Inbound dialing issues in Colorado Springs (已解決)
  • 優先級 - 重大
  • 影響範圍 系統 - Option9
  • Data102 is aware of an inbound dialing issue in the Colorado Springs region, stemming from a CenturyLink interconnect problem. We are working aggressively to resolve this as soon as possible.

  • 日期 - 06/25/2020 08:00 - 07/14/2020 15:08
  • 最後更新 - 07/14/2020 15:08
Inbound 719 call issues (已解決)
  • 優先級 - 重大
  • 影響範圍 伺服器 - portal.magnavoip.com
  • At 21:51 MST Data102 saw multiple PRIs from our Colorado Springs local carrier to down simulatenously. This temporarily impacts inbound calling to 719 numbers on that trunk group.

    Data102 is in contact with the upstream provider and is working for a speedy remedy.

    ** Update: 10:21pm: All of the PRIs are back up and appear to be functioning normally.

  • 日期 - 09/03/2018 21:51 - 07/14/2020 15:08
  • 最後更新 - 09/03/2018 22:30
nimbus1 cloud platform network connectivity (已解決)
  • 優先級 - 重大
  • 影響範圍 伺服器 - Nimbus VPS
  • Data102's nimbus platform experienced a partial switch feailure at 15:25:27MST. Network connectivity was fully restored at ~15:31:49. Due to partial switch failure impacting storage availability, some Windows virtual machines automatically rebooted. No Linux-based server appears to have been impacted.

    Data102 is assessing every server to ensure their operation at this time. Data102 will evaluate the switch failure and remedy accordingly.

  • 日期 - 08/24/2018 15:24 - 07/14/2020 15:08
  • 最後更新 - 08/24/2018 16:05
Upstream connectivity issues (已解決)
  • 優先級 - 重大
  • 影響範圍 其他 - Cogent transit
  • Over the last few hours we have observed some issues with our Cogent uplink, including heavy packet loss. We are temporarily removing Cogent from our transit mix at this time.

    ** Update 15:34 **
    We believe we have isolated and repaired the packet loss issues we have been seeing with Cogent and, after monitoring, have returned the network to full service.

  • 日期 - 08/07/2018 08:05 - 07/14/2020 15:08
  • 最後更新 - 08/07/2018 15:36
Upstream connectivity flap 2018-03-21 @ 21:43 (已解決)
  • 優先級 - 高
  • 影響範圍 其他 - Network Services
  • Data102 saw our BGP sessions with Level3 this evening fail around 21:43, 21:48, and 21:52 MST. Our logs and routers indicate that the BGP hold timer failed. We have opened ticket a ticket with Level3 for more information as to what occurred – we did not observe any other BGP flaps, router reloads or other issues, just Level3 flapping on both v4 and v6.

  • 日期 - 03/21/2018 21:48 - 07/14/2020 15:08
  • 最後更新 - 03/21/2018 22:23
Impacted inbound/outbound dialing (已解決)
  • 優先級 - 重大
  • 影響範圍 其他 - Magnavoip
  • Between 2:30pm and 3:00pm on 10/7/2017, Data102 observed multiple T1/PRIs lose connetivity to an upstream voice carrier.

    These circuits carry the majority of traffic for the Colorado Springs/719 region of our service. Customers with service via this carrier have been unable to make or recieve calls for the duration of the outage. 

    As of 4:30pm, we have remapped all of our outbound dialing to avoid these circuits; inbound service continues to be impacted.

    ** Update 4:40pm **
    We have learned that our upstream carrier suffered a power outage due to vandalism at their head end. Power has been restored, and services are being repaired.

    ** Update 6:05pm **
    Notified of inbound service restoration, and was confirmed at approximately 6:10pm.

    ** Update 6:43pm **
    Notified of outbound service restoration, reconfigured trunks and tested outbound service at 7:38pm (not urgent, as outbound was being processed through alternate upstreams without issue)

    ** Update 10/9/17 @ 11:30am **
    We are aware of intermittent inbound dialing, and of an "All circuits are busy" message occasionally playing on connect. This is due to various underlying circuits being down on our trunk group, and some calls attempting to use those downed services. We have remedied more than 50% of those, and are troubleshooting the additional capacity.

    ** UPdate 10/9/17 @ 12:32pm **
    Technicians are en route to the far end of one of our down hi-cap circuits to replace equipment that was damaged during the outage. Upon replacement and cable swapping we expect full operation.

    ** Update 10/9/17 @ 2:16pm **
    Full service has been restored across all uplinks and carriers.

  • 日期 - 10/07/2017 14:29 - 03/21/2018 22:23
  • 最後更新 - 10/09/2017 14:21
Degraded Spam filtration (已解決)
  • 優先級 - 低
  • 影響範圍 系統 - DirectMX
  • We have become aware that the DirectMX platform has failed to update anti-spam settings for at least 24 hours, resulting in a slight increase in spam being delivered to end users. We are aware of this issue and are actively investigating.

    12:21pm
    This issue has been isolated to a hung process, and has been resolved, and definition updates have been completed.

  • 日期 - 12/09/2016 11:41 - 08/09/2017 11:59
  • 最後更新 - 12/09/2016 12:21
Intermittent outbound calling issues (已解決)
  • 優先級 - 中
  • 影響範圍 伺服器 - portal.magnavoip.com
  • Issue Description
    Earlier this morning Data102 became aware of intermittent outbound calling problems, where end users would place a call and receive a message indicating "Destination Unavailable" or similar. Inbound calls were not impacted.

    Issue Root Cause
    After debugging, Data102 determined that an upstream carrier's voice gateway became unavailable and unresponsive at an unknown time in the morning. Due to this failure, an alternate service was used for call completions. This failover service has a limited number of call paths available (46), and intermittently this alternate service became congested, resulting in a failure to complete outbound calls.

    Issue Resolution
    After determining that the main voice gateway was unavailable, Data102 re-configured the call platform to forward calls to an alternate voice gateway. This promptly resolved the call flow issue, and service was restored. Data102 has put additional monitoring in place of upstream service provider availability, and setup additional alarming and auto-redirection services in case of gateway failure.






  • 日期 - 10/12/2016 09:29 - 10/12/2016 10:12
  • 最後更新 - 10/12/2016 14:55
RFO - VoIP - One-way audio problem (已解決)
  • 優先級 - 重大
  • 影響範圍 伺服器 - sip.data102.com
  • Event Overview

    • Outage Scope: Customers using MagnaVoIP phone services behind NAT firewalls without ALG
    • Outage Date: 2016-07-11
    • Outage Type: One-way audio / loss of service
    • Severity: Critical
    • Root Cause: Bug causing incorrect NAT setting on customer-facing SIP proxy

    Event Report
    On 2016-07-10 @ 5:30am, Data102 completed a software upgrade of the MagnaVoIP platform, and throughout the day monitored and confirmed numerous active and final call completions.

    At approximately 8:05am 2016-07-11, multiple reports indicated that customers were experiencing one-way audio. Immediate troubleshooting began, and at 8:48am, vendor support was engaged.

    Vendor confirmed that the issue existed as expressed, but did not have a fix or a root cause.

    At approximately 1:30pm, the decision was made to roll-back the previously performed upgrade. The rollback began at ~2:15pm, and all service was confirmed to be restored by 2:22pm.


    Customer Impact
    Customers who have their SIP endpoint behind a firewall or similar device running NAT, that did not have a SIP Application Layer Gateway, and were registered to sip1.magnavoip.com were impacted by the service. This predominantly consisted of hosted handset customers. Customers with public IP addresses on their SIP endpoint, including IADs, Hosted vPBX, or byo-PBX with a public IP address were not impacted.

    Customers who were impacted experienced one-way audio, and were unable to hear the remote party on the phone call. 

    Root Cause Analysis
    After procuring tremendous amounts of debug data, and sending it to our software vendor, alongside numerous retests of the situation on our development & testing environment, we were able to deduce the following:

    Data102's SIP1 customer-facing proxy was configured for "nat=0" rather than "nat=1". This resulted in the media handling to disregard whether not SIP packets were NAT'd, and to trust the supplied SDP; as such, RTP/media was sent to the private IP address of the handset, rather than to the Public->NAT'd IP. These private IP addresses are unroutable and unreachable, and thus no audio reached the handset.

    The aberration in the NAT configuration does not present as an issue on the current 3.0x software platform due to a bug which disregards the NAT setting, and always forces "nat=1" regardless of the setting or the reality. This bug is fixed in the 4.x software, resulting in the media being handled 'correctly' as configured -- without NAT workarounds -- resulting in one-way audio.

    This situation was not detected in our extensive development and testing routines because the default setting is "nat=1", and this configuration setting is not visible in the web UI, and is only accessible via the database. As such, it would have been impossible, on a fresh install for dev/test, to see this situation, as it can only exist on an upgrade, where such a pre-existing broken configuration would have been unearthed, "fixed" and resulted in a service impact.

    Resolution
    Data102 resolved the immediate service impact by rolling back to the snapshots taken prior to the software upgrade on 2016-07-12. This immediately restored all functionality.

    Data102 completed very extensive testing on the development environment, with vendor involvement, to 100% confirm where the configuration problem lies, how the bug impacted such a configuration as well as how to fix it. Data102 is able to reproduce the behavior, and the solution, on demand, and is confident that the configuration fix is viable. Data102 plans to re-initiate the 4.x upgrade at a later date.

  • 日期 - 07/11/2016 08:00 - 07/11/2016 14:22
  • 最後更新 - 07/13/2016 14:17
CDP Backups offline (已解決)
  • 優先級 - 重大
  • 影響範圍 其他 - SAN02-COS
  • Issue Description
    Service: CDP Backups
    Impact: Critical

    Issue Detail
    Yesterday at approximately 3:05 PM MST, Data102's Colorado Springs-side SAN02 lost two hard drives during a patrol read, destroying a 6-drive RAID5 set. At approximately 5PM, we completed swapping hard drives and began background-initializing the array, with estimate completion time of 48 hours.

    This morning at ~9:15AM, we began the resync process - copying the data from our Denver-side SAN01 to SAN02 to restore all services. This process has an estimated completion of ~33 hours.

    In the meantime, to ensure data consistency, all CDP services have been temporarily disabled. 

    Updates
    01/01/16 @ 12:45pm: The initialization and copyback are proceeding, at 37% and 8.5% respectively.
    01/02/16 @ 08:30am: Process continues, 57% , 59%
    01/02/16 @ 11:50pm: Rebuild and data transfer complete
    01/03/16 @ 08:15am: All backup jobs started
    01/03/16 @ 11:30am: Backup jobs audited, repaired as needed
    01/04/16 @ 09:28am: All backup jobs appear to completed as intended


    Our apologies for this significant inconvenience. We understand that backups are important, and are working as diligently and as swiftly as possible to restore service without losing any data. Should the need arise to take a backup or to restore a backup, we MAY be able to facilitate your request, depending on the location of your data. Please open a trouble ticket with your request and we will respond promptly.

    Data102 Operations Team
    VP/Ops Randal Kohutek
    888-328-2102 / 719-387-0000


  • 日期 - 01/01/2016 09:30 - 01/04/2016 09:30
  • 最後更新 - 02/05/2016 12:40

伺服器狀態

以下是我們所有伺服器的即時概況,您可以在此查看是否有任何已知問題發生。

伺服器名稱 HTTP FTP POP3 PHP 資訊 伺服器負載 可用時間
web01.orb.data102.com PHP 資訊
web02.orb.data102.com PHP 資訊