BT Wholesale Leased Line Outage

Resolved
Operational
Started 9 months ago Lasted 12 days

Affected

Connectivity & Voice
Inspira Broadband
Inspira Leased Lines & EFM
Updates
  • Resolved
    Resolved

    As previously reported, the ultimate cause of the outage was a crash of an active switch in a virtual switch chassis at our Telehouse North PoP following the replacement of the failed standby switch. This is a procedure that we have carried out many times in the past and it has always been a hitless operation and is indeed documented as such. Following post-mortem analysis involving vendor TAC it has been concluded that the supervisor on the active switch must have entered a partially failed state when it switched over from standby to active after the switch failure the following week. Had this been visible to us in any way we would have scheduled the replacement work in an out-of-hours maintenance window. In light of this incident, we will of course plan to carry out replacements of this nature out of hours should we see any switch failures in these systems going forward.

    This particular switch chassis had an uptime of just over six and a half years prior to the outage last week. Despite this solid stability we are now planning to move away from these virtual switch systems as part of our planned network upgrades. This will see our network transition to a more modern and efficient spine-leaf architecture where the failure of a single device will have limited to no impact on service. These upgrades will see significant investment and will be rolled out to all PoPs within the next 1-2 years.

    All maintenance work at our THN PoP is now complete and its previous stability is being observed. Please accept our apologies again for the downtime witnessed.

  • Monitoring
    Update

    Apologies for the disruption experienced this afternoon. What should have been a straight forward replacement of failed hardware has not gone to plan. A series of unexpected issues have hampered our NOC, and this has caused knock-on service affecting issues. We are now taking these findings to Cisco TAC for review before any more works take place.

    We expect all services to remain stable.

    Further updates on any planned works will be shared in due course.

  • Monitoring
    Update

    We are aware of continued disruption at Telehouse North affecting some leased line and broadband connections. We will abandon further works today to try and restore stability. Apologies for this continued disruption.

  • Monitoring
    Monitoring

    When bringing the new replacement switch into service, as a standby, the primary device went into a panic state and rebooted. The reboot took longer than it should have as it automatically upgraded at the same time.

    Apologies, this was unexpected.

  • Resolved
    Resolved

    All affected NNI and associated circuits should now be restored.

    There should be no need to action any changes on-site, connections should simply restore. If you continue to see disruption please raise individual faults against the circuits in question.

    Apologies for the disruption caused.

  • Identified
    Identified

    The cause of the issue is hardware failure. We have on-site hands moving the affected NNIs to another switch and we hope to get all circuits operational ASAP.

  • Investigating
    Update

    This is confirmed as impacting all carriers, not just BT Wholesale. Colt, Sky and TalkTalk are also impacted. The cause appears to be linked to our switches. NOC are investigating and we hope to have an update to share shortly. Apologies for the disruption this will cause.

  • Investigating
    Investigating

    We are investigating an issue impacting our BT Wholesale connectivity into Telehouse North. This will be impacting leased lines and may have had a temporary impact to broadband. As soon as we know more we will update this feed.