Teams: This is what caused the recent massive outage

Published February 1, 2023
Author: Ash Khan

Teams: This is what caused the recent massive outage

Published February 1, 2023
Author: Ash Khan

Teams parent company explains what it believes went wrong.

Microsoft Office 365 company has disclosed its first findings about what it believes caused a recent severe outage. This outage affected some of its most popular software services.

 

Users throughout Europe and Asia were unable to use Microsoft 365 services for many hours due to the outage. This outage affected Microsoft Teams, Outlook, OneDrive for Business, Exchange Online, and SharePoint.

 

Initially, Microsoft 365 office parent company blamed wide-area networking (WAN) routing change. Later Microsoft has now disclosed the findings of its initial inquiry into the outage. The new findings show that things were a little more complicated than that.

Microsoft Teams outage

According to the software company’s report, the customers experienced networking connectivity issues between 07:05 UTC and 12:43 UTC on 25 January 2023. A network outage appeared as long latency and/or timeouts when users attempted to access Azure resources. In addition, Microsoft’s other products and services, such as Microsoft 365 and the Power Platform, were also affected.

 

Microsoft explained via Express Routes that it noticed a change in its Wide Area Network (WAN) which caused connectivity problems. The problem also impacted connectivity between customers on the internet and in Azure, as well as across regions.

 

As part of a planned change to update a WAN router’s IP address, a command sent to the router led it to send messages to all other routers in the WAN. Thus, causing all of them to recompute their adjacency and forwarding tables. The routers were unable to appropriately relay packets transiting them throughout this re-computation procedure. The command that triggered the problem behaves differently on different network devices. Furthermore, it had not been validated using the whole qualification procedure on the router where it was run.

 

Microsoft stated that it was able to discover the problem within an hour. Moreover, restored all of its internal networking equipment within two and a half hours.

 

It has restricted highly significant instructions from being executed on the devices to avoid the same issue from happening again. Also, the corporation is implementing safe change rules for every command executed on its devices.

 

For IT Company productivity tools visit now.