Global Azure outage: Our DNS update mangled domain records, reported Microsoft

Global Azure outage: Our DNS update mangled domain records, reported Microsoft

Azure, Microsoft Office 365, Dynamics, Power BI, DevOps, all down for nearly two hours...

Microsoft says a mishap during a DNS migration was behind a nearly two-hour Azure outage on May 2, between 19:43 and 22:35 UTC.

The global incident impacted a whole range of Microsoft cloud services, causing connection problems for core services like Azure, multiple services under the Microsoft 365 umbrella, Dynamics, and DevOps.

The incident also had an effect on Azure compute, storage, App Service, Azure AD identity services, and SQL Database.

Microsoft was mid-way through migrating its legacy domain name system (DNS) to its own hosted Azure DNS, when "some domains for Microsoft services were incorrectly updated", it explains on the Azure status history page.

Microsoft updated the page several times during the incident and as services were gradually restored.

The company assures customers that none of their DNS records were impacted during the event and that Azure DNS itself remained up throughout.

"The problem impacted only records for Microsoft services," it said, noting it was caused by a "namerserver delegation issue". The issue appeared to have occurred when Microsoft was transitioning from its legacy DNS servers to new DNS servers hosted in Azure.

Currently all Microsoft services are reporting all normal operations.

The Incident Status message:


Network Connectivity - DNS Resolution

Summary of impact: Between 19:43 and 22:35 UTC on 02 May 2019, customers may have experienced intermittent connectivity issues with Azure and other Microsoft services (including M365, Dynamics, DevOps, etc). Most services were recovered by 21:30 UTC with the remaining recovered by 22:35 UTC.

Preliminary root cause: Engineers identified the underlying root cause as a nameserver delegation change affecting DNS resolution and resulting in downstream impact to Compute, Storage, App Service, AAD, and SQL Database services. During the migration of a legacy DNS system to Azure DNS, some domains for Microsoft services were incorrectly updated. No customer DNS records were impacted during this incident, and the availability of Azure DNS remained at 100% throughout the incident. The problem impacted only records for Microsoft services.

Mitigation: To mitigate, engineers corrected the nameserver delegation issue. Applications and services that accessed the incorrectly configured domains may have cached the incorrect information, leading to a longer restoration time until their cached information expired.

Next steps: Engineers will continue to investigate to establish the full root cause and prevent future occurrences. A detailed RCA will be provided within approximately 72 hours.

Subscribe to our email alerts and tips!

Karl Carichner

CEO/Technology Specialist