Dismiss Notice

Register now to be one of the first members of this SharePoint Community! Click here it just takes seconds!

Dismiss Notice
Welcome Guest from Country Flag

Postmortem: Hosted Build delays and Cloud Load Test errors on Visual Studio Team Services...

Discussion in 'Official Microsoft News' started by Visual Studio Team, Mar 31, 2017.

Thread Status:
Not open for further replies.
  1. Visual Studio Team

    Visual Studio Team Guest

    Blog Posts:
    Customer Impact:

    For 132 minutes on 15 March 2017, users of Visual Studio Team Services experienced delays and some timeouts for their builds and load tests. During the incident, builds for approximately 1,500 Hosted Build accounts did not start and eventually failed due to timeout errors. Also, tests for 40 Cloud Load Test accounts failed . The incident started at 21:48 UTC on 15 March 2017 and was active until 00:00 UTC the following day.

    What went wrong:

    The workflows for both Hosted Build and Cloud Load Test rely on the provisioning of new a VM for each build and test run. On 15 March 2017 at 21:42 UTC the Azure Storage Resource Provider started failing for all service management operations globally. As both Hosted Build and Cloud Load Test were unable to refresh their VM capacity, it resulted in queued builds and test run failures. The VSTS SRE team was alerted 36 minutes into the incident when the Build VM pool was exhausted. While trying to engage our partners in Azure we have discovered that Azure was already aware of this issue and working on a fix. After Azure Storage engineers applied a hotfix, both the Hosted Build and Cloud Load Test workflows recovered without manual intervention.

    Next Steps:

    Within the VSTS service we identified opportunities to improve our monitoring. Specifically, within existing telemetry, we log exceptions for failed Azure management calls which we will enable us to alert the team earlier for future issues. Additionally, our partners in Azure Storage have identified several resiliency improvements that are outlined in RCA noted below.

    Azure Storage Service RCA: https://azure.microsoft.com/en-us/status/history (View entry titled ”RCA – Storage provisioning impacting multiple services” on 3/16/2017).

    Sri Harsha

    Continue reading...
Thread Status:
Not open for further replies.

Share This Page

LiveZilla Live Chat Software