
GitHub Reports December 2024 Service Disruptions
In a recent announcement, GitHub, a leading platform for software development collaboration, revealed two significant service disruptions it experienced in December 2024. These incidents resulted in degraded performance across its services, affecting user access and functionality.
The first disruption occurred on December 17, 2024, from 14:33 UTC to 14:50 UTC. During this period, users encountered intermittent errors and timeouts, with the error rate averaging 8.5% and peaking at 44.3% of requests. The incident impacted several core functionalities, including logging in, viewing repositories, and managing pull requests and issues.
According to GitHub’s blog post, the root cause was identified as an overload of the web servers due to planned maintenance, which inadvertently caused the failure of the live updates service. This critical service is essential for providing automatic updates to users, who were forced to manually refresh pages, further straining the servers. The company took swift action by reversing the maintenance changes and scaling up the service to manage the increased traffic from WebSocket clients.
Post-incident analysis revealed gaps in GitHub’s alerting system, which led to a delayed assessment of the incident’s impact. As a result, GitHub is now focused on enhancing monitoring and alerting mechanisms to prevent similar issues in the future.
The second disruption took place on December 20, 2024, between 15:57 UTC and 16:39 UTC. This incident was attributed to a partial outage with one of GitHub’s third-party service providers, rendering some marketing pages inaccessible and causing 500 errors for users attempting to access them. However, no operational products or service areas were affected during this time.
Fortunately, the service provider resolved the issue at 16:39 UTC, restoring access to the affected pages. GitHub is currently exploring ways to improve error handling and ensure graceful degradation of service in the event of future outages.
The company remains committed to enhancing its infrastructure resilience and service reliability, with users able to track real-time service status updates on their status page and stay informed about ongoing improvements on the GitHub Engineering Blog.
Source: Blockchain.News