We initiated a systematic availability review process following our July 2024 offsite (see Revival of the GHES Availability Review Process). The first availability issue was then created on August 16th, marking almost a year since our previous review.
Our journey began by exploring what availability truly means for GHES. We recognized that an escalation's value extends beyond mere resolution - we aimed to foster deeper discussions, prevent recurrence through measured repair items, and share knowledge via comprehensive runbooks.
Over the past 6 months, we've made significant strides in our availability review processes:
- Created 29 availability issues
- Generated 29 repair items, with 22 successfully resolved
- Conducted 9 availability review meetings
Thanks to our engineers' excellent presentations, escalations are now thoroughly documented and analyzed, providing invaluable insights into our customers' most pressing needs. 🎖️Special recognition goes to @JoeFranks1993 for resolving the highest number of severity 1 escalations.
On average, resolving an escalation takes approximately 14 hours. Our fastest resolution time stands at an impressive 19 minutes, achieved by @zachary-mark's lightning-quick fix🏆.
Analysis of issue labels has provided valuable insights into escalation causes:
Looking ahead, we see numerous opportunities for further improvement and welcome your feedback on our process.


