Skip to content

Instantly share code, notes, and snippets.

@zheng022
Last active February 18, 2025 07:27
Show Gist options
  • Select an option

  • Save zheng022/550492b0d9a0616b1baad199a9914fd1 to your computer and use it in GitHub Desktop.

Select an option

Save zheng022/550492b0d9a0616b1baad199a9914fd1 to your computer and use it in GitHub Desktop.

Availability Review Retrospect

We initiated a systematic availability review process following our July 2024 offsite (see Revival of the GHES Availability Review Process). The first availability issue was then created on August 16th, marking almost a year since our previous review.

Our journey began by exploring what availability truly means for GHES. We recognized that an escalation's value extends beyond mere resolution - we aimed to foster deeper discussions, prevent recurrence through measured repair items, and share knowledge via comprehensive runbooks.

Over the past 6 months, we've made significant strides in our availability review processes:

  • Created 29 availability issues
  • Generated 29 repair items, with 22 successfully resolved
  • Conducted 9 availability review meetings

Thanks to our engineers' excellent presentations, escalations are now thoroughly documented and analyzed, providing invaluable insights into our customers' most pressing needs. 🎖️Special recognition goes to @JoeFranks1993 for resolving the highest number of severity 1 escalations.

assignee_distribution

On average, resolving an escalation takes approximately 14 hours. Our fastest resolution time stands at an impressive 19 minutes, achieved by @zachary-mark's lightning-quick fix🏆.

time_analysis

Analysis of issue labels has provided valuable insights into escalation causes:

issue_causes_pie_chart

Looking ahead, we see numerous opportunities for further improvement and welcome your feedback on our process.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment