X Thread: SRE Skills for Clawdbot

Post as thread on @avivl account.

I taught my AI assistant to do SRE.

Not just "summarize logs" — actual incident response, alert analysis, and engineering metrics.

Here's what an AI-powered operations toolkit looks like 🧵

1/11

Skill #1: Incident Response

My Clawdbot can now: • Check production health across 67 Cloud Run services • Correlate alerts with recent deploys • Suggest rollbacks or scaling fixes • Follow actual runbooks, not hallucinate them

2/11

The key: structured diagnostics.

It knows to check:

Recent deployments (GitHub Actions)
Error logs (Cloud Logging)
Service metrics (latency, memory)
External dependencies (API quotas)

In that order. Like an actual SRE would.

3/11

Skill #2: Alert Insights

Weekly analysis of production alerts: • Scans Gmail for alert patterns • Identifies noisy/flapping alerts • Cross-references with monitoring config • Recommends specific threshold changes

Turns alert fatigue into actionable PRs.

4/11

The magic: it reads our actual infra code.

Points to specific files: "Adjust error threshold in src/core/services/monitoring/error-reporting.ts line 47"

Not generic advice. Real code changes.

5/11

Skill #3: DORA Metrics

Tracks the 4 key DevOps metrics: • Deployment Frequency • Lead Time for Changes • Change Failure Rate • MTTR

Weekly reports with trends and per-service breakdowns.

6/11

Data sources it pulls from:

• GitHub Actions → deploy frequency, failure rate • GitHub PRs → lead time (created → merged) • Gmail alerts → MTTR (alert → resolved)

All automated. No manual spreadsheets.

7/11

The pattern: Skills = Runbooks as Code

Each skill is: • A markdown file with procedures • CLI commands it can run • Context about our specific infra

AI follows the runbook. Humans review the output.

8/11

What changed for us:

Before: Wake up to 47 alerts, spend 30min triaging After: "Golem, what happened overnight?" → 2min summary

Before: Monthly DORA review (if we remembered) After: Weekly automated report in my inbox

9/11

The meta insight:

SRE is mostly pattern matching + executing known procedures.

That's exactly what AI is good at.

Humans should design the runbooks and make judgment calls. AI should execute the checklist.

10/11

All of this runs on a $34/month GCP VM.

Skills are just markdown files. No fancy infra needed.

Your AI assistant can be your junior SRE — if you teach it how.

#SRE #DevOps #AI #Clawdbot #PlatformEngineering

11/11

avivl/x-thread-sre-skills-numbered.md

Select an option

No results found

Select an option

No results found

X Thread: SRE Skills for Clawdbot