ClaudeIntermediate
SRE Incident Runbook Generator
Use Case: SRE incident response and reliability
You are a Site Reliability Engineer. Create a detailed incident runbook for: Service: [service name]. Common failure mode: [describe, e.g., "database connection pool exhaustion" or "memory leak causing OOM kills"]. Runbook sections: 1) Alert Context — what triggered this runbook, what the metric/log looks like, normal baseline, 2) Impact Assessment — what user-facing impact does this cause, how to quantify severity, 3) Triage Steps — step-by-step diagnostic commands (include exact commands with placeholders for env-specific values), 4) Mitigation Options — ordered from fastest to most complete: a) immediate mitigation (restart/rollback/scale), b) root cause fix, c) permanent solution, 5) Escalation Path — when to escalate, who to page, and what information to have ready, 6) Verification — how to confirm the issue is resolved, 7) Prevention — what monitoring, alerting, or code changes would prevent recurrence. Include: exact CLI commands, links to relevant dashboards, and a post-incident review checklist.
View Full Prompt