Incident response runbook
Incident response runbook defines outage handling for hosted services. It follows the published rule: confirmed outage and critical incidents are handled 7/24, while non-outage requests remain in business-hour workflow.
Classification
| Class | Typical condition | Handling model |
|---|---|---|
| Outage / critical | Site unreachable, critical function unavailable | Immediate triage in 7/24 mode |
| Degraded but live | Partial failures or performance issues | Business-hour diagnosis and mitigation plan |
| Routine request | Feature ask, advisory, non-urgent change | Planned and scoped workflow |
Response flow
01
Open one incident thread
Include domain, timestamp, visible symptom, and [OUTAGE] subject marker if applicable.
02
Confirm outage scope
Validate whether impact matches outage/critical criteria or routine-scope issue.
03
Contain and restore
Apply rollback or mitigation to return service to stable state.
04
Close with written summary
Share timeline, action log, and follow-up controls in the same thread.
Communication pattern
- Keep one thread per event; do not split updates across channels.
- Share verified facts first, then assumptions labeled clearly.
- Track key actions with timestamps for post-incident review.
Outage markerFor Kernel Host outage events, use [OUTAGE] in the subject line. This is the documented trigger for immediate 7/24 handling.