What makes a safe mitigation during incidents
Mitigations are where incidents compound. A safe mitigation is reversible, scoped, and easy to verify.
The safe mitigation checklist
- Reversible: you can undo it quickly.
- Scoped: the blast radius is limited and known.
- Observable: you can verify impact within minutes.
- Time-bounded: you know when to stop and reassess.
If you can’t explain the verification window, it isn’t safe yet.
Mitigation safety axis
When you’re unsure, plot the action on this axis:
Blast radius ↑
High | Avoid unless you must. | High risk. Escalate fast.
Low | Safe to try first. | Safe but verify quickly.
Low reversibility -----> High reversibility
Common safe mitigations
- Roll back the last deploy.
- Disable a feature flag.
- Shed load for non-critical traffic.
- Scale a consumer group if the database can handle it.
Unsafe patterns
- Schema changes with no rollback plan.
- Multiple changes at once.
- Mitigations with no verification step.
A quick pre-flight check
Before acting, say out loud:
- “We’ll know this worked when ___ changes in ___ minutes.”
- “If that doesn’t happen, we’ll ___ next.”
A one-minute decision template
- What are we changing?
- What is the blast radius?
- How will we verify?
- What is our rollback trigger?
If you can’t answer these in 60 seconds, the mitigation isn’t safe yet.