DawnOps

What makes a safe mitigation during incidents

Mitigations are where incidents compound. A safe mitigation is reversible, scoped, and easy to verify.

The safe mitigation checklist

  • Reversible: you can undo it quickly.
  • Scoped: the blast radius is limited and known.
  • Observable: you can verify impact within minutes.
  • Time-bounded: you know when to stop and reassess.

If you can’t explain the verification window, it isn’t safe yet.

Mitigation safety axis

When you’re unsure, plot the action on this axis:

Blast radius ↑
High | Avoid unless you must.     | High risk. Escalate fast.
Low  | Safe to try first.         | Safe but verify quickly.
       Low reversibility -----> High reversibility

Common safe mitigations

  • Roll back the last deploy.
  • Disable a feature flag.
  • Shed load for non-critical traffic.
  • Scale a consumer group if the database can handle it.

Unsafe patterns

  • Schema changes with no rollback plan.
  • Multiple changes at once.
  • Mitigations with no verification step.

A quick pre-flight check

Before acting, say out loud:

  • “We’ll know this worked when ___ changes in ___ minutes.”
  • “If that doesn’t happen, we’ll ___ next.”

A one-minute decision template

  • What are we changing?
  • What is the blast radius?
  • How will we verify?
  • What is our rollback trigger?

If you can’t answer these in 60 seconds, the mitigation isn’t safe yet.

Keep going