DawnOps

On-call rotations: how to reduce variance for new engineers

New responders look unpredictable because the system is unpredictable. Reduce variance by standardizing the first 10 minutes.

Standardize the first checks

Pick one truth dashboard per service. Every responder should start there. If there isn’t one, fix that before you add a new engineer to the rotation.

Define safe mitigations

List the reversible actions that are always allowed and how to verify them. Make it a short list and make it visible.

Add a shadow week

A single shadow week with real pages is more valuable than a month of reading. Pair the new responder with a buddy and make them write the comms update at least once.

Use a short comms template

A 4-line update keeps comms consistent and reduces escalations.

Measure the variance

Track time to first mitigation and number of escalations per new responder. Both should shrink over time.

Consistency is what turns on-call from heroics into routine.

Keep going