Ir al contenido principal

Recovery Playbook

Recovery Playbook

Eval below threshold

HITL prompt fires automatically. Reject + add reason -> run terminates, recovery row on Monday.

Buffer publish failure

Automatic retry 3x (exponential). If still failing, alert in #recovery. Manual options:

  1. Reschedule for later
  2. Post a correction
  3. Mark as cancelled

Orphan run

runner.py detects stale state.json lock (>1h old) on next tick -> logs orphan, alerts.

Image generation failure

Run marked failed with image_unavailable. Re-run from Step 03 manually.

Wrong-country copy

Triggers country_copy_check failure -> HITL -> reject -> re-draft.

TODO: add SOP for each failure mode after W6 simulation pass.