# Recovery Playbook

# Recovery Playbook

## Eval below threshold
HITL prompt fires automatically. Reject + add reason -> run terminates, recovery row on Monday.

## Buffer publish failure
Automatic retry 3x (exponential). If still failing, alert in `#recovery`. Manual options:
1. Reschedule for later
2. Post a correction
3. Mark as cancelled

## Orphan run
`runner.py` detects stale state.json lock (>1h old) on next tick -> logs orphan, alerts.

## Image generation failure
Run marked failed with `image_unavailable`. Re-run from Step 03 manually.

## Wrong-country copy
Triggers country_copy_check failure -> HITL -> reject -> re-draft.

> TODO: add SOP for each failure mode after W6 simulation pass.

---

## Grants Pipeline Failure Modes

### Budget blocking gate fires -- salary above 40 pct
The 40 pct salary check is a hard gate. When fired:
1. Run is blocked and cannot reach pre-submission
2. Telegram alert in hitl-approvals with restructuring guidance
3. Options: restructure budget and re-run, or waive with explicit override

### Partner unresponsive -- more than 14 days
Automatically flagged during scan. Partner status downgraded to unresponsive. Alert in agent-ops. Consider replacing partner or dropping consortium component.

### Capacity guard -- 2 or more active drafts
New evaluating opportunities are deferred until an active draft completes. Logged as capacity_deferred in audit. No intervention needed; next tick retries.

### HITL gate timeout
If no response within timeout: run terminates cleanly. Opportunity stays at current stage. Next daily tick retries.

### Eval score below threshold
Routes through pre-submission HITL with the score and failing checks visible. Reviewer can approve despite low score.

### Subagent verify unavailable
Graceful degradation: check returns passed=True with weight=0 so it does not count. Other checks still score normally.