Incidents
Which playbook should I open?
Payment incidents are not interchangeable. Classify by operational signal and incident class, then open the playbook and reference that match your plane of failure. When uncertain, start with payment incident triage.
Signals: Observability · Signal catalog · All playbooks
Incident taxonomy
Six classes cover the majority of production payment operations incidents. Each class maps to a primary playbook—secondary playbooks handle adjacent failure modes.
Payment attempts observed incorrectly, duplicated at source, or not attributed to commerce records—before policy confirmation.
Characteristic signals
- Checkpoint lag at detection
- Duplicate chain sends
- Missing payment_id linkage
Primary playbook: Duplicate payment investigation
Also consider: Exception queue triage
References: Settlement checkpoint model · Operational signal catalog
Lifecycle progression stalls between detection and policy confirmation—or confirmation policy cannot classify risk automatically.
Characteristic signals
- Checkpoint lag
- Stuck Paid populations
- Confirmation policy triggers
Primary playbook: Delayed settlement recovery
Also consider: Confirmation policy escalation · Settlement operations checklist
References: Confirmation policy matrix · Async settlement lifecycle
Signed event delivery, verification, ordering, or idempotency failures on merchant webhook consumers.
Characteristic signals
- Webhook recency lag
- Verification failure spikes
- Duplicate side effects
Primary playbook: Webhook secret rotation
Also consider: Duplicate payment investigation
References: Webhook delivery model · Provider retry semantics
Upstream API degradation, webhook gaps, or read failures that prevent authoritative provider plane updates.
Characteristic signals
- Provider latency
- Webhook recency gap
- API error rate elevation
Primary playbook: Provider outage response
Also consider: Reconciliation close (freeze)
References: Provider retry semantics · Webhook delivery model
Persistent three-plane mismatch, exception queue overload, or period-close blockers requiring finance ownership.
Characteristic signals
- Reconciliation drift
- Exception queue depth
- Matcher repeat failures
Primary playbook: Exception queue triage
Also consider: Reconciliation close procedure · Under/overpayment handling
References: Reconciliation state model · Operational signal catalog
Withdrawal requests blocked, ledger eligibility disagreements, or treasury review backlog threatening outbound movement.
Characteristic signals
- Payout review backlog
- Ledger vs provider disagreement
- Recognition gate failures
Primary playbook: Merchant payout review
Also consider: Treasury recognition procedure
References: Merchant ledger transitions · Payment health dashboard model
Signal → playbook routing
Direct routing when the signal and class are already clear. The summary table uses playbook names; linked routes follow in the list below.
Start with payment incident triage when class is unclear; use rows below for direct routing.
| Signal | Incident class | Open playbook | Supporting reference |
|---|---|---|---|
| Webhook recency lag | Webhook / Provider | Payment incident triage | Webhook delivery model |
| Webhook recency lag (sustained, API errors) | Provider | Provider outage response | Provider retry semantics |
| Checkpoint lag (detection → Paid) | Detection / Settlement | Delayed settlement recovery | Settlement checkpoint model |
| Checkpoint lag (policy confirmation) | Settlement | Confirmation escalation | Confirmation policy matrix |
| Exception queue depth rising | Reconciliation | Exception queue triage | Reconciliation state model |
| Reconciliation drift (repeat matcher failure) | Reconciliation | Reconciliation close | Reconciliation state model |
| Provider latency / API errors | Provider | Provider outage response | Provider retry semantics |
| Payout review backlog | Payout | Merchant payout review | Ledger transitions |
| Duplicate payment / replay side effects | Webhook / Detection | Duplicate investigation | Webhook delivery model |
| Amount variance (under/over) | Reconciliation | Under/overpayment handling | Reconciliation state model |
- Webhook recency lag → Webhook / Provider → Payment incident triage + Webhook delivery model
- Webhook recency lag (sustained, API errors) → Provider → Provider outage response + Provider retry semantics
- Checkpoint lag (detection → Paid) → Detection / Settlement → Delayed settlement recovery + Settlement checkpoint model
- Checkpoint lag (policy confirmation) → Settlement → Confirmation escalation + Confirmation policy matrix
- Exception queue depth rising → Reconciliation → Exception queue triage + Reconciliation state model
- Reconciliation drift (repeat matcher failure) → Reconciliation → Reconciliation close + Reconciliation state model
- Provider latency / API errors → Provider → Provider outage response + Provider retry semantics
- Payout review backlog → Payout → Merchant payout review + Ledger transitions
- Duplicate payment / replay side effects → Webhook / Detection → Duplicate investigation + Webhook delivery model
- Amount variance (under/over) → Reconciliation → Under/overpayment handling + Reconciliation state model
Open payment incident triage to classify signals, assign incident class, and route to the correct playbook without skipping evidence collection.