Playbook

Webhook secret rotation playbook

Overlapping secrets, verification cutover, handler validation, and rollback boundaries for production webhook endpoints.

01

Objective

Rotate webhook secrets with overlapping verification keys, validated handlers, and observable cutover—minimizing false verification failures and duplicate processing incidents.

02

Prerequisites

  • Document current signing algorithm and header names per environment.
  • Dual-secret verification supported in handler (current + next).
  • Idempotency store healthy; duplicate metrics baselined.
  • Rollback secret retained in secure vault until cutover completes.

03

Operational signals

  • Elevated 401/403 on webhook endpoint during partial deploys.
  • Verification failure rate spike after config push.
  • Provider retry volume increasing without matching business events.

04

Decision points

  • Rotation window length and overlap duration.
  • Whether to pause auto-fulfillment during cutover.
  • Emergency rollback versus forward-fix when both secrets fail verification.

05

Escalation paths

  • On-call engineering → security for suspected secret exposure.
  • Payment operations → finance if verified events stop updating provider plane.
  • Provider support → delivery gap exceeds configured retry window.

06

Failure modes

  • Single-secret handler deployed before overlap period ends.
  • Logging full signatures or secrets during debugging.
  • Returning 2xx before idempotency write while rotation triggers retries.

07

Recovery patterns

  1. Re-enable previous secret; confirm verification success rate normalizes.
  2. Replay failed events from provider dashboard or API read if available.
  3. Audit idempotency store for partial applies during outage window.
  4. Post-incident: tighten rotation runbook and CI fixture tests.
  • Retries are normal. Webhook delivery is at-least-once. Design consumers to tolerate duplicates and out-of-order arrivals where possible.
  • Asynchronous by design. Payers, chains, and your servers operate on different clocks. UI and finance should not assume synchronous finality.
  • Eventual consistency. API reads, webhooks, and portal views may briefly diverge during transitions. Reconciliation jobs exist to converge truth.

Walkthroughs: /operations