Observability

Payment systems intelligence (conceptual)

Observability for crypto payment infrastructure is plane-aware: what integration engineers, operators, finance, and treasury each need to see—and what must never be collapsed into a single green indicator. These pages define signals and views, not live dashboards or published metrics.

Operational signal catalog

Six bounded signals mature teams track internally. Thresholds are merchant-defined; Kobbopay does not publish SLA numbers or live telemetry on this site.

Signal catalog summary — define thresholds internally; this site does not publish live metrics.

SignalWhat it measuresTypical ownerIncident classes
Webhook recencyTime since the last successfully verified webhook was processed for a merchant environment—or per-endpoint if you shard …Integration engineering / SREWebhook, Provider
Checkpoint lagElapsed time between lifecycle milestones (detection → eligibility → policy confirmation → finance reconciliation).Payment operationsSettlement, Detection
Exception queue depthCount of open, taxonomy-owned exceptions awaiting resolution—segmented by class and age bucket.Operations / financeReconciliation, Settlement
Reconciliation driftPersistent mismatch between commerce, provider, and finance plane states after matchers run—not one-off timing skew.Finance reconciliationReconciliation
Provider latencyResponse time and error rate for provider API reads/writes and webhook delivery attempts—observed from your integration …Integration engineeringProvider, Webhook
Payout review backlogOpen payout or withdrawal requests awaiting treasury review, dual control, or ledger eligibility confirmation.Treasury / financePayout, Reconciliation
  • Time since the last successfully verified webhook was processed for a merchant environment—or per-endpoint if you shard consumers.

    Healthy pattern: Recency stays within thresholds you define per traffic profile; occasional gaps align with known quiet periods.

    Investigate when: Recency grows while commerce or provider planes show activity; spikes after deploys or secret rotation.

  • Elapsed time between lifecycle milestones (detection → eligibility → policy confirmation → finance reconciliation).

    Healthy pattern: Lag distributions match rail and confirmation policy expectations documented internally.

    Investigate when: Payments stall between checkpoints; lag grows faster than historical baseline for the same rail.

  • Count of open, taxonomy-owned exceptions awaiting resolution—segmented by class and age bucket.

    Healthy pattern: Depth stable or draining during business hours; new items match known noise patterns.

    Investigate when: Depth grows monotonically; aging items exceed internal review targets; single class dominates.

  • Persistent mismatch between commerce, provider, and finance plane states after matchers run—not one-off timing skew.

    Healthy pattern: Drift items are rare, classified, and tied to known async windows.

    Investigate when: Same payment_id fails matchers repeatedly; drift clusters by rail, merchant, or time window.

  • Response time and error rate for provider API reads/writes and webhook delivery attempts—observed from your integration boundary.

    Healthy pattern: Latency and error rates within bands you track per environment; retries succeed without handler exhaustion.

    Investigate when: Elevated timeouts; read failures block status reconciliation; retry storms correlate with consumer crashes.

  • Open payout or withdrawal requests awaiting treasury review, dual control, or ledger eligibility confirmation.

    Healthy pattern: Backlog drains on schedule; holds are policy-driven with documented reasons.

    Investigate when: Requests exceed recognized balance checks; backlog grows during unrelated settlement incidents.

Open full signal catalog reference →

Operational dashboard concepts

Role-oriented views prevent single-plane dashboards from hiding reconciliation drift, webhook gaps, or treasury risk. Design internal tooling against these questions—not vanity uptime percentages.

Role-oriented views — design your internal dashboards against these questions.

ViewPrimary questions (sample)Key signals
Finance viewWhich payments are books-ready versus merely detected?Reconciliation drift; Exception queue depth; Checkpoint lag (finance gates)
Integration engineer viewAre webhooks verified, idempotent, and recent?Webhook recency; Provider latency; Checkpoint lag (detection → Paid)
Support / operator viewWhat lifecycle state should support quote to the customer?Checkpoint lag; Exception queue depth; Webhook recency (indirect stuck states)
Treasury viewWhich balances are recognized versus in-flight?Payout review backlog; Checkpoint lag (recognition → posting); Reconciliation drift (ledger vs provider)
Executive health viewAre payment systems degrading by class (webhook, settlement, reconciliation)?Aggregate signal trends you define internally; Incident class counts (not vanity uptime percentages); Exception queue aging buckets

Primary questions

  • Which payments are books-ready versus merely detected?
  • Where do matchers fail across commerce, provider, and finance planes?
  • What exceptions block period close?

Must not collapse: Provider Confirmed labels into treasury posted without reconciliation evidence.

Primary questions

  • Are webhooks verified, idempotent, and recent?
  • Where do handlers crash or exhaust retries?
  • Which rails show elevated provider latency?

Must not collapse: HTTP 200 responses into successful side effects without idempotency persistence.

Primary questions

  • What lifecycle state should support quote to the customer?
  • Which exceptions are owned and within review?
  • Is fulfillment allowed under merchant policy?

Must not collapse: Explorer screenshots or chat overrides into authoritative lifecycle truth.

Primary questions

  • Which balances are recognized versus in-flight?
  • What payout requests await dual control?
  • Are settlement and payout rails aligned?

Must not collapse: Detected inbound funds into payout eligibility without recognition gates.

Primary questions

  • Are payment systems degrading by class (webhook, settlement, reconciliation)?
  • Where are open incidents concentrated?
  • Is period close at risk from exception or drift trends?

Must not collapse: Multiple incident classes into a single uptime percentage without taxonomy.

Open dashboard model reference →

From signals to action

When a signal degrades, classify the incident, then open the matching playbook. Start with payment incident triage when the class is unclear.

Incident taxonomy and playbook routing →