Observability

Payment systems intelligence (conceptual)

Observability for crypto payment infrastructure is plane-aware: what integration engineers, operators, finance, and treasury each need to see—and what must never be collapsed into a single green indicator. These pages define signals and views, not live dashboards or published metrics.

Full references: Signal catalog · Dashboard model · Incident taxonomy & routing · Incident triage playbook

Operational signal catalog

Six bounded signals mature teams track internally. Thresholds are merchant-defined; Kobbopay does not publish SLA numbers or live telemetry on this site.

Signal catalog summary — define thresholds internally; this site does not publish live metrics.

Signal	What it measures	Typical owner	Incident classes
Webhook recency	Time since the last successfully verified webhook was processed for a merchant environment—or per-endpoint if you shard …	Integration engineering / SRE	Webhook, Provider
Checkpoint lag	Elapsed time between lifecycle milestones (detection → eligibility → policy confirmation → finance reconciliation).	Payment operations	Settlement, Detection
Exception queue depth	Count of open, taxonomy-owned exceptions awaiting resolution—segmented by class and age bucket.	Operations / finance	Reconciliation, Settlement
Reconciliation drift	Persistent mismatch between commerce, provider, and finance plane states after matchers run—not one-off timing skew.	Finance reconciliation	Reconciliation
Provider latency	Response time and error rate for provider API reads/writes and webhook delivery attempts—observed from your integration …	Integration engineering	Provider, Webhook
Payout review backlog	Open payout or withdrawal requests awaiting treasury review, dual control, or ledger eligibility confirmation.	Treasury / finance	Payout, Reconciliation

Time since the last successfully verified webhook was processed for a merchant environment—or per-endpoint if you shard consumers.
Healthy pattern: Recency stays within thresholds you define per traffic profile; occasional gaps align with known quiet periods.
Investigate when: Recency grows while commerce or provider planes show activity; spikes after deploys or secret rotation.
Elapsed time between lifecycle milestones (detection → eligibility → policy confirmation → finance reconciliation).
Healthy pattern: Lag distributions match rail and confirmation policy expectations documented internally.
Investigate when: Payments stall between checkpoints; lag grows faster than historical baseline for the same rail.
Count of open, taxonomy-owned exceptions awaiting resolution—segmented by class and age bucket.
Healthy pattern: Depth stable or draining during business hours; new items match known noise patterns.
Investigate when: Depth grows monotonically; aging items exceed internal review targets; single class dominates.
Persistent mismatch between commerce, provider, and finance plane states after matchers run—not one-off timing skew.
Healthy pattern: Drift items are rare, classified, and tied to known async windows.
Investigate when: Same payment_id fails matchers repeatedly; drift clusters by rail, merchant, or time window.
Response time and error rate for provider API reads/writes and webhook delivery attempts—observed from your integration boundary.
Healthy pattern: Latency and error rates within bands you track per environment; retries succeed without handler exhaustion.
Investigate when: Elevated timeouts; read failures block status reconciliation; retry storms correlate with consumer crashes.
Open payout or withdrawal requests awaiting treasury review, dual control, or ledger eligibility confirmation.
Healthy pattern: Backlog drains on schedule; holds are policy-driven with documented reasons.
Investigate when: Requests exceed recognized balance checks; backlog grows during unrelated settlement incidents.

Open full signal catalog reference →

Operational dashboard concepts

Role-oriented views prevent single-plane dashboards from hiding reconciliation drift, webhook gaps, or treasury risk. Design internal tooling against these questions—not vanity uptime percentages.

Role-oriented views — design your internal dashboards against these questions.

View	Primary questions (sample)	Key signals
Finance view	Which payments are books-ready versus merely detected?	Reconciliation drift; Exception queue depth; Checkpoint lag (finance gates)
Integration engineer view	Are webhooks verified, idempotent, and recent?	Webhook recency; Provider latency; Checkpoint lag (detection → Paid)
Support / operator view	What lifecycle state should support quote to the customer?	Checkpoint lag; Exception queue depth; Webhook recency (indirect stuck states)
Treasury view	Which balances are recognized versus in-flight?	Payout review backlog; Checkpoint lag (recognition → posting); Reconciliation drift (ledger vs provider)
Executive health view	Are payment systems degrading by class (webhook, settlement, reconciliation)?	Aggregate signal trends you define internally; Incident class counts (not vanity uptime percentages); Exception queue aging buckets

Primary questions

Which payments are books-ready versus merely detected?
Where do matchers fail across commerce, provider, and finance planes?
What exceptions block period close?

Must not collapse: Provider Confirmed labels into treasury posted without reconciliation evidence.

Primary questions

Are webhooks verified, idempotent, and recent?
Where do handlers crash or exhaust retries?
Which rails show elevated provider latency?

Must not collapse: HTTP 200 responses into successful side effects without idempotency persistence.

Primary questions

What lifecycle state should support quote to the customer?
Which exceptions are owned and within review?
Is fulfillment allowed under merchant policy?

Must not collapse: Explorer screenshots or chat overrides into authoritative lifecycle truth.

Primary questions

Which balances are recognized versus in-flight?
What payout requests await dual control?
Are settlement and payout rails aligned?

Must not collapse: Detected inbound funds into payout eligibility without recognition gates.

Primary questions

Are payment systems degrading by class (webhook, settlement, reconciliation)?
Where are open incidents concentrated?
Is period close at risk from exception or drift trends?

Must not collapse: Multiple incident classes into a single uptime percentage without taxonomy.

Open dashboard model reference →

From signals to action

When a signal degrades, classify the incident, then open the matching playbook. Start with payment incident triage when the class is unclear.

Incident taxonomy and playbook routing →