Simpaisa

Payment Error Threshold Intelligence

An AI-assisted, merchant-aware alerting system that turns noisy wallet-error streams into ranked, ownership-attributed incidents.

Role / Product Manager, design & automationYear / 2025Status / Shipped in production · deepened as a portfolio build

Anomaly detection and alert-funnel dashboard

68%ALERT-NOISE REDUCTION

100MERCHANTS MONITORED

The problem

In production, I shipped a Python and DataDog pipeline that cut payment-failure detection from 1.5 hours to under a minute. But naive rules ("alert when error rate > 5%") still flood ops teams, because some errors are normal, some are merchant-side config, and some signal a real operator outage. This build goes deeper: detect genuine degradation and attribute ownership, without crying wolf.

What I did

A two-tier, dual-condition threshold model. Pre-alerts fire when errors rise above a merchant/operator/error baseline AND success trends unhealthy; full alerts escalate when both breach harder. Explainable by design, with an AI copilot that drafts plain-English summaries and merchant-safe comms around the alerts, never the detection itself.

Python / pandasNumPyStatistical thresholdsDatadogAI copilot

THE MODEL

Two conditions beat one threshold.

For each merchant / operator / error reason I baseline the mean and spread of the error rate and its correlation with success. A pre-alert needs error_rate ≥ mean + 1σ AND success ≤ a warning percentile; a full alert tightens both. The dual condition is what stops harmless error spikes from paging anyone.

OWNERSHIP

Whose problem is it?

Isolated to one merchant with config/credit signals → merchant-side. Errors rising across many merchants as success drops broadly → operator-side. That attribution is what turns an alert into an action.

BACKTEST

From 4,002 candidates to 175 real incidents.

On a 30-day holdout across 100 merchants and 2 operators, success-rate guardrails cut 4,002 single-condition candidates to 1,269 dual-condition alerts, a 68% reduction, and surfaced 175 full-alert incidents as a ranked triage queue with owner, spread, and impact score.

From raw error spikes to ranked incidents · 30-day backtest, 100 merchants

Naive rule“error rate > 5%” on its own

4,002

Dual-conditionerrors up AND success down together

1,269

Real incidentsranked, each with an owner

175

4,002 → 175candidates triaged down to incidents a human can actually work

The trick is the dual condition: a page only fires when errors spike and success dips together — cutting the noise 68% before anything ever reaches a person.

Keep exploring

sastaticket.pkCybersource Pending-Transaction FixTraced a 22% pending-transaction rate to a button that never disabled, then shipped the fix that eliminated it.Simpaisa × Spotify / BokubKash Tokenization Mock APIA behavior-faithful bKash tokenization mock that stood in for a sandbox we did not have, so two engineering teams built in parallel and Spotify’s hard launch slot held on time.