API Abuse at Scale

June 1, 2024 ·2 min read #security#ml

When you’re defending an API surface that 300 million people touch every day, the problem stops being about any single request. The attacker who matters isn’t the one firing a malformed payload — your WAF catches that. The one who matters looks exactly like a real user, because for a while, they were one.

The signal is in the shape, not the content

Early on I made the classic mistake: I tried to write rules. Rate limits, geo-blocks, header heuristics. Every rule I wrote, an adversary trivially routed around within days. The lesson landed hard — static rules are a snapshot of yesterday’s attack.

What actually held up was modeling behavior over time. Not “is this request bad” but “does this entity behave like the population of legitimate entities.” That reframing changed everything:

A single login from Lagos means nothing.
Ten thousand logins from Lagos, each succeeding on the second attempt, with identical inter-request timing, means everything.

Baselines beat blocklists

The system I’m proudest of doesn’t block anything outright. It builds a rolling baseline of normal for every dimension we can measure — request cadence, session shape, device entropy, the JA4 fingerprint of the TLS handshake — and scores deviation. Confident anomalies get challenged; everything else flows.

The goal was never to stop bad requests. It was to make abuse expensive and slow while keeping the good path invisible.

That reframing cut our manual triage load by about 85%. The analysts stopped chasing individual alerts and started reviewing clusters the model had already grouped and explained.

What I’d tell my past self

If you’re building abuse detection: instrument first, model second, block last. You cannot detect a deviation from a baseline you never measured. Spend the first month just watching, and the patterns will tell you what to build.

← All posts