Back to Blog
Technology & Architecture

Why We Chose Lightweight Anomaly Detection Over ML Models

December 18, 20253 min read

Machine learning fraud detection is the industry standard pitch. Train models on labeled data, deploy neural networks, let AI catch fraudsters.

We went a different direction: rule-based detection with lightweight anomaly signals. This wasn't because we couldn't build ML systems. It was a deliberate engineering choice.

The ML Fraud Detection Promise

In theory, ML models can:

  • Learn complex patterns humans miss
  • Adapt to evolving fraud techniques
  • Handle high-dimensional feature spaces
  • Provide probabilistic risk scores

This sounds great. The reality is more complicated.

Why ML Fraud Detection Struggles

1. Training Data Problem

ML needs labeled training data: examples of "fraud" and "not fraud." But fraud labels are often:

  • Delayed (you don't know if traffic was fraudulent until downstream metrics appear)
  • Subjective (what's "fraud" varies by buyer, offer, vertical)
  • Imbalanced (fraud is rare, creating class imbalance issues)
  • Adversarial (fraudsters adapt when you deploy models)

Without clean labels, models learn noise rather than signal.

2. Explainability Problem

When a neural network flags traffic as "87% fraud probability," what does that mean? Which signals triggered it? How can users adjust their tolerance?

Black-box models create black-box decisions. Users can't understand, verify, or customize.

3. Latency Problem

Fraud decisions happen at ad-serve time (milliseconds). Complex models add latency. Simple models are faster.

At scale, the difference between 2ms and 50ms fraud check matters enormously.

4. Adversarial Adaptation

Sophisticated fraud operations probe detection systems. They learn what triggers blocks. ML models trained on historical data struggle against novel attack patterns.

Simple rules are actually more robust to certain attack types because they're based on physical constraints (a human can't click in 5ms) rather than statistical patterns (fraudsters can manipulate).

Our Approach: Explicit Signals

Instead of ML models, we use explicit signals with configurable weights:

Physical Impossibility Signals

  • Click timing < 50ms (physically impossible for humans)
  • Interaction events outside screen bounds
  • Page engagement before page could load

These aren't statistical—they're physical constraints that can't be gamed without changing the fraud technique entirely.

Technical Environment Signals

  • WebDriver flag (Selenium/Puppeteer detection)
  • Headless browser indicators
  • Automation framework artifacts

These detect specific tools rather than inferring from patterns.

Behavioral Heuristics

  • No mouse/keyboard/scroll events
  • Untrusted event objects
  • Missing expected browser capabilities

Simple rules with clear meaning.

The Benefit: User Control

Because every signal is explicit and weighted, users can:

  • Understand exactly why traffic was flagged
  • Adjust weights based on their tolerance
  • Disable signals that don't apply to their use case
  • Add their own rules via IP blacklists

Try doing that with a neural network.

External ML Integration

For users who want ML-powered detection, we integrate external providers (IPQualityScore, HUMAN, etc.). They have the scale and labeled data to train effective models.

This separates concerns: we handle the platform, they handle specialized fraud ML. Users choose their preferred approach.

The Tradeoff

Our approach catches less sophisticated fraud than state-of-the-art ML (which catches patterns we'd miss). It also has fewer false positives on legitimate edge cases (which ML often flags incorrectly).

We chose simplicity, explainability, and user control over maximum detection accuracy. For a platform emphasizing transparency, that's the right tradeoff.

Share: