How PopTrade Detects Ad Fraud: A Technical Overview Without the ML Hype
Every ad platform claims to have world-class fraud detection. Most are exaggerating. This article explains how PopTrades fraud detection actually works - the signals we use, our scoring approach, and why we chose practical methods over impressive-sounding but ineffective ML buzzwords.
The Fraud Detection Reality Check
Lets be honest about what fraud detection can and cannot do:
What We Can Detect
- Known bot signatures and automation tools
- Datacenter and proxy traffic
- Behavioral anomalies that deviate significantly from human patterns
- Technical inconsistencies in device/browser fingerprints
- Geographic mismatches and VPN usage
What Nobody Can Reliably Detect
- Sophisticated residential proxy traffic
- Human click farms
- Motivated fraudsters who study detection methods
- Zero-day fraud techniques
Anyone claiming 100% fraud detection is lying. Our goal is catching the obvious fraud cheaply and making sophisticated fraud expensive enough to be unprofitable.
Signal-Based Scoring
We use a weighted signal approach rather than black-box ML:
Critical Signals (High Weight)
WebDriver Detection
Checks if navigator.webdriver is true - indicates Selenium, Puppeteer, or similar automation:
- Weight: 0.95 (near-certain fraud)
- Why: No legitimate user has this flag set
Automation Markers
Presence of window.callPhantom, window._phantom, or similar:
- Weight: 0.90
- Why: Only exists in headless browser environments
Headless Browser Signatures
User-Agent containing HeadlessChrome or PhantomJS:
- Weight: 0.85
- Why: Explicit admission of non-human browser
Behavioral Signals (Medium Weight)
Time-to-Event (TTE)
How quickly did a click happen after page load?
- Under 50ms: Weight 0.65 (physically impossible for humans)
- Under 500ms: Weight 0.40 (suspicious but possible)
- Why: Bots click instantly; humans need time to perceive and react
No User Interactions
Zero mouse movements, keyboard events, or scroll events:
- Weight: 0.60
- Why: Real users move mouse, scroll, interact with page
Untrusted Events
Click events where event.isTrusted is false:
- Weight: 0.80
- Why: Programmatically generated clicks are not trusted by browser
Technical Signals (Variable Weight)
Plugin Count
navigator.plugins.length equals zero on desktop:
- Weight: 0.30
- Why: Most real browsers have some plugins; headless often has none
Screen Anomalies
Screen dimensions that dont match any real device:
- Weight: 0.35
- Why: Bots often use arbitrary or default screen sizes
Geographic Mismatch
Declared location doesnt match IP geolocation:
- Weight: 0.40-0.70 depending on severity
- Why: VPN/proxy users or spoofed location data
Score Aggregation
Individual signals combine into a final fraud score:
final_score = 1 - product(1 - signal_weight for each triggered signal)
This means:
- Single weak signal: low score
- Multiple weak signals: elevated score
- Any critical signal: high score
- Multiple critical signals: near-certain fraud
The Soft Reject System
Not everything is black and white. We have three outcomes:
Accept (Score below threshold)
Traffic looks legitimate. Impression served, advertiser charged, publisher paid.
Hard Reject (Score above high threshold)
Traffic is almost certainly fraud. Request blocked entirely. No impression, no charge, no payment.
Soft Reject (Score in middle zone)
Traffic is suspicious but not certain fraud. This is where it gets interesting:
- Impression may still serve (configurable by buyer)
- Traffic routed to fallback if configured
- Logged for analysis but not automatically blocked
- Publisher not penalized for borderline cases
Why Soft Reject Matters
The advertising ecosystem has a problem: aggressive fraud detection creates false positives that harm legitimate publishers. Soft reject addresses this:
For Publishers
Borderline traffic isnt immediately rejected. If your users happen to use VPNs for privacy, they might still see ads (depending on buyer settings) rather than being blanket-blocked.
For Advertisers
You control the threshold. Conservative buyers can hard-reject anything suspicious. Others can accept soft-reject traffic at lower bids.
For the Platform
Fewer disputes. When we say traffic is fraud, we mean it. Soft rejects handle the gray area without false accusations.
External Provider Integration
Our built-in detection catches common fraud. For sophisticated threats, we integrate external providers:
Available Integrations
- IPQualityScore - IP reputation, proxy detection, device fingerprinting
- Pixalate - MRC-accredited invalid traffic detection
- Fraudlogix - Real-time bot and click fraud detection
- HUMAN (White Ops) - Sophisticated bot detection
How Integration Works
- External score fetched in parallel with internal checks
- Results cached for 5-15 minutes to reduce API calls
- External score weighted into final decision
- Buyers choose which providers to enable
Why We Dont Use Deep Learning
The honest answer: it doesnt work well for our use case.
ML Requires Massive Labeled Data
You need millions of confirmed fraud/not-fraud examples. Labeling is expensive and often wrong.
Fraud Evolves Faster Than Models
By the time you train a model on last months fraud patterns, fraudsters have moved on.
Explainability Matters
When we reject traffic, we need to explain why. Black box says no isnt acceptable for dispute resolution.
Signal-Based is Faster
ML inference adds latency. Signal checks are microseconds. In RTB, speed matters.
Our approach: use interpretable signals that catch known fraud patterns, integrate external providers for sophisticated threats, and stay humble about what we cant detect.