A top-five European retail bank was losing an estimated EUR 47 million annually to transaction fraud. Their existing rule-based detection system — a set of 2,400 handcrafted rules maintained by a team of 8 analysts — caught obvious patterns but consistently missed sophisticated attacks, particularly synthetic identity fraud where criminals combine real and fabricated information to create new, seemingly legitimate identities.

Why rules weren't enough

The rule-based system had three fundamental limitations:

Static thresholds — Rules like "flag transactions over EUR 10,000" were trivially circumvented by structuring payments just below the limit
No behavioral context — Rules evaluated each transaction in isolation, missing patterns that only emerge over time (e.g., a gradual ramp-up of transaction amounts over weeks)
Maintenance burden — Each new fraud pattern required manual rule creation and testing, with a 6–8 week lead time from detection to deployment

By the time a new rule was deployed, the fraudsters had already moved on to a different technique. We were always fighting the last war.

The streaming architecture

We replaced the batch-oriented rule engine with a real-time streaming pipeline built on Apache Kafka and Apache Flink:

pipeline:
  ingestion:
    source: kafka
    topic: raw-transactions
    throughput: 12,000 events/sec
    format: avro

  feature_engineering:
    engine: flink
    windows:
      - type: sliding
        size: 1h
        slide: 5min
        features: [tx_count, tx_sum, unique_merchants, avg_amount]
      - type: session
        gap: 30min
        features: [session_duration, session_tx_count, geo_spread]
      - type: tumbling
        size: 24h
        features: [daily_volume, new_merchant_ratio, cross_border_ratio]

  scoring:
    model: gradient_boosted_ensemble
    latency_budget: 50ms
    fallback: rule_engine_v2

  action:
    - threshold: 0.92 -> block_transaction
    - threshold: 0.75 -> flag_for_review
    - threshold: 0.50 -> enhanced_monitoring

The feature engineering layer is the core innovation. For each incoming transaction, Flink computes 147 features across multiple time windows — from 5-minute micro-patterns to 30-day behavioral baselines. This gives the model a rich temporal context that rule-based systems simply cannot replicate.

The model

We use a gradient-boosted ensemble (XGBoost) rather than deep learning for two reasons: interpretability requirements from the bank's compliance team, and the strict 50ms latency budget that rules out heavier architectures.

import xgboost as xgb

model = xgb.XGBClassifier(
    n_estimators=350,
    max_depth=8,
    learning_rate=0.05,
    subsample=0.8,
    colsample_bytree=0.7,
    scale_pos_weight=580,  # severe class imbalance: 1 fraud per 580 legit
    tree_method="hist",
    eval_metric="aucpr",
)

model.fit(
    X_train, y_train,
    eval_set=[(X_val, y_val)],
    early_stopping_rounds=20,
)

The scale_pos_weight parameter is critical — fraud is extremely rare (0.17% of transactions), and without proper handling of class imbalance, the model would learn to predict "legitimate" for everything and still achieve 99.83% accuracy.

Catching synthetic identities

The breakthrough came from a set of graph-based features we engineered from the bank's transaction network. Synthetic identities often share characteristics that are invisible at the individual level but emerge when you look at the network:

Multiple new accounts sharing the same device fingerprint or IP address
Transaction patterns that mirror each other too closely (automated behavior)
Sudden appearance of a "well-established" credit history (fabricated through data manipulation)

We computed these features using a lightweight graph analysis running alongside the main pipeline, updating relationship scores every 5 minutes.

Results

After a 6-month deployment with parallel running (new system alongside the old one for validation):

Fraud detection rate: 89.4% (up from 62.1% with rules alone)
False positive rate: 0.031% (down from 0.089% — fewer legitimate customers inconvenienced)
Average detection latency: 38ms (vs. 12–36 hours with the batch system)
Estimated annual savings: EUR 31 million in prevented fraud losses
Rule maintenance team: reduced from 8 analysts to 2, with the others redeployed to strategic fraud intelligence

The system processes over 1 billion transactions per month and continues to improve as the model is retrained weekly with confirmed fraud labels.