Calibrating predictions differently for friendlies vs tournaments

Status: Shipped (Variant 4 — per-tier Platt temperature scaling). Production calibrator uses a hybrid strategy: Platt for the tournament tier (where isotonic collapses to identity at n~70), isotonic for friendlies/qualifiers (where it's more expressive at n~400+). Gate passed. See results belowSamenvatting + volledige notitie · 3,433 woorden

Samenvatting

The shipping ensemble calibrator (scripts/fit_ensemble_calibrator.py) fits per-class isotonic regression curves on the uniform-averaged three-component output (Elo bracket MC + Dixon-Coles + Hierarchical Poisson MAP). The first cut lifted holdout ECE on the 365-day common-subset training pool from 4.62pp uncalibrated → 2.70pp under the pooled-across-tiers fit (5-fold CV, n_train = 939, current artefact at data/wc2026/ensemble_calibrator.json).

A subsequent tier-aware refit (three sets…

Volledige notitie

Standard Pass

Lees de volledige onderzoeksnotitie

Calibrating predictions differently for friendlies vs tournaments telt 3,433 woorden. De Standard Pass ontgrendelt elke onderzoeksnotitie in zijn geheel, plus de volledige voorspelling en beoordelingen per team en per speler, geldig gedurende het toernooi.

Koop de Pass — $15 →

Every forecast graded against the real result, scored on 987 matches since 2014. See the scorecard.

24h money-back, no questions asked·No subscription, no auto-renewal·Access through 31 Dec 2026. See refund policy.