Calibrating predictions differently for friendlies vs tournaments

Durum: Shipped (Variant 4 — per-tier Platt temperature scaling). Production calibrator uses a hybrid strategy: Platt for the tournament tier (where isotonic collapses to identity at n~70), isotonic for friendlies/qualifiers (where it's more expressive at n~400+). Gate passed. See results belowÖzet + tam not · 3,433 kelime

Özet

The shipping ensemble calibrator (scripts/fit_ensemble_calibrator.py) fits per-class isotonic regression curves on the uniform-averaged three-component output (Elo bracket MC + Dixon-Coles + Hierarchical Poisson MAP). The first cut lifted holdout ECE on the 365-day common-subset training pool from 4.62pp uncalibrated → 2.70pp under the pooled-across-tiers fit (5-fold CV, n_train = 939, current artefact at data/wc2026/ensemble_calibrator.json).

A subsequent tier-aware refit (three sets…

Tam not

Standard Pass

Tam araştırma notunu okuyun

Calibrating predictions differently for friendlies vs tournaments, 3,433 kelimedir. Standard Pass, tüm araştırma notlarını tam olarak açar; ayrıca tam tahmin, takım ve oyuncu bazında değerlendirmeler, turnuva boyunca geçerli.

Pass'ı alın — $15 →

Every forecast graded against the real result, scored on 987 matches since 2014. See the scorecard.

24h money-back, no questions asked·No subscription, no auto-renewal·Access through 31 Dec 2026. See refund policy.