Research note

Back-filling international xG from StatsBomb open data

Status: Production xG path WIRED (2026-05-29) — `auto-refit.yml` fits DC with `--use-xg` and refits the calibrator on the xG-enabled ensemble; HP excluded (fails the gate's ECE half). The xG-enabled artefacts (`dixon_coles.json` / `ensemble_calibrator.json` / `data.json`) regenerate on the first auto-refit after merge (not hand-committed — see "Production wiring" for why). Single-provider corpus (314 StatsBomb + 28 residual Opta Copa-2021 = 342 rows). Gate clears for DC + Ensemble on both evaluation slicesBacktest date: 29 May 2026Topline + full note · 1,755 words

Topline

The model's --use-xg path fits round(xG) as the per-match Poisson response in Dixon-Coles and Hierarchical Poisson (falling back to realised goals where xG is absent). It was structurally bounded by a tiny corpus: the JaseZiv FBref mirror that scripts/pull_intl_xg.py reads carries Opta xG for only 143 tournament matches — WC 2018 (64), Euro 2020 (51), Copa 2021 (28). WC 2022 is present-but-null there; Euro 2024 / Copa 2024 / AFCON 2023 are absent. With 143 xG-bearing matches out of ~49…

Full note

Standard Pass

Read the full research note

Back-filling international xG from StatsBomb open data runs 1,755 words. The Standard Pass unlocks every research note in full, plus the complete forecast and per-team and per-player ratings — valid through the tournament.

Get the Pass — $15

24h self-service refund·No subscription, no auto-renewal·Access through 31 Dec 2026. See refund policy.