Onderzoek
Notities & ablaties
Besluitvormingslogs uit de modelbouw. Elke notitie documenteert een hypothese, de backtest-opzet, het resultaat en of de aanpassing is verscheept. Negatieve resultaten worden hier ook bewaard: het besluit om niet te verschepen is net zo belangrijk als het besluit om wel te verschepen, en een openbaar register beschermt tegen het per ongeluk opnieuw uitvoeren van dezelfde ablatie.
- Niet verscheept3 June 2026
A within-match chase layer "passes" the headline gate — and the placebo proves it shouldn't
The feasibility probe found that, after controlling for team strength, only
- Verscheept31 May 2026
Testing our approach on the Champions League final
The `/test/live/<slug>/` route renders the live-tracker pipeline
- Niet verscheept29 May 2026
Is composite *coverage* the lever for the player-strength offset? (No)
player-composite's match coverage — whether honestly (point-in-time WC
- Production xG path WIRED (2026-05-29) — `auto-refit.yml` fits DC with `--use-xg` and refits the calibrator on the xG-enabled ensemble; HP excluded (fails the gate's ECE half). The xG-enabled artefacts (`dixon_coles.json` / `ensemble_calibrator.json` / `data.json`) regenerate on the first auto-refit after merge (not hand-committed — see "Production wiring" for why). Single-provider corpus (314 StatsBomb + 28 residual Opta Copa-2021 = 342 rows). Gate clears for DC + Ensemble on both evaluation slices29 May 2026
Back-filling international xG from StatsBomb open data
The model's `--use-xg` path fits `round(xG)` as the per-match Poisson response in
- Niet verscheept29 May 2026
Does a player-form (momentum) offset improve match forecasts? (No)
player-form differential offset `Δ = α·(form_home − form_away)` does
- Niet verscheept29 May 2026
Can we fit the player-strength coefficient instead of hand-setting it? (No)
α = 0.05 offset (Model 16) beats a per-fold fitted α on median Brier.
- Niet verscheept27 May 2026
Anytime-scorer `start_prob` v2 — predicted-XI layer (default-off)
Model 5 (`scripts/build_anytime_scorer.py`) produces `P(player scores ≥ 1 across the WC tournament)`. The headline depends on `E[minutes]`, which is derived from `start_prob` (the per-match starter likelihood). The v1 chain was:
- Niet verscheept27 May 2026
Do teams try harder in must-win games? (No, actually)
Football economics literature (Brams & Ismail 2018; Apesteguia & Palacios-Huerta 2010 on tournament-incentive distortions) reports that match outcomes in the final round of group-stage tournaments deviate from baseline expectations when the
- Verscheept27 May 2026
Hierarchical Poisson — full PyMC NUTS posterior
* Fit posterior: `scripts/fit_hp_posterior.py`
- Niet verscheept27 May 2026
Letting team ratings drift over time (didn't improve predictions)
Per the design note (variant a, "EMA on (α_t, β_t)"): each team's attack/defence parameters should EVOLVE through time rather than absorb every era's matches into a single stationary compromise. Refit DC at K snapshot timestamps (= the 8 qu
- Verscheept25 May 2026
Are Premier League players really better? Cross-league strength adjustments
α = 0.05 unchanged.
- Verscheept25 May 2026
Does your starting goalkeeper change your defence? (Yes)
The starting-keeper rating is informative beyond what the team-level
- Verscheept24 May 2026
Predicting goalscorers: breaking down shot volume and shot quality
- `scripts/build_ratings_player.py` (now splices `big5_player_shooting.parquet` into `ratings_player.csv`)
- Niet verscheept24 May 2026
Do some playing styles beat others? (Not enough to measure)
- `scripts/build_style_matchup_training.py` (per-match training join)
- Niet verscheept23 May 2026
Retuning the models for tournament football — what changed
PR #310 documented that all four models in the ensemble are ~7% worse on tournament matches than on the all-matches average. The natural follow-up is to refit the predict-time knobs on a tournament-only training slice and serve tournament-v
- Verscheept22 May 2026
How well do the models predict tournaments specifically?
> **Update (2026-06-01) — superseded by a leakage-free measurement.** The ~0.545 tournament Brier below comes from a single-fold backtest that composes its ensemble from the *current* Elo snapshot rather than each team's rating as it stood
- Niet verscheept21 May 2026
Does extra rest between matches help? (Not measurably)
Sports-science literature reports a measurable effect of recovery time on football performance: better-rested teams score slightly more goals than fatigued ones. The expected magnitude is small but consistent across studies (Mohr et al. 201
- Verscheept
Calibrating predictions differently for friendlies vs tournaments
- `scripts/fit_ensemble_calibrator.py` (current implementation)
- Complete — A1 + B1 run on the 8-walk / 90-day harness (snapshot 2026-05-28). Verdict: **neither ships.** B1 fails its gate outright; A1's gate "passes" but on an extremization artefact, not a keeper-skill signal. See the Results section at the end
Can international-tournament StatsBomb signals beat the club-derived baseline?
PR #525 + PR #532 produced two new per-team signals extracted from StatsBomb open event data across WC 2018/2022, Euro 2020/2024, Copa America 2024, AFCON 2023:
- Design only. No code written, no fit run, no decision taken
Can team strength change mid-season? Design for a time-varying model
The shipping Dixon-Coles fit (`scripts/fit_dixon_coles.py`) gives every
- Design only. No code written, no fit run, no decision taken
Can we model the game *script*? Design for a within-match game-state layer
Our forecasting stack — ClubElo / FIFA-Elo + Dixon-Coles + Hierarchical
Post-processing parameter sweep
Walk-forward backtest (8 folds x 90 days, n=1,844 matches) over four
- Feasibility probe complete. The design note's stated risk was wrong; a different risk is binding. Qualified result — read the decision gate at the bottom
Within-match game-state: the corpus is ample, the confounding is the problem
The design note proposed a within-match game-state layer — a leading team