Pesquisa
Notas e ablações
Registros de decisão da construção do modelo. Cada nota documenta uma hipótese, a configuração do backtest, o resultado e se o ajuste foi publicado. Resultados negativos também são mantidos aqui — a decisão de não publicar é tão importante quanto a decisão de publicar, e um registro público protege contra revisitar a mesma ablação por acidente.
- Não publicado3 June 2026
A within-match chase layer "passes" the headline gate — and the placebo proves it shouldn't
The feasibility probe found that, after controlling for team strength, only
- Publicado31 May 2026
Testing our approach on the Champions League final
The `/test/live/<slug>/` route renders the live-tracker pipeline
- Não publicado29 May 2026
Is composite *coverage* the lever for the player-strength offset? (No)
player-composite's match coverage — whether honestly (point-in-time WC
- Production xG path WIRED (2026-05-29) — `auto-refit.yml` fits DC with `--use-xg` and refits the calibrator on the xG-enabled ensemble; HP excluded (fails the gate's ECE half). The xG-enabled artefacts (`dixon_coles.json` / `ensemble_calibrator.json` / `data.json`) regenerate on the first auto-refit after merge (not hand-committed — see "Production wiring" for why). Single-provider corpus (314 StatsBomb + 28 residual Opta Copa-2021 = 342 rows). Gate clears for DC + Ensemble on both evaluation slices29 May 2026
Back-filling international xG from StatsBomb open data
The model's `--use-xg` path fits `round(xG)` as the per-match Poisson response in
- Não publicado29 May 2026
Does a player-form (momentum) offset improve match forecasts? (No)
player-form differential offset `Δ = α·(form_home − form_away)` does
- Não publicado29 May 2026
Can we fit the player-strength coefficient instead of hand-setting it? (No)
α = 0.05 offset (Model 16) beats a per-fold fitted α on median Brier.
- Não publicado27 May 2026
Anytime-scorer `start_prob` v2 — predicted-XI layer (default-off)
Model 5 (`scripts/build_anytime_scorer.py`) produces `P(player scores ≥ 1 across the WC tournament)`. The headline depends on `E[minutes]`, which is derived from `start_prob` (the per-match starter likelihood). The v1 chain was:
- Não publicado27 May 2026
Do teams try harder in must-win games? (No, actually)
Football economics literature (Brams & Ismail 2018; Apesteguia & Palacios-Huerta 2010 on tournament-incentive distortions) reports that match outcomes in the final round of group-stage tournaments deviate from baseline expectations when the
- Publicado27 May 2026
Hierarchical Poisson — full PyMC NUTS posterior
* Fit posterior: `scripts/fit_hp_posterior.py`
- Não publicado27 May 2026
Letting team ratings drift over time (didn't improve predictions)
Per the design note (variant a, "EMA on (α_t, β_t)"): each team's attack/defence parameters should EVOLVE through time rather than absorb every era's matches into a single stationary compromise. Refit DC at K snapshot timestamps (= the 8 qu
- Publicado25 May 2026
Are Premier League players really better? Cross-league strength adjustments
α = 0.05 unchanged.
- Publicado25 May 2026
Does your starting goalkeeper change your defence? (Yes)
The starting-keeper rating is informative beyond what the team-level
- Publicado24 May 2026
Predicting goalscorers: breaking down shot volume and shot quality
- `scripts/build_ratings_player.py` (now splices `big5_player_shooting.parquet` into `ratings_player.csv`)
- Não publicado24 May 2026
Do some playing styles beat others? (Not enough to measure)
- `scripts/build_style_matchup_training.py` (per-match training join)
- Não publicado23 May 2026
Retuning the models for tournament football — what changed
PR #310 documented that all four models in the ensemble are ~7% worse on tournament matches than on the all-matches average. The natural follow-up is to refit the predict-time knobs on a tournament-only training slice and serve tournament-v
- Publicado22 May 2026
How well do the models predict tournaments specifically?
> **Update (2026-06-01) — superseded by a leakage-free measurement.** The ~0.545 tournament Brier below comes from a single-fold backtest that composes its ensemble from the *current* Elo snapshot rather than each team's rating as it stood
- Não publicado21 May 2026
Does extra rest between matches help? (Not measurably)
Sports-science literature reports a measurable effect of recovery time on football performance: better-rested teams score slightly more goals than fatigued ones. The expected magnitude is small but consistent across studies (Mohr et al. 201
- Publicado
Calibrating predictions differently for friendlies vs tournaments
- `scripts/fit_ensemble_calibrator.py` (current implementation)
- Complete — A1 + B1 run on the 8-walk / 90-day harness (snapshot 2026-05-28). Verdict: **neither ships.** B1 fails its gate outright; A1's gate "passes" but on an extremization artefact, not a keeper-skill signal. See the Results section at the end
Can international-tournament StatsBomb signals beat the club-derived baseline?
PR #525 + PR #532 produced two new per-team signals extracted from StatsBomb open event data across WC 2018/2022, Euro 2020/2024, Copa America 2024, AFCON 2023:
- Design only. No code written, no fit run, no decision taken
Can team strength change mid-season? Design for a time-varying model
The shipping Dixon-Coles fit (`scripts/fit_dixon_coles.py`) gives every
- Design only. No code written, no fit run, no decision taken
Can we model the game *script*? Design for a within-match game-state layer
Our forecasting stack — ClubElo / FIFA-Elo + Dixon-Coles + Hierarchical
Post-processing parameter sweep
Walk-forward backtest (8 folds x 90 days, n=1,844 matches) over four
- Feasibility probe complete. The design note's stated risk was wrong; a different risk is binding. Qualified result — read the decision gate at the bottom
Within-match game-state: the corpus is ample, the confounding is the problem
The design note proposed a within-match game-state layer — a leading team