Measured in public

Model report card

Every match, graded. The pre-match probability locks before kickoff; once the final whistle goes, it scores against the result and the day's card fills in. Brier score per match, running mean across the tournament.

Matches graded so far
2 / 4
Mean Brier score
0.398
Lower is better
0 = perfect foresight; 0.667 = always saying one-third each.

12 Jun 2026

0 of 2 graded · day card →

11 Jun 2026

2 of 2 graded · day mean Brier 0.398 · day card →

Upcoming match days

How results feed back into the model

Every final score on this page joins the model's training data. An automatic refit runs through the tournament: it pulls the latest results, re-estimates each team's attack and defence rates (Dixon-Coles on shot-quality data, hierarchical Poisson on goals), rolls the Elo ratings forward, and re-checks the calibration layer against everything played so far. A sanity guard blocks any refit that moves a leading team's tournament probability implausibly far in one step.

So an upset changes future forecasts twice: directly, through the ratings of the two teams involved, and indirectly, through what the calibrator learns about how confident the model should be. What never changes is the graded number above: each pre-match probability is locked before kickoff and scored as published. The refit improves tomorrow's forecast; it cannot touch yesterday's grade.

How to read this. Each graded row quotes the probability the model published before kickoff (the same number the calibration scoreboard grades) next to the realised result. Brier and log-loss on one match are noisy; the tournament mean is the honest measure. Methodology at /docs/calibration/. Research output, not betting advice.