Track record
How accurate is the model?
Anyone can say a team has a 70% chance. The honest test is what happens next: do teams given a 70% chance really win about seven times in ten? Here is the one number that answers that — and then every place you can check our working.
Graded against results you already know
When this model says a team has a 70% chance, that is about how often it happens. We tested it on every one of 987 matches at 24 past tournaments (2014–2024) — each one graded by the model rebuilt as it stood the day before kickoff, so it never saw the result — and its stated chances landed within about 5.6 percentage points of what actually happened.
Put as a single number: on average it rated the result that actually happened about 35% more likely than a blind 1-in-3 guess would.
For the statistically minded, that is a 0.572 against the 0.667 a blind guess scores — lower is better. It is the honest yardstick for 2026, not a number flattered after the fact.
See the full scoreboard — by tournament, by confidence band, with reliability diagrams →
Check the working
Five ways the model is held to account — the evidence, the failures, and the versioned record behind every number.
Live + held-out
Calibration scoreboard
The full held-out backtest broken out by tournament and confidence band, plus the live tracker that scores every 2026 match as it's played. A 70%-rated outcome should happen about 70% of the time — this is where you check.
The argument · free
Why trust the numbers
The discipline behind the probabilities — pre-registered acceptance gates, tier-honest reporting, and the parts of the model where confidence is genuinely lower, called out by name.
Published failures
What didn't work
Every model variant that failed the ship gate, published in full with its verdict. The no-ships are as visible as the wins — if only the winners showed, the model would look more inevitable than it is.
Versioned record
Brier at every release
The versioned history of the model — each retrain stamped with its Brier-at-release, so the number on any page traces back to a dated row.
How it's built
Methodology
The component models, training procedure, data sources, and backtest design — all reproducible from public data.
Prediction integrity
Locked before kickoff
Every match forecast locks a few hours before kickoff. The locked probabilities are the final prediction the model is graded against. Once frozen, the numbers cannot change, so the calibration scores on this page reflect what was actually published before each match.