Group stage scorecard: grading all 72 predictions

Every prediction we published for the group stage is now on the record. Seventy-two matches, all scored, all graded. This is the full accountability report.

We grade each prediction using Brier scores. Lower is better. An A+ means the model called it almost perfectly (Brier below 0.20). An F means the model was badly wrong (Brier above 1.60). The full methodology explains the scoring in detail, and the interactive report card lets you explore match by match.

The headline numbers

Metric	Value
Total matches	72
Mean Brier score	0.516
A+ grades	23 (32%)
A grades	18 (25%)
B grades	22 (31%)
C grades	8 (11%)
D grades	1 (1%)
F grades	0 (0%)

More than half of all predictions earned an A or A+. None received an F. The single D was Spain 0-0 Cape Verde on Match Day 1, where the model assigned just 10.5% to the draw.

Draws were the model's weakness. Twenty of 72 matches ended in draws (27.8%), but the model's average draw probability was 23.2%. That 5-point gap between prediction and reality drove most of the C and D grades.

The model's favourite won 43 of 72 matches (59.7%). Underdogs won outright nine times (12.5%). The remaining 20 were draws.

Group rankings by mean Brier

Rank	Group	Mean Brier	A+/A	B	C	D/F
1	I (France, Senegal, Norway, Iraq)	0.262	5	1	0	0
2	C (Brazil, Morocco, Haiti, Scotland)	0.271	5	1	0	0
3	J (Argentina, Algeria, Austria, Jordan)	0.275	5	1	0	0
4	L (England, Croatia, Ghana, Panama)	0.474	4	1	1	0
5	F (Netherlands, Japan, Sweden, Tunisia)	0.479	4	2	0	0
6	K (Portugal, DR Congo, Colombia, Uzbekistan)	0.533	3	2	1	0
7	B (Canada, Bosnia, Qatar, Switzerland)	0.543	4	1	1	0
8	E (Germany, Ivory Coast, Ecuador, Curaçao)	0.589	3	1	2	0
9	A (Mexico, South Africa, South Korea, Czech Republic)	0.622	3	2	1	0
10	G (Belgium, Egypt, Iran, New Zealand)	0.666	2	4	0	0
11	D (USA, Australia, Paraguay, Turkey)	0.676	1	5	0	0
12	H (Spain, Cape Verde, Uruguay, Saudi Arabia)	0.804	2	1	2	1

Three groups stand out. Groups I, C, and J all scored below 0.30, meaning the model read nearly every match correctly. At the other end, Group H was the only group above 0.70. The gap between the best and worst group (0.262 vs 0.804) is wide enough to tell a story about where the model works and where it breaks down.

The best groups

Group I: France, Senegal, Norway, Iraq (0.262)

The model's best group. Five of six predictions earned A or A+, and the worst call (Senegal 2-3 Norway, Brier 0.546) was still a B. France won all three matches as projected. Norway finishing second was well-calibrated. The model had a clear read on the power structure in this group.

Match	Result	P(result)	Brier	Grade
France vs Senegal	3-1	57.4%	0.276	A
Iraq vs Norway	1-4	67.7%	0.166	A+
France vs Iraq	3-0	79.3%	0.073	A+
Senegal vs Norway	2-3	39.8%	0.546	B
France vs Norway	4-1	56.3%	0.287	A
Senegal vs Iraq	5-0	62.4%	0.224	A

Group C: Brazil, Morocco, Haiti, Scotland (0.271)

Four A+ grades out of six. Brazil beat Haiti (Brier 0.015, the best single prediction of the entire tournament) and Scotland. Morocco beat Scotland and Haiti. The two non-A+ results were Brazil 1-1 Morocco (Brier 0.847, a B), where the model had Brazil as favourites and the draw surprised, and Morocco 1-0 Scotland (Brier 0.396, an A), where the model had Morocco as slight favourites and was nearly right.

Match	Result	P(result)	Brier	Grade
Brazil vs Morocco	1-1	27.7%	0.847	B
Haiti vs Scotland	0-1	68.6%	0.152	A+
Brazil vs Haiti	3-0	90.8%	0.015	A+
Morocco vs Scotland	1-0	48.7%	0.396	A
Brazil vs Scotland	3-0	71.1%	0.130	A+
Morocco vs Haiti	4-2	77.6%	0.083	A+

Group J: Argentina, Algeria, Austria, Jordan (0.275)

Argentina winning all three matches was projected at high confidence, and the model earned four A+ grades. The only blemish was Algeria 3-3 Austria (Brier 0.816, a B), a wild six-goal draw on Match Day 3 where the model slightly favoured Algeria to win outright.

Match	Result	P(result)	Brier	Grade
Argentina vs Algeria	3-0	69.6%	0.149	A+
Austria vs Jordan	3-1	64.0%	0.196	A+
Argentina vs Austria	2-0	64.5%	0.198	A+
Algeria vs Jordan	2-1	59.3%	0.249	A
Argentina vs Jordan	3-1	84.9%	0.040	A+
Algeria vs Austria	3-3	26.6%	0.816	B

The worst group

Group H: Spain, Cape Verde, Uruguay, Saudi Arabia (0.804)

The model's hardest group by a wide margin. Spain 0-0 Cape Verde on Match Day 1 was the worst prediction of the tournament (Brier 1.572, the only D grade). The model gave that draw a 10.5% probability. Cape Verde 2-2 Uruguay (Brier 1.151, a C) and Saudi Arabia 1-1 Uruguay (Brier 1.073, a C) compounded the problem. The model treated Uruguay as a strong contender and Cape Verde as the group's weakest team. In reality, Uruguay finished with just 2 points and Cape Verde nearly qualified.

Match	Result	P(result)	Brier	Grade
Spain vs Cape Verde	0-0	10.5%	1.572	D
Saudi Arabia vs Uruguay	1-1	23.4%	1.073	C
Spain vs Saudi Arabia	4-0	85.4%	0.037	A+
Cape Verde vs Uruguay	2-2	21.4%	1.151	C
Spain vs Uruguay	1-0	59.9%	0.247	A
Cape Verde vs Saudi Arabia	0-0	29.6%	0.744	B

The lesson from Group H is about draws. Four of six matches were decided (or drawn) by a single goal, and three of those were draws. When a group produces more draws than expected, the model's probabilities are too spread across win/loss outcomes and the Brier penalty compounds.

The middle groups

A few groups deserve brief notes.

Group D (USA, Australia, Paraguay, Turkey) had the second-worst mean Brier (0.676) despite no C, D, or F grades. Five of six matches were B grades: results that were plausible but not confidently predicted. The USA winning the group was expected, but their margins of victory (4-1 vs Paraguay, 2-0 vs Australia) were larger than the model projected, and Australia finishing second ahead of Paraguay was an outcome the model saw as close to a coin flip. The USA losing to Turkey 2-3 on Match Day 3 was the group's only A grade.

Group G (Belgium, Egypt, Iran, New Zealand) was similar: four B grades, reflecting a group where the model had a reasonable read on the hierarchy but couldn't separate the outcomes with confidence. Belgium starting slowly (1-1 draw with Egypt, 0-0 draw with Iran) dragged the scores up before their 5-1 win over New Zealand (Brier 0.073, A+) restored order.

Group E (Germany, Ivory Coast, Ecuador, Curaçao) contained both extremes: Germany 7-1 Curaçao (Brier 0.018, the second-best prediction of the tournament) and Ivory Coast 1-0 Ecuador (Brier 1.058, a C where the model had the underdogs at just 17.2%).

The ten biggest misses

Match	Result	P(result)	Brier	Grade
Spain vs Cape Verde	0-0	10.5%	1.572	D
Curaçao vs Ecuador	0-0	15.0%	1.397	C
Qatar vs Switzerland	1-1	15.4%	1.353	C
England vs Ghana	0-0	16.8%	1.322	C
Portugal vs DR Congo	1-1	19.3%	1.220	C
Cape Verde vs Uruguay	2-2	21.4%	1.151	C
South Africa vs South Korea	1-0	16.6%	1.093	C
Saudi Arabia vs Uruguay	1-1	23.4%	1.073	C
Ivory Coast vs Ecuador	1-0	17.2%	1.058	C
Iran vs New Zealand	2-2	24.5%	0.982	B

Seven of the ten biggest misses were draws. The other three were underdog wins where the model assigned them less than 20% probability. The pattern is clear: when the model is most wrong, it is usually because it underestimated the draw.

The ten best calls

Match	Result	P(result)	Brier	Grade
Brazil vs Haiti	3-0	90.8%	0.015	A+
Germany vs Curaçao	7-1	90.0%	0.018	A+
Spain vs Saudi Arabia	4-0	85.4%	0.037	A+
Argentina vs Jordan	3-1	84.9%	0.040	A+
France vs Iraq	3-0	79.3%	0.073	A+
Belgium vs New Zealand	5-1	79.0%	0.073	A+
England vs Panama	2-0	78.8%	0.074	A+
Morocco vs Haiti	4-2	77.6%	0.083	A+
Portugal vs Uzbekistan	5-0	71.7%	0.128	A+
Curaçao vs Ivory Coast	0-2	71.5%	0.129	A+

The best calls share a pattern: strong favourites winning comfortably. When the model assigns a probability above 70%, it is almost always right. The top ten best calls all had pre-match probabilities above 71%. The model's strength is reading power gaps; its weakness is reading competitive matches where draws are likelier than the model expects.

What this means for the knockouts

The group stage exposed one systematic weakness: draw calibration. The model predicted draws at 23.2% on average, but draws happened 27.8% of the time. That gap matters less in the knockout rounds, where draws lead to extra time and penalties rather than a final result. The model's core strength (reading which team is stronger and by how much) translates directly to the single-elimination format.

The full interactive scorecard is on the report card page. Every match, every grade, every probability.

Brier scores are computed from predictions frozen at least 24 hours before kickoff (the "receipt" system). Grades follow the thresholds on the report card page: A+ (below 0.20), A (below 0.45), B (below 1.00), C (below 1.40), D (below 1.60), F (1.60 and above). The model publishes probabilities, not recommendations. Full methodology. Full Terms of Use.