研究
阴性结果
经过测试、以 8×90 天前向 Brier + ECE 门控为标准进行判定、未能改善现有集成模型的模型变体和特征添加。完整发布, 因为不发布的决定与发布的决定同属一个校准故事: 下方每一条记录了一个可能被提出的假设、判定它的测试, 以及测试结果为否的原因。
语料库 23 篇笔记中有 11 篇未发布。 完整笔记索引(包括已发布的变体)位于 /research/notes/.
为什么发布未通过的结果
- 避免选择性发布。 如果只发布通过门控的变体, 现有的集成模型会显得比实际更加必然。未通过的结果是每次已发布模型变更周围负空间的证据, 展示了语料库和门控无法区分的内容。
- 防止意外重复测试。 六个月前失败的消融实验对新协作者来说是不可见的, 除非其报告可被发现。将阴性结果与阳性结果放在同一平台上, 意味着「有人试过这个吗?」有一个不需要翻阅提交日志的答案。
- 界定模型天花板。 一系列在相同语料库上失败的高容量变体本身就是一种测量: 以当前可用数据, 门控很难被超越。对于能看到失败记录的读者来说, 这个信号比只看到成功的读者更有用。
- 未发布3 June 2026
A within-match chase layer "passes" the headline gate — and the placebo proves it shouldn't
The feasibility probe found that, after controlling for team strength, only
阅读笔记 →
- 未发布29 May 2026
Is composite *coverage* the lever for the player-strength offset? (No)
player-composite's match coverage — whether honestly (point-in-time WC
阅读笔记 →
- 未发布29 May 2026
Does a player-form (momentum) offset improve match forecasts? (No)
player-form differential offset `Δ = α·(form_home − form_away)` does
阅读笔记 →
- 未发布29 May 2026
Can we fit the player-strength coefficient instead of hand-setting it? (No)
α = 0.05 offset (Model 16) beats a per-fold fitted α on median Brier.
阅读笔记 →
- 未发布27 May 2026
Anytime-scorer `start_prob` v2 — predicted-XI layer (default-off)
Model 5 (`scripts/build_anytime_scorer.py`) produces `P(player scores ≥ 1 across the WC tournament)`. The headline depends on `E[minutes]`, which is derived from `start_prob` (the per-match starter likelihood). The v1 chain was:
阅读笔记 →
- 未发布27 May 2026
Do teams try harder in must-win games? (No, actually)
Football economics literature (Brams & Ismail 2018; Apesteguia & Palacios-Huerta 2010 on tournament-incentive distortions) reports that match outcomes in the final round of group-stage tournaments deviate from baseline expectations when the
阅读笔记 →
- 未发布27 May 2026
Letting team ratings drift over time (didn't improve predictions)
Per the design note (variant a, "EMA on (α_t, β_t)"): each team's attack/defence parameters should EVOLVE through time rather than absorb every era's matches into a single stationary compromise. Refit DC at K snapshot timestamps (= the 8 qu
阅读笔记 →
- 未发布24 May 2026
Do some playing styles beat others? (Not enough to measure)
- `scripts/build_style_matchup_training.py` (per-match training join)
阅读笔记 →
- 未发布23 May 2026
Retuning the models for tournament football — what changed
PR #310 documented that all four models in the ensemble are ~7% worse on tournament matches than on the all-matches average. The natural follow-up is to refit the predict-time knobs on a tournament-only training slice and serve tournament-v
阅读笔记 →
- 未发布21 May 2026
Does extra rest between matches help? (Not measurably)
Sports-science literature reports a measurable effect of recovery time on football performance: better-rested teams score slightly more goals than fatigued ones. The expected magnitude is small but consistent across studies (Mohr et al. 201
阅读笔记 →
- 未发布
Can international-tournament StatsBomb signals beat the club-derived baseline?
PR #525 + PR #532 produced two new per-team signals extracted from StatsBomb open event data across WC 2018/2022, Euro 2020/2024, Copa America 2024, AFCON 2023:
阅读笔记 →