Ricerca

Risultati negativi

Le varianti del modello e le aggiunte di feature che sono state testate, giudicate dal gate walk-forward Brier + ECE a 8×90 giorni, e non hanno migliorato l'ensemble in produzione. Pubblicate integralmente perché la decisione di non pubblicare è la stessa storia di calibrazione della decisione di pubblicare: ogni voce qui sotto registra un'ipotesi che qualcuno avrebbe potuto formulare, il test che l'ha giudicata e il motivo per cui il test ha detto no.

11 note su 25 nel corpus non sono state pubblicate. L'indice completo delle note, incluse le varianti che sono state pubblicate, si trova su /research/notes/.

Perché pubblicare i non pubblicati

Niente cherry-picking. Se fossero pubblicate solo le varianti che hanno migliorato il gate, l'ensemble in produzione sembrerebbe più inevitabile di quanto sia. I non pubblicati sono la prova di ciò che il corpus e il gate non riescono a distinguere: sono lo spazio negativo attorno a ogni modifica pubblicata.
Evita di ritestare per errore. Un'ablazione fallita sei mesi fa è invisibile per un nuovo collaboratore a meno che il suo resoconto non sia trovabile. Mantenere i risultati negativi sulla stessa superficie dei positivi significa che "qualcuno ci ha già provato?" ha una risposta che non richiede la lettura del log dei commit.
Delimita il tetto del modello. Una serie di varianti ad alta capacità fallite sullo stesso corpus è a sua volta una misura: il gate è difficile da battere con i dati attualmente disponibili. Quel segnale è più utile per un lettore che può vedere i fallimenti che per uno che vede solo i successi.

Risultati negativi

Perché pubblicare i non pubblicati

A within-match chase layer "passes" the headline gate — and the placebo proves it shouldn't

Is composite coverage the lever for the player-strength offset? (No)

Does a player-form (momentum) offset improve match forecasts? (No)

Can we fit the player-strength coefficient instead of hand-setting it? (No)

Anytime-scorer `start_prob` v2 — predicted-XI layer (default-off)

Do teams try harder in must-win games? (No, actually)

Letting team ratings drift over time (didn't improve predictions)

Do some playing styles beat others? (Not enough to measure)

Retuning the models for tournament football — what changed

Does extra rest between matches help? (Not measurably)

Can international-tournament StatsBomb signals beat the club-derived baseline?

Perché pubblicare i non pubblicati

A within-match chase layer "passes" the headline gate — and the placebo proves it shouldn't

Is composite *coverage* the lever for the player-strength offset? (No)

Does a player-form (momentum) offset improve match forecasts? (No)

Can we fit the player-strength coefficient instead of hand-setting it? (No)

Anytime-scorer `start_prob` v2 — predicted-XI layer (default-off)

Do teams try harder in must-win games? (No, actually)

Letting team ratings drift over time (didn't improve predictions)

Do some playing styles beat others? (Not enough to measure)

Retuning the models for tournament football — what changed

Does extra rest between matches help? (Not measurably)

Can international-tournament StatsBomb signals beat the club-derived baseline?

Is composite coverage the lever for the player-strength offset? (No)