연구

미채택 결과

테스트되고, 8x90일 워크포워드 Brier + ECE 게이트로 판정되고, 운영 앙상블을 개선하지 못한 모델 변형 및 피처 추가. 미채택 결정이 채택 결정과 동일한 보정 이야기이므로 전문 공개합니다. 아래 각 항목에는 누군가 세울 수 있었던 가설, 그것을 판정한 테스트, 테스트가 거부한 사유가 기록되어 있습니다.

전체 25편 노트 중 11편이 미채택입니다. 채택된 변형을 포함한 전체 노트 색인은 다음에 있습니다: /research/notes/.

미채택 결과를 공개하는 이유

선별적 공개 방지. 게이트를 개선한 변형만 공개하면, 운영 앙상블이 실제보다 더 필연적으로 보일 것입니다. 미채택 결과는 모든 채택된 모델 변경 주변의 부정적 공간에 대한 증거입니다.
실수로 재테스트하는 것을 방지. 6개월 전 실패한 제거 실험은 작성 보고서가 검색 가능하지 않으면 새 기여자에게 보이지 않습니다. 미채택 결과를 긍정적 결과와 같은 페이지에 유지하면 "누가 이것을 시도했나요?"에 대한 답이 커밋 로그를 읽지 않아도 가능합니다.
모델의 상한을 설정. 동일 데이터에서 용량이 큰 변형들이 연속으로 실패하는 것 자체가 하나의 측정입니다: 현재 가용 데이터로는 게이트를 넘기 어렵다는 것. 이 신호는 실패를 볼 수 있는 독자에게 성공만 보는 독자보다 더 유용합니다.

미채택 결과

미채택 결과를 공개하는 이유

A within-match chase layer "passes" the headline gate — and the placebo proves it shouldn't

Is composite coverage the lever for the player-strength offset? (No)

Does a player-form (momentum) offset improve match forecasts? (No)

Can we fit the player-strength coefficient instead of hand-setting it? (No)

Anytime-scorer `start_prob` v2 — predicted-XI layer (default-off)

Do teams try harder in must-win games? (No, actually)

Letting team ratings drift over time (didn't improve predictions)

Do some playing styles beat others? (Not enough to measure)

Retuning the models for tournament football — what changed

Does extra rest between matches help? (Not measurably)

Can international-tournament StatsBomb signals beat the club-derived baseline?

미채택 결과를 공개하는 이유

A within-match chase layer "passes" the headline gate — and the placebo proves it shouldn't

Is composite *coverage* the lever for the player-strength offset? (No)

Does a player-form (momentum) offset improve match forecasts? (No)

Can we fit the player-strength coefficient instead of hand-setting it? (No)

Anytime-scorer `start_prob` v2 — predicted-XI layer (default-off)

Do teams try harder in must-win games? (No, actually)

Letting team ratings drift over time (didn't improve predictions)

Do some playing styles beat others? (Not enough to measure)

Retuning the models for tournament football — what changed

Does extra rest between matches help? (Not measurably)

Can international-tournament StatsBomb signals beat the club-derived baseline?

Is composite coverage the lever for the player-strength offset? (No)