Pinning down what's actually obtainable, cheaply, before designing models around feeds we'll never have. The bounding constraint on most amateur stacks is data, not modelling skill.
Data sources
Data Sources — Landscape Note (Area F)
Summary + sample · full document is 2,054 words
Summary
Sample
Tiers of football data
Roughly, in increasing order of richness, cost, and access friction:
- Results + odds — every match, full coverage, free or near-free.
- Aggregated stats (shots, possession, xG totals) — public for top leagues.
- Event data (per-action rows: passes, dribbles, tackles, shots) — partly free at sample scale, paid at production scale.
- Tracking data (25 Hz positions of all 22 players + ball) — mostly paid, broadcast-derived options narrowing the gap.
- 360 / freeze-frames (off-ball player positions at the moment of each event) — middle ground, partial access via StatsBomb Open Data.
- Lineups, injuries, late team news — distributed across feeds, tightly time-sensitive, the bottleneck for any lineup-aware model.
- Live odds / market microstructure — paid APIs or scraped exchange data.
Full document
Pro
Want the full document?
Data Sources — Landscape Note (Area F) runs 2,054 words. Pro members get every research note in full, plus the arbitrage feed, model outputs, and weekly updates.