← All docs

Data sources

Data Sources — Landscape Note (Area F)

Summary + sample · full document is 2,054 words

Summary

Pinning down what's actually obtainable, cheaply, before designing models around feeds we'll never have. The bounding constraint on most amateur stacks is data, not modelling skill.

Sample

Tiers of football data

Roughly, in increasing order of richness, cost, and access friction:

  1. Results + odds — every match, full coverage, free or near-free.
  2. Aggregated stats (shots, possession, xG totals) — public for top leagues.
  3. Event data (per-action rows: passes, dribbles, tackles, shots) — partly free at sample scale, paid at production scale.
  4. Tracking data (25 Hz positions of all 22 players + ball) — mostly paid, broadcast-derived options narrowing the gap.
  5. 360 / freeze-frames (off-ball player positions at the moment of each event) — middle ground, partial access via StatsBomb Open Data.
  6. Lineups, injuries, late team news — distributed across feeds, tightly time-sensitive, the bottleneck for any lineup-aware model.
  7. Live odds / market microstructure — paid APIs or scraped exchange data.

Full document

Pro

Want the full document?

Data Sources — Landscape Note (Area F) runs 2,054 words. Pro members get every research note in full, plus the arbitrage feed, model outputs, and weekly updates.

Sign in for Pro access