Every tick. Every venue. Queryable.
We're opening up the tick archive behind Flow. Every price change, every quote movement, every fill — captured from the moment each partner went live, across sportsbooks, peer-to-peer platforms, prediction markets, and sweeps. Backtest strategies, train models, measure execution quality against the real book. Launching in beta with a small set of quant teams.
# schema
ts_utc timestamp[ns, UTC]
partner string # kalshi | bettoredge | …
league string
contest_id string
position_hash string
side string # over | under | home…
best_price double
best_available double
depth list<struct>
consensus_fair double
is_live bool
# partitioned by league/contest_date
# delivered via S3 / GCS / HTTPBuilt for the buy side.
Same normalized schema that powers live Flow — planned as a historical archive you can query, export, or replay. When the beta opens, you won't be scraping six venues and reconciling clocks anymore.
Quant funds & prop desks
Prove alpha before you trade it. Backtest signals against the exact book your execution layer will see, partner-by-partner, including fragmentation and venue outages.
Market makers
Calibrate quoting against historical depth, measure realized slippage, and stress-test inventory curves across the full catalog of markets — not just a single venue.
Research & data science
Train models on normalized, venue-level tick data. Study price discovery across fragmented books. Build features on consensus pricing and distribution-fair values.
What the archive will ship with.
The archive builds on the same data plane that powers live Flow — captured, partitioned, and delivered in the formats your research stack already speaks.
Tick-level resolution
Every price change, every quote movement, every fill. Nanosecond-precision timestamps. Partner-native event boundaries preserved.
Venue-level depth
Full orderbook depth per partner at each tick — not just the top-of-book. Reconstruct the exact liquidity profile any trade faced.
Consensus & fair values
Pre-computed consensus prices and distribution-model fair values alongside the raw data. Skip the aggregation engineering.
Exports that fit your stack
Apache Parquet, Arrow, CSV, or line-delimited JSON. Delivered via S3, GCS, or pre-signed HTTP. Partitioned by league and contest date.
Replay API
Stream the historical archive back through the same Flow WebSocket schema. Your live engine and your backtest use identical code paths.
Provenance preserved
Every row carries the raw partner payload hash. Audit any datapoint back to the original API response — no reconciliation disputes.
What beta teams will build.
Four representative workloads the archive unblocks. Custom cuts — single-venue, single-league, or windowed around specific events — negotiated during beta onboarding.
- Strategy backtesting
- Replay any contest tick-by-tick. Compute the exact fills your strategy would have hit, venue-by-venue. No synthetic reconstructions — this is the book that actually existed.
- Signal research
- Hunt for lead-lag across partners. Study how fragmentation propagates. Quantify which venues are price-discovery leaders by contest type.
- Execution analysis
- Measure realized slippage, fill rates, and adverse selection against historical depth. Attribute execution quality per partner.
- Model training
- Pre-normalized features across every venue: best price, consensus fair, depth-weighted mid, partner divergence. Ready for any ML pipeline.
How beta onboarding will work.
When the archive goes live, we'll onboard teams directly so the first cut matches exactly what your research stack needs.
Scope the dataset
Leagues, contest types, partners, date range, depth level. We share a data dictionary and sample extract within 48h.
Sign the data agreement
Commercial terms, usage scope, redistribution rules. Standard contracts for funds and vendors; custom for bespoke cuts.
Delivery
Initial extract via S3 / GCS / HTTP. Daily incremental drops or a pull API — your choice. Schema-stable across releases.
Iterate
Add partners, extend history, request derived features (consensus, distribution fair, slippage curves). Schema versioned semantically.
Want to be in the first beta cohort?
We're onboarding quant teams directly when the archive opens. Tell us the leagues, partners, and date range you need — we'll get you a data dictionary, a sample extract, and commercial terms as soon as we launch.