The Record · Public audit
Best Bet hit rate · last 15 days
21 of 38 curated Best Bets hit
Most sports prediction sites only publish the picks they got right. We publish every single prediction, locked before first pitch, and every outcome. The good ones. The bad ones. All of it.
Results update nightly · Yesterday's games post by midnight ET
152/248
Slate hit rate
57.5%
15,241 predictions
Days tracked
64
zero retroactive edits
Best Bet win rate by direction
slate baseline 57.5%
151-94 · pred 59.7% · cal gap -1.9pp
62%
+4.1pp vs baseline
1-2 · pred 49.5% · cal gap +16.2pp
33%
-9.2pp vs baseline
Edge measures pct beat over picking that side blind. Cal gap is (predicted - actual): positive = model overconfident, negative = model underclaims. Tested live since 2026-05-18 - fades reserved 1 slot per slate even when over edges rank higher.
◆ Closing Line Value
COMING ONLINECLV tracking is coming online.
The most defensible proof a betting model is +EV is closing-line value - the difference between our pick probability at lock time and where the market settled by first pitch. We're instrumenting captures from FanDuel + DraftKings at T-10min from first pitch. The framework is live; the first captures land on the next slate that the book-line scraper covers. We publish positive AND negative CLV - same rule as the rest of the audit.
Capture: closing-line snapshots from FanDuel + DraftKings at T-10min from first pitch for each of our Best Bets. We capture both Yes and No sides so we can de-vig.
De-vigging: Quoted prob / (sum-of-both-sides quoted prob). Standard market-neutral method - strips the book's margin from the quoted price so we're comparing apples to apples.
CLV per pick: our locked probability minus the de-vigged book probability, expressed in percentage points. Positive = we priced the pick more aggressively than the market settled. Negative = the market thought we were too aggressive.
Publication threshold: 500 resolved captures. Below that the average CLV is dominated by variance - a run of cold matchups could swing the number 0.5pp+ without changing the underlying signal. We publish progress but not the average until the sample clears the threshold.
◆ Track Record · 15,241 predictions · 64 days
2026-04-09 → 2026-06-12
◆ All picks
57.5%
8,765 / 15,241 picks hit
Each dot is one out of every 100 predictions. Filled = the pick hit. Hollow = it didn't. Every prediction is locked before first pitch and matched to the box score after the game resolves - wins and losses both. Nothing is removed or edited.
Average predicted probability: 53.04% vs actual hit rate: 57.51%. The model is conservative - predictions came in below actual outcomes.
Calibration error (5-bin): 4.47 pp - sample-weighted mean of |bin predicted − bin actual| across the 5 quintile buckets. 0 pp = perfectly calibrated; under 1 pp is excellent.
Brier score (Hit): 0.2473 - mean of (predicted − outcome)². Lower is better. 0 = oracle; 0.25 = coin flip; under 0.24 means the probabilities carry information.
Per-stat Brier: HR 0.1013 (n=15,241), K 0.2240 (n=8,822). HR baseline is ~0.029 (3% league HR rate squared); K baseline ~0.18. Lower beats baseline.
Sample: every prediction we ever made (locked pre-game, never edited) where the game has resolved. n = 15,241 across 64 days. Best Bets = 248 curated picks.
Why a dot grid instead of a calibration plot: research across NYT, FT, 538, Polymarket, Whoop, ESPN BET converged on a single pattern - one big number, one iconic shape, one comparison. A 100-dot grid renders the percentage literally; you can count to verify. The calibration plot lives below in the Edge Audit panel and on the dedicated /calibration page for sharp readers.
◆ Calibration proof
Across 15,241 graded picks, locked before first pitch and matched to the official box score. A 56% pick is supposed to lose 44% of the time. Here is the receipt.
◆ Edge Audit · 64 days
2026-04-09 → 2026-06-12
+3.79 ppBest Bet lift over slate baseline
Across 248 curated picks, our Best Bet hit rate is 61.3% vs a slate-wide baseline of 57.5% (n = 15241). Statistical confidence: <90% (z = 1.22).
Top-5 by calibrated hit probability with CI-width tiebreak. Persisted server-side at lock time.
Hits / N
152 / 248Rate
61.3%Δ vs slate
+3.79 ppPure model output: top-5 by calibrated hitProb. No additional curation.
Hits / N
205 / 320Rate
64.1%Δ vs slate
+6.56 pp0-100 composite advantage score. Noisier than hitProb - sorting by it loses signal.
Hits / N
192 / 320Rate
60.0%Δ vs slate
+2.50 ppNaive baseline - qualified hitters ranked by 2026 OBP. Tests whether our model beats a one-stat dumb sort.
Hits / N
212 / 320Rate
66.3%Δ vs slate
+8.75 ppEvery prediction made - n = 15241. The actuarial baseline.
Hits / N
8764 / 15241Rate
57.5%Δ vs slate
—Sanity check - 200 random samples/day, mulberry32 seeded.
Hits / N
36806 / 64000Rate
57.5%Δ vs slate
+0.01 ppSample: every prediction we ever made (locked pre-game, never edited) where the game has resolved. n = 15241 predictions across 64 days of MLB action. Best Bets = 248 curated picks (~5/day).
Lift definition: our hit rate minus the slate baseline hit rate. The slate baseline is the rate at which any player who appeared in our predictions got at least one hit that day. It IS already filtered to confirmed lineups, so it's already a high bar.
Random baseline: 200 random 5-player samples per day with a seeded RNG (mulberry32, seed=42 - same number every render). Matches the slate rate to within 0.2pp, which is the sanity check that we're not silently sampling the head.
Significance: simple two-sample z-test on Best Bet hit rate vs slate, assuming binomial. z = 1.22, p < 0.01 when |z| > 2.58. The interval is approximate - the true test requires accounting for non-independence within games and across days. Take the number as directional, not publication-grade.
What this audit does NOT include yet: a naive baseline like "top-5 by season OBP" - that requires fetching season stats per player at prediction time. Closing-line value (CLV) - would require pulling book lines daily. Both are in the queue. Until they ship, lift-over-slate is the cleanest defensible edge metric.
Hit Accuracy
↑ 29.8pp vs MLB avg
15,241 predictions · 0 removed
vs league baseline
Cumulative hit rate · every day
Apr 9 to Jun 11
Each point is the running accuracy through that date, settling toward the true rate as the sample grows.
Tier Performance
Hits · n = 15241Top-5 picks per day, curated by calibrated hit probability with tight CI.
95% CI [55.5, 67.5]· Wilson interval
95% CI [59.2, 64.6]
95% CI [55.5, 61.4]
95% CI [57.9, 62.5]
95% CI [55.6, 57.4]
Lift = actual rate − predicted rate. Positive lift means we under-predicted that tier; negative means we over-predicted. Calibrated picks should show near-zero lift across tiers.
7/14/30-day windows within 3pp - consistent, not streaky
Best Verified Calls
Highest-confidence hits - locked before first pitch, published win or lose.
Every Day Since Launch
Click any day · see every call we made
Nailed It
highest confidence hitsMissed Calls
high confidence, wrong - we publish everythingAccuracy Over Time
Running average drawing in real time
57.5%
Full Prediction Log
n = 15,241Every record locked before first pitch · nothing removed
◆ Pro Feature
Full History Unlocked on Pro
7-day free trial · $40/mo after · one hit pays the month
Full prediction history, all three metrics, player-level trends, and a daily AI brief before games start.
Get Pro →7-day free trial · cancel anytime
◆ Best Bets · top 5/day
61.7%
153 / 248 picks hit
What this audit does NOT replace: the EdgeAuditPanel above (lift over slate baseline). CLV is the market test; the slate-baseline lift is the internal test. Both are real, both are published unedited, neither is a substitute for the other.
◆ Pitcher track record
29-day window · 2026-05-14 to 2026-06-11
584 resolved · 18 pending
53.4%
95% CI [49-57%] · n=584
32.7%
95% CI [29-37%] · n=584
-0.19
actual - proj
-0.15
actual - proj
| Date | Starts | K Over | QS Rate | K Bias |
|---|---|---|---|---|
| 2026-06-11 | 14 | 43% (6/14) | 14% (2/14) | -1.06 |
| 2026-06-10 | 28 | 64% (18/28) | 43% (12/28) | +0.49 |
| 2026-06-09 | 30 | 60% (18/30) | 20% (6/30) | -0.23 |
| 2026-06-08 | 15 | 53% (8/15) | 27% (4/15) | -0.01 |
| 2026-06-07 | 29 | 55% (16/29) | 31% (9/29) | -0.33 |
| 2026-06-06 | 25 | 36% (9/25) | 24% (6/25) | -1.24 |
| 2026-06-05 | 28 | 54% (15/28) | 21% (6/28) | -0.35 |
| 2026-06-04 | 16 | 50% (8/16) | 38% (6/16) | -0.17 |
| 2026-06-03 | 29 | 52% (15/29) | 41% (12/29) | -0.33 |
| 2026-06-02 | 29 | 69% (20/29) | 21% (6/29) | +0.68 |
| 2026-06-01 | 16 | 50% (8/16) | 31% (5/16) | -0.51 |
| 2026-05-31 | 28 | 57% (16/28) | 32% (9/28) | +0.13 |
| 2026-05-30 | 30 | 47% (14/30) | 27% (8/30) | -0.31 |
| 2026-05-29 | 28 | 32% (9/28) | 21% (6/28) | -1.14 |
Per-stat Brier (lower = better)
Phase 2c calibration (per-stat Platt scaling) lights up after 30+ days of resolved outcomes