MLB Betting Analytics: A Machine Learning Approach
The math, models, and market mechanics behind data-driven MLB betting — from how sportsbooks set lines to how XGBoost and Poisson distributions find edges in player props.
Updated March 2026 · 18 min read
1. How MLB Betting Markets Work
Before diving into models and math, you need to understand the markets you are betting into. MLB offers more daily betting volume than any other North American sport — 15 games a day, 162-game season, and dozens of player props per game. Here is how each market type works.
Moneyline
The moneyline is the simplest bet in baseball: pick which team wins. Odds are expressed in American format. A favorite is listed with a minus sign (e.g., NYY -150), meaning you risk $150 to win $100. An underdog is listed with a plus sign (e.g., BOS +130), meaning you risk $100 to win $130.
MLB moneylines are unique compared to other sports because the spread of odds is enormous. In the NFL, most games land between -110 and -200. In baseball, you regularly see -250 or -300 favorites when an ace like Gerrit Cole faces a bottom-tier rotation arm. On the extreme end, matchups can push past -400. These heavily juiced lines create both traps and opportunities — more on that in the edge detection section.
Run Line (-1.5 Spread)
The run line is baseball's version of the point spread, but it is almost always fixed at 1.5 runs rather than adjusted like NFL or NBA spreads. The favorite is listed at -1.5 (must win by 2+ runs), and the underdog at +1.5 (can lose by 1 run and still cover). Because the margin is fixed, the odds adjust around it — a heavy favorite might be -1.5 at -180, while a slight favorite might be -1.5 at +120.
Run line betting is particularly interesting because roughly 30% of MLB games are decided by exactly one run. That means the run line and moneyline disagree about one-third of the time, creating genuine structural opportunities for models that can estimate margin of victory, not just win probability.
Totals (Over/Under)
The totals market sets a combined run total for both teams — typically between 7.0 and 10.0 runs. You bet whether the actual total goes over or under. Totals are heavily influenced by starting pitching, park factors, weather (wind direction, temperature), and bullpen usage. A game at Coors Field with two mediocre starters might open at 11.5, while a deGrom vs. Ohtani matchup at a pitcher-friendly park might sit at 6.5.
Player Props
This is where the real analytical edge lives. Player props are bets on individual statistical outcomes within a game. The primary MLB prop markets include:
Hits: Over/under on a batter's hit total (typically O/U 0.5 or O/U 1.5)
Runs: Over/under on runs scored by a batter
RBIs: Over/under on runs batted in
Hits + Runs + RBIs (HRR): A combined stat line — the highest-volume prop market in MLB betting
Pitcher Strikeouts: Over/under on a starting pitcher's K total (e.g., Spencer Strider O/U 7.5 Ks)
Pitcher Walks: Over/under on bases on balls issued by the starter
Props are where sportsbooks embed the largest margins, but they are also where the most exploitable inefficiencies exist — because books have less data and less sharp action to calibrate these lines compared to sides and totals.
Live Betting
Live (in-play) betting allows you to place wagers after a game starts, with odds updating pitch-by-pitch. Live markets include adjusted moneylines, running totals, and player prop projections based on current game state. The speed of line movement creates opportunities for models that can process real-time data faster than the book can adjust. For example, if a pitcher records 5 strikeouts through 3 innings but the live K line has not moved from the pregame number, there is a window to bet the over before the book catches up.
How Our Live K Tracker Works
Our live strikeout tracker uses inning-by-inning XGBoost checkpoint models that update every 30 seconds during games. Each checkpoint fires as new innings complete, producing increasingly accurate projections as the game progresses. Here is how the projection evolves through a typical start:
| Inning | What Happens | Accuracy | Pitch Ceiling |
|---|---|---|---|
| Pre-game | Daily model projection (e.g., 7.2 K) | Baseline | N/A |
| Inning 1 | First live model fires using K rate, velocity, swinging strike % | ~68-75% | No |
| Inning 2 | Model refines with 2 innings of pitch data | ~74-80% | No |
| Inning 3 | Pitch count ceiling activates based on recent workload | ~80-85% | Active |
| Inning 4 | High-confidence projection, ceiling tightens | ~85-91% | Active |
| Inning 5+ | Near-final projection, pitcher nearing exit | ~91-96% | Active |
| Pulled | Projection locks at final K count | 100% | Locked |
The pitch count ceiling prevents unrealistic projections in late innings. If a pitcher has thrown 65 of his typical 80 pitches, the model caps the K projection based on estimated remaining batters rather than blindly extrapolating. This ceiling only activates at inning 3 and beyond — backtesting showed that applying it earlier hurts accuracy because the pitch pace estimate is too noisy with limited data.
For live bettors, the optimal window is between innings 1 and 3 when the model has signal but the live lines have not fully adjusted. By inning 4-5, the projection is highly accurate but the betting lines have typically caught up.
How to Use the Live K Probability Grid
The live tracker shows a probability grid for each pitcher at multiple lines (3.5, 4.5, 5.5, 6.5, 7.5 K). This grid is the core tool for identifying live betting opportunities. Here is how to read it with a real-world example:
Suppose Bryan Woo opens at O/U 5.5 strikeouts pre-game. After 4 innings, he has 5 K on 62 pitches with a 58% strike rate. The model projects him to finish with 7.7 K. His probability grid now reads:
| Line | Model Prob | Signal | Action |
|---|---|---|---|
| O/U 5.5 | 89% Over | Already cleared (5 K through 4 inn) | No value — line has moved past this |
| O/U 6.5 | 68% Over | Strong signal | This is your live play if the book offers 6.5 |
| O/U 7.5 | 52% Over | Coin flip | Pass — no edge |
| O/U 8.5 | 31% Over | Lean under | Pass unless book offers strong plus odds |
The process is simple: check what line your sportsbook is currently offering for that pitcher, then find that number on the probability grid. If the model shows 65% or higher in either direction at the live line, that is a play. If it is between 50-60%, it is a pass — the edge is within the model's margin of error.
A critical distinction: the model was backtested against pre-game static lines, not live moving lines. When we report 85% accuracy after inning 4, that means 85% accuracy versus where the line opened, not where it currently sits. By inning 4, the live line has already moved to reflect the game state. The real edge lives in the gap between the model's projection and the live line — and that gap is largest in innings 1 through 3, before the book fully adjusts.
As a rule of thumb, look for the model projection to be at least one full strikeout above or below the live line. The model's RMSE (average prediction error) is approximately 1.2 K after inning 4. If the model says 7.7 and the live line is 6.5, you have a 1.2 K edge — right at the RMSE boundary, which historically hits at 68%. If the model says 7.7 and the live line is 7.5, the 0.2 K gap is noise, not signal.
The Middle Play: Pre-Game + Live Hedge
One of the most powerful strategies enabled by the live tracker is the middle play — taking opposite sides of the same prop at different lines to create a window where both bets win.
Example: A pitcher is projected for 7.2 K pre-game with a line of O/U 7.5. You take UNDER 7.5 before the game. Through 5 innings, he has only 3 K on 70 pitches — clearly struggling. The model now projects 4.8 K and the live line has dropped to O/U 4.5. You take OVER 4.5 live.
Now you have UNDER 7.5 (pre-game) and OVER 4.5 (live). If the pitcher finishes with 5, 6, or 7 strikeouts, both bets win. You created a 3-strikeout window (5, 6, 7) where you profit on both sides. The only losing scenarios are 4 or fewer K (lose the over) or 8+ K (lose the under) — both of which are unlikely given the game state.
The live tracker makes this strategy practical by showing you exactly when the projection has diverged enough from the pre-game line to create a viable middle. Without real-time projections, you would be guessing when to place the second leg.
2. How Sportsbooks Set MLB Lines
Understanding how lines are created is the first step to understanding where they can be wrong. Sportsbooks are not trying to predict outcomes perfectly — they are trying to balance their book and guarantee a profit through the vig.
Implied Probability from Odds
Every set of odds implies a probability. Converting American odds to implied probability is straightforward:
For favorites (negative odds): Implied % = |odds| / (|odds| + 100)
For underdogs (positive odds): Implied % = 100 / (odds + 100)
Example: NYY -150 → 150 / (150 + 100) = 60.0%
Example: BOS +130 → 100 / (130 + 100) = 43.5%
Combined: 60.0% + 43.5% = 103.5% (the 3.5% overround is the vig)
The Vig (Juice)
The vig is the sportsbook's margin — the mathematical guarantee that they profit regardless of the outcome. On a standard -110/-110 line, both sides imply 52.4%, totaling 104.8%. That 4.8% overround is the house edge. On MLB moneylines, the vig is often higher because the odds spread is wider. A -200/+170 line totals about 104.3% overround, but a -300/+240 line totals about 104.2%.
Player props carry even more juice. A typical hits prop might be listed as O 0.5 (-180) / U 0.5 (+140), which implies 64.3% + 41.7% = 106.0% combined. Some books push prop vig to 108-112% on less liquid markets. This is critical context: the higher the vig, the larger your edge needs to be just to break even.
Line Movement: Opening vs. Closing
Lines open 12-24 hours before first pitch and move based on betting action. Sharp money (from professional bettors and syndicates) typically hits early, moving the opening line. Public money (recreational bettors) comes in closer to game time and tends to favor favorites, overs, and star players.
Closing lines are considered the most efficient odds in the market — they reflect the maximum amount of information. Studies consistently show that bettors who can beat the closing line are profitable long-term, regardless of their win rate on individual bets. This is called Closing Line Value (CLV), and it is the gold standard metric for evaluating betting performance.
Why Some Lines Are Juiced to -250 or -600
When the public heavily favors one side, sportsbooks do not just move the line to balance action — they also widen the vig. A -250 moneyline implies 71.4% win probability, but the true probability might be closer to 66-68%. The book knows recreational bettors will pay the premium to bet the “sure thing.” This is why blindly betting heavy favorites is a losing strategy: you are paying an inflated price for an outcome that is not as certain as the odds suggest.
3. The Math Behind Player Props
Player props are count-based statistics — a batter gets 0, 1, 2, or 3 hits, not 1.7 hits. This discrete nature makes them a natural fit for the Poisson distribution, one of the most important probability distributions in sports analytics.
Why Hits, Runs, and Walks Follow Poisson
The Poisson distribution models the number of times an event occurs in a fixed interval when events happen independently at a known average rate. A batter going to the plate 3-5 times per game and recording hits on some fraction of those at-bats fits this model closely. The distribution is defined by a single parameter: lambda (the expected average).
P(X = k) = (lambda^k * e^(-lambda)) / k!
where k = number of occurrences, lambda = expected rate
For a batter like Aaron Judge, who averages 1.2 hits per game over the last 30 days, lambda = 1.2. The Poisson distribution then gives exact probabilities for every outcome:
P(0 hits) = 30.1%
P(1 hit) = 36.1%
P(2 hits) = 21.7%
P(3+ hits) = 12.1%
P(Over 0.5 hits) = 1 - P(0 hits) = 69.9%
P(Over 1.5 hits) = 1 - P(0 hits) - P(1 hit) = 33.8%
Why O/U 0.5 Hits Is Always Juiced
Here is a truth that every MLB prop bettor needs to understand: the O/U 0.5 hits line is almost always a bad bet on the over. The average MLB hitter gets a hit in roughly 65-72% of games (depending on their batting average and plate appearances). Sportsbooks know this and price the over at -180 to -220, implying 64-69% probability. After accounting for the vig, the over barely breaks even even when it hits at the expected rate.
The real action is at O/U 1.5 hits. This is the line where the base rate is genuinely close to 50/50 for most hitters — meaning the sportsbook has to price it near even odds, and meaningful edges can emerge. A hitter with a lambda of 1.2 has roughly a 34% chance of going over 1.5, but a hitter with a lambda of 1.6 has roughly a 47% chance. That 13-point spread is where model precision creates profit.
CDF Approach vs. Binary Classification
There are two ways to model player props. The classifier approach trains a binary model: given these features, did this player go over or under the line? This works, but it throws away valuable information — specifically, the magnitude of the prediction.
The regressor + CDF approach is more powerful. Instead of predicting over or under directly, the model predicts the actual stat value (e.g., “Aaron Judge will get 1.35 hits today”). That prediction becomes the lambda parameter for the Poisson distribution, and the cumulative distribution function (CDF) calculates the exact probability of exceeding any line:
P(Over line) = 1 - Poisson_CDF(line, lambda=prediction)
Example: prediction = 1.35, line = 1.5
P(Over 1.5) = 1 - P(0) - P(1) = 1 - 0.259 - 0.350 = 39.1%
This approach is strictly superior because a single regression model can evaluate any line — O/U 0.5, 1.5, 2.5, or even alternate lines like 3.5. You do not need separate models for each threshold.
4. Feature Engineering for Baseball Models
In machine learning, the quality of your features determines the ceiling of your model. You can use the best algorithm in the world, but if you feed it bad inputs, you get bad outputs. Baseball is particularly rich for feature engineering because the game generates granular, structured data on every pitch, at-bat, and game.
Rolling Averages (Recency Windows)
Season-long averages are stable but slow to react. A batter who hit .230 in April but has hit .340 over the last two weeks is a fundamentally different hitter today. Rolling windows — typically the last 10, 15, or 30 games — capture current form. The tradeoff is variance: shorter windows are more responsive but noisier.
Effective models use multiple window lengths simultaneously. A 10-game rolling batting average captures hot/cold streaks. A 30-game average provides a more stable baseline. The model learns which timeframe is more predictive in different contexts — for example, short windows might matter more for power stats (home runs are streaky) while longer windows matter for contact stats (hit rate stabilizes faster).
Batter vs. Pitcher Matchups (BvP)
Historical batter-versus-pitcher data is one of the most intuitive features in baseball analytics. If Juan Soto is 8-for-15 lifetime against a specific starter, that history matters. However, BvP data is notoriously noisy because sample sizes are small — most matchups have fewer than 20 at-bats, which is not enough to draw reliable conclusions.
The solution is to use BvP data as one feature among many rather than a standalone signal. The model can weight it appropriately: high-sample BvP data (30+ ABs) gets meaningful signal, while low-sample data is effectively smoothed toward league averages by the ensemble of other features.
Platoon Splits (Left/Right Matchups)
Left-handed batters hit worse against left-handed pitchers. Right-handed batters hit worse against right-handed pitchers. This is one of the most persistent effects in baseball, and it is large enough to move prop lines by 10-15%. Encoding the platoon matchup (same-side vs. opposite-side) as a binary feature, plus interaction terms with the batter's split-specific stats, gives the model critical context.
For example, Shohei Ohtani's overall batting average might be .290, but his splits could be .315 vs. RHP and .245 vs. LHP. A model without platoon features would use .290 for every matchup — systematically overvaluing Ohtani against lefties and undervaluing him against righties.
Park Factors
Not all ballparks are created equal. Coors Field in Denver inflates offense by 15-25% due to altitude and thin air. Oracle Park in San Francisco suppresses home runs due to marine air and deep outfield dimensions. Park factors adjust the baseline expectation for every stat: hits, runs, home runs, and even strikeouts (some parks have larger strike zones due to umpire tendencies at specific venues).
A well-engineered park factor feature is not just a single number — it encodes how the park affects specific stat types. Yankee Stadium has a short right-field porch that inflates left-handed home runs specifically, while having a neutral effect on overall hits. Granular park factors that distinguish between stat types significantly improve player prop accuracy.
Opposing Team Quality
A batter facing a rotation arm with a 2.80 ERA and a bullpen ranked 3rd in the league is in a fundamentally different situation than one facing a 5.20 ERA starter backed by the worst bullpen in baseball. Features that encode opposing pitching quality — starter ERA, WHIP, K/9, BB/9, plus bullpen ERA and recent workload — give the model essential matchup context.
Why 42+ Features Beat Simple Averages
A naive model might use three features: batting average, opposing pitcher ERA, and home/away. An engineered model uses 42 or more features capturing rolling form, matchup dynamics, environmental factors, team context, and rest/travel effects. The performance gap is substantial.
The key insight is that baseball outcomes are driven by interaction effects. A left-handed power hitter facing a right-handed pitcher in Coors Field with the wind blowing out is a completely different situation than the same batter facing a lefty at Oracle Park. Simple averages cannot capture these interactions. Gradient-boosted trees can — and that is why they dominate.
5. XGBoost Regression + Poisson CDF
The core modeling pipeline for MLB player props combines XGBoost regression (to predict expected stat values) with Poisson CDF (to convert predictions into over/under probabilities). Here is why this architecture works and how the pieces fit together.
Why Gradient Boosting Outperforms Linear Models
Linear regression assumes a straight-line relationship between each feature and the outcome. In baseball, almost nothing is linear. The relationship between opposing pitcher K/9 and a batter's hit probability is not a straight line — it might be flat until K/9 exceeds 9.0, then drop sharply. The relationship between temperature and home run rate follows a curve, not a line. Platoon effects create discontinuities that linear models cannot represent.
XGBoost builds an ensemble of decision trees, where each tree corrects the errors of the previous ones. This architecture naturally captures nonlinear relationships, interaction effects (feature A matters only when feature B has a certain value), and threshold effects (performance changes at specific breakpoints). On tabular baseball data with 42+ features, XGBoost consistently outperforms linear regression, logistic regression, and even neural networks.
Regressor Approach vs. Classifier Approach
A classifier trained to predict “over 1.5 hits: yes or no” produces a probability for that specific line. If the sportsbook moves the line to 2.5, you need a completely different model. This approach is brittle and wasteful.
A regressor trained to predict “how many hits will this player get?” produces a continuous value — say, 1.42 hits. This single prediction can then be evaluated against any line using the Poisson CDF. One model handles every possible line, every alternate line, and every in-game adjustment. The regressor approach is more flexible, more data-efficient, and produces better-calibrated probabilities.
Using the Projection as Poisson Lambda
The XGBoost regression model outputs a prediction — for example, 6.8 strikeouts for a starting pitcher. This value becomes the lambda parameter in the Poisson distribution. From there, calculating the probability of any outcome is a direct computation:
Model prediction: lambda = 6.8 strikeouts
Sportsbook line: O/U 6.5 Ks
P(Over 6.5): 1 - Poisson_CDF(6, lambda=6.8) = 55.3%
Sportsbook implied: -120 odds = 54.5%
Edge: 55.3% - 54.5% = +0.8% (marginal, probably a pass)
Now compare a different matchup where the model predicts 7.9 Ks against the same 6.5 line:
Model prediction: lambda = 7.9 strikeouts
Sportsbook line: O/U 6.5 Ks
P(Over 6.5): 1 - Poisson_CDF(6, lambda=7.9) = 72.8%
Sportsbook implied: -130 odds = 56.5%
Edge: 72.8% - 56.5% = +16.3% (strong play)
Standard Deviation Floors: Preventing Overconfidence
One risk with the Poisson CDF approach is overconfident predictions. If a model predicts exactly 2.0 hits for a batter and you use lambda = 2.0, the Poisson distribution says there is an 86% chance of going over 0.5 hits. But the model's prediction has uncertainty that the Poisson distribution does not capture.
The solution is a standard deviation floor. Instead of using the raw Poisson CDF, the system applies a minimum variance adjustment that prevents any single prediction from producing an artificially narrow probability distribution. This is equivalent to saying “even when the model is very confident, maintain a minimum uncertainty band.” In practice, this means predictions near round numbers (like exactly 1.0 or 2.0) do not produce extreme probabilities that real-world variance would not support.
The floor is calibrated from historical prediction error — if the model's mean absolute error on hits predictions is 0.8, the effective variance should never drop below a threshold derived from that error rate. This keeps probabilities honest and prevents the model from outputting 90%+ confidence on propositions that actually hit at 75%.
6. Edge Detection: Model vs. Vegas
Having an accurate model is necessary but not sufficient for profitable betting. The model needs to disagree with the sportsbook — and be right when it disagrees. This is the concept of edge.
What Edge Means
Edge is the difference between your model's estimated probability and the sportsbook's implied probability. If your model says the over has a 62% chance of hitting and the sportsbook odds imply 52%, your edge is +10 percentage points. You are getting a bet with +EV (positive expected value) because you are paying a 52% price for a 62% outcome.
Crucially, edge is not the same as confidence. A model might be 85% confident that Aaron Judge gets a hit today — but if the sportsbook is also pricing it at 85% implied (via -550 odds), there is zero edge. You would need to risk $550 to win $100 on a proposition that the model agrees is priced correctly. Conversely, a model might assign only 55% probability to a prop — but if the book is pricing it at 45% implied, that 10-point gap is a strong edge despite modest confidence.
Why 55% Model Confidence at -110 Odds Is Profitable
Let us walk through the math with a concrete example:
Scenario: Model says 55% chance Over 1.5 hits, book offers -110
Implied probability at -110: 52.4%
Edge: 55.0% - 52.4% = +2.6%
EV per $100 bet:
EV = (0.55 x $90.91) - (0.45 x $100)
EV = $50.00 - $45.00 = +$5.00 per bet
Over 500 bets at this edge: ~$2,500 expected profit on $100 flat bets
This is the fundamental principle: you do not need to be right on every bet. You need to be right more often than the odds imply, consistently, over a large sample. A 2.6% edge on -110 odds translates to approximately 5% ROI — which, compounded over a full MLB season of daily bets, is substantial.
Good Prediction vs. Good Bet
This distinction trips up most bettors. A “good prediction” is one that correctly identifies the likely outcome. A “good bet” is one where the price you pay is less than the true probability warrants.
Consider two scenarios. In Scenario A, your model says the Dodgers have a 74% chance of beating the Rockies. The book offers -300, implying 75%. Your prediction is excellent (74% is very accurate), but the bet is bad — you are paying more than the true probability. In Scenario B, your model says the Marlins have a 38% chance of beating the Braves. The book offers +200, implying 33.3%. Your prediction says the Marlins probably lose, but the bet is good — you are getting a 38% outcome at a 33.3% price. Over time, making Scenario B bets consistently will make you money.
Expected Value Calculation
The expected value formula crystallizes everything into a single number:
EV = (P_win x Profit_if_win) - (P_lose x Stake)
Example: Mookie Betts Over 1.5 hits at +150
Model probability: 42%
Implied probability: 100/250 = 40%
EV = (0.42 x $150) - (0.58 x $100) = $63 - $58 = +$5.00
This bet loses 58% of the time but is still +EV because the payout compensates for the lower win rate.
7. Pitcher Walks: An Overlooked Market
Walk props are one of the least-bet, least-analyzed markets in MLB — which is exactly what makes them interesting. Sportsbooks dedicate less modeling effort to low-volume markets, and the resulting lines are often softer than hits, runs, or strikeout props.
Walk Distributions
The average MLB starting pitcher issues approximately 1.68 walks per start. The distribution is textbook Poisson:
0 walks: ~18.7% of starts
1 walk: ~31.4% of starts
2 walks: ~26.4% of starts
3 walks: ~14.8% of starts
4+ walks: ~8.7% of starts
P(Over 1.5 walks) = P(2+) = 49.9% (at league-average lambda)
This near-50/50 base rate at the O/U 1.5 line is what makes the market actionable. Small deviations in a pitcher's true walk rate create meaningful edges against a line that the book prices near even.
Control Metrics That Predict Walk Volume
The model uses several control-related features to predict walk volume for each start:
BB/9 (Walks per 9 innings): The primary rate stat. The league average is approximately 3.2 BB/9. A pitcher with 4.0+ BB/9 is a walk machine; under 2.0 is elite control.
K/BB Ratio: Strikeout-to-walk ratio is a proxy for command quality. A 3.0+ K/BB ratio indicates a pitcher who can challenge hitters without paying for it in free passes.
Zone% and Strike%: The percentage of pitches thrown in the strike zone and the percentage that result in strikes (including swinging strikes and foul balls). A pitcher whose zone% has dropped 5+ points over their last 3 starts is showing control deterioration that the season-long BB/9 has not yet captured.
First-Pitch Strike%: Pitchers who get ahead 0-1 in the count walk far fewer batters. A drop in first-pitch strike rate is an early warning indicator of walk risk.
How the Model Identifies High-Walk Starts
The walk model looks for convergence of factors: a pitcher with above-average BB/9, facing a patient lineup (high walk rate as a team), in a game context where the pitcher might nibble (e.g., facing a dangerous order with runners on base frequently). Recent form matters more than season averages for walks because control is highly variable start-to-start.
Consider a pitcher like Patrick Corbin, who has historically posted a 3.8-4.2 BB/9. Against a patient Phillies lineup that ranks top-5 in walks drawn, on a night where the O/U walk line is set at 1.5 with -115 odds (implying about 53.5%), the model might project 2.4 walks (lambda = 2.4). The Poisson CDF gives P(Over 1.5) = 69.2% — a massive 15+ point edge. These are the types of inefficiencies that exist in thin markets that the public ignores.
8. Why Most Bettors Lose
Sportsbooks are profitable businesses for a reason. Estimates suggest that 95-97% of sports bettors lose money over the long term. Understanding why most people lose is essential to avoiding the same traps.
The Vig Math: You Need 52.4% to Break Even
At standard -110 odds on both sides, a bettor must win 52.4% of bets to break even. Here is the derivation:
At -110: Risk $110 to win $100
Breakeven win rate: $110 / ($110 + $100) = 52.38%
At -120: Risk $120 to win $100 → Breakeven = 54.5%
At -150: Risk $150 to win $100 → Breakeven = 60.0%
At -200: Risk $200 to win $100 → Breakeven = 66.7%
Most recreational bettors win around 48-50% of their bets. That 2-4 point gap between their actual win rate and the breakeven threshold translates to slow, steady losses that compound over time. On player props where the vig is higher (-130 to -180 on the popular side), the breakeven climbs to 56-64%, making profitable blind betting essentially impossible.
Recency Bias
Humans overweight recent events. A batter who went 4-for-4 yesterday feels like he is “hot” — but a single game is meaningless noise in a 162-game season. The actual statistical evidence for hot and cold streaks in baseball is weak; most streaks are consistent with random variation. A model trained on 10-30 game windows with proper feature engineering incorporates recency appropriately. A bettor reacting to last night's box score does not.
Results-Oriented Thinking
This is the most insidious cognitive bias in betting. A bettor places a +EV bet that loses and concludes the model is wrong. Another bettor places a -EV bet that wins and concludes they are a genius. Neither conclusion is correct. A single bet tells you almost nothing about whether the decision was good or bad. Only a large sample (200+ bets minimum) reveals whether a strategy is profitable.
Professional bettors and quant funds think in terms of expected value per bet and track performance over thousands of wagers. They celebrate good process (correctly identifying +EV spots) rather than good results (winning a single bet). This mindset shift is the single most important difference between profitable bettors and everyone else.
The Favorite-Longshot Bias
Decades of research show that bettors systematically overbet longshots and underbet favorites. In MLB terms, the public loves parlaying three underdogs at +150 each for a massive payout, but hates laying -200 on a heavy favorite for a modest return. The result is that longshot odds are consistently worse than fair value (the book does not need to price them accurately because demand is artificially high), while moderate favorites sometimes offer genuine value because the public avoids them.
This bias is especially pronounced in player props. The “over” on a home run prop (+280) sounds exciting. The “under” at -400 sounds boring. But the under might be the +EV side — a fact the bettor who only chases big payoffs will never discover.
Volume and Discipline Beat Picking Winners
The uncomfortable truth about profitable betting is that it is boring. It requires betting small, consistent stakes on every +EV opportunity the model identifies — not swinging for the fences on “lock of the year” plays. A bettor who places 300 disciplined, flat-unit bets per month with a 2% average edge will almost certainly profit over a season. A bettor who places 5 max-bet plays per week based on gut feeling will almost certainly go broke.
The MLB season is uniquely suited to volume-based strategies. With 15 games per day and 20+ props per game, the model generates hundreds of evaluated opportunities daily. The sample size required for edge to materialize — typically 300-500 bets — can be reached within a single month. No other major sport offers this kind of daily volume during its season.
Put the Math to Work
Prediction Engine runs these models daily across every MLB game — player props, team predictions, and live in-game trackers. See today's edges for yourself.