Skip to main content
All Guides
Delivery

Sprint Spillover Analysis: Why 70% of Sprints Miss and How to Fix It

Analyze sprint spillover patterns to predict and prevent missed commitments. Use PR data to build an early warning system for sprint risk.

12 min readUpdated February 1, 2026By CodePulse Team
Sprint Spillover Analysis: Why 70% of Sprints Miss and How to Fix It - visual overview

Your team committed to 40 story points. You delivered 28. Again. Sprint after sprint, the pattern repeats: optimistic planning, frantic mid-sprint scrambles, and demoralized retrospectives asking "why do we keep missing?" The answer is not "estimate better." The answer is that story points are the wrong tool. This guide shows how to use PR data to predict spillover before it happens and break the cycle of missed commitments.

"Story points measure effort guesses. PRs measure actual completions. One of these is useful for forecasting. Hint: it is not the one that requires a meeting."

Sprint spillover is not a team discipline problem. It is a visibility problem. When you cannot see that a sprint is going sideways until day 8 of 10, intervention comes too late. But PR data gives you leading indicators by day 3. This guide introduces the Spillover Early Warning System: a framework for predicting and preventing missed sprint commitments using metrics that do not require anyone to guess.

🔥 Our Take

Story points are a relic of a pre-data era. They persist because teams are comfortable with them, not because they work.

Every study on estimation accuracy shows the same thing: teams overestimate what they can complete. Planning Poker does not fix this because it democratizes the optimism bias instead of eliminating it. Throughput-based forecasting using your PR history is more accurate, requires less ceremony, and cannot be gamed. The only reason to keep story points is organizational inertia. If you want to actually predict when work will ship, use data that measures completions, not intentions.

The Spillover Problem: Why 70% of Sprints Miss

Research from multiple agile studies shows a consistent pattern: approximately 70% of sprints do not complete all committed work. This is not a few teams doing poorly. This is the default outcome of sprint-based planning.

The Statistics Are Damning

MetricIndustry AverageSource
Sprints with spillover68-72%State of Agile Reports 2021-2024
Average commitment completion rate72-78%Scrum.org benchmarks
Teams that regularly hit 90%+ completionLess than 20%VersionOne surveys
Stories re-estimated mid-sprint35-40%Rally/Broadcom data
Velocity prediction accuracy (story points)+/- 25-40%Multiple studies

Compare this to throughput-based forecasting (counting completed items rather than estimated points): teams using historical throughput for forecasting typically achieve +/- 10-15% accuracy. Same teams, same work, different measurement system.

"We do not have an estimation problem. We have a measurement problem. Story points measure intentions. PRs merged measure reality. One of these correlates with delivered value."

Why Traditional Sprint Planning Fails

Failure ModeWhat HappensHow PR Data Fixes It
Planning FallacyTeams imagine best-case scenariosHistorical throughput includes actual interruptions
Anchoring BiasFirst estimate sets the rangePR data is objective, no discussion needed
Social PressureNobody wants to be the pessimistData does not feel social pressure
Point InflationTeams inflate points to hit velocity targetsPR count cannot be inflated without splitting work
Invisible WorkBugs, support, meetings not in estimatesHistorical throughput bakes in all overhead
Scope CreepWork expands after commitmentCycle time trends reveal mid-sprint changes
Identify bottlenecks slowing your team with CodePulse

Root Causes of Sprint Spillover (Data-Driven)

Spillover has patterns. When you analyze PR data from teams with chronic spillover, you see the same root causes repeatedly. Fixing spillover means identifying which pattern is dominant for your team.

The Spillover Causes Framework

CausePR Data SignalFrequencyFix
Oversized PRsAverage PR size >400 lines35% of casesEnforce smaller PRs, break work down
Review BottlenecksWait-for-review > Coding time25% of casesReview SLAs, load balancing
High WIP>2 active PRs per developer20% of casesWIP limits, finish before starting
Cycle Time VarianceStandard deviation > mean10% of casesReduce blockers, improve process
Late-Sprint Starts>40% PRs opened in final third10% of casesEarlier starts, better task breakdown

How Each Cause Creates Spillover

Oversized PRs (35% of spillover)

Large PRs are the single biggest predictor of spillover. When a "2-day" story turns into a 700-line PR, it sits in review for 3 days instead of 4 hours. The math does not work.

PR Size Impact on Cycle Time:

PR Size (lines)    Avg Cycle Time    Review Wait    Spillover Risk
----------------------------------------------------------------
< 100              6-8 hours         1-2 hours      Low (5%)
100-300            1-2 days          4-8 hours      Moderate (15%)
300-500            2-4 days          1-2 days       High (35%)
500-1000           4-7 days          2-4 days       Very High (60%)
> 1000             7-14+ days        4-7+ days      Almost Certain (85%)

Every 100 lines above 200 adds ~0.5 days to cycle time.
A 800-line PR takes 4x longer than two 400-line PRs combined.

Review Bottlenecks (25% of spillover)

Work is done, but it cannot merge. PRs pile up waiting for the same 1-2 reviewers. By the time reviews happen, the sprint is over.

High WIP (20% of spillover)

When every developer has 3+ PRs open, none of them are getting finished. Context switching kills throughput. See our High Activity, Low Progress Guide for the detailed diagnosis.

Cycle Time Variance (10% of spillover)

Even if your average cycle time is fine, high variance means unpredictable delivery. Some PRs ship in hours, others take weeks. You cannot plan with that variance.

Late-Sprint Starts (10% of spillover)

Work that starts on day 7 of a 10-day sprint cannot complete in time. If more than 40% of your PRs open in the final third of the sprint, you are setting up for spillover before any code is written.

📊How to See This in CodePulse

Identify your dominant spillover cause:

Using PR Data to Predict Spillover Risk

The advantage of PR data is that it gives you early warning. By day 3 of a sprint, you can predict with high accuracy whether you will complete your commitments.

The Three Key Predictors

Spillover Risk Score

Risk = (PR Size Factor x 0.4) + (Cycle Time Factor x 0.35) + (WIP Factor x 0.25)

Calculate spillover risk using weighted PR metrics. Each factor scales from 0-100. Total risk above 60 indicates high spillover probability.

Examples:
Healthy Sprint
Avg PR size: 180 lines (Factor: 20), Cycle time: 18 hours (Factor: 25), WIP/dev: 1.5 (Factor: 15)
= Risk: 20 (Low)
At-Risk Sprint
Avg PR size: 450 lines (Factor: 65), Cycle time: 72 hours (Factor: 70), WIP/dev: 3.2 (Factor: 80)
= Risk: 70 (High)
Interpretation:
0-30Low risk - on track for completion
31-50Moderate risk - monitor closely
51-70High risk - intervention needed
71-100Critical - reduce scope now

Factor Calculations

FactorCalculationWhy It Matters
PR Size Factor((Avg Lines - 100) / 5) capped at 100Large PRs have exponentially longer cycle times
Cycle Time Factor(Avg Hours / Sprint Hours) x 100If one PR takes 50% of sprint time, you can only fit 2
WIP Factor((Active PRs/Dev - 1) x 40) capped at 100WIP above 2 means context switching is killing flow
Identify bottlenecks slowing your team with CodePulse

The Spillover Early Warning System

Sprint timeline showing checkpoints across Alignment, Progress, Risk Detection, and Completion phases with actions
Sprint health checkpoints: Catch problems early while there's still time to act

Stop waiting until the retrospective to discover your sprint failed. Implement checkpoints that surface problems while there is still time to act.

Sprint Health Checkpoints

Day 1-2: Launch Check
  • All sprint items have PRs or branches created
  • No items larger than 400 expected lines
  • Review assignments distributed evenly
  • WIP per developer at or below 2
Day 3-4 (Mid-Sprint): Momentum Check
  • At least 25% of PRs have received first review
  • No PR waiting more than 24 hours for review
  • Average cycle time on track (< 40% of sprint length)
  • No developer with 0 merged PRs yet
Day 6-7: Completion Check
  • At least 60% of sprint PRs merged
  • No PR with more than 2 review cycles
  • Items still in progress have clear path to merge
  • Scope cuts identified if needed
Day 8-10: Close-Out Check
  • All remaining PRs in final review
  • No new PRs being opened
  • Carryover items clearly identified
  • Root cause of any spillover documented

Warning Signs by Day

Sprint Warning Signs (10-day sprint example):

Day 2 Warnings (Severe Impact):
  - < 50% of items have PRs started
  - Any item estimated at > 5 days work
  - Review queue already building

Day 4 Warnings (High Impact):
  - < 20% of PRs merged
  - Average PR age > 48 hours
  - WIP per developer > 3

Day 6 Warnings (Moderate Impact):
  - < 50% of PRs merged
  - Any PR waiting > 72 hours for review
  - New scope added to sprint

Day 8 Warnings (Limited Recovery):
  - < 70% of PRs merged
  - PRs still being opened
  - Large PRs still in progress

After Day 8, spillover is likely unavoidable.
Intervention windows shrink rapidly.

"By day 4 of a 10-day sprint, your outcome is 80% determined. The signals are there. Most teams just do not look at them until the retrospective."

Automated Monitoring

Manual checks get skipped when teams are busy. Automate the warning system:

  • Daily dashboard review: Add sprint health to daily standup agenda
  • Threshold alerts: Notify when any metric crosses warning threshold
  • Trend detection: Alert when trajectory suggests missing deadline
  • Review queue monitoring: Alert when queue exceeds 8-hour SLA

🔔Setting Up Spillover Alerts in CodePulse

Create early warning alerts for spillover signals:

  • Navigate to Alert Rules
  • Create alert: Wait-for-review time exceeds 8 hours
  • Create alert: Any developer with more than 3 active PRs
  • Create alert: Average PR size exceeds 400 lines
  • Create alert: Cycle time exceeds 72 hours

Reducing Spillover Without Gaming Velocity

The goal is not to hit arbitrary story point targets. The goal is predictable, sustainable delivery. Here is how to reduce spillover through process improvements rather than point manipulation.

Strategy 1: Right-Size Your PRs

The single highest-impact change you can make. Smaller PRs mean faster reviews, fewer merge conflicts, and more predictable cycle times.

BeforeAfterImpact
One 800-line PR per featureThree 250-line PRs per feature40% faster cycle time
Reviews take 2-4 hoursReviews take 20-30 minutesReviewers stay engaged
Merge conflicts commonMerge conflicts rareLess rework
Feedback late in processFeedback early and oftenLess wasted effort

Strategy 2: Implement WIP Limits

Stop starting and start finishing. A strict WIP limit forces completion before new work begins. Counterintuitively, doing less at once means delivering more over time.

WIP Limit Implementation:

Step 1: Measure current WIP
  - Count active PRs per developer right now
  - Typical finding: 3-5 per person

Step 2: Set limit at current - 1
  - If average is 4, set limit at 3
  - This is the "easy" step

Step 3: Reduce by 1 every 2 weeks
  - 3 -> 2 is the hard step
  - Stay at 2 - this is sustainable

Step 4: Enforce
  - New work cannot start until WIP < limit
  - Blocked? Help unblock, don't work around it

Expected results:
  - Week 1-2: Painful adjustment, velocity dips
  - Week 3-4: Flow improves, cycle time drops
  - Week 5+: Throughput increases 20-40%

Strategy 3: Fix Review Bottlenecks

  • Set SLAs: First review within 4 hours, all reviews within 24 hours
  • Distribute load: No one reviews more than 2x team average
  • Make it easy: Smaller PRs get reviewed faster
  • Protected time: Block 1-2 hours daily for review

Strategy 4: Use Throughput for Planning

Instead of estimating story points, count items. Your historical throughput tells you how many items you can complete per sprint. Use that number.

Throughput-Based Sprint Planning

Sprint Capacity = (Avg Items/Sprint x 0.85) - Known Interruptions

Use historical throughput with a 15% buffer. This accounts for estimation optimism and unexpected work.

Examples:
Historical Performance
Last 6 sprints avg: 12 items, Buffer (15%): -1.8 items, Planned PTO: -1 item
= Plan for 9 items
Interpretation:
Buffer85% of average accounts for bad weeks
ResultCommit to 9, stretch goal of 12

For more on transitioning from estimation to throughput, see our Stop Estimating, Start Forecasting guide.

Strategy 5: Commit vs Forecast Separation

Not everything in a sprint needs to be a commitment. Separate what you are committing to from what you are forecasting as stretch goals:

  • Commit (70% of capacity): What you will definitely complete
  • Stretch (20% of capacity): What you will attempt if things go well
  • Reserve (10% of capacity): Buffer for unplanned work

For more on capacity planning, see our Capacity Planning with PR Data guide.

Identify bottlenecks slowing your team with CodePulse

Frequently Asked Questions

Q: We use Scrum and story points are required. What do we do?

A: You can use story points for relative sizing discussions while using throughput for actual forecasting. Run both in parallel: do your Planning Poker, but track items completed (not points completed) for your predictions. After a few months, you will have data showing which method is more accurate. Most teams find throughput wins by a wide margin.

Q: How do we handle items that are genuinely different sizes?

A: Over a sufficient time window (4-6 sprints), size variation averages out. Your historical throughput already includes the mix of small and large items your team naturally works on. If you have occasional truly massive items, break them down or track them separately. Most "large" items can be split into multiple PRs anyway.

Q: Will smaller PRs mean more overhead?

A: Initially, yes. You will have more PRs to review. But smaller PRs review faster (a 100-line PR takes 15 minutes to review; a 500-line PR takes 2 hours). The net time is less, and the feedback loop is faster. After 2-3 sprints, teams report that smaller PRs feel lighter, not heavier.

Q: What if management still wants story point velocity reports?

A: Give them what they need while internally using what works. Report story points for compliance, but make decisions based on throughput. Over time, educate stakeholders on why throughput is a better predictor. Show them the accuracy comparison. Most leaders care about predictability more than any specific methodology.

Q: How quickly can we see improvement?

A: PR size reduction shows impact within 1-2 sprints. WIP limits typically hurt before they help (expect a dip in sprint 1, improvement by sprint 3). Review bottleneck fixes are immediate but require enforcement. Overall, expect measurable spillover reduction within 4-6 sprints if you address your dominant cause.

Q: Is some spillover acceptable?

A: Yes. Targeting 100% sprint completion every time means you are sandbagging. 85-90% completion is healthy: it means you are challenging yourselves while remaining realistic. Below 70% consistently indicates a systemic problem. Above 95% consistently suggests you are not committing to enough.

Action Plan: This Sprint

This Week

  1. Baseline your metrics: Check your current average PR size, cycle time, and WIP per developer in Dashboard
  2. Calculate your risk score: Use the formula above to see your current spillover risk
  3. Identify dominant cause: Which of the five causes matches your data most closely?

This Sprint

  1. Implement checkpoints: Add the Day 3-4 momentum check to your standup routine
  2. Set one improvement target: Focus on your dominant cause only
  3. Track daily: Monitor the key metric for your target cause

This Quarter

  1. Transition to throughput: Start tracking items completed alongside story points
  2. Compare accuracy: After 4-6 sprints, compare throughput predictions to point-based predictions
  3. Reduce ceremony: As throughput proves more accurate, reduce time spent on estimation meetings

For related guidance, see our guides on Stop Estimating, Start Forecasting, Capacity Planning with PR Data, and High Activity, Low Progress.

See these insights for your team

CodePulse connects to your GitHub and shows you actionable engineering metrics in minutes. No complex setup required.

Free tier available. No credit card required.