Q: How do we handle items that are genuinely different sizes?

A: Over a sufficient time window (4-6 sprints), size variation averages out. Your historical throughput already includes the mix of small and large items your team naturally works on. If you have occasional truly massive items, break them down or track them separately. Most "large" items can be split into multiple PRs anyway. Our free sprint capacity calculator turns that throughput history into a realistic sprint commitment.

Q: Will smaller PRs mean more overhead?

A: Initially, yes. You will have more PRs to review. But smaller PRs review faster (a 100-line PR takes 15 minutes to review; a 500-line PR takes 2 hours). The net time is less, and the feedback loop is faster. After 2-3 sprints, teams report that smaller PRs feel lighter, not heavier.

Q: What if management still wants story point velocity reports?

A: Give them what they need while internally using what works. Report story points for compliance, but make decisions based on throughput. Over time, educate stakeholders on why throughput is a better predictor. Show them the accuracy comparison. Most leaders care about predictability more than any specific methodology.

Q: How quickly can we see improvement?

A: PR size reduction shows impact within 1-2 sprints. WIP limits typically hurt before they help (expect a dip in sprint 1, improvement by sprint 3). Review bottleneck fixes are immediate but require enforcement. Overall, expect measurable spillover reduction within 4-6 sprints if you address your dominant cause.

Q: Is some spillover acceptable?

A: Yes. Targeting 100% sprint completion every time means you are sandbagging. 85-90% completion is healthy: it means you are challenging yourselves while remaining realistic. Below 70% consistently indicates a systemic problem. Above 95% consistently suggests you are not committing to enough.

Sprint Spillover Analysis: Why 70% of Sprints Miss and How to Fix It

Q: Q: We use Scrum and story points are required. What do we do?

A: You can use story points for relative sizing discussions while using throughput for actual forecasting. Run both in parallel: do your Planning Poker, but track items completed (not points completed) for your predictions. After a few months, you will have data showing which method is more accurate. Most teams find throughput wins by a wide margin.

Your team committed to 40 story points. You delivered 28. Again. Sprint after sprint, the pattern repeats: optimistic planning, frantic mid-sprint scrambles, and demoralized retrospectives asking "why do we keep missing?" The answer is not "estimate better." The answer is that story points are the wrong tool. This guide shows how to use PR data to predict spillover before it happens and break the cycle of missed commitments.

"Story points measure effort guesses. PRs measure actual completions. One of these is useful for forecasting. Hint: it is not the one that requires a meeting."

Sprint spillover is not a team discipline problem. It is a visibility problem. When you cannot see that a sprint is going sideways until day 8 of 10, intervention comes too late. But PR data gives you leading indicators by day 3. This guide introduces the Spillover Early Warning System: a framework for predicting and preventing missed sprint commitments using metrics that do not require anyone to guess.

🔥 Our Take

Story points are a relic of a pre-data era. They persist because teams are comfortable with them, not because they work.

Every study on estimation accuracy shows the same thing: teams overestimate what they can complete. Planning Poker does not fix this because it democratizes the optimism bias instead of eliminating it. Throughput-based forecasting using your PR history is more accurate, requires less ceremony, and cannot be gamed. The only reason to keep story points is organizational inertia. If you want to actually predict when work will ship, use data that measures completions, not intentions.

The Spillover Problem: Why 70% of Sprints Miss

Approximately 70% of sprints do not complete all committed work. This is not a few teams doing poorly. This is the default outcome of sprint-based planning. Our burndown charts and completion-rate guide shows how to read burndown trajectories to predict spillover early, not retrospectively.

The Statistics Are Damning

Metric	Industry Average	Source
Sprints with spillover	68-72%	State of Agile Reports 2021-2024
Average commitment completion rate	72-78%	Scrum.org benchmarks
Teams that regularly hit 90%+ completion	Less than 20%	VersionOne surveys
Stories re-estimated mid-sprint	35-40%	Rally/Broadcom data
Velocity prediction accuracy (story points)	+/- 25-40%	Multiple studies

Compare this to throughput-based forecasting (counting completed items rather than estimated points): teams using historical throughput for forecasting typically achieve +/- 10-15% accuracy. Same teams, same work, different measurement system.

"We do not have an estimation problem. We have a measurement problem. Story points measure intentions. PRs merged measure reality. One of these correlates with delivered value."

Why Traditional Sprint Planning Fails

Failure Mode	What Happens	How PR Data Fixes It
Planning Fallacy	Teams imagine best-case scenarios	Historical throughput includes actual interruptions
Anchoring Bias	First estimate sets the range	PR data is objective, no discussion needed
Social Pressure	Nobody wants to be the pessimist	Data does not feel social pressure
Point Inflation	Teams inflate points to hit velocity targets	PR count cannot be inflated without splitting work
Invisible Work	Bugs, support, meetings not in estimates	Historical throughput bakes in all overhead
Scope Creep	Work expands after commitment	Cycle time trends reveal mid-sprint changes

Identify bottlenecks slowing your team with CodePulse

Root Causes of Sprint Spillover (Data-Driven)

Spillover has patterns. When you analyze PR data from teams with chronic spillover, you see the same root causes repeatedly. Fixing spillover means identifying which pattern is dominant for your team.

The Spillover Causes Framework

Cause	PR Data Signal	Frequency	Fix
Oversized PRs	Average PR size >400 lines	35% of cases	Enforce smaller PRs, break work down
Review Bottlenecks	Wait-for-review > Coding time	25% of cases	Review SLAs, load balancing
High WIP	>2 active PRs per developer	20% of cases	WIP limits, finish before starting
Cycle Time Variance	Standard deviation > mean	10% of cases	Reduce blockers, improve process
Late-Sprint Starts	>40% PRs opened in final third	10% of cases	Earlier starts, better task breakdown

How Each Cause Creates Spillover

Oversized PRs (35% of spillover)

Large PRs are the single biggest predictor of spillover. When a "2-day" story turns into a 700-line PR, it sits in review for 3 days instead of 4 hours. The math does not work.

PR Size Impact on Cycle Time:

PR Size (lines)    Avg Cycle Time    Review Wait    Spillover Risk
----------------------------------------------------------------
< 100              6-8 hours         1-2 hours      Low (5%)
100-300            1-2 days          4-8 hours      Moderate (15%)
300-500            2-4 days          1-2 days       High (35%)
500-1000           4-7 days          2-4 days       Very High (60%)
> 1000             7-14+ days        4-7+ days      Almost Certain (85%)

Every 100 lines above 200 adds ~0.5 days to cycle time.
A 800-line PR takes 4x longer than two 400-line PRs combined.

Review Bottlenecks (25% of spillover)

Work is done, but it cannot merge. PRs pile up waiting for the same 1-2 reviewers. By the time reviews happen, the sprint is over.

High WIP (20% of spillover)

When every developer has 3+ PRs open, none of them are getting finished. Context switching kills throughput. See our High Activity, Low Progress Guide for the detailed diagnosis.

Cycle Time Variance (10% of spillover)

Even if your average cycle time is fine, high variance means unpredictable delivery. Some PRs ship in hours, others take weeks. You cannot plan with that variance.

Late-Sprint Starts (10% of spillover)

Work that starts on day 7 of a 10-day sprint cannot complete in time. If more than 40% of your PRs open in the final third of the sprint, you are setting up for spillover before any code is written.

📊How to See This in CodePulse

Identify your dominant spillover cause:

Dashboard shows cycle time breakdown including wait-for-review time
Forecasting displays throughput trends and delivery predictions
Developer Analytics reveals WIP per developer and PR size patterns
Repository Metrics shows PR size distribution and merge rates

Using PR Data to Predict Spillover Risk

The advantage of PR data is that it gives you early warning. By day 3 of a sprint, you can predict with high accuracy whether you will complete your commitments.

The Three Key Predictors

Spillover Risk Score

Risk = (PR Size Factor x 0.4) + (Cycle Time Factor x 0.35) + (WIP Factor x 0.25)

Calculate spillover risk using weighted PR metrics. Each factor scales from 0-100. Total risk above 60 indicates high spillover probability.

Examples:

Healthy Sprint

Avg PR size: 180 lines (Factor: 20), Cycle time: 18 hours (Factor: 25), WIP/dev: 1.5 (Factor: 15)

= Risk: 20 (Low)

At-Risk Sprint

Avg PR size: 450 lines (Factor: 65), Cycle time: 72 hours (Factor: 70), WIP/dev: 3.2 (Factor: 80)

= Risk: 70 (High)

Interpretation:

0-30Low risk - on track for completion

31-50Moderate risk - monitor closely

51-70High risk - intervention needed

71-100Critical - reduce scope now

Factor Calculations

Factor	Calculation	Why It Matters
PR Size Factor	((Avg Lines - 100) / 5) capped at 100	Large PRs have exponentially longer cycle times
Cycle Time Factor	(Avg Hours / Sprint Hours) x 100	If one PR takes 50% of sprint time, you can only fit 2
WIP Factor	((Active PRs/Dev - 1) x 40) capped at 100	WIP above 2 means context switching is killing flow

Identify bottlenecks slowing your team with CodePulse

The Spillover Early Warning System

Sprint timeline showing checkpoints across Alignment, Progress, Risk Detection, and Completion phases with actions — Sprint health checkpoints: Catch problems early while there's still time to act

Stop waiting until the retrospective to discover your sprint failed. Implement checkpoints that surface problems while there is still time to act.

Sprint Health Checkpoints

Day 1-2: Launch Check

All sprint items have PRs or branches created
No items larger than 400 expected lines
Review assignments distributed evenly
WIP per developer at or below 2

Day 3-4 (Mid-Sprint): Momentum Check

At least 25% of PRs have received first review
No PR waiting more than 24 hours for review
Average cycle time on track (< 40% of sprint length)
No developer with 0 merged PRs yet

Day 6-7: Completion Check

At least 60% of sprint PRs merged
No PR with more than 2 review cycles
Items still in progress have clear path to merge
Scope cuts identified if needed

Day 8-10: Close-Out Check

All remaining PRs in final review
No new PRs being opened
Carryover items clearly identified
Root cause of any spillover documented

Warning Signs by Day

Sprint Warning Signs (10-day sprint example):

Day 2 Warnings (Severe Impact):
  - < 50% of items have PRs started
  - Any item estimated at > 5 days work
  - Review queue already building

Day 4 Warnings (High Impact):
  - < 20% of PRs merged
  - Average PR age > 48 hours
  - WIP per developer > 3

Day 6 Warnings (Moderate Impact):
  - < 50% of PRs merged
  - Any PR waiting > 72 hours for review
  - New scope added to sprint

Day 8 Warnings (Limited Recovery):
  - < 70% of PRs merged
  - PRs still being opened
  - Large PRs still in progress

After Day 8, spillover is likely unavoidable.
Intervention windows shrink rapidly.

"By day 4 of a 10-day sprint, your outcome is 80% determined. The signals are there. Most teams just do not look at them until the retrospective."

Automated Monitoring

Manual checks get skipped when teams are busy. Automate the warning system:

Daily dashboard review: Add sprint health to daily standup agenda
Threshold alerts: Notify when any metric crosses warning threshold
Trend detection: Alert when trajectory suggests missing deadline
Review queue monitoring: Alert when queue exceeds 8-hour SLA

🔔Setting Up Spillover Alerts in CodePulse

Create early warning alerts for spillover signals:

Navigate to Alert Rules
Create alert: Wait-for-review time exceeds 8 hours
Create alert: Any developer with more than 3 active PRs
Create alert: Average PR size exceeds 400 lines
Create alert: Cycle time exceeds 72 hours

Reducing Spillover Without Gaming Velocity

The goal is not to hit arbitrary story point targets. The goal is predictable, sustainable delivery. Here is how to reduce spillover through process improvements rather than point manipulation.

Strategy 1: Right-Size Your PRs

The single highest-impact change you can make. Smaller PRs mean faster reviews, fewer merge conflicts, and more predictable cycle times.

Before	After	Impact
One 800-line PR per feature	Three 250-line PRs per feature	40% faster cycle time
Reviews take 2-4 hours	Reviews take 20-30 minutes	Reviewers stay engaged
Merge conflicts common	Merge conflicts rare	Less rework
Feedback late in process	Feedback early and often	Less wasted effort

Strategy 2: Implement WIP Limits

Stop starting and start finishing. A strict WIP limit forces completion before new work begins. Counterintuitively, doing less at once means delivering more over time.

WIP Limit Implementation:

Step 1: Measure current WIP
  - Count active PRs per developer right now
  - Typical finding: 3-5 per person

Step 2: Set limit at current - 1
  - If average is 4, set limit at 3
  - This is the "easy" step

Step 3: Reduce by 1 every 2 weeks
  - 3 -> 2 is the hard step
  - Stay at 2 - this is sustainable

Step 4: Enforce
  - New work cannot start until WIP < limit
  - Blocked? Help unblock, don't work around it

Expected results:
  - Week 1-2: Painful adjustment, velocity dips
  - Week 3-4: Flow improves, cycle time drops
  - Week 5+: Throughput increases 20-40%

Strategy 3: Fix Review Bottlenecks

Set SLAs: First review within 4 hours, all reviews within 24 hours
Distribute load: No one reviews more than 2x team average
Make it easy: Smaller PRs get reviewed faster
Protected time: Block 1-2 hours daily for review

Strategy 4: Use Throughput for Planning

Instead of estimating story points, count items. Your historical throughput tells you how many items you can complete per sprint. Use that number.

Throughput-Based Sprint Planning

Sprint Capacity = (Avg Items/Sprint x 0.85) - Known Interruptions

Use historical throughput with a 15% buffer. This accounts for estimation optimism and unexpected work.

Examples:

Historical Performance

Last 6 sprints avg: 12 items, Buffer (15%): -1.8 items, Planned PTO: -1 item

= Plan for 9 items

Interpretation:

Buffer85% of average accounts for bad weeks

ResultCommit to 9, stretch goal of 12

For more on transitioning from estimation to throughput, see our Stop Estimating, Start Forecasting guide.

Strategy 5: Commit vs Forecast Separation

Not everything in a sprint needs to be a commitment. Separate what you are committing to from what you are forecasting as stretch goals:

Commit (70% of capacity): What you will definitely complete
Stretch (20% of capacity): What you will attempt if things go well
Reserve (10% of capacity): Buffer for unplanned work

For more on capacity planning, see our Capacity Planning with PR Data guide.

Identify bottlenecks slowing your team with CodePulse

Frequently Asked Questions

A: You can use story points for relative sizing discussions while using throughput for actual forecasting. Run both in parallel: do your Planning Poker, but track items completed (not points completed) for your predictions. After a few months, you will have data showing which method is more accurate. Most teams find throughput wins by a wide margin.

Action Plan: This Sprint

This Week

Baseline your metrics: Check your current average PR size, cycle time, and WIP per developer in Dashboard
Calculate your risk score: Use the formula above to see your current spillover risk
Identify dominant cause: Which of the five causes matches your data most closely?

This Sprint

Implement checkpoints: Add the Day 3-4 momentum check to your standup routine
Set one improvement target: Focus on your dominant cause only
Track daily: Monitor the key metric for your target cause

This Quarter

Transition to throughput: Start tracking items completed alongside story points
Compare accuracy: After 4-6 sprints, compare throughput predictions to point-based predictions
Reduce ceremony: As throughput proves more accurate, reduce time spent on estimation meetings

For related guidance, see our guides on Stop Estimating, Start Forecasting, Capacity Planning with PR Data, and High Activity, Low Progress.

Sprint Spillover Analysis: Why 70% of Sprints Miss and How to Fix It

See these metrics for your own team

🔥 Our Take

The Spillover Problem: Why 70% of Sprints Miss

The Statistics Are Damning

Why Traditional Sprint Planning Fails

Root Causes of Sprint Spillover (Data-Driven)

The Spillover Causes Framework

How Each Cause Creates Spillover

Oversized PRs (35% of spillover)

Review Bottlenecks (25% of spillover)

High WIP (20% of spillover)

Cycle Time Variance (10% of spillover)

Late-Sprint Starts (10% of spillover)

📊How to See This in CodePulse

Using PR Data to Predict Spillover Risk

The Three Key Predictors

Spillover Risk Score

Examples:

Interpretation:

Factor Calculations

The Spillover Early Warning System

Sprint Health Checkpoints

Day 1-2: Launch Check

Day 3-4 (Mid-Sprint): Momentum Check

Day 6-7: Completion Check

Day 8-10: Close-Out Check

Warning Signs by Day

Automated Monitoring

🔔Setting Up Spillover Alerts in CodePulse

Reducing Spillover Without Gaming Velocity

Strategy 1: Right-Size Your PRs

Strategy 2: Implement WIP Limits

Strategy 3: Fix Review Bottlenecks

Strategy 4: Use Throughput for Planning

Throughput-Based Sprint Planning

Examples:

Interpretation:

Strategy 5: Commit vs Forecast Separation

Frequently Asked Questions

Action Plan: This Sprint

This Week

This Sprint

This Quarter

See these insights for your team

See These Features in Action

Related Guides

Story Points Are a Scam. Here's What Actually Works

Stop Guessing Capacity. Your PRs Already Know

Lots of Commits, No Features: The Productivity Illusion

Agile vs DevOps: Why the Debate Is Missing the Point