Looking for 2025 engineering benchmarks? Here's the headline: 3-hour median cycle time for teams that do code review (based on 802,979 PRs we analyzed). But the number alone won't help you—this guide shows you how to actually use that benchmark without falling into the "hit the number" trap.
Quick Benchmarks (Enterprise Teams with Code Review)
Want the full data, charts, and methodology?
View the complete 2025 Engineering Benchmarks Report →The rest of this guide focuses on how to use these benchmarks—comparing your team fairly, setting realistic targets, and communicating progress to leadership.
Step 1: Choose the Right Benchmark for Your Context
The most common mistake with benchmarks is comparing apples to oranges. Global GitHub stats include solo hobby projects, automated bot PRs, and self-merged code. That's not your team.
The Two-Benchmark Problem
Our research surfaces two very different pictures:
| Benchmark | Median Cycle Time | Who It Represents |
|---|---|---|
| Global GitHub | ~0 hours | Everyone (71% self-merge, 85% no review) |
| Reviewed PRs Only | 3 hours | Teams with code review (14.6% of PRs) |
"If your team does code review, compare to the 3-hour benchmark—not the instant-merge global average that includes hobby projects."
Action: Identify Your Comparison Group
Before looking at any benchmark, answer these questions:
- Do you require code review? If yes, use enterprise/reviewed-PR benchmarks only
- What's your team size? A 5-person startup ships differently than a 200-person enterprise
- What industry are you in? Fintech with compliance review will be slower than a consumer app
- What's your risk tolerance? Infrastructure teams need more review than marketing sites
Step 2: Establish Your Baseline Before Comparing
Benchmarks are useless without knowing where you stand. Before you can improve, you need to measure your current state.
What to Measure
Your Baseline Checklist: □ Median cycle time (PR open → merged) □ P90 cycle time (your worst 10% of PRs) □ Time breakdown by phase: - Waiting for review - In review - Waiting to merge □ Cycle time by PR size □ Review coverage (% of PRs with review) Record these for the last 30, 60, and 90 days. Trends matter more than snapshots.
📊 How to Get Your Baseline in CodePulse
Navigate to Dashboard to see your cycle time breakdown:
- The cycle time widget shows median and breakdown by phase
- Use the time filter to compare 30/60/90 day periods
- Check the Executive Summary for trends over time
Why P90 Matters More Than Median
Your median might be 4 hours, but if your P90 is 72 hours, 10% of your PRs are stuck for 3+ days. That's where the real pain lives.
Benchmark comparison: Our research shows enterprise P90 at ~149 hours (6 days). If yours is worse, you have outlier PRs dragging down the team.
Step 3: Do a Gap Analysis (Not a Judgment)
Once you have your baseline and the right benchmark, calculate the gap. But here's the critical mindset shift:
🔥 Our Take
A gap to benchmark is not a failure. It's a diagnostic signal.
If your cycle time is 3x the benchmark, the question isn't "why are we bad?" It's "what's different about our context, and is that difference intentional?" Maybe you have compliance requirements. Maybe you're in a high-risk domain. Maybe you just haven't optimized review assignment. The gap tells you where to look, not what to feel.
Gap Analysis Framework
For each metric where you're above benchmark: 1. IDENTIFY the gap Your median: 12 hours Benchmark: 3 hours Gap: 4x slower 2. INVESTIGATE the cause □ Is it waiting time? (reviewer availability) □ Is it review time? (PR complexity/size) □ Is it merge delay? (CI/CD or approval process) □ Is it intentional? (compliance, risk management) 3. DECIDE if it's a problem □ Does this gap hurt delivery? □ Does it frustrate developers? □ Is there a business cost? □ Or is it acceptable for your context? 4. PRIORITIZE action □ High impact + easy fix → Do now □ High impact + hard fix → Plan it □ Low impact → Ignore it
Step 4: Set Targets That Don't Backfire
Here's where most teams go wrong. They see a benchmark, turn it into a target, and watch their team game the metric. Don't do this.
Wrong Way: Benchmark as Target
❌ "Our OKR: Achieve 3-hour median cycle time by Q2" What happens: - PRs get smaller (gaming) - Reviews get rushed (quality drops) - People stop opening PRs until ready to merge (hiding work) - Numbers improve, delivery doesn't
Right Way: Improvement from Baseline
✅ "Our goal: Reduce median cycle time by 25% from current baseline" Why this works: - Progress is relative to YOUR starting point - No arbitrary external number to game - Improvement is always possible - Context is preserved
"The benchmark tells you what's possible. Your baseline tells you where to start. The target should be a percentage improvement, not the benchmark itself."
Target-Setting Formula
- If you're within 2x of benchmark: Target 15-25% improvement over 6 months
- If you're 2-5x above benchmark: Target 30-40% improvement, focus on the biggest bottleneck phase
- If you're 5x+ above benchmark: Don't target the number. Target fixing one root cause (e.g., "reduce waiting time by 50%")
Step 5: Communicate Benchmarks to Leadership
Executives want context, not excuses. Here's how to present benchmark comparisons without sounding defensive or creating unrealistic expectations.
The Leadership Benchmark Brief
Executive Summary Template: CURRENT STATE Our median cycle time: 8 hours Industry benchmark (reviewed PRs): 3 hours Gap: 2.7x CONTEXT - We require 2 reviewers (benchmark assumes 1) - Our PRs average 180 lines (benchmark median: 54) - 30% of our PRs touch compliance-sensitive code ROOT CAUSE Primary bottleneck: Waiting for review (65% of cycle time) This is a reviewer availability issue, not a process issue. IMPROVEMENT PLAN Target: Reduce to 5 hours (-37%) by end of Q2 Actions: 1. Implement review load balancing 2. Add PR size guidelines (<200 lines) 3. Set 4-hour first-review SLA EXPECTED OUTCOME Based on similar teams, these changes typically yield 30-50% cycle time reduction within 90 days.
What Leadership Wants to Know
- "How do we compare?" → Give them the benchmark with context
- "Why the gap?" → Explain intentional vs. fixable differences
- "What are we doing about it?" → Specific actions, not vague plans
- "When will we improve?" → Realistic timeline with caveats
For more on executive communication, see our Board-Ready Engineering Metrics guide.
Step 6: Prevent Gaming Before It Starts
Every metric will be gamed if incentives are wrong. Here's how to use benchmarks without creating perverse incentives.
Gaming Patterns to Watch For
| If You Measure... | Teams Might Game By... | Prevention |
|---|---|---|
| Cycle time | Splitting PRs artificially, rushing reviews | Also track PR size, review depth, rework rate |
| PR count | Tiny PRs, splitting one change into many | Track throughput (lines shipped) not count |
| Review time | Rubber-stamp approvals | Track review comments, change failure rate |
| Deployment frequency | Deploy empty changes, feature flags everything | Track deployment with actual changes |
Anti-Gaming Principles
- Never use metrics for individual performance reviews. The moment Alice's bonus depends on cycle time, she'll optimize cycle time—not delivery.
- Track metric bundles, not single metrics. If cycle time improves but rework rate spikes, you haven't improved—you've just moved the problem.
- Make targets team-level, not individual. Teams can self-regulate; individuals will compete.
- Celebrate learning, not hitting numbers. "We found our bottleneck" is a win, even if the metric didn't improve yet.
For a deeper dive on this topic, see our Goodhart's Law in Engineering Metrics guide.
Step 7: Review and Recalibrate Quarterly
Benchmarks shift. Your context changes. What was relevant last year might not be now. Build a quarterly review habit. The teams that improve consistently aren't the ones with the best metrics—they're the ones who actually look at the data every quarter.
Quarterly Benchmark Review Checklist
Every quarter, ask: CONTEXT CHECK □ Has our team size changed significantly? □ Have we added new compliance requirements? □ Did we change our PR/review process? □ Are we comparing to the same benchmark type? PROGRESS CHECK □ Did we improve from last quarter's baseline? □ Are we closer to or further from the benchmark? □ Did any improvements stick, or did we regress? □ What worked? What didn't? RECALIBRATION □ Should we adjust our target based on learnings? □ Is the benchmark still relevant to our context? □ Do we need to add/remove metrics from our bundle? □ What's the #1 focus for next quarter?
Quick Reference: 2025 Benchmarks
For detailed benchmark data, charts, and methodology, see our full 2025 Engineering Benchmarks research report. Here's the quick reference:
| Metric | Enterprise Benchmark | Notes |
|---|---|---|
| Median Cycle Time | 3 hours | For PRs with code review |
| P90 Cycle Time | 149 hours | ~6 days for slowest 10% |
| Waiting for Review | 0.6 hours | Median time to first review |
| Self-Merge Rate | 52% | For reviewed PRs (71% globally) |
| Review Coverage | 14.6% | % of PRs with formal review |
FAQ
Should I share benchmarks with my team?
Yes, but with context. Share the benchmark, explain why it may or may not apply to your situation, and focus on improvement from baseline rather than hitting the external number.
What if we're way above benchmark and can't improve?
First, verify the benchmark is appropriate for your context. If you're a regulated industry with mandatory review processes, you might never hit startup benchmarks—and that's fine. Document why your context is different and track improvement within your constraints.
How do I know if a benchmark is trustworthy?
Look for: sample size, methodology transparency, date of data, and whether they distinguish between different types of teams. Survey-based benchmarks tend to be optimistic; actual measured data (like ours from GitHub Archive) is more reliable.
Should I benchmark against competitors?
Usually not directly—you don't have their data. Instead, benchmark against industry averages and track your own improvement. If you're shipping faster than last quarter, you're winning regardless of what competitors do.
See these insights for your team
CodePulse connects to your GitHub and shows you actionable engineering metrics in minutes. No complex setup required.
Free tier available. No credit card required.
Related Guides
Your PR Cycle Time Is Fine (Here's the Benchmark)
What is a good PR cycle time? Benchmarks and targets based on team size, industry, and engineering maturity.
DORA Metrics Are Being Weaponized. Here's the Fix
DORA metrics were designed for research, not management. Learn how to use them correctly as signals for improvement, not targets to game.
Goodhart's Law in Software: Why Your Metrics Get Gamed
When a measure becomes a target, it ceases to be a good measure. This guide explains Goodhart's Law with real engineering examples and strategies to measure without destroying what you're measuring.
I Got $2M in Budget With These 5 Engineering Metrics
Learn how to create engineering metrics presentations that resonate with board members, investors, and C-suite executives.
