Skip to main content
All Guides
Metrics

2025 Engineering Benchmarks: How to Use Them Without Gaming

A practical guide to using engineering benchmarks effectively—choosing the right comparison, setting targets, and communicating to leadership without falling into metric traps.

12 min readUpdated January 29, 2026By CodePulse Team
2025 Engineering Benchmarks: How to Use Them Without Gaming - visual overview

Looking for 2025 engineering benchmarks? Here's the headline: 3-hour median cycle time for teams that do code review (based on 802,979 PRs we analyzed). But the number alone won't help you—this guide shows you how to actually use that benchmark without falling into the "hit the number" trap.

Quick Benchmarks (Enterprise Teams with Code Review)

3h
Median Cycle Time
149h
P90 Cycle Time
0.6h
Time to First Review
14.6%
Review Coverage

Want the full data, charts, and methodology?

View the complete 2025 Engineering Benchmarks Report →

The rest of this guide focuses on how to use these benchmarks—comparing your team fairly, setting realistic targets, and communicating progress to leadership.

Step 1: Choose the Right Benchmark for Your Context

The most common mistake with benchmarks is comparing apples to oranges. Global GitHub stats include solo hobby projects, automated bot PRs, and self-merged code. That's not your team.

The Two-Benchmark Problem

Our research surfaces two very different pictures:

BenchmarkMedian Cycle TimeWho It Represents
Global GitHub~0 hoursEveryone (71% self-merge, 85% no review)
Reviewed PRs Only3 hoursTeams with code review (14.6% of PRs)

"If your team does code review, compare to the 3-hour benchmark—not the instant-merge global average that includes hobby projects."

Decision tree showing how to choose between Global (0h median, all GitHub) and Enterprise (3h median, reviewed PRs) benchmarks based on whether your team requires code review
Most professional teams should use the Enterprise benchmark (3h median) as their baseline.

Action: Identify Your Comparison Group

Before looking at any benchmark, answer these questions:

  • Do you require code review? If yes, use enterprise/reviewed-PR benchmarks only
  • What's your team size? A 5-person startup ships differently than a 200-person enterprise
  • What industry are you in? Fintech with compliance review will be slower than a consumer app
  • What's your risk tolerance? Infrastructure teams need more review than marketing sites
Identify bottlenecks slowing your team with CodePulse

Step 2: Establish Your Baseline Before Comparing

Benchmarks are useless without knowing where you stand. Before you can improve, you need to measure your current state.

What to Measure

Your Baseline Checklist:

□ Median cycle time (PR open → merged)
□ P90 cycle time (your worst 10% of PRs)
□ Time breakdown by phase:
  - Waiting for review
  - In review
  - Waiting to merge
□ Cycle time by PR size
□ Review coverage (% of PRs with review)

Record these for the last 30, 60, and 90 days.
Trends matter more than snapshots.

📊 How to Get Your Baseline in CodePulse

Navigate to Dashboard to see your cycle time breakdown:

  • The cycle time widget shows median and breakdown by phase
  • Use the time filter to compare 30/60/90 day periods
  • Check the Executive Summary for trends over time

Why P90 Matters More Than Median

Your median might be 4 hours, but if your P90 is 72 hours, 10% of your PRs are stuck for 3+ days. That's where the real pain lives.

Benchmark comparison: Our research shows enterprise P90 at ~149 hours (6 days). If yours is worse, you have outlier PRs dragging down the team.

Step 3: Do a Gap Analysis (Not a Judgment)

Once you have your baseline and the right benchmark, calculate the gap. But here's the critical mindset shift:

🔥 Our Take

A gap to benchmark is not a failure. It's a diagnostic signal.

If your cycle time is 3x the benchmark, the question isn't "why are we bad?" It's "what's different about our context, and is that difference intentional?" Maybe you have compliance requirements. Maybe you're in a high-risk domain. Maybe you just haven't optimized review assignment. The gap tells you where to look, not what to feel.

Gap Analysis Framework

Four-step gap analysis flow: 1) Identify the gap (your metric vs benchmark), 2) Investigate the cause (waiting, review, merge, or intentional), 3) Decide if it's a problem, 4) Prioritize with impact/effort matrix
The gap analysis framework: Identify → Investigate → Decide → Prioritize
For each metric where you're above benchmark:

1. IDENTIFY the gap
   Your median: 12 hours
   Benchmark:   3 hours
   Gap:         4x slower

2. INVESTIGATE the cause
   □ Is it waiting time? (reviewer availability)
   □ Is it review time? (PR complexity/size)
   □ Is it merge delay? (CI/CD or approval process)
   □ Is it intentional? (compliance, risk management)

3. DECIDE if it's a problem
   □ Does this gap hurt delivery?
   □ Does it frustrate developers?
   □ Is there a business cost?
   □ Or is it acceptable for your context?

4. PRIORITIZE action
   □ High impact + easy fix → Do now
   □ High impact + hard fix → Plan it
   □ Low impact → Ignore it

Step 4: Set Targets That Don't Backfire

Here's where most teams go wrong. They see a benchmark, turn it into a target, and watch their team game the metric. Don't do this.

Wrong Way: Benchmark as Target

❌ "Our OKR: Achieve 3-hour median cycle time by Q2"

What happens:
- PRs get smaller (gaming)
- Reviews get rushed (quality drops)
- People stop opening PRs until ready to merge (hiding work)
- Numbers improve, delivery doesn't

Right Way: Improvement from Baseline

✅ "Our goal: Reduce median cycle time by 25% from current baseline"

Why this works:
- Progress is relative to YOUR starting point
- No arbitrary external number to game
- Improvement is always possible
- Context is preserved

"The benchmark tells you what's possible. Your baseline tells you where to start. The target should be a percentage improvement, not the benchmark itself."

Target-Setting Formula

  1. If you're within 2x of benchmark: Target 15-25% improvement over 6 months
  2. If you're 2-5x above benchmark: Target 30-40% improvement, focus on the biggest bottleneck phase
  3. If you're 5x+ above benchmark: Don't target the number. Target fixing one root cause (e.g., "reduce waiting time by 50%")
See your engineering metrics in 5 minutes with CodePulse

Step 5: Communicate Benchmarks to Leadership

Executives want context, not excuses. Here's how to present benchmark comparisons without sounding defensive or creating unrealistic expectations.

The Leadership Benchmark Brief

Executive Summary Template:

CURRENT STATE
Our median cycle time: 8 hours
Industry benchmark (reviewed PRs): 3 hours
Gap: 2.7x

CONTEXT
- We require 2 reviewers (benchmark assumes 1)
- Our PRs average 180 lines (benchmark median: 54)
- 30% of our PRs touch compliance-sensitive code

ROOT CAUSE
Primary bottleneck: Waiting for review (65% of cycle time)
This is a reviewer availability issue, not a process issue.

IMPROVEMENT PLAN
Target: Reduce to 5 hours (-37%) by end of Q2
Actions:
1. Implement review load balancing
2. Add PR size guidelines (<200 lines)
3. Set 4-hour first-review SLA

EXPECTED OUTCOME
Based on similar teams, these changes typically yield
30-50% cycle time reduction within 90 days.

What Leadership Wants to Know

  • "How do we compare?" → Give them the benchmark with context
  • "Why the gap?" → Explain intentional vs. fixable differences
  • "What are we doing about it?" → Specific actions, not vague plans
  • "When will we improve?" → Realistic timeline with caveats

For more on executive communication, see our Board-Ready Engineering Metrics guide.

Step 6: Prevent Gaming Before It Starts

Every metric will be gamed if incentives are wrong. Here's how to use benchmarks without creating perverse incentives.

Gaming Patterns to Watch For

If You Measure...Teams Might Game By...Prevention
Cycle timeSplitting PRs artificially, rushing reviewsAlso track PR size, review depth, rework rate
PR countTiny PRs, splitting one change into manyTrack throughput (lines shipped) not count
Review timeRubber-stamp approvalsTrack review comments, change failure rate
Deployment frequencyDeploy empty changes, feature flags everythingTrack deployment with actual changes
Anti-gaming metric bundle showing Cycle Time as primary metric with three balancing metrics: PR Size (watch for artificial splitting), Rework Rate (watch for rushed reviews), and Review Depth (watch for rubber-stamp approvals)
Always track balancing metrics together. If cycle time improves but PR size drops and rework rises, that's gaming.

Anti-Gaming Principles

  1. Never use metrics for individual performance reviews. The moment Alice's bonus depends on cycle time, she'll optimize cycle time—not delivery.
  2. Track metric bundles, not single metrics. If cycle time improves but rework rate spikes, you haven't improved—you've just moved the problem.
  3. Make targets team-level, not individual. Teams can self-regulate; individuals will compete.
  4. Celebrate learning, not hitting numbers. "We found our bottleneck" is a win, even if the metric didn't improve yet.

For a deeper dive on this topic, see our Goodhart's Law in Engineering Metrics guide.

Step 7: Review and Recalibrate Quarterly

Benchmarks shift. Your context changes. What was relevant last year might not be now. Build a quarterly review habit. The teams that improve consistently aren't the ones with the best metrics—they're the ones who actually look at the data every quarter.

Quarterly Benchmark Review Checklist

Every quarter, ask:

CONTEXT CHECK
□ Has our team size changed significantly?
□ Have we added new compliance requirements?
□ Did we change our PR/review process?
□ Are we comparing to the same benchmark type?

PROGRESS CHECK
□ Did we improve from last quarter's baseline?
□ Are we closer to or further from the benchmark?
□ Did any improvements stick, or did we regress?
□ What worked? What didn't?

RECALIBRATION
□ Should we adjust our target based on learnings?
□ Is the benchmark still relevant to our context?
□ Do we need to add/remove metrics from our bundle?
□ What's the #1 focus for next quarter?

Quick Reference: 2025 Benchmarks

For detailed benchmark data, charts, and methodology, see our full 2025 Engineering Benchmarks research report. Here's the quick reference:

MetricEnterprise BenchmarkNotes
Median Cycle Time3 hoursFor PRs with code review
P90 Cycle Time149 hours~6 days for slowest 10%
Waiting for Review0.6 hoursMedian time to first review
Self-Merge Rate52%For reviewed PRs (71% globally)
Review Coverage14.6%% of PRs with formal review

FAQ

Should I share benchmarks with my team?

Yes, but with context. Share the benchmark, explain why it may or may not apply to your situation, and focus on improvement from baseline rather than hitting the external number.

What if we're way above benchmark and can't improve?

First, verify the benchmark is appropriate for your context. If you're a regulated industry with mandatory review processes, you might never hit startup benchmarks—and that's fine. Document why your context is different and track improvement within your constraints.

How do I know if a benchmark is trustworthy?

Look for: sample size, methodology transparency, date of data, and whether they distinguish between different types of teams. Survey-based benchmarks tend to be optimistic; actual measured data (like ours from GitHub Archive) is more reliable.

Should I benchmark against competitors?

Usually not directly—you don't have their data. Instead, benchmark against industry averages and track your own improvement. If you're shipping faster than last quarter, you're winning regardless of what competitors do.

See these insights for your team

CodePulse connects to your GitHub and shows you actionable engineering metrics in minutes. No complex setup required.

Free tier available. No credit card required.