2025 Engineering Benchmarks: Year in Review

Executive Summary

A year after our landmark 2024 study, we analyzed over 800,000 merged pull requests from GitHub's public archive to see how code review practices have evolved. The trends are clear—and concerning.

The 2025 story: Self-merge rates climbed to 71%, massive PRs now ship without review 90% of the time, and bot PRs collapsed to just 15.5%. The automation boom is over—but the review gap is widening.

The Big Picture: 2024 → 2025

Here's what changed in the past year. Some trends accelerated, others reversed—but the overall picture is one of faster shipping with less oversight.

Self-Merge Rate

+3.45 pp

71.48%from 68.03%

More code shipping without peer review

Bot PRs

-22.37 pp

15.5%from 37.87%

Dramatic decline in automated PRs

No Review (1000+ lines)

+7.7 pp

90.3%from 82.6%

Large PRs getting even less scrutiny

First-Timer Wait Penalty

-15.5 pp

37.5%from 53%

Onboarding experience improving

Same-Day Merges

+2.6 pp

55.27%from 52.67%

Faster merge cycles

Peak Merge Day

Shifted

Wednesdaywas Monday

Merge patterns shifted mid-week

Two Benchmarks: Global GitHub vs Enterprise-Style Teams

GitHub's 800K+ monthly PRs include everything from solo hobby projects to enterprise teams. Comparing your team to "GitHub average" can be misleading. We've split the data into two benchmarks: all PRs (the full picture) and reviewed PRs only (more representative of team-based development).

All GitHub PRs

Includes solo projects, hobby repos, self-merged code

Total PRs802,979

Median Cycle Time0h (instant)

Self-Merge Rate71.48%

% with Review14.6%

Reviewed PRs Only

PRs with code review - more representative of team workflows

Total PRs117,413

Median Cycle Time3h

Self-Merge Rate52.11%

Weekend Merges17.18%

✓ Recommended benchmark for team comparisons

Why this matters: The "0 hour median cycle time" for all GitHub PRs reflects reality—most code ships instantly without review. But if your team does code reviews, the 3-hour median for reviewed PRs is a more meaningful benchmark. First-timers in reviewed repos wait 15.2 hours (vs 1.4h for repeat contributors)—a 10.9x penalty.

Cycle Time Breakdown: Where Time Goes

For PRs that go through code review, we can break down the cycle into three phases. This is the real data your team can benchmark against—and it maps directly to DORA's Lead Time for Changes metric.

Cycle Time Phases (Reviewed PRs, October 2025)

Waiting for Review

0.6h median / 128.9h P90

In Review

0h median / 9.4h P90

Merge Delay

0.1h median / 19.6h P90

Total Cycle

3h median / 148.9h P90

Based on 117,413 PRs that received at least one code review. The P90 (90th percentile) shows what "slow" looks like—useful for SLA planning.

Cycle Time by PR Size (Reviewed PRs)

Larger PRs take longer—but XL PRs (8.7h) actually wait longer than Massive (7.3h), suggesting massive PRs may be auto-generated or batch imports.

DORA Connection: Lead Time for Changes

Our "Total Cycle Time" maps to DORA's Lead Time for Changes (code committed → running in production). For reviewed PRs, the 3-hour median and 149-hour P90 give you real benchmarks. Elite teams target under 1 hour; high performers under 1 day; medium performers under 1 week.

The Review Crisis: 84-90% of PRs Ship Without Formal Review

The uncomfortable truth from 2024 has gotten worse. Across all PR sizes, the vast majority ship without any documented review process. For massive PRs (1000+ lines), 90% now ship without formal review—up from 83% in 2024.

PRs Without Formal Review by Size

Based on 788,166 merged PRs. "No formal review" = zero approvals, zero change requests, zero review comments.

"90% of pull requests over 1,000 lines ship without any code review—up from 83% last year."

What teams can do

1.Set up branch protection rules requiring at least one approval before merge
2.Use CODEOWNERS files to ensure domain experts review their areas
3.Track your team's review coverage rate as a metric (aim for >90% reviewed)

The Self-Merge Reality: 71% of Code Ships Without a Second Pair of Eyes

Self-merge rates climbed from 68% in 2024 to 71% in 2025. Nearly three-quarters of all code now ships without another developer clicking the merge button. The "code review culture" many teams claim to have is increasingly a fiction.

Who Merges the Code?

Merged by other
Self-merged

573,883 PRs self-merged

Author = Merger (71.48%)

228,935 merged by someone else

Different person reviewed (28.52%)

Up 3.5 percentage points from 2024 (68.03%)

"Self-merge rates hit 71%, meaning nearly three-quarters of code gets no peer review before shipping."

The Size Paradox: Large PRs Get 20x Less Scrutiny

The pattern from 2024 holds: the bigger the change, the less anyone looks at it. Review comments per 100 lines drops from 0.98 for tiny PRs to just 0.05 for massive ones—a 20x reduction in scrutiny per line.

Review Comments per 100 Lines of Code

As PR size increases, reviewers leave exponentially fewer comments per line of code—suggesting cognitive overload and "rubber-stamping."

Time to Merge by PR Size

Larger PRs take longer to merge (13h median for tiny vs 22h for massive), but not proportionally to their size—suggesting review depth doesn't scale with complexity.

What teams can do

1.Enforce PR size limits (e.g., warning at 400 lines, hard block at 1000)
2.Break large features into stacked PRs or feature flags
3.Require multiple reviewers for PRs above a size threshold

The First-Contribution Tax: New Contributors Wait 38% Longer

Good news: the first-contributor wait penalty dropped from 53% in 2024 to 38% in 2025. Teams are getting better at onboarding new contributors, but the gap remains significant.

Time to Merge: First-Time vs Repeat Contributors

First-time contributors

22h

median time to merge

P90: 280h (11.7 days)

21,289 PRs analyzed

Repeat contributors

16h

median time to merge

P90: 175h (7.3 days)

171,251 PRs analyzed

First-time contributors wait 37.5% longer (6 extra hours)

-15.5 pp from 2024

"Good news for newcomers: first-time contributor wait times dropped from 53% to 38% longer than veterans."

Wednesday is the New Monday: 1 in 4 PRs Merge Mid-Week

A significant shift from 2024: Wednesday has become the peak merge day at 23.55%, dethroning Monday (which held at 19% in 2024). This suggests teams are moving away from "clear the backlog Monday morning" toward more distributed workflows.

Merges by Day of Week

Wednesday dominates at 23.55%, followed by Tuesday (14.33%) and Monday (13.91%). In 2024, Monday led at 19.08%.

"Wednesday is the new Monday: 23.5% of PRs now merge mid-week, shifting from the traditional Monday peak."

The Bot Collapse: From 62% to 15.5% in Three Years

The most dramatic change since 2022: bot PRs (Dependabot, Renovate, CI automation) have collapsed from 62% at peak to just 15.5% in October 2025. The automation boom is definitively over. Teams are becoming far more selective about automated dependency updates.

Bot PRs as Percentage of All PRs (2020-2025)

2022 saw peak bot activity at 62%. By 2025, bot PRs dropped to 31.5% (full year) and just 15.5% in October—a four-fold reduction from peak.

Bot PRs (Oct 2025)

15.5%

475,905 PRs

Human PRs (Oct 2025)

84.5%

2,594,901 PRs

"Bot PRs collapsed from 62% (2022 peak) to just 15.5% in 2025. The automation boom is over."

The Always-On Culture: 25% of Code Ships on Weekends

A quarter of all code pushes happen on Saturday and Sunday. The "always-on" engineering culture continues, though weekend work has slightly decreased from 27% in 2024.

25.4%

Weekend Pushes

Down from 27% in 2024

Wednesday

Peak Merge Day

23.55% of merges

13.12%

Friday Merges

"Friday deploy" lives

Code Push Distribution by Day

"Same-day merges hit 55%, up from 53%. Code is shipping faster than ever."

Language Leaderboard: Fastest to Slowest

Different language ecosystems have different velocities. PowerShell leads with a 4h median (likely CI/automation scripts), while C is slowest at 24h (rigorous systems programming).

Language	Merged PRs	Median Hours	Avg PR Size
PowerShell	1,292	4h	234 lines
Dockerfile	1,174	11h	110 lines
Shell	5,230	14h	148 lines
Nix	1,366	14h	77 lines
TypeScript	43,177	15h	503 lines
JavaScript	16,904	15h	461 lines
Ruby	3,534	15h	174 lines
HCL	1,492	15h	133 lines
YAML	1,416	15h	29 lines
C#	6,762	16h	463 lines
HTML	5,903	16h	447 lines
Python	29,761	17h	371 lines
Vue	1,575	17h	526 lines
Scala	1,466	17h	139 lines
CSS	1,397	17h	496 lines

"PowerShell remains the speed king at 4 hours median merge time—6x faster than C at 24 hours."

Spotlight: How AI Developer Tools Ship Code

We took a closer look at three prominent AI-powered developer tools to see how their engineering teams ship compared to the GitHub average. The results reveal fascinating differences in velocity, review culture, and automation patterns.

Note: AI repos analysis uses January - October 2025 data to capture full activity for newer repos, while general GitHub stats are from October 2025.

Codex: Faster Than Enterprise Teams

2.3hvs 3h reviewed PR benchmark

OpenAI ships 23% faster than typical team workflows

Gemini: Exceptional Review Culture

86%vs 14.6% GitHub average

86% of Gemini PRs go through review (6x the GitHub rate)

AI Tools Are Human-Driven

<3%vs 15.5% GitHub average

Bot PRs nearly absent—these are human-crafted projects

Gemini Reviews Are Thorough

65%vs 32% reviewed PR benchmark

65% of Gemini PRs have review comments (2x typical)

Repo	Merged PRs	Median Merge	Self-Merge %	No Review %	Bot PRs %
OpenAI Codex 453 contributors	1,031	11h	73.5%	73.6%	2.9%
Gemini CLI 523 contributors	948	21h	58.4%	35.1%	2.4%
Claude Code 53 contributors	54	100h	68.5%	81.5%	0.0%
Reviewed PRs Benchmark	—	3h	52.1%	67.8%	15.5%

Cycle Time Comparison: AI Tools vs Enterprise Benchmark

How do AI tool repos compare to the reviewed PRs benchmark (3h median)?

In Review
Merge
Waiting

Claude Code: 0.9h

70% faster than benchmark

Codex: 2.3h

23% faster than benchmark

Gemini: 7.8h

2.6x longer (thorough reviews)

What we learned from AI tool repos

•Claude Code ships fastest at 0.9h median—70% faster than the enterprise benchmark. Small PRs (18 lines median) enable rapid iteration.
•Gemini CLI has exceptional review culture—only 35% of PRs ship without review comments vs 68% for reviewed repos. Google's code review rigor shows.
•Codex balances speed and review—53% of PRs have reviews (vs 14.6% GitHub-wide) while maintaining 2.3h median cycle time.
•All three are human-driven projects—bot PRs under 3% vs 15.5% average. AI tools are being hand-crafted by humans.

"Gemini CLI has 86% review engagement vs 14.6% GitHub average—6x the rate of typical open source."

Data Quality Alert: GitHub Archive Degradation

GitHub Archive Data Quality Alert

Starting November 2025, GitHub Archive event payloads have significantly reduced detail. Fields including user.login, additions, deletions, merged_by, and review_comments are no longer populated.

Affected Months

November 2025, December 2025

Last Good Month

October 2025

Impact

Many PR metrics (self-merge rate, PR size analysis, contributor analysis) cannot be calculated from November 2025 onwards.

Recommendation

Teams relying on GitHub Archive for analytics should monitor data quality and consider alternative data sources.

"GitHub Archive data quality degraded in late 2025—a warning sign for the analytics ecosystem."

Methodology

Data Source

All data comes from GitHub Archive, a public dataset that records all public GitHub events. We queried the data using Google BigQuery's public dataset.

Sample Size

Metric	Sample Size
PR size and review analysis	802,979 merged PRs
Unique repositories	262,212 repositories
Unique authors	191,099 developers
Self-merge analysis	802,818 merged PRs
Contributor analysis	192,540 merged PRs

Definitions

"No formal review": Zero approvals, zero change requests, and zero review comments on the PR
"Self-merged": The PR author is the same person who clicked merge
"First-time contributor": This is the author's first PR to this repository (within the analysis period)
PR size: additions + deletions in lines of code

YoY Comparison Notes

2024 data comes from December 2024 (3,387,250 PRs). 2025 data comes from October 2025 (802,979 PRs). Different months may have seasonal variations.

Limitations

Open source ≠ Enterprise: Public repos have different dynamics than private enterprise codebases. Volunteer contributors, different review cultures, and varying governance models affect metrics.
October vs December: Different months may have different patterns. October avoids holiday effects but may miss end-of-year patterns.
UTC timezone: All time-based analysis uses UTC, which may not reflect local business hours for globally distributed teams.
Survivorship bias: We only analyzed merged PRs. Abandoned or rejected PRs may have different patterns.
Data quality degradation: November-December 2025 GitHub Archive data has reduced payload detail, which is why we used October 2025 data.

2025 Engineering Benchmarks: Year in Review

Executive Summary

The Big Picture: 2024 → 2025

Two Benchmarks: Global GitHub vs Enterprise-Style Teams

All GitHub PRs

Reviewed PRs Only

Cycle Time Breakdown: Where Time Goes

Cycle Time Phases (Reviewed PRs, October 2025)

Cycle Time by PR Size (Reviewed PRs)

DORA Connection: Lead Time for Changes

The Review Crisis: 84-90% of PRs Ship Without Formal Review

PRs Without Formal Review by Size

What teams can do

The Self-Merge Reality: 71% of Code Ships Without a Second Pair of Eyes

Who Merges the Code?

The Size Paradox: Large PRs Get 20x Less Scrutiny

Review Comments per 100 Lines of Code

Time to Merge by PR Size

What teams can do

The First-Contribution Tax: New Contributors Wait 38% Longer

Time to Merge: First-Time vs Repeat Contributors

Wednesday is the New Monday: 1 in 4 PRs Merge Mid-Week

Merges by Day of Week

The Bot Collapse: From 62% to 15.5% in Three Years

Bot PRs as Percentage of All PRs (2020-2025)

The Always-On Culture: 25% of Code Ships on Weekends

Code Push Distribution by Day

Language Leaderboard: Fastest to Slowest

Spotlight: How AI Developer Tools Ship Code

Cycle Time Comparison: AI Tools vs Enterprise Benchmark

What we learned from AI tool repos

Data Quality Alert: GitHub Archive Degradation

GitHub Archive Data Quality Alert

Methodology

Data Source

Sample Size

Definitions

YoY Comparison Notes

Limitations

How Does Your Team Compare?

Previous Research