Code Review Quality

The Rubber Stamp Problem

Large PRs receive 18x fewer review comments per line of code. The bigger the change, the less anyone looks.

18x

Less Scrutiny

for large PRs vs tiny PRs

Based on 3,387,250 merged PRs | GitHub Archive / BigQuery | December 2024

The Scrutiny Cliff

As pull requests grow larger, review quality doesn't just decline—it collapses. Our analysis shows that review comments per 100 lines of code drops from 0.91 for tiny PRs to just 0.05 for massive PRs.

Review Comments per 100 Lines of Code

<1010-5050-200200-500500-1K1000+00.250.50.751
PR SizePRs AnalyzedNo Review %Comments/100 Lines
Tiny (<10 lines)998,20570.6%0.91
Small (10-50)763,79872.6%0.64
Medium (50-200)670,84974.7%0.41
Large (200-500)377,38176.5%0.25
XL (500-1000)204,53278%0.16
Massive (1000+)372,48582.6%0.05

Why This Happens

The phenomenon isn't surprising when you consider human psychology. Reviewing code requires sustained attention and mental effort. When faced with a 1000+ line diff, reviewers experience what researchers call cognitive overload.

Review Fatigue

Studies show attention declines sharply after 200-400 lines. Beyond that, reviewers shift from "finding issues" to "getting through it."

Context Loss

Large PRs often touch many files. Reviewers lose the mental model of how pieces connect, making it harder to spot subtle bugs.

Time Pressure

A 1000-line PR might take hours to review properly. Busy engineers often approve quickly to unblock the author.

Trust Assumptions

"If they submitted this much code, they probably tested it thoroughly." Ironically, larger changes are more likely to contain bugs.

"Large PRs receive 18x fewer review comments per line of code. The biggest changes get the least attention. That's backwards."

The Security Implications

When 82.6% of massive PRs ship without any formal review, security vulnerabilities slip through. This isn't theoretical—it's how many high-profile breaches occur.

What Ships Unreviewed

  • Large refactoring PRs that introduce subtle regressions
  • Framework upgrades with hidden breaking changes
  • Bulk feature additions with security edge cases
  • Generated code or vendor updates assumed to be safe

🔥 Our Take

If your team treats all PRs equally, you're doing it wrong.

Small PRs need quick turnaround. Large PRs need explicit review allocation—multiple reviewers, dedicated time, or automatic flags for additional scrutiny. One-size-fits-all review processes guarantee that your biggest changes get rubber-stamped.

What Teams Can Do

Set PR Size Limits

Enforce soft limits (warnings at 400 lines) and hard limits (block at 800 lines). Use tools like GitHub Actions to automatically flag oversized PRs.

Use Stacked Diffs

Break large changes into a chain of small, reviewable PRs. Tools like Graphite and gh-stack make this workflow manageable.

Scale Reviewers with Size

Large PRs should require multiple reviewers. Use CODEOWNERS files to automatically request appropriate domain experts.

Separate Mechanical Changes

Renames, formatting, and dependency updates should be separate PRs that can be reviewed quickly. Don't mix them with feature work.

Related Research

Methodology

This analysis is based on 3,387,250 merged pull requests from GitHub Archive / BigQuery during December 2024. "Review comments" includes all PR review comments and inline code comments. PRs were bucketed by total lines changed (additions + deletions). For full methodology, see the complete study.

See your team's PR size distribution

CodePulse shows you which PRs are getting rubber-stamped in your repos.