We analyzed Codex, Gemini CLI, and Claude Code. AI tool repos ship 70% faster with 6x better review engagement.
Faster
than benchmark cycle time
Based on 2,033 merged PRs | GitHub Archive / BigQuery | January - October 2025
Three major AI companies have open-sourced their developer tools in 2025. We pulled their GitHub data to see how the teams building AI assistants actually ship code.
"Claude Code ships in 0.9 hours median—70% faster than the 3-hour enterprise benchmark."
How long does it take from PR creation to merge? We compared median cycle times for reviewed PRs against the 3-hour enterprise benchmark (PRs with actual code review).
Note: Claude Code has a smaller sample (54 PRs) but shows remarkably fast shipping for small, focused changes.
0.9h
Claude Code
2.3h
Codex
3h
Benchmark
7.8h
Gemini CLI
"Gemini CLI has 86% review engagement vs 14.6% GitHub average—6x the rate of typical open source."
Only 14.6% of PRs on GitHub get any code review. These AI tool repos are dramatically different—Gemini CLI reviews 86% of its PRs.
Claude Code ships remarkably small PRs—a median of just 18 lines. This aligns with best practices for fast, reviewable changes.
18
Claude Code median
54
Codex median
74
Gemini CLI median
54
Enterprise benchmark
Why it matters: Smaller PRs get reviewed faster, have fewer bugs, and are easier to revert. Claude Code's 18-line median shows a disciplined approach to incremental shipping.
These AI tool teams demonstrate practices any engineering org can adopt. Here are the key takeaways:
2.3h vs 3h reviewed PR benchmark
OpenAI ships 23% faster than typical team workflows
86% vs 14.6% GitHub average
86% of Gemini PRs go through review (6x the GitHub rate)
<3% vs 15.5% GitHub average
Bot PRs nearly absent—these are human-crafted projects
65% vs 32% reviewed PR benchmark
65% of Gemini PRs have review comments (2x typical)
AI tool teams are practicing what they preach.
These aren't just fast because they're small teams—they're fast because they've adopted the practices that matter: small PRs, high review engagement, and minimal automation overhead. The 0.9-hour median for Claude Code isn't magic; it's the result of shipping 18-line changes that anyone can review in minutes. If you want to ship faster, start by making your changes smaller.
Claude Code's 18-line median PRs prove that small changes ship faster. Break large features into reviewable chunks.
Gemini's 86% review rate shows that review culture is intentional. Use branch protection and CODEOWNERS to make review the default.
Under 3% bot PRs in these repos vs 15.5% GitHub average. Keep automation PRs separate or batched to reduce review fatigue.
The 3-hour enterprise benchmark is your target. If you're above it, investigate where time is being lost.
The full study with 800K+ PRs. See how these AI repos compare to the broader ecosystem.
Read moreWhy the 3-hour median from reviewed PRs is the real benchmark for team workflows.
Read moreBreaking down where PRs spend their time—92% is waiting, not reviewing.
Read more90% of massive PRs ship without review. Why small PRs like Claude Code's are better.
Read moreThis analysis is based on 802,979 PRs from GitHub Archive / BigQuery during January - October 2025. We analyzed merged pull requests from three repositories: openai/codex (1,031 PRs), google-gemini/gemini-cli (948 PRs), and anthropics/claude-code (54 PRs). Cycle time is measured from PR creation to merge. Review rate is the percentage of PRs receiving at least one review event. For full methodology, see the complete 2025 study.
Track your cycle time, review engagement, and PR size with CodePulse.