AI Developer Tools Spotlight

How AI Developer Tools Ship Code

We analyzed Codex, Gemini CLI, and Claude Code. AI tool repos ship 70% faster with 6x better review engagement.

70%

Faster

than benchmark cycle time

Based on 2,033 merged PRs | GitHub Archive / BigQuery | January - October 2025

The 3 AI Tool Repos We Analyzed

Three major AI companies have open-sourced their developer tools in 2025. We pulled their GitHub data to see how the teams building AI assistants actually ship code.

OpenAI Codex

Merged PRs1,031

Contributors453

SinceApr 2025

Gemini CLI

Merged PRs948

Contributors523

SinceJun 2025

Claude Code

Merged PRs54

Contributors53

SinceFeb 2025

"Claude Code ships in 0.9 hours median—70% faster than the 3-hour enterprise benchmark."

Speed Comparison: Cycle Time

How long does it take from PR creation to merge? We compared median cycle times for reviewed PRs against the 3-hour enterprise benchmark (PRs with actual code review).

Median Cycle Time (Reviewed PRs)

Note: Claude Code has a smaller sample (54 PRs) but shows remarkably fast shipping for small, focused changes.

0.9h

Claude Code

2.3h

Codex

Benchmark

7.8h

Gemini CLI

"Gemini CLI has 86% review engagement vs 14.6% GitHub average—6x the rate of typical open source."

Review Culture: The 6x Difference

Only 14.6% of PRs on GitHub get any code review. These AI tool repos are dramatically different—Gemini CLI reviews 86% of its PRs.

Review Engagement Rate

OpenAI Codex

PRs Reviewed53%

Total PRs1,031

Gemini CLI

PRs Reviewed86%

Total PRs948

Claude Code

PRs Reviewed74%

Total PRs54

PR Size Strategy: Small and Focused

Claude Code ships remarkably small PRs—a median of just 18 lines. This aligns with best practices for fast, reviewable changes.

Claude Code median

Codex median

Gemini CLI median

Enterprise benchmark

Why it matters: Smaller PRs get reviewed faster, have fewer bugs, and are easier to revert. Claude Code's 18-line median shows a disciplined approach to incremental shipping.

What You Can Learn

These AI tool teams demonstrate practices any engineering org can adopt. Here are the key takeaways:

Codex: Faster Than Enterprise Teams

2.3h vs 3h reviewed PR benchmark

OpenAI ships 23% faster than typical team workflows

Gemini: Exceptional Review Culture

86% vs 14.6% GitHub average

86% of Gemini PRs go through review (6x the GitHub rate)

AI Tools Are Human-Driven

<3% vs 15.5% GitHub average

Bot PRs nearly absent—these are human-crafted projects

Gemini Reviews Are Thorough

65% vs 32% reviewed PR benchmark

65% of Gemini PRs have review comments (2x typical)

Our Take

AI tool teams are practicing what they preach.

These aren't just fast because they're small teams—they're fast because they've adopted the practices that matter: small PRs, high review engagement, and minimal automation overhead. The 0.9-hour median for Claude Code isn't magic; it's the result of shipping 18-line changes that anyone can review in minutes. If you want to ship faster, start by making your changes smaller.

Adopt These Practices

Ship Small, Ship Often

Claude Code's 18-line median PRs prove that small changes ship faster. Break large features into reviewable chunks.

Enforce Code Review

Gemini's 86% review rate shows that review culture is intentional. Use branch protection and CODEOWNERS to make review the default.

Minimize Bot Noise

Under 3% bot PRs in these repos vs 15.5% GitHub average. Keep automation PRs separate or batched to reduce review fatigue.

Track Your Benchmarks

The 3-hour enterprise benchmark is your target. If you're above it, investigate where time is being lost.

Related Research

2025 Engineering Benchmarks

The full study with 800K+ PRs. See how these AI repos compare to the broader ecosystem.

The Enterprise Benchmark

Why the 3-hour median from reviewed PRs is the real benchmark for team workflows.

Cycle Time Decoded

Breaking down where PRs spend their time—92% is waiting, not reviewing.

The Rubber Stamp Problem 2025

90% of massive PRs ship without review. Why small PRs like Claude Code's are better.

Methodology

This analysis is based on 802,979 PRs from GitHub Archive / BigQuery during January - October 2025. We analyzed merged pull requests from three repositories: openai/codex (1,031 PRs), google-gemini/gemini-cli (948 PRs), and anthropics/claude-code (54 PRs). Cycle time is measured from PR creation to merge. Review rate is the percentage of PRs receiving at least one review event. For full methodology, see the complete 2025 study.

Ship like the AI teams

Track your cycle time, review engagement, and PR size with CodePulse.

Try CodePulse Free