Skip to main content
All Guides
Metrics

McKinsey's Developer Productivity Article Was Wrong. Here's Why.

McKinsey's 2023 article on measuring developer productivity sparked industry backlash. Here's what they got right, what they got wrong, and a better approach for VPs facing pressure to implement their recommendations.

16 min readUpdated January 31, 2026By CodePulse Team
McKinsey's Developer Productivity Article Was Wrong. Here's Why. - visual overview

In August 2023, McKinsey published "Yes, you can measure developer productivity"—and the engineering world erupted. Kent Beck called it "naive." Gergely Orosz wrote a 12,000-word rebuttal. The debate exposed a fundamental tension: executives want productivity metrics, but developers fear surveillance. This guide navigates that tension with a pragmatic alternative.

If you're a VP or Director being pressured to implement McKinsey-style measurement, this guide is for you. We'll examine what McKinsey recommended, where they went wrong, what they got right, and how to build a metrics program that provides executive visibility without destroying engineering culture.

"The goal isn't to prove developers are productive. It's to help them be more productive. Those require fundamentally different measurement approaches."

Before critiquing McKinsey's approach, let's be precise about what they actually said. The 2023 article, authored by a team including former Google and Microsoft executives, made several specific claims:

The Three-Level Framework

McKinsey proposed measuring productivity at three levels:

LevelWhat McKinsey ProposedExample Metrics
SystemOrganizational and process metricsDORA metrics, CI/CD health, platform efficiency
TeamCollective output and collaborationVelocity, sprint completion, team throughput
Individual"Contribution" of each developerCommits, code quality, story points completed

The Contested Claims

McKinsey made several assertions that sparked controversy:

  • "Quantitative and qualitative factors can be measured" at the individual developer level
  • "Software developer performance can be broken into inner loop and outer loop activities" with separate metrics for each
  • Proposed "contribution analysis" that appeared to track individual developer output
  • Referenced "Quality of engineer X's code" as a measurable attribute

The article included a visualization showing how to measure individual developer "contribution"—including metrics like "contribution per developer" and "amount of legacy code refactored." This is what triggered the firestorm.

The Developer Community Backlash

Kent Beck's Criticism

Kent Beck—creator of Extreme Programming, co-author of the Agile Manifesto, and someone who has spent decades thinking about developer productivity—called the McKinsey article "a naive approach to a complicated subject."

"I've been measuring software development for 40 years. I know what works and what doesn't. This article gets some things right about system-level metrics, but the individual metrics are dangerous."

Beck's core objections:

  • Knowledge work is fundamentally different from manufacturing—you can't measure widgets produced
  • Individual metrics create perverse incentives—developers will optimize for the metric, not the outcome
  • The best work is often invisible—mentoring, preventing problems, simplifying systems

Gergely Orosz's "Pragmatic Engineer" Rebuttal

Gergely Orosz, former Uber engineering manager and author of The Pragmatic Engineer newsletter, published a comprehensive 12,000-word response. His key points:

  • McKinsey has no credibility in software—they've never shipped software products, so why would developers trust their measurement framework?
  • The "opportunity sizing" framing is concerning—McKinsey estimates $300B in productivity gains, which reads as "we can help you squeeze more out of developers"
  • System-level metrics are fine; individual metrics are not—DORA works because it measures outcomes, not individuals
  • The companies that inspired the article (Google, Microsoft) have backed away from individual developer productivity metrics

"When a management consultancy tells you they can measure individual developer productivity, ask them: have they ever managed developers? Have they ever shipped software?"

See your engineering metrics in 5 minutes with CodePulse

The Industry Response

The backlash was swift and nearly unanimous among practitioners:

  • Dan North (BDD creator): "This is what happens when people who don't understand software try to industrialize it"
  • Martin Fowler (ThoughtWorks chief scientist): Drew attention to Goodhart's Law and the dangers of optimizing proxy metrics
  • Charity Majors (Honeycomb CTO): Highlighted that the best developers often ship fewer commits because they're unblocking others
  • Will Larson (Author of "Staff Engineer"): Noted that individual metrics punish senior engineers who spend time on architecture and mentorship

What McKinsey Got Right

In the rush to condemn McKinsey, some legitimate points got lost. Let's be fair about what they got right:

1. Executives Need Visibility

McKinsey correctly identified a real problem: engineering is often a black box to leadership. "Trust us" is not a strategy when you're accountable to a board for millions in R&D spending. Some form of measurement is necessary.

2. System-Level Metrics Work

The DORA metrics McKinsey referenced are well-validated. Deployment frequency, lead time, change failure rate, and MTTR genuinely correlate with organizational performance. McKinsey was right to include them.

3. Qualitative Assessment Matters

McKinsey suggested combining quantitative metrics with qualitative signals like code review quality and technical decision-making. This is actually the right approach— numbers alone miss too much.

4. Developer Experience Affects Output

The article acknowledged that developer experience (tooling, CI/CD speed, meeting load) affects productivity. Improving DX is often higher leverage than measuring individual output.

McKinsey RecommendationAssessment
Use DORA metrics at system levelValid - well-researched framework
Track team-level velocityPartially valid - if used for planning, not evaluation
Improve developer experienceValid - high leverage approach
Combine quantitative + qualitativeValid - essential for context
Measure individual "contribution"Problematic - creates gaming and surveillance
"Quality of engineer X's code"Problematic - poorly defined, easily gamed

Where McKinsey Went Wrong

/// Our Take

McKinsey's fundamental error wasn't proposing measurement—it was proposing individual measurement using output proxies. This approach misunderstands both the nature of knowledge work and the psychology of metrics.

When you measure individual developers using activity metrics (commits, PRs, story points), you create a surveillance culture that drives out intrinsic motivation, punishes collaboration, and incentivizes gaming. The metrics improve while actual outcomes deteriorate.

1. Treating Developers Like Manufacturing Workers

McKinsey's framework implicitly assumes software development is like manufacturing: more widgets = more value. But software development is knowledge work. The most valuable contribution might be:

  • Deleting 10,000 lines of legacy code (negative "productivity" by output metrics)
  • Spending a week mentoring a junior developer (zero commits)
  • Writing a design doc that prevents three months of wasted work (no code shipped)
  • Simplifying a system so it's easier for everyone to work on (fewer total features)

None of these show up in "contribution analysis." All of them are immensely valuable.

2. Ignoring Goodhart's Law

"When a measure becomes a target, it ceases to be a good measure." McKinsey's individual metrics would immediately be gamed:

  • Story points? Developers inflate estimates. A "3" becomes an "8."
  • Commits? Developers split work into tiny, meaningless commits.
  • Code quality scores? Developers write verbose code to please linters.
  • PRs merged? Developers avoid hard problems that take longer.

For a deeper dive, see our guide on Goodhart's Law in engineering metrics.

3. Destroying Psychological Safety

Google's Project Aristotle found that psychological safety is the #1 predictor of team performance. Individual productivity metrics destroy psychological safety:

  • Developers fear being seen as "low performers"
  • Helping others becomes costly (reduces your own output)
  • Admitting you don't understand something becomes risky
  • Asking for help signals weakness

The irony: McKinsey's framework for improving productivity would actually decrease it.

4. Punishing Senior Engineers

Senior and staff engineers often contribute most through activities that don't register in output metrics:

High-Value Senior ActivityImpact on Output Metrics
Reviewing PRs thoroughlyZero personal output
Pair programming with junior devsHalves "individual" productivity
Architecture and system designNo commits for weeks
Cross-team coordinationMeeting time, not coding time
Incident responseDisrupts planned work

5. The Consulting Business Model Conflict

There's an elephant in the room: McKinsey sells consulting services to reduce headcount. The article estimates "$300B in opportunity" from productivity gains. To executives, that reads as: "We can help you do more with fewer developers."

This isn't inherently wrong—efficiency matters—but it creates a credibility problem. When the measurement framework comes from a firm that profits from headcount reduction, developers are right to be skeptical.

A Better Approach: SPACE Framework + Team Metrics

There's a middle path between "no measurement" and "surveillance." It's based on research from Microsoft, GitHub, and academia: the SPACE framework.

Comparison of McKinsey's individual-focused approach vs SPACE framework's team-focused approach
The fundamental difference: McKinsey measures individuals, SPACE measures systems and teams

The SPACE Dimensions

DimensionWhat It MeasuresTeam-Level Metric
SatisfactionDeveloper happiness and well-beingQuarterly surveys, after-hours patterns
PerformanceOutcomes and qualityChange failure rate, customer incidents
ActivityVolume of work (use cautiously)Team PRs merged, reviews completed
CommunicationCollaboration and knowledge sharingReview network, cross-team collaboration
EfficiencyFlow and minimal frictionCycle time, wait time for review

Key Principles

  • Team-level metrics, not individual rankings. "Our team's cycle time is 48 hours" is useful. "Sarah's cycle time is worse than John's" is toxic.
  • Measure at least 3 of 5 dimensions. Single metrics create incentives to game. Balanced metrics create incentives to improve.
  • Combine quantitative + qualitative. Numbers tell you what; conversations tell you why.
  • Use metrics for understanding, not evaluation. "Why did cycle time increase?" is productive. "Who caused cycle time to increase?" is not.
Detect code hotspots and knowledge silos with CodePulse

What This Looks Like in Practice

Executive Dashboard (What Leadership Sees)
==========================================

Team Health Summary
-------------------
Cycle Time:        48h avg (↓12% from last quarter)
Deployment Freq:   3.2/day (↑8%)
Change Fail Rate:  4.2% (stable)
Review Coverage:   94% (↑2%)

Team Satisfaction
-----------------
Latest survey: 4.1/5.0 (↑0.2)
After-hours work: 8% of commits (target: <10%)

Delivery Velocity
-----------------
PRs merged: 127 this sprint
Avg PR size: 186 lines (healthy)
Review turnaround: 6h avg

What's NOT on this dashboard:
- Individual developer rankings
- Commits per person
- "Productivity scores"
- Stack rankings

How to Respond When Leadership Asks for McKinsey Metrics

If you're a VP or Director, you may face pressure from executives who read the McKinsey article. Here's how to navigate that conversation:

Step 1: Acknowledge the Legitimate Need

Don't dismiss the request. Executives need visibility. Engineering can't be a black box. Start by validating the underlying concern:

"You're right that we need better visibility into engineering performance. I want to show you what we're proposing—it addresses your concerns while avoiding some pitfalls with the McKinsey approach."

Step 2: Present the Evidence

Bring data to the conversation:

  • Google's research: Their DevOps Research and Assessment (DORA) explicitly measures teams, not individuals
  • Microsoft's SPACE: Developed by researchers including Nicole Forsgren (DORA co-creator), emphasizes multidimensional team measurement
  • Industry consensus: Point to the backlash from Kent Beck, Gergely Orosz, and other respected voices
  • Attrition risk: Individual surveillance metrics correlate with engineer attrition—replacing developers costs 50-200% of salary

Step 3: Propose the Alternative

Don't just say no—offer a better option:

Executive ConcernMcKinsey ApproachBetter Alternative
"Are we shipping fast enough?"Individual commits/PRsTeam deployment frequency + cycle time
"Is our quality acceptable?""Code quality of engineer X"Team change failure rate + review coverage
"Do we have performance issues?"Individual productivity scoresManager 1:1s + team retrospectives
"Is headcount justified?"Output per developerTeam throughput vs. business outcomes

Step 4: Pilot and Demonstrate

Offer to run a pilot program:

  • Start with team-level SPACE metrics for one quarter
  • Share dashboards with leadership monthly
  • Survey developers on how the metrics affect their work
  • Measure whether teams improve under the new system

For more on rolling out metrics, see our guide on building metrics without surveillance.

Step 5: Set Clear Boundaries

Be explicit about what you will and won't do:

  • Will: Provide team-level performance metrics, delivery velocity, quality indicators
  • Will: Surface systemic issues (bottlenecks, slow CI, review delays)
  • Will: Report trends and improvements over time
  • Won't: Rank individual developers by output metrics
  • Won't: Tie metrics directly to compensation
  • Won't: Create "productivity scores" for individuals

/// How CodePulse Approaches This

We built CodePulse around the principle that metrics should help teams, not evaluate individuals. Here's what that means:

  • Team-level defaults: Dashboards show team metrics first; individual data is for self-reflection
  • No secret dashboards: Everyone sees the same data—managers don't have hidden views
  • Balanced metrics: We track speed AND quality AND collaboration, not just output
  • Trend focus: We emphasize improvement over time, not absolute scores
  • Recognition over punishment: Our Awards system celebrates contributions without creating rankings

Conclusion

McKinsey's article wasn't entirely wrong—executives do need visibility, and some measurement is necessary. But their individual-level "contribution analysis" was a mistake that would damage engineering culture more than it helps.

The better path is clear:

  • System-level metrics (DORA) for organizational health
  • Team-level metrics (SPACE) for continuous improvement
  • Qualitative signals (surveys, 1:1s) for context
  • Individual data for self-reflection only, never for evaluation

If you're being pressured to implement McKinsey-style measurement, push back—but push back with a better alternative, not just resistance. Executives have legitimate needs. Your job is to meet those needs without destroying what makes engineering teams work.

"You can measure developer productivity. Just don't measure individual developers. Measure the system. Measure the team. Measure outcomes. Then use that data to remove obstacles, not to evaluate people."

The goal isn't to prove developers are productive. It's to help them be more productive. Those require fundamentally different measurement approaches. Choose wisely.

See these insights for your team

CodePulse connects to your GitHub and shows you actionable engineering metrics in minutes. No complex setup required.

Free tier available. No credit card required.