McKinsey's Developer Productivity Article Was Wrong. Here's Why.

In August 2023, McKinsey published "Yes, you can measure developer productivity" - and the engineering world erupted. Kent Beck called it "naive." Gergely Orosz wrote a 12,000-word rebuttal. The debate exposed a fundamental tension: executives want productivity metrics, but developers fear surveillance. This guide navigates that tension with a pragmatic alternative.

If you're a VP or Director being pressured to implement McKinsey-style measurement, this guide is for you. We'll examine what McKinsey recommended, where they went wrong, what they got right, and how to build a metrics program that provides executive visibility without destroying engineering culture.

"The goal isn't to prove developers are productive. It's to help them be more productive. Those require fundamentally different measurement approaches."

What McKinsey Actually Recommended

Before critiquing McKinsey's approach, let's be precise about what they actually said. The 2023 article, authored by a team including former Google and Microsoft executives, made several specific claims:

The Three-Level Framework

McKinsey proposed measuring productivity at three levels:

Level	What McKinsey Proposed	Example Metrics
System	Organizational and process metrics	DORA metrics, CI/CD health, platform efficiency
Team	Collective output and collaboration	Velocity, sprint completion, team throughput
Individual	"Contribution" of each developer	Commits, code quality, story points completed

The Contested Claims

McKinsey made several assertions that sparked controversy:

"Quantitative and qualitative factors can be measured" at the individual developer level
"Software developer performance can be broken into inner loop and outer loop activities" with separate metrics for each
Proposed "contribution analysis" that appeared to track individual developer output
Referenced "Quality of engineer X's code" as a measurable attribute

The article included a visualization showing how to measure individual developer "contribution" - including metrics like "contribution per developer" and "amount of legacy code refactored." This is what triggered the firestorm.

The Developer Community Backlash

Kent Beck's Criticism

Kent Beck - creator of Extreme Programming, co-author of the Agile Manifesto, and someone who has spent decades thinking about developer productivity - called the McKinsey article "a naive approach to a complicated subject."

"I've been measuring software development for 40 years. I know what works and what doesn't. This article gets some things right about system-level metrics, but the individual metrics are dangerous."

Beck's core objections:

Knowledge work is fundamentally different from manufacturing - you can't measure widgets produced
Individual metrics create perverse incentives - developers will optimize for the metric, not the outcome
The best work is often invisible - mentoring, preventing problems, simplifying systems

Gergely Orosz's "Pragmatic Engineer" Rebuttal

Gergely Orosz, former Uber engineering manager and author of The Pragmatic Engineer newsletter, published a detailed 12,000-word response. His key points:

McKinsey has no credibility in software - they've never shipped software products, so why would developers trust their measurement framework?
The "opportunity sizing" framing is concerning - McKinsey estimates $300B in productivity gains, which reads as "we can help you squeeze more out of developers"
System-level metrics are fine; individual metrics are not - DORA works because it measures outcomes, not individuals
The companies that inspired the article (Google, Microsoft) have backed away from individual developer productivity metrics

"When a management consultancy tells you they can measure individual developer productivity, ask them: have they ever managed developers? Have they ever shipped software?"

See your engineering metrics in 5 minutes with CodePulse

The Industry Response

The backlash was swift and nearly unanimous among practitioners:

Dan North (BDD creator): "This is what happens when people who don't understand software try to industrialize it"
Martin Fowler (ThoughtWorks chief scientist): Drew attention to Goodhart's Law and the dangers of optimizing proxy metrics
Charity Majors (Honeycomb CTO): Highlighted that the best developers often ship fewer commits because they're unblocking others
Will Larson (Author of "Staff Engineer"): Noted that individual metrics punish senior engineers who spend time on architecture and mentorship

What McKinsey Got Right

In the rush to condemn McKinsey, some legitimate points got lost. Let's be fair about what they got right:

1. Executives Need Visibility

McKinsey correctly identified a real problem: engineering is often a black box to leadership. "Trust us" is not a strategy when you're accountable to a board for millions in R&D spending. Some form of measurement is necessary.

2. System-Level Metrics Work

The DORA metrics McKinsey referenced are well-validated. Deployment frequency, lead time, change failure rate, and MTTR genuinely correlate with organizational performance. McKinsey was right to include them.

3. Qualitative Assessment Matters

McKinsey suggested combining quantitative metrics with qualitative signals like code review quality and technical decision-making. This is actually the right approach - numbers alone miss too much.

4. Developer Experience Affects Output

The article acknowledged that developer experience (tooling, CI/CD speed, meeting load) affects productivity. Improving DX is often higher leverage than measuring individual output.

McKinsey Recommendation	Assessment
Use DORA metrics at system level	Valid - well-researched framework
Track team-level velocity	Partially valid - if used for planning, not evaluation
Improve developer experience	Valid - high leverage approach
Combine quantitative + qualitative	Valid - essential for context
Measure individual "contribution"	Problematic - creates gaming and surveillance
"Quality of engineer X's code"	Problematic - poorly defined, easily gamed

Where McKinsey Went Wrong

/// Our Take

McKinsey's fundamental error wasn't proposing measurement - it was proposing individual measurement using output proxies. This approach misunderstands both the nature of knowledge work and the psychology of metrics.

When you measure individual developers using activity metrics (commits, PRs, story points), you create a surveillance culture that drives out intrinsic motivation, punishes collaboration, and incentivizes gaming. The metrics improve while actual outcomes deteriorate.

1. Treating Developers Like Manufacturing Workers

McKinsey's framework implicitly assumes software development is like manufacturing: more widgets = more value. But software development is knowledge work. The most valuable contribution might be:

Deleting 10,000 lines of legacy code (negative "productivity" by output metrics)
Spending a week mentoring a junior developer (zero commits)
Writing a design doc that prevents three months of wasted work (no code shipped)
Simplifying a system so it's easier for everyone to work on (fewer total features)

None of these show up in "contribution analysis." All of them are immensely valuable.

2. Ignoring Goodhart's Law

"When a measure becomes a target, it ceases to be a good measure." McKinsey's individual metrics would immediately be gamed:

Story points? Developers inflate estimates. A "3" becomes an "8."
Commits? Developers split work into tiny, meaningless commits.
Code quality scores? Developers write verbose code to please linters.
PRs merged? Developers avoid hard problems that take longer.

For a deeper dive, see our guide on Goodhart's Law in engineering metrics.

3. Destroying Psychological Safety

Google's Project Aristotle found that psychological safety is the #1 predictor of team performance. Individual productivity metrics destroy psychological safety:

Developers fear being seen as "low performers"
Helping others becomes costly (reduces your own output)
Admitting you don't understand something becomes risky
Asking for help signals weakness

The irony: McKinsey's framework for improving productivity would actually decrease it.

4. Punishing Senior Engineers

Senior and staff engineers often contribute most through activities that don't register in output metrics:

High-Value Senior Activity	Impact on Output Metrics
Reviewing PRs thoroughly	Zero personal output
Pair programming with junior devs	Halves "individual" productivity
Architecture and system design	No commits for weeks
Cross-team coordination	Meeting time, not coding time
Incident response	Disrupts planned work

5. The Consulting Business Model Conflict

There's an elephant in the room: McKinsey sells consulting services to reduce headcount. The article estimates "$300B in opportunity" from productivity gains. To executives, that reads as: "We can help you do more with fewer developers."

This isn't inherently wrong - efficiency matters - but it creates a credibility problem. When the measurement framework comes from a firm that profits from headcount reduction, developers are right to be skeptical.

A Better Approach: SPACE Framework + Team Metrics

There's a middle path between "no measurement" and "surveillance." It's based on research from Microsoft, GitHub, and academia: the SPACE framework.

Comparison of McKinsey's individual-focused approach vs SPACE framework's team-focused approach — The fundamental difference: McKinsey measures individuals, SPACE measures systems and teams

The SPACE Dimensions

Dimension	What It Measures	Team-Level Metric
Satisfaction	Developer happiness and well-being	Quarterly surveys, after-hours patterns
Performance	Outcomes and quality	Change failure rate, customer incidents
Activity	Volume of work (use cautiously)	Team PRs merged, reviews completed
Communication	Collaboration and knowledge sharing	Review network, cross-team collaboration
Efficiency	Flow and minimal friction	Cycle time, wait time for review

Key Principles

Team-level metrics, not individual rankings. "Our team's cycle time is 48 hours" is useful. "Sarah's cycle time is worse than John's" is toxic.
Measure at least 3 of 5 dimensions. Single metrics create incentives to game. Balanced metrics create incentives to improve.
Combine quantitative + qualitative. Numbers tell you what; conversations tell you why.
Use metrics for understanding, not evaluation. "Why did cycle time increase?" is productive. "Who caused cycle time to increase?" is not.

Detect code hotspots and knowledge silos with CodePulse

What This Looks Like in Practice

Executive Dashboard (What Leadership Sees)
==========================================

Team Health Summary
-------------------
Cycle Time:        48h avg (↓12% from last quarter)
Deployment Freq:   3.2/day (↑8%)
Change Fail Rate:  4.2% (stable)
Review Coverage:   94% (↑2%)

Team Satisfaction
-----------------
Latest survey: 4.1/5.0 (↑0.2)
After-hours work: 8% of commits (target: <10%)

Delivery Velocity
-----------------
PRs merged: 127 this sprint
Avg PR size: 186 lines (healthy)
Review turnaround: 6h avg

What's NOT on this dashboard:
- Individual developer rankings
- Commits per person
- "Productivity scores"
- Stack rankings

How to Respond When Leadership Asks for McKinsey Metrics

If you're a VP or Director, you may face pressure from executives who read the McKinsey article. Here's how to navigate that conversation:

Step 1: Acknowledge the Legitimate Need

Don't dismiss the request. Executives need visibility. Engineering can't be a black box. Start by validating the underlying concern:

"You're right that we need better visibility into engineering performance. I want to show you what we're proposing - it addresses your concerns while avoiding some pitfalls with the McKinsey approach."

Step 2: Present the Evidence

Bring data to the conversation:

Google's research: Their DevOps Research and Assessment (DORA) explicitly measures teams, not individuals
Microsoft's SPACE: Developed by researchers including Nicole Forsgren (DORA co-creator), emphasizes multidimensional team measurement
Industry consensus: Point to the backlash from Kent Beck, Gergely Orosz, and other respected voices
Attrition risk: Individual surveillance metrics correlate with engineer attrition - replacing developers costs 50-200% of salary

Step 3: Propose the Alternative

Don't just say no - offer a better option:

Executive Concern	McKinsey Approach	Better Alternative
"Are we shipping fast enough?"	Individual commits/PRs	Team deployment frequency + cycle time
"Is our quality acceptable?"	"Code quality of engineer X"	Team change failure rate + review coverage
"Do we have performance issues?"	Individual productivity scores	Manager 1:1s + team retrospectives
"Is headcount justified?"	Output per developer	Team throughput vs. business outcomes

Step 4: Pilot and Demonstrate

Offer to run a pilot program:

Start with team-level SPACE metrics for one quarter
Share dashboards with leadership monthly
Survey developers on how the metrics affect their work
Measure whether teams improve under the new system

For more on rolling out metrics, see our guide on building metrics without surveillance.

Step 5: Set Clear Boundaries

Be explicit about what you will and won't do:

Will: Provide team-level performance metrics, delivery velocity, quality indicators
Will: Surface systemic issues (bottlenecks, slow CI, review delays)
Will: Report trends and improvements over time
Won't: Rank individual developers by output metrics
Won't: Tie metrics directly to compensation
Won't: Create "productivity scores" for individuals

/// How CodePulse Approaches This

We built CodePulse around the principle that metrics should help teams, not evaluate individuals. Here's what that means:

Team-level defaults: Dashboards show team metrics first; individual data is for self-reflection
No secret dashboards: Everyone sees the same data - managers don't have hidden views
Balanced metrics: We track speed AND quality AND collaboration, not just output
Trend focus: We emphasize improvement over time, not absolute scores
Recognition over punishment: Our Awards system celebrates contributions without creating rankings

SPACE Framework Implementation Guide - The research-backed alternative to McKinsey metrics
Measure Team Performance Without Micromanaging - Practical approaches to team-level measurement
Engineering Metrics Without Surveillance - Building trust while maintaining visibility
Goodhart's Law and Engineering Metrics - Why measuring can destroy what you're trying to measure

Conclusion

McKinsey's article wasn't entirely wrong - executives do need visibility, and some measurement is necessary. But their individual-level "contribution analysis" was a mistake that would damage engineering culture more than it helps.

The better path is clear:

System-level metrics (DORA) for organizational health
Team-level metrics (SPACE) for continuous improvement
Qualitative signals (surveys, 1:1s) for context
Individual data for self-reflection only, never for evaluation

If you're being pressured to implement McKinsey-style measurement, push back - but push back with a better alternative, not just resistance. Executives have legitimate needs. Your job is to meet those needs without destroying what makes engineering teams work.

"You can measure developer productivity. Just don't measure individual developers. Measure the system. Measure the team. Measure outcomes. Then use that data to remove obstacles, not to evaluate people."

The goal isn't to prove developers are productive. It's to help them be more productive. Those require fundamentally different measurement approaches. Choose wisely.

McKinsey's Developer Productivity Article Was Wrong. Here's Why.

See these metrics for your own team

What McKinsey Actually Recommended

The Three-Level Framework

The Contested Claims

The Developer Community Backlash

Kent Beck's Criticism

Gergely Orosz's "Pragmatic Engineer" Rebuttal

The Industry Response

What McKinsey Got Right

1. Executives Need Visibility

2. System-Level Metrics Work

3. Qualitative Assessment Matters

4. Developer Experience Affects Output

Where McKinsey Went Wrong

/// Our Take

1. Treating Developers Like Manufacturing Workers

2. Ignoring Goodhart's Law

3. Destroying Psychological Safety

4. Punishing Senior Engineers

5. The Consulting Business Model Conflict

A Better Approach: SPACE Framework + Team Metrics

The SPACE Dimensions

Key Principles

What This Looks Like in Practice

How to Respond When Leadership Asks for McKinsey Metrics

Step 1: Acknowledge the Legitimate Need

Step 2: Present the Evidence

Step 3: Propose the Alternative

Step 4: Pilot and Demonstrate

Step 5: Set Clear Boundaries

/// How CodePulse Approaches This

Conclusion

See these insights for your team

See These Features in Action

Related Guides

Why Microsoft Abandoned DORA for SPACE (And You Should Too)

Engineering Metrics That Won't Get You Reported to HR

How to Measure Developers Without Becoming the Villain

Goodhart's Law in Software: Why Your Metrics Get Gamed

McKinsey's Developer Productivity Article Was Wrong. Here's Why.

See these metrics for your own team

What McKinsey Actually Recommended

The Three-Level Framework

The Contested Claims

The Developer Community Backlash

Kent Beck's Criticism

Gergely Orosz's "Pragmatic Engineer" Rebuttal

The Industry Response

What McKinsey Got Right

1. Executives Need Visibility

2. System-Level Metrics Work

3. Qualitative Assessment Matters

4. Developer Experience Affects Output

Where McKinsey Went Wrong

/// Our Take

1. Treating Developers Like Manufacturing Workers

2. Ignoring Goodhart's Law

3. Destroying Psychological Safety

4. Punishing Senior Engineers

5. The Consulting Business Model Conflict

A Better Approach: SPACE Framework + Team Metrics

The SPACE Dimensions

Key Principles

What This Looks Like in Practice

How to Respond When Leadership Asks for McKinsey Metrics

Step 1: Acknowledge the Legitimate Need

Step 2: Present the Evidence

Step 3: Propose the Alternative

Step 4: Pilot and Demonstrate

Step 5: Set Clear Boundaries

/// How CodePulse Approaches This

Related Reading

Conclusion

See these insights for your team

See These Features in Action

Related Guides

Why Microsoft Abandoned DORA for SPACE (And You Should Too)

Engineering Metrics That Won't Get You Reported to HR

How to Measure Developers Without Becoming the Villain

Goodhart's Law in Software: Why Your Metrics Get Gamed