In August 2023, McKinsey published "Yes, you can measure developer productivity"—and the engineering world erupted. Kent Beck called it "naive." Gergely Orosz wrote a 12,000-word rebuttal. The debate exposed a fundamental tension: executives want productivity metrics, but developers fear surveillance. This guide navigates that tension with a pragmatic alternative.
If you're a VP or Director being pressured to implement McKinsey-style measurement, this guide is for you. We'll examine what McKinsey recommended, where they went wrong, what they got right, and how to build a metrics program that provides executive visibility without destroying engineering culture.
"The goal isn't to prove developers are productive. It's to help them be more productive. Those require fundamentally different measurement approaches."
What McKinsey Actually Recommended
Before critiquing McKinsey's approach, let's be precise about what they actually said. The 2023 article, authored by a team including former Google and Microsoft executives, made several specific claims:
The Three-Level Framework
McKinsey proposed measuring productivity at three levels:
| Level | What McKinsey Proposed | Example Metrics |
|---|---|---|
| System | Organizational and process metrics | DORA metrics, CI/CD health, platform efficiency |
| Team | Collective output and collaboration | Velocity, sprint completion, team throughput |
| Individual | "Contribution" of each developer | Commits, code quality, story points completed |
The Contested Claims
McKinsey made several assertions that sparked controversy:
- "Quantitative and qualitative factors can be measured" at the individual developer level
- "Software developer performance can be broken into inner loop and outer loop activities" with separate metrics for each
- Proposed "contribution analysis" that appeared to track individual developer output
- Referenced "Quality of engineer X's code" as a measurable attribute
The article included a visualization showing how to measure individual developer "contribution"—including metrics like "contribution per developer" and "amount of legacy code refactored." This is what triggered the firestorm.
The Developer Community Backlash
Kent Beck's Criticism
Kent Beck—creator of Extreme Programming, co-author of the Agile Manifesto, and someone who has spent decades thinking about developer productivity—called the McKinsey article "a naive approach to a complicated subject."
"I've been measuring software development for 40 years. I know what works and what doesn't. This article gets some things right about system-level metrics, but the individual metrics are dangerous."
Beck's core objections:
- Knowledge work is fundamentally different from manufacturing—you can't measure widgets produced
- Individual metrics create perverse incentives—developers will optimize for the metric, not the outcome
- The best work is often invisible—mentoring, preventing problems, simplifying systems
Gergely Orosz's "Pragmatic Engineer" Rebuttal
Gergely Orosz, former Uber engineering manager and author of The Pragmatic Engineer newsletter, published a comprehensive 12,000-word response. His key points:
- McKinsey has no credibility in software—they've never shipped software products, so why would developers trust their measurement framework?
- The "opportunity sizing" framing is concerning—McKinsey estimates $300B in productivity gains, which reads as "we can help you squeeze more out of developers"
- System-level metrics are fine; individual metrics are not—DORA works because it measures outcomes, not individuals
- The companies that inspired the article (Google, Microsoft) have backed away from individual developer productivity metrics
"When a management consultancy tells you they can measure individual developer productivity, ask them: have they ever managed developers? Have they ever shipped software?"
The Industry Response
The backlash was swift and nearly unanimous among practitioners:
- Dan North (BDD creator): "This is what happens when people who don't understand software try to industrialize it"
- Martin Fowler (ThoughtWorks chief scientist): Drew attention to Goodhart's Law and the dangers of optimizing proxy metrics
- Charity Majors (Honeycomb CTO): Highlighted that the best developers often ship fewer commits because they're unblocking others
- Will Larson (Author of "Staff Engineer"): Noted that individual metrics punish senior engineers who spend time on architecture and mentorship
What McKinsey Got Right
In the rush to condemn McKinsey, some legitimate points got lost. Let's be fair about what they got right:
1. Executives Need Visibility
McKinsey correctly identified a real problem: engineering is often a black box to leadership. "Trust us" is not a strategy when you're accountable to a board for millions in R&D spending. Some form of measurement is necessary.
2. System-Level Metrics Work
The DORA metrics McKinsey referenced are well-validated. Deployment frequency, lead time, change failure rate, and MTTR genuinely correlate with organizational performance. McKinsey was right to include them.
3. Qualitative Assessment Matters
McKinsey suggested combining quantitative metrics with qualitative signals like code review quality and technical decision-making. This is actually the right approach— numbers alone miss too much.
4. Developer Experience Affects Output
The article acknowledged that developer experience (tooling, CI/CD speed, meeting load) affects productivity. Improving DX is often higher leverage than measuring individual output.
| McKinsey Recommendation | Assessment |
|---|---|
| Use DORA metrics at system level | Valid - well-researched framework |
| Track team-level velocity | Partially valid - if used for planning, not evaluation |
| Improve developer experience | Valid - high leverage approach |
| Combine quantitative + qualitative | Valid - essential for context |
| Measure individual "contribution" | Problematic - creates gaming and surveillance |
| "Quality of engineer X's code" | Problematic - poorly defined, easily gamed |
Where McKinsey Went Wrong
/// Our Take
McKinsey's fundamental error wasn't proposing measurement—it was proposing individual measurement using output proxies. This approach misunderstands both the nature of knowledge work and the psychology of metrics.
When you measure individual developers using activity metrics (commits, PRs, story points), you create a surveillance culture that drives out intrinsic motivation, punishes collaboration, and incentivizes gaming. The metrics improve while actual outcomes deteriorate.
1. Treating Developers Like Manufacturing Workers
McKinsey's framework implicitly assumes software development is like manufacturing: more widgets = more value. But software development is knowledge work. The most valuable contribution might be:
- Deleting 10,000 lines of legacy code (negative "productivity" by output metrics)
- Spending a week mentoring a junior developer (zero commits)
- Writing a design doc that prevents three months of wasted work (no code shipped)
- Simplifying a system so it's easier for everyone to work on (fewer total features)
None of these show up in "contribution analysis." All of them are immensely valuable.
2. Ignoring Goodhart's Law
"When a measure becomes a target, it ceases to be a good measure." McKinsey's individual metrics would immediately be gamed:
- Story points? Developers inflate estimates. A "3" becomes an "8."
- Commits? Developers split work into tiny, meaningless commits.
- Code quality scores? Developers write verbose code to please linters.
- PRs merged? Developers avoid hard problems that take longer.
For a deeper dive, see our guide on Goodhart's Law in engineering metrics.
3. Destroying Psychological Safety
Google's Project Aristotle found that psychological safety is the #1 predictor of team performance. Individual productivity metrics destroy psychological safety:
- Developers fear being seen as "low performers"
- Helping others becomes costly (reduces your own output)
- Admitting you don't understand something becomes risky
- Asking for help signals weakness
The irony: McKinsey's framework for improving productivity would actually decrease it.
4. Punishing Senior Engineers
Senior and staff engineers often contribute most through activities that don't register in output metrics:
| High-Value Senior Activity | Impact on Output Metrics |
|---|---|
| Reviewing PRs thoroughly | Zero personal output |
| Pair programming with junior devs | Halves "individual" productivity |
| Architecture and system design | No commits for weeks |
| Cross-team coordination | Meeting time, not coding time |
| Incident response | Disrupts planned work |
5. The Consulting Business Model Conflict
There's an elephant in the room: McKinsey sells consulting services to reduce headcount. The article estimates "$300B in opportunity" from productivity gains. To executives, that reads as: "We can help you do more with fewer developers."
This isn't inherently wrong—efficiency matters—but it creates a credibility problem. When the measurement framework comes from a firm that profits from headcount reduction, developers are right to be skeptical.
A Better Approach: SPACE Framework + Team Metrics
There's a middle path between "no measurement" and "surveillance." It's based on research from Microsoft, GitHub, and academia: the SPACE framework.
The SPACE Dimensions
| Dimension | What It Measures | Team-Level Metric |
|---|---|---|
| Satisfaction | Developer happiness and well-being | Quarterly surveys, after-hours patterns |
| Performance | Outcomes and quality | Change failure rate, customer incidents |
| Activity | Volume of work (use cautiously) | Team PRs merged, reviews completed |
| Communication | Collaboration and knowledge sharing | Review network, cross-team collaboration |
| Efficiency | Flow and minimal friction | Cycle time, wait time for review |
Key Principles
- Team-level metrics, not individual rankings. "Our team's cycle time is 48 hours" is useful. "Sarah's cycle time is worse than John's" is toxic.
- Measure at least 3 of 5 dimensions. Single metrics create incentives to game. Balanced metrics create incentives to improve.
- Combine quantitative + qualitative. Numbers tell you what; conversations tell you why.
- Use metrics for understanding, not evaluation. "Why did cycle time increase?" is productive. "Who caused cycle time to increase?" is not.
What This Looks Like in Practice
Executive Dashboard (What Leadership Sees) ========================================== Team Health Summary ------------------- Cycle Time: 48h avg (↓12% from last quarter) Deployment Freq: 3.2/day (↑8%) Change Fail Rate: 4.2% (stable) Review Coverage: 94% (↑2%) Team Satisfaction ----------------- Latest survey: 4.1/5.0 (↑0.2) After-hours work: 8% of commits (target: <10%) Delivery Velocity ----------------- PRs merged: 127 this sprint Avg PR size: 186 lines (healthy) Review turnaround: 6h avg What's NOT on this dashboard: - Individual developer rankings - Commits per person - "Productivity scores" - Stack rankings
How to Respond When Leadership Asks for McKinsey Metrics
If you're a VP or Director, you may face pressure from executives who read the McKinsey article. Here's how to navigate that conversation:
Step 1: Acknowledge the Legitimate Need
Don't dismiss the request. Executives need visibility. Engineering can't be a black box. Start by validating the underlying concern:
"You're right that we need better visibility into engineering performance. I want to show you what we're proposing—it addresses your concerns while avoiding some pitfalls with the McKinsey approach."
Step 2: Present the Evidence
Bring data to the conversation:
- Google's research: Their DevOps Research and Assessment (DORA) explicitly measures teams, not individuals
- Microsoft's SPACE: Developed by researchers including Nicole Forsgren (DORA co-creator), emphasizes multidimensional team measurement
- Industry consensus: Point to the backlash from Kent Beck, Gergely Orosz, and other respected voices
- Attrition risk: Individual surveillance metrics correlate with engineer attrition—replacing developers costs 50-200% of salary
Step 3: Propose the Alternative
Don't just say no—offer a better option:
| Executive Concern | McKinsey Approach | Better Alternative |
|---|---|---|
| "Are we shipping fast enough?" | Individual commits/PRs | Team deployment frequency + cycle time |
| "Is our quality acceptable?" | "Code quality of engineer X" | Team change failure rate + review coverage |
| "Do we have performance issues?" | Individual productivity scores | Manager 1:1s + team retrospectives |
| "Is headcount justified?" | Output per developer | Team throughput vs. business outcomes |
Step 4: Pilot and Demonstrate
Offer to run a pilot program:
- Start with team-level SPACE metrics for one quarter
- Share dashboards with leadership monthly
- Survey developers on how the metrics affect their work
- Measure whether teams improve under the new system
For more on rolling out metrics, see our guide on building metrics without surveillance.
Step 5: Set Clear Boundaries
Be explicit about what you will and won't do:
- Will: Provide team-level performance metrics, delivery velocity, quality indicators
- Will: Surface systemic issues (bottlenecks, slow CI, review delays)
- Will: Report trends and improvements over time
- Won't: Rank individual developers by output metrics
- Won't: Tie metrics directly to compensation
- Won't: Create "productivity scores" for individuals
/// How CodePulse Approaches This
We built CodePulse around the principle that metrics should help teams, not evaluate individuals. Here's what that means:
- Team-level defaults: Dashboards show team metrics first; individual data is for self-reflection
- No secret dashboards: Everyone sees the same data—managers don't have hidden views
- Balanced metrics: We track speed AND quality AND collaboration, not just output
- Trend focus: We emphasize improvement over time, not absolute scores
- Recognition over punishment: Our Awards system celebrates contributions without creating rankings
Related Reading
- SPACE Framework Implementation Guide — The research-backed alternative to McKinsey metrics
- Measure Team Performance Without Micromanaging — Practical approaches to team-level measurement
- Engineering Metrics Without Surveillance — Building trust while maintaining visibility
- Goodhart's Law and Engineering Metrics — Why measuring can destroy what you're trying to measure
Conclusion
McKinsey's article wasn't entirely wrong—executives do need visibility, and some measurement is necessary. But their individual-level "contribution analysis" was a mistake that would damage engineering culture more than it helps.
The better path is clear:
- System-level metrics (DORA) for organizational health
- Team-level metrics (SPACE) for continuous improvement
- Qualitative signals (surveys, 1:1s) for context
- Individual data for self-reflection only, never for evaluation
If you're being pressured to implement McKinsey-style measurement, push back—but push back with a better alternative, not just resistance. Executives have legitimate needs. Your job is to meet those needs without destroying what makes engineering teams work.
"You can measure developer productivity. Just don't measure individual developers. Measure the system. Measure the team. Measure outcomes. Then use that data to remove obstacles, not to evaluate people."
The goal isn't to prove developers are productive. It's to help them be more productive. Those require fundamentally different measurement approaches. Choose wisely.
See these insights for your team
CodePulse connects to your GitHub and shows you actionable engineering metrics in minutes. No complex setup required.
Free tier available. No credit card required.
Related Guides
Why Microsoft Abandoned DORA for SPACE (And You Should Too)
Learn how to implement the SPACE framework from Microsoft and GitHub research to measure developer productivity across Satisfaction, Performance, Activity, Communication, and Efficiency.
Engineering Metrics That Won't Get You Reported to HR
An opinionated guide to implementing engineering metrics that build trust. Includes the Visibility Bias Framework, practical do/don't guidance, and a 30-day action plan.
How to Measure Developers Without Becoming the Villain
Learn how to implement engineering metrics that developers actually trust, focusing on insight over surveillance and team-level patterns.
Goodhart's Law in Software: Why Your Metrics Get Gamed
When a measure becomes a target, it ceases to be a good measure. This guide explains Goodhart's Law with real engineering examples and strategies to measure without destroying what you're measuring.
