Goodhart's Law in Software: Why Your Metrics Get Gamed

"When a measure becomes a target, it ceases to be a good measure." This is Goodhart's Law, and it's the most important thing to understand about engineering metrics. Every metric you track will eventually be gamed if you're not careful. This guide explains why, shows real examples from engineering teams, and offers strategies to measure performance without destroying it.

"The moment you start rewarding developers for lines of code, you'll get more lines of code—and probably worse software."

What is Goodhart's Law?

Goodhart's Law was originally stated by British economist Charles Goodhart in 1975, in the context of monetary policy. The idea is simple but profound:

"Any observed statistical regularity will tend to collapse once pressure is placed upon it for control purposes."

Or, in simpler terms: when you optimize for a metric, you often destroy what the metric was supposed to measure.

This happens because metrics are proxies for things we actually care about. We care about "software quality," but we measure test coverage because it's countable. We care about "developer productivity," but we measure commits because they're trackable. The proxy is never the thing itself—and when people optimize the proxy, the underlying thing often suffers.

Real Examples in Engineering

Goodhart's Law isn't theoretical. It plays out in engineering organizations constantly. Teams wondering why features aren't shipping faster often find that gaming-induced dysfunction is part of the problem:

Lines of Code

The metric:	Lines of code (LOC) per developer
What you wanted:	High productivity, lots of features delivered
What you got:	Verbose code, copy-paste duplication, resistance to refactoring

When LOC becomes a target, developers write more code than necessary. Functions that could be one-liners become twenty. Code that should be deleted stays. Refactoring that reduces LOC looks like "negative productivity."

Test Coverage

The metric:	Code coverage percentage
What you wanted:	Well-tested, reliable software
What you got:	Tests that execute code but don't verify behavior, assertion-free tests

Teams chasing coverage targets write tests that technically cover lines but don't actually test anything meaningful. A test that calls a function without asserting results increases coverage while adding zero value.

Velocity (Story Points)

The metric:	Story points completed per sprint
What you wanted:	Predictable delivery, understanding of team capacity
What you got:	Point inflation, gaming of estimates, velocity as performance metric

When velocity becomes a performance target, teams inflate their estimates. What was a "3" becomes a "5" to look more productive. Cross-team comparisons become meaningless. The metric stops tracking actual capacity.

Identify bottlenecks slowing your team with CodePulse

Number of PRs

The metric:	Pull requests merged per week
What you wanted:	Active development, regular integration
What you got:	Tiny PRs, split work unnecessarily, gaming for higher counts

When PR count becomes a goal, developers split work into artificially small pieces. A feature that should be one PR becomes five. Review overhead increases. The metric goes up while actual throughput goes down.

Cycle Time

The metric:	Time from first commit to merge
What you wanted:	Fast feedback, efficient process
What you got:	PRs opened before work is ready, rubber-stamp reviews to hit targets

When cycle time has targets, teams find shortcuts. Opening PRs early inflates "coding time" but shortens the measured cycle. Reviews get rushed. Quality suffers while the metric improves.

/// Our Take

Every metric we show in CodePulse can be gamed. We know this. The question isn't whether metrics can be gamed—they can. It's whether you create a culture where gaming them is more attractive than improving genuinely.

That's why we focus on team-level metrics rather than individual leaderboards, and why we emphasize understanding bottlenecks rather than hitting arbitrary targets. Metrics should inform, not evaluate.

Why Does Gaming Happen?

Understanding why Goodhart's Law operates helps you design better measurement systems:

1. Incentive Alignment

People optimize what they're rewarded for. If bonuses are tied to metrics, people will optimize those metrics—even at the expense of what the metrics were supposed to measure.

2. Metric Simplification

Reality is complex; metrics are simple. "Good software" involves quality, maintainability, performance, user satisfaction, and more. Any single metric captures only part of that—and optimizing the part can harm the whole.

3. Campbell's Law

Related to Goodhart's Law, Campbell's Law states: "The more any quantitative social indicator is used for social decision-making, the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intended to monitor."

4. Local Optimization

Individuals optimizing their own metrics can create system-wide dysfunction. A developer optimizing PR count might create more work for reviewers. The developer's metric improves; the team's throughput decreases.

"You get what you measure—which is why you should be very careful what you measure."

Strategies to Avoid Metric Gaming

1. Measure for Understanding, Not Evaluation

Use metrics to understand what's happening, not to judge people. When cycle time increases, ask "what changed?" rather than "who's to blame?" When you remove the evaluation pressure, you remove the incentive to game.

2. Use Multiple Metrics Together

Balance metrics against each other. If you track cycle time, also track quality metrics like change failure rate. Optimizing one at the expense of the other becomes visible. This is why DORA uses four metrics together, not one.

If You Track...	Also Track...	To Prevent...
Velocity	Bug rate, tech debt	Shipping fast but low quality
Test coverage	Test effectiveness (bugs caught)	Coverage without value
Cycle time	Change failure rate	Speed without stability
PR count	PR size, merge frequency	Artificial splitting
Deployment frequency	MTTR, customer incidents	Deploying for the sake of deploying

3. Focus on Team Metrics, Not Individual

Individual metrics create competition; team metrics create collaboration. "Our team's cycle time" encourages helping each other. "My cycle time" encourages gaming.

4. Let Teams Choose Their Own Metrics

When teams pick what to measure, they have ownership. Imposed metrics feel like surveillance; chosen metrics feel like tools. Self-selected metrics are also harder to game because the team knows why they chose them.

5. Rotate or Sunset Metrics

Don't measure the same things forever. Once a metric has served its purpose (identified a problem, tracked an improvement), consider retiring it. Long-lived metrics develop long-evolved gaming strategies.

6. Look at Trends, Not Absolutes

"Cycle time decreased 20%" is more useful than "cycle time is 48 hours." Trends show improvement; absolutes invite comparison and competition. Focus on direction, not position.

📊 How CodePulse Addresses This

We designed CodePulse with Goodhart's Law in mind:

Team-level focus: Default views show team metrics, not individual rankings
Balanced metrics: Executive Summary combines speed, quality, and collaboration metrics
Trend emphasis: Charts show change over time, not just current state
No arbitrary targets: We show benchmarks for context, not goals to hit

Building a Healthy Metrics Culture

The solution to Goodhart's Law isn't to stop measuring—it's to measure thoughtfully:

What a Healthy Metrics Culture Looks Like

Metrics are discussed in retrospectives, not performance reviews
Teams ask "what does this tell us?" not "how do we improve this number?"
Bad metrics are questioned and changed, not blindly optimized
Leaders use metrics to understand, not to judge
Gaming is discussed openly—and addressed by changing the metric, not punishing the gamer

Red Flags of Unhealthy Metrics Culture

Metrics tied directly to compensation or performance reviews
Individual leaderboards visible to management
Targets set without team input
Consistent "green" dashboards that don't match reality
Fear of reporting bad numbers
Metrics never change despite changing circumstances

"The goal of engineering metrics isn't to prove teams are productive. It's to help teams become more productive. Those are very different goals with very different measurement approaches."

Engineering Metrics Without Surveillance — Building trust with metrics
Measure Team Performance Without Micromanaging — Balanced measurement approaches
DORA Metrics Guide — A well-designed multi-metric framework
Data Quality in Engineering Metrics — Ensuring metrics are accurate and meaningful

Conclusion

Goodhart's Law isn't a reason to avoid metrics—it's a reason to use them wisely. Every metric you track will be gamed if the incentives are wrong. The solution is:

Measure for understanding, not evaluation
Use balanced metric sets, not single indicators
Focus on team outcomes, not individual numbers
Track trends, not absolute targets
Rotate metrics as circumstances change
Create psychological safety to report bad numbers

Remember: the map is not the territory. Metrics are maps of engineering performance, not the performance itself. Use them to navigate, not to judge.

"When a measure becomes a target, it ceases to be a good measure. When a team uses measures to improve, they become better measures—and a better team."

Start by auditing your current metrics. Are any tied to compensation? Are teams gaming them? Use CodePulse to understand your delivery flow, not to evaluate your developers. The metrics are there to serve you—not the other way around.