Lines of code (LOC) is the metric that refuses to die. First popularized in the 1960s at IBM, criticized by Fred Brooks in 1975, mocked by Bill Gates in the 1990s, and still showing up in board decks in 2025. This guide explains why LOC persists, why it was always flawed, and why AI-generated code has finally made it actively dangerous.
If you manage 50+ engineers and someone on your leadership team still references "lines of code per developer," this guide will arm you with the data to kill that conversation permanently.
"Measuring programming progress by lines of code is like measuring aircraft building progress by weight." — Bill Gates
A Brief History of LOC as a Metric
LOC started as a reasonable idea. In the 1960s, IBM needed to estimate project effort for large mainframe systems. Counting lines of code gave managers a tangible number to work with. Fred Brooks, who managed the IBM System/360 project, documented in The Mythical Man-Month (1975) that developers produced roughly 10 lines of debugged code per day, regardless of language. That observation was meant as a warning about the complexity of software. Instead, managers turned it into a target.
By the 1980s, the backlash had started. The Software Engineering Institute (SEI) and IEEE published guidelines for standardizing LOC counting, but the fundamental problem remained: there is no universal definition of what constitutes a "line of code." Source lines of code (SLOC) varies wildly between counting blank lines, comments, logical statements, and physical lines.
The metric survived anyway. Not because it was good, but because it was easy. Counting lines requires zero judgment, zero context, and zero understanding of software. That combination is irresistible to executives who want a single number.
Why LOC Won't Die: The Seductive Simplicity Problem
LOC endures for the same reason BMI endures in medicine: it's a bad proxy that's easy to calculate. Here are the specific reasons it keeps resurfacing:
| Reason LOC Persists | Why It's Wrong |
|---|---|
| It's countable | So are bugs. Counting the wrong thing is worse than counting nothing. |
| Non-technical leaders can understand it | Understanding a metric is useless if the metric doesn't measure what you think it does. |
| It feels objective | Objectivity without validity is just precise incorrectness. |
| It's language-agnostic (supposedly) | 50 lines of Java and 5 lines of Python can do the same thing. Language neutrality is a myth. |
| Tooling makes it trivial | Every version control system can count lines. Availability is not the same as utility. |
The real reason LOC persists is organizational. Engineering is opaque to most executives. They want a number that translates engineering effort into something tangible. LOC fills that gap, even though it fills it with garbage. If you want to understand what actually drives developer productivity metrics, start with outcomes, not outputs.
Lines of Code in the AI Era: Now It's Actually Dangerous
LOC was always a bad metric. With AI coding assistants, it has become a dangerous one.
In 2024, AI generated 41% of all code, with 256 billion lines written by AI tools in that year alone, according to Elite Brains research. GitHub Copilot users complete 126% more projects per week than manual coders, per Second Talent's analysis of AI coding statistics. If you're measuring LOC, your AI-assisted developers look 2-3x more "productive" than developers doing careful, minimal implementations.
But here is the problem: that AI-generated code is not free. A GitClear study of 211 million changed lines of code found that in 2024, duplicated code blocks rose eightfold compared to previous years. Code that was revised within two weeks of being written grew from 3.1% in 2020 to 5.7% in 2024. Refactoring activity dropped from 25% of changes in 2021 to under 10% in 2024, while copy-pasted code climbed from 8.3% to 12.3%.
Read that again: AI produces more lines, but those lines churn faster, get duplicated more, and replace less existing code. Google's 2024 DORA report found that AI adoption correlated with a 7.2% decrease in delivery stability. More code. More problems.
The LOC Paradox in the AI Era:
Developer A (with Copilot): +2,400 lines/week
Developer B (without): +600 lines/week
LOC says: "A is 4x more productive"
Reality: A's code churns 2x faster
A's code has 8x more duplication
B's smaller changes ship with fewer bugs
Which developer do you actually want more of?If you are tracking LOC in an organization where 40%+ of code is AI-generated, you are literally measuring how much your developers use autocomplete. That is not a productivity metric. That is a Goodhart's Law time bomb.
/// Our Take
LOC in 2025 is not just outdated. It is embarrassing. Any VP of Engineering who puts "lines of code" on a dashboard is telling their team: "I don't understand what you do, and I'm not willing to learn."
With AI assistants generating 41% of all code, LOC now measures how aggressively your team hits Tab in their IDE. The best engineers we work with routinely have negative LOC weeks. They delete dead code, consolidate abstractions, simplify architectures. Under LOC, those weeks register as zero or negative productivity. That tells you everything about the metric.
What LOC Hides: Delete, Refactor, Simplify
The deepest problem with LOC is what it penalizes. In mature codebases, the most valuable engineering work is reductive.
Deletion. Removing 500 lines of dead code that nobody understands is more valuable than adding 500 lines of new features. Dead code creates confusion, increases cognitive load during reviews, and can hide security vulnerabilities. LOC records deletion as negative productivity.
Refactoring. Consolidating three similar functions into one well-tested utility reduces the codebase by hundreds of lines while improving maintainability. LOC says the developer who did this was less productive than the one who copy-pasted.
Simplification. Replacing a 200-line custom implementation with a 10-line call to a well-maintained library is good engineering. LOC says it is a 190-line productivity loss.
These are not edge cases. In our research across thousands of PRs, the highest-impact changes often have more deletions than additions. Teams that track code churn rate understand this. A healthy codebase has regular cleanup cycles. LOC punishes every one of them.
"The developer who deletes 1,000 lines of legacy code and replaces it with 50 lines of clean abstraction has done more for your company than the one who wrote 2,000 lines of new spaghetti."
5 Metrics That Replaced LOC at Top Companies
The industry has moved on from LOC. Google's DORA research program, spanning over 39,000 professionals across organizations of all sizes, identified metrics that actually predict organizational performance. Here are the replacements:
1. Cycle Time (First Commit to Merge)
Cycle time measures how long it takes for code to go from first commit to merged. Unlike LOC, it captures process efficiency. A team producing fewer lines but merging in 4 hours is outperforming a team producing 10x the lines that sit in review for 3 days. Check our engineering metrics dashboard guide for how to set this up.
2. Deployment Frequency
How often your team ships to production. This measures what executives actually care about: delivery speed. Elite teams deploy multiple times per day. The number of lines in each deployment is irrelevant.
3. Change Failure Rate
The percentage of deployments that cause incidents. This is the quality signal that LOC cannot provide. A 10,000-line release with a 0% failure rate is better than a 100-line release that takes down production.
4. Code Churn Rate
The ratio of code rewritten or deleted versus code added. Churn reveals whether code is sticking or getting thrown away. High churn on recent code suggests unclear requirements or poor initial implementation, which are problems LOC completely ignores.
5. PR Throughput with Size Context
PRs merged per period, weighted by whether they are small and reviewable or large and risky. This captures both velocity and quality. A team merging 20 small, well-reviewed PRs per week is in better shape than one merging 5 massive PRs that nobody can review properly.
| Metric | What It Measures | LOC Equivalent |
|---|---|---|
| Cycle Time | Speed of delivery | None. LOC says nothing about delivery speed. |
| Deployment Frequency | Shipping cadence | None. LOC says nothing about shipping. |
| Change Failure Rate | Quality of releases | None. LOC says nothing about quality. |
| Code Churn Rate | Code stability | None. LOC penalizes healthy cleanup. |
| PR Throughput | Team output with size context | Vaguely similar, but LOC ignores review and risk. |
📊 How to See This in CodePulse
CodePulse replaces LOC tracking with metrics that actually correlate with engineering health:
- Velocity Score (
/velocity) combines cycle time, throughput, quality, and collaboration into a single composite score. It is the "single number" that LOC pretends to be, but backed by real signal. - Engineering Health Score (
/executive) gives leadership the high-level overview they want without reducing engineering to a line count. - Cycle Time Breakdown (
/dashboard) shows where time actually goes: coding, waiting for review, in review, and merge. - Code Churn Rate tracks the ratio of deletions to additions, so refactoring shows up as valuable work, not negative productivity.
How to Explain to Your CEO Why LOC Is Embarrassing
This is the practical section. Your CEO or board member asks: "How many lines of code did the team write last quarter?" Here is how to handle it without making them feel stupid, while steering toward metrics that actually work.
The Analogy That Works
"Measuring engineering by lines of code is like measuring a hospital by number of procedures. More procedures does not mean better healthcare. We want fewer procedures with better outcomes. Similarly, we want less code that does more, ships faster, and breaks less."
The Redirect
"Instead of lines of code, here is what I track: we shipped 47 features last quarter with a median cycle time of 18 hours and a 2.1% change failure rate. That means we are shipping fast and breaking almost nothing. I can show you the trend."
The Data Point
"AI tools now generate 41% of all code ( source). If we measure lines of code, we are really measuring how much our developers use autocomplete. Google's DORA research found that AI adoption actually decreases delivery stability by 7.2% ( 2024 DORA Report). More code is not better code."
"The phrase ‘engineering productivity’ is a trap. It implies engineers are factory workers with countable outputs. Software is creative knowledge work. Measure the system, not the workers."
The One-Page Replacement
Give your executive team a one-page dashboard with four numbers: deployment frequency, cycle time, change failure rate, and team throughput (PRs merged). These four numbers tell you more about engineering health than a billion lines of code ever could. Every major tech company uses some variant of this framework.
FAQ
Is LOC ever useful?
In narrow contexts, yes. LOC can be useful for rough project sizing during estimation (comparing similar projects in the same language), identifying unusually large PRs that need extra review attention, and detecting potential code churn patterns. It should never be used to compare developer productivity or set performance targets.
What about LOC per developer per sprint?
This is the worst variant. Different tasks require vastly different amounts of code. A developer fixing a critical one-line bug has done more for your company than one who wrote 3,000 lines of boilerplate. Per-developer LOC comparisons create exactly the wrong incentives.
My organization still requires LOC reporting. What do I do?
Report it, but always present it alongside context metrics. Show LOC next to cycle time, churn rate, and deployment frequency. Over time, your stakeholders will see that the contextual metrics correlate with business outcomes while LOC does not.
How did companies like Google move past LOC?
Google invested in the DORA research program, which studied 39,000+ professionals over a decade to identify metrics that predict organizational performance. DORA metrics (deployment frequency, lead time, change failure rate, mean time to recovery) replaced LOC because they correlate with business results. In 2024, DORA added rework rate as a fifth metric. In 2025, they moved to seven team archetypes instead of simple performance tiers.
Does AI-generated code make LOC completely meaningless?
Essentially, yes. When 41% of code is AI-generated and AI-authored code shows 8x more duplication and nearly double the churn rate ( GitClear 2025 research), LOC becomes a measure of AI tool adoption, not engineering output.
See these insights for your team
CodePulse connects to your GitHub and shows you actionable engineering metrics in minutes. No complex setup required.
Free tier available. No credit card required.
Related Guides
Lines of Code Is Embarrassing. Measure This Instead
Stop treating engineers like factory workers. Learn why LOC tracking is embarrassing, which metrics destroy trust, and how to measure productivity without surveillance. 83% of developers suffer burnout—bad metrics make it worse.
Goodhart's Law in Software: Why Your Metrics Get Gamed
When a measure becomes a target, it ceases to be a good measure. This guide explains Goodhart's Law with real engineering examples and strategies to measure without destroying what you're measuring.
Engineering Metrics Dashboard: The 7 Metrics You Need
Skip vanity metrics. Here are the 7 engineering metrics VPs actually need to track team performance, delivery, and quality.
High Code Churn Isn't Bad. Unless You See This Pattern
Learn what code churn rate reveals about your codebase health, how to distinguish healthy refactoring from problematic rework, and when to take action.
