No CI/CD Access? Here's How Google Measures DORA Anyway

The DORA Four Keys have become the definitive framework for measuring software delivery performance. But most VP and Directors of Engineering face a common problem: they want to implement DORA metrics, but they lack access to comprehensive CI/CD telemetry, their deployment pipelines are fragmented across multiple tools, or their incident tracking is manual and inconsistent. Can you still measure the Four Keys accurately?

Yes. This guide shows you exactly how to implement DORA metrics using only GitHub data as your source of truth. You'll learn how each metric maps to observable Git activity, which proxies work (and which don't), and how to benchmark your performance against industry standards—all without instrumenting a single deployment pipeline.

What Are the DORA Four Keys (And Why Google Invested in Them)

The DORA (DevOps Research and Assessment) program began in 2014 as an academic research initiative to identify what separates high-performing engineering organizations from low performers. After analyzing data from over 36,000 engineering professionals across multiple years, the research team (later acquired by Google in 2018) identified four key metrics that reliably predict organizational performance.

Why Google Cares About DORA

Google's acquisition of the DORA team wasn't academic curiosity—it was strategic. The Four Keys directly correlate with business outcomes that matter to any engineering organization:

2x higher organizational performance: Teams in the elite DORA category are twice as likely to meet or exceed their organizational performance goals
50% less burnout: High-performing teams report significantly lower rates of burnout and fatigue
Higher employee retention: Teams with strong DORA metrics have measurably better retention rates
Faster innovation: Elite performers can experiment and iterate at significantly higher velocity

The Four Keys matter because they measure outcomes (how fast you deliver value, how reliable your delivery is) rather than outputs (how many commits you make, how many hours you work). This makes them uniquely valuable for engineering leadership.

The Four Key Metrics

The DORA framework measures software delivery performance across two dimensions:

Throughput (Speed): How fast can you deliver changes?
- Deployment Frequency: How often do you release to production?
- Lead Time for Changes: How long from commit to production?
Stability (Quality): How reliable are your changes?
- Change Failure Rate: What percentage of changes cause incidents?
- Time to Restore Service: How quickly can you recover from failures?

The genius of the framework is this balance: you can't optimize for speed alone (you'll ship broken code) or stability alone (you'll never ship anything). Elite teams excel at both simultaneously.

🎯 Why These Four Metrics?

The DORA researchers tested hundreds of potential metrics and found that these four had the strongest correlation with overall organizational performance. More importantly, they're predictive: improving these metrics leads to improved business outcomes, not just better engineering stats.

Other metrics you might track—like code coverage, story points, or lines of code—don't have this predictive power. The Four Keys do.

See your engineering metrics in 5 minutes with CodePulse

How Each Key Metric Maps to GitHub Data

The traditional DORA implementation requires comprehensive instrumentation: deployment webhooks, incident management integrations, CI/CD telemetry. But for teams using GitHub as their source of truth for development activity, you can measure all four metrics with surprising accuracy using only Git data.

Deployment Frequency from GitHub

DORA Definition: How often your organization successfully releases to production (or releases to end users for on-demand software).

GitHub Proxy: PRs merged to your main/production branch per working day.

This proxy works when your development workflow follows one of these patterns:

Trunk-based development: Every merge to main triggers an automated deployment to production
Continuous deployment: Merges to main deploy within minutes via automated pipelines
Daily/weekly release trains: Merges to main deploy in batches on a predictable schedule

Accuracy caveat: If you merge to main but manually gate deployments (e.g., merge on Wednesday, deploy on Friday), this proxy will overestimate your deployment frequency. In that case, consider tracking release tags instead.

# Example: Calculating Deployment Frequency from Git

# Count merges to main in the last 30 days
git log --merges --first-parent main --since="30 days ago" --format="%ad" | wc -l

# Divide by working days (30 days ≈ 21 working days)
# If result = 42 merges → 42/21 = 2 deploys per working day (High Performer)

Lead Time for Changes from GitHub

DORA Definition: The amount of time it takes a commit to get into production. Specifically: time from when code is committed to version control to when it's running in production.

GitHub Proxy: Time from first commit on a branch to PR merged tomain. For a more actionable metric, use cycle time: PR opened to merged.

Why cycle time is often better than commit-to-merge lead time:

First commit is often experimental: Developers may commit locally multiple times while exploring a solution before the "real" work begins
PR creation signals readiness: When a developer opens a PR, they're saying "this is ready for review and potential merge"—that's your true start time
Measurable bottlenecks: Cycle time lets you see exactly where delays happen (waiting for review, review duration, CI failures, merge conflicts)

CodePulse tracks both variants and breaks cycle time into four actionable components. For detailed analysis, see our Deployment Frequency and Lead Time Guide.

# Example: Calculating Lead Time from GitHub API

# For each merged PR:
first_commit_time = pr.commits[0].created_at
merge_time = pr.merged_at
lead_time = merge_time - first_commit_time

# Aggregate to median (more robust than mean for skewed distributions)
median_lead_time_hours = median(all_lead_times)

# Example result: Median = 18 hours → High Performer

Change Failure Rate from GitHub

DORA Definition: The percentage of changes to production that result in degraded service or require remediation (e.g., lead to service impairment or outage, require a hotfix, rollback, fix forward, or patch).

GitHub Proxy (Option 1): Revert commit ratio—count commits that revert previous changes as a proxy for production failures.

# Example: Detecting Reverts in Git

# Find revert commits (usually have "Revert" or "revert" in message)
git log --grep="Revert" --grep="revert" -i --oneline --since="30 days ago"

# Calculate ratio
total_commits = 120
revert_commits = 8
change_failure_rate = (8 / 120) * 100 = 6.7% → Elite Performer

GitHub Proxy (Option 2): Hotfix branch frequency—if your team uses dedicated hotfix branches for production issues, count hotfix PRs merged as a proxy for failures.

GitHub Proxy (Option 3): PRs with failing status checks that were eventually merged—indicates code that passed review but failed automated testing, suggesting quality issues.

The hard truth: Change failure rate is the hardest DORA metric to measure accurately from Git alone. For true accuracy, you need incident management data (PagerDuty, Opsgenie, etc.) cross-referenced with deployments. If you track incidents in GitHub Issues with specific labels (incident, production-bug), you can correlate those with recent merges.

Time to Restore Service from GitHub

DORA Definition: How long it generally takes to restore service when a service incident occurs (e.g., unplanned outage, service impairment).

GitHub Proxy (Option 1): For teams that create GitHub Issues for incidents, measure time from issue creation (with incident label) to issue closure.

GitHub Proxy (Option 2): Time from revert commit to fix commit merged. This captures "we broke production, reverted it, and then fixed the underlying issue."

# Example: Measuring MTTR from GitHub Issues

# For each incident issue:
incident_created = issue.created_at
incident_resolved = issue.closed_at
time_to_restore = incident_resolved - incident_created

# Aggregate to median
median_mttr_hours = median(all_mttr_values)

# Example result: Median = 2.5 hours → High Performer

The hard truth (again): Like change failure rate, Time to Restore Service is difficult to measure accurately from GitHub alone. Most engineering organizations need to integrate incident management tooling or at minimum maintain a rigorous practice of tracking incidents as GitHub Issues.

💡 Start with What You Can Measure

If you can't accurately measure all four keys immediately, start with the two throughput metrics (Deployment Frequency and Lead Time). These are straightforward to calculate from GitHub data and provide immediate value.

Add the two stability metrics (Change Failure Rate and Time to Restore) once you've established better incident tracking practices. Imperfect data for all four is better than perfect data for only two, but don't let perfect be the enemy of good.

See your engineering metrics in 5 minutes with CodePulse

Setting Up Measurement Without CI/CD Access

Many VPs and Directors of Engineering don't control their CI/CD infrastructure—DevOps or Platform teams do. Here's how to implement DORA metrics when you can't instrument your deployment pipelines.

Phase 1: Establish Your Development Workflow Patterns

Before you can choose the right proxies, you need to understand your actual workflow:

Map merge-to-deploy relationship: When a PR merges to main, how long until it's in production?
- Immediate (automated CD) → Use merge timestamps directly
- Within hours (manual trigger) → Add average delay to merge timestamps
- Batched weekly → Use release tag timestamps instead of merge timestamps
Identify production branches: Does main represent production? Or do you have a separate production or releasebranch?
Document release patterns: Do you deploy continuously, on schedules (daily/weekly), or ad-hoc?

This audit takes 1-2 hours and is essential for choosing accurate proxies. Interview 2-3 developers and your DevOps lead to understand the real workflow.

Phase 2: Connect GitHub as Your Single Source of Truth

Once you understand your workflow, configure your metrics tool to extract DORA data from GitHub:

Deployment Frequency: Count PRs merged to your production branch per working day
Lead Time: Calculate median time from PR creation to merge (cycle time)
Change Failure Rate: Count revert commits or hotfix PRs as percentage of total merges
Time to Restore: Track incident issues (if using GitHub Issues) or measure revert-to-fix time

Tools like CodePulse automatically extract these metrics from your GitHub data without requiring any CI/CD instrumentation. For a quick baseline, try our DORA metrics calculator. For manual tracking, you can query the GitHub API:

# Example: GitHub API query for merged PRs (Deployment Frequency)

GET /repos/{owner}/{repo}/pulls?state=closed&base=main&per_page=100

# Filter results where merged_at is not null
# Count PRs merged in last 30 days
# Divide by working days (typically 21-22 per 30-day period)

Phase 3: Establish Baseline Performance

Before setting targets or making changes, collect 30-90 days of baseline data. This tells you where you currently stand:

DORA Baseline Performance

30-day snapshot

Watch

1.2/day

High/Medium

Deployment Frequency

Watch

36 hrs

Medium/High

Lead Time (Cycle Time)

Good

12%

Elite

Change Failure Rate

Good

4.5 hrs

High

Time to Restore

Throughput:Room for improvement in deployment frequency and lead time

Stability:Strong performance on change failure rate and recovery

This baseline becomes your "before" snapshot. All improvement initiatives will be measured against this baseline.

Phase 4: Create Your DORA Dashboard

Present your DORA metrics in a format that's accessible to both engineering teams and executives. Your dashboard should show:

Current values: This week/month's performance for all four metrics
Trend indicators: Are metrics improving or regressing vs. previous period?
Performance classification: Elite, High, Medium, or Low according to DORA benchmarks
Historical trends: 90-day rolling view showing improvement over time
Team breakdowns: Compare metrics across different teams/repositories

For dashboard design best practices, see our Engineering Metrics Dashboard Guide.

Benchmarking Against Industry Standards

The DORA research team established performance benchmarks based on surveying tens of thousands of engineering teams. Here's how to interpret where your organization falls.

Complete DORA Performance Benchmark Table

Metric	Elite	High	Medium	Low
Deployment Frequency	On-demand (multiple deploys per day)	Between once per day and once per week	Between once per week and once per month	Fewer than once per month
Lead Time for Changes	Less than one hour	Between one day and one week	Between one week and one month	More than one month
Change Failure Rate	0-15%	16-30%	16-30%	More than 30%
Time to Restore Service	Less than one hour	Less than one day	Between one day and one week	More than one week

Understanding Your Performance Profile

Most organizations don't fall cleanly into one category—you might be Elite at some metrics and Medium at others. Common patterns:

"Fast but fragile" teams: Elite deployment frequency and lead time, but Medium/Low change failure rate and time to restore. These teams ship quickly but break production frequently. Focus: Quality gates, testing practices, staged rollouts.

"Slow but stable" teams: Elite change failure rate and time to restore, but Medium/Low deployment frequency and lead time. These teams are careful but slow. Focus: Automation, reducing batch sizes, improving cycle time.

"Unbalanced" teams: High performers on throughput metrics but haven't invested in incident response. Elite deployment frequency and lead time, but no visibility into stability metrics. Focus: Implement incident tracking, establish on-call practices.

Comparing Against Industry Peers

DORA performance varies significantly by industry and company stage:

Startups/Scale-ups: Typically achieve Elite/High throughput metrics (fast shipping) but Medium stability metrics (breaking things is acceptable when moving fast)
Enterprise: Often Medium/High across all metrics—slower but more stable than startups
Highly regulated (fintech, healthcare): Often Low/Medium throughput metrics due to compliance requirements, but Elite stability metrics (failures are not acceptable)

Don't compare your healthcare startup to a consumer SaaS company. Compare against companies with similar regulatory requirements and business constraints. For industry- specific guidance, see our guides on Engineering Metrics for Fintech and Engineering Metrics for Healthcare.

Common Implementation Pitfalls

After helping hundreds of engineering teams implement DORA metrics, we've seen the same mistakes repeatedly. Here's how to avoid them.

Pitfall 1: Measuring Without Context

The mistake: Tracking metrics in isolation without understanding what drives them or what trade-offs you're making.

Example: Your deployment frequency drops from 3/day to 1/day. Is that a problem? Maybe—or maybe you just finished a major migration that required larger, more careful changes.

The fix: Always annotate your metrics with context:

Major releases or refactoring efforts
Team size changes (people joining/leaving)
Seasonal effects (holidays, end-of-year code freezes)
Deliberate process changes (new review requirements, security audits)

Pitfall 2: Gaming the Metrics

The mistake: Using DORA metrics for individual performance evaluation, which incentivizes gaming behavior.

Example: If deployment frequency is tied to bonuses, developers will split every change into tiny PRs to inflate their merge count—even when larger PRs would be more appropriate.

The fix: Use DORA metrics only at the team/organization level. Never use them for individual performance reviews. Track individual contributions separately through qualitative feedback, peer reviews, and project impact.

Pitfall 3: Optimizing One Metric at the Expense of Others

The mistake: Focusing obsessively on one metric (usually deployment frequency) while ignoring the others.

Example: Pushing for Elite deployment frequency by removing code review requirements and skipping tests. Deployment frequency improves, but change failure rate skyrockets.

The fix: Track all four metrics together. Elite performers excel atboth throughput and stability simultaneously. If improving one metric degrades another, you're not actually improving—you're just shifting where the problem shows up.

Pitfall 4: Comparing Teams Without Adjusting for Context

The mistake: Creating team leaderboards or directly comparing teams with different missions and constraints.

Example: Comparing your infrastructure team's deployment frequency (0.5/day) to your web frontend team's deployment frequency (5/day) and concluding the infrastructure team is "underperforming."

The fix: Compare teams against their own historical performance, not against each other. Platform teams, infrastructure teams, and security teams naturally have different DORA profiles than application teams. Judge each team by improvement, not absolute values.

Pitfall 5: Treating Elite Performance as the Only Goal

The mistake: Assuming every team must reach Elite status on all four metrics to be successful.

Example: A team moves from Low to Medium performance—a massive improvement—but leadership still sees them as "failing" because they're not Elite.

The fix: Celebrate improvement, not just absolute performance. The research shows that improving your metrics correlates with business outcomes— you don't need Elite status to benefit. A team that moves from Medium to High sees real benefits even if they never reach Elite.

Pitfall 6: Implementing Without Buy-In

The mistake: Rolling out DORA metrics as a top-down mandate without explaining why they matter or how they'll be used.

Example: Engineering teams see DORA metrics as "another management dashboard" and don't engage with them, or worse, actively try to game them.

The fix: Implement DORA metrics collaboratively:

Explain why these specific metrics correlate with team success and reduced burnout
Show how the metrics will be used (improvement tracking, not individual evaluation)
Involve team leads in defining baselines and setting realistic targets
Make metrics transparent—everyone should see the same dashboard you see

For a structured rollout approach, see our Engineering Metrics Rollout Playbook.

From Measurement to Improvement

Measuring DORA metrics is only valuable if you use that data to drive actual improvement. Here's a practical framework for turning metrics into better performance.

Step 1: Identify Your Biggest Constraint

Don't try to improve all four metrics simultaneously. Use your baseline data to identify which metric is your biggest bottleneck:

If deployment frequency is Low/Medium: You're not shipping fast enough. Focus on reducing cycle time and batch sizes.
If lead time is Low/Medium: You have process bottlenecks. Focus on identifying and removing delays (review time, CI time, approval processes).
If change failure rate is Low/Medium: You're shipping broken code. Focus on quality gates (testing, review rigor, staged rollouts).
If time to restore is Low/Medium: You can't recover from failures quickly. Focus on incident response (monitoring, runbooks, rollback procedures).

Step 2: Set a 90-Day Improvement Target

Choose realistic, achievable targets for your constraint metric:

Realistic improvement rate: 20-30% improvement per quarter is achievable. 100% improvement requires fundamental process changes.
Example targets:
- Deployment frequency: 1.2/day → 1.8/day (50% increase)
- Lead time: 48 hours → 32 hours (33% reduction)
- Change failure rate: 22% → 16% (move from Medium to High)
- Time to restore: 6 hours → 4 hours (move from Medium to High)

Step 3: Identify Specific Interventions

For each target, identify 2-3 specific changes you'll make to drive improvement:

To improve Deployment Frequency:

Reduce PR size (set 400-line soft limit)
Implement feature flags to allow shipping incomplete features
Automate deployment triggers (remove manual gates)
Encourage parallel work (multiple PRs in flight per developer)

To improve Lead Time:

Balance review load across more team members (no single bottleneck reviewer)
Set SLA for first review (PRs reviewed within 4 business hours)
Parallelize CI pipelines to reduce test wait time
Enable auto-merge for approved PRs (remove manual merge step)

To improve Change Failure Rate:

Increase test coverage for critical paths
Implement staged rollouts (canary to 5% → 50% → 100%)
Require two approvals for high-risk changes
Add pre-merge integration testing in production-like environments

To improve Time to Restore:

Implement automated rollback procedures
Create incident runbooks for common failure modes
Improve observability (add monitoring for key user journeys)
Practice incident response with quarterly game days

Step 4: Measure Weekly, Adjust Monthly

Track your target metric weekly to see if your interventions are working:

Week 1-2: Implement changes, expect metrics to dip slightly as team adjusts to new processes
Week 3-6: Metrics should start improving as new practices take hold
Week 7-12: Metrics should show sustained improvement; if not, adjust interventions

If you're not seeing improvement by week 8, hold a retrospective to understand why:

Are people actually following the new process?
Did we identify the right bottleneck?
Are there hidden dependencies we didn't account for?

Step 5: Sustain and Expand

Once you've improved your constraint metric, celebrate the win and then move to the next constraint:

Quarter 1: Improve deployment frequency from Low to Medium
Quarter 2: Improve lead time from Medium to High
Quarter 3: Improve change failure rate from Medium to High
Quarter 4: Improve time to restore from Medium to Elite

This iterative approach prevents overwhelm and ensures each improvement becomes embedded before moving to the next challenge.

🚀 Track DORA Metrics in CodePulse

CodePulse automatically calculates all four DORA metrics from your GitHub data:

Deployment Frequency: PRs merged to main per working day
Lead Time: PR cycle time with component breakdown
Change Failure Rate: Revert ratio and hotfix frequency
Time to Restore: Incident issue duration

All metrics include trend analysis, performance classification, and team-level breakdowns—no CI/CD instrumentation required.

Reporting to Executives and Boards

When presenting DORA metrics to non-technical stakeholders, focus on business impact:

Instead of: "We improved our deployment frequency from 1.2 to 2.1 per day"
Say: "We've reduced time-to-market for customer-requested features by 40%, allowing us to respond to competitive threats within days instead of weeks"

Instead of: "Our change failure rate decreased from 22% to 14%"
Say: "We've reduced production incidents by 35%, improving customer experience and reducing on-call burden on the team"

For comprehensive guidance on board-ready reporting, see our Board-Ready Engineering Metrics Guide.

Final Thoughts: DORA Metrics as a Foundation

The Four Keys aren't the only metrics you should track, but they provide a solid foundation for understanding software delivery performance. Once you've implemented DORA metrics and established a baseline, you can expand to track additional dimensions:

Team health metrics (review load distribution, burnout signals)
Code quality metrics (test coverage, technical debt)
Individual growth metrics (for mentoring, not performance evaluation)
Business alignment metrics (feature adoption, customer impact)

But start with DORA. These four metrics, measured consistently over time, will give you the visibility and improvement framework your engineering organization needs to level up.