Skip to main content
All Guides
Metrics

Engineering Metrics Look Wrong? Fix These 7 Data Quality Problems

Your engineering metrics look off. Cycle time is impossibly high. GitHub Insights shows different numbers. This guide covers the 7 most common data quality problems and how to fix each one.

12 min readUpdated March 25, 2026By CodePulse Team

Your engineering metrics look wrong. Cycle time is impossibly high. GitHub Insights shows different numbers. Bots are inflating your commit count. This guide covers the 10 most common data quality problems in engineering analytics and how to fix each one.

Quick Answer

Why do my engineering metrics look wrong?

The most common causes of misleading engineering metrics are: bot activity inflating counts (filter Dependabot, Renovate), draft PRs skewing cycle time (exclude drafts), stale PRs from months ago being merged (filter by date range), branch configuration including release/staging branches in metrics (set up branch exclusions), and force pushes rewriting commit timestamps. Start by enabling bot filtering and setting a 90-day window.

Why Your Numbers Don't Match GitHub Insights

This is the #1 question we hear from new users. You connect an analytics tool and the numbers differ from what GitHub shows. This is expected and usually means the analytics tool is more accurate for your purposes.

DifferenceGitHub InsightsAnalytics Tools (CodePulse)
Bot commitsIncludedExcluded by default
Draft PRsCounted in activityExcluded from cycle time
Self-mergesCounted as mergesFlagged separately
Time calculationCalendar timeConfigurable (working hours option)
Branch filteringAll branchesConfigurable exclusions
Private reposLimited in free tierFull access via GitHub App

If you need the numbers to match exactly, disable bot filtering and include all branches. But you probably do not want that. Filtered metrics are more useful for decision-making.

🔥 Our Take

Precision is the enemy of useful metrics. An 80% accurate metric that you act on weekly beats a 99% accurate metric that takes a month to calculate.

Stop trying to make your analytics tool match GitHub Insights exactly. The goal is consistent, actionable trends, not accounting-grade precision. If cycle time is trending up, it does not matter whether the absolute number is 18 hours or 22 hours. What matters is the direction.

Problem 1: Bots Inflating Your Metrics

Dependabot, Renovate, GitHub Actions bots, and other automated accounts can generate hundreds of PRs per month. If these are included in your metrics, they distort everything: cycle time (bot PRs merge instantly or sit forever), PR volume (inflated by dependency updates), and review load (if someone reviews bot PRs manually).

Fix:

  • Enable bot filtering in your analytics tool (CodePulse excludes bots by default)
  • Check for accounts with [bot] in their login name
  • Add custom bot accounts specific to your org (deploy bots, CI bots)

Problem 2: Stale PRs Skewing Cycle Time

A PR opened 6 months ago and merged yesterday has a cycle time of 180 days. If that is included in your team's median, it massively distorts the picture. Stale PRs are usually abandoned work that someone merged to clean up, not representative of normal delivery.

Fix:

  • Filter metrics to PRs opened within the last 90 days
  • Use median instead of mean (medians are resistant to outliers)
  • Set up alerts for PRs open longer than 7 days to prevent staleness
  • Review and close abandoned PRs monthly
Identify bottlenecks slowing your team with CodePulse

Problem 3: Wrong Branch Configuration

PRs to release branches, staging branches, and hotfix branches have different lifecycle patterns than feature PRs. Including them in the same metrics pool creates noise.

Fix:

  • Exclude PRs targeting branches like main, master, release/*, staging, hotfix/* from source branches
  • CodePulse automatically excludes PRs FROM main/master/develop/staging branches
  • Configure additional exclusion patterns for your branching strategy

Problem 4: Force Pushes Breaking History

Force pushes (git push --force) rewrite commit history. This can cause analytics tools to lose track of original commit timestamps, making coding time calculations unreliable.

Fix:

  • Prefer git push --force-with-lease (safer but still rewrites history)
  • Use squash merges instead of rebase-and-force-push workflows for cleaner history
  • Most analytics tools use PR events (immutable) rather than commit timestamps for key metrics
  • CodePulse uses PR lifecycle events (created, reviewed, merged) which are not affected by force pushes

Problem 5: Weekend and Holiday Hours in Cycle Time

A PR opened Friday at 5 PM and merged Monday at 9 AM shows 64 hours of cycle time. But only a few minutes of actual work happened. Calendar time cycle time can be misleading for teams that do not work weekends.

Fix:

  • Configure working hours in your analytics tool (CodePulse supports working days configuration per org)
  • Use "working hours only" mode if your team has consistent work schedules
  • For distributed teams across time zones, calendar time may be more appropriate since someone is always working

"The best metric configuration is the one your team agrees represents reality. Consistency matters more than precision."

Problem 6: Archived and Forked Repos in Metrics

Archived repositories contain historical PRs with old cycle times that skew team averages. Forked repos may include upstream PRs that your team did not create.

Fix:

  • Only include repositories your team actively commits to
  • Remove archived repos from your analytics tool configuration
  • For forks, filter to only PRs authored by your team members

Problem 7: Delayed or Missing Webhook Events

If your analytics rely on webhooks, delayed or dropped events create gaps in your data. GitHub webhooks have a 99.9% delivery rate but are not guaranteed.

Fix:

  • Use polling-based tools (like CodePulse, which syncs every 15 minutes via API) instead of webhook-only approaches
  • Check GitHub's webhook delivery log: Settings → Webhooks → Recent Deliveries
  • Implement webhook retry logic with exponential backoff
  • Run a daily reconciliation job that compares webhook data against API data
Detect code hotspots and knowledge silos with CodePulse

Data Quality Checklist

Run through this checklist when setting up engineering analytics for the first time:

  • Bot filtering enabled? Check that Dependabot, Renovate, and custom bots are excluded.
  • Branch exclusions configured? Exclude PRs from main/master/release/staging branches.
  • Time range appropriate? Start with 90 days. Include historical backfill only after verifying current data quality.
  • Working hours configured? Decide whether to use calendar time or working hours for cycle time.
  • Archived repos excluded? Remove repos that are no longer actively developed.
  • Using median, not mean? Medians are resistant to outliers and give a more accurate picture of typical performance.

Getting Started

  1. Connect CodePulse with bot filtering enabled (the default).
  2. Review the initial sync data. If numbers look off, check the problems above in order.
  3. Configure branch exclusions and working hours in Settings.
  4. Compare a sample of 10 PRs manually against the tool's calculations to build confidence.

For more on data quality, see our data quality in engineering metrics guide and GitHub metrics guide.

Frequently Asked Questions

GitHub Insights counts all activity including bots, draft PRs, and self-merges. Engineering analytics tools like CodePulse filter bots by default, exclude draft PRs from cycle time calculations, and may use different time boundaries. The numbers will differ, and the analytics tool numbers are usually more useful because they reflect actual human engineering work.

See these insights for your team

CodePulse connects to your GitHub and shows you actionable engineering metrics in minutes. No complex setup required.

Free tier available. No credit card required.