Home/MTTR Impact

RELIABILITY METRICS

Alert Fatigue and MTTR: How Noise Burns Your Error Budget

Updated May 2026 | Sources: DORA 2024 Accelerate State of DevOps, Honeycomb 2024 Observability Maturity Survey, Atlassian MTTR Benchmarks

What MTTA and MTTR Mean

MTTA

Mean Time To Acknowledge

How long it takes an on-call engineer to confirm they are investigating an alert. Industry median: 8-15 minutes. Healthy: under 5 minutes. Alert fatigue degrades MTTA first.

MTTR

Mean Time To Restore

Total time from incident detection to service restoration. DORA 2024 elite: under 1 hour. Low performers: over 1 week. Alert fatigue is one of the top three contributors to long MTTR.

MTTD

Mean Time To Detect

How long from an incident starting to its first alert triggering. Alert fatigue indirectly degrades MTTD: fatigued teams silence noisy monitors, reducing coverage.

Error budget

SLO-based allowed downtime

99.9% SLO = 43 minutes/month. 99.95% = 21.9 minutes/month. Alert fatigue burns error budgets faster via longer detection and resolution gaps.

How Noise Degrades MTTA and MTTR

1

Phase 1: False-positive normalisation

As false-positive ratio increases above 60%, engineers begin pattern-matching alerts to past false positives before investigating. This adds 2-5 minutes to MTTA per page as they mentally categorise the alert before acting. At 80%+ false positive rate, engineers begin acknowledging without investigating until they receive a second page or customer report.

2

Phase 2: Night page desensitisation

After 6+ months of high page volume, engineers begin acknowledging night pages with a 10-30 minute delay or miss them entirely. The incident.io 2024 survey found 62% report weekly sleep disruption; the natural response is to silence the phone between 11pm and 6am. Real P1s in this window are delayed by hours.

3

Phase 3: Signal degradation

Fatigued engineers conducting simultaneous investigations on multiple concurrent alerts miss the correlating signals that indicate a common root cause. Investigation proceeds on each alert independently. MTTR degrades because root-cause identification takes longer when the engineer cannot see the pattern across 15 simultaneous pages.

4

Phase 4: Post-mortem erosion

Fatigued teams skip post-mortems or conduct cursory ones. Root causes recur. Without a post-mortem culture, alert rules are never updated based on incident learnings, the false-positive ratio continues to climb, and MTTR degrades with each cycle.

DORA 2024 MTTR Benchmarks

The DORA State of DevOps 2024 report classifies organisations into four performance tiers. The gap between elite and low performers is now more than 2,100x on MTTR.

Tier	MTTR target	Alert discipline typically observed	SLO adoption
Elite	< 1 hour	SLO-based alerting, weekly audits, correlation enabled, runbooks for every P1	High
High	< 1 day	Partial SLO adoption, severity tiers defined, deduplication enabled	Medium
Medium	1 day - 1 week	Threshold alerting, some runbooks, ad-hoc audits	Low
Low	> 1 week	All-threshold alerting, no runbooks, alert debt accumulating	None

Source: DORA 2024 Accelerate State of DevOps Report (dora.dev). Note: DORA does not directly measure alert discipline; the correlation is inferred from MTTR and incident management maturity dimensions.

Tuning Interventions Ranked by MTTA/MTTR Lift

01

Alert correlation and deduplication

Eliminates duplicate investigation across correlated alerts

MTTA

-40% MTTA

MTTR

-20% MTTR

02

SLO-based alerting (burn rate)

Reduces false positives dramatically; engineers trust alerts more

MTTA

-15% MTTA

MTTR

-25% MTTR

03

Runbooks for every P1/P2 alert

Faster investigation and resolution once acknowledged

MTTA

-10% MTTA

MTTR

-30% TTR

04

Severity tiering (P1/P2/P3)

Concentration on genuinely urgent pages improves focus

MTTA

-25% MTTA (P1s)

MTTR

-15% MTTR

05

Weekly alert audit and noise culling

Slow-burn improvement as false-positive ratio decreases over time

MTTA

-10% MTTA/month

MTTR

-10% MTTR/month

FAQ

How does alert fatigue affect MTTR?+

Alert fatigue degrades MTTR through two mechanisms: (1) degraded MTTA as engineers become slower to acknowledge pages they assume are false positives, and (2) degraded investigation quality as fatigued engineers miss correlating signals. DORA 2024 data shows elite performers (MTTR < 1 hour) have significantly lower page volumes and higher SLO adoption than low performers (MTTR > 1 week).

What is an error budget and how does alert fatigue affect it?+

An error budget is the maximum allowable downtime or error rate within an SLO. A 99.9% SLO has a monthly error budget of 43 minutes. Alert fatigue burns error budgets faster because: (1) missed alerts mean incidents are detected later and last longer, and (2) fatigued teams skip post-mortems and return-to-green verification, allowing root causes to recur.

What DORA metrics are affected by alert fatigue?+

Alert fatigue primarily impacts MTTR (mean time to restore service) and Change Failure Rate. Elite DORA performers restore service in under 1 hour; low performers take over a week. The key differentiators DORA 2024 identifies include: SLO adoption, incident management maturity, and on-call team size relative to service count.

What is the difference between MTTA and MTTR?+

MTTA (Mean Time To Acknowledge) measures how long it takes for an on-call engineer to confirm they have seen and are investigating an alert. MTTR (Mean Time To Restore) measures the total time from incident detection to service restoration. Alert fatigue primarily degrades MTTA first (slower acknowledgement) which then compounds into longer MTTR as investigation starts later.

Alert Fatigue Calculator SLO vs Threshold Correlation & Dedup On-Call Cost Research Citations