Home/Metrics/MTTA vs MTTR

INCIDENT METRICS

MTTA vs MTTR: What Alert Fatigue Does to Each (Incident Response 2026)

Updated June 2026. Sources: PagerDuty 2023 State of Digital Operations, Atlassian incident management documentation, DORA 2024 State of DevOps, Google SRE Book Chapter 11.

Definitions That Often Get Conflated

MTTA (Mean Time To Acknowledge) measures how quickly an on-call engineer acknowledges a page in the pager tool. The clock starts when the page fires and stops when the engineer clicks the acknowledge button. MTTR (Mean Time To Resolve, sometimes Mean Time To Recover) measures the total incident duration from page firing to confirmed restored service health. The two metrics measure adjacent but distinct properties of incident response, and they are often conflated in practice because both vendors and engineering teams have inconsistent conventions about what each metric covers.

PagerDuty and Atlassian both publish MTTA in their incident metric APIs, and both define it consistently as page-to-acknowledgment time. MTTR is more fragmented. PagerDuty measures to acknowledgment of resolution. Atlassian Statuspage measures to confirmed restored health if Statuspage is wired into the incident workflow. Datadog and other monitoring tools sometimes report MTTR as time-to-recovery of the underlying signal, which is a different measurement that does not include the human investigation time. When comparing MTTR across organisations, confirm the definition first; comparisons across inconsistent definitions are not meaningful.

A third metric that often gets bundled with MTTA and MTTR is MTTD (Mean Time To Detect), measuring time from incident occurrence to the page firing. MTTD is determined by your monitoring instrumentation, not by your incident response capability; it is a separate diagnostic that points at monitoring gaps rather than response problems. Healthy MTTD for production-impacting incidents is under 5 minutes from incident onset to detection.

Healthy vs Noisy Thresholds

Metric + condition	Healthy	Median	Noisy
MTTA, business-hours paging	Under 3 min	5 to 10 min	Above 15 min
MTTA, after-hours paging	Under 5 min	8 to 15 min	Above 25 min
MTTA, night paging (22:00 to 06:00)	Under 8 min	12 to 20 min	Above 30 min
MTTR, Sev-1 (customer-impacting)	Under 30 min	60 to 180 min	Above 4 hours
MTTR, Sev-2 (partial impact)	Under 2 hours	4 to 12 hours	Above 1 day
MTTR, Sev-3 (operational)	Under 1 business day	2 to 5 days	Above 1 week

The night MTTA degradation relative to business-hours MTTA is the most diagnostic metric for sleep-disruption-driven alert fatigue. A team with healthy business-hours MTTA (under 5 minutes) but night MTTA above 20 minutes is signalling that the night rotation is suppressing pager acknowledgment, either intentionally (engineers know that night pages are usually false positives and delay engagement) or unintentionally (sleep disruption is degrading response capability). Either way, the night-MTTA-gap is a stronger predictor of attrition risk than aggregate MTTA.

How Alert Fatigue Affects Each Independently

Alert fatigue degrades MTTA through learned delay: engineers under sustained alert noise learn that most pages are false positives, and they delay acknowledgment expecting the alert to self-resolve before they need to engage. This is rational behaviour in the face of noisy alerting; it is also the mechanism by which the real signal gets missed. Each individual delay is small (an engineer waits 30 seconds longer to check the page); aggregated across a noisy quarter, the MTTA degrades from 3 minutes to 12 minutes and stays there even after the underlying noise is reduced. The learned behaviour persists.

Alert fatigue degrades MTTR through cognitive overload rather than delay. An engineer who has been paged 8 times in their primary week, even if most pages auto-resolve, carries accumulated context-switching debt. When a real incident occurs, the operator's ability to reason cleanly about the system is impaired by the prior context-switches. The Gloria Mark 23-minute refocus penalty is per-interruption; an engineer recovering from many recent interruptions is operating below their baseline capability for system reasoning. Mitigation that would normally take 20 minutes takes 35 minutes; the marginal 15 minutes is invisible in the metric but real in cumulative impact.

The compounding is critical for diagnostic purposes. A team that has good MTTA but bad MTTR likely has a runbook coverage or training problem: the response starts promptly but the mitigation is slow. A team that has bad MTTA but good MTTR likely has an alert fatigue or pager hygiene problem: when the team actually engages, they mitigate well, but they are slow to engage. A team that has bad both has multiple compounding problems and should sequence fixes carefully. The MTTR-only view collapses these distinct failure modes into one number and produces misallocated fix investment.

Action by Metric Quadrant

Good MTTAGood MTTR

Maintain. The system is operating well; focus engineering attention on preventing rather than responding. Invest in SLO-based alerting (read /slo-vs-threshold) to keep this state robust as the system grows.

Good MTTABad MTTR

Runbook coverage and training gap. Engineers respond promptly but cannot mitigate quickly. Highest-leverage investments: runbook coverage on top-50 alert classes (read /runbooks-oncall), incident response training, possibly two-tier on-call (read /two-tier-on-call-cost) so deep-knowledge engineers escalate cleanly.

Bad MTTAGood MTTR

Alert fatigue and pager hygiene problem. When the team engages they fix things well, but they delay engagement. Highest-leverage investments: alert audit (read /alert-tuning), correlation and dedup (read /correlation-dedup), reduce false-positive rate. The metric improves only after the noise reduces; the learned-delay behaviour persists for weeks after volume reduction.

Bad MTTABad MTTR

Compounding failure. Both the engagement and the mitigation are impaired. Diagnose the dominant cause: if the team is small and noisy, alert hygiene first; if the team is large and uncoordinated, runbook coverage and rotation structure first. Sequence carefully; trying to fix both simultaneously often produces neither result.

The Trap of MTTR-Only Dashboards

Many incident-management vendor dashboards default to MTTR as the headline metric and bury MTTA in a secondary view. The implicit assumption is that customers care about total incident duration, not about whether the engineer engaged promptly. This is correct from a customer-impact perspective and misleading from an operational-diagnosis perspective. The two metrics serve different decision-making purposes and both belong on the headline dashboard.

Without MTTA on the dashboard, two failure modes that have similar customer impact but different fixes are not distinguishable. Failure mode one: engineers acknowledge promptly but mitigation is slow (the fix is runbook coverage and training). Failure mode two: engineers acknowledge late but mitigation is fast (the fix is alert fatigue and pager hygiene). Both produce MTTR in the 90-minute range; the metric on its own says "we have an MTTR problem" without telling you whether to invest in runbook authoring or in alert tuning. Investment in the wrong fix wastes engineering attention and does not move the metric.

Configure your dashboard to show MTTA, MTTR, and the gap between them on the same view. Trend each over the same time window. When MTTA and MTTR move together, you have aligned signal: improvements are real, regressions are real. When MTTA and MTTR diverge, you have diagnostic signal: the failure mode is shifting and the fix priority should shift accordingly. This is cheap dashboard work that produces consistently better diagnostic decisions over time.

Frequently Asked Questions

What is MTTA?+

MTTA (Mean Time To Acknowledge) is the average time from page firing to an on-call engineer acknowledging the page in the pager tool. PagerDuty and Atlassian both publish MTTA in their incident metric APIs. Healthy MTTA for paging-priority alerts is under 5 minutes for daytime pages and under 8 minutes for night pages. Industry median per PagerDuty 2023 is 8 to 15 minutes.

What is MTTR?+

MTTR (Mean Time To Resolve, sometimes Recover) is the average time from page firing to incident resolution. Definitions vary slightly across vendors: some measure to acknowledgment of resolution, others to confirmed restored service health. Healthy MTTR for Sev-1 incidents is under 60 minutes. Industry median varies widely by incident severity and complexity; the DORA 2024 elite-performer benchmark is under one hour for production-impacting incidents.

How does alert fatigue affect MTTA differently from MTTR?+

Alert fatigue degrades MTTA by training engineers to delay acknowledgment, expecting the alert to self-resolve before they need to engage. It degrades MTTR through a different mechanism: cognitive overload from accumulated alert noise reduces the operator's ability to mitigate effectively. The two effects compound: slow acknowledgment plus impaired mitigation produces both longer customer impact and worse incident outcomes. Some teams have good MTTR (incidents get fixed eventually) and bad MTTA (incidents take ages to start being addressed), which masks the real problem.

What is the MTTR-only dashboard trap?+

When a team measures only MTTR and not MTTA, they cannot distinguish between two failure modes that have similar customer impact but different fixes. Failure mode one: engineers acknowledge promptly but mitigation is slow (fix: improve runbooks, training, tooling). Failure mode two: engineers acknowledge late but mitigation is fast (fix: address alert fatigue, paging hygiene). Both produce similar MTTR; only MTTA separates them. Teams that measure both diagnose correctly; teams that measure only MTTR often misallocate fix investment.

What are healthy MTTA and MTTR targets?+

Healthy daytime MTTA for paging-priority alerts: under 5 minutes. Healthy night MTTA: under 8 minutes. Healthy MTTR for Sev-1 production-impacting incidents: under 60 minutes per DORA elite-performer benchmark. Healthy MTTR for Sev-2: under 4 hours. Healthy MTTR for Sev-3: under one business day. Targets should be set per incident severity, not globally; setting a single MTTR target produces perverse incentives to classify everything as low-severity to game the number.

Should MTTA include weekend and night pages?+

Yes, but report them separately. A single MTTA number that mixes daytime and night-time pages obscures the night-time degradation that is the largest sleep-disruption driver. Most mature operations report three MTTA metrics: business-hours MTTA, after-hours MTTA, and night MTTA (typically 22:00 to 06:00 local time). The night MTTA is the leading indicator of sleep disruption and rotation burnout; track it explicitly.

What is the relationship between MTTA, MTTR, and SLO error budgets?+

Indirect but real. SLO error budgets measure user-visible failure rate; MTTA and MTTR measure operational response. A system can have great SLO adherence (low failure rate) and poor MTTR (when failures happen they take ages to fix), or vice versa. The healthy state is both: low SLO breach rate and short MTTR when breaches occur. Read /slo-vs-threshold for the SLO mechanics that complement MTTA and MTTR measurement.