The Real Cost of a One-Hour Outage

When small-business founders think about the cost of downtime, they do the intuitive math: hourly revenue × duration.

"We do $2,000/hour. We were down for an hour. That cost $2,000."

That number is easy to tolerate. It's roughly a rounding error on a month's P&L. Which is why so many SMBs underinvest in reliability — the immediate math doesn't justify anything more than it already has.

The problem is the math is wrong. Not mathematically — it computes correctly. But "direct revenue loss during the outage" is the lower bound of what an outage actually costs, not a reasonable estimate. The real cost, for most businesses, is 3–10x larger. ITIC's 2024 survey of over 1,000 firms found that 90%+ of mid-to-large enterprises put the average hourly cost of downtime above $300,000, and 41% put it between $1 million and $5 million^[1] — numbers that have to include more than direct revenue loss to make sense.

What the revenue-loss number leaves out

Five categories of cost are routinely missed:

Delayed-purchase recovery isn't 100%. The assumption in the hourly-revenue math is that a customer who wanted to buy at 3 PM will come back and buy at 4 PM. Some will. Many won't. The more friction-tolerant the customer (enterprise B2B), the more they'll wait. The more impulse-driven (consumer ecommerce), the less.

Support cost spikes for days. Tickets generated by the outage — "is your site broken?" "my transaction didn't complete" "I got charged but didn't get a confirmation" — don't all arrive during the outage itself. They arrive over the following three to five days, as customers notice the effect, come back to check, or get charged on statements.

Engineering time is consumed. Someone has to diagnose the cause, write the post-mortem, implement the fix, do the rollout. For a one-hour outage, this is typically 4-12 hours of engineering time in the week following. That time is real — the features that weren't built because that engineer was doing incident recovery are a cost.

Trust compounds. The first outage is a story. The second one is a pattern. The third is "are these guys going to be around in a year?" This effect doesn't show up on the balance sheet until it's catastrophic, at which point it's irreversible.

Enterprise sales impact. If you sell to businesses, your reliability story is a procurement question. One outage is forgivable. Several are a diligence issue. A prospect passing on you — or negotiating harder because of a reliability concern — is an outage cost nobody tracks, because it's invisible.

A better estimate

Here's a more honest framing:

Direct revenue loss

Hourly revenue × duration. This is your lower bound.

Recovery rate

What fraction of lost-during-outage demand actually comes back? For most SMBs, 50–80% for transactional businesses, higher for subscription/B2B.

Support overhead

Tickets generated × avg resolution time × loaded support cost. Usually 2–5% of affected customers open a ticket.

Engineering recovery

Person-hours × loaded engineering cost, including the week-long tail of post-mortem and fix work.

Reputational tail

Hardest to quantify. A reasonable proxy: your typical customer LTV × small fraction (0.5–2%) of affected customers who churn within the next 90 days.

Run through this exercise for a hypothetical one-hour outage on your business. The total is almost always multiples of the direct revenue number. It's not uncommon to get to 5-10x.

The cost scales with your customer profile

Different businesses have different shapes of outage cost:

Consumer ecommerce. High impulse-purchase rate means low recovery. Revenue loss is closer to real loss, but support cost is high because many low-value customers complain via many channels.

Transactional B2B. High recovery rate (customers are patient, they'll come back), but trust damage is outsized. A single outage during an enterprise sales cycle can cost a deal that would have been orders of magnitude larger than the hourly revenue number.

Subscription SaaS. Direct revenue loss is approximately zero (you already billed). But churn in the 30–90 days following an outage can spike, and it's directly attributable if customer interviews are done honestly. One bad outage can cost a full percentage point of monthly churn rate on top of baseline.

Marketplace. Compounds badly. Buyers can't buy, sellers can't sell, and both sides may decide to go elsewhere. Trust effect is bidirectional.

In all of these, the vendor risk component matters — if your outage was caused by a vendor, customers don't care. The cost is yours.

The framing that changes decisions

The reason the revenue-only math leads to underinvestment is that it makes reliability look cheap. "We have an outage once a quarter, costs us maybe $2k each, that's $8k a year. Why would I spend more than $8k on monitoring?"

The same decision with the real math: "Our actual cost per outage is $10k–20k, we're having four a year, that's $40k–80k in impact, and the second-order effects include enterprise deal concerns. A $500/month monitoring stack that halves our outage count is a no-brainer."

Same situation, different frame, different decision.

This is close to the reasoning behind why SMBs set up monitoring last — the cost of outages is felt over time and across categories, while the cost of monitoring is felt as a line item today. Without explicit math, the line item loses.

What published post-mortems tell us

The public record helps calibrate the real cost. A few well-documented outages worth reading if you haven't:

AWS S3 us-east-1, February 2017. A mistyped command during debugging removed more servers than intended, taking S3 offline for about four hours and cascading into every AWS service that depended on it^[2]. The customer impact — from major SaaS platforms going dark to a generation of "we should stop putting everything in one region" post-mortems — vastly exceeded any direct AWS revenue loss.
Cloudflare, July 2019. A single regex rule with catastrophic backtracking spiked CPU across the fleet and took down Cloudflare-fronted sites for 27 minutes^[3]. Cloudflare fronts roughly 10% of internet traffic. The direct Cloudflare revenue impact is a fraction of the ripple-effect cost to their customers.
GitHub, October 21, 2018. A 43-second network partition triggered a database failover that took 24+ hours to fully reconcile^[4]. The outage itself was short; the consequences were long.

Each of these illustrates the same pattern: the direct outage cost was a small fraction of the total business impact. Engineering teams rebuilt things, customers lost trust, post-mortems consumed weeks, and follow-on architectural changes consumed months.

The outages nobody tracks

Beyond the visible outage, there are the ones that barely register:

A partial outage that affected 10% of users for 40 minutes
A degraded performance incident that caused checkout timeouts without full failure
A third-party outage that made one feature unusable
An incident that was caught by engineering before customers noticed, but took half a day to mitigate

These don't show up on the hourly-revenue calculation at all. They show up in the cumulative trust erosion and the cumulative engineering drag. The pillar on boring IT ops makes the case that most SMBs have more of these than they track, and that they add up to more total cost than the visible outages do.

The implication for investment

You don't have to build enterprise-grade reliability to capture most of the return. You have to get past three common failure categories:

Silent failures that run for weeks before anyone notices. These are cheap to monitor and expensive to miss.
Response time — the gap between "something broke" and "someone is looking at it." Most SMBs have 30-90 minute response times by default, because alerts go to channels nobody's watching out of hours. Website Uptime Monitor covers the detection half; an alerting channel someone actually watches covers the rest.
Vendor-caused outages you can't prevent but can detect faster and communicate about. Is That Down aggregates vendor status pages so you can rule third parties in or out in seconds.

None of these require a big investment. The case for monitoring on the cheap versus DIY is a useful related piece — there's a middle ground of modest spend that covers 80% of the downside, which is the right first investment for almost any SMB.

The short version

The cost of a one-hour outage is not one hour of revenue. It's the visible revenue plus a week of support, a week of engineering recovery, a sliver of churn, and a small dent in the trust you're trying to build with customers who are deciding whether to stay.

Run the full math once. Most SMB founders do it and come away deciding that reliability work is suddenly obviously worth the investment. It was already worth it — the accounting just wasn't catching it.

References

ITIC (Information Technology Intelligence Consulting), 2024 Hourly Cost of Downtime Report. Polled over 1,000 firms worldwide, November 2023 – March 2024. itic-corp.com/itic-2024-hourly-cost-of-downtime-report. ↩
Amazon Web Services, Summary of the Amazon S3 Service Disruption in the Northern Virginia (US-EAST-1) Region, March 2017. aws.amazon.com/message/41926. ↩
Cloudflare, Details of the Cloudflare outage on July 2, 2019. blog.cloudflare.com/details-of-the-cloudflare-outage-on-july-2-2019. ↩
GitHub, October 21 post-incident analysis, 2018. github.blog/news-insights/company-news/oct21-post-incident-analysis. ↩