Vendor Risk Is Your Risk
Every third-party service your business depends on is a piece of infrastructure you don't control. Treat it that way.
Your customers don't care which of your vendors broke. If your product is down because Stripe is down, or your emails aren't sending because your provider is having problems, or your login is failing because your auth provider has an incident — to your customer, you're down.
This obvious thing is routinely forgotten. Small businesses in particular treat vendor reliability as someone else's problem, right up until the day it isn't.
The modern SMB is a dependency graph
A typical small-business tech stack quietly depends on ten to twenty third parties. Not counting tools used internally — just the things that sit in the customer request path or directly affect customer experience:
- Hosting provider
- DNS provider
- CDN
- Authentication provider
- Email sender (transactional and marketing)
- Payment processor
- Customer support tool (if it touches customer interactions live)
- Analytics (if integrations push data into customer-facing features)
- Third-party APIs (maps, tax calculation, address validation, etc.)
- The domain registrar itself
Each of these can have an outage. Each outage affects your customers, in ways that range from "annoying" to "can't transact." Your own code and infrastructure could be perfect, and you'd still have incidents.
Most SMBs can't name this list off the top of their head. Which is the first problem to fix.
The dependency inventory
Before you can manage vendor risk, you need to know what you depend on. This is a one-hour exercise, and it's worth doing in the open with whoever on your team knows the stack.
List every third party whose outage would affect your customers. For each one:
- What breaks if they're down?
- How would you first notice?
- What's the fallback?
- Who at your company owns the relationship?
This list is almost never written down in SMBs, even at companies that have been operating for years. The first time you create it, you'll usually find at least one vendor you'd forgotten about — often an integration a former employee set up that's still critical.
Status pages aren't enough
Every serious vendor maintains a status page. That's progress compared to ten years ago. But status pages have limitations:
They lag real incidents. The graph of "user reports" vs. "vendor acknowledgment" usually shows customers complaining 10-30 minutes before the status page updates. The vendor has to diagnose, internally decide it's a real incident, and then publish. You notice before they do.
They're often scoped narrowly. A "partial outage in the EU" status might affect 100% of your users if your users are in the EU. The page technically reflects the vendor's view, but the impact map is yours.
They miss degraded-but-working scenarios. Some of the worst vendor issues are intermittent slowdowns, elevated error rates at 2-5%, or authentication oddities that only affect a subset. These rarely make the status page.
You have to be subscribed. A status page you don't watch doesn't protect you.
The right play is to aggregate status pages from all critical vendors into one place where you actually see them — a channel, a dashboard, or a monitoring tool that consumes status-page APIs. This is infrastructure work, but it's small infrastructure work. Is That Down was built to do exactly this: one place that watches every vendor status page you tell it to, and alerts when any of them declare an incident.
Monitoring your own funnels beats trusting status pages
A more robust pattern is to monitor the effect of vendor health on your own service. Synthetic transactions that exercise the critical paths your vendors participate in:
- A scripted checkout end-to-end, once a minute
- A scripted login round-trip
- A test email sent through your transactional provider and checked at the destination
- A test API call to a critical third-party dependency
When one of these breaks, you care about the impact whether or not your vendor has declared an incident. This catches things status pages miss and gives you faster detection than waiting for the vendor to acknowledge.
Fallbacks aren't always graceful, but they should be known
"What's the fallback?" is the question most SMBs answer with a long pause.
For some dependencies, there's no graceful fallback. If your domain registrar goes down during a DNS emergency, you can't pivot mid-incident. That's fine — but it's worth knowing there's no fallback, so you don't assume one exists.
For other dependencies, there's a real fallback available if you've prepared:
- Secondary DNS provider at a different vendor, pre-configured
- Secondary email-sending provider with authentication already set up at the domain level
- Local-caching fallback for third-party APIs used in hot paths
- A "maintenance mode" page that activates automatically when the backend is unreachable
The common denominator: these have to be set up before you need them. Mid-incident is the worst time to design a fallback.
Contract-level questions worth asking
For vendors in the critical path, it's worth asking a few questions during procurement (and being willing to walk if the answers are bad):
- What's your uptime SLA, and what's the remedy for missing it?
- How do you notify customers of incidents? On what cadence?
- What's your data portability story if we want to leave?
- How do you handle scheduled maintenance that affects availability?
- What's the shortest path to a human engineer when something breaks?
If you want a more structured set of questions, the two canonical industry questionnaires are the Shared Assessments SIG[2] and the Cloud Security Alliance's CAIQ[3]. Both are overkill for most SMB vendor evaluations, but skimming them tells you what sophisticated procurement teams ask — which is what you'll eventually be asked too, when a large customer starts running diligence on you.
SMBs often skip this during evaluation because they assume the terms are non-negotiable. Sometimes they are. But "take it or leave it" is itself information — a vendor unwilling to negotiate basic SLA details is telling you how flexible they'll be in a real incident.
The compounding effect
Vendor risk is cumulative. Google's SRE availability table[1] spells out the math: 99.9% uptime is 8 hours 45 minutes of downtime per year; 99.95% is about 4 hours 22 minutes; 99.99% ("four nines") is 52 minutes. If each of ten critical vendors independently hits 99.9% — industry-standard for most SaaS — and you depend on all of them, your effective uptime floor is closer to 99%. That's nearly four days a year of some vendor being down.
They don't fail independently, usually, which helps. But the intuitive "each vendor is reliable" story glosses over the multiplication.
The real cost of that much downtime is more than people think. The direct revenue loss is the smallest part. The reputational effect — customers who've had multiple bad experiences attributed to "the vendor" but remembered as "you" — compounds over time.
The posture that works
The SMB that handles vendor risk well isn't the one that has a vendor-management framework. It's the one that has:
- A written list of critical dependencies
- Status monitoring on each
- Known fallbacks (or known "no graceful fallback" notes) for each
- A quarterly review of the list
That's it. Everything else is optional. The pillar on boring IT for SMBs places vendor risk in context with monitoring, alerting, and the rest of the ops stack — the short version is that your dependency graph deserves the same hygiene you give your own infrastructure, because functionally, it is your infrastructure.
References
- Google, Site Reliability Engineering, Appendix A: Availability Table. sre.google/sre-book/availability-table. ↩
- Shared Assessments, Standardized Information Gathering (SIG) Questionnaire. sharedassessments.org/sig. ↩
- Cloud Security Alliance, Consensus Assessments Initiative Questionnaire (CAIQ). cloudsecurityalliance.org. ↩