December 9, 2025 | 19:50
Reading-Time: ca. 5 Min

Why SLAs Are Often Little More Than Marketing

As a follow-up to my blog “What Really Measures IT Success”, today I am writing about a related topic.¹ It deals with the promises made by cloud and service providers and how they entice customers with availability figures beyond 99%.

Anexia/Netcup, the infrastructure provider where I operate, among other things, my mail server, VPN gateway, and a few other hosts, states an availability of 99.6% on its website.² That means a theoretical downtime of up to 40 minutes per week or up to 35 hours per year.

Microsoft Azure, for example, cites 99.95% or 99.99% as a benchmark for enterprise availability.³ In purely mathematical terms, this would correspond to a maximum of just one minute of downtime per week or up to 53 minutes in an entire year. Impressive at first glance.

As often, the issue lies elsewhere, not in the technology.

SLA Tricks and Fine Print

For most providers, availability is measured based on many individual SLAs (Service Level Agreements).⁴ The availability of a virtual machine is completely separate from the availability of storage, authentication, or a reverse or load-balancing proxy.⁵

Sounds complicated, doesn’t it? From many projects I know that this fine-grained separation can become a stumbling block. At the same time, there is a shift away from technical matters toward legal subtleties with their own rules. Understanding this has become an industry of its own.⁶

Points That Deserve a Closer Look

Here is an overview of some well-known industry patterns. They do not apply to all providers and are not meant as universally valid statements. As always: no claim to completeness. Your mileage may vary.

SLAs only take effect after a support ticket has been submitted. The 10 to 15 minutes often needed beforehand to receive, verify, and formulate a ticket are simply ignored. If external service providers and additional partners are involved, hours can pass before a ticket is actually created.
Short-notice “planned” maintenance windows are not counted as downtime in many SLAs and effectively serve as a free pass for providers. Whether an outage or “planned” maintenance: For customers, the system is unavailable.
Short outages, for example those lasting less than five minutes, are contractually excluded by many providers. Unfortunately, 15 outages of four minutes each still add up to a full working hour by the end of the month that goes unaccounted for.
Multiple outages with a common root cause are grouped into a single event. Three outages lasting 10, 20, and 30 minutes respectively become one single event with only 30 minutes of total downtime after resolution, instead of the actual 60 minutes.
Different monitoring measurement points: service providers often measure availability only internally at an API, not at the actual service. For example, if high latencies make a service unusable, this is often not captured.
Complex proof requirements and reporting procedures. Customers must document outages themselves and comply with standardized reporting channels. For non-technical users, this is often barely manageable. Complexity and frequent changes to the process are sometimes intentional.
SLAs are always evaluated monthly, not annually. “Bad” months are quietly and effectively averaged out.
Local or regional outages do not count if other regions of the provider continue to operate. Economically this is still painful for the customer, but formally there is no SLA violation.
SLAs only apply when using the “correct architecture” or booking certain services. Microsoft Azure, for example, only grants 99.99% availability if Azure is used the way Azure wants it to be used, with availability groups, zones, multiple instances, and much more. Many are not aware of this at the beginning. Later on, some deliberately choose a different approach for cost reasons.
Architectural complexity becomes mandatory. The operational risk always remains with the customer. This allows for flexible interpretations when dependencies interlock. A brief hiccup in a critical service can still cause side effects and outages hours later. These ripple effects understandably do not appear in the SLAs of cloud providers.⁷

The Elephant in the Room is More a Cash Cow

It is not that high availability is inherently more expensive. Redundancy does, however, significantly increase data traffic for forwarding, replication, or distributed backups.

On closer inspection, the supposed elephant in the room turns out to be a well-protected cash cow:⁸ Providers charge aggressively internal traffic as soon as someone is stuck in vendor lock-in.⁹ Infrastructures or central applications are not changed every few years. This is where customers are most locked in and least likely to change.

At this point, many cost-effectiveness calculations fail.

Conclusion

SLAs rarely protect the customer; they primarily protect the provider. The rest is pure marketing. They are only of limited use for fact-based statements about actual availability or real costs.

Companies are withdrawing from public cloud infrastructures. This trend is called repatriation. According to an IDC report from 2024, large enterprises in particular are increasingly bringing workloads back into their own server rooms and data centers.¹⁰ The golden cloud-native days are over, according to a recent Heise analysis.¹¹ Add to this the current geopolitical situation.¹²

This does not mean that cloud solutions are inherently bad or uneconomical. As so often, it depends on the individual case.

In 2019, I had the opportunity to analyze this for a mid-sized construction company with five locations. The question at the time was whether the new DocuWare document management system should be operated in the cloud or in the company’s own server room.

Even with conservative assumptions about price developments, based solely on inflation adjustment, the analysis showed a clear result: operating the system in-house was more economical over a ten-year period. This included building a second server room with air conditioning, redundant hardware, and infrastructure. The cost driver then, as now, was internal traffic, which the cloud provider charged for operations, replication, and backups.

Do you have projects or ideas for the new year?
I would be happy to support you.

Yours,
Tomas Jakobs

Member of UberBlgr Webring: < Back Next >

Support this blog - Donate a Coffee

December 9, 2025 | 19:50Reading-Time: ca. 5 Min