Cloudera named a leader in The Forrester Wave™: Data Fabric Platforms, Q4 2025

Read the report
| Business

2025 Was the Year the Cloud Reminded Us Who's Really in Control

Suzy Tonini Headshot

Why the outages keep happening, and what you can actually do about it

2025 was rough if you were betting your business on a single cloud vendor. In December, Snowflake customers watched helplessly as a schema update cascaded across multiple regions, blocking queries for 13 hours. Databricks users dealt with days of degraded AI services

In October, Amazon Web Services (AWS)'s US-East-1 region went dark for 15 hours—a DNS error affecting DynamoDB took down over 1,000 companies. In June, a null pointer exception in Google Cloud's Service Control binary disabled multiple systems including Cloud Storage, Compute Engine, and BigQuery for several hours, with ripple effects hitting Spotify, Discord, and OpenAI.

Across all of these incidents, the pattern was the same: customers refreshed status pages and waited for someone else to fix the problem. The difference between vendors is not whether outages happen, it’s what options you have when they do.

The Pattern: Single Points of Failure with Global Reach

Snowflake’s December incident was triggered by a backwards-incompatible database schema update. Version mismatch errors caused operations to fail or hang indefinitely across multiple regions on AWS, Microsoft Azure, and Google Cloud Platform (GCP). Snowflake's communications stated there were no workarounds except for customers who had pre-configured replication to non-impacted regions. Everyone else waited.

Databricks’ December outage (spanning multiple days) included Unity Catalog issues, compute degradation across multiple regions, and a Mosaic AI disruption that stretched for days. Status updates repeatedly noted they were "working with the cloud provider on potential mitigation paths." That phrase tells you everything about the dependency chain: when Azure has a bad day, Databricks customers on Azure regions have a bad day too.

The Google Cloud June incident revealed the same vulnerability. A faulty policy with blank fields was inserted into global configuration tables and replicated worldwide within seconds. The corrupted data triggered crash loops that took down core services for 7.5 hours. Google's own status dashboards were initially unavailable—SRE teams could not even confirm the scope of the disaster.

Regional redundancy does not help when the failure is logical rather than physical. When a platform relies on globally coordinated metadata or shared configuration, a single bad update propagates everywhere. The failure follows you from region to region.

Additionally, in these scenarios, the infrastructure is distributed, but control remains centralized. When Snowflake's control plane breaks, it doesn’t matter that they run on AWS, Azure, and Google Cloud underneath. When Databricks is waiting on Azure to fix something, multi-cloud marketing does not help. The single point of failure is the proprietary layer on top.

What Analysts Are Saying

The Gartner® 2025 analysis of cloud adoption trends estimates that more than 50% of organizations will not get the expected results from their multi-cloud implementations by 2029. The core problem: lack of interoperability between environments. 

In Forrester Predictions 2026: Cloud Outages, Private AI On Private Clouds, And The Rise Of The Neoclouds, the research firm predicts at least two major multiday cloud outages in 2026. The cloud industry is undergoing a massive infrastructure transition as hyperscalers race to build AI-native data centers. That investment is coming at a cost: legacy x86 and ARM environments are being deprioritized, leading to aging infrastructure faltering amid growing complexity.

In the same Forrester predictions piece, they estimate that at least 15% of enterprises will shift toward private AI deployments built on private clouds in 2026. The drivers: rising AI costs, concerns about data lock-in, and the operational risk of depending on infrastructure that is increasingly optimized for someone else's priorities. The 2025 outages were a preview of what happens when your workloads are not the provider's top concern.

Architect for Resilience with Cloudera

Most enterprises have “accidental multi-cloud” architectures by way of acquisitions, shadow IT, or best-of-breed tool selection—not through deliberate architectural planning. Their workloads are scattered across providers but they lack the ability to move data and workloads when things go wrong. 

Architecting for resilience involves ensuring your data and AI platform enables portability and eliminates single points of failover.

The Cloudera platform is designed for portability, giving you the ability to fail over between environments to maintain operations—workloads and data can move across AWS, Azure, Google Cloud , and on-premises environments without rewrites, friction, or vendor lock-in. Updates are not forced as global, non-backward-compatible changes.

When the inevitable outage happens, you have options: fail over to another cloud or move workloads back to your data center. You’re not stuck watching a status page—you remain in control of your data and can maintain consistent operations and compliance no matter where data resides.

For a deeper dive on how to build a resilient architecture with Cloudera, read our blog: Architecting for Data Resilience: Ensuring Business Continuity with Cloudera

Looking Ahead

The AI buildout is straining infrastructure, and analyst firms point to more turbulence moving forward: Forrester predicts multiday outages, Gartner predicts defensive multi-cloud adoption. Enterprises that come through 2026 in good shape will be those who treat resilience as an architectural principle rather than a compliance checkbox.

Cloudera does not have push-button cross-cloud failover out of the box—nobody does. But we’re architecturally positioned to support that resilience in ways proprietary platforms are not.

If the 2025 outages made you uncomfortable, we would like to have that conversation. Because the cloud is just someone else's computer. And when that computer has a bad day, you should have somewhere else to go.

To learn more about how you can architect for resilience with Cloudera, reach out to our professional services team, check out our product demos, or sign up for a free 5-day trial.

 

Ready to Get Started?

Your form submission has failed.

This may have been caused by one of the following:

  • Your request timed out
  • A plugin/browser extension blocked the submission. If you have an ad blocking plugin please disable it and close this message to reload the page.