The recent global IT outage experienced by a cloud hyperscaler was a reminder of a universal truth in technology: even if it’s minimal, downtime and service disruptions are inevitable. While the impact was widespread, disrupting services across retail, banking, healthcare, and other sectors, this wasn’t a failure unique to a single provider or a single cloud. It illustrates that disruption can occur anywhere: in any cloud region, with any provider.
The key takeaway is clear: organizations can and must take control by building a resilient data architecture that can adapt and thrive amid constant change. In this blog, we’ll share how Cloudera customers are uniquely positioned to ensure business continuity thanks to the flexibility our portable architecture and tools that ensure seamless failover and recovery. Cloudera is the only data and AI platform company that brings AI to data anywhere: in clouds, data centers, and at the edge.
Data resilience is an organization's ability to withstand, recover quickly from, and minimize the impact of data-related disruptions or failures. It is a proactive approach to business continuity, going beyond backup or disaster recovery to ensure that critical data always remains:
Available: Accessible to users and applications when needed (minimizing recovery time objective or RTO)
Intact/accurate (data integrity): Uncorrupted and unaltered (minimizing recovery point objective or RPO)
Secure: Protected from unauthorized access, loss, or theft
Architecting for true resilience involves two core, interconnected pillars: technology that enables portability and a vetted process for failover.
Relying on a single provider, a single cloud, or even a single region within a cloud creates a critical business vulnerability, or single point of failure. Outages occur due to hardware failures, software issues, human error, natural disasters, or cyberattacks. The goal of resilience is to ensure that when one environment goes down, your operations can seamlessly and automatically continue elsewhere.
This means you must be able to failover anywhere—between cloud regions, across cloud providers, and even back to a data center. Business operations must continue, and critical systems must remain up and running, regardless of where the initial disruption occurred.
Technology can provide resilience capability, but the process is essential for successful business continuity. Too many disaster recovery plans are written once and rarely revisited, even as people and technology evolve. A well-vetted plan is documented, practiced, and revisited regularly to ensure that the organization can execute in the event of a failure. Some elements of the plan include:
Prioritizing workloads to ensure mission-critical operations, such as transaction processing in retail and remote monitoring in healthcare, have the lowest service level agreements (SLAs) for RTO and RPO.
Ensuring redundancy and high availability by establishing the ability to failover between environments to maintain operations.
Backing up critical data and metadata, and establishing retention policies and governance.
Cloudera is the only data and AI platform provider that delivers a consistent cloud experience to data anywhere. This gives enterprises the freedom to move data and AI workloads between clouds and data centers—without friction or vendor lock-in—so that you’re no longer tied to any one piece of infrastructure. As a result, organizations can reduce business risk by leveraging Cloudera to architect for resilience and maintain consistent operations and compliance no matter where data resides.
The Cloudera platform supports high availability and disaster tolerance through our solutions and services, including:
Portable Data Services: Cloudera’s platform, including cloud-native data services and data lake, runs consistently on any cloud (AWS, Azure, Google Cloud) and on premises in Kubernetes. The freedom from underlying infrastructure enables customers to configure a variety of available sites—mixing different clouds and on-premises resources—to drastically reduce dependency on a single platform or vendor.
Data in Motion: Cloudera Data Flow, Cloudera Streaming Analytics, and Cloudera Streams Messaging enable customers to capture, process, and distribute data anywhere in real time. For mission-critical, real-time workloads like fraud detection and network monitoring, a potential outage can have significant business impact. Cloudera ensures these services remain highly available and can be replicated across environments.
Replication Manager: This core Cloudera component provides a simplified approach to backup and recovery. It replicates not just the data, but also the metadata, critical security and governance policies tied to that data. This replication enables easy migration, continuous synchronization, and, most importantly, the ability to quickly failover by promoting a secondary replicated environment alongside the primary operating environment with minimal data loss.
Open Data Lakehouse: Cloudera’s open data lakehouse provides secure data management and portable cloud-native data analytics with a write-one, run-anywhere approach. This eliminates the time and costs associated with refactoring applications or workloads when moving between different infrastructures.
Figure 1. Cloudera Delivers the Cloud Experience Anywhere for AI Everywhere
Together, these capabilities enable Cloudera customers to run mission-critical data and AI workloads with confidence, ensuring near-zero downtime and data loss for their most important business processes, even during an infrastructure-level outage.
For many businesses, the recent outage was just a blip. But what if the disruption was a true disaster, like a war? Based in Ukraine, AM-BITS, an IT solutions provider for the banking, telecom, and retail sectors, faced an urgent need to secure and migrate their clients’ mission-critical data after geopolitical disruption forced organizations to rapidly accelerate their shift from on-premises systems to the cloud. A typical cloud migration could take six months or more—a timeline that many businesses could not afford.
To address this crisis of continuity, AM-BITS built a modern, multi-tenant data and AI platform powered by Cloudera. Leveraging Cloudera Shared Data Experience (Cloudera SDX), AM-BITS rapidly provided a “technical safe harbor” for its clients’ data assets, drastically reducing the time to securely migrate data to the cloud by 50%. Because Cloudera operates seamlessly across any environment, AM-BITS’ clients gained true flexibility: they could migrate to the cloud quickly, but they also maintained the option to move to a different cloud or bring data back on premises. By leveraging Cloudera, AM-BITS turned portability into a powerful tool for business continuity.
Data-related disruptions and outages can be caused by hardware failures, software issues, human error, natural disasters, cyberattacks, and more. It’s critical that organizations design their systems with those points of failure in mind and have a plan in place to recover their IT systems and data quickly and without significant disruption.
To learn more about how you can architect for resilience with Cloudera, take a look at our disaster recovery checklist and resources, or reach out to our professional services team who can help you design a plan for resilience.
This may have been caused by one of the following: