ClouderaNOW   Navigate data architectures, sovereign clouds, & edge data for AI   |   July 15

Register
| Business

From Hybrid by Accident to Hybrid by Design: Mastering Data Sovereignty and AI Cost Control

Kierstan Williams Headshot
woman with laptop

No enterprise intentionally builds a chaotic tech landscape. It usually sneaks up through acquisitions, teams buying their own tools, and disconnected, oftentimes partial,cloud migrations. The result is a "hybrid by accident" architecture, an IT environment built on years of reactive choices rather than intentional actions, and worse, nobody has a real plan to fix it. 

As enterprise AI moves from experimentation into production, accidental hybrid is more than a technical inconvenience, it’s a strategic liability. To maintain data sovereignty and avoid skyrocketing AI costs, organizations need to embrace a hybrid-by-design architecture. Those that make this shift deliberately will unlock the full value of their data assets. Those that don't will find their architectural debt compounding with every passing year and every new AI initiative.

That was the topic at hand in my recent conversation with guest speaker Noel Yuhanna, VP Principal Analyst at Forrester, as part of the Cloudera webinar "Welcome to the Era of Hybrid by Design."

In this blog, I'm expanding on that conversation and outlining how organizations can create a hybrid-by-design architecture by leveraging unified governance, open standards, and having a clear map of AI lifecycle requirements.

How Enterprises End Up Here

Beyond M&A activity, siloed tool adoption, and lines of business locking in preferred vendors, the rapid push to the cloud of the past decades is a major culprit. Many organizations migrated fast and broadly, and now the pendulum is swinging back. Repatriation is a real and growing conversation.

The numbers tell the story: CIO-reported plans to repatriate workloads back on-premises rose from 43% in 2020 to 83% in 2024. This isn't a rejection of the cloud; it's a maturing recognition that not every workload belongs there. In fact, as Yuhanna points out, roughly 80% of transactional processing in banking and healthcare still runs on-premises today. The early "cloud is cheap" misconception has given way to hard questions about architecture optimization, overprovisioning, and egress costs that erode the value proposition.

The Regulatory Pressure Underneath It All

Compliance is forcing immediate action on what used to be a slow-moving IT problem. Regulations like GDPR, the EU Data Act, and HIPAA demand strict data sovereignty. Meanwhile, the US CLOUD Act, which allows US authorities to access data globally, is colliding with EU and APAC privacy rules, actively driving enterprises toward sovereign, non-US cloud providers. In the financial sector, DORA is mandating vendor exit strategies because relying too heavily on a single cloud is now a systemic risk.

With new AI regulations demanding strict traceability, this pressure will only increase in 2026 and beyond. Data and AI governance are merging into one massive compliance hurdle; companies without the right architecture face a painful, expensive retrofit.

Why AI Makes This Urgent Now

Enterprise AI converts a chronic infrastructure problem into an acute one. The AI lifecycle has vastly different needs at each stage: training and contextualization demands bursty, large-scale compute which may make it more well suited for the cloud, while steady-state inference is often more economical on-premises. "Intentional hybrid" means mapping each stage to the infrastructure that actually fits it, rather than defaulting to a single environment and absorbing the penalties.

Data gravity complicates this further. AI requires massive volumes of distributed data, and moving it across environments carries real latency and egress costs. You are forced into a corner: constrain models to a limited dataset (sacrificing quality) or absorb massive fees to centralize the data (destroying the business case).

Agentic AI sharpens this challenge considerably. Because these systems require real-time, trusted data to take action, batch-lagged pipelines simply won’t survive. As Yuhanna notes, agentic AI adoption currently sits around 24% and is expected to double by the end of 2026. The organizations building a proactive architecture for that reality today will capture the value tomorrow.

The Case for Open Standards

Vendor lock-in isn't just a theoretical risk; it’s an active cost at both the infrastructure and software layers. When your proprietary data tools only run on a specific cloud, you face a compounding "double lock-in." This hands all leverage to the vendor, creating a severe bottleneck the moment a workload needs to move, whether you are shifting a finished cloud pilot to run on spare data center capacity, or migrating to a new sovereign cloud to meet compliance. Organizations reclaim that leverage through workload portability and open standards.

Two defining standards make this possible:

Kubernetes acts as a universal abstraction layer for your underlying infrastructure. By providing a consistent cloud-native operational model regardless of what hardware or cloud provider sits underneath, it eliminates the "platform-hopping tax"—the re-engineering overhead that accumulates every time a workload crosses an infrastructure boundary.

Apache Iceberg does the equivalent job at the data layer. It isn't just about abstracting where your data lives; it's about expanding who can access it. The open table format and the Iceberg REST catalog allow organizations to share data in-place with any third-party system. This means you can leave your governed data exactly where it is, while allowing external analytics platforms to query it directly. By completely decoupling data from vendor-specific compute engines, organizations gain genuine, future-proof flexibility in how and where they run AI.

Consider what scale actually looks like in practice. Yuhanna recently encountered a customer connecting 50,000 databases across 1,000 disparate source systems. At that magnitude, complexity doesn't grow linearly, it compounds. Open standards aren't a nice-to-have; they're how enterprises stay in control of their own environments.

The Governance Gap and What It Costs

Fragmented infrastructure reliably produces fragmented governance. As Yuhanna highlights, roughly 70% of enterprise data lacks proper metadata and cataloging; meaning, only 25% is actually used for analytics and most enterprise data sits completely untouched! In 2006, British mathematician and data science pioneer, Clive Humby, famously coined "data is the new oil," noting that raw data must be refined by AI and analytics to drive real value. If every piece of data contains potential insight, why would you tolerate an architecture that actively prevents you from using all of it?

The security implications are just as concrete. According to IBM’s 2025 Data Breach Report, multi-environment breaches average over $5 million (well above the $4.44 million global average) and now account for roughly 30% of all incidents. The reason is simple: breaches happen at integration points, and every environment boundary is an integration point.

The answer is a unified policy layer: a single, federated control plane spanning classification, access control, lineage, auditing, and compliance. In this model, policies follow the data, applying consistently and in real time across the entire ecosystem.

Where to Start

Driven by the real-world demands of AI in production, tightening data sovereignty requirements, and a sharper focus on what infrastructure actually costs, organizations need to move from hybrid-by-accident to hybrid-by-design architectures. Here’s how to get started:

Establish clarity of purpose. Before touching any technology, build an 18-month roadmap anchored to concrete business outcomes, whether that is revenue growth, cost optimization, or resilience targets.

Conduct a data gravity audit. Map out where data actually lives, who accesses it, and your latency and egress exposure. This reliably surfaces forgotten workloads, duplicate data, and compliance blind spots.

Execute deliberate rationalization. Streamline overlapping tools, consolidate vendor relationships, standardize governance, and build for workload portability.

To learn more, replay my conversation with Noel Yuhanna and dive deeper with the “From Chaos to Control: Why ‘Hybrid by Design’ Is the Future of Enterprise Data Strategy” Industry Trend Report.

 

Ready to Get Started?

Your form submission has failed.

This may have been caused by one of the following:

  • Your request timed out
  • A plugin/browser extension blocked the submission. If you have an ad blocking plugin please disable it and close this message to reload the page.