The Data Readiness Index 2026: Understanding the Foundations for Successful AI

See the results

April 30, 2026 | Technical

Bridging the Gap Between High Performance Computing (HPC) and Sovereign AI: Part Two of Three

7 min read • by Gabriele Folchi and Lama Itani

If you haven’t read part one on the basics of high performance computing (HPC), check it out now!

Key Principles of a Sovereign Data Lakehouse

The Open Data Lakehouse: A Simple PaaS for Engineers

While traditional engineering simulation software excels at helping mechanical engineers prepare, execute, and analyze simulation jobs, it lacks a native design to manage modern machine learning (ML) workflows and data pipelines. An open data lakehouse can bridge this gap, offering R&D engineers robust, contemporary capabilities on a platform that the IT department is likely already familiar with.

Key use cases and benefits of an open data lakehouse include:

Cost-effective, governed data archiving: Offers virtually unlimited, low-cost storage for archiving years of simulation snapshots (the datasets generated by solver sessions). This storage is managed and governed consistently across all engineering and IT organizations or teams. Critically, essential metadata and lineage are preserved for each dataset, transforming it from an opaque file into a trusted asset that can be easily reused beyond its original creator.

Simplified access to compute resources: Engineers can easily and rapidly deploy shared notebooks and Apache Spark or Python Ray clusters. These often share the same dedicated GPU resources used by the main HPC cluster.

Protection via open standards: An open data lakehouse prioritizes open standards like Apache Iceberg, Parquet, and Python over proprietary engineering formats. This is crucial for safeguarding a company's Intellectual Property (IP), ensuring simulation data remains accessible and usable by any tool, now and in the future, regardless of the company's evolving IT infrastructure or provider strategy.

A cloud-like PaaS experience: Data lakehouses structured as user-friendly, self-service platform-as-a-service (PaaS) stacks simplify the use of complex data engineering and MLOps tools, effectively bridging the knowledge gap between users from different technical backgrounds and fostering productive competence exchange.

The Risk of Public Cloud for Protecting IP of R&D

While a data lakehouse offers many advantages, it’s not, by itself, a complete solution for highly regulated sectors (such as aerospace, defense, energy, and automotive) where sovereignty is a non-negotiable requirement. Simply put: not every data lakehouse can be deployed and operated in compliance with data sovereignty mandates, and relying on the public cloud carries significant risk to maintaining the strictest control over proprietary IP.

For instance, a single snapshot of a computational fluid dynamics (CFD) job—like a new engine design—effectively represents the complete blueprint of its performance and industrial design; this dataset is a company's crown jewel. It is therefore crucial to determine which key non-functional capabilities of a data lakehouse can provide the absolute legal assurance of operational sovereignty necessary to store such strategic assets. This leads directly to the core of the residency vs. sovereignty debate.

Data Residency vs. Sovereignty

The traditional definition of sovereignty as operating in an enterprise's home country is an outdated notion, a remnant of the pre-cloud era. Previously, data center infrastructure was typically managed by local personnel, inherently subjecting it to the company's local jurisdiction and legal obligations. However, the rise of commercial cloud offerings and the necessity for providers to guarantee extremely high service-level objectives 24/7, have fully enabled remote, follow-the-sun global cloud operations. This advancement makes it impossible to guarantee—at least in commercial standard regions—the residency of the management team, thereby severing the link between “data residency" and true "sovereignty."

Consequently, the most dependable architecture for handling and processing critical engineering data is a sovereign data lakehouse: an open data lakehouse that's natively hybrid and cloud-agnostic.

This approach offers the speed and ease of a cloud-like PaaS experience along with by-design compliance, enabling an enterprise to meet national or other jurisdictional policies that would mandate operating entirely within a sovereign, private, and controlled environment (and personnel).

Term	Explanation	Business Impact
Data Residency	The data physically sits on hardware inside a specific country's geopolitical boundaries.	Handles basic local compliance requirements, not necessarily related to security but mostly on latency between data itself and IT solutions consuming that particular dataset.
Operational Sovereignty	Ensures that the people managing the cloud infrastructure (Cloud Ops) and the legal framework governing the provider are also local and under the right sovereign governance.	Prevents the risk of foreign government access requests that could legally force the provider to hand over sensitive IP without the company's consent.

AI Economics: Achieving Cost Predictability for AI Models

Beyond security and legal compliance, a sovereign data lakehouse architecture offers another crucial advantage: predictable cost management for implementing AI workflows.

The financial model of running AI services in the public cloud is inherently variable and consumption-based, tying costs directly to usage metrics (such as GPU-hours, processed tokens, operational volume, and data scanned). As more teams, projects, and applications leverage cloud infrastructure, the cost grows exponentially. This model is particularly challenging for high-demand tasks like training complex generative AI (GenAI) models or heavy autoencoders, which require dedicated, constant, and massive GPU usage that is often difficult to share efficiently.

Transitioning to a sovereign data lakehouse deployed in a private or fixed-cost colocation data center shifts an organization to predictable spending by:

Establishing fixed asset investment: Organizations invest in fixed, sharable infrastructure. This setup allows multiple teams and projects to use the same resources, effectively driving the marginal cost of initiating new R&D experiments down to near zero.
Eliminating "bill shock": This architecture completely removes any financial risk associated with unexpected, massive expenses, such as those caused by high-volume inference, continuous iterative R&D training loops, or prohibitive data transfer fees common across public cloud zones.

To Learn More, Keep Reading in Part Three!

Gabriele Folchi

Senior Partner Solutions Engineer

More by this author ›

Lama Itani

Partner Solutions Engineer

More by this author ›

May 20, 2026 | Business

From Analytics Platform to an AI Operating System: Data Lakehouse in the Agentic AI Era

8 min read • Navita Sood

Ready to Get Started?

Your form submission has failed.

This may have been caused by one of the following:

Your request timed out
A plugin/browser extension blocked the submission. If you have an ad blocking plugin please disable it and close this message to reload the page.