Cloudera named a leader in The Forrester Wave™: Data Fabric Platforms, Q4 2025

Read the report
| Business

Luz Erez on Bringing Humanity Back to Healthcare with AI

Cloudera Author Profile Picture
Orange building and blue sky

Few types of data carry as much potential—or as much responsibility—as medical data. When used properly, healthcare data can improve outcomes, accelerate research, and quite literally save lives. But accessing and analyzing that data remains one of the hardest challenges in enterprise AI.

In this episode of The AI Forecast, host Paul Muller sits down with Luz Erez, founder of MDClone, to explore how synthetic data is changing the way healthcare organizations conduct research, deploy AI, and safeguard sensitive information. Their conversation spans everything from clinical workflows and physician burnout to the role synthetic data plays in validating AI agents safely at scale.

Here are the key takeaways from the conversation.

Bringing Humanity Back to Healthcare Through Automation

Paul: We talk about AI and what it’s going to do for business, and specifically what it’s going to do in the medical industry—but what would you like to see AI automate for you personally?

Luz: One of the things that happened to us during the last 40 years, we lost contact with people. A physician today spends 60% of his time behind the desk doing registering things, regulation, dosing, and so on.

He should be with you as a person. The rest of the work—important work, but work that AI can do—will be done by machines. And it will totally alter the way that we interact. Free time will be more, and many of the interactions that we don’t like about work will be done by machines. I really believe it’s a much, much better future. I’m excited.

Why Medical Research Needs a New Data Model

Paul: You talk about the complexity of retrospective research, but what does retrospective research mean in this context?

Luz: Retrospective research means I’m doing research by looking at data of patients that already exist. And most of the time, people understand there is a difference between correlation and causality.

A researcher might look at medical data and say: I want all the medications of patients that had a relapse in kidney disease while on a beta blocker. Tools like SQL can’t answer this because first you have to define what “on a beta blocker” means, and what a “relapse” means.

As a physicist, I ask: what are the basic rules? The basic rules are rules of time and people, which means this is longitudinal. So, the main question is, how frequent is something taking place? Once I put the mathematics inside it, I could build logic and a system on top of it. But then I saw another problem. I can find the answers, but I cannot give them to anyone. A physician can ask about their patient, but population-level research requires consent, privacy, and governance. So how do you solve this?

We built something called synthetic data. The engine looks at real data, but it doesn’t give you that data. It gives you a list of avatars that look like the original data. Any statistics will be the same, but there’s no one-to-one correlation with real people. There is no PHI issue.

Synthetic data allows you to share data, train models, and collaborate—without violating privacy. And today, synthetic data plays a major role in AI.

Synthetic Data as the Foundation for Safe Medical AI

Paul: To get synthetic data of sufficient fidelity, surely it still has to come from actual data?

Luz: Sure. It looks at actual data and creates synthetic data. There is a balance between privacy and utility. You set the level of privacy, and we give you the best utility possible.

The key difference is governance. When a machine does this automatically and users only see synthetic data, all the ethical and privacy issues go away. Not everything can be synthetic—rare cases are hard—but for common medical data, synthetic data works extremely well.

Paul: How do you see this changing the future of medicine?

Luz: AI agents are already doing things like offering dosing recommendations with incredible accuracy, but not enough yet. For dosing, you need absolute certainty. To validate these agents, you need hundreds of thousands of cases—and you don’t have them yet.

With synthetic data, we can bootstrap. We generate more and more cases until we can prove the agent works 100% of the time. We’ve shown that primary caregivers can save 40–60% of session time using agents like this, but only if they’re validated correctly.

Without synthetic data, you can’t safely test these systems at scale. I truly believe synthetic data is one of the bedrocks of medical AI.

Listen to the full conversation with Luz Erez on The AI Forecast on Spotify, Apple Podcasts, and YouTube.

You can also learn more about Cloudera’s partnership with MDClone at cloudera.com.

Ready to Get Started?

Your form submission has failed.

This may have been caused by one of the following:

  • Your request timed out
  • A plugin/browser extension blocked the submission. If you have an ad blocking plugin please disable it and close this message to reload the page.