Few types of data carry as much potential—or as much responsibility—as medical data. When used properly, healthcare data can improve outcomes, accelerate research, and quite literally save lives. But accessing and analyzing that data remains one of the hardest challenges in enterprise AI.
In this episode of The AI Forecast, host Paul Muller sits down with Luz Erez, founder of MDClone, to explore how synthetic data is changing the way healthcare organizations conduct research, deploy AI, and safeguard sensitive information. Their conversation spans everything from clinical workflows and physician burnout to the role synthetic data plays in validating AI agents safely at scale.
Here are the key takeaways from the conversation.
Paul: We talk about AI and what it’s going to do for business, and specifically what it’s going to do in the medical industry—but what would you like to see AI automate for you personally?
Luz: One of the things that happened to us during the last 40 years, we lost contact with people. A physician today spends 60% of his time behind the desk doing registering things, regulation, dosing, and so on.
He should be with you as a person. The rest of the work—important work, but work that AI can do—will be done by machines. And it will totally alter the way that we interact. Free time will be more, and many of the interactions that we don’t like about work will be done by machines. I really believe it’s a much, much better future. I’m excited.
Paul: You talk about the complexity of retrospective research, but what does retrospective research mean in this context?
Luz: Retrospective research means I’m doing research by looking at data of patients that already exist. And most of the time, people understand there is a difference between correlation and causality.
A researcher might look at medical data and say: I want all the medications of patients that had a relapse in kidney disease while on a beta blocker. Tools like SQL can’t answer this because first you have to define what “on a beta blocker” means, and what a “relapse” means.
As a physicist, I ask: what are the basic rules? The basic rules are rules of time and people, which means this is longitudinal. So, the main question is, how frequent is something taking place? Once I put the mathematics inside it, I could build logic and a system on top of it. But then I saw another problem. I can find the answers, but I cannot give them to anyone. A physician can ask about their patient, but population-level research requires consent, privacy, and governance. So how do you solve this?
We built something called synthetic data. The engine looks at real data, but it doesn’t give you that data. It gives you a list of avatars that look like the original data. Any statistics will be the same, but there’s no one-to-one correlation with real people. There is no PHI issue.
Synthetic data allows you to share data, train models, and collaborate—without violating privacy. And today, synthetic data plays a major role in AI.
Paul: To get synthetic data of sufficient fidelity, surely it still has to come from actual data?
Luz: Sure. It looks at actual data and creates synthetic data. There is a balance between privacy and utility. You set the level of privacy, and we give you the best utility possible.
The key difference is governance. When a machine does this automatically and users only see synthetic data, all the ethical and privacy issues go away. Not everything can be synthetic—rare cases are hard—but for common medical data, synthetic data works extremely well.
Paul: How do you see this changing the future of medicine?
Luz: AI agents are already doing things like offering dosing recommendations with incredible accuracy, but not enough yet. For dosing, you need absolute certainty. To validate these agents, you need hundreds of thousands of cases—and you don’t have them yet.
With synthetic data, we can bootstrap. We generate more and more cases until we can prove the agent works 100% of the time. We’ve shown that primary caregivers can save 40–60% of session time using agents like this, but only if they’re validated correctly.
Without synthetic data, you can’t safely test these systems at scale. I truly believe synthetic data is one of the bedrocks of medical AI.
Listen to the full conversation with Luz Erez on The AI Forecast on Spotify, Apple Podcasts, and YouTube.
You can also learn more about Cloudera’s partnership with MDClone at cloudera.com.
This may have been caused by one of the following: