ClouderaNOW Learn about the latest innovations in data, analytics, and AI

Watch now

June 03, 2024 | Business

Cloudera Introduces AI Inference Service With NVIDIA NIM

Enterprise Ai AI

We are excited to announce a tech preview of Cloudera AI Inference service powered by the full-stack NVIDIA accelerated computing platform, which includes NVIDIA NIM inference microservices, part of the NVIDIA AI Enterprise software platform for generative AI. Cloudera’s AI Inference service uniquely streamlines the deployment and management of large-scale AI models, delivering high performance and efficiency while maintaining strict privacy and security standards.

It integrates seamlessly with our recently launched AI Registry, a central hub for storing, organizing, and tracking machine learning models throughout their lifecycle.

Cloudera AI Registry: Centralized Model Management

By combining the AI Registry with advanced inference capabilities, Cloudera provides a comprehensive solution for modern MLOps, enabling enterprises to efficiently manage, govern, and deploy models of any size across public and private clouds.

The new AI Inference service offers accelerated model serving powered by NVIDIA Tensor Core GPUs, enabling enterprises to deploy and scale AI applications with unprecedented speed and efficiency. Additionally, by leveraging the NVIDIA NeMo platform and optimized versions of open-source LLMs like LLama3 and Mistral models, enterprises can take advantage of the latest advancements in natural language processing, computer vision, and other AI domains.

Cloudera AI Inference: Scalable and Secure Model Serving

Key Features of Cloudera AI Inference service:

Hybrid cloud support: Run workloads on premises or in the cloud, depending on specific needs and requirements, making it suitable for enterprises with complex data architectures or regulatory constraints.
Platform-as-a-Service (PaaS) Privacy: Enterprises have the flexibility to deploy models directly within their own Virtual Private Cloud (VPC), providing an additional layer of protection and control.
Real-time monitoring: Gain insights into the performance of models, enabling quick identification and resolution of issues.
Performance optimizations: Up to 3.7x throughput increase for CPU-based inferences and up to 36x faster performance for NVIDIA GPU-based inferences.
Scalability and high availability: Scale-to-zero autoscaling and HA support for hundreds of production models, ensuring efficient resource management and optimal performance under heavy load.
Advanced deployment patterns: A/B testing and canary rollout/rollback allow gradual deployment of new model versions and controlled measurement of their impact, minimizing risk and ensuring smooth transitions.
Enterprise-grade security: Service Accounts, Access Control, Lineage, and Audit features maintain tight control over model and data access, ensuring the security of sensitive information.

The tech preview of the Cloudera AI Inference service provides early access to these powerful enterprise AI model serving and MLOps capabilities. By combining Cloudera's data management expertise with cutting-edge NVIDIA technologies, this service enables organizations to unlock the potential of their data and drive meaningful outcomes through generative AI. With its comprehensive feature set, robust performance, and commitment to privacy and security, the AI Inference service is critical for enterprises that want to reap the benefits of AI models of any size in production environments.

To learn more about how Cloudera and NVIDIA are partnering to expand GenAI capabilities with NVIDIA microservices, read our recent press release.

Robert Hryniewicz

Director of Product Marketing

More by this author ›

Peter Ableda

Director of Product Management, Machine Learning

More by this author ›

Zoram Thanga

Principal Engineer, Machine Learning

More by this author ›

Shubho Sinha

Senior Engineering Director

More by this author ›

Priyank Patel

VP, Product Management

More by this author ›

October 15, 2025 | Partners

Cloudera and Protegrity: Delivering Secure AI and Analytics for Regulated Industries

Jerome Alexander • 4 min read

Ready to Get Started?

Your form submission has failed.

This may have been caused by one of the following:

Your request timed out
A plugin/browser extension blocked the submission. If you have an ad blocking plugin please disable it and close this message to reload the page.

Misa Amane

Cloudera Introduces AI Inference Service With NVIDIA NIM

Key Features of Cloudera AI Inference service:

Robert Hryniewicz

Peter Ableda

Zoram Thanga

Shubho Sinha

Priyank Patel

Your form submission has failed.