We are thrilled to announce the general availability of the Cloudera AI Inference service, powered by NVIDIA NIM microservices, part of the NVIDIA AI Enterprise platform, to accelerate generative AI deployments for enterprises. This service supports a range of optimized AI models, enabling seamless and scalable AI inference.
The generative AI landscape is evolving at a rapid pace, marked by explosive growth and widespread adoption across industries. In 2022, the release of ChatGPT attracted over 100 million users within just two months, demonstrating the technology's accessibility and its impact across various user skill levels.
By 2023, the focus shifted towards experimentation. Enterprise developers began exploring proof of concepts (POCs) for generative AI applications, leveraging API services and open models such as Llama 2 and Mistral. These innovations pushed the boundaries of what generative AI could achieve.
Now, in 2024, generative AI is moving into the production phase for many companies. Businesses are now allocating dedicated budgets and building infrastructure to support AI applications in real-world environments. However, this transition presents significant challenges. Enterprises are increasingly concerned with safeguarding intellectual property (IP), maintaining brand integrity, and protecting client confidentiality while adhering to regulatory requirements.
A major risk is data exposure — AI systems must be designed to align with company ethics and meet strict regulatory standards without compromising functionality. Ensuring that AI systems prevent breaches of client confidentiality, personally identifiable information (PII), and data security is crucial for mitigating these risks.
Enterprises also face the challenge of maintaining control over AI development and deployment across disparate environments. They require solutions that offer robust security, ownership, and governance throughout the entire AI lifecycle, from POC to full production. Additionally, there is a need for enterprise-grade software that streamlines this transition while meeting stringent security requirements.
To safely leverage the full potential of generative AI, companies must address these challenges head-on. Typically, organizations approach generative AI POCs in one of two ways: by using third-party services, which are easy to implement but require sharing private data externally, or by developing self-hosted solutions using a mix of open-source and commercial tools.
At Cloudera, we focus on simplifying the development and deployment of generative AI models for production applications. Our approach provides accelerated, scalable, and efficient infrastructure along with enterprise-grade security and governance. This combination helps organizations confidently adopt generative AI while protecting their IP, brand reputation, and compliance with regulatory standards.
The new Cloudera AI Inference service provides accelerated model serving, enabling enterprises to deploy and scale AI applications with enhanced speed and efficiency. By leveraging the NVIDIA NeMo platform and optimized versions of open-source models like Llama 3 and Mistral, businesses can harness the latest advancements in natural language processing, computer vision, and other AI domains.
The Cloudera AI Inference service offers a powerful combination of performance, security, and scalability designed for modern AI applications. Powered by NVIDIA NIM, it delivers market-leading performance with substantial time and cost savings. Hardware and software optimizations enable up to 36 times faster inference with NVIDIA accelerated computing and nearly four times the throughput on CPUs, accelerating decision-making.
Integration with NVIDIA Triton Inference Server further enhances the service. It provides standardized, efficient deployment with support for open protocols, reducing deployment time and complexity.
In terms of security, the Cloudera AI Inference service delivers robust protection and control. Customers can deploy AI models within their virtual private cloud (VPC) while maintaining strict privacy and control over sensitive data in the cloud. All communications between the applications and model endpoints remain within the customer’s secured environment.
Comprehensive safeguards, including authentication and authorization, ensure that only users with configured access can interact with the model endpoint. The service also meets enterprise-grade security and compliance standards, recording all model interactions for governance and audit.
The Cloudera AI Inference service also offers exceptional scalability and flexibility. It supports hybrid environments, allowing seamless transitions between on-premises and cloud deployments for increased operational flexibility.
Seamless integration with CI/CD pipelines enhances MLOps workflows, while dynamic scaling and distributed serving optimize resource usage. These features reduce costs without compromising performance. High availability and disaster recovery capabilities help enable continuous operation and minimal downtime.
The Cloudera AI Inference service, powered by NVIDIA NIM microservices, delivers seamless, high-performance AI model inferencing across on-premises and cloud environments. Supporting open-source community models, NVIDIA AI Foundation models, and custom AI models, it offers the flexibility to meet diverse business needs. The service enables rapid deployment of generative AI applications at scale, with a strong focus on privacy and security, to help enterprises that want to unlock the full potential of their data with AI models in production environments.
* feature coming soon - please reach out to us if you have questions or would like to learn more.
This may have been caused by one of the following: