Why Cloudera + NVIDIA
Today, data processing and data engineering has become the world's largest computing segment. Modest improvements in the accuracy of analytics models translate into billions to the bottom line. To build the best models, data scientists toil to train, evaluate, iterate, and retrain for highly accurate results and performant models. With RAPIDS on the Cloudera Data Platform (CDP), processes that took days now take minutes, making it easier and faster to build and deploy value generating models. Enterprises can easily leverage GPU-accelerated Apache Spark 3.0 on CDP to remove bottlenecks and quickly improve performance - significantly improving time to insight and the return on investment for data-driven enterprises.
Independent Hardware Vendor (IHV)
- Expand AI use cases with a complete production ML toolkit enabled by NVIDIA computing
- Generate models that produce highly accurate data and insights trusted by the business
- Operate a fully secure ML environment that can meet evolving requirements
- Reduces ML training time and the frequency of model deployment from days to minutes
Joint Solution Overview
Running data science workloads on an accelerated Cloudera Data Platform greatly improves time to value by enabling data scientists to collaborate in a single unified platform that is all inclusive for powering any AI use case. With the latest release, accelerated Apache Spark 3.0 workloads now run seamlessly on CDP. With GPU acceleration, data science teams can leverage purpose-built tooling for agile experimentation, data analytics and machine learning 10x faster and at lower cost.
Cost-effective NVIDIA infrastructure empowers IT teams to deliver an accelerated CDP solution for intuitive, self-service ML — now and into the future. NVIDIA-Certified servers are available from leading OEM server vendors. For companies looking to jumpstart their AI journey, Accelerated CDP Starter Solutions are available to confidently deploy scalable hardware and software solutions that securely and optimally run accelerated workloads.
Joint Solution Benefits
NVIDIA and Cloudera have tested and benchmarked workloads across a wide range of infrastructure configurations and boiled it down to two simple recommendations:
- For companies buying servers dedicated for running Apache Spark for data analytics and ETL in CDP, a CDP-READY configuration comprised of four NVIDIA-Certified servers with two NVIDIA A30 GPUs per server offers over five times the performance at less than 50% incremental cost relative when compared to modern CPU-only alternatives.
- For companies buying servers for running not just Apache Spark but also machine learning in CDP, or if these servers may be used for other AI-related applications during their lifetime, upgrade to an AI-READY configuration comprised of four NVIDIA-Certified servers with one NVIDIA A100 GPU per server offers over eight times the performance at less than 50% incremental cost relative when compared to modern CPU-only alternatives. And these numbers are just the Apache Spark benchmarks; acceleration on ML and AI training is even more significant.
Cloudera and NVIDIA: Predicting customer churn using RAPIDS, Apache Spark, and NVIDIA GPUs
Easily deploy end-to-end data science pipelines on Cloudera Data Platform running on NVIDIA accelerated infrastructure to improve your data-driven operations.
Related blog posts
Enabling NVIDIA GPUs to accelerate model development in Cloudera Machine Learning
By Pete Ableda | April 10, 2021