Faster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for production
Apache Spark is becoming the successor to MapReduce for data processing in Hadoop. Apache Hive also continues to be the most widely used data warehouse/ETL engine with large scale adoption across enterprises. Therefore, it’s imperative to enable Spark as the underlying execution engine for Hive to seamlessly allow existing and future Hive workloads to leverage the advantages of Spark. With the recent release of Cloudera Enterprise, we have delivered on this goal by adding support for Hive-on-Spark. Data engineers and ETL developers can now transition from MR to Spark for their Hive workloads seamlessly thereby benefitting from the advantages of Spark without any disruption on their end.