Best Practices for Using Apache Hive in CDH

Hive data warehouse software enables reading, writing, and managing large datasets in distributed storage. Using the Hive query language (HiveQL), which is very similar to SQL, queries are converted into a series of jobs that execute on a Hadoop cluster through MapReduce or Apache Spark.

Users can run batch processing workloads with Hive while also analyzing the same data for interactive SQL or machine-learning workloads using tools like Apache Impala or Apache Spark—all within a single platform.

As part of CDH, Hive also benefits from:

Unified resource management provided by YARN
Simplified deployment and administration provided by Cloudera Manager
Shared security and governance to meet compliance requirements provided by Apache Sentry and Cloudera Navigator

Continue reading:

Installation and Upgrade
Configuring
Using & Managing
Tuning
Hive Metastore (HMS)
Data Replication
Security
HCatalog
Troubleshooting

Categories: Hive | All Categories

Troubleshooting

Installation and Upgrade