NOTICE
As of January 31, 2021, this tutorial references legacy products that no longer represent Cloudera’s current product offerings.
Please visit recommended tutorials:
- How to Create a CDP Private Cloud Base Development Cluster
- All Cloudera Data Platform (CDP) related tutorials
Introduction
You will use Zeppelin's JDBC Hive Interpreter to perform SQL queries against the noSQL HBase table "tweets_sentiment" for the sum of happy and sad tweets and perform visualizations of the results.
Prerequisites
- Enabled Connected Data Architecture
- Setup the Development Environment
- Acquired Twitter Data
- Cleaned Raw Twitter Data
- Built a Sentiment Classification Model
- Deployed a Sentiment Classification Model
Outline
Implement a Zeppelin Notebook to Visualize Sentiment Scores
Create Hive Table Mapping to HBase Table
To visualize the data stored in HBase, you can use zeppelin's JDBC Hive Interpreter:
%jdbc(hive)
CREATE EXTERNAL TABLE IF NOT EXISTS tweets_sentiment(`key` BIGINT, `handle` STRING, `language` STRING, `location` STRING, `sentiment` DOUBLE, `tweet_id` BIGINT)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES("hbase.columns.mapping" = ":key,social_media_sentiment:twitter.handle,social_media_sentiment:twitter.language,social_media_sentiment:twitter.location,social_media_sentiment:twitter.sentiment,social_media_sentiment:twitter.tweet_id")
TBLPROPERTIES("hbase.table.name" = "tweets_sentiment");
Load a Sample of the Data
Load data from the Hive table:
%jdbc(hive)
SELECT * FROM tweets_sentiment;
Visualize Sentiment Score Per Language in Bar Chart
To see each tweet's sentiment score per language, copy and paste the following query.
%jdbc(hive)
SELECT language, sentiment FROM tweets_sentiment;
Summary
Congratulations! You just learned to write Hive code to access an HBase table, query against the table and visualize the data using Zeppelin's JDBC Hive Interpreter.