Overview
This 3-day course covers Apache HBase, a distributed, scalable, NoSQL database designed for real-time read/write access to large datasets. Built on top of HDFS, it brings low-latency random access to Hadoop-scale data. The course includes HBase architecture, data modeling, read/write internals, deployment, high availability, tuning, security, troubleshooting, and advanced topics like Phoenix, HBCK2, and YCSB benchmarking.
What Skills You Will Gain?
- Understanding HBase architecture and its role in the Cloudera
- Operational Database
- Deploying and configuring HBase clusters for high availability
- Designing effective HBase schemas for scalable, real-time workloads
- Analyzing and optimizing HBase write and read paths
- Tuning HBase performance through memory management, caching,
- and compaction
- Securing HBase with authorization policies in Ranger
- Monitoring and troubleshooting clusters using HBCK2 and diagnostic
- tools
- Performing data backup, recovery, and cluster migration
- Querying HBase tables using Apache Phoenix and its advanced
- features
- Benchmarking performance with the YCSB tool
- Managing medium-sized objects (MOBs) efficiently
Who Should Take This Course?
This course is designed for administrators and data engineers who manage or support Apache HBase deployments in production environments. It is also valuable for DevOps professionals involved in performance tuning, monitoring, and troubleshooting databases. Prior experience with HDFS and ZooKeeper is recommended. Students must have Internet access to connect to the hands-on lab environments.
Book the course
Course Details
Configuring HBase for High Availability
HBase Schema Design
- General Design Considerations
- Application-Centric Design
- Designing HBase Row Keys
- Case: Row Key Design
HBase Performance Tuning
- Performance Evaluation using 'pe'
- Optimization through Parameters
- Garbage Collection
- Case: Parameter Optimization
- Best Practices
>HBase Migration, Backup, and Recovery
- Full Migration
- Incremental Migration
- Best Practices
Resource Management
- Managing Roles and Templates
- Adding Workers
- Decommissioning and removing workers
YARN Queues and Jobs
- Install and Configure YARN Queues
- Running and Managing jobs
Managing Services
- Identifying and installing Parcels
- Add and remove Cloudera services
Configuration Management
- Configuration changes to properties
- Configuring Role Groups
HBase Monitoring & Troubleshooting
- Monitoring HBase
- Testing Network Bandwidth
- Cloudera Manager Charts
- Troubleshooting RIT Issues
- Best Practices
Operational Database
HBase Essentials
- Overview
- HBase Table Fundamentals
- HBase Shell
- HBase Data Access
- Column Family Design and Considerations
- Filtering Scans
- Best Practices
HBase Write & Read Path
- HBase Write Path
- HBase Read Path
- Deploying and Accessing to HBase Cluster
- HBase Cluster in Cloudera on Premises
- HBase Cluster in Cloudera on Cloud
Phoenix Overview
- Phoenix Overview
- Command Line Client
- Metadata SQL Commands and Line Client
- Creating a Table
- Table Architecture
- Modifying and Deleting rows
- Reading Data
- Transaction
- BulkLoad
- Views
- Mapping Phoenix table to an existing HBase table
- Secondary Index
- Salted Table
- Phoenix Optimization
Introduction to the HBase HBCK2 Repair Tool
Introduction to BucketCache
Benchmarking HBase Using YCSB Tool
Storing Medium Object (MOBs)
