Thumbnail

Apache HBase

Apache HBase, a key component of CDH, is a distributed, scalable, NoSQL database that runs on top of HDFS. HBase is modeled after Google’s BigTable and provides the ability to store data in massive tables (billions of rows / millions of columns) for fast, random access.

Download & Install CDH

Primary Use Cases for HBase

Serving of Data to Many Users or Applications
Traditional relational databases are not inherently distributed. Therefore, as the number of users interacting with the database (i.e. reading and writing data) grows the storage, memory and CPU requirements can quickly grow behind what a single machine can serve. HBase is distributed by design. This means that the system is architected to leverage the storage, memory and CPU resources of any number of servers (or nodes) in a “cluster” to scale the database horizontally as load and performance demands increase.

Providing fast, random read/write access to users and applications
HDFS is a Write Once Read Many (WORM) file system that’s tuned for batch operations. The emphasis is on high throughput rather than low latency. HBase augments HDFS by providing record-based storage that allows users and applications to perform fast, random reads and writes to data. Changes are cataloged in memory and eventually pushed down to HDFS for persistence. This enables the Hadoop system to serve random reads and writes to users and applications across big tables in real-time.

Key Features of Apache HBase

  • Scale-out architecture - add servers to increase capacity
  • Full consistency - guard against node failures or simultaneous writes to the same record
  • High availability - multiple master nodes ensure continuous access to data
  • Automatic sharding - transparently and efficiently scale out your data across machines in the cluster
  • Active-active replication - stream data across locations for disaster recovery and data protection
  • Security - table and column family-level security via Kerberos

For more information on HBase, visit the Apache HBase homepage or check out the HBase Definitive Guide by Cloudera Solutions Architect Lars George.

Maximize the Value of HBase with Cloudera Enterprise RTD Cloudera Enterprise RTD is an optional subscription module that can be added to Enterprise Core. When you add RTD to your Enterprise Core subscription, you can take advantage of Cloudera Manager’s powerful tools to interact with HBase and to configure and manage HBase within your CDH cluster as well as our market-leading technical support for HBase.

Get Cloudera Enterprise RTD