Apache HBase

Apache HBase is a distributed, scalable data store that runs on top of Apache Hadoop’s file system, the Hadoop Distributed File System (HDFS). HBase is a key component of an enterprise data hub (EDH), as its design caters to applications that require fast, random access to significant data sets. HBase, which is modeled after Google’s BigTable, can handle massive data tables containing billions of rows and millions of columns..

Download and Install CDH

HBase for the Enterprise

Serving data to many users or applications
Apache HBase is built to scale. Traditional relational databases are not inherently distributed, and as the number of users interacting with the database (i.e. reading and writing data) grows, the storage, memory and CPU requirements can quickly grow beyond what a single machine can accommodate. Scaling traditional systems can be costly to build and cumbersome to operate. HBase is distributed by design; the system is architected to leverage the cost-effective capabilities of Hadoop and an EDH and utilize the storage, memory, and CPU resources of any number of servers within a cluster so that the database scales horizontally as load and performance demands increase. Users can query data in HBase using a number of computing engines offered by an EDH, including interactive SQL with Cloudera Impala and full-text, faceted search with Cloudera Search.

Providing fast, random read/write access to users and applications
HDFS is a "write once read many" (WORM) file system that is well suited for batch processing and interactive SQL and search operations. HDFS emphasizes high throughput computing rather than low latency I/O. HBase augments HDFS by providing record-based storage layer that users and applications use to perform fast, random reads and writes to data. Changes are efficiently cataloged in memory to achieve maximum access while the data is persisted to HDFS. This design enables a Hadoop-based EDH to serve random reads and writes to users and applications in real time yet still enjoy the fault-tolerance and durability of HDFS.

Key Features of Apache HBase

  • Scale-out Architecture - add servers to increase capacity
  • Full Consistency - Guard against node failures or simultaneous writes to the same record
  • High Availability - Multiple master nodes ensure continuous access to data
  • Automatic Sharding - Transparently and efficiently scale out your data across machines in the cluster
  • Active-active Replication - Stream data across locations for disaster recovery and data protection
  • Security - Secure table and column family-level access via Kerberos
  • SQL Access - Query data interactively with Cloudera Impala and for batch processing with Apache Hive
  • Full-text, Faceted Search - Give non-technical users and your applications a familiar yet powerful, interactive search experience

For more information on HBase, visit Apache HBase or check out the HBase Definitive Guide by Cloudera Solutions Architect Lars George.

Get Support for HBase with Cloudera Enterprise

Cloudera Enterprise is the best way to leverage the power of Apache HBase in production environments. When you deploy HBase as part of Cloudera Enterprise Flex Edition or Data Hub Edition as part of an enterprise data hub, you can rely on our market-leading technical support for HBase, as well as actively influence the future of the project.

Learn More About Cloudera Enterprise