Configuring HBase in Pseudo-Distributed Mode
Pseudo-distributed mode differs from standalone mode in that each of the component processes run in a separate JVM. It differs from distributed mode in that each of the separate processes run on the same server, rather than multiple servers in a cluster. This section also assumes you wish to store your HBase data in HDFS rather than on the local filesystem.
Modifying the HBase Configuration
To enable pseudo-distributed mode, you must first make some configuration changes. Open /etc/hbase/conf/hbase-site.xml in your editor of choice, and insert the following XML properties between the <configuration> and </configuration> tags. The hbase.cluster.distributed property directs HBase to start each process in a separate JVM. The hbase.rootdir property directs HBase to store its data in an HDFS filesystem, rather than the local filesystem. Be sure to replace myhost with the hostname of your HDFS NameNode (as specified by fs.default.name or fs.defaultFS in your conf/core-site.xml file); you may also need to change the port number from the default (8020).
<property> <name>hbase.cluster.distributed</name> <value>true</value> </property> <property> <name>hbase.rootdir</name> <value>hdfs://myhost:8020/hbase</value> </property>
Creating the /hbase Directory in HDFS
Before starting the HBase Master, you need to create the /hbase directory in HDFS. The HBase master runs as hbase:hbase so it does not have the required permissions to create a top level directory.
To create the /hbase directory in HDFS:
$ sudo -u hdfs hadoop fs -mkdir /hbase $ sudo -u hdfs hadoop fs -chown hbase /hbase
Enabling Servers for Pseudo-distributed Operation
After you have configured HBase, you must enable the various servers that make up a distributed HBase cluster. HBase uses three required types of servers:
Installing and Starting ZooKeeper Server
HBase uses ZooKeeper Server as a highly available, central location for cluster management. For example, it allows clients to locate the servers, and ensures that only one master is active at a time. For a small cluster, running a ZooKeeper node collocated with the NameNode is recommended. For larger clusters, contact Cloudera Support for configuration help.
Install and start the ZooKeeper Server in standalone mode by running the commands shown in the Installing the ZooKeeper Server Package and Starting ZooKeeper on a Single Server
Starting the HBase Master
After ZooKeeper is running, you can start the HBase master in standalone mode.
$ sudo service hbase-master start
Starting an HBase RegionServer
The RegionServer is the HBase process that actually hosts data and processes requests. The RegionServer typically runs on all HBase nodes except for the node running the HBase master node.
To enable the HBase RegionServer On RHEL-compatible systems:
$ sudo yum install hbase-regionserver
To enable the HBase RegionServer on Ubuntu and Debian systems:
$ sudo apt-get install hbase-regionserver
To enable the HBase RegionServer on SLES systems:
$ sudo zypper install hbase-regionserver
To start the RegionServer:
$ sudo service hbase-regionserver start
Verifying the Pseudo-Distributed Operation
After you have started ZooKeeper, the Master, and a RegionServer, the pseudo-distributed cluster should be up and running. You can verify that each of the daemons is running using the jps tool from the Oracle JDK, which you can obtain from here. If you are running a pseudo-distributed HDFS installation and a pseudo-distributed HBase installation on one machine, jps will show the following output:
$ sudo jps 32694 Jps 30674 HRegionServer 29496 HMaster 28781 DataNode 28422 NameNode 30348 QuorumPeerMain
You should also be able to go to http://localhost:60010 and verify that the local RegionServer has registered with the Master.
Installing and Starting the HBase Thrift Server
The HBase Thrift Server is an alternative gateway for accessing the HBase server. Thrift mirrors most of the HBase client APIs while enabling popular programming languages to interact with HBase. The Thrift Server is multiplatform and more performant than REST in many situations. Thrift can be run collocated along with the RegionServers, but should not be collocated with the NameNode or the JobTracker. For more information about Thrift, visit http://thrift.apache.org/.
To enable the HBase Thrift Server On RHEL-compatible systems:
$ sudo yum install hbase-thrift
To enable the HBase Thrift Server on Ubuntu and Debian systems:
$ sudo apt-get install hbase-thrift
To enable the HBase Thrift Server on SLES systems:
$ sudo zypper install hbase-thrift
To start the Thrift server:
$ sudo service hbase-thrift start