Using HBase Command-Line Utilities

Besides the HBase Shell, HBase includes several other command-line utilities, which are available in the hbase/bin/ directory of each HBase host. This topic provides basic usage instructions for the most commonly used utilities.

PerformanceEvaluation
LoadTestTool
wal
hfile
hbck
clean

`PerformanceEvaluation`

The PerformanceEvaluation utility allows you to run several preconfigured tests on your cluster and reports its performance. To run the PerformanceEvaluation tool in CDH 5.1 and higher, use the bin/hbase pe command. In CDH 5.0 and lower, use the command bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation.

For usage instructions, run the command with no arguments. The following output shows the usage instructions for the PerformanceEvaluation tool in CDH 5.7. Options and commands available depend on the CDH version.

$ hbase pe

Usage: java org.apache.hadoop.hbase.PerformanceEvaluation \
<OPTIONS> [-D<property=value>]* <command> <nclients>

Options:
nomapred Run multiple clients using threads (rather than use mapreduce)
rows Rows each client runs. Default: One million
size Total size in GiB. Mutually exclusive with --rows. Default: 1.0.
sampleRate Execute test on a sample of total rows. Only supported by randomRead.
Default: 1.0
traceRate Enable HTrace spans. Initiate tracing every N rows. Default: 0
table Alternate table name. Default: 'TestTable'
multiGet If >0, when doing RandomRead, perform multiple gets instead of single
gets.
Default: 0
compress Compression type to use (GZ, LZO, ...). Default: 'NONE'
flushCommits Used to determine if the test should flush the table. Default: false
writeToWAL Set writeToWAL on puts. Default: True
autoFlush Set autoFlush on htable. Default: False
oneCon all the threads share the same connection. Default: False
presplit Create presplit table. Recommended for accurate perf analysis (see
guide). Default: disabled
inmemory Tries to keep the HFiles of the CF inmemory as far as possible. Not
guaranteed that reads are always served from memory. Default: false
usetags Writes tags along with KVs. Use with HFile V3. Default: false
numoftags Specify the no of tags that would be needed. This works only if usetags
is true.
filterAll Helps to filter out all the rows on the server side there by not returning
anything back to the client. Helps to check the server side performance.
Uses FilterAllFilter internally.
latency Set to report operation latencies. Default: False
bloomFilter Bloom filter type, one of [NONE, ROW, ROWCOL]
valueSize Pass value size to use: Default: 1024
valueRandom Set if we should vary value size between 0 and 'valueSize'; set on read
for stats on size: Default: Not set.
valueZipf Set if we should vary value size between 0 and 'valueSize' in zipf form:
Default: Not set.
period Report every 'period' rows: Default: opts.perClientRunRows / 10
multiGet Batch gets together into groups of N. Only supported by randomRead.
Default: disabled
addColumns Adds columns to scans/gets explicitly. Default: true
replicas Enable region replica testing. Defaults: 1.
splitPolicy Specify a custom RegionSplitPolicy for the table.
randomSleep Do a random sleep before each get between 0 and entered value. Defaults: 0
columns Columns to write per row. Default: 1
caching Scan caching to use. Default: 30

Note: -D properties will be applied to the conf used.
For example:
-Dmapreduce.output.fileoutputformat.compress=true
-Dmapreduce.task.timeout=60000

Command:
append Append on each row; clients overlap on keyspace so some concurrent
operations
checkAndDelete CheckAndDelete on each row; clients overlap on keyspace so some concurrent
operations
checkAndMutate CheckAndMutate on each row; clients overlap on keyspace so some concurrent
operations
checkAndPut CheckAndPut on each row; clients overlap on keyspace so some concurrent
operations
filterScan Run scan test using a filter to find a specific row based on it's value
(make sure to use --rows=20)
increment Increment on each row; clients overlap on keyspace so some concurrent
operations
randomRead Run random read test
randomSeekScan Run random seek and scan 100 test
randomWrite Run random write test
scan Run scan test (read every row)
scanRange10 Run random seek scan with both start and stop row (max 10 rows)
scanRange100 Run random seek scan with both start and stop row (max 100 rows)
scanRange1000 Run random seek scan with both start and stop row (max 1000 rows)
scanRange10000 Run random seek scan with both start and stop row (max 10000 rows)
sequentialRead Run sequential read test
sequentialWrite Run sequential write test

Args:
nclients Integer. Required. Total number of clients (and HRegionServers)
running: 1 <= value <= 500
Examples:
To run a single client doing the default 1M sequentialWrites:
$ bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation sequentialWrite 1
To run 10 clients doing increments over ten rows:
$ bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation --rows=10 --nomapred increment 10

`LoadTestTool`

The LoadTestTool utility load-tests your cluster by performing writes, updates, or reads on it. To run the LoadTestTool in CDH 5.1 and higher, use the bin/hbase ltt command . In CDH 5.0 and lower, use the command bin/hbase org.apache.hadoop.hbase.util.LoadTestTool. To print general usage information, use the -h option. Options and commands available depend on the CDH version.

$ bin/hbase ltt -h

Options:
 -batchupdate                    Whether to use batch as opposed to separate updates for every column
                                 in a row
 -bloom <arg>                    Bloom filter type, one of [NONE, ROW, ROWCOL]
 -compression <arg>              Compression type, one of [LZO, GZ, NONE, SNAPPY, LZ4]
 -data_block_encoding <arg>      Encoding algorithm (e.g. prefix compression) to use for data blocks
                                 in the test column family, one of
                                 [NONE, PREFIX, DIFF, FAST_DIFF, PREFIX_TREE].
 -deferredlogflush               Enable deferred log flush.
 -encryption <arg>               Enables transparent encryption on the test table, one of [AES]
 -families <arg>                 The name of the column families to use separated by comma
 -generator <arg>                The class which generates load for the tool. Any args for this class
                                 can be passed as colon separated after class name
 -h,--help                       Show usage
 -in_memory                      Tries to keep the HFiles of the CF inmemory as far as possible.  Not
                                 guaranteed that reads are always served from inmemory
 -init_only                      Initialize the test table only, don't do any loading
 -key_window <arg>               The 'key window' to maintain between reads and writes for concurrent
                                 write/read workload. The default is 0.
 -max_read_errors <arg>          The maximum number of read errors to tolerate before terminating all
                                 reader threads. The default is 10.
 -mob_threshold <arg>            Desired cell size to exceed in bytes that will use the MOB write path
 -multiget_batchsize <arg>       Whether to use multi-gets as opposed to separate gets for every
                                 column in a row
 -multiput                       Whether to use multi-puts as opposed to separate puts for every
                                 column in a row
 -num_keys <arg>                 The number of keys to read/write
 -num_regions_per_server <arg>   Desired number of regions per region server. Defaults to 5.
 -num_tables <arg>               A positive integer number. When a number n is speicfied, load test tool
                                 will load n table parallely. -tn parameter value becomes table name prefix.
                                 Each table name is in format <tn>_1...<tn>_n
 -read <arg>                     <verify_percent>[:<#threads=20>]
 -reader <arg>                   The class for executing the read requests
 -region_replica_id <arg>        Region replica id to do the reads from
 -region_replication <arg>       Desired number of replicas per region
 -regions_per_server <arg>       A positive integer number. When a number n is specified, load test tool
                                 will create the test table with n regions per server
 -skip_init                      Skip the initialization; assume test table already exists
 -start_key <arg>                The first key to read/write (a 0-based index). The default value is 0.
 -tn <arg>                       The name of the table to read or write
 -update <arg>                   <update_percent>[:<#threads=20>][:<#whether to ignore nonce collisions=0>]
 -updater <arg>                  The class for executing the update requests
 -write <arg>                    <avg_cols_per_key>:<avg_data_size>[:<#threads=20>]
 -writer <arg>                   The class for executing the write requests
 -zk <arg>                       ZK quorum as comma-separated host names without port numbers
 -zk_root <arg>                  name of parent znode in zookeeper

`wal`

The wal utility prints information about the contents of a specified WAL file. To get a list of all WAL files, use the HDFS command hadoop fs -ls -R /hbase/WALs. To run the wal utility, use the bin/hbase wal command. Run it without options to get usage information.

hbase wal
usage: WAL <filename...> [-h] [-j] [-p] [-r <arg>] [-s <arg>] [-w <arg>]
 -h,--help             Output help message
 -j,--json             Output JSON
 -p,--printvals        Print values
 -r,--region <arg>     Region to filter by. Pass encoded region name; e.g.
                       '9192caead6a5a20acb4454ffbc79fa14'
 -s,--sequence <arg>   Sequence to filter by. Pass sequence number.
 -w,--row <arg>        Row to filter by. Pass row name.

`hfile`

The hfile utility prints diagnostic information about a specified hfile, such as block headers or statistics. To get a list of all hfiles, use the HDFS command hadoop fs -ls -R /hbase/data. To run the hfile utility, use the bin/hbase hfile command. Run it without options to get usage information.

$ hbase hfile

usage: HFile [-a] [-b] [-e] [-f <arg> | -r <arg>] [-h] [-i] [-k] [-m] [-p]
       [-s] [-v] [-w <arg>]
 -a,--checkfamily         Enable family check
 -b,--printblocks         Print block index meta data
 -e,--printkey            Print keys
 -f,--file <arg>          File to scan. Pass full-path; e.g.
                          hdfs://a:9000/hbase/hbase:meta/12/34
 -h,--printblockheaders   Print block headers for each block.
 -i,--checkMobIntegrity   Print all cells whose mob files are missing
 -k,--checkrow            Enable row order check; looks for out-of-order
                          keys
 -m,--printmeta           Print meta data of file
 -p,--printkv             Print key/value pairs
 -r,--region <arg>        Region to scan. Pass region name; e.g.
                          'hbase:meta,,1'
 -s,--stats               Print statistics
 -v,--verbose             Verbose output; emits file and meta data
                          delimiters
 -w,--seekToRow <arg>     Seek to this row and print all the kvs for this
                          row only

`hbck`

The hbck utility checks and optionally repairs errors in HFiles. To run hbck, use the bin/hbase hbck command. Run it with the -h option to get more usage information.

$ bin/hbase hbck -h

Usage: fsck [opts] {only tables}
where [opts] are:
-help Display help options (this)
-details Display full report of all regions.
-timelag <timeInSeconds> Process only regions that have not experienced any metadata updates in the last
<timeInSeconds> seconds.
-sleepBeforeRerun <timeInSeconds> Sleep this many seconds before checking if the fix worked if run with
-fix
-summary Print only summary of the tables and status.
-metaonly Only check the state of the hbase:meta table.
-sidelineDir <hdfs://> HDFS path to backup existing meta.
-boundaries Verify that regions boundaries are the same between META and store files.
-exclusive Abort if another hbck is exclusive or fixing.
-disableBalancer Disable the load balancer.

Metadata Repair options: (expert features, use with caution!)
-fix Try to fix region assignments. This is for backwards compatiblity
-fixAssignments Try to fix region assignments. Replaces the old -fix
-fixMeta Try to fix meta problems. This assumes HDFS region info is good.
-noHdfsChecking Don't load/check region info from HDFS. Assumes hbase:meta region info is good. Won't
check/fix any HDFS issue, e.g. hole, orphan, or overlap
-fixHdfsHoles Try to fix region holes in hdfs.
-fixHdfsOrphans Try to fix region dirs with no .regioninfo file in hdfs
-fixTableOrphans Try to fix table dirs with no .tableinfo file in hdfs (online mode only)
-fixHdfsOverlaps Try to fix region overlaps in hdfs.
-fixVersionFile Try to fix missing hbase.version file in hdfs.
-maxMerge <n> When fixing region overlaps, allow at most <n> regions to merge. (n=5 by default)
-sidelineBigOverlaps When fixing region overlaps, allow to sideline big overlaps
-maxOverlapsToSideline <n> When fixing region overlaps, allow at most <n> regions to sideline per group.
(n=2 by default)
-fixSplitParents Try to force offline split parents to be online.
-ignorePreCheckPermission ignore filesystem permission pre-check
-fixReferenceFiles Try to offline lingering reference store files
-fixEmptyMetaCells Try to fix hbase:meta entries not referencing any region (empty REGIONINFO_QUALIFIER rows)

Datafile Repair options: (expert features, use with caution!)
-checkCorruptHFiles Check all Hfiles by opening them to make sure they are valid
-sidelineCorruptHFiles Quarantine corrupted HFiles. implies -checkCorruptHFiles

Metadata Repair shortcuts
-repair Shortcut for -fixAssignments -fixMeta -fixHdfsHoles
-fixHdfsOrphans -fixHdfsOverlaps -fixVersionFile
-sidelineBigOverlaps -fixReferenceFiles -fixTableLocks
-fixOrphanedTableZnodes
-repairHoles Shortcut for -fixAssignments -fixMeta -fixHdfsHoles

Table lock options
-fixTableLocks Deletes table locks held for a long time (hbase.table.lock.expire.ms,
10min by default)

Table Znode options
-fixOrphanedTableZnodes Set table state in ZNode to disabled if table does not exists

Replication options
-fixReplication Deletes replication queues for removed peers

`clean`

After you have finished using a test or proof-of-concept cluster, the hbase clean utility can remove all HBase-related data from ZooKeeper and HDFS. To run the hbase clean utility, use the bin/hbase clean command. Run it with no options for usage information.

$ bin/hbase clean

Usage: hbase clean (--cleanZk|--cleanHdfs|--cleanAll)
Options:
        --cleanZk   cleans hbase related data from zookeeper.
        --cleanHdfs cleans hbase related data from hdfs.
        --cleanAll  cleans hbase related data from both zookeeper and hdfs.

Accessing HBase by using the HBase Shell

Configuring HBase Garbage Collection