Configuring Mountable HDFS

CDH 5 includes a FUSE (Filesystem in Userspace) interface into HDFS. The hadoop-hdfs-fuse package enables you to use your HDFS cluster as if it were a traditional filesystem on Linux. Proceed as follows.

Before you start: You must have a working HDFS cluster and know the hostname and port that your NameNode exposes.

To install hadoop-hdfs-fuses On Red Hat-compatible systems:

$ sudo yum install hadoop-hdfs-fuse

To install hadoop-hdfs-fuse on Ubuntu systems:

$ sudo apt-get install hadoop-hdfs-fuse

To install hadoop-hdfs-fuse on SLES systems:

$ sudo zypper install hadoop-hdfs-fuse

You now have everything you need to begin mounting HDFS on Linux.

To set up and test your mount point in a non-HA installation:

$ mkdir -p <mount_point>
$ hadoop-fuse-dfs dfs://<name_node_hostname>:<namenode_port> <mount_point>

where namenode_port is the NameNode's RPC port, dfs.namenode.servicerpc-address.

To set up and test your mount point in an HA installation:

$ mkdir -p <mount_point>
$ hadoop-fuse-dfs dfs://<nameservice_id> <mount_point>

where nameservice_id is the value of fs.defaultFS. In this case the port defined for dfs.namenode.rpc-address.[nameservice ID].[name node ID] is used automatically. See Enabling HDFS HA for more information about these properties.

You can now run operations as if they are on your mount point. Press Ctrl+C to end the fuse-dfs program, and umount the partition if it is still mounted.

To clean up your test:

$ umount <mount_point>

You can now add a permanent HDFS mount which persists through reboots.

To add a system mount:

  1. Open /etc/fstab and add lines to the bottom similar to these:
    hadoop-fuse-dfs#dfs://<name_node_hostname>:<namenode_port> <mount_point> fuse allow_other,usetrash,rw 2 0

    For example:

    hadoop-fuse-dfs#dfs://localhost:8020 /mnt/hdfs fuse allow_other,usetrash,rw 2 0
  2. Test to make sure everything is working properly:
    $ mount <mount_point>

Your system is now configured to allow you to use the ls command and use that mount point as if it were a normal system disk.

For more information, see the help for hadoop-fuse-dfs:

$ hadoop-fuse-dfs --help

Optimizing Mountable HDFS

  • We recommend that you use the -obig_writes option on kernels later than 2.6.26. This option allows for better performance of writes.
  • By default, the CDH 5 package installation creates the /etc/default/hadoop-fuse file with a maximum heap size of 128 MB. You might need to change the JVM minimum and maximum heap size for better performance. For example:
    export LIBHDFS_OPTS="-Xms64m -Xmx256m"

    Be careful not to set the minimum to a higher value than the maximum.