Setting Up Apache Mahout Using the Command Line

Apache Mahout is a machine-learning tool. By enabling you to build machine-learning libraries that are scalable to "reasonably large" datasets, it aims to make building intelligent applications easier and faster.

The main use cases for Mahout are:

  • Recommendation mining, which tries to identify things users will like on the basis of their past behavior (for example shopping or online-content recommendations)
  • Clustering, which groups similar items (for example, documents on similar topics)
  • Classification, which learns from existing categories what members of each category have in common, and on that basis tries to categorize new items
  • Frequent item-set mining, which takes a set of item-groups (such as terms in a query session, or shopping-cart content) and identifies items that usually appear together

Installing Mahout

You can install Mahout from an RPM or Debian package, or from a tarball. Installing from packages is more convenient than installing the tarball because the packages:
  • Handle dependencies
  • Provide for easy upgrades
  • Automatically install resources to conventional locations

These instructions assume that you will install from packages if possible.

To install Mahout on a RHEL system:

$ sudo yum install mahout

To install Mahout on a SLES system:

$ sudo zypper install mahout

To install Mahout on an Ubuntu or Debian system:

$ sudo apt-get install mahout

To access Mahout documentation:

The Mahout docs are bundled in a mahout-doc package that should be installed separately.
$ sudo apt-get install mahout-doc
The contents of this package are saved under /usr/share/doc/mahout*.

The Mahout Executable

The Mahout executable is installed in /usr/bin/mahout. Use this executable to run your analysis.

Getting Started with Mahout

To get started with Mahout, you can follow the instructions in this Apache Mahout Quickstart.

Viewing the Mahout Documentation

For more information about Mahout, see mahout.apache.org.