This is the documentation for Cloudera Search CDH 5 Beta 2 and 1.2.0 for CDH 4.
Documentation for other versions is available at Cloudera Documentation.

Cloudera Search Requirements

This topic describes Cloudera Search requirements, organized into categories.

CDH and Cloudera Manager Requirements

  • Cloudera Search for CDH 5 is included with and supported on CDH 5 beta.
  • Cloudera Search 1.2.0 requires CDH 4.6 and supports Cloudera Manager 4.8. For more information on CDH 4, see CDH 4 Documentation. If you do not want to upgrade to CDH 4.6 or Cloudera Manager 4.8, do not use Search 1.2.0.

Operating Systems

Cloudera Search provides packages for RHEL, SLES, Ubuntu, and Debian systems as described below. All packages are 64-bit.

Operating System

Version

Red Hat compatible

 

Red Hat Enterprise Linux (RHEL)

5.7

 

6.2

  6.4

CentOS

5.7

 

6.2

  6.4

Oracle Linux with Unbreakable Enterprise Kernel

5.6

  6.4

SLES

 

SLES Linux Enterprise Server (SLES)

11 with Service Pack 1 or later

Ubuntu/Debian

 

Ubuntu

Search 1.x only: Lucid (10.04) - Long-Term Support (LTS)

 

Precise (12.04) - Long-Term Support (LTS)

Debian

Search 1.x only: Squeeze (6.03)

  Cloudera Search for CDH 5 only: Wheezy (7.0)
  Note:
  • Cloudera has received reports that our RPMs work well on Fedora, but we have not tested this.
  • If you are using an operating system that is not supported by Cloudera's packages, you can also download source tarballs from Downloads.

JDK

  • Cloudera Search for CDH 5 requires Oracle JDK 1.7. See Java Development Kit Installation for JDK downloads.
  • Cloudera Search 1.2 works with Oracle JDK 1.6 and Oracle JDK 1.7:
    • Cloudera Search works with JDK 1.6. Search is certified with 1.6.0_31, but any later maintenance (_xx) release should be acceptable for production, following Oracle's release notes and restrictions. The minimum supported version is 1.6.0_8.
    • Cloudera Search works with JDK 1.7. Search is certified with 1.7.0_15, but any later maintenance (_xx) release should be acceptable for production, following Oracle's release notes and restrictions.
  Note:

Cloudera Search supports running applications compiled with Oracle JDK 7 (JDK 1.7) with the following restrictions:

  • All CDH components must be running the same major version (that is, all deployed on JDK 6 or all deployed on JDK 7). For example, you cannot run Hadoop on JDK 6 while running Sqoop on JDK 7.
  • All nodes in the cluster must be running the same major JDK version: Cloudera does not support mixed environments (some nodes on JDK6 and others on JDK7).

To make sure everything works correctly, symbolically link the directory where you install the JDK to /usr/java/default on Red Hat and similar systems, or to /usr/lib/jvm/default-java on Ubuntu and Debian systems.

Ports Used by Cloudera Search

Cloudera Search uses the ports listed in table below. Before you deploy Cloudera Search, make sure these ports are open on each system. The table reflects the current default settings, which are defined in the Solr defaults file located in /etc/defaults/solr.

Component

Service

Port

Protocol

Access Requirement

Comment

Cloudera Search

Solr search/update

8983

http

External

All Solr-specific actions, update/query. Defined in /etc/default/solr.

CDH

Cloudera CDH admin

8984

http

Internal

CDH Administrative use.

Memory

CDH initially deploys Solr with a JVM size of 1 GB. In the context of Search, 1 GB is a small value. Starting with this small value simplifies JVM deployment, but the value is insufficient for most actual use cases. Some of the factors to consider when determining an acceptable value for production usage are:

  • The more searchable material you have, the more memory you need. All things being equal, 10 TB of searchable data requires more memory than 1 TB of searchable data.
  • What is indexed within the searchable material. Indexing all fields in a collection of logs, emails, or Wikipedia entries requires more memory than indexing only the Date Created field.
  • What level of performance is required. If the system must be stable and respond quickly, more memory may help. If slow responses are acceptable, you may be able to use less memory.

The only way to ensure an appropriate amount of memory is to consider your requirements and experiment in your environment. In general:

  • 4 GB may be acceptable for smaller loads or for evaluation.
  • 12 GB is sufficient for some production environments.
  • 48 GB is sufficient for most situations.