This is the documentation for CDH 4.7.0.
Documentation for other versions is available at Cloudera Documentation.

Configuring Oozie

Configuring which Hadoop Version to Use

The Oozie client does not interact directly with Hadoop MapReduce, and so it does not require any MapReduce configuration.

The Oozie server can work with either MRv1 or YARN. It cannot work with both simultaneously.

The MapReduce version Oozie server works with is determined by the CATALINA_BASE variable in the /etc/oozie/conf/oozie-env.sh file. By default, CATALINA_BASE is set to /usr/lib/oozie/oozie-server-0.20. This setting configures the Oozie server to work with MRv1.

To configure the Oozie server to work with YARN instead, set CATALINA_BASE to /usr/lib/oozie/oozie-server.

CAUTION:

Do this while the Oozie server is not running.

If you change the MapReduce version on an Oozie server running workflows that use the other version of MapReduce (the version you are changing from; for example MRv1) all those jobs will fail.

Configuring Oozie after Upgrading from CDH3

  Note:

If you are installing Oozie for the first time, skip this section and proceed with Configuring Oozie after a New Installation.

Step 1: Update Configuration Files

  1. Edit the the new Oozie CDH4 oozie-site.xml, and set all customizable properties to the values you set in the CDH3 oozie-site.xml:
      Important:
    • DO NOT copy over the CDH3 configuration files into the CDH4 configuration directory.
    • The configuration property names for the database settings have changed between Oozie CDH3 and Oozie CDH4: the prefix for these names has changed from oozie.service.StoreService.* to oozie.service.JPAService.*. Make sure you use the new prefix.
  2. If necessary do the same for the oozie-log4j.properties, oozie-env.sh and the adminusers.txt files.

Step 2: Upgrade the Database

  Important:
  • Do not proceed before you have edited the configuration files as instructed in Step 1.
  • Before running the database upgrade tool, copy or symlink the MySQL JDBC driver JAR into the /var/lib/oozie/ directory.

Oozie CDH4 provides a command-line tool to perform the database schema and data upgrade that is required when you upgrade Oozie from CDH3 to CDH4. The tool uses Oozie configuration files to connect to the database and perform the upgrade.

The database upgrade tool works in two modes: it can do the upgrade in the database or it can produce an SQL script that a database administrator can run manually. If you use the tool to perform the upgrade, you must do it as a database user who has permissions to run DDL operations in the Oozie database.

To run the Oozie database upgrade tool against the database:

  Important:

This step must be done as the oozie Unix user, otherwise Oozie may fail to start or work properly because of incorrect file permissions.

$ sudo -u oozie /usr/lib/oozie/bin/ooziedb.sh upgrade -run

You will see output such as this:

Validate DB Connection.
DONE
Check DB schema exists
DONE
Check OOZIE_SYS table does not exist
DONE
Verify there are not active Workflow Jobs
DONE
Create SQL schema
DONE
DONE
Create OOZIE_SYS table
DONE
Upgrade COORD_JOBS new columns default values.
DONE
Upgrade COORD_JOBS & COORD_ACTIONS status values.
DONE
Table 'WF_ACTIONS' column 'execution_path', length changed to 1024
DONE

Oozie DB has been upgraded to Oozie version '3.3.2-cdh4.4.0'

The SQL commands have been written to: /tmp/ooziedb-5737263881793872034.sql

To create the upgrade script:

  Important:

This step must be done as the oozie Unix user, otherwise Oozie may fail to start or work properly because of incorrect file permissions.

$ sudo -u oozie /usr/lib/oozie/bin/ooziedb.sh -sqlfile <SCRIPT>

For example:

$ bin/ooziedb.sh upgrade -sqlfile oozie-upgrade.sql

You should see output such as the following:

Validate DB Connection.
DONE
Check DB schema exists
DONE
Check OOZIE_SYS table does not exist
DONE
Verify there are not active Workflow Jobs
DONE
Create SQL schema
DONE
DONE
Create OOZIE_SYS table
DONE
Upgrade COORD_JOBS new columns default values.
DONE
Upgrade COORD_JOBS & COORD_ACTIONS status values.
DONE
Table 'WF_ACTIONS' column 'execution_path', length changed to 1024
DONE

Oozie DB has been upgraded to Oozie version '3.3.2-cdh4.4.0'

The SQL commands have been written to: oozie-upgrade.sql

WARN: The SQL commands have NOT been executed, you must use the '-run' option
  Important:

If you used the -sqlfile option instead of -run, Oozie database schema has not been upgraded. You need to run the oozie-upgrade script against your database.

Step 3: Upgrade the Oozie Sharelib

  Important:

This step is required; CDH4 Oozie does not work with CDH3 shared libraries.

CDH4 Oozie has a new shared library which bundles CDH4 JAR files for streaming, DistCp and for Pig, Hive and Sqoop.

The Oozie installation bundles two shared libraries, one for MRv1 and one for YARN. Make sure you install the right one for the MapReduce version you are using:

  • The shared library file for MRv1 is oozie-sharelib.tar.gz.
  • The shared library file for YARN is oozie-sharelib-yarn.tar.gz.
  1. Delete the Oozie shared libraries from HDFS. For example:
    $ sudo -u oozie hadoop fs -rmr /user/oozie/share
      Note:

    If Kerberos is enabled, do not use commands in the form sudo -u <user> <command>; they will fail with a security error. Instead, use the following commands: $ kinit <user> (if you are using a password) or $ kinit -kt <keytab> <principal> (if you are using a keytab) and then, for each command executed by this user, $ <command>

  2. Expand the Oozie CDH4 shared libraries in a local temp directory and copy them to HDFS. For example:
    $ mkdir /tmp/ooziesharelib
    $ cd /tmp/ooziesharelib
    $ tar xzf /usr/lib/oozie/oozie-sharelib.tar.gz
    $ sudo -u oozie hadoop fs -put share /user/oozie/share
      Important:

    If you are installing Oozie to work with YARN use oozie-sharelib-yarn.tar.gz instead.

      Note:

    If the current shared libraries are in another location, make sure you use this other location when you run the above commands, and if necessary edit the oozie-site.xml configuration file to point to the right location.

Step 4: Start the Oozie Server

Now you can start Oozie:

$ sudo service oozie start

Check Oozie's oozie.log to verify that Oozie has started successfully.

Step 5: Upgrade the Oozie Client

Although older Oozie clients work with the new Oozie server, you need to install the new version of the Oozie client in order to use all the functionality of the Oozie server.

To upgrade the Oozie client, if you have not already done so, follow the steps under Installing Oozie.

Configuring Oozie after Upgrading from an Earlier CDH4 Release

  Note:

If you are installing Oozie for the first time, skip this section and proceed with Configuring Oozie after a New Installation.

Step 1: Update Configuration Files

  1. Edit the the new Oozie CDH4 oozie-site.xml, and set all customizable properties to the values you set in the previous oozie-site.xml.
  2. If necessary do the same for the oozie-log4j.properties, oozie-env.sh and the adminusers.txt files.

Step 2: Upgrade the Oozie Sharelib

  Important:

This step is required; the current version of Oozie does not work with shared libraries from an earlier version.

The Oozie installation bundles two shared libraries, one for MRv1 and one for YARN. Make sure you install the right one for the MapReduce version you are using:

  • The shared library file for MRv1 is oozie-sharelib.tar.gz.
  • The shared library file for YARN is oozie-sharelib-yarn.tar.gz.
  1. Delete the Oozie shared libraries from HDFS. For example:
    $ sudo -u oozie hadoop fs -rmr /user/oozie/share
      Note:

    If Kerberos is enabled, do not use commands in the form sudo -u <user> <command>; they will fail with a security error. Instead, use the following commands: $ kinit <user> (if you are using a password) or $ kinit -kt <keytab> <principal> (if you are using a keytab) and then, for each command executed by this user, $ <command>

  2. Expand the Oozie CDH4 shared libraries in a local temp directory and copy them to HDFS. For example:
    $ mkdir /tmp/ooziesharelib
    $ cd /tmp/ooziesharelib
    $ tar xzf /usr/lib/oozie/oozie-sharelib.tar.gz
    $ sudo -u oozie hadoop fs -put share /user/oozie/share
      Important:

    If you are installing Oozie to work with YARN use oozie-sharelib-yarn.tar.gz instead.

      Note:

    If the current shared libraries are in another location, make sure you use this other location when you run the above commands, and if necessary edit the oozie-site.xml configuration file to point to the right location.

Step 3: Start the Oozie Server

Now you can start Oozie:

$ sudo service oozie start

Check Oozie's oozie.log to verify that Oozie has started successfully.

Step 4: Upgrade the Oozie Client

Although older Oozie clients work with the new Oozie server, you need to install the new version of the Oozie client in order to use all the functionality of the Oozie server.

To upgrade the Oozie client, if you have not already done so, follow the steps under Installing Oozie.

Configuring Oozie after a New Installation

  Note:

Follow the instructions in this section if you are installing Oozie for the first time. If you are upgrading Oozie from CDH3 or from an earlier CDH4 release, skip this section and choose the appropriate instructions under Configuring Oozie.

When you install Oozie from an RPM or Debian package, Oozie server creates all configuration, documentation, and runtime files in the standard Linux directories, as follows.

Type of File

Where Installed

binaries

/usr/lib/oozie/

configuration

/etc/oozie/conf/

documentation

for SLES: /usr/share/doc/packages/oozie/ for other platforms: /usr/share/doc/oozie/

examples TAR.GZ

for SLES: /usr/share/doc/packages/oozie/ for other platforms: /usr/share/doc/oozie/

sharelib TAR.GZ

/usr/lib/oozie/

data

/var/lib/oozie/

logs

/var/log/oozie/

temp

/var/tmp/oozie/

PID file

/var/run/oozie/

Deciding which Database to Use

Oozie has a built-in Derby database, but Cloudera recommends that you use a Postgres, MySQL, or Oracle database instead, for the following reasons:
  • Derby runs in embedded mode and it is not possible to monitor its health.
  • It is not clear how to implement a live backup strategy for the embedded Derby database, though it may be possible.
  • Under load, Cloudera has observed locks and rollbacks with the embedded Derby database which don't happen with server-based databases.

Configuring Oozie to Use Postgres

Use the procedure that follows to configure Oozie to use PostgreSQL instead of Apache Derby.

Step 1: Install PostgreSQL 8.4.x or 9.0.x.

  Note:

See CDH4 Requirements and Supported Versions for tested versions.

Step 2: Create the Oozie user and Oozie database.

For example, using the Postgres psql command-line tool:

$ psql -U postgres
Password for user postgres: *****

postgres=# CREATE ROLE oozie LOGIN ENCRYPTED PASSWORD 'oozie' 
 NOSUPERUSER INHERIT CREATEDB NOCREATEROLE;
CREATE ROLE

postgres=# CREATE DATABASE "oozie" WITH OWNER = oozie
 ENCODING = 'UTF8'
 TABLESPACE = pg_default
 LC_COLLATE = 'en_US.UTF8'
 LC_CTYPE = 'en_US.UTF8'
 CONNECTION LIMIT = -1;
CREATE DATABASE

postgres=# \q

Step 3: Configure Postgres to accept network connections for user oozie .

  1. Edit the postgresql.conf file and set the listen_addresses property to *, to make sure that the PostgreSQL server starts listening on all your network interfaces. Also make sure that the standard_conforming_strings property is set to off.
  2. Edit the Postgres data/pg_hba.conf file as follows:
    host    oozie         oozie         0.0.0.0/0             md5

Step 4: Reload the Postgres configuration.

$ sudo -u postgres pg_ctl reload -s -D /opt/PostgreSQL/8.4/data

Step 5: Configure Oozie to use Postgres.

Edit the oozie-site.xml file as follows:

...
    <property>
        <name>oozie.service.JPAService.jdbc.driver</name>
        <value>org.postgresql.Driver</value>
    </property>
    <property>
        <name>oozie.service.JPAService.jdbc.url</name>
        <value>jdbc:postgresql://localhost:5432/oozie</value>
    </property>
    <property>
        <name>oozie.service.JPAService.jdbc.username</name>
        <value>oozie</value>
    </property>
    <property>
        <name>oozie.service.JPAService.jdbc.password</name>
        <value>oozie</value>
    </property>
    ...
  Note:

In the JDBC URL property, replace localhost with the hostname where Postgres is running.

In the case of Postgres, unlike MySQL or Oracle, there is no need to download and install the JDBC driver separately, as it is license-compatible with Oozie and bundled with it.

Configuring Oozie to Use MySQL

Use the procedure that follows to configure Oozie to use MySQL instead of Apache Derby.

Step 1: Install and start MySQL 5.x

  Note:

See CDH4 Requirements and Supported Versions for tested versions.

Step 2: Create the Oozie database and Oozie MySQL user.

For example, using the MySQL mysql command-line tool:

$ mysql -u root -p
Enter password: ******

mysql> create database oozie;
Query OK, 1 row affected (0.03 sec)

mysql>  grant all privileges on oozie.* to 'oozie'@'localhost' identified by 'oozie';
Query OK, 0 rows affected (0.03 sec)

mysql>  grant all privileges on oozie.* to 'oozie'@'%' identified by 'oozie';
Query OK, 0 rows affected (0.03 sec)

mysql> exit
Bye

Step 3: Configure Oozie to use MySQL.

Edit properties in the oozie-site.xml file as follows:

...
    <property>
        <name>oozie.service.JPAService.jdbc.driver</name>
        <value>com.mysql.jdbc.Driver</value>
    </property>
    <property>
        <name>oozie.service.JPAService.jdbc.url</name>
        <value>jdbc:mysql://localhost:3306/oozie</value>
    </property>
    <property>
        <name>oozie.service.JPAService.jdbc.username</name>
        <value>oozie</value>
    </property>
    <property>
        <name>oozie.service.JPAService.jdbc.password</name>
        <value>oozie</value>
    </property>
    ...
  Note:

In the JDBC URL property, replace localhost with the hostname where MySQL is running.

Step 4: Add the MySQL JDBC driver JAR to Oozie.

Copy or symlink the MySQL JDBC driver JAR into the /var/lib/oozie/ directory.

  Note:

You must manually download the MySQL JDBC driver JAR file.

Configuring Oozie to use Oracle

Use the procedure that follows to configure Oozie to use Oracle 11g instead of Apache Derby.

  Note:

See CDH4 Requirements and Supported Versions for tested versions.

Step 1: Install and start Oracle 11g.

Step 2: Create the Oozie Oracle user.

For example, using the Oracle sqlplus command-line tool:

$ sqlplus system@localhost

Enter password: ******

SQL> create user oozie identified by oozie default tablespace users temporary tablespace temp;

User created.

SQL> grant all privileges to oozie;

Grant succeeded.

SQL> exit

$

Step 3: Configure Oozie to use Oracle.

Edit the oozie-site.xml file as follows:

...
    <property>
        <name>oozie.service.JPAService.jdbc.driver</name>
        <value>oracle.jdbc.driver.OracleDriver</value>
    </property>
    <property>
        <name>oozie.service.JPAService.jdbc.url</name>
        <value>jdbc:oracle:thin:@localhost:1521:oozie</value>
    </property>
    <property>
        <name>oozie.service.JPAService.jdbc.username</name>
        <value>oozie</value>
    </property>
    <property>
        <name>oozie.service.JPAService.jdbc.password</name>
        <value>oozie</value>
    </property>
    ...
  Note:

In the JDBC URL property, replace localhost with the hostname where Oracle is running and replace oozie with the TNS name of the Oracle database.

Step 4: Add the Oracle JDBC driver JAR to Oozie.

Copy or symlink the Oracle JDBC driver JAR into the /var/lib/oozie/ directory.

  Note:

You must manually download the Oracle JDBC driver JAR file.

Creating the Oozie Database Schema

After configuring Oozie database information and creating the corresponding database, create the Oozie database schema. Oozie provides a database tool for this purpose.

  Note:

The Oozie database tool uses Oozie configuration files to connect to the database to perform the schema creation; before you use the tool, make you have created a database and configured Oozie to work with it as described above.

The Oozie database tool works in 2 modes: it can create the database, or it can produce an SQL script that a database administrator can run to create the database manually. If you use the tool to create the database schema, you must have the permissions needed to execute DDL operations.

To run the Oozie database tool against the database:

  Important:

This step must be done as the oozie Unix user, otherwise Oozie may fail to start or work properly because of incorrect file permissions.

$ sudo -u oozie /usr/lib/oozie/bin/ooziedb.sh create -run

You should see output such as the following:

Validate DB Connection.
DONE
Check DB schema does not exist
DONE
Check OOZIE_SYS table does not exist
DONE
Create SQL schema
DONE
DONE
Create OOZIE_SYS table
DONE

Oozie DB has been created for Oozie version '3.3.2-cdh4.4.0'

The SQL commands have been written to: /tmp/ooziedb-5737263881793872034.sql

To create the upgrade script:

  Important:

This step must be done as the oozie Unix user, otherwise Oozie may fail to start or work properly because of incorrect file permissions.

Run /usr/lib/oozie/bin/ooziedb.sh create -sqlfile <SCRIPT>. For example:

$ sudo -u oozie /usr/lib/oozie/bin/ooziedb.sh create -sqlfile oozie-create.sql

You should see output such as the following:

Validate DB Connection.
DONE
Check DB schema does not exist
DONE
Check OOZIE_SYS table does not exist
DONE
Create SQL schema
DONE
DONE
Create OOZIE_SYS table
DONE

Oozie DB has been created for Oozie version '3.3.2-cdh4.4.0'

The SQL commands have been written to: oozie-create.sql

WARN: The SQL commands have NOT been executed, you must use the '-run' option
  Important:

If you used the -sqlfile option instead of -run, Oozie database schema has not been created. You need to run the oozie-create.sql script against your database.

Enabling the Oozie Web Console

To enable Oozie's web console, you must download and add the ExtJS library to the Oozie server. If you have not already done this, proceed as follows.

Step 1: Download the Library

Download the ExtJS version 2.2 library from http://archive.cloudera.com/gplextras/misc/ext-2.2.zip and place it a convenient location.

Step 2: Install the Library

Extract the ext-2.2.zip file into /var/lib/oozie.

Configuring Oozie with Kerberos Security

To configure Oozie with Kerberos security, see Oozie Security Configuration.

Installing the Oozie ShareLib in Hadoop HDFS

The Oozie installation bundles Oozie ShareLib, which contains all of the necessary JARs to enable workflow jobs to run streaming, DistCp, Pig, Hive, and Sqoop actions.

The Oozie installation bundles two shared libraries, one for MRv1 and one for YARN. Make sure you install the right one for the MapReduce version you are using:

  • The shared library file for MRv1 is oozie-sharelib.tar.gz.
  • The shared library file for YARN is oozie-sharelib-yarn.tar.gz.
  Important:

If Hadoop is configured with Kerberos security enabled, you must first configure Oozie with Kerberos Authentication. For instructions, see Oozie Security Configuration. Before running the commands in the following instructions, you must run the sudo -u oozie kinit -k -t /etc/oozie/oozie.keytab and kinit -k hdfs commands. Then, instead of using commands in the form sudo -u <user> <command>, use just <command>; for example, $ hadoop fs -mkdir /user/oozie

To install Oozie ShareLib in Hadoop HDFS in the oozie user home directory:

$ sudo -u hdfs hadoop fs -mkdir  /user/oozie
$ sudo -u hdfs hadoop fs -chown oozie:oozie /user/oozie
$ mkdir /tmp/ooziesharelib
$ cd /tmp/ooziesharelib
$ tar xzf /usr/lib/oozie/oozie-sharelib.tar.gz
$ sudo -u oozie hadoop fs -put share /user/oozie/share
  Important:

If you are installing Oozie to work with YARN use oozie-sharelib-yarn.tar.gz instead.

Configuring Support for Oozie Uber JARs

An uber JAR is a JAR that contains other JARs with dependencies in a lib/ folder inside the JAR. Beginning with CDH4.1, you can configure the cluster to handle uber JARs properly for the MapReduce action (as long as it does not include any streaming or pipes) by setting the following property in the oozie-site.xml file:

...
    <property>
        <name>oozie.action.mapreduce.uber.jar.enable</name>
    <value>true</value>
    
    ...

When this property is set, users can use the oozie.mapreduce.uber.jar configuration property in their MapReduce workflows to notify Oozie that the specified JAR file is an uber JAR.

Configuring Oozie to Run against a Federated Cluster

To run Oozie against a federated HDFS cluster using ViewFS, configure the oozie.service.HadoopAccessorService.supported.filesystems property in oozie-site.xml as follows:

<property>
     <name>oozie.service.HadoopAccessorService.supported.filesystems</name>
     <value>hdfs,viewfs</value>
</property>