This is the documentation for Cloudera 5.2.x.
Documentation for other versions is available at Cloudera Documentation.

About Oozie

Apache Oozie Workflow Scheduler for Hadoop is a workflow and coordination service for managing Apache Hadoop jobs:

  • Oozie Workflow jobs are Directed Acyclical Graphs (DAGs) of actions; actions are typically Hadoop jobs (MapReduce, Streaming, Pipes, Pig, Hive, Sqoop, etc).
  • Oozie Coordinator jobs trigger recurrent Workflow jobs based on time (frequency) and data availability.
  • Oozie Bundle jobs are sets of Coordinator jobs managed as a single job.

Oozie is an extensible, scalable and data-aware service that you can use to orchestrate dependencies among jobs running on Hadoop.

  Important: Running Services

When starting, stopping and restarting CDH components, always use the service (8) command rather than running scripts in /etc/init.d directly. This is important because service sets the current working directory to / and removes most environment variables (passing only LANG and TERM), to create a predictable environment for the service. If you run the scripts in /etc/init.d, locally-set environment variables could produce unpredictable results. If you install CDH from packages, service will be installed as part of the Linux Standard Base (LSB).