Configuring Oozie to Enable MapReduce Jobs To Read/Write from Microsoft Azure (ADLS)

MapReduce jobs controlled by Oozie as part of a workflow can read from and write to Azure Data Lake Storage (ADLS). The steps below show you how to enable this capability. Before you begin, you will need the following information from your Microsoft Azure account:
  • The client id.
  • The client secret.
  • The refresh URL. To get this value, in the Azure portal, go to Azure Active Directory > App registrations > Endpoints. In the Endpoints region, copy the OAUTH 2.0 TOKEN ENDPOINT. This is the value you need for the refresh_URL, below.
After storing these credentials in the keystore (the JCEKS file), specify the path to this keystore in the Oozie workflow configuration.

In the steps below, replace the path/to/file with the HDFS directory where the .jceks file is located, and replace access_key_ID and secret_access_key with your Microsoft Azure credentials.

  1. Create the credential store (.jceks) and add your Azure Client ID, Client Secret, and refresh URL to the store as follows:
    hadoop credential create -provider jceks://hdfs/user/USER_NAME/adlskeyfile.jceks -value client ID
    hadoop credential create dfs.adls.oauth2.credential -provider jceks://hdfs/user/USER_NAME/adlskeyfile.jceks -value client secret
    hadoop credential create dfs.adls.oauth2.refresh.url -provider jceks://hdfs/user/USER_NAME/adlskeyfile.jceks -value refresh URL
  2. Set to the path of the .jceks file in Oozie's workflow.xml file in the MapReduce Action's <configuration> section so that the MapReduce framework can load the Azure credentials that give access to ADLS.
    <action name="ADLSjob">