X

Cloudera Tutorials

Optimize your time with detailed tutorials that clearly explain the best way to deploy, use, and manage Cloudera products. Login or register below to access all Cloudera tutorials.

Cloudera named a leader in 2022 Gartner® Magic Quadrant™ for Cloud Database Management Systems Get the report

Ready to Get Started?

 

NOTICE

 

As of January 31, 2021, this tutorial references legacy products that no longer represent Cloudera’s current product offerings.

Please visit recommended tutorials:

 

Introduction

Our objective in this part of building the HVAC system analysis application is to setup the development environment by shell code, so we can jump to acquiring the data, cleaning it and visualizing keep aspects of it to show insight to our clients. We will clean up the NiFi canvas on HDF, so we start fresh with no pre-existing dataflows.

Prerequisites

  • Enabled CDA for your appropriate system.

Outline

Verify Prerequisites Have Been Covered

Map sandbox IP to desired hostname in hosts file

  • If you need help mapping Sandbox IP to hostname, reference Environment Setup -> Map Sandbox IP To Your Desired Hostname In The Hosts File in Learning the Ropes of HDP Sandbox

Setup Ambari admin password for "HDF" and "HDP"

If you need help setting the Ambari admin password,

Started up all required services for "HDF" and "HDP"

If unsure, login to Ambari admin Dashboard

Overview of Shell Code Used in Both Approaches

In approach 1, you will manually run the code line by line to setup the development environment. Yet in approach 2, you will download and execute a shell script to automate development environment setup. The shell code tee -a <file> << EOF ... EOF appends Google's Public DNS to /etc/resolve.conf to translate a variety of hostnames to IP addresses. The problem that this configuration potentially solves is using Google's Public DNS to translate s3.amazonaws.com to an IP address that services in the application can use to get data. The next couple lines of code initialize variables that will be used in the Ambari REST API Calls. The wait() function waits for Ambari services on HDF or HDP to stop or start. The two curl commands are used for sending JSON data by Ambari REST API Calls to tell the Ambari Server to overwrite the existing NiFi service state to INSTALLED (means STOPPED) or STARTED. First we tell Ambari we want to STOP the NiFi service. Next we backup and remove the pre-existing NiFi flow to clean the NiFi canvas, so we can do development. Lastly, we tell Ambari to START the NiFi service for the changes to take effect.

Approach 1: Manually Setup Development Environment

Setting up HDF

We will be using shell commands to setup the required services in our data-in-motion and data-at-rest platforms from the sandbox web shell clients.

Open the HDF web shell client located at http://sandbox-hdf.hortonworks.com:4200.

Prior to executing the shell code, replace the following string "<Your-Ambari-Admin-Password>" in the following line of code setup_nifi "admin" "<Your-Ambari-Admin-Password>" on the last line with the password you created for ambari admin user. For example, if our Ambari Admin password was set to yellowHadoop, then the line of code would look as follows: AMBARI_USER_PASSWORD="yellowHadoop"

Copy and paste the following shell code line by line in HDF web shell:

##
# Script sets up HDF services used in Building an HVAC System Analysis Application
##

DATE=`date '+%Y-%m-%d %H:%M:%S'`
LOG_DIR_BASE="/var/log/cda-sb/310/"
echo "$DATE INFO: Setting Up HDF Dev Environment for HVAC System Analysis App"
mkdir -p $LOG_DIR_BASE/hdf
setup_public_dns()
{
  ##
  # Purpose of the following section of Code:
  # NiFi GetHTTP will run into ERROR cause it can't resolve S3 Domain Name Server (DNS)
  #
  # Potential Solution: Append Google Public DNS to CentOS7 /etc/resolve.conf
  # CentOS7 is the OS of the server NiFi runs on. Google Public DNS is able to
  # resolve S3 DNS. Thus, GetHTTP can download HVAC Sensor Data from S3.
  ##

  # Adding Public DNS to resolve msg: unable to resolv s3.amazonaws.com
  # https://forums.aws.amazon.com/thread.jspa?threadID=125056
  echo "$DATE INFO: Adding Google Public DNS to /etc/resolve.conf"
  echo "# Google Public DNS" | tee -a /etc/resolve.conf
  echo "nameserver 8.8.8.8" | tee -a /etc/resolve.conf

  echo "$DATE INFO: Checking Google Public DNS added to /etc/resolve.conf"
  cat /etc/resolve.conf

  # Log everything, but also output to stdout
  echo "$DATE INFO: Executing setup_nifi bash function, logging to $LOG_DIR_BASE/hdf/setup-public-dns.log"
}

setup_nifi()
{
  ##
  # Purpose of the following section of Code:
  # CDF Sandbox comes with a prebuilt NiFi flow, which causes user to be
  # pulled away from building the HVAC System Analysis Application.
  #
  # Potential Solution: Backup prebuilt NiFi flow and call it a different name.
  ##

  echo "$DATE INFO: Setting HDF_AMBARI_USER based on user input"
  HDF_AMBARI_USER="$1" # $1: Expects user to pass "Ambari User" into the file
  echo "$DATE INFO: Setting HDF_AMBARI_PASS based on user input"
  HDF_AMBARI_PASS="$2" # $2: Expects user to pass "Ambari Admin Password" into the file
  HDF_CLUSTER_NAME="Sandbox"
  HDF_HOST="sandbox-hdf.hortonworks.com"
  HDF="hdf-sandbox"
  AMBARI_CREDENTIALS=$HDF_AMBARI_USER:$HDF_AMBARI_PASS

  # Ambari REST Call Function: waits on service action completing

  # Start Service in Ambari Stack and wait for it
  # $1: HDF or HDP
  # $2: Service
  # $3: Status - STARTED or INSTALLED, but OFF
  wait()
  {
    if [[ $1 == "hdp-sandbox" ]]
    then
      finished=0
      while [ $finished -ne 1 ]
      do
        echo "$DATE INFO: Waiting for $1 $2 service action to finish"
        ENDPOINT="http://$HDP_HOST:8080/api/v1/clusters/$HDP_CLUSTER_NAME/services/$2"
        AMBARI_CREDENTIALS="$HDP_AMBARI_USER:$HDP_AMBARI_PASS"
        str=$(curl -s -u $AMBARI_CREDENTIALS $ENDPOINT)
        if [[ $str == *"$3"* ]] || [[ $str == *"Service not found"* ]]
        then
          echo "$DATE INFO: $1 $2 service state is now $3"
          finished=1
        fi
          echo "$DATE INFO: Still waiting on $1 $2 service action to finish"
          sleep 3
      done
    elif [[ $1 == "hdf-sandbox" ]]
    then
      finished=0
      while [ $finished -ne 1 ]
      do
        echo "$DATE INFO: Waiting for $1 $2 service action to finish"
        ENDPOINT="http://$HDF_HOST:8080/api/v1/clusters/$HDF_CLUSTER_NAME/services/$2"
        AMBARI_CREDENTIALS="$HDF_AMBARI_USER:$HDF_AMBARI_PASS"
        str=$(curl -s -u $AMBARI_CREDENTIALS $ENDPOINT)
        if [[ $str == *"$3"* ]] || [[ $str == *"Service not found"* ]]
        then
          echo "$DATE INFO: $1 $2 service state is now $3"
          finished=1
        fi
          echo "$DATE INFO: Still waiting on $1 $2 service action to finish"
          sleep 3
      done
    else
      echo "$DATE ERROR: Didn't Wait for Service, sandbox chosen not valid"
    fi
  }

  # Stop NiFi first, then backup prebuilt NiFi flow, then start NiFi for
  # changes to take effect
  echo "$DATE INFO: Stopping HDF NiFi Service via Ambari REST Call"
  curl -u $AMBARI_CREDENTIALS -H "X-Requested-By: ambari" -X PUT -d '{"RequestInfo":
  {"context": "Stop NiFi"}, "ServiceInfo": {"state": "INSTALLED"}}' \
  http://$HDF_HOST:8080/api/v1/clusters/$HDF_CLUSTER_NAME/services/NIFI
  echo "$DATE INFO: Waiting on HDF NiFi Service to STOP RUNNING via Ambari REST Call"
  wait $HDF NIFI "INSTALLED"

  echo "$DATE INFO: Prebuilt HDF NiFi Flow removed from NiFi UI, but backed up"
  mv /var/lib/nifi/conf/flow.xml.gz /var/lib/nifi/conf/flow.xml.gz.bak

  echo "$DATE INFO: Starting HDF NiFi Service via Ambari REST Call"
  curl -u $AMBARI_CREDENTIALS -H "X-Requested-By: ambari" -X PUT -d '{"RequestInfo":
  {"context": "Start NiFi"}, "ServiceInfo": {"state": "STARTED"}}' \
  http://$HDF_HOST:8080/api/v1/clusters/$HDF_CLUSTER_NAME/services/NIFI
  echo "$DATE INFO: Waiting on HDF NiFi Service to START RUNNING via Ambari REST Call"
  wait $HDF NIFI "STARTED"

  # Log everything, but also output to stdout
  echo "$DATE INFO: Executing setup_nifi bash function, logging to $LOG_DIR_BASE/hdf/setup-nifi.log"
}

setup_public_dns | tee -a $LOG_DIR_BASE/hdf/setup-public-dns.log
setup_nifi "admin" "<Your-Ambari-Admin-Password>" | tee -a $LOG_DIR_BASE/hdf/setup-nifi.log

Setting up HDP

Open the HDP web shell client located at http://sandbox-hdp.hortonworks.com:4200. Copy and paste the following code to HDP web shell line by line.

##
# Sets up HDP services used in Building an HVAC System Analysis Application
##

DATE=`date '+%Y-%m-%d %H:%M:%S'`
LOG_DIR_BASE="/var/log/cda-sb/310/"
echo "Setting Up HDP Dev Environment for HVAC System Analysis App"

# Creates /sandbox directory in HDFS
# allow read-write-execute permissions for the owner, group, and any other users
mkdir -p $LOG_DIR_BASE/hdp
setup_hdfs()
{
  echo "$DATE INFO: Creating /sandbox/sensor/hvac_building and /sandbox/sensor/hvac_machine"
  sudo -u hdfs hdfs dfs -mkdir -p /sandbox/sensor/hvac_building/
  sudo -u hdfs hdfs dfs -mkdir /sandbox/sensor/hvac_machine
  echo "$DATE INFO: Setting permissions for hvac_buildnig and hvac_machine to 777"
  sudo -u hdfs hdfs dfs -chmod -R 777 /sandbox/sensor/hvac_building/
  sudo -u hdfs hdfs dfs -chmod -R 777 /sandbox/sensor/hvac_machine
  echo "$DATE INFO: Checking both directories were created and permissions were set"
  sudo -u hdfs hdfs dfs -ls /sandbox/sensor
}
# Log everything, but also output to stdout
echo "$DATE INFO: Executing setup_hdfs bash function, logging to /var/log/cda-sb/310/setup-hdp.log"
setup_hdfs | tee -a $LOG_DIR_BASE/hdp/setup-hdp.log

Now that the development environment is setup, you can move onto the summary.

Approach 2: Automatically Setup Development Environment

We will download and execute a shell script to automate the setup of our data-in-motion and data-at-rest platforms from the sandbox web shell clients.

Open HDF web shell client located at http://sandbox-hdf.hortonworks.com:4200.

Prior to executing the shell script, replace the following line of shell code AMBARI_USER_PASSWORD="<Your-Ambari-Admin-Password>" with the password you created for Ambari Admin user. For example, if our Ambari Admin password was set to yellowHadoop, then the line of code would look as follows: AMBARI_USER_PASSWORD="yellowHadoop"

Copy and paste the following code to HDF web shell with your updated change:

AMBARI_USER="admin"
AMBARI_USER_PASSWORD="<Your-Ambari-Admin-Password>"
wget https://raw.githubusercontent.com/hortonworks/data-tutorials/master/tutorials/cda/building-an-hvac-system-analysis-application/application/setup/shell/setup-hdf.sh
bash setup-hdf.sh $AMBARI_USER $AMBARI_USER_PASSWORD

Open HDP web shell client located at http://sandbox-hdp.hortonworks.com:4200. Copy and paste the following code to HDP web shell:

wget https://raw.githubusercontent.com/hortonworks/data-tutorials/master/tutorials/cda/building-an-hvac-system-analysis-application/application/setup/shell/setup-hdp.sh
bash setup-hdp.sh

Now that the development environment is setup, you can move onto the summary.

Summary

Congratulations! You now have the development environment ready to start building the HVAC system analysis application. The services needed for the application development have been setup, so we can began building the data pipeline.

Further Reading



Your form submission has failed.

This may have been caused by one of the following:

  • Your request timed out
  • A plugin/browser extension blocked the submission. If you have an ad blocking plugin please disable it and close this message to reload the page.