The Cask Data Application Platform (CDAP) provides an abstraction layer and runtime services enabling developers, even those without extensive Hadoop experience, to innovate and deliver new big data applications.
CDAP is available in three versions. The standalone SDK enables developers to develop and debug applications. The standalone VM provides the same functionality, and is recommended for Windows users. The distributed version of CDAP is available in RPM and DEB bundles. Applications developed on the standalone versions of CDAP can be deployed without modification on the CDAP distributed package running on a CDH cluster.
Supporting the installation and management of CDAP is the CDAP CSD (Custom Services Descriptor) for Cloudera Manager.
- System Requirements
- Quickstart Guide
- Supported Operating Systems
- Supported JDK Versions
- Supported Browsers
- Supported Node.js
- Additional Configuration
Supported Operating Systems
CDAP supports these operating systems:
- Red Hat Enterprise Linux and CentOS 5.7, 64-bit
- Red Hat Enterprise Linux and CentOS 5.10, 64-bit
- Red Hat Enterprise Linux and CentOS 6.4, 64-bit
- Red Hat Enterprise Linux and CentOS 6.4 in SE Linux Mode
- Red Hat Enterprise Linux and CentOS 6.5, 64-bit
Supported JDK Versions
CDAP supports JDK1.7.x or JDK1.8.x. Please refer to Cloudera Manager requirements for installing and upgrading Java.
CDAP Console supports these browsers:
- Firefox 11 or later
- Google Chrome
- Safari 5 or later
CDAP Console requires Node.js version 0.10.* or higher.
Certain YARN containers launched by CDAP connect to Zookeeper. It is recommended that 'maxClientCnxns' be set to zero (unlimited).
Kerberos-enabled clusters require additional settings and setup which are not currently managed by Cloudera Manager:
- The 'cdap' user needs to be granted HBase permissions to create tables. Run "grant 'cdap', 'CRW'” in an HBase shell.
- The 'cdap' user must be able to launch YARN containers, often by adding it to the YARN "allowed.system.users".
Confirm that YARN is configured properly to run MapReduce programs. Often, this includes ensuring that the HDFS "/user/yarn" directory exists with proper permissions.
Lower the default minimum YARN container size by adjusting the configuration "yarn.scheduler.minimum-allocation-mb" appropriately.
Versions of HIVE can attempt to create a temporary staging directory at the table location when executing queries. If there are permission issues observed when running a query, set "hive.exec.stagingdir" in your HIVE configuration to a temporary directory such as "/tmp/hive-staging". This can be set through Cloudera Manager under the "Hive Client Advanced Configuration Snippet (Safety Valve) for hive-site.xml" configuration field.
Quickstart guide for installing and using CDAP within Cloudera Manager
1) Install CDAP Custom Service Descriptor (CSD)
2) Download and Distribute the CDAP parcel
3) Run the "Add Service" Wizard and select “CDAP”:
Wizard Page 2: Optional Hive dependency is for the optional CDAP “Explore" component which can be enabled
Wizard Page 3: CDAP "Security Auth" Service is an optional service for CDAP perimeter security; it can be configured and enabled
Wizard Page 5: "Kerberos Auth Enabled” is needed if running against a secure Hadoop cluster.
Wizard Page 5: "Router Server Port": Should match the "Router Bind Port”; it’s used by the UI to connect to the Router service.
After the Setup Wizard completes, the “Quick Link” from the “Cask DAP” service should load the UI. (By default, port 9999 of the host where the Web-App role instance is running.) The UI may initially show errors while all the CDAP YARN containers are starting up. Allow up to a few minutes for this. The "System Health" section on the Overview page show the status of the CDAP services. They should all turn green, showing completion of startup.
Want to Get Involved or Learn More?
Check out our other resources
Receive expert Hadoop training through Cloudera University, the industry's only truly dynamic Hadoop training curriculum that’s updated regularly to reflect the state of the art in big data.