CCA Spark and Hadoop Developer Exam (CCA175)
- Number of Questions: 8–12 performance-based (hands-on) tasks on Cloudera Enterprise cluster. See below for full cluster configuration
- Time Limit: 120 minutes
- Passing Score: 70%
- Language: English
- Price: USD $295
Exam Question Format
Each CCA question requires you to solve a particular scenario. In some cases, a tool such as Impala or Hive may be used. In other cases, coding is required. In order to speed up development time of Spark questions, a template may be provided that contains a skeleton of the solution, asking the candidate to fill in the missing lines with functional code. This template will either be written in Scala or written in Python, but not necessarily both.
You are not required to use the template and may solve the scenario using a language you prefer. Be aware, however, that coding every problem from scratch may take more time than is allocated for the exam.
Evaluation, Score Reporting, and Certificate
Your exam is graded immediately upon submission and you are e-mailed a score report the same day as your exam. Your score report displays the problem number for each problem you attempted and a grade on that problem. If you fail a problem, the score report includes the criteria you failed (e.g., “Records contain incorrect data” or “Incorrect file format”). We do not report more information in order to protect the exam content. Read more about reviewing exam content on the FAQ.
If you pass the exam, you receive a second e-mail within a few days of your exam with your digital certificate as a PDF, your license number, a Linkedin profile update, and a link to download your CCA logos for use in your personal business collateral and social media profiles
Audience and Prerequisites
There are no prerequisites required to take any Cloudera certification exam. The CCA Spark and Hadoop Developer exam (CCA175) follows the same objectives as Cloudera Developer Training for Spark and Hadoop and the training course is an excellent preparation for the exam.
The skills to transfer data between external systems and your cluster. This includes the following:
Import data from a MySQL database into HDFS using Sqoop
Export data to a MySQL database from HDFS using Sqoop
Change the delimiter and file format of data during import using Sqoop
Ingest real-time and near-real-time streaming data into HDFS
Process streaming data as it is loaded onto the cluster
- Load data into and out of HDFS using the Hadoop File System commands
Transform, Stage, and Store
Convert a set of data values in a given format stored in HDFS into new data values or a new data format and write them into HDFS.
Load RDD data from HDFS for use in Spark applications
Write the results from an RDD back into HDFS using Spark
Read and write files in a variety of file formats
- Perform standard extract, transform, load (ETL) processes on data
Use Spark SQL to interact with the metastore programmatically in your applications. Generate reports by using queries against loaded data.
Use metastore tables as an input source or an output sink for Spark applications
Understand the fundamentals of querying datasets in Spark
Filter data using Spark
Write queries that calculate aggregate statistics
Join disparate datasets using Spark
- Produce ranked or sorted data
This is a practical exam and the candidate should be familiar with all aspects of generating a result, not just writing code.
- Supply command-line options to change your application configuration, such as increasing available memory
Spark 1 and Spark 2
Your exam cluster runs CDH 5.15 which comes with Spark 1.6. An additional package has been installed to offer Spark 2.3. Candidates should be aware of how to run two different versions of Spark before taking the exam. Instructions are found here: https://www.cloudera.com/documentation/spark2/latest/topics/spark_running_apps.html
Exam delivery and cluster information
CCA175 is a remote-proctored exam available anywhere, anytime. See the FAQ for more information and system requirements.
CCA175 is a hands-on, practical exam using Cloudera technologies. Each user is given their own CDH5 (currently 5.15.0) cluster pre-loaded with Spark 1.6, Spark 2.3, Impala, Crunch, Hive, Pig, Sqoop, Kafka, Flume, Kite, Hue, Oozie, DataFu, and many others (See a full list). In addition the cluster also comes with Python (2.6, 2.7, and 3.4), Perl 5.10, Elephant Bird, Cascading 2.6, Brickhouse, Hive Swarm, Scala 2.11, Scalding, IDEA, Sublime, Eclipse, and NetBeans.
Documentation Available online during the exam
Cloudera Product Documentation
JDK 7 API Docs
Python 2.7 Documentation
Python 3.4 Documentation
Only the documentation, links, and resources listed above are accessible during the exam. All other websites, including Google/search functionality and access to Spark external packages is disabled. You may not use notes or other exam aids.