This course is designed for software developers, engineers, and data scientists who develop Spark applications and need the information and techniques for tuning their code. This is not a beginning course in Spark; students should be comfortable completing the tasks covered in Cloudera Developer Training for Apache Spark and Hadoop. Spark examples and hands-on exercises are presented in Python and Scala. The ability to program in one of those languages is required. Basic familiarity with the Linux command line is assumed. Basic knowledge of SQL is helpful.
Book the course
- How would you like to train?
- Virtual Classroom
- Coverage of all concepts found in the Spark Application UI
- RDD execution
- Data Frame execution
- Catalyst optimizer
- Recognizing and dealing with skewed data
- Handling small files
- Join optimizations
- Unbalanced partitions
- Partitioned and bucketed tables
- Object serialization
- File formats
- Storage options
- Schema inference
- Static vs. dynamic scheduling
- Dynamic resource pools in YARN
- Partition processing
- Broadcast variables
- Driver and executor memory and CPU core configuration
- Python overhead
Developing High Performance Algorithms
- Caching data
CCA Spark and Hadoop Developer Certification
This course is excellent preparation for the CCA Spark and Hadoop Developer exam. Although we recommend further training and hands-on experience before attempting the exam, this course covers many of the subjects tested.
Certification is a great differentiator. It helps establish you as a leader in the field, providing employers and customers with tangible evidence of your skills and expertise.
Advance your career
Big data developers are among the world's most in-demand and highly-compensated technical roles. Check out some of the job opportunities currently listed that match the professional profile, many of which seek CCA qualifications.
We also provide private training at your site, at your pace, and tailored to your needs.