Pig is an Apache project that uses a scripting language to query and analyze large data sets. With Apache Pig, users can create MapReduce programs without writing Java code. This e-learning module teaches you how to write user-defined functions (UDFs) that can be executed inside of Pig to extend performance and develop a custom library of operations. We discuss what Pig UDFs are, supported functions and languages, and how to write custom UDFs in Java and Python. The module includes a hands-on exercise where you will write your own UDF in Python, complete with a sample solution.
Learn the new Parcel format for installing and upgrading CDH and other Hadoop ecosystem components. Parcels enable the new rolling upgrade functionality in Cloudera Manager, provide rollback functionality, and make maintenance windows short and painless. In this e-learning module, we discuss the benefits of Parcels, compare Parcels and packages, and understand what a Parcel file contains. The module finishes with a complete demonstration of a CDH upgrade and several component installations, including Cloudera Impala and Cloudera Search.
Learn how to use interactive, full-text search to quickly find relevant data in Hadoop and solve critical business problems simply and in real time. Cloudera Search combines the established, feature-rich, open-source search platform of Apache Solr and its extensible APIs for easy integration with CDH. In this e-learning module, you will learn the fundamentals, use cases, and features of Cloudera Search. The module includes a short discussion of Cloudera Search architecture and a product demonstration.
Hive is an Apache project that facilitates ad hoc queries and analyses of large data sets in the Hadoop cluster using a SQL-like language. This e-learning module teaches you how to write user-defined functions (UDFs) to augment Hive's built-in capabilities. We discuss why UDFs are necessary, what kinds of UDFs exist, and how to write custom UDFs in Java. The module includes a hands-on exercise where you will write your own UDF, complete with a sample solution.
Work at the speed of thought! This e-learning course explores Cloudera Impala's features, architecture, and benefits over legacy Hadoop platforms. Learn how to run interactive queries inside Impala and understand how it optimizes data systems. This free online course includes a training module, homework, and an Impala demo VM download to experiment with this powerful new tool.