Your browser is out of date

Update your browser to view this website correctly. Update my browser now

×

Overview

Cloudera University’s three-day Search training course is for developers and data engineer who want to index data in Hadoop for more powerful real-time queries. Participants will learn to get more value from their data by integrating Cloudera Search with external applications.

Learn a modern toolset

Cloudera Search brings full-text, interactive search and scalable, flexible indexing to Hadoop and an enterprise data hub. Powered by Apache Solr, Search delivers scale and reliability for a new generation of integrated, multi-workload queries.

Get hands-on experience

Through instructor-led discussion and interactive, hands-on exercises, participants will navigate the Hadoop ecosystem, learning topics such as:

  • Perform batch indexing of data stored in HDFS and HBase
  • Perform indexing of streaming data in near-real-time with Flume
  • Index content in multiple languages and file formats
  • Process and transform incoming data with Morphlines
  • Create a user interface for your index using Hue
  • Integrate Cloudera Search with external applications
  • Improve the Search experience using features such as faceting, highlighting, spelling correction

What to expect

This course is intended for developers and data engineers with at least basic familiarity with Hadoop and experience programming in a general-purpose language such as Java, C, C++, Perl, or Python. Participants should be comfortable with the Linux command line and should be able to perform basic tasks such as creating and removing directories, viewing and changing file permissions, executing scripts, and examining file output. No prior experience with Apache Solr or Cloudera Search is required, nor is any experience with HBase or SQL.

Book the course

How would you like to train?

Course Outline

Introduction

Overview of Cloudera Search

  • What is Cloudera Search?
  • Helpful Features
  • Use Cases
  • Basic Architecture

Performing Basic Queries

  • Executing a Query in the Admin UI
  • Basic Syntax
  • Techniques for Approximate Matching
  • Controlling Output

Writing More Powerful Queries

  • Relevancy and Filters
  • Query Parsers
  • Functions
  • Geospatial Search
  • Faceting

Preparing to Index Documents

  • Overview of the Indexing Process
  • Generating Configuration Files
  • Schemaless Mode
  • Schema Design
  • Collection Management
  • Using Morphlines to Extract, Transform, and Load Data into Solr

Batch Indexing HDFS Data with MapReduce and Spark

  • Overview of the MapReduce HDFS Batch Indexing Process
  • Using the MapReduce Indexing Tool
  • Testing and Troubleshooting
  • Batch Indexing of Data in HDFS with Spark

Near-Real-Time Indexing with Flume

  • Overview of the Near-Real-Time Indexing Process
  • Introduction to Apache Flume
  • How to Perform Near-Real-Time Indexing with Flume

Testing and Troubleshooting

  • Indexing HBase Data with Lily
  • What is HBase?
  • Batch Indexing for HBase
  • Indexing HBase Tables in Near-Real-Time

Understanding Language and File Type Support

  • Field Types and Analyzer Chains
  • Word Stemming, Character Mapping, and Language Support
  • Schema and Analysis Support in the Admin UI
  • Metadata and Content Extraction with Apache Tika
  • Indexing Binary File Types with SolrCell

Improving Search Quality and Performance

  • Delivering Relevant Results
  • Helping Users Find Information
  • Query Performance and Troubleshooting

Building User Interfaces for Search

  • Search UI Overview
  • Building a User Interface with Hue
  • Integrating Search into Custom Applications

Considerations for Deployment

  • Planning for Deployment
  • Determining Hardware Needs
  • Security Overview
  • Collection Aliasing

Conclusion

I immediately began using lessons from my Cloudera class to address and revisit several real-world issues and use cases that had been problematic, and was able to quickly create working code that yielded desired results. 

Intel

Learn More

Advance Your Career

Big data analysts are among the world's most in-demand and highly-compensated technical roles. Check out some of the job opportunities currently listed that match the professional profile, many of which seek experience with Search and Solr.

Private Training

We also provide private training at your site, at your pace, and tailored to your needs.

Your form submission has failed.

This may have been caused by one of the following:

  • Your request timed out
  • A plugin/browser extension blocked the submission. If you have an ad blocking plugin please disable it and close this message to reload the page.