Using Avro with MapReduce

The Avro MapReduce API is an Avro module for running MapReduce programs which produce or consume Avro data files.

If you are using Maven, simply add the following dependency to your POM:

<dependency>
    <groupId>org.apache.avro</groupId>
    <artifactId>avro-mapred</artifactId>
    <version>1.7.3</version>
    <classifier>hadoop2</classifier>
</dependency>

Then write your program using the Avro MapReduce javadoc for guidance.

At runtime, include the avro and avro-mapred JARs in the HADOOP_CLASSPATH; and the avro, avro-mapred and paranamer JARs in -libjars.

To enable Snappy compression on output files call AvroJob.setOutputCodec(job, "snappy") when configuring the job. You will also need to include the snappy-java JAR in -libjars.