Using Flume with Avro

The HDFSEventSink that is used to serialize event data onto HDFS supports plugin implementations of EventSerializer interface. Implementations of this interface have full control over the serialization format and can be used in cases where the default serialization format provided by the Sink does not suffice.

An abstract implementation of the EventSerializer interface is provided along with Flume, called the AbstractAvroEventSerializer. This class can be extended to support custom schema for Avro serialization over HDFS. A simple implementation that maps the events to a representation of String header map and byte payload in Avro is provided by the class FlumeEventAvroEventSerializer which can be used by setting the serializer property of the Sink as follows:

<agent-name>.sinks.<sink-name>.serializer = AVRO_EVENT