Apache Crunch Incompatible Changes and Limitations

The following changes introduced in CDH 5.2 are not backward compatible:
  • The MemPipeline now checks to ensure that any DoFns that are passed to it are serializable. This is designed to catch non-serializable DoFns during testing.
  • Scala's Iterable has been replaced by TraversableOnce inside Scrunch flatMap functions in order to support functions that return Iterators.

CDH 5.4.0 introduces new HBase APIs, which will probably require some changes to Crunch code developed against HBase 0.96 APIs. For more information, see the section on Apache Crunch under "What's New in CDH 5.4.0".