Harnessing the value of data in real-time is an increasingly common use case for Hadoop. However, advancements were necessary to fill important gaps in the storage layer. When building applications, users often had to choose: Extremely fast analytic scan rate with no ability to handle real-time modifications (HDFS with Apache Parquet) or very fast random access at the cost of scan performance (Apache HBase). For real-time analytic applications that required fast analytic performance over updating data, we started to see complex “hybrid architectures” emerge.

