Traditional data warehouse environments are being overwhelmed by the soaring volumes and wide variety of data pouring in from cloud, mobile, social media, machine, sensor, and other sources. And the problem will only worsen as big data continues to grow. IT organizations that need to address performance degradation in warehouses approaching their capacity are already considering costly upgrades. However, an upgrade is not the most effective way to manage an excess of seldom-used data. Nor does it save valuable CPU cycles currently consumed by the need to execute compute-intensive extract, load, and transform (ELT) jobs. To keep pace with exploding data volumes, the data warehouse itself needs to evolve.
One emerging strategy is data warehouse optimization using Hadoop as an enterprise data hub to augment an existing warehouse infrastructure. By deploying the Hadoop framework to stage and process raw or rarely used data, you can reserve the warehouse for high-value information frequently accessed by business users. This white paper outlines a new reference architecture for this strategy, jointly developed by Informatica and Cloudera to help organizations speed time to value, maximize productivity, lower costs, and minimize risk. Leveraging complementary technology from Informatica, the leading provider of data integration software, and Cloudera, the leader in enterprise analytic data management powered by Apache™ Hadoop®, this reference architecture supplies a blueprint for augmenting legacy warehouses to increase capacity and optimize performance. It enables organizations to better capitalize on the business value of big data.