The company collects data and feedback from all digital events pertaining to online advertising campaigns and streams it into a Hadoop-based learning system that was built from scratch to perform several functions:
- Processes real-time data extract, transform, and load (ETL)
- Sorts data by user
- Builds media buying algorithms
The system captures information on how consumers are engaging with different ads, including where and how long they hover, what they’re clicking on, and how they’re navigating across the internet. This information is combined with data from advertisers, the company’s own customer data, feeds from third party providers, and more. They use HBase to facilitate real-time read/write data access in Hadoop, and run analytics on Hive-based on custom-built machine learning algorithms using MapReduce code. The petabyte (PB) sized Hadoop system is integrated with a 100-TB Greenplum reporting data warehouse that offers analytics and business intelligence to clients via TIBCO Spotfire. Data is currently migrated from Hadoop to Greenplum on an hourly or daily basis, depending on user requirements. The results of this learning system are sent to a media buying system which is connected to 30 different media suppliers around the world.