As an online retail pioneer for over 15 years, Connexity, Inc.—formerly Shopzilla, Inc.—operates a portfolio of shopping websites and is recognized as a leading source for connecting buyers and sellers in the digital world.
As an online retail pioneer since 1996, Connexity, Inc. operates a portfolio of shopping websites and is recognized as a leading source for connecting buyers and sellers in the digital world. Connexity touches a global audience of more than 40 million shoppers each month, linking them with over 100 million products from tens of thousands of retailers. Connexity’s portfolio of comparison shopping, merchant ratings, and review sites—in addition to its namesake site—includes Bizrate, Bizrate Insights, Beso, Tada, and a more recent addition, Connexity, which provides audience targeting.
To accommodate its requirement to process and deliver insights on millions of pageviews or ten billion ad bid requests daily, reaching over 100 million unique visitors, Connexity has deployed Cloudera to complement its enterprise data warehouse (EDW) in a hybrid big data environment that meets the needs of a wide range of users. A large amount of processing, cleansing, transformation, and crunching is done in the Hadoop environment and then aggregated data is pushed into the data warehouse via Apache Sqoop for reporting. Connexity has written a custom tool, known internally as Forklift, to move data from the EDW into CDH in an optimized fashion.
Impact: Faster Processing
For its merchandising process, Connexity takes over 15,000 feeds and 100 million products from retailers and processes them with Cloudera each day. What once took several days has been reduced to just a few hours, and a new approach is being tested that will reduce that down to minutes. The improved processing performance also benefits Connexity’s search engine marketing (SEM) activities.
Impact: Detailed Insight, Relevant Results
Connexity uses a combination of Apache Mahout and R running on Cloudera to perform classifications and user segmentation on its analytics and research cluster.
Retail, Business Services
- Legacy system was maxed out; took hours to process 100 million products per day
- 500-TB EDW growing by 5TB every day
- Hybrid big data environment supports online price comparison services, SEO, SEM, merchandising, audience scoring, and data science
- Data processing reduced from days to hours or minutes
- Real-time reporting on 10B ad requests daily
- Improved monetization of data science and business analysis processes