Your browser is out of date!

Update your browser to view this website correctly. Update my browser now

×

Zoosk, an online dating site, reduced dissatisfied user churn by 10 percent and can now innovate faster with up to 80 times faster processing performance.

Overview

Zoosk is an online dating site, serving more than 38 million singles in 80 countries and 25 languages. More than three million messages are sent via the company’s mobile and web apps daily. One of Zoosk’s key differentiators is photo verification, a feature to confirm the authenticity of members’ profile pictures. Zoosk has verified over 12 million user pictures since launching this feature in 2014.

By building its enterprise data hub on Cloudera Enterprise, Zoosk gained new insight into user behavior and preferences that have helped improve the user experience and enabled singles to more quickly find a match.

Impact

For singles seeking a relationship, the question is always: When will I meet that special someone?

At Zoosk, use of Cloudera Enterprise has helped the company optimize searches to reduce the wondering and enable its customers to find a match faster.

“We conduct a lot of studies on behavior,” said Marilson Campos, senior data architect, Zoosk. “As users engage with each other on the site, we capture those events, identify patterns, and build a model for each user to ensure that his or her search results prioritize singles who are most likely to reply. Without [Apache] Hadoop and Cloudera Enterprise, it wouldn't be possible to process those events and combinations to gain the insights we have.”

Such insights have also helped Zoosk refine application features to reduce dissatisfied user churn (as opposed to churn caused by successful matchmaking events) by 10 percent. “Because we capture all of the data and can query it quickly, we can see which events are most likely to increase churn,” said Campos. “We’ve made several changes to our products as a result.”

Additionally, improved platform performance--with up to 80 times faster processing--helps the company innovate faster.

“Product managers now are free to innovate and can test ideas quickly,” said Campos. “We can see right away if an idea doesn't work and move to the next one. That's one area where [Apache] Impala really helped us. You don't want to wait hours to see results.”

Because we capture all of the data and can query it quickly, we can see which events are most likely to increase churn.

Marilson Campos, Senior Data Architect, Zoosk

Business Drivers

A significant challenge for dating sites is the amount of information that must be processed and analyzed to help a person find a match. Traditional database platforms limited the amount of data Zoosk could store and process.

“Without Apache Hadoop, there would be no viable way to process the data,” said Campos. “We're processing about five terabytes a day, and we have a little more than one petabyte of data stored.”

Cloudera offers the best Hadoop platform because it gives you the tools to perform and execute very fast compared to other options. And Cloudera Manager makes it very easy and intuitive to manage so we don’t need dedicated staff to manage our cluster.

Marilson Campos, Senior Data Architect, Zoosk

Through its enterprise data hub, built on Cloudera Enterprise, Zoosk can process, store, analyze, and serve all the data it needs, including unstructured data, such as web logs, and structured data from internal databases and external data sources. The company is pursuing a number of applications for its big data, including delivering more in-depth behavioral analytics and building machine learning models from large datasets.

“Traditionally, the difficulty is how to scale to build a million models in a day, and that's where the tools are really helping,” said Campos.

In building its enterprise data hub, data architects sought to deliver data as a product with an agreed upon service level. This approach helped staff identify which Hadoop components would best support each use case.

For example, Apache Hive is considered the “standard” for processing the data pipeline, enabling staff to conduct complex queries on very large datasets, while Apache Impala (incubating) with Apache Parquet is used primarily for queries performing aggregations in which immediate response adds value.

“Once we looked at the steps required to deliver a specific SLA [service level agreement], we could better measure and decide which Hadoop tools to use at each stage,” said Campos. “That’s where we saw a clear need for Impala, particularly for A/B testing. Product managers wanted to see results in a few hours and without Impala we wouldn’t have been able to deliver the data fast enough. We were blown away by the performance.”

Ultimately, Zoosk plans to move its Hadoop clusters to the cloud, using Cloudera Director to facilitate provisioning .

“The idea of instant processing capability comes realized with the cloud and Cloudera Director, because I can fire clusters up really easily to gain additional computing power and then shut them down when we don’t need it,” said Campos.

Why Cloudera

“Cloudera offers the best Hadoop platform because it gives you the tools to perform and execute very fast compared to other options,” said Campos. “And Cloudera Manager makes it very easy and intuitive to manage so we don’t need dedicated staff to manage our cluster.”

Additionally, Cloudera Support provided vital expertise as Zoosk’s Hadoop environment grew.

“As complexity increases, you have to have somebody to help you, particularly with upgrades, if you want to move fast,” said Campos. “Cloudera Support enables you to move fast. We had issues that we wouldn’t have been able to solve as quickly as we did without Cloudera Support.”

The idea of instant processing capability comes realized with the cloud and Cloudera Director, because I can fire clusters up really easily to gain additional computing power and then shut them down when we don’t need it.

Marilson Campos, Senior Data Architect, Zoosk