As a leading job search engine worldwide, Jobrapido’s mission is to “take the work out of looking for work.” To this end, the company aggregates more than 20 million jobs monthly for over 70 million registered users in 58 countries.
By building an enterprise data hub using Cloudera Enterprise, Jobrapido is gaining new insight about job seekers to increase traffic acquisition, job placement rates, and revenue.
By working with Cloudera, Jobrapido can conduct deeper cohort analysis on user behavior and user engagement, which, in turn, has improved traffic acquisition and the ability to better target job seekers. Businesses have taken notice.
“Thanks to Cloudera, we are increasing ROI on SEM [search engine marketing] processes,” said Serrecchia. “We can now compute in near-real time the lifetime value of every job seeker we acquire and we're also able to increase conversion rates and monetization, thanks to a much deeper A/B testing suite.”
Using its existing data warehouses, Jobrapido data scientists found that extract, transform, and load (ETL) and data processing took too long and often excluded important data sources that were needed to gain insight into the lifetime value and needs of job seekers. These constraints were holding the company back from its mission to “take the work out of looking for work.”
“In Jobrapido, data is considered an important asset and we put a lot of effort and resources to leverage data,” said Serrecchia. “Our old data architecture was created around a classic ETL pipeline composed by several jobs running on the same server. Obviously, this solution was not scalable in terms of computational power nor correlated to data volume growth.”
Executives viewed moving to a modern data platform critical to the company’s business success. “The correct analysis and management of big data is essential today,” added Michele Pinto, technical team leader, Big Data, Jobrapido. “We want to guarantee accurate results to every single user, every time a candidate searches for a job on our platform. To do so, a big data architecture is crucial to store as much data as possible.”
“We have revolutionized our data platform, replacing it with a new big data architecture from Cloudera that is scalable in throughput and provides the computational power that matches our data growth rate,” said Pinto. “The result is a solution with low maintenance costs that removes any bottleneck in tracking events. Because building new tracking is much less costly, it also helps us create the data culture that we need in order to achieve our goals.”
The platform not only provides fast, flexible data processing, but also supports high-performance SQL business intelligence (BI) and exploration. Data is now available in real time for various tasks including reporting, visualization, analytics, and machine learning.
For example, the company applies machine learning algorithms for profile scoring as well as document clustering and classification. With the ability to process, store, and analyze greater data volumes and a wider range of data, including unstructured text, the company is dramatically improving the accuracy of its classification algorithms.
Additionally, the Cloudera platform provides BI analysts with fast time-to-insight using Apache Impala (incubating). “Our department’s main mission is to give all company stakeholders access to the data,” said Serrecchia. “We use Impala to boost performance of our SQL queries against our data lake. Impala is an incredible service that gives us impressive performance on queries.”
Cloudera Support also provided Jobrapido staff with valuable expertise to help ensure their success in deploying a new big data architecture.
Headquarters: Milan, Italy
- Modern Data Platform: Cloudera Enterprise
- Workloads: Analytic Database, Data Science and Data Engineering, Operational Database
- Components: Apache Flume, Apache HBase, Apache Hive, Apache Kafka, Apache Impala (incubating), Apache Oozie, Apache Spark, Avro, Hue
- Databases: HP Vertica, Postgres
- BI & Analytics Tools: Knime, Microsoft Power BI, Tableau, R
- ETL Tool: Talend
- Business Intelligence
- Clickstream data
- Relational data
- Applications and services logs
- Social media data
- Google AdWords
- Google AdSense
- Google Analytics
- Data Management Platform (Lotame)
- Increased conversion rates
- Increased ROI on SEM (search engine marketing) processes
- Increased traffic acquisition
Big Data Scale
- 2 TB/month