By implementing Cloudera as its advanced analytics sandbox, Equifax can process and analyze large volumes of data faster, allowing data scientists to build better models based on larger data sets.
Equifax organizes, assimilates, and analyzes data on more than 600 million consumers and 80 million businesses worldwide to deliver insights that support a wide range of applications—from assessing a consumer’s credit risk to helping businesses grow their customer base to helping government agencies fight fraud.
Equifax’s success depends on its ability to perform risk analysis and build data models very quickly, with a high degree of statistical accuracy. And as data volumes increased, the company’s need for accelerated turnaround time increased as well.
By deploying an advanced analytics environment powered by Cloudera as part of its long-term enterprise data hub (EDH) strategy, Equifax data scientists can more quickly analyze much larger data sets so their models can be built in less time than prior methods, with a high degree of accuracy. And the real-time insights delivered by Apache Impala enable data scientists to more quickly test new theories as they design new data models.
Equifax’s primary goal was to reduce the time it took to develop new data models, and the deployment of the new analytics platform is expected to deliver dramatic time savings. Additionally, data scientists can perform real-time queries in their analytics sandbox, without having to move the data to another platform. This reduces the cost and effort of managing and synchronizing data across different environments.
- Reduce development time for new data models, which could take weeks to build and deliver
- Perform deeper analytics on larger data sets
- Apache Hadoop Platform: Cloudera Enterprise
- Apache Hadoop Components: Apache Hive, Apache Impala, Apache Mahout, Apache Sentry, Apache Spark
- ETL tool: Talend
- Analytics tools: Alpine Data Labs, R, SAS
- Security tool: Protegrity
- Servers: HP ProLiant DL360p Gen8 Servers using Intel® Xeon® E5-2600 v2 processors (management node), HP ProLiant SL4540 Gen8 Servers using Intel Xeon E5-2400 processors (edge node)
- Analytics sandbox
- Significant time reduction in delivery of new data models
- Deeper insights to help solve customer challenges
- Simplified management and reduced costs
Big Data Scale
- Analysis of 5 years’ data