Your browser is out of date

Update your browser to view this website correctly. Update my browser now


ADP logo
500B human resource data aggregated

Key highlights


Business Services


Headquarters: Roseland, NJ

Solution highlights

  • Modern Data Platform: Cloudera Enterprise, Data Hub Edition
  • Apache Hadoop Components: Apache Hive, Apache Impala, Apache Spark, MapReduce
  • Data Science Tools: Python, R, Scala
  • Databases: Oracle, Mainframes
  • BI and Analytics Tools: Qlik, Tableau, H2O

Applications supported

  • DataCloud product providing predictive analytics and insights to HR clients
  • Enterprise data hub supporting 15 internal departments and numerous use cases

Data sources

  • 600,000+ client databases capturing information on 29 million people
  • 30–35 million annual pay cycles processed by ADP
  • 15 million HR functions managed annually by ADP
  • 15 departments across ADP
  • Client data sets such as point-of-sale transactions and revenues


  • Allows clients to focus HR retention efforts on employees at highest risk of leaving
  • Helps clients understand how they compare to peers regarding pay and benefits
  • Tremendous growth among ADP client base, driving greater revenues

Big data scale

  • 1 billion records loaded per quarter, moving to monthly
  • 500 billion aggregates created
  • 200-TB lab plus (2) 400-TB production data centers, each with replication for disaster recovery

Automatic Data Processing (ADP), the Fortune 500 global provider of human capital management (HCM) solutions, has built a product called DataCloud, powered by Cloudera, that aggregates information across its 600,000 clients and generates insights to help clients prevent employee churn, ensure salary equality, and maximize human resources.


As its name implies, data is core to Automatic Data Processing (ADP)’s business; the 60-year-old Fortune 500 provider of human capital management (HCM) solutions is responsible for getting one in six Americans paid today. This puts tremendous data in ADP’s hands—and payroll is only one small piece of the work it does.

ADP is now putting that data to use and generating a new revenue stream by aggregating information across its 600,000 clients into a disruptive product offering, powered by data science on Apache Hadoop, that helps clients prevent employee churn, ensure salary equality, and maximize human resources. The product is DataCloud.


Marc Rind, Vice President of Product Development and Chief Data Scientist at ADP explains, “Whenever we speak to HR professionals at our clients, I ask the same question...What is the absolute most important part of your job? Why are you here? The answer is always the same: to find and keep the best talent possible.”

ADP’s DataCloud supports the HR mission by helping clients prevent employee turnover with a model that:

  • Identifies employees at risk of leaving

  • Provides information about the at-risk employee’s job type, location, duration in current role, and management organization

  • Delivers tools and information to share with the employee’s manager so they can address the situation

In a pilot at one account, employee turnover was at 17 percent. Using DataCloud, ADP was able to identify the top one percent of at-risk employees, and learned that within that group, turnover was actually 50 percent. When removing that top one percent from the overall analysis, average turnover dropped to nine percent. DataCloud helped the client focus on a small population of at-risk employees where they could make a meaningful impact that would drastically improve the company’s overall churn; without this insight, they would have spread retention efforts across the employee base, requiring more time and resources with a less targeted approach and having a lower impact overall.

Reducing employee churn has far-reaching business impacts. The cost of losing one employee is more than a simple hiring replacement. Recruiting and interviewing for that person’s replacement is costly. Productivity is lost while the new hire gets up to speed. Risk of others on the team leaving increases when they’re forced to pick up the slack. It’s a ripple effect.

DataCloud not only allows clients to look at their own data around human capital and employee retention, but also helps them understand how they compare to similar organizations and where/how they can make improvements. They can answer questions like:

  • Are we paying our workers appropriately based on industry averages for similar jobs, regardless of race, age, and gender?

  • What are bonus expectations?

  • What should our overtime rate be?

The value DataCloud offers is evidenced by the massive growth ADP has seen throughout its client base, driving greater success for ADP via this new revenue channel.

Business Drivers

DataCloud stemmed from a strategic shift at ADP to move from primarily processing transactions to also providing insights based on its greatest asset: data.

Upon considering building this product, ADP reached out to clients to:

  • Gauge their interest in gaining insights based on aggregated and anonymized benchmarks developed from the data spanning ADP’s customer base

  • Give them the opportunity to opt out of participating in such a program

When only seven percent opted out, ADP knew the opportunity was real.

Making the vision a reality presented a technological challenge. The data was spread across data centers and applications. It needed to be brought together for processing, exploration, and analysis. It wouldn’t be feasible using traditional relational database technology.


DataCloud runs on Cloudera Enterprise, comprising a 200-terabyte (TB) lab and two 400-TB production data centers, each with replication for disaster recovery. Tools including Apache ImpalaApache Spark, and Tableau are used to process data and benchmarks, and to facilitate data exploration and analysis.   

Ten data domains feed DataCloud a billion records every quarter, including:

  • 600,000-plus client databases capturing information on 29 million people

  • Mainframe-based data from the 30 to 35 million pay cycles ADP executes annually, including compensation, time card punches, bonuses, overtime, and salary increases  

  • Oracle-based data from the 15 million HR functions managed by ADP annually, such as benefit deductions and elections, performance scores, and recruiting processes

  • Data from 15 other ADP departments—such as Marketing, Sales, Implementations, and Service—who leverage the platform as their enterprise data hub (EDH) so they may build their own data products

  • Client data sets such as point-of-sale transactions and revenues

DataCloud conforms job title and role categorizations across 600,000 companies into a comparable standard from which 500 billion aggregates are created. Those aggregates are used to build the benchmarks that are delivered to clients. Jim Haas, Principal Architect at ADP explains, “the data is drawing everybody together.…Sometimes I call it ‘the little cluster that can’ because it’s just amazing what goes on in there in a day.”

Why Cloudera

ADP selected Cloudera for a few reasons:

  • Stability and support: By packaging, integrating, testing, and maintaining many Hadoop components into the most stable distribution, Cloudera allows ADP’s technical resources to focus on what matters to them: building product.

  • Security: ADP deals with highly sensitive data, so ensuring its data would be safe, anonymized, and secure was critical.

  • Partnership: With the right blend of technical acumen and business vision, Cloudera provides a strategic partnership, not just a product. Together, ADP and Cloudera built and deployed a platform within a few months that delivered immediate value and supports future use cases.

Cloudera has been one of the best companies I’ve ever seen in terms of customer support. Whether talking about architecture or if there are issues with a framework, I get communicated to immediately....If I need a fix, they provide it overnight.

-Jim Haas, Principal Architect, ADP