Data Science Challenge

Solve Real-World Problems Faced by Top Data Scientists

In the challenge portion of CCP:DS, candidates compete against each other and against a benchmark set by a committee including some of the world's elite data scientists. Participants who surpass evaluation benchmarks receive the CCP: Data Scientist credential.

Prerequisite: Data Science Essentials (DS-200)
Schedule: Twice per year
Duration: Three months from launch date
Next Challenge Dates: Fall 2014
Language: English
Price: USD $600


Pre-Test Details


Most Recent Challenge

Detecting Anomalies in Medicare Claims

In the U.S., Medicare reimburses private providers for medical procedures performed for covered individuals. As such, it needs to verify that the type of procedures performed and the cost of those procedures are consistent and reasonable. Finally, it needs to detect possible errors or fraud in claims for reimbursement from providers. You have been hired to analyze a large amount of data from Medicare and try to detect abnormal data -- providers, areas, or patients with unusual procedures and/or claims.

You have access to the following summary data, which aggregate information on procedures performed and billed by providers in 2011, as well as how much Medicare reimbursed:

Inpatient and outpatient are simply different types of procedures, which don’t overlap. A provider might provide procedures of both types. Inpatient procedures are coded with DRG codes, and outpatient with APC codes.

You also have access to actual individual patient procedures from 2013 and a small Hadoop cluster with data and relevant tools already loaded.

The following questions have been posed to you:

  1. We think that some providers and regions are consistently billing too much for procedures or billing for too many procedures, perhaps inadvertently.
    • Which procedures have the highest relative variance in cost?
    • For each procedure, consider the average amount claimed by each provider. Which three providers had the highest average amount claimed for the largest number of procedures?
  2. Some providers and regions are likely to be different in more subtle ways.
    • Based on amount and type of procedures claimed, which three providers and regions are least like the others?
    • Can you briefly explain what seems to be different about these?
  3. We have a lot of individual patient claim data. Our staff have identified several that look unusual -- they could be errors or fraud or simply unique patient contexts that are worth review.
    • Given this information, identify another 10,000 patients that seem most likely to need review.
    • Can you briefly describe some common features in these patients?

Rules

Individual Contributions Only
You must participate in this challenge only on an individual basis; teams are not permitted.

Sharing
Any sharing of code or solutions or collaboration with another person or entity is strictly forbidden.

Tools
You may use any tools or software you desire to complete the challenge.