Data Science Challenge
Solve Real-World Problems Faced by Top Data Scientists
In the challenge portion of CCP:DS, candidates compete against each other and against a benchmark set by a committee including some of the world's elite data scientists. Participants who surpass evaluation benchmarks receive the CCP: Data Scientist credential.
Prerequisite: Data Science Essentials (DS-200)
Schedule: Twice per year
Duration: Three months from launch date
Next Challenge Dates: Fall 2014
Price: USD $600
- Solution Kit: live data set, tutorial, and solution explanation from the Web Analytics Challenge
- Previous Challenge: overview of the most recent closed challenge
- Inaugural Challenge: overview of the very first challenge
- Meet the Data Scientists: short bios of the most recent CCP:DS class
- Pearson VUE Instructions: please read prior to registering
- Exam Policies and NDA: required to sign prior to beginning the challenge
Most Recent Challenge
Detecting Anomalies in Medicare Claims
In the U.S., Medicare reimburses private providers for medical procedures performed for covered individuals. As such, it needs to verify that the type of procedures performed and the cost of those procedures are consistent and reasonable. Finally, it needs to detect possible errors or fraud in claims for reimbursement from providers. You have been hired to analyze a large amount of data from Medicare and try to detect abnormal data -- providers, areas, or patients with unusual procedures and/or claims.
You have access to the following summary data, which aggregate information on procedures performed and billed by providers in 2011, as well as how much Medicare reimbursed:
Inpatient and outpatient are simply different types of procedures, which don’t overlap. A provider might provide procedures of both types. Inpatient procedures are coded with DRG codes, and outpatient with APC codes.
You also have access to actual individual patient procedures from 2013 and a small Hadoop cluster with data and relevant tools already loaded.
The following questions have been posed to you:
- We think that some providers and regions are consistently billing too much for procedures or billing for too many procedures, perhaps inadvertently.
- Which procedures have the highest relative variance in cost?
- For each procedure, consider the average amount claimed by each provider. Which three providers had the highest average amount claimed for the largest number of procedures?
- Some providers and regions are likely to be different in more subtle ways.
- Based on amount and type of procedures claimed, which three providers and regions are least like the others?
- Can you briefly explain what seems to be different about these?
- We have a lot of individual patient claim data. Our staff have identified several that look unusual -- they could be errors or fraud or simply unique patient contexts that are worth review.
- Given this information, identify another 10,000 patients that seem most likely to need review.
- Can you briefly describe some common features in these patients?
Individual Contributions Only
You must participate in this challenge only on an individual basis; teams are not permitted.
Any sharing of code or solutions or collaboration with another person or entity is strictly forbidden.
You may use any tools or software you desire to complete the challenge.