Your browser is out of date

Update your browser to view this website correctly. Update my browser now


Geisinger Health System, based in Danville, Pennsylvania, is one of the largest health service organizations in the United States, serving more than 3 million residents throughout Pennsylvania and southern New Jersey. Geisinger is one of America’s leading rural healthcare providers, with an integrated, physician-led system that includes 30,000 employees, nearly 1,600 employed physicians, 12 hospital campuses, and two research centers. 

Business Challenge

Geisinger’s reputation for outstanding patient experience is due in large part to its focus on innovations to enhance patient care, integrated health system approach, and an overall focus on caring and compassion. Geisinger was an early adopter of the Electronic Health Record (EHR) and has implemented the EHR throughout the health system. This digital integration connects the system’s hospitals, 40 community practice sites, and the primary and specialty care physicians and extenders who serve patients throughout the Geisinger network.

The quality of Geisinger’s integration between EHRs and its delivery system depends on its ability to make data quickly and easily available to caregivers. To that end, Geisinger’s physicians and data scientists wanted to combine the terabytes of data already collected with data from clinical department systems, patient data, and patient satisfaction surveys. In addition to its existing data, Geisinger anticipated a trove of new data from devices that weren’t even invented when the health system architected its EHRs. 

Health system leaders faced two challenges before they could turn that vision into reality. First, they needed to wrangle Geisinger’s data-in-motion under control more quickly and efficiently. Second, they needed to reduce the cost to store all data so data scientists could derive deep insight from comparison of raw data spanning years and millions of doctor-patient interactions.


Geisinger began transitioning its architecture to meet those business and clinical needs. They turned to Apache Hadoop® and Hortonworks Data Platform (HDP®) to consolidate the structured and unstructured data. The initial use focused on filling the gaps not met by their Enterprise Data Warehouse (EDW), and enriching that patient data with financial data such as billing records. Recognizing the economies of scale with an open-source approach, Geisinger soon turned to Hortonworks Connected Data Platforms to meet these challenges and speed the delivery of actionable insight from both data-in-motion and data-at-rest. The health system was soon onboarding over 30TB of important patient data.


Geisinger immediately began the process of archiving and processing its 30 terabytes of patient data from its EDW into HDP. For most organizations, especially those in healthcare, the associated storage costs are a prohibitive factor. Geisinger, however, was able to save $2 million in EDW replacement costs and $500,000 in annual maintenance costs by eliminating the need to continue and expand its EDW platform. 

After Geisinger successfully on-boarded its structured data, attention turned to its unstructured data. A vast trove of medical records and doctor notes came into HDP in non-structured text format, but then had to be queried. With HDP, Geisinger now runs queries on its unstructured data to derive analytical insights. clinicians and non-clinicians are able to search through 200 million patient note records in seconds to find relevant conditions and medications, which helps them analyze the success of treatments, identify areas of improvement, and determine ways to save time and money for both patients and providers.