Cloudera makes bold bet on strategic acquisition of Verta’s Operational AI Platform Read the blog
GlaxoSmithKline logo
Minutes vs. months for clinical analysis

Key Highlights


  • Healthcare
  • Pharmaceuticals
  • Biotech


Headquarters: Brentford, United Kingdom

Solution highlights

  • Modern Data Platform: Cloudera Enterprise
  • Workloads: Data Warehouse, Data Science & Engineering
  • Key Components: Apache HBase, Apache Impala, Apache Kafka, Apache Sentry, Apache Spark, Cloudera Data Science Workbench, Cloudera Navigator, Cloudera Search, Kerberos
  • Data Science Tools: Anaconda, Hail, IPython, JupyterHub, RStudio
  • Databases: MongoDB
  • BI & Analytics Tools: AtScale, Kinetica, SpotFire, Trifacta, Waterline Data, Zoomdata
  • Data Acquisition and Curation Tools: StreamSets, Tamr
  • Implementation Partner: Modak Analytics, +tellic, Cloudwick

Applications supported

  • R&D Information Platform

Data sources

  • Over 2,000 databases spanning discovery, clinical, genomics and other R&D data


  • Significant reduction in data silos
  • Accelerates development of new drugs
  • Reduces time for access to clinical trial data from months to minutes
  • Decreases time and cost of participant selection for clinical trials

Big data scale

  • Over 5 PB

GlaxoSmithKline’s R&D Information Platform uses Cloudera and partner technologies to provide its scientists with insights to accelerate new product development, reduce costs, increase drug safety and, ultimately, improve, extend, and save lives.

GlaxoSmithKline (GSK) is a global pharmaceutical company with commercial operations in more than 150 countries, a network of 87 manufacturing sites, and R&D centers in the United Kingdom, the United States, Belgium, and China.  


It can take from six to over 12 years to conduct all the steps necessary—from research and testing to clinical trials and regulatory approvals—to bring a new drug or vaccine to market. Once a new product goes to market, pharmaceutical companies have a small window of opportunity to recoup development costs before their patent expires. Adding to the challenge, the cost to produce drugs has remained static in recent years, leading to a considerable reduction in profitability.

To combat these pressures, GSK sought to transform how data is used across Research & Development (R&D). “Our data was in silos, so it was difficult to take what we learned across the R&D pipeline and build on it,” said Mark Ramsey, senior vice president and R&D chief data officer for GSK. “To gain the new levels of efficiency and insight we needed to reduce our costs and speed development, we had to create a platform that would ingest all unstructured and structured R&D data, and deliver greater analytic capabilities.”


The GSK R&D Information Platform uses Clouderapartner technologies, and homegrown tools to deliver a holistic view of all data within R&D and give researchers an immense analytic advantage. For example, it previously could take several months to assemble and analyze data from across multiple clinical trials. Now, with the clinical trial data standardized and analytics-ready, the same analysis can be done in minutes. Months to minutes drives significant value.

The platform combines over five petabytes (PBs) across 10 different data domains, including discovery, clinical, genomics, and other R&D data from across more than 2,000 silos. As a result, researchers can combine and analyze data regardless of when, how, and where it was generated. Users from across GSK’s R&D organization access the platform. “This is the first time in the history of GSK that such a large data strategy has been charted,” said Ramsey. “The platform is powering game-changing insights.”


With privacy and security of vital importance in the healthcare industry, GSK needed to confirm that the platform addressed rigorous industry and internal standards, including the Health Insurance Portability and Accountability Act (HIPAA). “One of the key reasons we rely on Cloudera is because of the enterprise class security,” said Ramsey. “By leveraging the Cloudera SDX capabilities, we can manage all the metadata and policy information in a centralized fashion.”


With its new platform, GSK researchers are gaining insights that help streamline every aspect of the R&D process, including the following:

  • Drug discovery. Scientists can see the detailed data for all of the experiments undertaken across the organization. This information is critical in helping accelerate the development of new drugs.

  • Genomics analysis. GSK researchers can perform association analysis on genomic data spanning 500,000 people--work that was impractical on its legacy platform. In its first analysis, GSK researchers looked for DNA variants associated with specific traits, such as Body Mass Index.

  • Clinical trials. Clinical trial teams can reduce the time and cost of identifying the optimal mix of participants for clinical trials by harnessing the breadth of data and analytics capabilities.

As GSK achieves greater efficiency and new insights across its many R&D processes, executives expect to ultimately move the needle in terms of time-to-market, bringing new drugs and vaccines to market more quickly and less expensively to help patients.

We’ve created a platform that provides our scientists with insights that can shorten delivery timelines, reduce costs, expand reach, increase safety, and, in the end, improve, extend, and save lives.

-Mark Ramsey, Senior Vice President, R&D Chief Data Officer, GSK

Your form submission has failed.

This may have been caused by one of the following:

  • Your request timed out
  • A plugin/browser extension blocked the submission. If you have an ad blocking plugin please disable it and close this message to reload the page.