Patterns and Predictions (P&P) is a predictive analytics firm with a core technology that provides unstructured and linguistics driven prediction. It is the technology powering the Durkheim Project’s ‘Big Data’ analytics network for the assessment of mental health risks. Partners include Bloomberg, The Geisel School of Medicine at Dartmouth, Cloudera, and Attivio. Funding sources include the U.S. Government’s Defense Advanced Research Project Agency (DARPA), and customers include Global 100 companies. The company’s principal partner, Chris Poulin, is co-inventor of the company’s core Centiment® technology that delivers unstructured and linguistics-driven prediction.
Patterns and Predictions (P&P) is a predictive analytics firm with a core technology that provides unstructured and linguistics driven prediction. It is the technology powering the Durkheim Project’s ‘Big Data’ analytics network for the assessment of mental health risks -- veteran suicide risk in particular. The technical rubric for the project is “maximum speed at minimum cost."
With Cloudera Search and Impala, our ingestion of data on Hadoop is promisingly efficient in terms of lower costs, better computational throughput, and reduced complexity of IT support.
Chris Poulin, Principal Partner, Patterns and Predictions
The Durkheim Project began in 2010 with initial funding by the DARPA. In 2011, P&P began sourcing the technology and building out the integrated foundational infrastructure and predictive modeling that would support the project’s extensive data collection and analysis, once it was scaled up.
Phase One of the project began with a study of three cohorts, with 100 subjects each, representing “non-psychiatric”, “psychiatric”, and “suicide positive” profiles. The researchers developed linguistics-driven prediction models to estimate suicide risk, generated from unstructured clinical notes. As participants join, individual profiles are set up and accessible, via a dashboard, to researchers at Geisel and to clinicians. The system assigns overall risk scores to each profile based on the collective information and on keywords that are specific to each participant.
- Over a terabyte of data is processed every day, in real time
- Up to 100,000 active duty military and veterans are supported
The technical rubric for the project is “maximum speed at minimum cost”, which prompted early adoption of Cloudera Search and Cloudera Impala. “The project has a very complex workflow,” explained Poulin. “All of our machine learning is indexed, and we actually access all of the machine learning through search interfaces, which can get expensive. With Cloudera Search and Impala, our ingestion of data on Hadoop is promisingly efficient in terms of lower costs, better computational throughput, and reduced complexity of IT support.”
Cloudera’s category leadership and subject matter expertise with Hadoop and Big Data led Poulin to engage Cloudera Professional Services to co-develop Bayesian counters, a lightweight statistical model that detects risk at scale, based on Apache HBase and CDH (Cloudera’s Distribution Including Apache Hadoop), the market-leading, 100% open source distribution of Hadoop and related projects. The Cloudera based framework is a cornerstone technology of the Durkheim Project.
The Phase One build and testing concluded in early 2013. It validated that the project’s machine learning data fabric was viable, with predictive capabilities that were 65% accurate in predicting suicide risk among a veteran control group.
- Building a lightweight machine learning framework that detects real-time risk at scale
- CDH with Cloudera Impala & Cloudera Search
- Accurate, linguistics-driven correlations between real-time communications & suicide risk
- Infrastructure delivers lower cost, better computational throughput, & reduced complexity of IT support
Big Data Scale
- Over 1TB of jobs processed per day in real time
- Up to 100,000 active duty & veterans supported in real time