DNA analysis

Big Data / Smart Data & machine learning

A case study from one of our sample machine learning projects


“We had the data but it was in many places, many formats and calibrated by many machines. Your team were able to align it and plug the gaps to make our research possible."

Dr James Creek

Case study

Genetic analysis of response to treatment

iPredictor consisted of experienced pharmaceutical physicians, geneticists and IT experts that worked to develop a system to collect specific patient history data, clinical diagnostic data, proteomic and genetic markers. We developed an iPad app that captured relevant new information from an electronic case report from (eCRF), provided from healthcare systems, and passed the information to servers for comparison with historical data from existing healthcare databases. Our team of algorithmists and statisticians combined multiple data sets from disparate systems, using machine learning techniques, to align the data and plug any gaps. Calibration algorithms had to be developed to counter unintentional errors from the diverse range of medical eye scanners.

This information was used to develop a predictive model that assessed which patients may respond well to treatment due to their particular geneotype and phenotype.

Diseases chosen for the research fulfilled a number of key criteria:

  1. Large amounts of clinical and diagnostic data is routinely collected (non-clinical trial)
  2. Diseases where the primary treatment modalities were physician administered (to eliminate compliance and persistence variability)
  3. Diseases & drugs where the pharmacokinetics were well established
  4. Extensive clinical trial and disease area working experience

Our research identified a number of key genetic markers, proteomics, critical patient demographic and clinical diagnostic data that were important in the predictive nature of the drug's response.

Informal partnerships were established with leading physicians at two academic teaching hospitals in the UK, to validate the data requirements of the eRCF and prove the data model developed.

Tools used to analyse the data included Octave for machine learning and SAS for statistical modelling.