20.09.17

How pharmas can extract greater value from large-scale population studies

Dr Misha Kapushesky, Genestack CEO discusses empowering discovery at the Oxford Global Pharmaceutical IT congress 27-28 September, London

The cost and complexity of organising genomics research data are a challenge shared by larger global pharma and growing biotech alike. Dr Misha Kapushesky, CEO of Cambridge-based Genestack, believes that the availability of large scale population studies will intensify this pinch-point. At the upcoming Oxford Global Pharmaceutical IT conference he will be using real world case-studies to describe how effective data management can empower discovery.

No matter who he talks to across the industry, Dr Kapushesky has found that a major issue is managing research data across organisational and geographical borders.

He says: “Today, you rarely come across an organisation that believes it can generate all the data it needs to make deductive discovery and development, so more data is being created through collaborations. This can be between academic partners and a therapeutic area, or between several therapeutic areas in the same organisation. There are also opportunities to put private data into a wider context with public data emerging from large population studies

“You don’t have to look very far for examples, like the collaboration between AstraZeneca and HLI, or GSK and the UK Biobank. Organisations in this R&D space are getting access to practically population size multi-omics datasets. However, this comes with challenges.

“Firstly, how do you find relevant data both internally and in public repositories and keep track of the provenance of the data that has been captured and processed? There needs to be a complete audit trail of the data so that computations are reproducible.

“Secondly, how do you provide rapid access for scientists and improve collaboration? You need to build an IT infrastructure so that they can find the data they need, wherever it is held, with simple searches such as 'find patients that are over 30, female, non-smokers, without a mutation in a particular gene for whom I also have transcription data’? The ability to get fast answers to these queries is critical.

“Finally, in discovery, we want to be able to reuse and repurpose data, so how do you stop it residing in silos? For instance, ontologies are needed to structure data in existing databases and allow the infrastructure to scale as more information is generated. Distributed storage, compute and federated queries are tough technologies to harness.”

Being able to address these challenges effectively will free bioinformaticians from the routine work and make the scientists more productive.”

Dr Kapushesky believes that Genestack has addressed many of these issues by creating a flexible platform that works with their clients’ existing investment in hardware, data storage and analysis.

“Our mission is to create an multi-omics ecosystem that can integrate with public data on the cloud, or private data within an in-house deployment, and everything in between. To support the scientists, the platform incorporates a powerful user interface, intelligent data management and sophisticated metadata management. This makes it possible for them to quickly gauge how much relevant data exists, create virtual cohorts, see what analytical reports are available and what kinds of analyses can be done.”

In his presentation Dr Kapushesky will describe how the company has created development focussed partnerships to address the challenges of big data management.