Challenges
Over the past ten years, Rothamsted Research has been developing the KnetMiner tools for Life Science Data integration, literature mining and gene network analysis. Combining these techniques enables biologists to mine the wealth of public domain data to understand the biological mechanisms underlying complex traits such as yield, nutritional quality, as well as tolerance against diseases or drought. However, these tools can be difficult to set up and maintain for less specialist users.
Biological information and evidence is often scattered across databases and literature in myriad forms: effects of genetic variations, gene regulations, protein interactions, pathways, homologies and others. To make informed and effective decisions, these knowledge pieces need to be examined together: establishing chains of functional associations and interrelations between distant traits and causal genes. Networks provide a suitable data structure to capture such heterogeneous, complex and interconnected relations and provide a unified structure for deep relationship mining that’s simply not achievable when dealing with the native data in isolation or pairwise. Unfortunately, building and mining a large biological knowledge network is not a trivial task.
Solution
Supported by Innovate UK, Genestack joined forces with the Hassani-Pak lab at Rothamsted Research to integrate KnetMiner, a suite of tools for evidence-based gene discovery, into a single architecture. Genestack software now provides a new environment for building, mining and distributing large-scale context and species-specific knowledge networks. Although the underlying tools, Ondex and KnetMiner, are made open-source and can be used outside the Genestack environment, this would require exceptional data management skills and dedicated servers. Using them within the architecture provided by Genestack is simpler and allows users to do much more thanks to the readily-available public data and other bioinformatics applications. Though originated in agri-research, our network mining solution is agnostic in principle so it can be tailored and applied to any kind of data integration, whether it’s for toxicology or drug development. The solution we have developed can be used either via a graphical user interface or programmatically, and deployed on or off the cloud.
Impact
Genestack provided an enterprise-level infrastructure for bioinformatics R&D, with a strong focus on data and metadata management, secure sharing of data tools, absolute reproducibility and governance, as well as scalable computing and visualisations. Using Genestack software, Rothamsted can take their data from raw machine output to knowledge-based networked contextual interpretation, combining private and public datasets and databases, and sharing their findings securely.