Mining for hidden knowledge
Combining artificial intelligence with human intelligence to search diverse biological datasets for clues.
Overview
Over the past ten years, Rothamsted Research has been developing the KnetMiner tools (www.knetminer.org) for Life Science Data integration, literature mining and gene network analysis. Combining these techniques enables biologists to mine the wealth of public domain data to understand the biological mechanisms underlying complex traits such as yield, nutritional quality, as well as tolerance against diseases or drought. However, these tools can be difficult to set up and maintain for less specialist users.
Genestack provided an enterprise-level infrastructure for bioinformatics R&D, with a strong focus on data and metadata management, secure sharing of data tools, absolute reproducibility and governance, as well as scalable computing and visualisations. Using Genestack, Rothamsted can take their data from raw machine output to knowledge-based networked contextual interpretation, combining private and public datasets and databases, and sharing their findings securely.
Make informed and effective decisions
Biological information and evidence is often scattered across databases and literature in myriad forms: effects of genetic variations, gene regulations, protein interactions, pathways, homologies and others.
To make informed and effective decisions, these knowledge pieces need to be examined together: establishing chains of functional associations and interrelations between distant traits and causal genes. Networks provide a suitable data structure to capture such heterogeneous, complex and interconnected relations and provide a unified structure for deep relationship mining that’s simply not achievable when dealing with the native data in isolation or pairwise. Unfortunately, building and mining a large biological knowledge network is not a trivial task.
Evidence-based gene discovery with Genestack and KnetMiner
Supported by the Innovate UK, Genestack joined forces with the Hassani-Pak lab at Rothamsted Research to integrate the KnetMiner tool suite for evidence-based gene discovery into Genestack. The Genestack platform now provides a new environment for building, mining and distributing large-scale context and species-specific knowledge networks. The underlying tools, Ondex and KnetMiner, are made open-source, so you always have the option to contribute and use these tools outside Genestack, but remember that it will take a steep learning curve, laborious data management, and dedicated servers. In Genestack, the process is much simpler and you can do much more by utilising the readily-available public data and other bioinformatics applications. Though originated in agri-research, our network mining solution is generic in principle and we’ve designed the system as such. This means, if you’re working on any kind of data integration, whether it’s for toxicology or drug development, you can tailor-make a network that suits your purpose. You have the flexibility to operate via a graphical user interface or interact programmatically, on or off the cloud. Everyone now has a new opportunity to compose meanings from the vastly evolving Life Science Data.
Genestack now hosts over 40 plant and crop networks, as well as a prototype human disease network. You can see how these networks were built and mined by reading the user guide below. Nothing is stopping you from modifying them or creating many more, bringing your own data and research interest into play. All these data sources, built networks, and applications are immediately available: if you’re an existing Genestack user, ask us for access; if you don’t yet have a Genestack account, let us know and we’ll provide a tester account for you. This is a proof of concept project and we’d very much welcome your feedback and ideas.
You can read more about KnetMiner by download the free (no signup required) eBook here