A FAIR genLife Science Data and metadata management system

If we’re going to improve the accuracy and consistency of metadata in genomics, there is a need for robust management systems and consistent guidelines for researchers wanting to enhance the reusability of their data.

The FAIR principles aim to overcome data discovery and reuse obstacles by ensuring that data and metadata are (F)indable, (A)ccessible, (I)nteroperable, and (R)eusable. The principles should apply not only to data in the conventional sense, but also to the algorithms, tools, and workflows that lead to the data.

A particular emphasis has been put on making the data discovery and reuse easier not only for humans, but also for machines.

The public data landscape is changing, with many repositories already showing some degrees of “FAIRness”, each with its own technological implementation for the different aspects of FAIR, e.g. Dataverse, FAIRDOM, and UniProt. There are even emerging projects in which FAIR is a key objective, e.g. bioCADDIE (a project to index biomedical data across data repositories and aggregators) and CEDAR (tools enabling the creation of metadata templates that follow community standards with vocabularies and ontologies integration).

When FAIRness falls down

Despite best efforts, some of these new repositories fall short in their “FAIRness” goals in one aspect or another, particularly when it comes to core challenges in multi-omics research: integration of external and internal data and computational pipelines. Not only do we, as researchers, want to easily access data, we also want to be able to easily process them. And we want to do this in combination with private or public datasets using a mixture of private or public tools, while keeping track of their complete provenance. The provenance of processed data not only facilitates reproducibility and reusability, but it also lends itself to the aggregation of knowledge, i.e. meta-analysis.

This is less of an issue for researchers in academia since they operate on a scale that can tolerate a degree of inefficiency in their integration. However, that does not hold true as data volumes and collaboration requirements increase. Industrial R&D departments have to focus on maintaining existing tools and creating point solutions, but often lack the time and resources to fully integrate all of their resources. At Genestack, we help companies to do just that through off-the-shelf software as well as a suite of flexible custom-development services. Our areas of expertise extend from data management to system integration, visual analytics, and scientific consultancy.

A robust data management system

We recently launched the enterprise software Genestack ODM (ODM) that lets you record linked Life Science Data along with rich metadata and relationships – ODM becomes the Single-Point-of-Truth for all of your Life Science Data management needs. Data is imported into ODM with rich metadata following strict templates with support for public and private ontologies to ensure consistency. Users can seamlessly search and query diverse private and public data using features such as ontology-based autocomplete and suggested common field values while maintaining strict access control.

Genestack ODM records and retain relationship information and tracks file provenance.

We’ve also covered the essential integration aspects. ODM easily integrates into individual environments, like LIMSs, as well as other common platforms. The entire system is supported by our expert team of bioinformaticians and software developers.

The future is FAIR

Data only have value in the context of what we already know. Metadata allow us to build links between disparate blocks of detailed information and put it into perspective. However, creating a unified framework for metadata is challenging as our knowledge as a whole is constantly changing. Life science research and healthcare stand to benefit greatly from the opportunities afforded by this integration, but also suffer from the challenges associated with the fast-paced evolution of knowledge that NGS and multi-omics has recently injected into such fields. Diverse databases and data standards are competing and evolving in the omics space while users struggle to build compatibility between them.

To tackle this and advance to field, we need to embrace the FAIR principles, so that, as a community, we can benefit from power that metadata offer.

Software built with intuitive integration in mind will allow large-scale Life Science Datasets to be harnessed efficiently and integrate proprietary data with the wider omics landscape. Genestack exists to do just that.

If you’d like to more about metadata why they’re so valuable to the omics industry, you can download our metadata eBook.

> Metadata and the need for consistency