The FAIR principles of data management

The FAIR principles are a set of measurable guidelines, aimed at data producers and publishers, to ensure that scientific data is findable, accessible, interoperable and reusable.

The cost of being un-FAIR

Originally published in 2016 (Wilkinson et al., 2016, Nature), they arise from the urgent need to improve the infrastructure supporting the reuse of research data.

A recent EU report estimates that not having FAIR research data costs the European economy more than €10.2bn/yr

for time spent, cost of storage, licence costs, research retraction, and double funding, plus a staggering €16bn of unquantified elements including impact on research quality, economic turnover, and machine readability of research data.

Photo by Markus Spiske

The FAIR principles benefit a number of different stakeholders, including researchers wanting to share and reuse experimental data, scientific publishers and funding agencies for long-term data stewardship, providers of software for data management, analysis, and processing, and the data science community using new and existing data to advance discovery.

FAIR data do not just benefit the scientific community as such, or academic research.

Data production, management, and integration is a core business for many life-science organisations such as pharmaceutical, biotech, agriscience, and healthcare companies.

Organisations in these sectors all stand to gain from implementation of the principles, with time and cost saving, productivity gains, and new analytical capabilities among the key benefits.

The FAIR principles

The FAIR principles define a set of characteristics that data, tools, vocabularies and infrastructures should have in order to be findable, accessible, interoperable and reusable.

Findable

Metadata and data should be easy to find for both humans and computers. Machine-readable metadata are essential for automatic discovery of datasets and services, so this is an essential component.

Accessible

Once the user finds the required data, she/he needs to know how they can be accessed, possibly including authentication and authorisation.

Interoperable

The data usually need to be integrated with other data. In addition, the data need to interoperate with applications or workflows for analysis, storage, and processing.

Reusable

Metadata and data should be well-described so that they can be replicated and/or combined in different settings.

Each principle is broken down further into a series of measurable requirements. The FAIR principles have a low barrier-to-entry: they are minimally defined so that data producers, publishers and stewards can easily adhere to them, and they can be adhered to incrementally. The FAIR principles are also modular: they can be adhered to in any combination, which supports a wide range of applications and special circumstances.

The future is FAIR

Creating a unified framework for data and metadata is challenging as our knowledge as a whole is constantly changing. Life science research and healthcare stand to benefit greatly from the opportunities afforded by this integration, but also suffer from the challenges associated with the fast-paced evolution of knowledge that NGS and multi-omics has recently injected into such fields. Diverse databases and data standards are competing and evolving in the omics space while users struggle to build compatibility between them.

To tackle this and advance the science, we need to embrace the FAIR principles, so that, as a community, we can benefit from the power that data and metadata offer. Software built with intuitive integration in mind will allow large-scale Life Science Datasets to be harnessed efficiently and integrate proprietary data with the wider omics landscape. Genestack’s Genestack ODM exists to do just that.

Related content: