Complete our short form to continue

Genestack will process your personal data in accordance to its privacy policy which can be found here. This includes sending you updates by email about our products and content we think it would be of interest to you. You can unsubscribe at any time by clicking the link in the footer of any email we send. By clicking submit you agree that we process your information in accordance with these terms.
Opinion

Want to democratize data science? Focus on the data and search services first, not the visual analytics

11.06.21

9 out of 10 scientists are unable to explore data effectively in a typical R&D organization. This is because most scientists lack the necessary computational skills, thus preventing them from dealing with the data directly and independently. They have to be constrained by the limited availability of data scientists (who, in turn spend most of their time doing data wrangling, rather than data science) or rigid self-service visualization tools, which can’t keep up with rapidly-evolving data and research needs.

Deal with the root cause: the data and the search services, not the visual analytics.

The key issue is that data visualization applications are only as good as the underlying data quality and the ability to query it. For example:

Data management and search services enable you to build diverse visual analytic apps much more rapidly

Your best bet is to focus on the underlying source of limitations first: data management and search services. You’ll then have the lever to build applications much faster and much more cheaply, to the point that they become disposable.

What does good data management and search services look like?

A good data management should help you break down data silos, clean messy metadata, and track complex relationships. It should also enable you to do this as early as possible, not retrospectively. Because collecting and cleaning data only at the point when you need it will be painstakingly difficult.

A good search service should help you to integratively query thousands of samples, millions of variants, hundreds of thousands of expression values and so on. It should be flexible enough, providing reusable building blocks for creating tailored visual analytic apps. Once you have good data management and search services in place, when it comes to building the visual analytic apps, consider lighter implementation in R/Python, rather than the more complex frontend stack in JavaScript. This will make the life of data scientists a lot simpler: they are more familiar with R/Python and they’ll be able to easily utilise bioinformatics packages.

Related Article Driving precision medicine and the evolution of clinical data management Pharma is beginning to recognize the value of their existing data assets and using these to empower future research. In this industry article we reflect on this transformation.

What’s the impact of implementing this strategy?

Consider a typical scenario in a medium R&D department, where you have 20 data scientists and 200 biologists:

If we can make self-service applications faster and cheaper to build, a conservative estimate would lead to saving each of these activities by 50%, which translates to annual cost savings of at least $5M dollars. This is not to mention the long-term impact from better science.

Case study: Expression atlas

Data diversity and volume has grown rapidly over the years: it’s not uncommon now to want to query a gene/protein of interest across thousands of private/public transcriptomics/proteomics samples. Traditionally, it’d take months to build such applications.

Using our flagship product, Genestack ODM, we are able to build a powerful proteomics/transcriptomics expression atlas in just a few days, by a single data scientist. Moreover, it’s purely written in R, with only a few hundreds lines of code, so it’s very-easily customisable and extendable to answer additional research questions.

But there’s no magic: this is only possible since all the hard work of integrating, harmonizing, and indexing the data has been taken care of by Genestack ODM, allowing the application to make just a few API calls to retrieve the right data and metadata, from the right sources, for the right questions.

 

 

Related:

> Bridging the gap: Helping organizations make a success of Life Science Data integration initiatives

> Wellcome Sanger Institute adopts Genestack’s Genestack ODM for Human Genetics datasets

 

 

11.06.21

Sign up for our newsletter