Complete our short form to continue

Genestack will process your personal data in accordance to its privacy policy which can be found here. This includes sending you updates by email about our products and content we think it would be of interest to you. You can unsubscribe at any time by clicking the link in the footer of any email we send. By clicking submit you agree that we process your information in accordance with these terms.
Story

Bioinformatics highlights of the last decade

07.01.20

As we start a new decade the pace of bioinformatics innovation appears to be at least as frantic as the last. We thought it would be fun to ask people in the Genestack office what their particular highlights of the last 10 years were.

1. Sequencing data growth

Hardly a surprise, but developments in sequencing technologies continued apace, enabling the faster and cheaper production of data. Where the decade started with the publication of the 1000 Genomes Project, it ended with the sub $1000 genome and completion of the 100,000 Genomes Project, The Cancer Genome Atlas, GTEX, The Human Microbiome Project, The 1000 Plant Genomes Project and many more.

2. Single-cell sequencing

The application of next-generation sequencing techniques to study the transcriptome at the level of a single cell has revolutionised the ability of scientists to interrogate biological systems and diseases, as well as, of course, generating a lot more data. The Human Cell Atlas initiative used this to create a reference map of all of the cells that make up a human healthy body, a feat which was impossible prior to this technology.

3. Non-sequence data types growth

These have been around for a while, but the last decade has also seen a huge increase in the collection of other data types, for example medical/clinical data, wearables, proteomics, environmental data, imaging data etc. Projects such as the UK Biobank enable scientists to correlate anonymised data from blood samples, lifestyles/behaviours, electronic health records and more with genetics and disease outcomes from a large number of individuals.

4. Life Science Data integration

The premise of simple correlation between an individual’s genome and their condition was long gone before the last decade, but while we have increased the collection of other data types, the complexity of biological systems showed the clear need for patients, samples and hypothesis to be queried across combinations of biomarkers and multiple Life Science Data types.

5. AI/ML

The problem of classifying outcomes from large volumes of multiple input factors seems tailor made for AI/ML, and while these techniques have been around for many decades their application to the biosciences field really took off in the last ten years. This has been mostly driven by the increasing availability of computational power and storage, as well as improved techniques for deep learning, and the better suitability of data for mining by AI/ML algorithms.

5. Cloud computing and containerisation

The last decade saw the launch of cloud businesses from Amazon, Microsoft and Google which together with continued developments in CPUs and GPUs resulted in increasingly available and affordable computational power, enabling anyone to investigate AI/ML.

Containerisation technologies such as Docker were also released, lowering the barrier for cloud deployment and system integrations.

6. Data FAIRification

Many organisations began concerted efforts to manage their digital health in the last decade. With the vast amount of data now being produced keeping it Findable, Accessible, Interoperable and enabling it’s Re-use is more important than ever. Correctly and standardly annotating data with metadata and provenance allows cross-study hypotheses to be tested and is crucial for AI/ML algorithms to accurately infer classifications.

What are your highlights of the last decade?

07.01.20

Sign up for our newsletter