Hi!I'm Richard, I'm currently two years into a maths degree at Cambridge University and am spending the summer as an intern at Genestack. I'm currently working on building and evaluating pipelines for transcriptome assembly from RNA-seq data in partnership with theNeuroscience & Painunit atPfizer.With the cost of RNA sequencing falling dramatically over the last few years, there is a large demand for tools that can take RNA-seq output data and assemble it into a transcriptome. Currently there are a lot of transcriptome assemblers out there, falling into two main categories: referenced based assemblers which use a reference genome to guide them, and 'de novo' methods that can operate without a reference genome. There is no one tool that consistently outperforms the others and they are regularly being updated, so it is often unclear which assembler to use.I'll be integrating a number of transcriptome assemblers into the platform including Cufflinks, StringTie and Trinity and using them to assemble transcriptomes from RNA-seq data provided by Pfizer as well as testing the programs on the datasets from RGASP. The biggest challenge in this project will be evaluating the accuracy of the different assemblers - we do not know what the true transcriptome is and so it is hard to measure how accurate the assembled transcriptomes are. I'll be using the DETONATE and rnaQUAST tools for this, these two tools can calculate a range of metrics that can be used to assess the quality of an assembled transcriptome.The aim is to have 5 pipelines constructed by the end of the summer as well as to provide evaluation software that can measure the generated transcriptomes on a range of metrics and visualise the results to allow for an informed choice of assembler.
In this brief article, the first of a series of three brief articles, we share some of our direct experience using Large Language Models (LLMs) in life science R&D.
In his upcoming talk at BioIT 2024 our CEO Misha Kapushesky will discuss “The Uses of LLMs in Discovery Bioinformatics, the Role of Data Management & Lessons in Practical Applications.”
Life Sciences runs on data. From high complexity omics data through to validatory assays, from clinical trials patient data through to sales and manufacturing. It is an asset and a resource that every organization relies on.