Variant Calling Using Genestack Platform

testSingle nucleotide polymorphisms (SNPs), together with insertions and deletions (indels) represent almost all human genetic variations. Studies on these mutations focus on the identification of genomic markers for monogenic and complex diseases, such as diabetes, cancer and various autoimmune diseases. Variant calling is one of the main applications of NGS technology, thanks to high throughputs and low error rates. In this tutorial we will perform variant calling analysis on the Genestack platform, going from uploading raw data to visualizing our results in a genome browser. We will use Genestack applications to go through the following steps:

  1. Quality control and preprocessing of raw reads
  2. Mapping reads onto a reference genome
  3. Quality control of mapped reads
  4. Variant calling analysis

The RNA-Seq dataset we will use comes from Luo J. et al. 2011 and includes four Homo sapiens samples: two replicates from bowel adenocarcinoma tissue, and two controls from healthy bowel tissue. All four RNA-Seq samples are made by 65bp, single-end short reads from Illumina technology. This dataset has been published as on GEO, the Gene Expression Omnibus database from NCBI, with accession number GSE29580. This experiment is available on Genestack with accession GSF000402; we can find it by typing GSF000402 in the search box on top.



GSF000402 is an experiment, and clicking on it will open a folder containing its corresponding sequencing assays, which are the RNA-Seq libraries from each sample. The yellow box shows useful information on this dataset: we can click on "show more" to read a description, or click on "Open Metainfo Viewer" to see metainformation on the experiment, including details for all samples.


Now that we chose an experiment we can proceed with the analysis using Genestack applications. There are four main categories:

  1. Preprocess, for QC, filtering and editing data;

  2. Analyse, for sequencing data analysis, including differential gene expression for RNA-Seq or peak calling for ChIP-Seq;

  3. Explore, for data visualization and reports;

  4. Manage, for creating groups an sharing files and experiments.