How to import and organize genomic data in the cloud

Genestack platform allows its users to upload large datasets to the cloud and keep them organized and format-free. In this guide, we'll import data from Merkin et al. 2012 experiment and show how Genestack makes this process fast and reliable.

Step 1: Uploading files

In File Manager click on Import button. There are 2 ways to upload data into the platform:

1. Use data from your computer. Select or drag-and-drop files.



2. Upload from URLs (FTP or HTTP/HTTPS). Specify URLs for separate files or directories. This is what we're going to do. Click on the "Import URLs" and paste these files:


URLFiles are uploaded in multiple streams to increase upload speed.

Uploading from URLs is done in the background. This means that even while these files are being uploaded, you can describe their metadata and use them in pipelines. uploads Click the "Import files" button to proceed.

Step 2: Format recognition

After data is uploaded, Genestack automatically recognizes file formats and transforms them into biological data types: raw reads, mapped reads, reference genomes, etc. You won't have to worry about formats at all. Format conversions will be handled internally by Genestack. data_types If files are not recognized or recognized incorrectly, you can manually allocate them to a specific data type using drag & drop. Click "Edit Metainfo" to proceed.

Step 3: Editing metainfo

On this step, you can describe uploaded data using an Excel-like spreadsheet. edit Edit file metainfo and add new columns, choosing from existing metainfo fields or creating new ones. Let's add a "Tissue" column. Click on the "Add column" button and choose "Tissue" from the dropdown: tissue Add other metainfo fields and fill them according to the table below. It's important to fill the "organism" field so that your data is well-organized:

Name Organism Tissue
SRR594393 Mus musculus brain
SRR594394 Mus musculus colon
SRR594420 Rattus norvegicus colon
SRR594421 Rattus norvegicus heart
SRR594395 Mus musculus heart
SRR594419 Rattus norvegicus brain

              This metainfo will be useful for further analysis, helping you to work with your data. The following picture shows how to run a tissue-specific isoform expression analysis, using "Tissue" field in automatic grouping of the samples. isoform To reproduce this analysis, read our blog post about isoform expression analysis.

More cool stuff:

If during uploading you lose your Internet connection, you will be able to resume unfinished uploads later. resume_uploads In addition to uploading your own data, you have free access to public data imported from SRA, GEO, ENA and other databases. They are located in the folder Public experiments. It even has the experiment used in this guide (including all metadata).

Try Genestack import and let us know what you think about it. Please comment and report bugs below or via email support@genestack.com