Genestack platform allows its users to upload large datasets to the cloud and keep them organized and format-free. In this guide, we'll import data from Merkin et al. 2012 experiment and show how Genestack makes this process fast and reliable.
Step 1: Uploading files
1. Use data from your computer. Select or drag-and-drop files.
2. Upload from URLs (FTP or HTTP/HTTPS). Specify URLs for separate files or directories. This is what we're going to do. Click on the "Import URLs" and paste these files:
Uploading from URLs is done in the background. This means that even while these files are being uploaded, you can describe their metadata and use them in pipelines. Click the "Import files" button to proceed.
Step 2: Format recognition
After data is uploaded, Genestack automatically recognizes file formats and transforms them into biological data types: raw reads, mapped reads, reference genomes, etc. You won't have to worry about formats at all. Format conversions will be handled internally by Genestack. If files are not recognized or recognized incorrectly, you can manually allocate them to a specific data type using drag & drop. Click "Edit Metainfo" to proceed.
Step 3: Editing metainfo
On this step, you can describe uploaded data using an Excel-like spreadsheet. Edit file metainfo and add new columns, choosing from existing metainfo fields or creating new ones. Let's add a "Tissue" column. Click on the "Add column" button and choose "Tissue" from the dropdown: Add other metainfo fields and fill them according to the table below. It's important to fill the "organism" field so that your data is well-organized:
This metainfo will be useful for further analysis, helping you to work with your data. The following picture shows how to run a tissue-specific isoform expression analysis, using "Tissue" field in automatic grouping of the samples. To reproduce this analysis, read our blog post about isoform expression analysis.
More cool stuff:
If during uploading you lose your Internet connection, you will be able to resume unfinished uploads later. In addition to uploading your own data, you have free access to public data imported from SRA, GEO, ENA and other databases. They are located in the folder Public experiments. It even has the experiment used in this guide (including all metadata).