Getting started on Genestack Platform

In this tutorial, we'd like to introduce you to the core features of Genestack Platform. You will learn how our system deals with files, how it helps you organise and manage your data and how to share data with your colleagues. You will see how easy it is to work on private and public data simultaneously and seamlessly, and how to reproduce complex analyses with data flows, a built-in mechanism for capturing and replaying your research. In this tutorial we will walk you through:

1.Creating an Account and Managing Users

2.Creating Users

3.Organising your research

4.Importing data onto the Genestack Platform

5.Initializing files

6.Managing and Sharing Data

7.Reproducing the Analyses with Data Flows

Creating an Account and Managing Users

It's easy to register on Genestack. All you need to do is provide an email and set up a password. sign up screenYou will quickly receive a confirmation email with a link to click on and then you'll be able to log in. After you log in, the system will take you to the Welcome Page. genestack welcome page This is your main point of entry and the place where you can manage and search data using the File Manager, view your recent results and share the findings with your colleagues, set up and launch analysis pipelines and visit the tutorials section. You can always go back to the Welcome Page by clicking the Genestack icon in the upper left corner.  You can always change the settings of your account and instead of the Welcome Page choose the File Manager as the main point of entry to the platform.

Creating Users

Now that you have set up your own account, let's talk about user management. Try opening the menu in the top right hand corner of the screen, where your email is displayed. user management If you click on Manage Users you will go to the user management screen. Every user in Genestack Platform belongs to an organisation. When you signed up to use Genestack via the sign up dialog, we created a new organisation for you, and you have automatically become its first user and its administrator. As an organisation administrator you can create as many  new users for your organisation as you want. For instance you can create accounts for your colleagues. Being in one organisation means you can share data without any restrictions. The user management screen allows you to get an overview of all users in your organisation. You can change a user's password, make any user an administrator or lock a user out of the system. first user on the platformYou can also create new users.  Let's create a Second User by clicking the Create user button. second user creationYou will need to set the user name, email and password. Users added this way are immediately confirmed, and can log in right away.

Organising your research

So you have created a Genestack Platform account, logged on, and created a bunch of users. Let's now talk about data organisation on Genestack platform. From the Welcome Page go to the File Manager and explore it a little bit. Right now you do not have any private files, but you have access to all public data available on the platform. file manager We have preloaded the platform with hunders of thousands of publicly available experiments, curated datasets and reference genomes. In the public data folder you will also find Public Data Flows that you can use in the future.

Format-Free Files

While you browse around all the folders, we'd like to point out a key feature of Genestack Platform: all files are format-free objects. Each Genestack file can be considered a container, packing several physical files or even a database, with complex and rich metadata. Let us take a look at an example. In the Reference genomes folder you will see several pre-loaded genomes: reference genomes Take a look at the KIND column. These files are of the Reference Genome type. There is no single, standard, commonly accepted file format for storing and exchanging genomic sequence and features: sequence can be stored in FASTA, EMBL or GenBank formats. Genomic features (introns, exons, etc.) can for example be represented via GFF or GTF files. Each of these formats themselves have flavours, versions, occasionally suffering from incompatibilities and conflicts. In Genestack you no longer have to worry or know about file formats ("low-level implementation details" as programmers call them). A Reference Genome file contains packed sequence and genomic features. When data, such as reference genomes, is imported onto Genestack (and several different formats can be imported) it is  "packed" into a Genestack file, meaning all reference genomes will behave identically, regardless of any differences in the physical formats underneath. You can browse reference genomes with our Genome Browser, you can use them to map raw sequence reads, to analyse variations, to add and manage rich annotations such as Gene Ontology and you never have to think about formats again. Of course, not only reference genomes are format-free. All files in Genestack Platform are format-free: raw reads, mapped sequence, statistics, genome annotations, genomic data, codon tables, and so forth. metainfo reference genome We can take a look at any file's metadata by clicking the "eye" icon in the File Manager. All files have rich metadata, different for each file type. Some metadata fields are filled in by our curators, some are available for you to edit via different applications in Genestack Platform, and some are computed when files are initialised.

Importing data onto the Genestack Platform

We've talked about the core concepts of Genestack and the geography of the Platform. Now let's discuss importing data onto the platform. On the Welcome Page you can find an "Import data" option,  and an "Import" button can be found in the File Manager.
import file manager
  Once you click it, this will take you to the Raw File uploader page. There are various options of importing your data. You can drag and drop or select files from your computer, import data from URL or use previous uploads. Zrzut ekranu 2015-11-03 o 11.32.40After data is uploaded and imported, the platform automatically recognizes file formats and transforms them into biological data types e.g. raw reads, mapped reads, reference genomes and so on. This means you won't have to worry about formats at all and this will most likely save you a lot of time. If files are unrecognized, you can manually allocate them to a specific data type using drag & drop located at the top of the page. Zrzut ekranu 2015-11-03 o 11.33.02 On the next "Edit metainfo" step, you can describe uploaded data. Using an Excel-like spreadsheet you can edit the file metainfo and add new attributes for example cell type or age. Zrzut ekranu 2015-11-03 o 11.33.53 Once this step is completed, you can go to "Show files in File Manager" at the bottom of the page. Take a look at a "kind" column- there are no file formats, just biological data types. Zrzut ekranu 2015-11-03 o 11.34.14

Additional option of importing your data is using import templates. On the Welcome page you can find an "Add import template" option. Import templates allow you to specify required and optional metainfo attributes for different file kinds. When you scroll down to the bottom of the page, you'll see an "Add import template" button. import welcome pagea

Initialising files and various file types

Now that you know how to import data onto the platform, we will walk you through file initialisation. All files on Genestack are created by various applications.  When an application creates a new file, it specifies what should happen when it is initialised: a script, a download, indexing, computation. In practice it means that uninitialised files are cheap and quick to create, can be configured, used as inputs to applications to create other files, and then, later, computed all at once. Let's look at an example. Go to the public experiment library and choose "Analysis of the intestinal microbiota of hybrid house mice reveals evolutionary divergence in a vertebrate hologenome"  experiment by Wang et al. Select one of the raw sequencing reads file called "FS01", right click on it, and select "Preprocessing" and "Trim Low Quality Bases" app. This created a file "Trimmed FS01" that is not initialised yet. What is special about our system, is that you do not have to start initialisation! In fact, you can use this file as input to applications for creating other files. trim low quality bases appNotice that you can edit the initialisation parameters of the new file. You can change them because the file is not yet initialised, i.e. the computation - in this case, trimming - has not yet been started. After initialisation has completed, these parameters are fixed and are there to inform you about how the file was created. They can be used to identically reproduce your work. If you wanted to start initialisation of this newly created file, click on the name of the file and select "Start initialisation". trim low quality bases start initializationIn this post we will show you how to use this file as an input for a different application. The trimmed file can for example be mapped to a reference genome. In order to do this you should click on "add step" and select the Spliced Mapping application. Using the "edit parameters" option you can check if the system suggested a correct reference genome and if not, you can select the correct one (in this case this should be a mouse genome). These actions created another file called "Mapped reads for Trimmed FS01" that is waiting to be initialised. spliced mapping mouse genome This again can be used as an input for a different application. As a last step you could for example create a genetic variations file by choosing the Variant Calling app in the "add step" option. In order to see the entire data flow we have just created, click on the name of the last created file, go to "manage" and "File provenance". file provenenceIt will show you processes that have been completed, and ones that need to be initialised. To initialise only one of the steps, click on a given cell, then on "Actions" and later select "Start initialization". To initialise all of the uninitialised dependencies, simply click on "Start initialisation" blue button at the top. file provenance You can track the progress of your computations using the Task Manager that can be found at the top of the page. All the files created in the above example are located in the tutorial folder. To read more about data flows scroll down. One additional thing we should mention is that if you want to analyse more than one file using the same app, it's very easy: just tick all the files you want to analyse, right click on them and select the app you wish to use. running an app on multiple files All the steps you need to take are identical to if you would want to analyse just one file. In this example we have created 100 files that we have to initialize to start the tasks. app page 100 files Now let's talk a bit about different types of files that can be found on the platform.  As we demonstrated, all our files have a built-in system type. Some of these file types are particularly useful when it comes to organising your research and now we will discuss them in more detail.


There are many different file types in Genestack Platform. Every file is created by an application and there's a lot of metadata associated with each file. For example, every file has one or more unique accessions, a name and a description. Applications use file type and metadata to make suggestions about what kinds of analyses a given file can be used in. Almost anywhere you see file names and accessions, e.g., File Manager or in other applications, you can click on them and a file context menu will show up. For example, clicking on a file containing raw sequenced reads displays a menu:dropdown menu raw sequencing reads You can view and edit file metadata via the Edit Metainfo, which appears under the Manage submenu. edit metainfo2 You can open the metainfo viewer on any file in the system by clicking on the eye icon. Here it is on a sequencing assay: edit metainfo


Folders in Genestack behave the same as folders in other systems. You can put files in folders, and you can remove files from folders. There's one very useful difference, however, from most systems. Each file can be added (or, as we sometimes say, "linked") to multiple folders. No data gets copied of course, the file simply appears in multiple locations. This is very handy for organising your work. For example, you can collect into one folder files from multiple experiments and work on them as if they were all part of one experiment.

Experiments, Assays, and Assay Groups

An experiment is a very special kind of folder. It contains only assays, or files, which contain experimentally collected data. One can think of experiments as packages for experimental data. They are a handy container for data. Assays are a general category of file types, which store experimentally collected data. Assay groups are a way to collect assays with common metadata into experimental subgroups, e.g., technical replicates, biological samples undergoing the same treatment, and so forth.

Managing and sharing data

To share data, we use groups. Managing groups is similar to managing users. Click on your e-mail address in the upper right corner and select "Manage groups" from the administration menu. manage groupsAny user can create a group. You can think of a group as a shared project for two or more users. Let's share some data -- the files we created above. To do this go to Manage Groups and create your first group. create new group Right away we have a new group: my new group members And we can add a new member to this newly created group: add user to the group Now your group looks like this: Zrzut ekranu 2015-11-03 o 12.48.41 No confirmation is needed - any user in your organisation can create a group and add other users from your organisation to it. You are the group administrator of any groups that you create. As group administrator you can add/remove other users to your group, make them administrators, or at least make them "sharing" or "non-sharing" users. All groups appear as folders under "Shared with me" in File Manager, and the moment you add a user to a group they'll see the group's folder in their File Manager. shared with me folder Group folders are the same as all other folders in the system: you can add and remove files to group folders just like to any other regular folder. There is an important point to note though: adding a file to a group folder is not the same as sharing it with the group. To share one or more files with a group, you need to select them in File Manager (sharing functionality is also available in other places in Genestack Platform) and click "Share".  A window will open where you will need to select with which group you want to share the selected files. You have the option of giving members the ability to edit the files in addition to viewing them. This does not stop them from using the shared files in data flows, but it does prevent them from editing parameters in files that are not yet initialized as well as metadata for both initialized and uninitialized files. managing-and-sharing-tutorial-tick-box-for-write-permissions-in-sharing-popup After you select the group and click Share, you'll need to confirm that you want to share these files by entering your password (the system will remember this authorisation for the next five minutes) and then you'll have the option, which you do not have to take, to also add them to the group folder: link shared files If you choose to link the shared files, then all group members will see the files you shared at the top level of the group folder. If you choose not to link them into the group folder, do not worry - the files are still shared. This means that users will see them in search results, they will see them in file provenance data flows and will be able to open them in applications. You can always add shared files to group folders later. If you add a file to a group folder, e.g., by drag-and-dropping it in File Manager, the platform will try to detect this and ask you if you want to share it first.

Sharing with Other Organisations

It is very easy to share data with users in the same organisation. You simply create a group and share files; all group members see share data immediately. What about sharing across organisations“ Say, you work in a hospital research group and have imported some valuable pathogenic specimen sequence data into Genestack Platform and want to share it with your colleagues in a pharma company who work on some novel drugs to kill the pathogen. It is easy to set up a new cross-organisational group, or to turn an existing group into one. When you add new users, simply type in the email address of the user from another organisation. Genestack Platform will autocomplete only users in your organisation, not for others. This is a security feature, it means that no one from any other organisation can find out who is registered in Genestack Platform from yours. After you enter the user's email, you will see a new screen: manage-groups-invite The new user is not added to your group right away: you will now create an invitation and send it to another organisation. Your organisation administrator will need to approve it first, and then the other organisation's administrator will have to approve it, too. After confirmation of collaboration by organisation administrators of both parties, the group becomes a cross-organisational group and other users can be added easily. The inviting organisation's administrator will see the following in their group management screens: incoming-invitation Once they confirm the outgoing invitation, the other organisation's administrator will see the same in their Incoming invitations section and will have to confirm it as well. After both confirmations, the new group has members from both organisations: cross-org-group Note that you can change the status of users from your organisation, but not from other organisations. A cross-organisational group can have multiple organisations participating in it. The addition of each new participating organisation needs approvals of administrators of all organisations in the group, as well as that of an administrator from the organisation being invited. Once the approvals are in, sharing is easy. So, you can easily collaborate across organisational (enterprise) boundaries, and appropriate administrative controls are in place.

Reproducing your work with data flows

So, you learned how to work with files and folders, you even created a simple analytical data flow to go from raw sequence to a list of variants. Now, let's talk about reproducibility. We will now show you how to take any data file in Genestack Platform, and repeat the analysis steps that led up to it on different data. Let's go back to the genetic variations file you created called "Variants from Mapped reads from Trimmed FS01″. You might use the Welcome Page to find it in Resent Results or go to the "Created files" folder in the File Manager. You can also find it in the tutorial folder. Rather than viewing its provenance like we did before, let's see if we can reuse the provenance. To do this, select the file, go to "Manage" and "Create new Data Flow". create new data flow In the next screen you will see the data flow we have previously created. run data flowThe data flow editor has one core goal: to help you create more files using this diagram. To do this you will need to make some decisions for boxes in the diagram via the Action menu. If you want to select different files, go to "Choose another file". If you want to leave the original file simply don't change anything. choose another file In this example, we will use this data flow to produce variant calls for another raw sequence data file, FS02 reproducing the entire workflow including trimming low quality bases, spliced mapping and variant calling. All you need to do is choose another input file and click on "Run dataflow" button at the top of the page. You will be given a choice: you can initialize the entire data flow now or delay initialization. delay initialization until later If you decide to delay the initialization till later, you will be brought back to the Data Flow Runner page where you can initialize individual files by clicking on the file name and later selecting "Start initialization". Zrzut ekranu 2015-11-03 o 13.16.22   This is the end of this tutorial. We hope you found it useful and that you are now ready to make the most out of our platform. If you have any questions you can post them on our forum and we will answer them as soon as we can. Alternatively, you can e-mail us. Genestack team