During their research on pain disorders, Pfizer’s Cambridge team experimented on human sensory neurons. These cell models can be generated by directed differentiation of pluripotent stem cells. In order to assess what proportion of cells had successfully differentiated into neurons, and to further classify them into more specific neuronal subtypes, researchers use single-cell RNA sequencing – a technique that allows profiling cell-to-cell variability on the genomic level.
Differentiating between technical noise and true biological variability is a key issue in single-cell RNA-sequencing analysis. Normalization methods and bioinformatic tools commonly used for bulk RNA-seq have been shown to have difficulty handling the high cell-to-cell variability. One solution is to add, or ‘spike-in’, a known amount of specific RNA transcripts to quantify technical noise. Computational techniques for dealing with this sort of data are known (Brennecke et al, 2013), however, no good solution existed for spike-in-free datasets. Pfizer turned to Genestack to develop a solution for processing and analysing such data in an intuitive and visual manner.
Our team tackled this challenge by developing two custom applications that run on the Genestack platform: Single-Cell Analyser and Single-Cell Visualiser application.
Firstly, the Single-Cell Analyser application uses two noise models: the Brennecke et al (2013) model and a custom-developed novel approach for analysis of spike-in free datasets, to identify a set of genes showing significant biological variability across the cells, and then group them into clusters with similar gene expression patterns.
The Single-Cell Visualiser application provides interactive visualizations to explore these clusters. The app utilises both: industry standard methods, such as PCA, as well as new tools, such as the t-SNE algorithm, which is better suited for segregating clusters of samples with similar gene expression patterns in the presence of technical noise. Genestack's Single Cell Analyser and Visualiser have one more important feature–an improved effective way of automating cluster identification. Previously, automatic cluster identification was done in the visualiser by cutting the heatmap dendrogram at a fixed number of nodes. Using the new method, it is possible to divide cells into clusters using a well-known k-means algorithm. This time each cell will be assigned to a parental cluster based on its gene expression profile. Moreover, the algorithm allows you to determine the optimal cluster number using the "elbow method".
By combining t-SNE algorithm with the k-means clustering algorithm user can easily perform both: sample visualisation and automated cell classification into cell subpopulations:
The new approach results in a better separation of known cell types and is able to reveal cell subpopulations that could not be identified using standard PCA. Here is what it would look like if the visualisation was performed using PCA:
And here is what it would look like if we used dendrogram cutting instead of the k-means algorithm:
Both applications are fully integrated into Genestack which ensures data provenance and in-depth metadata management. This also means they can be used seamlessly with results generated from all the other apps available on our platform.