Welcome to Granatum! This is a graphical single-cell RNA-seq (scRNA-seq) analysis pipeline for genomics scientists. The pipeline will graphically guide you through the analysis of scRNA-seq data, starting from expression and metadata tables. It uses a comprehensive set of modules for quality control / normalization, clustering, differential gene expression / enrichment analysis, protein network interaction visualization, and cell pseudo-time pathway construction.
Note 1: please do not click your browser's "Back" button. To restart the pipeline, click your browser's "Refresh" button.
Note 2: depending on dataset size, some steps may take time. Please allow computations to complete even if your browser appears to hang.
Video tutorial: link to the video
Survey (suggestions are welcome!): link to the survey
Manuscript: link to the manuscript
Manual: download PDF
License: download text
Example human data (Kim, et al. 2016):
Is your data Human or Mouse? Make a selection under "Species". Then provide your Expression and Metadata tables as comman separated value files.
If you would like to add more datasets, click Add another dataset on the next page.
Remove confounding effects from data generated in batches. Box plots give expression statistics for a random sampling of up to 96 cells. Select a batch grouping label (factor) then click "Remove batch effect". If multiple datasets were separately uploaded, the "dataset" factor can be used.
Remove unusual cells, e.g., those damaged by capture. Select cells by clicking points in the plot and/or using "Auto-identify", then click "Remove selected".
Adjust expression levels to correct for artificial differences between cells, e.g., differences in sequencing depth. When a rescaling/normalization button is clicked, the box plot (showing expression statistics for up to 96 randomly selected cells) will reflect changes. For example, clicking "Rescaling to geometric mean" will cause red dots (geometric means) to align. Note that clicking more than one rescaling/normalization button will apply adjustments on already adjusted values (use "Reset" to go back to unadjusted data).
Remove genes having very low expression and/or those with little variation (dispersion) by moving the sliders. It is recommended to keep at least 2,000 genes.
Select a clustering method and enter a number of clusters (or check the box for auto selection), then click "Run clustering".
Identify differentially expressed genes between clusters. The number of cores can be set to 2 and will run for approximately 30 minutes on the Kim, et al. 2016 dataset (116 cells, 3,788 genes, 3 clusters), when using a VirtualBox Appliance having 8 GB RAM and an Intel I7 processor. Note: the progress bar will not accurately reflect progress, please give the calculations time to complete.
Once complete, the enrichment of differentially expressed genes in KEGG pathways and GO terms can be calculated.
Tabs indicate cluster numbers. Genes are sorted by absolute Z-score.
Proteins from top differentially expressed genes are visualized with connecting lines indicating documented biochemical interactions. Go to the next step by clicking "Proceed" (bottom right of page).