Step-by-step guide of GREIN


GREIN : GEO RNA-seq Experiments Interactive Navigator

GREIN is an interactive web platform that provides user-friendly options to explore and analyze GEO RNA-seq data. The Gene Expression Omnibus (GEO) is a public repository of gene expression data that hosts more than 6,000 RNA-seq datasets and this number is significantly growing. Most of the samples are deposited in raw sequencing format which needs to be downloaded and processed. With an aim to transform all these datasets in an analysis-ready format, we are constantly processing all available RNA-seq samples of human, mouse, and rat from GEO using an R-based pipeline, GREP2 (GEO RNA-seq Experiments Processing Pipeline). This pipeline simultaneously downloads and processes RNA-seq raw sequencing files available in GEO. A brief outline of the pipeline workflow is described below:

GREP2 : GEO RNA-seq Experiments Processing Pipeline

The whole pipeline consists of two sub-pipelines: downloading and processing. Both the pipelines are simultaneously running on a docker conatiner. The pipeline diagram is shown below:

GEO RNA-seq pipeline workflow


Processing pipeline

  • Retrieve metadata for a given GEO series accession using Bioconductor package GEOquery. We also download metadata file for the given data set from SRA to get the corresponding run information and merge both these metadata files by sample names. We keep samples with library strategy RNA-seq only.

  • For each of the samples in a data set, download the associated run files from SRA database using ascp utility of aspera connect.

  • For each sample in a data set, we generate FastQ files from each SRA file using SRA Toolkit.

  • Get rid of the adapter sequences if necessary using Trimmomatic.

  • Quality control (QC) reports are generated for each of the FastQ files using FastQC.

  • We then run Salmon to quantify transcript abundances for each sample. These transcript level estimates are then summarized to gene level using tximport. We use lengthScaledTPM option in the summarization step which gives estimated counts scaled up to library size while taking into account for transcript length. We obtained gene annotation for Homo sapiens (GRCh38), Mus musculus (GRCm38), and Rattus norvegicus (Rnor_6.0) from Ensemble (release-91).

  • All the information from FastQC and Salmon quantification folders are then combined to produce a single QC report using MultiQC.


Citation

Please cite GREIN by:


Code

Source code of GREIN is available in GitHub. You can post any comments, suggestions, or bug reports here.

GREIN : GEO RNA-seq Experiments Interactive Navigator



Loading...

Loading...

Loading...


Search for GEO series (GSE) accession


          

          

          

          

          

Search by ontologies (MetaSRA)


          

          

User requested datasets









Download data















Loading...

Loading...


                    
                    
Loading...



                          
                          
Loading...


                          
                          
Loading...


                          
                          


Loading...

Loading...







Download signature






Loading...

Search for genes of interest


                          
                          

                          
                          

                          
                          
Loading...



                    
                    


                          
                          

                          
                          

Loading...



                    
                    


                          
                          


                          
                          

                          
                          

Loading...