GREIN is an interactive web platform that provides user-friendly options to explore and analyze GEO RNA-seq data. The Gene Expression Omnibus (GEO) is a public repository of gene expression data that hosts more than 6,000 RNA-seq datasets and this number is significantly growing. Most of the samples are deposited in raw sequencing format which needs to be downloaded and processed. With an aim to transform all these datasets in an analysis-ready format, we are constantly processing all available RNA-seq samples of human, mouse, and rat from GEO using an R-based pipeline, GREP2 (GEO RNA-seq Experiments Processing Pipeline). This pipeline simultaneously downloads and processes RNA-seq raw sequencing files available in GEO. A brief outline of the pipeline workflow is described below:
The whole pipeline consists of two sub-pipelines: downloading and processing. Both the pipelines are simultaneously running on a docker conatiner. The pipeline diagram is shown below:
Retrieve metadata for a given GEO series accession using Bioconductor package GEOquery. We also download metadata file for the given data set from SRA to get the corresponding run information and merge both these metadata files by sample names. We keep samples with library strategy
For each of the samples in a data set, download the associated run files from SRA database using
ascp utility of aspera connect.
For each sample in a data set, we generate FastQ files from each SRA file using SRA Toolkit.
Get rid of the adapter sequences if necessary using Trimmomatic.
Quality control (QC) reports are generated for each of the FastQ files using FastQC.
We then run Salmon to quantify transcript abundances for each sample. These transcript level estimates are then summarized to gene level using tximport. We use
lengthScaledTPM option in the summarization step which gives estimated counts scaled up to library size while taking into account for transcript length. We obtained gene annotation for Homo sapiens (GRCh38), Mus musculus (GRCm38), and Rattus norvegicus (Rnor_6.0) from Ensemble (release-91).
Please cite GREIN by: