RNA-Seq Analysis Pipeline
A complete Snakemake pipeline for RNA-Seq data analysis from raw FASTQ files to differential expression results with MultiQC reporting.
RNA-Seq Analysis Pipeline
A production-ready Snakemake pipeline for comprehensive RNA-Seq data analysis, designed for reproducibility and scalability.
Overview
This pipeline automates the entire RNA-Seq analysis workflow from raw sequencing data to publication-ready results. It handles quality control, alignment, quantification, and differential expression analysis with minimal manual intervention.
Key Features
- End-to-End Automation: From FASTQ to differential expression results
- Quality Control: Comprehensive QC with FastQC and MultiQC
- Flexible Alignment: Support for multiple aligners (STAR, HISAT2)
- Quantification: Feature counting with featureCounts and HTSeq
- Differential Expression: DESeq2 and edgeR integration
- Reproducible Environments: Conda and Singularity support
- Scalable Execution: Local, cluster, and cloud deployment options
Pipeline Workflow
-
Quality Control
- FastQC for per-base quality assessment
- MultiQC for consolidated reporting
-
Read Preprocessing
- Adapter trimming with Trim Galore!
- Quality filtering
-
Alignment
- STAR alignment to reference genome
- BAM file sorting and indexing
-
Quantification
- Feature counting with featureCounts
- TPM and FPKM calculation
-
Differential Expression
- DESeq2 for statistical analysis
- MA plots and volcano plots
- Gene set enrichment analysis
-
Reporting
- Automated MultiQC reports
- Differential expression summaries
- Interactive visualizations
Technologies Used
- Workflow Management: Snakemake
- Programming: Python, R
- Environment Management: Conda, Singularity
- Alignment: STAR
- Quantification: featureCounts
- Statistics: DESeq2, edgeR
- Visualization: ggplot2, plotly
Getting Started
# Clone the repository
git clone https://github.com/tamoghnadas12/rnaseq-snakemake-pipeline
cd rnaseq-snakemake-pipeline
# Install dependencies
conda env create -f environment.yml
conda activate rnaseq-pipeline
# Configure samples
cp config/samples.example.tsv config/samples.tsv
# Edit samples.tsv with your sample information
# Run the pipeline
snakemake --cores 8 --use-conda
Configuration
The pipeline is highly configurable through YAML config files:
config/config.yaml: Main configurationconfig/samples.tsv: Sample metadataconfig/units.tsv: Sequencing unit information
Results
The pipeline generates a comprehensive output directory with:
- Quality control reports
- Alignment statistics
- Gene expression matrices
- Differential expression results
- Publication-ready figures
Documentation
Contributing
Contributions are welcome! Please see our contributing guidelines for details.
License
This project is licensed under the MIT License - see the LICENSE file for details.
Continue Learning
One Small Win
Try this quick command to get started:
Copy and paste this into your terminal to get started immediately.
Related Content
Snakemake for Beginners: Your First Bioinformatics Pipeline
Learn how to build reproducible bioinformatics workflows with Snakemake. A step-by-step guide from basic concepts to a complete RNA-seq analysis pipeline.
RNA-Seq Analysis Pipeline with Bash
Build a comprehensive bash pipeline for RNA-Seq data analysis from raw reads to differential expression results using STAR, featureCounts, and DESeq2.
Proteomics Data Analysis Pipeline
A comprehensive pipeline for mass spectrometry-based proteomics data analysis including identification, quantification, and statistical analysis.
Variant Calling Pipeline
A robust Snakemake pipeline for germline and somatic variant calling from whole-genome and whole-exome sequencing data.
Start Your Own Project
Use our battle-tested template to jumpstart your reproducible research workflows. Pre-configured environments, standardized structure, and example workflows included.
Use This Templategit clone https://github.com/Tamoghna12/bench2bash-starter
cd bench2bash-starter
conda env create -f env.yml
make run