Skip to content
b2b bench2bash
pipeline completed

RNA-Seq Analysis Pipeline

A complete Snakemake pipeline for RNA-Seq data analysis from raw FASTQ files to differential expression results with MultiQC reporting.

2 min read
#snakemake #rna-seq #bioinformatics #pipeline #differential-expression #quality-control

RNA-Seq Analysis Pipeline

A production-ready Snakemake pipeline for comprehensive RNA-Seq data analysis, designed for reproducibility and scalability.

Overview

This pipeline automates the entire RNA-Seq analysis workflow from raw sequencing data to publication-ready results. It handles quality control, alignment, quantification, and differential expression analysis with minimal manual intervention.

Key Features

  • End-to-End Automation: From FASTQ to differential expression results
  • Quality Control: Comprehensive QC with FastQC and MultiQC
  • Flexible Alignment: Support for multiple aligners (STAR, HISAT2)
  • Quantification: Feature counting with featureCounts and HTSeq
  • Differential Expression: DESeq2 and edgeR integration
  • Reproducible Environments: Conda and Singularity support
  • Scalable Execution: Local, cluster, and cloud deployment options

Pipeline Workflow

  1. Quality Control

    • FastQC for per-base quality assessment
    • MultiQC for consolidated reporting
  2. Read Preprocessing

    • Adapter trimming with Trim Galore!
    • Quality filtering
  3. Alignment

    • STAR alignment to reference genome
    • BAM file sorting and indexing
  4. Quantification

    • Feature counting with featureCounts
    • TPM and FPKM calculation
  5. Differential Expression

    • DESeq2 for statistical analysis
    • MA plots and volcano plots
    • Gene set enrichment analysis
  6. Reporting

    • Automated MultiQC reports
    • Differential expression summaries
    • Interactive visualizations

Technologies Used

  • Workflow Management: Snakemake
  • Programming: Python, R
  • Environment Management: Conda, Singularity
  • Alignment: STAR
  • Quantification: featureCounts
  • Statistics: DESeq2, edgeR
  • Visualization: ggplot2, plotly

Getting Started

# Clone the repository
git clone https://github.com/tamoghnadas12/rnaseq-snakemake-pipeline
cd rnaseq-snakemake-pipeline

# Install dependencies
conda env create -f environment.yml
conda activate rnaseq-pipeline

# Configure samples
cp config/samples.example.tsv config/samples.tsv
# Edit samples.tsv with your sample information

# Run the pipeline
snakemake --cores 8 --use-conda

Configuration

The pipeline is highly configurable through YAML config files:

  • config/config.yaml: Main configuration
  • config/samples.tsv: Sample metadata
  • config/units.tsv: Sequencing unit information

Results

The pipeline generates a comprehensive output directory with:

  • Quality control reports
  • Alignment statistics
  • Gene expression matrices
  • Differential expression results
  • Publication-ready figures

Documentation

Contributing

Contributions are welcome! Please see our contributing guidelines for details.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Continue Learning

One Small Win

Try this quick command to get started:

git clone https://github.com/tamoghnadas12/rnaseq-snakemake-pipeline

Copy and paste this into your terminal to get started immediately.

Start Your Own Project

Use our battle-tested template to jumpstart your reproducible research workflows. Pre-configured environments, standardized structure, and example workflows included.

Use This Template
git clone https://github.com/Tamoghna12/bench2bash-starter
cd bench2bash-starter
conda env create -f env.yml
make run