Skip to content
b2b bench2bash
pipeline completed

Variant Calling Pipeline

A robust Snakemake pipeline for germline and somatic variant calling from whole-genome and whole-exome sequencing data.

3 min read
#variant-calling #snakemake #wgs #wes #gatk #bcftools #genomics

Variant Calling Pipeline

A production-ready Snakemake pipeline for comprehensive variant calling from whole-genome and whole-exome sequencing data, supporting both germline and somatic variant detection.

Overview

This pipeline implements best-practice workflows for variant calling using GATK and other standard tools. It supports multiple sequencing technologies and provides flexible configuration options for different analysis scenarios.

Key Features

  • Multiple Variant Callers: GATK HaplotypeCaller, FreeBayes, bcftools
  • Germline and Somatic Calling: Support for both variant types
  • Quality Control: Comprehensive QC at each step
  • Annotation: Variant effect prediction with SnpEff
  • Filtering: Hard and soft filtering options
  • Joint Calling: Multi-sample variant calling support
  • Reproducible: Conda environments and container support

Pipeline Workflow

1. Preprocessing

  • Adapter trimming with Trim Galore!
  • Read alignment with BWA-MEM
  • BAM file sorting and indexing
  • Duplicate marking with Picard
  • Base quality score recalibration (BQSR)

2. Variant Calling

  • Germline Calling: GATK HaplotypeCaller per sample
  • Joint Genotyping: GenotypeGVCFs for cohort analysis
  • Somatic Calling: MuTect2 for tumor-normal pairs
  • Alternative Callers: FreeBayes, bcftools

3. Variant Processing

  • Variant quality score recalibration (VQSR)
  • Variant filtering and annotation
  • Variant effect prediction with SnpEff
  • Population frequency annotation

4. Quality Control

  • Alignment statistics with Samtools
  • Variant calling metrics
  • Concordance analysis
  • MultiQC reports

Installation

# Clone the repository
git clone https://github.com/tamoghnadas12/variant-calling-pipeline
cd variant-calling-pipeline

# Install dependencies
conda env create -f environment.yml
conda activate variant-calling

# Configure reference data
bash scripts/download_references.sh

Configuration

The pipeline is configured through:

  • config/config.yaml: Main configuration
  • config/samples.tsv: Sample metadata
  • config/units.tsv: Sequencing unit information

Example configuration:

# config/config.yaml
ref:
  genome: 'references/hg38.fasta'
  dbsnp: 'references/dbsnp_138.vcf'
  mills: 'references/Mills_and_1000G_gold_standard.indels.vcf'

calling:
  callers: ['gatk', 'freebayes']
  joint_calling: true
  somatic: false

processing:
  bqsr: true
  vqsr: true

Usage

Germline Variant Calling

# Run germline pipeline
snakemake --cores 16 --use-conda --config calling/somatic=false

# Joint genotyping for cohort
snakemake --cores 16 --use-conda --config calling/joint_calling=true

Somatic Variant Calling

# Run somatic pipeline
snakemake --cores 16 --use-conda --config calling/somatic=true

# Tumor-normal pairs specified in samples.tsv

Output Files

The pipeline generates organized output directories:

results/
├── alignments/           # BAM files and indices
├── variants/            # VCF files (per-sample and joint)
├── annotations/         # Annotated VCF files
├── metrics/             # QC metrics and statistics
├── reports/             # HTML and PDF reports
└── multiqc/             # MultiQC reports

Technologies Used

  • Workflow: Snakemake
  • Alignment: BWA-MEM
  • Variant Calling: GATK, FreeBayes, bcftools
  • Processing: Picard, Samtools
  • Annotation: SnpEff, ANNOVAR
  • QC: FastQC, MultiQC
  • Environment: Conda, Singularity

Documentation

Contributing

Please read our contributing guidelines for information on how to contribute to this project.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Continue Learning

One Small Win

Try this quick command to get started:

git clone https://github.com/tamoghnadas12/variant-calling-pipeline

Copy and paste this into your terminal to get started immediately.

Start Your Own Project

Use our battle-tested template to jumpstart your reproducible research workflows. Pre-configured environments, standardized structure, and example workflows included.

Use This Template
git clone https://github.com/Tamoghna12/bench2bash-starter
cd bench2bash-starter
conda env create -f env.yml
make run