Metagenomics Analysis Toolkit
A comprehensive toolkit for metagenomics data analysis including taxonomic profiling, functional annotation, and diversity analysis.
Metagenomics Analysis Toolkit
A Python-based toolkit for comprehensive metagenomics data analysis, providing streamlined workflows for taxonomic profiling, functional annotation, and microbial diversity analysis.
Overview
The Metagenomics Analysis Toolkit simplifies complex metagenomics workflows by providing a unified interface for common analysis tasks. It integrates popular tools like MetaPhlAn, HUMAnN, and QIIME2 into easy-to-use Python functions.
Key Features
- Taxonomic Profiling: Species-level identification with confidence scores
- Functional Annotation: Pathway and gene family analysis
- Diversity Analysis: Alpha and beta diversity with statistical testing
- Visualization: Publication-ready plots and interactive dashboards
- Data Management: Standardized data structures and file formats
- Reproducibility: Integrated provenance tracking
Core Modules
1. Taxonomic Profiling
from metagenomics import taxonomy
# Run MetaPhlAn analysis
taxa_results = taxonomy.run_metaphlan(
fastq_files=['sample1_R1.fastq', 'sample1_R2.fastq'],
output_dir='results/taxonomy'
)
# Generate Krona plot
taxonomy.create_krona_plot(taxa_results, 'krona.html')
2. Functional Analysis
from metagenomics import functional
# Run HUMAnN analysis
func_results = functional.run_humann(
input_dir='results/taxonomy',
output_dir='results/functional'
)
# Pathway enrichment analysis
enrichment = functional.pathway_enrichment(
func_results,
metadata_file='metadata.tsv'
)
3. Diversity Analysis
from metagenomics import diversity
# Calculate alpha diversity
alpha_div = diversity.alpha_diversity(
taxa_table='otu_table.tsv',
method='shannon'
)
# Beta diversity and PCoA
beta_div, pcoa = diversity.beta_diversity(
taxa_table='otu_table.tsv',
method='bray-curtis'
)
Installation
# Clone the repository
git clone https://github.com/tamoghnadas12/metagenomics-toolkit
cd metagenomics-toolkit
# Create conda environment
conda env create -f environment.yml
conda activate metagenomics-toolkit
# Install package
pip install -e .
Usage Examples
Basic Workflow
import metagenomics as mg
# Initialize analysis
analysis = mg.MetagenomicsAnalysis(
input_dir='data/fastq',
output_dir='results'
)
# Run complete pipeline
analysis.run_taxonomic_profiling()
analysis.run_functional_annotation()
analysis.run_diversity_analysis()
# Generate report
analysis.generate_report()
Custom Analysis
# Load pre-computed results
taxa_df = mg.load_taxa_table('results/taxa.tsv')
func_df = mg.load_functional_table('results/functional.tsv')
# Custom visualization
from metagenomics.visualization import plot_taxa_barplot
plot_taxa_barplot(
taxa_df,
top_n=20,
group_by='treatment',
output_file='taxa_barplot.png'
)
Technologies Integrated
- Taxonomic Profiling: MetaPhlAn4, Kraken2, Centrifuge
- Functional Analysis: HUMAnN3, KEGG, MetaCyc
- Diversity Analysis: QIIME2, scikit-bio, vegan
- Visualization: Matplotlib, Seaborn, Plotly, Krona
- Data Management: Pandas, NumPy, HDF5
Documentation
Contributing
We welcome contributions! Please read our contributing guidelines and code of conduct.
License
This project is licensed under the BSD 3-Clause License - see the LICENSE file for details.
Continue Learning
One Small Win
Try this quick command to get started:
Copy and paste this into your terminal to get started immediately.
Related Content
Single-Cell RNA-Seq Analysis Framework
A comprehensive Python framework for single-cell RNA-Seq data analysis including preprocessing, clustering, differential expression, and trajectory analysis.
Protein Annotation Pipeline with Bash
Build a comprehensive bash pipeline for protein sequence annotation including functional prediction, domain identification, and structural analysis.
Proteomics Data Analysis Pipeline
A comprehensive pipeline for mass spectrometry-based proteomics data analysis including identification, quantification, and statistical analysis.
RNA-Seq Analysis Pipeline
A complete Snakemake pipeline for RNA-Seq data analysis from raw FASTQ files to differential expression results with MultiQC reporting.
Start Your Own Project
Use our battle-tested template to jumpstart your reproducible research workflows. Pre-configured environments, standardized structure, and example workflows included.
Use This Templategit clone https://github.com/Tamoghna12/bench2bash-starter
cd bench2bash-starter
conda env create -f env.yml
make run