Soil Metagenome Pipeline
main @ b8f617b

Workflow Type: Nextflow
Stable

Soil Metagenome Pipeline

Soil Metagenome Pipeline is a modular, Nextflow DSL2 workflow for assembling, polishing, binning, annotating, and functionally characterizing complex soil metagenomes. It orchestrates state-of-the-art tools for long- and short-read metagenomics, generates high-quality MAGs, assigns taxonomy, and screens for biosynthetic gene clusters (BGCs).

What it does

  • Assembles long-read metagenomes (e.g., ONT) with Flye and optionally polishes with Medaka and/or NextPolish using short reads.
  • Maps short/long reads to assemblies to compute coverage/depth for downstream binning and QC.
  • Bins contigs with multiple strategies (SemiBin2, VAMB, MetaCoAG, ComeBIN) and can integrate results.
  • Evaluates MAG quality with CheckM2 and assigns taxonomy with GTDB-Tk.
  • Annotates bins and/or assemblies (Bakta, eggNOG) and detects BGCs (antiSMASH) with network-based clustering (BiG-SCAPE).
  • Produces organized outputs suitable for downstream comparative genomics.

Key features

  • Modular DSL2 design: swap/extend modules under modules/ and submodules/.
  • Reproducible runtime via Conda/containers (profiles in conf/).
  • Sensible defaults with overridable parameters via nextflow.config or CLI.
  • Caching and resumability: supports -resume for efficient re-runs.

Modules at a glance (non-exhaustive)

  • Assembly and polishing: Flye, Medaka, NextPolish
  • Coverage mapping: minimap2/samtools, coverm, strobealign
  • Binning: SemiBin2, VAMB, MetaCoAG, ComeBIN, plus bin collection utilities
  • QC and taxonomy: CheckM2, GTDB-Tk
  • Annotation and function: Bakta (assemblies/bins), eggNOG
  • BGC discovery: antiSMASH (assemblies/bins), BiG-SCAPE networks
  • Taxonomic profiling: MMseqs2/MetaBuli helpers

Inputs

  • Reads: long reads (ONT/PacBio), optional short reads (Illumina).
  • Sample sheet: a tab-separated file like data/samples.tsv describing sample IDs and file paths.
  • Reference databases: external DBs required by some tools (e.g., GTDB-Tk, antiSMASH, BiG-SCAPE) are not bundled. Configure their locations via params or environment as appropriate.

Quick start

  • Dry run / graph preview: nextflow run . -dsl2 -preview

  • Example execution (adjust paths and profile to your environment): nextflow run . -profile conda -resume
    --reads '/path/to/_{R1,R2}.fastq.gz'
    --longreads '/path/to/
    .fastq.gz'
    --samples 'data/samples.tsv'
    --outdir 'results'

See conf/ for example profiles (conda, docker, singularity, slurm). Tune resources via nextflow.config using withName: blocks for process-specific CPU, memory, and time.

Citation

If you use Soil Metagenome Pipeline in your research, please cite the corresponding preprint:

A machine-readable citation file (CITATION.cff) is included in the repository root. GitHub will display a "Cite this repository" button.

License

This project is licensed under the GNU General Public License v3.0 or later (GPL-3.0-or-later). See the LICENSE file for the full text.

Version History

main @ b8f617b (earliest) Created 19th Sep 2025 at 19:34 by Caner Bağcı

add bin


Frozen main b8f617b
help Creators and Submitter
Creator
  • Caner Bagci
Submitter
Citation
Bagci, C. (2025). Soil Metagenome Pipeline. WorkflowHub. https://doi.org/10.48546/WORKFLOWHUB.WORKFLOW.1960.1
Activity

Views: 420   Downloads: 43

Created: 19th Sep 2025 at 19:34

Annotated Properties
help Attributions

None

Total size: 143 KB
Powered by
(v.1.17.0-main)
Copyright © 2008 - 2025 The University of Manchester and HITS gGmbH