MAGNETO (automated workflow dedicated to MAG reconstruction)

Workflow Type: Snakemake
Stable

MAGNETO

MAGNETO is an automated snakemake workflow dedicated to MAG (Metagenome-Assembled Genomes) reconstruction from metagenomic data.

It includes a fully-automated coassembly step informed by optimal clustering of metagenomic distances, and implements complementary genome binning strategies, for improving MAG recovery.

Key Features

  • Quality Control (QC): Automatically assesses the quality and the contamination of input reads, ensuring that low-quality data are filtered out to improve downstream analyses.

  • Assembly: MAGNETO uses high-performance assembler to construct contigs from metagenomic reads.

  • Gene Collection: Extracts and compiles gene sequences from contigs, providing a comprehensive gene catalog directly after assembly.

  • Binning: Groups contigs into probable genomes using composition signatures and abundance profiles.

  • Genomes collection: Provides taxonomic and functional annotation of reconstructed MAGs.

Documentation

Full description in the wiki pages

Dependencies

A working installation of conda and git is mandatory to build magneto. If you have mamba already install on your system, the creation of the main environment will be faster.

  • python 3.8+
  • snakemake 7.32.4
  • mamba 1.5.8
  • conda 4.10.3
  • click 8.01

Other dependencies (such as python libraries for analysis, or to compute programs) are installed through the setup.py and conda management.
The default conda libraries management for snakemake is mamba, since a couple of month now. Even if it is possible to use conda instead of mamba for conda libraries management, the design of Magneto set mamba mandatory, as --conda-frontend conda does not propagate to the subworkflows.
Except if you use your own databases or already have downloaded them, MAGNETO will also require an internet connection.

Installation

Install from bioconda

conda install -c bioconda magneto

Install from source

Main conda environment installation

Start by creating a conda environment containing snakemake, mamba and the python module click:

conda create -n magneto snakemake-minimal=7.32.4 click=8.01 mamba=1.5.8 -c bioconda -c conda-forge

Note

  • If you have mamba or micromamba already installed, you can create the environment with it instead of conda.

Then, activate your environment:

conda activate magneto

Installation of magneto module in the conda environment

Installation is performed using pip:

git clone https://gitlab.univ-nantes.fr/bird_pipeline_registry/magneto.git
python3 -m pip install magneto/

Magneto is now installed in the "magneto" conda environment. Activate your environment whenever you need to run the pipeline !!

Initialization of working directory

magneto init --wd 

This will set configuration files into /config/ :

  • config.yaml, in which all parameters for the programs in the workflow may be set;
  • SGE/Slurm profiles, to run the workflow on clusters. Two versions are currently available, for SGE and Slurm, you can find them here : /config/[sge or slurm]/config.yaml (Certain details may need to be modified to reflect the specific characteristics of the cluster like queue or partition names.)
  • cluster_[sge or slurm].yaml, these files are used to specify the resources allocated for the workflow. It is completely modular and the resources can be adapted for each snakemake rule.

Input data

Magneto supports both single-end and paired-end reads in fasta/fasta.gz/fastq/fastq.gz format. You will need to provide a file at yaml format listing your reads files, following the general patterns below :

For a sample file containing paired-end reads files :

If you have one run by sample :

:
  :
  - 
  - 
:
  <...>

If you have multiple runs by sample :

:
  :
  - 
  - 
  :
  - 
  - 
:
  <...>

For a sample file containing single-ended reads files :

:
  :
  - 
  :
  - 
:
  <...>

Set the path of your sample file in the samples field of the config file (/config/config.yaml). You can find a template for sample files in /config/dummy_samples.yaml, which can be used for test. (see section below).

Usage

The general command line is as following :

magneto run   --profile   --rerun-incomplete 

With the submodules names :

  • qc performs reads trimmering, using fastp/fastqScreen;

  • motus performs taxonomic profiling of reads;

  • assembly performs the assembly of metagenomic reads to contigs, using Megahit;

  • genes performs functional and taxonomic annotation on contigs, to obtain the gene collection

  • binning performs binning of contigs to putative genomes, using metabat2;

  • genomes performs bins quality check and dereplication, using checkM and dRep;

  • all allows to run the complete workflow at once.

--skip-qc allows to bypass the reads trimming step.

You can choose the type of assembly you want to perform : --config target=[single_assembly or co_assembly]

A command line example is represented below:

magneto run all --profile config/slurm/ --config target=single_assembly --rerun-incomplete

The use of --profile allows to use Snakemake pre-configuration to run Magneto on clusters. Use either the SGE of Slurm profile, depending on your system. By default, Snakemake will use the config.yaml file located in the specified folder (in this example, config/slurm).

More details here.

Test

⚠️ Test data not currently working, work in progress

To test the workflow you can use a dummy dataset found in the test folder. Simply extract the archive to your working directory, e.g. :

tar -zxf test/dummy_dataset.tar.gz -C 

Then launch magneto with --dummy option (or simply set the samples field in /config/config.yaml to ``):

    magneto run all --dummy --rerun-incomplete --profile config/sge/

Databases management

MAGNETO used dedicated databases to run fastp and checkm. It downloads them automatically and stored them by default in /Database.

Output

Output of the different steps will be stored in /intermediate_results folder with following organization:

intermediate_results
|
|__reads
|   |
|   |__PE (for paired-end reads)
|   |__SE (for single-end reads)
|__assembly
|   |
|   |__single_assembly
|   |   |
|   |   |__megahit
|   |      |
|   |      |__
|   |      |__
|   |      |__ ...
|   |      |__
|   |__co_assembly
|      |
|      |__megahit
|      |  |
|      |  |__
|      |  |__
|      |  |__...
|      |  |__
|      |__simka (distance matrix computed between samples)
|      |__clusters (repartition of the samples in clusters inferred from matrix distances)
|__binning
|  |
|  |__single_binning
|  |  |
|  |  |__
|  |
|  |__co_binning
|  |  |
|  |  |__

The final output of the workflow will be stored in /genomes_collection subfolder. Graphics reports (notably from fastQscreen and multiQC) will be stored in /reports.

Steps implemented

pre-processing:

  • QC (fastp and fastqscreen)
  • merging (bbmerge)

**mOTU **

  • motus profiling (motus)

Assembly

  • single-assembly (megahit)
  • metagenomic distance between samples (simka)
  • co-assembly (megahit - [clustering : CAH + silhouette])
  • QC assembly and filtering (metacovest.py)
  • missamblies detection (DeepMased for single assembly only at this time.)
  • assembly taxonomic annotation (CAT)

Genomes collection

  • single-binning from single-assembly
  • single-binning from co-assembly
  • co-binning from single-assembly
  • co-binning from co-assembly
  • filter out contigs not consistent with bin's assignment (from CATBAT results, homemade script) #require an update
  • improve collection with external genomes db.
  • checkM (by batch)
  • dRep (by batch, for every taxonomic level in theory - 0.95: species, 0.99: species)
  • dRep95 followed by dRep99 on 95 clusters
  • GTDB-TK
  • functional annotation (eggNOG-mapper)
  • genomes_length table
  • genomes_reads_counts table
  • genomes_bp_covered table
  • genomes_abundance table
  • genomes_function table
  • genomes_taxo table

Genes collection

  • CDS prediction using prodigal per sample
  • concatenate all CDS
  • CDS clustering using linclust
  • read back mapping against genes collection
  • eggNOG_mapper
  • taxonomy using MMSEQ protocol (uniprot db as reference)

Reports:

  • multiqc report (pre-processing)
  • assembly report #homemade
MAGNETO workflow

Citing the pipeline

Churcheward B, Millet M, Bihouée A, Fertin G, Chaffron S.
MAGNETO: An Automated Workflow for Genome-Resolved Metagenomics.
mSystems. 2022 Jun 15:e0043222. doi: 10.1128/msystems.00432-22

Version History

1.3 (latest) Created 17th Jul 2025 at 15:33 by Hugo Lefeuvre

Edit README.md


Frozen 1.3 c94f307

master @ 819d4cf (earliest) Created 17th Jul 2025 at 15:27 by Hugo Lefeuvre

Edit README.md


Frozen master 819d4cf
help Creators and Submitter
Creators
Submitter
Citation
Chaffron, S., Bihouee, A., Churcheward, B., Millet, M., Fertin, G., & Lefeuvre, H. (2025). MAGNETO (automated workflow dedicated to MAG reconstruction). WorkflowHub. https://doi.org/10.48546/WORKFLOWHUB.WORKFLOW.1815.2
Activity

Views: 27   Downloads: 5

Created: 17th Jul 2025 at 15:27

Last updated: 17th Jul 2025 at 15:33

help Tags

This item has not yet been tagged.

help Attributions

None

Total size: 8.1 MB
Powered by
(v.1.17.0-main)
Copyright © 2008 - 2025 The University of Manchester and HITS gGmbH