WHALE
master @ 1036e1d

Workflow Type: Nextflow
Work-in-progress

WHALE: (W)orkflow for (H)uman-genome (A)nalysis of (L)ong-read (E)xperiments

Introduction

WHALE is a bioinformatics pipeline based on Nextflow and nf-core for long-read DNA sequencing analysis. It takes a samplesheet as input and performs quality control, alignment, variant calling and annotation.

Pipeline summary

  1. Read QC (FastQC)
  2. Present QC for raw reads (MultiQC)
  3. Alignment (Minimap2)
  4. Variant calling
  5. Merge variant calling
  6. Annotation

Usage

First, prepare a samplesheet with your input data. Depending on which step of the analysis you want to run, the input data type can be: fastq, bam (and bai), vcf or bed. The samplesheet should look as follows:

samplesheet.csv:

sample,fastq
A123,/path/to/your/input/file/A123.fastq.gz
B456,/path/to/your/input/file/B456.fastq.gz

There are two types of full analysis:

  • SNV analysis: -profile snv_analysis

  • SV analysis: -profile sv_analysis

    Each full analysis can start with:

    • Alignment: --step mapping (input data type: fastq) (default)
    • Variant calling: --step variant_calling (input data type: bam and bai)

A specific step of the analysis can be executed:

  • SNV calling (and merge): -profile snv_calling (input data type: bam and bai)
  • SV calling (and merge): -profile sv_calling (input data type: bam and bai)
  • SNV annotation: -profile snv_annotation (input data type: vcf)
  • SV annotation: -profile sv_annotation (input data type: bed)

Profiles to use in the CCC (UAM):

  • -profile uam,singularity,batch
  • -profile uam_allcontigs,singularity,batch

Profiles to use in the server:

  • -profile tblabserver,singularity
  • -profile tblabserver_allcontigs,singularity

Examples

SNV and SV analysis starting with variant calling in the server:

nextflow run WHALE \
   -profile snv_analysis,sv_analysis,tblabserver,singularity \
   --input samplesheet.csv \
   --outdir  \
   --step variant_calling

SV calling in the CCC:

nextflow run WHALE \
   -profile sv_calling,uam,singularity,batch \
   --input samplesheet.csv \
   --outdir 

Pipeline output

WHALE will create the following subdirectories in the output directory:

  • alignment
  • snv_calling
    • snv_merge
  • snv_annotation
  • sv_calling
    • sv_merge
  • sv_annotation
    • overlapping_sv_samples
  • multiqc
  • pipeline_info

Citations

An extensive list of references for the tools used by the pipeline can be found in the CITATIONS.md file.

Illustration by Yolanda Benítez

Version History

master @ 1036e1d (earliest) Created 12th Aug 2025 at 10:56 by Yolanda Benítez Quesada

Merge pull request #1 from RafaFariasVarona/profile_uam

update master branch


Frozen master 1036e1d
help Creators and Submitter
Creators
Not specified
Submitter
License
Activity

Views: 89   Downloads: 17

Created: 12th Aug 2025 at 10:56

Annotated Properties
Topic annotations
help Attributions

None

Total size: 144 MB
Powered by
(v.1.17.0-main)
Copyright © 2008 - 2025 The University of Manchester and HITS gGmbH