reference (and plasmid) preprocessing workflow
Version 1

Workflow Type: Common Workflow Language

**Workflow for preprocessing a reference file. **

Steps:
-When a GenBank file is not provided, it is downloaded from NCBI based on a accession number.
-When multiple plasmid GenBank files are provided, they are merged into one file.
-When any amount of plasmid GenBank files are provided, the reference is merged with the plasmid GenBank file(s) into one file. A FASTA file is also extracted.
-When no plasmid Genbank files are provided, a FASTA file is extracted from the reference GenBank file.
-A GFF3 file is extracted from the final GenBank file.
-The final step determines the relevant outputs.

All tool CWL files and other workflows can be found here:
Tools: https://git.wur.nl/ssb/automated-data-analysis/cwl/-/tree/main/tools
Workflows: https://git.wur.nl/ssb/automated-data-analysis/cwl/-/tree/main/workflows

Click and drag the diagram to pan, double click or use the controls to zoom.

Inputs

ID Name Description Type
accession_number accession number accession number, used to download a GenBank file from NCBI, mandatory when not inputting a reference file.
  • string
fasta_extraction_script FASTA extraction script Python script that extracts a FASTA file from GenBank Files. Passed externally within the git structure to avoid having to host a new image.
  • File
gff3_extraction_script GFF3 extraction script BioPerl script that extracts a GFF3 file from GenBank Files. Passed externally within the git structure to avoid having to host a new image.
  • File
merging_genbank_script merging script Python script to merge multiple GenBank Files. Passed externally within the git structure to avoid having to host a new image.
  • File
plasmids plasmid file(s) Input plasmid GenBank files.
  • array containing
    • File
reference_file reference GenBank file Reference file in GenBank format.
  • File

Steps

ID Name Description
determine_output determine output Determines relevant final outputs.
extract_fasta extract FASTA Extracts FASTA file from input reference file when no plasmids are provided.
extract_gff3 extract GFF3 Extracts GFF3 annotation file from the (merged) reference.
fetch_reference fetch reference Downloads the associated GenBank file from the supplied accession number.
merge_plasmids merge plasmids Merges plasmids when more than one are present.
merge_reference merge plasmid(s) with reference Merges the plasmid(s) with the reference GenBank file.

Outputs

ID Name Description Type
fasta_final FASTA output file Final FASTA output file.
  • File
genbank_final GenBank output file Final GenBank output file.
  • File
gff3 GFF3 output file Final GFF3 output file.
  • File

Version History

Version 1 (earliest) Created 22nd Jul 2025 at 16:22 by Martijn Melissen

Initial commit


Open master 1c0d264
help Creators and Submitter
Activity

Views: 152   Downloads: 55

Created: 22nd Jul 2025 at 16:22

Last updated: 12th Aug 2025 at 11:40

help Attributions

None

Total size: 38 KB
Powered by
(v.1.17.0-main)
Copyright © 2008 - 2025 The University of Manchester and HITS gGmbH