**Workflow for preprocessing a reference file. **
Steps:
-When a GenBank file is not provided, it is downloaded from NCBI based on a accession number.
-When multiple plasmid GenBank files are provided, they are merged into one file.
-When any amount of plasmid GenBank files are provided, the reference is merged with the plasmid GenBank file(s) into one file. A FASTA file is also extracted.
-When no plasmid Genbank files are provided, a FASTA file is extracted from the reference GenBank file.
-A GFF3 file is extracted from the final GenBank file.
-The final step determines the relevant outputs.
All tool CWL files and other workflows can be found here:
Tools: https://git.wur.nl/ssb/automated-data-analysis/cwl/-/tree/main/tools
Workflows: https://git.wur.nl/ssb/automated-data-analysis/cwl/-/tree/main/workflows
Click and drag the diagram to pan, double click or use the controls to zoom.
Inputs
ID | Name | Description | Type |
---|---|---|---|
accession_number | accession number | accession number, used to download a GenBank file from NCBI, mandatory when not inputting a reference file. |
|
fasta_extraction_script | FASTA extraction script | Python script that extracts a FASTA file from GenBank Files. Passed externally within the git structure to avoid having to host a new image. |
|
gff3_extraction_script | GFF3 extraction script | BioPerl script that extracts a GFF3 file from GenBank Files. Passed externally within the git structure to avoid having to host a new image. |
|
merging_genbank_script | merging script | Python script to merge multiple GenBank Files. Passed externally within the git structure to avoid having to host a new image. |
|
plasmids | plasmid file(s) | Input plasmid GenBank files. |
|
reference_file | reference GenBank file | Reference file in GenBank format. |
|
Steps
ID | Name | Description |
---|---|---|
determine_output | determine output | Determines relevant final outputs. |
extract_fasta | extract FASTA | Extracts FASTA file from input reference file when no plasmids are provided. |
extract_gff3 | extract GFF3 | Extracts GFF3 annotation file from the (merged) reference. |
fetch_reference | fetch reference | Downloads the associated GenBank file from the supplied accession number. |
merge_plasmids | merge plasmids | Merges plasmids when more than one are present. |
merge_reference | merge plasmid(s) with reference | Merges the plasmid(s) with the reference GenBank file. |
Outputs
ID | Name | Description | Type |
---|---|---|---|
fasta_final | FASTA output file | Final FASTA output file. |
|
genbank_final | GenBank output file | Final GenBank output file. |
|
gff3 | GFF3 output file | Final GFF3 output file. |
|
Version History
Version 1 (earliest) Created 22nd Jul 2025 at 16:22 by Martijn Melissen
Initial commit
Open
master
1c0d264

Creator
Submitter
Views: 152 Downloads: 55
Created: 22nd Jul 2025 at 16:22
Last updated: 12th Aug 2025 at 11:40


None