Publications

39 Publications visible to you, out of a total of 39

(Show All)

Abstract (Expand)

Computational workflows describe the complex multi-step methods that are used for data collection, data preparation, analytics, predictive modelling, and simulation that lead to new data products. They …

Authors: Carole Goble, Sarah Cohen-Boulakia, Stian Soiland-Reyes, Daniel Garijo, Yolanda Gil, Michael R. Crusoe, Kristian Peters, Daniel Schober

Date Published: 2020

Publication Type: Journal

DOI: 10.1162/dint_a_00033

Citation: Data Intellegence 2(1-2):108-121

Created: 1st Dec 2021 at 21:43, Last updated: 16th Jan 2023 at 13:34

FAIR Computational Workflows

Testing

Abstract

Not specified

Authors: Carole Goble, Sarah Cohen-Boulakia, Stian Soiland-Reyes, Daniel Garijo, Yolanda Gil, Michael R. Crusoe, Kristian Peters, Daniel Schober

Date Published: 2020

Publication Type: Journal

DOI: 10.1162/dint_a_00033

Citation: Data Intellegence 2(1-2):108-121

Created: 2nd Dec 2021 at 10:16, Last updated: 16th Jan 2023 at 13:34

Framework para a Construção de Redes Filogenéticas em Ambiente de Computação de Alto Desempenho

HP2NET - Framework for construction of phylogenetic networks on High Performance Computing (HPC) environment

Abstract (Expand)

No presente artigo é apresentado uma avaliação de desempenho de um Framework de Redes Filogenéticas no ambiente do supercomputador Santos Dumont. O trabalho reforça os benefícios de paralelizar o …

Authors: Rafael Terra, Kary Ocaña, Carla Osthoff, Lucas Cruz, Philippe Navaux, Diego Carvalho

Date Published: 19th Oct 2022

Publication Type: InProceedings

DOI: 10.5753/wscad.2022.226366

Citation: Anais do XXIII Simpósio em Sistemas Computacionais de Alto Desempenho (WSCAD 2022),pp.73-84,Sociedade Brasileira de Computação

Created: 9th Jan 2024 at 12:54, Last updated: 9th Jan 2024 at 12:57

Framework para execução de workflows de redes filogenéticas em ambientes de computação de alto desempenho

HP2NET - Framework for construction of phylogenetic networks on High Performance Computing (HPC) environment

Abstract (Expand)

In the last years, the development of technologies, such as next-generation sequencing and high-performance computing allowed the execution of Bioinformatics experiments of high complexity and … computationally intensives. Different Bioinformatics fields need to use high-performance computing platforms to take advantage of the parallelism and tasks distribution, through specialized technologies of scientific workflows management systems. One of the Bioinformatics fields that need high-performance computing is phylogeny, a field that expresses the evolutive relations between genes and organisms, establishing which of them are most related evolutively. The phylogeny is used in several approaches, such as in the species classification; in the discovery of individuals’ kinship; in the identification of pathogens origins, and even in conservation biology. A way of representing these phylogenetic relations is using phylogenetic networks. However, the construction of these networks uses computationally intensive algorithms that require the constant manipulation of different input data. This work aims the development of a framework for construction of explicit phylogenetic networks, modeling a scientific workflow that adds different methods for the construction of the networks and the required input data treatment. The framework was developed to allow the use of multiple flows from the workflow in an automated, parallel, and distributed manner in a single execution and also to be executable in high- performance computing environments, constituting a challenging task, once the tools used are not developed focused in this environment. To orchestrate the workflow tasks, the scalable parallel programing library Parsl was used, allowing to do optimizations in the workflow’s tasks execution, performing better management of the resources. Two versions of the framework were developed, called Single Partition and Multi Partition, differing in the manner in which the resources are used. In tests performed, there was an improvement in the execution time of about five times when compared to the sequential execution of a flow without the optimizations. The framework was validated using public data of Dengue virus genomes, which were processed, annotated, and executed in the framework using the Santos Dumont supercomputer. The construction of the genomes’ explicit phylogenetic networks indicates that the framework is a functional, efficient, and easy to use tool.

Authors: Rafael Terra, Kary Ocaña, Carla Osthoff, Diego Carvalho

Date Published: 18th Feb 2022

Publication Type: Master's Thesis

Citation: TERRA, R. S. Framework para execução de workflows de redes filogenéticas em ambientes de computação de alto desempenho. 2022. 71 f. Tese. (Programa de Pós-Graduação em Modelagem Computacional) - Laboratório Nacional de Computação Científica, Petrópolis, 2022.

Created: 9th Jan 2024 at 13:16

Gerência e Análises de Workflows aplicados a Redes Filogenéticas de Genomas de Dengue no Brasil

HP2NET - Framework for construction of phylogenetic networks on High Performance Computing (HPC) environment

Abstract (Expand)

Processos evolutivos e dispersão de genomas de Dengue no Brasil são relevantes na direção do impacto e vigilância endemo-epidêmico e social de arboviroses emergentes. Árvores e redes filogenéticas …

Authors: Rafael Terra, Micaella Coelho, Lucas Cruz, Marco Garcia-Zapata, Luiz Gadelha, Carla Osthoff, Diego Carvalho, Kary Ocaña

Date Published: 18th Jul 2021

Publication Type: InProceedings

DOI: 10.5753/bresci.2021.15788

Citation: Anais do XV Brazilian e-Science Workshop (BRESCI 2021),pp.49-56,Sociedade Brasileira de Computação

Created: 9th Jan 2024 at 12:55, Last updated: 9th Jan 2024 at 12:56

INSaFLU: an automated open web-based bioinformatics suite “from-reads” for influenza whole-genome-sequencing-based surveillance

InSaFLU

Abstract (Expand)

Background A new era of flu surveillance has already started based on the genetic characterization and exploration of influenza virus evolution at whole-genome scale. Although this has been prioritized …d by national and international health authorities, the demanded technological transition to whole-genome sequencing (WGS)-based flu surveillance has been particularly delayed by the lack of bioinformatics infrastructures and/or expertise to deal with primary next-generation sequencing (NGS) data. Results We developed and implemented INSaFLU (“INSide the FLU”), which is the first influenza-oriented bioinformatics free web-based suite that deals with primary NGS data (reads) towards the automatic generation of the output data that are actually the core first-line “genetic requests” for effective and timely influenza laboratory surveillance (e.g., type and sub-type, gene and whole-genome consensus sequences, variants’ annotation, alignments and phylogenetic trees). By handling NGS data collected from any amplicon-based schema, the implemented pipeline enables any laboratory to perform multi-step software intensive analyses in a user-friendly manner without previous advanced training in bioinformatics. INSaFLU gives access to user-restricted sample databases and projects management, being a transparent and flexible tool specifically designed to automatically update project outputs as more samples are uploaded. Data integration is thus cumulative and scalable, fitting the need for a continuous epidemiological surveillance during the flu epidemics. Multiple outputs are provided in nomenclature-stable and standardized formats that can be explored in situ or through multiple compatible downstream applications for fine-tuned data analysis. This platform additionally flags samples as “putative mixed infections” if the population admixture enrolls influenza viruses with clearly distinct genetic backgrounds, and enriches the traditional “consensus-based” influenza genetic characterization with relevant data on influenza sub-population diversification through a depth analysis of intra-patient minor variants. This dual approach is expected to strengthen our ability not only to detect the emergence of antigenic and drug resistance variants but also to decode alternative pathways of influenza evolution and to unveil intricate routes of transmission. Conclusions In summary, INSaFLU supplies public health laboratories and influenza researchers with an open “one size fits all” framework, potentiating the operationalization of a harmonized multi-country WGS-based surveillance for influenza virus.

Authors: Vítor Borges, Miguel Pinheiro, Pedro Pechirra, Raquel Guiomar, João Paulo Gomes

Date Published: 1st Dec 2018

Publication Type: InProceedings

DOI: 10.1186/s13073-018-0555-0

Citation: Genome Med 10(1)

Created: 8th Apr 2020 at 11:56, Last updated: 16th Jan 2023 at 13:34

Landscape Analysis for the Specimen Data Refinery

Specimen Data Refinery

(Show All)

Abstract (Expand)

This report reviews the current state-of-the-art applied approaches on automated tools, services and workflows for extracting information from images of natural history specimens and their labels. We …

Authors: Stephanie Walton, Laurence Livermore, Olaf Bánki, Robert W. N. Cubey, Robyn Drinkwater, Markus Englund, Carole Goble, Quentin Groom, Christopher Kermorvant, Isabel Rey, Celia M Santos, Ben Scott, Alan Williams, Zhengzhe Wu

Date Published: 14th Aug 2020

Publication Type: Journal

DOI: 10.3897/rio.6.e57602

Citation: Walton S, Livermore L, Bánki O, Cubey RWN, Drinkwater R, Englund M, Goble C, Groom Q, Kermorvant C, Rey I, Santos CM, Scott B, Williams AR, Wu Z (2020) Landscape Analysis for the Specimen Data Refinery. Research Ideas and Outcomes 6: e57602. https://doi.org/10.3897/rio.6.e57602

Created: 8th Dec 2021 at 16:58, Last updated: 16th Jan 2023 at 13:34

Large-Scale Protein Interactions Prediction by Multiple Evidence Analysis Associated With an In-Silico Curation Strategy

yPublish - Bioinfo tools

Abstract (Expand)

Predicting the physical or functional associations through protein-protein interactions (PPIs) represents an integral approach for inferring novel protein functions and discovering new drug targets …

Authors: Yasmmin Côrtes Martins, Artur Ziviani, Marisa Fabiana Nicolás, Ana Tereza Ribeiro de Vasconcelos

Date Published: 6th Sep 2021

Publication Type: Journal

DOI: 10.3389/fbinf.2021.731345

Citation: Front. Bioinform. 1,731345

Created: 23rd Oct 2023 at 15:13, Last updated: 23rd Oct 2023 at 15:16

Laserchicken—A tool for distributed feature calculation from massive LiDAR point cloud datasets

Laserfarm applications to European demonstration sites

Abstract (Expand)

Point cloud datasets provided by LiDAR have become an integral part in many research fields including archaeology, forestry, and ecology. Facilitated by technological advances, the volume of these …

Authors: C. Meijer, M.W. Grootes, Z. Koma, Y. Dzigan, R. Gonçalves, B. Andela, G. van den Oord, E. Ranguelova, N. Renaud, W.D. Kissling

Date Published: 1st Jul 2020

Publication Type: Journal

DOI: 10.1016/j.softx.2020.100626

Citation: SoftwareX 12:100626

Created: 24th Apr 2025 at 15:48, Last updated: 24th Apr 2025 at 15:49

Laserfarm – A high-throughput workflow for generating geospatial data products of ecosystem structure from airborne laser scanning point clouds

Laserfarm applications to European demonstration sites

Abstract (Expand)

Quantifying ecosystem structure is of key importance for ecology, conservation, restoration, and biodiversity monitoring because the diversity, geographic distribution and abundance of animals, plants … and other organisms is tightly linked to the physical structure of vegetation and associated microclimates. Light Detection And Ranging (LiDAR) — an active remote sensing technique — can provide detailed and high resolution information on ecosystem structure because the laser pulse emitted from the sensor and its subsequent return signal from the vegetation (leaves, branches, stems) delivers three-dimensional point clouds from which metrics of vegetation structure (e.g. ecosystem height, cover, and structural complexity) can be derived. However, processing 3D LiDAR point clouds into geospatial data products of ecosystem structure remains challenging across broad spatial extents due to the large volume of national or regional point cloud datasets (typically multiple terabytes consisting of hundreds of billions of points). Here, we present a high-throughput workflow called ‘Laserfarm’ enabling the efficient, scalable and distributed processing of multi-terabyte LiDAR point clouds from national and regional airborne laser scanning (ALS) surveys into geospatial data products of ecosystem structure. Laserfarm is a free and open-source, end-to-end workflow which contains modular pipelines for the re-tiling, normalization, feature extraction and rasterization of point cloud information from ALS and other LiDAR surveys. The workflow is designed with horizontal scalability and can be deployed with distributed computing on different infrastructures, e.g. a cluster of virtual machines. We demonstrate the Laserfarm workflow by processing a country-wide multi-terabyte ALS dataset of the Netherlands (covering ∼34,000 km2 with ∼700 billion points and ∼ 16 TB uncompressed LiDAR point clouds) into 25 raster layers at 10 m resolution capturing ecosystem height, cover and structural complexity at a national extent. The Laserfarm workflow, implemented in Python and available as Jupyter Notebooks, is applicable to other LiDAR datasets and enables users to execute automated pipelines for generating consistent and reproducible geospatial data products of ecosystems structure from massive amounts of LiDAR point clouds on distributed computing infrastructures, including cloud computing environments. We provide information on workflow performance (including total CPU times, total wall-time estimates and average CPU times for single files and LiDAR metrics) and discuss how the Laserfarm workflow can be scaled to other LiDAR datasets and computing environments, including remote cloud infrastructures. The Laserfarm workflow allows a broad user community to process massive amounts of LiDAR point clouds for mapping vegetation structure, e.g. for applications in ecology, biodiversity monitoring and ecosystem restoration.

Authors: W. Daniel Kissling, Yifang Shi, Zsófia Koma, Christiaan Meijer, Ou Ku, Francesco Nattino, Arie C. Seijmonsbergen, Meiert W. Grootes

Date Published: 1st Dec 2022

Publication Type: Journal

DOI: 10.1016/j.ecoinf.2022.101836

Citation: Ecological Informatics 72:101836

Created: 7th Feb 2025 at 08:39, Last updated: 24th Apr 2025 at 16:02

Publications

Filters ×

Filters