The bioinformatic workflow presented here enables the analysis of RNA sequencing data obtained from human reproductive tissues in unexplained recurrent pregnancy loss (uRPL) research. This pipeline requires a sample sheet containing the sample information (example_input_data.csv) and gene expression matrices generated using the Salmon tool in the nf-core/rnaseq bioinformatics pipeline (example_count_data.csv). For more information on how to use the nf-core/rnaseq pipeline including the required inputs and expected outputs, please refer to their documentation. The processes used to download publicly available high throughput RNA-seq datasets and generate the Salmon gene expression matrices (e.g. counts files) can be found in our Github repository (also available as a file through WorkflowHub - Data_Preparation.md) alongside documentation showing the expected outputs from this pipeline.
The workflow developed during this project was designed with the intent to be used to compare datasets generated using different RNA sequencing methods by looking for concordance in differential expression analysis results, including differentially expressed genes and enriched functional pathways. This workflow can be accessed and used by others to help improve the standardisation and reproducibility of RNA-seq analytical processes, through consistent analysis methods and documentation.
This workflow can be split into different sections to complete the following analyses with the main packages used listed (tool versions available in the attached R script)
Section 1: Intialising environment and loading required packages and files
Section 2: Principal Component Analysis (PCAtools)
-
Section 2.1: Generating PCA objects to be used in Sections 2.2-2.4 (PCAtools)
-
Section 2.2: Principal Component Retention (PCAtools)
-
Section 2.3: Confounding factor identification using Eigencor plots and Pearson's Correlation coefficients (PCAtools)
-
Section 2.4: Generate PCA plots with arrows representing confounding numeric variables (ggplot2)
Section 3: Differential Expression Analysis (DESeq2)
- Assess concordance in differential expression between datasets (ggVenn)
Section 4: Functional Annotation of KEGG pathways (clusterProfiler, pathview)
Version History
Version 1 (earliest) Created 2nd Oct 2025 at 00:48 by Isabella Brown
Initial commit
Open
master
a562786

Creators
Submitter
Views: 57 Downloads: 8
Created: 2nd Oct 2025 at 00:48
Last updated: 3rd Oct 2025 at 05:29

None