Research Object Crate for Generalizable machine learning models for rapid antimicrobial resistance prediction in unseen healthcare settings

Original URL: https://workflowhub.eu/workflows/1816/ro_crate?version=1

# πŸ“„ Generalizable machine learning models for rapid antimicrobial resistance prediction in unseen healthcare settings This repository contains the code used for the experiments in the paper: **_Generalizable machine learning models for rapid antimicrobial resistance prediction in unseen healthcare settings_** by *Diane Duroux, Paul P. Meyer, Giovanni VisonΓ , and Niko Beerenwinkel*. ## βš™οΈ Install the dependencies Clone the repository, unzip OriginalData.zip, and install the necessary dependencies listed in the requirements.txt file ```bash pip install -r requirements.txt ``` ## πŸ’» AMR Classifier Training with ResMLP and inference The following command trains a ResMLP model for AMR classification using the preprocessed DRIAMS data. ### πŸ“¦ Output In `output//_results/`, the script generates: - `test_set_seed0.csv` ➀ Contains predictions: `species`, `sample_id`, `drug`, `response`, and `Prediction`. ### πŸ›  Required Arguments | Argument | Description | |-------------------------|-------------------------------------------------------------------------------------------------| | `--driams_long_table` | Path to the metadata file for the current dataset. | | `--spectra_matrix` | Path to the input mass spectra (either raw or MAE-encoded). | | `--sample_embedding_dim`| Dimension of the spectra input (6000 for raw, or same as for MAE). | | `--drugs_df` | Path to the antimicrobial compound encoding file. | | `--fingerprint_class` | Type of encoding: `'morgan_1024'`, `'molformer_github'`, or `'selfies_flattened_one_hot'`. | | `--fingerprint_size` | Size of the encoding: 1024 (Morgan), 768 (Molformer), or 24160 (SELFIES). | | `--split_type` | Set to `specific` if splits are pre-defined, else random. | | `--split_ids` | Path to the `data_splits.csv` file. | | `--experiment_group` | Name of the output folder. | | `--experiment_name` | Name of the output subfolder. | | `--seed` | Random seed for reproducibility. | | `--n_epochs` | Number of epochs for classifier training. | | `--learning_rate` | Learning rate for the optimizer. | | `--patience` | Number of epochs to wait before early stopping. | | `--batch_size` | Batch size for classifier training. | ### πŸš€ Example: ResMLP Training on DRIAMS B2018 with Raw Spectra + Morgan Fingerprints ```bash ulimit -Sn 10000 # Optional: increase file descriptor limit if needed python3 code/ResAMR_classifier.py \ --driams_long_table ProcessedData/B2018/combined_long_table.csv \ --spectra_matrix ProcessedData/B2018/rawSpectra_data.npy \ --sample_embedding_dim 6000 \ --drugs_df OriginalData/drug_fingerprints_Mol_selfies.csv \ --fingerprint_class morgan_1024 \ --fingerprint_size 1024 \ --split_type specific \ --split_ids ProcessedData/B2018/data_splits.csv \ --experiment_group rawMS_MorganFing \ --experiment_name ResMLP \ --seed 0 \ --n_epochs 2 \ --learning_rate 0.0003 \ --patience 10 \ --batch_size 128 ``` --- ## πŸ’° Funding This research was primarily supported by the ETH AI Center.

Author
License
GPL-3.0

Contents