TY - JOUR
T1 - Application of MALDI-TOF MS and machine learning for the detection of SARS-CoV-2 and non-SARS-CoV-2 respiratory infections
AU - Yegorov, Sergey
AU - Kadyrova, Irina
AU - Korshukov, Ilya
AU - Sultanbekova, Aidana
AU - Kolesnikova, Yevgeniya
AU - Barkhanskaya, Valentina
AU - Bashirova, Tatiana
AU - Zhunusov, Yerzhan
AU - Li, Yevgeniya
AU - Parakhina, Viktoriya
AU - Kolesnichenko, Svetlana
AU - Baiken, Yeldar
AU - Matkarimov, Bakhyt
AU - Vazenmiller, Dmitriy
AU - Miller, Matthew S.
AU - Hortelano, Gonzalo H.
AU - Turmukhambetova, Anar
AU - Chesca, Antonella E.
AU - Babenko, Dmitriy
N1 - Publisher Copyright:
© 2024 American Society for Microbiology. All rights reserved.
PY - 2024/5
Y1 - 2024/5
N2 - Matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF MS) could aid the diagnosis of acute respiratory infections (ARIs) owing to its affordability and high-throughput capacity. MALDI-TOF MS has been proposed for use on commonly available respiratory samples, without specialized sample preparation, making this technology especially attractive for implementation in low-resource regions. Here, we assessed the utility of MALDI-TOF MS in differentiating severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) vs non-COVID acute respiratory infections (NCARIs) in a clinical lab setting in Kazakhstan. Nasopharyngeal swabs were collected from inpatients and outpatients with respiratory symptoms and from asymptomatic controls (ACs) in 2020–2022. PCR was used to differentiate SARS-CoV-2+ and NCARI cases. MALDI-TOF MS spectra were obtained for a total of 252 samples (115 SARS-CoV-2+, 98 NCARIs, and 39 ACs) without specialized sample preparation. In our first sub-analysis, we followed a published protocol for peak preprocessing and machine learning (ML), trained on publicly available spectra from South American SARS-CoV-2+ and NCARI samples. In our second sub-analysis, we trained ML models on a peak intensity matrix representative of both South American (SA) and Kazakhstan (Kaz) samples. Applying the established MALDI-TOF MS pipeline “as is” resulted in a high detection rate for SARS-CoV-2+ samples (91.0%), but low accuracy for NCARIs (48.0%) and ACs (67.0%) by the top-performing random forest model. After re-training of the ML algorithms on the SA-Kaz peak intensity matrix, the accuracy of detection by the top-performing support vector machine with radial basis function kernel model was at 88.0%, 95.0%, and 78% for the Kazakhstan SARS-CoV-2+, NCARI, and AC subjects, respectively, with a SARS-CoV-2 vs rest receiver operating characteristic area under the curve of 0.983 [0.958, 0.987]; a high differentiation accuracy was maintained for the South American SARS-CoV-2 and NCARIs. MALDI-TOF MS/ML is a feasible approach for the differentiation of ARI without specialized sample preparation. The implementation of MALDI-TOF MS/ML in a real clinical lab setting will necessitate continuous optimization to keep up with the rapidly evolving landscape of ARI.
AB - Matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF MS) could aid the diagnosis of acute respiratory infections (ARIs) owing to its affordability and high-throughput capacity. MALDI-TOF MS has been proposed for use on commonly available respiratory samples, without specialized sample preparation, making this technology especially attractive for implementation in low-resource regions. Here, we assessed the utility of MALDI-TOF MS in differentiating severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) vs non-COVID acute respiratory infections (NCARIs) in a clinical lab setting in Kazakhstan. Nasopharyngeal swabs were collected from inpatients and outpatients with respiratory symptoms and from asymptomatic controls (ACs) in 2020–2022. PCR was used to differentiate SARS-CoV-2+ and NCARI cases. MALDI-TOF MS spectra were obtained for a total of 252 samples (115 SARS-CoV-2+, 98 NCARIs, and 39 ACs) without specialized sample preparation. In our first sub-analysis, we followed a published protocol for peak preprocessing and machine learning (ML), trained on publicly available spectra from South American SARS-CoV-2+ and NCARI samples. In our second sub-analysis, we trained ML models on a peak intensity matrix representative of both South American (SA) and Kazakhstan (Kaz) samples. Applying the established MALDI-TOF MS pipeline “as is” resulted in a high detection rate for SARS-CoV-2+ samples (91.0%), but low accuracy for NCARIs (48.0%) and ACs (67.0%) by the top-performing random forest model. After re-training of the ML algorithms on the SA-Kaz peak intensity matrix, the accuracy of detection by the top-performing support vector machine with radial basis function kernel model was at 88.0%, 95.0%, and 78% for the Kazakhstan SARS-CoV-2+, NCARI, and AC subjects, respectively, with a SARS-CoV-2 vs rest receiver operating characteristic area under the curve of 0.983 [0.958, 0.987]; a high differentiation accuracy was maintained for the South American SARS-CoV-2 and NCARIs. MALDI-TOF MS/ML is a feasible approach for the differentiation of ARI without specialized sample preparation. The implementation of MALDI-TOF MS/ML in a real clinical lab setting will necessitate continuous optimization to keep up with the rapidly evolving landscape of ARI.
KW - acute respiratory infection
KW - COVID-19
KW - machine learning
KW - MALDI-TOF MS
KW - SARS-CoV-2
UR - http://www.scopus.com/inward/record.url?scp=85192114525&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85192114525&partnerID=8YFLogxK
U2 - 10.1128/spectrum.04068-23
DO - 10.1128/spectrum.04068-23
M3 - Article
C2 - 38497716
AN - SCOPUS:85192114525
SN - 2165-0497
VL - 12
JO - Microbiology spectrum
JF - Microbiology spectrum
IS - 5
ER -