CERES
CERES TEST Only!
  • Communities & Collections
  • Browse CERES
  • Library Staff Log In
    New user? Click here to register. Have you forgotten your password?
  1. Home
  2. Browse by Author

Browsing by Author "Pruthi, Siddharth Sanjay"

Now showing 1 - 1 of 1
Results Per Page
Sort Options
  • Loading...
    Thumbnail Image
    ItemOpen Access
    Leveraging large-scale Mycobacterium tuberculosis whole genome sequence data to characterise drug-resistant mutations using machine learning and statistical approaches
    (Springer, 2024-12-01) Pruthi, Siddharth Sanjay; Billows, Nina; Thorpe, Joseph; Campino, Susana; Phelan, Jody E.; Mohareb, Fady; Clark, Taane G.
    Tuberculosis disease (TB), caused by Mycobacterium tuberculosis (Mtb), is a major global public health problem, resulting in > 1 million deaths each year. Drug resistance (DR), including the multi-drug form (MDR-TB), is challenging control of the disease. Whilst many DR mutations in the Mtb genome are known, analysis of large datasets generated using whole genome sequencing (WGS) platforms can reveal new variants through the assessment of genotype-phenotype associations. Here, we apply tree-based ensemble methods to a dataset comprised of 35,777 Mtb WGS and phenotypic drug-susceptibility test data across first- and second-line drugs. We compare model performance across models trained using mutations in drug-specific regions and genome-wide variants, and find high predictive ability for both first-line (area under ROC curve (AUC); range 88.3–96.5) and second-line (AUC range 84.1–95.4) drugs. To aggregate information from low-frequency variants, we pool mutations by functional impact and observe large improvements in predictive accuracy (e.g., sensitivity: pyrazinamide + 25%; ethionamide + 10%). We further characterise loss-of-function mutations observed in resistant phenotypes, uncovering putative markers of resistance (e.g., ndh 293dupG, Rv3861 78delC). Finally, we profile the distribution of known DR-associated single nucleotide polymorphisms across discretised minimum inhibitory concentration (MIC) data generated from phenotypic testing (n = 12,066), and identify mutations associated with highly resistant phenotypes (e.g., inhA − 779G > T and 62T > C). Overall, our work demonstrates that applying machine learning to large-scale WGS data is useful for providing insights into predicting Mtb binary drug resistance and MIC phenotypes, thereby potentially assisting diagnosis and treatment decision-making for infection control.

Quick Links

  • About our Libraries
  • Cranfield Research Support
  • Cranfield University

Useful Links

  • Accessibility Statement
  • CERES Takedown Policy

Contacts-TwitterFacebookInstagramBlogs

Cranfield Campus
Cranfield, MK43 0AL
United Kingdom
T: +44 (0) 1234 750111
  • Cranfield University at Shrivenham
  • Shrivenham, SN6 8LA
  • United Kingdom
  • Email us: researchsupport@cranfield.ac.uk for REF Compliance or Open Access queries

Cranfield University copyright © 2002-2025
Cookie settings | Privacy policy | End User Agreement | Send Feedback