The development of drug resistance in tuberculosis is a significant global problem, often leading to therapeutic failure. While genetic sequencing techniques are able to identify known resistance causing mutations, they still fall short of understanding underlying mechanisms and translating these to avoid therapeutic failure. I have explored the use of structural information to bridge this gulf in knowledge, to guide the identification and characterization of novel mutations within the gene rpoB.
Missense mutations identified from our genome-wide association study were subdivided into two distinct phenotypes: resistant (n=203) and susceptible (n=28). Next, the mCSM suite of computational tools was used to qualitatively analyze the molecular changes brought about by these mutations. Our analysis included measurements for changes in protein stability, dynamics, interaction binding affinities and physicochemical properties. These were integrated into a predictive binary classifier used to distinguish between rifampicin resistant and susceptible mutations.
Following training using the KNN algorithm, and validation with independent blind tests obtained from online sources TBDreamDB, tbvar, GMTV, MUBII-TB-DB, our preliminary binary classifier performed with a precision of 90.4% and an accuracy of 80.6%. Analysis of the model highlighted that changes in interactions within the RNA polymerase complex, including to the nucleic acid and rifampicin, were a significant driver of resistance.
This work highlights the power of using structural information to interpret genomic variants. Our structure-based tool is able to analyze missense mutations located throughout the RpoB structure, and not limited to the rifampicin resistance determining region like the current gold-standard (Gene-Xpert MTB/RIF). Following clinical validation, this tool can be used as a backbone to treatment strategies in tuberculosis patients presenting with novel mutations, and in stewardship efforts.