The embodiments relate to a method of determining an antibiotic resistance profile for a bacterial microorganism and to a method of determining the resistance of a bacterial microorganism to an antibiotic drug.
Antibiotic resistance is a form of drug resistance whereby a sub-population of a microorganism, (e.g., a strain of a bacterial species), may survive and multiply despite exposure to an antibiotic drug. It is a serious and health concern for the individual patient as well as a major public health issue. Timely treatment of a bacterial infection requires the analysis of clinical isolates obtained from patients with regard to antibiotic resistance, in order to select an efficacious therapy.
Antibacterial drug resistance (ADR) represents a major health burden. According to the World Health Organization's antimicrobial resistance global report on surveillance, ADR leads to 25,000 deaths per year in Europe and 23,000 deaths per year in the US. In Europe, 2.5 million extra hospital days lead to societal cost of 1.5 billion euro. In the US, the direct cost of 2 million illnesses leads to 20 billion dollar direct cost. The overall cost is estimated to be substantially higher, reducing the gross domestic product (GDP) by up to 1.6%.
Currently, resistance/susceptibility testing is carried out by obtaining a culture of the suspicious bacteria, subjecting it to different antibiotic drug protocols and determining in which cases bacteria do not grow in the presence of a certain substance. In this case, the bacteria are not resistant (e.g., susceptible to the antibiotic drug) and the therapy may be administered to the respective patients. U.S. Pat. No. 7,335,485 describes a method of determining the antibiotic susceptibility of a microorganism, wherein the organism is cultured in the presence of an antibiotic drug to be tested. More recently, sensitive technologies as Mass Spectrometry are applied to determine resistance, but this still requires culturing of the microorganism to be tested in the presence of an antibiotic drug to be tested. Further, in all these techniques, each microorganism to be tested has to be tested against individual antibiotic drugs or drug combinations, requiring extensive, time-consuming, and cumbersome tests.
It is known that drug resistance may be associated with genetic polymorphisms. This holds for viruses, where resistance testing is established clinical practice (e.g., HIV genotyping). More recently, it has been shown that resistance has also genetic causes in bacteria and even higher organisms, such as humans where tumors resistance against certain cytostatic agents may be linked to genomic mutations.
Wozniak et al. (BMC Genomics 2012, 13(Suppl 7):523) disclose genetic determinants of drug resistance in Staphylococcus aureus based on genotype and phenotype data. Stoesser et al. disclose prediction of antimicrobial susceptibilities for Escherichia coli and Klebsiella pneumoniae isolates using whole genomic sequence data (J Antimicrob Chemother 2013; 68: 2234-2244).
Escherichia coli (E. coli) is a Gram-negative, facultative anaerobic, rod-shaped bacterium potentially found, e.g., in the lower gastro-intestinal tract of mammals. While many species of the Escherichia genus are harmless, some strains of some species are pathogenic in humans causing urinary tract infections, gastrointestinal disease, as well as a wide range of other pathologic conditions. E. coli is responsible for the majority of these pathologic conditions.
There remains a need for quick and efficient antibiotic resistance testing.
The scope of the present invention is defined solely by the appended claims and is not affected to any degree by the statements within this summary. The present embodiments may obviate one or more of the drawbacks or limitations in the related art.
Extensive studies were performed on the genome of E. coli bacteria resistant to antibiotic drugs and found remarkable differences to wild type E. coli. Based on this information, it is now possible to provide a detailed analysis on the resistance pattern of E. coli strains based on individual genes or mutations on a nucleotide level. This analysis involves the identification of a resistance against individual antibiotic drugs as well as clusters of them. This allows not only for the determination of a resistance to a single antibiotic drug, but also to groups of antibiotics such as lactam or quinolone antibiotics, or even to all relevant antibiotic drugs.
Therefore, the present embodiments will considerably facilitate the selection of an appropriate antibiotic drug for the treatment of an E. coli infection in a patient and thus will largely improve the quality of diagnosis and treatment.
According to a first aspect, the present embodiments are directed to a method of determining an antibiotic resistance profile for a bacterial microorganism belonging to the species E. coli including the acts of: a) providing a sample containing or suspected of containing the bacterial microorganism; b) determining the presence of a mutation in at least one gene of the bacterial microorganism selected from the group of genes listed in Table 4; wherein the presence of a mutation is indicative of a resistance to an antibiotic drug.
Table 4 is depicted in the following:
The presence or absence of a mutation in these genes is tested in relation to the reference strain E. coli K12 substrain DH10B (see also more detailed information in the following and in Example 1). In an embodiment, act b) includes determining the presence of a mutation in at least two or more genes selected from the group of Table 4, and wherein the presence of a mutation in at least two genes is indicative of a resistance to an antibiotic drug.
Instead of testing only single genes or mutants, a combination of several variant positions may improve the prediction accuracy and further reduce false positive findings that are influenced by other factors. Therefore, the presence of a mutation in 2, 3, 4, 5, 6, 7, 8 or 9 (or more) genes selected from Table 4 may be determined.
In a further embodiment, the present method includes in act b) determining the presence of a mutation in at least one gene selected from the group of genes listed in Table 5, and wherein the presence of a mutation in the at least one gene is indicative of a resistance to an antibiotic drug.
The genes according to Table 5 have never been described before in the context of antibiotic resistance of E. coli bacteria. They may be used for the determination of an antibiotic drug resistance of E. coli alone or in combination with other genes disclosed herein.
For E. coli, 86 ultra highly significant pairs of genetic positions and drug resistance (Table 2) were identified. The 86 combinations correspond to 35 genetic positions, since the sites may be significant for more than one single drug. Most importantly, the respective sites are located in 9 genes: hofB, allA, mukB, ymdC, potB, ycgK, ycgB, valS, yjjJ. These genes thus appear to be critical for antibiotic resistance/susceptibility. The identified mutations all lead to amino acid alterations, either to an exchange of amino acid at the respective position or the creation of a new stop-codon. For more detailed information, it is referred to Example 1, below.
One embodiment relates to a method of determining the resistance or susceptibility of a bacterial microorganism belonging to the species E. coli to an antibiotic drug including: providing a sample containing or suspected of containing the bacterial microorganism belonging to the species E. coli; determining from the sample a nucleic acid sequence information of at least one gene selected from the group of hofB, allA, mukB, ymdC, potB, ycgK, ycgB, valS, and yjjJ; and based on the determination of the genetic information determining the resistance or susceptibility to the antibiotic drug.
In a further embodiment, the presence of a mutation in at least one gene selected from the group of hofB, allA, mukB, ymdC, potB, ycgK, ycgB, valS, and yjjJ is determined. Thus, the presence of a mutation in at least one or 2, 3, 4, 5, 6, 7, 8 or 9 of these genes may be analyzed.
In a further embodiment, the presence of a mutation in at least one gene selected from the group of the following table 6 is determined. The exact amino acid exchange indicated in Table 6 may be determined.
Surprisingly, it was discovered that an overlap of mutations in functionally similar proteins of E. coli and K. pneumoniae exists. Interestingly, when considering the proteins that were associated significantly with at least one drug, an overlap of 1,746 proteins was found (same official name and more than 80 percent positives in BLAST in pairwise comparison) that are affected in E. coli as well as in K. pneumoniae. Extending the analysis to the exact AA exchanges in these proteins, an overlap of 55 mutated positions that are equal in both organisms were detected. Therefore, the above genes might form a valuable basis for the determination of the antibiotic resistance pattern in both, E. coli and K. pneumonia microorganisms.
According to an optional aspect, the nucleic acid sequence information may be the determination of the presence of a single nucleotide at a single position in at least one gene.
Thus the embodiments include a method wherein the presence of a single nucleotide polymorphism or mutation at a single nucleotide position is detected.
For example, this may be done in at least one gene selected from the group of hofB, allA, mukB, ymdC, potB, ycgK, ycgB, valS, and yjjJ. Therefore, according to an optional aspect, the mutation is a mutation selected from the group of mutations listed in table 2 (see below in Example 1). The present embodiments thus also include a method of determining an antibiotic resistance profile for a bacterial microorganism belonging to the species E. coli including the acts of a) providing a sample containing or suspected of containing the bacterial microorganism; b) determining the presence of a mutation in at least one position as identified in Table 2; wherein the presence of a mutation is indicative of a resistance to an antibiotic drug.
The determination may be made based on 1, 2, 3, 4, 5, 6, 7, and up to the 35 genetic positions identified in Table 2.
The method may include determining the resistance of E. coli to one or more antibiotic drugs. These drugs include, but are not restricted to antibiotic drugs selected from the group consisting of ampicillin sulbactam (A.S.), ampicillin (AM), amoxicillin clavulanate (AUG), aztreonam (AZT), ceftriaxone (CAX), ceftazidime (CAZ), cefotaxime (CFT), cefepime (CPM), ciprofloxacin (CP), ertapenem (ETP), levofloxacin (LVX), cefuroxime (CRM), piperazillin tazobactam (P/T), trimethoprim sulfamethoxazole (T/S), tobramycin (TO), gentamicin (GM), cefazolin (CFZ), cephalotin (CF), imipenem (IMP), meropenem (MER) and tetracycline (TE). See also Table 1.
It was discovered that mutations in certain genes are indicative not only for a resistance to one single antibiotic drug, but to groups containing several drugs.
For example, it turned out that in case of the group of lactam antibiotics, the presence of a mutation in the following genes: chbG, eutQ, flgL, gudD, gyrA, ldrA, menE, murB, murP, nepI, parC, pphB, ptrB, rhaD, ydiU, yegE, yegI, yfbL, yfiK, ygcR, ygiF, ygjM, yohG, and/or yrfB may be determined and is indicative for the presence of a resistance against antibiotics of this group.
The group of lactam antibiotics may include A.S., AM, AUG, AZT, CFZ, CPE, CFT, CAZ, CAX, CRM, CF, CP, IMP, MER, ETP and/or P/T. The p-value threshold for these identified genes is ≦10−45.
It is within the scope of the present embodiments that the above determination is done based on a single gene or 2, 3, 4, etc. genes of this group, however, a mutation may be determined in all of these genes in relation to the reference strain K12 substrain DH10B (see also below for further information).
In a further embodiment, the antibiotic drug is selected from quinolone or aminoglycoside antibiotics and the presence of a mutation in the following genes is determined: agaD, chbG, eutE, eutQ, gcvP, gspO, gyrA, livG, menE, nepI, parC, speC, tiaF, torZ, uidB, yegE, yegI, yejA, ygcU, ygfZ, ygiF, ygjM, yjjU, yjjW, ymdC, ypdB, yqjA, and/or ytfG.
The quinolone and aminoglycoside antibiotics may be selected from CP, LVX, GM and TO.
Surprisingly, the relevant genes completely overlapped regarding a resistance to quinolone and aminoglycoside antibiotics; the p-value threshold for these genes is ≦10−53. Also here, it is within the scope of the present embodiments that the determination is done based on a single gene or in 2, 3, 4, or more genes of this group only, however, a mutation may be determined in all of these genes in relation to the reference strain K12 substrain DH10B.
In a further embodiment, the antibiotic drug is selected from tetracycline and the presence of a mutation in at least one or more of the following genes is determined: astE, chbG, eutQ, flgL, gudD, gyrA, hemF, hypF, kdpE, ldrA, menE, murB, murP, nepI, ompC, parC, pphB, ptrB, and/or rhaD. The p-value threshold is ≦10−47.
In a still further embodiment, the antibiotic drug is selected from trimethoprim sulfmethoxazol and the presence of a mutation in at least one or more of the following genes is determined: astE, chbG, eutQ, flgL, gudD, gyrA, ldrA, menE, murB, nepI, parC, ycjX, ydiU, yegE, yfiK, ygcR, ygiF, and/or yrfB. The p-value threshold is ≦10−48.
In an embodiment, the method includes determining a mutation, wherein the mutation is selected from the group of mutations listed in Table 7. Table 7 is depicted in the following:
A part from the above genes indicative of a resistance against antibiotics, single nucleotide polymorphisms (=SNP's) may have a high significance for the presence of a resistance against defined antibiotic drugs. The analysis of these polymorphisms on a nucleotide level may further improve and accelerate the determination of a drug resistance to antibiotics in E. coli.
For example, a resistance of E. coli against the antibiotic drug AM may be determined by the presence of a single nucleotide polymorphism in at least one, for example, 1, 2, 3, 4, 5, or 6 of the following nucleotide positions: 2428183, 4525576, 1684413, 4636902, 1181357, 206427.
In an embodiment, the antibiotic drug is A/S and an SNP in at least one, for example, 1, 2 or 3 of the following nucleotide positions is detected: 2428183, 4054212, 1974644.
In a further embodiment, the antibiotic drug is AUG and a mutation in the following nucleotide position is detected: 2463877.
For a resistance to the antibiotic drug AZT, a mutation in at least one of the following nucleotide positions is detected: 2428183, 1615473.
In a still further embodiment, the antibiotic drug is CAX and a mutation in at least one of the following nucleotide positions is detected: 2428183, 1615473.
A resistance to the antibiotic drugs CFT, CP, CPE, CRM, GM, LVX, TO, T/S or CAZ may be detected by a mutation in the nucleotide position 2428183.
When the antibiotic drug is ETP, a mutation in at least one, for example 1, 2, 3, or 4 of the following nucleotide positions is detected: 2052365, 2233638, 4553471, 2565236.
In a further embodiment, the antibiotic drug is P/T and a mutation in at least one, for example, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, or 14 of the following nucleotide positions is detected: 2233638, 2216164, 2725302, 1567286, 2755319, 319290, 3240296, 1517573, 2178525, 2924554, 1516808, 37032, 1368519, 4575887.
The resistance to the respective antibiotic drug may be tested according to the decision diagrams of
A decision diagram or “decision tree” is a tree-like graph for prediction tasks, e.g., classification. Given a data set including a number of samples with feature values (e.g., any measurements, here SNPs) and class labels (e.g., resistant/not resistant against a certain drug), a decision tree models the decision process of inferring the sample class label from its feature values.
To build the model a given data set is used (as described above): Among all features (SNPs) and their possible values (DNA bases A, C, T, and G) the feature value is selected that achieves the optimal sample separation with respect to the given sample labels. That may be a SNP whose value for not resistant samples would be different as for the resistant samples. The selected feature value becomes the root of the tree (e.g., the first tree node, often drawn at the top) and the samples are split according to that feature, e.g., samples having that feature value and samples with another value. The resulting subsets of samples form new nodes and the feature selection and splitting process is repeated for each of them separately. This procedure stops if a specific criterion is fulfilled (e.g., no further improvement or maximal tree size is achieved).
The used graphical representation is defined as follows:
The tree root is drawn at the top. Each node contains following information: (1) Its feature and its value(s) drawn below the node, e.g. SNP 2428183=G. (2) Class label: 0=not resistant, 1=resistant. (3) Class distribution: The proportion of samples contained in that node belonging to class 0 or 1. (4) Proportion of samples contained in that node (w.r.t. to sample number used to build the tree). (5) Color: green=0, blue=1, the stronger the color the higher the certainty for the chosen class label.
The model may be built on the so-called training set and its prediction power may be tested on the so-called test set (e.g., to assess the model performance on unseen data). Both data sets may be independent and have no intersection. However, if the available data set is not large enough to form a sufficient large training and test data sets, we apply a procedure called k-fold cross validation (CV): We divide our data set into k subsets of equal size, then each of the k subsets is used once as test data and the rest as training data. The final tree is built on the whole data set, so the CV is only used to estimate the performance of the final model.
The classification of a new sample works as follows: (1) One starts at the tree root: the value of the root attribute in the sample is checked. If the value is equal to the root value then one goes left to the next node. Otherwise, one goes right. (2) The value of the current node attribute in the sample is checked and it is decided again whether to go left or right. And so on. (3) The process stops if one is in a leaf node (terminal node, node without outgoing edges). The sample gets the same label as that leaf node.
According to an optional aspect, a detected mutation is a mutation leading to an altered amino acid sequence in a polypeptide derived from a respective gene in which the detected mutation is located. According to this aspect, the detected mutation thus leads to a truncated or version of the polypeptide (wherein a new stop codon is created by the mutation) or a mutated version of the polypeptide having an amino acid exchange at the respective position.
According to an optional aspect, determining the nucleic acid sequence information or the presence of a mutation includes determining a partial sequence or an entire sequence of the at least one gene.
According to an optional aspect, determining the nucleic acid sequence information or the presence of a mutation includes determining a partial or entire sequence of the genome of the bacterial microorganism, wherein the partial or entire sequence of the genome includes at least a partial sequence of the at least one gene.
According to an optional aspect, the sample is a patient sample (clinical isolate).
According to an optional aspect, determining the nucleic acid sequence information or the presence of a mutation includes a using a next generation sequencing or high throughput sequencing method. According to a further aspect, a partial or entire genome sequence of the bacterial organism is determined by a using a next generation sequencing or high throughput sequencing method.
According to an optional aspect, the method further includes determining the resistance to 2, 3, 4, 5, or 6 antibiotic drugs.
In a further aspect, the present embodiments are directed to a diagnostic method of determining an antibiotic resistant E. coli infection in a patient, including the acts of: a) obtaining or providing a sample containing or suspected of containing E. coli from the patient; and b) determining the presence of at least one mutation in at least one gene as described above, wherein the presence of the at least one mutation is indicative of an antibiotic resistant E. coli infection in the patient.
In a still further aspect, the present embodiments are directed to a method of treating a patient suffering from an antibiotic resistant E. coli infection in a patient: a) obtaining or providing a sample containing or suspected of containing E. coli from the patient; b) determining the presence of at least one mutation in at least one gene as described above, wherein the presence of the at least one mutation is indicative of a resistance to one or more antibiotic drugs; c) identifying the at least one or more antibiotic drugs; d) selecting one or more antibiotic drugs different from the ones identified in act c) and being suitable for the treatment of an E. coli infection; and e) treating the patient with the one or more antibiotic drugs.
According to an embodiment, the patient is a vertebrate, e.g., a mammal such as a human patient.
Regarding the dosage of the antibiotic drug, it is referred to the established principles of pharmacology in human and veterinary medicine. For example, Forth, Henschler, Rummel “Allgemeine und spezielle Pharmakologie und Toxikologie”, 9th edition, 2005 might be used as a guideline. Regarding the formulation of a ready-to-use medicament, reference is made to “Remington, The Science and Practice of Pharmacy”, 22nd edition, 2013.
Unless defined otherwise, technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art.
The term “nucleic acid molecule” refers to a polynucleotide molecule having a defined sequence. It includes DNA molecules, RNA molecules, nucleotide analog molecules and combinations and derivatives thereof, such as DNA molecules or RNA molecules with incorporated nucleotide analogs or cDNA.
The term “nucleic acid sequence information” relates to an information that may be derived from the sequence of a nucleic acid molecule, such as the sequence itself or a variation in the sequence as compared to a reference sequence.
The term “mutation” relates to a variation in the sequence as compared to a reference sequence. Such a reference sequence may be a sequence determined in a predominant wild type organism or a reference organism, e.g., a defined and known bacterial strain or substrain. A mutation is, for example, a deletion of one or multiple nucleotides, an insertion of one or multiple nucleotides, or substitution of one or multiple nucleotides, duplication of one or a sequence of multiple nucleotides, translocation one or a sequence of multiple nucleotides, and, in particular, a single nucleotide polymorphism (SNP).
As used herein, a “sample” is a sample including nucleic acid molecule from a bacterial microorganism. Examples for samples are: cells, tissue, body fluids, biopsy specimens, blood, urine, saliva, sputum, plasma, serum, cell culture supernatant, swab sample, and others.
New and highly efficient methods of sequencing nucleic acids referred to as next generation sequencing have opened the possibility of large scale genomic analysis. The term “next generation sequencing” or “high throughput sequencing” refers to high-throughput sequencing technologies that parallelize the sequencing process, producing thousands or millions of sequences at once. Examples include Massively Parallel Signature Sequencing (MPSS) Polony sequencing, 454 pyrosequencing, Illumina (Solexa) sequencing, SOLiD sequencing, Ion semiconductor sequencing, DNA nanoball sequencing, Helioscope™ single molecule sequencing, Single Molecule SMRT™ sequencing, Single Molecule real time (RNAP) sequencing, Nanopore DNA sequencing.
It is to be understood that this invention is not limited to the particular component parts of the process acts of the methods described herein as such methods may vary. It is also to be understood that the terminology used herein is for purposes of describing particular embodiments only, and is not intended to be limiting. It is noted that, as used in the specification and the appended claims, the singular forms “a,” “an” and “the” include singular and/or plural referents unless the context clearly dictates otherwise. For example, the term “a” as used herein may be understood as one single entity or in the meaning of “one or more” entities. It is also to be understood that plural forms include singular and/or plural referents unless the context clearly dictates otherwise. It is moreover to be understood that, in case parameter ranges are given which are delimited by numeric values, the ranges are deemed to include these limitation values.
Here, a unique collection of genes was identified that allow the determination the resistance of a bacterial microorganism to commonly used antibiotic drugs.
A unique cohort of bacterial samples obtained from 150 clinical isolates was sequenced in order to understand the genetic resistance mechanisms by using High Throughput sequencing. In parallel, classical resistance tests were applied using 21 drugs or combinations of drugs (Table 1).
E. coli strains to be tested were seeded on agar plates and incubated under growth conditions for 24 hours. Then, colonies were picked and incubated in growth medium in the presence of a given antibiotic drug in dilution series under growth conditions for 16-20 hours. Bacterial growth was determined by observing turbidity.
Next mutations were searched that are highly correlated with the results of the phenotypic resistance test.
For sequencing, samples were prepared using a Nextera library preparation, followed by multiplexed sequencing using the Illuminat HiSeq 2500 system, paired end sequencing. Data were mapped with BWA (Li H. and Durbin R. (2010) Fast and accurate long-read alignment with Burrows-Wheeler Transform. Bioinformatics, Epub. [PMID: 20080505]) and SNP were called using samtools (Li H.*, Handsaker B.*, Wysoker A., Fennell T., Ruan J., Homer N., Marth G., Abecasis G., Durbin R. and 1000 Genome Project Data Processing Subgroup (2009) The Sequence alignment/map (SAM) format and SAMtools. Bioinformatics, 25, 2078-9. [PMID: 19505943]).
The reference sequence was obtained from Escherichia coli str. K-12 substr.
LOCUS CP000948 4686137 bp DNA circular BCT 5 Jun. 2008
DEFINITION Escherichia coli str. K12 substr. DH10B, complete genome.
SOURCE Escherichia coli str. K-12 substr. DH10B
ORGANISM Escherichia coli str. K-12 substr. DH10B
Bacteria; Proteobacteria; Gammaproteobacteria; Enterobacteriales; Enterobacteriaceae; Escherichia.
REFERENCE 1 (bases 1 to 4686137)
The mutations were matched to the genes and the amino acid changes were calculated. Using different algorithms (SVM, homology modeling) mutations leading to amino acid changes with likely pathogenicity/resistance were calculated. Known variants from the swissprot database were excluded and all variants in the respective genes selected.
As noted above, for E. coli 86 ultra highly significant pairs of genetic positions and drug resistance (Table 2) were identified. The 86 combinations correspond to 35 genetic positions, since the sites may be significant for more than one single drug. Most importantly, the respective sites are located in 9 genes: hofB, allA, mukB, ymdC, potB, ycgK, ycgB, valS, yjjJ. These genes thus appear to be critical for antibiotic resistance/susceptibility. The identified mutations all lead to amino acid alterations, either to an exchange of amino acid at the respective position or the creation of a new stop-codon. Thereby, resistance related variants for the following 6 antibiotic drugs were detected: CP, LVX, TE, CFZ, CRM, GM.
In Table 2, the columns are designated as follows:
Genome Pos: genomic position of the SNP/variant in the E. coli reference genome (see below);
Therapy: the therapy to which the mutation is significantly correlated, multiple therapies are in separate rows (if a SNP is correlated to e.g. 4 therapies this leads to 4 single rows);
P-value: significance value calculated using fishers exact test;
Gene pos: position of the mutation in the gene;
Ref: reference base, A, C, T, G;
Alt: Alternative base associated with resistance;
AA: original Amino acid;
Alt A: changed amino acid;
Gene: affected gene;
Exchange: amino acid exchange in standard nomenclature;
P-value was calculated using the fisher exact test based on contingency table with 4 fields: #samples Resistant/wild type; #samples Resistant/mutant; #samples not Resistant/wild type; #samples not Resistant/mutant
In Table 3, the identified genes and gene products are listed and identified by Gene ID of the gene and (NCBI) Accession number of the corresponding protein corresponding to:
The test is based on the distribution of the samples in the 4 fields. Even distribution indicates no significance, while clustering into two fields indicates significance.
Using this approach 35 highly significant, novel genetic positions or mutations in 9 genes (hofB, allA, mukB, ymdC, potB, ycgK, ycgB, valS, yjjJ) were identified that may be used for and allow the determination of resistance to commonly used antibiotic drugs. All the highly significant mutations described herein and listed in table 2 are non-conservative mutations leading to an amino acid exchange or a new stop-codon (designated with a “*” symbol in table 2), and thus to an altered protein. It is thus likely that the identified 9 genes play a significant role in antibiotic resistance and are putative targets for developing new drug candidates.
In this example, genetic susceptibility of E. coli to 21 different drugs from five drug classes is evaluated (see below).
Methods: Antimicrobial susceptibility test (AST) for 1,162 clinical E. coli isolates with varying spectra of resistance to 21 FDA-approved drugs was performed and genomes of all isolates were sequenced. Genetic variants were correlated to the AST data.
Results: 25,744 sites in the E. coli genome significantly correlated to drug resistance are reported. Highest significance was reached for the drugs Ciprofloxacin and Levofloxacin with respect to amino acid (AA) exchange S83L in GyrA (pCiprofloxacin=10−235 accuracy, specificity and sensitivity: 98%, 99%, and 94%; pLevofloxacin=10−209, 97%, 98%, 93%), a target for quinolones. The second most significant association was observed for ParC, a second target of quinolones (AA exchange S80I, pCiprofloxacin=10−196 and pLevofloxacin=10−194). Particularly many AA exchanges significantly associated with resistance to multiple drugs were discovered in YigN. By analyzing the sequence coverage on the genome level, a gene dose dependency of several genes is identified, including mmuP and mmuM, encoding a putative S-methylmethionine transporter and a homocysteine S-methyltransferase. Both loci are associated with resistance against â-lactams and quinolones.
Conclusion: a high-throughput screening and analysis pipeline is presented to investigate antibiotics resistance in E. coli strains. The results demonstrate the potential of genetics-based tests to predict susceptibility against antimicrobial drugs. In addition, novel correlations of gene dose to resistance are reported.
A systematic evaluation using E. coli was carried out. Specifically, 1,162 E. coli samples were collected over 22 years (1991-2013) across over 60 different institutes. For these isolates, the standard AST for 21 FDA-approved drugs was carried out and performed Whole Genome Sequencing (WGS) for the same 1,162 isolates to build a database revealing genetic sites for predicting AST from genetic data.
1,162 E. coli strains are selected from the microbiology strain collection at Siemens Healthcare Diagnostics (West Sacramento, Calif.) for susceptibility testing and whole genome sequencing.
Frozen reference AST panels were prepared following Clinical Laboratory Standards Institute (CLSI) recommendations′. The following antimicrobial agents (with μg/ml concentrations shown in parentheses) were included in the panels: Amoxicillin/K Clavulanate (0.5/0.25-64/32), Ampicillin (0.25-128), Ampicillin/Sulbactam (0.5/0.25-64/32), Aztreonam (0.25-64), Cefazolin (0.5-32), Cefepime (0.25-64), Cefotaxime (0.25-128), Ceftazidime (0.25-64), Ceftriaxone (0.25-128), Cefuroxime (1-64), Cephalothin (1-64), Ciprofloxacin (0.015-8), Ertepenem (0.12-32), Gentamicin (0.12-32), Imipenem (0.25-32), Levofloxacin (0.25-16), Meropenem (0.12-32), Piperacillin/Tazobactam (0.25/4-256/4), Tetracycline (0.5-64), Tobramycin (0.12-32), and Trimethoprim/Sulfamethoxazole (0.25/4.7-32/608). Prior to use with clinical isolates, AST panels were tested with QC strains. AST panels were considered acceptable for testing with clinical isolates when the QC results met QC ranges described by CLSI16.
Isolates were cultured on trypticase soy agar with 5% sheep blood (BBL, Cockeysville, Md.) and incubated in ambient air at 35±1° C. for 18-24 h. Isolated colonies (4-5 large colonies or 5-10 small colonies) were transferred to a 3 ml Sterile Inoculum Water (Siemens) and emulsified to a final turbidity of a 0.5 McFarland standard. 2 ml of this suspension was added to 25 ml Inoculum Water with Pluronic-F (Siemens). Using the Inoculator (Siemens) specific for frozen AST panels, 5 μl of the cell suspension was transferred to each well of the AST panel. The inoculated AST panels were incubated in ambient air at 35±1° C. for 16-20 h. Panel results were read visually, and minimal inhibitory concentrations (MIC) were determined.
Four streaks of each Gram-negative bacterial isolate cultured on trypticase soy agar containing 5% sheep blood and cell suspensions were made in sterile 1.5 ml collection tubes containing 50 μl Nuclease-Free Water (AM9930, Life Technologies). Bacterial isolate samples were stored at −20° C. until nucleic acid extraction. The Tissue Preparation System (TPS) (096D0382-02_01_B, Siemens) and the VERSANT® Tissue Preparation Reagents (TPR) kit (10632404B, Siemens) were used to extract DNA from these bacterial isolates. Prior to extraction, the bacterial isolates were thawed at room temperature and were pelleted at 2000 G for 5 seconds. The DNA extraction protocol DNAext was used for complete total nucleic acid extraction of 48 isolate samples and eluates, 50 μl each, in 4 hours. The total nucleic acid eluates were then transferred into 96-Well qPCR Detection Plates (401341, Agilent Technologies) for RNase A digestion, DNA quantitation, and plate DNA concentration standardization processes. RNase A (AM2271, Life Technologies), which was diluted in nuclease-free water following manufacturer's instructions, was added to 50 μl of the total nucleic acid eluate for a final working concentration of 20 μg/ml. Digestion enzyme and eluate mixture were incubated at 37° C. for 30 minutes using Siemens VERSANT® Amplification and Detection instrument. DNA from the RNase digested eluate was quantitated using the Quant-iT™ PicoGreen dsDNA Assay (P11496, Life Technologies) following the assay kit instruction, and fluorescence was determined on the Siemens VERSANT® Amplification and Detection instrument. Data analysis was performed using Microsoft® Excel 2007. 25 μl of the quantitated DNA eluates were transferred into a new 96-Well PCR plate for plate DNA concentration standardization prior to library preparation. Elution buffer from the TPR kit was used to adjust DNA concentration. The standardized DNA eluate plate was then stored at −80° C. until library preparation.
Prior to library preparation, quality control of isolated bacterial DNA was conducted using a Qubit 2.0 Fluorometer (Qubit dsDNA BR Assay Kit, Life Technologies) and an Agilent 2200 TapeStation (Genomic DNA ScreenTape, Agilent Technologies). NGS libraries were prepared in 96 well format using NexteraXT DNA Sample Preparation Kit and NexteraXT Index Kit for 96 Indexes (Illumina) according to the manufacturer's protocol. The resulting sequencing libraries were quantified in a qPCR-based approach using the KAPA SYBR FAST qPCR MasterMix Kit (Peqlab) on a ViiA 7 real time PCR system (Life Technologies). 96 samples were pooled per lane for paired-end sequencing (2×100 bp) on Illumina Hiseq2000 or Hiseq2500 sequencers using TruSeq PE Cluster v3 and TruSeq SBS v3 sequencing chemistry (Illumina). Basic sequencing quality parameters were determined using the FastQC quality control tool for high throughput sequence data (Babraham Bioinformatics Institute).
Raw paired-end sequencing data for the 1,162 E. coli samples were mapped against the E. coli DH10B reference (NC_010473)(see also above in Example 1) with BWA 0.6.1.20 The resulting SAM files were sorted, converted to BAM files, and PCR duplicates were marked using the Picard tools package 1.104 (http://picard.sourceforge.net/). The Genome Analysis Toolkit 3.1.1 (GATK)21 was used to call SNPs and indels for blocks of 200 E. coli samples (parameters: -ploidy 1-glm BOTH-stand_call_conf 30-stand_emit_conf 10). VCF files were combined into a single file and quality filtering for SNPs was carried out (QD<2.0∥FS>60.0∥MQ<40.0) and indels (QD<2.0∥FS>200.0). Detected variants were annotated with SnpEff22 to predict coding effects. For each annotated position, genotypes of all E. coli samples were considered. E. coli samples were split into two groups, low resistance group (having lower MIC concentration for the considered drug), and high resistance group (having higher MIC concentrations) with respect to a certain MIC concentration (breakpoint). To find the best breakpoint all thresholds were evaluated and p-values were computed with Fisher's exact test relying on a 2×2 contingency table (number of E. coli samples having the reference or variant genotype vs. number of samples belonging to the low and high resistance group). The best computed breakpoint was the threshold yielding the lowest p-value for a certain genomic position and drug. For further analyses positions with non-synonymous alterations and p-value <10-9 were considered. Based on the contingency table, the accuracy (ACC), sensitivity (SENS), specificity (SPEC), and the positive/negative predictive values (PPV/NPV) were calculated (
Since a potential reason for drug resistance is gene duplication, gene dose dependency was evaluated. For each sample the genomic coverage for each position was determined using BED Tools. 23 Gene ranges were extracted from the reference assembly NC_010473.gff and the normalized median coverage per gene was calculated. To compare low- and high-resistance isolates the best area under the curve (AUC) value was computed. Groups of at least 20% of all samples having a median coverage larger than zero for that gene and containing more than 15 samples per group were considered in order to exclude artifacts and cases with AUC>0.75 were further evaluated.
Results
The aim of our study was to demonstrate the feasibility of genetic antimicrobial susceptibility tests (GAST), to verify our method for known resistance mechanisms, and to discover novel mechanisms. Culture-based AST were performed for 1,162 E. coli isolates and 21 antimicrobial drugs belonging to 5 different drug classes: â-lactams, fluoroquinolones, aminoglycosides, tetracyclines, and folate synthesis inhibitors. The complete list of drugs is shown in Table 1. For the same 1,162 E. coli isolates, whole genome sequencing using Illumina's HiSeq2500 instrument was carried out.
Most Significant Sites in the E. coli Genome
In order to calculate genome-wide significance scores, all 1,162 E. coli genomes to the reference strain DH10B were mapped. For each genomic position, the base for each sample was determined and 973,226 sites were discovered that passed the quality filtering and in which at least one sample had a non-reference base. The respective sites were correlated to the AST data for the 21 drugs using Fisher's exact test. Our analysis revealed 25,744 sites where a genetic mutation significantly correlated with at least one drug (p-value<10−9) and led to a change in the AA sequence, including point mutation and small insertions and deletions. The highest significance was reached for AA exchange S83L in GyrA and the drug Ciprofloxacin (p=10′5). Remarkably, GyrA is one of the targets of Ciprofloxacin. For this position, three AA exchanges, S83L, S83W, S83A, are annotated in UniProt as conferring resistance to quinolones. For this site, only 5 false positive (0.4%) and 18 false negative samples (1.6%) were discovered while 1,139 samples were identified correctly, corresponding to accuracy, specificity, and sensitivity of 98.0%, 99.4% and 93.8%, respectively (
Besides the mutations in type II topoisomerase drug targets (GyrA/ParC), mutations in genes ygiF (A110T, p=10−67, acc=86%, spec=89.5%, sens=69.9%) and ygjM (A68V, p=10−63, acc=89.9%, spec=94.4%, sens=67.1%) have also a high significance. Compared to the above-described AA exchanges, these two sites demonstrate a substantially decreased sensitivity and positive predictive values (PPV). While the PPV for the four AA exchanges in GyrA and ParC was between 94.8% and 98.2%, the PPV of these two exchanges decreases to 59.0% and 70.8%. This means that the likelihood to be resistant given the exchanged AA is almost as high as the likelihood to be susceptible given the exchanged AA, limiting the probability that the respective AA exchanges are causative.
To discover other AA exchanges that are potentially causative for drug resistance, the list of all 25,744 sites were filtered (at least 150 resistant E. coli isolates carry the AA exchange, NPV>50%, PPV>75%). This filtering revealed 127 candidate sites (see also Table 4). Besides the already described exchanges in GyrA and ParC, AA exchanges in YdjO associated with predicted resistance to different â-lactams (V121E, 5120C, V118F, 1114V, K111E, and D112N) were discovered. Likewise, AA exchanges in YcbS (E848Q, E848*), RhsC (R717Q, W492C), YcbQ (T86I), YagR (S274T) and YeaU (N293K) were reported for lactams. Finally, AA exchanges related to quinolones, tetracycline, and lactams in YhaL were discovered (altogether 23 different sites).
In addition, the most significant non-synonymous AA exchange for each drug were computed (p-value threshold<10−9). Of 21 tested drugs, only two (Imipenem, Meropenem) were not found to be associated with an AA exchange with such a low p-value. Interestingly, the S83L mutation in GyrA is the predominant exchange in 15 drugs. For the drugs Ciprofloxacin and Levofloxacin, of which GyrA is a target, the p-values were however much lower than the p-values for this mutation in association with the remaining 13 drugs (>10−62 vs.<10−209). In addition, a significant decrease in sensitivity and/or PPV in these cases were observed: either the sensitivity or PPV is below 55% for drugs, of which GyrA is not the target, demonstrating that these measures are effective for separating mutations in true targets from others.
In 9 cases, mutations associated with drugs were detected in genes that are also encoding the targets for the respective drugs. This includes the mutations associated with Ciprofloxacin and Levofloxacin in GyrA (S83L, D87N, D87Y, D678E, E574D) and ParC (S80I, E84G, E84V, E84A, A192V, Q481H, A471G, T718A, Q198H), mutations associated with Cephalotin in AmpC (K40R, 1300V, T3351, A210P, Q196H, A236T, R248C), with Sulfamethoxazole in FolC (A319T, R88C, G217S), with Cefazolin in MrcB (D839E, QQQP815Q, R556C) and PbpC (L357V, V348A, A15T, A217V, Q495L, V768F, A701E, K766R, K766T, T764S, T764A, R602L, E446G, R669H, A202T) and with Ceftazidime in PbpG (A28V).
Mutations are not uniformly distributed across E. coli genes: for example, yfaL, fhuA, yehI, yjgL, and yeeJ carry over 120 non-synonymous variants per gene (
A potential reason for drug resistance is gene duplication or deletion, which may be observed in our dataset by inspecting the read coverage of different genes in the groups of resistant and susceptible isolates. To estimate the difference in coverage, AUC values were calculated for the normalized median coverage per gene in the two groups. Altogether 23 cases of abnormal differences in gene coverage between resistant and susceptible bacteria were discovered, resulting in an AUC>0.75 (
The considerable and ongoing increase of infections caused by multi-drug resistant pathogens presents a major threat for patients especially in hospital settings. The development of new drugs is a long and expensive venture, and stagnated in the last years despite increasing investments in research and development. The announcement by the FDA in September 2012 to form an internal task force for supporting the development of new antimicrobial drugs emphasizes the importance of this topic. Until these drugs become available, it has to be learned how to apply the available ones most efficiently. Abundant prescribing of broad-spectrum antibiotics promotes the development of multi-drug resistance, so a more careful selection of drugs is needed. Thus, methods that may quickly stratify patients and provide them with the optimal therapy are needed. Identifying the genetic loci in the infectious agent that are predominantly responsible for an observed resistance or susceptibility is a crucial point for this.
Here, 1,162 clinical isolates of E. coli were analyzed for their susceptibility towards 21 FDA approved drugs and combined this information with whole genome NGS data to identify potential variants that might be causative for the observed resistance patterns. In total, 25,744 significant sites were found (p-value <10−9). The method correctly identified already known drug targets in nine gene/drug combinations: gyrA (Ciprofloxacin, Levofloxacin), parC (Ciprofloxacin, Levofloxacin), ampC (Cephalothin), folC (Trimethoprim Sulfamethoxaxole), mrcB (Cefazolin), pbpC (Cefazolin), and pbpG (Ceftazidime). To identify other potential sites that might be secondary drug targets, filtering criteria were applied using the measures NPV/PPV, which provided a reduction in the number of potentially relevant sites from 25,744 to 127 sites.
Considering the best drug-target combinations according to the computed p-values, the AA exchange S83L in GyrA was found to be the predominant mutation for 15 drugs. Since only Ciprofloxacin or Levofloxacin are approved drugs for GyrA, the other associations to this protein might be a side-effect of multi-drug resistance. Employing additional measures such as sensitivity, PPV, and NPV facilitates the separation of causative drug targets from other variants as exemplified in this case.
Instead of using only single variants, a combination of several variant positions may improve the prediction accuracy and further reduce false positive findings that are influenced by other factors.
Since gene duplication and/or deletion might also play a role in resistance development mechanisms, the gene coverage combined with the resistance data was analyzed and 23 cases of abnormal differences in gene coverage between resistant and susceptible bacteria were discovered. Interestingly, an increase of genetic material in resistant bacteria, (e.g. for genes mmuP, mmuM, and yieI), was found, but also a decrease in certain genes such as mngB and mngR was found. While for membrane or transporter proteins both an increase or a decrease of gene dosage may influence drug susceptibility by not allowing a drug to permeate the membranes or to more efficiently transport it out of the cell, a decrease of the quantity of metabolic enzymes or transcription factors is not as easily interpretable in this context, and might be more or less directly related to the fitness of the isolates.
Another source of information that might improve the accuracy of our analysis are the strain-specific plasmids. Mapping the sequencing data against those plasmids will extend our knowledge about additional resistance mechanisms. In a first approach, a subset of sequencing data to about 300 E. coli plasmids was mapped. Among the genes having the most significant variant sites were e.g. repA1, trbI, psiB, and traG that are directly involved in replication, plasmid transfer, and maintenance and might play an indirect role in resistance development by giving its host the ability to facilitate spreading of resistance genes.
Compared to approaches using MALDI-TOF MS, the present approach has the advantage that it covers almost the complete genome and thus enables us to identify the potential genomic sites that might be related to resistance. While MALDI-TOF MS may also be used to identify point mutations in bacterial proteins33, this technology only detects a subset of proteins and of these not all are equally well covered. In addition, the identification and differentiation of certain related strains may not be feasible.
The present method allows to compute a best breakpoint for the separation of isolates into resistant and susceptible groups. A flexible software tool was designed that allows to consider besides the best breakpoints also values defined by different guidelines (e.g. European and US guidelines), preparing for an application of the GAST in different countries.
Another critical point of this study is that it analysis only included cultured bacteria strains. Several studies used culture-independent samples from urine, fecal samples, or vaginal swab and applied NGS to identify or characterize the pathogens directly. The advance of the NGS technology, including the development of new long read sequencers as PacBio and Oxford Nanopore, will further improve and speed up our procedure in the future to develop a culture-independent diagnostic test based on NGS data.
This approach is capable of identifying mutations in genes that are already known as drug targets, as well as detecting potential new target sites.
It is to be understood that the elements and features recited in the appended claims may be combined in different ways to produce new claims that likewise fall within the scope of the present invention. Thus, whereas the dependent claims appended below depend from only a single independent or dependent claim, it is to be understood that these dependent claims may, alternatively, be made to depend in the alternative from any preceding or following claim, whether independent or dependent, and that such new combinations are to be understood as forming a part of the present specification.
While the present invention has been described above by reference to various embodiments, it may be understood that many changes and modifications may be made to the described embodiments. It is therefore intended that the foregoing description be regarded as illustrative rather than limiting, and that it be understood that all equivalents and/or combinations of embodiments are intended to be included in this description.
Number | Date | Country | Kind |
---|---|---|---|
14153260.6 | Jan 2014 | EP | regional |
EP14179456 | Aug 2014 | EP | regional |
The present patent document is a §371 nationalization of PCT Application Serial Number PCT/EP2015/051926, filed Jan. 30, 2015, designating the United States, which is hereby incorporated by reference, and this patent document also claims the benefit of EP 14153260.6, filed Jan. 30, 2014, and EP 14179456.0, filed Aug. 1, 2014, which are also hereby incorporated by reference.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/EP2015/051926 | 1/30/2015 | WO | 00 |