The vaginal microbiome plays a vital role in gynecological and reproductive health. Lactobacillus-predominated vaginal microbiota constitute the first line of defense against infection. Protective mechanisms include lactic acid production by Lactobacillus spp., which acidifies the vaginal microenvironment and elicits anti-inflammatory effects [1-4]. This environment wards off non-indigenous organisms, including causative agents of sexually transmitted infections (STIs) such as HIV, and bacteria associated with bacterial vaginosis (BV) [5-7]. However, vaginal Lactobacillus spp. are functionally diverse. For example, L. crispatus and L. gasseri are capable of producing both the D- and L-isomers of lactic acid, L. jensenii produces only the D-isomer, and L. iners only the L-isomer [4, 8]. These key features have implications for susceptibilities to pathogens [9, 10].
The vaginal microbiota has been previously shown to cluster into community state types (CSTs) that reflect differences in bacterial species composition and abundance [1, 11]. Lactobacillus spp. predominate four of the five CSTs (CST I: L. crispatus; CST II: L. gasseri; CST III: L. iners; CST V: L. jensenii). In contrast, CST IV communities are characterized by a paucity of lactobacilli and the presence of a diverse array of anaerobes such as Gardnerella vaginalis and “Ca. Lachnocurva vaginae”. CST IV is found, albeit not exclusively, during episodes of BV, a condition associated with increased risk to sexually transmitted infections, including HIV, as well as preterm birth and other gynecological and obstetric adverse outcomes [12-20].
BV is characterized by a lack of Lactobacillus spp. in the vagina and the presence of a diverse, anaerobic microbiota. It is a common form of vaginitis affecting women of all ages, races and socioeconomic statuses with an estimated prevalence of 23-29% [43,57] and closer to 50% in sub-Saharan Africa [78-81]. In the U.S., women of African or Latin descent experience higher rates of prevalence (33.2% and 30.7%, respectively) compared to white or Asian women [57]. BV is clinically defined by observing 3 of 4 Amsel's criteria (Amsel-BV; vaginal pH >4.5; abnormal discharge; and on wet mount, presence of clue cells and fishy odor with 10% KOH) [21]. Patients presenting with symptoms and satisfying the Amsel's criteria (symptomatic Amsel-BV) are treated with antibiotics, however, efficacy is poor, and recurrence is common [21-24]. In research settings, BV is often defined by scoring Gram stained vaginal smears (Nugent-BV) [25] or molecular typing of bacterial composition by sequencing marker genes (molecular-BV) [26]. There is no definition of BV that relies on both the composition and function of the microbiome.
Species-level composition of the vaginal microbiota used for CSTs may not suffice to accurately capture associations between the vaginal microbiome and outcomes of interest because functional differences exist between strains of the same species. For example, in the skin microbiome strains of Staphylococcus aureus or Streptococcus pyogenes elicit different acute immune responses [27]. Similarly, genomic and functional analyses of Lactobacillus rhamnosus strains demonstrate distinct adaptations to specific niches (for example, the gut versus the oral cavity) [28]. While functional differences likely exist between strains of the same species in the vaginal microbiota, metagenomic studies show that combinations of multiple strains co-exist within a single vaginal microbiome [29, 30]. These strain assemblages are known as metagenomic subspecies or mgSs [29], and are important to consider as they potentially impact the functional diversity and resilience of a species in a microbiome.
Determining the mechanistic consequences and health outcomes associated with mgSs may improve precision of risk estimates and interventions.
Relevant to the characterization of metagenomic subspecies (mgSs), recent findings have shown that multiple strains of the same species are commonly observed in the vaginal microbiome [29], and that samples can be clustered into mgSs defined by unique strain combinations represented by species-specific gene sets, and thus unique sets of functions. These critical observations led to the conceptualization of vaginal microbiome classifications based on their mgSs compositions and abundance, and thus defined by both species' composition and functions, i.e., metagenomic community state types (MgCSTs). MgCSTs describe vaginal microbiomes through a new lens, one that includes both compositional and functional dimensions.
MgCSTs are composed of unique combinations of mgSs. A two-step classifier that assigns metagenomic subspecies and mgCSTs was developed and validated, and it is designed to work in concert with the vaginal non-redundant gene database, VIRGO [29]. This classifier will facilitate reproducibility and comparisons across studies.
MgCSTs allow integration of the taxonomic composition and functional potential of vaginal microbiomes in prognostic, diagnostic and therapeutic strategies, as reported herein. The present invention takes advantage of MgCST classifications for use in prognostic, diagnostic and therapeutic strategies associated with bacterial vaginosis (BV), along with other important uses.
Thus, and in non-limiting examples, the invention is drawn to methods of characterizing a vaginal microbiome, methods of identifying a subject predisposed to develop bacterial vaginosis, methods of identifying a subject predisposed to re-develop bacterial vaginosis, methods of diagnosing bacterial vaginosis in a subject, methods of treating bacterial vaginosis in a subject, and methods of preventing bacterial vaginosis in a subject.
In particular, and in a first embodiment, the invention is directed to methods of characterizing a vaginal microbiome, comprising obtaining a vaginal microbiome sample from a subject, determining the metagenomic community state type (MgCST) of the sample, and classifying the MgCST as one of MgCSTs 1-27.
In a second embodiment, the invention is directed to methods of identifying a subject predisposed to develop bacterial vaginosis, comprising obtaining a vaginal microbiome sample from a subject, and determining the metagenomic community state type (MgCST) of the sample, wherein when the MgCST of the subject is determined to be a MgCST associated with bacterial vaginosis, the subject is identified as a subject predisposed to develop bacterial vaginosis.
In certain aspects of this embodiment, a MgCST associated with bacterial vaginosis is one or more of MgCSTs 12 and 17-25.
In a third embodiment, the invention is directed to methods of identifying a subject predisposed to re-develop bacterial vaginosis, comprising determining the metagenomic community state type (MgCST) of a vaginal microbiome of a subject after treatment for bacterial vaginosis and classifying it as one of MgCSTs 1-27, wherein when the MgCST of the subject is classified as one or more of MgCSTs 12 and 17-25, the subject is predisposed to re-develop bacterial vaginosis.
In certain aspects of this embodiment, when the MgCST of the subject is classified as one or more of MgCSTs 19 and 22, the subject is predisposed to re-develop bacterial vaginosis.
In a fourth embodiment, the invention is directed to methods of diagnosing bacterial vaginosis in a subject, comprising obtaining a vaginal microbiome sample from a subject, and determining the metagenomic community state type (MgCST) of the sample, wherein when the MgCST of the subject is determined to be a MgCST associated with bacterial vaginosis, the subject is diagnosed as having bacterial vaginosis.
In certain aspects of this embodiment, a MgCST associated with bacterial vaginosis is one or more of MgCSTs 12 and 17-25.
In a fifth embodiment, the invention is directed to methods of treating bacterial vaginosis in a subject, comprising obtaining a vaginal microbiome sample from a subject, determining the metagenomic community state type (MgCST) of the sample, and administering a therapeutically effective amount of an antibacterial agent to the subject when the MgCST of the subject is determined to be a MgCST associated with bacterial vaginosis.
In certain aspects of this embodiment, a MgCST associated with bacterial vaginosis is one or more of MgCSTs 12 and 17-25.
In certain aspects of this embodiment, the antibacterial agent is metronidazole or clindamycin.
In a sixth embodiment, the invention is directed to methods of preventing bacterial vaginosis in a subject, comprising obtaining a vaginal microbiome sample from a subject, determining the metagenomic community state type (MgCST) of the sample, and administering a prophylactically effective amount of an antibacterial agent to the subject when the MgCST of the subject is determined to be a MgCST associated with bacterial vaginosis.
In certain aspects of this embodiment, a MgCST associated with bacterial vaginosis is one or more of MgCSTs 12 and 17-25.
In certain aspects of this embodiment, the antibacterial agent is metronidazole or clindamycin.
In each of the relevant embodiments and aspects of the invention as summarized herein, the methods of the invention may further comprise determining one or more of (a) age of the subject, (b) race of the subject, (c) vaginal pH of the subject, and (d) Gram stain assessment of the subject.
In each of the relevant embodiments and aspects of the invention as summarized herein, the methods of the invention may further comprise determining one or more metagenomic subspecies (mgSs) cluster in the sample.
In each of the relevant embodiments and aspects of the invention as summarized herein, the MgCST may be determined by a two-step classifier that assigns one or both of MgSs and mgCSTs. The classifier may be based on mgS composition or abundance, or both. The classifier may use a vaginal non-redundant gene database (VIRGO).
In each of the relevant embodiments and aspects of the invention as summarized herein, the subject may be a human.
The foregoing has outlined rather broadly the features and technical advantages of the present invention in order that the detailed description of the invention that follows may be better understood. Additional features and advantages of the invention will be described herein, which form the subject of the claims of the invention. It should be appreciated by those skilled in the art that any conception and specific embodiment disclosed herein may be readily utilized as a basis for modifying or designing other structures for carrying out the same purposes of the present invention. It should also be realized by those skilled in the art that such equivalent constructions do not depart from the spirit and scope of the invention as set forth in the appended claims. The novel features which are believed to be characteristic of the invention, both as to its organization and method of operation, together with further objects and advantages will be better understood from the following description when considered in connection with the accompanying figures. It is to be expressly understood, however, that any description, figure, example, etc. is provided for the purpose of illustration and description only and is by no means intended to define the limits of the invention.
The patent or application file contains at least one drawing executed in color. Copies of this patent or patent application publication with color drawing(s) will be provided by the Office upon request and payment of the necessary fee.
Unless otherwise noted, technical terms are used according to conventional usage. Definitions of common terms in molecular biology may be found, for example, in Benjamin Lewin, Genes VII, published by Oxford University Press, 2000 (ISBN 019879276X); Kendrew et al. (eds.); The Encyclopedia of Molecular Biology, published by Blackwell Publishers, 1994 (ISBN 0632021829); and Robert A. Meyers (ed.), Molecular Biology and Biotechnology: a Comprehensive Desk Reference, published by Wiley, John & Sons, Inc., 1995 (ISBN 0471186341); and other similar technical references.
As used herein, “a” or “an” may mean one or more. As used herein when used in conjunction with the word “comprising,” the words “a” or “an” may mean one or more than one. As used herein “another” may mean at least a second or more. Furthermore, unless otherwise required by context, singular terms include pluralities and plural terms include the singular.
As used herein, “about” refers to a numeric value, including, for example, whole numbers, fractions, and percentages, whether or not explicitly indicated. The term “about” generally refers to a range of numerical values (e.g., +/−5-10% of the recited value) that one of ordinary skill in the art would consider equivalent to the recited value (e.g., having the same function or result). In some instances, the term “about” may include numerical values that are rounded to the nearest significant figure.
MgCSTs are categories of microbiomes classified using taxonomy and the functional potential encoded in their metagenomes. MgCSTs reflect unique combinations of metagenomic subspecies (mgSs), which are assemblages of bacterial strains of the same species, within a microbiome. As demonstrated herein, mgCSTs are associated with demographics such as age and race, as well as vaginal pH and Gram stain assessment of vaginal smears. Importantly, these associations varied between mgCSTs predominated by the same bacterial species. A subset of mgCSTs, including three of the six predominated by G. vaginalis mgSs, as well as a mgSs of L. iners, were associated with a greater likelihood of Amsel bacterial vaginosis diagnosis. This L. iners mgSs, among other functional features, encoded enhanced genetic capabilities for epithelial cell attachment that could facilitate cytotoxin-mediated cell lysis.
The present invention thus demonstrates that MgCSTs are a novel and easily implemented approach to reducing the dimension of complex metagenomic datasets, while maintaining their functional uniqueness. MgCSTs enable investigation of multiple strains of the same species and the functional diversity in that species. Such investigations of functional diversity may be key to unraveling the pathways by which the vaginal microbiome modulates protection to the genital tract. Importantly, the findings presented herein support the hypothesis that functional differences between vaginal microbiomes, including those that may look compositionally similar, are critical considerations in vaginal health.
Based on these findings, the present invention is directed, in non-limiting examples, to methods for characterizing a vaginal microbiome as well as more practical applications such as methods of identifying a subject predisposed to develop bacterial vaginosis, methods of identifying a subject predisposed to re-develop bacterial vaginosis, methods of diagnosing bacterial vaginosis in a subject, methods of treating bacterial vaginosis in a subject, and methods of preventing bacterial vaginosis in a subject.
As summarized above, in a first embodiment the invention is directed to methods of characterizing a vaginal microbiome, comprising obtaining a vaginal microbiome sample from a subject, determining the metagenomic community state type (MgCST) of the sample, and classifying the MgCST as one of MgCSTs 1-27.
In a second embodiment, the invention is directed to methods of identifying a subject predisposed to develop bacterial vaginosis, comprising obtaining a vaginal microbiome sample from a subject, and determining the metagenomic community state type (MgCST) of the sample, wherein when the MgCST of the subject is determined to be a MgCST associated with bacterial vaginosis, the subject is identified as a subject predisposed to develop bacterial vaginosis.
In certain aspects of this embodiment, a MgCST associated with bacterial vaginosis is one or more of MgCSTs 12 and 17-25.
In a third embodiment, the invention is directed to methods of identifying a subject predisposed to re-develop bacterial vaginosis, comprising determining the metagenomic community state type (MgCST) of a vaginal microbiome of a subject after treatment for bacterial vaginosis and classifying it as one of MgCSTs 1-27, wherein when the MgCST of the subject is classified as one or more of MgCSTs 12 and 17-25, the subject is predisposed to re-develop bacterial vaginosis.
In certain aspects of this embodiment, wherein when the MgCST of the subject is classified as one or more of MgCSTs 19 and 22, the subject is predisposed to re-develop bacterial vaginosis.
In a fourth embodiment, the invention is directed to methods of diagnosing bacterial vaginosis in a subject, comprising obtaining a vaginal microbiome sample from a subject, and determining the metagenomic community state type (MgCST) of the sample, wherein when the MgCST of the subject is determined to be a MgCST associated with bacterial vaginosis, the subject is diagnosed as having bacterial vaginosis.
In certain aspects of this embodiment, a MgCST associated with bacterial vaginosis is one or more of MgCSTs 12 and 17-25.
In a fifth embodiment, the invention is directed to methods of treating bacterial vaginosis in a subject, comprising obtaining a vaginal microbiome sample from a subject, determining the metagenomic community state type (MgCST) of the sample, and administering a therapeutically effective amount of an antibacterial agent to the subject when the MgCST of the subject is determined to be a MgCST associated with bacterial vaginosis.
In certain aspects of this embodiment, a MgCST associated with bacterial vaginosis is one or more of MgCSTs 12 and 17-25.
In a sixth embodiment, the invention is directed to methods of preventing bacterial vaginosis in a subject, comprising obtaining a vaginal microbiome sample from a subject, determining the metagenomic community state type (MgCST) of the sample, and administering a prophylactically effective amount of an antibacterial agent to the subject when the MgCST of the subject is determined to be a MgCST associated with bacterial vaginosis.
In certain aspects of this embodiment, a MgCST associated with bacterial vaginosis is one or more of MgCSTs 12 and 17-25.
The methods of the present invention are each based on determining the metagenomic community state type (MgCST) of the vaginal sample obtained from a subject.
Samples may be obtained from the subject via any suitable means including, but not limited to, self- or clinician-collected vaginal swabs or clinician-collected cervicovaginal lavage.
Once obtained, the sample is placed into preservation buffer (RNALater, C2 Buffer) and frozen at −20° C. or −80° or flash frozen at −80° C. Samples undergo initial processing that includes DNA extraction. Once processed, samples are subject to metagenomic library preparation. The resulting data is then subject to high-through whole genome shotgun sequencing. The final step is assigning or identifying the sample as mgCST 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26 or 27.
The required inputs are direct outputs from VIRGO and include the taxonomic abundance table (“summary.Abundance.txt”) and gene abundance table (“summary.NR.abundance.txt”). It is imperative that taxonomic and gene column headings match those output by VIRGO. The expected output is a count table with samples as rows, taxa as columns, and counts normalized by gene length as values. Additional columns indicate the sample mgCST classification and the Yue-Clayton similarity score for all 27 mgCSTs. A heatmap is also produced showing taxon relative abundances in samples, where samples are labeled with assigned mgCSTs Substantial differences may indicate either an incongruence in taxonomic or gene names or the need for an additional mgCST. The classifier is contained in an R script, which is available at https://github.com/ravel-lab/mgCST-classifier. MgSs and mgCST classifications were robust at sampling depths greater than 100,000 reads per sample.
As can be seen, a two-step classifier that assigns metagenomic subspecies and mgCSTs was developed and validated, and it is designed to work in concert with a vaginal non-redundant gene database (VIRGO). This classifier facilitates reproducibility and comparisons across studies.
In each of the relevant embodiments and aspects of the invention as summarized herein, the methods of the invention may further comprise determining one or more of (a) age of the subject, (b) race of the subject, (c) vaginal pH of the subject, and (d) Gram stain assessment of the subject.
In each of the relevant embodiments and aspects of the invention as summarized herein, the methods of the invention may further comprise determining one or more metagenomic subspecies (mgSs) cluster in the sample.
In each of the relevant embodiments and aspects of the invention as summarized herein, the MgCST may be determined by a two-step classifier that assigns one or both of MgSs and mgCSTs. The classifier may be based on mgS composition or abundance, or both. The classifier may use a vaginal non-redundant gene database (VIRGO).
In each of the relevant embodiments and aspects of the invention, the subject is a human, a non-human primate, horse, cow, goat, sheep, a companion animal, such as a dog, cat or rodent, or other mammal.
As described herein, the present invention is directed to methods of treating bacterial vaginosis in a subject and methods of preventing bacterial vaginosis in a subject, among other equally important goals.
In these methods of treatment and prevention, any antibacterial agent that is therapeutically effective (in methods of treatment) and/or prophylactically effective (in methods of prevention) may be used in the method. Suitable antibacterial agents include, but are not limited to, metronidazole or clindamycin.
The amounts and means for administration of the antibacterial agents will depend on such factors as the MgCST and severity of the infection, demographic information related to the subject, and the selected antibacterial, among other factors. Therefore, dosages and means for administration will be determined by an attending physician or other medical professional.
Study cohorts. Raw metagenomic data from 1,890 vaginal samples were used in this study. This included publicly available metagenomes including those used in the construction of the vaginal non-redundant gene database, VIRGO (virgo.igs.umaryland.edu, n=342) [29], the University of Maryland Baltimore Human Microbiome Project (UMB-HMP, n=677, PRJNA208535, PRJNA575586, PRJNA797778), the National Institutes of Health Human Microbiome Project (NIH HMP, n=174, phs000228), metagenomes from Li et al. [62] (n=44, PRJEB24147), the Longitudinal Study of Vaginal Flora and Incident STI (LSVF, n=653, dbGaP project phs002367). All samples in LSVF (n=653) and some in UMB-HMP (n=20) had clinical diagnosis information about Amsel-BV. Amsel-BV was diagnosed based on the presence of 3 out of 4 Amsel's criteria [21] and symptomatic Amsel-BV was diagnosed when a woman reported symptoms upon questioning [58]. At the time of these studies, gender identity information was not collected. All women responded to recruiting materials which included “women” or “woman”. In addition, individuals are referred to as women in previous publications, thus individuals are referred to herein as “woman” or “women” to maintain consistency.
Sequence Processing and Bioinformatics. Host reads were removed from all metagenomic sequencing data using BMTagger and the GRCh38 reference genome, and reads were quality filtered using trimmomatic (v0.38, sliding window size 4 bp, Q15, minimum read length: 75 bp) [63]. Metagenomic sequence reads were mapped to VIRGO using bowtie (v1; parameters: —p 16—1 25—fullref—chunkmbs 512—best—strata—m 20—suppress 2,4,5,6,7,8), producing a taxonomic and gene annotation for each read. Samples with fewer than 100,000 mapped reads were removed from the analysis (n=59). The number of reads mapped to a gene was multiplied by the read length (150 bp) and divided by the gene length to produce a coverage value for each gene. Conserved domain and motif searches were performed with CD-SEARCH and the Conserved Domain Database (CDD), using an e-value threshold of 10−4. The taxonomic composition table generated using VIRGO was run through the vaginal CST classifier VALENCIA [11].
Metagenomic Subspecies. For each species, a presence/absence matrix was constructed from a metagenome which included all genes with at least 0.5× average coverage after normalizing for gene length. Metagenomic subspecies were generated for species present (>75% estimated median number of genes encoded in reference genomes from the Genome Taxonomy Database [64]; data not shown) in >20 samples using binary gene counts and hierarchical clustering with Ward linkage of sample Jaccard distances calculated using the vegdist function from the vegan package (v2.5-5) in R (v. 3.5.2). MgSs were defined using the dynamic hybrid tree cut method (v.1.62-1) and minClusterSize=2 [66]. Heatmaps of gene presence/absence were constructed for each species using the gplots package heatmap.2 function [67] (data not shown). MgSs were tested for associations with low species coverage using logistic regression in which the mgSs was the binary outcome and the log10-transformed coverage of the species was the predictor. Tests were done at the participant level; if a participant had more than one sample and both samples were the same mgSs, only one sample was used, but if the mgSs differed, the samples were included in each. P-values were adjusted for multiple comparisons using Bonferroni correction. Significant dependance was observed in multiple mgSs of Atopobium vaginae, Gardnerella vaginalis, and Lactobacillus iners (data not shown). For these species, the classifiers were built using samples with ≥5.5e5 reads. The cluster stability of each mgSs was evaluated using the clusterboot function of the R package fpc (v 2.2-10) [68, 69] and 100 bootstraps.
Metagenomic CSTs. Using gene abundance information (normalized by gene length and sequencing depth), the proportion of vaginal species in each sample was estimated. For species that were sub-divided into mgSs, the mgSs proportion in a sample was equal to the proportion of the species in that sample. When a species was present in a sample but with too few genes present to constitute a mgSs (<75% estimated median number of genes encoded in reference genomes), it was labeled as “mgSs 0”. Samples in the resulting compositional table were hierarchically clustered using Jensen-Shannon distances. Clusters were defined using the dynamic hybrid tree cut method (v.1.62-1) [66]. A heatmap for metagenomic CSTs was produced using the gplots package heatmap.2 function (
Statistical analysis of the association between mgCST and age, race, Nugent score, vaginal pH, and BV. For those samples with race, age category, Nugent score category, vaginal pH category, or Amsel-BV diagnoses information (Table 1), the Cochran-Mantel-Haenszel Chi-Squared Test (CMH test, “mantelhaen.test” from the samplesizeCMH R package, v 0.0.0, github.com/pegeler/samplesizeCMH) was used to determine associations with mgCSTs while accounting for source study (the confounding variable). The CMH test evaluates associations between two binary variables (i.e., “mgCST X or not” and “high Nugent score or not”). Tests were done at the participant level; if a participant had more than one sample and both samples were the same mgCST, only one sample was used, but if the mgCSTs differed, the samples were included in each.
Statistical analysis of the association between mgSs, gene clusters, and BV. For both L. iners, associations between mgSs and Amsel-BV were evaluated at the participant level using chi-square analyses which compared the proportion of Amsel-BV positive to negative participants within an mgSs to those in all participants containing any L. iners. For both L. iners and G. vaginalis, associations between gene clusters and Amsel-BV were evaluated at the participant level using chi-square analyses which compared the proportion of Amsel-BV positive to negative participants within a gene cluster to those in all participants containing any L. iners or G. vaginalis gene cluster, respectively. Gene cluster presence in a sample was defined as the presence of ≥30% of genes in a gene cluster.
Longitudinal Stability and Shannon Diversity of L. crispatus mgSs. For participants in the HMP cohort that contributed multiple samples with at least one sample assigned to an L. crispatus mgSs, the Yue-Clayton θ was measured to define microbiota stability for each participant [70]. Here, 16S rRNA gene amplicon sequencing-based CSTs from all samples from a participant [60] were used to produce a reference centroid, and then each sample was compared to that reference (Yue-Clayton's θ). The mean θ for each participant represented the overall microbiota compositional stability. Values closer to 1 indicate high compositional stability. The number of strains in each sample was compared between mgSs using the Wilcoxon signed rank test. Shannon's H diversity index was calculated for each sample using the vegan package diversity function. Shannon Diversity was compared between mgSs using the Wilcoxon signed rank test.
Estimating the number of L. crispatus strains. The number of L. crispatus strains in a mgSs was estimated using a pangenome accumulation curve which was generated by mapping the gene contents of publicly available isolate genome sequences (data not shown) to VIRGO (blastn, threshold: 90% identity, 70% coverage). Bootstrap (n=100) combinations of N (N=1 to 61) isolates were selected and the number of unique L. crispatus Vaginal Orthologous Groups [VOGs; provided in the VIRGO output [29]] encoded in their genomes was determined. An exponential curve relating the number of isolates to the number of VOGs detected was then fit to the resulting data and produced the equation: Y=2057N0.14 where Y is the number of L. crispatus VOGs detected, and N is the estimated number of strains. This equation was then used to estimate the number of L. crispatus strains detected in each metagenome based on the observed number of L. crispatus VOGs in each metagenome. The number of strains in each sample was compared between mgSs using the Wilcoxon signed rank test.
Construction of the random forests for mgSs classification. Random forests were constructed for classification of mgSs using the R package randomForestSRC v2.12.1R [71]. For mgSs, a random forest was built for each species (n=28) where the training data contained presence/absence values of genes. Gene presence was defined as above for mgSs. Random forest classification analysis was implemented with all predictors included in a single model. For each mgSs random forest, predictors were all genes in a species. Ten-fold cross-validation (90% of data as training, 10% as testing) was performed wherein each training set was used to build and tune a random forest model using tune “tune.rfsrc”. A random forest model using optimal parameters was then used to predict mgSs classifications for the test set and out-of-bag error estimates (misclassification error) are reported. The overall misclassification error is the average misclassification error from each fold and the “correct” assignment is based on original hierarchical clustering assignment. The final models included all data and the optimal tuning parameters determined for that species. For mgSs assignment, the mgSs which provides the highest probability (based on the proportion of votes in the tree) is used for assignment. The user is provided both the assignment, as well as the probability of that assignment as a measure of confidence.
Construction of the a nearest centroid classifier for mgCSTs. Using mgCSTs as defined above, reference centroids were produced using the mean relative abundances of each mgSs in a mgCST. For classification, the similarity of a sample to the reference centroids is determined using Yue-Clayton's θ [70]. Compared to Jensen-Shannon, the Yue-Clayton θ measure depends more on the high relative abundance metagenomic subspecies than those at lower relative abundances. Samples are assigned to the mgCST to which they bear the highest similarity and the degree of similarity to that mgCST can be taken as a measure of confidence in the assignment. Ten-fold cross validation was applied wherein each training set was used to build “reference” centroids and each test set was used for assignment. The misclassification error was determined by subtracting the number of correct assignments (based on original hierarchical clustering assignment) divided by the total number of assignments from 1. The overall misclassification error is the average of misclassification error from each fold.
Running the mgCST classifier. The required inputs are direct outputs from VIRGO [29] and include the taxonomic abundance table (“summary.Abundance.txt”) and gene abundance table (“summary.NR.abundance.txt”). It is imperative that taxonomic and gene column headings match those output by VIRGO. The expected output is a count table with samples as rows, taxa as columns, and counts normalized by gene length as values. Additional columns indicate the sample mgCST classification and the Yue-Clayton similarity score for all 27 mgCSTs. A heatmap is also produced showing taxon relative abundances in samples, where samples are labeled with assigned mgCSTs. Substantial differences may indicate either an incongruence in taxonomic or gene names or the need for an additional mgCST. The classifier is contained in an R script, which is available at github.com/ravel-lab/mgCST-classifier. The classifier is a random forest model+Yue-Clayton theta models. Thus, the classifier requires the training sets available here: https://figshare.com/account/home#/projects.
Validation of the mgCST classifier in external datasets. Three external, publicly available vaginal metagenome datasets were used to validate the generalizability of mgCST assignments beyond the training dataset. ENA PRJEB34536 [72]. NIH PRJNA576566 [73], and NIH PRJNA779415 [74]. Briefly, host reads were removed using BMTagger [75] and the GRch38_p12 human reference genome. Ribosomal RNA reads were removed using sortmerna [76] and reads were quality filtered using fastp [77] with a minimum length of 50 bp, and a mean quality of 20 in a sliding window of 4 bp. Remaining reads were processed through VIRGO using default settings (virgo.igs.umaryland.edu) [29] and summary tables were used to assign mgSs and mgCSTs with the mgCST Classifier. Generalizability of the mgSs and mgCST reference datasets is illustrated using probability of assignment by the mgSs random forest classifier and the Yue-Clayton similarity scores of assigned mgCSTs. All bioinformatic and statistical analyses are available in R Markdown notebooks (File_S7 and File S8).
Metagenomic Community State Types (mgCST) of the Vaginal Microbiome.
The within-species bacterial genomic diversity was evaluated in 1,890 vaginal metagenomes of reproductive-age participants from 1,024 mostly North American women (98.7% of samples) (Table 1). Vaginal metagenomes derived from five cohort studies as well as metagenomes generated to build the vaginal non-redundant gene database (VIRGO, [29]) were used to construct mgCSTs as described above. In total, 135 metagenomic subspecies (mgSs) from 28 species were identified by hierarchical clustering of species-specific gene presence/absence profiles (data not shown). Subsequent hierarchical clustering of samples based on mgSs compositional data produced 27 mgCSTs (Table 2). Cluster stability was ≥0.75 for most mgCSTs (Table 2). MgCSTs consisted of mgSs from commonly observed vaginal species including L. crispatus (mgCST 1-6, 19% of samples), L. gasseri (mgCST 7-9, 3% of samples), L. iners (mgCST 10-14, 23% of samples), L. jensenii (mgCST 15 and 16, 4.6% of samples), “Ca. Lachnocurva vaginae” (mgCST 17-19, 7.5% of samples), G. vaginalis (mgCST 20-25, 36.3% of samples) and Bifidobacterium breve (mgCST 26, 0.74% of samples) (
Database (VIRGO, virgo.lgs.
.edu) [29], the University of Maryland and Baltimore Human Microbiome Project (UMB-HMP), PRJNA208535, PRJNA575586, PRJNA797778), the National Institutes of Health, Human Microbiome Project (NIH-HMP, pbs000228), Li et al. [60] (PRJEB24147], the Longitudinal Study of Vaginal Flora and Incident STI (LSVP, dbGaP project pbs002367).
indicates data missing or illegible when filed
of Samples from
Lactobacillus
1
Lactobacillus
1
Lactobacillus
2
Lactobacillus
2
Lactobacillus
3
Lactobacillus
3
Lactobacillus
4
Lactobacillus
4
Lactobacillus
5
Lactobacillus
5
Lactobacillus
6
Lactobacillus
6
Lactobacillus
1
Lactobacillus
1
Lactobacillus
2
Lactobacillus
2
Lactobacillus
3
Lactobacillus
3
Lactobacillus
1
Lactobacillus
1
Lactobacillus
2
Lactobacillus
2
Lactobacillus
3
Lactobacillus
3
Lactobacillus
5
Lactobacillus
5
Lactobacillus
6
Lactobacillus
6
Lactobacillus
Lactobacillus
Lactobacillus
Lactobacillus
Gardnerella vaginalis
Gardnerella vaginalis
Gardnerella vaginalis
Gardnerella vaginalis
Gardnerella vaginalis
4
Gardnerella vaginalis
Gardnerella vaginalis
Gardnerella vaginalis
Gardnerella vaginalis
Gardnerella vaginalis
Gardnerella vaginalis
indicates data missing or illegible when filed
Vaginal mgCSTs and Demographics.
Race and Age. Race information was available for 1,441 samples reported by 858 women. Most women identified as either Black (71%) or White (20%), and the remainder as Asian (6.3%), Hispanic (2.2%), or other (<1%) (Table 1). Age was also reported for 1,623 samples from 897 individuals and ranged from 15-45 years old. After adjusting for between-cohort heterogeneity, certain races and age categories were associated with mgCSTs (
Nugent Scores and Vaginal pH. Of the 968 women for which Nugent scores were available, 48% had low Nugent scores (0-3), 20% had intermediate scores (4-6), and 32% had high scores (7-10) (Table 1). The Nugent scoring system is a Gram stain scoring method for vaginal smears that helps diagnose bacterial vaginosis. A vaginal smear is plated on a microscopic slide and examined for the presence of three types of bacteria: Lactobacillus, Gardnerella, and curved gram rods. Each bacteria type is scored based on the number of bacteria counted, and the three scores are added together for a total score between 0 and 10. A score of 0-3 is considered negative for BV, 4-6 is intermediate, and 7-10 is positive for BV.
Vaginal pH was also available for 979 women and of these 31% had low pH <4.5, and 69% had high pH ≥4.5 (Table 1). Both Nugent score and vaginal pH were associated with mgCSTs after adjusting for between-cohort heterogeneity (
Amsel-BV and Vaginal Symptoms. Of 627 women, each with a vaginal sample and same-day clinical examination data (n=607 from LSVF cohort, n=20 from HMP cohort), 40.3% had asymptomatic Amsel-BV and 5.5% had symptomatic Amsel-BV diagnoses. Twelve percent of Amsel-BV cases were symptomatic. Diagnosis of Amsel-BV was associated with mgCSTs (
Functional Potential of mgCSTs and Metagenomic Subspecies.
L. crispatus mgCSTs differ by species diversity, stability, and the potential to produce D-lactic acid. L. crispatus is known to produce both L- and D-lactic acid, which acidifies the vaginal environment and confers protective properties [4, 10, 31, 32]. Metagenomic analyses revealed differences among L. crispatus mgSs. First, VIRGO identified two L- and two D-lactate dehydrogenase genes in L. crispatus. All genes were present in L. crispatus mgSs except for mgSs 2. MgSs 2 was missing a D-lactate dehydrogenase gene (V1806611) that has 96.1% identity to a functionally validated ortholog, P30901.2 (
L. iners metagenomic subspecies are associated with Amsel-BV diagnoses. The role of L. iners in the vaginal microbiome is not fully understood because it has been implicated in both healthy and BV states [34]. Sixty-five percent of samples containing L. iners mgSs 4 were positive Amsel-BV cases which is significantly greater than the proportion of cases harboring any L. iners mgSs (45.8%, p=1.1e−6,
Next, whether L. iners genes were associated with Amsel-BV was evaluated. Most samples in L. iners mgSs 4 contained genes from cluster 6 (yellow gene cluster,
As previously mentioned, positive Amsel-BV diagnoses were common in G. vaginalis mgCSTs 20, 22, and 24, while mgCST 23 contained more relatively more Amsel-BV negative samples than positive Amsel-BV (
Automated Classification of mgCSTs Using Random Forest Models.
Random forest models were built for each of the 135 mgSs identified and used to perform mgSs assignments as described above. There was good concordance between mgSs assigned by hierarchical clustering of Jensen-Shannon distances and random forest based assignments with κ>0.8 for most species (data not shown). Ten-fold cross validation of the classifier revealed the misclassification error for mgSs assignment ranged from 0-30% (data not shown). The error estimates for most major vaginal taxa were near or less than 10%, with L. gasseri having the lowest (2.2%). L. iners consistently provided higher misclassification error estimates (20%) regardless of attempts to fine-tune the model and was likely the result of high genetic homogeneity between and heterogeneity within L. iners mgSs. Following assignment of mgSs, mgCSTs were assigned using the nearest centroid classification method, as previously used for vaginal taxonomy-based community state type assignments [11]. Good concordance was observed between mgCSTs assigned by hierarchical clustering of Jensen-Shannon distances and nearest centroid based assignments (κ=0.78, data not shown). Ten-fold cross validation of centroid classification revealed the mean classification error was 9.6%, with some mgCSTs classified more accurately than others (data not shown).
Three external, publicly available metagenomic datasets illustrate generalizability of the mgCSTs assignment (
The source code for the mgCST classifier is an R script and is available at https://github.com/ravel-lab/mgCST-classifier and uses direct outputs from VIRGO.
Bacterial Vaginosis (BV) and Recurrence (rBV).
As described above, 27 mgCSTs were characterized from 1,898 samples and each was dominated by different strain combinations of commonly observed vaginal bacteria (
(i) Metronidazole treatment does not prevent recurrence. Certain BV-like mgCSTs are prognostic of recurrent Amsel-BV, namely CLV 19 and Gv 22. A retrospective case-control analysis within the Longitudinal Study of Vaginal Flora [24], a longitudinal observational study recruited reproductive age, nonpregnant women presenting for routine health care visits at 1 of 12 health department clinics in Birmingham, Alabama from August 1999 to February 2002, was performed. Clinical evaluations, participant surveys, and cervicovaginal lavages (CVL) were collected at each visit. Bacterial Vaginosis (BV) was diagnosed by the observation of 3 of 4 Amsel's criteria [21]. In the analysis, the primary outcomes of interest were BV resolution (controls, n=232) or recurrence (cases, n=402) after an index BV event wherein a participant received a clinical diagnosis of Amsel-BV during the study period, irrespective of patient-reported BV signs (foul odor, vaginal discharge, itching, or burning) (
In this multinomial model accounting for known BV risk factors: race [83], prior history of BV [84,85], and smoking status [86], as well as personal hygiene and sexual practices during the interval prior to the outcome, receiving metronidazole treatment was not associated with BV resolution and, in fact, trended towards lower odds of resolution compared to recurrence (OR: 0.6, 95% CI: 0.3-1.1), emphasizing the long-term ineffectiveness of the current CDC standard-of-care guidelines for BV [58,84,87,90].
Comparatively, certain mgCSTs at the index visit, namely CLV 19 and Gv 22, were associated with 10 and nearly 20-fold increased odds of BV recurrence compared to Lactobacillus-dominated microbiomes, respectively, while Gv mgCSTs 23 and 25 were not related to recurrence. These findings remain significant when accounting for hygiene practices and sexual behaviors following the index visit and imply that reinfection through sexual habits is unlikely to explain these associations. These data highlight for the first time that mgCST-based definitions of BV can identify recurrent BV, and this presents an opportunity to develop diagnostic tests and novel treatments tailored to prevent recurrence. Leading hypotheses about the recurrence of BV include metronidazole resistance through ferredoxin/ferredoxin-NADP reductase (FNR) and nitroimidazole reductase, the production of polymicrobial biofilms, and sexual partner reinfection [88-92]. Thus far, genomic investigations have revealed that Gv 22 differs from other Gv mgCSTs in its potential to bind host epithelia and has added potential for mucin degradation through Gardnerella sp. 11 and 13 that are unique to mgCST 22 (
(ii) Functionally distinct vaginal microbiomes predict BV recurrence and resolution. Using traditional amplicon-sequencing based characterizations, the odds of recurrence were 6-fold greater than resolution in all non-Lactobacillus predominated communities compared to Lactobacillus abundant microbiota (
(iii) Microbial mechanisms associated with BV recurrence. Leading hypotheses about the recurrence of BV include metronidazole resistance through ferredoxin/ferredoxin-NADP reductase (FNR) and nitroimidazole reductase, the production of polymicrobial biofilms, and sexual partner reinfection [88-92], though the mechanisms of these hypotheses have yet to be ascertained. Metagenomic data were mapped to VIRGO2, an updated version of the vaginal non-redundant gene database [94]. First, there are unique compositions of Gardnerella species in mgCSTs 20-25 and BV was associated with those containing a greater diversity of Gardnerella species [35], though the functional repertoires of these communities remain unevaluated. The metagenomic data presented herein already provides some novel insights: the presence of additional Gardnerella species yields added virulence factors such as vaginolysin, hemolysin, the muralytic enzyme precursor Rpf2, a vancomycin resistance protein, and glycogen debranching enzymes pullulanase and oligo-16-glucosidase [82]. Next, a COG-directed analyses identified “Extracellular Structure” proteins in S. vaginalis, S. amnii, and Dialister associated with recurrence. Specifically, adhesin proteins homologous to those of Burkholderia pseudomallei, an intracellular agent response for melioidosis, were identified [95,96], and the trimeric autotransporter YadA, a fibronectin-binding adhesin required for epithelial cell invasion by enteropathogenic Yersinia enterocolitica [97]. Hypotheses of the role of biofilms in BV exist [90], but critical evidence of such bacterial adherence in the vagina is lacking.
Further Studies on Bacterial Vaginosis Recurrence (rBV) and Persistence.
Rationale & Hypotheses. Because Clv 19 and Gv 22 were associated with BV recurrence and it is hypothesized this is related to the persistence of specific species or strains in these communities (for example through biofilm formation), this study will address a significant knowledge gap by testing the following hypotheses.
Study design. To test these hypotheses, metagenomic sequencing data will be generated from LSVF cervicovaginal lavages from the 3- and 6-month follow-up visits of participants that experienced BV recurrence (n=530 samples, the index BV event data are already generated) as well as the index, 3- and 6-month follow-up visits from 196 participants that experience BV persistence (n=588 samples).
Description of the study cohort. This aims seeks to utilize archived samples and data originally collected in the NIH's Longitudinal Study of Vaginal Flora (LSVF, Z01-HD002535) in which 3,620 reproductive-age women were followed for 12 months and assessed quarterly with clinical examinations between August 1999 and February 2002, yielding 13,591 clinical visits. At each visit, participants underwent a pelvic exam and were surveyed on symptoms, demographics and behaviors. For the exam, the clinician placed a speculum, unlubricated or lubricated with water, in the vagina. The quality and consistency of vaginal discharge was described, if present. A vaginal swab was touched to a ColorpHast stick and pH was read. A traditional wet mount (microscopy) was performed to assess for clue cells and whiff test as per Amsel's clinical criteria for the diagnosis of BV. Finally, cervicovaginal lavage (CVL) was collected, aliquoted and stored for further testing at −80° C. Patients with asymptomatic BV are defined as those who did not report vaginal symptoms on direct questioning, but met at least 3 out of 4 Amsel's criteria based on clinician exam (thin, homogenous vaginal discharge, a vaginal pH >4.5, clue cells >20% of epithelial cells and/or a positive whiff test). Patients with symptomatic BV met at least 3 out of 4 Amsel's criteria and reported vaginal symptoms (discharge, vaginal irritation, itching, burning, foul odor or other) when questioned and were treated with standard of care (metronidazole or clindamycin). STIs were tested for at each visit as follows: N. gonorrhea by culture, C. trachomatis by LCR, and T. vaginalis by wet mount. STIs will be controlled for in analyses. All participants were HIV-negative at enrollment. At each visit, participants underwent a detailed interview with a female interviewer.
Metagenomic Sequencing. Accounting for the available data from the preliminary study (index BV event in BV recurrence, from NIH K01), 1,118 additional samples will require metagenomic sequencing. DNA will be extracted using a validated procedure for CVL specimens (˜700 μl) that consistently provides 10-30 μg of high-quality DNA adequate for metagenomic analyses. The aim is to obtain at least 40 million high quality reads for each sample to ensure cost-effective, sufficient coverage for mgCST classification and genome assemblies. Metagenomic libraries will be sequenced on a NovaSeq 6000 platform on an S4 Flowcell at 60 samples per Flowcell.
Sequence processing. Human reads will first be removed from raw sequencing reads using BMTagger, with the GRCh38 human genome as the reference. The remaining non-human metagenomic reads will then be quality filtered using fastp (v.0.21) to remove polyG tails, reads with a minimal length of at least 75 bp, and low-quality reads (—g—1 75—3—W 4—M 20). After quality filtering, the remaining reads will be mapped to the vaginal non-redundant gene database, VIRGO, and mgCSTs assigned with the classifier. Quality-filtered reads will also be assembled into contigs using metaSPAdes. Paired-end reads, as well as unpaired reads, will be passed to the assembler, which will be run with a k-mer range of 21, 33, and 55. The resulting contigs will then be binned using three different tools: MetaBAT, MaxBin, and MetaDecoder. For MetaBAT and MetaDecoder, contigs will be first indexed and reads aligned to the contigs with Bowtie2 (bowtie2-build, bowtie2, respectively). With SAMtools, the alignment results will be output in SAM format, filtered to include only correctly paired reads (samtools view —f 0x2), converted to sorted BAM files (samtools sort), and indexed (samtools index). Next, binning will be performed using MetaBAT, with a maximum edge threshold of 1000 (—maxEdges 1000). Similarly, the indexed and aligned contigs will be the input for MetaDecoder. Contig coverage will be calculated from the aligned SAM files using MetaDecoder's coverage tool, and single-copy marker genes will be mapped to the contigs using the seed function. Finally, contigs will be clustered into bins using metadecoder cluster, with a minimum contig length of 1,000 base pairs. MaxBin will be used to bin contigs longer than 1,000 base pairs (-min_contig_length 1000). The bins generated by MetaBAT, MaxBin, and MetaDecoder will be refined using metaWRAP. Binning refinement will be conducted by setting the minimum completion threshold to 70% (-c 70) and a maximum contamination threshold of 5% (-x 5). The quality of the refined bins will be assessed using CheckM, and taxonomic classification will be performed on the final set of bins using GTDB-Tk (v.2.0.0).
Statistical Analyses. H1: Identifying mgCSTs associated with BV persistence. Applying the same methods used to identify mgCSTs associated with recurrent BV (
Lactobacillus crispatus/gasseri/jensenii
Lactobacillus iners
Lactobacillus iners
Lactobacillus iners
Lactobacillus iners
Lactobacillus iners
Gardnerella
Gardnerella
Gardnerella
Gardnerella
Gardnerella
Gardnerella
H2: Determining the rates of new BV infections among recurrent and persistent BV. Among participants that experienced either recurrent or persistent BV (
H3: Quantifying the proportions of strains shared in recurrent and persistent BV. To better understand the temporal dynamics of the microbiome during persistent and recurrent BV, the degree of strain sharing between BV events from individuals will be quantified with recurrent and persistent BV quantified using InStrain (
While CSTs, based on 16S rRNA gene amplicons, provide insight into the species composition of the vaginal microbiota, metagenomic CSTs (mgCSTs), based on whole genome shotgun sequencing, offer additional information on the functional potential of the vaginal microbiome.
In a study, MgCSTs were assigned in a subset of a cohort (n=708 samples, 1:1 case to matched control). In comparison to a reference L. crispatus-dominated mgCST 1, G. vaginalis-dominated mgCST 22 presented 3.5-fold higher odds of STI acquisition (aOR: 3.53, 95% CI: 1.8-7.9, p=0.001,
Recent findings that motivated development of mgCST classification included studies showing multiple strains of the same species are commonly observed in the vaginal microbiome [29], and that samples can be clustered into metagenomic subspecies (mgSs) defined by unique strain combinations represented by species-specific gene sets, and thus unique sets of functions. These critical observations led to the conceptualization of a vaginal microbiomes classification based on their mgSs compositions and abundance, and thus defined by both species' composition and functions, i.e., metagenomic community state types (MgCSTs). MgCSTs describe vaginal microbiomes through a new lens, one that includes both compositional and functional dimensions.
L. iners-predominated vaginal microbiota have been associated with increased risks of experiencing bacterial vaginosis (BV) [39, 40]. Longitudinal observational prospective studies support this conclusion and present several critical findings: 1) L. iners is often detected at low to medium abundances during episodes of BV, and L. iners commonly dominate the vaginal microbiota after metronidazole treatment for BV and, 2) L. iners predominated vaginal microbiota are more prevalent prior to incidence of BV [41, 42]. The frequency of L. iners predominated vaginal microbiota observed was high in Black and Hispanic women (31.4% and 36.1%, respectively), both of whom experience a disproportionate prevalence of BV in the US, with reported rates of 33.2% and 30.7%, respectively (compared to 22.7% and 11.1% in White and Asian women) [43]. Interestingly, L. iners predominated vaginal microbiota were even more frequent in North American Asian women in this study, as was shown previously by Ravel et al. [1].
MgCST classification provides insight into this contradiction to prevailing dogma regarding L. iners and increased risk of BV. L. iners mgSs 4 was associated with Amsel-BV, while L. iners mgSs 3 (predominates mgCST 12) was significantly associated with negative BV diagnoses and was most frequently observed in Asian women. This is the first evidence of genetically distinct combinations of L. iners strains (mgSs) in healthy versus BV-like states. This critical finding points to the possibility of beneficial L. iners-dominated microbiomes that had not been evidenced previously.
The analyses presented herein also identified a specific set of L. iners genes associated with positive Amsel-BV diagnoses. Macklaim et al. 2018 reported marked differences in L. iners gene expression between two control patients versus two diagnosed with BV, including increased CRISPR-associated proteins gene expression in BV samples [44]. However, the mgSs analysis of L. iners presented herein indicates that it is not simply alterations in gene expression of a common gene pool that differentiates BV from non-BV microbiomes, but L. iners mgSs that also differ. Microbiomes from women with Amsel-BV diagnoses were enriched for host immune response evasion and host-colonization functions by L. iners. For example, serine/threonine-protein kinases (STPKs) contribute to resistance from phagocytosis by macrophage, invasion of host cells including epithelia and keratinocytes, antibiotic resistance, disruption of the NF-κB signaling pathway, and mucin binding [45]. Bacteria attached to host cells (clue cells) is a hallmark of high Nugent scores (a bacterial morphology-based definition of bacterial vaginosis) and a criterion in Amsel-BV diagnoses [21, 25]. L. iners can appear as Gram-variable cocci (like G. vaginalis) or rods [46, 47], and the data presented herein suggest that certain strains of L. iners (specifically those containing gene cluster 6) may adhere to epithelial cells, contributing to the appearance of clue cells. In addition, epithelial cell adherence could make certain L. iners strains more difficult to displace in the vaginal environment and contribute to the common observation of L. iners following antibiotic treatment [48]. Interestingly, just like L. iners mgSs 4, mgSs of “Ca. Lachnocurva vaginae” were strongly associated with Amsel-BV and were also not found in the vaginal microbiomes of Asian women in this study. Along with the observation that Asian women in this study were more likely to have L. iners mgSs 3 than any other L. iners mgSs and were less burdened with Amsel-BV, it is hypothesized that selective pressures by the host environment may result in niche specialization by vaginal bacteria. Sources of selective pressure could relate to host-provided nutrient availability (e.g., mucus glycan composition), the host innate and adaptive immune system, the circulation of other species' mgSs in a population, or any such combination.
Several distinct mgCSTs associate strongly with Amsel-BV. Critically, these data support the need for an improved definition of BV and the importance of a personalized approach to treatment. “Ca. Lachnocurva vaginae” predominated mgCSTs were strongly associated with asymptomatic Amsel-BV and contained more high Nugent scores than other mgCSTs. Conversely, intermediate Nugent scores were most prevalent in G. vaginalis predominated mgCSTs, and only three of these six mgCSTs were associated with Amsel-BV, which suggests that not all G. vaginalis-dominated microbiomes are related to Amsel-BV. G. vaginalis contains vast genomic diversity, supporting a split into different genomospecies [38, 49, 50]. Because different genomospecies can co-exist, the data presented herein show that G. vaginalis predominated mgSs represent unique combinations of genomospecies and strains of these genomospecies. MgCSTs 20-22 contain high Gardnerella genomospecies diversity and were associated with positive Amsel-BV diagnoses in studies using qPCR or transcriptomic data to define Gardnerella species [49, 51-53]. The data presented herein corroborate these reports and further indicate in mgCSTs with higher numbers of Gardnerella genomospecies that there are more gene variants coding for virulence factors like cholesterol-dependent pore-forming cytotoxin vaginolysin and neuraminidase sialidase present, thus expanding functional redundancy of these enzymes and potentially contributing to the association with positive Amsel-BV diagnoses [54-56]. However, mgCST 24 which is comprised of G. vaginalis mgSs 4 (G. vaginalis and G. swidsinkii), has relatively lower Gardnerella genomospecies diversity and was highly associated with Amsel-BV and symptomatic Amsel-BV. Together these data suggest that enumeration and classification of Gardnerella genomospecies may prove to be an important diagnostic of different “types” of Amsel-BV which could inform treatment options. For example, it is possible that harboring more Gardnerella genomospecies may predict BV recurrence following metronidazole treatment, suggesting the need for a different approach to treatment. Alternatively, some Gardnerella genomospecies may be important and novel targets of therapy.
In the clinic, antibiotic treatment is recommended for BV diagnosis (generally a point-of-care test) only when the patient reports symptoms, which is estimated to occur in fewer than half of women with BV [24, 57, 58]. In research settings, both symptomatic and asymptomatic Amsel-BV can be evaluated. Indeed, in the observational research studies included in this analysis where Amsel criteria were evaluated along with whether participants reported symptoms or not, symptomatic Amsel-BV accounted for only 12% of Amsel-BV cases and 30% of these were in mgCST 24 (dominated primarily by Gardnerella swidsinkii and G. vaginalis). It is hypothesized that the high prevalence of BV recurrence post-treatment may be due to the heterogeneity in the genetic make-up of the microbiota associated with BV as revealed by mgCSTs. MgCSTs reduce this heterogeneity resulting in more precise estimates of risk. Furthermore, these findings highlight the potential importance of developing specialized treatments that target “types” of BV.
The mgCST framework can also be used to identify vaginal microbiomes that are associated with positive health outcomes. For example, mgCSTs predominated by different L. crispatus mgSs varied in their association with low Nugent scores, the number of L. crispatus strains present, and the longitudinal stability of communities. The vaginal microbiome can be dynamic [59-61]. Shifts from Lactobacillus to non-Lactobacillus predominated microbiota can increase the risk of infection following exposure to a pathogen. The present study identified L. crispatus mgCSTs with variable stability, suggesting that not all L. crispatus predominated microbiomes are functionally similar and may be differently permissive to infection. Those found to be associated with higher stability may reduce the window of opportunity for pathogens to invade. Microbiome stability may be related to both the diversity of other non-Lactobacillus members of the microbiome and/or the number of L. crispatus strains present. In any case, the present study shows that there is a range of protective abilities even among L. crispatus predominated communities. This information could be critical in selecting and assembling strains of L. crispatus to design novel live biotherapeutics products aimed to restore an optimal vaginal microenvironment.
To aid in further exploration, a validated classifier for both mgSs and mgCSTs is provided at github.com/ravel-lab/mgCST-classifier/blob/main/README.md.
While the invention has been described with reference to certain particular embodiments thereof, those skilled in the art will appreciate that various modifications may be made without departing from the spirit and scope of the invention. The scope of the appended claims is not to be limited to the specific embodiments described.
All patents and publications mentioned in this specification are indicative of the level of skill of those skilled in the art to which the invention pertains. Each cited patent and publication is incorporated herein by reference in its entirety. All of the following references have been cited in this application:
This invention was made with government support under Grant Numbers AI163413, AI136400, AI084044, AI083264, AI116799 and NR015495 awarded by National Institutes of Health. The government has certain rights in the invention.
Number | Date | Country | |
---|---|---|---|
63541969 | Oct 2023 | US |