The present invention relates to the field of anticancer treatment. In particular, the present invention concerns the role of the gut microbiota in the efficacy of treatments comprising administration of immune checkpoint inhibitors (ICI), in the treatment of renal cell cancer (RCC). The present invention provides “metagenomics-based gut oncomicrobiome signatures” (GOMS) at diagnosis prior to PD-1/PDL-1/PDL-2 blockade as novel predictors of response or resistance for the best clinical outcome and at 6 months of therapy of a renal cell cancer. The present invention also provides theranostic methods to identify patients in need of a bacterial compensation treatment before receiving an ICI and/or during the therapy with an ICI.
Major conceptual advances in cancer biology have been made over the past decade. The understanding that immune responses are routinely generated against tumor-specific neoantigens expressed by cancer-associated mutations and commonly dampened by the immunosuppressive tumor microenvironment (TME) has been seminal to the development of effective immunotherapies aimed at provoking immune control against tumor progression (Sharma and Allison, 2015a, 2015b). Progress in cancer immunotherapy has resulted in remarkable success in the treatment of a variety of hematological and solid metastatic malignancies such as melanoma, lung, bladder, kidney, Hodgkin's lymphoma, acute B cell leukemia, liver, Merkel-cell carcinoma and head and neck tumors, amongst others (Borghaei et al., 2015; Nghiem et al., 2016; Robert et al., 2011; Rosenberg et al., 2016). To date, therapies that block inhibitory signaling pathways expressed by T lymphocytes (so called “immune checkpoints”) during the initial priming phase (in draining lymph nodes) or the effector phases (in tumor beds) of adaptive anticancer immune responses have demonstrated the greatest clinical benefit in overall survival (Ribas et al., 2016; Topalian et al., 2012). The prototypic example for this success has been the use of monoclonal antibodies (mAbs) targeting PD-1 (expressed by activated/exhausted T cells) or its ligand PD-L1 (commonly expressed by cancer cells or cells of the TME) (Pardoll, 2015). By releasing these molecular brakes, such mAbs reinstate the anticancer adaptive arm of the immune response.
While PD-1 blockade represents the most effective first line therapy in B-RAF wild type melanoma and the best option in second line unresectable lung carcinomas, approximately 60-70% of tumors exhibit primary resistance to this therapeutic strategy. Primary resistance has been attributed to low tumor mutational burden and poor intrinsic antigenicity of tumor cells (Riaz et al., 2016; Rizvi et l., 2015), defective antigen presentation and priming phase (Spranger et al., 2015), limited intratumoral infiltration related to exhausted T cell functions (Smyth et al., 2016), and metabolic immunosuppressive pathways related to adenosine and indoleamine, 2, 3-dioxygenase (Koyama et al., 2016; Smyth et al., 2016). Similarly, primary resistance mechanisms to CTLA-4 blockade have also been elucidated. For example, melanoma presenting with loss of IFN-γ signaling lack response to ipilimumab (Gao et al., 2016).
More recently, secondary resistance mechanisms to chronic inhibition of PD-1 receptors have been reported in approximately 25% of melanoma patients (Koyama et al., 2016; Ribas et al., 2016; Zaretsky et al., 2016). Four out of 74 melanoma patients who progressed for a median follow up of 1.8 years despite continuous therapy with pembrolizumab developed lesions presenting with loss-of-function mutations in janus kinases JAK1 or JAK2, thus leading to decreased STAT1 phosphorylation and reduced sensitivity to the antiproliferative effects of IFNs or alternatively, a mutation within the gene encoding β-2 microglobulin, preventing folding and transport of MHC class I molecules to the cell surface for T cell recognition of tumor cells (Zaretsky et al., 2016). In addition to these darwinian natural selection of genetic (or epigenetic) heritable traits, other acquired resistance mechanisms have been reported in mice. Predominately, adaptive immune resistance resulting from the IFN-γ-inducible expression of PD-L1 (Pardoll, 2012; Taube et al., 2012), a primary ligand of PD-1, TNF-induced loss of antigenic variants (Landsberg et al., 2012) as well as TCR-dependent upregulation of additional exhaustion markers on activated T lymphocytes such as Tim3/HAVCR2 (Koyama et al., 2016; Restifo et al., 2012; Smyth et al., 2016), lymphocyte activation gene 3 (Lag3), T cell immunoreceptor with Ig and ITIM domains (TIGIT), B and T cell lymphocyte attenuator (BTLA), and V-domain Ig suppressor of T cell activation (VISTA).
The inventors' team just reported in Routy et al. Science Jan. 5, 2018, in conjunction with two other reports (Gopalakrishnan V et al and Matson V et al), that primary resistance to PD-1/PDL-1-based immune checkpoint inhibitors (ICI) can be due to an abnormal gut microbiota. Antibiotics (ATB) inhibited the clinical benefit of ICI in advanced lung, kidney and bladder cancer patients. Fecal microbiota transplantation (FMT) from cancer patients who responded to ICI (but not from non-responding patients) into germ-free or ATB-treated mice ameliorated the antitumor effects of PD-1 blockade. Metagenomics of patient stools at diagnosis revealed correlations between clinical responses to ICI and dedicated microbial patterns, with relative increase of Akkermansia muciniphila when examining both lung and kidney cancer patients altogether. The diagnosis of a gut dysbiosis is important since it is amenable to a therapy that restores efficacy of ICI. Indeed, oral supplementation with A. muciniphila or Alistipes indistinctus post-FMT with non-responder feces restored the efficacy of PD-1 blockade in an IL-12-dependent manner, by increasing the recruitment of CCR9+CXCR3+CD4+T lymphocytes into tumor beds.
The results disclosed in the present application show the predictive value of metagenomics-based gut oncomicrobiome signatures (GOMS), calculated at diagnosis from data of a larger cohort of 69 renal cell cancer patients (RCC) patients who received a second line therapy with nivolumab or pembrolizumab for a tyrosine kinase inhibitor or mTOR inhibitor-resistant advanced or metastatic RCC, for the clinical benefit of such a treatment. This study was performed using three RECIST criteria (best outcome, time to progression (TTP) at 3 months, TTP at 6 months), including or excluding 11 RCC patients who took antibiotics during or the 2 months preceeding the first administration of anti-PD1 Abs.
According to a first aspect, the present invention pertains to a method for in vitro determining if an individual having a renal cell cancer patients (RCC) is likely to respond to a treatment with an anti-PD1/PD-L1/PD-L2 Ab-based therapy, comprising the following steps:
The invention also pertains to 12 models useful to perform the above method, 6 of which were obtained from data of patients who did not take any antibiotics during the last two months, whereas the 6 others can be used in patients whom recent antibiotics uptake is unknown. These models rely on (partly) different subsets of the microbiota (listed in Tables 3, 6, 9 and 12 below) and can be used either alone or combined.
Tools designed to easily perform the above method are also part of the present invention, such as a nucleic acid microarray comprising nucleic acid probes specific for each of the microorganism species to be detected in step (i) of the method, and such as a set of primers comprising primer pairs for amplifying sequences specific for each of the microorganism species to be detected in step (i) of said method.
Theranostic methods for determining whether an individual needs a bacterial compensation with a bacterial composition and/or by FMT before receiving an anti-PD1/PD-L1 Ab-based therapy are also part of the invention.
The RECIST 1.1 criterium taken into account as predictive for clinical benefit was “best clinical outcome”.
A. Volcano plots (left) and Linear discriminant analysis effect size (LEfSe) (right) analysis to assess putative bacterial biomarkers for building metagenomics-based GOMS in RCC patients' stools regardless antibiotic usage. Volcano plots were generated computing for each bacterial species: i) the log 2 of fold ratio (FR) among the mean relative abundances of R versus NR (x axis); ii) the co-log 10 of P values deriving from Mann-Whitney U test calculated on relative abundances in absolute values (y axis). LEfSe plots were generated with Python 2.7 on output files derived from LEISe pipeline, and all species with LDA score≥2 were considered for subsequent analysis.
B. ROC curves to assess predictability of metagenomic-based GOMS in RCC patients' stools regardless antibiotic usage. Patients were divided into NR and R according to their «resistant» or «responding» clinical phenotype upon ICI treatment. Combinations of selected bacterial species in panel A were performed with Python 2.7 and underwent ROC analysis. Upon 5-fold cross-validation (no noise added), the ROC curve corresponding to the bacterial species consortium having the best AUC was depicted. Specificity (x axis) and sensitivity (y axis) along with their best values were reported for each curve (inset). Diagonal line depicts the absence of predictability for the best clinical outcome.
C-D, Idem as A-B but considering only patients who did not take ATB.
The RECIST 1.1 criterium taken into account as predictive for clinical benefit was “TTP< or >3 months”.
A. Volcano plots (left) and LEfSe (right) analysis to assess putative bacterial biomarkers for building metagenomics-based GOMS in RCC patients' stools regardless antibiotic usage. Volcano plots were generated computing for each bacterial species: i) the log 2 of fold ratio (FR) among the mean relative abundances of R versus NR (x axis); ii) the co-log 10 of P values deriving from Mann-Whitney U test calculated on relative abundances in absolute values (y axis). LEfSe plots were generated with Python 2.7 on output files derived from LEfSe pipeline, and all species with LDA score 2 were considered for subsequent analysis.
B. ROC curves to assess predictability of metagenomic-based GOMS in RCC patients' stools regardless antibiotic usage. Patients were divided into NR and R according to their «resistant» or «responding» clinical phenotype upon ICI treatment. Combinations of selected bacterial species in panel A were performed with Python 2.7 and underwent ROC analysis. Upon 5-fold cross-validation (no noise added), the ROC curve corresponding to the bacterial species consortium having the best AUC was depicted. Specificity (x axis) and sensitivity (y axis) along with their best values were reported for each curve (inset). Diagonal line depicts the absence of predictability for the best clinical outcome.
C-D. Idem as A-B but considering only patients who did not take ATB.
The RECIST 1.1 criterium taken into account as predictive for clinical benefit was “TTP< or >6 months”;
A. Volcano plots (left) and LEfSe (right) analysis to assess putative bacterial biomarkers for building metagenomics-based GOMS in RCC patients' stools regardless antibiotic usage. Volcano plots were generated computing for each bacterial species: i) the log 2 of fold ratio (FR) among the mean relative abundances of R versus NR (x axis); ii) the co-log 10 of P values deriving from Mann-Whitney U test calculated on relative abundances in absolute values (y axis). LEfSe plots were generated with Python 2.7 on output files derived from LEfSe pipeline, and all species with LDA score≥2 were considered for subsequent analysis.
B. ROC curves to assess predictability of metagenomic-based GOMS in RCC patients' stools regardless antibiotic usage. Patients were divided into NR and R according to their «resistant» or «responding» clinical phenotype upon ICI treatment. Combinations of selected bacterial species in panel A were performed with Python 2.7 and underwent ROC analysis. Upon 5-fold cross-validation (no noise added), the ROC curve corresponding to the bacterial species consortium having the best AUC was depicted. Specificity (x axis) and sensitivity (y axis) along with their best values were reported for each curve (inset). Diagonal line depicts the absence of predictability for the best clinical outcome.
C-D. Idem as A-B but considering only patients who did not take ATB.
All RCC patients who did not take ATB from the Routy et al. Science 2018 data base were considered unifying discovery and validation cohorts at TTP6. Alfa-diversity, beta-diversity, Volcano and LEfSe plots were generated as described in
Logistic regression models were drawn from stool MG of RCC patients regardless antibiotic usage and OUTCOME_1 criterion. Each Model is depicted by three graphs: a feature selection (left), a confusion matrix (center) and a ROC curve (right). Confusion matrices and ROC curves were generated after 5-fold cross-validation. The higher the percentage within the crossing ‘True label’ and ‘Predicted label’ cells, the higher the model predictability. For the ROC curves, both AUC (area under curve) and CV_AUC (cross-validated area under curve) need to be taken into account. At the bottom of each triplet is reported the Model equation described by a logistic regression: exp is the exponent with natural base, CAG is the relative abundance of each CAG species expressed within the closed interval [0:1] and standardized (zero mean and unit variance).
Logistic regression models were drawn from stool MG of RCC patients with no antibiotic usage and OUTCOME_1 criterion. Triplet graph (feature selection, confusion matrix, ROC curve) description as in
Logistic regression models were drawn from stool MG of RCC patients regardless antibiotic usage and OUTCOME_2 criterion. Triplet graph (feature selection, confusion matrix, ROC curve) description as in
Logistic regression models were drawn from stool MG of RCC patients with no antibiotic usage and OUTCOME_2 criterion. Triplet graph (feature selection, confusion matrix. ROC curve) description as in
In the present text, the following definitions are used:
Most current approaches for analyzing metagenomic data rely on comparisons to reference genomes, but the human gut microbiota diversity extends beyond what is currently covered by reference databases. In the results disclosed herein, the inventors used a method based on binning co-abundant genes across a series of metagenomic samples, that enables comprehensive discovery of new microorganisms without the need for reference sequences. In what follows, part of the species identified as likely to play a role in the patients' response to therapies based on antibodies against PD1, PD-L1 or PD-L2 are newly-identified species, not yet precisely referenced in public databases. For each of the identified species (both newly-identified and species very close to already referenced species), the present application discloses a set of 50 bacterial genes which are non-redundant sequences and can be used, alone or in combination, as tracer genes to assess the presence and relative abundance to the corresponding species. Of course, once the species are identified, either by the set of non-redundant genes disclosed herein, or later on by their further identification and/or inclusion into a data base, the skilled in the art can assess their relative abundance by any appropriate means, such as, for example, by measuring the copy number of another non-redundant gene that co-varies with the 50 sequences disclosed in the present application, or even by identifying a signature of this species at the protein level rather than in a nucleic acids sample. Hence, the present invention is not limited to the use of the disclosed sequences to measure the relative abundance of the corresponding species.
When necessary, other definitions are provided later in the present text.
The present invention concerns a method for in vitro determining if a subject having a renal cell cancer is likely to benefit from a cancer treatment with an ICI and more specifically, from a treatment with antibodies (or other inhibiting molecules) directed against immune checkpoint blockers PD1, PD-L1 or PD-L2, alone or together with CTLA4 and/or other drugs as defined above. The responder (NR) or responder (R) status is established following these steps:
(i) from a fecal sample of the said subject (or from an ileal or colonic mucosal specimen), obtaining an “abundances pattern” based on the relative abundances of a set of bacterial species, expressed within the closed interval [0:1];
(ii) using the obtained abundances pattern to calculate, using one (or several) of the complementary models proposed in the present application, the probability that the subject will not respond (NR) or respond (R) to the treatment.
It is important to note here that, as already mentioned above, the present invention is not limited by the technique used to measure the relative abundances of the bacterial species, which can be obtained by NGS (through any past or future NGS platform, from the first generation to the last available on the market and those in development, using any NGS output file provided as fastq, BAM, SAM, or other kind of files extensions) or any other technique such as, for example, qPCR (quantitative polymerase chain reaction) and microarrays to express the relative abundances of selected bacterial species, using the sequences provided herein or any other co-variant sequence. When the relative abundances are assessed by genetic analysis, the data preferably derive from shotgun sequencing, and not 16S targeted sequencing, in order to comply with the bioinformatic pipeline described in Materials and Methods below.
According to a first aspect, the present invention pertains to a method for in vitro determining if an individual having a renal cell cancer (RCC) is likely to respond to a treatment with an ICI-based therapy such as an anti-PD1/PD-L1/PD-L2 Ab-based therapy, comprising the following steps:
Clostridium_bolteae_ATCC_BAA_613
Bacteroides_faecis_MAJ27
Clostridium_sp_CAG_226
Barnesiella_viscericola_DSM_18177
Coprococcus_catus_GD_7
Subdoligranulum_sp_4_3_54A2FAA
Bacteroides_stercoris_ATCC_43183
Bacteroides_sp_CAG_20
Prevotella_sp_CAG_891
Clostridium_sp_CAG_230
Faecalibacterium_sp_CAG_74
Eubacterium_sp_CAG_115
Eubacterium_rectale_M104_1
Bacteroides_ovatus_V975
Bacteroides_sp_CAG_144
Prevotella_sp_CAG_617
Sutterella_wadsworthensis_2_1_59BFAA
Prevotella_sp_CAG_279
Alistipes_obesi
Prevotella_sp_CAG_617
Ruminococcus_callidus_ATCC_27760
Clostridium_sp_CAG_62
Eubacterium_sp_CAG_251
Hungatella_hathewayi_12489931
Alistipes_sp_CAG_268
Dorea_formicigenerans_ATCC_27755
Azospirillum_sp_CAG_239
Ruminococcus_sp_CAG_177
Anaerotruncus_colihominis_DSM_17241
Eggerthella_lenta_DSM_2243
Clostridium_sp_CAG_413
Eubacterium_rectale_CAG_36
Oscillibacter_sp_CAG_241
Butyricimonas_virosa_DSM_23226
Subdoligranulum_sp_CAG_314
Sutterella_sp_CAG_351
Megasphaera_elsdenii_14_14
Clostridium_methylpentosum_DSM_5476
Acidiphilium_sp_CAG_727
Clostridium_sp_CAG_7
Clostridium_sp_CAG_524
Faecalibacterium_cf_prausnitzii_KLE1255
Holdemanella_biformis_DSM_3989
Lactobacillus_vaginalis_DSM_5837_ATCC_49540
Pseudoflavonifractor_capillosus_ATCC_29799
Dialister_succinatiphilus_YIT_11850
Coprococcus_catus_GD_7
Clostridium_clostridioforme_2_1_49FAA
Faecalibacterium_prausnitzii_SL3_3
In the above method, the “abundances pattern based on the relative abundances of a set of bacterial species” can be, for example, in the form of a vector of said relative abundances. This abundance pattern will be inserted into an executable program (Windows OS environment) by the person performing the method (e.g., a clinician) to obtain the NR or R probability percentage of a subject.
As described in the experimental part below, several models can be obtained from clinical data of a representative cohort to assess the probability, for an individual, to respond (or not to respond) to the treatment. The experimental part describes in detail 12 possible models which, although based on different strategies, all provide very good predictability and success rates.
Other models can possibly be used, keeping in mind that an overrepresentation of bacterial species identified as “good bacteria” in Table 1 above, and/or underrepresentation of bacterial species identified as “bad bacteria” in the same table indicate that the individual is likely to respond to the treatment, whereas an overrepresentation of bacterial species identified as “bad bacteria” in Table 1 above, and/or underrepresentation of bacterial species identified as “good bacteria” in the same table indicate that the individual is likely to be a poor responder to the treatment. For assessing whether a bacterial species is over- or underrepresented in a fecal sample, its relative abundance is compared to a control value corresponding to the relative abundance of the same bacterial species in normal/healthy volunteers (i.e., individuals not having a cancer). For bacterial species that are not detected in the control volunteers, the mere presence of the bacterium is considered as an overrepresentation. Equations are used to “weight” the predictive values of each species' over- or underrepresentation and more precisely calculate the probability that the patient responds to the treatment (PR) or the probability that the patient resists to the treatment (PNR) with an ICI-based therapy.
In step (ii), several equations can be used, each corresponding to a model based on a subset of 8 bacteria from the set recited in step (i). The clinician will then combine the result obtained with each of these equations to more precisely predict the R or NR status of the patient.
According to a particular embodiment of the above method, at least one equation used in step (ii) corresponds to a model for a set of at least 8 bacterial species which are present in the fecal sample of said individual. However, a model based on 8 bacteria which are not all present in the sample can also be used, especially if it is combined to one or several models.
According to another particular embodiment of the above method, the model(s) used in step (ii) lead to an overall predictability of at least 65% and a success rate of at least 70%. The performance of a model will depend on the representativity of the cohort on which it has been obtained and on the relevance of the strategy used to build the model. A person skilled in the art of statistics can easily calculate the overall predictability and success rate of a model and determine whether it satisfies the required predictability and success rate criteria.
According to another particular embodiment of the above method, the Probability score obtained in step (ii) is interpreted as clinically meaningful if it is higher than 75% (fixed threshold) and/or if it is concordant among the different Models used, always taking into account the antibiotic usage (known or unknown).
According to another aspect of the present invention, the above method is combined with another method for determining, from a feces sample from a RCC patient, whether said patient is likely to be a good responder to a treatment with an ICI, based on an animal model. Such a method was already described in a previous application from the inventors's team (WO2016/063263) and comprises the steps of (i) performing a fecal microbial transplantation (FMT) of a feces sample from the patient into germ free (GF) model animals (e.g., GF mice); (ii) at least 7 to 14 days after step (i), inoculating said mice with a transplantable tumor model; (iii) treating the inoculated mice with the ICI; and (iv) measuring the tumor size in the treated animals. The results of step (iv) are illustrative of the response that can be expected for said patient to said treatment. In case the result obtained with the animal differs from the NR or R status predicted by the model(s) (with a probability of X %), the result with the animal model will prevail in the clinician's conclusion.
When the individual's antibiotic regimen exposure during the last two months is unknown, the method according to the invention is preferably performed using, in step (i), a set of bacterial species that comprises at least 8 bacterial species selected from the group consisting of:
Clostridium_bolteae_ATCC_BAA_613
Bacteroides_faecis_MAJ27
Barnesiella_viscericola_DSM_18177
Coprococcus_catus_GD_7
Subdoligranulum_sp_4_3_54A2FAA
Clostridium_sp_CAG_230
Faecalibacterium_sp_CAG_74
Bacteroides_ovatus_V975
Bacteroides_sp_CAG_144
Prevotella_sp_CAG_617
Hungatella_hathewayi_12489931
Dorea_formicigenerans_ATCC_27755
Eggerthella_lenta_DSM_2243
Acidiphilium_sp_CAG_727
Faecalibacterium_cf_prausnitzii_KLE1255
Holdemanella_biformis_DSM_3989
Lactobacillus_vaginalis_DSM_5837_ATCC_49540
Clostridium_clostridioforme_2_1_49FAA
According to a preferred embodiment of the method for assessing the R or NR status (best outcome criterion) of a patient whose antibiotic regimen exposure during the last two months is unknown, one, two or three equations are used in step (ii), which correspond to models obtained for the following sets of bacterial species, identified by their CAG numbers:
According to a preferred embodiment of the above method, at least one equation used corresponds to a model obtained with a set of bacteria which are all present in the individual's sample. However, the models can be run even if a few bacteria are missing. In such a case, it is preferable to use at least two and preferably at least 3 equations in step (ii).
According to a particular embodiment of the above method based on Models 1.2, 2.2 and/or 3.2, illustrated in the experimental part below, the equations for calculating the probability that said individual responds to the treatment (PR) are as follows:
wherein Xj (j=1 to 8) are the relative abundances of the bacterial species measured in the individual's sample and βj (j=1 to 8) are the following regression coefficients:
Of course, these coefficients can be refined by the skilled in the art by performing the same strategies as those described in the present application starting from other clinical data (i.e., clinical data from another cohort in addition to or in replacement of the data used by the inventors), or even merely because the 10000 iterations employed in the logistic regression (see methods) ensure each time a weak ‘floating randomness’ of each coefficient, not surpassing the 0.2 units. The coefficients can also differ if a different technique is used for measuring the relative abundances of the bacterial species (e.g., using qPCR instead of MGS analysis). Importantly, even using different techniques, regression coefficients will retain their positive or negative sign, meaning the positive or negative contribution of a definite CAG species to the overall model.
In case the patient did not take any antibiotic during the last two months, the method according to the invention is preferably performed using, in step (i), a set of bacterial species that comprises at least 8 bacterial species selected from the group consisting of:
Clostridium_bolteae_ATCC_BAA_613
Clostridium_sp_CAG_226
Subdoligranulum_sp_4_3_54A2FAA
Prevotella_sp_CAG_891
Faecalibacterium_sp_CAG_74
Bacteroides_sp_CAG_144
Prevotella_sp_CAG_617
Prevotella_sp_CAG_279
Dorea_formicigenerans_ATCC_27755
Ruminococcus_sp_CAG_177
Anaerotruncus_colihominis_DSM_17241
Eggerthella_lenta_DSM_2243
Clostridium_sp_CAG_524
Faecalibacterium_cf_prausnitzii_KLE1255
Holdemanella_biformis_DSM_3989
Coprococcus_catus_GD_7
According to a preferred embodiment of the method for assessing the R or NR status (best outcome criterion) of a patient who did not take antibiotics during the last two months, one, two or three equations are used in step (ii), each of which corresponding to a model obtained for the following sets of bacterial species, identified by their CAG numbers:
According to a preferred embodiment of the above method, at least one equation used corresponds to a model obtained with a set of bacteria which are all present in the individual's sample. However, the models can be run even if a few bacteria are missing. In such a case, it is preferable to use at least two and preferably at least 3 equations in step (ii).
According to a particular embodiment of the above method based on Models 4.2, 5.2 and/or 6.2, illustrated in the experimental part below, the equations for calculating the probability that said individual resists (PNR) or responds (PR) to the treatment are as follows:
wherein Xj (j=1 to 8) are the relative abundances of the bacterial species measured in the individual's sample and βj (j=1 to 8) are the following regression coefficients:
As mentioned above, these coefficients can be refined for several reasons or even changed if a different technique is used for measuring the relative abundances of the bacterial species (e.g., using qPCR instead of MGS analysis).
The models 1.2, 2.2, 3.2, 4.2, 5.2 and 6.2 above have been obtained using a criterion of best outcome. Patients who first responded to the treatment but relapsed a few months later were thus considered as responders. In order to assess if an individual is likely to have a long-term benefit from the treatment, defined as a time to progression (TTP) of at least 6 months, the inventors obtained additional models by considering only the stable diseases of more than 6 months (SD>6 months) among responders (OUTCOME_2 in the experimental part). These models are based on partly different subsets of the microbiota composition.
For assessing whether an individual whose antibiotic regimen exposure during the last two months is unknown is likely to have a long-term benefit from a treatment with an anti-PD1/PD-L1/PD-L2 Ab-based therapy, the method according to the invention is preferably performed using, in step (i), a set of bacterial species that comprises at least 8 bacterial species selected from the group consisting of:
Sutterella_wadsworthensis_2_1_59BFAA
Ruminococcus_callidus_ATCC_27760
Eubacterium_sp_CAG_251
Dorea_formicigenerans_ATCC_27755
Azospirillum_sp_CAG_239
Clostridium_sp_CAG_413
Eubacterium_rectale_CAG_36
Oscillibacter_sp_CAG_241
Butyricimonas_virosa_DSM_23226
Subdoligranulum_sp_CAG_314
Sutterella_sp_CAG_351
Megasphaera_elsdenii_14_14
Clostridium_sp_CAG_7
Holdemanella_biformis_DSM_3989
Dialister_succinatiphilus_YIT_11850
Faecalibacterium_prausnitzii_SL3_3
According to a preferred embodiment of the method for assessing the R or NR status (TTP>6months criterion) of a patient whose antibiotic regimen exposure during the last two months is unknown, one, two or three equations are used in step (ii), which correspond to models obtained for the following sets of bacterial species, identified by their CAG numbers:
According to a preferred embodiment of the above method, at least one equation used in step (ii) corresponds to a model obtained with a set of bacteria which are all present in the individual's sample. However, the models can be run even if a few bacteria are missing. In such a case, it is preferable to use at least two and preferably at least 3 equations in step (ii).
According to a particular embodiment of the above method based on Models 7.2, 8.2 and/or 9.2, illustrated in the experimental part below, the equations for calculating the probability that said individual resists (PNR) to the treatment are as follows:
wherein Xj (j=1 to 8) are the relative abundances of the bacterial species measured in the individual's sample and βj (j=1 to 8) are the following regression coefficients:
As already mentioned, these coefficients can be refined for several reasons or even changed if a different technique is used for measuring the relative abundances of the bacterial species.
In case the patient did not take any antibiotic during the last two months, the method according to the invention for assessing his/her probability of having a long-term benefit (TTP>6 months) from a treatment with an anti-PD1/PD-L1/PD-L2 Ab-based therapy is preferably performed using, in step (i), a set of bacterial species that comprises at least 8 bacterial species selected from the group consisting of:
Clostridiales_bacterium_1_7_47FAA
Bacteroides_stercoris_ATCC_43183
Bacteroides_sp_CAG__20
Eubacterium_sp_CAG_115
Eubacterium_rectale_M104_1
Sutterella_wadsworthensis_2_1_59BFAA
Alistipes_obesi
Prevotella_sp_CAG_617
Clostridium_sp_CAG_62
Eubacterium_sp_CAG_251
Alistipes_sp_CAG_268
Azospirillum_sp_CAG_239
Firmicutes_bacterium_CAG_103
Firmicutes_bacterium_CAG_270
Butyricimonas_virosa_DSM_23226
Subdoligranulum_sp_CAG_314
Sutterella_sp_CAG_351
Megasphaera_elsdenii_14_14
Clostridium_methylpentosum_DSM_5476
Holdemanella_biformis_DSM_3989
Pseudoflavonifractor_capillosus_ATCC_29799
Clostridium_clostridioforme_2_1_49FAA
Faecalibacterium_prausnitzii_SL3_3
According to a preferred embodiment of the method for assessing the R or NR status (TTP>6 months criterion) of a patient who did not take antibiotics during the last two months, one, two or three equations are used in step (ii), each of which corresponding to a model obtained for the following sets of bacterial species, identified by their CAG numbers:
According to a preferred embodiment of the above method, at least one equation used in step (ii) corresponds to a model obtained with a set of bacteria which are all present in the individual's sample. However, the models can be run even if a few bacteria are missing. In such a case, it is preferable to use at least two and preferably at least 3 equations in step (ii).
According to a particular embodiment of the above method based on Models 10.2, 11.2 and/or 12.2, illustrated in the experimental part below, the equations for calculating the probability that said individual resists to the treatment (PNR) or responds (PR) are as follows:
wherein Xj (j=1 to 8) are the relative abundances of the bacterial species measured in the individual's sample and βj (j=1 to 8) are the following regression coefficients:
As for the other models, these coefficients can be refined for several reasons or even changed if a different technique is used for measuring the relative abundances of the bacterial species.
According to a particular embodiment of the method of the invention, the fecal sample is obtained before the first administration of any ICI, such as an anti-PD1/PD-L1/PD-L2 antibody.
According to another particular embodiment, the individual already received a first-line therapy different from PD1/PD-L1/PD-L2 Ab-based therapies for treating his/her RCC. The method of the invention is however not limited to patients receiving a PD1/PD-L1/PD-L2 Ab-based therapy as a second line therapy, and the presently disclosed methods can also be used to assess the responder or non-responder status of RCC patients receiving a first-line PD1/PD-L1/PD-L2 Ab-based therapy either alone or in combination with other antineoplastic drugs (chemotherapy or another ICI such as anti-CTLA-4, as specified above).
According to another particular embodiment, the anti-PD1/PD-L1/PD-L2 Ab-based therapy administered to the patient is a treatment with an anti-PD1 antibody such as nivolumab or pembrolizumab or an anti-PD-L1 antibody such as atezolizumab or durvalumab, for example.
According to another of its aspects, the present invention pertains to a theranostic method for determining if a cancer patient needs a bacterial compensation before administration of an anti-PD1/PD-L1/PD-L2 Ab-based therapy and/or during this therapy, comprising assessing, by a method as above-disclosed, whether the patient is likely to be a good responder to such a therapy, wherein if the patient is not identified as likely to be a good responder, the patient needs a bacterial compensation.
According to this aspect of the invention, the bacterial compensation can be done either by fecal microbiota transplant (FMT), using microbiota from one or several donors (for example, from responders to the treatment), possibly enriched with bacterial strains known to be beneficial in this situation, or by administration of a bacterial composition. The inventors already described bacterial compositions that can be used for such a compensation and restore the ability, for the patient, to respond to the treatment (e.g., in WO 2016/063263 and in WO 2018/115519). Non-limitative examples of bacterial strains which can be beneficial to patients with an initial NR status are: Enterococcus hirae, Akkermansia muciniphila, Blautia strains, Coprococcus comes strains, Alistipes shahii, other Alistipes species (e.g. Alistipes indistinctus and/or onderdonkii and/or finegoldii), Ruminococcacae, Clostridiales species, Bacteroidales species, Actinobacteria, Coriobacteriales species, Methanobrevibacter Burkholderia cepacia, Bacteroides fragilis, Actinotignum schaalii, as well as the following additional bacteria:
Examples of compositions which can be beneficial to a patient with an initial NR status assessed by the method according to the invention are:
A nucleic acid microarray designed to perform the method according to the invention is also part of the present invention. Such a nucleic acid microarray comprises nucleic acid probes specific for each of the microorganism species to be detected in step (i) of said method (i.e., at least 8 species selected amongst those recited in Table 1). In a specific embodiment, the nucleic acid microarray is an oligonucleotide microarray comprising at least one oligonucleotide specific for at least one sequence selected from SEQ ID NOs: 1-2950. For example, the said microarray comprises at least 8 oligonucleotides, each oligonucleotide being specific for one sequence of a distinct species recited in table 1. The microarray of the invention can of course comprise more oligonucleotides specific for sequences of SEQ ID NOs: 1-2950, for example at least nx96 oligonucleotides, divided into 12 sets of nx8 oligonucleotides corresponding to the 12 models described herein, with n being an integer comprised between 1 and 25 which corresponds to the number of oligonucleotides used to specifically assess the presence of one specific bacterial species. The microarray according to the invention may further comprise at least one oligonucleotide for detecting at least one gene of at least one control bacterial species. A convenient bacterial species may be e.g. a bacterial species the abundance of which does not vary between individuals having a R or a NR status. Preferably, the oligonucleotides are about 50 bases in length. Suitable microarray oligonucleotides specific for any gene of SEQ ID NOs: 1-2950 may be designed, based on the genomic sequence of each gene, using any method of microarray oligonucleotide design known in the art. In particular, any available software developed for the design of microarray oligonucleotides may be used, such as, for instance, the OligoArray software, the GoArrays software, the Array Designer software, the Primer3 software, or the Promide software, all known by the skilled in the art.
As mentioned above, the relative abundance of the recited bacterial species can be measured by techniques different from the metagenomics analysis used herein, especially when the skilled in the art knows which bacterial species are to be measured. A particular technique which can be used to this purpose is qPCR (quantitative PCR). The PCR-based techniques are performed with amplification primers designed to be specific for the sequences which are measured. The present invention hence also pertains to a set of primers suitable for performing the above method, i.e., a set of primers comprising primer pairs for amplifying sequences specific for each of the microorganism species to be detected in step (i) of said method (i.e., at least 8 species selected amongst those recited in Table 1). Such a set of primers comprises a minimum of 16 primers, but it can comprise more primers, for example 30, 40, 50, 60, 70, 80, 100, 200, 300, 500, 1000, 2000 or more. According to a particular embodiment, the set of primers comprises at least one primer pair specifically amplifying part of a sequence selected amongst SEQ ID Nos: 1-2950. Of course, primer sets according to the invention can advantageously comprise 2, 3, 4, 5, 10, 20, 30, 40, 50, 60, 70, 80, 100, 200, 300, 500, 1000 or more pairs of primers each specifically amplifying part of a sequence selected amongst SEQ ID Nos: 1-2950.
Other characteristics of the invention will also become apparent in the course of the description which follows of the biological assays which have been performed in the framework of the invention and which provide it with the required experimental support, without limiting its scope.
Metagenomic Analysis of Stool Samples
Quantitative metagenomics pipeline developed at MetaGenoPolis (Jouy-en-Josas, France) was employed to reach the species-level description of RCC gut microbiota. DNA was extracted from 69 stool samples through an automated platform (SAMBO, MetaGenoPolis) and subjected to shotgun sequencing using Ion Proton sequencer to reach >20 million short DNA sequence reads (MetaQuant platform, MetaGenoPolis). Reads were filtered (Q>25) and cleaned to eliminate possible contaminants as human reads. The high-quality (HQ) reads were then mapped against the MetaHIT hs_9.9 M genes catalogue and counted through a two-step procedure: 1) using uniquely mapped reads; 2) attributing shared reads according to their mapping ratio based on unique reads. Identity threshold for mapping was set at >95%, in order to overcome gene allelic variants and the non-redundant nature of the catalogue itself. HQ reads were downsized at 13 millions for each sample in order to correct for sequencing depth, then normalized through RPKM (reads per kilo base per million mapped reads) method. A profile matrix of gene frequencies was thus obtained, and used as an input file for MetaOMineR, a suite of R packages developed at MetaGenoPolis.
In order to achieve the species-level description of microbes in RCC stool samples, the hs_9.9 M catalogue was clustered into 1436 MetaGenomic Species (MGS), roughly >500 genes that covary in abundance among hundreds of samples, ultimately belonging to the same microbial species. The taxonomical annotation of MGS was performed using gene homology with previously sequenced organisms using BLASTn against nt and wgs NCBI databases. After taxonomical assignment of each MGS, a matrix of frequency profiles of 50 co-abundant genes (CAG) was built using normalized MGS mean signals, thus the sum of each MGS within a sample resulted equal to 1. With this approach each species was assigned to a definite CAG number.
Statistical Analysis
Multivariate statistics were employed on the matrix of frequency profiles in order to describe the gut microbial composition and diversity. Python 2.7 and related statistical/graphical libraries (Matplotlib, Scikit learn, Pandas, Numpy) were used to compare all MGS through pairwise analysis, linear discriminant analysis effect size (LEfSe) and volcano plots. A P value less than or equal to 0.05 was considered statistically significant. Volcano plots were generated computing for each bacterial species: i) the log 2 of fold ratio (FR) among the mean relative abundances of R versus NR (x axis) and ii) the co-log 10 of P values deriving from Mann-Whitney U test calculated on relative abundances in absolute values (y axis). In order to ease the volcano graphical representation, at TTP6 the chosen P value was less than or equal to 0.7. LEfSe plots were generated with Python 2.7 on output files derived from LEfSe pipeline, and all species with LDA score 2 were considered for subsequent analysis.
Building the Predictive Models
A double strategy was implemented to select the bacterial species used to build the models.
The first one (“human-oriented”) relies on: i) performing Receiver Operating Characteristic (ROC) curves implementing Support Vector Machine (SVM) classifier on all possible combinations of bacterial species, looking for consortia giving the best Area Under Curve (AUC) values; ii) building a logistic regression classifier and a RFECV (recursive feature elimination with cross-validation) feature selection with selected bacterial species; iii) performing metrics as ROC curves and confusion matrices using the train/test subsets deriving from the original dataset.
The second strategy (“machine-learning”) relies on: i) building a logistic regression classifier and a RFECV feature selection on all bacterial species (without permutation), selecting the ones with higher ranking (equal to 1); ii) performing a new feature selection by means of RFE (recursive feature elimination), selecting the 8 species with highest ranking; iii) re-building a new logistic regression classifier plus RFECV feature selection on the 8 selected bacterial species; iv) performing metrics as ROC curves and confusion matrices using the train/test subsets deriving from the original dataset.
The main difference among the first and the second strategies is that the first one allows combinations of all the bacterial species present in the original dataset selecting sub-groups of them (called ‘consortia’) based on the highest AUC, while the second one would be able to select single species due to the natural convergence of the logistic regression model to select all the variables in order to reach the maximum predictability.
The implementations for building the logistic regression classifier and the feature selection models were made in Python (version 2.7) using the methods ‘logisticRegression’, ‘RFE’ and ‘RFECV’ from the Scikit Learn module (version 0.19.1). The implementation for building the SVM classifier (C-Support Vector Classification) was made in Python (version 2.7) using the method SVC from the Scikit Learn module (version 0.19.1). The implementation of combinations for ROC curves in the first strategy was made in Python (version 2.7) using the method ‘combinations’ from the Itertools' module present as a built-in in Python 2.7.
Logistic regression coefficients β were calculated by logistic regression classifier with the following requirements: a constant (also known as intercept) not added to the decision function; a max iteration equal to 10000; a liblinear solver (with L2 penalization). RFECV feature selection model (step=1, cross-validation equal to 3, scoring accuracy) was implemented in Python with logistic regression estimator (no intercept, 10000 max iteration, liblinear solver), providing rankings within the interval [1: (nspecies-1)], and it was used to make a recursive feature selection on all bacterial species, selecting only those species having a rank equal to 1. RFE feature selection model (step=1, feature to select=8) was similarly implemented in Python with the same estimator (logistic regression, no intercept, 10000 max iteration, liblinear solver) of RFECV.
Receiver Operating Characteristic (ROC) curves were built with Python 2.7 using the SVM classifier (linear kernel) and employed to assess the actual predictive meaning of selected bacterial species or consortia. Two different approaches were used to achieve such a selection: 1) species retrieved from LEfSe analysis (with LDA score 2) and combined within all possible combinations (2×2, 3×3, 4×4, 5×5, etc. . . . ) and 2) all species were considered and the first 5 having the best AUC were combined into a consortium. Area Under Curve (AUC), specificity and sensitivity were computed for each best combination, and reported as ROC curve and tabular data. A 5-fold cross-validation was used, with no added noise. Mean relative abundances (±Standard Error of the Mean), Fold Ratio (FR), Log 10 of FR and P value of cohort comparison (R vs NR, non-parametric Mann-Whitney U test) were also reported.
According to the aforementioned methods, bacterial species were chosen in the following manner: i) for the first strategy, taking into account all the combinations for all available species, and especially those derived from LEfSe, we selected the ‘consortium’ for each OUTCOME_1, OUTCOME_2, TTP3 and TTP6 timepoints having the highest AUC value, both for NR and R status; ii) for the second strategy, we selected the bacterial species having a RFECV and then RFE rank equal to 1, regardless negative and positive logistic regression coefficients (β), for OUTCOME_1 and OUTCOME_2.
Taking into account the two aforementioned strategies, a logistic regression classifier was implemented, for both of them, on standardized raw data (removing the mean and scaling to unit variance) in order to calculate regression coefficients (β) to be used in equation [2]. Thus, the relative contribution of each selected bacterial species to each model was weighted by the β coefficient and put into the equation [2] in order to predict through equation [1] the estimation (in probability percentage) of the outcome for a definite new subject.
P=1/(1+exp{circumflex over ( )}−z) [1]
and
z=β1*x1+ . . . +βn*xn [2]
For both strategies, two feature selection models (RFECV and then RFE) model were implemented to refine the number of selected species at 8, a number useful to be implemented in future and forthcoming diagnostic kits for clinical practical use. Evaluation of the final Models (built on three different sets of 8 bacterial species) was performed through metrics as ROC curves and confusion matrices using t train/test subsets of the original dataset (implemented in Python 2.7 and Scikit learn 0.19, through the train_test_split method).
For OUTCOME_1, taking into account all the aforementioned embodiments, methods and strategies, and taking into account the dichotomous/complementary nature of logistic regression classifier, in the present invention we propose four equations ([3], [4], [6], [7]) to predict the NR or R status of a definite new subject JD. Two of them ([3], [4]) take into account the eventuality of an unknown antibiotic regimen/treatment status, while the other two ([6], [7]) consider that the individual did not take antibiotics during the past two months.
In order to circumvent the caveats deriving from the natural sources of variability within the subjects' cohort (feces sampling, feces consistencies, sample storage, sequencing biases, etc. . . . ) that could affect relative abundances of under-represented bacterial species, the inventors built two further equations ([5], [8]) based solely on the species having an average relative abundance greater than or equal to 0.001 if expressed in the interval [0:1] (or 0.1% if expressed in the interval [0:100]) for subjects with unknown antibiotic treatment ([5]) and for subjects who did not take antibiotics during the last two months prior sampling ([8]) These pattern of species underwent the second strategy of modeling as described above.
For OUTCOME_2, taking into account all the aforementioned embodiments, methods and strategies, and taking into account the dichotomous/complementary nature of logistic regression classifier, in the present invention we propose four equations ([9], [10], [12], [13]) to predict the NR or R status of a definite new subject JD. Two of them ([9], [10]) take into account the eventuality of an unknown antibiotic regimen/treatment status, while the other two ([12], [13]) consider that the individual did not take antibiotics during the past two months.
In order to circumvent the caveats deriving from the natural sources of variability within the subjects' cohort (feces sampling, feces consistencies, sample storage, sequencing biases, etc. . . . ) that could affect relative abundances of under-represented bacterial species, the inventors built two further equations ([11], [14]) based solely on the species having an average relative abundance greater than or equal to 0.001 if expressed in the interval [0:1] (or 0.1% if expressed in the interval [0:100]) for subjects with unknown antibiotic treatment ([11]) and for subjects who did not take antibiotics during the last two months prior sampling ([14]) These pattern of species underwent the second strategy of modeling as described above.
We explored the composition of the gut microbiota using quantitative metagenomics by shotgun sequencing, reaching >20 million short DNA sequence reads per sample followed by analysis of the results in a 9.9 million-gene reference catalogue (J. Li, Nat. Biotechnol. 32, 834-841 (2014). Total DNA was extracted from 69 patients diagnosed with RCC prior to starting therapy (Table 1). Among the 69 RCC patients, 58 did not take ATB, while 11 had been prescribed ATB before the MG analyses. As reported in Materials and Methods, each metagenomic species (MGS) was assigned to a definite co-abundant gene profile (CAG), and the subsequent taxonomical annotation was performed based on gene homology to previously sequenced organisms (using blastN against the nt and WGS databanks). R (responders) were defined as those patients exhibiting a complete or partial or stable disease >3 months, while NR (non responders) were defined as harbouring a progressive disease
<65
<2
When segregating responders (R) from non-responders (NR) in all 69 patients (according to the best clinical response as assessed by RECIST1.1), we failed to observe significant differences with a FDR of 10%.
GAMS Associated with Worse Clinical Outcome
However, the best predictive model allowing to identify the “resistant” patients with AUC value of 0.882, a specificity of 0.778 (true positive cases) and a sensitivity (false positive cases) of 0.860 was obtained when considering LEfSe combination species, and the commensals retained in the model, that were increased in NR (“bad bacteria”) were:
The cut-off values correspond to the means±SEM featuring in Table 2. We kept only those species having a p value <0.05 for the final model, and claim the equilibrium between good and bad bacteria for the best outcome in this patent.
When segregating responders (R) from non-responders (NR) in ROC who did not take ATB (according to the best clinical response as assessed by RECIST1.1), we failed to observe significant differences with a FDR p value <0.05.
However, the best predictive model allowing to identify the “resistant” patients with AUC value of 0.879, a specificity of 0.707 (true positive cases) and a sensitivity (false positive cases) of 0.950 was obtained when considering LEfSe combination species, and the commensals retained in the model that were increased in NR (“bad bacteria”) were:
The cut-off values correspond to the means±SEM featuring in Table 15. We kept only those species having a p value <0.05 for the final model. According to the present invention, the equilibrium between good and bad bacteria is critical for the best outcome.
GAMS Associated with Best Clinical Outcome
The best predictive model allowing to identify the “responding” patients (R) among all patients with AUC value of 0.863, a specificity of 0.798 (true positive cases) and a sensitivity (false positive cases) of 0.842 was obtained when considering LEfSe combo species: the decrease of the following detrimental commensals retained in the model were:
The cut-off values correspond to the means±SEM featuring in Table 15. We kept only these species (that had a p value <0.05) for the final model and claim the equilibrium between good and bad bacteria for the best outcome in this patent.
The best predictive model allowing to identify the “responding” patients (R) among those who did not take ATB with AUC value of 0.845, Spec=0.990; Sens=0.614 was obtained when considering LEfSe combo species, and the commensals retained in the model were:
Are decreased in R the following ones:
Are increased in R the following ones:
The cut-off values correspond to the means±SEM featuring in Table 15. We kept only these species (that had a p value <0.05) for the final model. According to the present invention, the equilibrium between good and bad bacteria is critical for the best outcome.
Clostridium_bolteae_ATCC_BAA_613
Bacteroides_sp_CAG_144
Faecalibacterium_cf_prausnitzii_KLE1255_3
Eubacterium_sp_CAG_38
Prevotella_sp_CAG_617
Subdoligranulum_sp_4_3_54A2FAA
Prevotella_copri_CAG_164
Acinetobacter_sp_CAG_196
Azospirillum_sp_CAG_239
Clostridium_bolteae_ATCC_BAA_613
Bacteroides_sp_CAG_144
Faecalibacterium_cf_prausnitzii_KLE1255_3
Eubacterium_sp_CAG_38
Prevotella_sp_CAG_617
Subdoligranulum_sp_4_3_54A2FAA
Azospirillum_sp_CAG 239
Clostridium_bolteae_ATCC_BAA_613
Bacteroides_sp_CAG_144
Eubacterium_sp_CAG_115
Prevotella_sp_CAG_617
Bacteroides_plebeius_CAG_211
Eubacterium_siraeum_CAG_80
Bacteroides_coprophilus_DSM_18228_JCM_13818
Eubacterium_sp_CAG_115
Prevotella_sp_CAG_617
Clostridium_clostridioforme_2_1_49FAA_1
Bacteroides_plebeius_CAG_211
Eubacterium_siraeum_CAG_80
Eubacterium_sp_CAG_202
Bacteroides_coprophilus_DSM_18228_JCM_13818
Metagenomics-based GOMS associated with TTP>3 months according to RECIST 1.1 criteria (
When segregating responders (R, TTP>3 months) from non-responders (NR, TTP<3 months) in all 69 patients (according to the best clinical response as assessed by RECIST1.1), we failed to observe significant differences with a FDR p value <0.05. When segregating responders (R, TTP>3 months) from non-responders (NR, TTP<3 months) in RCC who did not take ATB (according to the best clinical response as assessed by RECIST1.1), we failed to observe significant differences with a FDR p value <0.05.
However, the multivariate model allowed to find some GOMS with clinical relevance.
GOMS Associated with TTP<3 Months
Resistance at 3 months was the best clinical readout of our analysis in RCC patients.
The best predictive model allowing to identify the “resistant” patients (NR) among all patients with AUC value of 0.913, a specificity of 0.798 (true positive cases) and a sensitivity (false positive cases) of 0.910 was obtained when considering LEfSe combination, and the commensals retained in the model as “bad bacteria” increased in NR were the following, as shown in Table 16:
However, in patients who did not took ATB, the list of bacteria retained in this model which gave an AUC of 0.884, specificity 0.667, sensitivity 1.000 were:
For the bad bacteria, increase of:
For the good bacteria, decrease of:
We did retain the ratio between good and bad bacteria for TTP3 NR in our final model (see Table 16, part B).
GOMS Associated with TTP>3 Months
The best predictive model allowing to identify the “responders” patients (R) among all patients, with AUC=0.876; Spec=0.990; Sens=0.727, was obtained when considering LEfSe combination, and the commensals retained in the model as “bad bacteria” decreased in R were the following, as shown in Table 16:
However, in patients who did not took ATB, the list of bacteria retained in this model which gave AUC=0.899; Spec=0.990; Sens=0.836 were:
For the bad bacteria, a decrease in:
For the good bacteria, an increase in:
We did retain the ratio between good and bad bacteria for TTP3 R in our final model (see Table 16, part B).
Clostridium_sp_CAG_7_1
Hungatella_hathewayi_12489931_1
Prevotella_sp_CAG_617
Holdemanella_biformis_DSM_3989_1
Azospirillum_sp_CAG_239
Clostridium_bolteae_ATCC_BAA_613
Clostridium_sp_CAG_7_1
Faecalibacterium_cf_prausnitzii_KLE1255_3
Prevotella_sp_CAG_617
Holdemanella_biformis_DSM_3989_1
Azospirillum_sp_CAG_239
Peptoniphilus_sp_oral_taxon_836_str_F0141
Clostridium_bolteae_ATCC_BAA_613
Eubacterium_sp_CAG_38
Eubacterium_sp_CAG_115
Prevotella_sp_CAG_617
Holdemanella_biformis_DSM_3989_1
Faecalibacterium_cf_prausnitzii_KLE1255
Clostridium_bolteae_ATCC_BAA_613
Clostridium_sp_CAG_7_1
Akkermansia_muciniphila_CAG_154
Eubacterium_sp_CAG_115
Prevotella_sp_CAG_617
Prevotella_copri_CAG_164
Holdemanella_biformis DSM_3989_1
Prevotella_sp_CAG_279_1
Faecalibacterium_cf_prausnitzii_KLE1255
When segregating responders (R, TTP>6 months) from non-responders (NR, TTP<6 months) in all ROC patients (according to the TTP6 as assessed by RECIST1.1), we did not observe significant differences with a FDR p value <0.05 (not shown). When segregating responders (R, TTP>6 months) from non-responders (NR, TTP<6 months) in RCC who did not take ATB (according to the TTP6 as assessed by RECIST1.1), we observed significant differences with a FDR p value <0.05 (
We conducted the multivariate model allowing to find very robust GOMS with clinical relevance.
GOMS Associated with TTP>6 Months
The best predictive model allowing to identify the “responding” patients (R) among all patients with AUC value of 0.845, a specificity of 0.889 (true positive cases) and a sensitivity (false positive cases) of 0.800 was obtained when considering LEfSe combination, and the good commensals retained in the model were the following, as shown in Table 17 (part A):
The bad commensals that should be decreased are:
The cut-off values correspond to the means±SEM featuring in Table 17.
However, in patients who did not took ATB, the list of favorable bacteria retained in this model was different and included the following ones, reaching AUC value of 0.804, a specificity of 0.848 (true positive cases) and a sensitivity (false positive cases) of 0.770 was obtained when considering LEfSe combination (Table 17 part B):
The cut-off values correspond to the means±SEM featuring in Table 17. These favorable and unfavorable commensals are major spp influencing favorable responses to PD1 blockade.
GOMS Associated with TTP<6 Months
As for the bacteria associated with short TTP<6 months in all patients, the list of significant bacteria was very restrained even more without ATB. Very few can be found significantly associated with resistance:
Clostridium_bolteae_ATCC_BAA_613
Alistipes_sp_CAG_514_1
Clostridiales_bacterium_VE202_14_1
Clostridium_sp_CAG_230
Azospirillum_sp_CAG_239_1
Azospirillum_sp_CAG_239_3
Alistipes_sp_CAG_268
Coprococcus_catus_GD_7_6
Clostridium_sp_CAG_230
Clostridium_sp_CAG_167
Oscillibacter_sp_KLE_1745_6
Azospirillum_sp_CAG_239_3
Eubacterium_sp_CAG_180_1
Eubacterium_sp_CAG_115
Eubacterium_siraeum_CAG_80
Butyrivibrio_crossotus_DSM_2876
In this example, three different models were built based on a bundle of 69 “reference patterns”, which are the standardized abundances profiles (zero mean and unit variance) obtained from the 69 samples used in training the logistic regression models. Standardization is necessary to homogenize the dynamic range of bacterial species abundances.
Based on the first strategy described in the Materials and Methods, applied to 69 RCC patients (NR=27; R=42) with unknown antibiotic usage history and OUTCOME_1 criterion (best outcome), we selected 25 bacterial species that gave a logistic regression model overall predictability equal to 89.86% (model 1.1). In order to measure the goodness of the model with 25 species, after a 5-fold cross-validation performed on a subset of the original dataset, the model itself was able to predict 60% of NR and 69% of R subjects, with a CV_AUC=0.917 and an AUC=0.785. Upon RFECV and RFE feature selection, 8 species were selected from the aforementioned 25: the model built on these 8 species gave an overall predictability of 79.71% (model 1.2,
Based on the second strategy (machine-learning) applied to 69 RCC patients (NR=27; R=42) with unknown antibiotic usage history and OUTCOME_1 criterion, we took into account all the bacterial species of the dataset (n=1112). After a first round of feature selection, RFECV gave a rank=1 to 31 species which collectively, after a 5-fold cross-validation, were able to predict 100% of NR and 100% of R subjects, with a CV_AUC=1 and an AUC=1 (model 2.1). Upon a second round of feature selection, RFE gave a rank=1 to 8 species, which were selected for the subsequent logistic regression classifier. The model built on these 8 species gave an overall predictability of 89.86% (model 2.2,
Taking into account the bacterial species (n=181, above 1112 total species, 16.3% of the total microbial composition) having an average relative abundance greater than or equal to 0.001 if expressed in the interval [0:1] (thus, the 0.1%) within the overall RCC cohort, we applied the second strategy (machine-learning) for 69 RCC patients (NR=27; R=42) with unknown antibiotic usage history and OUTCOME_1 criterion. After a first round of feature selection, RFECV gave a rank=1 to 21 species which collectively, after a 5-fold cross-validation, were able to predict 100% of NR and 92% of R subjects, with a CV_AUC=0.573 and an AUC=1 (model 3.1). After a second round of feature selection, RFE gave a rank=1 to 8 species which were selected for the subsequent logistic regression classifier. The model built on these 8 species gave an overall predictability of 85.51% (model 3.2,
According to the Models 1.2, 2.2 and 3.2, which were modeled on unknown antibiotic usage history and OUTCOME_1 criterion, and are all based on 8 bacterial species so they can be properly compared, it is shown that the Model 2.2 has the highest predictability (89.86%) with the highest CV_AUC and AUC.
According to the results obtained in Example 4, and due to the fact that responders (R) are predicted better than non-responders (NR) in all the three models proposed with 8 species (
For the Model 1.2, according to the β coefficients of R (R_coeff), the probability estimate (expressed as a percentage) of belonging to R cohort for a tested subject JD is calculated as follows:
P
R(JD)=1/[1 exp{circumflex over ( )}−(−1.0323*CAG00008+
−1.084*CAG01039+
−0.9723*CAG00473+
−0.9907*CAG00140+
−0.6234*CAG00610+−0.8468*CAG01141+
0.8994*CAG00413+
0.6041*CAG00317)] [3]
In equation [3], “CAGXXXXX” represents the relative abundance of the bacterial species defined by the recited CAG, in a stool sample from the tested subject JD. These relative abundances need to be expressed as standardized values (zero mean and unit variance). The best option is to standardize the “abundances pattern” of the tested subject with a bundle of known “reference patterns”, i.e., all the relative abundances of each bacterial species for each fecal sample used in training the logistic regression model and the present invention refers a claim of a Python script used to ease the computation of the PR (JD). From the standardized abundance pattern of a tested subject, the equation [3] can be applied to predict his/her belonging to the NR cohort with a model predictability of 79.71% and a success rate of 77%.
For the Model 2.2, according to the β coefficients of R (R_coeff), the probability estimate (expressed as a percentage) of belonging to R cohort for a tested subject JD is calculated as follows:
P
R(JD)=1/[1+exp{circumflex over ( )}−(−1.0791*CAG00727+
−1.1164*CAG01141+
−1.2839*CAG00650+
−1.1156*CAG00928+
−1.014*CAG00473+
−0.9394*CAG00122+
−0.8085*CAG01144+
−0.7679*CAG00063)] [4]
In equation [4], “CAGXXXXX” represents the relative abundance of the bacterial species defined by the recited CAG, in a stool sample from the tested subject JD. These relative abundances need to be expressed as standardized values (zero mean and unit variance). The best option is to standardize the “abundances pattern” of the tested subject with a bundle of known “reference patterns”, and the present invention refers a claim of a Python script used to ease the computation of the PR (JD). From the standardized abundance pattern of a tested subject, the equation [4] can be applied to predict his/her belonging to the R cohort with a model predictability of 89.86% and a success rate of 92%.
For the Model 3.2, according to the β coefficients of R (R_coeff), the probability estimate (expressed as a percentage) of belonging to R cohort for a tested subject JD is calculated as follows:
P
R(JD)=1/[1+exp{circumflex over ( )}−(−1.1735*CAG00243+
−1.027*CAG00473+
−0.9968*CAG00327+
−1.0043*CAG01039+
−0.7456*CAG00140+
−0.8736*CAG01263+
0.7032*CAG00037+
−0.6073*CAG00357)] [5]
In equation [5], “CAGXXXXX” represents the relative abundance of the bacterial species defined by the recited CAG, in a stool sample from the tested subject JD. These relative abundances need to be expressed as standardized values (zero mean and unit variance). The best option is to standardize the “abundances pattern” of the tested subject with a bundle of known “reference patterns”, and the present invention refers a claim of a Python script used to ease the computation of the PR (JD). From the standardized abundance pattern of a tested subject, the equation [5] can be applied to predict his/her belonging to the R cohort with a model predictability of 85.51% and a success rate of 85%.
In this example, three different models were built based on a bundle of 58 “reference patterns”, which are the standardized abundances profiles (zero mean and unit variance) obtained from the 58 samples used in training the logistic regression models. Standardization is necessary to homogenize the dynamic range of bacterial species abundances.
Based on the first strategy applied to 58 RCC patients (NR=19; R=39) with no antibiotic usage history and OUTCOME_1 criterion (best outcome), we selected 25 bacterial species that gave a logistic regression model overall predictability equal to 89.66% (model 4.1). In order to measure the goodness of the model with 25 species, after a 5-fold cross-validation performed on a subset of the original dataset, the model itself was able to predict 67% of NR and 100% of R subjects, with a CV_AUC=1 and an RUC=0.778. Upon RFECV and RFE feature selection, 8 species were selected from the aforementioned 25: the model built on these 8 species gave an overall predictability of 87.93% (model 4.2,
Based on the second strategy (machine-learning) applied to 58 RCC patients (NR=19; R=39) with no antibiotic usage history and OUTCOME_1 criterion, we took into account all the bacterial species of the dataset (n=1083). After a first round of feature selection, RFECV gave a rank=1 to 9 species which collectively, after a 5-fold cross-validation, were able to predict 100% of NR and 92% of R subjects, with a CV_AUC=1 and an AUC=1 (model 5.1). Upon a second round of feature selection, RFE gave a rank=1 to 8 species, which were selected for the subsequent logistic regression classifier. The model built on these 8 species gave an overall predictability of 93.1% (model 5.2,
Taking into account the bacterial species (n=183, above 1083 total species, 16.9% of the total microbial composition) having an average relative abundance greater than or equal to 0.001 if expressed in the interval [0:1] (thus, the 0.1%) within the overall RCC cohort, we applied the second strategy (machine-learning) for 58 RCC patients (NR=19; R=39) with no antibiotic usage history and OUTCOME_1 criterion. After a first round of feature selection, RFECV gave a rank=1 to 20 species which collectively, after a 5-fold cross-validation, were able to predict 100% of NR and 92% of R subjects, with a CV_AUC=0.917 and an AUC=1 (model 6.1). After a second round of feature selection, RFE gave a rank=1 to 8 species which were selected for the subsequent logistic regression classifier. The model built on these 8 species gave an overall predictability of 84.48% (model 6.2,
According to the Models 4.2, 5.2 and 6.2, which were modeled on no antibiotic usage history and OUTCOME_1 criterion and were all based on 8 bacterial species in order to be properly compared, it is shown that the Model 5.2 has the highest predictability (93.1%) with optimal CV_AUC and AUC.
According to the results obtained in Example 6, and due to the fact that non-responders (NR) are predicted better than responders (R) in the first two models proposed with 8 species (
Regarding the Model 4.2, and according to the g coefficients of NR (NR_coeff), the probability estimate (expressed as a percentage) of belonging to NR cohort for a tested subject JD is calculated as follows:
P
NR(JD)=1/[1+exp{circumflex over ( )}−(0.955*CAG00008+
0.8761*CAG01039+
0.9951*CAG00473+
0.7609*CAG00487+
1.1107*CAG00140+
0.9402*CAG01141+
0.7187*CAG01208+
−0.5812*CAG00413)] [6]
In equation [6], “CAGXXXXX” represents the relative abundance of the bacterial species defined by the recited CAG, in a stool sample from the tested subject JD. These relative abundances need to be expressed as standardized values (zero mean and unit variance). The best option is to standardize the “abundances pattern” of the tested subject with a bundle of known “reference patterns”, and the present invention refers a claim of a Python script used to ease the computation of the PNR (JD). From the standardized abundance pattern of a tested subject, the equation [6] can be applied to predict his/her belonging to the NR cohort with a model predictability of 87.93% and a success rate of 100%.
Regarding the Model 5.2, and according to the β coefficients of NR (NR_coeff), the probability estimate (expressed as a percentage) of belonging to NR cohort for a tested subject JD is calculated as follows:
P
NR(JD)=1/[1+exp{circumflex over ( )}−(0.7112*CAG00048_1+
0.8931*CAG00300+
1.0042*CAG00473+
1.1282*CAG00650+
1.2569*CAG00720+
1.2397*CAG00727+
0.7595*CAG00963+
1.0235*CAG01141)] [7]
In equation [7], “CAGXXXXX” represents the relative abundance of the bacterial species defined by the recited CAG, in a stool sample from the tested subject JD. These relative abundances need to be expressed as standardized values (zero mean and unit variance). The best option is to standardize the “abundances pattern” of the tested subject with a bundle of known “reference patterns”, and the present invention refers a claim of a Python script used to ease the computation of the PNR (JD). From the standardized abundance pattern of a tested subject, the equation [7] can be applied to predict his/her belonging to the NR cohort with a model predictability of 93.1% and a success rate of 100%.
Regarding the Model 6.2, and according to the coefficients of R (R_coeff), the probability estimate (expressed as a percentage) of belonging to R cohort for a tested subject JD is calculated as follows:
P
R(JD)=1/[1+exp{circumflex over ( )}−(−1.2907*CAG00243+
−0.8848*CAG00008+
−0.9776*CAG00698+
−0.9997*CAG00473+
−1.0697*CAG00327+
−0.7806*CAG01039+
−0.8376*CAG00897+
−0.7459*CAG00766)] [8]
In equation [8], “CAGXXXXX” represents the relative abundance of the bacterial species defined by the recited CAG, in a stool sample from the tested subject JD. These relative abundances need to be expressed as standardized values (zero mean and unit variance). The best option is to standardize the “abundances pattern” of the tested subject with a bundle of known “reference patterns”, and the present invention refers a claim of a Python script used to ease the computation of the PR (JD). From the standardized abundance pattern of a tested subject, the equation [8] could be applied to predict his/her belonging to the R cohort with a model predictability of 84.48% and a success rate of 100%.
In this example, three different models were built based on a bundle of 67 “reference patterns”, which are the standardized abundances profiles (zero mean and unit variance) obtained from the 67 samples used in training the logistic regression models. Standardization is necessary to homogenize the dynamic range of bacterial species abundances.
Based on the first strategy applied to 67 RCC patients (NR=37; R=30) with unknown antibiotic usage history and OUTCOME_2 criterion (TPP>6 months), we selected 10 bacterial species that gave a logistic regression model overall predictability equal to 65.22% (model 7.1). In order to measure the goodness of the model with 10 species, after a 5-fold cross-validation performed on a subset of the original dataset, the model itself was able to predict 78% of NR and 44% of R subjects, with a CV_AUC=0.370 and an AUC=0.716. Upon RFECV and RFE feature selection, 8 species were selected from the aforementioned 10: the model built on these 8 species gave an overall predictability of 66.67% (model 7.2,
Based on the second strategy (machine-learning) applied to 67 RCC patients (NR=37; R=30) with unknown antibiotic usage history and OUTCOME_2 criterion, we took into account all the bacterial species of the dataset (n=1112). After a first round of feature selection, RFECV gave a rank=1 to 1070 species which collectively, after a 5-fold cross-validation, were able to predict 11% of NR and 100% of R subjects, with a CV_AUC=0.667 and an AUC=0.667 (model 8.1). Upon a second round of feature selection, RFE gave a rank=1 to 8 species, which were selected for the subsequent logistic regression classifier. The model built on these 8 species gave an overall predictability of 88.41% (model 8.2,
Taking into account the bacterial species (n=181, above 1112 total species, 16.3% of the total microbial composition) having an average relative abundance greater than or equal to 0.001 if expressed in the interval [0:1] (thus, the 0.1%) within the overall RCC cohort, we applied the second strategy (machine-learning) for 67 RCC patients (NR=37; R=30) with unknown antibiotic usage history and OUTCOME_2 criterion. After a first round of feature selection, RFECV gave a rank=1 to 175 species which collectively, after a 5-fold cross-validation, were able to predict 33% of NR and 78% of R subjects, with a CV_AUC=0.222 and an AUC=0.605 (model 9.1). After a second round of feature selection, RFE gave a rank=1 to 8 species which were selected for the subsequent logistic regression classifier. The model built on these 8 species gave an overall predictability of 88.41% (model 9.2,
According to all Models, which were modeled on unknown antibiotic usage history and OUTCOME_2 criterion, it is shown that the Model 8.2 has a good predictability (84.41%) with good CV_AUC and AUC, and can be chosen as the most discriminant among NR and R patients.
According to the results obtained in Example 8, we took into consideration the logistic regression β coefficients relative to NR for Models 7.2, 8.2 and 9.2.
For the Model 7.2, according to the β coefficients of non-responders (NR_coeff), the probability estimate (expressed as a percentage) of belonging to NR cohort for a tested subject JD is calculated as follows:
P
NR(JD)=1/[1+exp{circumflex over ( )}−(−0.2034*CAG00782+
0.4092*CAG00013+
0.2708*CAG00873+
0.7665*CAG01141+
−0.5899*CAG00668+
−0.5181*CAG00669+
−0.8415*CAG00886+
−0.7339*CAG00889)] [9]
In equation [9], “CAGXXXXX” represents the relative abundance of the bacterial species defined by the recited CAG, in a stool sample from the tested subject JD. These relative abundances need to be expressed as standardized values (zero mean and unit variance). The best option is to standardize the “abundances pattern” of the tested subject with a bundle of known “reference patterns”, and the present invention refers a claim of a Python script used to ease the computation of the PNR (JD). From the standardized abundance pattern of a tested subject, the equation [9] can be applied to predict his/her belonging to the NR cohort with a model predictability of 66.67% and a success rate of 78%.
For the Model 8.2, according to the β coefficients of NR (NR_coeff), the probability estimate (expressed as a percentage) of belonging to NR cohort for a tested subject JD is calculated as follows:
P
NR(JD)=1/[1+exp{circumflex over ( )}−(0.6422*CAG00211+
−1.0581*CAG00474+
−0.8984*CAG00624+
1.3189*CAG00650+
−0.7324*CAG00676+
−0.6623*CAG00771+
−1.1372*CAG01197+
−0.8525*CAG01321)] [10]
In equation [10], “CAGXXXXX” represents the relative abundance of the bacterial species defined by the recited CAG, in a stool sample from the tested subject JD. These relative abundances need to be expressed as standardized values (zero mean and unit variance). The best option is to standardize the “abundances pattern” of the tested subject with a bundle of known “reference patterns”, and the present invention refers a claim of a Python script used to ease the computation of the PNR (JD). From the standardized abundance pattern of a tested subject, the equation [10] can be applied to predict his/her belonging to the NR cohort with a model predictability of 84.41% and a success rate of 100%.
For the Model 9.2, according to the β coefficients of NR (NR_coeff), the probability estimate (expressed as a percentage) of belonging to NR cohort for a tested subject JD is calculated as follows:
P
NR(JD)=1/[1+exp{circumflex over ( )}−(0.9341*CAG00557+
0.8927*CAG00601+
0.7108*CAG00607+
−1.2341*CAG00669+
1.0379*CAG00861+
−1.0016*CAG00880+
−0.872*CAG00937+
−0.8362*CAG01321)] [11]
In equation [11], “CAGXXXXX” represents the relative abundance of the bacterial species defined by the recited CAG, in a stool sample from the tested subject JD. These relative abundances need to be expressed as standardized values (zero mean and unit variance). The best option is to standardize the “abundances pattern” of the tested subject with a bundle of known “reference patterns”, and the present invention refers a claim of a Python script used to ease the computation of the PNR (JD). From the standardized abundance pattern of a tested subject, the equation [11] can be applied to predict his/her belonging to the NR cohort with a model predictability of 88.41% and a success rate of 89%.
In this example, three different models were built based on a bundle of 56 “reference patterns”, which are the standardized abundances profiles (zero mean and unit variance) obtained from the 56 samples used in training the logistic regression models. Standardization is necessary to homogenize the dynamic range of bacterial species abundances.
Based on the first strategy applied to 56 RCC patients (NR=29; R=27) with no antibiotic usage history and OUTCOME_2 criterion, we selected 10 bacterial species that gave a logistic regression model overall predictability equal to 72.41% (model 10.1). In order to measure the goodness of the model with 10 species, after a 5-fold cross-validation performed on a subset of the original dataset, the model itself was able to predict 78% of NR and 67% of R subjects, with a CV_AUC=0.667 and an AUC=0.796. Upon RFECV and RFE feature selection, 8 species were selected from the aforementioned 10: the model built on these 8 species gave an overall predictability of 68.97% (model 10.2,
Based on the second strategy (machine-learning) applied to 56 RCC patients (NR=29; R=27) with no antibiotic usage history and OUTCOME_2 criterion, we took into account all the bacterial species of the dataset (n=1083). After a first round of feature selection, RFECV gave a rank=1 to 51 species which collectively, after a 5-fold cross-validation, were able to predict 89% of NR and 83% of R subjects, with a CV_AUC=0.944 and an AUC=0.944 (model 11.1). Upon a second round of feature selection, RFE gave a rank=1 to 8 species, which were selected for the subsequent logistic regression classifier. The model built on these 8 species gave an overall predictability of 91.38% (model 11.2,
Taking into account the bacterial species (n=183, above 1083 total species, 16.9% of the total microbial composition) having an average relative abundance greater than or equal to 0.001 if expressed in the interval [0:1] (thus, the 0.1%) within the overall RCC cohort, we applied the second strategy (machine-learning) for 56 RCC patients (NR=29; R=27) with no antibiotic usage history and OUTCOME_2 criterion. After a first round of feature selection, RFECV gave a rank=1 to 20 species which collectively, after a 5-fold cross-validation, were able to predict 78% of NR and 67% of R subjects, with a CV_AUC=0.833 and an AUC=0.852 (model 12.1). After a second round of feature selection, RFE gave a rank=1 to 8 species which were selected for the subsequent logistic regression classifier. The model built on these 8 species gave an overall predictability of 89.66% (model 12.2,
According to the Models 10.2, 11.2 and 12.2, which were modeled on no antibiotic usage history and OUTCOME_2 criterion, and were all based on 8 bacterial species in order to be properly compared, it is shown that the Model 11.2 has the highest discriminatory power among NR and R, with good predictability (91.38%), and excellent CV_AUC and AUC.
According to the results obtained in Example 10, and due to the fact that responders (R) are predicted better than non-responders (NR) in models 10.2 and 11.2, while is the opposite for model 12.2, and that all models were proposed with 8 species (
Regarding the Model 10.2, and according to the coefficients of R (R_coeff), the probability estimate (expressed as a percentage) of belonging to R cohort for a tested subject JD is calculated as follows:
P
R(JD)=1/[1+exp{circumflex over ( )}−(−0.1456*CAG00580+
−0.5496*CAG00013+
−0.4646*CAG00873+
−0.5496*CAG01141+
0.5684*CAG00668+
0.5129*CAG00669+
0.6623*CAG00886+
0.7102*CAG00889)] [12]
In equation [12], “CAGXXXXX” represents the relative abundance of the bacterial species defined by the recited CAG, in a stool sample from the tested subject JD. These relative abundances need to be expressed as standardized values (zero mean and unit variance). The best option is to standardize the “abundances pattern” of the tested subject with a bundle of known “reference patterns”, and the present invention refers a claim of a Python script used to ease the computation of the PR (JD). From the standardized abundance pattern of a tested subject, the equation [12] can be applied to predict his/her belonging to the R cohort with a model predictability of 68.97% and a success rate of 100%.
Regarding the Model 11.2, and according to the β coefficients of R (R_coeff), the probability estimate (expressed as a percentage) of belonging to R cohort for a tested subject JD is calculated as follows:
P
R(JD)=1/[1+exp{circumflex over ( )}−(−0.8463*CAG00346+
1.13*CAG00530+
−1.183*CAG00601+
−0.9544*CAG00607+
0.8929*CAG00646+
1.0919*CAG00713+
0.4881*CAG00919+
1.1606*CAG01158)] [13]
In equation [13], “CAGXXXXX” represents the relative abundance of the bacterial species defined by the recited CAG, in a stool sample from the tested subject JD. These relative abundances need to be expressed as standardized values (zero mean and unit variance). The best option is to standardize the “abundances pattern” of the tested subject with a bundle of known “reference patterns”, and the present invention refers a claim of a Python script used to ease the computation of the PR (JD). From the standardized abundance pattern of a tested subject, the equation [13] can be applied to predict his/her belonging to the R cohort with a model predictability of 91.38% and a success rate of 100%.
Regarding the Model 12.2, and according to the β coefficients of NR (NR_coeff), the probability estimate (expressed as a percentage) of belonging to NR cohort for a tested subject JD is calculated as follows:
P
NR(JD)=1/[1+exp{circumflex over ( )}−(1.138*CAG00142+
1.1396*CAG00218+
−1.0166*CAG00341+
−0.8403*CAG00474+
−0.7067*CAG00508+
−1.185*CAG00880+
0.8466*CAG01263+
−0.8895*CAG01321)] [14]
In equation [14], “CAGXXXXX” represents the relative abundance of the bacterial species defined by the recited CAG, in a stool sample from the tested subject JD. These relative abundances need to be expressed as standardized values (zero mean and unit variance). The best option is to standardize the “abundances pattern” of the tested subject with a bundle of known “reference patterns”, and the present invention refers a claim of a Python script used to ease the computation of the PNR (JD). From the standardized abundance pattern of a tested subject, the equation [14] can be applied to predict his/her belonging to the NR cohort with a model predictability of 89.66% and a success rate of 78%.
The method described above relies on the following positive features:
According to the statistical method used to properly fit the “abundances pattern” of a definite subject into one of the proposed models, as described in Materials and Methods, we propose the following steps:
P=1/(1+exp{circumflex over ( )}−z) [1]
and
z=β1*x1+ . . . +βn*xn [2]
where z is the log-odds ratio (expressing the natural logarithm of the ratio between the probability that an event will occur to the probability that it will not occur), exp is the exponent in natural base, each β is the regression coefficient pre-computed according to the invention, and each x is the relative abundance of selected bacterial species expressed within the closed interval [0:1], obligatory expressed as standardized data (mean removed and unit variance scaled);
In the above method, the regression coefficients have to be calculated by logistic regression classifier with the following requirements: a constant (also known as intercept) not added to the decision function; a max iteration equal to 10000; a liblinear solver (with L2 penalization). All these parameters for logistic regression are explained and described thoroughly in section 1.1.11. of Sci-Kit Learn library guide.
OUTCOME_2 means one considers only SD>6 months among responders.
Based on the results disclosed herein, we conclude the following notions:
1/GOMS (“gut oncomicrobiome signatures”) can definitely predict good clinical outcome during PD-1 blockade with AUC>0.9, and decent specificity and sensitivity in RCC.
2/Some clinical criteria contrasting R versus NR can be impacted by the composition of the gut oncomicrobiome.
3/The GOMS of R and NR are by and large the same ones when taking into consideration the ‘human-driven’ strategy, and they can predict differently the NR or R status based on logistic regression coefficients.
4/Clinical significance relies on an equilibrium between “favorable” and “unfavorable” commensals, detectable in higher abundance in R feces or NR feces, respectively
5/Some species withstand ATB, mainly members of Firmicutes phylum.
6/Our best predicting minimal ecosystem (called “GOMS”) on the whole population of RCC 2L is based on few MGS (unfavorable Prevotella spp. Faecalibcaterium spp., Coprococcus catus, Eggerthelia lenta, Clostridium boiteae versus favorable Suttereila spp., Akkermansia, Alistipes spp.), as evidenced by pairwise analysis and logistic regression models.
7/Of note, some favorable commensals are part of the differences observed between healthy versus cancer individuals (overexpressed bacteria in HV (vs cancer pts) are also associated with R phenotype in anti-PD1 treated patients)
In the present text, we proposed different GOMS (gut oncomicrobiome signatures) to predict the NR or R status of a definite subject suffering from RCC depending, before an ICI treatment, on two main criteria: i) the presence of an antibiotic treatment/usage within the last two months; ii) the RECIST definition of outcome (OUTCOME_1 or OUTCOME_2). As a first step, we evidenced how GOMS are able to differentiate among NR and R at OUTCOME_1, TTP3 and TTP6, with this latter showing significant alfa- and beta-diversity in the first cohort reported in Routy et al Science 2018. Mean AUC values (among NR and R) for the best descriptors (single bacterial species or consortia) based on ROC curves showed the higher prediction value at TTP3: regardless antibiotic usage we have Mean AUCOUTCOME1=0.872, Mean AUCTTP3=0.894, Mean AUCTTP6=0.840, while with no antibiotic treatment we have Mean AUCOUTCOME1=0.862, Mean AUCTTP3=0.891, Mean AUCTTP6=0.762. From these observations, it seems that in RCC the antibiotic pressure (at least within the two months before ICI treatment) seems to enhance predictability of the GOMS signature especially at TTP6 (+10%), while leaving unalterated the one at OUTCOME1 and TTP3.
In the above study, we deeply analyzed by means of Volcano, LEfSe and pairwise analysis the bacterial signature of NR and R in patients regardless antibiotic usage and of those who did not take antibiotics during the two months preceeding ICI treatment. We found that from the original signature at TTP6 no bacterial species resisted ATB among NR, while 18% resisted ATB among responders (
Number | Date | Country | Kind |
---|---|---|---|
18306282.7 | Sep 2018 | EP | regional |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/EP2019/076158 | 9/27/2019 | WO | 00 |