Drug combination has been widely used in treating diseases, including some of the most dreadful diseases such as cancer and AIDS. Oftentimes, a drug combination can have better therapeutic outcomes than single anticancer drug treatment. The rationale for drug combination includes improved therapeutic effect, dose and toxicity reduction, and the reduction or delay of drug resistance.
Due to the large number of available drugs, their complex (and often not well-understood) mechanism in treating diseases, and the complicated drug-drug interactions, finding a good combination of drugs having improved efficacy, toxicity and other properties of drug combinations can be extremely difficult. It is infeasible to experimentally screen possible drug combinations considering the limited resources.
There is a need to develop methods for improved reliability/accuracy in the prediction of properties of combination drugs.
The present disclosure provides computer-implemented methods to predict drug combinations using genomic data, treatment patterns, and clinical outcomes data.
In one aspect, the present disclosure provides a method of determining effects of drug combinations on treatment outcomes, which comprises: generating a plurality of genomic and clinical variables from the combination of (1) comprehensive genomic data, (2) EHR data, and (3) clinical treatment data, for each of a plurality of patients; wherein the plurality of patients comprise at least a first subset who have been treated with at least one first drug for a disease, and a second subset who have been treated with at least one second, different drug for the same disease, the first subset not entirely overlapping the second subset; setting up a plurality of two by two contingency tables in which rows are defined by the presence or absence of each of the plurality of genomic and clinical variables, and the columns are defined by the presence or absence of each of the first drug and the second drug; based on a Cox Proportional Hazards model, calculating independent risk factors, cumulative hazard-ratios, and p-values for the combination of the first drug and the second drug; and determining the nature of the combination of the first drug and the second drug as being one of additive, synergistic, and antagonistic with respect to treating the disease.
In another aspect, the present disclosure provides a method of determining drug effect on treatment outcome for a disease, comprising: generating a plurality of genomic and clinical variables from the combination of (1) comprehensive genomic data, (2) EHR data, and (3) clinical treatment data, for each of a plurality of patients; wherein some, but not all, of the plurality of patients share a common biomarker, and wherein some, but not all, of the plurality of patients have been treated with a same drug for a disease; based on the plurality of genomic and clinical variables and a two by two contingency table representing the following combinations: (1) the number of patients having the biomarker and having been treated with the drug, (2) the number of patients having the biomarker but having not been treated by the drug, (3) the number of patients not having the biomarker and having been treated with the drug, and (4) the number of patients not having the biomarker and not having been treated by the drug, using a Cox Proportional Hazards model to calculate independent risk factors, cumulative hazard-ratios, and p-values for the combination of the drug and the biomarker; and determining the nature of the combination of the drug and the biomarker as being one of additive, synergistic, and antagonistic with respect to treating the disease.
In further aspect, the present disclosure provides a method of determining effects of drug combinations on treatment outcomes, the method comprising: generating a plurality of genomic and clinical variables from the combination of (1) comprehensive genomic data, (2) EHR data, and (3) clinical treatment data, for each of a plurality of patients; wherein the plurality of patients comprise at least a first subset who have been treated with at least one first drug for a disease, a second subset who have been treated with at least one second, different drug for the same disease, and a third subset who have been treated with at least one third drug which are different from the first drug and different from the second drug for the same disease, each of the first, second, and third subsets not entirely overlapping with any of other subsets; setting up a plurality of two by two contingency tables in which rows are defined by the presence or absence of each of the plurality of genomic and clinical variables, and the columns are defined by the presence or absence of each of the first, second and third drug; based on a Cox Proportional Hazards model, calculating independent risk factors, cumulative hazard-ratios, and p-values for the combination of the first and the second drug, combination of the first and the second drug, and combination of the first and the second drug, and determining the nature of all possible binary combinations of the first, second and third drug as being one of additive, synergistic, and antagonistic with respect to treating the disease.
In some embodiments, whole-exome (WES) and transcriptome (RNA-Seq) sequencing of tumors of patients are first obtained. Bioinformatics analysis can be performed on the sequencing data to provide certain genomic features for each cancer patient, such as gene expression, loss of heterozygosity (LOH), copy number alteration (CNA), somatic and germline mutations, Microsatellite instability (MSI), tumor mutational burden (TMB), Chromosomal Variation, Mutational signatures, human Leukocyte Antigen Typing (HLA) and human pathogen. Demographics, tumor types/characteristics (biomarkers, stage, pathology), treatment (prescriptions, surgery, radiotherapy, diagnostic imaging, side effects/adverse events), and long-term survival outcome clinical variables can be obtained from real-world clinical electronic health records (EHRs).
Any of the steps or aspects of the methods disclosed herein can be carried out on a computer using one or more computer processors.
Reference will now be made in detail to the present embodiments of the invention, examples of which are illustrated in the accompanying drawings.
In one aspect, the present disclosure provides a method of determining effects of drug combinations on treatment outcomes, which comprises: generating a plurality of genomic and clinical variables from the combination of (1) comprehensive genomic data, (2) EHR data, and (3) clinical treatment data, for each of a plurality of patients; wherein the plurality of patients comprise at least a first subset who have been treated with at least one first drug for a disease, and a second subset who have been treated with at least one second, different drug for the same disease, the first subset not entirely overlapping the second subset; setting up a plurality of two by two contingency tables in which rows are defined by the presence or absence of each of the plurality of genomic and clinical variables, and the columns are defined by the presence or absence of each of the first drug and the second drug; based on a Cox Proportional Hazards model, calculating independent risk factors, cumulative hazard-ratios, and p-values for the combination of the first drug and the second drug; and determining the nature of the combination of the first drug and the second drug as being one of additive, synergistic, and antagonistic with respect to treating the disease.
In another aspect, the present disclosure provides a method of determining effects of drug combinations on treatment outcomes, the method comprising: generating a plurality of genomic and clinical variables from the combination of (1) comprehensive genomic data, (2) EHR data, and (3) clinical treatment data, for each of a plurality of patients; wherein the plurality of patients comprise at least a first subset who have been treated with at least one first drug for a disease, a second subset who have been treated with at least one second, different drug for the same disease, and a third subset who have been treated with at least one third drug which are different from the first drug and different from the second drug for the same disease, each of the first, second, and third subsets not entirely overlapping with any of other subsets; setting up a plurality of two by two contingency tables in which rows are defined by the presence or absence of each of the plurality of genomic and clinical variables, and the columns are defined by the presence or absence of each of the first, second and third drug; based on a Cox Proportional Hazards model, calculating independent risk factors, cumulative hazard-ratios, and p-values for the combination of the first and the second drug, combination of the first and the second drug, and combination of the first and the second drug, and determining the nature of all possible binary combinations of the first, second and third drug as being one of additive, synergistic, and antagonistic with respect to treating the disease. A particular combination of two drugs can be selected for treating patients based on the determined nature of these possible binary combinations of drugs.
In another aspect, the present disclosure provides a method of determining drug effect on treatment outcome for a disease, comprising: generating a plurality of genomic and clinical variables from the combination of (1) comprehensive genomic data, (2) EHR data, and (3) clinical treatment data, for each of a plurality of patients; wherein some, but not all, of the plurality of patients share a common biomarker, and wherein some, but not all, of the plurality of patients have been treated with a same drug for a disease; based on the plurality of genomic and clinical variables and a two by two contingency table representing the following combinations: (1) the number of patients having the biomarker and having been treated with the drug, (2) the number of patients having the biomarker but having not been treated by the drug, (3) the number of patients not having the biomarker and having been treated with the drug, and (4) the number of patients not having the biomarker and not having been treated by the drug, using a Cox Proportional Hazards model to calculate independent risk factors, cumulative hazard-ratios, and p-values for the combination of the drug and the biomarker; and determining the nature of the combination of the drug and the biomarker as being one of additive, synergistic, and antagonistic with respect to treating the disease.
The disclosed method utilizes certain data sources, which can be provided by healthcare institutions, hospitals, clinics, medical practice groups, and patients. As an example, for evaluation efficacy of possible cancer drug combinations, data about cancer patients can be used. Tumor tissues can be collected from patients, pathology tests can be performed on the tissue, and the tissue can also be subject to genomic sequencing, such as whole-exome (WES) and transcriptome (RNA-Seq) sequencing. Bioinformatics analysis can be performed on the sequencing data to provide certain genomic features for each cancer patient, such as gene expression, loss of heterozygosity (LOH), copy number alteration (CNA), somatic and germline mutations, Microsatellite instability (MSI), tumor mutational burden (TMB), Chromosomal Variation, Mutational signatures, human Leukocyte Antigen Typing (HLA) and human pathogen.
Meanwhile, patient data from real-world clinical electronic health records (EHR) for the patients can be used to obtain demographics, medical history, medication and allergies, immunization status, laboratory test results, radiology images, vital signs, personal statistics like age and weight, etc. of the patients. Such patient data can be de-identified, processed, and stored into a database for use by clinical management software. Quality control and inspection may be performed on the patient data to reduce or eliminate errors.
Further, clinical treatment data can also be obtained about the patients. For example, for cancer patients, a patient may undergo one or more therapies and has been treated by one or more drugs for the cancer. The clinical treatment data can include prescriptions, surgery, radiotherapy, diagnostic imaging, side effects/adverse events, other treatment status and progress, as well as outcomes.
Based on the comprehensive genomic data, EHR data, and real-world treatment data, a database can be built to match these data and generate a plurality of genomic and clinical variables.
Then, a novel exhaustive Cox proportional hazards model (ECPH) model is used to evaluate all possible drug combinations with respect to their efficacy in prolonging patients' lives. In terms of efficacy, there can be three types of drug interaction: additive, synergistic, and antagonistic. Identifying the drug combination interactions in the clinical trial and/or real-world clinical data can help make the choice between sequential and simultaneous treatment and the design of new drug combinations.
Additive interaction means the effect of two chemicals is equal to the sum of the effect of the two chemicals taken separately. Synergistic interaction means that the effect of two substances/agents taken together is greater than the sum of their separate effect at the same doses. Antagonistic interaction means that the effect of two substances/agents is actually less than the sum of the effect of the two drugs taken independently of each other. By mathematic interaction definition, if the combination effect is greater than the mathematic probability of the two agents contributing independently (Synergistic), equal to the probability of their independent activities (Additive) or less than predicted (Antagonistic).
The Cox proportional-hazards (CPH) model is essentially a regression model used in medical research for investigating the association between the survival time of patients and one or more predictor variables. The CPH model extends survival analysis methods to assess simultaneously the effect of several risk factors on survival time.
The approach used in the present disclosure further extends the CPH model. In an example process, drugs used for fewer than one treatment cycle or the number of patients in any variables less than 15 were removed. Then, a 2×2 contingency table is set up, in which rows are defined by every unique genomic or clinical variable, and the columns are defined by drug variable (See Table 3 as an illustration). Then the CPH model (as described in “The Robust Inference for the Cox Proportional Hazards Model”, D. Y. Lin &L. J. Wei, Journal of the Am. Stat. Assoc., pp. 1074-1078, 1989) is used to calculate independent risk factors, cumulative hazard-ratios, and p-values for each drug combination table. Then these results are used predict and prioritize effective drug combinations with respect to additive, synergistic and antagonistic effects.
Additive, synergistic and antagonistic effect of factor/drug combinations in real-word clinical outcomes can be described as follows.
Additive combination definition: The HR score of the drug A+ plus factor B+ group is between the other two treatment groups (A+B− and A−B+). The P-value of the drug A+ plus factor B+ group is not statistically significant compared with the other two treatment groups (A+B− and A−B+).
Synergistic combination definition: The HR score of the drug A+ plus factor B+ group is smaller than the other two treatment groups (A+B− and A−B+). The P-value of the drug A+ plus factor B+ group has statistically significant compared with the other two treatment groups (A+B− and A−B+). The drug A+ plus factor B+ group is the statistically independent variable of the other two treatment groups (A+B− and A−B+).
Antagonistic combination definition: The HR score of the drug A+ plus factor B+ group is greater than the other two treatment groups (A+B− and A−B+). The P-value of the drug A+ plus factor B+ group has statistically significant compared with the other two treatment groups (A+B− and A−B+).
In the above definitions of different types of combinations, factor B can be a second drug which has been used to treat the patient cohort, or a certain characteristic of the patient cohort, for example, a genomic biomarker.
Any steps of the described methods can be performed on one or more computing devices (e.g., a workstation, a PC, a laptop, a mobile device, etc., or networked computers in a distributed environment, e.g., a cloud). As shown in
The described methods are validated by the following examples.
1. Synergistic Combination of Lenvatinib with PD-1/PD-L1 Immune Checkpoint Inhibitors for Hepatocellular Carcinoma (HCC) and Intrahepatic Cholangiocarcinoma (ICC).
In this example, data were collected and analyzed in the following steps:
(1) Clinical high-throughput sequencing and bioinformatics analysis according to the flow chart shown in
(2) Real-world clinical electronic health records (EHRs) collection, clinical data entry and l long-term follow-up, according to the flowchart shown in
(3) Comprehensive genomic data is matched with real-world treatment patterns and clinical outcomes feature database and analysis workflow, as shown in
(4) A large one-hot encoding matrix(˜10,000*10,000) of comprehensive genomic and clinical factors. Based on the one-hot encoding matrix, the combined effects of all factors such as: age, gender, gene mutation, and drug treatment can be obtained.
A sample snippet of one-hot matrix encoding in shown in the below Table.
(5) Applying exhaustive Cox proportional hazards model (ECPH) model
a. Patients are divided into four categories based on the combined use of the two drugs. For an example: Patients treated with Lenvatinib and without Sorafenib were defined as the Lenvatinib treatment group. Patients treated with Sorafenib and without Lenvatinib were defined as the Sorafenib treatment group. Patients treated with Sorafenib and Lenvatinib were defined as the Sorafenib-Lenvatinib treatment group. Patients treated without Sorafenib and Lenvatinib were defined as Sorafenib-Lenvatinib free treatment group.
b. 2×2 contingency tables were calculated for all possible drug combinations pairs: Millions of possible combinations of Lenvatinib.
c. Cox PH model (The Robust Inference for the Cox Proportional Hazards Model D. Y. Lin &L. J. Wei Pages 1074-1078) was used to calculate independent risk factors, cumulative hazard-ratios, and p-values for each drug combination table. Briefly, the hazard function can be interpreted as the risk of dying at time t. It can be estimated as follow: h(t)=h0(t)×exp(b1x1+b2x2+ . . . +bpxp) where,
The Cox PH model can be written as a multiple linear regression of the logarithm of the hazard on the variables xi, with the baseline hazard being an ‘intercept’ term that varies with time. The average hazard rate of the interval was used in which the number of patients dying per unit time in the interval is divided by the average number of survivors at the midpoint of the interval:
h(t)=number of patients dying per unit time in the interval/((number of patients surviving at t)−(number of deaths in the interval)/2)
The hazard ratio of the patient receiving the experimental drug and the one receiving placebo is:
h(t|x1=1)/h(t|x1=0)=exp(bi)
The hazard ratios (HR) are defined as the quantities exp(bi). Thus, the two treatments are equally effective if HR=1 and the experimental drug introduces lower (higher) risk for survival than placebo if HR<1 (HR>1). The function coxph (R survival package) can be used to compute the Cox proportional hazards regression model in R. (https://cran.r-project.org/web/packages/survival/survival.pdf). Using Contingency Table A below for an example, three treatment groups HR scores were obtained: Sorafenib+ and Lenvatinib+ group: HR: 1.35, P-value: 0.334; Sorafenib+ and Lenvatinib− group: HR: 0.76, P-value: 0.388; Sorafenib− and Lenvatinib+ group: HR: 0.56 P-value: 0.058. These three results were used to predict and prioritize effective drug combinations with respect to additive, synergistic and antagonistic effects (according to the above additive, synergistic and antagonistic combination definition), as well as explore the dynamics of combination therapy and its role in combating drug resistance in cancer treatments.
The below three tables are shown as calculation examples:
Sorafenib+ and Lenvatinib+ group: HR: 1.35, P-value: 0.334; Sorafenib+ and Lenvatinib− group: HR: 0.76, P-value: 0.388; Sorafenib− and Lenvatinib+ group: HR: 0.56 P-value: 0.058
Regorafenib+ and Lenvatinib+ group: HR: 1.88, P-value: 0.475; Regorafenib+ and Lenvatinib− group: HR: 0.99, P-value: 0.984; Regorafenib− and Lenvatinib+ group: HR: 0.74 P-value: 0.226
PD-1/PD-L1+ and Lenvatinib+ group: HR: 0.278, P-value: 0.008; PD-1/PD-L1+ and Lenvatinib− group: HR: 0.503, P-value: 0.117; PD-1/PD-L1− and Lenvatinib+ group: HR: 1.00 P-value: 0.977
Synergistic Combination Definition Used in this Example:
The HR score of the PD-1/PD-L1+ and Lenvatinib+ group is smaller than the other two treatment groups (HR: 0.278<PD-1/PD-L1+ and Lenvatinib− group: HR: 0.503 and PD-1/PD-L1− and Lenvatinib+ group: HR: 1.00). The P-value of the PD-1/PD-L1+ and Lenvatinib+ group has statistically significant compared with the other two treatment groups (P-value: 0.008<PD-1/PD-L1+ and Lenvatinib− group: P-value: 0.117 and PD-1/PD-L1− and Lenvatinib+ group P-value: 0.977). The PD-1/PD-L1 and Lenvatinib group is the statistically independent variable (independence of Chi-square test: P-value: 0.223). These results show the treatment with Lenvatinib plus anti-PD-1/PD-L1 treatment induced significant antitumor activity compared with Lenvatinib or anti-PD-1 treatment alone. Our ECPH model provide a real-world scientific rationale for combination therapy of Lenvatinib with anti-PD-1/PD-L1 blockade to improve cancer immunotherapy.
Such validation can also find support in the following references:
2. Antagonistic Combination (Drug Resistance) Genetics HLA Biomarker with PD-1/PD-L1 Immune Checkpoint Inhibitors.
In this example, a specific HLA-B biomarker is considered a second factor and its combination with the treatment of a PD-1/PD-L1 drug is evaluated in a similar manner as outlined above. For example, a 2 by 2 contingency table can be set up as follows:
Such validation can also find support in the following reference: Patient HLA class I genotype influences cancer response to checkpoint blockade immunotherapy. (Science. 2018 Feb. 2; 359(6375):582-587. doi: 10.1126/science.aao4572. Epub 2017 Dec. 7.) In this paper, it is observed that two independent melanoma cohorts, patients with the HLA-B44 supertype had extended survival, whereas the HLA-B62 supertype (including HLA-B*15:01) or somatic loss of heterozygosity at HLA-I was associated with poor outcome.
It is to be understood that the embodiments shown and described herein are only illustrative of the principles of the present invention and that various modifications may be implemented by those skilled in the art without departing from the scope and spirit of the invention. Those skilled in the art could implement various other feature combinations without departing from the scope and spirit of the invention.
Number | Date | Country | |
---|---|---|---|
Parent | PCT/CN2020/104900 | Jul 2020 | US |
Child | 18159923 | US |