Physicians engaged in precision oncology may integrate an overwhelming amount of information from publications and from their own experience. For example, as of the end of 2019, PubMed reports 19,748 publications matching the term “breast cancer” in the past year alone, and the same search for open, recruiting studies in ClinicalTrials.gov returns 1,937 studies. Therefore, practitioners, such as oncologists, may face challenges in reading all these materials, determining which may be most relevant, and synthesizing the whole of the data into relevant predictions for patient outcomes.
Oncologists fighting less common cancers may be potentially a worse situation; instead of being overwhelmed, they may have only a few relevant publications, and may have seen only a small number of similar cases. Here, successful prediction of patient outputs may depend on prior information gleaned from experts in similar, but not exactly the same, disease states.
Importantly, prediction may not be an exact science. Every patient may respond differently, due to a multitude of unknowns; it may be difficult or even impossible to fully model patients and their disease states, or the complete set of interactions between patients and their treatment regimens.
For some cancers, such as chronic myelogenous leukemia, the level of uncertainty may be relatively low; patients may almost universally receive tyrosine kinase receptor inhibitors, and the response characteristics may be relatively well-known. But for most cancers, and for many late-stage cancers, the number of unknown variables may far outnumber the number of known characteristics. In these cases, the sum of effects from the unknown variables may exceed the effects from known treatments. This may require probabilistic reasoning in order to devise an effective rational treatment strategy.
Thus, there remains a need for automated intelligent systems and methods that acquire and structure knowledge from a diverse array of sources, such as clinical trials, case series, individual patients cases and outcomes data, and expert opinions, such that such information may be used to predict, for a given patient, what the probable range of outcomes might be, over time, for a given treatment. Furthermore, such predictions may be explainable to a physician or scientist who queries the system for such a prediction; in contrast, a “black box” that provides answers without rationales may not instill confidence.
In light of the needs above, the present disclosure provides systems and methods for precision oncology using multilevel Bayesian models, which may effectively address challenges faced by physicians when treating patients with complex disease etiologies, such as cancer. Systems and methods of the present disclosure may be used to predict various measures of patient outcomes for particular patients under different treatment regimens. The systems and methods may be capable of learning from a diverse range of information sources, including individual patient outcomes observed outside of randomized trials (in other words, “real world evidence” or RWE) as well as other sources, such as expert surveys and summary statistics from clinical trials. The learning process may occur via a training module, which presents this data in a learning loop to a multilevel model module, which may be a combination of a Bayesian model and database.
Once the multilevel model module has been conditioned on such source data, it may be used in conjunction with a prediction module to predict outcomes for new patients under different treatment choices and provide a measure of the uncertainty of these predictions. These predictions may be probabilistic in nature, in that they represent a distribution of possible outcomes (e.g., in contrast to a single outcome).
A key advance may be that the multilevel model’s structure bears an understandable relationship to the domain, and to the types of inputs and outputs oncologists may expect. This structure may help users of systems and methods of the present disclosure to understand how the predictions and uncertainty therein may be derived, rather than treating the results as “black box” predictions. This level of explainability may be critical, for example, for certification of medical devices that rely on Artificial Intelligence and Machine Learning.
In an aspect, the present disclosure provides a system comprising a computer processor and a storage device having instructions stored thereon that are operable, when executed by the computer processor, to cause the computer processor to: (i) receive clinical data of a subject and a set of treatment options for a disease or disorder of the subject, wherein the set of treatment options corresponds to clinical outcomes having future uncertainty; (ii) access a prediction module comprising a trained machine learning model that determines probabilistic predictions of clinical outcomes of the set of treatment options based at least in part on clinical data of subjects; and (iii) apply the prediction module to at least the clinical data of the subject to determine probabilistic predictions of clinical outcomes of the set of treatment options for the disease or disorder of the subject.
In some embodiments, the clinical data is selected from somatic genetic mutations, germline genetic mutations, mutational burden, protein levels, transcriptome levels, metabolite levels, tumor size or staging, clinical symptoms, laboratory test results, and clinical history.
In some embodiments, the disease or disorder comprises cancer. In some embodiments, the subject has received a previous treatment for the cancer. In some embodiments, the subject has not received a previous treatment for the cancer.
In some embodiments, the cancer is selected from the group consisting of: Adrenal Gland Tumor, Ampulla of Vater Tumor, Biliary Tract Tumor, Bladder/Urinary Tract Tumor, Bone Tumor, Bowel Tumor, Breast Tumor, CNS/Brain Tumor, Cervix Tumor, Esophagus/Stomach Tumor, Eye Tumor, Head and Neck Tumor, Kidney Tumor, Liver Tumor, Lung Tumor, Lymphoid Tumor, Myeloid Tumor, Other Tumor, Ovary/Fallopian Tube Tumor, Pancreas Tumor, Penis Tumor, Peripheral Nervous System Tumor, Peritoneum Tumor, Pleura Tumor, Prostate Tumor, Skin Tumor, Soft Tissue Tumor, Testis Tumor, Thymus Tumor, Thyroid Tumor, Uterus Tumor, and Vulva/Vagina Tumor. In some embodiments, the cancer is selected from the group consisting of: Adrenal Gland Tumor, Ampulla of Vater Tumor, Biliary Tract Tumor, Bladder/Urinary Tract Tumor, Bone Tumor, Bowel Tumor, Breast Tumor, CNS/Brain Tumor, Cervix Tumor, Esophagus/Stomach Tumor, Eye Tumor, Head and Neck Tumor, Kidney Tumor, Liver Tumor, Lung Tumor, Lymphoid Tumor, Myeloid Tumor, Other Tumor, Ovary/Fallopian Tube Tumor, Pancreas Tumor, Penis Tumor, Peripheral Nervous System Tumor, Peritoneum Tumor, Pleura Tumor, Prostate Tumor, Skin Tumor, Soft Tissue Tumor, Testis Tumor, Thymus Tumor, Thyroid Tumor, Uterus Tumor, Vulva/Vagina Tumor, Adrenocortical Adenoma, Adrenocortical Carcinoma, Pheochromocytoma, Ampullary Carcinoma, Cholangiocarcinoma, Gallbladder Cancer, Intracholecystic Papillary Neoplasm, Intraductal Papillary Neoplasm of the Bile Duct, Bladder Adenocarcinoma, Bladder Squamous Cell Carcinoma, Bladder Urothelial Carcinoma, Inflammatory Myofibroblastic Bladder Tumor, Inverted Urothelial Papilloma, Mucosal Melanoma of the Urethra, Plasmacytoid/Signet Ring Cell Bladder Carcinoma, Sarcomatoid Carcinoma of the Urinary Bladder, Small Cell Bladder Cancer, Upper Tract Urothelial Carcinoma, Urachal Carcinoma, Urethral Cancer, Urothelial Papilloma, Adamantinoma, Chondroblastoma, Chondrosarcoma, Chordoma, Ewing Sarcoma, Giant Cell Tumor of Bone, Osteosarcoma, Anal Gland Adenocarcinoma, Anal Squamous Cell Carcinoma, Anorectal Mucosal Melanoma, Appendiceal Adenocarcinoma, Colorectal Adenocarcinoma, Gastrointestinal Neuroendocrine Tumors, Low-grade Appendiceal Mucinous Neoplasm, Medullary Carcinoma of the Colon, Small Bowel Cancer, Small Intestinal Carcinoma, Tubular Adenoma of the Colon, Adenomyoepithelioma of the Breast, Breast Ductal Carcinoma In Situ, Breast Fibroepithelial Neoplasms, Breast Lobular Carcinoma In Situ, Breast Neoplasm, NOS, Breast Sarcoma, Inflammatory Breast Cancer, Invasive Breast Carcinoma, Juvenile Secretory Carcinoma of the Breast, Metaplastic Breast Cancer, Choroid Plexus Tumor, Diffuse Glioma, Embryonal Tumor, Encapsulated Glioma, Ependymomal Tumor, Germ Cell Tumor, Brain, Meningothelial Tumor, Miscellaneous Brain Tumor, Miscellaneous Neuroepithelial Tumor, Pineal Tumor, Primary CNS Melanocytic Tumors, Sellar Tumor, Cervical Adenocarcinoma, Cervical Adenocarcinoma In Situ, Cervical Adenoid Basal Carcinoma, Cervical Adenoid Cystic Carcinoma, Cervical Adenosquamous Carcinoma, Cervical Leiomyosarcoma, Cervical Neuroendocrine Tumor, Cervical Rhabdomyosarcoma, Cervical Squamous Cell Carcinoma, Glassy Cell Carcinoma of the Cervix, Mixed Cervical Carcinoma, Small Cell Carcinoma of the Cervix, Villoglandular Adenocarcinoma of the Cervix, Esophageal Poorly Differentiated Carcinoma, Esophageal Squamous Cell Carcinoma, Esophagogastric Adenocarcinoma, Gastrointestinal Neuroendocrine Tumors of the Esophagus/Stomach, Mucosal Melanoma of the Esophagus, Smooth Muscle Neoplasm, NOS, Lacrimal Gland Tumor, Ocular Melanoma, Retinoblastoma, Head and Neck Carcinoma, Other, Head and Neck Mucosal Melanoma, Head and Neck Squamous Cell Carcinoma, Nasopharyngeal Carcinoma, Parathyroid Cancer, Salivary Carcinoma, Sialoblastoma, Clear Cell Sarcoma of Kidney, Renal Cell Carcinoma, Renal Neuroendocrine Tumor, Rhabdoid Cancer, Wilms’ Tumor, Fibrolamellar Carcinoma, Hepatoblastoma, Hepatocellular Adenoma, Hepatocellular Carcinoma, Hepatocellular Carcinoma plus Intrahepatic Cholangiocarcinoma, Liver Angiosarcoma, Malignant Nonepithelial Tumor of the Liver, Malignant Rhabdoid Tumor of the Liver, Undifferentiated Embryonal Sarcoma of the Liver, Combined Small Cell Lung Carcinoma, Inflammatory Myofibroblastic Lung Tumor, Lung Adenocarcinoma In Situ, Lung Neuroendocrine Tumor, Non-Small Cell Lung Cancer, Pleuropulmonary Blastoma, Pulmonary Lymphangiomyomatosis, Sarcomatoid Carcinoma of the Lung, Lymphoid Atypical, Lymphoid Benign, Lymphoid Neoplasm, Myeloid Atypical, Myeloid Benign, Myeloid Neoplasm, Adenocarcinoma In Situ, Cancer of Unknown Primary, Extra Gonadal Germ Cell Tumor, Mixed Cancer Types, Ovarian Cancer, Other, Ovarian Epithelial Tumor, Ovarian Germ Cell Tumor, Sex Cord Stromal Tumor, Acinar Cell Carcinoma of the Pancreas, Adenosquamous Carcinoma of the Pancreas, Cystic Tumor of the Pancreas, Pancreatic Adenocarcinoma, Pancreatic Neuroendocrine Tumor, Pancreatoblastoma, Solid Pseudopapillary Neoplasm of the Pancreas, Undifferentiated Carcinoma of the Pancreas, Penile Squamous Cell Carcinoma, Ganglioneuroblastoma, Ganglioneuroma, Nerve Sheath Tumor, Neuroblastoma, Peritoneal Mesothelioma, Peritoneal Serous Carcinoma, Pleural Mesothelioma, Basal Cell Carcinoma of Prostate, Prostate Adenocarcinoma, Prostate Neuroendocrine Carcinoma, Prostate Small Cell Carcinoma, Prostate Squamous Cell Carcinoma, Aggressive Digital Papillary Adenocarcinoma, Atypical Fibroxanthoma, Atypical Nevus, Basal Cell Carcinoma, Cutaneous Squamous Cell Carcinoma, Dermatofibroma, Dermatofibrosarcoma Protuberans, Desmoplastic Trichoepithelioma, Endocrine Mucin Producing Sweat Gland Carcinoma, Extramammary Paget Disease, Melanoma, Merkel Cell Carcinoma, Microcystic Adnexal Carcinoma, Porocarcinoma/Spiroadenocarcinoma, Poroma/Acrospiroma, Proliferating Pilar Cystic Tumor, Sebaceous Carcinoma, Skin Adnexal Carcinoma, Spiroma/Spiradenoma, Sweat Gland Adenocarcinoma, Sweat Gland Carcinoma/Apocrine Eccrine Carcinoma, Aggressive Angiomyxoma, Alveolar Soft Part Sarcoma, Angiomatoid Fibrous Histiocytoma, Angiosarcoma, Atypical Lipomatous Tumor, Clear Cell Sarcoma, Dendritic Cell Sarcoma, Desmoid/Aggressive Fibromatosis, Desmoplastic Small-Round-Cell Tumor, Epithelioid Hemangioendothelioma, Epithelioid Sarcoma, Ewing Sarcoma of Soft Tissue, Fibrosarcoma, Gastrointestinal Stromal Tumor, Glomangiosarcoma, Hemangioma, Infantile Fibrosarcoma, Inflammatory Myofibroblastic Tumor, Intimal Sarcoma, Leiomyoma, Leiomyosarcoma, Liposarcoma, Low-Grade Fibromyxoid Sarcoma, Malignant Glomus Tumor, Myofibroma, Myofibromatosis, Myopericytoma, Myxofibrosarcoma, Myxoma, Paraganglioma, Perivascular Epithelioid Cell Tumor, Pseudomyogenic Hemangioendothelioma, Radiation-Associated Sarcoma, Rhabdomyosarcoma, Round Cell Sarcoma, NOS, Sarcoma, NOS, Soft Tissue Myoepithelial Carcinoma, Solitary Fibrous Tumor/Hemangiopericytoma, Synovial Sarcoma, Tenosynovial Giant Cell Tumor Diffuse Type, Undifferentiated Pleomorphic Sarcoma/Malignant Fibrous Histiocytoma/High-Grade Spindle Cell Sarcoma, Non-Seminomatous Germ Cell Tumor, Seminoma, Sex Cord Stromal Tumor, Testicular Lymphoma, Testicular Mesothelioma, Thymic Epithelial Tumor, Thymic Neuroendocrine Tumor, Anaplastic Thyroid Cancer, Hurthle Cell Thyroid Cancer, Hyalinizing Trabecular Adenoma of the Thyroid, Medullary Thyroid Cancer, Oncocytic Adenoma of the Thyroid, Poorly Differentiated Thyroid Cancer, Well-Differentiated Thyroid Cancer, Endometrial Carcinoma, Gestational Trophoblastic Disease, Other Uterine Tumor, Uterine Sarcoma/Mesenchymal, Germ Cell Tumor of the Vulva, Mucinous Adenocarcinoma of the Vulva/Vagina, Mucosal Melanoma of the Vulva/Vagina, Poorly Differentiated Vaginal Carcinoma, Squamous Cell Carcinoma of the Vulva/Vagina, and Vaginal Adenocarcinoma.
In some embodiments, (iii) comprises applying the prediction module to at least treatment features of the set of treatment options to determine the probabilistic predictions of the clinical outcomes of the set of treatment options. In some embodiments, the treatment features comprise attributes of a surgical intervention, a drug intervention, a targeted intervention, a hormonal therapy intervention, a radiotherapy intervention, or an immunotherapy intervention. In some embodiments, the treatment features comprise the attributes of the drug intervention, wherein the attributes of the drug intervention comprise a chemical structure or a biological target of the drug intervention.
In some embodiments, (iii) comprises applying the prediction module to at least interaction terms between the clinical data of the subject and the treatment features of the set of treatment options to determine the probabilistic predictions of the clinical outcomes of the set of treatment options.
In some embodiments, the clinical outcomes having future uncertainty comprise a change in tumor size, a change in patient functional status, a time-to-disease progression, a time-to-treatment failure, overall survival, or progression-free survival. In some embodiments, the clinical outcomes having future uncertainty comprise the change in tumor size, as indicated by cross section or volume. In some embodiments, the clinical outcomes having future uncertainty comprise the change in patient functional status, as indicated by ECOG, Karnofsky, or Lansky scores.
In some embodiments, the probabilistic predictions of clinical outcomes of the set of treatment options comprise statistical distributions of the clinical outcomes of the set of treatment options. In some embodiments, (iii) further comprises determining a statistical parameter of the statistical distributions of the clinical outcomes of the set of treatment options. In some embodiments, the statistical parameter is selected from the group consisting of a median, a mean, a mode, a variance, a standard deviation, a quantile, a measure of central tendency, a measure of variance, a range, a minimum, a maximum, an interquartile range, a frequency, a percentile, a shape parameter, a scale parameter, and a rate parameter. In some embodiments, the statistical distributions of the clinical outcomes of the set of treatment options comprise a parametric distribution selected from the group consisting of a Weibull distribution, a log logistic distribution, or a log normal distribution, a Gaussian distribution, a Gamma distribution, and a Poisson distribution.
In some embodiments, the probabilistic predictions of clinical outcomes of the set of treatment options are explainable based on performing a query of the probabilistic predictions.
In some embodiments, the instructions are operable, when executed by the computer processor, to cause the computer processor to further apply a training module that trains the trained machine learning model. In some embodiments, the trained machine learning model is trained using a plurality of disparate data sources. In some embodiments, the training module aggregates datasets from the plurality of disparate sources, wherein the datasets are persisted in a plurality of data stores, and trains the trained machine learning model using the aggregated datasets. In some embodiments, the plurality of disparate sources is selected from the group consisting of clinical trials, case series, individual patient cases and outcomes data, and expert opinions.
In some embodiments, the training module updates the trained machine learning model using the probabilistic predictions of the clinical outcomes of the set of treatment options generated in (iii). In some embodiments, updating is performed using a Bayesian update or a maximum likelihood algorithm.
In some embodiments, the trained machine learning model is selected from the group consisting of a Bayesian model, a support vector machine (SVM), a linear regression, a logistic regression, a random forest, and a neural network. In some embodiments, the trained machine learning model comprises a multilevel statistical model that accounts for variation at a plurality of distinct levels of analysis. In some embodiments, the multilevel statistical model accounts for correlation of subject-level effects across the plurality of distinct levels of analysis.
In some embodiments, the multilevel statistical model comprises a generalized linear model. In some embodiments, the generalized linear model comprises use of the expression:
, wherein η is a linear response, X is a vector of predictors for treatment effects fixed across subjects, β is a vector of fixed effects, Z is a vector of predictors for subject-level treatment effects, and u is a vector of subject-level effects. In some embodiments, the generalized linear model comprises use of the expression: y = g-1(η), wherein η is a linear response, g is an appropriately chosen link function from observed data to the linear response, and y is an outcome variable of interest.
In some embodiments, (iii) comprises applying a plurality of iterations of the prediction module to determine the probabilistic predictions of the clinical outcomes of the set of treatment options.
In some embodiments, the instructions are operable, when executed by the computer processor, to cause the computer processor to further use a parsing module to identify relevant features of the clinical data of the subject, the set of treatment options, and/or interaction terms between the clinical data of the subject and the treatment features of the set of treatment options. In some embodiments, the parsing module identifies relevant features by matching against a feature library.
In some embodiments, the instructions are operable, when executed by the computer processor, to cause the computer processor to further generate an electronic report comprising the probabilistic predictions of clinical outcomes of the set of treatment options. In some embodiments, the electronic report is used to select a treatment option from among the set of treatment options based at least in part on the probabilistic predictions of clinical outcomes of the set of treatment options. In some embodiments, the selected treatment option is administered to the subject. In some embodiments, the prediction module is further applied to outcome data of the subject that is obtained subsequent to administering the selected treatment option to the subject, to determine updated probabilistic predictions of the clinical outcomes of the set of treatment options.
In another aspect, the present disclosure provides a computer-implemented method comprising: (i) receiving clinical data of a subject and a set of treatment options for a disease or disorder of the subject, wherein the set of treatment options corresponds to clinical outcomes having future uncertainty; (ii) accessing a prediction module comprising a trained machine learning model that determines probabilistic predictions of clinical outcomes of the set of treatment options based at least in part on clinical data of test subjects; and (iii) applying the prediction module to at least the clinical data of the subject to determine probabilistic predictions of clinical outcomes of the set of treatment options for the disease or disorder of the subject.
In some embodiments, the clinical data is selected from somatic genetic mutations, germline genetic mutations, mutational burden, protein levels, transcriptome levels, metabolite levels, tumor size or staging, clinical symptoms, laboratory test results, and clinical history.
In some embodiments, the disease or disorder comprises cancer. In some embodiments, the subject has received a previous treatment for the cancer. In some embodiments, the subject has not received a previous treatment for the cancer.
In some embodiments, the cancer is selected from the group consisting of: Adrenal Gland Tumor, Ampulla of Vater Tumor, Biliary Tract Tumor, Bladder/Urinary Tract Tumor, Bone Tumor, Bowel Tumor, Breast Tumor, CNS/Brain Tumor, Cervix Tumor, Esophagus/Stomach Tumor, Eye Tumor, Head and Neck Tumor, Kidney Tumor, Liver Tumor, Lung Tumor, Lymphoid Tumor, Myeloid Tumor, Other Tumor, Ovary/Fallopian Tube Tumor, Pancreas Tumor, Penis Tumor, Peripheral Nervous System Tumor, Peritoneum Tumor, Pleura Tumor, Prostate Tumor, Skin Tumor, Soft Tissue Tumor, Testis Tumor, Thymus Tumor, Thyroid Tumor, Uterus Tumor, and Vulva/Vagina Tumor. In some embodiments, the cancer is selected from the group consisting of: Adrenal Gland Tumor, Ampulla of Vater Tumor, Biliary Tract Tumor, Bladder/Urinary Tract Tumor, Bone Tumor, Bowel Tumor, Breast Tumor, CNS/Brain Tumor, Cervix Tumor, Esophagus/Stomach Tumor, Eye Tumor, Head and Neck Tumor, Kidney Tumor, Liver Tumor, Lung Tumor, Lymphoid Tumor, Myeloid Tumor, Other Tumor, Ovary/Fallopian Tube Tumor, Pancreas Tumor, Penis Tumor, Peripheral Nervous System Tumor, Peritoneum Tumor, Pleura Tumor, Prostate Tumor, Skin Tumor, Soft Tissue Tumor, Testis Tumor, Thymus Tumor, Thyroid Tumor, Uterus Tumor, Vulva/Vagina Tumor, Adrenocortical Adenoma, Adrenocortical Carcinoma, Pheochromocytoma, Ampullary Carcinoma, Cholangiocarcinoma, Gallbladder Cancer, Intracholecystic Papillary Neoplasm, Intraductal Papillary Neoplasm of the Bile Duct, Bladder Adenocarcinoma, Bladder Squamous Cell Carcinoma, Bladder Urothelial Carcinoma, Inflammatory Myofibroblastic Bladder Tumor, Inverted Urothelial Papilloma, Mucosal Melanoma of the Urethra, Plasmacytoid/Signet Ring Cell Bladder Carcinoma, Sarcomatoid Carcinoma of the Urinary Bladder, Small Cell Bladder Cancer, Upper Tract Urothelial Carcinoma, Urachal Carcinoma, Urethral Cancer, Urothelial Papilloma, Adamantinoma, Chondroblastoma, Chondrosarcoma, Chordoma, Ewing Sarcoma, Giant Cell Tumor of Bone, Osteosarcoma, Anal Gland Adenocarcinoma, Anal Squamous Cell Carcinoma, Anorectal Mucosal Melanoma, Appendiceal Adenocarcinoma, Colorectal Adenocarcinoma, Gastrointestinal Neuroendocrine Tumors, Low-grade Appendiceal Mucinous Neoplasm, Medullary Carcinoma of the Colon, Small Bowel Cancer, Small Intestinal Carcinoma, Tubular Adenoma of the Colon, Adenomyoepithelioma of the Breast, Breast Ductal Carcinoma In Situ, Breast Fibroepithelial Neoplasms, Breast Lobular Carcinoma In Situ, Breast Neoplasm, NOS, Breast Sarcoma, Inflammatory Breast Cancer, Invasive Breast Carcinoma, Juvenile Secretory Carcinoma of the Breast, Metaplastic Breast Cancer, Choroid Plexus Tumor, Diffuse Glioma, Embryonal Tumor, Encapsulated Glioma, Ependymomal Tumor, Germ Cell Tumor, Brain, Meningothelial Tumor, Miscellaneous Brain Tumor, Miscellaneous Neuroepithelial Tumor, Pineal Tumor, Primary CNS Melanocytic Tumors, Sellar Tumor, Cervical Adenocarcinoma, Cervical Adenocarcinoma In Situ, Cervical Adenoid Basal Carcinoma, Cervical Adenoid Cystic Carcinoma, Cervical Adenosquamous Carcinoma, Cervical Leiomyosarcoma, Cervical Neuroendocrine Tumor, Cervical Rhabdomyosarcoma, Cervical Squamous Cell Carcinoma, Glassy Cell Carcinoma of the Cervix, Mixed Cervical Carcinoma, Small Cell Carcinoma of the Cervix, Villoglandular Adenocarcinoma of the Cervix, Esophageal Poorly Differentiated Carcinoma, Esophageal Squamous Cell Carcinoma, Esophagogastric Adenocarcinoma, Gastrointestinal Neuroendocrine Tumors of the Esophagus/Stomach, Mucosal Melanoma of the Esophagus, Smooth Muscle Neoplasm, NOS, Lacrimal Gland Tumor, Ocular Melanoma, Retinoblastoma, Head and Neck Carcinoma, Other, Head and Neck Mucosal Melanoma, Head and Neck Squamous Cell Carcinoma, Nasopharyngeal Carcinoma, Parathyroid Cancer, Salivary Carcinoma, Sialoblastoma, Clear Cell Sarcoma of Kidney, Renal Cell Carcinoma, Renal Neuroendocrine Tumor, Rhabdoid Cancer, Wilms’ Tumor, Fibrolamellar Carcinoma, Hepatoblastoma, Hepatocellular Adenoma, Hepatocellular Carcinoma, Hepatocellular Carcinoma plus Intrahepatic Cholangiocarcinoma, Liver Angiosarcoma, Malignant Nonepithelial Tumor of the Liver, Malignant Rhabdoid Tumor of the Liver, Undifferentiated Embryonal Sarcoma of the Liver, Combined Small Cell Lung Carcinoma, Inflammatory Myofibroblastic Lung Tumor, Lung Adenocarcinoma In Situ, Lung Neuroendocrine Tumor, Non-Small Cell Lung Cancer, Pleuropulmonary Blastoma, Pulmonary Lymphangiomyomatosis, Sarcomatoid Carcinoma of the Lung, Lymphoid Atypical, Lymphoid Benign, Lymphoid Neoplasm, Myeloid Atypical, Myeloid Benign, Myeloid Neoplasm, Adenocarcinoma In Situ, Cancer of Unknown Primary, Extra Gonadal Germ Cell Tumor, Mixed Cancer Types, Ovarian Cancer, Other, Ovarian Epithelial Tumor, Ovarian Germ Cell Tumor, Sex Cord Stromal Tumor, Acinar Cell Carcinoma of the Pancreas, Adenosquamous Carcinoma of the Pancreas, Cystic Tumor of the Pancreas, Pancreatic Adenocarcinoma, Pancreatic Neuroendocrine Tumor, Pancreatoblastoma, Solid Pseudopapillary Neoplasm of the Pancreas, Undifferentiated Carcinoma of the Pancreas, Penile Squamous Cell Carcinoma, Ganglioneuroblastoma, Ganglioneuroma, Nerve Sheath Tumor, Neuroblastoma, Peritoneal Mesothelioma, Peritoneal Serous Carcinoma, Pleural Mesothelioma, Basal Cell Carcinoma of Prostate, Prostate Adenocarcinoma, Prostate Neuroendocrine Carcinoma, Prostate Small Cell Carcinoma, Prostate Squamous Cell Carcinoma, Aggressive Digital Papillary Adenocarcinoma, Atypical Fibroxanthoma, Atypical Nevus, Basal Cell Carcinoma, Cutaneous Squamous Cell Carcinoma, Dermatofibroma, Dermatofibrosarcoma Protuberans, Desmoplastic Trichoepithelioma, Endocrine Mucin Producing Sweat Gland Carcinoma, Extramammary Paget Disease, Melanoma, Merkel Cell Carcinoma, Microcystic Adnexal Carcinoma, Porocarcinoma/Spiroadenocarcinoma, Poroma/Acrospiroma, Proliferating Pilar Cystic Tumor, Sebaceous Carcinoma, Skin Adnexal Carcinoma, Spiroma/Spiradenoma, Sweat Gland Adenocarcinoma, Sweat Gland Carcinoma/Apocrine Eccrine Carcinoma, Aggressive Angiomyxoma, Alveolar Soft Part Sarcoma, Angiomatoid Fibrous Histiocytoma, Angiosarcoma, Atypical Lipomatous Tumor, Clear Cell Sarcoma, Dendritic Cell Sarcoma, Desmoid/Aggressive Fibromatosis, Desmoplastic Small-Round-Cell Tumor, Epithelioid Hemangioendothelioma, Epithelioid Sarcoma, Ewing Sarcoma of Soft Tissue, Fibrosarcoma, Gastrointestinal Stromal Tumor, Glomangiosarcoma, Hemangioma, Infantile Fibrosarcoma, Inflammatory Myofibroblastic Tumor, Intimal Sarcoma, Leiomyoma, Leiomyosarcoma, Liposarcoma, Low-Grade Fibromyxoid Sarcoma, Malignant Glomus Tumor, Myofibroma, Myofibromatosis, Myopericytoma, Myxofibrosarcoma, Myxoma, Paraganglioma, Perivascular Epithelioid Cell Tumor, Pseudomyogenic Hemangioendothelioma, Radiation-Associated Sarcoma, Rhabdomyosarcoma, Round Cell Sarcoma, NOS, Sarcoma, NOS, Soft Tissue Myoepithelial Carcinoma, Solitary Fibrous Tumor/Hemangiopericytoma, Synovial Sarcoma, Tenosynovial Giant Cell Tumor Diffuse Type, Undifferentiated Pleomorphic Sarcoma/Malignant Fibrous Histiocytoma/High-Grade Spindle Cell Sarcoma, Non-Seminomatous Germ Cell Tumor, Seminoma, Sex Cord Stromal Tumor, Testicular Lymphoma, Testicular Mesothelioma, Thymic Epithelial Tumor, Thymic Neuroendocrine Tumor, Anaplastic Thyroid Cancer, Hurthle Cell Thyroid Cancer, Hyalinizing Trabecular Adenoma of the Thyroid, Medullary Thyroid Cancer, Oncocytic Adenoma of the Thyroid, Poorly Differentiated Thyroid Cancer, Well-Differentiated Thyroid Cancer, Endometrial Carcinoma, Gestational Trophoblastic Disease, Other Uterine Tumor, Uterine Sarcoma/Mesenchymal, Germ Cell Tumor of the Vulva, Mucinous Adenocarcinoma of the Vulva/Vagina, Mucosal Melanoma of the Vulva/Vagina, Poorly Differentiated Vaginal Carcinoma, Squamous Cell Carcinoma of the Vulva/Vagina, and Vaginal Adenocarcinoma.
In some embodiments, (iii) comprises applying the prediction module to at least treatment features of the set of treatment options to determine the probabilistic predictions of the clinical outcomes of the set of treatment options. In some embodiments, the treatment features comprise attributes of a surgical intervention, a drug intervention, a targeted intervention, a hormonal therapy intervention, a radiotherapy intervention, or an immunotherapy intervention. In some embodiments, the treatment features comprise the attributes of the drug intervention, wherein the attributes of the drug intervention comprise a chemical structure or a biological target of the drug intervention.
In some embodiments, (iii) comprises applying the prediction module to at least interaction terms between the clinical data of the subject and the treatment features of the set of treatment options to determine the probabilistic predictions of the clinical outcomes of the set of treatment options.
In some embodiments, the clinical outcomes having future uncertainty comprise a change in tumor size, a change in patient functional status, a time-to-disease progression, a time-to-treatment failure, overall survival, or progression-free survival. In some embodiments, the clinical outcomes having future uncertainty comprise the change in tumor size, as indicated by cross section or volume. In some embodiments, the clinical outcomes having future uncertainty comprise the change in patient functional status, as indicated by ECOG, Karnofsky, or Lansky scores.
In some embodiments, the probabilistic predictions of clinical outcomes of the set of treatment options comprise statistical distributions of the clinical outcomes of the set of treatment options. In some embodiments, (iii) further comprises determining a statistical parameter of the statistical distributions of the clinical outcomes of the set of treatment options. In some embodiments, the statistical parameter is selected from the group consisting of a median, a mean, a mode, a variance, a standard deviation, a quantile, a measure of central tendency, a measure of variance, a range, a minimum, a maximum, an interquartile range, a frequency, a percentile, a shape parameter, a scale parameter, and a rate parameter. In some embodiments, the statistical distributions of the clinical outcomes of the set of treatment options comprise a parametric distribution selected from the group consisting of a Weibull distribution, a log logistic distribution, or a log normal distribution, a Gaussian distribution, a Gamma distribution, and a Poisson distribution.
In some embodiments, the probabilistic predictions of clinical outcomes of the set of treatment options are explainable based on performing a query of the probabilistic predictions.
In some embodiments, the method further comprises applying a training module that trains the trained machine learning model. In some embodiments, the trained machine learning model is trained using a plurality of disparate data sources. In some embodiments, the training module aggregates datasets from the plurality of disparate sources, wherein the datasets are persisted in a plurality of data stores, and trains the trained machine learning model using the aggregated datasets. In some embodiments, the plurality of disparate sources is selected from the group consisting of clinical trials, case series, individual patient cases and outcomes data, and expert opinions.
In some embodiments, the training module updates the trained machine learning model using the probabilistic predictions of the clinical outcomes of the set of treatment options generated in (iii). In some embodiments, updating is performed using a Bayesian update or a maximum likelihood algorithm.
In some embodiments, the trained machine learning model is selected from the group consisting of a Bayesian model, a support vector machine (SVM), a linear regression, a logistic regression, a random forest, and a neural network. In some embodiments, the trained machine learning model comprises a multilevel statistical model that accounts for variation at a plurality of distinct levels of analysis. In some embodiments, the multilevel statistical model accounts for correlation of subject-level effects across the plurality of distinct levels of analysis.
In some embodiments, the multilevel statistical model comprises a generalized linear model. In some embodiments, the generalized linear model comprises use of the expression:
, wherein η is a linear response, X is a vector of predictors for treatment effects fixed across subjects, β is a vector of fixed effects, Z is a vector of predictors for subject-level treatment effects, and u is a vector of subject-level effects. In some embodiments, the generalized linear model comprises use of the expression: y = g-1(η), wherein η is a linear response, g is an appropriately chosen link function from observed data to the linear response, and y is an outcome variable of interest.
In some embodiments, (iii) comprises applying a plurality of iterations of the prediction module to determine the probabilistic predictions of the clinical outcomes of the set of treatment options.
In some embodiments, the method further comprises using a parsing module to identify relevant features of the clinical data of the subject, the set of treatment options, and/or interaction terms between the clinical data of the subject and the treatment features of the set of treatment options. In some embodiments, the parsing module identifies relevant features by matching against a feature library.
In some embodiments, the method further comprises generating an electronic report comprising the probabilistic predictions of clinical outcomes of the set of treatment options. In some embodiments, the electronic report is used to select a treatment option from among the set of treatment options based at least in part on the probabilistic predictions of clinical outcomes of the set of treatment options. In some embodiments, the selected treatment option is administered to the subject. In some embodiments, the method further comprises applying the prediction module to outcome data of the subject that is obtained subsequent to administering the selected treatment option to the subject, to determine updated probabilistic predictions of the clinical outcomes of the set of treatment options.
In another aspect, the present disclosure provides a non-transitory computer storage medium storing instructions that are operable, when executed by computer processors, to implement a method comprising: (i) receiving clinical data of a subject and a set of treatment options for a disease or disorder of the subject, wherein the set of treatment options corresponds to clinical outcomes having future uncertainty; (ii) accessing a prediction module comprising a trained machine learning model that determines probabilistic predictions of clinical outcomes of the set of treatment options based at least in part on clinical data of test subjects; and (iii) applying the prediction module to at least the clinical data of the subject to determine probabilistic predictions of clinical outcomes of the set of treatment options for the disease or disorder of the subject.
In some embodiments, the clinical data is selected from somatic genetic mutations, germline genetic mutations, mutational burden, protein levels, transcriptome levels, metabolite levels, tumor size or staging, clinical symptoms, laboratory test results, and clinical history.
In some embodiments, the disease or disorder comprises cancer. In some embodiments, the subject has received a previous treatment for the cancer. In some embodiments, the subject has not received a previous treatment for the cancer.
In some embodiments, the cancer is selected from the group consisting of: Adrenal Gland Tumor, Ampulla of Vater Tumor, Biliary Tract Tumor, Bladder/Urinary Tract Tumor, Bone Tumor, Bowel Tumor, Breast Tumor, CNS/Brain Tumor, Cervix Tumor, Esophagus/Stomach Tumor, Eye Tumor, Head and Neck Tumor, Kidney Tumor, Liver Tumor, Lung Tumor, Lymphoid Tumor, Myeloid Tumor, Other Tumor, Ovary/Fallopian Tube Tumor, Pancreas Tumor, Penis Tumor, Peripheral Nervous System Tumor, Peritoneum Tumor, Pleura Tumor, Prostate Tumor, Skin Tumor, Soft Tissue Tumor, Testis Tumor, Thymus Tumor, Thyroid Tumor, Uterus Tumor, and Vulva/Vagina Tumor. In some embodiments, the cancer is selected from the group consisting of: Adrenal Gland Tumor, Ampulla of Vater Tumor, Biliary Tract Tumor, Bladder/Urinary Tract Tumor, Bone Tumor, Bowel Tumor, Breast Tumor, CNS/Brain Tumor, Cervix Tumor, Esophagus/Stomach Tumor, Eye Tumor, Head and Neck Tumor, Kidney Tumor, Liver Tumor, Lung Tumor, Lymphoid Tumor, Myeloid Tumor, Other Tumor, Ovary/Fallopian Tube Tumor, Pancreas Tumor, Penis Tumor, Peripheral Nervous System Tumor, Peritoneum Tumor, Pleura Tumor, Prostate Tumor, Skin Tumor, Soft Tissue Tumor, Testis Tumor, Thymus Tumor, Thyroid Tumor, Uterus Tumor, Vulva/Vagina Tumor, Adrenocortical Adenoma, Adrenocortical Carcinoma, Pheochromocytoma, Ampullary Carcinoma, Cholangiocarcinoma, Gallbladder Cancer, Intracholecystic Papillary Neoplasm, Intraductal Papillary Neoplasm of the Bile Duct, Bladder Adenocarcinoma, Bladder Squamous Cell Carcinoma, Bladder Urothelial Carcinoma, Inflammatory Myofibroblastic Bladder Tumor, Inverted Urothelial Papilloma, Mucosal Melanoma of the Urethra, Plasmacytoid/Signet Ring Cell Bladder Carcinoma, Sarcomatoid Carcinoma of the Urinary Bladder, Small Cell Bladder Cancer, Upper Tract Urothelial Carcinoma, Urachal Carcinoma, Urethral Cancer, Urothelial Papilloma, Adamantinoma, Chondroblastoma, Chondrosarcoma, Chordoma, Ewing Sarcoma, Giant Cell Tumor of Bone, Osteosarcoma, Anal Gland Adenocarcinoma, Anal Squamous Cell Carcinoma, Anorectal Mucosal Melanoma, Appendiceal Adenocarcinoma, Colorectal Adenocarcinoma, Gastrointestinal Neuroendocrine Tumors, Low-grade Appendiceal Mucinous Neoplasm, Medullary Carcinoma of the Colon, Small Bowel Cancer, Small Intestinal Carcinoma, Tubular Adenoma of the Colon, Adenomyoepithelioma of the Breast, Breast Ductal Carcinoma In Situ, Breast Fibroepithelial Neoplasms, Breast Lobular Carcinoma In Situ, Breast Neoplasm, NOS, Breast Sarcoma, Inflammatory Breast Cancer, Invasive Breast Carcinoma, Juvenile Secretory Carcinoma of the Breast, Metaplastic Breast Cancer, Choroid Plexus Tumor, Diffuse Glioma, Embryonal Tumor, Encapsulated Glioma, Ependymomal Tumor, Germ Cell Tumor, Brain, Meningothelial Tumor, Miscellaneous Brain Tumor, Miscellaneous Neuroepithelial Tumor, Pineal Tumor, Primary CNS Melanocytic Tumors, Sellar Tumor, Cervical Adenocarcinoma, Cervical Adenocarcinoma In Situ, Cervical Adenoid Basal Carcinoma, Cervical Adenoid Cystic Carcinoma, Cervical Adenosquamous Carcinoma, Cervical Leiomyosarcoma, Cervical Neuroendocrine Tumor, Cervical Rhabdomyosarcoma, Cervical Squamous Cell Carcinoma, Glassy Cell Carcinoma of the Cervix, Mixed Cervical Carcinoma, Small Cell Carcinoma of the Cervix, Villoglandular Adenocarcinoma of the Cervix, Esophageal Poorly Differentiated Carcinoma, Esophageal Squamous Cell Carcinoma, Esophagogastric Adenocarcinoma, Gastrointestinal Neuroendocrine Tumors of the Esophagus/Stomach, Mucosal Melanoma of the Esophagus, Smooth Muscle Neoplasm, NOS, Lacrimal Gland Tumor, Ocular Melanoma, Retinoblastoma, Head and Neck Carcinoma, Other, Head and Neck Mucosal Melanoma, Head and Neck Squamous Cell Carcinoma, Nasopharyngeal Carcinoma, Parathyroid Cancer, Salivary Carcinoma, Sialoblastoma, Clear Cell Sarcoma of Kidney, Renal Cell Carcinoma, Renal Neuroendocrine Tumor, Rhabdoid Cancer, Wilms’ Tumor, Fibrolamellar Carcinoma, Hepatoblastoma, Hepatocellular Adenoma, Hepatocellular Carcinoma, Hepatocellular Carcinoma plus Intrahepatic Cholangiocarcinoma, Liver Angiosarcoma, Malignant Nonepithelial Tumor of the Liver, Malignant Rhabdoid Tumor of the Liver, Undifferentiated Embryonal Sarcoma of the Liver, Combined Small Cell Lung Carcinoma, Inflammatory Myofibroblastic Lung Tumor, Lung Adenocarcinoma In Situ, Lung Neuroendocrine Tumor, Non-Small Cell Lung Cancer, Pleuropulmonary Blastoma, Pulmonary Lymphangiomyomatosis, Sarcomatoid Carcinoma of the Lung, Lymphoid Atypical, Lymphoid Benign, Lymphoid Neoplasm, Myeloid Atypical, Myeloid Benign, Myeloid Neoplasm, Adenocarcinoma In Situ, Cancer of Unknown Primary, Extra Gonadal Germ Cell Tumor, Mixed Cancer Types, Ovarian Cancer, Other, Ovarian Epithelial Tumor, Ovarian Germ Cell Tumor, Sex Cord Stromal Tumor, Acinar Cell Carcinoma of the Pancreas, Adenosquamous Carcinoma of the Pancreas, Cystic Tumor of the Pancreas, Pancreatic Adenocarcinoma, Pancreatic Neuroendocrine Tumor, Pancreatoblastoma, Solid Pseudopapillary Neoplasm of the Pancreas, Undifferentiated Carcinoma of the Pancreas, Penile Squamous Cell Carcinoma, Ganglioneuroblastoma, Ganglioneuroma, Nerve Sheath Tumor, Neuroblastoma, Peritoneal Mesothelioma, Peritoneal Serous Carcinoma, Pleural Mesothelioma, Basal Cell Carcinoma of Prostate, Prostate Adenocarcinoma, Prostate Neuroendocrine Carcinoma, Prostate Small Cell Carcinoma, Prostate Squamous Cell Carcinoma, Aggressive Digital Papillary Adenocarcinoma, Atypical Fibroxanthoma, Atypical Nevus, Basal Cell Carcinoma, Cutaneous Squamous Cell Carcinoma, Dermatofibroma, Dermatofibrosarcoma Protuberans, Desmoplastic Trichoepithelioma, Endocrine Mucin Producing Sweat Gland Carcinoma, Extramammary Paget Disease, Melanoma, Merkel Cell Carcinoma, Microcystic Adnexal Carcinoma, Porocarcinoma/Spiroadenocarcinoma, Poroma/Acrospiroma, Proliferating Pilar Cystic Tumor, Sebaceous Carcinoma, Skin Adnexal Carcinoma, Spiroma/Spiradenoma, Sweat Gland Adenocarcinoma, Sweat Gland Carcinoma/Apocrine Eccrine Carcinoma, Aggressive Angiomyxoma, Alveolar Soft Part Sarcoma, Angiomatoid Fibrous Histiocytoma, Angiosarcoma, Atypical Lipomatous Tumor, Clear Cell Sarcoma, Dendritic Cell Sarcoma, Desmoid/Aggressive Fibromatosis, Desmoplastic Small-Round-Cell Tumor, Epithelioid Hemangioendothelioma, Epithelioid Sarcoma, Ewing Sarcoma of Soft Tissue, Fibrosarcoma, Gastrointestinal Stromal Tumor, Glomangiosarcoma, Hemangioma, Infantile Fibrosarcoma, Inflammatory Myofibroblastic Tumor, Intimal Sarcoma, Leiomyoma, Leiomyosarcoma, Liposarcoma, Low-Grade Fibromyxoid Sarcoma, Malignant Glomus Tumor, Myofibroma, Myofibromatosis, Myopericytoma, Myxofibrosarcoma, Myxoma, Paraganglioma, Perivascular Epithelioid Cell Tumor, Pseudomyogenic Hemangioendothelioma, Radiation-Associated Sarcoma, Rhabdomyosarcoma, Round Cell Sarcoma, NOS, Sarcoma, NOS, Soft Tissue Myoepithelial Carcinoma, Solitary Fibrous Tumor/Hemangiopericytoma, Synovial Sarcoma, Tenosynovial Giant Cell Tumor Diffuse Type, Undifferentiated Pleomorphic Sarcoma/Malignant Fibrous Histiocytoma/High-Grade Spindle Cell Sarcoma, Non-Seminomatous Germ Cell Tumor, Seminoma, Sex Cord Stromal Tumor, Testicular Lymphoma, Testicular Mesothelioma, Thymic Epithelial Tumor, Thymic Neuroendocrine Tumor, Anaplastic Thyroid Cancer, Hurthle Cell Thyroid Cancer, Hyalinizing Trabecular Adenoma of the Thyroid, Medullary Thyroid Cancer, Oncocytic Adenoma of the Thyroid, Poorly Differentiated Thyroid Cancer, Well-Differentiated Thyroid Cancer, Endometrial Carcinoma, Gestational Trophoblastic Disease, Other Uterine Tumor, Uterine Sarcoma/Mesenchymal, Germ Cell Tumor of the Vulva, Mucinous Adenocarcinoma of the Vulva/Vagina, Mucosal Melanoma of the Vulva/Vagina, Poorly Differentiated Vaginal Carcinoma, Squamous Cell Carcinoma of the Vulva/Vagina, and Vaginal Adenocarcinoma.
In some embodiments, (iii) comprises applying the prediction module to at least treatment features of the set of treatment options to determine the probabilistic predictions of the clinical outcomes of the set of treatment options. In some embodiments, the treatment features comprise attributes of a surgical intervention, a drug intervention, a targeted intervention, a hormonal therapy intervention, a radiotherapy intervention, or an immunotherapy intervention. In some embodiments, the treatment features comprise the attributes of the drug intervention, wherein the attributes of the drug intervention comprise a chemical structure or a biological target of the drug intervention.
In some embodiments, (iii) comprises applying the prediction module to at least interaction terms between the clinical data of the subject and the treatment features of the set of treatment options to determine the probabilistic predictions of the clinical outcomes of the set of treatment options.
In some embodiments, the clinical outcomes having future uncertainty comprise a change in tumor size, a change in patient functional status, a time-to-disease progression, a time-to-treatment failure, overall survival, or progression-free survival. In some embodiments, the clinical outcomes having future uncertainty comprise the change in tumor size, as indicated by cross section or volume. In some embodiments, the clinical outcomes having future uncertainty comprise the change in patient functional status, as indicated by ECOG, Karnofsky, or Lansky scores.
In some embodiments, the probabilistic predictions of clinical outcomes of the set of treatment options comprise statistical distributions of the clinical outcomes of the set of treatment options. In some embodiments, (iii) further comprises determining a statistical parameter of the statistical distributions of the clinical outcomes of the set of treatment options. In some embodiments, the statistical parameter is selected from the group consisting of a median, a mean, a mode, a variance, a standard deviation, a quantile, a measure of central tendency, a measure of variance, a range, a minimum, a maximum, an interquartile range, a frequency, a percentile, a shape parameter, a scale parameter, and a rate parameter. In some embodiments, the statistical distributions of the clinical outcomes of the set of treatment options comprise a parametric distribution selected from the group consisting of a Weibull distribution, a log logistic distribution, or a log normal distribution, a Gaussian distribution, a Gamma distribution, and a Poisson distribution.
In some embodiments, the probabilistic predictions of clinical outcomes of the set of treatment options are explainable based on performing a query of the probabilistic predictions.
In some embodiments, the method further comprises applying a training module that trains the trained machine learning model. In some embodiments, the trained machine learning model is trained using a plurality of disparate data sources. In some embodiments, the training module aggregates datasets from the plurality of disparate sources, wherein the datasets are persisted in a plurality of data stores, and trains the trained machine learning model using the aggregated datasets. In some embodiments, the plurality of disparate sources is selected from the group consisting of clinical trials, case series, individual patient cases and outcomes data, and expert opinions.
In some embodiments, the training module updates the trained machine learning model using the probabilistic predictions of the clinical outcomes of the set of treatment options generated in (iii). In some embodiments, updating is performed using a Bayesian update or a maximum likelihood algorithm.
In some embodiments, the trained machine learning model is selected from the group consisting of a Bayesian model, a support vector machine (SVM), a linear regression, a logistic regression, a random forest, and a neural network. In some embodiments, the trained machine learning model comprises a multilevel statistical model that accounts for variation at a plurality of distinct levels of analysis. In some embodiments, the multilevel statistical model accounts for correlation of subject-level effects across the plurality of distinct levels of analysis.
In some embodiments, the multilevel statistical model comprises a generalized linear model. In some embodiments, the generalized linear model comprises use of the expression:
, wherein η is a linear response, X is a vector of predictors for treatment effects fixed across subjects, β is a vector of fixed effects, Z is a vector of predictors for subject-level treatment effects, and u is a vector of subject-level effects. In some embodiments, the generalized linear model comprises use of the expression: y = g-1(η), wherein η is a linear response, g is an appropriately chosen link function from observed data to the linear response, and y is an outcome variable of interest.
In some embodiments, (iii) comprises applying a plurality of iterations of the prediction module to determine the probabilistic predictions of the clinical outcomes of the set of treatment options.
In some embodiments, the method further comprises using a parsing module to identify relevant features of the clinical data of the subject, the set of treatment options, and/or interaction terms between the clinical data of the subject and the treatment features of the set of treatment options. In some embodiments, the parsing module identifies relevant features by matching against a feature library.
In some embodiments, the method further comprises generating an electronic report comprising the probabilistic predictions of clinical outcomes of the set of treatment options. In some embodiments, the electronic report is used to select a treatment option from among the set of treatment options based at least in part on the probabilistic predictions of clinical outcomes of the set of treatment options. In some embodiments, the selected treatment option is administered to the subject. In some embodiments, the method further comprises applying the prediction module to outcome data of the subject that is obtained subsequent to administering the selected treatment option to the subject, to determine updated probabilistic predictions of the clinical outcomes of the set of treatment options.
Another aspect of the present disclosure provides a non-transitory computer readable medium comprising machine executable code that, upon execution by one or more computer processors, implements any of the methods above or elsewhere herein.
Another aspect of the present disclosure provides a system comprising one or more computer processors and computer memory coupled thereto. The computer memory comprises machine executable code that, upon execution by the one or more computer processors, implements any of the methods above or elsewhere herein.
Additional aspects and advantages of the present disclosure will become readily apparent to those skilled in this art from the following detailed description, wherein only illustrative embodiments of the present disclosure are shown and described. As will be realized, the present disclosure is capable of other and different embodiments, and its several details are capable of modifications in various obvious respects, all without departing from the disclosure. Accordingly, the drawings and description are to be regarded as illustrative in nature, and not as restrictive.
All publications, patents, and patent applications mentioned in this specification are herein incorporated by reference to the same extent as if each individual publication, patent, or patent application was specifically and individually indicated to be incorporated by reference. To the extent publications and patents or patent applications incorporated by reference contradict the disclosure contained in the specification, the specification is intended to supersede and/or take precedence over any such contradictory material.
The novel features of the invention are set forth with particularity in the appended claims. A better understanding of the features and advantages of the present invention will be obtained by reference to the following detailed description that sets forth illustrative embodiments, in which the principles of the invention are utilized, and the accompanying drawings (also “Figure” and “FIG.” herein), of which:
[0074.1]
While various embodiments of the invention have been shown and described herein, it will be obvious to those skilled in the art that such embodiments may be provided by way of example only. Numerous variations, changes, and substitutions may occur to those skilled in the art without departing from the invention. It may be understood that various alternatives to the embodiments of the invention described herein may be employed.
As used in the specification and claims, the singular form “a”, “an”, and “the” include plural references unless the context clearly dictates otherwise.
As used herein, the term “subject,” generally refers to an entity or a medium that has testable or detectable genetic information. A subject can be a person, individual, or patient. A subject can be a vertebrate, such as, for example, a mammal. Non-limiting examples of mammals include humans, simians, farm animals, sport animals, rodents, and pets. A subject can be a person that has a cancer or may be suspected of having a cancer. The subject may be displaying a symptom(s) indicative of a health or physiological state or condition of the subject, such as a cancer of the subject. As an alternative, the subject can be asymptomatic with respect to such health or physiological state or condition.
Physicians engaged in precision oncology may integrate an overwhelming amount of information from publications and from their own experience. For example, as of the end of 2019, PubMed reports 19,748 publications matching the term “breast cancer” in the past year alone, and the same search for open, recruiting studies in ClinicalTrials.gov returns 1,937 studies. Therefore, practitioners, such as oncologists, may face challenges in reading all these materials, determining which may be most relevant, and synthesizing the whole of the data into relevant predictions for patient outcomes.
Oncologists fighting less common cancers may be potentially a worse situation; instead of being overwhelmed, they may have only a few relevant publications, and may have seen only a small number of similar cases. Here, successful prediction of patient outputs may depend on prior information gleaned from experts in similar, but not exactly the same, disease states.
Importantly, prediction may not be an exact science. Every patient may respond differently, due to a multitude of unknowns; it may be difficult or even impossible to fully model patients and their disease states, or the complete set of interactions between patients and their treatment regimens.
For some cancers, such as chronic myelogenous leukemia, the level of uncertainty may be relatively low; patients may almost universally receive tyrosine kinase receptor inhibitors, and the response characteristics may be relatively well-known. But for most cancers, and for many late-stage cancers, the number of unknown variables may far outnumber the number of known characteristics. In these cases, the sum of effects from the unknown variables may exceed the effects from known treatments. This may require probabilistic reasoning in order to devise an effective rational treatment strategy.
Thus, there remains a need for automated intelligent systems and methods that acquire and structure knowledge from a diverse array of sources, such as clinical trials, case series, individual patients cases and outcomes data, and expert opinions, such that such information may be used to predict, for a given patient, what the probable range of outcomes might be, over time, for a given treatment. Furthermore, such predictions may be explainable to a physician or scientist who queries the system for such a prediction; in contrast, a “black box” that provides answers without rationales may not instill confidence.
In light of the needs above, the present disclosure provides systems and methods for precision oncology using multilevel Bayesian models, which may effectively address challenges faced by physicians when treating patients with complex disease etiologies, such as cancer. Systems and methods of the present disclosure may be used to predict various measures of patient outcomes for particular patients under different treatment regimens. The systems and methods may be capable of learning from a diverse range of information sources, including individual patient outcomes observed outside of randomized trials (in other words, “real world evidence” or RWE) as well as other sources, such as expert surveys and summary statistics from clinical trials. The learning process may occur via a training module, which presents this data in a learning loop to a multilevel model module, which may be a combination of a Bayesian model and database.
Once the multilevel model module has been conditioned on such source data, it may be used in conjunction with a prediction module to predict outcomes for new patients under different treatment choices and provide a measure of the uncertainty of these predictions. These predictions may be probabilistic in nature, in that they represent a distribution of possible outcomes (e.g., in contrast to a single outcome).
A key advance may be that the multilevel model’s structure bears an understandable relationship to the domain, and to the types of inputs and outputs oncologists may expect. This structure may help users of systems and methods of the present disclosure to understand how the predictions and uncertainty therein may be derived, rather than treating the results as “black box” predictions. This level of explainability may be critical, for example, for certification of medical devices that rely on Artificial Intelligence and Machine Learning.
The model may be constructed or improved by a training process and a prediction process. For both of these tasks, the user may need to provide a list of relevant patient features (e.g., biomarkers), a list of relevant treatment features, and a list of possible interactions between features. Patient features (biomarkers) may include, but may be not limited to: somatic mutations (e.g., which may provide information about the cancer tumor itself); information about mutational burden (e.g., total number of mutations or number of mutations per million base pairs); germline genetic mutations (e.g., which may indicate higher risk of getting cancer, such as the BRCA1 and BRCA2 mutations); specific protein levels (e.g., if the protein ERCC1 may be present, then platinum-based chemotherapies may be not likely to be effective; other proteins of interest include certain enzymes, antibodies, and cytokines).
Treatment features may describe the various attributes of the treatments, such as whether it involves surgery, radiation, or may be a biochemical intervention. Each of these may be further subdivided. For example, surgical interventions may be divided into partial and total resections, exploratory biopsies, etc. Radiation may be described by wavelength, duration, burstiness, etc. For biochemical interventions, there may be multiple hierarchies, forming a lattice-like representation, of attributes that may describe the chemical structure, biological targets, and other attributes of the compounds. For example, the following hierarchy may be used, as described by Espinosa et al., “Classification of anticancer drugs—a new system based on therapeutic targets” CANCER TREATMENT REVIEWS 2003; 29: pp. 515-523, which may be incorporated by reference herein in its entirety:
This feature classification may be further refined, for example, down to the level of specific genes or pathways targeted by specific drugs (e.g., MEK, ERK, or p53).
The concatenation of a list of biomarkers, treatment features, and interactions terms between these features may specify a set of predictors for the model. In addition to identifying the predictors, a user of the system of the present disclosure may specify the desired treatment outcomes of interest, to be predicted by the system. These outcomes may include, but may be not limited to: a change in tumor size (e.g., as measured in cross section, or in volumetric estimation); a change in patient functional status (e.g., ECOG, Karnofsky, or Lansky scores); a time-to-disease progression; a time-to-treatment failure; an overall survival; and a progression-free survival.
With the set of predictors and a desired set of outcomes, the system may then generate a “predictive model,” which may be a forward simulation from a set of given predictors to the set of desired outcomes. Because these simulations may be stochastic in nature, they may involve a plurality of iterations and produce a statistical distribution of possible treatment outcomes. The outcome predictions may be communicated as summary statistics of this distribution, such as the mean and standard deviation for continuous outcomes, shape/scale parameters of the distribution, or the frequency of specific cases for discrete outcomes (e.g. a rate parameter).
The predictive model may be a generalized linear multilevel model. As such, the expected outcome of the generalized linear model may be a linear combination of predictor variables, under an appropriate transformation of the outcome variable. Multilevel models may be statistical models that account for variation at multiple levels of analysis. For instance, the model may measure the size of a subject’s tumors each month for several months after treatment. Variation in the size of a subject’s tumor at a particular time may be due to either the characteristics specific to the subject (e.g., having a more or less aggressive tumor) or from the time relative to the start of treatment. Additionally, such a model may consider subject-level effects on the time-to-disease progression or death to be additional effects on the survival of subjects which, while they may be correlated with predictors, may not be fixed across subjects when conditioned on the predictors. Models that fail to account for correlation in data from different levels of analysis may underestimate the uncertainty of model predictions.
To perform the learning task, a learning module may update the state of the predictive model by conditioning it on new data. This new data may take the form of any treatment outcomes data that may be predicted by the predictive model or by summary statistics derived from the predictive model. The state representation of the model may be any representation of a probability distribution over such model parameters, such as a finite number of samples from the distribution, summary statistics of the distribution, or hyperparameters describing a particular instance of a parametric family of probability distribution functions. While the learning task may be considered a form of a Bayesian update, such an updating procedure may use techniques from frequentist statistics, such as maximum likelihood algorithms to derive new model parameters.
Improved systems and methods for predicting treatment outcomes may comprise improvements in the application of subject-specific biological features and/or the application of black box machine learning algorithms such as neural networks to the task of generating predictions of outcomes.
For example, systems and methods for predicting treatment outcomes may be improved in the application of subject-specific biological features. For example, genetic sequencing of a subject’s tumor may reveal mutations in known oncogenes (e.g., genes that have the potential to cause cancer). The presence or absence of mutations in these genes may be shown in randomized controlled trials to affect the efficacy of particular drugs that target proteins in related metabolic pathways. Methods for applying this knowledge may comprise use of decision trees whose decision criteria may be set by published studies. While these methods may provide clear guidance on applying the predictions, there may be little or no quantification of uncertainty in the predictions. Such uncertainty quantification may naturally arise in a Bayesian outcomes model, in which uncertainty may be expressed as the variance in the distribution of predicted outcomes. An additional challenge that such methods face may be that they require expensive clinical studies in order to discover new rules for achieving better outcomes, with results that may take years or even decades to be disseminated to widespread practice in the community. In contrast, the Bayesian outcomes model presented herein may be updated with multiple sources, including individual subject data, existing clinical trial data, and expert surveys, and it may be done in a timely fashion.
As another example, systems and methods for predicting treatment outcomes may be improved in the application of black box machine learning algorithms such as neural networks to the task of generating predictions of outcomes. Such algorithms may achieve high predictive accuracy but may require large datasets to make sensible predictions. Thus, they may not generalize well beyond the scope of data that the model has been trained on. Since many cancers may be rare, and many subjects present unique circumstances, training such networks may be difficult.
Training such systems may face challenges from the “large p, small n” problem. That is, there may be a very large number of parameters that may be fitted compared to the number of data points available for training. As an example, consider the size of the human genome and the number of possible mutations it may harbor, in relation to the number of childhood brain cancer subjects. The problems and challenges associated with potential overfitting may be enormous.
In addition, these algorithms may be difficult for domain experts to interpret and critique. Aside from hindering the adoption of such algorithms by care providers, the lack of interpretability may make it difficult to debug these algorithms. The same problems with non-explainability of the algorithms may make it difficult when it comes to consideration of systems utilizing such algorithms for certification as software medical devices.
Thus, there remains a need for systems and methods that may predict measures of subject outcomes using a relative paucity of data, which may handle uncertainty in prediction, and which may explain the outcomes in terms of features that a physician may use to describe the subject’s condition, such that a physician may understand why the system came to the conclusion that was determined.
In a generalized linear multilevel model, the linear response to a treatment may be described by the expression:
where η may be the linear response, X is a vector of predictors for treatment effects fixed across subj ects, β is a vector of fixed effects, Z is a vector of predictors for subject-level treatment effects, and u is a vector of subject-level effects. Z may comprise any subset of predictors from X, indexed by subject. The subject-level effects parameters may be asserted or assumed to be drawn from a zero-centered multivariate normal distribution. These subject-level effects may have the interpretation as the variation in outcomes in subjects beyond those due to measured predictor variables.
The expectation of the linear response may be described by the expression:
where g is an appropriately chosen link function from the observed data to the linear response and y is the outcome variable of interest. The distribution about the expected value may be chosen to match the range of the outcome space, such as a normal distribution for continuous outcomes or a categorical distribution for discrete outcomes. Other outcomes, such as time-to-event outcomes, may use a more specialized distribution such as a Weibull, log logistic, or log normal distribution. Such distributions with additional shape or scale parameters beyond η may introduce additional linear dependence on predictor variables and subject-level variables.
Importantly, the prediction model provided herein may not be stateless. It may accumulate knowledge over time, by being trained via a set of training inputs, and/or by learning from every example it may be presented with. Further, since the effects parameters may not be scalar parameters, but rather may be drawn from distributions, it may be possible to provide prior estimates of degrees of confidence or degrees of belief in certain effects, even if there have been no concrete cases yet available to examine (e.g., in a case where there have been in vitro experiments but there has not yet been in vivo usage of a drug, there may only be expert opinion to draw from at the moment).
The machinery that surrounds the prediction model may be organized into several modules that perform different functions, depending on whether the system may be being trained with training data or being asked to predict the outcomes for a specific subject.
At a simple level of abstraction, the systems and methods of the present disclosure may be used in different modes. When used in “training mode,” the system may be presented with multiple training examples, each of which comprises a subject case description and the actual treatment outcome. This data may be used to train an internal model (e.g., through one or more iterations), but may produce no output (other than for debugging purposes and monitoring purposes).
When used in “prediction mode,” the system may be presented with a single subject case at a time. The system may then use the model to produce predicted outcomes, which describe the expected trajectory of a test subject on the proposed treatment regimen. These outcomes may be time-dependent and probabilistic in nature.
The system 100 comprises four modules: the parsing module 110, the model module 120, the prediction module 130, and the training module 140. In “training mode,” the system may be presented with training inputs 102, which may be training examples which have both input and outcome information. These training inputs may be used to update the internal model representation 121, and may be the way by which the system learns.
Another way the system may be used may be in “prediction mode.” In this mode, the system may be provided only features of a particular subject and treatment regimen in the prediction inputs 101. The system may then use the knowledge stored in the model representation 121, along with other parts of the system, and may generate predicted outcomes 105 therefrom. These predictions may not necessarily be exact values, but may be expected values with credible intervals associated with them.
For performing the prediction task, the user of the system may provide prediction inputs (e.g., subject case descriptions) to the parsing module. The parsing module may identify relevant biomarkers, treatments, and interaction terms by matching against the feature library 122. This identification process may produce a matrix of predictors 103, whose rows represent different treatment options and whose columns represent different feature variables that may be associated with variation in outcomes (alternatively, without loss of generality, rows may represent different feature variables that may be associated with variation in outcomes, and columns may represent different treatment options). The prediction module may iteratively draw sample parameters 131 from the model representation, then may use these sampled parameters with the predictor matrix to draw a sample of outcomes 132 under each treatment option. This iterative process may be repeated to build a larger sample of predicted outcomes 105 under each treatment option.
For performing the training task, the user of the system may provide training inputs 102 (e.g., subject treatment outcomes data, expert survey data, or clinical trial data) to the parsing module 110. The parsing module 110 may identify relevant biomarkers, treatments, and interaction terms by matching against the feature library 122. This identification process may produce a matrix of predictors 103, whose rows represent different treatment options and whose columns represent different feature variables that may be associated with variation in outcomes (alternatively, without loss of generality, rows may represent different feature variables that may be associated with variation in outcomes, and columns may represent different treatment options). In addition, the parsing module 110 may identify treatment outcomes from the training inputs, and the parsing module 110 may produce a vector of outcomes 104. The training module may read the current model representation to construct a Bayesian prior distribution 141. The training module may then take these priors, and may use the predictors matrix and outcomes vector to perform a Bayesian update 142. This updating process may produce an updated model representation, which may be stored in place of the previous model representation 121.
While some embodiments of the present disclosure utilize Bayesian modeling to perform an update of internal model state, the same task may be performed using frequentist statistical techniques. For example, the Bayesian formulation may be simpler to use; however, limitation of the discussion may in no way be interpreted as a limitation of the present disclosure.
The outcome prediction shown in panel 204 may be enlarged and shown in
Returning to
The model module 120 may comprise model representation 121 and feature library 122. The model representation may be a database which comprises a record of model parameter distributions for each outcome type (e.g., time-to-disease progression, change in tumor load, change in performance status). These parameter distributions may be stored either as a finite number of samples from the distribution of interest or as hyperparameters of some parametric probability distribution (note that “hyperparameter” here may be used in the Bayesian sense, to refer to parameters that describe a particular probability distribution, as compared to the machine learning sense of parameters that may be tweaked to tune how an algorithm runs). The feature library may be another database comprising a list of treatment options, a list of biomarkers, and a list of interaction terms that reference entries in the treatment and biomarker lists. All of this information may be used in creating the predictors matrix 103, which may be used in intermediate calculations.
The parsing module 110 may perform the following sub-tasks: upon being presented with training input data 102, the “identify features” subsystem 111 constructs the predictors matrix 103, and the “identify outcomes” subsystem 112 constructs the outcomes vector 104. Additionally, “identify features” subsystem 111 constructs the predictors matrix 103 when presented with prediction inputs 101. Training input data may comprise multiple subject case descriptions associated with treatment outcomes. Prediction input data may comprise a single subject case description.
To construct a predictors matrix from training input data, the parsing module 110 may partition the training data by individual subjects, then may construct, for each subject, a vector of features by matching the individual subject’s case description to the list of features provided by the feature library in the model module. These feature row vectors may be concatenated to form a matrix of predictors (predictors matrix 103). To construct the outcomes vector 104, the parsing module may similarly partition the training data by individual subjects, then may associate each subject with a treatment outcome.
To construct a predictors matrix 103 from prediction inputs, the parsing module 110 may create a copy of the subject case description for each treatment option read from the feature library. Each treatment option may be associated with a copy of the case description. The parsing module 110 may take this set of case descriptions with hypothetical treatments, then for each hypothetical treatment, it may form a feature vector by matching against the biomarker, treatment, and interaction terms stored in the feature library 122. These feature row vectors may be concatenated to form a matrix of predictors, where the rows in this matrix represent different hypothetical treatment scenarios.
The prediction module 130 may generate predicted outcomes 105 under different treatment options. Treatment options may be represented as rows of the inputted predictors matrix. Because predictions may be probabilistic in nature, representing a distribution of possible outcomes, they may be generated by sampling distributions. Thus, the prediction module may first sample parameters 131 from the parameter distribution stored in the model representation, then the prediction module 130 may sample from the outcomes distribution 132, conditional on the previously sampled parameters. These two subsystems may repeat their processes one or more times, as necessary, to generate a representative distribution.
The process by which the particular features may be chosen may be manual. Alternatively, automatic generators based on, for example, natural language parsing of domain models or simple causal diagrams, may be used.
The remaining components of
The sample parameters module 431 may read the model representation 421 to fetch values for the following model parameters: effects on TL 442, subject-level effects on TL 441, effects on PFS 443, and subject-level effects on PFS 440. The predictors matrix 403 may be multiplied by the vector of effects on TL 442 and added to the product of the predictors matrix with the subject-level effects on TL to form the TL linear response 445 variable. The TL linear response may be used as an additional predictor along with the other predictors from the predictors matrix for calculating the PFS linear response 444 from the vector of effects on PFS 443 and subject-level effects on PFS 440.
The sample outcomes module 432 may take the linear responses for TL 451 and PFS 450, and draw a sample from the appropriate outcomes distribution. For this example, sample TL outcomes may be drawn from a LogNormal distribution whose location parameter may be specified by the TL linear response 445, and sample PFS outcomes may be drawn from a LogLogistic distribution whose location parameter may be specified by the PFS linear response 444. The sampled outcomes may be appended to the list of predicted outcomes.
Each subtask of parameter sampling and outcomes sampling may be independently repeated over some pre-specified number of iterations (e.g., 1,000 or 10,000) to generate a distribution of predicted outcomes. This predicted outcomes distribution may be summarized by e.g., mean and standard deviation statistics, which provide an indication of the expected outcome and the uncertainty, respectively.
The use of tumor load and progression-free survival as metrics of subject outcomes was provided for illustrative purposes only, and may be not intended to be limiting in any respect. Other metrics may also be created using similar approaches, such as, but not limited to: tumor markers (e.g., CA19-9); overall survival; performance scores (e.g., ECOG or Karnofsky Score), serious adverse events, and so forth.
Returning to
At the next level, the training module comprises a subsystem for constructing priors 141, and a subsystem for performing a Bayesian update 142. The subtask of construction priors may be done by either directly taking samples of model parameters from the model representation 121, or by reading the hyperparameters and functional form of the parameter distribution from the model representation. The Bayesian update process may be performed with a wide variety of algorithmic methods, such as Markov Chain Monte Carlo, Variational Bayesian Inference, and Approximate Bayesian Computation.
An example of such a Bayesian update algorithm may be a Markov Chain Monte Carlo procedure with Metropolis-Hastings proposals (however, other algorithms may be possible; this example may be not meant to be limiting):
In some embodiments, the system may “warm-up” the chain over some large number of iterations until the Markov Chain may be stationary, then may draw samples from the distribution until the desired number of samples may be reached. Metrics such as the autocorrelation time and the Gelman-Rubin convergence statistics may be used to assess the convergence of the algorithm.
The learning loop may be adapted or customized to deal with different types of informative prior information. For example, the system may learn from examples of subjects interacting with their care providers; this may be a case where treatment decisions may be made, and importantly, follow-up data on the subject’s outcome may be available. In another example, the system may learn from surveys of expert opinion; in this case, no subject outcome data may be available, but because the data comes from experts, the strength of the prior beliefs may be high. In another example, the system may learn from clinical trials data; in this case, data involve real subjects with rigorous controls. These three examples may be illustrative and may not be exhaustive. Numerous other examples of learning opportunities may be applied to systems and methods of the present disclosure.
All of these examples involve use of the parsing module, predictions matrix, outcomes vector, the training module and the model module, but arranged in slightly different ways, as may be illustrated herein.
Initially, a subject and the subject’s provider (together, 560) may wish to use the system to decide on the best course of treatment. They may input a case description 561 (which corresponds to prediction inputs 101 in
In some embodiments, the options predictors matrix 506 may be generated (corresponding to predictors matrix 103 of
At this point, and separately from the system, the subject and provider may discuss the options available to them, make a treatment decision, and begin treatment. This may result in an outcome at some future date (e.g., an increase or decrease in the subject’s tumor by some measurable amount), and they may again use a system of the present disclosure at that future date to enter information about how well the treatment performed. This may be where learning takes place.
An example of data being entered and displayed in the system may be shown in
Returning to
The training module 540 may take inputs 503 and 504, as well as the current model representation from the model module 520, to produce an updated model representation. This may complete the “learning loop”, in that the next subject that interacts with the system will receive better predictions from the system due to the updated model from the previous subject’s data.
The results of these polls, along with natural language discussions that may be mined for rationales, may be stored in this tool, allowing results to be communicated to a system of the present disclosure, among other uses.
Returning to
Note that this learning may be done purely based on the opinions of the experts, and not on any actual subject outcomes based on treatments. However, experts often have decades of experience, and may use lateral thinking and analogous reasoning to predict how previously unused combinations of therapies may work together, even in the absence of hard evidence.
A simple reconfiguration of the system’s components may allow training of the model from data that has already been processed from groups of individual subjects, such as summary statistics from clinical trials data. More concretely, a clinical trial may describe the features of its subject sample, the treatments given to subjects, and the median progression-free survival in cohorts of subjects that received particular treatments.
To perform the training task in this scenario, some embodiments of the present disclosure apply an Approximate Bayesian Computation (ABC) method. In context, the ABC rejection sampling algorithm may be performed as follows.
The second operation may mark the beginning of the Approximate Bayesian Computation (ABC) loop. In this operation, which may be the propose subject sample operation 1012, the parsing module may match any inclusion or exclusion criteria and treatment arm descriptions from the clinical trial data against the feature library 1022 from the model module 1020 (corresponding to module 120 in
Next, the training module 1040 (corresponding to module 140 in
At this point, there may be observed summary statistics 1064 from the clinical trial, and predicted summary statistics 1065 from a synthetic subject population. Both the observed summary statistics and the predicted summary statistics may be fed to the compare statistics operation 1041 within the training module. On each ABC iteration, the training module may read the most recent model representation from the mode module. The comparator 1041 may compare the observed and predicted summary statistics, using a pre-specified threshold for how close these quantities need to be in order to be accepted.
If the observed and predicted summary statistics are close enough, then the training module may store the sampled set of prior parameters in the model representation, and the system may successfully exit the training loop. Otherwise, the training module may reject the current parameter sample and another ABC iteration may begin, which includes generating additional synthetic subject samples in the propose subject sample operation 1012, new predicted summary statistics 1065, and the training module comparing statistics again in operation 1041 to check for the ABC loop exit criteria.
In multilevel modeling, the observed variation in outcomes may be assumed to be split between fixed effects from observed covariates and random effects that vary on the unit (e.g., subject) level. For example, there may be an effect on the survival time of a subject from that subject having taken some treatment (e.g., a fixed effect), and there may be additional effects on the survival time from unobserved genetic mutations. Unmeasured sources of variation, such as these unobserved genetic mutations, may be modeled in the subject-level random effects on subject survival. Such subject-level effects may also vary with measured features, but they may still take on different values across subjects.
In the limit of a large number of small unobserved additive effects, the distribution of random effects on a per-unit basis may tend to follow a normal distribution. Deviations from a normal distribution may be thus indicative that there may be underlying sources of variation in outcomes that may be clinically relevant (e.g., that they have effects comparable to or larger than other known sources of variation).
By identifying clusters of subject-level random effects terms, it may be possible to classify subpopulations of interest to be examined in more detail to discover better predictors for likelihood of treatment response or survival time.
The present disclosure provides computer systems that may be programmed to implement methods of the disclosure.
The computer system 1201 can regulate various aspects of analysis, calculation, and generation of the present disclosure, such as, for example, (i) receiving clinical data of a subject and a set of treatment options for a disease or disorder of the subject, (ii) accessing a prediction module comprising a trained machine learning model that determines probabilistic predictions of clinical outcomes of the set of treatment options based at least in part on clinical data of subjects, and (iii) applying the prediction module to clinical data of the subject, treatment features, and/or interaction terms to determine probabilistic predictions of clinical outcomes of the set of treatment options for the disease or disorder of the subject. The computer system 1201 can be an electronic device of a user or a computer system that may be remotely located with respect to the electronic device. The electronic device can be a mobile electronic device.
The computer system 1201 includes a central processing unit (CPU, also “processor” and “computer processor” herein) 1205, which can be a single core or multi core processor, or a plurality of processors for parallel processing. The computer system 1201 also includes memory or memory location 1210 (e.g., random-access memory, read-only memory, flash memory), electronic storage unit 1215 (e.g., hard disk), communication interface 1220 (e.g., network adapter) for communicating with one or more other systems, and peripheral devices 1225, such as cache, other memory, data storage and/or electronic display adapters. The memory 1210, storage unit 1215, interface 1220 and peripheral devices 1225 may be in communication with the CPU 1205 through a communication bus (solid lines), such as a motherboard. The storage unit 1215 can be a data storage unit (or data repository) for storing data. The computer system 1201 can be operatively coupled to a computer network (“network”) 1230 with the aid of the communication interface 1220. The network 1230 can be the Internet, an internet and/or extranet, or an intranet and/or extranet that may be in communication with the Internet.
The network 1230 in some cases may be a telecommunication and/or data network. The network 1230 can include one or more computer servers, which can enable distributed computing, such as cloud computing. For example, one or more computer servers may enable cloud computing over the network 1230 (“the cloud”) to perform various aspects of analysis, calculation, and generation of the present disclosure, such as, for example, (i) receiving clinical data of a subject and a set of treatment options for a disease or disorder of the subject, (ii) accessing a prediction module comprising a trained machine learning model that determines probabilistic predictions of clinical outcomes of the set of treatment options based at least in part on clinical data of subjects, and (iii) applying the prediction module to clinical data of the subject, treatment features, and/or interaction terms to determine probabilistic predictions of clinical outcomes of the set of treatment options for the disease or disorder of the subject. Such cloud computing may be provided by cloud computing platforms such as, for example, Amazon Web Services (AWS), Microsoft Azure, Google Cloud Platform, and IBM cloud. The network 1230, in some cases with the aid of the computer system 1201, can implement a peer-to-peer network, which may enable devices coupled to the computer system 1201 to behave as a client or a server.
The CPU 1205 may comprise one or more computer processors and/or one or more graphics processing units (GPUs). The CPU 1205 can execute a sequence of machine-readable instructions, which can be embodied in a program or software. The instructions may be stored in a memory location, such as the memory 1210. The instructions can be directed to the CPU 1205, which can subsequently program or otherwise configure the CPU 1205 to implement methods of the present disclosure. Examples of operations performed by the CPU 1205 can include fetch, decode, execute, and writeback.
The CPU 1205 can be part of a circuit, such as an integrated circuit. One or more other components of the system 1201 can be included in the circuit. In some cases, the circuit may be an application specific integrated circuit (ASIC).
The storage unit 1215 can store files, such as drivers, libraries and saved programs. The storage unit 1215 can store user data, e.g., user preferences and user programs. The computer system 1201 in some cases can include one or more additional data storage units that may be external to the computer system 1201, such as located on a remote server that may be in communication with the computer system 1201 through an intranet or the Internet.
The computer system 1201 can communicate with one or more remote computer systems through the network 1230. For instance, the computer system 1201 can communicate with a remote computer system of a user. Examples of remote computer systems include personal computers (e.g., portable PC), slate or tablet PC’s (e.g., Apple® iPad, Samsung® Galaxy Tab), telephones, Smart phones (e.g., Apple® iPhone, Android-enabled device, Blackberry®), or personal digital assistants. The user can access the computer system 1201 via the network 1230.
Methods as described herein can be implemented by way of machine (e.g., computer processor) executable code stored on an electronic storage location of the computer system 1201, such as, for example, on the memory 1210 or electronic storage unit 1215. The machine executable or machine readable code can be provided in the form of software. During use, the code can be executed by the processor 1205. In some cases, the code can be retrieved from the storage unit 1215 and stored on the memory 1210 for ready access by the processor 1205. In some situations, the electronic storage unit 1215 can be precluded, and machine-executable instructions may be stored on memory 1210.
The code can be pre-compiled and configured for use with a machine having a processer adapted to execute the code, or can be compiled during runtime. The code can be supplied in a programming language that can be selected to enable the code to execute in a pre-compiled or as-compiled fashion.
Aspects of the systems and methods provided herein, such as the computer system 1201, can be embodied in programming. Various aspects of the technology may be thought of as “products” or “articles of manufacture” typically in the form of machine (or processor) executable code and/or associated data that may be carried on or embodied in a type of machine readable medium. Machine-executable code can be stored on an electronic storage unit, such as memory (e.g., read-only memory, random-access memory, flash memory) or a hard disk. “Storage” type media can include any or all of the tangible memory of the computers, processors or the like, or associated modules thereof, such as various semiconductor memories, tape drives, disk drives and the like, which may provide non-transitory storage at any time for the software programming. All or portions of the software may at times be communicated through the Internet or various other telecommunication networks. Such communications, for example, may enable loading of the software from one computer or processor into another, for example, from a management server or host computer into the computer platform of an application server. Thus, another type of media that may bear the software elements includes optical, electrical and electromagnetic waves, such as used across physical interfaces between local devices, through wired and optical landline networks and over various air-links. The physical elements that carry such waves, such as wired or wireless links, optical links or the like, also may be considered as media bearing the software. As used herein, unless restricted to non-transitory, tangible “storage” media, terms such as computer or machine “readable medium” refer to any medium that participates in providing instructions to a processor for execution.
Hence, a machine readable medium, such as computer-executable code, may take many forms, including but not limited to, a tangible storage medium, a carrier wave medium or physical transmission medium. Non-volatile storage media include, for example, optical or magnetic disks, such as any of the storage devices in any computer(s) or the like, such as may be used to implement the databases, etc. shown in the drawings. Volatile storage media include dynamic memory, such as main memory of such a computer platform. Tangible transmission media include coaxial cables; copper wire and fiber optics, including the wires that comprise a bus within a computer system. Carrier-wave transmission media may take the form of electric or electromagnetic signals, or acoustic or light waves such as those generated during radio frequency (RF) and infrared (IR) data communications. Common forms of computer-readable media therefore include for example: a floppy disk, a flexible disk, hard disk, magnetic tape, any other magnetic medium, a CD-ROM, DVD or DVD-ROM, any other optical medium, punch cards paper tape, any other physical storage medium with patterns of holes, a RAM, a ROM, a PROM and EPROM, a FLASH-EPROM, any other memory chip or cartridge, a carrier wave transporting data or instructions, cables or links transporting such a carrier wave, or any other medium from which a computer may read programming code and/or data. Many of these forms of computer readable media may be involved in carrying one or more sequences of one or more instructions to a processor for execution.
The computer system 1201 can include or be in communication with an electronic display 1235 that comprises a user interface (UI) 1240 for providing, for example, (i) a visual display indicative of training and testing of a trained algorithm, (ii) a visual display of data indicative of a cancer status of a subject, (iii) a quantitative measure of a cancer status of a subject, (iv) an identification of a subject as having a cancer status, or (v) an electronic report indicative of the cancer status of the subject. Examples of UIs include, without limitation, a graphical user interface (GUI) and web-based user interface.
Methods and systems of the present disclosure can be implemented by way of one or more algorithms. An algorithm can be implemented by way of software upon execution by the central processing unit 1205. The algorithm can, for example, (i) receive clinical data of a subject and a set of treatment options for a disease or disorder of the subject, (ii) access a prediction module comprising a trained machine learning model that determines probabilistic predictions of clinical outcomes of the set of treatment options based at least in part on clinical data of subjects, and (iii) apply the prediction module to clinical data of the subject, treatment features, and/or interaction terms to determine probabilistic predictions of clinical outcomes of the set of treatment options for the disease or disorder of the subject.
A system of the present disclosure may perform the following method for an automated identification of anomalous subject subpopulations:
While preferred embodiments of the present invention have been shown and described herein, it will be obvious to those skilled in the art that such embodiments are provided by way of example only. It is not intended that the invention be limited by the specific examples provided within the specification. While the invention has been described with reference to the aforementioned specification, the descriptions and illustrations of the embodiments herein are not meant to be construed in a limiting sense. Numerous variations, changes, and substitutions will now occur to those skilled in the art without departing from the invention. Furthermore, it shall be understood that all aspects of the invention are not limited to the specific depictions, configurations or relative proportions set forth herein which depend upon a variety of conditions and variables. It should be understood that various alternatives to the embodiments of the invention described herein may be employed in practicing the invention. It is therefore contemplated that the invention shall also cover any such alternatives, modifications, variations or equivalents. It is intended that the following claims define the scope of the invention and that methods and structures within the scope of these claims and their equivalents be covered thereby.
This application is a continuation of International Application No. PCT/US2021/035759, filed Jun. 3, 2021, which claims the benefit of U.S. Provisional Pat. Application No. 63/034,578, filed Jun. 4, 2020, and U.S. Provisional Pat. Application No. 63/094,478, filed Oct. 21, 2020, each of which is incorporated by reference herein in its entirety.
Number | Date | Country | |
---|---|---|---|
63034578 | Jun 2020 | US | |
63094478 | Oct 2020 | US |
Number | Date | Country | |
---|---|---|---|
Parent | PCT/US2021/035759 | Jun 2021 | WO |
Child | 18074659 | US |