This invention relates generally to the field of disease detection and treatment identification, and in particular to using artificial intelligence in identifying prostate cancer patients at high-risk of progression among clinically intermediate risk group.
Prostate cancer (PCa) is the second most diagnosed cancer among US men. About 70% of PCa patients are either cured after the first-line of intervention (radical prostatectomy and/or ionizing radiation therapy) or their disease remains organ-confined for life. Thus, the treatment decision for PCa relies on a patient's likelihood of disease progression. Patients whose prostate has been debulked with the first line of therapy are generally followed by a simple blood test to detect prostate specific antigen (PSA) levels in the blood. About 30% of the patients with rising blood PSA after the first line of treatment, defined as biochemical relapse (BCR), are considered to be at high risk of having progressing metastatic PCa and need aggressive treatment.
Generally, methods such as Gleason grading (GG), tumor volume measurement, and clinical staging (TNM) are used in the clinic to prognosticate the risk of disease progression. About 60% of the patients, who do not exhibit explicit high or low risk characteristics when assessed using these methods are considered intermediate-risk and do not have a clearly defined treatment plan. Roughly half of the patients within this group return to the clinic with disease progression after their first line of therapy. While these intermediate-risk patients are further classified into Gleason Grade Group 2 and 3 ((GG) 3+4 and 4+3, respectively), studies indicate that a large number of patients with progressing PCa can be found in both groups.
The application of more aggressive therapies at the earlier stages of PCa for patients with high risk of progression have improved patient outcomes. For example, the clinical trials PROSPER, SPARTAN, ARAMIS, added enzalutamide, apalutamide, or darolutamide, respectively, to standard androgen deprivation therapy (ADT) to both metastatic and non-metastatic castrate resistant PCa (CRPC). All of the trials showed improvements in multiple clinical end points such as overall survival, time to metastasis, and time to biochemical recurrence. Thus, it is important to clearly and efficiently identify intermediate risk patients with high risk of progression at an earlier stage of PCa, to benefit from timely intervention with adjuvant therapies after the first line of therapy in order to improve overall disease outcome.
In general, the best indicators of risk of BCR are GG and TNM staging. However, recently some clinicians are using more advanced methods such as genomic tests. Based on the results of the whole tissue genomic analysis, a few sets of genomic markers included in the scoring systems of DECIPHER, POLARIS, Oncotype GPS, etc. have been identified for PCa prognosis and have recently started entering clinical practice. Most of these genome-based scores, however, marginally improved the prognostic accuracy of GGS thus far. Their diagnostic accuracy (at <70% for the intermediate risk group patients), requirement of a relatively large volume of tissue samples often unachievable in prostate biopsies, high pricing, and long turn-around time pose significant hurdle in clinical applications.
In recent years, computer vision-based deep learning has shown to recognize objects and diagnose diseases from histopathology whole slide images (WSIs) with impressive accuracy. Prior in silico studies that we and other have performed have shown deep learning models with performance at par with human experts for diagnostic tasks such as tumor detection and grading. The models deliver accurate detection and quantification of known histological patterns of established clinical significance and reduce inter-observer and intra-observer variability among general and subspecialty expert pathologists. In parallel, methods focusing on directly learning morphological features that are associated with clinical outcome have been developed. Some success in this field includes predicting overall survival from colorectal cancer using histology slides and immune response in lung cancer treatment. Our prior work has shown ability to predict response to neoadjuvant chemotherapy in triple negative breast cancer.
Similarly, in the area of PCa, various AI-enabled digital pathology-based methods have also been published, citing potential to improve patient risk stratification by improving GG using automated quantification or, in some cases, through novel biomarker discovery. Methods which take the route of quantifying GG, however, have some weaknesses. Distinguishing within GG 7 (3+4 and 4+3) have still vexed many algorithms. One study merged these two groups together into one category for prediction, defeating the purpose of the current GGG stratification. Another study was able to predict GG adequately overall but also struggled in distinguishing 3+4 and 4+3. Both methods are limited to predicting disease progression to how well they can quantify morphologies based on GG. These limitations of GG indicate that there are further markers to be discovered within the prostate tumor landscape for better patient stratification to guide successful clinical intervention.
Very recently, an AI-based retrospective tissue image analysis of a large RTOG trial prostate tissue images identified intermediate-risk patients who will benefit from adding short term androgen deprivation therapy (ADT) with ionizing radiation [19, 20]. No such stratification has been available for intermediate-risk patients about the benefits of radical prostatectomy (RP) and subsequent ADT and/or anti-androgen therapy. This method showed the power of combining self-supervised learning to learn high dimensional features from WSIs with standard clinical features to provide a risk of relapse at 5 years with an AUC of 0.67. Another study extracted gland features in PCa WSIs, independent of GG patterns, in order to rank patients by risk of BCR. Their method produced a concordance index of 0.68 on a test set. This model, however, was limited to hand-engineered morphological features. While these features are quantitative, they are still limited to human perception and thus, suffers from subjectivity.
Consequently, there is a need for computer aided systems and methods that can identifying prostate cancer patients at high-risk of progression among clinically intermediate risk group.
The systems and methods described herein provide an AI-powered platform that objectively extract features at visual and more importantly at sub-visual levels and accurately identified all GG patterns, including distinguishing 3+4 and 4+3, with a weighted kappa score of 98%. To bridge this advanced AI-powered method with clinically relevant biological significance, we performed genomic analysis in areas of tissue indicated as especially high, and determined a potentially new STING pathway-related PCa biomarker, TMEM173. Through a combination of semi-supervised learning and accurate morphological quantification, we developed a system which only relies on digitized H&E WSIs to predict risk of BCR within 36 months and metastatic disease more accurately than GG, TNM, standard nomograms, as well as genomic tests. The system delivers a risk score using a cost-efficient, secure, and quick algorithm which preserves the WSIs analyzed, overcoming many hurdles of genomic analysis such as paucity of material, shipping of tissue samples, and others listed above.
These drawings and the associated description herein are provided to illustrate specific embodiments of the invention and are not intended to be limiting.
The following detailed description of certain embodiments presents various descriptions of specific embodiments of the invention. However, the invention can be embodied in a multitude of different ways as defined and covered by the claims.
Unless defined otherwise, all terms used herein have the same meaning as are commonly understood by one of skill in the art to which this invention belongs. All patents, patent applications and publications referred to throughout the disclosure herein are incorporated by reference in their entirety. In the event that there is a plurality of definitions for a term herein, those in this section prevail.
When the terms “one”, “a” or “an” are used in the disclosure, they mean “at least one” or “one or more”, unless otherwise indicated.
In some embodiments, the systems and methods described herein provides a novel and previously “unknown” morphological features on digitized images of H&E-stained slides that drive tumor progression. The system converts image patches into mathematical vector representations to generate hundreds of clusters of morphologically similar patterns using state-of-the-art Deep Convolutional Neural Network (CNN)-based models. It then ranks these image clusters to identify novel Regions of Interest (ROIs) on the H&E-stained slides which have high or low prognostic/predictive values correlated to patient disease outcome information to generate a morphometric score for therapy response prediction. These ROIs capture both cancer as well as TME landscapes, including spatial distribution of the immune and stromal cells relative to the cancer zones that play major roles in tumor progression and resistance to therapy.
In the context of disease treatment, current medical practice and standard-of-care (SOC) might treat patients based on stages of a disease and the patient's responsiveness to available treatments for those stages. These treatments in case of cancer may include administrating drugs, radiation therapy, surgery or other forms of treatment. If the patient does not respond to available treatments for the stage of disease the patient is in, the patient transitions to a different stage, where different, and potentially more aggressive treatment options may be applied. Later stage treatment options may include experimental or advanced therapy options.
The dynamics of disease progression and applied treatment, in some cases, are as follows. Most patients in earlier stages, respond well to treatment, but a smaller percentage of patients in earlier stages do not respond well to the treatment options applied in those earlier stages. For example, the first group of patients 14 that respond well to treatment options 12 can be a substantially larger number than the second patient group 16 that do not respond well to the treatment options 12. Furthermore, the treatment options that are available to be applied in the later stages can have more efficacy if they are applied in earlier stages. In other words, for the second patient group 16, if treatment options 20, were applied, when those patients were in an earlier disease stage 10, the treatment options 20 may have had more efficacy.
Furthermore, later stage treatment options 20 can include potentially more aggressive treatments or experimental advance therapy options. In some cases, the late stage treatment options 20 can be experimental in nature and can include treatment options that governmental approval may not yet have been obtained. Nonetheless, the second group patients 16 may substantially benefit from those treatments if they were applied in an earlier disease stage 10. Consequently, in terms of disease treatment efficacy and treatment discovery, systems and methods that help early identification of patients that respond well to available treatment options can be beneficial and needed.
Also, pharmaceutical companies are running several thousand clinical trials to get advanced and novel drugs to market across several cancer indications. Lack of reliable predictive biomarkers to identify responders vs non-responders to these drugs result in random selection of patients for the trials and contributes to the low success rate. Even when some of these trials succeed, only a small percentage of patients respond to the drugs when administered in clinical practice. Consequently, there is a huge unmet need for pharmaceutical companies to identify responders to a new drug early on.
Biochemical signatures, biomarkers etc. can be used to predict patient outcome. Some genomics and proteomics techniques to discover biomarkers that predict patient response are focused on biochemical markers and the structured molecular data of those biochemical markers, such as DNA, RNA, and protein data. This approach has major challenges. First, molecular analysis is done from DNA, RNA, and protein extracted from whole tissue which delivers an average molecular signature across tens of thousands of cancer, benign and micro-environmental (e.g., stroma, immune etc.) cells. Consequently, this approach works better when a single or few genes are heavily overexpressed or under-expressed across an entire tissue. However, tumors can be inherently heterogeneous, and there are several molecular subtypes with varying levels of aggressiveness that show up in the same tumor and the tumor micro-environment. This molecular signal gets lost when averaged over an entire tissue.
Second, biochemical, molecular, structural or other analysis of tumor alone do not present a full picture of the disease. In many cases, it is the spatial interaction of the tumor with the tumor micro-environment (TME), including the stroma, several types of immune cells, blood vessels etc., whose interplay determines patient response. Many current genomic analyses are not able to capture the TME dynamics, nor is there one single RNA or protein that can be linked to driving patient response. Nevertheless, Histopathology remains the cornerstone of cancer diagnosis. Many molecular changes and TME elements that are linked to disease can result in morphological changes that are visible on tissue slides. Consequently, systems and methods that can identify and extract morphometric features that correlate with patient outcome from histopathology slides are valuable in disease treatment. Therefore, it is advantageous to employ artificial intelligence (AI) in an unsupervised manner to identify and extract these morphologic features.
Furthermore, the field of studying biomarkers and identification of morphologic features for drug and treatment discovery can be slowed down by the sheer number of samples and patient data that need to be analyzed to identify biomarkers of interest. For example, some methods rely on or work in conjunction with laboratory test results. Described embodiments substantially reduce the volume and number of data that need to be analyzed in a laboratory environment, making the applications of the described embodiments more practical than existing systems. For example, the described embodiments can identify regions of interest (ROIs) on tissue slides that are more predictive, and more promising or relevant for performing laboratory molecular analysis to identify predictive biomarkers of patient response. The identification of aberrant genes/proteins present in the ROI known to be involved in therapy response prediction may also enhance easy detection of disease or therapy response biomarkers, unlike techniques which operate on the whole tissue slide, where the abnormality could be masked by the large preponderance of cells with normal proteogenomic patterns.
Current methods of cancer diagnosis, and in some cases cancer prognosis using histopathology include trained pathologists examining sample slides from a patient. The pathologists examine patient cells and look for patterns and other markers as identified in one or more SOC trade guidelines, such as guidelines published by the national comprehensive cancer network (NCCN). Pathologists identify type of cells, they are observing in the sample, as well as identifying whether a patient sample contains benign or malignant tumor cells, and in some cases, a grading of the detected cancer cells. The SOC guidelines are typically generated by researchers and health care professionals who through their years of experience observing patient samples have accumulated a knowledge-base of correlations between features in patient sample tissue and cases of other patients in the past and an associated outcome with the observed features or combination of several specific features. In this paradigm, the identification of biomarkers is limited to the guidelines and past experiences of the healthcare professionals. The process of updating the guidelines and the way the pathologists scan, examine and identify biomarkers is therefore a dynamic and at the same time a slow process.
In other words, the current methodologies of biomarker identification can include matching features from a sample space against a limited-scope database of known biomarkers. The described embodiments, on the other hand, can utilize unsupervised artificial intelligence architectures to scan tissue sample image data at a much faster speed and also identify biomarkers predictive of patient outcome that has never been previously identified.
Another challenge with traditional methods of identification of biomarkers and drug target is that diseases, such as cancer can be highly heterogenous and evolving over time. One tumor may include many different molecular subtypes some of which may be biomarkers predictive of patient response. Many techniques look at a small subset of potential molecular subtypes by analyzing a whole tissue slide from a patient. That approach has identified some useful biomarkers, but a wealth of data and information in each patient slide also remains unexamined. As a result, many patients still get baseline treatments, even though they may be good candidates for a different treatment option. Not knowing the relevant biomarkers, the success rate of many treatments is lower than maximum because a large patient population are treated with the same treatment options, without regards to the anticipated response. What is worse, is that in the absence of better alternative, low-success rate treatment options become SOC. Systems and methods that can identify biomarkers predictive of patient response will help to identify patients, who are good candidates for a specific treatment option and deliver targeted and personalized therapies to an individual patient.
Furthermore, patient outcome and responsiveness can be a multimodal problem, where tumor alone or normal disease pathways and mechanisms may not be the only relevant factors. For example, a tumor micro-environment (TME) can play a significant role in patient responsiveness. A drug can be correctly designed based on a disease or tumor, but it might not reach the correct target in the patient if the drug is not designed with the TME of the tumor cells in mind. As an example, a drug might be correctly designed to activate immune system, but in some patients, the tumor might have few infiltrated immune cells, or might have immune suppressor cells nullifying the drug effect. The described embodiments use, the TME of a cell, including stroma, immune cells, blood vessels etc., as well as the tumor cells, when identifying biomarkers, thus enabling the selection of treatment options with higher success potential.
In one sense, the traditional methods of biomarker identification have relied on molecular biologists and pathologists as the initial actors that identified the biomarkers. The results of human-driven identification of biomarkers predictive of patient response were then verified using bioinformatics and statistical analysis. As discussed earlier, the human-driven method of biomarker identification is necessarily limited in the size and number of patient samples capable of being analyzed in laboratory settings, and by the patterns and structures that have previously been identified in research and trade guidelines. The disclosed embodiments, on the other hand, analyze tissue samples in a patient or patient population and identify biomarkers predictive of patient response that may not have been previously known. The results, including the newly identified biomarkers, can be further confirmed by pathologists or biologists in a laboratory setting.
For pharma and oncologists, each stage and sub-type of disease and each potential drug is a unique challenge for biomarker discovery and drug development. The disclosed computer aided systems and methods that correlate disease outcome with tissue morphology are agnostic to the type of cancer and of its treatment. The systems and methods rank morphological features based on known patient outcome to a particular drug to treat a specific disease but do not depend upon the drug mechanism itself. They can therefore be applied to any disease such as cancer that changes morphology and the treatment of interest.
In another embodiment, the method can also be used to identify morphometric features on patient tissues that correlate with a particular molecular change, such as protein loss or gene mutations without the need for a molecular test such as immunohistochemistry (IHC) or gene panel testing. It can rank the morphological features in an unsupervised manner using only the molecular status as label and determine the lead morphology features that can be related to the molecular change.
Next, a disease detection and grading module (DDGM) 208 transfers the image patches into a vector representation. The DDGM 208 can receive labels 210 for a given disease and using supervised artificial intelligence techniques can determine which label applies to a given patch and augment a vector representation of the patches with applicable labels. These labeled vectors are input to a sub-morphology detector 212, which can use unsupervised learning to determine further morphological sub-patterns within the labels 210. The sub-morphology detector 212 can output structured vectors what include vector representations of image patches labeled with labels 210 and morphological sub-patterns determined by unsupervised learning. The structured vectors outputted from sub-morphology detector 212 are input to a region of interest (ROI) and outcome prediction module 214, which can rank the patches in terms of patient disease outcome and based on whether the determined morphological sub-patterns to which a patch belongs occur in patients with adverse outcomes or patients with good outcome or response treatment. The ranking includes assigning the patches a patch-level score. The patch-level scores can be combined to arrive at a patient-level score indicative of a prediction of a patient's response to a treatment. In one embodiment, a patch-level score can be a number between 0 and 1, where a high score (approximately 1) reflects that the morphological sub-pattern detected for a patch, only shows up in patients with adverse outcome, while a low score (approximately 0) reflects that the morphological sub-pattern only shows up in patients with good outcome.
Patch-level scores can also be used to determine regions of interest on patient's tissue for which further focused laboratory, biochemical or biomarker identification analysis yields information about predicting patient response and/or fine-tuning the artificial intelligence models within the system 200 and/or the ROI and outcome prediction module 214. The regions of interest can capture data on various patient tissue, such as tumor, immune and stromal cells and as a result the ROIs can capture both tumor heterogeneity as well as tumor micro-environment (TME) elements that are prognostic or predictive of the patient outcome.
The ROIs can be input to a spatial profiling and biomarker identification module (SPBI) module 216, where molecular analysis is performed. The molecular analysis is performed on the ROIs to capture differential expression of proteins/RNA in the regions marked as ROI versus regions not marked as ROI. The correlation is done on ROI vs non-ROI of patients with adverse outcome, as well as between patients that have different outcomes to identify the protein/RNA markers that are driving the patient outcome.
In some embodiments, an IHC or immunofluorescence (IF) module 218 can be used. The IHC/IF biomarker slides can be generated for the protein markers identified by SPBI module 216 to capture the spatial distribution in the TME. These biomarker slides can be co-registered with the H&E slides (or other type of input images if used) to determine patch-level biomarker quantification and distribution, as well as other prognostic or predictive data. The combination of biomarker expression (quantity) with morphology data (morphological sub-pattern indication identified by the sub-morphology detector 212) can be used to further improve the accuracy of patient outcome prediction by various modules of system 200, including the ROI and outcome prediction module 214.
In one aspect, the system 200 reduces the complexity of data present in a patient image slide to a data structure. Images 202 and 204 can be WSI or any digitized version of patient tissue, bone or other anatomical regions included but not limited to biopsied tissue, resected samples, circulating blood cells, etc. The images can be divided to a range of 50 to 100,000 patches in some embodiments. These patches can contain different expressions of tumors, benign cells and the microenvironment of the cells. In one sense, the input to the system 200 can include a vast and complex dataset of images containing millions of cells and thousands of patches. The system 200 analyzes this complex dataset and transfers it into structured and usable data for disease and patient response prediction, in one aspect, by determining morphological similarity and determining a more limited dataset of morphological sub-patterns within which these millions of cells and thousands of patches may be classified.
The system 200 performs vectorization on the input image patches and captures morphological similarity between those patches by performing vector operations on the vectors resulting from the patches. In one respect, the tissue space observed in image slides for a given disease (e.g., a tumor type) can be broken down to distinct categories of morphological sub-patterns. For example, within a broader morphological pattern labeled by labels 210. For example, one label 210 might be cells that have morphological patterns of benign cells, while there may be 50 morphological sub-patterns of those benign cells which can further classify those benign cells with more granularity and precision. In routine clinical practice, those morphological sub-patters may be unknown or not labeled to increase the reading efficiency of human pathologists. Nonetheless, those sub-patterns can contain valuable and more targeted information to treat disease or predict patient outcome. The system 200 can determine these sub-patterns within a given label and reduce the complexity of the data.
The input images 202 may be, without any associated patient outcome and for the purpose of training the artificial intelligence networks of the system 200 to identify various morphological sub-patterns. On the other hand, input images 204 may include patient outcome data, so the system 200 can additionally identify whether the detected morphological sub-patterns occur in patients with good or adverse outcomes. In the case of cancer, patients who respond well to a treatment, express the cancer on their tissues in morphologically different ways than the patients who do not respond well to the treatment. Identification of morphological sub-patterns that occur only in patients with good outcome versus those that occur in patients with poor outcome can act as a marker or signature of the category to which a patient might be predicted to belong. Consequently, the detected morphological sub-patterns identify signatures or signals indicative of patient outcome or response, which is used to predict patient outcome or response at an earlier stage of a disease.
In another aspect, the system 200 reduces the complexity of the input data and the overall problem of identifying morphological markers predictive of patient response. For example, at the stage of dividing the input images into patches, hundreds to thousands of patches might exist per patient where thousands of patients might be participating in a treatment program or in a clinical trial. Each image has in the order of gigabytes of data per image. The system 200 reduces the complexity of the input data and the biomarker/morphological identification to, for example, tens of thousands of morphological sub-patterns, where every patient data can be expressed in terms of structured vectors including identification of detected morphological sub-patterns. Vectors can be analyzed between patients with good outcome and patients with poor outcome. Patches with high predictive value and low predictive value can be identified. For example, patches belonging to morphological sub-patters occurring only in patients with adverse outcomes (e.g., failing treatment) can be given a high score. An artificial intelligence model can be trained to rank patches and assign them scores, based on known patient outcomes. The model can learn which patches only occur in patients who fail treatment, which patches only occur in patients who respond well to treatment, and which patches occur in both groups (and are therefore of low predictive value). Accordingly, the model can assign a score to each patch vector and its corresponding vector.
In some embodiments, the disease detection and grading module 208 can be exposed to patient data in two ways. Patient Images 202 do not include patient outcome data. Patient outcome data can be difficult or time consuming to obtain. In some cases, patient outcome data can be available only after following up with a patient 3-5 years after a treatment option is administered. Nonetheless, the artificial intelligence models of the disease detection and grading module 208 can be exposed to input patient image data, without any known patient outcome, for the purpose of training the models to better identify morphological similarity and morphological patterns in patient image data. On the other hand, the disease detection and grading module 208 and the artificial intelligence models therein can be exposed to images 204, where the patient outcome is known, so the models of the system 200 can associate the detected morphological patterns and sub-patterns to a patient outcome and learn how to rank a detected morphological pattern in terms of patient outcome.
In other words, ranking of patches is specific to a particular task, which is specific to a particular clinical question, while detecting morphological similarity (detecting patterns and sub-patterns) can be universal, because a disease (such as cancer) shows up in patient tissue in so many different ways, regardless of the treatment given. Consequently, the models of disease detection and grading module 208 can be improved by exposing them to more patient images, regardless of patient outcome.
In one aspect, the supervised learning models of the system 200 are trained to identify morphological patterns associated with the labels 210 and the unsupervised learning models of the system 200 can identify morphological sub-patterns within the tissue samples labeled by supervised learning models of the system 200. Consequently, the system 200 can identify morphological similarity (morphological patterns and sub-patterns) through unsupervised networks across a population of patients for a given disease. Input images 204 that include patient outcome data can be processed to rank each patch within those images with a score indicative of patient outcome. The patch-level scores can be combined to yield a patient-level score for each patient. The patient-level score indicates a prediction of the patient response to a given treatment.
Additionally, patch-level scores yield ROIs that are candidates for more analysis, both for training the models of the system 200 and for predicting patient outcome or response. Instead of performing molecular analysis of a whole patient image slide, only the regions of an image that contain morphological patterns and sub-patterns that only occur in one group of patients, are analyzed in more detail to accurately yield biomarkers predictive of patient outcome. At the same time, the ROIs identified by the models of the system 200 include the environment and the context of cell tissues to make an improved analysis of those regions, taking into account the heterogeneous nature of cancer. The same tumor cells can look and behave differently across different regions of the tissues, because cancers and diseases are not uniformly mutating or evolving across different sites. Similarly, the microenvironments of the cells also can look different depending on where in the tissue they are from. There may be different types of immune cells, blood vessels and stromal cells that appear in the images 202 and 204 differently depending on the region of the tissue they are from. Consequently, the ranking and scoring of image patches in the system 200 can be at least partially based on the environment and the microenvironment of the cells.
As discussed earlier, the system 200 makes possible analyzing regions of interests in a tissue, as opposed to a whole slide analysis of the tissue. In other words, the system 200 narrows down the field of view to regions that have predictive value. Molecular analysis of those regions can identify the biological mechanisms and pathways that are driving a detected phenotype (morphological pattern or sub-pattern). In other words, the visible regions on patient image slide (such as an H&E image slide) is a manifestation of a tumor, but there are biochemical changes within those cells captured by the image slide that constitutes the basis for the manifestation or morphological patterns or sub-patterns that have appeared in a patient image slide. Identification of those biological processes and pathways through molecular analysis allows for accurate identification of predictive biomarkers as well as developing drugs and treatment options that target those pathways and/or explain why patients respond or do not respond to a given treatment. Many different techniques can be used for further analysis of the ROIs in detail. These can include IHC imaging, IF imaging, genetic profiling and other techniques. These techniques can identify tumor cells, immune cells or other cellular and subcellular components and molecules in the ROIs and mutations or evolutions in those cells. Such identifications can be used for further refinement of the predictive models of the system 200 and/or for better understanding of the disease or patient outcome response to a given treatment. IHCs or IFs can be co-registered on the image slide to obtain a quantification of the expressions of the biomarkers of interest. The quantification can be used for refinement of the predictive models of the system 200 and/or for better understanding of the disease or patient outcome.
Pathologists can annotate a set or subset of training images with labels 210. The Labels 210 can be a high-level annotation of morphological patterns that may occur in images 202 and 204. Example labels 210 can include, but not limited to, benign, cancer precursor, low, medium or high-grade cancers, immune cells, stromal cells, etc. The labels 210 can be based on morphological patterns known to pathologists used for cancer detection and grading. A vectorization module 302 can include one or more supervised artificial intelligence networks, including for example, neural networks, deep neural networks, convolutional neural networks (CNNs), and other artificial intelligence networks. The vectorization module 302 accepts as input image patches and using the labels 210 classifies the patches into the categories identified by labels 210. In the example shown, image patches are categorized between 5 labels L1-L5. As described earlier, in some embodiments, the labels 210 can include a high-level disease identification and a high-level grading of the detected disease. The DDGM 208 outputs a labeled vector for each input image patch, that places the patch in a category identified by labels 210.
The labels 210 may be high-level indications of morphological patterns. For example, there may be hundreds of morphological shapes and sub-patterns in which benign cells can appear on an image slide. Labels 210 may be at a high-level or high-abstraction level because labeling all the morphological sub-types that can appear in an image segment can be impractical, burdensome or difficult. Consequently, in regular clinical practice, a pathologist might label and rely on broad labeled categories. The DDGM 208 in combination with the sub-morphology detector 212 can further classify image patches based on the sub-types and sub-morphological categories to which they might belong.
Initially, the sub-patterns are broadly labeled using labels 210 (e.g., benign, cancerous, low, medium or high-grade cancer, immune, stroma) through the supervised learning processes of the DDGM 208. As will be described, the sub-morphology detector 212 uses an unsupervised learning method to extract sub-patterns that may be present in each label 210. In one embodiment, the labels 210 are based on the level of granularity that a pathologist might use to label images or image segments in her regular clinical practice. The artificial intelligence models of the vectorization module 302 can be trained to detect these labels in a set of input patches. In other words, the models of the vectorization module 302 learn morphological patterns and features corresponding to each label 210 and can distinguish and categorize the image patches based on those morphological features and patterns. In some embodiments, the last layer of prediction of the models of the vectorization module 302 can be used to extract a morphological vector corresponding to a patch, which can be used to represent the input data with less complexity, while retaining data relevant to patient outcome or response.
In one aspect, the input image data is heterogeneous and complex. When a pathologist examines an image segment (e.g., an image or image segment of a gland), they look at the nuclei, they look at the gland, and they look at the environment around the gland to make a judgment of whether it is cancerous or not. There may be some cells inside the gland that look cancerous, but the pathologist will still grade the whole gland as benign, if they see some other types of cells, which they know only show up in benign glands. The models of the system 200 and the DDGM 208 perform a similar function. The vectorization module 302 can include auxiliary labels 308 (e.g., nuclei, cytoplasm, gland, neighborhood, etc.). Image patches are also labeled according to auxiliary labels 308. In one embodiment, the models of the vectorization module 302 further classifies the image patches having auxiliary labels 308 at various sizes and resolutions. At the end of a last patch, the models of the vectorization module 302 can for example determine whether a patch has an auxiliary label 308 of a nucleus. The patch is processed through the vectorization module 302 at different resolutions, where it is determined if a label 210 is applicable to the patch, when the patch is viewed at different sizes and resolutions. For example, the vectorization module 302 can determine whether the nucleus is cancerous, benign, whether if the nucleus is in a gland, the gland is cancerous or not, and whether other cells within the neighborhood are cancerous or not. Based on the output of the processing of a patch at multiple resolutions and sizes, the vectorization module 302 can apply or modify a label 210 of the patch accordingly. This is similar to a process that a human pathologist might employ to apply a label 210 (e.g., cancer or not cancer) to an image patch. For example, in clinical practice, if a gland (viewed at high resolution) looks benign, but everything around it (viewed at lower resolution) is cancerous, a pathologist is more likely to conclude that that gland is also cancerous.
In other words, the models of the system 200, including the models of vectorization module 302 can operate on image data in the same way a pathologist might operate on the data (e.g., by labeling the image patches at various resolutions). Consequently, the models learn relevant and effective features, and learn to ignore artifacts. In some embodiments, image patches having been applied an auxiliary label 308 can be processed in the vectorization module 302 at a plurality of resolutions. As an example, an image patch can be viewed by the model and applied a label at three levels of resolution. Fewer or more levels of resolutions are also possible.
Furthermore, the input image data can be highly imbalanced in terms of the features that are relevant to patient outcome or patient response. In the case of cancer, a high-grade cancer may show up in less than 0.01 percent of the tissue. For example, 20-50 cells out of millions of cells on an image slide may be high-grade cancer cells. Those 20-50 high-grade cancer cells can change the treatment decision for the patient. A data balancing module 304 can balance the training data, so the models of the vectorization module 302 can give appropriate weight to morphological features that are highly relevant but may not occur in high frequency or quantity in the image slide. In some embodiments, the data balancing module 304 can use clustering to balance the input training data for the models of the vectorization module 302. The data balancing module 304 can cluster the input data based on morphology through an iterative process. The models of the vectorization module 302 are first trained using the baseline input image data, yielding a first level of accuracy. The output vectors are used to cluster the input image data and pass through the models of the vectorization module 302. Input training data, in subsequent passes, is fed uniformly through the models of the vectorization module 302 in a manner that input data from each cluster can be uniformly sampled throughout the input training data across all pattern subtypes, regardless of how frequently they show up in the tissue.
To further increase the accuracy of DDGM 208, a mistakes pipeline can use patch labels that are determined with less confidence and increase their presence in the input training data, so the models of the vectorization module 302 can better learn the low confidence labels. The vectorization module 302 may use a predetermined label threshold before classifying input data as belonging to a label. For example, one classification threshold of a label can be data having a score of 0.5 or more on a scale of 0 to 1. If data is classified with a score of 0.6. That data is classified with a confidence lower than another data, which is classified with a score of 0.9. Classifications with higher confidence score can increase the accuracy of the models of the vectorization module 302. The mistakes pipeline 306 can identify low confidence classification and sample them as input training data for the next round of training. In another embodiment, the mistakes pipeline can also sample from errors, as well as low-confidence classifications. In some embodiments, the mistakes pipeline is applied after the models of the vectorization module 302 have learned the labels that are easy to learn. In some embodiments, a predetermined percentage of the input training data to the AI models of the vectorization module 302 is used to feed input data from the mistakes pipeline (sampled from those input values that have generated output vectors having error or low confidence level).
Additionally, as discussed earlier, the models 402 can be trained based on auxiliary labels 308 to provide more accurate identification and classification for the input data labeled with auxiliary labels 308 based on processing that input data at different sizes and resolutions. For example, for each nucleus that is labeled, three patches are generated (e.g., 64×64 pixels at 40× resolution, 256×256 pixels at 40× resolution and 1024×1024 pixels at 5× resolution). The innermost patch captures the nucleus; the middle patch captures the gland and the outermost patch captures the micro-environment of the nucleus. Three parallel CNNs are run to transform the patch to a 1024-dimensional vector. The 3×1024 vectors are combined and classified to one of the known labels. In other embodiments, fewer or more patches based on a detected nucleus can be generated. Other resolutions and sizes can also be used. In some embodiments, the patches are from a WSI image, which can include a large number of pixels (e.g., in the order of millions or billions). The DDGM 208 can include other modules to improve labeling the input data (by auxiliary labels 308 or by labels 210). For example, a background module, color normalization, gland segmenter or nuclei segmenter can be used to label input image patches.
As described earlier, training data can be highly imbalanced. For example, there can be a very small percentage of high-grade cancer (<0.01%) that can determine or change treatment. To balance the dataset, the vectors generated from the first pass of training can be used to divide the input training data into clusters of morphologically similar patterns. This clustering can be performed, so different sub-patterns of input data are equally represented in the input training data (in iterative passes), even though they may not be present in equal amounts in the training dataset, or in clinical practice.
The mistakes pipeline 306 can further fine-tune the models of the vectorization module 302. Patches that the model has detected with low confidence are identified, based on the absolute value of the difference between the confidence level and a predetermined threshold for a classification label. If the absolute value of the difference does not exceed a confidence threshold, the underlying input data or a sampled subset of the underlying data is fed through the mistakes pipeline 306 to the models 402, as part of the training data. Consequently, the models 402 can give appropriate weight to the low-confidence data. Additional patterns that show up in small quantities and cannot be captured through clustering, can also be fed into the mistakes pipeline 306 to further improve the accuracy of models 402.
In one aspect, the output of the DDGM 208 includes a detection of presence and grading of a disease (via classification in the labels 210) and vector representations for morphological types in that disease. These labeled vectors are fed into a sub-morphology detector (SMD) 212.
Each patient slide is divided up to patches using the patch generator 206. As an example, each patient slide may be divided into 50 to 100K patches. Each patient patches and the patient outcome data are fed through the DDGM 208, which generates a labeled vector for each patch. Now, instead of a patch being an image, the patch is represented by a vector. The dimensions of the vector can be chosen according to an embodiment. Examples include 500, 1,000 or 2,000 dimensional vectors. These vectors can inherently capture similarities, which correspond to morphological similarity on a tissue slide. In other words, patterns that are morphologically similar, have mathematically similar vectors. In one embodiment, for example, the cosine function can be used to cluster similar vectors. Vectors that are mathematically similar yield cosine function results that are close in range. Other vector operations can also be used to determine similarity between vectors. In one respect, the SMD 212 converts an image into a structured dataset that can be used to rank and score image patches in terms of patient response or outcome.
Next, the DDGM 208 can predict a label 210 for each patch. In one embodiment, where three parallel CNNs are used to implement the DDGM 208, the last layer of the vectors is extracted, and used as the vector representation of the patch. As an example, in the three-parallel net architecture, if each vector is of size 1024, the last vector chosen for vector representation of the patch is of size 3×1024 or 3072.
Next, the SMD 212 clusters the vectors to identify morphological sub-patterns within each label. This can generate hundreds of clusters of morphologically similar patterns that do may not have an explicit label but can represent a phenotype. In the example shown, multiple morphological sub-patterns are identified for stroma and cancer labels. The morphological sub-patterns identified by SMD 212 can include regions that in turn include biomarkers, signals or signatures of disease or patient outcome that may have been previously unknown in ordinary clinical practice.
The DDGM 208 and SMD 212 convert unstructured data of gigapixel WSIs (or other input images if used) to a structure of clusters of morphological patterns. This structured representation of morphology enables downstream tasks of ranking these patterns and identifying which patterns are prognostic/predictive of patient outcome.
The input to the patch-level score module 708 can include a patch and the microenvironment of the patch. In some cases, the patch alone may not capture enough information to accurately score the patch. A TME adder 702 can be configured to obtain a region of predetermined size around a patch (e.g., by using patch generator 206) and vectorize the region (e.g., by using DDGM 208 and/or SMD 212). The region surrounding a patch can be chosen to capture the microenvironment of a patch. The data of the patch and its surrounds (e.g., microenvironment of the patch) can be used as input data to the patch-level score module 708 to generate a score for the patch. As an example, in some embodiments, the microenvironment of a patch can be chosen as a region of 3×3 or 5×5 pixels surrounding a patch. The TME ADDER 702 can combine the vectors from the microenvironment region with the vectors from the patch region. In some embodiments, the combination vector can be a mean vector. Other mathematical techniques combining the microenvironment vectors and the patch vectors are also possible candidates. Combining microenvironment vectors and patch vectors can allow for scoring a patch not only based on any tumor that may be present in the patch, but also the microenvironment of the tumor. The morphological manifestation of a tumor can look different in different regions of tissue based on the cells surrounding the tumor, and the microenvironment of the tumor in general. Consequently, the complexity of the microenvironment of the tumor cells can be captured via combining vectors from the microenvironment region with the patch vectors where the tumor may be present. The combined vectors of patch and microenvironment (CVPM) can be clustered into multiple clusters using a CVPM clustering module 704. As an example, 50 to 100K patches for a patient slide can be converted to 100 to 200 distinct clusters of CVPM. This can substantially reduce the complexity of the input image data and provide further structure for processing of the patient data and arriving at a prediction score.
Clusters of CVPM can include different numbers of corresponding patches because the morphological patterns in each cluster can occur with different frequencies in a WSI image or other patient input image. Nonetheless, frequency of occurrence of a morphological pattern may or may not have relevance to patient outcome. Some infrequently occurring morphological patterns can nevertheless be clinically significant for patient outcome and response. A sampling module 706 can sample input training data from each cluster in a manner that exposes the AI model of the patch level score module 708 to data in a uniform matter, regardless of the volume and frequency of patches in the clusters. In other words, sampling module 706 can be used to input a uniform representation across the morphological subtypes in a patient slide.
The path-level score module 708 can include a deep learning artificial intelligence model that uses its weights to assign a score to each patch. In other words, the AI model of the patch-level score module converts a CVPM to a score (e.g., a number between 0 to 1). A patient-level score module 710 takes a predetermined sampling of the patch-level scores and combines them to arrive at a patient-level score. In one embodiment, a predetermined number of top patch-level scores and a predetermined number of low patch-level scores are combined to arrive at a patient-level score. During training, the patient-level score is compared against known clinical patient output. If the results do not match, the weights of the model associated with the sampled patch-level scores are modified to find a better weight distribution that yields a patient-level score closer to the known clinical patient outcome. In one respect, the combination of all patch-level scores are not used, so the model can remember weights associated with which patch-level scores need to be modified in order to fine-tune the predicted patient-level score. In another respect, the ROPM 214 and the AI model therein learn to predict patient outcome, and in the process, learn to rank patches in a manner that yields an accurate patient outcome prediction. Consequently, patch-level scores can yield regions of interest that likely contain predictive biomarkers, signatures and signals.
In some embodiments, the ROPM 214 includes a mistakes pipeline 712, which is similar in operations to the mistakes pipeline 306 as described above. The mistakes pipeline 712 identifies patches, which the model has not learned well and allocates a percentage of the input data in subsequent passes to those patches, so the model is exposed to and can learn those patches better. The mistakes pipeline 712 can determine which patches correspond to a mistaken prediction, where a mistaken prediction can refer to the output 714 of patch-level score module 708 and patient-level score module 710, at the end of a training pass, predicting an outcome for the patient that does not match the known clinical outcome of that patient. For example, if a patient is non-recurrent, but the output 714 of ROPM 214 is predicting the patient has recurrence. In this mistaken prediction scenario, there are patches that show up on the patient's image slide that the model of the ROPM 214 is allocating high values, where the high values given to those patches can cause the combined patient-level score for that patient image slide to be above a predetermined threshold, and the model is predicting that the patient is recurrent. Conversely, the model of the ROPM 214 may be allocating lower value weights to some patches that should otherwise be scored higher, causing a mistaken prediction of non-recurrent, where the known clinical outcome of the patient is recurrent. The mistakes pipeline 712 can identify patches that are causing mistaken predictions, and feeds those or a sampled subset of them as training input data in successive training passes to the models of ROPM 214. Consequently, the models of the ROPM 214 get exposed more in iterative and successive passes to the patches that the models do not accurately score and learn to score those patches more accurately. In one respect, the mistakes pipeline 712 performs a tuning step, similar to mistakes pipeline 306, where the AI models are first train based on cluster of patches until an initial level of accuracy is reached. After the model is mature and it is still making some mistakes, the training data of the model will be sampled in a way to include a predetermined percentage of mistakes from the mistakes pipeline 712 to train the model for better accuracy regarding those elements that are causing a mistaken output.
The output 714 of ROPM 214 can include patient outcome or response indicator in the form of a morphometric score. In some embodiments, the morphometric score is a number between 0 to 1 that indicates a risk profile of a given patient. The closer the morphometric score of a patient to 1, the higher risk of an adverse outcome for that patient. The output of the patch-level score module 708 also includes scores that indicate risk of correlation between that patch and the patient outcome. Therefore, the patch-level scores can be used to identify regions of interests (ROIs) for further analysis and for finding biomarkers, signatures or signals predictive of patient outcome.
At step 804, for each patient, the patches and corresponding vector representations for the patient image slides are collected. At step 806, to capture the micro-environment of each patch, an N×N region is selected around each patch, and the vectors of those regions are generated. At step 808, the vectors are averaged to generate a mean vector for each region. The mean vectors are clustered per label to generate multiple clusters. As an example, this can convert 50-100 k patches into 100-200 distinct morphological clusters. At step 810, K numbers of patched are sampled uniformly across each cluster to generate a batch of vectors that represent the patient slides.
At step 812, each of the K patches are converted to a patch-level score between 0 and 1, using a supervised deep learning model, which is trained based on patient outcome. A high score (around 1) represents that the patch shows up in patients with adverse outcome, while a low score (around 0) represents good outcome (e.g., the patch appears in patients with non-recurring cancer). The patch-level scores are generated using a set of weights in the deep learning model that are learned by the outcome based on known patient outcome as labels.
At step 814, the top and bottom R patches are selected and combined to generate an outcome morphometric score for the patient. As described above, choosing a limited predetermined set of patches to be responsible for the outcome morphometric score can force the deep learning model to learn the most predictive features and give them the highest or lowest scores. At step 816, the deep learning model is further fine-tuned using a mistakes pipeline. Patches are identified that are causing mistakes in predicting patient outcome. These patches are collected and used to generate training data to expose the deep learning model to learn the mistakes better and assign more accurate weights to them. At step 818, based on patch-level scores, regions of interest (ROIs) are identified on the tissue slide. The method ends at step 820.
Having identified ROIs, molecular analysis or other techniques can be applied to those regions to determine which gene mutations or cellular processes are causing the patient to exhibit the morphological regions that are indicative of adverse outcome. This platform allows classifying tissue globally based on the protein and mRNA expression in ROI obtained through unsupervised morphological features extraction or focus on any region of interest to discover novel gene expression profiles. Combining gene expression profiles/signature found through protein and RNA analysis with morphological context of ROI (clusters of regions 904, 906 and 908) and non-ROI (other regions or patches which shows low score) in a wide variety of tissue types and their correlation with patient survival or therapy response outcome helps to discover precise biomarker signature for predicting outcome. For example, spatial profiling and biomarker identification techniques can be applied to the ROIs.
In the example of
Proteogenomic analysis or other biomarker identification techniques allow for detection of markers, including but not limited to immune markers, cancer markers, stromal markers, etc. that may be over- or under-expressed in ROIs compared to the non-ROIs. Without the benefit of ROIs determined by the ROPM 214, molecular markers are searched for at the whole slide level, while the information, related to differential (up-regulation/down-regulation) expression of genes related to specific pathways in ROIs vs non-ROIs and proteins to remodel the tumor microenvironment, is lost or weakened at the whole-slide-level analysis. This suggests that subpopulations of immune cells in the tumor microenvironment have specific features that differ from their behaviors in normal tissues and identify phenotypes that potentially help establish their roles in interacting with other cell types and modulating the tumor microenvironment. On the other hand, the problem with searching the whole slide for biomarkers, as is currently used in some existing techniques, is that any present disease signature is averaged out over the whole tissue, weakening the disease signal by including elements that are not biomarkers, weakening or hindering detection of biomarkers.
When biomarkers (e.g., proteins) predictive of patient response or outcome are identified, it can be beneficial to determine the spatial distribution of those biomarkers across tissue slides. Spatial distribution data can help improve the signature, because there may be elements that are not visible on an H&E slide but may be visible on a biomarker-specific IHC or IF stained image slide that captures spatial distribution of a biomarker. In other words, there may be more relevant or subtle patient outcome or response data that are only visible on biomarker slides. Without the benefit of having identified biomarkers predictive of patient response, it would be burdensome or impractical to develop biomarker slides for all potential biomarkers (e.g., in some cases 20,000 proteins can be potential biomarker candidates. Developing biomarker slides for these many proteins would be burdensome or impractical. Once the set of potential biomarkers is narrowed down, the spatial distribution of those signals can be developed by overlying or co-registering an annotated H&E slide with a biomarker slide, such as IHC or IF, where the annotations include patch-level morphological patterns. From the overlay, tumor cells and the proteins expressed by them are identifiable. The predicted patient outcome or response can be explained biologically as well.
The following discussion relates generally to the field of disease detection and treatment identification, and in particular to using artificial intelligence in identifying prostate cancer patients at high-risk of progression among clinically intermediate risk group.
About half of the intermediate group prostate cancer (PCa) patients are stratified into Gleason Grades (GG) 3+4 and 4+3. Several studies show that high-risk patients can still be found in both groups. An AI platform was used to develop a morphometric biomarker by analyzing digitized H&E slide images (WSIs) which predicts early biochemical relapse (BCR) and radiographic progression to metastasis. One hundred twenty five intermediate-risk (n=67, 3+4; n=58, 4+3) samples were collected from the Icahn School of Medicine at Mount Sinai (ISSMS) to form a held-out test set. A series of deep learning models trained using data from ISMMS, the University of Wisconsin-Madison, and TCGA generated a high dimensional vector for each WSI to provide a numerical representation of observed morphologies, which are then converted into a single score to predict BCR within 36 months and high risk of metastasis (MET). Area-under-the-receiver operating characteristic (AROC) was used to measure the accuracy of BCR, and the concordance index (CI) was used to measure the performance of MET. The high and low-risk groups' hazard ratios (HR) for patients within grades 3+4 and 4+3 show that our model can further stratify GG. Our method was significantly better at predicting BCR (AROC: 0.801) and ranking MET (CI: 0.764) relative to standard clinical metrics, GG, pathologic staging, and genomic tools such as DECIPHER. We further sub-stratified the patients into GG 3+4 and 4+3 and identified high-risk patients within each GG. Patients in GG 3+4 with high PathomIQ scores had a significantly higher risk of BCR (HR 3.3; 95% CI 1.44-7.56; p<0.005) compared to the low PathomIQ score patients. A similar trend was seen in the GG 4+3 group (HR 3.0; 95% CI 1.32-6.83; p<0.01). Our histopathology-based prognostic biomarker significantly improves over standard clinical markers in stratifying patients with intermediate-risk PCa for BCR and MET. Our scoring method may strongly impact the management of intermediate-risk PCa patients and clinical trial patient selection for the successful development of new therapies for early-stage PCa.
PATHOMIQ-PRAD Base. A total of 1000 radical prostatectomy (RP) WSIs from 589 patients were collected from the University of Wisconsin-Madison (UW) and 243 from the publicly available Cancer Genome Atlas (TCGA). Slides from UW were scanned at 40× magnification using a high-capacity scanner (Aperio AT2 DX; Leica Biosystems). The training and validation details of models which make up the base of PATHOMIQ-PRAD using these slides have previously been published.
PATHOMIQ-PRAD Output. A total of 325 RP and biopsy WSIs were collected from the Icahn School of Medicine at Mount Sinai (ISMMS) and UW. TCGA slides were also used in this mode. Slides from ISMMS were scanned using Aperio CS2 (Leica Biosystems, Inc.) at 40× magnification. Each WSI was associated with a single patient and included an accompanying time-to-biochemical relapse (BCR), and standard clinical information such as Gleason grade, margin status of the patients post RP, pathologic staging, etc. WSIs from ISMMS also included time-to-metastasis (MET) and subset included DECIPHER scores, a genomic risk profiling tool. The Table in
The output of PATHOMIQ-PRAD was trained using data sourced from research institutions (ISMMS and UW) and publicly available data (TCGA). Patients were included in training if they had a BCR within 36 months or no BCR after 36 months. BCR was defined by PSA rising by two consecutive tests post-RP relative to the first PSA level post-RP. Patients with BCR after 36 months were excluded. No patient received perioperative treatment with androgen deprivation therapy or adjuvant radiotherapy, including patients with positive margin.
To most effectively test the predictive power of PATHOMIQ-PRAD in identifying high risk patients, we curated a held-out test set comprised only of patients with assigned intermediate-risk Gleason scores in order to measure its prognostic and predictive power when looking at the difficult to assess cases. This held-out test set from ISMMS had 125 patients with Gleason score 7 (n=67, 3+4; n=58, 4+3).
Referring now to
A portion of the system (also referred to as PATHOMIQ-PRAD) comprises multiple stages and multiple AI-modules, each responsible for a different task Branch X is responsible for tile-level classification of tissue and quantification of morphologies; Branch Y is responsible for high level feature encoding of slides. A final model aggregates information from both branches to output a single score which represents risk of biochemical recurrence. All models were implemented in Python 3.9 using the deep learning framework Pytorch, and the models were trained on an Amazon Web Service g3.8×large instance, with 4 GPUs, 64 vCPUs, 16 GB GPU RAM.
Pre-processing. While the process requires at least one digitized biopsy or RP slide, multiple slides may be submitted for one patient. If submitted slides were not originally scanned at 40× magnification, the slide is upscaled to 40×. First, all digitized tissue across one or more slides is divided into 256×256 tiles and filtered using a quality control AI module which removes tiles with pen marks, scanning artifacts, tissue folds, and other degrading characteristics.
Branch X. Afterwards, each tile is fed through a series of convolutional neural network (CNN) deep learning models in order to classify them for morphological quantification. The cancer detection module classifies tiles between cancer, benign, and stroma. The cancer grading module takes any tile previously classified as cancer and predicts them into various standard prostate-specific morphological patterns. Using these classifications, slide level statistics are generated and normalized.
Branch Y. Each tile is encoded into a high-dimensional feature representation using a previously trained CNN encoder. All patient level vectors are averaged into a single vector and then it is reduced in dimensionality using principal component analysis.
Final Output. In the last step, information from Branches X and Y are combined as covariates in a final survival model which is trained to output a score between 0 and 1 which indicates risk of biochemical recurrence, with 1 being the highest likelihood.
Nomograms are useful ways to combine different clinical parameters and tests into a unified score. We compared against three of the most popular ones in the prostate setting. Cancer of the Prostate Risk Assessment score (CAPRA-S) was calculated using the pre-surgical PSA level, pathologic grade, Gleason score, positive surgical margin, and histologic markers including extracapsular extension, seminal vesicle invasion, and regional lymph node invasion. Partin was calculated using clinical stage, gleason score on diagnostic biopsy, and pre-surgical PSA. Finally, Kattan was calculated using age, pre-surgical PSA, gleason score on diagnostic biopsy, and pathologic grade.
Area-under-the-receiver operating characteristic (AROC) was used to measure the overall performance of each method's ability to predict BCR within 36 months by evaluating all pairs of sensitivities and specificities using a sweeping threshold. An AROC of 0.5 indicates random performance. Concordance index (CI) was used to measure the performance of each method's ability to rank patients by risk of metastasis in concordance with their actual time-to-metastasis, including censored and uncensored outcomes. Similar to AROC, a CI of 0.5 indicates random performance.
A univariate analysis for each method was conducted using Cox-proportional hazard modeling. Hazard ratios with confidence intervals and statistical significance are provided for each independent covariate for both BCR and MET. The genomic test score was divided into low, intermediate, and high categories using its respective thresholds of 0.45 and 0.6. CAPRA-S was divided into low and high categories based on its point system. Because Kattan and Partin nomograms use continuous scoring systems without designated categorical risk groups, a median threshold was used to define low and high risk categories for the sake of comparing hazard ratio with the other methods.
To measure our model's predictive value in identifying high risk patients within samples assigned intermediate-risk Gleason scores, we stratified patients into high and low risk groups based on our method's output score and using a clinical cut off determined using the median PATHOMIQ_PRAD score on training data. Using this classification, we use Kaplan-Meier modeling to measure and visualize how confidently these groups are stratified from each other. Unlike AROC which uses a sweeping threshold to measure overall accuracy, this single threshold analysis is essential for real-world clinical decision making. The log rank test was used to measure the statistical significance of stratification (p<0.05) between groups.
All results are measured on the intermediate-risk patients (n=67, GG 3+4; n=58, GG 4+3) held-out test set described herein.
Referring to
In order to determine specific cutoffs for time-to-BCR and time-to-MET, scores produced on the training data were used to maximize patient stratification in each time to event category. BCR was best stratified using 0.4 and MET was best stratified using 0.55. These thresholds were then used to stratify the held-out test data.
Our method was significantly better at predicting BCR within 36 months after RP and ranking MET relative to GG, TNM, and CAPRA-S. These results demonstrate that our AI-based deep learning method is capable of further dissecting the morphological characteristics of PCa tissue H&E stained images that are not easily discernible by human eyes and thus, never been included in the GG system.
Additionally, our assay is based on digital images of H&E stained slides that does not require shipping of slides as required by all genomic analysis based assays. Moreover, our assay is non-destructive and the tissue can be subjected to other proteo-genomic analysis subsequent to our scoring, if desired.
The results illustrated in
Thus, PATHOMIQ-PRAD performed on both biopsy and prostatectomy tissue images is a much superior test in predicting both BCR and MET in intermediate-risk PCa patients as compared to other tests generally performed in the clinic today. Some of these proteogenomic tests require physical shipping of tissue samples that are destroyed during the testing process. Use of this this test on all PCa patient tissue images has demonstrated high accuracy of BCR prediction for all PCa patients. That has now been extended to prediction of both BCR and MET with similar accuracy for intermediate-risk patients.
Thus, system and methods described herein will be very helpful in assisting both the urology surgeons and PCa medical oncologists to address the following unmet need in PCa management decision for intermediate-risk patients-(i) deciding on who needs RP and who can be kept in watchful waiting, (ii) deciding on who needs aggressive therapy such as ADT and/or anti-androgen treatment post-RP.
Our systems and methods described herein provide a novel AI-powered PCa prognostic test for a larger cohort of intermediate risk patients undergoing RP that can identify patients who will likely be benefited by aggressive treatment such as ADT or anti-androgen therapy post-surgery to delay or block PCa progression.
Additional Discussion of methodologies followed:
Two types of data were collected: (1) Hematoxylin and Eosin (HE)-stained whole-slide images (WSIs) without associated outcome data for training and development of various artificial intelligence (AI) modules that comprise the PATHOMIQ_PRAD base, designed to accurately identify many hundreds of distinct high- and low-level morphological features; and (2) WSIs with associated outcome and clinical data for training and validation of the final PATHOMIQ_PRAD output, which is a single score indicating the risk of biochemical recurrence (BCR) and metastasis. The workflow is illustrated in
A total of 1000 radical prostatectomy (RP) WSIs from 589 patients were collected from the University of Wisconsin-Madison (UW), as well as 243 from the publicly available data in The Cancer Genome Atlas (TCGA). Slides from UW were scanned at 40× magnification using a high-capacity scanner (Aperio AT2 DX; Leica Biosystems). The training and validation details for models constituting the base for PATHOMIQ_PRAD using these slides have previously been published.
A total of 376 RP and biopsy WSIs were collected from the Icahn School of Medicine at Mount Sinai (ISMMS) and UW. 243 TCGA slides were also used in this mode. Of this data, 176 WSIs from ISMMS were set aside as a blinded test set. Slides from ISMMS were scanned NanoZoomer S210 Digital slide scanner (Hamamatsu USA) at 40× magnification. Each WSI was associated with a single patient and included the time to BCR and standard clinical information such as decision-making nomograms. WSIs from ISMMS also included time to metastasis (except one case) and subset included DECIPHER scores, a genomic risk profiling tool. Table 1 of
Patients were included in the training set regardless of whether or not they had a BCR event after RP. BCR was defined as rising prostate-specific antigen (PSA) on two consecutive tests after RP relative to the first PSA level after RP. No patient received perioperative treatment with androgen deprivation therapy (ADT) or adjuvant radiotherapy, including patients with a positive surgical margin.
As PATHOMIQ_PRAD was the most effective test in identifying patients with high-risk disease, we curated a held-out test set comprising only patients assigned an intermediate-risk Gleason score to measure its prognostic and predictive power for looking at cases that are difficult to assess. The blinded test set from ISMMS comprised 176 patients with Gleason score 7 (98 with Gleason 3+4 and 78 with Gleason 4+3).
PATHOMIQ_PRAD consists of multiple stages and multiple AI-modules, each responsible for a different task (as illustrated in
Branch X is responsible for tile-level classification of tissue and quantification of morphologies. Branch Y is responsible for high-level feature encoding of slides. A final model aggregates information from both branches to output a single score representing the risk of BCR. All models were implemented in Python 3.9 using the Pytorch deep learning framework and were trained on an Amazon Web Service g3.8×large instance with four GPUs, 64 vCPUs, and 16 GB of GPU RAM.
While the algorithm requires at least one digitized biopsy or RP slide, multiple slides may be submitted for an individual patient. If a submitted slide was not originally scanned at 40× magnification, the slide is upscaled to 40×. First, all digitized tissue across one or more slides is divided into 256×256 tiles and filtered using a quality control AI module that removes tiles with pen marks, scanning artifacts, tissue folds, and other degrading characteristics.
After preprocessing, each tile is fed through a series of convolutional neural network (CNN) deep learning models to classify them for morphological quantification. The cancer detection module classifies tiles between cancer, benign, and stroma. The cancer grading module takes any tile previously classified as cancer and predicts them into various standard prostate-specific morphological patterns. Using these classifications, slide level statistics are generated and normalized.
Each tile is encoded into a high-dimensional feature representation using a previously trained CNN encoder. All patient level vectors are averaged into a single vector and then it is reduced in dimensionality using principal component analysis.
In the last step, data from branches X and Y are combined as covariates in a final survival model that is trained to output a score between 0 and 1 reflecting the risk of BCR, where 1 denotes the highest likelihood.
To measure the predictive performance of PATHOMIQ_PRAD in identifying high-risk PCa among samples assigned intermediate-risk Gleason scores, we stratified patients into high and low risk groups on the basis of the output score and clinical cutoffs of 0.45 for BCR and 0.55 for metastasis, as used in previous validation studies. We used the Kaplan-Meier method to measure and visualize the stratification of these groups for the complete test set. The log-rank test was used to measure the statistical significance of the stratification (p<0.05) between groups.
For the subset of patients with a genomic score, the concordance index was calculated as a measure of the performance of each method in ranking patients by risk of BCR or metastasis in relation to their actual time to BCR or metastasis, including censored and uncensored outcomes. A concordance index of 0.5 indicates random performance.
For the subset of patients with a genomic score, univariate analysis using Cox proportional-hazards modeling was conducted for each method. Hazard ratios with confidence intervals and p values for the statistical significance of the Wald test are provided for each independent covariate for BCR and for metastasis. The Wald test assesses the ratio of the covariate coefficient to its standard error and evaluates whether the coefficient for a given variable is significantly different from 0. Multivariate analysis was also conducted for all covariates in a single model. Statistical significance using the Wald test was assessed for each covariate. Three global statistical tests were used to measure the overall significance of the multivariate model: the likelihood ratio test, the log-rank test, and the Wald test. These three methods are asymptotically equivalent. As the sample size increases, they all perform similarly. The likelihood ratio test has better performance for smaller sample sizes.
The genomic test score was divided into non-high and high categories using the published threshold value of 0.6. CAPRA-S was divided into low and high categories according to its point system. Because the Kattan and Partin nomograms use continuous scoring systems without designated categorical risk groups, a median threshold was used to define low and high risk categories for comparison of hazard ratios with the other methods.
The net benefit of PATHOMIQ_PRAD in comparison to a genomic score and CAPRA-S was assessed via decision curve analysis (DCA) for the 5-yr probability of BCR and metastasis. The 3-yr probability of BCR was also assessed because of recent clinical interest in identifying risk at an earlier time point. Net benefit was measured based on a range of threshold probabilities to indicate the minimum probability of disease at which an additional intervention would be justified. The net benefit was calculated as sensitivity prevalence-(1−specificity)×(1−prevalence)×w, where w represents the odds at the threshold probability [6]. Individual survival models were trained and used for DCA.
PATHOMIQ_PRAD analyzes both cancer epithelium and the proximal tumor microenvironment (TME). To demonstrate its ability to recognize subtle differences in histological groups, regions of interest (ROIs) correlated most closely with disease progression in the training data were clustered on the basis of image similarity.
To investigate which features in the TME are most relevant when predicting disease outcome, regions indicative of high risk and low risk are extracted separately from the test data for review by a pathologist and identification of specific stromal features that have previously been largely ignored because of a lack of evidence regarding their role in tumor risk stratification (see
Patients with intermediate-risk prostate cancer (PCa) exhibit a wide range of disease characteristics and clinical outcomes, making their management challenging. Several stratification methods have been developed to better categorize these patients. Effective stratification is crucial for guiding selection of the most appropriate treatment options and improving clinical outcomes for individuals with intermediate-risk PCa.
The D'Amico classification, one of the earliest and most consistently used tools, stratifies PCa patients into risk groups according to their 5-yr BCR risk after radiotherapy or RP. The scheme uses clinical stage, PSA levels, and Gleason score for stratification. Despite its foundational role, there are limitations in relying solely on Gleason grading or other clinical parameters for effective patient stratification. This study highlights how our AI-powered platform addresses these limitations and offers a more precise stratification method for intermediate-risk PCa, thereby enhancing clinical decision-making for better outcomes.
PATHOMQ_PRAD uses digitized hematoxylin and eosin (HE)-stained WSIs and applies a holistic approach that incorporates epithelial, stromal, and immune contexture to generate high-dimensional vectors for each image. This allows numerical representations of the morphologies observed, which are then transformed into a single prognostic score (ranging from 0 of 1) for each patient that is capable of predicting clinical outcomes.
The results in
While we did not evaluate the impact of exclusion of some of these features from the calculation on the predictive accuracy of PATHOMIQ_PRAD, a previous study evaluated ROIs in WSIs with high morphometric scores. Results from a pilot morphogenomic study, which used a NanoString panel of limited protein markers, allowed us to identify factors in the tumor microenvironment that drive tumor growth. This supports the validity of our approach. We discovered that well-known markers of cancer proliferation, such as Ki67, were elevated in these ROIs and reported for the first time that immune markers such as TMEM173, CD8, CD163, and PD-L1 were also highly expressed in the ROIs. This implies that the extensive perspective offered by PATHOMIQ_PRAD might encompass predictive characteristics that have not yet been identified. In addition, advanced technologies such as spatial transcriptomics using comprehensive whole-transcriptome assays and mass cytometry can further aid in this approach. These strategies are currently being explored in our laboratories.
PATHOMIQ_PRAD has been validated across multiple risk assessment and treatment response studies that highlighted its remarkable generalizability and broad clinical utility for PCa management. While the studies described here used WSIs for RP specimens, we tested the predictive capability of PATHOMIQ_PRAD on biopsy specimens and found that it performs equally well with biopsies. For a cohort of 436 patients, PATHOMIQ_PRAD showed stronger stratification for patients on apalutamide+ADT (hazard ratio 0.19, 95% confidence interval [CI] 0-0.37; p<0.005) versus placebo+ADT (hazard ratio 0.39, 95% CI, 0.17-0.86; p=0.02).
Our prior research demonstrated the high accuracy of the test in predicting BCR across all PCa risk categories, now expanded to include precise predictions of both BCR and metastasis for patients with intermediate-risk PCa. This novel AI platform can potentially fill crucial gaps in treatment decision-making for intermediate-risk PCa, particularly in evaluating the need for RP versus watchful waiting and in identifying patients who may benefit from more aggressive treatments, such as ADT and androgen receptor signaling inhibitors (ARSIs), after surgery. In addition, this approach has a fast turnaround time and lower costs, involves decentralized deployment, and does not destroy tissue, which are advantages in comparison to genomic tests.
In summary, our results to date show that the PATHOMIQ_PRAD test can guide clinical decision-making and is potentially ready for clinical translation and incorporation into routine practice, pending appropriate regulatory approvals. PATHOMIQ_PRAD is the first AI-driven prognostic tool tailored for intermediate-risk PCa after RP that can discern patients likely to respond to ADT or ARSI therapy to prevent or slow disease progression.
Processor 3101 may perform computing functions such as running computer programs. The volatile memory 3102 may provide temporary storage of data for the processor 3101. RAM is one kind of volatile memory. Volatile memory typically requires power to maintain its stored information. Storage 3103 provides computer storage for data, instructions, and/or arbitrary information. Non-volatile memory, which can preserve data even when not powered and including disks and flash memory, is an example of storage. Storage 3103 may be organized as a file system, database, or in other ways. Data, instructions, and information may be loaded from storage 3103 into volatile memory 3102 for processing by the processor 3101.
The computer 3100 may include peripherals 3105. Peripherals 3105 may include input peripherals such as a keyboard, mouse, trackball, video camera, microphone, and other input devices. Peripherals 3105 may also include output devices such as a display. Peripherals 3105 may include removable media devices such as CD-R and DVD-R recorders/players. Communications device 3106 may connect the computer 3100 to an external medium. For example, communications device 3106 may take the form of a network adapter that provides communications to a network. A computer 3100 may also include a variety of other devices 3104. The various components of the computer 3100 may be connected by a connection medium such as a bus, crossbar, or network.
Some portions of the preceding detailed descriptions have been presented in terms of algorithms and symbolic representations of operations on data bits within a computer memory. These algorithmic descriptions and representations are the ways used by those skilled in the data processing arts to most effectively convey the substance of their work to others skilled in the art. An algorithm is here, and generally, conceived to be a self-consistent sequence of operations leading to a desired result. The operations are those requiring physical manipulations of physical quantities. Usually, though not necessarily, these quantities take the form of electrical or magnetic signals capable of being stored, combined, compared, and otherwise manipulated. It has proven convenient at times, principally for reasons of common usage, to refer to these signals as bits, values, elements, symbols, characters, terms, numbers, or the like.
It should be borne in mind, however, that all of these and similar terms are to be associated with the appropriate physical quantities and are merely convenient labels applied to these quantities. Unless specifically stated otherwise as apparent from the above discussion, it is appreciated that throughout the description, discussions utilizing terms such as “identifying” or “determining” or “executing” or “performing” or “collecting” or “creating” or “sending” or the like, refer to the action and processes of a computer system, or similar electronic computing device, that manipulates and transforms data represented as physical (electronic) quantities within the computer system's registers and memories into other data similarly represented as physical quantities within the computer system memories or registers or other such information storage devices.
The present disclosure also relates to an apparatus for performing the operations herein. This apparatus may be specially constructed for the intended purposes, or it may comprise a general-purpose computer selectively activated or reconfigured by a computer program stored in the computer. Such a computer program may be stored in a computer readable storage medium, such as, but not limited to, any type of disk including floppy disks, optical disks, CD-ROMs, and magnetic-optical disks, read-only memories (ROMs), random access memories (RAMs), EPROMs, EEPROMs, magnetic or optical cards, or any type of media suitable for storing electronic instructions, each coupled to a computer system bus.
Various general-purpose systems may be used with programs in accordance with the teachings herein, or it may prove convenient to construct a more specialized apparatus to perform the method. The structure for a variety of these systems will appear as set forth in the description above. In addition, the present disclosure is not described with reference to any programming language. It will be appreciated that a variety of programming languages may be used to implement the teachings of the disclosure as described herein.
This application claims the benefit of priority to U.S. Provisional Application No. 63/544,562, filed on Oct. 17, 2023, which is hereby incorporated by reference in its entirety.
Number | Date | Country | |
---|---|---|---|
63544562 | Oct 2023 | US |