The disclosed technology pertains to a system and interface for early diagnosis of cancer and other pathologies based upon medical images.
Conventional approaches to diagnosing cancer and other diseases based on medical imaging may utilize combinations of manual and automated steps and are often performed across multiple distinct systems and health information environments, and with multiple users providing review, confirmation, annotation, and other inputs.
The varied approaches, systems, interfaces, and participants in such processes often result in lengthy wait times for output of results, inaccurate or incomplete results, and various inefficiencies. As an example, some diagnostic systems may automatically identify a lesion depicted by a medical image, and may perform various additional tasks based on such an identification (e.g., resource intensive additional analysis, image annotation and markup, distribution of images and diagnoses to clinicians or others associated with the patient). Such processes are often queue-driven and performed across multiple local and remote devices within a single information system, such that a user providing care or support to a patient at the point-of-care (“POC”) will often have limited or no visibility on the status of such tasks, and little ability to influence or intervene in the outcome if needed. Such processes are often also performed across information systems from multiple different parties, with images, information, and other input being passed to third parties via communication interfaces that provide minimal feedback or interactivity.
While timely and high-quality care is certainly possible with a workflow having such limitations, it is not uncommon to have breakdowns in the workflow that are difficult or impossible to address, and that can contribute to delays, inefficiencies, inaccuracies, and poor patient outcome. As an example, where an automatic diagnosis performed early in the workflow erroneously segments or identifies a lesion, a number of additional steps and actions may be taken before the error is identified and corrected. In the meantime, the patient, clinician, and others involved in the patient care may have been waiting for results of a workflow that now needs to be re-performed, or may have begun to take action or provide care based on workflow output that is now in question.
What is needed, therefore, is an improved system and interface for early diagnosis of cancer and other pathologies.
Aspects of the invention relate to methods and systems for establishing annotation, measurement, phenotypical characteristics, and diagnosis of, or other medical predictions concerning, a lesion on a medical image.
In a method according to one aspect of the invention, one or more measurements and one or more phenotypical characteristics of a lesion are established using a first machine learning model operating on a machine segmentation of a lesion indicated in a medical image. Using the first machine learning model or a second machine learning model, a medical prediction concerning the lesion is provided using one or more of at least some of the measurements, at least some of the phenotypical characteristics, or features extracted from the machine segmentation.
In a practical embodiment of the method described above, a trained person would click on or otherwise indicate a lesion or a region containing a lesion using a graphical user interface (GUI). A machine segmentation model would produce a machine segmentation of the lesion using the indication from the GUI. The system would proceed to produce measurements and define phenotypical characteristics of the lesion automatically and would provide the medical prediction. The phenotypical characteristics would typically be clinically-known phenotypical characteristics typically used by trained persons to make clinical judgments and predictions about lesions of that particular type. The medical prediction could be a diagnosis (e.g., benign, malignant, or the particular type of lesion), a prediction concerning response to a particular treatment, a prediction concerning survival, a prediction concerning progression, etc.
Another aspect of the invention relate to systems implementing these methods. These systems would generally include a processor, such as a microprocessor, and memory, and would establish, e.g., the GUI used to obtain the indication of interest and provide the output. Systems according to embodiments of the invention may be physically distributed and networked with one another, or they may be physically located at the same location.
Other aspects, features, and advantages of the invention will be set forth in the description that follows.
The invention will be described with respect to the following drawing figures, in which like numerals represent like features throughout the description, and in which:
Here, the term “medical image” refers to any kind of medical imagery, including computed tomography (CT) scans, magnetic resonance imaging (MRI) scans, positron emission tomography (PET) scans, and X-ray images. The term “medical image” also applies to applications of these types of modalities to specific body parts, as in the case of mammography and digital breast tomosynthesis. As those of skill in the art will understand, a medical image need not be the result of a single exposure or capture event. Rather, a single medical image may be a reconstruction or interpolation from a larger image dataset, e.g., a particular plane or “slice” from a helical CT scan. Moreover, the medical image used in method 10 need not necessarily be two-dimensional: in many cases, the medical image may be a three-dimensional image showing, e.g., some compartment or interior volume of the body, although two-dimensional “slices” or projections of that three-dimensional image may be used in particular tasks and for particular purposes.
The trained person may access a single medical image or a series or set of related medical images from a scan or study. Typically, “access” occurs by retrieving one or more medical images from a database or other storage medium and displaying them on a desktop computer, a tablet computer, a touchscreen display, etc. In clinical use, the database may be a Picture Archiving and Communication System (PACS) database; research and other non-clinical applications of method 10 may use a PACS database or a database or repository of some other sort.
As was noted above, the medical image includes one or more lesions. Here, the term “lesion” is used in the general sense to indicate any kind of injury to, or disease in, an organ or tissue that is discernible in a medical image, e.g., a lung nodule visible on a CT image. Any lesion may be either benign or malignant (e.g., a cancerous tumor). A trained person may do any number of things with such a medical image. For example, a radiologist might “read” a scan in traditional fashion and produce a radiology report.
A traditional radiology report might include information on the precise boundaries of any lesions, which this description refers to as annotation or annotations. A traditional report might also include clinical measurements of any lesions; an indication of characteristics related to the biology or appearance of the lesion, which this description refers to as the phenotype of a lesion; and an indication of a diagnosis, such as whether a particular lesion is benign or malignant.
Artificial intelligence (AI) predictive systems used in medicine can easily become “black boxes” that offer diagnoses, treatment plans, or predictions regarding a lesion with no overt reasoning or evidence provided to support whatever diagnoses, treatment plans, or other predictions that are offered. However, as will be described below in more detail, method 10 and systems that implement it can provide both medical predictions and information derived from those predictions, like treatment plans, alongside the same sort of information that would be found in a traditional radiology report, thus giving trained professionals a basis on which to evaluate any clinical prediction or recommendation that might be made.
Method 10 begins at task 12 and continues with task 14, in which an indication of interest is obtained from the trained person. In task 14 of method 10, the trained person indicates one or more lesions or other regions of concern in the medical image or series of medical images. The method by which this is done is not critical. For example, the trained person may click on the lesion in a graphical user interface using a mouse or trackpad, tap on the lesion if a device with a touchscreen is being used, circle the lesion or mark it in some other way with a stylus, etc. While the method by which the trained person indicates lesions in the medical image is not critical, it is advantageous if that method is quick and convenient, because method 10 preferably requires as little of the trained person's time as possible.
The location or locations at which the trained person clicked or otherwise made an indication are recorded as the indication of interest. An indication of interest may comprise multiple points from multiple lesions or regions. The indication of interest essentially describes structures or regions that are of concern to the trained person and will be used to make one or more medical predictions in later tasks of method 10. In some cases, once the indication of interest is obtained, it may be displayed or overlaid on the medical image so that the trained person can confirm that the input was correctly received, but that need not be the case in all embodiments.
Typically, the indication of interest is stored numerically as a set of two-dimensional or three-dimensional coordinates. Those coordinates may be expressed in any useful frame of reference, e.g., relative to the medical image itself (i.e., the pixel or voxel coordinates of the indication of interest in the medical image), relative to an organ or anatomical feature in the medical image, or relative to some other point of origin. If method 10 is operating on a set of medical images, the indication of interest would typically also include an indication of the image to which it corresponds.
With respect to method 10 of
However, a set of coordinates alone may not be a suitable input for some segmentation models, and the indication of interest may be expressed in any number of ways. For example, an image-like matrix could be constructed that weights each pixel in accordance with metrics relevant to the likelihood that that pixel is a part of the lesion to which the indication of interest pertains, such as the Euclidean distance from the point or points that were clicked, or some other metric.
Segmentation models often use neural networks, such as convolutional neural networks (CNNs) and vision transformers, and those neural networks usually require input in the form of an image. In those cases, a distance map could be encoded as an additional image channel or layer, or as a separate image. Other types of segmentation models that do not rely on neural networks could also be used, including active contour and region-growing approaches.
In the illustrated embodiment, method 10 uses a U-net, a CNN architecture that is particularly suited for image segmentation tasks.
As shown in
With respect to
As indicated in
Once the segmentation is established in task 18, clinical measurements may be taken, as shown in task 22 of method 10. For convenience in illustration and explanation,
In general, measurement may include automatic computation of a straightforward clinical measure of lesion diameter, as well as short axis, area, volume, basic shape attributes, and other measurements. Measurement may also include the computation and/or extraction of complex measurements or features that cannot be calculated manually. This type of operation is referred to is radiomics.
In the field of radiomics, large amounts of quantitative data are extracted from medical images, such as CT, MRI, and PET scans, as well as classic X-ray images. This quantitative data is in the form of tens, hundreds, or even thousands of individual quantitative image features. Some of those features, like the size and shape of a lesion, are straightforward and would be understandable to any clinician. Other features, for example, relating to the texture of the image in and around the lesion, are less interpretable, or uninterpretable, to human eyes. The radiomic features are used with machine learning techniques to make medical predictions.
Examples of features that may be extracted and used include histogram features, textural features, filter- and transform-based features, and size- and shape-based features, including vessel features. As will be described below in more detail, vessel features can be considered to be a special case of size- and shape-based features. The classification of various radiomic features may vary depending on the authority one consults; the categories used here should not be considered a limitation on the range of features that could potentially be used.
Histogram features use the global or local gray-level histogram, and include gray-level mean, maximum, minimum, variance, skewness, kurtosis, etc. Measures of energy and entropy may also be taken as histogram or first-order statistical features. Texture features explore the relationship between voxels, the gray-level cooccurrence matrix (GLCM), the gray-level run-length matrix (GLRLM), gray-level size zone matrix (GLSZM), and gray-level distance zone matrix (GLDZM). Co-occurrence of local anisotropic gradient orientations (COLLAGE) features are another form of texture feature that may be used. (See P. Prasanna et al., “Co-occurrence of local anisotropic gradient orientations (collage): distinguishing tumor confounders and molecular subtypes on MRI,” in Int'l Conf. on Med. Image Computing and Computer-Assisted Intervention, pp. 73-80 (Springer, 2014).) Filter- and transform-based features include Gabor features, a form of wavelet transform, and Laws features.
Vessel features, i.e., features of the blood vessels in the peri-lesional region, may be used, including measures and statistics descriptive of vessel curvature and vessel tortuosity. (Sec, c.g., Braman, N., et al., “Novel Radiomic Measurements of Tumor-Associated Vasculature Morphology on Clinical Imaging as a Biomarker of Treatment Response in Multiple Cancers.” Clin. Cancer Res. 28 (20), pp. 4410-4424, (October, 2022).) Transform-based approaches to characterizing vessel features like curvature and tortuosity may also be used, such as Vascular Network Organization via Hough Transform (VaNgOGH). (Sec, c.g., Braman, N. et al. “Vascular Network Organization via Hough Transform (VaNgOGH): A Novel Radiomic Biomarker for Diagnosis and Treatment Response” in Medical Image Computing and Computer Assisted Intervention—MICCAI 2018 (eds. Frangi, A.F., et al.), pp. 803-811 (Springer, 2018)). As noted above, vessel features can be considered to be a special case of size- and shape-based features, considering the size and shape of the vessels, rather than the lesion. If vessel features are to be used, a segmentation of the vessels around the lesion may be performed in task 18 of method 10 to identify the vessels. Additional steps may also be taken, like the use of a fast-march algorithm to identify the centerlines of the vessels and steps to connect disconnected vessel portions, before extracting features from the vessels.
Once measurements are derived and/or extracted from the segmented image in task 22, method 10 continues with task 24. As was indicated above, radiomic features of the segmented lesion may be specifically extracted and used to make medical predictions. However, in many embodiments, it may not be necessary to perform a traditional radiomic feature extraction. Instead, post-attention deep features extracted from the U-net 102 may be sent to a machine learning model to make medical predictions, to establish semantic phenotype characteristics for a lesion, and for other purposes. This is the purpose of task 24 of method 10.
The “deep features” extracted in task 24 are essentially compressed, filtered versions of the medical image that have been through multiple trained convolution and downsampling operations, and thus have a reduced dimensionality. In the schematic of
Method 10 continues with task 26, in which semantic phenotype traits are derived using a machine-learning model. As shown in
In general, the information should be presented in a way that would be immediately understood by a trained person, and where an established clinical scale or metric for a particular trait is commonly used, the information may be presented using that established clinical scale or metric. The semantic phenotypical characteristics that are established by the machine learning model 136 may vary from embodiment to embodiment, and may be any characteristics that might be considered by a trained person, or any consistent characteristic that can be established and presented by a machine learning model 136.
In this embodiment, the machine learning phenotyping model 136 is a machine learning classifier. Any type of classifier may be used, depending on the nature of the medical prediction that is to be made, the nature of the features, and other factors. The classifier may be, e.g., a logistic regression or Cox proportional hazards model, a linear discriminant analysis (LDA) classifier, a quadratic discriminant analysis (QDA) classifier, a bagging classifier, a random forest classifier, a support vector machine (SVM) classifier, a Bayesian classifier, a Least Absolute Shrinkage and Selection Operator (LASSO) classifier, etc. A trained neural network may also serve as a classifier.
In this embodiment, the classifier is trained to make predictions (i.c., establish phenotype scores) using the deep features extracted from the U-net 102. However, the machine learning model 136 may be trained to make predictions based on any combination of features. For example, if radiomic features are extracted in task 22 of method 10, the machine learning model 136 may be trained to use some combination of deep features derived from the image segmentation process and radiomic features that are separately extracted from the segmented medical image.
As shown in
The risk score or medical prediction established in task 30 may be made by the same machine learning model 136 used to establish the phenotype characteristics, or it may be made by another machine learning model. That machine learning model may use any combination of deep features extracted from the segmentation process, radiomic features extracted from the medical image, general clinical or demographic information on the patient, or any other available information to make a prediction. In
Once all of the phenotype characteristics and metrics have been calculated, method 10 proceeds with task 30, and the annotation, measurements, phenotype, and medical prediction or predictions are output. This may be done in any number of ways. For example, in some embodiments, the output may be a written or printed output, or a textual report that is output to the trained person or, if system 100 and method 10 are being used clinically, stored in an electronic medical record (EMR) system for the particular patient.
Frequently, the results of a method like method 10 will be displayed in a graphical user interface that is either separate from or integrated into another system, like a radiological information system.
In general, method 10 and system 100 may be used to monitor a patient's progress, to confirm the efficacy of treatment, to predict the occurrence of side effects, or to monitor over time for recurrence. These are all potential “live clinical” uses. However, method 10 and system 100 may also be used to check the accuracy of diagnoses offered by human radiologists and oncologists, to search for evidence of malignancy before human eyes can find it, and for general research purposes.
As may be clear from the above, method 10 and system 100 generate many different types of information. However, not all of that information need be presented at one time or in a particular interface. Moreover, in many cases, the interface will be dynamic, presenting contextually important information as needed. For example, as shown in
In
Method 10 returns at 32.
The description of method 10 above assumes a single prediction based on a single two-dimensional segmentation of a lesion from a single medical image acquired at a single timepoint. However, methods according to embodiments of the invention are not limited to this. As was described briefly above, method 10 and other methods according to embodiments of the invention may set of medical images acquired at the same time or at different times. If method 10 and other methods are applied to sets of medical images acquired at different times, then these methods and systems may be used for longitudinal monitoring. That is, method 10 and other methods and systems like it may be used to track and determine a lesion's response to treatment over time. If a system according to an embodiment of the invention is used for longitudinal monitoring, it may take into account all available medical images of the lesion over time. As new scans are taken and new medical images become available, the methods and systems may present the same medical predictions, updated or revised to include the new data, they may offer medical predictions of a different type that are clinically and contextually appropriate, or they may offer a mix of updated original and new, contextually-appropriate medical predictions.
In some cases, the medical predictions that are offered may be based, at least in some part, on the measurements that are taken. For example, if the measurements made during the annotation and measurement steps indicate that the lesion has grown, then one of the medical predictions may concern whether or not that apparent growth is true progression or hyperprogression. Similarly, if the measurements indicate that the lesion has progressed, the methods and systems may offer one or more medical predictions concerning whether or not the lesion is likely to benefit from some alternative treatment.
The description of method 10 presents its tasks in a certain order for case in explanation. The tasks need not necessarily be performed in the described order. For example, once a segmentation of the medical image is established, multiple tasks may be performed essentially in parallel. Moreover, certain tasks are described as being performed by certain types of machine learning models, but the nature of the model used for any particular task may vary greatly from embodiment to embodiment.
In the above embodiment, the U-net 102 used to segment the medical image is a deep learning model, a type of CNN. Other types of neural networks, like vision transformers, could also be used. Segmentation models that do not rely on neural networks could also be used, including thresholding, active-contour, and region-growing approaches. If the segmentation model does not use deep learning, extracted radiomic features could be used for medical prediction instead of features extracted from a deep learning model.
Deep learning could be used to generate more, or even all, of the necessary measurements and predictions. For example, the measurement model 134, phenotyping model 136, and risk score model 138 could all be deep learning models, like fully-connected CNNs. If deep learning models are used for these components 134, 136, 138, it may be necessary to encode the deep features from the U-net 102 in forms that the other deep learning models can use. For example, image-based deep features may be encoded in a vector form for input to deep learning models that do not take image input.
The use of multiple deep-learning machine models presents other opportunities as well. In the embodiment described above, the U-net 102 and any other machine learning models 134, 136, 138 would often be trained separately. However, it is possible to use multi-task learning to train multiple machine models at the same time. That is, cach machine learning model 102, 134, 136, 138 has a loss function associated with it. It is possible, and in some embodiments, it may be desirable, to train several machine learning models at once, i.c., to simultaneously optimize more than one loss function.
As those of skill in the art will understand, machine models must be trained before they can provide segmentations, predictions, and other kinds of useful output. Various training techniques could be used. Training the machine learning models 102, 134, 136, 138 described here would typically use a dataset of medical images of the desired type (CT, MRI, PET, etc.) with pathologically-confirmed diagnoses. For example, data from The Lung Image Database Consortium (LIDC) may be suitable for at least some uses, and the private image collections of hospitals or radiology practices could be used in other cases. The images should be sufficiently numerous and diverse in patient demographics, lesion type, phenotypical characteristics, and outcomes as to give the machine learning models 102, 134, 136, 138 sufficient exposure to a variety of different types of situations. In a typical arrangement, the available training data would be divided into two cohorts: a first cohort of data would be used for initial training, and a second cohort of data would be used for validation prior to deployment of any system 100. If necessary, the models 102, 134, 136, 138 could be retrained with adjustments until some defined performance metric is met, such as the area under the curve (AUC) of the receiver operating characteristic curve (ROC) for the model 102, 134, 136, 138.
In this description, unless the term “model” is qualified in such a way as to indicate its nature (e.g., a machine learning model, deep learning model, etc.), the term should be interpreted more broadly. For example, a nomogram is a type of model that may be used in and with embodiments of the invention. Nomograms may be used with the medical predictions generated by method 10 and output as a part of descriptive clinical reports or in other ways. A nomogram might, for example, be used to compare or combine the predictions from two or more machine learning models. A simple, non-trainable combination strategy leveraging features of import should also be considered a model. For instance, the averaging or summing of several key features to derive an aggregate score would be an example of a simple model.
Any of the methods described here may be implemented as a set of sets of machine-readable instructions on a machine-readable medium that, when executed, cause the machine to perform the method. The methods may also be implemented as a system of interconnected computing components. For example, the GUI used by the trained person to obtain the indication of interest may be a local computing device, while the computers or devices used to perform the other computations described here may be remote or “cloud-based” machines that are connected by a network, such as the Internet. Some or all portions of methods according to embodiments of the invention may also be performed by embedded systems. For example, the capability of creating an appropriate image segmentation of a medical image may be built into a medical imaging device or another such machine.
As those of skill in the art will appreciate, the tasks of a method like method 10 need not all happen in a continuous sequence, and in some cases, certain or all tasks may be fully automated. For example, all medical images described in metadata as being of a particular type (e.g., lung CT images) may be automatically segmented on entry to a PACS database, or shortly thereafter, with the remainder of the tasks of method 10 performed only on demand by a trained person. In yet other embodiments, methods like method 10 may be run in fully automated fashion, with images automatically segmented, lesions automatically identified, measurements and phenotypical characteristics established, and predictions made without an explicit indication of interest from a trained person. In those cases, a finished report on any identified lesions may simply be sent to, and saved in, an EHR system.
While several different forms of GUI output are illustrated in the figures, the output used in any particular method or system according to an embodiment of the invention may differ. Not all embodiments or implementations need use all of the output generated by method 10 and methods like it. As one example, the output of method 10 may be used to create a pop-up alert in an EHR system that a particular patient is unlikely to respond to a particular treatment, like an immune checkpoint inhibitor. Any treatment or drug recommendations may be used as alerts, prompts, or guidelines in a computer physician order entry system or in a pharmacy information system.
An implementation of a single-click annotation, measurement, phenotyping, and diagnosis (AMPD) system was constructed for lung cancer screening. The AMPD system was validated on 3,073 nodules from 1,394 patient CT scans.
Methods and Materials: AMPD included two components: a click based annotation and measurement (AM) module and a deep phenotyping diagnosis (PD) module. AM was a click-based, pan-cancer deep learning segmentation model trained with 32,735 lesions from 4,427 patients from the DeepLesion dataset. The AM module was validated on screening CTs of 851 patients with 2530 nodules from the Lung Image Database Consortium (LIDC) dataset. Nodules were annotated and measured by 4 radiologists, as well as rated with respect to 9 phenotypic properties: suspicion of malignancy, texture, spiculation, lobulation, margin definition, sphericity, calcification, internal structure, and subtlety. A multitask PD model was trained to predict phenotypic attributes and overall diagnosis using deep features from the AM model in this dataset. The approach was evaluated end-to-end on a subset of the longitudinal National Lung Screening Trial (NLST) study of patients who had nodules >4 mm present in their first screening exam—94 were diagnosed with lung cancer on a subsequent CT scan, while 152 had stable nodules (<1.5 mm growth between exams) for 3 consecutive years. We Diagnostic performance was assessed at time of diagnosis and one year prior. Clicks were simulated by selecting a random point within the middle 50% of a radiologist-defined lesion annotation (LIDC) or bounding box (DeepLesion, NLST).
Results: Within LIDC, the AM module showed strong alignment with consensus of manual radiologist annotations (dice=0.81) and diameter measurements (intraclass correlation coefficient=0.86, p<1e-10, measurement error=1.20+/−2.99 mm). Deep feature phenotype scores from the PD model were significantly correlated with radiologist ratings (spearman correlation=0.06-0.62, p <=0.004). Leveraging this information, the PD model strongly predicted malignancy both at time of diagnosis (AUC=0.81) and one year prior (AUC=0.73). Lesion diameter was less predictive than PD (AUC-0.79 and AUC-0.68, respectively) and did not improve its performance when added to the model (AUC=0.83, AUC-0.74).
From a single click, the described implementation of the AMPD system produced high quality annotations and measurements that strongly aligned with the consensus of expert readers, and, furthermore, generated interpretable diagnostic predictions that can predate a clinical finding of malignancy. Following additional prospective multi-site validation, it has been determined that implementations of an AMPD system could both streamline traditional lung screening protocols (e.g., Lung-RAD v1.1) and identify malignancy sooner.
While the invention has been described with respect to certain embodiments, the description is intended to be exemplary, rather than limiting. Modifications and changes may be made within the scope of the invention, which is defined by the appended claims.
This application claims priority to, and the benefit of, U.S. Provisional Patent Application No. 63/426,098, filed Nov. 17, 2022. The contents of that application are incorporated by reference herein in their entirety.
Number | Date | Country | |
---|---|---|---|
63426098 | Nov 2022 | US |