Advances in biotechnology have generated large quantities of detailed information on patients' genetic risk for cancer such as the presence of cancer-promoting germline mutations and highly sensitive blood tests that can detect tumor DNA circulating in the blood. While these developments have the potential to improve the detection of cancer at earlier stages at the microscopic level, medical imaging remains an integral player in localizing and characterizing observable changes at the macroscopic level. The current imaging paradigm for cancer screening and detection to visually search for any signs of cancer. While screening for cancer (e.g., mammography for breast cancer) has been shown to reduce cancer related mortality, interpretation of screening exams is imperfect. For breast imaging, for example, nationally one in eight cancers have been found to go undetected by radiologists and 10 percent of all screening exams are called back for diagnostic workup, with a majority being false-positive results. Improving the interpretation of screening images, for example, mammograms, would minimize potential harms and enhance benefits to the population of being screened. In recent years, machine learning, including deep learning, has increasingly been utilized in the analysis of medical images (e.g., image recognition, disease classification) as a result of increased computational power and accumulation of big data.
It would be desirable to provide a system and method for image-based tissue classification that utilize advanced machine learning approaches combined with information from advanced biotechnology to predict the formation of cancer (e.g., determine a probability of cancer formation) before a distinct abnormality is observed. A determination of a prediction or probability of cancer formation for a given patient may be valuable for treatment planning and decision making regarding treatment.
In accordance with an embodiment, a method for tissue classification includes receiving at least two images associated with a patient, the at least two images being of a tissue, identifying a region of interest in the at least two images, analyzing the region of interest to identify changes in the tissue, generating a probability map of the region of interest based on the changes in the tissue, the probability map indicating a likelihood of formation of cancer in the tissue within a predetermined time period and displaying the probability map on a display.
In accordance with another embodiment, a system for tissue classification includes at least one database and a preprocessing module. The preprocessing module is coupled to the at least one database and configured to receive at least two images associated with a patient, the at least two images being of a tissue, to identify a region of interest in the at least two images, and to analyze the region of interest to identify changes in the tissue. The system also includes a classifier coupled to the at least one database and the preprocessing module and configured to generate a probability map of the region of interest based on the changes in the tissue, the probability map indicating a likelihood of formation of cancer in the tissue within a predetermine time period.
The present disclosure describes a system and method for image-based tissue classification and quantitative prediction. The computer-vision based system and method is configured to analyze medical images (e.g., images obtained for detecting the presence of cancer or diagnostic images) in the context of known germline mutations that have been identified using molecular or genomic analysis such as whole genome sequencing (WGS) or whole exome sequencing (WES). The system and method may be used to analyze tissue in an organ to assess its favorability status for initiating forming or growing a cancer given a known genetic risk (e.g., alterations to specific genes such as BRCA1/2). For example, a quantitative prediction of the probability of the formation of cancer in a tissue may be generated based on an analysis of images of the tissue. It has been shown that multiple variegated biologic networks display abnormal behavior or dynamics before a cancer forms, and these abnormalities change microenvironmental tissue states for a cancer cell to survive, “take root,” and progress. Image analysis of microenvironmental tissue states may reflect the presence of abnormal biologic networks, i.e., cellular, tissue organ, and systemic network abnormalities and thus detect the early pre-cancer cell environment. The system and method described herein is configured to identify “pre-cancer” tissue status rather than to search for an already formed cancer as is done in current imaging paradigms for detecting cancer. In various embodiments, the system and method described herein is configured to identify static and dynamic tissue imaging characteristics that indicate a “fertile ground” for cancer development in the presence of known elevated genetic risks.
It may be advantageous for an individual with a known cancer-promoting germline mutation to assess the personal risk of cancer developing in certain tissues or organs. For example, a germline mutation such as BRCA1/2 increases the risk of breast cancer while also increasing the risk of cancer in other tissues (e.g., fallopian tube/ovarian). The system and method described herein may be valuable for clinical decision-making on the part of patients who have a baseline risk as determined by WGS and WES analysis (or some other technology) and complemented by relevant phenotype changes in the tissues at risk. For example, knowing a probability of cancer formation provided by the system and method for tissue classification may be valuable to patients considering drug treatment, such as aromatase inhibitors, to prevent or delay possible cancer formation. Given a genetic predisposition for cancer development, the goal of computationally analyzing tissue with no obvious abnormalities is to detect subtle changes that reflect early time-sequenced biologic network perturbations that may eventually lead to actual cancer formation, which requires multiple sequential and parallel tissue factors to develop for a cancer to start, survive and grow. Monitoring of unobvious changes in tissue may be referred to as the “countdown” to cancer.
To detect these tissue changes, a computer-based system may be used that incorporates machine learning (ML) and deep learning (DL) methods to detect subtle static and dynamic incremental tissue (or whole organ) imaging features difficult to detect by the human visual system. Machine and deep learning may be used to extract information from medical images. In an embodiment, the longitudinal nature of the data collected in images (e.g., detection or diagnostic images) may enable utilization of advanced machine learning approaches to analyze subtle changes in the appearance of an imaged region (e.g., breast parenchyma) that may predict the formation of cancer even before a distinct abnormality is observed. In an embodiment, an individualized probability map that visualizes the risk of observed changes in pixel or voxel values as reflecting malignancy within a specific time period may be generated and presented for a given patient. The probability map may be valuable to, for example, radiologists and referring physicians, in determining how best to move forward with further diagnostic tests, particularly in patients with an observed mutation (e.g., BRCA1 positive). The probability map may be useful for shared decision making.
In an embodiment, the system and method for tissue classification may be used to characterize tissue changes that presage formation of actual cancer. In another embodiment, the system and method for tissue classification may also be used to identify cancer. Salient imaging features (e.g., intensity profile, shape, texture) may be quantitatively analyzed and associated with changes in tissue characteristics (e.g., microstructure, metabolic status, physiologic status, cytoarchitecture, etc.) that indicate a pathway favorable to cancer formation. The system and method for tissue classification may be used to provide quantitative predictions for the formation of various types of cancer such as, for example, breast cancer, prostate cancer, liver cancer, pancreas cancer, etc.
Pre-processing module 114 is configured to receive a selected set of images for a specific patient. The set of images for the patient includes at least two sequential images of a region (e.g., tissue(s) or whole organs) of interest associated with a selected cancer of interest. The pre-processing module 114 performs various processing steps on the set of images to generate difference image data (e.g., extracted features) that may be associated with or characterize changes in tissue characteristics as discussed further below with respect to
In an embodiment, classifier 102 is trained to classify on a pixel-level (for two dimensional images) or voxel level (for three dimensional images) the probability of cancer within time period t+i.
At block 206, the sets of sequential images or exams are matched by modality, description, and relevant clinical and molecular data to generate a plurality of subgroups of the sets of sequential images. As mentioned above, in an embodiment individual prediction models (i.e., a classifier 102) may be generated based on each subgroups of cases that share similar characteristics. In one example for breast cancer, subgroups may be created for imaging modalities (e.g., 2D mammography versus 3D mammography), patients (e.g., younger patients <60 years old versus older patients ≥60 years old who tend to have less dense breasts), and clinical/molecular information, whenever available, such as race/ethnicity and mutation status (e.g., BRCA1/2). The purpose of creating a classifier for each subgroup is to reduce the variability that exists among different modalities, patients, clinical data and molecular data. Individual classifiers or models are trained on data from each subgroup (e.g., a classifier for younger women who are BRCA1/2 positive heterogeneous or extremely dense breasts). The images or exams may also be matched by laterality and other positioning information to ensure a consistent field of view. At block 208, a set of sequential images associated with a selected subgroup are identified for use in training a classifier for the subgroup. The training process described in references to blocks 208-216 below may be repeated for each subgroup to create a classifier for that subgroup.
At block 210, difference image data is generated for each set of sequential images in the subgroup.
At block 306, each image in the selected set of images (e.g., pixel intensity values for each image) are denoised and normalized. If an image is of sufficient quality, a denoising algorithm is applied to reduce acquisition-specific noise and enhance tissue contrast in a region of interest (e.g., the parenchymal region). In various embodiments, known denoising algorithms may be used. In one embodiment for breast imaging utilizing mammograms, a convolutional neural network that consists of 10 layers, 5 convolutional (encoder) and 5 deconvolutional (decoder) layers symmetrically arranged may be used for denoising and normalizing. Each layer is followed by a rectified linear unit (ReLU). The convolution neural network (CNN) with perceptual loss is trained to map between mammograms acquired at different compression force and tube current to a standardized value, essentially denoising and normalizing these images. Given that actual patient images at different tube currents and compressions are impractical to obtain, the network is trained using a physics-based simulation to generate multiple possible views of breast parenchyma under different acquisition parameters. Using the trained denoising CNN, mammograms that are acquired serially but perhaps with slight variations in acquisition settings are inputted into the model, and a normalized, denoised image is generated as the output. At block 308, a baseline image is selected from the set of sequential images and the remaining sequential images in the set are registered to the baseline image (or exam). In an embodiment, the selection of a baseline image is arbitrary. In another embodiment, the baseline image is the earliest acquired image in the set of sequential images. The images may be registered to the baseline image using known methods.
At block 310, segmentation of an organ or tissue of interest (or region of interest) is performed on each image in the set of sequential images. The organ or tissue of interest is based on the selected cancer type. Known methods for image segmentation may be used to segment the organ or tissue of interest. In an embodiment for breast imaging, an automated segmentation approach utilizing adaptive thresholding to delineate the breast parenchyma and an iterative “cliff detection” approach to delineate the pectoral margin, separating the breast tissue from pectoral muscle may be applied. Once segmented, the breast parenchyma region (foreground) may be resized to fit an image of fixed size (e.g., 1200×1200) and the background region is set to 0.
At block 312, an extraction process is performed on the segmented organ or tissue of interest (or region of interest) for each image to extract features per pixel in the segmented region and to generate difference image data, for example, a difference image for each pair of sequential images in the set of sequential images. For example, if there are three sequential images (image 1, image 2 and image 3) in the set of sequential images, two difference images are generated. One difference image between image 1 and image 2 and one difference image between image 2 and image 3. In an embodiment, the feature for each pixel is a quantitative representation of at least one underlying tissue characteristic. For example, as mentioned above, salient imaging features (e.g., intensity profile, shape, texture) may be quantitatively analyzed and associated with changes in tissue characteristics (e.g., microstructure, metabolic status, physiologic status, cytoarchitecture, etc.). The purpose of the extraction process is to generate features that best characterize observed differences between two sequential imaging scans in the set of sequential images. In an embodiment, rather than operate on the original (raw) image, the image may first be transformed into a different space that would help amplify features that have changed between the two serial scans. Various transformations may be used for this purpose. In an embodiment for breast imaging, the Phase Stretch Transform may be applied because of an interest in detecting textual differences in the breast parenchyma (e.g., structural alterations in breast tissue that may indicate environmental changes). In this embodiment, the input image is first transformed into the frequency domain using 2D or 3D fast Fourier transform, depending on imaging modality. A warped phase stretch transform is then applied on the image in this domain. The phase of the output image is then thresholded and postprocessed using morphological operators to enhance edge information within the image. Each image is transformed in the same manner. Taking two sequential transformed images, the difference between the two images is calculated and, for example, a difference image is generated. At block 314, it is determined whether the current set of sequential images being processed is the last set of sequential images in the subgroup. If the current set of sequential images being processed is not the last set of sequential images in the subgroup, the process returns to block 302 and another set of sequential images is selected and processed as described with respect to blocks 304-312. If the current set of sequential images being processed is the last set of sequential images in the subgroup, the difference image data (e.g., difference images) for each image in the set of sequential images is stored, for example, in the sequential image database 106 (shown in
Returning to
Returning to
h(i|Wx
where h(i|Wx
θm=G(Wx
where G is the activation function, W defines the coefficient weight matrix between the input and hidden layers, and b is the bias term for each hidden node. The probability of the event to be observed occurring with case m at time i is given as:
l(β)=ΣC(m)=1(θm−log Σi
where l(β) is the partial log likelihood, C(m)=1 indicates the occurrence of a malignancy in a case, and n represents the set of cases where malignancy has not occurred before time interval i. The output of the model is the predicted probability of an event (diagnosis of malignancy) assigned to each pixel (or set of pixels) in the image.
Once a classifier has been trained, the classifier may be applied to a set of sequential images for a particular patient to generate a probability map.
The computer system 500 may operate autonomously or semi-autonomously, or may read executable software instructions from memory 506 or a computer-readable medium (e.g., hard drive a CD-RIOM, flash memory), or may receive instructions via the input from a user, or any other source logically connected to a computer or device, such as another networked computer or server. Thus, in come embodiments, the computer system 500 can also include any suitable device for reading computer-readable storage media. In general, the computer system 500 may be programmed or otherwise configured to implement the methods and algorithms described in the present disclosure.
The input 502 may take any suitable shape or form, as desired, for operation of the computer system 500, including the ability for selecting, entering, or otherwise specifying parameters consistent with performing tasks, processing data, or operating the computer system 500. In some aspects, the input 502 may be configured to receive data, such as imaging data, clinical data or molecular data. In addition, the input 502 may also be configured to receive any other data or information considered useful for implementing the methods described above. Among the processing tasks for operating the computer system 500, the one or more hardware processors 504 may also be configured to carry out any number of post-processing steps on data received by way of the input 502.
The memory 506 may contain software 510 and data 512, such as imaging data, clinical data and molecular data, and may be configured for storage and retrieval of processed information, instructions, and data to be processed by the one or more hardware processors 504. In some aspects, the software 510 may contain instructions directed to implementing one or more machine learning algorithms with a hardware processor 504 and memory 506. In addition, the output 508 may take any form, as desired, and may be configured for displaying images, patient information, probability maps, and reports, in addition to other desired information. Computer system 500 may also be coupled to a network 514 using a communication link 516. The communication link 516 may be a wireless connection, cable connection, or any other means capable of allowing communication to occur between computer system 500 and network 514.
Computer-executable instructions for tissue classification according to the above-described systems and methods may be stored on a form of computer readable media. Computer readable media includes volatile and nonvolatile, removable, and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Computer readable media includes, but is not limited to, random access memory (RAM), read-only memory (ROM), electrically erasable programmable ROM (EEPROM), flash memory or other memory technology, compact disk ROM (CD-ROM), digital volatile disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired instructions and which may be accessed by a system (e.g., a computer), including by internet or other computer network form of access.
The present invention has been described in terms of one or more preferred embodiments, and it should be appreciated that many equivalents, alternatives, variations, and modifications, aside from those expressly states, are possible and within the scope of the invention.
This application is based on, claims priority to, and incorporates herein by reference in its entirety U.S. Ser. No. 62/807,811 filed Feb. 20, 2019 and entitled “System and Method For Tissue Classification Using Quantitative Image Analysis of Serial Scans.”
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US2020/019076 | 2/20/2020 | WO | 00 |
Number | Date | Country | |
---|---|---|---|
62807811 | Feb 2019 | US |