In the field of cancer treatment in general, neoadjuvant therapy (NAT), the administration of therapeutic agents prior to surgery, can be very successful in downstaging tumors, making it feasible to conduct tissue-conserving surgery rather than more drastic surgical options. In breast cancer, for example, which is one of the most common cancers in women, NAT may significantly reduce the amount of normal breast tissue that would otherwise be removed during surgery.
After NAT is performed, the amount of cancerous tissue that remains—the residual cancer burden or residual tumor burden—is an indicator of the effectiveness of NAT in that individual case, and it has been found to be a useful prognostic for long term survival. The “gold standard” for assessing residual tumor is pathological examination of tissue sections to assess tumor cellularity, defined as the proportion (usually expressed as a percentage) of the cells visible in the tissue sections that are cancerous rather than normal.
In current clinical practice, tumor cellularity (which may be referred to elsewhere in this disclosure simply as “cellularity”) is estimated manually, by a pathologist examining a patient's tissue sample after it has been sectioned and stained (typically with hematoxylin and eosin, or H&E).
The pathologist can then study one or more regions of the tissue section on the slide, comparing the proportion of residual tumor bed area containing cancer with standard cellularity references, such as set 200 shown in
The pathologist can then compare the appearance of each viewed tissue section with the reference set to come up with an estimate of cellularity, in terms of the percentage of area occupied by tumor cells relative to normal cells in that section. In the
Such manual estimations of cellularity, each involving a series of visual comparisons or matching tasks, are time consuming. Moreover, the quality and reliability of estimations made manually by different people—different raters—is, of course, subject to inter-rater variability, which can degrade the prognostic power of the estimations, in NAT trials and in regular patient care.
There is, therefore, a need for time-efficient methods of estimating cellularity that are less manual, and less dependent on the individual pathologist's skill and consistency. Such methods would ideally take advantage of technical advancements in digital pathology to save time, reduce the influence of human error, increase inter-observer agreement, and thus improve reproducibility and diagnosis accuracy.
Embodiments generally relate to systems and methods for automatically estimating cellularity in digital pathology slide images. In one embodiment, a method comprises: extracting patches of interest from the digital pathology slide image; operating on each of the extracted patches using a trained first deep convolutional neural network (DCNN) to classify that patch as either normal, having an estimated cellularity of 0%, or suspect, having a cellularity roughly estimated to be greater than 0%; operating on each of the suspect patches using a second DCNN, trained using a deep ordinal regression model, to determine an estimated cellularity score for that suspect patch; and combining the estimated cellularity scores of the patches of interest to provide an estimated cellularity for the digital pathology slide image at a patch-by-patch level.
In another embodiment, a method of training first and second deep convolutional neural networks to automatically estimate cellularity of digital pathology slide images comprises: training the first deep convolutional neural network (DCNN) to classify each of a first plurality of training digital pathology images input to the first DCNN as either normal, having an estimated cellularity of 0%, or suspect, having a roughly estimated cellularity greater than 0%; and training the second DCNN to estimate a cellularity score of each of a second plurality of training digital pathology images, using an ordinal regression model.
In yet another embodiment, an apparatus for automatically estimating cellularity in a digital pathology slide image comprises: one or more processors; and logic encoded in one or more non-transitory media for execution by the one or more processors. When the logic is executed, it is operable to: extract patches of interest from the digital pathology slide image; operate on each of the extracted patches using a trained first deep convolutional neural network (DCNN) to classify that patch as either normal, having an estimated cellularity of 0%, or suspect, having a cellularity roughly estimated to be greater than 0%; operate on each of the suspect patches using a second DCNN, trained using a deep ordinal regression model, to determine an estimated cellularity score for that suspect patch; and combine the estimated cellularity scores of the patches of interest to provide an estimated cellularity for the digital pathology slide image at a patch-by-patch level.
A further understanding of the nature and the advantages of particular embodiments disclosed herein may be realized by reference to the remaining portions of the specification and the attached drawings.
Embodiments described herein are directed to the automatic estimation of cellularity in digital pathology slide images. The present invention uses a DPI AI platform to perform the estimation, trained and operational in accordance with the methods and systems described below.
Each of the extracted patches may be considered as roughly corresponding to a microscope field image of tissue on a slide (like slide 141, for example) in prior art methods such as that of
The next action is feeding one patch at a time into deep convolutional neural network (DCNN) 330. DCNN 330 has been previously trained using relatively conventional AI image classification methods to operate on any input patch to classify it as either normal, meaning that it has an estimated cellularity of 0%, or suspect, meaning that it has a cellularity that is estimated, by a relatively rough estimation process, to be greater than 0%. When trained DCNN 330 is subsequently used in system 300 and receives an input patch, it will produce an output at OPA if it determines the patch is normal but will send the patch on though output OPB if it determines the patch is suspect. It has been found experimentally that while a trained DCNN such as DCNN 330 can be relied on to not mistakenly classify patches of greater than 0% cellularity as normal, it may mistakenly classify a few normal patches as being suspect, meaning that a small fraction of patches in stream OPB will actually be also be of 0% cellularity i.e. normal. Hence, the term “suspect” is a more accurate term than “cancerous” to describe the OPB patches.
In the routine operation of system 300, the OPB stream feeds the suspect patches, one at a time, into DCNN 340, which has been previously trained using a deep ordinal regression model (which will be described in more detail below) to operate on any input patch to estimate a corresponding cellularity score in the range of 0% to 100%. When trained DCNN 340 is subsequently used in system 300 and receives a stream of suspect input patches through OPB, it will estimate a cellularity score for each patch, and produce an output stream at OPC with a corresponding estimated cellularity score for each patch.
Cellularity scores can thus be provided for all the input patches, and a detailed analysis across the entire whole slide image 310 performed.
At step 430, a trained DCNN such as that shown as 330 in
When step 435 finds a patch classified as suspect, but as noted above has a small chance of actually being normal, that patch is operated on by a second DCNN which, unlike the DCNN used at step 430, has been trained by an ordinal regression model technique, which does not involve separately assigning different patches into different categories on the basis of apparent cellularity. Instead, this second DCNN operates, at step 445, to estimate a specific cellularity score for each suspect patch, and that cellularity score is provided as an output at step 450.
The training of the two DCNNs used in embodiments of the present invention is done with the use of training DPI images, of similar size and image quality as those of image patches the DCNNs will later be expected to process.
In some embodiments, different sets of training images are used for DCNNs to be used for classifying images and normal or suspect (meaning DCNNs serving the function of DCNN 330) than for DCNNs to be used for cellularity score estimations (meaning DCNNs serving the function of DCNN 340), maybe with a larger proportion of images of 0% cellularity in the former set than in the latter. In other embodiments, the same set of training images may be used for both types of DCNN.
In training the first DCNN of a system of the present invention, like DCNN 330 in system 300, a relatively straightforward iterative process is involved, of a type well known to one of skill in the art. As the goal of this DCNN is simply to classify images as either normal (0% cellularity) or suspect (>0% cellularity), an initial classification of each image of the training set can be compared with a rough classification based on ground truth for that image—or the seed image from which that image was derived—and then weighting and connectivity parameters of the internal DCNN backbone iteratively adjusted until an acceptable matching rate is achieved.
In training the second DCNN of a system of the present invention, like DCNN 340 in
Consider a digital pathology image 611 fed into the DCNN model 630. Image 611 is representative of images that the first DCNN in a system of the present invention (like 330 in system 300) might have classified as suspect. DCNN model 630 operates on image 611 in a relatively conventional way to extract image features 650 from the input image. Such feature extraction methods are well known in processing natural images, but in the present invention are applied to the processing of DPI images.
Then the network branches out into a stack of K output layers (660), where each output layer (or binary classifier) contains 2 neurons and corresponds to a binary classification task. The k-th task is to predict whether the cellularity of the input image is larger than the rank Ck. Each binary classifier in stack 660 outputs a “1” or a “0” depending on the comparison. For example, one classifier might output a “1” if it predicts the cellularity of input image 611 is above 5%, while another might do so if it predicts the cellularity of input image 611 is above 10%, and so on. The binary outputs (ok for the k-th classifier) generated by the output layers 660 are then “fused” or mathematically combined at fusing element 670 which weights the outputs of different classifiers differently, to provide a fused cellularity score “c”. When this cellularity score is compared with a ground truth cellularity score for input image 611, an iterative process can be carried out (indicated by the dashed lines in the figure) to adjust parameters in the output layer 660 and in the DCNN model 630 to optimize the loss function of the binary classifiers. Those parameters and weights are then fixed, and the DCNN considered to be trained, with the fused score being the desired output.
Some of the mathematical details involved in the operation of the binary classifiers, the fusing element, and the loss function in one embodiment of the present invention are shown in
There are some very significant differences between the present invention and prior art methods, including those that apply DCNNs to the problem of cellular estimation. One such difference is that the present invention does not require either nuclear segmentation or cell segmentation of the input image patches to be performed. Almost all prior art methods depend on one or both of these as an initial step, in which all the nuclei (or cells) in an image patch are located, classified to distinguish between normal nuclei/cells and cancerous ones, and corresponding areas of interest to be analyzed further are defined. In the present invention, the only sort of segmentation that may optionally occur is a rough sort of tumor segmentation, defining a boundary around the whole tumor area, and removing the surrounding area as a simple background removal step. Avoiding the need for nuclear or cell-based segmentation is a valuable simplification afforded by the present invention.
Another difference is that prior art methods, including those few that avoid a nuclear or cell segmentation step, depend on a type of classification in which a cellularity score—the percentage of cellular area that appears to be taken up by cancer cells—is assigned based on finding the best match in terms of visual appearance to one category “bin” in a pre-established series. For example, if the input image “looks more like” images with ground truth cellularity scores in the range of 20%+/−5% than those images in any other “bins” in the entire 0 to 100% range, the cellularity of that input image will be estimated to be 20%+/−5%. In the present invention, instead of using a DCNN trained by that type of regression model, a direct estimate of a specific cellularity score is provided, from a DCNN trained according to an ordinal regression model.
Systems of the present invention can be realized in the form of stand-alone computer software, or it can be deployed onto the existing clinical CAD (computer-aided diagnosis) systems. Applications for the methods described herein, beyond NAT monitoring and survival prediction, include other image-based rating tasks such as cancer grading.
Although the description has been described with respect to particular embodiments thereof, these particular embodiments are merely illustrative, and not restrictive.
Any suitable programming language can be used to implement the routines of particular embodiments including C, C++, Java, assembly language, etc. Different programming techniques can be employed such as procedural or object oriented. The routines can execute on a single processing device or multiple processors. Although the steps, operations, or computations may be presented in a specific order, this order may be changed in different particular embodiments. In some particular embodiments, multiple steps shown as sequential in this specification can be performed at the same time.
Particular embodiments may be implemented in a computer-readable storage medium for use by or in connection with the instruction execution system, apparatus, system, or device. Particular embodiments can be implemented in the form of control logic in software or hardware or a combination of both. The control logic, when executed by one or more processors, may be operable to perform that which is described in particular embodiments.
Particular embodiments may be implemented by using a programmed general purpose digital computer, by using application specific integrated circuits, programmable logic devices, field programmable gate arrays, optical, chemical, biological, quantum or nanoengineered systems, components and mechanisms may be used. In general, the functions of particular embodiments can be achieved by any means as is known in the art. Distributed, networked systems, components, and/or circuits can be used. Communication, or transfer, of data may be wired, wireless, or by any other means.
It will also be appreciated that one or more of the elements depicted in the drawings/figures can also be implemented in a more separated or integrated manner, or even removed or rendered as inoperable in certain cases, as is useful in accordance with a particular application. It is also within the spirit and scope to implement a program or code that can be stored in a machine-readable medium to permit a computer to perform any of the methods described above.
A “processor” includes any suitable hardware and/or software system, mechanism or component that processes data, signals or other information. A processor can include a system with a general-purpose central processing unit, multiple processing units, dedicated circuitry for achieving functionality, or other systems. Processing need not be limited to a geographic location, or have temporal limitations. For example, a processor can perform its functions in “real time,” “offline,” in a “batch mode,” etc. Portions of processing can be performed at different times and at different locations, by different (or the same) processing systems. Examples of processing systems can include servers, clients, end user devices, routers, switches, networked storage, etc. A computer may be any processor in communication with a memory. The memory may be any suitable processor-readable storage medium, such as random-access memory (RAM), read-only memory (ROM), magnetic or optical disk, or other non-transitory media suitable for storing instructions for execution by the processor.
As used in the description herein and throughout the claims that follow, “a”, “an”, and “the” includes plural references unless the context clearly dictates otherwise. Also, as used in the description herein and throughout the claims that follow, the meaning of “in” includes “in” and “on” unless the context clearly dictates otherwise.
Thus, while particular embodiments have been described herein, latitudes of modification, various changes, and substitutions are intended in the foregoing disclosures, and it will be appreciated that in some instances some features of particular embodiments will be employed without a corresponding use of other features without departing from the scope and spirit as set forth. Therefore, many modifications may be made to adapt a particular situation or material to the essential scope and spirit.
Number | Name | Date | Kind |
---|---|---|---|
20210050094 | Orringer | Feb 2021 | A1 |
Number | Date | Country |
---|---|---|
WO2019108695 | Jun 2019 | WO |
WO2020014477 | Jan 2020 | WO |
Entry |
---|
Dov David et al: “Machine Learning for Healthcare Thyroid Cancer Malignancy Prediction From Whole Slide Cytopathology Images”, Proceedings of Machine Learning Research, Mar. 29, 2019 (Mar. 29, 2019) XP055952237, Retrieved from the Internet: URL:https://proceedings.mlr.press/v106/dov19a. |
Rakhlin Alexander et al: “Breast Tumor Cellularity Assessment Using Deep Neural Networks”, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), IEEE, Oct. 27, 2019 (Oct. 27, 2019), pp. 371-380, XP033732800, DOI: 10.1109/ICCVW.2019.00048. |
Axel Berg et al: “Deep Ordinal Regression with Label Diversity”, arxiv.org, Cornell University Library, 201 Olin Library Cornell University Ithaca, NY 14853, Jun. 29, 2020 (Jun. 29, 2020), XP08193220, DOI: 10.1109/ICPR48806.2021.9412608. |
Xiao Lichao et al: “Censoring-Aware Deep Ordinal Regression for Survival Prediction from Pathological Images” Sep. 29, 2020 (Sep. 29, 2020), arxiv,org, Cornell University Library, 201 Olin Library Cornell University Ithaca, NY, 145853, pp. 449-458, XP047564083. |
Niu Zhenxing et al: “Ordinal Regression with Multiple Output CNN for Age Estimation” , 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, Jun. 27, 2016 (Jun. 27, 2016), pp. 4920-4928, XP033021685, DOI: 10.1109/CVPR.2016.532. |
“Breast Tumor Cellularity Assessment using Deep Neural Networks” https://openaccess.thecvf.com/content_ICCVW_2019/html/VRMI/Rakhlin_Breast_Tumor_Cellularity_Assessment_Using_Deep_Neural_Networks_ICCVW_2019_paper.html. |
Number | Date | Country | |
---|---|---|---|
20220398719 A1 | Dec 2022 | US |