This disclosure generally relates to tools for analyzing and classifying digital pathology images.
Whole Slide Images (WSI) result from scans of images of samples or from digital-native scans. A scan, and the corresponding WSI, is often very large, for example 100,000 pixels by 100,000 pixels in each of several color channels, making it difficult to efficiently analyze WSI on a holistic level using traditional computational methods. Current approaches to handle the large formats of WSI include segmenting the WSI into smaller portions and performing parallel analysis using multiple processors or otherwise distributed processing. Segmenting and distributed processing may be useful to gather understanding of the discrete portions but cannot generate an understanding of the WSI as a whole.
A pathologist or other trained specialist will often evaluate a WSI for evidence of abnormalities in the depicted tissue. Labeling for WSI tends to refer to the entire image and not, for example, to a specific portion of an image. For example, a pathologist may identify a tissue abnormality (e.g., a tumor) in an image of a lung and label the image as “abnormal.” In most cases, however, the pathologist will not annotate the image to specify where in the image the tissue abnormality appears. This “all or nothing” labelling style is less useful for training computer-implemented algorithms to evaluate WSI. However, even under whole-image labelling, pathologist analysis is time consuming. To have pathologists re-evaluate old samples to mark individual locations is prohibitively time consuming. Moreover, many conditions are not mutually exclusive, so a single WSI may indicate multiple conditions simultaneously which may require multiple specialists to review the image simultaneously to ensure that all abnormal conditions are labeled.
Accordingly, a desire exists for WSI labels or annotations that provide refinement past a binary labeling of images as “normal image” or “abnormal image.” In addition, a desire exists for such annotations to pinpoint portions of an image or locations in an image that depict an abnormality.
In particular embodiments, a computer-implemented method includes, receiving or otherwise accessing a whole slide image and segmenting the whole slide image into multiple tiles. In particular, the whole slide image may be a large format image and the size of the segmented tiles may be selected to facilitate efficient management and processing. The method includes generating an embedding feature vector corresponding to each tile of the plurality of tiles. In particular embodiments, the embedding feature vectors are generated using a neural network trained using natural images. The method includes computing a weighting value corresponding to each embedding feature vector using an attention network. The method includes computing an image embedding from the embedding feature vectors. Each embedding feature vector is weighted from the weighting value corresponding to the embedding feature vector. In particular embodiments, the method further includes normalizing the weighting values prior to computing the image embedding. The method includes generating a classification for the whole slide image from image embedding. The classification for the whole slide image may indicate the presence of one or more biological abnormalities in tissue depicted in the whole slide image, include hypertrophy, Kupffer cell abnormalities, necrosis, inflammation, glycogen abnormalities, lipid abnormalities, peritonitis, anisokaryosis, cellular infiltration, karyomegaly, microgranuloma, hyperplasia, or vacuolation. The classification for the whole slide image may include an evaluation of a potentially toxic event associated with tissue depicted in the whole slide image. In particular embodiments, the computer may compute weighting values corresponding to each embedding feature vector using multiple attention networks and generate a respective classification for the whole slide image from each attention network. In particular embodiments, the classification indicates the whole slide image depicts one or more abnormalities associated with the tissue depicted in the whole slide image. In particular embodiments the method includes providing the classification for the whole slide image to a pathologist for verification.
In particular embodiments, the computer may generate a heatmap corresponding to the whole slide image. The heatmap may include tiles corresponding to the tiles of the whole slide image. An intensity value associated with each tile of the heatmap may be determined from the weighting value corresponding to the embedding feature vector of the corresponding tile of the whole slide image. In particular embodiments, the method further includes generating annotations for the whole slide image. The computer generates annotations for the whole slide image by identifying one or more weighting values satisfying a predetermined criteria, such as exceeding a threshold value, identifying one or more embedding feature vectors corresponding to the identified weighting values, and identifying one or more tiles corresponding to the identified embedding feature vectors. The annotations for the whole slide image may be provided for display in association with the whole slide image by marking the identified tiles or as an interactive overlay.
In particular embodiments, the computer may calculate a confidence score associated with the classification for the whole slide image from at least the weighting values and provide the confidence score for display in association with the classification for the whole slide image. In particular embodiments, the computer may identify, from the embedding feature vectors, weighting values, and slide embedding feature vector, derivative characteristics associated with the classification for the whole slide image. In particular embodiments, the computer may generate multiple classifications for multiple whole slide images, respectively and train one or more attention networks to predict weighting values associated with one or more conditions, respectively, using the classifications as a ground truth associated with the whole slide images. In particular embodiments, the whole slide image is received from a user device and the method includes providing the classification for the whole slide image to the user device for display. In particular embodiments, the whole slide image is received from a digital pathology image generation system communicatively coupled with a digital pathology image processing system that performs the method.
The embodiments disclosed herein are only examples, and the scope of this disclosure is not limited to them. Particular embodiments may include all, some, or none of the components, elements, features, functions, operations, or steps of the embodiments disclosed herein. Embodiments according to the invention are in particular disclosed in the attached claims directed to a method, a storage medium, a system and a computer program product, wherein any feature mentioned in one claim category, e.g., method, may be claimed in another claim category, e.g., system, as well. The dependencies or references back in the attached claims are chosen for formal reasons only. However any subject matter resulting from a deliberate reference back to any previous claims (in particular multiple dependencies) may be claimed as well, so that any combination of claims and the features thereof are disclosed and may be claimed regardless of the dependencies chosen in the attached claims. The subject-matter which may be claimed includes not only the combinations of features as set out in the attached claims but also any other combination of features in the claims, wherein each feature mentioned in the claims may be combined with any other feature or combination of other features in the claims. Furthermore, any of the embodiments and features described or depicted herein may be claimed in a separate claim and/or in any combination with any embodiment or feature described or depicted herein or with any of the features of the attached claims.
As described herein, WSI are extremely large format digital images that may result from digitization of physical slides into high-resolution image files or may be output directly by medical scanning devices. WSI are typically preserved in the highest possible resolution format because of the nature of the images being captured and to avoid the misdiagnosis of tissue depicted in the WSI because of artifacts that ordinarily result from image compression and manipulation. WSI often include orders of magnitude larger numbers of pixels than typical digital images, and may include resolutions of 100,000 pixels by 100,000 pixels (e.g., 10,000 megapixels) or greater.
Analysis of WSI is a labor-intensive process that requires highly specialized individuals with the knowledge and dexterity to the review the WSI, recognize and identify abnormalities, classify the abnormalities, label the WSI, and potentially render diagnosis of the tissue. Additionally, because WSI are used for a wide array of tissue types, persons with the knowledge and skill to identify abnormalities must be further specialized in order to provide accurate analysis and diagnosis. Tissue abnormalities that may be detected from a WSI include, by way of example only and not limitation, inflammation, pigmentation, degeneration, anisokaryosis, hypertrophy, mitotic increase, mononuclear cell infiltration, inflammatory cell infiltration, inflammatory cell foci, decreased glycogen, glycogen accumulation (diffuse or concentrated), extramedullary myelopoiesis, extramedullary hematopoiesis, extramedullary erythropoiesis, single-cell necrosis, diffuse necrosis, marked necrosis, coagulative necrosis, apoptosis, karyomegaly, peribiliary, increased cellularity, glycogen deposits, lipid deposits, microgranuloma, congestion, Kupffer cell pigmentation, increased hemosiderin, histiocytosis, hyperplasia, or vacuolation, among many others. Therefore, because of the labor- and knowledge-intensive nature of the work, WSIs are considered candidate for automation certain functions. However, the large size of WSIs renders typical techniques ineffective, slow, and expensive. It is not practical to perform standard image recognition and deep learning techniques, which require analysis of multiple rounds of many samples of WSIs to increase accuracy. The techniques described herein are directed to solving the problem of automating feature recognition in WSI and enable the development of novel data analysis and presentation techniques that previously could not be performed with WSI due to the well-documented limitations.
The systems disclosed herein may efficiently generate training data for feature recognition based on standard WSI labels. Additionally, the present systems may identify whether a WSI contains abnormalities and where in the WSI the abnormalities are located.
At 120 the digital pathology image processing system 210, for example using a tile-generating module 211, segments the whole slide image 105 into a plurality of tiles 115a, 115b, . . . 115n.
At 130 the digital pathology image processing system 210, for example using a tile embedding module 212, generates embeddings for each tile of the plurality of tiles using an embedding network 125. As an example, for tile 115a, the tile embedding module 212 generates a corresponding embedding 135a, for tile 115b, the tile embedding module 212 generates a corresponding embedding 135b, and for tile 115n, the tile embedding module 212 generates a corresponding embedding 135n. As described herein, the embeddings may include unique representations of the tiles that preserve some information about the content or context of the tiles. The tile embeddings may also be derived from a translation of the tiles into a corresponding tile embedding space, where distinct within the tile embedding space correlates to similarity of the tiles. For example, tiles that depict similar subject matter or have similar visual features will be positioned closer in the embedding space than tiles that depict different subject matter or have dissimilar visual features. The tile embeddings may be represented as feature vectors.
As illustrated in
At 150, the digital pathology image processing system 210, for example using an image embedding module 214, computes image embeddings V1, V2, . . . Vn for the whole slide image 105 by combining the tile embeddings in a weighted combination, using the weighting values generated for each embedding to weight the respective embedding. In some embodiments, multiple image embeddings V1, V2, . . . Vn may be generated, for example one image embedding for each attention network 145a, 145b, 145c. The image embedding Vn may be calculated as a weighted combination, Vn=α1+α2+ . . . +αn. In some embodiments, a single image embedding may be generated using all of the weighting values (e.g., weighting values from all of the attention networks).
At 160, the digital pathology image processing system 210, for example using an image classification module 215, classifies the whole slide image 105 using the image embeddings V1, V2, . . . Vn. In some embodiments, the image classification module 215 uses an classification network 155 to generate the classifications. The classifications are then presented as evaluations of the whole slide image, where the evaluations are equivalent to predictions of one or more conditions present in the whole slide image. For example, the evaluations may include a determination that the whole slide image depicts normal biological conditions or contains diagnosable biological abnormalities. Diagnosable biological abnormalities may include abnormalities associated with hypertrophy (e.g., hepatocyte hypertrophy, Kupffer cell hypertrophy, etc.), Kupffer cells (e.g., Kupffer cell pigmentation, Kupffer cell hypertrophy, etc.), necrosis (e.g., diffuse, focal, coagulative, etc.), glycogen (e.g., glycogen depletion, glycogen deposits, etc.), inflammation, lipids (e.g., lipid depletion, lipid deposits, etc.), peritonitis, and other conditions. As another example, the evaluations may include a determination that indications of one or more conditions are present in the whole slide image. The evaluations may be provided to users or operators of the digital pathology image processing system 210 for review. The evaluations may also be provided to one or more user devices 230.
As discussed herein, the output from the digital pathology image processing system 210 may be provided in a number of forms, including a simple recitation of the evaluations made by the digital pathology image processing system. More advanced output may also be provided. As an example, the digital pathology image processing system 210 may generate “heatmaps” of the whole slide image where the value of each tile of the heatmap is correlated to the value of one or more of the weighting values generated by the attention networks. Example heatmaps are illustrated in
A digital pathology image generation system 220 may generate one or more digital pathology images, including, but not limited to whole slide images, corresponding to a particular sample. For example, an image generated by digital pathology image generation system 220 may include a stained section of a biopsy sample. As another example, an image generated by digital pathology image generation system 220 may include a slide image (e.g., a blood film) of a liquid sample. As another example, an image generated by digital pathology image generation system 220 may include fluorescence microscopy such as a slide image depicting fluorescence in situ hybridization (FISH) after a fluorescent probe has been bound to a target DNA or RNA sequence.
Some types of samples (e.g., biopsies, solid samples and/or samples including tissue) may be processed by a sample preparation system 221 to fix and/or embed the sample. Sample preparation system 221 may facilitate infiltrating the sample with a fixating agent (e.g., liquid fixing agent, such as a formaldehyde solution) and/or embedding substance (e.g., a histological wax). For example, a sample fixation sub-system may fix a sample by exposing the sample to a fixating agent for at least a threshold amount of time (e.g., at least 3 hours, at least 6 hours, or at least 12 hours). A dehydration sub-system may dehydrate the sample (e.g., by exposing the fixed sample and/or a portion of the fixed sample to one or more ethanol solutions) and potentially clear the dehydrated sample using a clearing intermediate agent (e.g., that includes ethanol and a histological wax). A sample embedding sub-system may infiltrate the sample (e.g., one or more times for corresponding predefined time periods) with a heated (e.g., and thus liquid) histological wax. The histological wax may include a paraffin wax and potentially one or more resins (e.g., styrene or polyethylene). The sample and wax may then be cooled, and the wax-infiltrated sample may then be blocked out.
A sample slicer 222 may receive the fixed and embedded sample and may produce a set of sections. Sample slicer 222 may expose the fixed and embedded sample to cool or cold temperatures. Sample slicer 222 may then cut the chilled sample (or a trimmed version thereof) to produce a set of sections. Each section may have a thickness that is (for example) less than 100 μm, less than 50 μm, less than 10 μm or less than 5 μm. Each section may have a thickness that is (for example) greater than 0.1 μm, greater than 1 μm, greater than 2 μm or greater than 4 μm. The cutting of the chilled sample may be performed in a warm water bath (e.g., at a temperature of at least 30° C., at least 35° C. or at least 40° C.).
An automated staining system 223 may facilitate staining one or more of the sample sections by exposing each section to one or more staining agents. Each section may be exposed to a predefined volume of staining agent for a predefined period of time. In some instances, a single section is concurrently or sequentially exposed to multiple staining agents.
Each of one or more stained sections may be presented to an image scanner 224, which may capture a digital image of the section. Image scanner 224 may include a microscope camera. The image scanner 224 may capture the digital image at multiple levels of magnification (e.g., using a 10× objective, 20× objective, 40× objective, etc.). Manipulation of the image may be used to capture a selected portion of the sample at the desired range of magnifications. Image scanner 224 may further capture annotations and/or morphometrics identified by a human operator. In some instances, a section is returned to automated staining system 223 after one or more images are captured, such that the section may be washed, exposed to one or more other stains and imaged again. When multiple stains are used, the stains may be selected to have different color profiles, such that a first region of an image corresponding to a first section portion that absorbed a large amount of a first stain may be distinguished from a second region of the image (or a different image) corresponding to a second section portion that absorbed a large amount of a second stain.
It will be appreciated that one or more components of digital pathology image generation system 220 can, in some instances, operate in connection with human operators. For example, human operators may move the sample across various sub-systems (e.g., of sample preparation system 221 or of digital pathology image generation system 220) and/or initiate or terminate operation of one or more sub-systems, systems or components of digital pathology image generation system 220. As another example, part or all of one or more components of digital pathology image generation system (e.g., one or more subsystems of the sample preparation system 221) may be partly or entirely replaced with actions of a human operator.
Further, it will be appreciated that, while various described and depicted functions and components of digital pathology image generation system 220 pertain to processing of a solid and/or biopsy sample, other embodiments may relate to a liquid sample (e.g., a blood sample). For example, digital pathology image generation system 220 may receive a liquid-sample (e.g., blood or urine) slide that includes a base slide, smeared liquid sample and cover. Image scanner 224 may then capture an image of the sample slide. Further embodiments of the digital pathology image generation system 220 may relate to capturing images of samples using advancing imaging techniques, such as FISH, described herein. For example, once a florescent probe has been introduced to a sample and allowed to bind to a target sequence appropriate imaging may be used to capture images of the sample for further analysis.
A given sample may be associated with one or more users (e.g., one or more physicians, laboratory technicians and/or medical providers) during processing and imaging. An associated user may include, by way of example and not of limitation, a person who ordered a test or biopsy that produced a sample being imaged, a person with permission to receive results of a test or biopsy, or a person who conducted analysis of the test or biopsy sample, among others. For example, a user may correspond to a physician, a pathologist, a clinician, or a subject. A user may use one or one user devices 230 to submit one or more requests (e.g., that identify a subject) that a sample be processed by digital pathology image generation system 220 and that a resulting image be processed by a digital pathology image processing system 210.
Digital pathology image generation system 220 may transmit an image produced by image scanner 224 back to user device 230. User device 230 then communicates with the digital pathology image processing system 210 to initiate automated processing of the image. In some instances, digital pathology image generation system 220 provides an image produced by image scanner 224 to the digital pathology image processing system 210 directly, e.g. at the direction of the user of a user device 230. Although not illustrated, other intermediary devices (e.g., data stores of a server connected to the digital pathology image generation system 220 or digital pathology image processing system 210) may also be used. Additionally, for the sake of simplicity only one digital pathology image processing system 210, image generating system 220, and user device 230 is illustrated in the network 200. This disclosure anticipates the use of one or more of each type of system and component thereof without necessarily deviating from the teachings of this disclosure.
The network 200 and associated systems shown in
Digital pathology image processing system 210 may process digital pathology images, including whole slide images, to classify the digital pathology images and generate annotations for the digital pathology images and related output. A tile generating module 211 may define a set of tiles for each digital pathology image. To define the set of tiles, the tile generating module 211 may segment the digital pathology image into the set of tiles. As embodied herein, the tiles may be non-overlapping (e.g., each tile includes pixels of the image not included in any other tile) or overlapping (e.g., each tile includes some portion of pixels of the image that are included in at least one other tile). Features such as whether or not tiles overlap, in addition to the size of each tile and the stride of the window (e.g., the image distance or pixels between a tile and a subsequent tile) may increase or decrease the data set for analysis, with more tiles (e.g., through overlapping or smaller tiles) increasing the potential resolution of eventual output and visualizations. In some instances, tile generating module 211 defines a set of tiles for an image where each tile is of a predefined size and/or an offset between tiles is predefined. Furthermore, the tile generating module 211 may create multiple sets of tiles of varying size, overlap, step size, etc., for each image. In some embodiments, the digital pathology image itself may contain tile overlap, which may result from the imaging technique. Even segmentation without tile overlapping may be a preferable solution to balance tile processing requirements and avoid influencing the embedding generation and weighting value generation discussed herein. A tile size or tile offset may be determined, for example, by calculating one or more performance metrics (e.g., precision, recall, accuracy, and/or error) for each size/offset and by selecting a tile size and/or offset associated with one or more performance metrics above a predetermined threshold and/or associated with one or more optimal (e.g., high precision, highest recall, highest accuracy, and/or lowest error) performance metric(s). The tile generating module 211 may further define a tile size depending on the type of abnormality being detected. For example, the tile generating module 211 may be configured with awareness of the type(s) of tissue abnormalities that the digital pathology image processing system 210 will be searching for and may customize the tile size according to the tissue abnormalities to optimize detection. For example, the image generating module 211 may determine that, when the tissue abnormalities include searching for inflammation or necrosis in lung tissue, the tile size should be reduced to increase the scanning rate, while when the tissue abnormalities include abnormalities with Kupffer cells in liver tissues, the tile size should be increased to increase the opportunities for the digital pathology image processing system 210 to analyze the Kupffer cells holistically. In some instances, tile generating module 211 defines a set of tiles where a number of tiles in the set, size of the tiles of the set, resolution of the tiles for the set, or other related properties, for each image is defined and held constant for each of one or more images.
As embodied herein, the tile generating module 211 may further define the set of tiles for each digital pathology image along one or more color channels or color combinations. As an example, digital pathology images received by digital pathology image processing system 210 may include large-format multi-color channel images having pixel color values for each pixel of the image specified for one of several color channels. Example color specifications or color spaces that may be used include the RGB, CMYK, HSL, HSV, or HSB color specifications. The set of tiles may be defined based on segmenting the color channels and/or generating a brightness map or greyscale equivalent of each tile. For example, for each segment of an image, the tile generating module 211 may provide a red tile, blue tile, green tile, and/or brightness tile, or the equivalent for the color specification used. As explained herein, segmenting the digital pathology images based on segments of the image and/or color values of the segments may improve the accuracy and recognition rates of the networks used to generating embeddings for the tiles and image and to produce classifications of the image. Additionally, the digital pathology image processing system 210, e.g., using tile generating module 211, may convert between color specifications and/or prepare copies of the tiles using multiple color specifications. Color specification conversions may be selected based on a desired type of image augmentation (e.g., accentuating or boosting particular color channels, saturation levels, brightness levels, etc.). Color specification conversions may also be selected to improve compatibility between digital pathology image generation systems 220 and the digital pathology image processing system 210. For example, a particular image scanning component may provide output in the HSL color specification and the models used in the digital pathology image processing system 210, as described herein, may be trained using RGB images. Converting the tiles to the compatible color specification may ensure the tiles may still be analyzed. Additionally, the digital pathology image processing system may up-sample or down-sample images that are provided in particular color depth (e.g., 8-bit, 16-bit, etc.) to be usable by the digital pathology image processing system. Furthermore, the digital pathology image processing system 210 may cause tiles to be converted according to the type of image that has been captured (e.g., fluorescent images may include greater detail on color intensity or a wider range of colors).
As described herein, a tile embedding module 212 may generate an embedding (e.g., 135a, 135b, . . . 135n) for each tile in a corresponding embedding space. The embedding may be represented by the digital pathology image processing system 210 as a feature vector for the tile. The tile embedding module 212 may use a neural network (e.g., a convolutional neural network) to generate a feature vector that represents each tile of the image. In particular embodiments, the tile embedding neural network may be based on the ResNet image network trained on a dataset based on natural (e.g., non-medical) images, such as the ImageNet dataset. By using a non-specialized tile embedding network, the tile embedding module 212 may leverage known advances in efficiently processing images to generating embeddings. Furthermore, using a natural image dataset allows the embedding neural network to learn to discern differences between tile segments on a holistic level.
In other embodiments, the tile embedding network used by the tile embedding module 212 may be an embedding network customized to handle large numbers of tiles of large format images, such as digital pathology whole slide images. Additionally, the tile embedding network used by the tile embedding module 212 may be trained using a custom dataset. For example, the tile embedding network may be trained using a variety of samples of whole slide images or even trained using samples relevant to the subject matter for which the embedding network will be generating embeddings (e.g., scans of particular tissue types). Training the tile embedding network using specialized or customized sets of images may allow the tile embedding network to identify finer differences between tiles which may result in more detailed and accurate distances between tiles in the embedding space at the cost of additional time to acquire the images and the computational and economic cost of training multiple tile generating networks for use by the tile embedding module 212. The tile embedding module 212 may select from a library of tile embedding networks based on the type of images being processed by the digital pathology image processing system 210.
As described herein, tile embeddings (e.g., 135a, 135b, . . . 135n) may be generated from a deep learning neural network using visual features of the tiles. Tile embeddings may be further generated from contextual information associated with the tiles or from the content shown in the tile. For example, a tile embedding may include one or more features that indicate and/or correspond to a size of depicted objects (e.g., sizes of depicted cells or aberrations) and/or density of depicted objects (e.g., a density of depicted cells or aberrations). Size and density may be measured absolutely (e.g., width expressed in pixels or converted from pixels to nanometers) or relative to other tiles from the same digital pathology image, from a class of digital pathology images (e.g., produced using similar techniques or by a single digital pathology image generation system or scanner), or from a related family of digital pathology images. Furthermore, tiles may be classified prior to the tile embedding module 212 generating embeddings for the tiles such that the tile embedding module 212 considers the classification when preparing the embeddings.
For consistency, the tile embedding module 212 produces embeddings of a predefined size (e.g., vectors of 512 items, vectors of 2048 bytes, etc.). The tile embedding module 212 may produce embeddings of various and arbitrary sizes. The time embedding module 212 may adjust the sizes of the embeddings based on user direction or may be selected, for example, to optimize computation efficiency, accuracy, or other parameters. In particular embodiments, the embedding size may be based on the limitations or specifications of the deep learning neural network that generated the embeddings. Larger embedding sizes may be used to increase the amount of information captured in the embedding and improve the quality and accuracy of results, while smaller embedding sizes may be used to improve computational efficiency.
A weighting value generating module 213 may generate a weighting value for each tile that will be used in association with the tile and the corresponding embedding. As described herein, the weighting value may be an attention score generated by a neural network that receives tile embeddings as input and generates attention scores as output, also referred to as an attention neural network or simply an attention network. For example, the attention score may be defined to be and/or interpreted to be an extent to which a given tile is predictive of a specific output. A tiles, or tile embedding, with a high attention score relative to other tiles in a set may be said to have been identified by the attention network has having a high influence in the classification of the digital pathology image. For example, the attention network may learn that certain features in the tile or tile embedding are highly relevant to a digital pathology image being classified as normal or abnormal or as indicating inflammation or necrosis. The weighting value generating module 213 may use multiple attention networks as needed, including at least one for each class of output that the digital pathology image processing system 210 may detect. As an example, the weighting value generating module 213 may use one or more attention networks that have been trained, as described herein, to determine the key instances of tiles associated with each of multiple conditions that are detectable in the digital pathology image. As an example only and not by way of limitation, the weighting value generating module 213 may include networks trained to detect particular diagnoses which may be grouped according to the similarities or likelihood of usefulness to an end user. For example, the networks may be trained to detect conditions includes hypertrophy (e.g., hepatocyte hypertrophy, Kupffer cell hypertrophy, etc.), Kupffer cells (e.g., Kupffer cell pigmentation, Kupffer cell hypertrophy, etc.), necrosis (e.g., diffuse, focal, coagulative, etc.), glycogen (e.g., glycogen depletion, glycogen deposits, etc.), inflammation, lipids (e.g., lipid depletion, lipid deposits, etc.), peritonitis, and other conditions detectable in a digital pathology image. Additionally, the weighting value generating module 213 may include an attention network trained to determine abnormalities in the tiles of the digital pathology images and assign an overall weighting value for abnormal versus normal.
While some embodiments may include each given tile having a respective attention score, other embodiments may include attentions cores that correspond to regions of an image that may include or comprise one or more tiles or portion of tiles. For example, such image regions may extend beyond the borders of a single tile or may have a perimeter that is smaller than that of a single tile. Attention scores may result from processing of image related details (e.g., intensities and/or color values) within the tile or image region. Contextual information for the tile, such as the position of the tile within the digital pathology image, may also be used by the attention network to generate the attention score. The attention network receives a series of embeddings (e.g., vector representations) that correspond to a set of pixel intensities or to a position within an embedding space. The attention network may include, for example, a feed-forward network, perceptron network (e.g., a multilayer perceptron), and/or a network having one or more fully connected layers. The neural network may further include a convolutional neural network and one or more additional layers (e.g., a fully connected layer).
An image embedding module 214 generates an embedding for the digital pathology image (e.g., the whole slide image) using the tile embeddings (e.g., 135a, 135b, . . . 135n) and the weighting values. The image embedding may take the form of another feature vector to represent the image. As embodied herein, the image embedding may result from a combination of the tile embeddings where the weighting values generated by the weighting value generating module 213 are used to weight the tile embeddings. In other words, the image embedding may be the result of a weighted combination of the tile embeddings according to the attentions score from each attention network. In generating the image embedding, the image embedding module 214 may apply further transformations and/or normalizations to the tile embeddings (e.g., 135a, 135b, . . . 135n) and weighting values. Therefore, one or more image embeddings may be generated. In particular, the image embedding module 214 may generate one image embedding for each attention network (and thus each condition being evaluated). The image embedding module 214 may also generate one or more composite embeddings where embeddings and weighting values across attention networks are combined.
An image classification module 215 then processes the image embedding to determine which classifications should be applied to the digital pathology image. The image classification module 215 may include or use one or more classification networks 155 trained to classify a digital pathology image from the image embedding. For example, a single classification network 155 may be trained to identify and differentiate between classifications. In another example, one classification network 155 may be used for each classification or condition of interest, such that each classification network 155 determines that the image embedding is indicative of its subject classification or condition or not. The resulting classification(s) may be interpreted as evaluations of the digital pathology image and determinations that the digital pathology image includes indicators of one or more specified conditions. For example, the output of the image classification module 215 may include a series of binary yes or no determinations for a sequence of conditions. The output may be further organized as a vector composed of the yes or no determinations. The determinations may be augmented, for example, with a confidence score or interval representing the degree of confidence that the image classification module 215 or its component classification networks 155 have in a particular determination. For example, the image classification module 215 may indicate that the digital image is 85% likely to include abnormal cells, 80% likely to not be indicative of hypertrophy, 60% likely to be indicative of inflammation, etc. Additionally, the output of the classifier network(s) may include a set of scores associated with each potential classification. The image classification module 215 may then apply a normalizing function (e.g., softmax, averaging, etc.) to the scores before assessing the scores and assigning a confidence level. In this manner, the digital pathology image processing system 210 may automatically label for digital pathology images from the image embeddings, which are in turn based on tile embeddings and weighting values.
As described herein, the image embedding network, attention networks, and classification network may be artificial neural networks (“ANN”) designed and trained for a specific function.
ANNs used in particular embodiments may be a feedforward ANN with no cycles or loops and where communication between nodes flows in one direction beginning with the input layer and proceeding to successive layers. As an example, the input to each node of the hidden layer 320 may include the output of one or more nodes of the input layer 310. Similarly, the input to each node of the output layer 350 may include the output of nodes of the hidden layer 340. ANNs used in particular embodiments may be deep neural networks having least two hidden layers. ANNs used in particular embodiments may be deep residual networks, a feedforward ANN including hidden layers organized into residual blocks. The input into each residual block after the first residual block may be a function of the output of the previous residual block and the input of the previous residual block. As an example and not by way of limitation, the input into residual block N may be represented as F(x)+x, where F(x) is the output of residual block N−1, and x is the input into residual block N−1. Although this disclosure describes a particular ANN, this disclosure contemplates any suitable ANN.
In particular embodiments, each node of an ANN may include an activation function. The activation function of a node defines or describes the output of the node for a given input. In particular embodiments, the input to a node may be a singular input or may include a set of inputs. Example activation functions may include an identity function, a binary step function, a logistic function, or any other suitable function. Example activation functions for a node k may include the sigmoid function Fk(sk)=1/1+e−s
The input of an activation function corresponding to a node may be weighted. Each node may generate output using a corresponding activation function based on weighted inputs. As embodied herein, each connection between nodes may be associated with a weight. For example, a connection 325 between the node 305 and the node 315 may have a weighting coefficient of 0.4, which indicates that the input of node 315 is 0.4 (the weighting coefficient) multiplied by the output of the node 305. More generally, the output y k of node k may be yk=Fk(sk), where Fk is the activation function corresponding to node k, sk=Σj(wjkxj) is the input to node k, xj is the output of a node j connected to node k, and wjk is the weighting coefficient between node j and node k. As embodied herein, the input to nodes of the input layer 310 may be based on a vector representing an object, also referred to as a vector representation of the object, an embedding of the object in a corresponding embedding space, or other suitable input. Although this disclosure describes particular inputs to and outputs of nodes, this disclosure contemplates any suitable inputs to and outputs of nodes in an ANN. Moreover, although this disclosure describes particular connections and weights between nodes, this disclosure contemplates any suitable connections and weights between nodes.
In particular embodiments, an ANN 300 may be trained using training data. As an example and not by way of limitation, training data may include inputs to the ANN 300 and an expected output, such as a ground truth value corresponding to the input. For example, training data may include one or more vectors representing a training object and an expected label for the training object. Training typically occurs with multiple training objects simultaneously or in succession. Training an ANN may include modifying the weights associated with the connections between nodes of the ANN by optimizing an objective function. As an example and not by way of limitation, a training method may be used to backpropagate an error value. The error value may be measured as a distance between each vector representing a training object, for example, using a cost function that minimizes error or a value derived from the error, such as a sum-of-squares error. Example training methods include, but are not limited to the conjugate gradient method, the gradient descent method, the stochastic gradient descent, etc. In particular embodiments, an ANN may be trained using a dropout technique in which one or more nodes are temporarily omitted while training such that they receive no input or produce no output. For each training object, one or more nodes of the ANN have a probability of being omitted. The nodes that are omitted for a particular training object may differ from nodes omitted for other training objects. Although this disclosure describes training an ANN in a particular manner, this disclosure contemplates training an ANN in any suitable manner.
The weighting value generating module 213 may further apply normalizing functions to the attention scores associated with each embedding for the tiles. The normalizing functions may be used to normalize weighting values (e.g., attention scores) across the tiles. For example, one normalizing function that may be applied is the softmax function:
where {right arrow over (z)} is an input vector, ez
An output generating module 216 of the digital pathology image processing system 210 may use the digital pathology image, tiles, tile embeddings, weighting values, image embedding, and classifications to generate output corresponding to the digital pathology image received as input. As described herein, in addition to labels and annotations for the digital pathology image, the output may include a variety of visualizations and interactive graphics. In many embodiments, the output will be provided to the user device 230 for display, but in certain embodiments the output may be access directly from the digital pathology image processing system 210.
In one embodiment, the output for a given digital pathology image may include a so-called heatmap that identifies and highlights areas of interest within the digital pathology image. A heatmap may indicate portions of an image that depict or correlate to a particular condition or diagnosis and may indicate the accuracy or statistical confidence of such indication(s).
Because the weighting value generating module 213 may use multiple attention networks to generate the attention scores and weighting values, the output generating module 216 may produce an equivalent number of heatmaps (e.g., one heatmap corresponding to each classification for which the attention networks are configured to identify instances of indicators of a condition).
Another example output that may be produced by the output generating module 216 is annotations for the digital pathology image. The annotations may automatically indicate areas of interest to a user (e.g., a pathologist or clinician) within the digital pathology image. As described herein, the production of annotations for digital pathology images is often a difficult and time-consuming task that requires the input of individuals with a significant amount of training. Using the weighting values generated by the attention network(s) the digital pathology image processing system 210 may identify areas that a user should focus on as contain indicators of conditions of interest. As an example, the output generating module may compare the weighting values across the set of tiles for the digital pathology image and identify the tiles that have weighting values outside the norm for the image or for images of the type. The output generating module may compare the weighting values to a threshold weighting value that may be selected by the user or may be predetermined by the digital pathology image processing system 210. The threshold may differ based on the type of condition being evaluated (e.g., the threshold value for an “abnormal” annotation may differ from a threshold value for a “necrosis” annotation). Thus, the annotations for an input digital pathology image may be based on the identification of key instances within the set of tiles for the digital pathology image.
Furthermore, the annotations may simplify the process of identifying visual matches contained within the same digital pathology image by applying pattern matching, for example drawing attention to tiles that contain the same abnormalities across the image. As embodied herein, after identifying tiles of interest (e.g., using the weighting values) the digital pathology image processing system 210 may perform gradient descent on the pixels of the identified tiles to maximize the recognition and association of tiles having similar visual characteristics as the identified tiles that may have been missed by the attention networks. Thus, the digital pathology image processing system 210 may learn and identify which visual patterns maximize the classification determination for each tile of interest. This recognition may be performed on an ad hoc basis, where new patterns are learned for each digital pathology image under consideration or may be based on a library of common patterns. For example, the digital pathology image processing system 210 may store frequently occurring patterns for each classification and proactively compare tiles to those patterns to assist with identifying tiles and areas of the digital pathology image.
Once a set of tile embeddings (e.g., 135a, 135b, . . . 135n) have been identified for annotation, the digital pathology image processing system 210 works backwards to identify the tiles corresponding to those tile embeddings. For example, each embedding may be uniquely associated with a tile, which may be identified via a tile identifier within the tile embedding. The digital pathology image processing system 210 then attempts to group proximate tiles in circumstances where a collection of tiles have been determined to showcase the same condition or indicia. Each grouping of tiles may be collected and readied for display with the relevant annotations.
A first example of a digital pathology image including annotations is shown in
A second example of a digital pathology image including annotations is shown in
The model for this type of learning structure may be referred to as multiple instance learning. In multiple instance learning a collection of instances are provided together as a set with a label. Note that the individual instances are often not labelled, just the set. The label is typically based on a condition being present. The basic assumption in the multiple instance learning techniques employed by the system described is that when a set is labelled as having the condition present (e.g., when a whole slide image is labelled as abnormal) then at least one instance in the set is abnormal. Conversely, when the set is labelled as not having the instance (e.g., when a whole slide image is labelled as normal) then no instance in the set is abnormal. From this principle, and iterative training approaches, the attention network(s) may learn to identify the features of a tile (or, more specifically, a tile embedding) that correlate to an abnormal slide.
A training controller 217 of the digital pathology image processing system 210 may control training of the one or more models (e.g., neural networks) and/or functions used by digital pathology image processing system 210. In some instances, multiple or all of the neural networks used by digital pathology image processing system 210 (e.g., attention network(s) used to generate tile embeddings, a network used to generate weighting values, a network used to classify images based on image embeddings) are trained together by training controller 217. In some instances, the training controller 217 may selectively train the models using by the digital pathology image processing system 210. For example, the digital pathology image processing system 210 may use a preconfigured model to generate tile embeddings and focus on training attention network(s) to generate weighting values.
As illustrated in
The training controller 217 uses one or more loss or scoring functions 637 to evaluate the attention scores generated during the epoch. Training controller 217 may use a loss function that penalizes variability or differences in attention scores across the embeddings corresponding to each individual image. Building on the intuition that the differences between “normal” tiles of “normal” and “abnormal” images should be minimized, the loss function may penalize differences between a distribution of attention scores generated for each random sampling and a reference distribution. The reference distribution may include (for example) a delta distribution (e.g., a Dirac delta function) or a uniform or Gaussian distribution. Preprocessing of the reference distribution and/or the attention score distribution may be performed, which may include (for example) shifting one or both of the two distributions to have a same center of mass or average. It will be appreciated that, alternatively, attention scores may be preprocessed prior to generating the distribution. The loss function may characterize the differences between the distributions using (for example) Kullback-Leibler (KL) divergence. If the attention score distribution included multiple disparate peaks, the divergence with a delta distribution or uniform distribution may be more dramatic, which may result in a higher penalty. While the differences in attention scores for “normal” embeddings is minimized, the loss function may reward differences in “abnormal” tiles, effectively encouraging the attention network to learn to identify abnormal tiles from among normal tiles. Another technique may use a loss function that penalizes a lack of variability across tile attention scores. For example, a loss function may scale a penalty in an inverse manner to a K-L divergence between an attention score distribution and a delta or uniform distribution. Thus, in some instances, different types (e.g., opposite types) of loss are used for tiles associated with different labels. The results R1, R2, . . . , Rn of the loss function are provided to the attention network 635, which applies or saves modifications to the attention network 635 to optimize the scores. After the attention network 635 is modified, another training epoch begins with a randomized sample of the input tiles.
The training controller 217 determines when to cease training. For example, the training controller 217 may determine to train the attention network(s) 635 for a set number of epochs. As another example, the training controller 217 may determine to train the attention network(s) 635 until the loss function indicates that the attention networks have passed a threshold value of the divergence between the distributions. As another example, the training controller 217 may periodically pause training and provide a test set of tiles where the appropriate label is known. The training controller 217 may evaluate the output of the attention network(s) 635 against the known labels on the test set to determine the accuracy of the attention network(s) 635. Once the accuracy reaches a set threshold, the training controller 217 may cease training the attention network(s) 635.
With the attention network(s) 635 sufficiently trained to identify the key instances of a provided set of tile embeddings by associating a higher attention score to the abnormal tiles, the training controller 217 may train the classifier network(s).
At 650, the training controller 217 causes the digital pathology image processing system 210, for example using an image embedding module 214, to compute image embeddings V1, V2, . . . Vn for each whole slide image by combining the tile embeddings in a weighted combination, using the weighting values generated for each embedding to weight the respective embedding. For example, the image embedding V1 for the image 605a may be generated from the embeddings 611a, 611b, 611n, in combination with weighting values a1, b1, . . . , n1, the image embedding V2 for the image 605b may be generated from the embeddings 612a, 612b, . . . , 612n, in combination with weighting values a2, b2, . . . , n2, and the image embedding Vn for the image 605c may be generated from the embeddings 613a, 613b, . . . , 613n, in combination with weighting values an, bn, . . . , nn.
As illustrated in
At 670, the training controller 217 accesses the ground truth classifications for each of the images being classified. As illustrated in
In some instances, training controller 217 preferentially selects, retrieves, and/or accesses training images associated with a particular label. Thus, a training data set may be biased toward digital pathology images associated with the particular label. The training data set may be defined to include more images associated with labels indicating abnormal conditions or a specified abnormal condition (e.g., inflammation and necrosis) relative to images associated with labels indicating normal conditions. This may be done to account for the expectation that more “normal” images will be readily available, but the digital pathology image processing system 210 may be targeted to identifying abnormal images.
As described herein, the traditional process for obtaining labels for digital pathology images (e.g., whole slide images) is arduous and time-consuming. The digital pathology image processing system 210 and the methods of use and training said system described herein may be used to increase the set of images available for training the various networks of the digital pathology image processing system. For example, after an initial training pass using data with known labels (including, potentially annotations), the digital pathology image processing system 210 may be used to classify images without existing labels. The generated classifications may be verified by human agents and, should correction be needed, the digital pathology image processing system 210 (e.g., the classification network(s)) may be retrained using the new data. This cycle may repeat, with the expectation that viewer interventions will be required to improve the accuracy rate on previously unseen examples. Additionally, once a specified level of accuracy has been reached, the labels generated by the digital pathology image processing system 210 may be used as a ground truth for training, e.g., the attention networks 635 used by the weighting value generating module 213.
At step 715, the digital pathology image processing system 210 segments the image into tiles. As described herein, the digital pathology image is expected to be significantly larger than standard images, and much larger than would normally be feasible for standard image recognition and analysis (e.g., on the order of 100,000 pixels by 100,000 pixels). To facilitate analysis, the digital pathology image processing system segments the image into tiles. The size and shape of the tile is uniform for the purposes of analysis, but the size and shape may be variable. In some embodiments, the tiles may overlap to increase the opportunity for image context to be properly analyzed by the digital pathology image processing system 210. To balance the work performed with accuracy, it may be preferable to use non-overlapping tiles. Additionally segmenting the image into tiles may involve segmenting the image based on a color channel or dominant color associated with the image.
At step 720, the digital pathology image processing system 210 generates a tile embedding corresponding to each tile. The tile embedding may map the tile to an appropriate embedding space and may be considered representative of the features shown in the tile. Within the embedding space, tiles in spatial proximity are considered similar, while distance between tiles in the embedding space is indicative of dissimilarity. The tile embedding may be generated by an embedding network that receives tiles (e.g., images) as input and produces embeddings (e.g., vector representations) as output. The embedding network may be trained on natural (e.g., non-medical images) or may be specialized on images expected to be similar to those input into the embedding network. Using natural images increases the sophistication of available training data, while using specialized images may improve the resiliency of the embedding network and allow the image embedding network to learn to discern between finer details in the input images.
At step 725, the digital pathology image processing system 210 computes an attention score for each tile using one or more attention networks. The attention score may be generated by one or more specially-trained attention networks. The attention networks receive tile embeddings and input and produce a score for each tile embedding that indicates a relative importance of the tiles. The importance of the tile, and thus the attention score, is based on identifying tile that are dissimilar from the “normal” tile. This is based on the intuition that even in digital pathology images depicting tissue having abnormalities, the overwhelming majority of tiles will depict normal-looking tissue. Therefore, the attention network may efficiently pick out tiles embeddings (and thus tiles) that are different from the rest of the tiles in each set. Multiple attention networks may be used simultaneously, with each attention network being trained to identify tiles that are abnormal in a specific manner (e.g., depicting different types of abnormalities).
At step 730, the digital pathology image processing system 210 computes weighting values for each embedding based on the corresponding attention score. The weighting values are highly correlated with the attention scores, but may result from normalizing methods, such as applying normalizing functions (e.g., the softmax function) to balance out the values of the attention scores and facilitate comparison of attention scores across different tiles, images, and attention networks.
At step 735, the digital pathology image processing system 210 computes an image embedding corresponding to the image based on the tile embeddings and corresponding weighting values. The image embedding serves as an efficient representation of the ordinarily large-format digital pathology image without losing the context of the image (e.g., based on the attention networks identifying key tiles). The image embedding may result from a weighted combination of the tile embeddings using the weighting values as weights in the combination. In embodiments, where multiple attention networks are used to attempt to identify multiple conditions, the digital pathology image processing system 210 may generate multiple image embeddings (which may each be used to classify the image) or the digital pathology image processing system 210 may create a unified image representation based on the tile embedding and multiple sets of weighting values.
At step 740, the digital pathology image processing system 210 generates a digital pathology image classification based on the image embedding using one or more classification networks. The classification networks may include artificial neural networks that receive image embeddings as input and produce either a predicted classification of the image (e.g., normal, abnormal, depicting inflammation, etc.) or a determination that the image belongs to a specified classification (e.g., in embodiments in which multiple classification networks are used and each is trained to identify a single classification for the image). The classification networks may also produce confidence scores or intervals for the detected classifications that may indicate the degree of certainty of the classification networks. The digital pathology image processing system 210 is not limited to the number or types of classifications that may be added to the digital pathology image processing system, thus as additional training samples for a new classification are identified, the capabilities of the digital pathology image processing system may be expanded in a semi-modular fashion.
At step 745, the digital pathology image processing system 210 may generate an enhanced overlay or interactive interface for the digital pathology image. The enhanced overlay or interactive interface may include visualizations of the digital pathology image designed to enhance the understanding of a viewer of the image while also providing insight to the inner-workings of the digital pathology image processing system. For example, as described herein, the digital pathology image processing system 210 may produce one or more “heatmaps” of the digital pathology image that map to the tiles (or related groupings) of the digital pathology image. The intensity of the cells of the heatmaps may correspond to, for example, the attention scores or weighting values produced by the attention networks. Thus, multiple heatmaps may be created (one corresponding to each of the attention networks) or a single heatmap may be created that illustrates the output of the various attention networks. The digital pathology image processing system 210 may also produce annotations for the digital pathology image that identify areas of the image that may be interesting to the viewer. For example, using the attention scores or weighting values, the digital pathology image processing system 210 may identify regions of the image, indicate the classification determined by the classification network, of the tiles associated with that region, and provide additional data regarding that region and the tiles within. The system may also use the tiles within an annotation feature to perform image analysis and recognition on other tiles in the image, indicating where similar features may be found. These forms of output, and many others, may be designed to be provided through the user device 230.
At step 750, the digital pathology image processing system 210 may identify derivative characteristics of the digital pathology image or the tissues depicted therein based on the tile embeddings, image embeddings, and/or classification. For example, the digital pathology image processing system 210 may store associations and correlations between certain types of classifications or features captured in tile embeddings. For example, the digital pathology image processing system may learn natural associations between types of abnormalities that may be depicted in digital pathology images. Thus, the derivative characteristics may serve as warning or reminders to the user to look for additional features in the digital pathology image. The derivative characteristics may also correlate tile embeddings across digital pathology images. For example, the digital pathology image processing system 210 may store tile embeddings or patterns of tile embeddings and perform pattern matching with an image being evaluated to draw attention to the similarities between previously-reviewed images. The digital pathology image processing system 210 may therefore serve as a tool to identify underlying similarities and characteristics.
At step 755, the digital pathology image processing system 210 provides the generated output for display. The generated output may include, for example, the digital pathology image classification, the enhance overlay or interactive interface, or the derivative characteristics and statistics thereon. These output and more may be provided to a user via, for example, a suitably configured user device 230. The output may be provided in an interactive interface that facilitates the user reviewing the analysis performed by the digital pathology image processing system 210 while also supporting the user's independent analysis. For example, the user may turn various features of the output on or off, zoom, pan, and otherwise manipulate the digital pathology image, and provide feedback or notes regarding the classifications, annotations, and derivative characteristics.
At step 760, the digital pathology image processing system 210 may receive feedback regarding the provided output. The user may provide feedback regarding the accuracy of the classifications or annotations. The user can, for example, indicate areas of interest to the user (as well as the reason why they are interesting) that were not previously identified by the digital pathology image processing system 210. The user may additionally indicate additional classifications for the image that were not already suggested or captured by the digital pathology image processing system 210. This feedback may also be stored for the user's later access, for example as clinical notes.
At step 765, the digital pathology image processing system 210 use the feedback to retrain one or more of the networks, for example, the attention networks or classification networks, used in generated the classification. The digital pathology image processing system 210 may use the feedback to supplement the dataset available to the digital pathology image processing system 210 with the additional benefit that the feedback has been provided by a human expert which increases its reliability. The digital pathology image processing system 210 may continuously revise the networks underlying the analysis provided by the system with a goal of increasing the accuracy of its classifications as well as increasing the rate at which the digital pathology image processing system identifies major areas of interest (e.g., attributes high attention scores to highly descriptive tiles). Thus, the digital pathology image processing system 210 is not a static system, but may offer and benefit from continuous improvement.
Particular embodiments may repeat one or more steps of the method of
The general techniques described herein may be integrated into a variety of tools and use cases. For example, as described, a user (e.g., pathology or clinician) may access a user device 230 that is in communication with the digital pathology image processing system 210 and provide a digital pathology image for analysis. The digital pathology image processing system 210, or the connection to the digital pathology image processing system may be provided as a standalone software tool or package that automatically annotates digital pathology images and/or generates heatmaps evaluating the images under analysis. As a standalone tool or plug-in that may be purchased or licensed on a streamlined basis, the tool may be used to augment the capabilities of a research or clinical lab. Additionally, the tool may be integrated into the services made available to the customer of digital pathology image generation systems. For example, the tool may be provided as a unified workflow, where a user who conducts or requests a digital pathology image to be created automatically receives an annotated image or heatmap equivalent. Therefore, in addition to improving digital pathology image analysis, the techniques may be integrated into existing systems to provide additional features not previously considered or possible.
Moreover, the digital pathology image processing system 210 may be trained and customized for use in particular settings. For example, the digital pathology image processing system 210 may be specifically trained for use in providing clinical diagnoses relating to specific types of tissue (e.g., lung, heart, blood, liver, etc.). As another example, the digital pathology image processing system 210 may be trained to assist with safety assessment, for example in determining levels or degrees of toxicity associated with drugs or other potential therapeutic treatments. Once trained for use in a specific subject matter or use case, the digital pathology image processing system 210 is not necessarily limited to that use case. For example, the digital pathology image processing system may be trained for use in toxicity assessment for liver tissues, but the resulting models may be applied to a diagnostic setting. Training may be performed in a particular context, e.g., toxicity assessment, due to a relatively larger set of at least partially labeled or annotated digital pathology images. The included appendix relates to results of using the techniques described herein to perform toxicity assessment, including identifying a common toxicity event, and illustrate example output related to toxicity assessment.
This disclosure contemplates any suitable number of computer systems 800. This disclosure contemplates computer system 800 taking any suitable physical form. As example and not by way of limitation, computer system 800 may be an embedded computer system, a system-on-chip (SOC), a single-board computer system (SBC) (such as, for example, a computer-on-module (COM) or system-on-module (SOM)), a desktop computer system, a laptop or notebook computer system, an interactive kiosk, a mainframe, a mesh of computer systems, a mobile telephone, a personal digital assistant (PDA), a server, a tablet computer system, an augmented/virtual reality device, or a combination of two or more of these. Where appropriate, computer system 800 may include one or more computer systems 800; be unitary or distributed; span multiple locations; span multiple machines; span multiple data centers; or reside in a cloud, which may include one or more cloud components in one or more networks. Where appropriate, one or more computer systems 800 may perform without substantial spatial or temporal limitation one or more steps of one or more methods described or illustrated herein. As an example and not by way of limitation, one or more computer systems 800 may perform in real time or in batch mode one or more steps of one or more methods described or illustrated herein. One or more computer systems 800 may perform at different times or at different locations one or more steps of one or more methods described or illustrated herein, where appropriate.
In particular embodiments, computer system 800 includes a processor 802, memory 804, storage 806, an input/output (I/O) interface 808, a communication interface 810, and a bus 812. Although this disclosure describes and illustrates a particular computer system having a particular number of particular components in a particular arrangement, this disclosure contemplates any suitable computer system having any suitable number of any suitable components in any suitable arrangement.
In particular embodiments, processor 802 includes hardware for executing instructions, such as those making up a computer program. As an example and not by way of limitation, to execute instructions, processor 802 may retrieve (or fetch) the instructions from an internal register, an internal cache, memory 804, or storage 806; decode and execute them; and then write one or more results to an internal register, an internal cache, memory 804, or storage 806. In particular embodiments, processor 802 may include one or more internal caches for data, instructions, or addresses. This disclosure contemplates processor 802 including any suitable number of any suitable internal caches, where appropriate. As an example and not by way of limitation, processor 802 may include one or more instruction caches, one or more data caches, and one or more translation lookaside buffers (TLBs). Instructions in the instruction caches may be copies of instructions in memory 804 or storage 806, and the instruction caches may speed up retrieval of those instructions by processor 802. Data in the data caches may be copies of data in memory 804 or storage 806 for instructions executing at processor 802 to operate on; the results of previous instructions executed at processor 802 for access by subsequent instructions executing at processor 802 or for writing to memory 804 or storage 806; or other suitable data. The data caches may speed up read or write operations by processor 802. The TLBs may speed up virtual-address translation for processor 802. In particular embodiments, processor 802 may include one or more internal registers for data, instructions, or addresses. This disclosure contemplates processor 802 including any suitable number of any suitable internal registers, where appropriate. Where appropriate, processor 802 may include one or more arithmetic logic units (ALUs); be a multi-core processor; or include one or more processors 802. Although this disclosure describes and illustrates a particular processor, this disclosure contemplates any suitable processor.
In particular embodiments, memory 804 includes main memory for storing instructions for processor 802 to execute or data for processor 802 to operate on. As an example and not by way of limitation, computer system 800 may load instructions from storage 806 or another source (such as, for example, another computer system 800) to memory 804. Processor 802 may then load the instructions from memory 804 to an internal register or internal cache. To execute the instructions, processor 802 may retrieve the instructions from the internal register or internal cache and decode them. During or after execution of the instructions, processor 802 may write one or more results (which may be intermediate or final results) to the internal register or internal cache. Processor 802 may then write one or more of those results to memory 804. In particular embodiments, processor 802 executes only instructions in one or more internal registers or internal caches or in memory 804 (as opposed to storage 806 or elsewhere) and operates only on data in one or more internal registers or internal caches or in memory 804 (as opposed to storage 806 or elsewhere). One or more memory buses (which may each include an address bus and a data bus) may couple processor 802 to memory 804. Bus 812 may include one or more memory buses, as described below. In particular embodiments, one or more memory management units (MMUs) reside between processor 802 and memory 804 and facilitate accesses to memory 804 requested by processor 802. In particular embodiments, memory 804 includes random access memory (RAM). This RAM may be volatile memory, where appropriate. Where appropriate, this RAM may be dynamic RAM (DRAM) or static RAM (SRAM). Moreover, where appropriate, this RAM may be single-ported or multi-ported RAM. This disclosure contemplates any suitable RAM. Memory 804 may include one or more memories 804, where appropriate. Although this disclosure describes and illustrates particular memory, this disclosure contemplates any suitable memory.
In particular embodiments, storage 806 includes mass storage for data or instructions. As an example and not by way of limitation, storage 806 may include a hard disk drive (HDD), a floppy disk drive, flash memory, an optical disc, a magneto-optical disc, magnetic tape, or a Universal Serial Bus (USB) drive or a combination of two or more of these. Storage 806 may include removable or non-removable (or fixed) media, where appropriate. Storage 806 may be internal or external to computer system 800, where appropriate. In particular embodiments, storage 806 is non-volatile, solid-state memory. In particular embodiments, storage 806 includes read-only memory (ROM). Where appropriate, this ROM may be mask-programmed ROM, programmable ROM (PROM), erasable PROM (EPROM), electrically erasable PROM (EEPROM), electrically alterable ROM (EAROM), or flash memory or a combination of two or more of these. This disclosure contemplates mass storage 806 taking any suitable physical form. Storage 806 may include one or more storage control units facilitating communication between processor 802 and storage 806, where appropriate. Where appropriate, storage 806 may include one or more storages 806. Although this disclosure describes and illustrates particular storage, this disclosure contemplates any suitable storage.
In particular embodiments, I/O interface 808 includes hardware, software, or both, providing one or more interfaces for communication between computer system 800 and one or more I/O devices. Computer system 800 may include one or more of these I/O devices, where appropriate. One or more of these I/O devices may enable communication between a person and computer system 800. As an example and not by way of limitation, an I/O device may include a keyboard, keypad, microphone, monitor, mouse, printer, scanner, speaker, still camera, stylus, tablet, touch screen, trackball, video camera, another suitable I/O device or a combination of two or more of these. An I/O device may include one or more sensors. This disclosure contemplates any suitable I/O devices and any suitable I/O interfaces 808 for them. Where appropriate, I/O interface 808 may include one or more device or software drivers enabling processor 802 to drive one or more of these I/O devices. I/O interface 808 may include one or more I/O interfaces 808, where appropriate. Although this disclosure describes and illustrates a particular I/O interface, this disclosure contemplates any suitable I/O interface.
In particular embodiments, communication interface 810 includes hardware, software, or both providing one or more interfaces for communication (such as, for example, packet-based communication) between computer system 800 and one or more other computer systems 800 or one or more networks. As an example and not by way of limitation, communication interface 810 may include a network interface controller (NIC) or network adapter for communicating with an Ethernet or other wire-based network or a wireless NIC (WNIC) or wireless adapter for communicating with a wireless network, such as a WI-FI network. This disclosure contemplates any suitable network and any suitable communication interface 810 for it. As an example and not by way of limitation, computer system 800 may communicate with an ad hoc network, a personal area network (PAN), a local area network (LAN), a wide area network (WAN), a metropolitan area network (MAN), or one or more portions of the Internet or a combination of two or more of these. One or more portions of one or more of these networks may be wired or wireless. As an example, computer system 800 may communicate with a wireless PAN (WPAN) (such as, for example, a BLUETOOTH WPAN), a WI-FI network, a WI-MAX network, a cellular telephone network (such as, for example, a Global System for Mobile Communications (GSM) network), or other suitable wireless network or a combination of two or more of these. Computer system 800 may include any suitable communication interface 810 for any of these networks, where appropriate. Communication interface 810 may include one or more communication interfaces 810, where appropriate. Although this disclosure describes and illustrates a particular communication interface, this disclosure contemplates any suitable communication interface.
In particular embodiments, bus 812 includes hardware, software, or both coupling components of computer system 800 to each other. As an example and not by way of limitation, bus 812 may include an Accelerated Graphics Port (AGP) or other graphics bus, an Enhanced Industry Standard Architecture (EISA) bus, a front-side bus (FSB), a HYPERTRANSPORT (HT) interconnect, an Industry Standard Architecture (ISA) bus, an INFINIBAND interconnect, a low-pin-count (LPC) bus, a memory bus, a Micro Channel Architecture (MCA) bus, a Peripheral Component Interconnect (PCI) bus, a PCI-Express (PCIe) bus, a serial advanced technology attachment (SATA) bus, a Video Electronics Standards Association local (VLB) bus, or another suitable bus or a combination of two or more of these. Bus 812 may include one or more buses 812, where appropriate. Although this disclosure describes and illustrates a particular bus, this disclosure contemplates any suitable bus or interconnect.
Herein, a computer-readable non-transitory storage medium or media may include one or more semiconductor-based or other integrated circuits (ICs) (such, as for example, field-programmable gate arrays (FPGAs) or application-specific ICs (ASICs)), hard disk drives (HDDs), hybrid hard drives (HHDs), optical discs, optical disc drives (ODDs), magneto-optical discs, magneto-optical drives, floppy diskettes, floppy disk drives (FDDs), magnetic tapes, solid-state drives (SSDs), RAM-drives, SECURE DIGITAL cards or drives, any other suitable computer-readable non-transitory storage media, or any suitable combination of two or more of these, where appropriate. A computer-readable non-transitory storage medium may be volatile, non-volatile, or a combination of volatile and non-volatile, where appropriate.
Herein, “or” is inclusive and not exclusive, unless expressly indicated otherwise or indicated otherwise by context. Therefore, herein, “A or B” means “A, B, or both,” unless expressly indicated otherwise or indicated otherwise by context. Moreover, “and” is both joint and several, unless expressly indicated otherwise or indicated otherwise by context. Therefore, herein, “A and B” means “A and B, jointly or severally,” unless expressly indicated otherwise or indicated otherwise by context.
The scope of this disclosure encompasses all changes, substitutions, variations, alterations, and modifications to the example embodiments described or illustrated herein that a person having ordinary skill in the art would comprehend. The scope of this disclosure is not limited to the example embodiments described or illustrated herein. Moreover, although this disclosure describes and illustrates respective embodiments herein as including particular components, elements, feature, functions, operations, or steps, any of these embodiments may include any combination or permutation of any of the components, elements, features, functions, operations, or steps described or illustrated anywhere herein that a person having ordinary skill in the art would comprehend. Furthermore, reference in the appended claims to an apparatus or system or a component of an apparatus or system being adapted to, arranged to, capable of, configured to, enabled to, operable to, or operative to perform a particular function encompasses that apparatus, system, component, whether or not it or that particular function is activated, turned on, or unlocked, as long as that apparatus, system, or component is so adapted, arranged, capable, configured, enabled, operable, or operative. Additionally, although this disclosure describes or illustrates particular embodiments as providing particular advantages, particular embodiments may provide none, some, or all of these advantages.
This application claims the benefit of and the priority to U.S. Provisional Application No. 63/160,493 entitled “ATTENTION-BASED MULTIPLE INSTANCE LEARNING FOR WHOLE SLIDE IMAGES” and filed on Mar. 12, 2021, which is hereby incorporated by reference in its entirety for all purposes.
Number | Date | Country | |
---|---|---|---|
63160493 | Mar 2021 | US |
Number | Date | Country | |
---|---|---|---|
Parent | PCT/US2022/020059 | Mar 2022 | US |
Child | 18463585 | US |