The present disclosure relates to digital pathology. More particularly, the present disclosure relates to a method and apparatus for analyzing tissue structures by quantitatively measuring tissue cell nuclei arrangements on a digital histological micrograph.
The function of a tissue is determined to a large extent by the arrangement of the cells and pathologists can obtain a lot of information about the functionality and health of a tissue by looking at the structure of cell arrangements. In particular, in the presence of cancer, the cells lose their ability to grow in well organized structures, such as epithelial layers and their arrangements tend to become more random. Such randomness (or structural entropy) is an important diagnostic measure and pathologists are trained to identify cell arrangements as normal, functional tissue or as non-typical and indicative of a disease.
Normal cells tend to form glands or other structures they are programmed to make. In cancer, cells often lose the ability to arrange themselves, thereby creating tumors. The degree at which these tumors lack cell arrangements is a measure of their aggressiveness. This measure is essential to provide adequate treatment to patients, as the more aggressive the tumor, the more aggressive should the treatment be. Precise understanding of the aggressiveness of a tumor is key to better treatment of cancer patients. Heavy chemotherapy and radiotherapy have serious side-effects that can potentially kill a patient. It is essential to be able to dose these treatments so that they kill the tumor and not the patient.
Prior art methods for identifying cell arrangements by computer have mostly focused on gland structures, attempting to identify them by their lumen, i.e., the visible white opening in the center of a gland. However, such methods are not effective in difficult situations where the glands are not well formed or when lumens are not visible. Other methods use edge detectors to identify local gland edges formed by several nuclei. This approach is also ineffective when the glands are not well formed or when they are touching each other.
Accordingly, a method and an apparatus are needed for reliably analyzing cell nuclei arrangements to determine the functionality and health of a tissue.
A method is disclosed for quantitatively measuring structural entropy of cell nuclei in biopsy tissue. The method comprises the steps of: obtaining a dye color map of an image of the tissue; locating cell nuclei in the dye color map; and measuring, within small groups of cell nuclei, features that determine their structural entropy or degree of organization.
In some embodiments, the step of measuring statistics of structural arrangement features is performed over cliques of neighboring nuclei.
In some embodiments, one of the features comprises an average distance of a given nuclei to its two closest neighbors.
In some embodiments, one of the features comprises an angle formed by a given nuclei and its two closest neighbors.
In some embodiments, one of the features comprises a number of circular sectors around a given nuclei that contain one or more cell nuclei.
In some embodiments, the step of measuring statistics of structural arrangement features is performed over paths of consecutive nuclei wherein a number of nuclei form each of the paths.
In some embodiments, each of the paths is obtained by the steps of: selecting a first cell nucleus not already belonging to a path; finding a second cell nucleus not already belonging to a path within a predetermined distance range of the first nucleus; and finding a third cell nucleus not already belonging to a path within a predetermined distance range of the second nucleus and also within a predetermined angle of the direction formed by the first and second nuclei; and finding consecutive cell nuclei not already belonging to a path within a predetermined distance range of the preceding nucleus and also within a predetermined angle of the direction formed with the previous nucleus and offset by the angle formed by the latter direction and the direction formed by the previous nucleus and its predecessor in the path, until no existing nuclei not already belonging to a path satisfy those constraints.
In some embodiments, instead of considering one nucleus at each step of the path, a set of candidate nuclei is instead selected; the method further comprising the step of: creating a directed acyclic graph of all possible combinations of nuclei that form the path; assigning a cost value to each transition in the graph based on the distance between the two nuclei as well as the angle formed by their direction; and performing a Viterbi algorithm to find the “best” path generating the lowest cost.
In some embodiments, one of the features comprises a number of nuclei in a path.
In some embodiments, one of the features comprises a curvature of a path.
In some embodiments, one of the features comprises the number of inflexion points or change of sign in the curvature of a path.
In some embodiments, the step of measuring statistics of structural arrangement features is performed over closed paths of consecutive nuclei wherein a number of nuclei forming a path closes onto themselves.
In some embodiments, each of the closed paths is obtained by the steps of: selecting a first cell nucleus not already belonging to a path; finding a second cell nucleus not already belonging to a path within a predetermined distance range of the first nucleus; and finding a third cell nucleus not already belonging to a path within a predetermined distance range of the second nucleus and also within a predetermined angle of the direction formed by the first and second nuclei; finding consecutive cell nuclei not already belonging to a path except for the first nuclei of this path, within a predetermined distance range of the preceding nucleus and also within a predetermined angle of the direction formed with the previous nucleus and offset by the angle formed by the latter direction and the direction formed by the previous nucleus and its predecessor in the path, until either the first nuclei of the path satisfies those constraints or no existing nuclei not already belonging to a path satisfy those constraints; and discarding paths for which the last nucleus is not the first nucleus.
In some embodiments, one of the statistics comprises a normalized histogram.
In some embodiments, one of the statistics comprises a percentile value.
In some embodiments, one of the statistics comprises an average value.
In some embodiments, one of the statistics comprises a standard deviation value.
In some embodiments, one of the statistics comprises the number of groups having certain characteristics.
In some embodiments, one of the statistics comprises the number of groups forming closed paths.
Also disclosed is an apparatus for measuring said structural entropy. The apparatus comprises a processor executing instructions for: obtaining a dye color map of the biopsy tissue sample; locating cell nuclei in the dye color map; and measuring, within small groups of cell nuclei, features that determine their structural entropy (or degree of organization).
When cells begin to lose their ability to form organized structures, their proliferation becomes more random and entropy increases. In the present disclosure, a method is described for directly measuring the structural entropy of cell nuclei. The method identifies different cell nuclei arrangements reliably and provides a powerful tool for locating and evaluating the aggressiveness of cancerous tissues.
After preparing the slide, a color scan (digital micrograph) of the slide of the stained biopsy tissue sample(s) is obtained with an imaging device in block 104.
In block 106, the digital micrograph is pre-processed to extract a dye (e.g., hematoxylin) color map. Specifically, individual cell nuclei of the tissue sample(s) are identified and located in the digital micrograph of the slide containing stained biopsy tissue. The resolution of the dye stained images is typically above 100K dpi (4.35 pixels per micron). A succession of filters applied at different resolutions is used to progressively “zoom-in” on regions of interest. In this context, regions of interest comprise regions where large nuclei may be present. A first filter, based on luminance, is used to locate the tissue within the slide. A second filter is used to reject pen markings made on the slide to separate the biopsy tissue samples. A third filter is used to reject areas that are out of focus. After filtering, the average dye colors are adaptively found and the original RGB pixels are projected onto each color channel to obtain dye intensity maps.
Because the staining process is a manual operation involving many steps, the amount of staining used or the thickness of the biopsy tissue sample(s) may affect the final visual appearance of the sample(s), and the imaging process (scanning) may be performed on different systems with different types of light sources and may or may not apply color correction, there is a need to find first for each tissue sample the actual dye colors and then normalize the image based on color vectors corresponding to the dyes (H and E vectors in embodiments using HE-staining). The average H and E vectors are determined with the following equations:
where P is a RGB pixel and PR is the red component of P; and
In block 108, individual nuclei are located from the dye color map that stains the nuclear chromatin (e.g., Hematoxylin in HE-staining embodiments). In one exemplary embodiment, areas containing nuclei are located in the dye color map by running a DoG filter (Difference of Gaussian) over the image. Non-maxima suppression is applied to the resulting map and a list of candidate locations for each nucleus is obtained.
Starting at the previously obtained candidate locations, an Active Contour Algorithm (nuclei finder) is used to find the contours (nuclei). Opposing constraints of elasticity and edge-matching are used to iteratively update the snakes to match the contours of nuclei on the image. Several initial conditions and elastic strengths are used per location, thus producing n·e·i contours, where n is the number of detections in the image, e is the number of different elastic strengths and i is the number of different initializations.
Because of large variations in the appearance of biopsy tissues as well as differences in staining quality, there exist many opportunities for the nuclei finder to produce spurious contours. A typical occurrence of a spurious contour happens when two or more nuclei are clumped together, resulting in one contour encompassing them all. Such spurious contours would easily throw off the statistics of the nuclear area measurements. Fortunately, since there are usually a large number of nuclei in an image, we can afford to ignore those for which the nuclei finder failed to produce a sufficiently trustworthy contour. In order to filter out reliably those spurious contour, a machine-learning approach is used where contours are manually overlaid onto the original digital micrograph. In one embodiment, an interactive graphical interface is used for manually overlaying the contours onto the image. The operator/user simply clicks with a mouse on spurious contours, thus creating a training set of “good” and “spurious” contours to train a classifier to select contours. For the training phase, each contour is encoded into a vector of features. For this task, only features relevant to discriminate between well formed contours that match their underlying image versus spurious contours that do not follow the outline of a nucleus. Hence, a set of features is selected that captures the response of the edge map along the contour; the texture within the contour; the geometry of the contour. Then a classifier, such as Support Vector Machine, is trained to select the contours or nuclei in the color map.
The processes of blocks 102, 104, 106, and 108 are well known in the art, for example, see U.S. Patent Application Publication No. 2009/0297007 A1, entitled Automated Method And System For Nuclear Analysis of Biopsy Images, which is assigned to the assignee herein. The disclosure of U.S. Patent Application Publication No. 2009/0297007 A1, as it pertains to the processes of blocks 102, 104, 106, and 108, is incorporated herein by reference.
In block 110, small groups of the cell nuclei found in block 108, are examined to measure how they are structurally arranged or organized. Micrographs of biopsy tissue typically contain vast amounts of cells. For efficiency reasons, it is therefore essential to restrict structural entropy evaluation to small manageable groups of cells. Once structural features are obtained from those groups of cells, statistics of those features are computed over a region of interest of the tissue. Those statistics represent the structural entropy in that region of the tissue. The present disclosure describes two processes for grouping cell nuclei: cliques and paths. In biology groups or cliques of atoms are sometimes used to characterize proteins and define similarities between molecules. The cliques process of the present disclosure is different, because an individual clique is not used for describing the local properties of the tissue, but instead, a statistical measure over a set of cliques is calculated. The process of forming cliques starts from a given nucleus and adds neighboring nuclei that fit a certain set of constraints. The process of forming paths starts from a nucleus and adds nuclei while trying to form a path with certain directional and shape constraints typical of epithelial layers of glands.
Referring to
Referring to
Referring to
The path process joins the cell nuclei of a group in a path by selecting a cell nucleus and hopping from cell nucleus to cell nucleus following a given set of constraints. Once the constraints cannot be met anymore, the path construction stops and another cell nucleus is selected as the start point. Many numeric feature values can be derived from these paths. For example, in a given area of the micrograph, the number of paths, the average length of the paths, the average regularity of the paths, the curvature of the paths and the number of inflexion points in the paths are good features. Another feature is the proportion of cell nuclei belonging to such paths.
Referring to
An enhancement of this method allows more than one candidate nuclei at each step, thus creating several candidate paths for a given starting cell nucleus. The candidate paths create a directed acyclic graph with edges representing transitions from one cell nucleus to the next. Each edge of the graph is given a cost based on the fitness to the path. A CPU is then programmed with a Viterbi algorithm to find the best path among multiple candidate paths.
Features from cliques and paths are then collected over a certain area of the tissue. Statistics of these features are then extracted to characterize the structure of that tissue area. Such statistics include (but are not limited to) average, standard deviation, percentiles and normalized histograms. Features from cliques and paths can be used individually, combined together or combined with others features to derive a structural measure of the tubularity of a cancer tumor. Tubularity refers to the degree with which cell nuclei arrange in glands (or tubules). If a large proportion of nuclei in a given ROI form paths that are sufficiently long (for example, at least 10 nuclei), then the area can be deemed of high degree of tubularity and should be classified as low probability for cancer. Using cliques statistics would be similar. For example if the histogram of angles (between a considered nuclei and its two closest neighbors) over a given ROI shows a hump around 180 degrees, that would mean that most nuclei are part of a gland (their closest neighbors are to their left and right and not in front or behind). Thus, this ROI shows a high degree of glandularity (or tubularity) and should be classified non-cancer. The diagnosis of cancer/non-cancer can be performed using these features (by themselves or combined with features from other analytics) in a classifier.
In other embodiments of the method and system, the size shape and distribution of chromatin in each nucleus can be determined (prior to the process of block 110) to further improve the characterization of the nuclei arrangements. For example, the cliques and/or paths can be made more reliable by enforcing a constraint that the nuclei of the cliques and/or paths be of similar size and/or shape and/or texture. In addition, the speed of the method and system can be improved by filtering out nuclei of certain sizes and/or shapes and/or textures before extracting cliques.
While exemplary drawings and specific embodiments of the present disclosure have been described and illustrated, it is to be understood that that the scope of the invention as set forth in the claims is not to be limited to the particular embodiments discussed. Thus, the embodiments shall be regarded as illustrative rather than restrictive, and it should be understood that variations may be made in those embodiments by persons skilled in the art without departing from the scope of the invention as set forth in the claims that follow and their structural and functional equivalents.
This application claims the benefit of U.S. Provisional Application No. 61/348,328, filed May 26, 2010, the entire disclosure of which is incorporated herein by reference.
Number | Date | Country | |
---|---|---|---|
61348328 | May 2010 | US |