The invention relates generally to digital image processing and analysis.
Segmentation of ridge-like and blob-like structures is one of the segmentation tasks used in medical and life sciences imaging applications. Such applications typically detect vessels, bronchial trees, bones, and nodules in medical applications, and neurons, nuclei, and membrane structures in microscopy applications. For example, partitioning a multiple channel digital image into multiple segments (regions/compartments) is one of the steps used to quantify one or more biomarkers in molecular cell biology, molecular pathology, and pharmaceutical research.
The methods and systems in part provide a likelihood function estimator that may be adapted to generate probability maps of ridge-like and blob-like structures in images. Such probability maps may be used to encode the segmentation information of different shapes in images using probability values between zero and one. One or more of the example embodiments of the methods iteratively estimates empirical likelihood functions of curvature and intensity based features. Geometric constraints may be imposed on the curvature feature to detect, for example, nuclei or membrane structures in fluorescent images of tissues. The methods may be configured to be non-parametric and to learn the distribution functions from the data. This is an improvement over existing parametric approaches, because the methods enable analysis of arbitrary mixtures of blob and ridge like structures. This is highly valuable for applications, such as in tissue imaging, where a nuclei image in an epithelial tissue comprises both ridge-like and blob-like structures.
An embodiment of one of the methods for segmenting images, generally comprises the steps of, providing an image comprising a plurality of pixels; categorizing the pixels into a plurality of subsets using one or more indexes; determining a log-likelihood function of one or more of the indexes; and generating one or more maps, such as a probability map, based on the determination of the log-likelihood function of one or more of the indexes. The subsets may comprise background pixels, foreground pixels and indeterminate pixels. The indexes may comprise one or more features such as, but not limited to, a shape index, a normalized-curvature index or an intensity value.
The step of determining may comprise estimating the log-likelihood function of one or more of the indexes, wherein the pixels may be categorized using at least three, but not necessarily limited to three, of the indexes and wherein the step of determining a log-likelihood function comprises using two out of the three indexes, for an iteration of the step of determining the log-likelihood function, to estimate the log-likelihood of the third indexes. These indexes may be used to estimate one or more class conditional probabilities and to estimate the log-likelihood of the third feature set, wherein the log-likelihood my be estimated for at least one of the indexes at least in part by estimating one or more decision boundaries. One or more of the decision boundaries may be used to apply one or more monotonicity constraints for one or more log-likelihood functions.
The image may comprise an image of a biological material, such as but not limited to a biological tissue that may comprise one or more cellular structures, wherein the cellular structures may comprise one or more blob-like and ridge-like structures.
An embodiment of a system for segmenting images, generally comprises, a storage device for at least temporarily storing the image; and a processing device that categorizes the pixels into a plurality of subsets using one or more indexes, determines a log-likelihood function of one or more of the indexes, and generates one or more maps based on the determination of the log-likelihood function of one or more of the indexes. The images may comprise, but are not limited to, blob-like and ridge-like structures. For example, one or more of the blob-like structures may comprise at least a portion of a nucleus and one or more of the ridge-like structures may comprise at least a portion of a membrane. One or more of the maps may be a probability map of one or more of the blob-like structures and the ridge-like structures. The image may comprise, but is not limited to, one or more structures selected from a group consisting of: cellular structures, vascular structures, and neural structures.
These and other features, aspects, and advantages of the present invention will become better understood when the following detailed description is read with reference to the accompanying drawings in which like characters represent like parts throughout the drawings, wherein:
a is an image of a retina used to illustrate one of the examples.
b illustrates the segmented foreground pixels based on the shape index and normalized-curvature index for the image shown in
c illustrates the segmented foreground pixels based on the shape index and intensity for the image shown in
d illustrates the segmented foreground pixels based on the intensity and normalized-curvature index for the image shown in
e illustrates an estimated probability map for the image shown in
f illustrates probability values greater than 0.5, indicating the pixels more likely to be vessels than being background, for the image shown in
a-3f illustrate the estimated class conditional distribution and log-likelihood functions of the retina image shown in
a is the image of the retina shown in
b illustrates segmented pixels that have intensity value above a threshold, T for the image shown in
c illustrates segmented pixels when the threshold, T, is decreased by 5% for the image shown in
d illustrates segmented pixels when the threshold, T, is increased by 5% for the image shown in
a illustrates an image of a membrane marker and estimated foreground subsets (white color) and background subsets (black color) based on two of the features used in this example
b illustrates the segmented foreground pixels based on the shape index and normalized-curvature index for the image shown in
c illustrates the segmented foreground pixels based on the shape index and intensity for the image shown in
d illustrates the segmented foreground pixels based on the intensity and normalized-curvature index for the image shown in
e illustrates the estimated probability map for the image shown in
f. illustrates the probability values greater than 0.5, indicating the pixels more likely to be vessels than being background, for the image shown in
a-6f illustrate the estimated class conditional distribution and log-likelihood functions of the membrane image shown in
a illustrates an image of a nuclei marker and estimated foreground subsets (white color) and background subsets (black color) based on two of the features used in this example
b illustrates the segmented foreground pixels based on the shape index and normalized-curvature index for the image shown in
c illustrates the segmented foreground pixels based on the shape index and intensity for the image shown in
d illustrates the segmented foreground pixels based on the intensity and normalized-curvature index for the image shown in
e illustrates the estimated probability map from the empirical log-likelihood function for the image shown in
f illustrates the probability map from the parametric log-likelihood function, for the image shown in
a-8f illustrate the estimated class conditional distribution and log-likelihood functions of the nuclei image shown in
a illustrates an example of raw image intensities for membrane, nuclei and c-Met markers.
b illustrates the detected compartments for the membrane, epithelial nuclei, stromal nuclei and cytoplasm for the image shown in
a illustrates an example of raw image intensities for a retinal image.
b illustrates the detected vasculature network for the image shown in
The quantitation of biomarkers can be accomplished without giving definite decisions for each pixel, but rather computing the likelihood of a pixel belonging to a region. For example, instead of identifying membrane pixels, the likelihood of a pixel being a membrane can be computed, which is essentially the probability of a pixel being a membrane. Such probability maps can be computed using the intensity and geometry information provided by each channel. A likelihood function estimator that calculates the probability maps of membranes and nuclei structures in images is presented. Starting from known initial geometric constraints, the algorithm iteratively estimates empirical likelihood functions of curvature and intensity based features. The distribution functions are learned from the data. This is different than existing parametric approaches, because it can handle arbitrary mixtures of blob-like and ridge-like structures. In applications, such as tissue imaging, a nuclei image in an epithelial tissue comprises, both ridge-like and blob-like structures. Network of membrane structures in tissue images is another example where the intersection of ridges can form structures that are partially blobs. Accurate segmentation of membrane and nuclei structures forms the base for higher level scoring and statistical analysis applications. For example, distribution of a target protein on each of the segmented compartments can be quantified to reveal protein specific pathways. Then the pathway can be related to clinical outcomes.
Retina images are used to illustrate this example embodiment, and are used only to illustrate one or more of the steps of the methods and systems described. Although the steps of the methods are illustrated in this example in connection with the elongated vascular structures of the retina, the steps are equally applicable to other tissues and biological structures.
Eigenvalues of the hessian matrix are used in this example embodiment to detect ridge-like and blob-like structures. Although such eigenvalues are used in this example because of their invariance to rigid transformations, other known feature detection algorithms may be used. The Hessian of an image I(x, y) is defined as
The eigenvalues (λ1(x, y)≦λ2(x, y)) of the Hessian matrix can either be numerically calculated or analytically written in terms of the elements the Hessian Matrix;
The eigenvalues encode the curvature information of the image, and provide useful cues for detecting ridge type membrane structures, or blob type nuclei structures. However the eigenvalues depend on image brightness. Below are two examples of curvature based features that are independent of image brightness;
and refer them as shape index, and normalized-curvature index respectively. This is essentially the same defining the eigenvalues in a polar coordinate system (See
and 0≦φ(x, y)≦π/2.
The image intensity I(x, y) is a significant information source. However, due to brightness variations across different images and within the same image, it is difficult to determine right intensity thresholds, and parameters to adjust for these variations. An intensity histogram of a retina image (
Using known geometric cues, an initial segmentation based on the shape index and the normalized-curvature index separates the image pixels into three subsets: background, foreground, and indeterminate. Indeterminate subset comprises all the pixels that are not included in the background or foreground subsets. From these subsets, the background and foreground intensity distributions, as well as the intensity log-likelihood functions are estimated. The example algorithm used in this embodiment continues iterating by using two out of the three features at a time to estimate the distribution of the feature that is left out. Usually three iterations are usually sufficient for a convergence. As described below, these log-likelihood functions are combined in this embodiment to determine the overall likelihood function. A probability map that represents the probability of a pixel being a foreground may then be calculated.
The log-likelihood functions are estimated based on the assumption that the intensity and the feature vectors defined in Equations 3 and 4 are independent. Notice that these equations are normalized such that they measure a ratio rather than absolute values. The arctangent operation in these equations maps these measures onto a bounded space. If the overall image brightness is increased or decreased, these metrics stay unchanged. Starting with initial log-likelihoods determined based on the known geometry of the ridge-like or blob-like structures, the algorithm uses two out of these three feature sets to estimate the class membership of each pixels (foreground, background, or indeterminate), and use the pixel classes to estimate the class conditional probability, and the log-likelihood of the third feature. This procedure is repeated, either for a certain number of iterations or convergence in log-likelihood functions is achieved.
The following table illustrates example embodiments of algorithms that may be used in the methods and systems. In Step-A, the class memberships are determined based on two of the three features. Note that the union of the foreground pixels, SF, and the background pixels, SB, is a subset of all the pixels. In other words, subsamples are taken from the dataset in which there is a higher confidence that class membership may be determined. In this embodiment, only these points are then used to estimate log-likelihood function of the other feature. In Step-B, the decision boundary is estimated along the direction of the feature that is not used in Step-A. Although not necessary for the estimation of the log-likelihood functions, the decision boundaries can be used for enforcing monotonicity constraints for some of the log-likelihood functions. Step-C estimates the log-likelihood functions as a function of the class conditional functions. For the intensity and normalized-curvature index, the monotonicity constraints are enforced. In this embodiment, this implies that, for example for the intensity feature, the brighter a pixel is the more likely it is to be on a foreground.
The initial log-likelihood functions are defined in this embodiment as
L(f2(x,y))=2ε2(U(φ(x,y)−φM)−0.5). (5)
L(f3(x,y))=ε3(U(θ(x,y)−θL)−U(θ(x,y)−θU)−U(θ(x,y))), (6)
where U is the unit step function, and εi are the likelihood thresholds for each feature. Now using these initial log-likelihoods, the sets in Step-A would be equivalent to the following sets,
SF={(x,y):θL≦θ(x,y)≦θU,φ(x,y)>φM} (7)
SB={(x,y):θ(x,y)≧0,φ(x,y)≦φM}, (8)
where θL=−3π/2, θU=−π/2 for blobs, and θL=−π/2−Δ1, θU=−π/2+Δ2 for ridges. These parameters can be easily derived for different geometric structures. For example, for bright blobs on a dark background, both eigenvalues are negative, hence the angle between them is less than −π/2. Since the angle is relative to the larger eigenvalue, it is bounded by −3π/2. The ridge margins are at small angles, Δ1 and Δ2, for straight ridges they are equal. For the initial sets, subsamples are taken from θ≧0 to observe background pixels. Note that due to noise, the background pixels can have any curvature index. However, in this embodiment only a subset with positive polar curvature is sufficient to estimate the intensity distribution for the background pixels. An initial threshold for normalized-curvature index, φM, is set to the median value of all the normalized-curvature index values.
b shows the initial background (black), foreground (white), and indeterminate (gray) subsets computed using the shape index and the normalized-curvature index for the image shown in
Next, given the initial log-likelihood function of the shape index, and the estimated log-likelihood function of the intensity, the background/foreground subsets may be recomputed, as shown in
In one iterative embodiment, the same procedure is repeated for the shape index. The estimated log-likelihood functions for the intensity and the normalized-curvature index are used to form the background/foreground subsets,
The monotonicity constraint is imposed by first estimating the decision boundaries. An optimal intensity threshold for the intensity and the normalized-curvature index are estimated by maximizing the a Posteriori Probabilities (MAP),
In this example, the goal is to minimize the overall error criteria when the a priori distributions for the background and the foreground are equal. Since an estimate is known, from this example, for the class conditional distributions, the value of the decision threshold is determined by a one-dimensional exhaustive search, rather than any parametric approximations. While there is only one decision boundary along the intensity, and normalized-curvature index dimensions, there can be multiple boundaries along the shape index feature. Therefore, a monotonicity constraint is not imposed on the log-likelihood function of the shape index in this example.
Although the log-likelihood functions are estimated in this example in Step-C, for small values of numerator and denominator, this expression can become undefined or unstable. Therefore, a modified empirical log-likelihood function is defined by imposing the non-decreasing constraint as follows,
where Δ is the bin size of the histogram used to estimated the intensity distributions. Equation 10 is calculated recursively starting from {circumflex over (T)}k estimated by Equation 9. This is used in this example to ensure that the estimated empirical log-likelihood function does not change the decision boundary when the log-likelihood function (L*(fk(x, y))=0) is used for decision. In the above example equation, the index, k, is defined for the first two features, not for all of them, therefore excluding the shape index. Example empirical non-decreasing intensity log-likelihood functions are shown in
The methods and systems described may be used to process and analyze many different kinds of images for any number and type of purposes depending on the analytical tools desired for a given application. The methods and systems are particularly useful for analyzing images that comprise blob-like and/or ridge-like structures, or other similar structures that can be differentiated from one another based at least in part on shape, geographical and/or topographical features. For example, such images may include, but are not limited to, images of biological structures and tissues. For example, the methods and systems are useful for differentiating structures and tissues comprising vascular features, neural features, cellular and subcellular features.
Building again upon the assumption that the features are independent, the joint log-likelihood function can be computed from the individual log-likelihood functions,
A probability map representing the probability of a pixel being a foreground may be calculated from the joint log-likelihood functions as follows,
e, 5e, and 7e show the estimated probability maps for vessel, membrane, and nuclei images, respectively. In this example, as shown in
While the estimated binary decision maps for the vessel and membrane structures comprise accurate segmentation boundaries, the nuclei decision map tends to result in over-segmented regions. This is due to the large amount of light scattering around the nuclei, particularly in between compactly located epithelial nuclei, and inside the ring shaped epithelial nuclei where the scattered light makes relatively bright regions. Since the regions in between nearby nuclei and inside ring nuclei have high curvature and high intensity, these regions adversely contribute to the class conditional estimation of the shape index. A model-based likelihood function that deemphasizes the unexpected geometric structures is fitted to the nuclei log-likelihood functions. The dashed line in
Many molecular markers target either epithelial nuclei or stromal nuclei. Current practice in molecular imaging uses biomarkers such as keratin to differentiate the epithelial tissue from the stromal tissue. However, in this example, the curvature based methods obviate the need for markers to differentiate epithelial tissue from stromal tissue. As a result, the staining process is less complex and makes the biological and optical resources available for multiplexing other targets. The example computational algorithms used in one or more of the example embodiments, exploit the knowledge that epithelial nuclei have membrane structures surrounding them. The nuclei in the epithelial tissue are larger and more densely populated than nuclei in the stromal tissue.
The morphological differences between epithelial and stromal nuclei may be defined in this example, which is for illustration only, by identifying a superset of the nuclei, cytoplasm, and membrane set. For example, S(x, y), when used to denote this superset, may be defined as the union of the detected compartments,
S(x,y)=C(x,y)∪M(x,y)∪N(x,y), (13)
where C(x, y), M(x, y), and N(x, y) denote cytoplasm, membrane, and nuclei pixels. Cytoplasm, in this example, is defined as the union of set of small regions circumscribed by membrane and nuclei pixels. Since the stromal nuclei are not connected through membrane structures, and are sparsely distributed, they can be detected by a connected component analysis of S(x, y). An epithelial mask, E(x, y), may be generated as a union of large connected components of S(x, y). For the sample images in this example, any connected component larger than 800 pixels is accepted as a part of the epithelial mask. The nuclei set is then separated into epithelial nuclei (Ne(x, y)) and stromal nuclei (Ns(x, y)) by masking,
Ne(x,y)=N(x,y)·E(x,y), (14a)
Ns(x,y)=N(x,y)·(1−E(x,y)). (14b)
As noted, the methods and systems may be used in a variety of applications. Segmenting digital images of tissue microarrays is an example of one such application. In this example, multiple channel digital images are segmented into multiple regions (segments/compartments) as one of the steps for quantifying one or more biomarkers. In this example, the quantitation is accomplished without having to make definite decisions for each pixel, but rather by determining the likelihood that a given pixel belongs to a region. For example, instead of identifying membrane pixels, the likelihood of a pixel being a membrane can be computed. This likelihood represents the probability that a given pixel is belongs to a membrane region. Probability maps of these regions may be computed using the intensity and geometry information derived from each channel. For example,
Translocation of a target protein between different regions can be quantified based on the probability maps. The distribution of a target protein (cMet) on each of the regions can be represented by a probability distribution functions (PDF). For example the PDF of the cMet on the membrane is the weighted empirical distribution of the cMet, where the membrane probability map determines weights. A translocation score may then be generated based on one or more or pairs of regions. In this example, there are five regions (membrane, epithelial nuclei, stromal nuclei, cytoplasm, and extra cellular matrix). The translocation score is defined, in this example, as the normalized mean difference between the corresponding PDFs. These translocation scores may be used to reflect clinical outcome or to explore the association with life expectancy.
As noted, the methods and systems may be used to analyze a variety of images. The microscopy images, used in this example, may be calibrated in advance by using fluorescent calibration targets. Such calibration may not possible for some images, such as the retinal image. However, illumination correction techniques may be applied to correct such variations. A commonly used illumination correction technique is homomorphic filtering defined as,
I′(x,y)=exp(log(I(x,y))−log((I(x,y)*G(x,y))), (15)
where I′(x, y) is the new corrected image, G(x, y) is a Gaussian filter, and * is a convolution operation. By replacing the image with the corrected intensities, images with large intensity variations can be segmented more accurately using the same algorithms described. To eliminate any artifacts introduced by the homomorphic filtering, the shape index and the normalized-curvature index is preferably calculated from the original intensity values.
While only certain features of the invention have been illustrated and described herein, many modifications and changes will occur to those skilled in the art. It is, therefore, to be understood that the appended claims are intended to cover all such modifications and changes as fall within the true spirit of the invention.
The automated system 10 (
The storage device may comprise, but is not necessarily limited to, any suitable hard drive memory associated with the processor such as the ROM (read only memory), RAM (random access memory) or DRAM (dynamic random access memory) of a CPU (central processing unit), or any suitable disk drive memory device such as a DVD or CD, or a zip drive or memory card. The storage device may be remotely located from the processor or the means for displaying the images, and yet still be accessed through any suitable connection device or communications network including but not limited to local area networks, cable networks, satellite networks, and the Internet, regardless whether hard wired or wireless. The processor or CPU may comprise a microprocessor, microcontroller and a digital signal processor (DSP).
In one of the embodiments, the storage device 12 and processor 14 may be incorporated as components of an analytical device such as an automated high-throughput system that stains and images tissue micro arrays (TMAs) in one system and still further analyzes the images. System 10 may further comprise a means for displaying 16 one or more of the images; an interactive viewer 18; a virtual microscope 20; and/or a means for transmitting 22 one or more of the images or any related data or analytical information over a communications network 24 to one or more remote locations 26.
The means for displaying 16 may comprise any suitable device capable of displaying a digital image such as, but not limited to, devices that incorporate an LCD or CRT. The means for transmitting 22 may comprise any suitable means for transmitting digital information over a communications network including but not limited to hardwired or wireless digital communications systems. The system may further comprise an automated device 28 for applying one or more of the stains and a digital imaging device 30 such as, but not limited to, an imaging microscope comprising an excitation source 32 and capable of capturing digital images of the TMAs. Such imaging devices are preferably capable of auto focusing and then maintaining and tracking the focus feature as needed throughout processing.
This is a continuation-in-part of U.S. patent application Ser. No. 11/606,582, entitled “System and Methods for Scoring Images of a Tissue Micro Array, filed on Nov. 30, 2006, which is herein incorporated by reference.
Number | Name | Date | Kind |
---|---|---|---|
5888743 | Das | Mar 1999 | A |
5995645 | Soenksen et al. | Nov 1999 | A |
6150173 | Schubert | Nov 2000 | A |
6160617 | Yang | Dec 2000 | A |
6195451 | Kerschmann et al. | Feb 2001 | B1 |
6573043 | Cohen et al. | Jun 2003 | B1 |
6995020 | Capiodieci et al. | Feb 2006 | B2 |
7219016 | Rimm et al. | May 2007 | B2 |
7321881 | Saidi et al. | Jan 2008 | B2 |
7467119 | Saidi et al. | Dec 2008 | B2 |
7483554 | Kotsianti et al. | Jan 2009 | B2 |
7505948 | Saidi et al. | Mar 2009 | B2 |
7709222 | Rimm et al. | May 2010 | B2 |
20020076092 | Ellis et al. | Jun 2002 | A1 |
20020164063 | Heckman | Nov 2002 | A1 |
20020177149 | Rimm et al. | Nov 2002 | A1 |
20020187487 | Goldenring et al. | Dec 2002 | A1 |
20030036855 | Harris et al. | Feb 2003 | A1 |
20030077675 | Das | Apr 2003 | A1 |
20030184730 | Price | Oct 2003 | A1 |
20040023320 | Steiner et al. | Feb 2004 | A1 |
20040197839 | Daniely et al. | Oct 2004 | A1 |
20040248325 | Bukusoglul | Dec 2004 | A1 |
20050267690 | Cong et al. | Dec 2005 | A1 |
20060094868 | Giuliano et al. | May 2006 | A1 |
20070016373 | Hunter et al. | Jan 2007 | A1 |
20070099219 | Teverovskiy et al. | May 2007 | A1 |
20070111251 | Rosania et al. | May 2007 | A1 |
20080118916 | Sood et al. | May 2008 | A1 |
20080118934 | Gerdes et al. | May 2008 | A1 |
20080118944 | Larsen et al. | May 2008 | A1 |
20080144895 | Hunter et al. | Jun 2008 | A1 |
20100062452 | Gustavson et al. | Mar 2010 | A1 |
Number | Date | Country |
---|---|---|
0345953 | Jul 1994 | EP |
1416278 | Nov 2004 | EP |
02086498 | Oct 2002 | WO |
WO2004038418 | May 2004 | WO |
WO2006016697 | Feb 2006 | WO |
2007130677 | Nov 2007 | WO |
2007136724 | Nov 2007 | WO |
2008064067 | May 2008 | WO |
2008133727 | Nov 2008 | WO |
2008133728 | Nov 2008 | WO |
2008133729 | Nov 2008 | WO |
Number | Date | Country | |
---|---|---|---|
20080031521 A1 | Feb 2008 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 11606582 | Nov 2006 | US |
Child | 11680063 | US | |
Parent | 11500028 | Aug 2006 | US |
Child | 11606582 | US |