Method and apparatus for automatically detecting malignancy-associated changes

Information

  • Patent Grant
  • 6493460
  • Patent Number
    6,493,460
  • Date Filed
    Friday, March 26, 1999
    25 years ago
  • Date Issued
    Tuesday, December 10, 2002
    21 years ago
Abstract
A method for detecting malignancy-associated changes. A sample of cells is obtained and stained to identify the nuclear DNA material. The sample is imaged with a digital microscope. Objects of interest are identified in the sample of cells based on the intensity of the pixels that comprise the object versus the average intensity of all pixels in the slide image. An exact edge is located for each object and variations in the illumination intensity of the microscope are compensated for. A computer system calculates feature values for each object and, based on the value of the features, a determination is made whether the cell exhibits malignancy-associated changes or not.
Description




FIELD OF THE INVENTION




The present invention relates to image cytometry systems in general, and in particular to automated systems for detecting malignancy-associated changes in cell nuclei.




BACKGROUND OF THE INVENTION




The most common method of diagnosing cancer in patients is by obtaining a sample of the suspect tissue and examining it under a microscope for the presence of obviously malignant cells. While this process is relatively easy when the location of the suspect tissue is known, it is not so easy when there is no readily identifiable tumor or precancerous lesion. For example, to detect the presence of lung cancer from a sputum sample requires one or more relatively rare cancer cells to be present in the sample. Therefore patients having lung cancer may not be diagnosed properly if the sample does not accurately reflect the conditions of the lung.




Malignancy-associated changes (MACs) are subtle changes that are known to take place in the nuclei of apparently normal cells found near cancer tissue. In addition, MACs have been detected in tissue found near precancerous lesions. Because the cells exhibiting MACs are more numerous than the malignant cells, MACs offer an additional way of diagnosing the presence of cancer, especially in cases where no cancerous cells can be located.




Despite the ability of researchers to detect MACs in patients known to have cancer or a precancerous condition, MACs have not yet achieved wide acceptance as a screening tool to determine whether a patient has or will develop cancer. Traditionally, MACs have been detected by carefully selecting a cell sample from a location near a tumor or precancerous lesion and viewing the cells under relatively high magnification. However, it is believed that the malignancy-associated changes that take place in the cells are too subtle to be reliably detected by a human pathologist working with conventional microscopic equipment, especially when the pathologist does not know beforehand if the patient has cancer or not. For example, a malignancy-associated change may be indicated by the distribution of DNA within the nucleus coupled with slight variations in the shape of the nucleus edge. However, nuclei from normal cells may exhibit similar types of changes but not to the degree that would signify a MAC. Because human operators cannot easily quantify such subtle cell changes, it is difficult to determine which cells exhibit MACs. Furthermore, the changes which indicate a MAC may vary between different types of cancer, thereby increasing the difficulty of detecting them.




SUMMARY OF THE INVENTION




The present invention is a system for automatically detecting malignancy-associated changes in cell samples. The system includes a digital microscope having a CCD camera that is controlled by and interfaced with a computer system. Images captured by the digital microscope are stored in an image processing board and manipulated by the computer system to detect the presence of malignancy-associated changes (MACs). At the present state of the art, it is believed that any detection of MACs requires images to be captured at a high spatial resolution, a high photometric resolution, that all information coming from the nucleus is in focus, that all information belongs to the nucleus (rather than some background), and that there is an accurate and reproducible segmentation of the nucleus and nuclear material. Each of these steps is described in detail below.




To detect the malignancy-associated changes, a cell sample is obtained and stained to identify the nuclear material of the cells and is imaged by the microscope. The stain is stoichiometric and specific to DNA only. The computer system then analyzes the image to compute a histogram of all pixels comprising the image. First, an intensity threshold is set that divides the background pixels from those comprising the objects in the image. All pixels having an intensity value less than the threshold are identified as possible objects of interest while those having an intensity value greater than the threshold are identified as background and are ignored.




For each object located, the computer system calculates the area, shape and optical density of the object. Those objects that could not possibly be cell nuclei are ignored. Next, the image is decalibrated, i.e., corrected by subtracting an empty frame captured before the scanning of the slide from the current frame and adding back an offset value equal to the average background light level. This process corrects for any shading of the system, uneven illumination, and other imperfections of the image acquisition system. Following decalibration, the images of all remaining objects must be captured in a more precise focus. This is achieved by moving the microscope in the stage z-direction in multiple focal planes around the approximate frame focus. For each surviving object a contrast function (a texture feature) is calculated. The contrast function has a peak value at the exact focus of the object. Only the image at the highest contrast value is retained in the computer memory and any object which did not reach such a peak value is also discarded from further considerations.




Each remaining in-focus object on the image is further compensated for local absorbency of the materials surrounding the object. This is a local decalibration which is similar to that described for the frame decalibration described above, except that only a small subset of pixels having an area equal to the area of a square into which the object will fit is corrected using an equivalent square of the empty frame.




After all images are corrected with the local decalibration procedure, the edge of the object is calculated, i.e., the boundary which determines which pixels in the square belong to the object and which belong to the background. The edge determination is achieved by the edge-relocation algorithm. In this process, the edge of the original mask of the first contoured frame of each surviving object is dilated for several pixels inward and outward. For every pixel in this frame a gradient value is calculated, i.e., the sum and difference between all neighbor pixels touching the pixel in question. Then the lowest gradient value pixel is removed from the rim, subject to the condition that the rim is not ruptured. The process continues until such time as a single pixel rim remains. To ensure that the proper edge of an object is located, this edge may be again dilated as before, and the process repeated until such time as the new edge is identical to the previous edge. In this way the edge is calculated along the highest local gradient.




The computer system then calculates a set of feature values for each object. For some feature calculations the edge along the highest gradient value is corrected by either dilating the edge by one or more pixels or eroding the edge by one or more pixels. This is done such that each feature achieves a greater discriminating power between classes of objects and is thus object specific. These feature values are then analyzed by a classifier that uses the feature values to determine whether the object is an artifact or is a cell nucleus. If the object appears to be a cell nucleus, then the feature values are further analyzed by the classifier to determine whether the nucleus exhibits malignancy-associated changes. Based on the number of objects found in the sample that appear to have malignancy-associated changes and/or an overall malignancy-associated score, a determination can be made whether the patient from whom the cell sample was obtained is healthy or harbors a malignant growth.











BRIEF DESCRIPTION OF THE DRAWINGS




The foregoing aspects and many of the attendant advantages of this invention will become more readily appreciated [as the same becomes better understood] by reference to the following detailed description, when taken in conjunction with the accompanying drawings, wherein:





FIG. 1

is a block diagram of the MAC detection system according to the present invention;





FIGS. 2A-2C

are a series of flow charts showing the steps performed by the present invention to detect MACs;





FIG. 3

is an illustrative example of a histogram used to separate objects of interest from the background of a slide;





FIG. 4

is a flow chart of the preferred staining procedure used to prepare a cell sample for the detection of MACs;





FIGS. 5 and 6

are illustrations of objects located in an image;





FIGS. 7A-7F

illustrate how the present invention operates to locate the edge of an object,





FIGS. 8 and 9

are diagrammatic illustrations of a classifier that separates artifacts from cell nuclei and MAC nuclei from non-MAC nuclei; and





FIG. 10

is a flow chart of the steps performed by the present invention to determine whether a patient is normal or abnormal based on the presence of MACs.











DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT




As described above, the present invention is a system for automatically detecting malignancy-associated changes (MACs) in the nuclei of cells obtained from a patient. From the presence or absence of MACs, a determination can be made whether the patient has a malignant cancer.




A block diagram of the MAC detection system according to the present invention is shown in FIG.


1


. The system


10


includes a digital microscope


12


that is controlled by and interfaced with a computer system


30


. The microscope


12


preferably has a digital CCD camera


14


employing a scientific CCD having square pixels of approximately 0.3 μm by 0.3 μm size. The scientific CCD has a 100% fill factor and at least a 256 gray level resolution. The CCD camera is preferably mounted in the primary image plane of a planar objective lens


22


of the microscope


12


.




A cell sample is placed on a motorized stage


20


of the microscope whose position is controlled by the computer system


30


. The motorized stage preferably has an automatic slide loader so that the process of analyzing slides can be completely automated.




A stable light source


18


, preferably with feedback control, illuminates the cell sample while an image of the slide is being captured by the CCD camera. The lens


22


placed between the sample


16


and the CCD camera


14


is preferably a 20×/0.75 objective that provides a depth of field in the range of 1-2 μm that yields a distortion-free image. In the present embodiment of the invention, the digital CCD camera


14


used is the Microimager™ produced by Xillix Technologies Corp. of Richmond, B.C., Canada.




The images produced by the CCD camera are received by an image processing board


32


that serves as the interface between the digital camera


14


and the computer system


30


. The digital images are stored in the image processing board and manipulated to facilitate the detection of MACs. The image processing board creates a set of analog video signals from the digital image and feeds the video signals to an image monitor


36


in order to display an image of the objects viewed by the microscope.




The computer system


30


also includes one or more input devices


38


, such as a keyboard and mouse, as well as one or more peripherals


42


, such as a mass digital. storage device, a modem or a network card for communicating with a remotely located computer, and a monitor


40


.





FIGS. 2A-2C

show the steps performed by the system of the present invention to determine whether a sample exhibits MACs or not. Beginning with a step


50


, a cell sample is obtained. Cells may be obtained by any number of conventional methods such as biopsy, scraping, etc. The cells are affixed to a slide and stained using a modified Feulgen procedure at a step


52


that identifies the nuclear DNA in the sample. The details of the staining procedure are shown in FIG.


4


and described in detail below.




At step


54


, an image of a frame from the slide is captured by the CCD camera and is transferred into the image processor. In this process, the CCD sensor within the camera is cleared and a shutter of the camera is opened for a fixed period that is dependent on the intensity of the light source


18


. After the image is optimized according to the steps described below, the stage then moves to a new position on the slide such that another image of the new frame can be captured by the camera and transferred into the computer memory. Because the cell sample on the slide occupies a much greater area than the area viewed by the microscope, a number of slide images are used to determine whether the sample is MAC-positive or negative. The position of each captured image on the slide is recorded in the computer system so that the objects of interest in the image can be found on the slide if desired.




Once an image from the slide is captured by the CCD camera and stored in the image processing board, the computer system determines whether the image produced by the CCD camera is devoid of objects. This is performed by scanning the digital image for dark pixels. If the number of dark pixels, i.e., those pixels having an intensity of the background intensity minus a predetermined offset value, is fewer than a predetermined minimum, the computer system assumes that the image is blank and the microscope stage is moved to a new position at step


60


and a new image is captured at step


54


.




If the image is not blank, then the computer system attempts to globally focus the image. In general, when the image is in focus, the objects of interest in the image have a maximum darkness. Therefore, for focus determination the height of the stage is adjusted and a new image is captured. The darkness of the object pixels is determined and the process repeats until the average darkness of the pixels in the image is a maximum. At this point, the computer system assumes that global focus has been obtained.




After performing the rough, global focus at step


62


, the computer system computes a histogram of all pixels. As shown in

FIG. 3

, a histogram is a plot of the number of pixels at each intensity level. In the Microimager™-based microscope system, each pixel can have an intensity ranging from 0 (maximum darkness) to 255 (maximum brightness). The histogram typically contains a first peak


90


that represents the average intensity of the background pixels. A second, smaller peak


92


represents the average intensity of the pixels that comprise the objects. By calculating a threshold


94


that lies between the peaks


90


and


92


, it is possible to crudely separate the objects of interest in the image from the background.




Returning to

FIG. 2B

, the computer system computes the threshold that separates objects in the image from the background at step


68


. At a step


72


, all pixels in the cell image having an intensity less than the threshold value are identified. The results of step


72


are shown in FIG.


5


. The frame image


200


contains numerous objects of interest


202


,


204


,


206


. . .


226


. Some of these objects are cell nuclei, which will be analyzed for the presence of MACs, while other objects are artifacts such as debris, dirt particles, white blood cells, etc., and should be removed from the cell image.




Returning to

FIG. 2B

, once the objects in the image have been identified, the computer system calculates the area, shape (sphericity) and optical density of each object according to formulas that are described in further detail below. At a step


76


, the computer system removes from memory any objects that cannot be cell nuclei. In the present embodiment of the invention those objects that are not possibly cell nuclei are identified as having an area greater than 2000 μm


2


, an optical density less than 1 c (i.e., less that ½ of the overall chromosome count of a normal individual) or a shape or sphericity greater than 4.




The results of step


76


are shown in

FIG. 6

where only a few of the previously identified objects of interest remain. Each of the remaining objects is more likely to be a cell nuclei that is to be examined for a malignancy-associated change.




Again returning to

FIG. 2B

, after removing each of the objects that could not be a cell nucleus, the computer system determines whether there are any objects remaining by scanning for dark pixels at step


78


. If no objects remain, the computer system returns to step


54


, a new image on the slide is captured and steps


54


-


76


are repeated.




If there are objects remaining in the image after the first attempt at removing artifacts at step


76


, the computer system then compensates the image for variations in illumination intensity at step


80


. To do this, the computer system recalls a calibration image that was obtained by scanning in a blank slide for the same exposure time that was used for the image of the cells under consideration. The computer system then begins a pixel-by-pixel subtraction of the intensity values of the pixels in the calibration image obtained from the blank slide from the corresponding pixels found in the image obtained from the cell sample. The computer system then adds a value equal to the average illumination of the pixels in the calibration image obtained from the blank slide to each pixel of the cell image. The result of the addition illuminates the cell image with a uniform intensity.




Once the variations in illumination intensity have been corrected, the computer system attempts to refine the focus of each object of interest in the image at step


82


(FIG.


2


C). The optimum focus is obtained when the object has a minimum size and maximum darkness. The computer system therefore causes the stage to move a predefined amount above the global focus position and then moves in a sequence of descending positions. At each position the CCD camera captures an image of the frame and calculates the area and the intensity of the pixels comprising the remaining objects. Only one image of each object is eventually stored in the computer memory coming from the position in which the pixels comprising the object have the maximum darkness and occupy a minimum area. If the optimum focus is not obtained after a predetermined number of stage positions, then the object is removed from the computer memory and is ignored. Once the optimum focus of the object is determined, the image received from the CCD camera overwrites those pixels that comprise the object under consideration in the computer's memory. The result of the local focusing produces a pseudofocused image in the computer's memory whereby each object of interest is ultimately recorded at its best possible focus.




At a step


84


, the computer system determines whether any in-focus objects in the cell image were found. If not, the computer system returns to step


54


shown in

FIG. 2A

whereby the slide is moved to another position and a new image is captured.




Once an image of the object has been focused, the computer system then compensates for local absorbency of light near the object at a step


85


. To do this, the computer system analyzes a number of pixels within a box having an area that is larger than the object by two pixels on all sides. An example of such a box is the box


207


shown in FIG.


6


. The computer system then performs a pixel-by-pixel subtraction of the intensity values from a corresponding square in the calibration image obtained from the blank slide. Next the average illumination intensity of the calibration image is added to each pixel in the box surrounding the object. Then the average intensity value for those pixels that are in the box but are not part of the object is determined and this local average value is then subtracted from each pixel in the box that encloses the object.




Once the compensation for absorbency around the object has been made, the computer system then determines a more precise edge of each remaining object in the cell image at step


86


. The steps required to compute the edge are discussed in further detail below.




Having compensated for local absorbency and located the precise edge of the object, the computer system calculates a set of features for each remaining object at a step


87


. These feature values are used to further separate artifacts from cell nuclei as well as to identify nuclei exhibiting MACs. The details of the feature calculation are described below.




At a step


88


, the computer system runs a classifier that compares the feature values calculated for each object and determines whether the object is an artifact and, if not, whether the object is a nucleus that exhibits MACs.




At a step


90


, the pseudofocus digital image, the feature calculations and the results of the classifier for each in-focus object are stored in the computer's memory.




Finally, at a step


92


, the computer system determines whether further scans of the slide are required. As indicated above, because the size of each cell image is much less than the size of the entire slide, a number of cell images are captured to ensure that the slide has been adequately analyzed. Once a sufficient number of cell images have been analyzed, processing stops at step


94


. Alternatively, if further scans are required, the computer system loops back to step


54


and a new image of the cell sample is captured.




As indicated above, before the sample can be imaged by the digital microscope, the sample is stained to identify the nuclear material.





FIG. 4

is a flow chart of the steps used to stain the cell samples. Beginning at a step


100


, the cell sample is placed on a slide, air dried and then soaked in a 50% glycerol solution for four minutes. The cell is then washed in distilled water for two minutes at a step


102


. At a step


104


, the sample is bathed in a 50% ethanol solution for two minutes and again washed with distilled water for two minutes at a step


106


. The sample is then soaked in a Bohm-Sprenger solution for 30 minutes at a step


108


followed by washing with distilled water for one minute at a step


110


. At step


112


, the sample is soaked in a 5N HCl solution for 45 minutes and rinsed with distilled water for one minute at a step


114


. The sample is then stained in a thionine stain for 60 minutes at a step


116


and rinsed with distilled water for one minute at a step


118


.




At step


120


, the sample is soaked in a bisulfite solution for six minutes followed by a rinse for one minute with distilled water at a step


122


. Next, the sample is dehydrated in solutions of 50%, 75% and 100% ethanol for approximately 10 seconds each at a step


124


. The sample is then soaked in a final bath of xylene for one minute at a step


126


before a cover slip is applied at a step


128


. After the cell sample has been prepared, it is ready to be imaged by the digital microscope and analyzed as described above.





FIGS. 7A-7F

illustrate the manner in which the present invention calculates the precise edge of an object. As shown in

FIG. 7A

, an object


230


is comprised of those pixels having an intensity value less than the background/object threshold which is calculated from the histogram and described above. In order to calculate the precise edge, the pixels lying at the original edge of the object are dilated to form a new edge region


242


. A second band of pixels lying inside the original edge are also selected to form a second edge region


244


. The computer system then assumes that the true edge is somewhere within the annular ring bounded by the edge regions


242


and


244


. In the presently preferred embodiment of the invention, the annular ring has a width of approximately ten pixels. To determine the edge, the computer calculates a gradient for each pixel contained in the annular ring. The gradient for each pixel is defined as the sum of the differences in intensity between each pixel and its surrounding eight neighbors. Those pixels having neighbors with similar intensity levels will have a low gradient while those pixels at the edge of the object will have a high gradient.




Once the gradients have been calculated for each pixel in the annular ring, the computer system divides the range of gradients into multiple thresholds and begins removing pixels having lower gradient values from the ring. To remove the pixels, the computer scans the object under consideration in a raster fashion. As shown in

FIG. 7C

, the raster scan begins at a point A and continues to the right until reaching a point B. During the first scan, only pixels on the outside edge, i.e., pixels on the edge region


242


, are removed. The computer system then scans in the opposite direction by starting, for example, at point D and continuing upwards to point B returning in a raster fashion while only removing pixels on the inside edge, region


244


of the annular ring. The computer system then scans in another orthogonal direction--for example, starting at point C and continuing in the direction of point D in a raster fashion, this time only removing pixels on the outside edge region


242


. This process continues until no more pixels at that gradient threshold value can be removed.




Pixels are removed from the annular ring subject to the conditions that no pixel can be removed that would break the chain of pixels around the annular ring. Furthermore, adjacent pixels cannot be removed during the same pass of pixel removal. Once all the pixels are removed having a gradient that is less than or equal to the first gradient threshold, the threshold is increased and the process starts over. As shown in

FIG. 7D

, the pixel-by-pixel removal process continues until a single chain of pixels


240


′ encircles the object in question.




After locating the precise edge of an object, it is necessary to determine whether those pixels that comprise the edge should be included in the object. To do this, the intensity of each pixel that comprises the newly found edge is compared with its eight neighbors. As shown in

FIG. 7E

, for example, the intensity of a pixel


246


is compared with its eight surrounding pixels. If the intensity of pixel


246


is less than the intensity of pixel


250


, then the pixel


246


is removed from the pixel chain as it belongs to the background. To complete the chain, pixels


248


and


252


are added so that the edge is not broken as shown in FIG.


7


F. After completing the edge relocation algorithm and determining whether each pixel should be included in the object of interest, the system is ready to compute the feature values for the object.




Once the features have been calculated for each in-focus object, the computer system must make a determination whether the object is a cell nucleus that should be analyzed for malignancy-associated changes or is an artifact that should be ignored. As discussed above, the system removes obvious artifacts based on their area, shape (sphericity) and optical density. However, other artifacts may be more difficult for the computer to recognize. To further remove artifacts, the computer system uses a classifier that interprets the values of the features calculated for the object.




As shown in

FIG. 8

, a classifier


290


is a computer program that analyzes an object based on its feature values. To construct the classifier two databases are used. The first database


275


contains feature values of objects that have been imaged by the system shown in FIG.


1


and that have been previously identified by an expert pathologist as non-nuclei, i.e., artifacts. A second database


285


contains the features calculated for objects that have been imaged by the system and that have been previously identified by an expert as cell nuclei. The data in each of these databases is fed into a statistical computer program which uses a stepwise linear discriminant function analysis to derive a discriminant function that can distinguish cell nuclei from artifacts. The classifier is then constructed as a binary decision tree based on thresholds and/or the linear discriminant functions. The binary tree answers a series of questions based on the feature values to determine the identity of an object.




The particular thresholds used in the binary tree are set by statisticians who compare histograms of feature values calculated on known objects. For example, white blood cells typically have an area less than 50 μm


2


. Because the present invention treats a red blood cell as an artifact, the binary decision tree can contain a node that compares the area of an object to the 50 μm


2


threshold. Objects with an area less than the threshold are ignored while those with an area having a greater area are further analyzed to determine if they are possible MAC cells or artifacts.




In the presently preferred embodiment of the invention, the discriminant functions that separate types of objects are generated by the BMDP program available from BMDP Statistical Software, Inc., of Los Angeles, Calif. Given the discriminant functions and the appropriate thresholds, the construction of the binary tree classifier is considered routine for one of ordinary skill in the art.




Once the binary tree classifier has been developed, it can be supplied with a set of feature values


292


taken from an unknown object and will provide an indication


294


of whether the object associated with the feature data is most likely an artifact or a cell nucleus.





FIG. 9

shows how a classifier is used to determine whether a slide exhibits malignancy-associated changes or not. The classifier


300


is constructed using a pair of databases. A first database


302


contains feature values obtained from apparently normal cells that have been imaged by the digital microscope system shown in FIG.


1


and are known to have come from healthy patients. A second database


304


contains feature values calculated from apparently normal cells that were imaged by the digital microscope system described above and were known to have come from abnormal (i.e., cancer) patients. Again, classifier


300


used in the presently preferred embodiment of the invention is a binary decision tree made up of discriminant functions and/or thresholds that can separate the two groups of cells. Once the classifier has been constructed, the classifier is fed with the feature values


306


that are obtained by imaging cells obtained from a patient whose condition is unknown. The classifier provides a determination


308


of whether the nuclei exhibit MACs or not.





FIG. 10

is a flow chart of the steps performed by the present invention to determine whether a patient potentially has cancer. Beginning at a step


325


, the computer system recalls the features calculated for each in-focus nuclei on the slide. At a step


330


, the computer system runs the classifier that identifies MACs based on these features. At a step


332


, the computer system provides an indication of whether the nucleus in question is MAC-positive or not. If the answer to step


332


is yes, then an accumulator that totals the number of MAC-positive nuclei for the slide is increased at a step


334


. At a step


336


, the computer system determines whether all the nuclei for which features have been calculated have been analyzed. If not, the next set of features is recalled at step


338


and the process repeats itself. At a step


340


, the computer system determines whether the frequency of MAC-positive cells on the slide exceeds a predetermined threshold. For example, in a particular preparation of cells (air dried, as is the practice in British Columbia, Canada) to detect cervical cancer, it has been determined that if the total number of MAC-positive epithelial cells divided by the total number of epithelial cells analyzed exceeds 0.45 per slide, then there is an 85% chance that the patient has or will develop cancer. If the frequency of cells exhibiting MACs exceeds the threshold, the computer system can indicate that the patient is healthy at step


342


or likely has or will develop cancer at step


344


.




The threshold above which it is likely that a patient exhibiting MACs has or will develop cancer is determined by comparing the MAC scores of a large numbers of patients who did develop cancer and those who did not. As will be appreciated by those skilled in the art, the particular threshold used will depend on the type of cancer to be detected, the equipment used to image the cells, etc.




The MAC detection system of the present invention can also be used to determine the efficacy of cancer treatment. For example, patients who have had a portion of a lung removed as a treatment for lung cancer can be asked to provide a sample of apparently normal cells taken from the remaining lung tissue. If a strong MAC presence is detected, there is a high probability that the cancer will return. Conversely, the inventors have found that the number of MAC cells decreases when a cancer treatment is effective.




As described above, the ability of the present invention to detect malignancy-associated changes depends on the values of the features computed. The following is a list of the features that is currently calculated for each in-focus object.




I.2 Coordinate Systems, Jargon and Notation




Each image is a rectangular array of square pixels that contains within it the image of an (irregularly shaped) object, surrounded by background. Each pixel P


i,j


is an integer representing the photometric value (gray scale) of a corresponding small segment of the image, and may range from 0 (completely opaque) to 255 (completely transparent). The image rectangle is larger than the smallest rectangle that can completely contain the object by at least two rows, top and bottom, and two columns left and right, ensuring that background exists all around the object. The rectangular image is a matrix of pixels, P


i,j


, spanning i=1, L columns and j=1, M rows and with the upper left-hand pixel as the coordinate system origin, i=j=1.




The region of the image that is the object is denoted by its characteristic function, Ω; this is also sometimes called the “object mask” or, simply, the “mask.” For some features, it makes sense to dilate the object mask by one pixel all around the object; this mask is denoted Ω


+


. Similarly, an eroded mask is denoted Ω





. The object mask is a binary function:






Ω=(Ω


1,1





1,2


,KΩ


i,j


,KΩ


L,M


)  (1)






where







Ω

i
,
j


=

{



1




if






(

i
,
j

)



object





0




if






(

i
,
j

)



object















and where “(i,j) ε object” means pixels at coordinates: (i, j) are part of the object, and “(i,j) ∉ object” means pixels at coordinates: (i, j) are not part of the object.




II Morphological Features




Morphological features estimate the image area, shape, and boundary variations of the object.




II.1 area




The area, A, is defined as the total number of pixels belonging to the object, as defined by the mask, Ω:









area
=

A
=




i
=
1

L










j
=
1

M







Ω
ij








(
2
)













where i, j and Ω are defined in Section I.2 above.




II.2 x_centroid, y_centroid




The x_centroid and y_centroid are the coordinates of the geometrical center of the object, defined with respect to the image origin (upper-left hand corner):









x_centroid
=





i
=
1

L










j
=
1

M







i
·

Ω

i
,
j





A





(
3
)






y_centroid
=





i
=
1

L










j
=
1

M







i
·

Ω

i
,
j





A





(
4
)













where i and j are the image pixel coordinates and Ω is the object mask, as defined in Section 1.2 above, and A is the object area.




II.3 mean_radius, max_radius




The mean_radius and max_radius features are the mean and maximum values of the length of the object's radial vectors from the object centroid to its 8 connected edge pixels:









mean_radius
=


r
_

=





k
=
1

N







r
k


N






(
5
)









 max_radius=max(


r




k


)  (6)




where r


k


is the k


th


radial vector, and N is the number of 8 connected pixels on the object edge.




II.4 var_radius




The var_radius feature is the variance of length of the object's radius vectors, as defined in Section II.3.









var_radius
=





k
=
1

N








(


r
k

-

r
_


)

2



N
-
1






(
7
)













where r


k


is the k


th


radius vector, {overscore (r)} is the mean_radius, and N is the number of 8 connected edge pixels.




II.5 sphericity




The sphericity feature is a shape measure, calculated as a ratio of the radii of two circles centered at the object centroid (defined in Section II.2 above). One circle is the largest circle that is fully inscribed inside the object perimeter, corresponding to the absolute minimum length of the object's radial vectors. The other circle is the minimum circle that completely circumscribes the object's perimeter, corresponding to the absolute maximum length of the object's radial vectors. The maximum sphericity value: 1 is given for a circular object:









sphericity
=


min_radius
max_radius

=


min


(

r
k

)



max


(

r
k

)








(
8
)













where r


k


is the k


th


radius vector.




II.6 eccentricity




The eccentricity feature is a shape function calculated as the square root of the ratio of maximal and minimal eigenvalues of the second central moment matrix of the object's characteristic function, Ω:









eccentricity
=



λ
1


λ
2







(
9
)













where λ


1


and λ


2


are the maximal and minimal eigenvalues, respectively, and the characteristic function, Ω, as given by Equation 1. The second central moment matrix is calculated as:










[




x

moment





2





xy

crossmoment





2







xy

crossmoment





2





y

moment





2





]

=

&AutoLeftMatch;

[







i
=
1

L










j
=
1

M







(

i
-





i
=
1

L







i
·

Ω

i
,
j




L


)









i
=
1

L










j
=
1

M








(

i
-





i
=
1

L







i
·

Ω

i
,
j




L


)



(

j
-





j
=
1

M







j
·

Ω

i
,
j




M


)












i
=
1

L










j
=
1

M








(

i
-





i
=
1

L







i
·

Ω

i
,
j




L


)



(

j
-





j
=
1

M







j
·

Ω

i
,
j




M


)










i
=
1

L










j
=
1

M







(

j
-





i
=
1

L







j
·

Ω

i
,
j




M


)






]






(
10
)













Eccentricity may be interpreted as the ratio of the major axis to minor axis of the “best fit” ellipse which describes the object, and gives the minimal value 1 for circles.




II.7 inertia_shape




The inertia_shape feature is a measure of the “roundness” of an object calculated as the moment of inertia of the object mask, normalized by the area squared, to give the minimal value 1 for circles:









inertia_shape
=


2

π





i
=
1

L










j
=
1

M








R

i
,
j

2



Ω

i
,
j







A
2






(
11
)













where R


i,j


is the distance of the pixel, P


i,j


, to the object centroid (defined in Section II.2), and A is the object area, and Ω is the mask defined by Equation 1.




II.8 compactness




The compactness feature is another measure of the object's “roundness.” It is calculated as the perimeter squared divided by the object area, giving the minimal value 1 for circles:









compactness
=


P
2


4

π





A






(
12
)













where P is the object perimeter and A is the object area. Perimeter is calculated from boundary pixels (which are themselves 8 connected) by considering their 4 connected neighborhood:








P=N




1


+{square root over (2)}


N




2


+2


N




3


  (13)






where N


1


is the number of pixels on the edge with 1 non-object neighbor, N


2


is the number of pixels on the edge with 2 non-object neighbors, and N


3


is the number of pixels on the edge with 3 non-object neighbors.




II.9 cell_orient




The cell_orient feature represents the object orientation measured as a deflection of the main axis of the object from the y direction:









cell_orient
=


180
π



(


π
2

+

arctan


[


(


λ
1

-

y

moment





2



)


xy

cross_moment





2



]



)






(
14
)













where Y


moment2


and xy


crossmoment2


are the second central moments of the characteristic function Ω defined by Equation 1 above, and λ


1


is the maximal eigenvalue of the second central moment matrix of that function (see Section II.6 above). The main axis of the object is defined by the eigenvector corresponding to the maximal eigenvalue. A geometrical interpretation of the cell_orient is that it is the angle (measured in a clockwise sense) between the y axis and the “best fit” ellipse major axis.




For slides of cell suspensions, this feature should be meaningless, as there should not be any a priori preferred cellular orientation. For histological sections, and possibly smears, this feature may have value. In smears, for example, debris may be preferentially elongated along the slide long axis.




II.10 elongation




Features in Sections II.10 to II.13 are calculated by sweeping the radius vector (from the object centroid, as defined in Section II.2, to object perimeter) through 128 discrete equal steps (i.e., an angle of 2π/128 per step), starting at the top leftmost object edge pixel, and sweeping in a clockwise direction. The function is interpolated from an average of the object edge pixel locations at each of the 128 angles.




The elongation feature is another measure of the extent of the object along the principal direction (corresponding to the major axis) versus the direction normal to it. These lengths are estimated using Fourier Transform coefficients of the radial function of the object:









elongation
=



a
0

+

2




a
2
2

+

b
2
2







a
0

-

2




a
2
2

+

b
2
2










(
15
)













where a


2


,b


2


are Fourier Transform coefficients of the radial function of the object, r(θ), defined by:










r


(
θ
)


=



a
0

2

+




n
=
1

m








a
n



cos


(

n





θ

)




+




n
=
1

m








b
n



sin


(

n





θ

)









(
16
)













II.11 freq_low_fft




The freq_low_fft gives an estimate of coarse boundary variation, measured as the energy of the lower harmonics of the Fourier spectrum of the object's radial function (from 3rd to 11th harmonics):










freq_low

_fft

=




n
=
3

11







(


a
n
2

+

b
n
2


)






(
17
)













where a


n


,b


n


are Fourier Transform coefficients of the radial function, defined in Equation 16.




II.12 freq_high_fft




The freq_high_fft gives an estimate of the fine boundary variation, measured as the energy of the high frequency Fourier spectrum (from 12th to 32nd harmonics) of the object's radial function:










freq_high

_fft

=




n
=
12

32







(


a
n
2

+

b
n
2


)






(
18
)













where a


n


,b


n


are Fourier Transform coefficients of the n


th


harmonic, defined by Equation 16.




II.13 harmon


01


_fft, . . . , harmon


32


_fft




The harmon


01


_fft, . . . , harmon


32


_fft features are estimates of boundary variation, calculated as the magnitude of the Fourier Transform coefficients of the object radial function for each harmonic 1-32:










harmon





n_fft

=



a
n
2

+

b
n
2







(
19
)













where a


n


,b


n


are Fourier Transform coefficients of the n


th


harmonic, defined by Equation 16.




III Photometric Features




Photometric features give estimations of absolute intensity and optical density levels of the object, as well as their distribution characteristics.




III.1 DNA_Amount




DNA_Amount is the “raw” (unnormalized) measure of the integrated optical density of the object, defined by a once dilated mask, Ω


+


:









DNA_Amount
=




i
=
1

L










j
=
1

M








OD

i
,
j




Ω

i
,
j

+








(
20
)













where the once dilated mask, Ω


+


is defined in Section I.2 and OD is the optical density, calculated according to [12]:








OD




i,j


=log


10




I




B


−log


10




I




i,j


  (21)






where I


B


is the intensity of the local background, and I


i,j


is the intensity of the i,j th pixel.




III.2 DNA_Index




DNA_Index is the normalized measure of the integrated optical density of the object:









DNA_Index
=

DNA_Amount

iod
norm






(
22
)













where iod


norm


is the mean value of the DNA amount for a particular object population from the slide (e.g., leukocytes).




III.3 var_intensity, mean_intensity




The var_intensity and mean_intensity features are the variance and mean of the intensity function of the object, I, defined by the mask, Ω:









var_intensity
=





i
=
1

L










j
=
1

M








(



I

i
,
j




Ω

i
,
j



-

I
_


)

2




A
-
1






(
23
)













where A is the object area, Ω is the object mask defined in Equation 1, and {overscore (I)} is given by:










I
_

=





i
=
1

L










j
=
1

M








I

i
,
j




Ω

i
,
j





A





(
24
)













{overscore (I)} is the “raw” (unnormalized) mean intensity.




mean intensity is normalized against iod


norm


defined in Section III.2:









mean_intensity
=


I
_




(

iod
norm

)

100






(
25
)













III.4 OD_maximum




OD_maximum is the largest value of the optical density of the object, normalized to iod


norm


, as defined in Section III.2 above:









OD_maximum
=


max


(

OD

i
,
j


)




(

100

iod
norm


)






(
26
)













III.5 OD_variance




OD_variance is the normalized variance (second moment) of optical density function of the object:









OD_variance
=





i
=
1

L










j
=
1

M








(



OD

i
,
j




Ω

i
,
j



-

OD
_


)

2





(

A
-
1

)




OD
_

2







(
27
)













where Ω is the object mask as defined in Section I.2, {overscore (OD)} is the mean value of the optical density of the object:







OD
_

=

(





i
=
1

L










j
=
1

M








OD

i
,
j




Ω

i
,
j





A

)











and A is the object area (total number of pixels). The variance is divided by the square of the mean optical density in order to make the measurement independent of the staining intensity of the cell.




III.6 OD_skewness




The OD_skewness feature is the normalized third moment of the optical density function of the object:









OD_skewness
=





i
=
1

L










j
=
1

M








(



OD

i
,
j




Ω

i
,
j



-

OD
_


)

3





(

A
-
1

)




(




i
=
1

L










j
=
1

M








(



OD

i
,
j




Ω

i
,
j



-

OD
_


)

2



)


3
2








(
28
)













where Ω is the object mask as defined in Section 1.2, {overscore (OD)} is the mean value of the optical density of the object and A is the object area (total number of pixels).




III.7 OD_kurtosis




OD_kurtosis is the normalized fourth moment of the optical density function of the object:









OD_kurtosis
=





i
=
1

L










j
=
1

M








(



OD

i
,
j




Ω

i
,
j



-

OD
_


)

4





(

A
-
1

)




(




i
=
1

L










j
=
1

M








(



OD

i
,
j




Ω

i
,
j



-

OD
_


)

2



)

2







(
29
)













where Ω is the object mask as defined in Section 1.2, {overscore (OD)} is the mean value of the optical density of the object and A is the object area.




IV Discrete Texture Features




The discrete texture features are based on segmentation of the object into regions of low, medium and high optical density. This segmentation of the object into low, medium and high density regions is based on two thresholds: optical density high threshold and optical density medium threshold. These thresholds are scaled to the sample's iod


norm


value, based on the DNA amount of a particular subset of objects (e.g., lymphocytes), as described in Section III.2 above.




By default, these thresholds have been selected such that the condensed chromatin in leukocytes is high optical density material. The second threshold is located half way between the high threshold and zero.




The default settings from which these thresholds are calculated are stored in the computer as:






CHROMATIN_HIGH_THRES=36








CHROMATIN_MEDIUM_THRES=18






A


high


is the area of the pixels having an optical density between 0 and 18, A


med.


is the area of the pixels having an optical density between 18 and 36 and A


low


is the area of the pixels having an optical density greater than 36. Together the areas A


high


, A


med


and A


low


sum to the total area of the object. The actual thresholds used are these parameters, divided by 100, and multiplied by the factor iod


norm


/100.




In the following discussion, Ω


low


, Ω


med


, and Ω


high


are masks for low-, medium-, and high-optical density regions of the object, respectively, defined in analogy to Equation 1.




IV.1 lowDNAarea, medDNAarea, hiDNAarea




These discrete texture features represent the ratio of the area of low, medium, and high optical density regions of the object to the total object area:









lowDNAarea
=






i
=
1

L










j
=
1

M







Ω

i
,
j

low







i
=
1

L










j
=
1

M







Ω

i
,
j





=


A
low

A






(
30
)






medDNAarea
=






i
=
1

L










j
=
1

M







Ω

i
,
j

med







i
=
1

L










j
=
1

M







Ω

i
,
j





=


A
med

A






(
31
)






hiDNAarea
=






i
=
1

L










j
=
1

M







Ω

i
,
j

hi







i
=
1

L










j
=
1

M







Ω

i
,
j





=


A
hi

A






(
32
)













where Ω is the object mask as defined in Equation 1, and A is the object area.




IV.2 lowDNAamnt, medDNAamnt, hiDNAamnt




These discrete texture features represent the total extinction ratio for low, medium, and high optical density regions of the object, calculated as the value of the integrated optical density of the low-, medium-, and high-density regions, respectively, divided by the total integrated optical density:









lowDNAamnt
=





i
=
1

L










j
=
1

M








OD

i
,
j




Ω

i
,
j

low








i
=
1

L










j
=
1

M








OD

i
,
j




Ω

i
,
j










(
33
)






medDNAamnt
=





i
=
1

L










j
=
1

M








OD

i
,
j




Ω

i
,
j

med








i
=
1

L










j
=
1

M








OD

i
,
j




Ω

i
,
j










(
34
)






hiDNAamnt
=





i
=
1

L










j
=
1

M








OD

i
,
j




Ω

i
,
j

hi








i
=
1

L










j
=
1

M








OD

i
,
j




Ω

i
,
j










(
35
)













where Ω is the object mask as defined in Equation 1, and OD is the optical density as defined by Equation 21.




IV.3 lowDNAcomp, medDNAcomp, hiDNAcomp, mhDNAcomp




These discrete texture features are characteristic of the compactness of low-, medium-, high-, and combined medium- and high-density regions, respectively, treated as single (possibly disconnected) objects. They are calculated as the perimeter squared of each region, divided by 4π (area) of the region.









lowDNAcomp
=



(

P
low

)

2


4

π






A
low







(
36
)






medDNAcomp
=



(

P
med

)

2


4

π






A
med







(
37
)






hiDNAcomp
=



(

P
hi

)

2


4

π






A
hi







(
38
)






mhDNAcomp
=



(


P
med

+

P
hi


)

2


4


π


(


A
med

+

A
hi


)








(
39
)













where P is the perimeter of each of the optical density regions, defined in analogy to Equation 13, and A is the region area, defined in analogy to Equation 2.




IV.4 low_av_dst, med_av_dst, hi_av_dst, mh_av_dst




These discrete texture features represent the average separation between the low-, medium-, high-, and combined medium- and high-density pixels from the center of the object, normalized by the object mean_radius.










low_av

_dst

=





i
=
1

L










j
=
1

M








R

i
,
j




Ω

i
,
j

low






A
low

·
mean_radius






(
40
)







med_av

_dst

=





i
=
1

L










j
=
1

M








R

i
,
j




Ω

i
,
j

med






A
med

·
mean_radius






(
41
)







hi_av

_dst

=





i
=
1

L










j
=
1

M








R

i
,
j




Ω

i
,
j

hi






A
hi

·
mean_radius






(
42
)







mh_av

_dst

=






i
=
1

L










j
=
1

M








R

i
,
j




Ω

i
,
j

med




+




i
=
1

L










j
=
1

M








R

i
,
j




Ω

i
,
j

hi







(


A
med

+

A
hi


)

·
mean_radius






(
43
)













where R


i,j


is defined in Section II.7 as the distance from pixel P


i,j


to the object centroid (defined in Section II.2), and the object mean_radius is defined by Equation 5.




IV.5 lowVSmed_DNA, lowVShigh_DNA, lowVSmh_DNA




These discrete texture features represent the average extinction ratios of the low-density regions, normalized by the medium-, high-, and combined medium- and high-average extinction values, respectively. They are calculated as the mean optical density of the medium-, high-, and combined medium- and high-density clusters divided by the mean optical density of the low density clusters.









lowVSmed_DNA
=


(





i
=
1

L










j
=
1

M








OD

i
,
j




Ω

i
,
j

med





A
med


)

÷

(





i
=
1

L










j
=
1

M








OD

i
,
j




Ω

i
,
j

low





A
low


)






(
44
)






lowVShi_DNA
=


(





i
=
1

L










j
=
1

M








OD

i
,
j




Ω

i
,
j

hi





A
hi


)

÷

(





i
=
1

L










j
=
1

M








OD

i
,
j




Ω

i
,
j

low





A
low


)






(
45
)






lowVSmh_DNA
=


(






i
=
1

L










j
=
1

M








OD

i
,
j




Ω

i
,
j

med




+




i
=
1

L










j
=
1

M








OD

i
,
j




Ω

i
,
j

hi







A
med

+

A
hi



)

÷

(





i
=
1

L










j
=
1

M








OD

i
,
j




Ω

i
,
j

low





A
low


)






(
46
)













where OD is the region optical density defined in analogy to Equation 21, Ω is the region mask, defined in analogy to Equation 1, and A is the region area, defined in analogy to Equation 2.




IV.6 low_den_obj, med_den_obj, high_den_obj




These discrete texture features are the numbers of discrete 8-connected subcomponents of the objects consisting of more than one pixel of low, medium, and high density.




IV.7 low_cntr_mass, med_cntr_mass, high_cntr_mass




These discrete texture features represent the separation between the geometric center of the low, medium, and high optical density clusters (treated as if they were single objects) and the geometric center of the whole object, normalized by its mean radius.










low_cntr

_mass

=



(

[



(






i
=
1

L










j
=
1

M







i
·

Ω

i
,
j

low





A
low


-
x_centroid

)

2

+


(






i
=
1

L










j
=
1

M







j
·

Ω

i
,
j

low





A
low


-
y_centroid

)

2


]

)


1
2


÷

(
mean_radius
)






(
47
)







med_cntr

_mass

=



[



(






i
=
1

L










j
=
1

M







i
·

Ω

i
,
j

med





A
med


-
x_centroid

)

2

+


(






i
=
1

L










j
=
1

M







j
·

Ω

i
,
j

med





A
med


-
y_centroid

)

2


]


1
2


÷

(
mean_radius
)






(
48
)







hi_cntr

_mass

=



[



(






i
=
1

L










j
=
1

M







i
·

Ω

i
,
j

hi





A
hi


-
x_centroid

)

2

+


(






i
=
1

L










j
=
1

M







j
·

Ω

i
,
j

hi





A
hi


-
y_centroid

)

2


]


1
2


÷

(
mean_radius
)






(
49
)













where mean_radius of the object is defined by Equation 5, the object's centroid is defined in Section II.2, Ω is the region mask defined in analogy to Equation 1, and A is the region area defined in analogy to Equation 2.




V Markovian Texture Features




Markovian texture features are defined from the co-occurrence matrix, Δ


λ,μ


of object pixels. Each element of that matrix stands for the conditional probability of the pixel of grey level λ occurring next (via 8-connectedness) to a pixel of grey level μ, where λ, μ are row and column indices of the matrix, respectively. However, the computational algorithms used here for the calculation of Markovian texture features uses so-called sum and difference histograms: H


l




s


and H


m




d


, where H


l




s


is the probability of neighboring pixels having grey levels which sum to 1, and H


m




d


is the probability of neighboring pixels having grey level differences of m, where an 8-connected neighborhood is assumed. Values of grey levels, l, m, used in the sum and difference histogram are obtained by quantization of the dynamic range of each individual object into 40 levels.




For completeness, the formulae that follow for Markovian texture features include both the conventional formulae and the computational formulae actually used.




V.1 entropy




The entropy feature represents a measure of “disorder” in object grey level organization: large values correspond to very disorganized distributions, such as a “salt and pepper” random field:









entropy
=



λ





μ




Δ

λ
,
μ




log
10



Δ

λ
,
μ








(
conventional
)








(
50
)






entropy
=


-



l




H
l
s



log
10



H
l
s




-



m




H
m
d



log
10



H
m
d







(
computational
)






















V.2 energy




The energy feature gives large values for an object with a spatially organized grey scale distribution. It is the opposite of entropy, giving large values to an object with large regions of constant grey level:









energy
=



λ





μ




Δ

λ
,
μ

2







(
conventional
)








(
51
)






energy
=




l



(

H
l
s

)


+



m





(

H
m
d

)

2







(
computational
)






















V.3 contrast




The contrast feature gives large values for an object with frequent large grey scale variations:









contrast
=



λ









μ





(

λ
-
μ

)

2



Δ

λ
,
μ








(
conventional
)








(
52
)






contrast
=



m




m
2



H
m
d







(
computational
)





















V.4 correlation




A large value for correlation indicates an object with large connected subcomponents of constant grey level and with large grey level differences between adjacent components:









correlation
=



λ





μ




(

λ
-


I
q

_


)



(

μ
-


I
q

_


)



Δ

λ
,
μ








(
conventional
)








(
53
)






correlation
=


1
2



(




λ




(

l
-

2



I
q

_



)



H
l
s



-



m




m
2



H
m
d




)







(
computational
)




















where {overscore (I


q


)} is the mean intensity of the object calculated for the grey scale quantized to 40 levels.




V.5 homogeneity




The homogeneity feature is large for objects with slight and spatially smooth grey level variations:









homogeneity
=



λ





μ




1

1
+


(

λ
-
μ

)

2





Δ

λ
,
μ








(
conventional
)








(
54
)






homogeneity
=



m




1


(

1
+
m

)

2




H
m
d







(
computational
)





















V.6 cl_shade




The cl_shade feature gives large absolute values for objects with a few distinct clumps of uniform intensity having large contrast with the rest of the object. Negative values correspond to dark clumps against a light background while positive values indicate light clumps against a dark background:









cl_shade
=



λ





μ





(

λ
+
μ
-

2



I
q

_



)

3



Δ

λ
,
μ








(
conventional
)








(
55
)






cl_shade
=





l





(

l
-

2



I
q

_



)

3



H
l
s





(



l





(

l
-

2



I
q

_



)

2



H
l
s



)


3
2









(
computational
)




















V.7 cl_prominence




The feature cl_prominence measures the darkness of clusters.









cl_prominence
=



λ





μ





(

λ
+
μ
-

2



I
q

_



)

4



Δ

λ
,
μ








(
conventional
)








(
56
)






cl_prominence
=





l





(

l
-

2



I
q

_



)

4



H
l
s





(



l





(

l
-

2



I
q

_



)

2



H
l
s



)

2








(
computational
)




















VI Non-Markovian Texture Features




These features describe texture in terms of global estimation of grey level differences of the object.




VI.1 den_lit_spot, den_drk_spot




These are the numbers of local maxima and local minima, respectively, of the object intensity function based on the image averaged by a 3×3 window, and divided by the object area.










den_lit

_spot

=






i


=
1

L











j


=
1

M







δ


i


,

j




m





a





x




A





(
57
)





and











den_drk

_spot

=






i


=
1

L











j


=
1

M







δ


i


,

j




m





i





n




A





(
58
)













where







δ


i


,

j




m





a





x


=

{





1



if





there





exists





a





local





maximum





of






I


i


,

j










with





value






max


i


,

j









0



otherwise













and






δ


i


,

j




m





a





x



=

{



1



if





there





exists





a





local





minimum





of






I


i


,

j










with





value






min


i


,

j









0



otherwise





















and where







I


i


,

j





=


1
9






i
=


i


-
1




i


+
1











j
=


j


-
1




j


+
1









I

i
,
j




Ω

i
,
j
















and I is the object intensity, Ω is the object mask, and A is the object area.




VI.2 range_extreme




This is the intensity difference between the largest local maximum and the smallest local minimum of the object intensity function, normalized against the slide DNA amount, iod


norm


, defined in Section III.2. The local maxima, max


i′,j′


and minima, min


i′,j′


, are those in Section VI.1 above.









range_extreme
=

(


max


(

max


i


,

j




)


-


(

min


(

min

i
,
j


)


)



(

100

iod
norm


)








(
59
)













VI.3 range_average




This is the intensity difference between the average intensity of the local maxima and the average intensity of the local minima, normalized against the slide DNA amount value, iod


norm


, defined in Section III.2 above. The local maxima, maxi


i′,j′


and minima, min


i′,j′


, values used are those from Section VI.1 above.









range_average
=


(







i


=
1

L











j


=
1

M







max


i


,

j











i


=
1

L











j


=
1

M







δ


i


,

j



max




-






i


=
1

L











j


=
1

M







min


i


,

j











i


=
1

L











j


=
1

M







δ


i


,

j



min





)



100

iod
norm







(
60
)













VI.4 center_of_gravity




The center_of_gravity feature represents the distance from the geometrical center of the object to the “center of mass” of the optical density function, normalized by the mean_radius of the object:










center_of

_gravity

=


[



(






i
=
1

L










j
=
1

M








i
·

OD

i
,
j





Ω

i
,
j









i
=
1

L










j
=
1

M








OD

i
,
j




Ω

i
,
j






-
x_centroid

)

2

+


(






i
=
1

L










j
=
1

M








j
·

OD

i
,
j





Ω

i
,
j









i
=
1

L










j
=
1

M








OD

i
,
j




Ω

i
,
j






-
y_centroid

)

2



mean_radius





(
61
)













This gives a measure of the nonuniformity of the OD distribution.




VII Fractal Texture Features




The fractal texture features are based on the area of the three-dimensional surface of the object's optical density represented essentially as a three-dimensional bar graph, with the vertical axis representing optical density, and the horizontal axes representing the x and y spatial coordinates. Thus, each pixel is assigned a unit area in the x-y plane plus the area of the sides of the three-dimensional structure proportional to the change in the pixel optical density with respect to its neighbors. The largest values of fractal areas correspond to large objects containing small subcomponents with high optical density variations between them.




The difference between fractal


1


_area and fractal


2


_area is that these features are calculated on different scales: the second one is based on an image in which four pixels are averaged into a single pixel, thereby representing a change of scale of fractal


1


_area. This calculation needs the additional mask transformation: Ω


i2,j2


represents the original mask Ω with 4 pixels mapped into one pixel and any square of 4 pixels not completely consisting of object pixels is set to zero. Ωi,j represents Ω


i2,j2


expanded by 4 so that each pixel in Ω


i2,j2


is 4 pixels in Ωi,j.




VII.1 fractal


1


_area









fractal1_area
=




i
=
2

L










j
=
2

M








(


&LeftBracketingBar;


OD

i
,
j

*

-

OD

i
,

j
-
1


*


&RightBracketingBar;

+

&LeftBracketingBar;


OD

i
,
j

*

-

OD


i
-
1

,
j

*


&RightBracketingBar;

+
1

)



Ω

i
,
j









(
62
)













where OD*


i,j


is the optical density function of the image scaled by a factor common to all images such that the possible optical density values span 256 levels.




VII.2 fractal


2


_area




This is another fractal dimension, but based on an image in which four pixel squares are averaged into single pixels, thereby representing a change of scale of fractal


1


_area in Section VII.1 above.









fractal2_area
=





i
2

=
2


L
2












j
2

=
2


M
2









(


&LeftBracketingBar;


OD

i2
,
j2

*

-

OD


i
2

,


j
2

-
1


*


&RightBracketingBar;

+

&LeftBracketingBar;


OD


i
2

,

j
2


*

-

OD



i
2

-
1

,

j
2


*


&RightBracketingBar;

+
1

)



Ω


i
2

,

j
2










(
63
)













where,








L
2

=

[

L
2

]


,


M
2

=

[

M
2

]


,










with L


2


, M


2


as integers, and OD*


i2,j2


is a scaled optical density function of the image, with 4 pixels averaged into one.




VII.3 fractal_dimen




The fractal_dimen feature is calculated as the difference between logarithms of fractal


1


_area and fractal


2


_area, divided by log


2


. This varies from 2 to 3 and gives a measure of the “fractal behavior” of the image, associated with a rate at which measured surface area increases at finer and finer scales.









fractal_dimen
=




log
10



(
fractal1_area
)


-


log
10



(
fractal2_area
)





log
10


2






(
64
)













VIII Run Length Texture Features




Run length features describe texture in terms of grey level runs, representing sets of consecutive, collinear pixels having the same grey level value. The length of the run is the number of pixels in the run. These features are calculated over the image with intensity function values transformed into 8 levels.




The run length texture features are defined using grey level length matrices,


p,q




Θ


for each of the four principal directions: θ=0°, 45°, 90°, 135°, where the directions are defined clockwise with respect to the positive x-axis. Note: As defined here, the run length texture features are not rotationally invariant, and therefore cannot, in general, be used separately since for most samples there will be no a priori preferred direction for texture. For example, for one cell, a run length feature may be oriented at 45°, but at 90° in the next; in general, these are completely equivalent. Each element of matrix


p,q




Θ


specifies the number of times that the object: contains a run of length q, in a given direction, Θ, consisting of pixels lying in grey level range, p (out of 8 grey levels). Let N


g


=8 be the number of grey levels, and N


r


be the number of different run lengths that occur in the object; then the run length features are described as follows:




VIII.1 short


0


_runs, short


45


_runs, short


90


_runs, short


135


_runs




These give large values for objects in which short runs, oriented at 0°, 45°, 90°, or 135°, dominate.









shortθ_runs
=





p
=
1


N
g











q
=
1


N
r











p
,
q

Θ


q
2








p
=
1


N
g











q
=
1


N
r










p
,
q

Θ








(
65
)













VIII.2 long


0


_runs, long


45


_runs, long


90


_runs, long


135


_runs




These give large values for objects in which long runs, oriented at 0°, 45°, 90°, or 135°, dominate.









longθ_runs
=





p
=
1


N
g











q
=
1


N
r









q
2





p
,
q

Θ








p
=
1


N
g











q
=
1


N
r










p
,
q

Θ








(
66
)













VIII.3 grey


0


_level, grey


45


_level, grey


90


_level, grey


135


_level




These features estimate grey level nonuniformity, taking on their lowest values when runs are equally distributed throughout the grey levels.









greyθ_level
=





p
=
1


N
g





(








q
=
1


N
r










p
,
q

Θ


)

2






p
=
1


N
g











q
=
1


N
r










p
,
q

Θ








(
67
)













VIII.4 run


0


_length, run


45


_length, run


90


_length, run


135


_length




These features estimate the nonuniformity of the run lengths, taking on their lowest values when the runs are equally distributed throughout the lengths.









runθ_length
=





q
=
1


N
r









(




p
=
1


N
g










p
,
q

Θ


)

2






p
=
1


N
g











q
=
1


N
r










p
,
q

Θ








(
68
)













VIII.5 run


0


_percent, run


45


_percent, run


90


_percent, run


135


_percent




These features are calculated as the ratio of the total number of possible runs to the object's area, having its lowest value for pictures with the most linear structure.









runθ_percent
=





p
=
1


N
g











q
=
1


N
r








(



p
,
q

Θ

)



A





(
69
)













where A is the object's area.




VIII.6 texture_orient




This feature estimates the dominant orientation of the object's linear texture.









texture_orient
=


180
π



(


π
2

+

arctan


[


(


λ
1


-

y

pseudo


-


moment2



)


xy

pseudo


-


cross_moment2



]



)






(
70
)













where λ′


1


is the maximal eigenvalue of the run length pseudosecond moment matrix (calculated in analogy to Section II.9). The run length pseudosecond moments are calculated as follows:










x

pseudo


-


moment2


=




p
=
1


N
g











q
=
1


N
r








[




p
,
q

0






l
=
1

q







(


l
2

-
l

)



]







(
71
)







y

pseudo


-


moment2


=




p
=
1


N
g











q
=
1


N
r








[




p
,
q

90






l
=
1

q







(


l
2

-
l

)



]







(
72
)







xy

pseudo


-


cross_moment2


=





(





p
=
1


N
g











q
=
1


N
r








[




p
,
q

45

·




l
=
1

q







(


2


l
2


-


2


l


)



]



-











p
=
1


N
g











q
=
1


N
r








[




p
,
q

135

·




l
=
1

q







(


2


l
2


-


2


l


)



]



)





2


2







(
73
)













Orientation is defined as it is for cell_orient, Section II.9, as the angle (measured in a clockwise sense) between the y axis and the dominant orientation of the image's linear structure.




VIII.7 size_txt_orient




This feature amplifies the texture orientation for long runs.










size_txt

_orient

=


λ
1



λ
2







(
74
)













where λ′


1


,λ′


2


are the maximal and minimal eigenvalues of the run_length pseudosecond moment matrix, defined in Section VIII.6.




Each of the above features are calculated for each in-focus object located in the image. Certain features are used by the classifier to separate artifacts from cell nuclei and to distinguish cells exhibiting MACs from normal cells. As indicated above, it is not possible to predict which features will be used to distinguish artifacts from cells or MAC cells from non-MAC cells, until the classifier has been completely trained and produces a binary decision tree or linear discriminant function.




In the present embodiment of the invention, it has been determined that thirty (30) of the above-described features appear more significant in separating artifacts from genuine nuclei and identifying cells with MACs. These primarily texture features are as follows:















30 preferred nuclear features

























 1) Area







 2) mean radius







 3) OD variance







 4) OD skewness







 5) range average







 6) OD maximum







 7) density of light spots







 8) low DNA area







 9) high DNA area







10) low DNA amount







11) high DNA amount







12) high average distance







13) mid/high average distance







14) correlation







15) homogeneity







16) entropy







17) fractal dimension







18) DNA index







19) run 0 percent







20) run 45 percent







21) run 90 percent







22) run 135 percent







23) grey level 0







24) grey level 45







25) grey level 90







25) grey level 135







27) run length 0







28) run length 45







29) run length 90







30) run length 135















Although these features have been found to have the best ability to differentiate between types of cells, other object types may be differentiated by the other features described above.




As indicated above, the ability of the system according to the present invention to distinguish cell nuclei from artifacts or cells that exhibit MACs from those that do not depends on the ability of the classifier to make distinctions based on the values of the features computed. For example, to separate cell nuclei from artifacts, the present invention may apply several different discriminant functions each of which is trained to identify particular types of objects. For example, the following discriminant function has been used in the presently preferred embodiment of the invention to separate intermediate cervical cells from small picnotic objects:



















cervical cells




picnotic




























max_radius




4.56914




3.92899







freq_low_fft




−.03624




−.04714







harmon03_fft




1.29958




1.80412







harmon04.fft




.85959




1.20653







lowVSmed_DNA




58.83394




61.84034







energy




6566.14355




6182.17139







correlation




.56801




.52911







homogeneity




−920.05017




−883.31567







cl_shade




−67.37746




−63.68423







den_drk_spot




916.69360




870.75739







CONSTANT




−292.92908




−269.42419















Another discriminant function that can separate cells from junk particles is:



















cells




junk




























eccentricity




606.67365




574.82507







compactness




988.57196




1013.19745







freq_low_fft




−2.57094




−2.51594







freq_high_fft




−28.93165




−28.48727







harmon02.fft




−31.30210




−30.18383







harmon03.fft




14.40738




14.30784







medDNAamnt




39.28350




37.50647







correlation




.27381




.29397







CONSTANT




−834.57800




−836.19659















Yet a third discriminant function that can separate folded cells that should be ignored from suitable cells for analysis.



















normal interm




rejected objects




























sphericity




709.66357




701.85864







eccentricity




456.09146




444.18469







compactness




1221.73840




1232.27441







elongation




−391.76352




−387.19376







freq_high_fft




−37.89624




−37.39510







lowDNAamnt




−41.89951




−39.42714







low_den_obj




1.40092




1.60374







correlation




.26310




.29536







range_average




.06601




.06029







CONSTANT




−968.73628




−971.18219















Obviously, the particular linear discriminant function produced by the classifier will depend on the type of classifier used and the training sets of cells. The above examples are given merely for purposes of illustration.




As can be seen, the present invention is a system that automatically detects malignancy-associated changes in a cell sample. By properly staining and imaging a cell sample, the features of each object found on the slide can be determined and used to provide an indication whether the patient from which the cell sample was obtained is normal or abnormal. In addition, MACs provide an indication of whether cancer treatment given is effective as well as if a cancer is in remission.




While the preferred embodiment of the invention has been illustrated and described, it will be appreciated that various changes can be made therein without departing from the spirit and scope of the invention.



Claims
  • 1. A method of predicting whether a patient will develop cancer, comprising the steps of:obtaining a sample of apparently normal cells from the patient; determining whether the cells in the sample exhibit malignancy associated changes by: (1) staining the nuclei of the cells in the sample; (2) obtaining an image of the cells with a digital microscope and recording the image in a computer system; (3) analyzing the stored image of the cells to identify the nuclei; (4) computing a set of feature values for each nucleus found in the sample and from the feature values determining whether the nucleus exhibits a malignancy associated change; and determining a total number of nuclei in the sample that exhibit malignancy-associated changes and from the number predicting whether the patient will develop cancer, wherein the step of determining a total number of nuclei comprises: determining a ratio of nuclei determined to exhibit malignancy-associated changes to the total identified nuclei; comparing the ratio to a predetermined threshold; and predicting that a patient will develop cancer if the ratio exceeds the predetermined threshold.
  • 2. The method of claim 1, wherein the step of calculating one or more features of the set of feature values further comprises the step of:dilating or contracting the true edge of the object before calculating the features.
  • 3. The method of claim 1, wherein the features used to determine whether the nucleus exhibits a malignancy-associated change are selected from the group comprising: 1) area 2) mean radius 3) OD variance 4) OD skewness 5) range average 6) OD maximum 7) density of light spots 8) low DNA area 9) high DNA area10) low DNA amount11) high DNA amount12) high average distance13) mid/high average distance14) correlation15) homogeneity16) entropy17) fractal dimension18) DNA index19) run 0 percent20) run 45 percent21) run 90 percent22) run 135 percent23) grey level 024) grey level 4525) grey level 9026) grey level 13527) run length 028) run length 4529) run length 9030) run length 135.
  • 4. The method of claim 1, wherein the predetermined threshold is 0.45.
  • 5. A method of predicting whether a patient will develop cancer, comprising the steps of:obtaining a sample of apparently normal cells from a patient; determining whether the cells in the sample exhibit malignancy associated changes by: (1) staining the cells in the sample; (2) obtaining an image of the cells with a digital microscope and recording the image in a computer system; (3) analyzing the stored image of the cells to identify the nuclei; and (4) computing a set of feature values for each nucleus found in a subsample of the sample, and from the feature values determining whether the nucleus exhibits a malignancy associated change; determining a composite value based on the nuclei determined to exhibit malignancy-associated changes and the total nuclei in the subsample; comparing the composite value to a predetermined threshold; and predicting whether the patient will develop cancer if the composite value exceeds the predetermined threshold.
  • 6. The method of claim 5, wherein the composite value is a ratio of the nuclei determined to exhibit malignancy-associated changes to the total nuclei in the subsample.
  • 7. The method according to claim 6, wherein the predetermined threshold is 0.45.
CROSS-REFERENCE TO RELATED APPLICATIONS

The present application is a continuation of application Ser. No. 08/644,893, filed May 10, 1996, now U.S. Pat. No. 5,889,881, which is a continuation-in-part of application Ser. No. 08/425,257 filed Apr. 17, 1995, now abandoned which is a continuation of Ser. No. 08/182,453 filed Jan. 10, 1994 now abandoned which is a continuation-in-part of Ser. No. 07/961,596 filed Oct. 14, 1992, now abandoned, the disclosure of which are expressly incorporated herein by reference. The benefit of the priority of the filing dates of the above-identified applications is hereby claimed under 35 U.S.C. §120.

US Referenced Citations (5)
Number Name Date Kind
4453266 Bacus Jun 1984 A
5016283 Bacus et al. May 1991 A
5099521 Kosaka Mar 1992 A
5889881 MacAulay et al. Mar 1999 A
6026174 Palcic et al. Feb 2000 A
Continuations (2)
Number Date Country
Parent 08/644893 May 1996 US
Child 09/277499 US
Parent 08/182453 Jan 1994 US
Child 08/425257 US
Continuation in Parts (2)
Number Date Country
Parent 08/425257 Apr 1995 US
Child 08/644893 US
Parent 07/961596 Oct 1992 US
Child 08/182453 US