This application is the National Stage of International Application No. PCT/CN2019/091761 filed Jun. 18, 2019.
The present invention relates to a method and system for segmenting overlapping cytoplasm in a medical image, and more particularly, to a method and system for segmenting overlapping cytoplasm in a medical image for cervical cancer screening.
Cervical cancer ranks fourth in the mortality rate of malignant tumors in women, and it is also the fourth cancer in incidence. High-quality cervical cancer screening can greatly reduce the incidence and mortality of cervical cancer. Specifically, cervical cancer screening refers to examining the abnormality of each cervical cell sampled from the cervix and placed on a glass slide under a microscope to assess whether there are cervical cancer cells.
For the development of an automatic cervical cancer screening system, segmentation of the overlapping cytoplasm of the cells in the cervical image is one of the key tasks, because in order to examine the abnormality of the cells, the characteristics of the cell level (e.g., the shape and size of the cells, and the area ratio of cytoplasm to nucleus) are clinically important. However, since the intensity (or color) information in an overlapping area is usually confusing and even misleading, the lack of such intensity (or color) information makes this task very challenging.
The traditional method of segmenting overlapping cytoplasm in a medical image is achieved by leveraging the intensity information between the cytoplasm of cells in a clump or combining spatial information. This objective is generally achieved by extending typical segmentation models, such as threshold segmentation, watershed segmentation, and image segmentation. These methods theoretically hypothesize that the intensity information is sufficient to identify an occluded boundary part. However, this hypothesis is flawed. In fact, the intensity information of the overlapping area is usually confusing and even misleading.
In order to eliminate the problems caused by insufficient strength information, shape prior-based methods show a good segmentation performance because additional shape information is inserted into segmentation methods. These methods either use simple shape estimation (for example, the cytoplasm has an elliptical or star shape), or match shape instances from a finite set of shapes collected in advance to model a priori shape. Then, by leveraging the intensity information, the segmentation result needs to be as similar as possible to the modeled prior shape to segment the overlapping cytoplasm. It is usually realized by an active contour model or a level set model, where the prior shape is designed as a regular term in an energy function, and it is assumed that the minimum (or maximum) value of the energy function is obtained by a segmentation function that produces the optimal segmentation result.
Although the prior art has improved the accuracy of segmentation, these existing prior shape methods still have three main shortcomings. First of all, these methods use finite shape hypothesis (for example, shape speculation or collected shape instances) to model prior shapes. Because these specific shape hypotheses cannot well restore the occluded boundary part of the cytoplasm, the prior shapes modeled by these methods are usually not enough to identify the occluded boundary part of the cytoplasm. Second, these methods only use the local prior shape (i.e. the prior information is only the shape of a single cytoplasm) to evolve the shape of the cytoplasm, without considering the shape relationship between all cytoplasm and clumps. As a result, the segmentation results of these methods are usually inconsistent with the clump evidence, and the segmented cytoplasm boundary is deviated from the ideal boundary of the clump. Third, although these methods require the final shape to be as similar as possible to the modeled prior shape, these methods do not impose shape constraints on the final shape. In fact, these methods try to find a suitable compromise between the intensity evidence and the local prior shape by balancing the parameters. Therefore, when the intensity evidence contradicts the local prior shape, these methods will produce incredible segmentation results.
In addition, for the establishment of an infinite shape hypothesis set, existing technologies (e.g., T F Cootes, C J Taylor, and et al. D H Cooper. “Active shape models—their training and application”, Computer Vision and Image Understanding, 61(1): 38-59, 1995) disclosed relevant content. Although this method can obtain an infinite shape hypothesis set by selecting infinite values, its main disadvantage lies in that it is difficult to collect a set of shape instances to ensure that the invisible cytoplasm shape is well restored by the established shape hypothesis. In the prior art, the establishment of the shape hypothesis set depends on how to collect good shape instances. Usually, it is necessary to manually select representative shape instances. This is an experimental and error-prone way, and it is also a labor-intensive method. This method is not feasible when an infinite number of instances need to be collected to approximate complex shapes.
Therefore, a technical problem to be solved by the present invention is to provide a method and system that can more accurately and effectively segment overlapping cytoplasm in a medical image based on the prior shape.
In an embodiment, the present invention relates to a method for segmenting overlapping cytoplasm in a medical image, including: establishing a cytoplasm shape hypothesis set; and selecting a shape hypothesis for each cytoplasm from the established cytoplasm shape hypothesis set to perform constrained multi-shape evolution, thereby segmenting overlapping cytoplasm in the medical image, wherein the constrained multi-shape evolution includes: segmenting a clump area composed of a plurality of overlapping cytoplasm to provide clump evidence; performing shape alignment to assess quality of the selected shape hypotheses; and performing shape evolution to determine a better shape hypothesis for each cytoplasm.
The method for segmenting overlapping cytoplasm in a medical image further includes a step of learning the importance of shape instances for updating cytoplasm shape instances for the established cytoplasm shape hypothesis set.
Preferably, the shape alignment and the shape evolution are performed iteratively, an output of the shape alignment is used as an input of the shape evolution, and an output of the shape evolution is used as an input of the shape alignment. The shape alignment assesses whether to start a new shape evolution or not, and once it is determined that no new shape evolution is required, a current shape hypothesis after the shape alignment is regarded as a segmentation result of the overlapping cytoplasm.
Preferably, the cytoplasm shape hypothesis set is established with a formula as follows:
s={si:si=μ+Mxi, i∈}
where μ represents an average shape of collected shape instances, Mxi represents a linear combination of eigenvectors of a covariance matrix of the collected shapes, wherein each column of the matrix M represents an eigenvector, xi represents a weight vector of the linear combination, and si represents a shape hypothesis marked as i.
Preferably, the shape alignment includes filling the shape hypotheses selected from the cytoplasm shape hypothesis set to obtain a binary image, and obtaining a rotation angle and a scaling size required for aligning the binary image with the corresponding cytoplasm with a formula as follows:
argmax(Bi∩Bc), s. t. Bi⊂Bc
where Bc represents an image of the segmented clump area; Bi represents an alignment result image, Bi is a binary image with the same size as Bc, and Bi should be inside Bc.
Preferably, the shape evolution includes: setting an objective function, the objective function being as follows:
where x represents set {xi}i=1N and xk represents x in the k-th evolution; N represents quantity of cytoplasm in the clump area; Bg=∪i=1NBi represents a binary image generated by an alignment result {Bi}i=1N; (x, y) represents coordinates of a pixel in the image; and determining xk+1 that causes to have a lower value than xk.
Preferably, the determining of xk+1 that causes to have a lower value than xk comprises: for matrix p obtained from the objective function, obtaining the following formula according to Taylor's theorem:
(xk+p)=(xk)+∇(xk)Tp+½pT∇2(xk+γp)p
where ∇ and ∇2 respectively represent a gradient and Hessian matrix calculation; γ represents a scalar in an interval (0, 1), and then a minimum value of an overall area of in an area formed with xk as a center of a circle and ∥p∥2 as a radius is obtained; returning to xk to approximate as follows:
mk(p)=(xk)+∇(xk)Tp+½pT∇2(xk)p
where mk represents at xk; approximating the minimum value of by the minimum value of mk, and solving the following formula by a trust region method:
and obtaining an output result of the k-th evolution as follows:
xk+1=xk+p*.
Preferably, the step of learning the importance of shape instances includes: randomly selecting a set of shape instances and calculating an average shape of the set of shape instances with a formula as follows:
where Ks represents a number of selected shape instances; ωi represents the importance of each shape instance si; W represents a sum of all ωi; and calculating the covariance matrix according to the obtained μ as follows:
where a first t eigenvectors of the matrix Mc constitute a matrix M=(e1 e2 . . . et), and their corresponding eigenvalues are λ1≥λ2≥ . . . ≥λt.
Preferably, if the segmentation result obtained by the constrained multi-shape evolution is greater than a predetermined threshold, recalculation is performed by the step of learning the importance of shape instances to update the shape hypothesis set; and the update is stopped until the segmentation result cannot decrease any more or reaches the predetermined threshold.
According to another aspect, the present invention further relates to a system for segmenting overlapping cytoplasm in a medical image, including: a shape hypothesis set module configured to establish a cytoplasm shape hypothesis set; and a multi-shape evolution module configured to select a shape hypothesis for each cytoplasm from the established cytoplasm shape hypothesis set to perform constrained multi-shape evolution, thereby segmenting overlapping cytoplasm in the medical image, wherein the multi-shape evolution module is configured to: segment a clump area composed of a plurality of overlapping cytoplasm to provide clump evidence; perform shape alignment for assessing quality of the selected shape hypotheses; and perform shape evolution for determining a better shape hypothesis for each cytoplasm.
Preferably, the system for segmenting overlapping cytoplasm in a medical image further includes a shape instance importance learning module configured to update cytoplasm shape instances for the established cytoplasm shape hypothesis set.
Preferably, the shape instance importance learning module is configured to: randomly select a set of shape instances and calculate an average shape of the set of shape instances with a formula as follows:
where Ks represents a number of selected shape instances; ωi represents the importance of each shape instance si; W represents a sum of all ωi; and calculate the covariance matrix according to the obtained μ as follows:
where a first t eigenvectors of the matrix Mc constitute a matrix M=(e1 e2 . . . et), and their corresponding eigenvalues are λ1≥λ2≥ . . . ≥λt.
Preferably, if the segmentation result obtained by the constrained multi-shape evolution module is greater than a predetermined threshold, recalculation is performed by the shape instance importance learning module to update the shape hypothesis set; and the update is stopped until the segmentation result cannot decrease any more or reaches the predetermined threshold. According to the method and system of the present invention, an infinite shape hypothesis set is used to model prior shapes, and in the meanwhile local prior shapes and overall prior shapes are combined with intensity information for evolution, and the result shape in each evolution is constrained to the shape hypothesis set. Compared with the existing method and system for segmenting overlapping cytoplasm in a medical image, the method and system of the present invention can better identify the occluded boundary part, thereby better segmenting the overlapping cytoplasm and providing more accurate shape characteristics for medical diagnosis. The infinite shape hypothesis set established in the present invention can better describe all possible shapes of the cytoplasm, thereby more efficiently segmenting overlapping cytoplasm of different shapes. The constrained multi-shape evolution algorithm of the present invention combines the local prior shapes and the overall prior shapes with intensity information for evolution by considering the shape relationship between all cytoplasm and the entire clump, thereby obtaining more information for segmentation. The present invention uses the importance of each shape instance in the calculation of shape statistics, so that invisible shapes can be well approximated by the shape hypotheses in the shape hypothesis set. The implantation of a learning step in the multi-shape evolution step of the present invention can obtain useful information more effectively. Therefore, compared with the prior art, the method and system of the present invention for segmenting overlapping cytoplasm in a medical image can obtain more accurate results more effectively.
The technical solution of the present invention can be better understood through the drawings and the following description, in which:
According to the present invention, all overlapping cytoplasm in a clump is segmented by evolving the cytoplasm shape guided by the modeled local prior shape and overall prior shape and simultaneously evolving the mutual shape constraints of the cytoplasm, so that the shape prior-based method of the present invention for segmenting overlapping cytoplasm in an medical image can accurately and efficiently obtain cytoplasm segmentation results, thereby improving the accuracy and efficiency of cervical cancer screening. Specifically, by using statistical shape information to model local prior shapes, shape hypothesis set with infinite cytoplasm shape hypothesis is established; in the multi-shape evolution step, in addition to considering the local prior shapes, the present invention obtains the overall prior shape by evolving the shape of the cytoplasm and then using an algorithm to make the segmentation result consistent with the clump evidence. In addition, in the multi-shape evolution step, the final shape obtained in the evolution process is required to be within the shape hypothesis set, thereby reducing the incredible segmentation results in the prior art. Moreover, in order to make the established shape hypothesis better restore any invisible cytoplasm shape, the present invention also adds a step of learning the importance of shape instances in the shape statistics calculation.
The present invention adopts the following new algorithms and steps to implement a new method for segmenting a clump with overlapping cytoplasm in a medical image.
The following content provides a detailed description of the above-mentioned three steps: step 201 of establishing a shape hypothesis set, step 202 of constrained multi-shape evolution, and step 203 of learning the importance of shape instances.
Establishing a Shape Hypothesis Set
First, the shape of the cytoplasm of the cell is parameterized. Boundary points in the form of vector s are used to describe the shape of the cytoplasm of each cell. The k-th s stores a distance value of the boundary point having an angle value equal to k in a polar coordinate system, and the origin of the polar coordinate system is located at the center of mass of a cell nucleus. It should be noted that each cell is composed of a cytoplasm and a nucleus. In the present invention, the center of mass of a cell nucleus rather than the center of mass of the cytoplasm is selected as the origin of the polar coordinate system for the consideration of feasibility, because it is easier to detect the center of mass of the cell nucleus than to detect the center of mass of the cytoplasm when the cytoplasm of cells overlaps (see
In addition, according to the existing method of establishing an infinite shape hypothesis set in the prior art, the shape hypothesis set is established by using the statistical shape information of the cytoplasm. In the present invention, the shape hypothesis set is expressed as follows:
s={si:si=μ+Mxi, i∈} (1)
where μ represents an average shape of collected shape instances, Mxi represents a linear combination of eigenvectors of a covariance matrix of the collected shapes (wherein each column of the matrix M represents an eigenvector and xi represent a weight vector of the linear combination), and si represents a shape hypothesis marked as i.
By substituting xi of different values into formula (1), different shape hypotheses si can be obtained. Since an infinite number of xi can be selected, an infinite shape hypothesis set can be established. However, relying on the shape hypothesis set established by formula (1), it is difficult to collect shape instances that can well restore invisible cytoplasm shapes. The present invention overcomes this shortcoming by implementing the following step of learning the importance of shape instances in the calculation of shape statistics.
Constrained Multi-Shape Evolution
In the clump area segmentation (step 301), the present invention uses a multi-size convolutional neural network (CNN) to segment cytoplasm and nucleus areas (see
In the shape alignment (step 302), for the shape hypothesis si, since it is only a vector storing boundary point information, a corresponding binary image of si (i.e. binary image) is obtained by filling an area inside a contour described by si. Pixels inside the contour are marked as 1, and pixels outside the contour are marked as 0. As described above, si is assigned as the output of the shape evolution step, and the average shape of the instances collected from the shape hypothesis set is used as the initial si of each cytoplasm. In addition, since the present invention can circumvent non-rigid transformation through the evolution of shape hypothesis described below, the present invention limits the shape alignment to rigid alignment.
Specifically, for each si, first the center of mass of an area where si is filled, is aligned with the center of mass of the cell nucleus in the image. Then, a scaling factor (ri) and a rotation coefficient θi for alignment are obtained with a formula as follows:
argmax(Bi∩Bc), s. t. Bi⊂Bc (2)
where: Bc represents an image of the segmented clump area; Bi represents the alignment result, which is obtained by rotating the area where si is filled by an angle θi and scaling it with the number ri; Bi is a binary image with the same size as Bc, where the values of ri and θi are determined by grid search.
The alignment result Bi should be inside Bc. If there is no such constraint, the shape hypothesis actually obtained is aligned with the entire clump area, rather than aligned with the cytoplasm itself.
In the shape evolution (step 303), for the alignment result Bi, the shape evolution algorithm of the present invention can find a more suitable cytoplasm shape hypothesis than si. First, an objective function as shown in the following formula (3) needs to be defined:
where x represents set {xi}i=1N, xk represents x of the k-th evolution, as described in formula (1), si is determined by xi; N represents quantity of cytoplasm in the clump; Bg=∪i=1NBi represents a binary image generated by the alignment result {Bi}i=1N; (x, y) represents coordinates of a pixel in the image.
It can be seen that the objective function represented by the formula (3) is actually to detect the difference about pixel between a segmented clump area and a clump area composed of the alignment result. In an ideal state, if all the cytoplasm is segmented very accurately, is equal to 0. As mentioned above, this ideal state is difficult to achieve. Therefore, the objective function is designed in the method of the present invention. The main reason is to make full use of the boundary information of the clump while minimizing the influence of insufficient intensity information in the overlapping area.
Therefore, according to the present invention, xk+1 that causes to have a value lower than xk is found through the following formulas (4) to (6). For any matrix p obtained from formula (3), the following formula (4) is obtained here using Taylor's theorem:
(xk+p)=(xk)+∇(xk)Tp+½pT∇2(xk+γp)p (4)
where ∇ and ∇2 respectively represent a gradient and Hessian matrix calculation; γ is a certain scalar in the interval (0, 1). The above formula (4) indicates that only information about the function value, the first derivative and the second derivative at xk can be used to approximate near xk, so as to obtain the minimum value of the overall area of in an area formed with xk as the center of a circle and ∥p∥2 as a radius.
In theory, the minimum value of the overall area is the optimal xk+1 that can be used in the k-th evolution. However, since the value of the scalar γ is unknown, cannot be obtained directly by analysis. Therefore, the process returns to xk to approximate , at xk is represented by mk in the following formula (5):
mk(p)=(xk)+∇(xk)Tp+½pT∇2(xk)p. (5)
When ∥p∥2 is very small, the result is very accurate with an approximate error of (∥p∥23), and then the minimum value of is approximated by the minimum value of mk as follows:
The formula (6) can be solved by the existing trust region method (see J. Nocedal and S J Wright. Numerical Optimization. Springer, 2006), and finally, the output result of the k-th evolution is shown in formula (7):
xk+1=xk+p*. (7)
Once xk+1 is obtained, the new shape hypothesis is aligned with the image, and then a new evolution calculation starts until cannot decrease any more or reaches a predetermined threshold *. For the cytoplasm i, the final shape hypothesis si after alignment is regarded as a final segmentation result.
Learning the Importance of Shape Instances
In order to better restore any invisible cytoplasm shape by the shape hypotheses in the shape hypothesis set s calculated according to the above formula (1), the present invention also adopts the step 203 of learning the importance of shape instances in the shape statistics calculation. The method of the present invention for learning the importance of shape instances can solve a series of problems caused by manual collection of shape instances in the prior art.
Specifically, a set of K input-output pairs are selected randomly first, as shown in the following formula (8):
={Bcj, {sij}i=1N
where Bcj represents the image, in the training image j, in which the clump area is segmented; sij represents a shape vector of the cytoplasm i in the image j; Nj represents quantity of cytoplasm in the image j. is obtained from the above formula (8), and a small set of shape instances {si}i=1K
where W represents the sum of all ωi. Then, the covariance matrix is calculated with the following formula (10):
where the first t eigenvectors of the matrix Mc constitute matrix M=(e1 e2 . . . et), and their corresponding eigenvalues are λ1≥λ2≥ . . . ≥λt.
The correlation between the step 203 of learning the importance of shape instances and the above-mentioned constrained multi-shape evolution step 202 and step 201 of establishing a shape hypothesis set (see
does not decrease any more.
As shown in
This example is based on two typical cervical scraping data sets: Pap stain data set and H&E stain data set. For obtaining the Pap stain data set, reference is made to (Z. Lu, G. Carneiro, A. Bradley et al., “Evaluation of three algorithms for the segmentation of overlapping cervical cells”, IEEE Journal of Biomedical and Health Informatics, 21(2):441-450, 2017). This data set includes 8 publicly available images, and each image has 11 clumps with an average of 3.3 cytoplasm instances. The H&E stain data set is prepared by H&E staining; the data set includes 21 images, and each image has 7 clumps with an average of 6.1 cytoplasm instances.
First, a training set is constructed using three images randomly selected from the Pap stain data set and five images randomly selected from the H&E stain data set, and the remaining images form a test set. The training set has 72 clumps and 324 cytoplasm instances, of which 28 isolated cytoplasm instances are used to initialize a small shape instance set {si}i=1K
In the method of the present invention, two parameters need to be set: a predetermined threshold * for terminating the multi-shape evolution step, and a value for calculating the eigenvector t of the matrix M. Although usually small * helps to improve the accuracy of the segmentation result, a smaller * causes longer calculation time. In the present invention, in order to balance the accuracy of the segmentation result and the calculation time, the predetermined threshold * is set to be approximately between 3% and 7% of the number of pixels in the clump, preferably approximately 5%. For the value of the eigenvector t, a larger t will make the assessed shape si show more details of the overall shape, but it will also consume more computing resources in the shape evolution process. In the present invention, t is determined on the basis of the existing formula (Σi=1tλi/Σλi)>0.995 disclosed in the prior art (see T F Cootes, C J Taylor, D H Cooper, etc., “Active shape models-their training and application”, Computer Vision and Image Understanding, 61(1):38-59, 1995), and the value oft is set to be 20 in this example.
In this embodiment, the results obtained by the method according to the present invention are compared with those of four existing technologies (see Table 1 below). The four existing technologies are joint level set function method (see Z. Lu, G. Carneiro and A P Bradley. “An improved joint optimization of multiple level set functions for the segmentation of overlapping cervical cells”, IEEE Transactions on Image Processing, 24(4):1261-1272, 2015.), multi-cell labeling (see Y. Song, E L Tan, X. Jiang, etc., “Accurate cervical cell segmentation from overlapping clumps in pap smear images”, IEEE Transactions on Medical Imaging, 36(1):288-300, 2017), multi-pass watershed method (see A. Tareef, Y. Song, H. Huang, etc., “Multi-pass fast watershed for accurate segmentation of overlapping cervical cells”, IEEE Transactions on Medical Imaging, 2018.) and contour segmentation method (see Y. Song, J. Qin, B. Lei, etc., “Automated segmentation of overlapping cytoplasm in cervical smear images via contour fragments”, In Proceedings of the 32th AAAI Conference on Artificial Intelligence, pages 168-175. AAAI, 2018.). In Table 1, LSF, MCL, MPW, and CF are used to represent the segmentation results of the above four existing technologies, wherein LSF, MCL and CF belong to the existing shape prior-based methods, and MPW is a variant of the watershed method.
Table 1 lists the quantitative comparison of the segmentation results obtained by different methods under multiple overlap conditions. The overlap is used to measure the degree of overlap and calculated by the length ratio of the occluded boundary part to the entire boundary part of the cytoplasm. From the results in Table 1, it can be seen that the method of the present invention obtains the optimal segmentation results, and compared with other methods, the accuracy of the method of the present invention has an improvement of about 5% on average. Specifically, when the overlap is less than 0.5 (see columns [0, ¼) and [¼, 2/4) in the table), the accuracy of the method of the present invention has an improvement of about 3% on average; when the overlap is greater than 0.5 (see columns [ 2/4, ¾) and [¾, 1) in the table), the accuracy of the method of the present invention has an improvement of about 8% on average.
In addition,
The method and system of the present invention overcome the problem that the cytoplasm cannot be accurately segmented due to lack of intensity information in the overlapping area. Compared with the existing shape prior-based technologies, the method and system of the present invention provide a more accurate method and system for segmenting cytoplasm in an overlapping area by establishing an infinite shape hypothesis set, calculating and evolving local prior shapes and overall prior shapes, and imposing shape constraints on the final result.
The method and system of the present invention are not limited to the detection of cervical cancer, and those skilled in the art can make appropriate improvements so that the method and system of the present invention can be applied to other microscopy images that quantitatively measure cell-level features, such as pathological image measurement.
Although the present invention has been described above with reference to the specific embodiments of the shape priori-based method and system for detecting overlapping cytoplasm, it is certainly conceivable that a person of ordinary skill in the art can derive many variants. Therefore, variations readily conceivable by those of ordinary skill in the art are considered as part of the present invention. The scope of the present invention is defined in the appended claims.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/CN2019/091761 | 6/18/2019 | WO |
Publishing Document | Publishing Date | Country | Kind |
---|---|---|---|
WO2020/252665 | 12/24/2020 | WO | A |
Number | Name | Date | Kind |
---|---|---|---|
20190272638 | Mouton | Sep 2019 | A1 |
Entry |
---|
Song et. al., Segmentation of Overlapping Cytoplasm in Cervical Smear Images via Adaptive Shape Priors Extracted From Contour Fragments IEEE Transactions on Medical Imaging 08; Aug. 5, 2019, vol. 38 Issue 12; pp. 2849-2855. |
Tareef et. al., Automatic segmentation of overlapping cervical smear cells based on local distinctive features and guided shape deformation Neurocomputing 221:94-107. |
Number | Date | Country | |
---|---|---|---|
20220237782 A1 | Jul 2022 | US |