The present invention relates to classifying different types of tissue in medical image data using machine learning based image classification, and more particularly to automatic brain tumor diagnosis using machine learning based image classification.
Cancer is a major health problem throughout the world. Early diagnosis of cancer is crucial to the success of cancer treatments. Traditionally, pathologists acquire histopathological images of biopsies sampled from patients, examine the histopathological images under microscopy, and make judgments as to a diagnosis based on their knowledge and experience. Unfortunately, intraoperative fast histopathology is often not sufficiently informative for pathologists to make an accurate diagnosis. Biopsies are often non-diagnostic and yield inconclusive results for various reasons. Such reasons include sampling errors, in which the biopsy may not originate from the most aggressive part of a tumor. Furthermore, the tissue architecture of the tumor can be altered during the specimen preparation. Other disadvantages include the lack of interactivity and a waiting time of about 30-45 minutes for the diagnosis result.
Confocal laser endomicroscopy (CLE) is a medical imaging technique that provides microscopic information of tissue in real-time on cellular and subcellular levels. Thus, CLE can be used to perform an optical biopsy and pathologists are able to access image directly in the surgery room. However, manual judgement as to a diagnosis may be subjective and variable across different pathologists. In addition, due to large amounts of image data acquired, the diagnosis task based on the optical biopsy can be a significant burden for pathologists. A computer-aided method for automated tissue diagnosis is desirable to reduce the burden and to provide quantitative numbers to support a pathologist's final diagnosis.
The present invention provides a method and system for automated classification of different types of tissue in medical images using machine learning based image classification. Embodiments of the present invention reconstruct image features of input endomicroscopy images using a learnt discriminative dictionary and classify the tissue in the endomicroscopy images based on the reconstructed image features using a trained classifier. Embodiments of the present invention utilize a dictionary learning algorithm that explicitly learns class-specific sub-dictionaries that minimize the effect of commonality among the sub-dictionaries. Embodiments of the present invention can be used to distinguish between glioblastoma and meningioma and classify brain tumor tissue in confocal laser endomicroscopy (CLE) images as malignant or benign.
In one embodiment of the present invention, local feature descriptors are extracted from an endomicroscopy image. Each of the local feature descriptors is encoded using a learnt discriminative dictionary. The learnt discriminative dictionary includes class-specific sub-dictionaries and penalizes correlation between bases of sub-dictionaries associated with different classes. Tissue in the endomicroscopy image is classified using a trained machine learning based classifier based on coded local feature descriptors resulting from encoding each of the local feature descriptors using a learnt discriminative dictionary.
These and other advantages of the invention will be apparent to those of ordinary skill in the art by reference to the following detailed description and the accompanying drawings.
The present invention relates to automated classification of different types of tissue in medical images using a machine learning based image classification. Embodiments of the present invention can be applied to endomicroscopy images of brain tumor tissue for automated brain tumor diagnosis. Embodiments of the present invention are described herein to give a visual understanding of the method for automated classification of tissue in medical images. A digital image is often composed of digital representations of one or more objects (or shapes). The digital representation of an object is often described herein in terms of identifying and manipulating the objects. Such manipulations are virtual manipulations accomplished in the memory or other circuitry/hardware of a computer system. Accordingly, is to be understood that embodiments of the present invention may be performed within a computer system using data stored within the computer system.
In the example of
A foot pedal (not shown in
Continuing with the example of
Although
An image based retrieval approach has been proposed to perform endomicroscopic image recognition tasks. In such an approach, classification is performed by querying an image database with Bag of feature Words (BoW)-based image representation and the most similar images from the database are retrieved. However, this approach requires large amounts of storage space which may be unfeasible for large database sizes. Embodiments of the present invention encode feature descriptors extracted from endomicroscopy images using learnt task-specific dictionaries.
Embodiments of the present invention utilize an automated machine learning based framework to classify endomicroscopy images to different tissue types. This framework has three stages: (1) offline dictionary learning; (2) offline classifier training; and (3) online image classification. Embodiments of the present invention apply this image classification framework to automated brain tumor diagnosis to distinguish between two types of brain tumors: Glioblastoma and Meningioma. It is possible to learn an overcomplete dictionary to approximate feature descriptors of a given endomicroscopy image. However, the present inventors have observed that, despite the highly discriminative features contained by the images of different categories of tissue (e.g., glioblastoma and meningioma), these images may also share common patterns which do not contribute to the image recognition task. Another challenge in distinguishing glioblastoma and meningioma is the large intra-class variance and small inter-class commonality of the two types of brain tumors.
To solve the above described challenges and improve the performance of the dictionary based classification pipeline, embodiments of the present invention learns a discriminative dictionary using a dictionary learning algorithm that explicitly learns class-specific sub-dictionaries that minimize the effect of commonality among the sub-dictionaries. The learnt discriminative dictionary can be used with any dictionary-code based coding methods, such as BoW, sparse coding, and locality-constraint coding. In addition, new coding methods fully utilizing the learnt discriminative dictionary are described herein.
In an advantageous embodiment, automated machine-learning based classification of tissue in endomicroscopy images is performed in three stages of off-line unsupervised codebook (dictionary) learning, off-line supervised classifier training, and on-line image or video classification.
At step 404, local feature descriptors are extracted from the training images. In a possible implementation, local feature points are detected on each training image, and local feature descriptors are extracted at each of the feature points on each training image. Various techniques may be applied for feature extraction. For example, feature descriptors such as, Scale Invariant Feature Transform (SIFT), Local Binary Pattern (LBP), Histogram of Oriented Gradient (HOG), and Gabor features, can be extracted at each of a plurality of points in each training image. Each technique may be configured based on the clinical application and other user-desired characteristics of the results. For example, the SIFT feature descriptor is a local feature descriptor that has been used for a large number of purposes in computer vision. It is invariant to translations, rotations and scaling transformations in the image domain and robust to moderate perspective transformations and illumination variations. The SIFT descriptor has been proven very useful in practice for image matching and object recognition under real-world conditions. In one exemplary implementation, dense SIFT descriptors of 20×20 pixel patches computed over a grid with spacing of 10 pixels are utilized. Such dense image descriptors may be used to capture uniform regions in cellular structures such as low-contrast regions in case of meningioma.
In another possible embodiment, rather than using human-designed feature descriptors, machine learning techniques may be used to learn filters that are discriminatively valuable from the training images. These machine-learning techniques may use various feature detection techniques including, without limitation, edge detection, corner detection, blob detection, ridge detection, edge direction, change in intensity, motion detection, and shape detection.
Returning to
Given a training set Y=[y1, . . . , yN]∈M×N, traditional dictionary learning aims to learn a dictionary of bases that best reconstruct the training examples:
where D=[d1, . . . , dK]∈M×K is the dictionary with K bases, xi∈K are the reconstruction coefficients for yi, ∥•∥1 denotes the l1-norm that promotes the sparsity of the reconstruction coefficients, and λ is a tuning parameter. Different from K-means clustering that assigns each training example to the nearest cluster center, Equation (1) learns an overcomplete dictionary D and represents each training example as a sparse linear combination of the bases in the dictionary.
To learn a dictionary that is well-suited for supervised classification tasks, class-specific dictionary learning methods have been proposed that learn a sub-dictionary for each class. For example, such a dictionary learning method can be formulated as:
where C is the number of classes, Yc=[y1c, . . . , yNcc], Xc=[x1c, . . . , xNcc], and Dc=[d1c, . . . , dKcc] are the training set, reconstruction coefficients, and sub-dictionary for the class c, respectively. However, the sub-dictionaries learned using Equation (2) typically share common (correlated) bases. Thus, the dictionary D may not be sufficiently discriminative for classification tasks, and the sparse representation will be sensitive to variations in features.
According to an advantageous embodiment of the present invention, a discriminative dictionary is learned by learning high-order couplings between the feature representations of images in the form of a set of class-specific sub-dictionaries under elastic net regularization, which is formulated as:
where D∈c=[0, . . . , D∈C, . . . 0] and D∉=D−D∈c. The term ∥Yc−DXc∥22 minimizes the global reconstruction residual of the training examples using the whole dictionary. The term ∥Yc—D∈cXc∥22 minimizes the reconstruction residual of the training examples of class c using the cth sub-dictionary. Accordingly, the minimization problem of Equation (3) learns dictionary bases D and reconstruction coefficients X to minimize the global residual for reconstruction of the training examples of a specific class from the all of dictionary bases as well as the residual for reconstruction of the training examples of the specific class from only bases of the sub-dictionary associated with that class, while penalizing the use of bases of sub-dictionaries not associated with the particular class in reconstruction training examples of that class. The term ∥D∉cXc∥22 penalizes the reconstruction of training examples using sub-dictionaries from different classes. ë1∥Xc∥1+ë2∥Xc∥22 is the elastic net regularizer, where ë1 and ë2 are tuning parameters.
The elastic net regularizer is a weighted sum of the of the l1-norm and the l2-norm of the reconstruction coefficients. Compared to a pure l1-norm regularizer, the elastic net regularizer allows the selection of groups of correlated features even if the group is not known in advance. In addition to enforcing the grouped selection, the elastic net regularizer is also crucial to the stability of the spare reconstruction coefficients with respect to the input training examples. The incorporation of the elastic net regularizer to enforce a group sparsity constraint provides the following benefits class-specific dictionary learning. First, the intra-class variations among features can be compressed since features from the same class tend to be reconstructed by bases within the same group (sub-dictionary). Second, the influence of correlated atoms (bases) from different sub-dictionaries can be minimized since their coefficients tend to be zero or non-zero simultaneously. Third, possible randomness in coefficients distribution can be removed since coefficients have group clustered sparse characteristics.
The discriminative dictionary D is learned by optimizing Equation (3). The optimization of Equation (3) can be iteratively solved by optimizing over D and X while fixing the other. D and X can be initialized using preset values. After fixing the dictionary D, the coefficients vector xjc (i.e., the coefficient vector of the j-th example in the c-th class) can be calculated by solving the following convex problem:
where
s
j
C
=[y
j
c
;y
j
c;0; . . . ;0] (5a)
{circumflex over (D)}
c
=[D;D
∈c
;D
∉c;√{square root over (ë2)}I] (5b)
where I∈K×K is an identity matrix. In an advantageous implementation, the Alternating Direction Method of Multipliers (ADMM) procedure can be used to solve Equation (4). While the dictionary is fixed D, equation (4) is solved to optimize the coefficient vectors for all training examples in all classes.
The reconstruction coefficients are the fixed and with the reconstruction coefficients fixed, bases (atoms) in the dictionary are updated. In an advantageous embodiment, the sub-dictionaries are updated class by class. In other words, while updating the sub-dictionary Dc, all other sub-dictionaries will be fixed. Terms that are independent of the current sub-dictionary can then be omitted from the optimization. Thus the objective function for updating the sub-dictionary Dc can be expressed as:
Analytical solutions exist for Equation (6). In particular, Equation (6) can be solved using the following analytical solution:
Equation (6) can be solved for each sub-dictionary using the analytical solution in Equation (7) in order to update the dictionary bases for each sub-dictionary. The updating of the coefficients and dictionary bases can be iterated until the dictionary bases and/or reconstruction coefficients converge or until a preset number of iterations are performed. In an exemplary embodiment, a discriminative dictionary having two sub-dictionaries, one associated with a glioblastoma (malignant) class and one associated with a meningioma (benign) class, is learned for reconstructing local feature descriptors extracted from training images in the glioblastoma and meningioma classes.
Returning to
Referring to
In a possible embodiment in which an endomicroscopy video stream is received, entropy-based pruning may be used to automatically remove image frames with low image texture information (e.g., low-contrast and contain little categorical information) that may not be clinically interesting or not suitable for image classification. This removal may be used, for example, to address the limited imaging capability of some CLE devices. Image entropy is a quantity which is used to describe the “informativeness” of an image, i.e., the amount of information contained in an image. Low-entropy images have very little contrast and large runs of pixels with the same or similar gray values. On the other hand, high entropy images have a great deal of contrast from one pixel to the next. For CLE images of glioblastoma and meningioma, low-entropy images contain a lot of homogeneous image regions, while high-entropy images are characterized by rich image structures. The pruning can be performed using an entropy threshold. This threshold may be set based on the distribution of the image entropy throughout the dataset of training images used for learning the discriminative dictionary and training the machine learning based classifier.
At step 504, local feature descriptors are extracted from the received endomicroscopy image. In an advantageous embodiment, a respective feature descriptor is extracted at each of a plurality of points on the endomicroscopy image, resulting in a plurality of local feature descriptors extracted from the endomicroscopy image. For example, a feature descriptor such as, Scale Invariant Feature Transform (SIFT), Local Binary Pattern (LBP), Histogram of Oriented Gradient (HOG), or Gabor features, can be extracted at each of a plurality of points in the endomicroscopy image. It is also possible that multiple of the above feature descriptors can be extracted at each of the plurality of points of the endomicroscopy image. In an exemplary implementation, the SIFT feature descriptor is extracted at each of a plurality of points of the endomicroscopy image. The SIFT feature descriptor is invariant to translations, rotations and scaling transformations in the image domain and robust to moderate perspective transformations and illumination variations. In one exemplary implementation, dense SIFT feature descriptors of 20×20 pixel patches computed over a grid with spacing of 10 pixels are extracted from the endomicroscopy image.
In another possible embodiment, rather than using human-designed feature descriptors, local features can be automatically extracted using filters that are learned from training images using machine learning techniques. These machine-learning techniques may use various feature detection techniques including, without limitation, edge detection, corner detection, blob detection, ridge detection, edge direction, change in intensity, motion detection, and shape detection.
At step 506, the local feature descriptors extracted from the endomicroscopy image are encoded using a learnt discriminative dictionary. In an advantageous embodiment, the learnt discriminative dictionary trained using the method of
Various encoding schemes can be used to calculate the reconstruction coefficients x for an input local feature descriptor y using the learnt discriminative dictionary D. For example, the learnt discriminative dictionary can be used in place of a conventional dictionary in existing encoding schemes, such as BoW, sparse coding, or locality-constraint linear coding. Other encoding schemes to calculate the reconstruction coefficients x for an input local feature descriptor y using the learnt discriminative dictionary D are described herein, as well. Such feature encoding schemes are applied to each local descriptor extracted from the endomicroscopy image in order to determine the reconstruction coefficients for each local descriptor.
In an exemplary embodiment, reconstruction coefficients x for each local feature descriptor y can be calculated using feature encoding under the elastic-net regularizer. Encoding the local feature descriptor y under the elastic-net regularizer can be formulated as:
Equation (8) can be re-written as:
where ŷ=[y;0], {circumflex over (D)}=[D;√{square root over (ë2)}I], and I∈K×K is an identity matrix. The ADMM optimization procedure can then be applied to optimize Equation (9) in order to calculate the reconstruction coefficients x.
In another exemplary embodiment, reconstruction coefficients x for each local feature descriptor y can be calculated using feature encoding by nearest centroid. In this embodiment, the local feature descriptor y can be encoded by the nearest dictionary basis, as follows:
In another exemplary embodiment, reconstruction coefficients x for each local feature descriptor y can be calculated using feature encoding under a locality-constrained linear regularizer. Encoding the local feature descriptor y under the locality-constrained linear regularizer can be formulated as:
where b is a locality adaptor that gives a different weight for each dictionary basis proportional to its similarity to the input local feature descriptor y, i.e.,
where dist(y,D)=[dist(y,d1), . . . , dist(y,dK)]T, and ó is a tuning parameter used for adjusting the decay speed for the locality adaptor.
In another exemplary embodiment, reconstruction coefficients x for each local feature descriptor y can be calculated using feature encoding under locality-constrained sparse regularizer. Encoding the local feature descriptor y under the locality-constrained sparse regularizer can be formulated as:
where b is a locality adaptor that gives a different weight for each dictionary basis proportional to its similarity to the input local feature descriptor y, i.e.,
where dist(y,D)=[dist(y,d1), . . . , dist(y,dK)]T, and ó is a tuning parameter used for adjusting the decay speed for the locality adaptor.
In another exemplary embodiment, reconstruction coefficients x for each local feature descriptor y can be calculated using feature encoding under a locality-constrained elastic-net regularizer. Encoding the local feature descriptor y under the locality-constrained elastic-net regularizer can be formulated as:
where b is a locality adaptor that gives a different weight for each dictionary basis proportional to its similarity to the input local feature descriptor y, i.e.,
where dist(y,D)=[dist(y,d1), . . . , dist(y,dK)]T, and ó is a tuning parameter used for adjusting the decay speed for the locality adaptor.
Returning to
In an advantageous embodiment, the coded feature local descriptors (i.e., the reconstruction coefficients for each of the extracted local feature descriptors) for the endomicroscopy image can be pooled in order to generate an image representation of the endomicroscopy image prior to being input to the trained classifier. One or more feature pooling operations can be applied to summarize the coded local feature descriptors to generate a final image representation of the endomicroscopy image. For example, pooling techniques such as max-pooling, average-pooling, or a combination thereof, may be applied to the coded local feature descriptors. In a possible implementation, a combination of max-pooling and average-pooling operations can be used. For example, each feature map may be partitioned into regularly spaced square patches and a max-polling operation may be applied (i.e., the maximum response for the feature over each square patch may be determined). The max-pooling operation allows local invariance to translation. Then, the average of the maximum response may be calculated from the square patches, i.e. average pooling is applied after max-pooling. Finally, the image representation may be formed by aggregating feature responses from the average-pooling operation. Once the pooling is performed, the image representation generated by pooling the coded local feature descriptors for the endomicroscopy image is input to trained classifier and the trained classifier classifies the tissue in the endomicroscopy image based on the input image representation.
In an advantageous embodiment, the trained classifier classifies the tissue in the endomicroscopy image of a brain tumor as glioblastoma (malignant) or meningioma (benign). Further, in addition to classifying the tissue into one of a plurality of tissue classifications (e.g., glioblastoma or meningioma), the trained classifier may also calculate a classification score, which is a probability or confidence score regarding the classification result.
Returning to
Although the method of
The above-described methods for learning a discriminative dictionary and training a machine learning based classifier, and automated classification of tissue in endomicroscopy images may be implemented on a computer using well-known computer processors, memory units, storage devices, computer software, and other components. A high-level block diagram of such a computer is illustrated in
The foregoing Detailed Description is to be understood as being in every respect illustrative and exemplary, but not restrictive, and the scope of the invention disclosed herein is not to be determined from the Detailed Description, but rather from the claims as interpreted according to the full breadth permitted by the patent laws. It is to be understood that the embodiments shown and described herein are only illustrative of the principles of the present invention and that various modifications may be implemented by those skilled in the art without departing from the scope and spirit of the invention. Those skilled in the art could implement various other feature combinations without departing from the scope and spirit of the invention.
This application claims the benefit of U.S. Provisional Application No. 62/139,016, filed Mar. 27, 2015, the disclosure of which is herein incorporated by reference.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US16/23929 | 3/24/2016 | WO | 00 |
Number | Date | Country | |
---|---|---|---|
62139016 | Mar 2015 | US |