Embodiments described herein generally relate to generating pseudo lesion masks from bounding box annotations to aid training of deep learning segmentation models.
Supervised training of deep learning models for segmentation requires ground truth segmentation masks. However, annotating segmentation masks is often very costly, especially in the healthcare domain (for example, lesion segmentation in the context of medical imaging). In particular, annotating lesion masks in two-dimensional (2-D) or three-dimensional (3-D) medical images is time consuming as the annotator needs to draw the contours of every lesion present in each image of a given study. Often studies might contain multiple lesions that might expand across multiple slices (where the study is in 3-D). Additionally, annotating a lesion mask generally requires an expert (for example, a radiologist). Finally, there is variability between annotators at determining the true boundaries of a lesion, which may impact overall performance of a deep learning model trained via supervised learning from a set of lesion masks generated by multiple annotators.
To solve these and other problems, embodiments described herein provide methods and systems for generating pseudo lesion masks from bounding box annotations to aid training of deep learning segmentation models. In particular, embodiments described herein allow for the training of a lesion segmentation model without the need of annotating all cases (in a training dataset) with lesion masks. Rather, embodiments described herein use bounding boxes (for example, in two-dimensions or three-dimensions). By using bounding boxes as opposed to annotated lesion masks, the efficiency of annotating training data is increased.
For example, one embodiment provides a system of generating pseudo lesion masks. The system includes an electronic processor configured to receive an annotated medical image, the annotated medical image including a bounding box annotation positioned around at least one lesion of the medical image. The electronic processor is also configured to generate, using a ground truth generator, a pseudo-mask candidate, the pseudo-mask candidate representing a pseudo lesion mask for the at least one lesion of the medical image. The electronic processor is also configured to train a segmentation model using the pseudo-mask candidate as ground truth.
Another embodiment provides a method of generating pseudo lesion masks. The method includes receiving, with an electronic processor, an annotated medical image, the annotated medical image including a bounding box annotation positioned around at least one lesion of the medical image. The method also includes generating, with the electronic processor using a ground truth generator, a pseudo-mask candidate, the pseudo-mask candidate representing a pseudo lesion mask for the at least one lesion of the medical image. The method also includes training, with the electronic processor, a segmentation model using the pseudo-mask candidate as ground truth.
Another embodiment provides a non-transitory, computer-readable medium storing instructions that, when executed by an electronic processor, perform a set of functions. The set of functions includes receiving an annotated medical image, the annotated medical image including a bounding box annotation positioned around at least one lesion of the medical image. The set of functions also includes generating, using a ground truth generator, a pseudo-mask candidate, the pseudo-mask candidate representing a pseudo lesion mask for the at least one lesion of the medical image. The set of functions also includes training a segmentation model using the pseudo-mask candidate as ground truth.
Other aspects of the embodiments described herein will become apparent by consideration of the detailed description and accompanying drawings.
The patent or application file contains at least one drawing executed in color. Copies of this patent or patent application publication with color drawing(s) will be provided by the Office upon request and payment of the necessary fee.
Other aspects of the embodiments described herein will become apparent by consideration of the detailed description.
One or more embodiments are described and illustrated in the following description and accompanying drawings. These embodiments are not limited to the specific details provided herein and may be modified in various ways. Furthermore, other embodiments may exist that are not described herein. Also, the functionality described herein as being performed by one component may be performed by multiple components in a distributed manner. Likewise, functionality performed by multiple components may be consolidated and performed by a single component. Similarly, a component described as performing particular functionality may also perform additional functionality not described herein. For example, a device or structure that is “configured” in a certain way is configured in at least that way but may also be configured in ways that are not listed. Furthermore, some embodiments described herein may include one or more electronic processors configured to perform the described functionality by executing instructions stored in non-transitory, computer-readable medium. Similarly, embodiments described herein may be implemented as non-transitory, computer-readable medium storing instructions executable by one or more electronic processors to perform the described functionality. As used herein, “non-transitory computer-readable medium” comprises all computer-readable media but does not consist of a transitory, propagating signal. Accordingly, non-transitory computer-readable medium may include, for example, a hard disk, a CD-ROM, an optical storage device, a magnetic storage device, a ROM (Read Only Memory), a RAM (Random Access Memory), register memory, a processor cache, or any combination thereof.
In addition, the phraseology and terminology used herein is for the purpose of description and should not be regarded as limiting. For example, the use of “including,” “containing,” “comprising,” “having,” and variations thereof herein is meant to encompass the items listed thereafter and equivalents thereof as well as additional items. The terms “connected” and “coupled” are used broadly and encompass both direct and indirect connecting and coupling. Further, “connected” and “coupled” are not restricted to physical or mechanical connections or couplings and can include electrical connections or couplings, whether direct or indirect. In addition, electronic communications and notifications may be performed using wired connections, wireless connections, or a combination thereof and may be transmitted directly or through one or more intermediary devices over various types of networks, communication channels, and connections. Moreover, relational terms such as first and second, top and bottom, and the like may be used herein solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions.
As described above, supervised training of deep learning models for lesion segmentation requires ground truth segmentation masks. However, annotating lesion masks is very costly. In particular, annotating a lesion mask is time consuming (for example, multiple lesions, multiple slices for a given case). Additionally, annotating a lesion mask generally requires an expert (for example, a radiologist). Finally, there is variability between annotators.
Therefore, to solve these and other issues with existing lesion segmentation technology, embodiments described herein generate pseudo lesion masks from bounding box annotations to aid training of deep learning segmentation models, which increases the efficiency of annotating training data. For example, embodiments described herein provide methods and systems for generating pseudo lesion masks from bounding box annotations such that the training of a lesion segmentation model may be performed without the need of annotating all cases (in a training dataset) with lesion masks. Rather, embodiments described herein use bounding boxes (for example, in two-dimensions or three-dimensions). By using bounding boxes as opposed to annotated lesion masks, the efficiency of annotating training data is increased.
The server 105, the medical image database 115, and the user device 117 communicate over one or more wired or wireless communication networks 120. Portions of the communication network 120 may be implemented using a wide area network, such as the Internet, a local area network, such as a Bluetooth™ network or Wi-Fi, and combinations or derivatives thereof. Alternatively or in addition, in some embodiments, components of the system 100 communicate directly as compared to through the communication network 120. Also, in some embodiments, the components of the system 100 communicate through one or more intermediary devices not illustrated in
The server 105 is a computing device, which may server as a gateway for the medical image database 115. For example, in some embodiments, the server 105 may be a PACS server. Alternatively, in some embodiments, the server 105 may be a server that communicates with a PACS server to access the medical image database 115. As illustrated in
The electronic processor 200 includes a microprocessor, an application-specific integrated circuit (ASIC), or another suitable electronic device for processing data. The memory 205 includes a non-transitory computer-readable medium, such as read-only memory (“ROM”), random access memory (“RAM”) (for example, dynamic RAM (“DRAM”), synchronous DRAM (“SDRAM”), and the like), electrically erasable programmable read-only memory (“EEPROM”), flash memory, a hard disk, a secure digital (“SD”) card, another suitable memory device, or a combination thereof. The electronic processor 200 is configured to access and execute computer-readable instructions (“software”) stored in the memory 205. The software may include firmware, one or more applications, program data, filters, rules, one or more program modules, and other executable instructions. For example, the software may include instructions and associated data for performing a set of functions, including the methods described herein.
For example, as illustrated in
Accordingly, the learning engine 220 (as executed by the electronic processor 200) may perform machine learning using training data (for example, using ground truth) to develop a segmentation model that performs lesion segmentation with respect to one or more medical images (for example, the medical images stored in the medical image database 115). In other words, the segmentation model detects and segments one or more lesions included in a medical image. The training data may include, for example, medical images including at least one lesion and associated lesion masks (as ground truth).
Segmentation models generated by the learning engine 220 may be stored in the segmentation model database 225. As illustrated in
As seen in
The communication interface 210 allows the server 105 to communicate with devices external to the server 105. For example, as illustrated in
The user device 117 is also a computing device and may include a desktop computer, a terminal, a workstation, a laptop computer, a tablet computer, a smart watch or other wearable, a smart television or whiteboard, or the like. Although not illustrated, the user device 117 may include similar components as the server 105 (an electronic processor, a memory, and a communication interface). The user device 117 may also include a human-machine interface 140 for interacting with a user. The human-machine interface 140 may include one or more input devices, one or more output devices, or a combination thereof. Accordingly, in some embodiments, the human-machine interface 140 allows a user to interact with (for example, provide input to and receive output from) the user device 117. For example, the human-machine interface 140 may include a keyboard, a cursor-control device (for example, a mouse), a touch screen, a scroll ball, a mechanical button, a display device (for example, a liquid crystal display (“LCD”)), a printer, a speaker, a microphone, or a combination thereof. As illustrated in
Additionally, in some embodiments, to communicate with the server 110, the user device 117 may store a browser application or a dedicated software application executable by an electronic processor. The system 100 is described herein as providing a lesion segmentation and lesion mask generation service through the server 110. However, in other embodiments, the functionality described herein as being performed by the server 110 may be locally performed by the user device 117. For example, in some embodiments, the user device 117 may store the learning engine 220, the segmentation model database 225, the ground truth generator 230, or a combination thereof.
The medical image database 115 stores a plurality of medical images 165. In some embodiments, the medical image database 115 is combined with the server 105. Alternatively or in addition, the medical images 165 may be stored within a plurality of databases, such as within a cloud service. Although not illustrated in
The medical images 165 stored in the medical image database 115 may include a variety of classifications or types. For example, the medical images 165 may include anatomical images, such as a lateral chest radiograph, a PA chest radiograph, and the like. In some embodiments, a memory of the medical image database 115 stores the medical images 165 and associated data (for example, reports, metadata, and the like). For example, the medical image database 115 may include a picture archiving and communication system (“PACS”), a radiology information system (“RIS”), an electronic medical record (“EMR”), a hospital information system (“HIS”), an image study ordering system, and the like.
A user may use the user device 117 to access and view the medical images 165 and interact with the medical images 165. For example, the user may access the medical images 165 from the medical image database 115 (through a browser application or a dedicated application stored on the user device 117 that communicates with the server 105) and view the medical images 165 on the display device 160 associated with the user device 117. Alternatively or in addition, the user may access the medical images 165 from the medical image database 115 and annotate the medical images 165 (via the human machine interface 140 of the user device 117). As one example, the user may annotate a medical image 165 by adding a bounding box around a lesion included in the medical image 165.
As noted above, annotating lesion masks in medical images 165 is time consuming (for example, multiple lesions, where lesion masks need to be drawn on each slice where the lesion is present a given case, and the like) and generally requires an expert (for example, a radiologist). To solve these and other problems, the system 100 is configured to automatically generate pseudo-mask candidates (for example, pseudo lesion masks) from bounding box annotations to aid training of deep learning segmentation models (for example, the models stored in the segmentation model database 225). The methods and systems described herein train (or re-train) the segmentation model(s) stored in the segmentation model database 225 using the pseudo-mask candidates as training data (or ground truth).
For example,
As illustrated in
After receiving the annotated medical image (at block 305), the electronic processor 200 (using the ground truth generator 230) generates a pseudo-mask candidate (at block 310). As noted above, the pseudo-mask candidate may represent a pseudo lesion mask for the lesion included in the annotated medical image. The pseudo-mask candidate may include a two-dimensional lesion mask or a three-dimensional lesion mask. For embodiments where a three-dimensional lesion mask is generated, the bounding box annotation may be a three-dimensional bounding box annotation.
In some embodiments, the electronic processor 200 (i.e., the ground truth generator 230) generates the pseudo-mask candidate by generating a shape. As noted above, the pseudo-mask candidate may include a two-dimensional lesion mask or a three-dimensional lesion mask. Accordingly, the shape may include a two-dimensional shape or a three-dimensional shape, such as, for example, a two-dimensional circle, a two-dimensional ellipse, a three-dimensional sphere, or the like. The electronic processor 200 may position (or fit) the shape within the bounding box of the annotated medical image. The electronic processor 200 may then deform the shape within the bounding box, where the deformed shape represents the pseudo-mask candidate. The electronic processor 200 may deform the shape by, for example, adjusting one or more boundaries (or boundary points) of the shape (i.e., the boundary defining the shape or area of the shape). For example,
Alternatively or in addition, in some embodiments, the electronic processor 200 (i.e., the ground truth generator 230) generates the pseudo-mask candidate using an edge detection process. In such embodiments, the electronic processor 200 may execute an edge detection process on the medical image 165 to determine one or more boundaries of the lesion included in the medical image 165. In particular, the electronic processor 200 may estimate rough or estimated lesion boundaries within the bounding box of the medical image 165. After determining the boundaries of the lesion, the electronic processor 200 may then deform at least one of the boundaries of the lesion to generate the pseudo-mask candidate (i.e., the ground truth).
Alternatively or in addition, in some embodiments, the electronic processor 200 (i.e., the ground truth generator 230) generates the pseudo-mask candidate using a pre-existing segmentation model. The pre-existing segmentation model may be based on machine learning, and may have been trained using a fully annotated training dataset that is smaller (in terms of number of cases) than the dataset being used to train the segmentation model. In such embodiments, the electronic processor 200 may access the pre-existing segmentation model (for example, a segmentation model stored in the segmentation model database 225). After accessing the pre-existing segmentation model, the electronic processor 200 uses the pre-existing segmentation model to generate an approximate or estimated lesion mask that fits within the bounding box annotation of the medical image 165. The electronic processor 200 may then deform at least one boundary of the approximate or estimated lesion mask as a deformed approximate lesion mask, where the deformed approximate lesion mask is used as the pseudo-mask candidate.
Alternatively or in addition, in some embodiments, the electronic processor 200 (i.e., the ground truth generator 230) generates the pseudo-mask candidate using a collection of previously annotated lesion masks. For example, in some embodiments, the medical images 165 stored in the medical image database 115 (or a portion thereof) are medical images 165 that were previously annotated with lesion masks. In such embodiments, the electronic processor 200 may sample the previously annotated lesion masks from the collection of previously annotated lesion masks. The electronic processor 200 may deform the sampled lesion masks (for example, by altering at least one boundary of a lesion mask). After deforming the sampled lesion mask, the electronic processor 200 may then position (or fit) the deformed sampled lesion mask into the bounding box annotation of the medical image 165 as the pseudo-mask candidate.
Alternatively or in addition, in some embodiments, the electronic processor 200 (i.e., the ground truth generator 230) generates the pseudo-mask candidate using a collection of previously annotated lesion masks. For example, as noted above, in some embodiments, the medical images 165 stored in the medical image database 115 (or a portion thereof) are medical images 165 that were previously annotated with lesion masks. In such embodiments, the electronic processor 200 may determine a probability distribution of each lesion mask included in the collection of previously annotated lesion masks. The electronic processor 200 may then generate the pseudo-mask candidate based on the probability distribution.
Alternatively or in addition, in some embodiments, the electronic processor 200 (i.e., the ground truth generator 230) generates the pseudo-mask candidate using a generative adversarial network (GAN). In such embodiments, the electronic processor 200 trains a GAN configured to generate one or more lesion mask shapes (for example, realistic lesion mask shapes). In some embodiments, the GAN generates the lesion mask shapes using an input, such as a bounding box aspect ratio, a medical image (for example, a CT image), noise, or the like. After training the GAN, the electronic processor 200 may generate a lesion mask using the GAN, where the lesion mask is the pseudo-mask candidate.
After generating the pseudo-mask candidate (at block 310), the electronic processor 200 trains a segmentation model using the pseudo-mask candidate (at block 315). In some embodiments, the electronic processor 200 uses the pseudo-mask candidate as ground truth (or training data) for the segmentation model.
In some embodiments, the electronic processor 200 is configured to update (or re-train) the segmentation model (for example, the segmentation model 505). The electronic processor 200 may update (or re-train) the segmentation model by comparing the predicted lesion mask and the pseudo-mask candidate and determine a difference (or error) between the predicted lesion mask and the pseudo-mask candidate, as seen in
Accordingly, generating a bounding-box annotation generally requires less work than generating a different, more precise annotation of a lesion. For example, a user may be able to quickly add one or more bounding boxes to an image (for example, four points per lesion for two-dimensions and eight points per lesion for three-dimensions) as compared to marking, with greater precision the boundaries of each lesion represented within an image. Thus, automatically generating ground truth from two-dimensional or three-dimensional bounding boxes generally allows training data (i.e., ground truth) to be generated more quickly and efficiency than existing technology. Furthermore, the different ways a mask can be generated from a bounding-box annotation as described above, allows the complexity and accuracy of the system to be configured and controlled as needed.
Various features and advantages of the embodiments described herein are set forth in the following claims.