The present disclosure relates to a medical imaging technology. More particularly, the disclosure relates to methods and devices for annotating serialized medical images.
Medical imaging technologies, including X-rays, Magnetic Resonance Imaging (MRI), and Computed Tomography (CT), are critical tools in modern diagnostics, providing detailed internal structural images necessary for accurate diagnosis and treatment planning. However, the increasing volume and complexity of imaging data pose significant challenges for healthcare professionals, particularly in the precise identification and annotation of Regions of Interest (RoI).
However, annotating RoIs in a large amount of medical imaging is highly time-consuming. In some practices, accurately annotating Regions of Interest (RoI) often relied on manual operations by radiologists or radiographic technicians. This method is particularly burdensome when dealing with serialized imaging such as CT and MRI, as it requires professionals to annotate each slice obtained from the scan individually. This process not only consumes a significant amount of time but also introduces inconsistencies due to variances among annotators, which can affect the efficiency of medical image analysis and clinical diagnosis.
The disclosure provides a medical imaging annotation device, which includes an interface, a memory and a processor. The memory is configured to store serialized medical images. The processor is coupled to the interface and the memory. The processor is configured to receive, via the interface, a first manual annotation on a first slice image of the serialized medical images and a second manual annotation on a second slice image of the serialized medical images. The processor is configured to execute a bidirectional inference mechanism to generate final annotation labels on intermediate slice images of the serialized medical images based on the first manual annotation and the second manual annotation respectively.
The disclosure also provides a medical imaging annotation method, which includes steps of: obtaining the serialized medical images; receiving a first manual annotation on a first slice image of the serialized medical images and a second manual annotation on a second slice image of the serialized medical images; executing a bidirectional inference mechanism to generate final annotation labels on intermediate slice images of the serialized medical images based on the first manual annotation and the second manual annotation respectively.
It is to be understood that both the foregoing general description and the following detailed description are demonstrated by examples, and are intended to provide further explanation of the invention as claimed.
The disclosure can be more fully understood by reading the following detailed description of the embodiment, with reference made to the accompanying drawings as follows:
Reference will now be made in detail to the present embodiments of the disclosure, examples of which are illustrated in the accompanying drawings. Wherever possible, the same reference numbers are used in the drawings and the description to refer to the same or like parts.
Reference is made to
In some embodiments, the serialized medical images SIMG include Magnetic Resonance Imaging (MRI) scan images or Computed Tomography (CT) scan images. As embodiments shown in
The serialized medical images SIMG are not limited to the abdominal MRI/CT scan images shown in
For brevity, the serialized medical images SIMG shown in
Because each of the serialized medical images SIMG is captured in sequence, adjacent slice images among the serialized medical images SIMG are relevant to each other. When an object (e.g., an organ or a tissue) appears in the slice image SL1 with a specific contour, the object may also appear in the slice image SL2 with a similar contour. Therefore, if a manual annotation PM1 about an object is provided on the slice image SL1 is provided, the manual annotation PM1 on the slice image SL1 can be utilized as a hint to segment a similar object in the subsequent slice images SL2, SL3, and so on. Therefore, when another manual annotation PM8 about an object is provided on the slice image SL8 is provided, the manual annotation PM8 on the slice image SL8 can be utilized as a hint to segment a similar object in the adjacent slice images SL7, SL6, and so on.
The medical imaging annotation device 100 in the disclosure aims to effectively utilize a limited amount of manual-annotated data to enhance the automatic annotation accuracy of algorithms for accurately identifying unlabeled medical images. As shown in
The interface 120 may include Input-Output components (e.g., a touch panel, a keyboard, a mouse, a microphone, a displayer). As shown in
As shown in
In some embodiments, the manual annotations are not limited to be made on the first and the last slice images SL1 and SL8. In other embodiments, the manual annotations can be made on other combinations of the slice images, e.g., on the slice images SL2 and SL7, on the slice images SL3 and SL6.
Reference is further made to
Reference is further made to
In some embodiment, the Track-Anything-Model is a type of computer vision algorithm designed to follow or track an object or label across the sequence of the serialized medical images SIMG. The Track-Anything-Model can be implemented by software instructions involving convolutional neural networks (CNNs) and recurrent neural networks (RNNs) or transformers. The Track-Anything-Model is pre-trained on vast datasets to understand and predict the motion and appearance of a wide variety of objects.
As shown in
Reference is further made to
As shown in
As shown in
The merger 144 is configured to receive and merge the forward prediction annotation PF2 and the backward prediction annotation PB2 regarding to the slice image SL2, and generate a final annotation label P2 on the slice image SL2. Similarly, the merger 144 is also configured to receive and merge the forward prediction annotation PF3 and the backward prediction annotation PB3 regarding to the slice image SL3, and generate a final annotation label P3 on the slice image SL3. Similarly, the merger 144 is also configured to receive and merge the forward prediction annotation PF7 and the backward prediction annotation PB7 regarding to the slice image SL7, and generate a final annotation label P7 on the slice image SL7.
In some embodiments, the final annotation labels P2 to P7 can be determined by the merger 144 according to a distance between a target slice image and the manual-labeled image.
For example, regarding to a Kth slice image of the serialized medical images, a final annotation label on the Kth slice image is produced by calculating a first distance between the slice image SL1 and the Kth slice image with a second distance between the Kth slice image and the slice image SL8. If the first distance is shorter than the second distance, the forward prediction annotation on the Kth slice image is selected by the merger 144 as the final annotation label on the Kth slice image. If the second distance is shorter than the first distance, the backward prediction annotation on the Kth slice image is selected by the merger 144 as the final annotation label on the Kth slice image. K is a positive integer.
For example, regarding to the slice image SL2, the first distance between the slice image SL1 and the slice image SL2 is “1” and the second distance between the slice image SL2 and the slice image SL8 is “6”. The first distance is shorter. In other words, the slice image SL2 is closer to the slice image SL1. In this case, the forward prediction annotation PF2 will be selected by the final annotation label P2 on the slice image SL2.
Regarding to the slice image SL3, the first distance between the slice image SL1 and the slice image SL3 is “2” and the second distance between the slice image SL3 and the slice image SL8 is “5”. In this case, the forward prediction annotation PF3 will be selected by the final annotation label P3 on the slice image SL3.
Regarding to the slice image SL7, the first distance between the slice image SL1 and the slice image SL7 is “6” and the second distance between the slice image SL7 and the slice image SL8 is “1”. The second distance is shorter. In other words, the slice image SL7 is closer to the slice image SL8. In this case, the backward prediction annotation PB7 will be selected by the final annotation label P7 on the slice image SL7.
In some embodiments, the final annotation labels P2 to P7 are not limited to be selected from the forward prediction annotation and the backward prediction annotation. In other embodiments, the final annotation labels P2 to P7 can be determined by the merger 144 according to a weighted sum between forward prediction annotation and the backward prediction annotation.
For example, regarding to a Kth slice image of the serialized medical images, a final annotation label on the Kth slice image is produced by calculating a first distance between the slice image SL1 and the Kth slice image with a second distance between the Kth slice image and the slice image SL8, and then generating the final annotation label on the Kth slice image based on a weighted sum between a forward prediction annotation and a backward prediction annotation on the Kth slice image according to the first distance and the second distance.
For example, regarding to the slice image SL2, the first distance between the slice image SL1 and the slice image SL2 is “1” and the second distance between the slice image SL2 and the slice image SL8 is “6”. The final annotation label P2 on the slice image SL2 can be generated by a weighted sum between the forward prediction annotation PF2 and the backward prediction annotation PB2, as below:
In some embodiments, the forward prediction annotation PF2 includes possibility values (e.g., 0% to 100%) on pixels of the slice image SL2. The possibility values represent whether the pixels are relative to the target object or the label. The backward prediction annotation PB2 also includes possibility values (e.g., 0% to 100%) on pixels of the slice image SL2. The merger 144 is configured to calculate the weighted sums between the possibility values of the forward prediction annotation PF2 and the backward prediction annotation PB2. In some example, if a weighted sum on one pixel is above 50%, this pixel will be selected to be included in the final annotation label P2 on the slice image SL2. In some example, if a weighted sum on one pixel is below 50%, this pixel will be excluded from the final annotation label P2 on the slice image SL2.
Regarding to the slice image SL3, the first distance between the slice image SL1 and the slice image SL3 is “2” and the second distance between the slice image SL3 and the slice image SL8 is “5”. The final annotation label P3 on the slice image SL3 can be generated by a weighted sum between the forward prediction annotation PF3 and the backward prediction annotation PB3, as below:
Regarding to the slice image SL7, the first distance between the slice image SL1 and the slice image SL7 is “6” and the second distance between the slice image SL7 and the slice image SL8 is “1”. The final annotation label P7 on the slice image SL7 can be generated by a weighted sum between the forward prediction annotation PF7 and the backward prediction annotation PB7, as below:
The bidirectional inference mechanism, which includes the tracking model 142 and the merger 144 shown in
The manual annotations PM1 and PM8 and the final annotation labels P2 to P7 on the serialized medical images SIMG can be utilized as training data to train a medical-related model, such an organ segmentation model, a medical imaging classification model, a diagnosis-aiding model.
The bidirectional inference mechanism, which includes the tracking model 142 and the merger 144 shown in
To overcome the potential shape discontinuity issues introduced by Post-Fusion when combining annotation information, the disclosure provide another manner to perform the bidirectional inference mechanism. Reference is further made to
XMem is a model proposed for long-term video object segmentation (VOS), drawing inspiration from the Atkinson-Shiffrin memory model, which consists of short-term and long-term memory systems.
As shown in
In some embodiments, the bidirectional XMem model 146 is configured to generate the final annotation labels on the intermediate slice images based on the manual annotation PM1 and the second manual annotation PM8.
In some embodiments, the bidirectional XMem model 146 is able to process a pair of slice images of the serialized medical images SIMG at one time to generate final annotation labels on the pair of slice images. For example, after receiving the serialized medical images SIMG (including the slice image SL1 to SL8), the manual annotation PM1 on the slice image SL1 and the manual annotation PM8 on the slice image SL8, the bidirectional XMem model 146 may generate the final annotation labels P2 and P7 on one pair of the slice images SL2 and SL7 at first.
As shown in
The sensory memory 146a is a short-term memory, which is responsible for handling immediate data from the most recent slices of the serialized medical images SIMG. The sensory memory 146a captures and processes transient information that is crucial for the segmentation of current frames. As shown in
The working memory 146b and the long-term memory 146c retain crucial information from previous frames that may no longer be present in the short-term memory. The working memory 146b and the long-term memory 146c are used to maintain context over the serialized medical images SIMG. The working memory 146b can be thought of as a dynamic, flexible storage space that handles information actively and is used for immediate processing tasks. It deals with current and recent frames, allowing the system to adapt quickly to changes in the input. In contrast to the working memory 146b, the long-term memory 146c stores information deemed important for maintaining consistency and context over the serialized medical images SIMG. The long-term memory 146c retains crucial object features and contextual data that persist over longer timescales.
As shown in
In some embodiments, the forward memory key feature KWF and the forward memory value feature VWF in the working memory 146b are updated according to forward information data INF1. At the beginning, the forward information data INF1 include the slice image SL1 and the manual annotation PM1. On the other hand, the backward memory key feature KWB and the backward memory value feature VWB in the working memory 146b are updated according to backward information data INF2. At the beginning, the backward information data INF2 include the slice image SL8 and the manual annotation PM8.
Using a bidirectional mechanism, data stored in the working memory 146b and the long-term memory 146c will considers information (e.g., INF1 and INF2) from slices both ahead and behind. This bidirectional memory retrieval enhances the contextual understanding of the bidirectional XMem model 146.
As shown in
As shown in
The bidirectional XMem model 146 merges the memories retrieved at different times (mid-term and long-term) to obtain a unified and comprehensive memory representation, such as the memory key Kc. Different from the original XMem, the bidirectional XMem model 146 ensures a more complete representation of shape and temporal information. It is assumed that the shape information from the early, mid, and late stages of serialized images can mutually assist each other, which is particularly useful in medical imaging for tracking objects.
As shown in
The decoder 146e is configured to generate a forward annotation mask (i.e., the final annotation label P2) about the forward input medical image (i.e., the slice image SL2) and a backward annotation mask (i.e., the final annotation label P7) about the backward input medical image (i.e., the slice image SL7) according to the forward input query QSL2, the backward input query QSL7, the readout feature FRO, the short-term forward hidden representation h1 and the short-term backward hidden representation h8 from the sensory memory 146a.
The decoder 146e is to generate the final segmentation masks from the combined features obtained from the query encoder 146d and memory modules (the sensory memory 146a, the working memory 146b and the long-term memory 146c). The decoder 146e takes in the enriched feature representation that combines current slice features from the query encoder 146d with relevant historical data retrieved from the memory modules. This integration ensures that both current observations and past context contribute to the segmentation outcomes (the final annotation label P2 and the final annotation label P7).
The mask encoder 146f is configured to generate a short-term forward hidden representation h2 according to the forward annotation mask (i.e., the final annotation label P2) and a short-term backward hidden representation h7 according to the backward annotation mask (i.e., the final annotation label P7). The short-term forward hidden representation h2 and the short-term backward hidden representation h7 are updated into the sensory memory 146a for following segmentation regarding to another pair of slice images (will be discussed in
In this case, the final annotation label P2 about the slice image SL2 and the final annotation label P7 about the slice image SL7 can be generated by the bidirectional XMem model 146 in reference with the manual annotation PM1 on the slice image SL1 and the manual annotation PM8 on the slice image SL8. As shown in
Reference is further made to
As shown in
In some embodiments, the forward memory key feature KWF and the forward memory value feature VWF in the working memory 146b are updated according to forward information data INF1. Currently, the forward information data INF1 include the slice image SL1, the manual annotation PM1, the slice image SL2 and the final annotation label P2. On the other hand, the backward memory key feature KWB and the backward memory value feature VWB in the working memory 146b are updated according to backward information data INF2. Currently, the backward information data INF2 include the slice image SL8, the manual annotation PM8, the slice image SL7 and the final annotation label P7.
As shown in
As shown in
As shown in
The decoder 146e is configured to generate a forward annotation mask (i.e., the final annotation label P3) about the forward input medical image (i.e., the slice image SL3) and a backward annotation mask (i.e., the final annotation label P6) about the backward input medical image (i.e., the slice image SL6) according to the forward input query QSL3, the backward input query QSL6, the readout feature FRO, the short-term forward hidden representation h2 and the short-term backward hidden representation h7 from the sensory memory 146a.
The decoder 146e is to generate the final segmentation masks from the combined features obtained from the query encoder 146d and memory modules (the sensory memory 146a, the working memory 146b and the long-term memory 146c). The decoder 146e takes in the enriched feature representation that combines current slice features from the query encoder 146d with relevant historical data retrieved from the memory modules. This integration ensures that both current observations and past context contribute to the segmentation outcomes (the final annotation label P2 and the final annotation label P7).
The mask encoder 146f is configured to generate a short-term forward hidden representation h3 according to the forward annotation mask (i.e., the final annotation label P3) and a short-term backward hidden representation h6 according to the backward annotation mask (i.e., the final annotation label P6). The short-term forward hidden representation h3 and the short-term backward hidden representation h6 are updated into the sensory memory 146a for following segmentation regarding to another pair of slice images.
In this case, the final annotation label P3 about the slice image SL3 and the final annotation label P6 about the slice image SL6 can be generated by the bidirectional XMem model 146 in reference with current input data and historical data. As shown in
Similarly, final annotation labels P4 and P5 regarding to other slice images SL4 and SL5 can be generated by the bidirectional XMem model 146.
In some embodiments, the query encoder 146d, the decoder 146e and the mask encoder 146f shown in
As shown in aforementioned embodiments, the medical imaging annotation device 100 can handle various kinds of continuous data such as serialized images and medical videos simultaneously. It effectively leverages multiple different annotation data using the bidirectional inference mechanism.
In some embodiments, the medical imaging annotation device 100 can be implemented by a computer, a computing server or a medical image server. The processor 140 can be implemented by a central processing unit, a graphic processing unit, a tensor processing unit or an ASIC.
A medical imaging annotation method executed by the medical imaging annotation device 100 is also an embodiment of this disclosure. The medical imaging annotation method include steps of: obtaining the serialized medical images SIMG; receiving a first manual annotation (e.g., the manual annotation PM1) on a first slice image (e.g., the slice image SL1) of the serialized medical images SIMG and a second manual annotation (e.g., the manual annotation PM8) on a second slice image (e.g., the slice image SL8) of the serialized medical images SIMG; executing a bidirectional inference mechanism (e.g., the tracking model 142 and the merger 144 shown in
Although the present invention has been described in considerable detail with reference to certain embodiments thereof, other embodiments are possible. Therefore, the spirit and scope of the appended claims should not be limited to the description of the embodiments contained herein.
It will be apparent to those skilled in the art that various modifications and variations can be made to the structure of the present invention without departing from the scope or spirit of the invention. In view of the foregoing, it is intended that the present invention cover modifications and variations of this invention provided they fall within the scope of the following claims.
This application claims the priority benefit of U.S. Provisional Application Ser. No. 63/599,565, filed Nov. 16, 2023, which is herein incorporated by reference.
Number | Date | Country | |
---|---|---|---|
63599565 | Nov 2023 | US |