Having annotated data is crucial to the training of machine-learning (ML) models or artificial neural networks. Conventional ways of data annotation rely heavily on manual work (e.g., by qualified annotators such as radiologists if the data includes medical images) and even when computer-based tools are provided, they still require a tremendous amount of human effort (e.g., mouse clicking, drag-and-drop, etc.). This strains resources and often leads to inadequate and/or inaccurate results. Accordingly, it is highly desirable to develop systems and methods to automate the data annotation process such that more data may be obtained for ML training and/or verification.
Disclosed herein are systems, methods, and instrumentalities associated with automatic 3D data (e.g., 3D images) annotation. According to embodiments of the disclosure, an apparatus configured to perform the data annotation task may include at least one processor that may be configured to obtain a first sequence of two-dimensional (2D) images (e.g., based on a first three-dimensional (3D) image dataset) and further obtain a first manual annotation based on a first user input (e.g., obtained through a graphical user interface provided by the processor), where the first manual annotation may be associate with a first image in the first sequence of 2D images and may indicate a location of a person or an object (e.g., an anatomical structure such as an organ) in the first image. The at least one processor may be configured to annotate, automatically, a first subset of images in the first sequence of 2D images based on the first manual annotation and a first machine-learning (ML) model, and to further annotate, automatically, a second subset of images in the first sequence of 2D images based on the first ML model and a second annotation. The second annotation may be an annotation automatically generated for the last image of the first subset of images or an annotation manually generated for a second image of the first sequence of 2D images. In this manner, the at least one processor may perform the automatic annotation task progressively, e.g., by processing one subset or batch of images at a time.
In some embodiments, the at least one processor may be configured to determine that the number of images included in the first subset of images (e.g., the number of images to be automatically annotated based on the first manual annotation in the first batch) is equal to the size of a pre-defined annotation propagation window. In some embodiments, the at least one processor may be configured to determine that the number of images included in the first subset of images (e.g., the number of images to be automatically annotated based on the first manual annotation in the first batch) is equal to the number of images sequentially located between the first image (e.g., corresponding to the first manual annotation) and the second image (e.g., corresponding to a second manual annotation) in the first sequence of 2D images. In other words, the at least one processor may be configured to automatically annotate images in the first sequence of 2D images based on the first manual annotation and the pre-defined annotation propagation window size until the processor encounters another manually annotated image (e.g., within the propagation window).
In some embodiments, the first ML model may be trained for extracting first features associated with the person or the object from the first manual annotation, extracting respective second features associated with the person or the object from the first subset of images, and automatically annotating the first subset of images based on the first features and the second features. In some embodiments, the at least one processor may be further configured to obtain a third manual annotation that may be associated with a third image in the first subset of images or in the second subset of images, and to annotate, automatically, one or more images adjacent to the third image (e.g., according to the pre-defined annotation propagation window) based on the third manual annotation. This way, a user may adjust an auto-generated annotation and have the adjustment propagated to other images to ensure the quality of the automatic annotation process.
In some embodiments, the first ML model may be trained using a plurality of sequentially ordered training images and, during the training, the first ML model may be used to annotate, automatically, the plurality of sequentially ordered training images in a first order (e.g., an ascending order of image indices) and based on a first manually created training annotation. The first ML model may be further used to annotate, automatically, the plurality of sequentially ordered training images in a second order (e.g., a descending order of the image indices) and based on a second manually created training annotation. The parameters of the first ML model may then be adjusted to enforce consistency between corresponding annotations obtained in the first order and the second order.
In some embodiments, the at least one processor described herein may be further configured to determine, based on a second ML model and a readiness score associated with one or more annotated image sequences, whether to use the one or more annotated image sequences to automatically annotate the second sequence of 2D images. Such a second ML model may be trained for predicting a query annotation based on the one or more annotated image sequences and the readiness score may be determined by comparing the query annotation with a ground truth annotation. If the determination is to use the one or more annotated image sequences to automatically annotate the second sequence of 2D images, the at least one processor may obtain an annotation for the second sequence of 2D images (e.g., an initial annotation that may be propagated through the second sequence of 2D images) based on the one or more annotated image sequences and the second ML model.
A more detailed understanding of the examples disclosed herein may be had from the following description, given by way of example in conjunction with the accompanying drawing.
The present disclosure is illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings.
In some embodiments of the present disclosure, a computer-generated user interface (not shown) may be provided to display the first sequence of 2D images 102 (e.g., after they have been obtained using the technique described above), and a user may, through the interface, select and annotate one or more of the first sequence of 2D images 102. For example, the user may select an image 102a from the first sequence of 2D images 102 and annotate, at 104, the image 102a by marking, outlining, or otherwise indicating the location or contour of an object of interest 106 (e.g., a brain hemorrhage) in the image 102a to obtain an annotation 108 (e.g., a segmentation mask) for the original image 102a. The annotation operation at 104 may be performed using tools provided by the user interface, which may allow the user to create the annotation 108 through one or more of a click, a tap, a drag-and-drop, a click-drag-and-release, a sketching or drawing motion, etc. that may be executed by the user with an input device (e.g., a computer mouse, a keyboard, a stylus, a touch screen, the user's finger, etc.). The annotation 108 may then be used at 110 to generate a 3D annotation for the first 3D image dataset, for example, by propagating the annotation 108 through (e.g., by automatically annotating) multiple other images 102b, 102c, etc. of the first sequence 102 (e.g., annotation 108 may be propagated to all or a subset of the first sequence of 2D images 102) to obtain annotations (e.g., segmentation masks) 108b, 108c, etc. The automatic 3D annotation at 110 may be accomplished using a first machine-learning (ML) data annotation model, which may be trained for detecting features associated with the object of interest 106 in image 102a (e.g., based on manual annotation provided at 104), identifying areas having similar features from the other 2D images (e.g., 102b, 102c, etc.) of the first sequence 102, and automatically annotating those areas as containing the object of interest 106. The implementation and/or training of the first ML data annotation model will be described in greater detail below, and the term “machine-learning model” may be used interchangeably herein with the term “machine-learned model” or “artificial intelligence model.”
The 3D annotation generated at 110 for the first 3D image dataset may be displayed to a user (e.g., the same user who created the 2D annotation 108), who may then confirm and/or adjust the 3D annotation (e.g., through one or more user inputs), for example, using the same user interface described above. The confirmed and/or adjusted 3D annotation may be used to automatically annotate (e.g., generate a 3D annotation for) other 3D image datasets such as a second 3D image dataset that may be associated with a second patient, as described below. The user-provided adjustment may also be used to improve the first ML data annotation model, for example, through reinforcement learning, which may be conducted (e.g., in an online manner) after the first ML data annotation model has been trained (e.g., in an offline manner) and deployed. The improvement may be accomplished, for example, based on the differences (e.g., prediction errors) between the automatic annotation predicted by the first ML data annotation model and the user input.
To annotate the second 3D image dataset based on the first 3D image dataset such as a previously annotated sequence of 2D images (e.g., the first sequence 102 of
The automatically generated initial 2D annotation(s) for the second sequence 202 may be confirmed and/or adjusted by a user, for example, using the user interface described herein. The confirmed or adjusted 2D annotation may then be propagated (e.g., from image 202a) to other images (e.g., 202b, 202c, etc.) of the second sequence 202 (e.g., to all or a subset of the second sequence of 2D images 202) to obtain a 3D annotation 208 for the second 3D image dataset (e.g., comprising segmentation masks 208a, 208b, 208c, etc.). The propagation (e.g., the automatic annotation of images 202b, 202c, etc.) may be accomplished, for example, based on the first ML data annotation model described herein. The user-confirmed or adjusted annotation(s) may also be used for improving the second ML data annotation model, for example, to generate more accurate initial annotation(s) for subsequent 3D image datasets.
In some implementations of the ML model shown in
The annotation 406 for the first image 402 may be used to enhance the completeness and/or accuracy of the first plurality of features f1 (e.g., which may be obtained as a feature vector or feature map). For example, using a normalized version of the annotation 406 (e.g., by converting probability values in the annotation mask to a value range between 0 and 1), the first image 402 (e.g., pixel values of the first image 402) may be weighted (e.g., before the weighted imagery data is passed to the feature extraction operation at 408) such that pixels belonging to the object of interest may be given larger weights during the feature extraction process. As another example, the normalized annotation mask may be used to apply respective weights to the features (e.g., preliminary features) extracted at 408 such that features associated with the object of interest may be given larger weights within the feature representation f1.
The second image 404, which may include the same object of interest as the first image 402, may be processed through a feature extraction module 410 (e.g., which may be the same feature extraction module as 408 or a difference feature extraction module) to determine a second plurality of features f2. The second plurality of features f2 may be represented in the same format as the first plurality of features f1, (e.g., as a feature vector) and/or may have the same size as f1. The two sets of features may be used jointly to determine a set of informative features f3 that may be indicative of the pixel characteristics of the object of interest in first image 402 and the second image 404. For instance, informative features f3 may be obtained by comparing features f1 and f2, and selecting the features that are common to both f1 and f2. One example way of accomplishing this task may be to normalize the feature vectors of f1 and f2 (e.g., such that both vectors have values ranging from 0 to 1), compare the two normalized vectors (e.g., based on (f1−f2)), and select corresponding elements in the two vectors that have a value difference smaller than a predefined threshold as the informative features f3.
In examples, the second plurality of features f2 extracted from the second image 404 and/or the informative features f3 may be further processed at 412 to gather information (e.g., from certain dimensions of f2) that may be used to automatically annotate the object of interest in the second image 404. For example, based on the informative features f3, an indicator vector having the same size as the feature vector f1 and/or f2 may be derived in which elements that correspond to informative features f3 may be given a value of 1 and the remaining elements may be given a value of 0. A score may then be calculated to aggregate of the informative features f3 and/or the informative elements of feature vector f2. Such a score may be calculated, for example, by conducting an element-wise multiplication of the indicator vector and feature vector f2. Using the calculated score, an annotation 414 of the object of interest may be automatically generated for the second image 404, for example, by backpropagating a gradient of the score through the first ML data annotation model (e.g., through the neural network used to implement the first ML data annotation model) and determining pixel locations (e.g., spatial dimensions) that may correspond to the object of interest based on the gradient values associated with the pixel locations. For instance, pixel locations having positive gradient values during the backpropagation (e.g., these pixel locations may make positive contributions to the desired result) may be determined to be associated with the object of interest and pixel locations having negative gradient values during the backpropagation (e.g., these pixel locations may not make contributions or may make negative contributions to the desired result) may be determined to be not associated with the object of interest. Annotation 414 for the second image 404 may then be generated based on a weighted linear combination of the feature maps or feature vectors obtained by the first ML data annotation model (e.g., the gradients may operate as the weights in the linear combination).
The annotation (e.g., annotation 414) automatically generated using the techniques described above may be presented to a user, for example, through the user interface described herein so that adjustments may be made by the user to refine the annotation. For example, the user interface may allow the user to adjust the annotation 414 by executing one or more of a click, a tap, a drag-and-drop, a click-drag-and-release, a sketching or drawing motion, etc. Adjustable control points may be provided along the contour of the annotation 414 and the user may be able to change the shape of the annotation 414 by manipulating one or more of these control points (e.g., by dragging and dropping the control points to various new locations on the display screen).
The automatically generated 3D annotation for the first 3D image dataset may be confirmed and/or adjusted by a user, and the parameters of the first ML model may be adjusted based on the user adjustment, for example, through reinforcement learning (e.g., conducted in an online manner). The confirmed or adjusted 3D annotation may be used to automatically annotate other 3D image datasets including, e.g., a second 3D image dataset associated with a second patient.
The automatic annotation operations described herein may be performed in a progressive manner, for example, to reduce the amount of human efforts and/or computational resources (e.g., memory consumptions) involved.
In examples, if there is a manual annotation within a current annotation propagation window, the corresponding batch may end before the manual annotation and the manual annotation may be set/used as a reference for the next batch. For example, as shown in
In examples, a user (e.g., a human annotator) may adjust an auto-generated annotation for image i using the interface or tools described herein, and the adjusted annotation may be used a reference or basis for annotating (e.g., re-annotating) a batch of other images located adjacent to image i, where the number of the images included in the batch may be equal to the smaller of annotation propagation window size w described herein or the number of images located between image i and a manually annotated image nearest to image i. The last annotation in this batch (or the nearest manual annotation) may then be used as a basis for annotating another batch of images and the process may be repeated till a satisfactory 3D annotation is derived for the 3D dataset.
It should be noted that image 1 shown in
As shown in
Annotations obtained for a first 3D image set or a first set of one or more 2D image sequences (e.g., from the first 3D image dataset) may be used (e.g., as reference or support annotations) to automatically annotate a second 3D image set or a second sets of one or more 2D image sequence. In examples, the first 3D image set or the first set of one or more 2D image sequences may belong to a first patient, the second 3D image set or the second set of 2D image sequences may belong to a second patient, and the automatic annotation may be referred to herein as cross-sequence annotation. Such a cross-sequence annotation task may be performed based on an ML model (e.g., referred to herein as a cross-sequence annotation ML model) and a readiness check may be performed (e.g., automatically) on the first 3D image set or the first set of one or more 2D image sequences before it is used to annotate the second 3D image set or the second set of 2D image sequences (e.g., to derive an initial annotation that may be propagated through the second 3D image set or the second set of 2D image sequences).
The cross-sequence annotation ML model may be implemented using a branch of the ML data annotation neural network described herein or using a separate neural network. In examples, such a cross-sequence annotation ML model may employ a two-branch architecture. A first branch of the ML model may be configured to receive one or more annotated image sets or image sequences (e.g., referred to herein as support annotations) as an input, while a second branch of the ML model may be configured to receive an un-annotated image set or image sequence as an input. From these inputs, the cross-sequence annotation ML model may predict an annotation for the un-annotated image set or image sequence (e.g., an initial annotation that may be propagated through the un-annotated image set or image sequences) based on features extracted from the support annotations. In doing so, the cross-sequence annotation ML model may assess the readiness of the support annotations (e.g., by calculating a readiness score for the support annotations) and proceed with the automatic, cross-sequence annotation if (e.g., only if) the readiness of the support annotations exceeds a pre-defined threshold (e.g., the value of this threshold may be determined based on various factors such as the size of the region to be annotated). If the readiness of the support annotations is lower than the pre-defined threshold, a manual annotation may be obtained (e.g., from a human annotator) and used as a basis for annotating the un-annotated image set or image sequence.
The training of the cross-sequence annotation ML model (e.g., a neural network implementing the cross-sequence annotation ML model) may be conducted based on a training dataset comprising N annotated (e.g., manually annotated) image sequences (e.g., the value of N may vary based on factors such as the quality of the training images). During the training, the cross-sequence annotation ML model may be used to predict, for each image sequence i of the N image sequences, an annotation (e.g., referred to herein as a query annotation) for the image sequence (e.g., an annotation for one of the images in the image sequence) based on the other (N−1) annotated image sequences. The cross-sequence annotation ML model may predict the query annotation, for example, by extracting features from the other (N−1) annotated image sequences and inferring the query annotation for image sequence i based on those extracted features (e.g., based on an average or a maximums of the respective sets of features extracted from the (N−1) annotated image sequences). The cross-sequence annotation ML model may then compare the predicted query annotation for image sequence i with a corresponding manual annotation already available in the training dataset (e.g., a ground truth annotation) and calculate a readiness score based on the comparison.
The readiness score may be calculated in different ways including, but not limited to, by calculating an intersection-over-union (IoU) between the predicted query annotation and the ground truth annotation, by determining a ratio of true positive points in the predicted query annotation to a sum of true positive points and false positive points in the predicted annotation, based on the correctness metric associated with the predicted query annotation, etc. The same operations may be applied to each of the N image sequences, through which N readiness scores may be obtained and an annotation may be generated (e.g., based on an average or maximum of the annotations predicted for each image sequence i) for annotating another 3D image set or 2D image sequence. An overall readiness score representing the readiness of the N image sequences for the cross-sequence annotation task may then be calculated, for example, based on an average or median (or other statistical summary) of the N readiness scores. If the overall readiness score is satisfactory (e.g., based on an empirical evaluation), the N image sequences may be determined to be ready for the cross-sequence annotation task (e.g., for generating an initial annotation for a new image sequence). Otherwise (e.g., if the overall readiness score is unsatisfactory), more annotated image sequences (e.g., a (N+1)-th annotated image sequence) may be added to enrich the support annotation pool (e.g., comprising the annotated image sequences) before the pool of images can be used for the cross-sequence annotation task.
The images (e.g., the N image sequences) used to train the cross-sequence annotation ML model may be pre-processed to increase the diversity of the training data and the robustness of the ML model. The pre-processing may be performed, for example, by applying various mathematical operations to the training images including, but not limited to, cropping, rotation, affine transformation, etc. The annotation generated by the cross-sequence annotation ML model may be in different formats, such as, e.g., a binary mask (e.g., in which 0 may represent background and 1 may represent the object of interest), a bounding box (e.g., surrounding the object of the interest in an image), or a group of positive and negative seeds/points that may be used for mask inference.
For simplicity of explanation, the training operations are depicted and described herein with a specific order. It should be appreciated, however, that the training operations may occur in various orders, concurrently, and/or with other operations not presented or described herein. Furthermore, it should be noted that not all operations that may be included in the training method are depicted and described herein, and not all illustrated operations are required to be performed.
The systems, methods, and/or instrumentalities described herein may be implemented using one or more processors, one or more storage devices, and/or other suitable accessory devices such as display devices, communication devices, input/output devices, etc.
Communication circuit 1004 may be configured to transmit and receive information utilizing one or more communication protocols (e.g., TCP/IP) and one or more communication networks including a local area network (LAN), a wide area network (WAN), the Internet, a wireless data network (e.g., a Wi-Fi, 3G, 4G/LTE, or 5G network). Memory 1006 may include a storage medium (e.g., a non-transitory storage medium) configured to store machine-readable instructions that, when executed, cause processor 1002 to perform one or more of the functions described herein. Examples of the machine-readable medium may include volatile or non-volatile memory including but not limited to semiconductor memory (e.g., electrically programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM)), flash memory, and/or the like. Mass storage device 1008 may include one or more magnetic disks such as one or more internal hard disks, one or more removable disks, one or more magneto-optical disks, one or more CD-ROM or DVD-ROM disks, etc., on which instructions and/or data may be stored to facilitate the operation of processor 1002. Input device 1010 may include a keyboard, a mouse, a voice-controlled input device, a touch sensitive input device (e.g., a touch screen), and/or the like for receiving user inputs to apparatus 1000.
It should be noted that apparatus 1000 may operate as a standalone device or may be connected (e.g., networked, or clustered) with other computation devices to perform the functions described herein. And even though only one instance of each component is shown in
While this disclosure has been described in terms of certain embodiments and generally associated methods, alterations and permutations of the embodiments and methods will be apparent to those skilled in the art. Accordingly, the above description of example embodiments does not constrain this disclosure. Other changes, substitutions, and alterations are also possible without departing from the spirit and scope of this disclosure. In addition, unless specifically stated otherwise, discussions utilizing terms such as “analyzing,” “determining,” “enabling,” “identifying,” “modifying” or the like, refer to the actions and processes of a computer system, or similar electronic computing device, that manipulates and transforms data represented as physical (e.g., electronic) quantities within the computer system's registers and memories into other data represented as physical quantities within the computer system memories or other such information storage, transmission or display devices.
It is to be understood that the above description is intended to be illustrative, and not restrictive. Many other implementations will be apparent to those of skill in the art upon reading and understanding the above description. The scope of the disclosure should, therefore, be determined with reference to the appended claims, along with the full scope of equivalents to which such claims are entitled.
Number | Date | Country | |
---|---|---|---|
Parent | 17969876 | Oct 2022 | US |
Child | 18128290 | US |