Computed Tomography (CT) is a widely used imaging modality helping diagnosis and treatment of various diseases (e.g., cardiovascular diseases), resulting in over 80 million exams performed annually in US. Cardiac CT, e.g. coronary CTA, is a commonly performed exam for patients with suspected coronary artery disease. A major challenge in cardiac CT is the intrinsic motion of the heart, which requires high temporal resolution and electrocardiogram (ECG) gating. Motion artifacts are commonly seen in cardiac CT, which hampers accurate delineation of key anatomic (e.g., coronary lumen) and pathological features (e.g., stenosis), and, ultimately, may sacrifice the diagnosis. This is mainly due to the limited temporal resolution offered by current CT systems. In CT imaging, temporal resolution is primarily determined by data acquisition time, and, more specifically, gantry rotation time. However, gantry rotation of CT systems has already reached the level close to mechanical limit and it is extremely challenging to make the gantry rotate much faster.
Another current CT system includes a dual-source CT (DSCT) technique that uses two data acquisition systems to improve temporal resolution by a factor of two. Although the DSCT may improve temporal resolution, the availability of DSCT is very limited compared to that of standard single-source CT (SSCT). Additionally, DSCT still suffers from motion artifacts for patients with a high heartbeat rate or irregular heartbeats, which may also benefit from further temporal resolution improvement.
Therefore, there is a need to improve temporal resolution of CT, including but not limited to both SSCT and DSCT, for the purpose of improving the quality of cardiac CT exams.
As noted above, motion artifact is a major challenge in cardiac CT. Conventional motion correction techniques are limited on patients with high or irregular heart rate due to simplified modeling of CT systems and cardiac motion. Emerging deep learning based cardiac motion correction techniques have demonstrated the potential of further quality improvement. Yet, many methods require CT projection data or advanced motion simulation tools that are not readily available.
To solve these and other problems, the embodiments described herein provide systems and methods for motion correction. The embodiments described herein improve image quality of cardiac CT exams, including, e.g., improved temporal resolution and reduced motion artifacts. Embodiments described herein provide a deep convolutional neural network (CNN) integrated with customized attention and spatial transformer techniques. Embodiments described herein may be implemented directly with CT images, without the need of projection/raw data or any proprietary information. Accordingly, embodiments described herein may be implemented with any scanner vendors and models, including photon-counting-detector (PCD) CT. For example, embodiments described herein enable the full potential PCD-CT to be realized, providing high temporal resolution, high spatial resolution, and multi-energy CT for cardiac imaging.
Accordingly, embodiments described herein provide systems and methods for deep learning-based medical image motion artifact correction. One embodiment provides a system for medical image motion artifact correction. The system includes an electronic processor configured to receive a medical image associated with a patient, the medical image including at least one motion artifact. The electronic processor is also configured to apply a model developed using machine learning to the medical image for correcting motion artifacts, the model including at least one of a spatial transformer network and an attention mechanism network. The electronic processor is also configured to generate a new version of the medical image, wherein the new version of the medical image at least partially corrects the at least one motion artifact.
Another embodiment provides a method for medical image motion artifact correction. The method includes receiving, with an electronic processor, a medical image associated with a patient, the medical image including at least one motion artifact. The method also includes applying, with the electronic processor, a model developed using machine learning to the medical image for correcting motion artifacts, the model including at least one of a spatial transformer network and an attention mechanism network. The method also includes generating, with the electronic processor, a new version of the medical image, wherein the new version of the medical image at least partially corrects the at least one motion artifact.
The foregoing and other aspects and advantages of the present disclosure will appear from the following description. In the description, reference is made to the accompanying drawings that form a part hereof, and in which there is shown by way of illustration a preferred embodiment. This embodiment does not necessarily represent the full scope of the invention, however, and reference is therefore made to the claims and herein for interpreting the scope of the invention.
One or more embodiments are described and illustrated in the following description and accompanying drawings. These embodiments are not limited to the specific details provided herein and may be modified in various ways. Furthermore, other embodiments may exist that are not described herein. Also, the functionality described herein as being performed by one component may be performed by multiple components in a distributed manner. Likewise, functionality performed by multiple components may be consolidated and performed by a single component. Similarly, a component described as performing particular functionality may also perform additional functionality not described herein. For example, a device or structure that is “configured” in a certain way is configured in at least that way, but may also be configured in ways that are not listed. Furthermore, some embodiments described herein may include one or more electronic processors configured to perform the described functionality by executing instructions stored in non-transitory, computer-readable medium. Similarly, embodiments described herein may be implemented as non-transitory, computer-readable medium storing instructions executable by one or more electronic processors to perform the described functionality. As used in the present application, “non-transitory, computer readable medium” comprises all computer-readable media but does not consist of a transitory, propagating signal. Accordingly, non-transitory computer-readable medium may include, for example, a hard disk, a CD-ROM, an optical storage device, a magnetic storage device, a ROM (Read Only Memory), a RAM (Random Access Memory), register memory, a processor cache, or any combination thereof.
In addition, the phraseology and terminology used herein is for the purpose of description and should not be regarded as limiting. For example, the use of “including,” “containing,” “comprising,” “having,” and variations thereof herein is meant to encompass the items listed thereafter and equivalents thereof as well as additional items. The terms “connected” and “coupled” are used broadly and encompass both direct and indirect connecting and coupling. Further, “connected” and “coupled” are not restricted to physical or mechanical connections or couplings and can include electrical connections or couplings, whether direct or indirect. In addition, electronic communications and notifications may be performed using wired connections, wireless connections, or a combination thereof and may be transmitted directly or through one or more intermediary devices over various types of networks, communication channels, and connections. Moreover, relational terms such as first and second, top and bottom, and the like may be used herein solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions.
The server 105, the medical image database 115, the user device 120, and the medical imaging system 125 communicate over one or more wired or wireless communication networks 130. Portions of the communication networks 130 may be implemented using a wide area network, such as the Internet, a local area network, such as Bluetooth™ network or Wi-Fi, and combinations or derivatives thereof. In some embodiments, additional communication networks may be used to allow one or more components of the system 100 to communicate. Also, in some embodiments, components of the system 100 may communicate directly as compared to through a communication network 130 and, in some embodiments, the components of the system 100 may communicate through one or more intermediary devices not shown in
The server 105 may include a computing device, such as a server, a database, or the like. As illustrated in
The communication interface 210 allows the server 105 to communicate with devices external to the server 105. For example, as illustrated in
The electronic processor 200 is configured to access and execute computer-readable instructions (“software”) stored in the memory 205. The software may include firmware, one or more applications, program data, filters, rules, one or more program modules, and other executable instructions. For example, the software may include instructions and associated data for performing a set of functions, including the methods described herein.
For example, as illustrated in
Motion artifact correction models generated by the learning engine 225 are stored in the model database 230. As illustrated in
As also illustrated in
Returning to
The medical image database 115 may include a computing device, such as a server, a database, or the like. As illustrated in
In some embodiments, the medical images 250 stored in the medical image database 115 include training data used by the learning engine 225. For example, in some embodiments, the training data includes one or more CT images from one cardiac phase or consecutive cardiac phases (e.g., 65-80% R-R interval) with slow temporal resolution (e.g., 140 ms) (as input data) and a counterpart CT image from one quiescent phase (e.g., 75% R-R interval) with fast temporal resolution (e.g., 75 ms) medical images 250 (as labels). Accordingly, embodiments described herein may be implemented directly with medical images 250 (e.g., CT images), without projection or raw data. In some embodiments, before being used as training data, the medical images 250 may be filtered. For example, the medical images 250 may be filtered to identify subsets or groupings of medical images 250 based on temporal resolution, phases (e.g., consecutive cardiac phases), or the like.
As noted above, in some embodiments, the medical image database 115 is combined with the server 105. Alternatively or in addition, the medical images 250 may be stored within a plurality of databases, such as within a cloud service. Furthermore, in some embodiments, the medical images 250 may be stored in a memory of the user device 120. In some embodiments, the medical image database 115 may be associated with one or more entities, such as a particular healthcare provider, a health collaborative, a health maintenance organization (HMOs), or the like. Although not illustrated in
The user device 120 may also include a computing device, such as a desktop computer, a laptop computer, a tablet computer, a terminal, a smart telephone, a smart television, a smart wearable, or another suitable computing device that interfaces with a user. Although not illustrated in
The user device 120 may include other software applications. Example software applications may enable a user to display an EMR, access a PACS or another system or data storage device, or the like. The system 100 is described herein as providing a motion artifact correction service through the server 105. However, in other embodiments, the functionality described herein as being performed by the server 105 may be locally performed by the user device 120. For example, in some embodiments, the user device 120 may store the learning engine 225, the model database 230, the application 240, or a combination thereof.
The user device 120 may be used by an end user to generate and interact with medical images, including enhanced or improved medical images. As one example, a user may use the user device 120 to access a CT image and perform a motion artifact correction technique on the CT image (via, e.g., the application 240), such that the CT image has an improved temporal resolution and reduced motion artifacts. As another example, a user may use the user device 120 to interact with or view an enhanced medical image (e.g., a CT image with improved temporal resolution and reduced motion artifacts) for diagnostic and reporting purposes.
As illustrated in
The electronic processor 200 may receive the medical image 250 through the communication network 130 from the medical imaging system 125, the user device 120, the medical image database 115, or a combination thereof. As one example, the electronic processor 200 may receive the medical image 250 directly from the medical imaging system 125 after the medical image 250 was generated by the medical imaging system 125 (e.g., immediately after the medical imaging system 125 generates the medical image 250). As another example, the electronic processor 200 may receive the medical image 250 after a predetermined amount of time has elapsed since the medical image 250 was generated by the medical imaging system 125, such as several hours, a day, several days, or the like. Accordingly, in some embodiments, the medical imaging system 125 is configured to automatically transmit the medical image 250 to the server 105 (e.g., the electronic processor 200) for motion artifact correction prior to transmitting the medical image 250 for storage (e.g., locally in a memory of the medical imaging system 125 or remotely in a memory of the user device 120 or the medical image database 115).
In response to receiving the medical image 250 (at block 305), the electronic processor 200 performs a motion artifact correction technique on the medical image 250 (at block 310). The electronic processor 200 may perform the motion artifact correction technique on the medical image 250 by applying a model developed using machine learning to the medical image 250. As noted above, the model database 230 may store one or more models developed by the learning engine 225. Accordingly, the electronic processor 200 may access a model from the model database 230 and apply the model to the medical image 250.
In some embodiments, the model is a neural network, such as, e.g., a convolutional neural network (CNN). For example, the model may be a convolutional neural network configured to compensate cardiac motion artifacts.
During training of the CNN-MC model 405, the CNN(s) may implicitly learn one or more patterns of motion artifact and consequently correct it. In the illustrated example, the CNN-MC model 405 jointly utilizes the techniques of a spatial transformer (e.g., a spatial transformer network 410) and attention (e.g., an attention mechanism network 415). The attention mechanism network 415 and the spatial transformer network 410 enable the CNN(s) (e.g., the CNN-MC model 405) to focus on the dynamic structure of heart and correct the corresponding motion artifact without using a complicated system model that relies on assumptions of the modes of cardiac motion and physical processes. This enables improvement of the delineation of coronary arteries and, if any, calcifications. Further, the CNN-MC model 405 is vendor agnostic, as it may be fully implemented in the CT image domain.
As illustrated in
Alternatively or in addition, in some embodiments, the CNN-MC model 405 implements a pseudo 3-D CNN structure, as illustrated in
In some embodiments, the CNN-MC model 405 may be associated with a customized loss function. When the customized loss function is implemented, the parameters of the CNNs may be optimized. The customized loss function may be
The loss function may include a data fidelity term and two regularization terms. The data fidelity term may include, e.g., a mean-square error. The two regularization terms may include, e.g., an image-gradient-correlation and a feature-reconstruction loss. The terms of the loss function may be used to gauge the consistency of CT number, boundary, and high-level image features, respectively.
Alternatively or in addition, in some embodiments, the loss function may be associated with additional regularization terms, as set forth below:
Returning to
In some embodiments, the electronic processor 200 transmits the new version of the medical image 250 to a remote device. As one example, the electronic processor 200 may transmit the new version of the medical image 250 to the user device 120 for display to a user of the user device 120 (via, e.g., a display device or other output mechanism of the user device 120). As another example, the electronic processor 200 may transmit the new version of the medical image 250 to a remote device for storage, such as, e.g., the user device 120, the medical image database 115, or another component of the system 100.
The embodiments described herein were tested on a patient cohort of ECG-gated (retrospective gating) cardiac CT exams (n=40) that were retrospectively collected to create training and testing datasets for the CNN-MC model 405. Each case was originally acquired from a clinical dual-source CT scanner. 26 cases were randomly selected to generate training datasets (e.g., training data used by the learning engine 225 of the server 105). The remaining cases were reserved for validation. CT images corresponding to slow and fast temporal resolution (i.e., 140 ms and 75 ms, respectively) were reconstructed using the same projection data. For slow temporal resolution, CT images were reconstructed across cardiac phases corresponding to 65%, 70%, 75% and 80% R-R interval in ECG. Of note, these images (slow temporal resolution) were equivalent to the CT images acquired from SSCT (denoted as “pseudo SSCT images”). For fast temporal resolution, CT images were reconstructed at the relatively quiescent phase of 75% R-R interval (denoted as “reference images”). The major parameters of image reconstruction are listed in the table illustrated in
After reconstruction was completed, image patches of coronary arteries were acquired to generate additional training datasets. Centerline of major coronary arteries, including right coronary artery (RCA), left anterior descending artery (LAD), and circumflex artery (CX) was obtained. Finally, image patches (80×80 pixels per patch) centered at these coronary arteries were extracted from axial CT images.
This pilot study involved two experiments. First, the effects of two training-inference strategies on CNN performance were evaluated. Strategy #1 trained the CNN using original 200 mm FOV of the whole heart, and then deployed the CNN to the testing images with 200 mm FOV. Strategy #2 trained the CNN using patches of coronary arteries, and deployed the CNN to testing patches of coronary arteries. In this experiment, only the attention based network (e.g., the attention mechanism network 415) was used for simplicity but without losing generality. In the second experiment, the influence of network structure on CNN performance was investigated, and only strategy #2 was used for simplicity. Besides visual inspection, the CT number accuracy and structural similarity index (SSIM) were also evaluated.
In some embodiments, training is implemented by using patch-based training, where image patches were centered at coronary arteries, using standard augmentation (e.g., rotation, flipping, etc.), and using Adam optimizer with dropout (rate=0.1). In some embodiments, inference included applying cardiac motion correction to testing patches of coronary arteries and implementing Monte Carlo Dropout (e.g., used the averaged CNN outputs as final output percase). With respect to the Monte Carlo Dropout, standard dropout operation (i.e., randomly deactivating neurons) may be turned on in both training and inference. Using Monte Carlo Dropout during inference stage improves the stability of network outputs (i.e., averaging neural network outputs across several iterations of Monte Carlo Dropout). In some embodiments, evaluation includes visual inspection, CT number error (at lumen), structural similarity, or the like.
Before correction, motion artifacts distorted cross-sectional structure of coronary arteries, blurred the edge of vessels, and induced CT number error at iodine-lumen. Both strategies #1 and #2 improved the delineation, brightness (signal intensity), and SSIM of coronary arteries, as illustrated in
With the same training-inference strategy, CNN #3 (e.g., the hybrid network of
The three CNNs yielded comparable SSIM (as illustrated in the table above). The 140 ms SSCT image showed clear broadening of the LAD with blurry boundaries (in both curved reformat and axial images) compared to the reference 75 ms DSCT image.
Accordingly, embodiments described herein provide methods and systems of deep learning-based medical image motion artifact correction, and, more particularly, for cardiac CT images. The embodiments described herein improve SSCT cardiac image quality approaching that of DSCT, by reducing motion artifacts using deep-learning based approaches. As described herein, three deep CNNs with customized attention and STN techniques were developed to directly synthesize DSCT images, using pseudo SSCT images as inputs. The CNNs disclosed herein do not request access to patient raw projection data or advanced cardiac motion simulation tools. In the pilot study, the CNNs improved the delineation, structural accuracy, and CT number accuracy of iodine-enhanced coronary arteries. The embodiments described herein demonstrated the potential of improving image quality in cardiac exams with SSCT.
The present disclosure has described one or more preferred embodiments, and it should be appreciated that many equivalents, alternatives, variations, and modifications, aside from those expressly stated, are possible and within the scope of the invention.
This application claims the benefit of U.S. Provisional Patent Application Ser. No. 63/321,881, filed on Mar. 21, 2022, and entitled “Deep Learning-Based Medical Image Motion Artifact Correction,” which is herein incorporated by reference in its entirety.
This invention was made with government support under EB028590 awarded by the National Institutes of Health. The government has certain rights in the invention.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US2023/064787 | 3/21/2023 | WO |
Number | Date | Country | |
---|---|---|---|
63321881 | Mar 2022 | US |