This disclosure relates to a system using feature maps to estimate motion in computed tomography (CT) scanned images and to compensate for the estimated motion in reconstruction of computed tomography (CT) scanned images, and more particularly to using image registration and deep learning (DL) networks to estimate the motion from the feature maps of the computed tomography (CT) scanned images.
The background description provided herein is for the purpose of generally presenting the context of the disclosure. Work of the presently named inventors, to the extent the work is described in this background section, as well as aspects of the description that may not otherwise qualify as prior art at the time of filing, are neither expressly nor impliedly admitted as prior art against the present disclosure.
Computed tomography (CT) systems and methods are widely used, particularly for medical imaging and diagnosis. CT systems generally create images of one or more sectional slices through a subject's body. A radiation source, such as an X-ray source, irradiates the body from one side. At least one detector on the opposite side of the body receives radiation transmitted through the body. The attenuation of the radiation that has passed through the body is measured by processing electrical signals received from the detector.
X-ray CT has found extensive clinical applications in cancer, heart, and brain imaging. As CT has been increasingly used for a variety of applications including, e.g., cancer screening and pediatric imaging, there has arisen a push to reduce the radiation dose of clinical CT scans to become as low as reasonably achievable. For low-dose CT, the image quality can be degraded by many factors, such as high quanta noise challenge scanning geometry (i.e., large cone angle, high helical pitch, truncation, etc.), and other non-ideal physical phenomenon (i.e., scatter, beam hardening, crosstalk, metal, etc.).
Eliminating motion artifacts is one of the most challenging issues in CT imaging. The artifacts result from two major types of motion: respiratory motion and cardiac motion. With non-cooperative patients or in an emergency case, breathing motion or motion of the skull can compromise image quality in CT scans. Motion artifacts of lung structures are unavoidable in routine practice of chest CT scans. The pulsatile motion of the heart or the involuntary motion of the diaphragm are the main causes of the lung motion occurring even under a breath-hold during CT scans.
As a result, motion artifacts of various lung structures, such as lung parenchyma, pulmonary vessels, or airways, are often visible in routine CT images. These artifacts impose challenges in the diagnosis of the lung using CT since they mimic various lung diseases, including bronchiectasis due to doubling edges, cysts, emphysema, or ground glass opacity (GGO) nodules because of CT-value bias. In addition, motion artifacts are one of the main challenges in quantitative chest CT. Thus, motion estimation in cardiac CT images is crucial for the evaluation of human heart anatomy and function.
Conventional CT image reconstruction methods generally assume that an object is stationary during data acquisition. Artifacts can severely affect a diagnosis that uses these reconstructed images, especially if the imaged features are small. Correction methods are required to detect motion artifacts and correct them in the reconstruction of images. For example, plaque formed in coronary arteries are generally indicative of a risk of a potential heart attack but are difficult to image due to their small size.
Developing efficient correction methods can be challenging due to the difficulties of accurately modelling the forward model and solving a complicated inverse problem. While motion correction of the coronary vessels requires a local motion model, global motion models are sufficient for organs, such as, for example, the lung or the skull.
Although many innovative technologies have been developed during the past decades to improve CT image quality, such as model-based iterative image reconstruction for breathing-motion correction or lung-motion correction, conventional imaging techniques employ brute force approaches to mitigate the effects of motion artifacts in CT imaging. Some of these techniques include employing two X-ray tubes or detector pairs angularly offset from each other, a heavier or higher power tube combined with spinning a gantry faster or combining data from successive heart cycles. These techniques and methods, however, are often time consuming, with a number of computational challenges and require expensive hardware. Particularly, for some challenging scenarios, the image quality is still inferior.
Accordingly, there exists a need to develop an efficient method for motion estimation and for the generation of motion artifact-free CT images using motion compensation reconstruction techniques. Additionally, improved methods are desired in order to reduce computational time, hardware costs, and to further improve CT image quality. The embodiments disclosed herein meet such a need.
In one embodiment, there is provided a medical image processing method includes obtaining a set of projection data acquired in a computed tomography (CT) scan of a three-dimensional region of an object to be examined; generating for each time point of a plurality of time points of the CT scan based on a part of the obtained set of projection data corresponding to the time point, a pair of feature maps for estimating motion at the time point so as to generate a plurality of pairs of feature maps, each feature map representing a feature of an image reconstructed from the part of the obtained set of projection data; estimating, based on the generated plurality of pairs of feature maps, a four-dimensional motion field, wherein the four-dimensional motion field indicates change of a motion of the object in the three-dimensional region over time; and reconstructing, based on the estimated four-dimensional motion field and the obtained set of projection data, a CT image of the object.
In another embodiment, there is provided A medical image processing apparatus, comprising processing circuitry configured to obtain a set of projection data acquired in a computed tomography (CT) scan of a three-dimensional region of an object to be examined; generate for each time point of a plurality of time points of the CT scan based on a part of the obtained set of projection data corresponding to the time point, a pair of feature maps for estimating motion at the time point so as to generate a plurality of pairs of feature maps, each feature map representing a feature of an image reconstructed from the part of the obtained set of projection data; estimate, based on the generated plurality of pairs of feature maps, a four-dimensional motion field, wherein the four-dimensional motion field indicates change of a motion of the object in the three-dimensional region over time; and reconstruct, based on the estimated four-dimensional motion field and the obtained set of projection data, a CT image of the object.
A more complete understanding of this disclosure is provided by reference to the following detailed description when considered in connection with the accompanying drawings, wherein:
To address the above-identified challenges of known reconstruction methods for medical images, the methods described herein have been developed in order to estimate motion and perform reconstruction to generate motion-artifact-free images, and further improve image quality of medical images, such as lung and cardiac computed tomography (CT) images. Further, the examples provided herein of applying these methods are non-limiting, and the methods described herein can be applied to other medical imaging modalities, such as, MRI, PET/SPECT, etc., by adapting the framework described herein.
Accordingly, the discussion herein discloses and describes merely exemplary implementations of the present disclosure. As will be understood by those skilled in the art, the present disclosure may be embodied in other specific forms without departing from the spirit or essential characteristics thereof. Accordingly, the present disclosure is intended to be illustrative, but not limiting of the scope of the invention, as well as other claims. The disclosure, including any readily discernible variants of the teachings herein, defines, in part, the scope of the foregoing claim terminology such that no inventive subject matter is dedicated to the public.
As discussed above, the CT image quality can be degraded by many factors, such as, respiratory motion and cardiac motion and other non-ideal physical phenomenon. As a result, developing efficient correction methods can be challenging due to existing techniques often being time consuming with a number of computational challenges that require expensive hardware. Particularly, for some challenging scenarios, the resultant image quality is still inferior.
To address the above-identified challenges of known methods, the methods described herein, use feature map-based motion estimation and motion-compensated reconstruction. A feature map can present edge and high-frequency information of the scanned object instead of a conventional CT image. For example, in certain implementations, in the methods described herein, filters are utilized to obtain a feature map of the scanned image. The feature map is further utilized for generation of a motion field, which is eventually utilized for compensation of the motion in reconstruction and generation of motion artifact-free images. In particular, various implementations of the methods described herein provide several advantages over previous methods of image reconstruction.
In certain implementations of the methods described herein, the generation of the motion field is achieved through image registration. First, image registration is performed utilizing the feature map, and then the motion field is generated using the results of the image registration. Image registration, also known as image fusion or image matching, is the process of aligning two or more images based on image appearance. Medical image registration seeks to find an optimal spatial transformation that best aligns the underlying anatomical structures in the images. Medical image registration is used in many clinical applications such as image guidance, motion tracking, segmentation, dose accumulation, image reconstruction, etc.
In certain implementations of the methods described herein, the generation of a motion field is achieved through deep learning (DL) networks. In certain implementations, offline training of a DL network is performed, and the trained network is then embedded in the reconstruction step. In general, DL networks have been adapted to image processing for improving image spatial resolution and reducing noise. As compared to traditional methods, deep learning does not require accurate modelling, relying instead on learning from training data sets. Therefore, the methods described herein can achieve better image quality in terms of motion artifact-free images than previous methods.
For example, the methods disclosed herein leverage improvements in various research areas whereby DL-based convolutional neural networks (CNN) can be used to generate motion-artifact-free reconstructed CT images. Training data corresponding to different CT scanning methods and scanning conditions can be used to train various CNN networks to be tailored for projection data corresponding to particular CT scanning methods, protocols, applications, and conditions by using training data selected to match the particular CT scanning methods, protocols, applications, and conditions. Thus, respective CNN networks can be customized and tailored to certain conditions and methods for CT scanning.
Additionally, the customization of CNN networks can extend to the motion field of the projection data and can be extended to the anatomical structure or region of the body being imaged. The methods described herein can be applied to motion estimation and compensation of reconstructed CT images. Further, the redundancy of information in adjacent slices of a three-dimensional CT image can be used to perform volumetric-based DL by using a kernel for the convolution layers of the DL network that extends to pixels in slices above and below a slice that is being denoised. In general, DL can be adapted to image processing for estimation of motion and generation of motion-free images.
Accordingly, feature maps, image registration, and convolution neural networks are used in the embodiments herein to obtain a motion field of the scanned image. The motion field is further utilized for motion compensation in the scanned image resulting in a high-quality motion-free scanned CT image.
Referring now to the drawings, wherein like reference numerals designate identical or corresponding parts throughout the several views,
In step 120, raw projection data 115 is obtained from a projection view. More generally, data 115 can be image data, which is obtained from a computed tomography (CT) apparatus through the process of back projection. Back projection is the process of mapping the raw projection data to an image. Generally, the projection data in several cycles is combined to reconstruct the final image. In the embodiments disclosed herein, partial raw projected data is obtained in step 120 for further processing.
In step 140, feature map reconstruction is performed using the raw data 115 to generate a feature map 135 of the scanned object. A feature map can represent, for example, edge and high frequency information of the scanned object. Feature maps include information in an image that can usually be viewed as a set of connected regions consisting of a collection of regions, each with basic features such as edges, frequency components related to some special image pattern, image intensity/color, prior shape, and texture. The aim of feature map reconstruction is to divide an image into specific regions that are adjacent to each other and do not overlap with each other according to different features of the image, and to divide these specific regions into different classes, with regions with the same or similar features grouped into the same class and different basic features between classes. In the embodiments disclosed herein, feature map generation is used to achieve an understanding of any object motion that occurred during scanning.
In the embodiments disclosed herein, the feature map reconstruction utilizes feature-enhancement filters and information about known parameters of the object being scanned, for example, such as, parameters of a heart when the scanned object is the heart of a patient. For example, feature enhancement filters, such as, for example, high-pass filters, low-pass filters, and bandpass filters are used. In an example of the embodiments disclosed herein, various segmentation methods can be used for feature map reconstruction, including traditional segmentation and DL-based segmentation methods. Histogram-based image value conversion can also be utilized to enhance features, and either enlarge the difference in value between organs and tissues in an image or reduce the differences between some special tissues and organs. Generally, feature-enhancement filters are used for highlighting certain information, for example, the features in the image, as well as weakening or removing any unnecessary information, such as, noise based on the known parameters of the scanned object. The highlighted information is presented as a feature map of the scanned object.
In step 160, the feature map 135 is used and a motion estimation is performed to obtain a motion field 155 of the scanned object. A motion field provides a detailed analysis of all the motion detected in the scanned image. The motion field can be a collection of the position of all features of a scanned image at all time points of a scan. This information is used to determine information about any change in position in a three-dimensional region at each increment of time, i.e., an indication of motion in the scanned data.
In step 180, the motion field 155 at the output of step 160 is used to perform motion-compensated reconstruction. Information from motion field 155 is used in the motion-compensated reconstruction to obtain motion correction image 175. The motion correction image 175 in the embodiments disclosed herein utilizes information from motion field 155, and removes corresponding motion artifacts to produce a motion-artifact-free image.
In step 230, two data ranges corresponding to projection views on opposite sides of time point T1 on the trajectory are selected, as shown in
In step 250, feature map reconstruction is performed by extracting features from the projection data corresponding to ranges 235 and 235′. Conventionally, complete image reconstruction is performed from the data ranges of the projection view of the scanned object. In the implementation of the embodiments disclosed herein, only partial reconstruction is performed to obtain the feature map. Step 250 can be implemented in either of the two ways disclosed herein. In a first implementation of the example embodiments, projection-based feature extraction is performed.
In the first implementation of step 250, features from the raw projection data of the selected two data ranges is extracted and enhanced, and a feature map is generated. To perform such enhancement, a feature-enhancement filter is used. Generally, a high-pass filter or a band-pass filter can be used for feature enhancement. In certain example embodiments, a nonlinear transform can also be used to perform the enhancement. In one example, the feature extraction and enhancement are performed by implementing a deep neural network.
In this example, a feature map is generated using the extracted features using any of the various methods described above. Back-projected feature enhanced projection data can also be used to generate a feature map of the scanned object. For the selected two data ranges of the projection data, two feature maps are generated. A feature map pair comprising the two feature maps, for example FM11 and FM12, is generated. The feature map pair for the first two data ranges can be, for example, of high temporal resolution, due to the use of short data ranges.
In a second implementation of step 250, image-based feature extraction is performed. For such an implementation, back-projected projection data corresponding to the two selected data ranges is used to obtain a partial reconstruction image pair, say P11 and P12. Next, the partial reconstruction image pair P11 and P12 are used to generate feature map pair, FM11 and FM12. As described above, each of the feature map pairs FM11 and FM12 are generated by using feature enhancement filters, for example, such as high-pass filters or band-pass filters, a nonlinear transformation, or a feature extraction deep neural network.
Next steps 210 and 250 can be repeated for additional time points of the trajectory, for example, a new time point T2 is selected on the trajectory, as shown in
The generated feature map pairs are used to generate a motion field. The computation of accurate motion fields is a crucial aspect in 4D medical imaging as a motion field can affect changes in the image intensities over time. In the embodiments disclosed herein, the feature maps are generated to produce a 4D motion field of the scanned object.
In a first embodiment of a 4D motion estimation, a first step towards generation of a 4D motion field is 3D image registration.
In step 310, a 3D image registration, e.g., traditional non-rigid image registration, is performed. In order to perform this step, each of the feature map pairs 335, 335′ obtained from step 250, corresponding to all time points on the trajectory of the scan are utilized as inputs. In one implementation, the 3D image registration generates a 3D object shape using the feature map pairs at each time point.
In the embodiments described above, results of each of 3D image registration, e.g., the output of step 310 is further used in step 330 to generate a 3D motion field 355 at each time point, for example, MVF1(x, y, z) corresponding to feature map pair FM11 and FM12 at time point T1, MVF2(x, y, z) corresponding to feature map pair FM21 and FM22 at time point T2, up to MVFN(x, y, z) corresponding to the nth feature map pair FMN1 and FMN2 at time point TN.
In step 350, the 3D motion fields 355 or (MVF1(x, y, z), MVF2(x, y, z), MVF3(x, y, z), . . . , MVFN(x, y, z)) are collated through motion fitting in time to obtain a 4D motion field MVF(x, y, z, t) 375. The 4D motion field 375 provides a detailed description of any motion of the scanned object.
In a second embodiment of 4D motion estimation, a 3D deep convolutional neural network (3D DCNN) is trained to output the 3D motion field based on input feature maps. Deep Convolutional Neural Networks (DCNN) are Deep Learning (DL) networks with a higher number of hidden layers, usually more than five, which are used to extract more features and increase the accuracy of the prediction. In general, DL can be adapted to image processing for improving image spatial resolution and reducing noise. As compared to traditional methods, DL does not require accurate noise and edge modelling and only relies on training data sets. Further, DL has the capability to capture the interlayer image features and build up a sophisticated network between noisy observations and latent clean images.
The process 410 performs offline training of the 3D DL network 435, resulting in the 3D DL network being output in step 430. In one example, the offline 3D DL training process 410 trains the 3D DL network 435 using a large number of sets of 3D feature map pairs 415, for example, FM11 and FM12, generated, for example, in the process shown in
In process 440, in step 450, each of the feature map pairs 445, 445′ utilizes the trained 3D DL network 435 to generate a corresponding 3D motion field at each time point, for example, MVF1(x, y, z) corresponding to feature map pair FM11 and FM12 at time point T1, MVF2(x, y, z) corresponding to feature map pair FM21 and FM22, at time point T2, up to MVFN(x, y, z) corresponding to the nth feature map pair FMN1 and FMN2, at time point TN. The 3D motion fields 465 at every time point corresponds to a position of each part of the scanned object, i.e., a feature in a three-dimensional region.
In step 460, the 3D motion fields 465 at each time point (MVF1(x, y, z), MVF2(x, y, z), MVF3(x, y, z), . . . , MVFN(x, y, z)) are collated through motion fitting in time to obtain a 4D motion field MVF(x, y, z, t) 485. The 4D motion field 485 provides a detailed description of any motion in the scanned object.
The process 510 of method 500 performs 4D offline training of the 4D DL network 535. In step 530, a set of feature map pairs 515, such as, for example FM11 and FM12, are used as training data to train a 4D DL network 535, similar to training the 3D DL network 435. The training results in the 4D DL network output from step 530.
In process 540, each of the feature map pairs 545, 545′ utilizes the trained 4D DL network 535 to generate a 4D motion field, 4D MVF(x, y, z, t) 565. The 4D motion field 565 provides a detailed description of any motion of the scanned object.
Mathematically, a neuron's network function is defined as a composition of other functions, which can further be defined as a composition of other functions. This can be conveniently represented as a network structure, with arrows depicting the dependencies between variables, as shown in
In
The 3D DL network 435 or the 4D DL network 535 operates to achieve a specific task, such as motion artifact estimation of a CT image, denoising a CT image, by searching within the class of functions F to learn, using a set of observations, to find which solves the specific task in some optimal sense For example, in certain implementations, this can be achieved by defining a cost function such that, for the optimal solution, (i.e., no solution has a cost less than the cost of the optimal solution). The cost function C is a measure of how far away a particular solution is from an optimal solution to the problem to be solved (e.g., the error). Learning algorithms iteratively search through the solution space to find a function that has the smallest possible cost. In certain implementations, the cost is minimized over a sample of the data (i.e., the training data).
Following after a convolutional layer, a CNN can include local and/or global pooling layers, which combine the outputs of neuron clusters in the convolution layers. Additionally, in certain implementations, the CNN can also include various combinations of convolutional and fully connected layers, with pointwise nonlinearity applied at the end of or after each layer.
CNNs have several advantages for image processing. To reduce the number of free parameters and improve generalization, a convolution operation on small regions of input is introduced. One significant advantage of certain implementations of CNNs is the use of shared weight in convolutional layers, which means that the same filter (weights bank) is used as the coefficients for each pixel in the layer; this both reduces memory footprint and improves performance. Compared to other image-processing methods, CNNs advantageously use relatively little pre-processing. This means that the network is responsible for learning the filters that in traditional algorithms were hand-engineered. The lack of dependence on prior knowledge and human effort in designing features is a major advantage for CNNs.
As shown in
It can be appreciated that, in one embodiment, the implementations described above, of motion estimation and compensation is applicable to a computed tomography (CT) apparatus or scanner.
An embodiment of an X-ray computed tomography (CT) apparatus according to the present disclosure will be described below with reference to the views of the accompanying drawing. Note that X-ray CT apparatuses include various types of apparatuses, e.g., a rotate/rotate-type apparatus in which an X-ray tube and X-ray detector rotate together around an object to be examined, and a stationary/rotate-type apparatus in which many detection elements are arrayed in the form of a ring or plane, and only an X-ray tube rotates around an object to be examined. The present disclosure can be applied to either type. In this case, the rotate/rotate type, which is currently the mainstream, will be exemplified.
The multi-slice X-ray CT apparatus further includes a high voltage generator 9909 that generates a tube voltage applied to the X-ray tube 9901 through a slip ring 9908 such that the X-ray tube 9901 generates X-rays. An X-ray detector 9903 is located at an opposite side from the X-ray tube 9901 across the object for detecting the emitted X-rays that have transmitted through the object. The X-ray detector 9903 is for example a photon-counting detector. The X-ray detector, or the photon-counting detector 9903 further includes individual detector elements or units, such as, for example, a processing circuitry.
The CT apparatus further includes other devices for processing the detected signals from X-ray detector 9903. A data acquisition circuit or a Data Acquisition System (DAS) 9904 converts a signal output from the X-ray detector 9903 for each channel into a voltage signal, amplifies the signal, and further converts the signal into a digital signal. The X-ray detector 9903 and the DAS 9904 are configured to manage a predetermined total number of projections per rotation (TPPR).
The above-described data is sent to a preprocessing device 9906, which is housed in a console outside the radiography gantry 9900 through a non-contact data transmitter 9905. The preprocessing device 9906 performs certain corrections, such as sensitivity correction on the raw data or the various implementations of the motion estimation and compensation in CT scan images.
In an embodiment, the pre-processing device 9906 implements the various implementations of the motion estimation and compensation in CT scan images, as described in embodiments above. A memory 9912 stores the resultant data, which is also called projection data at a stage immediately before reconstruction processing. The memory 9912 is connected to a system controller 9910 through a data/control bus 9911, together with a reconstruction device 9914, input device 9915, and display 9916. The system controller 9910 controls a current regulator 9913 that limits the current to a level sufficient for driving the CT system.
The detectors are rotated and/or fixed with respect to the object being scanned, such as the patient, among various generations of the CT scanner systems. In one implementation, the above-described CT system can be an example of a combined third-generation geometry and fourth-generation geometry system. In the third-generation system, the X-ray tube 9901 and the X-ray detector 9903 are diametrically mounted on the annular frame 9902 and are rotated around the object as the annular frame 9902 is rotated about the rotation axis RA. In the fourth-generation geometry system, the detectors are fixedly placed around the patient and an X-ray tube 9901 rotates around the patient. In an alternative embodiment, the radiography gantry 9900 has multiple detectors arranged on the annular frame 9902, which is supported by a C-arm and a stand.
The memory 9912 can store the measurement value representative of the irradiance of the X-rays at the X-ray detector unit 9903. Further, the memory 9912 can store a dedicated program for executing various steps of the methods for motion estimation and compensation in CT scan images, such as example method 100,
Post-reconstruction processing performed by the reconstruction device 9914 can include filtering and smoothing the image, volume rendering processing, and image difference processing as needed. The image reconstruction process can implement various CT image reconstruction methods. The reconstruction device 9914 can use the memory to store, e.g., projection data, reconstructed images, calibration data and parameters, and computer programs.
Further, the memory 9912 can be non-volatile, such as ROM, EPROM, EEPROM or FLASH memory. The memory 9912 can also be volatile, such as static or dynamic RAM, and a processor, such as a microcontroller or microprocessor, can be provided to manage the electronic memory as well as the interaction between the FPGA or CPLD and the memory.
In one implementation, the reconstructed images can be displayed on a display 9916. The display 9916 can be an LCD display, CRT display, plasma display, OLED, LED, or any other display known in the art. The memory 9912 can be a hard disk drive, CD-ROM drive, DVD drive, FLASH drive, RAM, ROM, or any other electronic storage known in the art.
While certain implementations have been described, these implementations have been presented by way of example only, and are not intended to limit the teachings of this disclosure. Indeed, the novel methods, apparatuses and systems described herein may be embodied in a variety of other forms; furthermore, various omissions, substitutions, and changes in the form of the methods, apparatuses and systems described herein may be made without departing from the spirit of this disclosure.