X-ray computed tomography (CT) is an imaging modality with broad clinical usage in disease diagnosis, therapy monitoring, and image guided intervention covering all major human anatomy. CT scans acquire x-ray projection data that must be reconstructed to form human-interpretable three-dimensional images of patient's anatomy. This image reconstruction process requires setting parameters such as the reconstruction slice thickness and convolution kernel, which can greatly affect the level of noise and the spatial resolution of the resulting images.
CT scanners differ substantially in x-ray sources, detectors, data acquisition systems, and reconstruction algorithms, but the general clinical workflow, including generating image series, image archival, retrieval, and display, remains very similar among most scanners. Images at different kernels (e.g., smooth, medium sharp, sharp, and the like), slice thicknesses, and intervals at different 3D planes are typically reconstructed from CT projection data. Some of these images can also be generated by 3D reformatting from already reconstructed images.
In order to properly assess the patient's medical condition, radiologists interpret image features from a variety of tissue types. Some diagnostic tasks rely on the detection of low-contrast features, whereas other tasks may require resolving small anatomical details at higher contrast levels. Low-contrast image features are most conspicuous at low noise levels, which is typically achieved using thicker reconstructed slices and low-pass reconstruction kernels at the cost of increased partial volume averaging and worse axial spatial resolution. Small-scale and high-contrast image features, on the other hand, can be better interpreted using thin slices and high-pass kernels at the cost of increased noise levels.
Selection of an appropriate reconstruction kernel is a step in reconstructing CT images that can greatly influence the appearance and clinical utility of the reconstructed image. That is, the choice of kernel can have a dramatic effect on the appearance of the reconstructed image. A very sharp kernel results in well-defined edges and high spatial resolution, but also amplifies image noise. On the other hand, a very smooth kernel reduces the noise, but comes at the cost of blurring sharp edges and fine anatomical details. This fundamental tradeoff means that diagnostic tasks may require multiple image sets, each reconstructed with different kernels, in order to evaluate all aspects of the images and achieve an accurate diagnosis.
Due to this inherent tradeoff between image noise and spatial resolution there is currently no single image volume (often called series) that is optimal for all diagnostic tasks; including in those challenging scenarios where high spatial resolution is desired, but at low-contrast levels. Image-based diagnosis is complex, covering diverse anatomy and having several objectives; thus, it is routine in the clinical setting to reconstruct several image series from the same projection data. As an example, CT head trauma imaging is one such application that employs thin slices reconstructed with sharp kernels to detect fractures. CT head trauma imaging also uses both thick and thin slices reconstructed with smooth kernels for soft tissue imaging of the brain to detect hemorrhages and infarctions, resulting in at least three reconstructed series per scan.
Which image series is created depends on the clinical exams and diagnostic tasks. For example, trauma, musculoskeletal, thoracic and neurological CT exams can require ten or more separate reconstructions for a single patient scan, many of which use different kernel settings. It is not uncommon that some of the exams require many image series in a variety of configurations to be created and archived. Large numbers of image series being created and stored may put a huge burden on technologists, increase the burden on the archival system, increase reconstruction time, and slow down the scanner, which is problematic in a busy clinical environment. In addition, even if the exam protocol specifies a large number of image series to be generated at a variety of kernels and slice thicknesses, sometimes one would later like to have additional image series reconstructed in a manner that differs greatly from all images that have been archived. Since CT projection data are typically deleted from the scanner within a few days after the exam, it is unlikely that the data will be available to generate the needed image series at a later date.
One potential solution that could preserve the ability to reconstruct different kernel images at any time is to archive the CT projection data. There are many reasons why this is not feasible. First, CT projection data are usually large files that are difficult to transfer through a network. Second, most archival systems don't support the format of CT projection data, which are all encoded with proprietary information by the manufacturers of the CT systems. Most importantly, even if the CT projection data are available and can be interpreted correctly, the reconstruction system that is necessary to generate images from the projection data might not readily be available. This is because the reconstruction system is vendor and scanner-model specific and may be upgraded from time to time, making older projection data unusable.
Other approaches have relied upon replacing image voxels obtained with smooth kernels and corresponding to high contrast bone anatomy with sharp kernel voxels, thereby producing a single image with matched diagnostic performance as the two individual kernel images. However, such threshold-based multi-kernel methods are prone to stair-step discontinuity artifacts at the brain-skull boundary and require matching slice thicknesses, which could be suboptimal for each kernel. Other approaches utilizing iterative reconstruction spatially vary their regularization function over different anatomy to control the local smoothness, but at the cost of increased computation time.
Compared to abdominal imaging, head imaging poses additional challenges. Interpreting CT images of the head is a challenging diagnostic task that requires attention to both low-contrast features in the brain and high-resolution features in the skull. In addition to high spatial resolution requirements to detect fractures in the skull, Hounsfield unit (HU) differences in the brain, particularly between gray and white matter, are especially small. For these reasons, multiple image series are routinely reconstructed with different kernel and slice thickness configurations to meet these image quality requirements. This places a burden on technologists and radiologists to prepare and interpret different image series for a single exam.
Navigating and maintaining the plethora of available reconstruction kernels also creates numerous opportunities for errors, and therefore introduces a substantial burden on CT manufacturers, technologists, and radiologists reading CT images. There exists a need for both reducing the burden of multiple kernels and also for preserving the ability to generate different images with different kernels after the projection data has been lost.
The present disclosure addresses the aforementioned drawbacks by providing a system and method for synthesizing information from multiple image series of different kernels into a single image series using deep-learning based methods trained using a task-based loss function that includes a sharp loss term and a smooth loss term that parameterize training. For multi-kernel synthesis, a single set of images with desired high spatial resolution and low image noise can be synthesized from multiple image series of different kernels. The synthesized kernel is sufficient for a wide variety of clinical tasks, even in circumstances that would otherwise require many separate image sets.
In one configuration, a method is provided for synthesizing computed tomography (CT) image series. The method includes reconstructing at least two CT image series. The two image series are reconstructed with different reconstruction kernels. The method also includes synthesizing at least one new CT image series by applying the at least two CT image series to an artificial neural network that has been trained on training data using a task-based loss function comprising a sharp loss term that trains the neural network for similarity with a sharp kernel target and a smooth loss term that trains the neural network for similarity with a smooth kernel target.
In one configuration, a system is provided for synthesizing computed tomography (CT) image series. The system includes a computer system configured to reconstruct at least two CT image series. The at least two image series are reconstructed with different reconstruction kernels. The computer system is also configured to synthesize at least one new CT image series by applying the at least two CT image series to an artificial neural network that has been trained on training data using a task-based loss function comprising a sharp loss term that trains the neural network for similarity with a sharp kernel target and a smooth loss term that trains the neural network for similarity with a smooth kernel target.
The foregoing and other aspects and advantages of the present disclosure will appear from the following description. In the description, reference is made to the accompanying drawings that form a part hereof, and in which there is shown by way of illustration a preferred embodiment. This embodiment does not necessarily represent the full scope of the invention, however, and reference is therefore made to the claims and herein for interpreting the scope of the invention.
A system and method is provided for synthesizing information from multiple image series reconstructed with different kernels into a single image series, and also to generate image series with different kernels from a single image series reconstructed from the scanner. A single set of images may be synthesized with the best qualities of images reconstructed using multiple kernels that can be used for a wide variety of tasks. In addition, a method may employ a loss function with deep learning-based systems, such as a deep convolutional neural network (CNN), to generate images with different kernels from one single image series reconstructed by the scanner with a sharp kernel. This method may be used to generate images at different kernels in just a fraction of the time it takes to run a full reconstruction, and may do so without using the raw projection data.
Diagnostic CT imaging for head trauma is a type of exam that requires image quality features with conflicting requirements in terms of noise and spatial resolution, particularly when using filtered back projection (FBP) reconstruction. Diagnostic tasks in the brain require differentiation of very low-contrast image features, which are typically only discernable on thick-slice (e.g., 5 mm) reconstructions using smooth kernels and a very narrow window level setting. On the other hand, diagnostic tasks in boney regions requires high spatial resolution to resolve small-scale features such as skull fractures, thus very sharp reconstruction kernels (sometimes with edge enhancement) and thin slices are the clinical norm. In order to bridge the gap between these very different image configurations, the noise level of the sharp-kernel reconstructions may need to be reduced by a factor of 16 or greater while maintaining both spatial resolution and soft-tissue contrast.
Low-pass reconstruction kernels, such as for the head, may be used with additional post-processing steps to enhance the small differences between gray and white matter and aid in the detection of low-contrast lesions. In some configurations a method for synthesized Improved Resolution and Concurrent noise reduction (ZIRCON), is provided. The systems and methods may include a denoising and image synthesis CNN-model that uses a loss function with two complementary loss terms to parameterize training. The systems and methods have been validated in the ability to reduce noise and enhance soft tissue contrast. In a non-limiting example, reduced noise and enhanced soft tissue contrast was achieved in the brain while preserving sharp details in the skull, effectively combining the favorable image quality features of each input kernel into a single image series optimized for imaging of the head.
The output of the systems and methods may be a single low-noise and high-resolution image series that may be used to perform diagnostic tasks that would otherwise require multiple series with different reconstruction parameters. Processed or synthesized images may have noise levels less than or equal to the noise levels of the corresponding smooth-kernel reconstructions. The synthesized images may maintain the soft-tissue contrast of the corresponding smooth-kernel reconstructions. The synthesized images may maintain the spatial resolution of the corresponding sharp-kernel reconstructions. The synthesized images may also not introduce artifacts that could affect diagnostic performance.
Although the advantages and disadvantages of smooth and sharp kernels are understood, it was not previously possible to combine the advantages of images reconstructed using different kernels into a single set of images. No generalized algorithm has previously been developed that is able to blend together images created with multiple kernels to produce an output image that matches or exceeds the clinical utility of each input image individually. Synthesizing information from multiple image series, each reconstructed with a different kernel, into a single image series allows for combining the advantages of different kernels. The application of an artificial neural network allows for circumventing the difficult task of determining hand selected rules for combining features in the input images. Instead of using predetermined rules, a network training procedure may determine how the input features should be merged to produce an optimal output image. Since the application of a trained neural network is very fast, this adds negligible computation time to the image reconstruction process, and opens up the possibility of applying the network in real time.
For the purposes of this disclosure and accompanying claims, the term “real time” or related terms are used to refer to and define a real-time performance of a system, which is understood as performance that is subject to operational deadlines from a given event to a system's response to that event. For example, a real-time extraction of data and/or displaying of such data based on empirically-acquired signals may be one triggered and/or executed simultaneously with and without interruption of a signal-acquisition procedure.
In some configurations, a convolutional neural network (CNN)-based multi-kernel synthesis may include a task-based loss able to parameterize specific image quality requirements. A CNN denoising and kernel synthesis model may be used with a task-based training routine to generate a single thin slice, low noise, soft tissue contrast enhanced, sharp detail preserved image series. The resulting image series is able to combine the desired image quality properties of each input series into a single output containing the desired properties of each and lower noise in a thin slice.
In some configurations, a CNN may be trained to synthesize multiple input image series, each produced with a different reconstruction kernel, into a single output image series that exhibits improved image qualities (in terms of high sharpness and low noise levels) compared to each input individually. The CNN architecture may be based on any of a variety of selected design, such as a ResNet design, and may include repeated blocks of residual units with a plurality of layers, such as a total of 32 convolutional layers. The CNN inputs may include a selection from original images, such as images produced by soft (e.g., B10), medium-sharp (e.g., B45), and sharp (e.g., B70) kernels that may be stacked in the channel dimension. The CNN output may be treated as a perturbation that may be added to the sharp-kernel input, which may reduce the required training time. The network may be trained using supervised learning with full-dose images, reduced dose images, simulated images, such as simulated quarter-dose CT images, or other lower dose images, and the like.
In some configurations, the full-dose images reconstructed with a sharp kernel may be used as the ground truth to evaluate a comparative loss function, such as with a mean squared error function. In some configurations, a loss function may include a task-based loss function. In a non-limiting example, a task-based loss function may include two task-based loss terms, although any number of terms may be used. A first term may be used to enforce similarity with the sharp kernel target images in the HU range outside of the brain (<0 or >80) while penalizing sharp kernel noise within the brain HU range using total variation loss. The second term may be used to enforce similarity with the smooth kernel target images within the brain HU range.
CT image reconstruction kernels can be broadly characterized in terms of noise and spatial resolution properties as being either “smooth” (lower spatial resolution and lower noise) or “sharp” (higher spatial resolution and higher noise). Images reconstructed with smooth kernels (xsmooth) are optimal for interpreting low-contrast features, whereas images reconstructed with sharp kernels (xsharp) are optimal for interpreting features at small spatial scales. In accordance with the present disclosure, a framework is provided to combine the clinically desirable features from smooth-kernel images and the clinically desirable features from sharp-kernel images into a single image series. A parameterized function may be used:
Φ(xsmooth,xsharp,θ)=z, (1)
such that the synthesized image z has the characteristic spatial resolution of xsharp while also having the low noise level of xsmooth. Φ may be modeled using a deep convolutional neural network (CNN).
If examples of ideal synthesized images z were known a priori, then the parameters θ could be optimized using supervised learning with e.g. stochastic gradient descent:
where L denotes the loss function L (e.g the Frobenius norm of (Ztrue−z)) averaged over a batch of N examples (z{circumflex over ( )}i, zi).
Ideal synthesized images z{circumflex over ( )}i are not typically known a priori, which makes a direct supervised learning approach impractical. In some configurations, z{circumflex over ( )} can also be defined as a noise-free representation of the sharp-kernel image xsharp. This means we can abstractly decompose xsharp as:
x
sharp
={circumflex over (z)}+δ, (3)
Where δ denotes the contribution from random noise in the imaging system. One solution to eq. (1) is then:
Φ(xsmooth,xsharp,θ)=xsharp−δ={circumflex over (z)}. (4)
Eq. (4) phrases the image synthesis task as a denoising task for xsharp.
Supervised learning using paired high-noise (x′) and low-noise (x) images is effective for optimizing CNN models for such denoising tasks. In some configurations, the denoising CNN may be modeled as a parameterized function f(x; θ), with parameter values θ{circumflex over ( )} that satisfy:
ƒ(x′,{circumflex over (θ)})=x′−δ=x. (5)
The optimal parameters can be found by solving the minimization problem:
Where L denotes a scalar loss function such as the Frobenius norm of (x′−x).
Noise characteristics of the CT imaging system may be well-known, such that it is possible to simulate a representation of a CT image x with additional noise:
x′
sharp
=x
sharpδsim, (7)
where δsim denotes the contribution of additional simulated noise. An approximate solution to eq. (4) may be formed by solving a minimization problem of the form:
for a suitable choice of the loss function L. The loss function can be decomposed into two separate terms:
L(z,xsmooth,xsharp)=Lsharp(z,xsharp)+Lsmooth(z,xsmooth), (9)
where the Lsmooth term constrains the low-contrast features in the synthesized image z to be similar to those of xsmooth; and Lsharp constrains the small-scale features in z to be similar to xsharp.
Referring particularly now to
The CT system 100 also includes an operator workstation 116, which typically includes a display 118; one or more input devices 120, such as a keyboard and mouse; and a computer processor 122. The computer processor 122 may include a commercially available programmable machine running a commercially available operating system. The operator workstation 116 provides the operator interface that enables scanning control parameters to be entered into the CT system 100. In general, the operator workstation 116 is in communication with a data store server 124 and an image reconstruction system 126. By way of example, the operator workstation 116, data store sever 124, and image reconstruction system 126 may be connected via a communication system 128, which may include any suitable network connection, whether wired, wireless, or a combination of both. As an example, the communication system 128 may include both proprietary or dedicated networks, as well as open networks, such as the internet.
The operator workstation 116 is also in communication with a control system 130 that controls operation of the CT system 100. The control system 130 generally includes an x-ray controller 132, a table controller 134, a gantry controller 136, and a data acquisition system 138. The x-ray controller 132 provides power and timing signals to the x-ray source 104 and the gantry controller 136 controls the rotational speed and position of the gantry 102. The table controller 134 controls a table 140 to position the subject 112 in the gantry 102 of the CT system 100.
The DAS 138 samples data from the detector elements 110 and converts the data to digital signals for subsequent processing. For instance, digitized x-ray data is communicated from the DAS 138 to the data store server 124. The image reconstruction system 126 then retrieves the x-ray data from the data store server 124 and reconstructs an image therefrom. The image reconstruction system 126 may include a commercially available computer processor, or may be a highly parallel computer architecture, such as a system that includes multiple-core processors and massively parallel, high-density computing devices. Optionally, image reconstruction can also be performed on the processor 122 in the operator workstation 116. Reconstructed images can then be communicated back to the data store server 124 for storage or to the operator workstation 116 to be displayed to the operator or clinician.
The CT system 100 may also include one or more networked workstations 142. By way of example, a networked workstation 142 may include a display 144; one or more input devices 146, such as a keyboard and mouse; and a processor 148. The networked workstation 142 may be located within the same facility as the operator workstation 116, or in a different facility, such as a different healthcare institution or clinic.
The networked workstation 142, whether within the same facility or in a different facility as the operator workstation 116, may gain remote access to the data store server 124 and/or the image reconstruction system 126 via the communication system 128. Accordingly, multiple networked workstations 142 may have access to the data store server 124 and/or image reconstruction system 126. In this manner, x-ray data, reconstructed images, or other data may be exchanged between the data store server 124, the image reconstruction system 126, and the networked workstations 142, such that the data or images may be remotely processed by a networked workstation 142. This data may be exchanged in any suitable format, such as in accordance with the transmission control protocol (“TCP”), the internet protocol (“IP”), or other known or suitable protocols.
Referring to
In some configurations, a network training procedure may determine how the input image features should be merged to produce an optimal output image. The optimization may be performed to create a single image series where different kernels have been mixed together to emphasize certain features or regions of interest in an image, such as mixing a soft kernel and a sharp kernel to form an optimized single image. In some configurations, an optimization may be performed using deep learning methods to model a function that combines image features in a way that is desirable with regards to a specified loss function. In some configurations, the synthesized image may be a single image and may be a combination of the best image qualities of multiple kernels. Image qualities may include spatial resolution, contrast-to-noise ratio, and signal-to-noise ratio.
In some configurations, the method used to generate data to train the neural network includes supervised learning. In order to use supervised learning, a training dataset may be generated that includes input images, each paired with the desired output. Since the desired output, such as a map of the X-ray attenuation coefficients of the imaged object, may be unknown, and the input CT images may be an approximation of this quantity, generating a useful training dataset may not be a simple task. In some configurations to create a dataset, the input images may be degraded by artificially inserting CT image noise, thereby allowing the original CT images to be used as examples of superior image quality. After the neural network has been trained to combine the input images in a desired way, it can then be applied to unaltered clinical images.
In general, training a neural network may include initializing the neural network, such as by computing, estimating, or otherwise selecting initial network parameters (e.g., weights, biases, or both). Training data can then be input to the initialized neural network, generating model output data (e.g., synthesized image series data). The quality of the model output data can then be evaluated, such as by passing the model output data to a loss function to compute an error. The current neural network can then be updated based on the calculated error (e.g., using backpropagation methods based on the calculated error). For instance, the current neural network can be updated by updating the network parameters (e.g., weights, biases, or both) in order to minimize the loss according to the loss function. When the error has been minimized (e.g., by determining whether an error threshold or other stopping criterion has been satisfied), the current neural network and its associated network parameters represent the trained neural network.
As described above, it is an aspect of the present disclosure that a task-based loss function can be used when training the neural network for multi-kernel synthesis. This task-based loss function includes two complementary terms: a sharp loss term and a smooth loss term. As will be described below, the sharp loss term can be applied to the model output data (e.g., synthesized image series data) while the smooth loss term is applied to the model output data after domain conversion.
Referring to
As one example, the task-based loss function may be expressed as:
Here ŷ is the model output, ysharp, and ysmooth are the respective routine dose target kernel images, ŷ*g indicates the filtering operation smoothing the model output during training, and TV(ŷ) is an additional total variation regularization term that can be added to further control the overall noise level.
To model the kernel synthesis mapping Φ(xsharp; xsmooth)=z, a residual U-Net architecture characterized for its image-to-image applications may be used. The network dimensions may be adjusted to reduce the receptive field while maintaining the model capacity in terms of trainable parameters. In a non-limiting example, the depth of the U-Net in terms of the number of down-sampling blocks may be reduced from 5 to 2, which reduces the receptive field and reflects the more localized nature of the kernel synthesis relative to, for example, organ segmentation. To maintain model capacity, the width of the U-Net, in terms of the number of convolutional filters at each layer, may be increased. In a non-limiting example, a base of 256 filters may be used, whereas conventional U-Net systems typically have a base of 64 filters.
A decomposition of the loss function (Eq. 9) into specific loss terms (Lsharp, Lsmooth) for each image series (xsharp, xsmooth) may provide for tailoring of each term to train a model whose output is a single image series possessing the favorable image quality properties of each input for the given task. In a non-limiting example for head trauma, xsharp may emphasize high spatial frequencies used for visualizing the bones of the skull while being of limited use in the brain. Lsharp may be formed to penalize differences between a model output {circumflex over (z)} and full dose sharp kernel target xsharp in the pixel locations {right arrow over (r)} covering the skull, referred to as the sharp domain Dsharp:
L
sharp
=α∥{circumflex over (z)}({right arrow over (r)})−xsharp({right arrow over (r)})∥2, for {right arrow over (r)}∈Dsharp, (10)
where ∥⋅∥2 is the L2 norm and a is a scalar hyperparameter controlling the relative influence of Lsharp in eq. (9). xsmooth may be optimized for low-contrast imaging tasks, such as those inside the brain, utilizing a low-pass kernel that reduces image noise and a specialized non-linear gray level remapping used to enhance gray-matter and white-matter contrast in the brain, referred to as multi-band filtration. A sharp kernel acquired with exceptionally high dose may be used to match the low noise of a smooth kernel, but would still lack the enhanced gray-white matter contrast of the clinical kernel. In a non-limiting example for the brain, a low-noise smooth kernel xsmooth may be desired for training purposes. A smooth kernel may, however, suffer from the loss in spatial resolution. To address loss in spatial resolution Lsmooth may incorporate a smoothing of the model output to match the spatial resolution of xsmooth prior to computing the L2 norm, a process called kernel conversion C({circumflex over (z)}({right arrow over (r)}),σ). The kernel conversion process may be part of Lsmooth as follows:
L
smooth
=β∥C({circumflex over (z)}({right arrow over (r)}),σ)−xsmooth({right arrow over (r)})∥2+γTV({circumflex over (z)}({right arrow over (r)})), for {right arrow over (r)}∈Dsmooth. (11)
Here kernel conversion filters the model output with a fixed isotropic gaussian kernel
whose only parameter is the gaussian width σ. σ may be determined by:
And may be based upon the combination of sharp and smooth image data, xsharp and xsmooth, respectively. In a non-limiting example (Table 2) it was found σ=0:47 mm. A total variation (TV) regularizer may be added to control the overall noise level of the model output within the region of interest, such as the brain (Dsmooth) with hyperparameter γ.
The domains D in which loss functions take effect can be practically thought of as masks determined via segmentation. In cases of large HU differences between the anatomic regions of interest, such as in some forms of head imaging tasks given the bones and soft tissue, this mask can be found by intensity thresholding. Thresholds may be based upon typical settings, such as brain display settings. In a non-limiting example, thresholds may be selected as:
In some configurations, in order to ensure a smooth transition between domains a windowing function may be used with a cubic weighted distance function.
The image quality tasks may be defined by the loss terms Eq. (10) and Eq. (11), which are parameterized by α, β, and γ. These hyperparameters control image sharpness, soft tissue contrast enhancement, and noise level within the region of interest, such as the brain, respectively and are combined in Eq. (8) to define the total loss. The specific values of these hyperparameters may depend upon the combination of input kernels and relative priority of each image quality task. Hyperparameters may be chosen with expert reader feedback using a validation set to balance noise reduction in the region of interest, such as the brain while maintaining suitable image texture and preserving sharpness, such as in the skull. Non-limiting example hyperparameters for a brain imaging application are summarized in Table 1. Model parameters may be optimized using an optimizer, such as an ADAM optimizer, and a stepwise decreasing learning rate n may be used. In a non-limiting example, a stepwise learning rate may be selected to decrease by a set threshold, such as by a third over three equidistant steps covering the total number of training epochs.
In some configurations, an edge enhancement may be used. High-contrast anatomical details from the sharp kernel may be further enhanced in the synthesized output. In a non-limiting example, despite having the same cutoff frequency as the clinical reference Hr69 kernel, a Qr69 kernel used in model training (Table 2) lacked a pre-processed edge enhancement native to the clinical reference kernel, making Qr69 appear visually less sharp. Using Hr69 as the sharp training kernel revealed artifacts in the model output when viewed with brain display settings. These included boundary discontinuity artifacts at the brain-skull interface and unsatisfactory noise texture within the brain and were due to the undesired edge enhancement of Hr69 and lack of beam hardening correction. While beam hardening corrections were not available in sharp kernels directly on the scanner, they were available on offline reconstruction systems. In some configurations, beam hardening may be incorporated as an additional learning task by utilizing beam hardening corrected images as training targets in both smooth and sharp targets. Such corrections are shown in a non-limiting example in Table 2. To address reduced apparent sharpness in the bone from the Qr69 kernel, the highly attenuating skull regions from the sharp kernel image were edge-enhanced using an un-sharp mask filter that best maps the Qr69 kernel to Hr69 and added as a weighted sum with the model output:
{circumflex over (z)}′=x′
sharp
δ+{circumflex over (z)}(1−δ) (13)
where δ is the windowed mask representing Dsharp.
Referring to
Referring to
In some configurations, the baseline image is a single image reconstruction and may be in the sharpest possible spatial resolution that is determined by the data acquisition and in a sufficiently large matrix size to avoid aliasing. In some configurations, the spatial resolution can be lower than these limits and the matrix size can be smaller to save storage space, provided that all the clinically required kernels can still be generated by this one single image series.
In current clinical workflows, many reconstruction jobs are needed to be done on the scanner by the image reconstruction system (IRS). Often, some of the reconstructions or reformats need to be manually performed by the technologists before sending to the PACS or long-term archival systems. This workflow is an inefficient use of scanner and technologist time, especially in a busy working environment. The process described with respect to
In some configurations, various image series of different kernels can either be generated in real time based on the request from the human reader or pre-generated based on a specific protocol for the diagnostic task and loaded on an image storage system, such as the PACS, or other image viewers for diagnosis. In one non-limiting example, the list of reconstruction jobs can be pre-generated on the workstation by the CNN (and 3D reformat tool) and displayed automatically after the exam is loaded for diagnosis. Additional kernels can be generated as requested, which may be much faster than image reconstructions performed from raw data. Generating various image series of different kernels may not require vendor-specific software.
In some configurations, the neural network is trained prior to being used to generate new image series of different kernels. During the training of the network, the training images can be at a higher dose so that the neural network can be trained to generate a new image series with a noise reduction effect.
In some configurations, the newly generated image output of the neural network at step 440 of
Referring to
Referring to
In a non-limiting example brain application, task-based loss function Ltask may combine loss terms appropriate to the sharp and smooth kernels, Lsharp, and Lsmooth by:
The task-based loss function may include two loss functions that control the local image quality. Lsmooth enforces similarity with the smooth image target y2, such as in the HU range given, which is typical of brain. Lsharp enforces similarity with the sharp target, such as outside the brain HU range while penalizing sharp kernel noise within the brain HU range.
In some configurations, a U-Net CNN architecture may be used with sharp and smooth kernel inputs. A baseline model may be trained using a mean-squared-error (MSE) loss function. A second model may then be trained under the same conditions using a task-based loss function Ltask which combines loss terms appropriate to the sharp and smooth kernels, Lsharp, and Lsmooth. The Lsharp loss function penalizes noise from the sharp kernel images using total variation loss V. For all other HUS, Lsharp enforces similarity with the sharp kernel image target y1. Similarly, the Lsmooth loss function enforces similarity with the smooth kernel image target y2 desired HU values. Additionally, a kernel conversion C may be applied to the model output as a precalculated fixed-width Gaussian convolution that approximates the mapping from the sharp to the smooth kernel. The kernel conversion of the model output may provide for better similarity comparisons with the smooth target and improved training performance.
Referring to
Execution time may vary depending on hardware and the exact CNN architecture used. In some configurations, an easier kernel conversion may be able to use a smaller CNN (fewer residual blocks). In one non-limiting example, time to process a single scan slice was 0.17 sec. for kernel synthesis and 0.02 sec.-0.17 sec. for kernel conversion using commercially available graphical processing unit (GPU) hardware. A CNN may be implemented in any appropriate coding language, such as Python, and may be trained using any appropriate tools, including open-source tools (such as Tensorflow) and standard Python scientific computing packages. CNN architecture may be selected to be similar to image classification networks (such as ResNet), for ease of commercial deployment.
In one non-limiting example, simulated quarter-dose images obtained from different kernels were used as the network input, and the corresponding full-dose images reconstructed with the sharp kernel were used as the ground truth to evaluate a mean-squared-error loss function. The network was trained on 500,000 example images of various sizes that were cropped from ten abdominal CT exams. After training, the performance was evaluated by comparing input and output images using a reserved set of full-dose abdominal, chest, and phantom CT scans that were not used in the network training.
Images were combined from three input kernels: a very sharp kernel (e.g., Siemens B70), a very smooth (e.g., Siemens B10), and an intermediate kernel (e.g., Siemens B45). The resultant images maintain the sharp edges and fine anatomical details of the sharpest input kernel, while also exhibiting the low noise characteristics and maintaining the ability to identify subtle anatomical details of the smooth kernels. For images acquired with low dose level (one quarter of clinical dose), the signal-to-noise ratio of in the sharp kernel images is increased by over 200%, thereby potentially making these images clinically useful. Different tissues that require specific kernels for optimal viewing, such as bones, liver and lungs, all appear to have satisfactory quality in the resultant image. Furthermore, the images are devoid of any artificial textures that detract from the natural appearance of the images.
The synthetic images in the current example improved the signal-to-noise ratio by 338% compared to the sharp kernel images, without observable blurring of sharp edges. Despite the increased smoothness, the synthesized image maintains many of the sharp features and fine details contained in the sharpest input images. The synthesized images also appear natural, with no perceptible artificial texture introduced that detracted from the natural appearance of the synthetic image. The algorithm was robust enough to be applied to multiple tissue types, including the bones, lungs, and liver.
Example Kernel Conversion from a Single Baseline Image Series
In one non-limiting example of how a deep learning-based method can be used to generate multiple reconstruction kernels from a single high-resolution baseline image series, a deep CNN was trained using supervised learning on clinical CT images reconstructed with the sharpest available kernel (an S80 kernel on a Siemens Flash scanner) with thin slices (0.6 mm) to preserve information acquired in the raw data, as inputs. Using a very sharp kernel ensures that all useful information from the data acquisition is included in the input image series.
Three commonly-used body CT target kernels were generated with a CNN approach: a smooth kernel (B30), medium-sharp kernel (B46), and sharp kernel (B70). A separate CNN training was performed for each kernel. For each kernel, the full training dataset consisted of 250,000 paired 64×64 pixel images randomly cropped from the base kernel images and target kernel images. A supervised learning approach was used, with the loss function defined as the mean squared error (MSE) between the CNN-generated kernel image and the truly reconstructed target kernel image. The CNN was implemented with Tensorflow. The approach was tested on a reserved set of 20 clinical patient scans that were not used for network training. Accuracy of the approach was determined by subtracting the simulated kernel images from the target kernel images. Additionally, the mean absolute error was calculated on a pixel by pixel basis for each case.
After training for 100 epochs, the result was tested on a patient scan not included in the training data. The CNN simulates the appearance of other kernels to an almost imperceptible level of accuracy. The mean absolute difference between the images generated with the trained CNN and the actually reconstructed images was 2.0 HU for the B30 kernel, and 3.1 HU for the B46 kernel. Any reconstruction kernels can be generated, including iterative reconstruction. By training with high-dose image series, noise reduction effect can be inherently included in the kernel conversion. By converting baseline image series to images reconstructed with iterative reconstruction, similar image quality improvement as by iterative reconstruction may be achieved. Artifact correction may have already been implemented in the baseline image series.
Referring to
A dataset consisting of CT images of patients with suspected head trauma was collected to demonstrate clinical utility. Head trauma CT was chosen since the clinical task involves both low-contrast (hemorrhage, infarction), and high-resolution (skull fracture) image features. The protocols for head trauma cases are typically very complex and require multiple reconstructed image series with different kernels and slice thicknesses, such that the potential value of having a single low-noise and high-resolution image series is high. Clinical examples of suspected head trauma were collected under a protocol approved by the Institutional Review Board. Inclusion criteria considered patients who underwent head CT exams in the emergency department for trauma or acute-onset symptoms suspected of fracture, intracranial hemorrhage, and/or infarction. At total of 585 cases were collected. Screening by a radiologist revealed acute findings in 82 patients (hemorrhage (N=52), hemorrhagic brain infarction (N=2), infarction (N=15), metastasis (N=2), fracture (N=6), and both fracture and hemorrhage (N=5)).
The selected exams were performed as part of routine patient care using commercially available CT scanners (SOMATOM Force, Siemens Healthineers GmbH, Forchheim, Germany) and a protocol specific to head trauma: 1 second rotation time, collimation of 192×6 mm, pitch of 0.6, 120 kVp, 350 effective mAs, and no tube current modulation. A previously validated projection-domain noise- insertion tool was used to simulate exams with a dose level corresponding to 25% of the original dose. Multiple CT image series were reconstructed from each exam to form the training dataset and the clinical reference images. A summary of the reconstruction parameters used is shown in Table 2. In Table 2 IBHC stands for iterative beam hardening correction.
From this dataset, a total of 110 exams from unique patients were used, including 100 for model training and 10 for validation and hyperparameter tuning. An additional 10 cases were set aside for testing of task independent image quality.
The output of the model was a single low-noise and high-resolution image series that was used to perform diagnostic tasks that would otherwise require multiple series with different reconstruction parameters. The noise levels of the synthesized images were compared with the corresponding smooth-kernel reconstructions by measuring the standard deviation at 3 uniform regions-of-interest (ROIs). Similarly, the soft-tissue contrast of the synthesized images was compared with the corresponding smooth-kernel reconstructions by measuring the contrast between adjacent gray matter and white matter tissues in 3 ROIs. Spatial resolution was evaluated by comparing line profiles for the synthesized images and clinical sharp-kernel images across relevant small-scale features such as skull fractures. Finally, the presence of clinically-relevant artifacts was assessed by visual inspection of pathological ROIs and an informal review with a trained radiologist fellow.
The qualitative performance of the synthesized images in terms of low noise in the brain and preservation of sharp details in the skull was demonstrated in a comparison against the input and reference clinical image series. Compared to the clinical thin sharp kernel, the synthesized images retained sharpness. Simultaneously, the thin slice, low noise, and soft tissue contrast enhancement of the synthesized images also improved visibility of a nearby hemorrhage also visible in the smooth thin slice but made more conspicuous in the synthesized images.
Notably, the addition of Lsmooth in the task-based loss function Eq. (11) markedly improved noise reduction with improved soft tissue contrast over a CNN denoising model trained with a generic MSE loss, as demonstrated by a line profile comparison of the synthesized images covering a white matter extension of the corona radiata. The gray-matter white-matter contrast definition was also similar to that of the thin slice clinical smooth image series. This gray-white matter enhancement was confirmed with contrast measurements taken by comparing the mean HU in elliptical ROIs covering adjacent regions of gray (odd-numbered ROIs) and white matter (even-numbered ROIs) that showed a comparable contrast between synthesized and smooth clinical images (Table 3). However, when combined with lower noise, shown by standard deviation measurements in three separate regions of gray-matter (Table 4), this results in an overall higher gray-matter white-matter contrast-to-noise ratio (CNR) in processed images.
The performance of preserving high-frequency details was then quantitatively assessed via a line profile measurement along a fracture in the occipital bone where the fracture definition was matched with that of the clinical sharp thin slice image.
The combination of multi-kernel synthesis with a task-specific loss function can effectively combine the favorable image quality properties of the multiple input image series while simultaneously reducing noise. Applied to a neural imaging task the result has low noise and enhanced contrast in the brain while preserving sharp high-frequency features in the skull. The benefits of having such thin-slice image series with low noise levels in the brain are demonstrated by enhanced visibility of a hemorrhage in the center of the brain. The model output quality may depend on the input data quality.
The non-limiting example results demonstrate that the systems and methods in accordance with the present disclosure are capable of maintaining the relevant image features of both the smooth-kernel and the sharp-kernel images. The combination of sharper reconstruction kernels, thinner slices, and low noise levels results in a clear enhancement of detail in soft-tissue regions. Both qualitative comparisons between the input kernel images and quantitative noise measurements in the brain confirmed the noise reduction performance. Contrast measurements between gray and white matter indicated a successful incorporation of the soft-tissue contrast enhancement derived from the multi-band filtration in the input smooth kernel image. This contrast enhancement combined with lower noise resulted in an overall higher contrast-to-noise ratio between gray and white matter than in any of the clinical kernels. Further enhancement of the visibility of soft tissue lesions was also indicated.
Line profile comparisons across an occipital fracture revealed that preserved sharpness compared to the clinical kernel in maintaining definition of critical pathology.
Given the exponential rise in CT image series due to both the rapid rise in prescribed CT scans and growing number of images per scan there is a demonstrated need to address how to better handle potential information overload. Multi-kernel synthesis is one potential approach at addressing this issue by generating images that more efficiently convey information with anatomy-specific image quality. A CNN-based synthesis method utilizing a task-specific loss function to parameterize training may provide for addressing these challenges, and may also yield an improved image synthesis optimized for applications such as the head, and the like.
The present disclosure has described one or more preferred embodiments, and it should be appreciated that many equivalents, alternatives, variations, and modifications, aside from those expressly stated, are possible and within the scope of the invention.
This application claims the benefit of U.S. Provisional Patent Application Ser. No. 63/170,064 filed on Apr. 2, 2021 and entitled “Systems and Methods for Multi-Kernel Synthesis and Kernel Conversion in Medical Imaging,” which is herein incorporated by reference in its entirety.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US2022/023335 | 4/4/2022 | WO |
Number | Date | Country | |
---|---|---|---|
63170064 | Apr 2021 | US |