This application relates to computed tomography technology.
Cone Beam Computed Tomography (CBCT) reconstruction is one of the central topics in medical image processing. In CBCT, the patient's volumetric information is retrieved based on its x-ray projections in a cone beam geometry along a number of directions. Examples of the CBCT reconstruction algorithms include filtered back projection (FBP) algorithm and other reconstruction algorithms, such as EM and ART/SART. Among its many applications, CBCT is particularly convenient for accurate patient setup in cancer radiotherapy.
Techniques, apparatus and systems are described for implementing a fast GPU-based CBCT reconstruction algorithm based on a small number of x-ray projections to lower radiation dose.
In one aspect a graphics processor unit (GPU) implemented method of reconstructing a cone beam computed tomography (CBCT) image includes receiving, at the GPU, image data for CBCT reconstruction. The GPU uses an iterative process to minimize an energy functional component of the received image data. The energy functional component includes a data fidelity term and a data regularization term. The reconstructed CBCT image is generated based on the minimized energy functional component.
Implementations can optionally include one or more of the following features. The received image data can include volumetric information projected using x-ray projections onto an x-ray imager plane in a cone beam geometry along a number of projection angles. The fidelity term can indicate consistency between the reconstructed CBCT image and an observed image from the number of projection angles. The data regularization term can include a total variation regularization term. Using the iterative process to minimize an energy functional component of the received image data can include using an algorithm that minimizes the data regularization term and the data fidelity term in two alternating steps. The algorithm that minimizes the data regularization term and the data fidelity term in two alternating steps can include an iterative algorithm. The iterative algorithm can include (a) performing a gradient descent update with respect to minimization of the data fidelity term; (b) performing Rudin-Osher-Fatemi model minimization to remove noise and artifacts while preserving sharp edges and main features; (c) performing truncation to ensure non-negativeness of the reconstructed image; and (d) repeating (a)-(c) until a desired minimization of the energy functional component is reached.
In another aspect, a computer implemented method of reconstructing a cone beam computed tomography (CBCT) image includes receiving, at the computer, image data for CBCT reconstruction. An iterative conjugate gradient least square (CGLS) algorithm is used to minimize an energy functional component. The reconstructed CBCT image is generated based on the minimized energy functional component.
Implementations can optionally include one or more of the following features. The received image data can include volumetric information projected using x-ray projections onto an x-ray imager plane in a cone beam geometry along a number of projection angles. The iterative CGCL algorithm can begin with an initial guess and repeatedly minimizes the energy functional component until the reconstructed CBCT image is obtained. The iterative CGCL algorithm can ensure consistency between the reconstructed CBCT image and an observation image from the number of projection angles. The iterative CGCL algorithm can impose a tight frame regularization condition. Imposing a tight frame regularization condition can include decomposing the reconstructed image into a set of coefficients using a convolution function. The computer implemented method can be performed by a graphics processing unit (GPU).
In another aspect, a computing system for reconstructing a cone beam computed tomography (CBCT) image includes a graphics processing unit (GPU) to perform CBCT reconstruction. The CBCT reconstruction performed by the GPU includes receiving image data for CBCT reconstruction. The CBCT reconstruction performed by the GPU includes using an iterative process to minimize an energy functional component of the received image data. The energy functional component includes a data fidelity term and a data regularization term. The CBCT reconstruction performed by the GPU includes generating the reconstructed CBCT image based on the minimized energy functional component. The computing system includes a central processing unit (CPU) to receive the generated CBCT image for output.
Implementations can optionally include one or more of the following features. The received image data can include volumetric information projected using x-ray projections onto an x-ray imager plane in a cone beam geometry along a number of projection angles. The fidelity term can indicate consistency between the reconstructed CBCT image and an observed image from the number of projection angles. The data regularization term can include a total variation regularization term. The iterative process to minimize an energy functional component of the received image data can include an algorithm that minimizes the data regularization term and the data fidelity term in two alternating steps. The algorithm that minimizes the data regularization term and the data fidelity term in two alternating steps can include an iterative algorithm that includes (a) a gradient descent update with respect to minimization of the data fidelity term; (b) Rudin-Osher-Fatemi model minimization to remove noise and artifacts while preserving sharp edges and main features; (c) truncation to ensure non-negativeness of the reconstructed image; and (d) repeating (a)-(c) until a desired minimization of the energy functional component is reached.
In another aspect, a computing system for reconstructing a cone beam computed tomography (CBCT) image includes a graphics processing unit (GPU) to perform the CBCT reconstruction. The GPU is configured to perform the CBCT reconstruction including receiving image data for CBCT reconstruction. The GPU is configured to perform the CBCT reconstruction including using an iterative conjugate gradient least square (CGLS) algorithm to minimize an energy functional component. The GPU is configured to perform the CBCT reconstruction including generating the reconstructed CBCT image based on the minimized energy functional component. The computing system includes a central processing unit (CPU) to receive the reconstructed CBCT image for output.
Implementations can optionally include one or more of the following features. The received image data can include volumetric information projected using x-ray projections onto an x-ray imager plane in a cone beam geometry along a number of projection angles. The GPU can be configured to perform the iterative CGCL algorithm by beginning with an initial guess and repeatedly minimizes the energy functional component until the reconstructed CBCT image is obtained. The iterative CGCL algorithm can ensure consistency between the reconstructed CBCT image and an observation image from the number of projection angles. The GPU can use the iterative CGCL algorithm to impose a tight frame regularization condition. The GPU can be configured to impose the tight frame regularization condition by decomposing the reconstructed image into a set of coefficients using a convolution function.
The subject matter described in this specification potentially can provide one or more of the following advantages. For example, the described techniques, apparatus and systems can be used to implement fast graphics processing unit (GPU)-based low-dose cone beam computed tomography (CT) reconstruction via total variation regularization. In the described low-dose CBCT reconstruction implementation, the CBCT image is reconstructed by minimizing an energy functional that includes a Total Variation (TV) regularization term and a data fidelity term. An algorithm can be implemented to efficiently perform the optimization process via a proximal forward-backward splitting method and a multi-grid technique. Furthermore, the described reconstruction algorithm can be fully implemented on graphics processing unit (GPU), ensuring satisfactory computational speed. The powerful minimization algorithm, as well as the GPU implementation, can provide both sufficient reconstruction accuracy and satisfactory efficiency. In particular, the reconstruction time can range from 0.5 to 5 minutes depending on the number of x-ray projections used, a significant improvement over currently similar reconstruction approaches.
Also, the described techniques can provide CBCT reconstructions with a greatly reduced number of noisy projections, while maintaining high image quality. In addition, the reconstruction process can be sped up by utilizing a better optimization algorithm and a more powerful computational hardware. For example, general purpose graphic processing units (GPUs) can be used to speed up heavy duty tasks in radiotherapy, such as CBCT reconstruction, deformable image registration, dose calculation, and treatment plan optimization.
a shows two patches used to measure MTF in CBCT images reconstructed from TF and FDK algorithms at 1.0 mAs/projection with 40 projections.
b shows MTF measurements obtained from different methods.
c shows three patches used to measure MTF in CBCT images reconstructed from TF method at 1.0, 0.3, and 0.1 mAs/projections with 40 projections.
d shows MTF measured at different mAs levels.
a shows results of taking a case at 0.3 mAs/projection and 40 projections as an example.
b shows the dependence of CNRs on mAs levels measured in those four ROIs in the CBCT images reconstructed using the TF method.
c shows corresponding CNRs obtained from the conventional FDK algorithm.
Like reference symbols and designations in the various drawings indicate like elements.
In one aspect, techniques, apparatus and systems are described for implementing fast graphics processing unit (GPU)-based low-dose cone beam computed tomography (CT) reconstruction via total variation regularization. In the described low-dose GPU-based CBCT reconstruction implementation, CBCT images can be reconstructed by minimizing an energy functional that includes a TV regularization term and a data fidelity term posed by the x-ray projections. By developing new minimization algorithms with mathematical structures suitable for GPU parallelization, the massive computing power of GPU can be adapted to dramatically improve the efficiency of the TV-based CBCT reconstruction.
For example, an algorithm can be implemented to efficiently perform the optimization process via a proximal forward-backward splitting method and a multi-grid technique. Furthermore, the described reconstruction algorithm can be fully implemented on graphics processing unit (GPU), ensuring satisfactory computational speed. Tests results of the CBCT reconstruction on a digital phantom are described below. The powerful minimization algorithm, as well as the GPU implementation, can be combined to provide both sufficient reconstruction accuracy and satisfactory efficiency. In particular, the reconstruction time can range from 0.5 to 5 minutes depending on the number of x-ray projections used.
For a patient volumetric image represented by a function ƒ(x, y, z), where (x, y, z) εR3 is a vector in 3-dimensional Euclidean space. A projection operator Pθ maps ƒ(x, y, z) into another function on an x-ray imager plane along a projection angle θ.
P
θ[ƒ(x,y,z)](u,v)=∫0L(u,v)dlf(xs+n1l,ys+n2l,zs+n3l) (1)
where (xs,ys,zs) is the coordinate of the x-ray source S and (u,v) is the coordinate for a point P on the imager, n=(n1,n2,n3) being a unit vector along the direction SP.
The CBCT image can be reconstructed by minimizing an energy functional consisting of a data fidelity term and a regularization term:
f(x,y,z)=argmin E[f]=argminE1[f]+μE2[f],s.t.f.(x,y,z)≧0for ∀(x,y,z)εR3, (2)
where the energy functionals are
and
Here V is the volume in which the CBCT image is reconstructed. Nθ is the number of projections used and A is the projection area on each x-ray imager. ∥ . . . ∥⇓, p denotes Ip− norm of functions. In Eq. (2), the data fidelity term E2[ƒ] ensures the consistency between the reconstructed volumetric image ƒ(x,y,z) and the observation Yθ(u,v). The first term, E1[f], known as TV semi-norm, can be extremely powerful to remove artifacts and noise from f while preserving its sharp edges to a certain extent. The CBCT image reconstruction from few projections is an underdetermined problem in that there are infinitely many functions f such that Pθ[ƒ(x,y,z)](u,v)=Yθ(u,v). By performing the minimization as in Eq. (2), the projection condition can be satisfied to a good approximation. The TV regularization term can serve as the purpose of picking out the one with desired image properties, namely smooth while with sharp edges, among those infinitely many candidate solutions. The dimensionless constant μ>0 controls the smoothness of the reconstructed images by adjusting the relative weight between E1[f] and E2[ƒ]. In the reconstruction results shown herein, the value of μ can be chosen manually for the best image quality.
One of the obstacles encountered while solving Eq. (2) comes from the projection operator Pθ. In matrix representation, this operator is sparse. However, it contains approximately 4×109 non-zero elements for a typical clinical case studied in this paper, occupying about 16 GB memory space. This matrix is usually too large to be stored in typical computer memory and therefore it has to be computed repeatedly whenever necessary. This repeated work can consume a large amount of the computational time. For example, if Eq. (2) is solved with a simple gradient descent method, Pθ would have to be evaluated repeatedly while computing the search direction, i.e. negative gradient, and while calculating the functional value for step size search. This could significantly limit the computation efficiency.
To overcome this difficulty, the forward-backward splitting algorithm can be used. This algorithm splits the minimization of the TV term and the data fidelity term into two alternating steps, while the computation of x-ray projection Pθ is only in one of them. The computation efficiency can be improved owing to this splitting. Consider the optimality condition of Eq. (2) by setting the functional variation with respect to ƒ(x,y,z) to be zero:
If the above equation is split into the following form
where β>0 is a parameter and g(x,y,z) is an auxiliary function used in this splitting algorithm, the minimization problem can be accordingly split, leading to the following iterative algorithm to solve Eq. (2):
Here a new energy functional can be described as
Step 1 in Algorithm A1 is a gradient descent update with respect to the minimization of energy functional E2[ƒ]. Also, Step 2 here is just an Rudin-Osher-Fatemi model, which has been shown to be extremely efficient and capable of removing noise and artifacts from a degraded image g(x,y,z) while still preserving the sharp edges and main features. In addition, since ƒ represents the x-ray attenuation coefficients, its non-negativeness can be ensured by a simple truncation as in Step 3.
The choice of β can be important in this algorithm. On one hand, a small value of β can lead to a large step size of the gradient descent update in Step 1, causing instability of the algorithm. On the other, a large β may tend to make the TV semi-norm E1[ƒ] unimportant in Step 2, reducing its efficacy in removing artifacts. In practice, an empirical value β˜10μ can be appropriate.
For the sub-problem in Step 2, the optimization of an energy functional EROF[ƒ]=E1[ƒ]+(β/2)E3[ƒ] can be solved by a simple gradient descent method. At each iteration step of the gradient descent method, the functional value E0=EROF[ƒ] can be first evaluated, as well as the functional gradient
An inexact line search can be then performed along the negative gradient direction with an initial step size λ=λ0. The trial functional value Enew=EROF[ƒ−λd] can be evaluated. If Amijo's rule is satisfied, namely
where c is a constant, the step size λ is accepted and an update of the image ƒ←ƒ−λd is applied; otherwise, the search step size is reduced by a factor α with αε (0,1). This iteration can be repeated until the relative energy decrease in a step
is less than a given threshold ε. The parameters relevant in this sub-problem can be chosen as empirical values of c=0.01, α=0.6 and ε=0.1%. The computation results are found to be insensitive to these choices.
Boundary condition can be also addressed during the implementation. For simplicity, zero boundary condition can be imposed in the described computation along the anterior-posterior direction and the lateral direction, while reflective boundary condition can be used along the superior-inferior direction.
Computer graphic cards, such as the NVIDIA GeForce series and the GTX series, can be used for display purpose on desktop computers. However, special GPUs dedicated for scientific computing, for example the Tesla C1060 card (from NVIDIA of Santa Clara, Calif.) can be used for implementing the CBCT algorithms described herein. A scientific computing dedicated GPU card can have multiple processor cores (e.g., a total number of 240 processor cores grouped into 30 multiprocessors with 8 cores each), each with a clock speed of 1.3 GHz. The card can be equipped with 4 GB DDR3 memory, shared by all processor cores. Utilizing such a GPU card with tremendous parallel computing ability can considerably elevate the computation efficiency of the described algorithm. In fact, a number of components can be easily accomplished in a parallel fashion. For instance, the evaluating of functional value EROF[ƒ] in Step 2 of Algorithm A1 can be performed by first evaluating its value at each (x,y,z) coordinate and then summing over all coordinates. The functional gradient d(x,y,z) can be also computed with each GPU thread responsible for one coordinate.
A straightforward way of implementing Algorithm A1 can include interpretation of
Pθ[ƒ] as a matrix multiplication and then E2[ƒ] as a vector norm
This leads to a simple form
for the functional variation δE2[ƒ]/8ƒ(x,y,z) in Step 1, apart from some constants, where ·T denotes matrix transpose. In practice, Pθ may be too large to be stored in computer memory. Also, Pθƒ is simply a forward projection calculation that can be easily computed by a ray-tracing algorithm, such as Siddon's algorithm. Due to the massive parallelization ability of GPU, multiple threads can compute projections of a large number rays simultaneously and high efficiency can be achieved. On the other hand, an efficient algorithm to perform the operation of Pθ
To overcome this difficulty, the functional variation
in Step 1 of Algorithm A1 can be analytically computed and a closed-form equation can be obtained:
Here (u*,v*) is the coordinate for a point P on imager, where a ray line connecting the source S=(xs, ys, zs) and the point V=(x,y,z) intersects with the imager. L0 is the distance from the x-ray source S to the imager. I(x,y,z) and L(u*,v*) are the distance from S to the point V and from S to the point P, respectively. See
Without losing of generality, assume the x-ray projection is taken along positive x-direction. With the help of delta functions, Eq. (1) can be rewritten as:
P
θ[ƒ(x,y,z)](u,v)=∫dldxdydzf(x,y,z)·δ(x−xs−n1l)δ(y−ys−n2l)δ(z−zs−n3l). (7)
From
where L(u,v)=[L02+u2+v2]1/2. For the functional E2[ƒ] in Eq. (2), variation is taken with respect to ƒ(x,y,z), yielding
The following functions can be defined.
From the properties of delta function, it follows that
where (u*,v*,I*) is the solution to Eq. (9). Explicitly, the following is obtained:
The geometric meaning of this solution is clear. l* is the distance from x-ray source to the point V=(x,y,z). (u*,v*) is the coordinate for a point P on imager, where a ray line connecting the source S=(xs,ys,zs) and the point V=(x,y,z) intersects with the imager. Finally, the Jacobian
in Eq. (10) can be evaluated. This somewhat tedious work yields a simple result:
Substituting Equation (12) into equation (10) leads to Equation (6) above.
The form of Equation (6) suggests a very efficient way of evaluating this functional variation and hence accomplishing Step 1 in Algorithm A1 in a parallel fashion. For this purpose, the forward projection operation can be first performed and compute an auxiliary function defined as Tθ(u,v)≡[Pθ[ƒ(x,y,z)](u,v)−Yθ(u,v)] for all (u,v) and θ. For a point (x,y,z) where we try to evaluate the functional variation, it suffices to take the function values of Tθ(u*,v*) at a (u*,v*) coordinate corresponding to the (x,y,z), multiply by proper prefactors, and finally sum over θ. In numerical computation, since Tθ(u,v) can only be evaluated at a set of discrete (u,v) coordinates and (u*,v*) does not necessarily coincide with these discrete coordinates, a bilinear interpolation is performed to get Tθ(u*,v*). Because the computation of Tθ(u,v) can be achieved in a parallel manner with each GPU thread responsible for a ray line and the evaluation of
can be performed with each thread for a voxel (x,y,z), extremely high efficiency in Step 1 is expected given the vast parallelization ability of GPU.
Another technique employed to increase computation efficiency is multi-grid method. Because of the convexity of the energy functional in Eq. (2), the minimization problem is equivalent to solving a set of nonlinear differential equations posed by the optimality condition as in Eq. (3). When solving a differential equation with a certain kind of finite difference scheme, the convergence rate of an iterative approach largely depends on the mesh grid size. In particular, the convergence rate is usually worsened when a very fine grid size is used. Moreover, finer grid also implies more number of unknown variables, significantly increasing the size of the computation task.
A multi-grid approach can be utilized to resolve these problems. When it is desired to reconstruct a volumetric CBCT image ƒ(x,y,z) on a fine grid Ωh of size h, a process can be started by solving the problem on a coarser grid Ω2h of size 2 h with the same iterative approach as in Algorithm A1. Upon convergence, the solution ƒ2h on Ω2h can be extended to the fine grid Ωh by, for example, a linear interpolation, and then the solution can be used as the initial guess of the solution on Ωh. Because of the decent quality of this initial guess, only few iteration steps of Algorithm A1 are needed to achieve the final solution on Ωh. This two-level idea can be used recursively. In practice, a 4-level multi-grid scheme can be implemented, e.g., the reconstruction can be sequentially achieved on grids Ω8h→Ω4h→Ω2h→Ωh. A considerably efficiency gain can be observed in the implementation.
The process performed in Step 2 (250) above is further described in the flow chart on the right side of
The above described reconstruction algorithm can be tested with a digital NURBS-based cardiac-torso (NCAT) phantom, which maintains a high level of anatomical realism (e.g., detailed bronchial trees). For example,
For
Also, for the example shown in
The CBCT images are first reconstructed based on different number of x-ray projections Nθ. In all cases, the projections can be taken along equally spaced angles covering an entire 360 degree rotation.
For the purpose of comparison, the images reconstructed from conventional FDK algorithm is shown with same experimental setting. The reconstructed CBCT image from the described CBCT method based on 40 projections is almost visually indistinguishable from the ground-truth. On the other hand, the image produced by the conventional FDK algorithm is full of streak artifacts due to the insufficient number of projections. Moreover, the required number of projections can be further lowered for some clinical applications. For example, 20 projections may suffice for patient setup purposes in radiotherapy, where major anatomical features have already been retrieved. As far as radiation dose is concerned, the results shown in
In order to quantify the reconstruction accuracy, the correlation coefficient c≡corr(ƒ,ƒO) and the relative error
as metrics to measure the closeness between the ground truth image ƒo(x,y,z) and the reconstruction results ƒ(x,y,z). The relative error e and the correlation coefficient c corresponding to the results shown in
To have a complete visualization of the reconstructed CBCT image,
The following describes the convergence of the described algorithm for this NCAT phantom case, as the ground truth is known here and the quality of the reconstructed image can be quantified. The rigorous proof of the convergence of A1 has been established mathematically. Therefore, whether the solution to Eq. (2) above is reached, or if not, how far away from it, could be only an issue of the number of iteration steps. Since the solution to Eq. (2) is not known (note that this solution is different from the ground truth image), it can be hard to quantify how far away the solution obtained in our algorithm is from the true solution. Alternatively, the relative error can be used to measure the distance between the described solution and the ground truth. Though this is not the mathematically correct metric to measure the convergence toward the true solution to Eq. (2), it is a practically relevant metric to quantify the goodness or correctness of the algorithm. In the described algorithm, the relative error and the obtained image quality are acceptable, when the iteration is terminated.
To further demonstrate the convergence process, Nθ=40 can be used as an example.
To demonstrate the described algorithm's performance in a real physical phantom, CBCT reconstruction can be performed on a CatPhan 600 phantom (The Phantom Laboratory, Inc., Salem, NY). Multiple projections (e.g., 379) within multiple (e.g., 200) degrees are acquired using a Varian On-Board Imager system (Varian Medical Systems, Palo Alto, Calif.) at 2.56 mAs/projection under a full-fan mode. A subset of equally spaced 40 projections is used to perform the reconstruction.
To further test the capability of handling noisy data, CBCT scans can be performed under different mAs levels and reconstructions can be then conducted using the described TV-based algorithm and the FDK algorithm, as shown
Also, the described CBCT reconstruction algorithm can be validated on realistic head-and-neck anatomic geometry. A patient head-and-neck CBCT scan can be obtained using a Varian On-Board Imager system with 363 projections in 200 degrees and 0.4 mAs/projection. A subset of only 40 projections is selected for the reconstruction in this example.
In addition, CBCT scans can be performed on an anthropomorphic skeleton Rando head phantom (The Phantom Laboratory, Inc., Salem, N.Y.) to validate the described algorithm under low mAs levels in such a complicated anatomy. 363 projections within 200 degrees are acquired using a Varian On-Board Imager system at various mAs/projection levels. The CBCT reconstruction uses only a subset of equally spaced 40 projections. In
In addition,
Only a few embodiments has be described for a fast iterative algorithm for CBCT reconstruction. In the described TV-Based CBCT techniques, an energy functional that includes a data fidelity term and a regularization term of TV semi-norm can be used. The minimization problem can be solved with a GPU-friendly forward-backward splitting method together with a multi-grid approach on a GPU platform, leading to both satisfactory accuracy and efficiency.
Reconstructions performed on a digital NCAT phantom and a real patient at the head-and-neck region indicate that images with decent quality can be reconstructed from 40 x-ray projections in about 130 seconds. Also, the described algorithm has been tested on a CatPhan 600 phantom and Rando head phantom under different mAs levels and found that CBCT images can be successfully reconstructed from scans with as low as 0.1 mAs/projection. All of these results indicate that the described new algorithm has improved the efficiency by a factor of −100 over existing similar iterative algorithms and reduced imaging dose by a factor of at least 36 compared to the current clinical standard full-fan head and neck scanning protocol. The high computation efficiency achieved in the described algorithm makes the iterative CBCT reconstruction approach applicable in real clinical environments.
In another aspect, a fast GPU-based algorithm can be implemented to reconstruct high quality CBCT images from undersampled and noisy projection data so as to lower the imaging dose. In particular, described is an iterative tight frame (TF) based CBCT reconstruction algorithm. A condition that a real CBCT image has a sparse representation under a TF basis is imposed in the iteration process as regularization to the solution. To speed up the computation, a multi-grid method is employed. The described GPU implementation can achieve high computational efficiency and a CBCT image of resolution 512×512×70 can be reconstructed in about ˜139 sec. The described algorithm can be implemented on a digital NCAT phantom and a physical Calphan phantom. The described TF-based algorithm can be used to reconstrct CBCT in the context of undersampling and low mAs levels. Also, the reconstructed CBCT image quality can be quantitatively analyzed in terms of modulation-transfer-function and contrast-to-noise ratio under various scanning conditions. The results confirm the high CBCT image quality obtained from the described TF algorithm. Moreover, the described algorithm has also been validated in a real clinical context using a head-and-neck patient case.
Tight frames have the same structure as the traditional wavelets, except that they are redundant systems that generally provide sparser representations to piecewise smooth functions than traditional wavelets. CBCT reconstruction problem can be generally viewed as a 3-dimensional image restoration problem. In such a problem, the discontinuities of the reconstructed piecewise smooth image provide very important information, as they usually account for the boundaries between different objects in the volumetric image. In the TF approach, one tries to restore TF coefficients of the image, which usually correspond to important features, e.g. edges, as opposed to the image itself. This allows us to specifically focus on the reconstruction of the important information of the image, hence leading to high quality reconstruction results.
Besides its effectiveness, TF approach also has attractive numerical properties. Numerical schemes specifically designed for the TF approach can lead to a high convergence rate. Also, the numerical scheme only involves simple matrix-vector or vector operations, making it straightforward to implement the algorithm and parallelize it in a parallel computing structure. It is these numerical properties that lead to high computational efficiency in practice. Moreover, general purpose graphic processing units (GPUs) have offered a promising prospect of increasing efficiencies of heavy duty tasks in radiotherapy, such as CBCT FDK reconstruction, deformable image registration, dose calculation, and treatment plan optimization. Taking advantages of the high computing power of the GPU, the computation efficiency of TF-based CBCT reconstruction can be enhanced considerably.
Techniques, system and apparatus are described for implementing a novel CBCT reconstruction algorithm based on TF and implemented it on a GPU. The described techniques, systems and apparatus can provide a new approach for CBCT reconstruction, in addition to the well known FDK-type algorithms and the state-of-the-art iterative reconstruction algorithms, such as total variation. The described techniques, along with some validations, are described below. Experiments on a digital phantom, a physical phantom, and a real patient case demonstrate the possibility of reconstructing high quality CBCT images from extremely under sampled and noisy data. The associated high computational efficiency due to the good numerical property of the TF algorithm and our GPU implementation makes this approach practically attractive. By introducing the novel TF algorithm to the CBCT reconstruction context for the first time, can shed a light to the CBCT reconstruction field and contribute to the realization of low dose CBCT.
For a patient volumetric image represented by a function ƒ(x) with x=(x,y,z)εR3. A projection operator Pθ maps ƒ(x) into another function on an x-ray imager plane along a projection angle θ.
P
θ[ƒ](μ)≡˜∫0L(u)dlƒ(xs−nl), (13)
where xs=(xs,xy,zs) is the coordinate of the x-ray source and u=(u,v)εR2 is the coordinate of the projection point on the x-ray imager, n=(n1,n2,n3) being a unit vector along the projection direction.
The CBCT image reconstruction from few projections is an underdetermined problem. Because of insufficient measurements made at only a few x-ray projections, there are indeed infinitely many functions f satisfying the condition Pθ[ƒ](u)=gθ(u). Therefore, regularization based on some assumptions about the solution f has to be performed during the reconstruction process. These regularization-based CBCT reconstruction approaches usually result in solving challenging minimization problems. The most commonly used approach is an alternative iteration scheme, where, within each iteration step, conditions to be satisfied by the solution is imposed one after another. In the described problem, there are three conditions that need to be satisfied by the solution, and three key operations are performed in each iteration step accordingly. These conditions, as well as the operations ensuring them, are described in the following.
First, the x-ray projection of the reconstructed volumetric image ƒ(x) should match the observation gθ(u). This condition is commonly achieved by solving a linear system Pƒ=g, where P is the matrix representation of the projection operator Pθ, and ƒ and g are vectors corresponding to the solution ƒ(x) and the observation gθ(u), respectively. Nonetheless, since this is a highly underdetermined problem, any numerical scheme tending to directly solve Pƒ=g is unstable. Instead, in this aspect of the specification, described is an implementation of a minimization of an energy E[ƒ]=∥Pƒ−g∥22 by using a conjugate gradient least square (CGLS) algorithm. This algorithm is essentially an iterative algorithm, which generates a new solution ƒ given an initial guess v. This process can be represented as ƒ←CGLS[v], and the details regarding the implementation of the CGLS algorithm are described below. The CGLS algorithm can be used to efficiently solve this minimization problem, and hence ensure the consistency between the reconstructed volumetric image ƒ(x) and the observation gθ(u).
Second, a regularization condition can be imposed to the solution ƒ(x) that it has a sparse representation under a piecewise linear TF system X={ψi(x)}. The solution ƒ(x) can be decomposed by X into a set of coefficient as αi(x)=ψi(x)ƒ(x), where stands for the convolution of two functions. In this specification, the piece-wise linear TF basis is used. Specifically, in one-dimension (1D), the discrete forms of the basis functions are chosen as
and
The 3D basis functions are then constructed by the tenser product of the three 1D basis functions, i.e., ψi(x,y,z)=hl(x)hm(y)hn(z), with integers l, m, n chosen from 0, 1, or 2. The transform from ƒ(x) to the TF coefficient αi(x) via convolution is a linear operation. To simplify the notation, this transformation can be represented in a matrix notation as α=Dƒ, where α is a vector consisting of all the TF coefficients. The introduction of the matrix D is merely for the purpose of simplifying notation. In practice, this transformation can be computed via convolution but not matrix multiplication. Conversely, the function ƒ(x) can be uniquely determined given a set of coefficients {αi(x)} by
which can be denoted as ƒ=DTα.
Many natural images have a very sparse representation under the TF system X, i.e. there are only a small proportion of the elements within the coefficient vector α that are considerably larger in magnitude than the rest of the elements. It is this property that can be utilized a priori as a regularization term in our CBCT reconstruction problem. A common way of imposing this condition into the solution ƒ is to throw away small TF coefficients. The deletion of these small coefficients not only sharpens edges but also removes noises in the reconstructed CBCT. As such, ƒ can be decomposed into the TF space; then soft-threshold the TF coefficients with a predetermined value μ; and finally reconstruct ƒ based on the new coefficients. This process can be denoted as ƒ←DTDƒ. Here the soft-threshold operation is defined as:
where sgn(.) is sign function defined as
It is understood that the soft-threshold operation α is performed component-wise on the vector α.
Third, since the reconstructed CBCT image ƒ(x) physically represents x-ray attenuation coefficient at a spatial point x, its positivity has to be ensured during the reconstruction in order to obtain a physically correct solution. For this purpose, a correction step of the reconstructed image ƒ(x) is performed by setting its negative voxel values to be zero. Mathematically, this operation is denoted by ƒ←ƒ, where the operation stands for a voxel-wise truncation of the negative values in the CBCT image ƒ.
In considering all the components mentioned above, the reconstruction algorithm can be summarized as shown in Algorithm B1:
There is only one tuning parameter μ in the algorithm. In practice, its value is carefully tuned so that the best image quality can be obtained. An example of a process for selecting this parameter is described further below.
Mathematically, the TF coefficients Dƒ(k) generated by Algorithm B1 can be shown to converge to the following optimization problem:
The optimization problem of Equation (16) is a successful model in solving image restoration problems. With a simple modification, the convergence rate of B1 can be considerably enhanced, leading to Algorithm B2 used in the described reconstruction problem:
The CBCT reconstruction problem can be solved with the aforementioned algorithm B2 on a GPU, such as an NVIDIA Tesla C1060 card. This GPU card has a total number of 240 processor cores (grouped into 30 multiprocessors with 8 cores each), each with a clock speed of 1.3 GHz. It is also equipped with 4 GB DDR3 memory, shared by all processor cores. Utilizing such a GPU card with tremendous parallel computing ability can considerably elevate the computation efficiency. Described below are components of the described implementation.
In fact, a number of computationally intensive tasks involved in algorithm B2 share a common feature, i.e. applying a single operation to different part of data elements. For computation tasks of this type, it is straightforward to accomplish them in a data-parallel fashion, namely having all GPU threads running the same operation, one for a given subset of the data. Such a parallel manner is particularly suitable for the SIMD (single instruction multiple data) structure of a GPU and high computation efficiency can be therefore achieved.
Specifically, the following components in B2 fall into this category: 1) The component-wise soft-threshold in Step 3 and the voxel-wise positivity correction of the CBCT image in Step 4 can be parallelized with one GPU thread responsible for one TF coefficient or one voxel, respectively. 2) The transformation of a CBCT image f into the TF space can be merely a convolution operation αi(x)=ψi(x)ƒ(x). This computation can be performed by having one GPU thread compute the resulted αi(x) at one x coordinate. The inverse transformation from the TF coefficient α to the image ƒ is also a convolution operation and can be achieved in a similar manner. 3) A matrix vector multiplication of the form g=Pf is frequently used in the CGLS method. This operation corresponds to the forward x-ray projection of a volumetric image ƒ(x) to the imager planes, also known as a digital reconstructed radiograph. In the described implementation, it is performed in a parallel fashion, with each GPU thread responsible for the line integral of equation (13) along an x-ray line using Siddon's ray-tracing algorithm.
Another distinctive component in the described implementation is the CGLS solution to the optimization problem
in step 2 of B2. in this step, a CGLS method is applied to efficiently find a solution ƒ(k+1) to this least square problem with an initial value of v(k) in an iterative manner. The details of this CGLS algorithm are provided below in a step-by-step manner.
CGLS algorithm can be used to solve the least-square problem
in an iterative manner using conjugate gradient method. Specifically, the algorithm performs following iterations:
In the context of CBCT reconstruction with only a few projections, the normal equation PTPx=PT y is indeed underdetermined. The convergence of the CGLS algorithm for underdetermined problems has been defined. In the described reconstruction algorithm, the CGLS is used as a means to ensure the data fidelity condition during each iteration step of the reconstruction. Specifically, given an input image x(0)=ƒ, the CGLS algorithm outputs a solution ƒ′=x(m+1) which is better than the input in the sense that the residual ∥Pƒ′−∥22 is smaller than ∥Pƒ−y∥22. This fact always holds regardless the singularity of the linear system.
Since the use of CGLS is merely for ensuring data fidelity via minimizing the residual l2 norm, in each outer iteration of the described TF algorithm, it is not necessary to perform CGLS iteration till converge. In practice, M=2˜3 CGLS steps in each outer iteration step is found sufficient. This approach is also favorable in considering the computation efficiency, as more CGLS steps per outer iteration step will considerably slow down the overall efficiency.
Each iteration step of the CGLS algorithm includes a number of fundamental linear algebra operations. For those simple vector-vector operations and scalar-vector operations, CUBLAS package (NVIDIA, 2009) is used for high efficiency. In addition, there are two time-consuming operations needing special attention, namely matrix-vector multiplications of the form g=Pf and f=PTg, where P is the x-ray projection matrix. Though it is straightforward to accomplish g=Pƒ on GPU with the Siddon's ray-tracing algorithm as described previously, it is quite cumbersome to carry out a computation of the form f=PTg. It is estimated that the matrix P, though being a sparse matrix, contains approximately 4×109 non-zero elements for a typical clinical case described herein, occupying about 16 GB memory space. Such a huge matrix P is too large to be stored in a GPU memory, not to mention computing its transpose. Therefore, a new algorithm for completing the task f=PTg has been designed. While ƒ=PTg can be computed using the Siddon's algorithm, such an operation is a backward one in that it maps a function g(u) on the x-ray imager back to a volumetric image ƒ(x) by updating its voxel values along all ray lines. If Siddon's ray-tracing algorithm were still used in the GPU implementation with each thread responsible for updating voxels along a ray line, a memory conflict problem would take place due to the possibility of simultaneously updating a same voxel value by different GPU threads. When this conflict occurs, one thread will have to wait until another thread finishes updating. This severely limits the maximal utilization of GPU's massive parallel computing power.
To overcome this difficulty, the explicit form of the resulting volumetric image function ƒ(x) can be analytically computed when the operator PT acts on a function g(x) on the x-ray imager and a close form expression can be obtained as:
Here u* is the coordinate for a point on imager where a ray line connecting the x-ray source at xs and the point at x intersects with the imager. L0 is the distance from the x-ray source S to the imager, while I(x) and L(u*) are the distance from xs to x and from xs to u* on the imager, respectively. Refer back to
Let ƒ(.): R3→R and g(.): R2→R be two smooth enough functions in the CBCT image domain and in the x-ray projection image domain, respectively. The operator Pθ
ƒ,Pθ
where . , . denotes the inner product. This condition can be explicitly expressed as
∫dxƒ(x)Pθ
Now take the functional variation with respect to ƒ(x) on both sides of equation (19) and interchange the order of integral and variation on the right hand side. This yields:
With help of a delta function Equation (1) can be rewritten as:
P
θ[ƒ](u)=∫dldxƒ(x)δ(x−xs−nl). (21)
Substituting Equation (21) into Equation (20), the following can be obtained:
Where u* is the coordinate of a point on imager, at which a ray line connecting the source xs and the point x intersects with the imager. L(u*) is the length from xs to u* and l(x) is the length from xs to x. The source to imager distance is L0. Additionally, a summation over projection angles θ is performed in Equation (16) to account for all the x-ray projection images.
One caveat when implementing Equation (22) is that the equation is derived from condition of Equation (18), where the inner product of two functions is defined in an integral sense. In the CGLS algorithm, both P and PT are viewed as matrices. Therefore, an inner product defined in the vector sense, i.e. ƒ, g=Σiƒigi for two vectors ƒ and g, should be understood in Equation (18). Changing the inner product from a function form to a vector form results in a numerical factor in Equation (16), simply being the ratio of pixel size ΔuΔv to the voxel size ΔzΔyΔz. The accuracy of such defined operator PT in terms of satisfying condition expressed in Equation (18). Numerical experiments indicate that this condition is satisfied with numerical error less than 1%. Though this could lead to CT number inaccuracy in the reconstructed CBCT image, absolution accuracy of CT number is not crucial in the use of CBCT in cancer radiotherapy, since CBCT is mainly used for patient setup purpose. This potential inaccuracy of the Hounsfield Unit should be taken account of when utilizing Equation (17).
Equation (17) indicates a very efficient way of performing ƒ=PTg in a parallel fashion. To compute ƒ(x) at a given x, we simply take the function values of g(u*) at the coordinate u*, multiply by proper prefactors, and finally sum over all projection angles θ. In numerical computation, since g(u) is evaluated at a set of discrete coordinates and u* does not necessarily coincide with these discrete coordinates, a bilinear interpolation is performed to obtain gθ(u*). At this point, the parallel computing can be performed with each GPU thread for a voxel at a given x coordinate. Extremely high efficiency can be obtained given the vast parallelization ability of the GPU.
Another technique employed to increase computation efficiency is the multi-grid method. The convergence rate of an iterative approach solving an optimization problem is usually worsened when a very fine grid size Δx, Δy, and Δz is used. Moreover, a fine grid can indicate a large number of unknown variables, significantly increasing the size of the computation task. The multi-grid approach can be utilized to resolve these problems. When reconstructing a volumetric CBCT image ƒ(x) on a fine grid Ωh of size h, the process can being with solving the problem on a coarser grid Ω2h of size Ωh with the same iterative approach as in Algorithm B2. Upon convergence, the solution ƒ2h on Ω2h, can be smoothly extended to the fine grid Ωh using, for example, linear interpolation, and it can be used the initial guess of the solution on Ωh. Because of the decent quality of this initial guess, only a few iteration steps of Algorithm B2 are adequate to achieve the final solution on Ωh. This idea can be further used while seeking the solution ƒ2h by going to an even coarser grid of size 4 h. In practice, a 4-level multi-grid scheme can be used, i.e. the reconstruction is sequentially achieved on grids Ω8h→Ω4h→Ω2h→Ωh.
The CBCT reconstruction results are presented on a NURBS-based cardiac-torso (NCAT) phantom, a Calphan phantom (The Phantom Laboratory, Inc., Salem, N.Y.), and a real patient at head-and-heck region. For the example described, all of the reconstructed CBCT images are of a resolution 512×512×70 voxels with the voxel size chosen as 0.88×0.88×2.5 mm3. The x-ray imager resolution is 512×384 covering an area of 40×30 cm2. The reconstructed images are much shorter than the imager dimension along the z-direction due to the cone beam divergence. The x-ray source to axes distance is 100 cm and the source to detector distance is 150 cm. For this example, all of these parameters mimic realistic configurations in a Varian On-Board Imager (OBI) system (Varian Medical Systems, Palo Alto, Calif.). For the cases presented, a total number of 40 x-ray projections are used to perform the reconstruction. For the digital NCAT phantom, x-ray projections are numerically computed along 40 equally spaced projection angles covering a full rotation with Siddon's ray tracing algorithm. As for the Calphan phantom case and the real patient case, they are scanned in the Varian OBI system under a full-fan mode in an angular range of 200°. 363 projections are acquired and a subset of 40 equally spaced projections can be selected for the reconstruction.
NCAT phantom and Calphan phantom
The described TF-based reconstruction algorithm can be tested with a digital NCAT phantom. It is generated at thorax region with a high level of anatomical realism (e.g., detailed bronchial trees). In this simulated case, the projection data are ideal, in that it does not contain contaminations due to noise and scattering as in real scanning. Under this circumstance, a powerful reconstruction algorithm should be able to reconstruct CBCT image almost exactly. For example, the total variation method can yield accurate reconstruction from very few views. To test the TF algorithm, the reconstruction can be first performed with a large number of iterations (˜100 iterations in each multi-grid level) to obtain high image quality.
As shown in 1420 and 1430, the image profiles along the horizontal and the vertical cut in this slice are plotted to demonstrate the detail comparison between ground truth and the reconstruction results. For both image profiles 1420 and 1430, the reconstructed image and the ground-truth are virtually indistinguishable. To quantify the reconstruction accuracy in this case, the RMS error can be computed as
where ƒ is the reconstructed image and ƒ0 is the ground truth image. The reconstructed 3D volumetric CBCT image attains an RMS error of e=1.92% in this case. When the RMS error is computed in the phantom region only, i.e. excluding those background outside the patient, the RMS error can be e=1.67%. These numbers, as well as the figures presented in
The reconstruction time for this case is about 10 min on an NVIDIA Tesla C1060 card. CBCT can be used in various applications, including for the patient alignment purpose in cancer radiotherapy, where a fast reconstruction is of essential importance. The above described example demonstrates the feasibility of using TF as a regularization approach to reconstruct CBCT given that an enough number of iteration steps are performed. In some clinical practice, such as for positioning patient in cancer radiotherapy, it is adequate to perform less number of iterations for fast image reconstruction, while still yielding acceptable image quality. For this purpose, the iteration process can be terminated earlier (e.g. ˜20 iteration in 2 min). Under this condition, the transverse slice of the reconstructed CBCT images is shown in
Specifically,
The Calphan phantom contains a layer including a single point-like structure of a diameter 0.28 mm as shown in
a shows two patches (1700) used to measure MTF in CBCT images reconstructed from TF and FDK algorithms at 1.0 mAs/projection with 40 projections.
At a constant mAs level of 1.0 mAs/projection, the spatial resolution in the images reconstructed from the TF and the FDK algorithms are compared. The patch images used to compute MTF are shown in
To quantitatively evaluate the contrast of the reconstructed CBCT images, the contrast-to-noise ratio (CNR) can be measured. For a given region of interest (ROI), CNR can be calculated as CNR=2|S−Sb|/(σ+σb), where S and Sb are the mean pixel values over the ROI and in the background, respectively, and σ and σb are the standard deviation of the pixel values inside the ROI and in the background. Before computing the CNR, it should be understood that CNR is affected by the parameter μ which controls to what extent we would like to regularize the solution via the TF term. In fact, a small amount μ is not sufficient to regularize the solution, leading to a high noise level and hence a low CNR. In contrast, a large p tends to over-smooth the CBCT image and reduce the contrast between different structures. Therefore, there exists an optimal μ level in the reconstruction.
Take the case at 0.3 mAs/projection and 40 projections as an example, CBCT reconstruction can be performed with different μ values and the CNRs can be computed for the four ROIs indicated in
b shows the dependence of CNRs on mAs levels measured in those four ROIs in the CBCT images reconstructed using the TF method (1810). The corresponding CNRs obtained from the conventional FDK algorithm are also shown in
Only a few embodiments have been described of a TF-based fast iterative algorithm for CBCT reconstruction. By iteratively applying three steps to impose three conditions that the reconstructed CBCT should satisfy, high quality CBCT images can be constructed with undersampled and noisy projection data. In particular, the underline assumption that a real CBCT image has a sparse representation under a TF basis is found to be valid and robust in the reconstruction, leading to high quality results. Such an algorithm is mathematically equivalent to the so called balanced approach for TF-based image restoration. In practice, due to the GPU implementation, the multi-grid method, and various techniques employed, high compuational efficiecny can be achieved.
As shown above, the described TF-based algorithm has been tested on a digital NCAT phantom and a physical Calphan phantom. The described TF-based algorithm can lead to higher quality CBCT image than those obtained from a conventional FDK algorithm in the context of undersampling and low mAs scans. Quantitative analysis of the CBCT image quality has been performed with respect to the MTF and CNR under various scanning cases, and the results confirm the high CBCT image quality obtained from the described TF-based algorithm. Moreover, reconstructions performed on a head-and-neck patient have presented very promissing results in real clinical applications.
A few embodiments have been described in detail above, and various modifications are possible. The disclosed subject matter, including the functional operations described in this specification, can be implemented in electronic circuitry, computer hardware, firmware, software, or in combinations of them, such as the structural means disclosed in this specification and structural equivalents thereof, including potentially a program operable to cause one or more data processing apparatus to perform the operations described (such as a program encoded in a computer-readable medium, which can be a memory device, a storage device, a machine-readable storage substrate, or other physical, machine-readable medium, or a combination of one or more of them).
The term “data processing apparatus” or “computing system’ encompasses all apparatus, devices, and machines for processing data, including by way of example a programmable processor, a computer, or multiple processors or computers. The apparatus can include, in addition to hardware, code that creates an execution environment for the computer program in question, e.g., code that constitutes processor firmware, a protocol stack, a database management system, an operating system, or a combination of one or more of them.
A program (also known as a computer program, software, software application, script, or code) can be written in any form of programming language, including compiled or interpreted languages, or declarative or procedural languages, and it can be deployed in any form, including as a stand alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment. A program does not necessarily correspond to a file in a file system. A program can be stored in a portion of a file that holds other programs or data (e.g., one or more scripts stored in a markup language document), in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, sub programs, or portions of code). A program can be deployed to be executed on one computer or on multiple computers that are located at one site or distributed across multiple sites and interconnected by a communication network.
While this specification contains many specifics, these should not be construed as limitations on the scope of any invention or of what may be claimed, but rather as descriptions of features that may be specific to particular embodiments of particular inventions. Certain features that are described in this specification in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination. Moreover, although features may be described above as acting in certain combinations and even initially claimed as such, one or more features from a claimed combination can in some cases be excised from the combination, and the claimed combination may be directed to a subcombination or variation of a subcombination.
Similarly, while operations are depicted in the drawings in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results. In certain circumstances, multitasking and parallel processing may be advantageous. Moreover, the separation of various system components in the embodiments described above should not be understood as requiring such separation in all embodiments.
Only a few implementations and examples are described and other implementations, enhancements and variations can be made based on what is described and illustrated in this application including the attached Appendix.
This application claims priority to U.S. Provisional Patent Application No. 61/304,353, filed Feb. 12, 2010, entitled “Graphics Processing Unit-Based Fast Cone Beam Computed Tomography Reconstruction,” the entire contents of which are incorporated by reference.
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/US11/24803 | 2/14/2011 | WO | 00 | 9/13/2012 |
Number | Date | Country | |
---|---|---|---|
61304353 | Feb 2010 | US |