This invention relates generally to the use of Graphic Processing Units (GPUs) to accelerate reconstruction of slices from X-ray projection images acquired using a circular cone beam trajectory.
As is known in the art, in early computerized tomography for either of medical or industrial applications, an x-ray fan beam and a linear array detector was used to achieve two-dimensional (2D) imaging. While an acquired data set may be complete and image quality is correspondingly high, only a single slice of an object is imaged at a time. Thus, if a 3D image is required, an approach which reconstructs a stack of slices is employed. Acquiring and Reconstructing a 3D data set one 2D slice at a time is inherently slow. Moreover, in medical applications, motion artifacts occur because adjacent slices are not imaged simultaneously. Also, dose utilization is less than optimal because the distance between slices is typically less than the x-ray collimator aperture, resulting in double exposure to many parts of the body. In 2D CT, the scanning path of the source is often simply a circular scan about the object.
More recently a system employing cone beam geometry has been developed for 3D imaging and includes a cone beam x-ray source instead of a fan beam source, and a 2D area detector instead of a linear array detector. An object to be imaged is scanned, preferably over a 360 degrees angular range, either by moving the x-ray source in a scanning path about the object or by rotating the object while the source remains stationary. In either case, the area detector is fixed relative to the source and relative movement between the source and object provides the scanning (irradiation of the object by the cone beam energy). Compared to the conventional 2D “stack of slices” approach to achieve 3D imaging, the cone beam approach has the potential to achieve 3D imaging of both medical and industrial applications both rapidly and with improved dose utilization.
As is also known in the art, in cone beam computed tomography (CT), a conical beam is used and three dimensional (3D) image reconstruction is based on the technique known as filtered back projection. Various methods have been developed for 3D image reconstruction for cone beam x-ray imaging systems. For example, a back projection cone beam image reconstruction technique is described in U.S. Pat. No. 5,257,183, which issued on Oct. 26, 1993 to Kwok C. Tam, entitled “Method and Apparatus for Converting Cone Beam X-Ray Projection Data To Planar Integral and Reconstructing a Three-Dimensional Computerized Tomography (CT) Image of an Object”, which is incorporated herein by reference.
As is also known, increasing sampling resolution of the projection images and the desire to reconstruct high-resolution output volumes increases both the memory consumption and the processing time considerably. In order to keep the processing time down, new strategies for memory management are required as well as new algorithmic implementations of the reconstruction pipeline. In the current generation of cone beam scanners, a typical projection image is obtained with 1024×1024 sample points at 16-bit dynamic range. Using software, the acquired projection images are converted into line integrals of attenuation through the objects (floating point 32 bit data) by applying familiar Beer-Lambert's law of exponential decay of intensity through absorbing medium of rays coming out of a monochromatic energy source, see Principles of Computerized Tomographic Imaging by Avinash C Kak and Malcolm Slaney, IEEE Press, 1988. This procedure ignores the fact that actual energy source is poly-energetic and besides actual absorption, other phenomena like scatter and beam hardening take place. The effect of these phenomena appear as cupping artifacts in the resultant reconstruction which are later removed by the pos-processing operation described in this invention. It is noted that “input” projection images are considered as line integrals of attenuation through the object obtained from original projection images. Usually, between 360 and 720 images are acquired during one rotation around the patient. Consequently, high quality data volumes can be reconstructed from those images. On the other hand, the data set size can grow up to several giga bytes and the processing time increases tremendously, too. Furthermore, for accurate and high-quality output volumes all calculations are typically performed in 32-bit floating point precision. This improves the image quality over fix-point formats at the price of even more memory consumption and longer processing time. To handle the large amount of data, an efficient memory management strategy is required to achieve fast turn-around times.
While in the past years, there have been a couple of cone beam reconstruction papers exploiting the power of GPUs like for example by Mueller and Xu [F. Xu and K. Mueller, “Accelerating popular tomographic reconstruction algorithms on commodity PC graphics hardware,” IEEE Transaction of Nuclear Science, 2005], they focus mainly on the reconstruction process itself using various algorithms like backprojection and algebraic reconstruction, the focus of the present invention is on the entire reconstruction pipeline with pre- and post-processing filters in high accuracy using floating-point precision.
In accordance with the present invention, a method is provided for reconstructing an object subject to cone beam passing through the object and generating projection images of the object, comprising: applying curvature-based smoothing and boundary adjustments to the projection images using GPU shader programs; applying high-pass filtering to the curvature-smoothed and boundary adjusted images using GPU shader programs; reconstructing the object from the applied high pass filtering using backprojection using GPU shader programs; and removing ring and cupping artifacts from the reconstructed object using GPU shader programs.
In one embodiment, a method is provided for reconstructing an object subject to a cone beam passing through the object and generating projection images of the object. The method includes: applying a non-linear curvature-based smoothing filter to the projection images; applying a high-pass filter to the curvature-smoothed images; backprojecting the high pass filtered images into an output volume using voxel driven backprojection; removing cupping artifacts from the output volume convolving every slice with a Butterworth kernel and adding the result to the slice weighted by a scaling factor.
In one embodiment, the output volume is divided into chunks, wherein a chunk is a stack of slices representing a part of the entire output volume.
In one embodiment, a method is provided for reconstructing an object subject to cone beam passing through the object and generating projection images of the object. The method includes applying a non-linear curvature-based smoothing filter to the projection images, such filter smoothing the projection images in uniform regions while preserves edges of the images by including local curvature and gradient magnitude effects. The method applies a high-pass filter to the curvature smoothed images using a FFT to convolve the projection images with a high-pass filter. The method backprojects the projection images into an output volume using voxel driven backprojection, wherein for each voxel in the output volume the process request the corresponding pixel in the projection images, such backprojection comprising counting the number of rays passing through each voxel and discarding voxels below a user-defined minimum ray limit. The method removes artifacts from the output volume comprising: removing ring artifacts from the output volume using radial median filtering and circular smoothing. The method removes cupping artifacts from the output volume comprising convolving every slice with a Butterworth kernel and adding the result to the slice weighted by a scaling factor.
In order to improve the reconstruction pipeline speed, the process according to the invention, takes advantage of the computational power of modern graphics processing units (GPUs). Although GPUs are not Turing-complete, they provide a powerful subset for parallel single instruction multiple data (SIMD)-type operations which outperform current CPUs even with streaming SIMD extension (SSE) 2 or 3 optimization. Furthermore, there is a second type of parallelism available on modern graphics cards: 32 to 48 pixel units exist on board executing SIMD operations in parallel. To better understand the architecture and advantages of using the GPU, a brief overview of GPU programming is presented below
The process according to the invention also uses an GPU accelerated implementation of the Fast Fourier transform (FFT) presented in papers by T. Schiwietz and R. Westermann, “GPU-PIV,” in VMV, pp. 151-158, 2004, and T. Schiwietz, T. Chang, P. Speier, and R. Westermann, “MR image reconstruction using the GPU,” in Medical Imaging 2006: Visualization, Image-Guided Procedures, and Display. Proceedings of the SPIE (2006)., pp. 646-655, April 2006.
In order to obtain the maximum performance of a GPU all computations according to the invention are executed by the GPU and the bus transfer between GPU memory and main memory is minimized. Therefore, the invention implements the entire reconstruction pipeline using GPU programs (shaders). A goal was to upload the projection images to GPU memory right after acquisition and download only the final output volume. Unfortunately, the memory consumption is very high. Therefore, the invention splits the volume into several parts (chunks) and reconstructs them separately.
The details of one or more embodiments of the invention are set forth in the accompanying drawings and the description below. Other features, objects, and advantages of the invention will be apparent from the description and drawings, and from the claims.
FIGS. 2(a) and 2(b) show the effect of curvature smoothing used in the process according to the invention;
Like reference symbols in the various drawings indicate like elements.
As will be described in more detail below, in addition to the filtered backprojection (see L. A. Feldkamp, L. C. Davis, and J. W. Kress, “Practical cone-beam algorithm,” Optical Society of America Journal A 1, pp. 612-619, June 1984) (FDK) algorithm), the process implements the entire cone beam reconstruction pipeline including nonlinear smoothing of acquired raw data to the artifact removal of reconstructed output slices.
The process according to the invention, and as shown in the flowchart of
More particularly, Step 602 includes backprojecting the high-pass filtered image from projection coordinates into three dimensional (3D) voxel coordinates to produce a stack of image slices using a GPU shader program and Step 604 includes removing artifacts and cupping effects from the produced stack of images using GPU shader programs.
More particularly, the process uses the following GPU-accelerated pre- and post-filters in the pipeline:
1. Pre-Filtering (Step 600): Since raw data is usually noisy, a non-linear curvature-based filter is applied to all projection images. The filter smoothes the image in uniform regions but preserves the edges by taking the local curvature and gradient magnitude into account. The algorithm is based on work by Neskovic et al [P. Neskovic and B. B. Kimia, “Three-dimensional shape representation from curvature dependentsurface evolution”, IEEE International Conference on Image Processing, 1994. Proceedings. ICIP-94., Volume: 1, On page(s): 6-10 vol. 1, 13-16 November 1994]
2. Pre-Filtering (Step 602): The inverse Radon transform theory requires the images to be filtered with a high-pass ramp filter. The process uses the FFT to convolve the projection images with a high-pass filter. The GPU-accelerated FFT implementation is based on previous papers (see T. Schiwietz and R. Westermann, “GPU-PIV,” in VMV, pp. 151-158, 2004; and T. Schiwietz, T. Chang, P. Speier, and R. Westermann, “MR image reconstruction using the GPU,” in Medical Imaging 2006: Visualization, Image-Guided Procedures, and Display. Proceedings of the SPIE (2006)., pp. 646-655, April 2006, both incorporated herein by reference).
3. Backprojection (Step 604): The projection images are backprojected into the output volume. The process uses a voxel driven backprojection implementation. That is, for each voxel in the output volume the process asks for the corresponding pixel in the projection images. This is implemented using projective texture mapping described by Segal et al., (see M. Segal, C. Korobkin, R. van Widenfelt, J. Foran, and P. Haeberli, “Fast shadows and lighting effects using texture mapping,” SIGGRAPH Comput. Graph. 26 (2), pp. 249-252, 1992). Geometry-wise, a voxel is only considered as reconstructed correctly, if a certain percentage of all projection images have contributed to it. Therefore, the process counts the number of “valid” rays from source passing through each voxel, a ray being “valid” if it meets a “usable” pixel location on the projection image, and discard voxels below a user-defined minimum “valid” count limit. In contrast to publications by Mueller and Xu [see F. Xu and K. Mueller, “Accelerating popular tomographic reconstruction algorithms on commodity PC graphics hardware,” IEEE Transaction of Nuclear Science, 2005], the process implementation does not require two output volumes oriented along the axes perpendicular to the patient axis. This makes the combination pass obsolete and saves half of the memory.
4. Post-Processing (Step 606): Once a raw volume is reconstructed, reconstruction artifacts are removed by post-processing operations. First, the process removes ring artifacts. Typically, ring artifacts can be observed on the axial plane centered around the patient axis due to detector calibration errors. The ring artifacts are extracted using radial median filtering and circular smoothing. Finally, the isolated ring artifacts are removed from the original image. This filter is based on an algorithm by Zellerhoff et al. [see M. Zellerhoff, B. Scholz, E.-P. Ruehrnschopf, and T. Brunner, “Low contrast 3D reconstruction from C-arm data,” in Medical Imaging 2005: Visualization, Image-Guided Procedures, and Display. Edited by Galloway, Robert L., Jr.; Cleary, Kevin R. Proceedings of the SPIE, Volume 5745, pp. 646-655 (2005)., M. J. Flynn, ed., pp. 646-655, April 2005].
5. Post-Filtering (Step 606): The second post-processing filter is a cupping artifact removal. Low-frequency artifacts are removed from the output slices. In order to eliminate the artifacts, the process convolves every slice with a band-pass Butterworth kernel and adds the cupping signal, which is generated as the filter response, to the original slice, weighted by a scaling factor.
The acquisition device 105 may further be a flatbed scanner that takes in an optical image and digitizes it into an electronic image represented as binary data to create a computerized version of a photo or illustration.
The PC 110, which may be a portable or laptop computer includes a CPU 125, a memory 130 and a graphics card 170, which are connected to an input device 150 and an output device 155. The CPU 125 includes a volume rendering module 145 that includes one or more methods for rendering a binary volume in a GPU. It is to be understood, however, that the volume rendering module 145 could be included in the graphics card 170.
The memory 130 includes a random access memory (RAM) 135 and a read only memory (ROM) 140. The memory 130 can also include a database, disk drive, tape drive or a combination thereof. The RAM 135 functions as a data memory that stores data used during execution of a program in the CPU 125 and is used as a work area. The ROM 140 functions as a program memory for storing a program executed in the CPU 125. The input device 150 is constituted by a keyboard or mouse and the output device 155 is constituted by a liquid crystal display (LCD), cathode ray tube (CRT) display or printer.
The graphics card 170, which is used to take binary data from the CPU 125 and turn it into an image, includes, inter alia, a GPU 175 and a memory 180. The GPU 175 determines what to do with each pixel to be displayed on, for example, the output device 155 or a display 160 of the operator's console 115. In operation, the GPU 175 makes a 3D image by first creating a wire frame out of straight lines, rasterizing the image and adding lighting, texture and color to the 3D image. The memory 180, which may be a RAM, holds information regarding each pixel and temporarily stores completed images. Although not shown, the graphics card 170 also includes a connection to a motherboard, which also holds the CPU 125, for receiving data and power and a connection to the output device 155 for outputting the picture.
It is to be understood that the memory 180 could be included in the GPU 175 or that the GPU 175 could include its own memory for performing certain storage tasks according to an exemplary embodiment of the present invention.
The operation of the system 100 is typically controlled from the operator's console 115, which includes a controller 165 such as a keyboard, and the display 160 such as a CRT display. The operator's console 115 communicates with the PC 110 and the acquisition device 105 so that 2D image data collected by the acquisition device 105 can be rendered into 3D data by the PC 110 and viewed on the display 160. It is to be understood that the PC 110 can operate and display information provided by the acquisition device 105 absent the operator's console 115, using, for example, the input device 150 and output device 155 to execute certain tasks performed by the controller 165 and display 160.
The operator's console 115 further includes any suitable image rendering system/tool/application that can process digital image data of an acquired image dataset (or portion thereof) to generate and display 2D and/or 3D images on the display 160. More specifically, the image rendering system may be an application that provides 2D/3D renderings and visualizations of medical image data, and which executes on a general purpose or specific computer workstation. Moreover, the image rendering system may enable a user to navigate through a 3D image or a plurality of 2D image slices. The PC 110 may also include an image rendering system/tool/application for processing digital image data of an acquired image dataset to generate and display 2D and/or 3D images.
The volume rendering module 145 may also be used by the PC 110 to receive and process digital medical image data, which as noted above, may be in the form of raw image data, 2D reconstructed data (e.g., axial slices), or 3D reconstructed data such as volumetric image data or multiplanar reformats, or any combination of such formats. The data processing results can be output from the PC 110 via the network 120 to an image rendering system in the operator's console 115 for generating 2D and/or 3D renderings of image data in accordance with the data processing results, such as segmentation of organs or anatomical structures, color or intensity variations, and so forth.
The remainder of the description is organized as follows. A general overview of GPU programming; a description of the stages of a GPU accelerated reconstruction pipeline; and a memory management algorithm and the corresponding pipeline algorithm.
Reference is made to T. Schiwietz, T. Chang, P. Speier, and R. Westermann, “MR image reconstruction using the GPU,” in Medical Imaging 2006: Visualization, Image-Guided Procedures, and Display. Proceedings of the SPIE (2006)., pp. 646-655, April 2006, and U.S. Patent Application Publication No. 2007/0014486 published Jan. 18, 2007, incorporated herein by reference. Reference is also made to the books J. D. Foley, A. van Dam, S. K. Feiner, and J. F. Hughes, Computer graphics (2nd ed. in C): principles and practice, Addison-Wesley Longman Publishing Co., Inc., Boston, Mass., USA, 1996. S. Thilaka and L. Donald, GPU Gems 2, ch. Medical Image Reconstruction with the FFT, pp. 765-784. Addison-Wesley, 2005. M. Woo, Davis, and M. B. Sheridan, OpenGL Programming Guide: The Official Guide to Learning OpenGL, Version 1.2, Addison-Wesley Longman Publishing Co., Inc., Boston, Mass., USA, 1999. K. Gray, Microsoft DirectX 9 Programmable Graphics Pipeline, Microsoft Press, Redmond, Wash., USA, 2003 and GPU Gems 2, Edited by Matt Pharr, Addison-Wesley, 2005.
Graphics cards provide readable and writable data structures in GPU memory. Basically, there are three types of data structures available: 1D, 2D, and 3D arrays, all referred to as textures. Among them, 2D textures provide the fastest update performance. The array elements of a texture are called texels. Each texel can have up to four float-typed components. The four components are often called the RGBA channels, as they are originally used to represent the red, green, blue, and alpha intensities of a color for rendering.
To set values in a texture, the GPU processing pipeline consisting of several stages is utilized. This procedure is also referred to as “rendering” which is carried over from the original convention when screen pixels are drawn by the pipeline.
Vertex shader: A stream of vertices enters the vertex shader stage. A vertex is a data structure on the GPU with attributes such as position vectors for certain coordinate systems. In this stage, vertex attributes are manipulated according to the objectives of the applications. Traditionally, user-defined vertex shader programs are used to transform position vectors from one space to another.
Geometry shader (triangle setup): In this stage, sets of three vertices are selected to setup triangles according to a geometry shader program.
Rasterization/Pixel shader: Using the three vertices of each triangle as knot points, a scanline algorithm generates texels that are circumscribed by the triangle. All vertex attributes are interpolated bilinearly. User-defined pixel shaderprograms are executed for each rasterized texel, which can access and work with the interpolated vertex attributes. The following types of instructions are available in this stage.
Arithmetical instructions perform addition, multiplication, exponent, etc.
Data access instructions allow reading values from GPU memory.
Control flow instructions allow branching and loops.
A large number of temporary registers are available for intermediate results in the program. Each texel is rasterized and processed independently and potentially in parallel. However, no processing order can be assumed by the programmer. The return value of a pixel shader program is primarily a four-component vector, which is explained in detail next.
Particularly, the texture to be updated is set as the render target. This implies that the pixel shader now writes values to the texture instead of the screen. The area of the texture that the programmer wants to write to is covered by a quadrilateral defined by four vertices and two triangles. Using the graphics card pipeline, a pixel shader program is executed for each texel in the target texture. With this procedure, arbitrary calculations can be performed. In the context of the reconstruction pipeline, the projection images and the output volume are represented as textures and updated using pixel-shader programs.
Currently, the two major graphics programming APIs are DirectX and OpenGL. Both APIs provide similar functionality for graphics cards programming. Additionally, there are so called shading languages to program the vertex and pixel shader units on the graphics card. The DirectX shading language is called high-level shading language (HLSL); and the one for OpenGL is the OpenGL shading language (GLSL). Both languages provide similar functionality and their syntax is closely related to the C language. Here, implementations have been written in DirectX with HLSL.
The pipeline consists of the following five stages
Curvature Smoothing (projection image pre-processing) (Step 600,
High-pass filter (projection image pre-processing) (Step 602,
Backprojection (volume reconstruction) (Step 604,
Ring artifact removal (volume post-processing) (Step 606,
Cupping artifact removal (volume post-processing) (Step 606,
Usually, the projection image raw data is noisy, as shown in
A continuous scalar field φ is smoothed iteratively by the function φi+1=φi+βC|∇φ| where C is the mean curvature
H(φ) denotes the Hessian matrix of φ and β is a free parameter in range [0;1]. Equation 1 is derived from the fundamental forms as described by Farin [G. Farin, Curves and surfaces for CA GD: a practical guide, Morgan Kaufmann Publishers Inc., San Francisco, Calif., USA, 2002] and Eberly [D. H. Eberly, 3D Game Engine Design, Morgan Kaufmann Publishers Inc., San Francisco, Calif., USA, 2001. In two dimensions, Equation 1 evaluates to
Equation 2 is discretized using central differences
where pi,j denotes the pixel intensity at the raster location (i, j). The curvature C is weighted by the gradient magnitude |∇φ| for normalization purposes. In order to prevent a division by zero in Equation 2, the algorithm returns 0 if the gradient magnitude |∇φ| gets small. Since no derivatives can be computed in the boundary region accurately, the process uses an out-flow boundary condition: the curvature of the direct neighbor perpendicular to the border tangent is replicated. After that, the image pixels pi,j are updated by the weighted curvature βC|∇φ|. Usually, four to six iterations are sufficient for satisfying results.
The algorithm is implemented, as shown by the flowcharts in
Thus, referring to
FIGS. 2(A) and 2(A) show the effect of curvature smoothing used in the process according to the invention;
According to the inverse Radon transformation, a high-pass filter, for example, a ramp filter or a variation of that has to be applied to the projection images before the backprojection takes place. Commonly used filters are the Ram-Lak, Shepp-Logan, or the Hamming filter. Here, the process convolves the image with the Ram-Lak filter. The process uses GPU shader programs, as shown in the flowchart in
More particularly, referring to
Principally, backprojection implementation works voxel-driven. That is, the process inquires for each voxel in the volume what the corresponding pixels in the projection image are. The process uses 2D textures to represent both the projection images and the slices of the output volume in GPU memory. The slices of the volume are updated using the techniques described above by rendering a quadrilateral covering the entire texture. 3D texture coordinates in projection space are specified at each vertex of the slices. Using the rasterization units the coordinates at the vertices are interpolated across the slice. The process uses GPU shader programs, shown in the flowchart in
More particularly, referring to
Ring artifacts appear in the reconstructed volume because of detector calibration errors. Since the detector calibration can never be completely accurate, a post-processing filter helps to reduce the ring artifacts that arise typically. The algorithm is a 2D filter that is applied to all slices of the volume separately as the ring artifacts appear equally in each slice and not volumetrically. Basically, the algorithm extracts the ring artifacts and subtracts them from the original image. In detail, the algorithm works in four steps:
Dynamic range restriction: The dynamic range of the image is clamped to 11-bit [0; 2048] to avoid introducing artifacts from objects with high contrast.
Median Filtering: Each pixel is filtered by the median of five samples along the radial line to the image center.
Circular Smoothing: The extracted ring artifact image is refined by a circular smoothing filter in order to reduce the noise. The center of the circles is always located in the image center while the radius is the distance to the image center. The filter averages values along the circular neighborhood. The process uses a constant angular distance Δφ between the sample points. Here, the process uses six samples in both directions on the circle resulting in the average of 13 samples.
Correction: The extract ring artifacts are subtracted from the original image.
The process uses GPU shader programs shown in the flowchart in
Thus, referring to FIGS. 3(A)-3(F),
The process is performed by GPU shader programs in accordance with the flowchart in
Cupping artifacts occur where a significant proportion of the lower energy photons in the beam spectrum have been scattered and absorbed. Typically, the artifacts are visible as low frequency error resulting in a non-uniform intensity distribution inside the image. In order to correct the cupping artifacts the low frequency errors must be identified and corrected. It can be observed that the cupping artifacts appear in range 0.4 to 4 cycles per image dimension. This filter band needs to be amplified and added to the original image. Formally, it can be expressed by the equation
fcorr(x,y)=F−1[F[f(x,y)](1+kW(u,v))] (4)
where f(x, y) is the two dimensional input image, fcorr(x, y) is the corrected image, F(x, y) is the Fourier transform of the image f(x, y), W(u, v) is a low-pass filter to extract the cupping artifacts and k is a filter gain scalar. The process uses the Butterworth filter for W(u, v)
where ω=√{square root over (u2+v2)} the angular frequency in radians per second, ωl the low cut off frequency, ωh the high cut off frequency, and n the order of the filter. In particular, the following parameter values are used:
Low cut off frequency ωl: 2 cycles per meter
High cut off frequency ωh: 3 cycles per meter
Filter order n: 2
Filter gain k: 2.25
The filter response is the low frequency cupping artifacts residing in the image. It is added to the original image weighted by the filter gain k and shifted by (avg+max)/2 with the average and maximum value of the in the filter response.
This algorithm has the following properties:
The process uses GPU shader programs, shown in the flowchart in
More particularly, referring to the flowchart in
Referring to
With all pipeline stages described herein, reconstructed output volume is now of very high quality.
Since GPU memory is limited and usually not big enough to store all projection images and the output volume, a memory management strategy is necessary. In this section, two memory management strategies are described and reconstruction pipeline algorithms are derived from them.
Nowadays, graphics cards have typically 512 MB of RAM. One can easily see that this is not enough memory to reconstruct a volume with 512×512×512 voxels in floating-point precision. A volume like this takes 512 MB. This does not fit into GPU memory since the graphics card driver uses some megabytes for internal buffers and, moreover, the projection images need to be stored in GPU memory as well. Therefore, a memory management strategy is required that swaps data between main memory and GPU memory. In general, both projection images and slices of the output volume can be swapped. Since the output volume does not fit into memory in one piece, the output volume is divided into chunks as depicted in
Two Swapping Strategies
Swapping chunks: The projection image is the central pipeline element that stays in GPU memory until it is not needed anymore (pnps bytes upload, 0 bytes download). Each projection image is first pre-processed and then backprojected to the volume. Since the volume does not fit into GPU memory in one piece, all chunks have to be swapped in and out for each projection image (pn·cncs bytes upload and download).
Swapping projection images: First, all projection images are pre-processed (pnps bytes upload) and then stored in main memory (pnps bytes download). Next, the chunks are processed sequentially. For each chunk, all projection image are backprojected. That means that all projection images have to be uploaded for each chunk (cn·pnps bytes upload). Once a chunk is reconstructed completely, post-processing filters can be applied directly before it is downloaded (cncs downloads).
What strategy produces the smaller bus traffic depends on the size of a projection image and the size of the output volume. A typical scenario for us looks like this: In this example, there pn=360 projection images with 512×512 float-valued pixels ps=1 MB and the desired output volume with 512×512×512 floating-point voxels is divided into cn=3 chunks. Each chunk takes about cs=170 MB. Swapping chunks causes about 360 GB traffic, while swapping projection images causes only 2.25 GB of traffic. Therefore, here the implementation is based on the second approach.
Swapping projection images leads to the following reconstruction algorithm (in pseudo code):
First, all projection images are preprocessed using the GPU. Therefore, the images are uploaded to GPU memory, the curvature smoothing filter and the high-pass are applied and the results are downloaded and saved in main memory. Then, the output volume is reconstructed chunk-by-chunk. All projection images are uploaded sequentially to GPU memory and backprojected to all slices of the chunk. Afterwards, the ring artifact removal and the cupping artifact removal are applied to the chunk in post-processing. Finally, the chunk is downloaded to main memory. This procedure is repeated for all chunks.
A number of embodiments of the invention have been described. Nevertheless, it will be understood that various modifications may be made without departing from the spirit and scope of the invention. Accordingly, other embodiments are within the scope of the following claims.
This application claims priority from U.S. Provisional application Ser. No. 60/816,540 filed on Jun. 26, 2006, which is incorporated herein by reference.
Number | Date | Country | |
---|---|---|---|
60816540 | Jun 2006 | US |