The invention relates generally to processing of digital volume images, and in particular, to a system and methods for improved high-speed processing of digital volume images using a CUDA (Compute Unified Device Architecture) enabled GPU (graphics processing unit).
3-D volume imaging has proved to be a valuable diagnostic tool that offers significant advantages over earlier 2-D radiographic imaging techniques for evaluating the condition of internal structures and organs. 3-D imaging of a patient or other subject has been made possible by a number of advancements, including the development of high-speed imaging detectors, such as digital radiography (DR) detectors that enable multiple images to be taken in rapid succession. Digital volume images, obtained from computed tomography (CT) or other imaging systems, provide valuable tools for diagnosis, treatment planning, and biomedical modeling and visualization.
While it may offer some benefits, 3-D volume imaging works with large amounts of data and requires considerable data processing resources, with very high CPU usage and long processing times. Image processing utilities for 3-D volume imaging include processes such as volume segmentation, a process that partitions a three-dimensional image set into a plurality of non-overlap regions. As an example of a segmentation process, the GrowCut segmentation algorithm (see “GrowCut—Interactive Multi-Label N-D Image Segmentation By Cellular Automata,” by Vladimir Vezhnevets, and Fadim Konouchine, International Conf. Computer Graphics and Vision 2005) stores at least five intermediate three-dimensional image sets in order to perform its segmentation processing. With this much data to process, computation cost is often a concern and the CPU (central processing unit) based GrowCut algorithm takes a very long time to compute. For a medium size volume data set (e.g.181×147×242 voxels), the execution time using GrowCut segmentation is about 15 minutes using a capable CPU processor (e.g. an Intel® Core™ 2 Duo CPU) with sequential processing.
One solution proposed for processing the massive amounts of data needed to support functions such as image segmentation is the use of a dedicated Graphical Processing Unit (GPU). Originally developed for computer game and simulation applications, the GPU has evolved from a dedicated graphic display processor with a fixed pipeline to a more capable processor for general purpose computing, matrix computing, image processing, simulation and medical imaging using parallel processing with the programming pipeline. GPU architecture and its parallel processing capabilities have been utilized for providing hardware-accelerated volume image rendering of CT and other images, as described in U.S. Patent Application No. 2006/0227131 entitled “Flat Texture Volume Rendering” by Schiwietz et al. This approach stores the 3D image slices as flat texture data. While such a method improves some aspects of image storage and addressing, however, it does not facilitate update of the volume image data and has other drawbacks. Storage of 3D image data slices as flat texture structures makes it cumbersome to apply processing that updates image data elements, such as bilinear filtering, for example, that require facile computation between neighboring voxels. Addressing problems with GPU hardware become considerably more complex than with conventional CPU structures. In order to find neighbors for a voxel, for example, it is necessary to calculate and keep track of the tile offsets in the flat volume. Such calculations, since they are required for every voxel in the GPU shader program, can slow shader performance considerably. It appears that, because of the complexity and time required for addressing neighboring voxels, methods such as that taught in Schiwietz et al. '7131 are not well suited to support segmentation, such as using the GrowCut algorithm noted earlier.
While GPU capabilities offer some promise for improving processing speed and capability overall, a number of significant problems remain. GPU programming is not straightforward and requires different strategies for data storage and addressing than those conventionally applied for central processing unit (CPU) schemes.
Thus it is seen that, while GPU capabilities offer an attractive alternative to conventional CPU-based image processing for volume images, there is considerable work needed to take advantage of GPU parallel processing capabilities. One aspect of this problem relates to the task of mapping the existing volume image data structures into a form that can be readily handled by the GPU and to addressing schemes needed to harness the capability of the GPU for high-level image processing such as registration, filtering, and segmentation.
It is an object of the present invention to advance the art of volume image processing using GPU-based technology. The present invention provides methods that help to streamline and simplify the problem of voxel addressing needed to obtain information from neighboring voxels for each voxel in a volume image.
An advantage of the present invention relates to the ease of indexing between slices of the image when the image data is arranged in a GPU flat volume data structure.
According to an aspect of the present invention, there is provided a method for processing digital volume image data, the method executed at least in part on a computer that has an associated graphics processing unit, the method comprising: ordering the digital volume image data as a stack of two-dimensional image slices, each slice containing a plurality of voxels; forming a 1:1 mapping of each of the slices, in sequential order, to a corresponding tile in a digital one-row flat volume in a global memory of the graphics processing unit; defining, for each voxel of a plurality of voxels in the digital flat volume, a neighborhood that comprises, relative to said each voxel, four or more adjacent voxels that are within the corresponding tile of said each voxel, and one or more adjacent voxels that are within the preceding tile, and one or more adjacent voxels that are within the next tile; updating said each voxel according to information stored for the adjacent voxels in the defined neighborhood for said each voxel to form a rendered volume image; and displaying the updated rendered volume image.
The foregoing and other objects, features, and advantages of the invention will be apparent from the following more particular description of the embodiments of the invention, as illustrated in the accompanying drawings, in which:
The following is a detailed description of the preferred embodiments of the invention, reference being made to the drawings in which the same reference numerals identify the same elements of structure in each of the several figures.
In the context of the present disclosure, the term “image” refers to multi-dimensional image data that is composed of discrete image elements. For 2-D images, the discrete image elements are picture elements, or pixels. For 3-D images, the discrete image elements are volume image elements, or voxels.
In the context of the present disclosure, the terms “tile” and slice are interchangeable, describing the same image content, in 2-D form, from different aspects. The term “texture” defines a variable-length data structure used in GPU data representation, familiar to those skilled in GPU programming. The texture is typically a 1-, 2-, or 3-dimension array, optimized to support parallel processing by the GPU. Textures can be readable and writable. In GPU image processing, the texture image elements can store information such as voxel color and intensity. A “shader” is a GPU program that is designed to operate efficiently on the texture data structure to support parallel processing of the image data. Rendering of processed data can be directed to a texture, for example.
In the context of the present disclosure, the terms “neighboring voxel” and “adjacent voxel” are considered to be synonymous. Voxel update is required as a result of some type of image processing, such as segmentation or applying a filter to the image data, for example.
Segmentation, filtering, and other image processing functions for volume images typically require calculations that address each voxel and its surrounding neighbors or adjacent voxels. In 2D image processing, similar types of operations are carried out for individual pixels, with each individual pixel having 8 neighboring pixels. By way of reference,
For a volume image, as shown for reference in
Embodiments of the present invention use the GPU 170 for high speed digital volume processing to support segmentation and other complex operations. A novel addressing scheme, termed neighbor-order rendering, allows quick access to data about neighboring voxels on different slices to facilitate the computation needed for segmentation, filtering, and other compute-intensive volume image processing operations.
By way of illustration,
The use of a flat volume has been proposed for image data representation in GPU processing in a number of different applications. One example application is described in the article “Simulation of Cloud Dynamics on Graphics Hardware” by Harris, Baxter, Scheuermann, Lastra, Proc. Graphics Hardware 2003, Eurographics Association, pp. 92-101. The use of a flat volume offers some advantages over more conventional 3D volume texture. For example, only one texture update is needed per operation and GPU parallelism is used efficiently. Schiwietz et al. '7131 also teaches a method of flat texture rendering for volume imaging. However, as noted earlier, it is necessary to calculate tile offsets in the flat volume in order to address neighbors for a voxel. This type of calculation, required for every voxel in the GPU shader program, necessitates multiple computations for update of each voxel, and can degrade shader performance. This, in turn, hampers the performance of the GPU for volume image processing applications.
In the flat volume or 2D texture of
m_texWidth=(m_volWidth)*(m_tileCol)
wherein m_tileCol is the number of tiles (slices in a row). In the current example, m_tileCol=4.
A texture height 114 is shown as m_texHeight. The texture height 112, m_texHeight of texture 104 equals:
m_texHeight=(m_volHeight)*(m_tileRow)
wherein m_tileRow is the number of rows of tiles in the 2D texture presentation. In the current example, m_tileRow=2.
To use the GrowCut algorithm or other type of processing using the mapped arrangement of
As shown in
sliceX=k % m_tileCol
wherein sliceX is the starting tile_x coordinate for the tile, “%” is a modulus operator, and kε[0, m_volDepth−1];
sliceY=k/m_tileCol
where sliceY is the starting tile_y coordinate for said tile, “/” is an integer division operator (fractional component discarded), and kε[0, m_volDepth−1].
For the exemplary 8-tile (slice) flat volume of
sliceX=5%4=1
sliceY=5/4=1
As shown in
x1=sliceX*m_volWidth,
y1=sliceY*m_volHeight
wherein, for a tile x1 is the starting x coordinate and y1 is the starting y coordinate.
The value z is used for indexing from a tile to its preceding and next tiles in neighbor-order rendering.
The logic flow diagram of
Using the arrangement and definitions described with reference to
By way of example, considering the
Continuing with the
The logic flow diagram of
The block diagram of
This neighbor-order rendering approach is applied to voxels in all valid tiles in the flat volume, one at a time. By way of example, the GrowCut algorithm employs five flat volumes (2D textures): one intensity texture, two label textures and two strength textures. All these 2D textures have the same basic tile arrangement.
As described with reference to
Those skilled in the art can appreciate that the neighbor-order rendering approach described herein can be generalized for applications in which a voxel has a neighborhood of some size other than 3×3×3. In such a case, for the exemplary GrowCut algorithm, a voxel's status (strength and label values) is updated based on its neighbors' status (strength and label values) in a plurality of steps by splitting its neighborhood into a plurality two dimensional neighborhood layers (or, simply, layers), namely, preceding or previous layers, current layer, and following layers residing in the preceding tiles, current tile and following tiles respectively.
Convergence verification is done by occlusion query. Two 2D label textures are compared and voxels in corresponding positions in two textures are discarded if they have identical label values. The occlusion query returns the number of remaining voxels in a label texture. The GrowCut evolution process (iteration) is terminated if the number returned is zero, which means that propagation process has converged.
The above design is a locally-parallel strategy, meaning that all voxels in a tile in a flat volume are processed simultaneously while tiles themselves in the flat volume are processed sequentially.
An alternate embodiment using globally-parallel processing strategy of the present invention is described with respect to
In the one-row flat volume or 2D texture of
m_texWidth=(m_tilelWidth)*(m_tileCol)=W*(m_tileCol)
wherein m_tileCol is the number of tiles (slices in a row). In the current example, m_tileCol=4.
A texture height 1014 is shown as m_texHeight. The texture height 1014, denoted m_texHeight, equals:
m_texHeight=(m_volHeight)*(m_tileRow)=H
wherein m_tileRow is the number of rows of tiles in the 2D texture presentation. In the current example, m_tileRow=1.
As shown in
There is shown an exemplary voxel(x,y) 1050 under processing for being updated using information from neighboring voxels. With this one-row flat volume configuration, each neighbor of voxel(x,y) 1050 in tile 1002d can be readily located with a corresponding coordinate offset. For example. neighbor voxel(x+W, y−1) 1052 in tile 1002e with an x coordinate offset by W and a y coordinate offset by −1. For neighbor voxel(x−W−1, y) in tile 1002c, the x offset is −W−1; the y offset is 0.
In the example of
In CUDA implementation, the one-row flat volume, shown as flat volume 1004 in
Continuing with the GrowCut example, processing of the volume image uses five one-row flat volumes assigned as: label one-row flat volume L; label buffer one-row flat volume LB; strength one-row flat volume S; strength buffer one-row flat volume SB; and intensity one-row flat volume C.
For a volume containing K slices, the layout of the one-row flat volume can be represented in a single-row matrix form, using label volume L as an example, as: L=[L1 L2 . . . LK]. where Lk is a slice in the volume. The same format is applicable to all the other volumes.
The GrowCut implemented in CUDA can be described by the following pseudo code.
For an image voxel c(x,y)k residing in an unmarked position (x,y) in any tiles except the first tile (k=1) and the last tile (k=K), its label l(x,y)k and strength s(x,y)k are updated in a function call of φ( ) as described in Vezhnevets' paper. In function φ, neighbors of c(x,y)k, l(x,y)k, and s(x,y)k are used. In CUDA implementation, neighbors' offset values, shown in Table 1062 in
Convergence verification, with diff( ) is performed for every iteration for the while loop described above, so that LB and current textures L are compared to verify if they are exactly the same or if the difference is within a small value of ε as shown in the above algorithm.
It can be appreciated that the data mapping and addressing scheme of the present invention, using the GPU flat volume representation, facilitates addressing of voxels in adjacent slices, thus simplifying the update processing task for each voxel. Using the GPU to perform this function provides significant advantages for processing throughput, helping to speed execution of the GrowCut algorithm and similar processing.
The present invention is described as a method. However, in another preferred embodiment, the present invention comprises a computer program product for image linear structure detection in medical applications in accordance with the method described. In describing the present invention, it should be apparent that the computer program of the present invention can be utilized by any well-known computer system, such as the personal computer. However, many other types of computer systems can be used to execute the computer program of the present invention.
It will be understood that the computer program product of the present invention may make use of image manipulation algorithms and processes that are well known. Accordingly, the present description is directed in particular to those algorithms and processes forming part of, or cooperating more directly with, the method of the present invention. Such algorithms and processes are conventional and within the ordinary skill in the image processing art. Additional aspects of such algorithms and systems, and hardware and/or software for producing and otherwise processing the images or co-operating with the computer program product of the present invention, are not specifically shown or described herein and may be selected from such algorithms, systems, hardware, components and elements known in the art.
A computer program product for executing the method of the present invention, such as computer 158 in
It will be appreciated that variations and modifications can be effected by a person of ordinary skill in the art without departing from the scope of the invention. The subject matter of the present invention relates to digital image processing and computer vision technologies, which is understood to mean technologies that digitally process a digital image to recognize and thereby assign useful meaning to human understandable objects, attributes or conditions, and then to utilize the results obtained in the further processing of the digital image.
The invention has been described in detail with particular reference to a presently preferred embodiment, but it will be understood that variations and modifications can be effected within the spirit and scope of the invention. The presently disclosed embodiments are therefore considered in all respects to be illustrative and not restrictive. The scope of the invention is indicated by the appended claims, and all changes that come within the meaning and range of equivalents thereof are intended to be embraced therein.
This is a Continuation-in-Part of U.S. Ser. No. 13/156,378 titled SYSTEM AND METHOD FOR DIGITAL VOLUME PROCESSING filed on Jun. 9, 2011 in the names of Li et al.
Number | Date | Country | |
---|---|---|---|
Parent | 13156378 | Jun 2011 | US |
Child | 13421183 | US |