The invention relates generally to processing of digital volume images, and in particular, to a system and methods for improved high-speed processing of digital volume images using a GPU (graphics processing unit).
3-D volume imaging is a diagnostic tool that offers advantages over earlier 2-D radiographic imaging techniques for evaluating the condition of internal structures and organs. 3-D imaging of a patient or other subject has been made possible by a number of advancements, including the development of high-speed imaging detectors, such as digital radiography (DR) detectors that enable multiple images to be taken in rapid succession. Digital volume images, obtained from computerized tomography (CT) or other imaging systems, provide valuable tools for diagnosis, treatment planning, and biomedical modeling and visualization.
While it offers considerable benefits, 3-D volume imaging works with large amounts of data and requires considerable data processing resources, with very high CPU usage and long processing times. Image processing utilities for 3-D volume imaging include processes such as volume segmentation, a process that partitions a three-dimensional image set into a plurality of non-overlap regions. As an example of a segmentation process, the GrowCut segmentation algorithm (see “GrowCut—Interactive Multi-Label N-D Image Segmentation By Cellular Automata,” by Vladimir Vezhnevets, and Fadim Konouchine, International Conf. Computer Graphics and Vision 2005) stores at least five intermediate three-dimensional image sets in order to perform its segmentation. With this much data to process, computation cost is often a concern and the CPU (central processing unit) based GrowCut algorithm takes a very long time to compute. For a medium size volume data set (e.g. 181×147×242 voxels), the execution time using GrowCut segmentation is about one hour using a capable CPU processor (e.g. an Intel® Core™ 2 Duo CPU).
One solution proposed for processing the massive amounts of data needed to support functions such as image segmentation is the use of a dedicated Graphical Processing Unit (GPU). Originally developed for computer game and simulation applications, the GPU has evolved from a dedicated graphic display processor with a fixed pipeline to a more capable processor for general purpose computing, matrix computing, image processing, simulation and medical imaging using parallel processing with the programming pipeline. GPU architecture and its parallel processing capabilities have been utilized for providing hardware-accelerated volume image rendering of CT and other images, as described in U.S. Patent Application No. 2006/0227131 entitled “Flat Texture Volume Rendering” by Schiwietz et al. This approach stores the 3D image slices as flat texture data. While such a method improves some aspects of image storage and addressing, however, it does not facilitate update of the volume image data and makes it cumbersome to apply processing such as bilinear filtering, for example, that require facile computation between neighboring voxels. It is necessary to calculate the tile offsets in the flat volume in order to find neighbors for a voxel. Such calculation can slow the shader performance considerably, since it is required for every voxel in the GPU shader program. Significantly, because of the complexity and time required for addressing neighboring voxels, the method taught in Schiwietz et al. '7131 is not well suited to support segmentation, such as using the GrowCut algorithm noted earlier.
While GPU capabilities offer some promise for improving processing speed and capability overall, a number of significant problems remain. GPU programming is not straightforward and requires different strategies for data storage and addressing than those conventionally applied for central processing unit (CPU) schemes. The graphic pipeline API of the GPU does not directly handle volume imaging structures, but requires re-mapping of image data to existing data structures, which can be time consuming and can make functions such as image filtering more difficult to execute than when using conventional data representation. Even with high level programming languages designed for GPU interaction, such as OpenCL, CUDA, CAL and Brook, careful implementation design is important in order to achieve improvements in volume imaging performance.
Thus it is seen that, while GPU capabilities offer an attractive alternative to conventional CPU-based image processing for volume images, there is considerable work needed to take advantage of GPU speed and parallel processing capabilities. One aspect of this problem relates to the task of mapping the existing volume image data structures into a form that can be readily handled by the GPU and to addressing schemes needed to harness the capability of the GPU for high-level image processing such as registration, filtering, and segmentation.
It is an object of the present invention to advance the art of volume image processing using GPU based technology. The present invention provides methods that help to streamline and simplify the problem of voxel addressing needed to obtain information from neighboring voxels for each voxel in a volume image.
An advantage of the present invention relates to the ease of indexing between slices of the image when arranged in a GPU flat volume data structure.
According to an aspect of the present invention, there is provided a method for processing a digital volume image, the method executed at least in part on a computer and comprising: receiving the digital volume image as a stack of image slices, each slice containing a plurality of voxels; forming a 1:1 mapping of each of the slices, in order, to a corresponding tile in a digital flat volume; defining, for at least one voxel in a plurality of voxels in the digital flat volume, a neighborhood that comprises the at least one voxel and adjacent voxels that are within the corresponding tile of the at least one voxel, and adjacent voxels to the at least one voxel that are within the preceding tile in the digital flat volume, and adjacent voxels to the at least one voxel that are within the next tile in the digital flat volume; rendering the at least one voxel according to the adjacent voxels in its defined neighborhood; and displaying the volume image having the at least one rendered voxel.
The foregoing and other objects, features, and advantages of the invention will be apparent from the following more particular description of the embodiments of the invention, as illustrated in the accompanying drawings, in which:
The following is a detailed description of the preferred embodiments of the invention, reference being made to the drawings in which the same reference numerals identify the same elements of structure in each of the several figures.
In the context of the present invention, the terms “tile” and slice are interchangeable. The term “texture” defines a variable-length data structure used in GPU data representation, familiar to those skilled in GPU programming.
Segmentation, filtering, and other image processing functions for volume images typically require calculations that address each voxel and its surrounding neighbors or adjacent voxels. In 2D image processing, similar types of operations are carried out for individual pixels, with each individual pixel having 8 neighboring pixels. By way of reference,
Embodiments of the present invention use the GPU 170 for high speed digital volume processing to support segmentation and other complex operations. A novel addressing scheme, termed neighbor-order rendering, allows quick access to data about neighboring voxels on different slices to facilitate the computation needed for segmentation and other compute-intensive volume image processing operations.
By way of illustration,
The use of a flat volume has been proposed for data representation in GPU processing in a number of different applications. One example application is described in the article “Simulation of Cloud Dynamics on Graphics Hardware” by Harris, Baxter, Scheuermann, Lastra, Proc. Graphics Hardware 2003, Eurographics Association, pp. 92-101. The use of a flat volume offers some advantages over more conventional 3D volume texture. For example, only one texture update is needed per operation and GPU parallelism is used efficiently. Schiwietz et al. '7131 also teaches a method of flat texture rendering for volume imaging. However, as noted earlier, it is necessary to calculate tile offsets in the flat volume in order to address neighbors for a voxel. This type of calculation is required for every voxel in the GPU shader program, requires multiple computations for update of each voxel, and can degrade shader performance.
In the flat volume or 2D texture of
m_texWidth=(m_volWidth)*(m_tileCol)
wherein m_tileCol is the number of tiles (slices in a row). In the current example, m_tileCol=4.
A texture height 114 is shown as m_texHeight. The texture height 112, denoted m_texHeight, of texture 104 equals:
m_texHeight=(m_volHeight)*(m_tileRow)
wherein m_tileRow is the number of rows of tiles in the 2D texture presentation. In the current example, m_tileRow=2.
To use the GrowCut algorithm or other type of processing using the mapped arrangement of
As shown in
sliceX=k%m_tileCol
wherein sliceX is the starting tile_x coordinate for the tile, “%” is a modulus operator, and kε[0,m_volDepth−1];
sliceY=k/m_tileCol
where sliceY is the starting tile_y coordinate for said tile, “/” is an integer division operator (fractional component discarded), and kε[0,m_volDepth−1].
For the exemplary 8-tile (slice) flat volume of
sliceX=5%4=1
sliceY=5/4=1
As shown in
x1=sliceX*m_volWidth,
y1=sliceY*m_volHeight
wherein x1 is the starting x coordinate for a tile and y1 is the starting y coordinate for the tile.
The value z is used for indexing from a tile to its preceding and next tiles in neighbor-order rendering.
The logic flow diagram of
Using the arrangement and definitions described with reference to
By way of example, considering the
Continuing with the
The logic flow diagram of
The block diagram of
This neighbor-order rendering approach is applied to voxels in all valid tiles one at a time in the flat volume. By way of example, the GrowCut algorithm employs five flat volumes (2D textures): one intensity texture, two label textures and two strength textures. All these 2D textures have the same basic tile arrangement.
As described with reference to
Those skilled in the art can readily appreciate that the neighbor-order rendering approach described herein can be generalized for applications in which a voxel has a neighborhood of some size other than 3×3×3. In such a case, for the exemplary GrowCut algorithm, a voxel's status (strength and label values) is updated based on its neighbors' status (strength and label values) in a plurality of steps by splitting its neighborhood into a plurality two dimensional neighborhood layers (or, simply, layers), namely, preceding or previous layers, current layer, and following layers residing in the preceding tiles, current tile and following tiles respectively.
Convergence verification is done by occlusion query. Two 2D label textures are compared and voxels in corresponding positions in two textures are discarded if they have identical label values. The occlusion query returns the number of remaining voxels in a label texture. The GrowCut evolution process (iteration) is terminated if the number returned is zero, which means that propagation process has converged.
It can be appreciated that the data mapping and addressing scheme of the present invention, using the GPU flat volume representation, facilitates addressing of voxels in adjacent slices, thus simplifying the update processing task for each voxel. Using the GPU to perform this function provides significant advantages for processing throughput, helping to speed execution of the GrowCut algorithm and similar processing.
The present invention is described as a method. However, in another embodiment, the present invention comprises a computer program product for image linear structure detection in medical applications in accordance with the method described. In describing the present invention, it should be apparent that the computer program of the present invention can be utilized by any well-known computer system, such as the personal computer. However, many other types of computer systems can be used to execute the computer program of the present invention.
It will be understood that the computer program product of the present invention may make use of image manipulation algorithms and processes that are well known. Accordingly, the present description is directed in particular to those algorithms and processes forming part of, or cooperating more directly with, the method of the present invention. Thus, it will be understood that the computer program product embodiment of the present invention may embody algorithms and processes not specifically shown or described herein that are useful for implementation. Such algorithms and processes are conventional and within the ordinary skill in the image processing art. Additional aspects of such algorithms and systems, and hardware and/or software for producing and otherwise processing the images or co-operating with the computer program product of the present invention, are not specifically shown or described herein and may be selected from such algorithms, systems, hardware, components and elements known in the art.
Processing results from methods and apparatus of the present invention can be displayed on a control monitor, for example, or can be reported to a viewer or provided, as data, for subsequent image processing and analysis. Linear structures and microcalcifications that are detected by the method of the present invention can be highlighted on the display, for example.
A computer program product may include one or more storage media, for example; magnetic storage media such as magnetic disk or tape; optical storage media such as optical disk, optical tape, or machine readable bar code; solid-state electronic storage devices such as random access memory (RAM), or read-only memory (ROM); or any other physical device or media employed to store a computer program having instructions for controlling one or more computers to practice the method according to the present invention. The computer of the present invention has both a central processing unit (CPU) and a Graphics Processing Unit (GPU) that cooperate to provide the volume processing functions described herein.
It will be appreciated that variations and modifications can be effected by a person of ordinary skill in the art without departing from the scope of the invention. The subject matter of the present invention relates to digital image processing and computer vision technologies, which is understood to mean technologies that digitally process a digital image to recognize and thereby assign useful meaning to human understandable objects, attributes or conditions, and then to utilize the results obtained in the further processing of the digital image.
The invention has been described in detail with particular reference to a presently preferred embodiment, but it will be understood that variations and modifications can be effected within the spirit and scope of the invention. The presently disclosed embodiments are therefore considered in all respects to be illustrative and not restrictive. The scope of the invention is indicated by the appended claims, and all changes that come within the meaning and range of equivalents thereof are intended to be embraced therein.