The invention relates to a method for circularly accessing a plurality of memory addresses.
The invention also relates to a computer program product and to a system for circularly using a sequence of values.
Digital signal processing in general and image processing in particular frequently involves executing block type operations. The block type operations may comprise performing a computation using a block of pixels, for example a block of 3×3 pixels or 5×5 pixels. These computations can be performed efficiently by loading a number of lines in respective memory buffers of a fast memory, the number of lines corresponding to the size of the block, and then performing the relevant computations on the blocks comprised in the loaded buffers. For example, in the case of 3×3 blocks, three consecutive lines of pixels may be loaded into the fast memory. Subsequently, the computations are done for the thus available blocks while simultaneously loading a fourth consecutive line into the fast memory. After having completed the computations for the first three consecutive lines, the first of those lines is discarded. The two remaining lines of pixels in combination with the fourth line again form three lines for performing block processing of 3×3 blocks. Addressing the lines of pixels in the fast memory is relatively computationally expensive. Four pointers to the beginning of the memory buffers corresponding to the successive lines of pixels are maintained, and after processing the blocks corresponding to the first three lines and after loading the fourth line of pixels into memory, the blocks corresponding to the second to fourth lines are processed and the fifth line is loaded in the memory buffer originally containing the pixels of the first line. This process is repeated until the complete image has been processed. An indexed table containing the pointers to the buffers is maintained, and indices are maintained indicating which line is in which buffer to be processed and indicating into which buffer the next line is to be loaded. After having processed the blocks and having loaded the next line, the indices are incremented modulo the number of pointers in the table, that is, the number of buffers, so that each pointer is used differently in a circular manner. Thus if the number of pointers is four, four modulo operations are required. However, modulo computation is a computationally expensive operation.
In U.S. Pat. No. 5,463,749, a simplified cyclical buffer is disclosed. The buffer has an integer number of memory locations M in respect of which a number of consecutive memory locations STEP are required to be accessed in a single operation and having a predetermined START location defining an initial memory location to be accessed. M is constrained to be an integer multiple of STEP and the k least significant bits of START are zero where k is the minimal integer satisfying the relation 2k>M−|STEP|. The result is the same as the general modulo algorithm employed in conventional cyclical buffers but without the cost of implementing the complete modulo function. An apparatus for generating successive addresses involves an adder and a k-bit comparator coupled via a multiplexer to an address register such that the k least significant bits of the adder or M−|STEP| or 0 is fed to the k least significant bits of the address register depending on the output of the k-bit comparator. This is a relatively complex way of addressing a circular buffer.
It is an object of the invention to provide a more efficient way of circularly accessing a plurality of memory addresses.
This object is realized by providing a method using a sequence of a plurality of m values wherein each value is represented by a predefined number of n bits, comprising
The method can include performing the steps of reading n predetermined bits of the register and identifying a memory address more than one time, reading n different predetermined bits each time, between successive rotations of the plurality of bits of the register. Hereinafter a unit shall indicate a sequence of n bits of the register representing one of the m values. A plurality of units can be read followed by the rotating, after which the plurality of units is read again. The integer multiple determines how fast the method steps through the plurality of values. If the integer multiple is equal to 1, the values are stepped through one by one. If the integer multiple is equal to or larger than 2, some values may be skipped. If the integer multiple is negative, the order of stepping through the values is opposite as compared to a positive integer multiple. If the integer multiple is 0, the same value is accessed each time.
An embodiment of the invention further comprises
This embodiment is a particularly practical way to cycle through a number of values, stored at distinct memory addresses. This is advantageous when the values may be represented by more than n bits.
An embodiment of the invention further comprises
In this way, it is possible to cycle through pointers. Also, it is possible to cycle through blocks of data associated with the pointers.
In an embodiment of the invention, the steps of
This embodiment makes it possible to apply different processing steps to different buffers in a cyclical manner. It also allows to perform a processing step on data in a first buffer while loading a second buffer with new data simultaneously.
In another embodiment, the step of obtaining a value represented by n predetermined bits of the register is performed for all m values, each value being represented by a respective n bits.
This aspect is advantageously used if the processing algorithm involves processing a plurality of buffers in a different way simultaneously, and the role of each buffer changes in a repetitive way between processing steps.
In another embodiment, the respective read pointer values are associated with respective memory buffers, and the method comprises processing data stored in a plurality of the respective memory buffers.
The processing can be performed more efficiently if the memory buffers are part of a fast memory or cache memory. In particular if a data set needs to be processed that is too large to be loaded in the fast memory completely, part of the data set can be loaded in the memory buffers for processing.
In another embodiment, the step of processing data comprises performing a block type operation on an at least two-dimensional image, each memory buffer being loaded with a line of the image, the loaded lines collectively comprising block-shaped subsets of the image, and the block type operation is performed on blocks of pixels of the image by reading corresponding pixel values from the memory buffers.
This allows a highly efficient cyclic use of the buffers.
In another embodiment of the invention, a computer program product comprises instructions for causing a processor to perform the method of claim 1.
The invention also relates to a system as defined in claim 9.
These and other aspects of the invention will be elucidated hereinafter in the description of the drawing, wherein
Here, the steps of performing the required operations and loading the next line can be performed in parallel. To make the method more efficient, instead of releasing the fast memory holding the first consecutive line, this fast memory is reserved for loading the next consecutive line in the fast memory. This means that four memory buffers are allocated in the fast memory, each buffer capable of holding the pixel values of a single line of the image. Each line is kept in the buffer for three iterations for processing, after which the buffer is overwritten with a new line of the image. Each buffer can have four different roles in an iteration: the role of being multiplied with the first line of the kernel, the role of being multiplied with the second line of the kernel, the role of being multiplied with the third line of the kernel, and the role of being overwritten with the next consecutive line of the image. These roles are rotated over the four buffers after each iteration.
Similar scenarios are obvious to the skilled artisan, for example if a 5×5 kernel were used in the above example, 6 fast memory buffers could be used of which 5 would contain consecutive lines of the image and one would be overwritten with the next consecutive line.
The principle of reserving a buffer for loading new data while executing a filter on another buffer containing data is also referred to as double buffering.
Many applications of the invention will be obvious to the person skilled in the art. In this description, the application of applying a two-dimensional block filter to an image has been discussed. However, the invention can be applied equally well to three-dimensional filters for filtering volumetric datasets. Volumetric data sets comprise voxels ordered in a three-dimensional grid. The filter correspondingly also has a kernel extending in three dimensions. Consider a three-dimensional filter kernel with size L×M×N. For efficient computation, a number of lines of voxel values is loaded in the buffers. In this case, L×M+L buffers could be used. L×M buffers could be used for multiplication with filter kernel values, and the remaining L buffers could be used for double buffering, as set forth. Volumetric datasets typically occur in medical imaging.
The invention can be used to advantage for any application which requires a circular reading of predetermined values; in particular, for any application which requires repeated reading of a sequence of values, wherein the repeated readings differ in that a value that appears first in the sequence at a reading of the sequence should appear last at the next reading of the sequence.
It will be appreciated that the invention also extends to computer programs, particularly computer programs on or in a carrier, adapted for putting the invention into practice. The program may be in the form of source code, object code, a code intermediate source and object code such as partially compiled form, or in any other form suitable for use in the implementation of the method according to the invention. The carrier may be any entity or device capable of carrying the program. For example, the carrier may include a storage medium, such as a ROM, for example a CD ROM or a semiconductor ROM, or a magnetic recording medium, for example a floppy disc or hard disk. Further the carrier may be a transmissible carrier such as an electrical or optical signal, which may be conveyed via electrical or optical cable or by radio or other means. When the program is embodied in such a signal, the carrier may be constituted by such cable or other device or means. Alternatively, the carrier may be an integrated circuit in which the program is embedded, the integrated circuit being adapted for performing, or for use in the performance of, the relevant method.
It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design many alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. Use of the verb “comprise” and its conjugations does not exclude the presence of elements or steps other than those stated in a claim. The article “a” or “an” preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the device claim enumerating several means, several of these means may be embodied by one and the same item of hardware. The mere fact that certain measures are recited in mutually different dependent claims does not indicate that a combination of these measures cannot be used to advantage.
Number | Date | Country | Kind |
---|---|---|---|
06110716.5 | Mar 2006 | EP | regional |
PCT/IB2007/050718 | Mar 2007 | IB | international |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/IB07/50718 | 3/5/2007 | WO | 00 | 9/6/2008 |