The invention relates generally to a address calculation unit for region based image processing tasks.
Image processing tasks are often based on the selection of rectangular regions within a picture. Typically, the applied algorithms for the image processing require a pixel-accurate selection of the region. Therefore the addressing of input and output regions requires complex addressing operations.
Due to the nature of the applied algorithms, it is possible to process several pixels concurrently. For this, SIMD(single instruction multiple data)-type architectures can be efficiently applied. However, one of the major issues arising with this kind of architectural approach is the addressing and selection of input and output operands for a certain operation. Performing the required address calculation requires a significant amount of processing power. As a result, the overall processing performance of the implementation is reduced.
A region-based processing of video data requires a two-dimensional access to input and output data. In current implementations of this processing scheme, the addressing of the input and output regions is performed by calculations carried out on the general-purpose data path that is also used for data processing. As the arithmetic resources can only be used for either data processing or for address calculations, this approach leads to a reduction of available data processing performance. Whenever an address calculation is carried out, the arithmetic units cannot be used for executing data processing. This reduces the overall performance of the implementation too.
The aim of this invention is to support an increased efficiency of this processing scheme for programmable embedded hardware implementations and it is an object of the invention to mitigate the drawbacks of the prior art.
The invention relates generally to a address calculation unit for region based image processing tasks according to claim 1. Further inventive advantages are described in the claims 2 to 7.
The above and other features and advantages of the invention will be apparent from the following description of an exemplary embodiment of the invention with reference to the accompanying drawings, in which:
An overview on how a region based addressing scheme can be applied to conventional architectures is depicted in
In order to perform an appropriate addressing of input and output data, several parameters have to be known by the addressing unit. As shown in
Global addressing for loading and storing sub-ROIs is performed as described by the following formula:
GlobalAddress=Image:Base+((Roi:Posy+SubRoi:Posy)*Image:Stride)+((Roi:Posx+SubRoi:Posx)*Image:Bpp)>>3)
GlobalAddress as well as Image:Base and Image:Stride are assumed to be byte addresses in this example.
The address calculation can be easily extended for non-byte-aligned addressing schemes.
Local addressing for accessing sub-ROI contents is performed according to the following scheme:
LocalAddress=Local:Base+Local:Posy*Local:Stride+(Local:Posx*Image:Bpp)>>3
Local:Stride is assumed to be byte addresses in this example.
In order to achieve high performance processing of region based algorithms, several neighbouring pixels can be combined into one data word that is supplied to the data path. As a consequence the resulting output data calculated by the data path typically contain several neighbouring pixels of the output sub-ROI. Writing of pixel data, that are not part of the sub-ROI, can be avoided by an extension of the previously described implementation of the address generation unit:
In parallel to the generation of a local address for the output sub-ROI a mask is generated. This mask indicates which portion of the result is a valid part of the sub-ROI. Only this part is written to local memory. Portions not belonging to the sub-ROI are discarded.
The masking operation is performed by the following scheme:
The invention described above can be applied for every application that requires region based processing of multi-dimensional data. The described masking operation has advantages for all implementations supporting the concurrent processing of several pixels or generally speaking of data elements.
For example the invention may be applied in an automotive vision controller. Additionally a region-based processing may be applied for video analysis algorithms in the context of video compression and decompression applications.
Improvement are achieved by applying an address calculation unit performing the necessary address calculations required for accessing input and output data. The address calculation is performed in parallel to the data processing.
As an extension to the basic scheme, a mask calculation can be applied. The masking is used if several output pixels are generated concurrently. In case not all generated output pixels are part of the defined output region, setting the associated mask accordingly invalidates these pixel data.
The main advantage of the approach is the split of the relatively complex address calculation of region-based algorithms and the actual processing of data. The parallel implementation of both functions leads to a significant overall performance increase as well as an increased ease of use for region-based image processing algorithms.
The invention allows the concurrent address calculation and data processing of region-based tasks. This is achieved by extending the basic architecture with a dedicated address calculation unit. This address calculation is able to calculate the addresses of input and out put pixels. Moreover, the unit calculates a so-called “write mask” which indicates which part the output data generated by the arithmetic unit contains valid data, i.e. data that is part of the selected output region.
Number | Date | Country | Kind |
---|---|---|---|
06122973.8 | Oct 2006 | EP | regional |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/IB07/54184 | 10/5/2007 | WO | 00 | 4/24/2009 |