The present invention relates to an information processing device, in which an LSI and a memory are installed, that executes processing such as image processing.
In digital broadcasting and a DVD that are widely used today, moving images are compressed before being transferred or recorded. This is because the data amount of non-compressed digital moving images becomes extremely large. The most popular compression algorism is MPEG2. Although there are other algorithms such as H.264, they are similar in that both intra-frame data compression and inter-frame data compression are used.
Intra-frame data compression reduces the data amount by reducing redundant data using the characteristics of images. On the other hand, inter-frame data compression reduces the data amount by exploiting the similarities between successive frames.
When inter-frame compression is applied to a moving video, motion prediction is performed to find how much an object in a frame moves. Motion prediction is processing for finding to which part of the reference frame the target image area is most similar. During motion prediction, the evaluation value representing the similarity between a part of the reference frame and the target image area is calculated sequentially for the neighboring areas in the reference frame to calculate the coordinates where the similarity is the highest. Because this processing requires high calculation performance, several methods for reducing the processing amount have been proposed.
One of the proposed methods is the hierarchical search. As disclosed in JP-A-10-320175, this search method thins both the target image area and the reference frame and, using the thinned area and the thinned reference frame, searches the reference frame for the most similar area. After that, the method searches the neighboring areas in the reference frame corresponding to the most similar area again using the image area and the reference frame that are not thinned.
The hierarchical search requires two types of image data, non-thinned and thinned, in an external storage device. This leads to an increase in the space required in the external storage and results in an increase in cost. In addition, images are stored in the external device via the bus between the external storage device and the LSI. In this case, when two types of image data, non-thinned and thinned, are stored in the external storage device, the bandwidth of the bus is used up by transferring these two types of data and the system performance is decreased.
The present invention exchanges data positions in an area wider than the bus width when access is made to the external memory to make it possible to create a thinned image from a non-thinned image stored in the external memory without wasting the bus bandwidth.
A first embodiment will be described with reference to
The numeral 11 indicates the external memory 11 that is a DDR-SDRAM in this embodiment. Of course, this memory is not limited to a DDR-SDRAM but may be some other type of memory such as an SRAM.
Next, the following describes the video receiving circuit 1 with reference to
Next, the following describes motion prediction performed by the motion prediction circuits 2 and 3. The motion prediction refers to processing for searching the reference frame for a small area most similar to a small area in the current frame. In the description below, this small area is referred to as a macro block. Although it is assumed in this embodiment that the reference frame is a frame earlier than the current frame, it is also possible that the reference frame is a later frame as in the standard image compression algorithm or that the reference frame is composed of multiple frames. The motion prediction processing extracts an area, equal in size to the small area in the current frame, from the reference frame and calculates an evaluation value that indicates the similarity between the corresponding pixels. There are many methods for calculating the evaluation value. In this embodiment, the evaluation value is the sum of the absolute values of the differences between the corresponding pixel values. It should be noted that the processing of this embodiment does not depend on the calculation method of the evaluation value. The same calculation is performed repeatedly while changing the selection area in the reference frame until the position where the evaluation value is the minimum, that is, the position where the similarity is the highest, is found. The difference between the coordinates of the area in the reference frame where the evaluation value is the minimum and the coordinates of the area in the current frame is the motion vector.
Because this processing requires a large calculation amount, several methods have been tried to reduce the calculation amount. This embodiment uses the hierarchical search that is one of those methods.
For convenience of description, assume that the macro block is composed of 16×16 pixels and that the search range in the reference frame is composed of 512×512 pixels. Note that this embodiment does not depend on those values. Because one macro block is composed of 256 pixels in this case, 256 times of subtractions, absolute value calculations, and sum-calculation additions are required to calculate one evaluation value. If this evaluation value calculation is performed, once for each pixel, for the macro blocks included in the 512×512 area while shifting from one coordinate point to another, the total of (512−16)×(512−16)=246016 evaluation values must be calculated. The difference between the coordinates of the macro block in the reference frame where the evaluation value is the minimum and the coordinates of the target macro block is the motion vector. That is, for the subtraction only, the operations must be performed as many as 256×246016 times for one macro block of the target image.
The hierarchical search, designed to reduce the calculation amount, is performed in the following two stages. First, a coarse motion vector is calculated using thinned data of the target macro block in the current frame and thinned data of the reference frame. After that, a motion vector is calculated on a pixel basis for the neighboring areas using non-thinned data. The calculation performed first for finding a coarse motion vector is called a coarse search, while the calculation performed later for finding a motion vector on a pixel basis is called a fine search. The fine search is sometimes performed for the resolution level of one pixel or lower, and this embodiment is also applicable in such a case.
In this embodiment, data is thinned on a pixel basis. In this case, a thinned macro block is composed of 4×4 pixels, meaning that 16 times of subtractions, absolute value calculations, and sum-calculation additions are required to calculate one evaluation value. After thinning, the size of an area for calculating the evaluation value in the thinned reference frame is 256×256, meaning that (256−4)×(256−4)=63504 evaluation values are calculated. The number of times the subtraction is performed is 16×63504, which is significantly reduced as compared with that described above. The amount of other calculations, such as that of the absolute value calculation, is also reduced.
This hierarchical search is executed by the motion prediction circuits 2 and 3. The motion prediction circuit 2 executes the coarse search, while the motion prediction circuit 3 executes the fine search. The following describes the internal structure of those circuits with reference to
The control circuit 305 controls the bus interface 304 to read the data of the current frame and the reference frame from the external memory 11 and stores the data in the current frame storage memory 301 and the reference frame storage memory 302, respectively. At this time, the data that is not thinned is read.
The pixel values read from the memories 301 and 302 are sent to the difference absolute value summation circuit 303 for use in calculating the sum of the difference absolute values that is the evaluation value. The calculated evaluation value is sent to the control circuit 305. The control circuit 305 compares the received evaluation value with the minimum of the evaluation values that were already received. If the received evaluation value is smaller, the control circuit 305 updates the minimum evaluation value and, at the same time, saves the difference between the coordinates of the reference frame that is used and the coordinates of the current frame as the motion vector value. At this time, the differences between the pixels of the target block of the current frame and the pixels of the macro block in the reference frame used in the calculation, which is the intermediate result of the difference absolute value summation circuit 303, are stored in the difference data storage memory 306. When the search range in the reference frame has been searched completely, the control circuit 305 should store the motion vector value of the target macro block and the difference data storage memory 306 should store the difference data. The motion vector value is sent to the in-frame compression circuit 4 as motion information 92, and the difference data is sent to the in-frame compression circuit 4 as difference data 93. Note that, when the motion prediction does not improve the compression rate because of a scene change or when the frame is an in-frame encoding frame that is sometimes inserted into a compressed stream, the value stored in the current frame storage memory 301 is transferred directly to the difference data storage memory 306 and is output as the difference data 93.
Next, the following describes how the hierarchical searched is performed. The reference image to be used for the motion vector search is generated by the in-frame compression circuit 4 and written into the external memory 11 via the internal bus 8 and the memory control circuit 6. At this time, the write data mapping circuit 606 in the memory control circuit 6 exchanges the positions of data. The following describes this processing with reference to
In this embodiment, the data in a reference frame is represented in the plane format, and one pixel is assumed to be composed of 8 bits with each component corresponding to a plane. That is, when the leftmost area of an image is accessed, the 0th pixel from the left occurs in bits [63:56] of the data read from the write data buffer 605, the first pixel occurs in bits [55:48], and so on. When the patterns W1 and W2 are applied sequentially to the 16 pixels of horizontally consecutive data, the 16 pixels of horizontal data are written sequentially into the flip-flops 61a-61p beginning at the leftmost position of the image. When this image is sequentially read using patterns R1 and R2 in
That is, using patterns W1 and W2 alternately to write data into the flip-flops 61a-61p and using patterns R1 and R2 alternately to read the data from the flip-flops 61a-61p allow the data, received from the internal bus 8, to be sent directly to the external memory 11. This is called a data non-mapping write. On the other hand, using patterns W1 and W2 alternately to write data into the flip-flops 61a-61p and using patterns R3 and R4 alternately to read the data from the flip-flops 61a-61p divide the image data, received from the internal bus 8, into even-numbered position pixels and odd-numbered position pixels and allow the data to be written alternately to the external memory 11. This is called a data mapping write. This embodiment uses the latter write method, that is, the data mapping write, to write a reference frame into the external memory 11. As a result, the pixels in the even-numbered positions are stored in external memory addresses divisible by 16, and the pixels in odd-numbered positions are stored in addresses not divisible by 16 but divisible by 8. Data written by the control CPU 5 is written in the data non-mapping write mode. The write mode is switched by issuing a command to the bus or by referencing the block ID of the source. The mode information is saved temporarily in the register in the control circuit 607 to allow the mode to be selected based on this value.
Next, with reference to
Using patterns W1 and W2 alternately to write data into the flip-flops 60a-60p and using patterns R1 and R2 alternately to read the data from the flip-flops 60a-60p allow the data, read from the external memory 11, to be sent directly to the read data buffer 603. This is called a data non-mapping read. In this case, applying the non-mapping read to the image data area written into the external memory 11 in the mapping write mode and increasing the read address by 8 bytes every 16th byte allow only the pixels in the even-numbered positions or only the pixels in the odd-numbered positions to be read. For example, only the pixels in the even-numbered positions can be read by reading data from addresses 0-7, addresses 16-23, addresses 32-39, and so on. Conversely, only the pixels in the odd-numbered positions can be read by reading data from addresses 8-15, addresses 24-31, addresses 40-47, and so on. Because the pixels that are not used are not read from the external memory 11 in those cases, the bandwidth of data transfer between the external memory 11 and the LSI 10 can be saved.
Using patterns W3 and W4 alternately to write data into the flip-flops 60a-60p and using patterns R1 and R2 alternately to read the data allows the data, written by the write data mapping circuit 606 as pixels in even-numbered positions and pixels in odd-numbered positions, to be restored to the original image data. This is called a data mapping read. One of these two read modes, that is, data non-mapping read mode and data mapping read mode, is selected based on the value specified by a bus command from the block that issues the read request to the external memory 11 and stored temporarily in the register in the control circuit 607. In this embodiment, the data non-mapping read is used for an image data read request from the motion prediction circuit 2 and the data mapping read is used for an image data read request from the motion prediction circuit 3.
When the hierarchical search is performed in this embodiment, an image is thinned by half both horizontally and vertically when the motion prediction circuit 2 performs the coarse search. To do so, a reference frame written into the external memory 11 in the mapping write mode is read in the non-mapping mode to read only the pixels in the even-numbered positions or odd-numbered positions. In this case, by reading data every other pixel not only horizontally but vertically, the amount of data transfer between the external memory 11 and the LSI 10 can be reduced to ¼ of the data amount of direct data transfer. The coarse search is performed in this way to find the position where the evaluation value is the minimum and, after that, the motion prediction circuit 3 is used to perform the fine search to search the neighboring areas. The fine search, whose read range is smaller than that used for the coarse search, is required to read the non-thinned image. Because the external memory 11 already stores data written in the mapping mode, a non-thinned image can be obtained by reading data from this area in the mapping mode. The fine search is performed using the obtained image.
A non-thinned data is required also when the display control circuit 7 outputs the monitor image. Therefore, data is read from the frame buffer area in the mapping read mode as when the fine search is performed.
The two-level hierarchical search is performed in the first embodiment where the coarse search using an image generated by thinning every other pixel and the fine search using a non-thinned image are performed, while a second embodiment is applicable to three or higher level search. The following describes this embodiment with reference to
In addition, during execution of the non-mapping read, an image can be read every third pixel (one pixel out of four pixels) by reading the image, 8 bytes at a time, while increasing the address by 32 at a time and, similarly, an image can be read every other pixel (one pixel out of two pixels) by reading the image, 8 bytes at a time, while increasing the address by 16 at a time. The very coarse search and the coarse search can be implemented by thinning the image every third pixel or every other pixel not only horizontally but also vertically.
The present invention saves the capacity of external storage and saves the bus bandwidth required for writing a thinned image into external storage.
It should be further understood by those skilled in the art that although the foregoing description has been made on embodiments of the invention, the invention is not limited thereto and various changes and modifications may be made without departing from the spirit of the invention and the scope of the appended claims.
Number | Date | Country | Kind |
---|---|---|---|
2005-303837 | Oct 2005 | JP | national |