This application is based upon and claims the benefit of priority from the prior Japanese Patent Application No. 2008-321446 filed in Japan on Dec. 17, 2008; the entire contents of which are incorporated herein by reference.
1. Field of the Invention
The present invention relates to a cache memory device, a control method for a cache memory device, and an image processing apparatus, and more particularly to a cache memory device for storing image data of a frame, a control method for a cache memory device, and an image processing apparatus.
2. Description of Related Art
In television receivers for receiving terrestrial digital broadcasting, BS digital broadcasting, CS digital broadcasting and the like, or video recorders for reproducing video, image processing such as decoding is conventionally performed on image data.
For example, moving image data which is encoded by MPEG-4AVC/H.264 or the like is decoded and retained in frame memory as frame data. The frame data retained in the frame memory is utilized for decoding of image data of subsequent frames using a rectangular image in a predetermined area within the frame data as a reference image.
When a CPU or the like directly loads a reference image including a necessary image portion from the frame memory, e.g., SDRAM, data other than the necessary image data is also loaded. Of such data including an unnecessary portion that is loaded when a reference image is loaded, data other than the necessary image data is discarded, and even when the discarded data is required in immediately subsequent loading of another reference image, the data is loaded from the SDRAM again.
Japanese Patent Application Laid-Open Publication No. 2008-66913, for example, proposes a technique for improving cache hit rate when cache memory is used for readout of image data in image processing, because cache hit rate in image processing is low when general cache memory is utilized as-is for readout of a reference image from image data.
In an image data processing apparatus according to the proposal, multiple areas as readout units are defined in each of horizontal and vertical directions for making capacity of cache memory small and increasing cache hit rate.
In many of various types of general image processing, processing is sequentially performed from an upper left area of a frame to the right, and upon completion of processing at a rightmost end, processing is then sequentially performed from a left area immediately below to the right.
When such a way of processing is conducted, however, it is often a case that image data in an upper portion of an area in which pixels to be processed are present is already replaced with other image data and is no longer present in cache memory when reference should be made to the image data even when an index allocation method disclosed in the above proposal is employed. Conversely, the cache memory according to the proposal is not configured in consideration of a case where processing is sequentially performed from an upper left area of a frame to the right, and upon completion of processing at the rightmost end, processing is sequentially performed from the left area immediately below to the right.
According to an aspect of the present invention, there can be provided a cache memory device including a memory section configured to store image data of a frame with a predetermined size as one cache block, and an address conversion section configured to convert a memory address of the image data such that a plurality of different indices are assigned in units of the predetermined size in horizontal direction in the frame so as to generate address data, wherein the image data is output from the memory section as output data by specifying a tag, an index, and a block address based on the address data generated by the address conversion section through conversion.
Hereinafter, embodiments of the present invention will be described with reference to drawings.
First, configuration of an image processing apparatus according to the present embodiment is described with respect to
A video processing apparatus 1, which may be a television receiver, a video decoder or the like, includes a central processing unit (CPU) 11 as an image processing section, a SDRAM 12 as main memory capable of storing multiple pieces of frame data, and an interface (hereinafter abbreviated as I/F) 13 for receiving image data. These components are interconnected via a bus 14. The CPU 11 has a CPU core 11a and a cache memory 11b.
The cache memory 11b is cache memory used for image processing, and although shown to be contained in the CPU 11 in
Furthermore, while the CPU 11 is employed as an image processing section here, other circuit device, such as a dedicated decoder circuit, may be used.
The I/F 13 is a receiving section configured to receive broadcasting signals of terrestrial digital broadcasting, BS broadcasting and the like via an antenna or a network. Coded image data that has been received is stored in the SDRAM 12 via the bus 14 under control of the CPU 11. The I/F 13 may also be a receiving section configured to receive image data that has been recorded into a storage medium like a DVD, a hard disk device, and the like.
The CPU 11 performs decoding processing according to a predetermined method, such as MPEG-4AVC/H.264. That is to say, image data received through the I/F 13 is once stored in the SDRAM 12, and the CPU 11 performs decoding processing on the image data stored in the SDRAM 12 and generates frame data. In decoding processing, frame data is generated from image data stored in the SDRAM 12 while making reference to already generated frame data as necessary, and the generated frame data is stored in the SDRAM 12. In decoding processing, reference images in rectangular area units of a predetermined size in preceding and following frames, for example, are read from the SDRAM 12 and decoding is performed using the reference images according to a certain method. It means that data before decoding and decoded data are stored in the SDRAM 12.
At the time of decoding processing, the CPU 11 utilizes the cache memory 11b for reading image data of a reference image. By first accessing the cache memory 11b by the CPU core 11a of the CPU 11, memory band width is reduced and speed of reading out a reference image is increased. The CPU core 11a makes a data access to the SDRAM 12 by specifying a 32-bit memory address, for example. If data having the memory address is present in the cache memory 11b at the time, the data is read from the cache memory 11b. Configuration of the cache memory 11b is discussed later.
Next, data structure of image data stored in the SDRAM 12 will be described.
In the SDRAM 12, multiple pieces of coded frame data are stored, and in the cache memory 11b, image data for one frame divided into portions of a predetermined size is stored in each cache line, namely each cache block. Image data of the predetermined size corresponds to data in one cache block.
One frame 20 is divided into multiple rectangular area units each of which is formed of image data of a predetermined size. Each of the rectangular area units represents one readout unit. The frame 20 is a frame that is made up of multiple pixels in a two-dimensional matrix, e.g., 1,920 by 1,080 pixels here. In other words, the frame 20 is a frame of 1,920 pixels wide and 1,080 pixels long. In the SDRAM12, data for multiple frames is stored. The CPU 11 is capable of decoding such 1,920-by-1,080 pixel image data.
The frame 20 is divided into matrix-like multiple rectangular area units RU, each of which is an image area as a readout unit, as shown in
As one frame made up of 1,920 by 1,080 pixels is divided into matrix-like multiple rectangular area units RU each having a size of 16 by 8 pixels, the frame 20 is divided into horizontally 120 and vertically 135 rectangular area units RU as illustrated in
As described later, a cache data storage section (hereinafter called a memory section) in the cache memory 11b permits designation of a line or a cache block by means of an index.
Moreover, as described below, an index is assigned to each of multiple (here 120) rectangular area units RU of a frame that align widthwise. In
Image data in units of the rectangular area unit RU of the predetermined size is stored in the cache memory 11b as one piece of cache block data, and image data for multiple rectangular area units RU of a frame is stored in the cache memory 11b. In other words, the memory section stores image data for a frame with the predetermined size as one cache block, and image data for one rectangular area unit RU (i.e., 128-pixel data) is stored in one cache block.
Processing for decoding is generally performed by scanning two-dimensional frame data widthwise. In the case of
In both the cases of
The cache memory 11b includes a tag table 21, a memory section 22, a tag comparator 23, a data selector 24, and an address conversion section (hereinafter also called just a conversion section) 25. Memory address data 31 from the CPU core 11a is converted into address data 32 in the conversion section 25. Address conversion will be discussed later.
The cache memory 11b and the CPU core 11a are formed on a single chip as a system LSI, for example.
The tag table 21 is a table configured to store tag data corresponding to individual index numbers. Herein, index numbers are from 0 to n.
The memory section 22 is a storage section configured to store cache block data corresponding to individual index numbers. As mentioned above, the memory section 22 stores frame image data of the predetermined size as one cache block.
The tag comparator 23 as a tag comparison section is a circuit configured to compare tag data in the address data 32 that is generated by conversion of the memory address data 31 from the CPU core 11a with tag data in the tag table 21, and output a match signal as an indication of a hit when there is a match.
The data selector 24 as a data selection section is a circuit configured to select and output corresponding data in a selected cache block based on block address data in the address data 32. As shown in
The conversion section 25 is a circuit configured to apply predetermined address conversion processing to the memory address data 31 from the CPU core 11a for replacing internal data as discussed below to convert the memory address data 31 into the in-cache address data 32 for the cache memory 11b. More specifically, the conversion section 25 generates the address data 32 by converting the memory address data 31 for image data so that multiple indices are assigned in units of the predetermined size horizontally in a frame.
The CPU core 11a outputs the memory address data 31 for data that should be read out, namely an address in the SDRAM 12, to the cache memory 11b. The memory address data 31 is 32-bit data, for example.
The conversion section 25 performs the aforementioned address conversion processing on the memory address data 31 that has been input or specified, and through specification of a tag, an index, and a block address based on the data after conversion, image data is output from the memory section 22 as output data to the CPU core 11a.
Since a block address is an address for specifying pixels in a rectangular area unit RU of M by N bits, each cache block is configured such that M>N. And the index 32b of the address data 32 includes data that indicates a horizontal position in a frame and at least a portion of data that indicates a vertical position in the frame.
As mentioned above, the conversion section 25 converts the memory address data 31 into the address data 32. The memory address data 31 is 32-bit address data, and the address data 32 in the cache memory 11b is also 32-bit address data. The address data 32 is made up of a tag 32a, an index 32b, and a block address 32c.
Correspondence between the memory address data 31 and the address data after conversion 32 is as follows. A predetermined bit portion A on higher-order side in the memory address data 31 directly corresponds to a bit portion A1 on the higher-order side in the tag 32a of the address data 32. A predetermined bit portion E on lower-order side in the memory address data 31 directly corresponds to a bit portion E1 on the lower-order side in the block address 32c of the address data 32.
A bit portion B that neighbors the bit portion A on the lower-order side in the memory address data 31 corresponds to a bit portion H on the lower-order side in the index 32b of the address data 32, and corresponds to the bit portion H that indicates a horizontal position in the matrix of rectangular area units RU in a frame.
A bit portion D in the memory address data 31 that neighbors the bit portion E on the higher-order side corresponds to a bit portion V in the address data 32 that neighbors the bit portion H on the higher-order side, and corresponds to a bit portion V that indicates a vertical position in the matrix of rectangular area units RU in a frame.
A bit portion C between the bit portion B and the bit portion D in the memory address data 31 is divided into two bit portions, C1 and C2. The bit portion C1 corresponds to the bit portion between the bit portion A1 of the tag 32a and the bit portion V in the address data 32. The bit portion C2 corresponds to the bit portion between the bit portion E1 of the block address 32c and the bit portion H.
The conversion section 25 performs conversion processing for association as described above when data is written into the cache memory 11b and when data is read from the cache memory 11b.
Operations of the cache memory 11b at the time of data readout in the present embodiment will be described.
When the memory address 31 is input from the CPU core 11a, such conversion processing as illustrated in
The tag comparator 23 of the cache memory 11b compares tag data stored in the tag table 21 that is specified by the index 32b in the address data 32 with tag data in the tag 32a, and outputs a match signal for indicating a hit to the data selector 24 if the two pieces of data match.
If the two pieces of data do not match, the tag comparator 23 returns a cache miss. Upon a cache miss, refilling is carried out.
The index 32b of the address data 32 is supplied to the memory section 22, and a cache block stored in the memory section 22 that is specified by the supplied index is selected and output to the data selector 24. Upon input of a match signal from the tag comparator 23, the data selector 24 selects data in the cache block that is specified by the block address 32c of the address data 32, and outputs the data to the CPU core 11a.
That is to say, as shown in
As described above, according to the present embodiment, indices are uniquely assigned to multiple blocks that align in horizontal direction with M-by-N pixel image data as one cache block in a two-dimensional frame. Accordingly, since all data in horizontal direction of a frame can be cached in the cache memory, cache hit rate is increased in decoding processing on an image processing apparatus in which image processing is often performed in order of raster scanning, such as a video decoder.
While the above-described video processing apparatus according to the first embodiment is an apparatus for processing non-interlaced images, a video processing apparatus according to a second embodiment of the present invention is an example of an apparatus that processes interlaced images. A cache memory device of the video processing apparatus according to the present embodiment is configured to store data in a memory section such that data for top field and data for bottom field of an interlaced image are not present together within a cache block. Such a configuration reduces occurrence frequency of cache misses.
As the video processing apparatus has a similar configuration to that of the apparatus shown in
Since some of various types of image processing for an interlaced image use only top field data, for example, cache misses would occur with a high frequency if top field data and bottom field data are present together in a cache block. Thus, in the present embodiment, data is stored such that only either top or bottom field data is present in each cache block of the cache memory.
In other words, in the SDRAM 12, top field data and bottom field data are stored in any of various layouts, whereas in the cache memory 11b, data is stored such that only either the top or the bottom field of data of a frame stored in the SDRAM 12 is present in each cache block.
In the case of
In the case of
The address conversion section 25 applies address conversion processing described below to each piece of pixel data of
The conversion section 25 converts the memory address data 31 into the address data 32A. As in the first embodiment, the memory address data 31 is 32-bit address data and the address data 32A in the cache memory 11b is also 32-bit address data. The address data 32A is made up of a tag 32a, an index 32b, and a block address 32c.
In the present embodiment, correspondence between the memory address data 31 and the address data after conversion 32A is as follows. The conversion section 25 performs address conversion processing such that a bit portion T/B made up of one bit, or two or more bits which is included on the lower-order side in the 32-bit data of the memory address data 31 as indication data for showing distinction between top and bottom field is moved to and included at a predetermined position in the tag 32a of the address data after conversion 32A. In the case of
As shown in
Furthermore, as in the first embodiment, index numbers are uniquely assigned in each of top and bottom fields within each block row of a frame as shown in
In the layout of
The conversion section 25 converts the memory address data 31 into the address data 32B. As in the first embodiment, the memory address data 31 is 32-bit address data and the address data 32B in the cache memory 11b is also 32-bit address data. The address data 32B is made up of a tag 32aB, an index 32bB, and a block address 32cB.
In the present embodiment, correspondence between the memory address data 31 and the address data after conversion 32B is as follows. The conversion section 25 performs conversion processing to move a bit portion T/B made up of one bit, or two or more bits which is present in the data portion 31b to a predetermined position in the tag 32aB of the address data 32B. The bit portion T/B is present in the data portion 31b of the memory address data 31 that corresponds to the index 32bB. That is to say, also by performing conversion processing so as to move the bit portion T/B which is data indicative of field polarity to the higher-order side of the index 32bB, only either top or bottom field data is present in each cache block and such a situation is prevented in which cache capacity is virtually only partially used. In other words, a cache block corresponding to an index contains only either top or bottom field data, and all of available indices can be used even during processing that uses only the top field, for example.
As shown in
In the case of
Since the bit portion T/B is sometimes represented in two or more bits as mentioned above, the bit portion T/B can be present in both portions corresponding to the index and the block address of the address data after conversion 32, namely in both the data portions 31b and 31c of the memory address data 31.
In such a case, address conversion may be performed as shown in
As illustrated in
Operations of the cache memory 11b at the time of data readout in the present embodiment are similar to those of the cache memory 11b of the first embodiment and are different only in that conversion processing performed in the conversion section 25 is such conversion processing as illustrated in
As described above, for an interlaced image of a field structure, decoding processing is carried out separately on top field and bottom field. Therefore, if the two types of field data are present together in a cache block, data of both the fields will be read into the cache even when only data for either one of the fields is required, which decreases cache efficiency.
According to the above-described cache memory device 11b of the present embodiment, cache efficiency does not decrease because only either top or bottom field data is stored in each cache block.
Also, even if only data for either one of the two fields is required when individual cache blocks are allocated in a manner that the cache blocks are used for only either the top or bottom field, cache blocks allocated to data of the other field would not be used at all, which decreases cache efficiency. Thus, by adopting such an index allocation method as illustrated by
Thus, according to the present embodiment, cache hit rate for image data in decoding processing is improved even for interlaced frames on an image processing apparatus in which image processing is often done in order of raster scanning, such as a video decoder.
Now, a third embodiment of the present invention will be described.
Decoding processing can include processing in which the area of a referenced image is changed in accordance with the type of processing during decoding processing. One of such types of processing is processing that includes adaptive motion predictive control, e.g., Macro Block Adaptive Frame/Field (MBAFF) processing in MPEG-4AVC/H.264.
In
In a particular way of processing, e.g., the MBAFF processing mentioned above, however, image processing may be performed with 16 by 32 pixels as a processing unit. In the case of
To further improve cache hit rate in such a case, the present embodiment changes a number of ways in the cache memory in accordance with the type of image processing, more specifically, change in the pixel area of the processing unit. To be specific, when the processing unit has changed to the processing unit PU, the number of ways is decreased in the cache memory 11b in order to increase the number of indices so as to conform to the processing unit PU.
As a result, in the case of
The cache memory 11bA shown in
The conversion section 25 performs the address conversion processing described in the first or second embodiment. Address data after address conversion is maintained in a register as two pieces of data, D1 and D2, in accordance with the number of indices associated with change of the number of ways as discussed later.
When the aforementioned MBAFF processing which involves lengthwise expansion of the processing unit area in a frame is executed, a predetermined control signal CS for changing the number of ways is supplied from the CPU core 11a to the way switch 41. Upon input of the predetermined control signal CS, the way switch 41 outputs a way-number signal WN which indicates the number of ways after change to each of the selectors 42, 43 and 44. The control signal CS is a signal that indicates a change of the pixel area of the processing unit.
The selector 42 outputs the block address (BA) of one piece of address data selected from multiple pieces of address data (two pieces of address data here) in accordance with the way-number signal WN to the data selector 24A. In the case of
The selector 43 outputs the index number of one piece of address data selected from multiple pieces of address data (two pieces of address data here) in accordance with the way-number signal WN to the tag table 21A and the memory section 22A.
The selector 44 outputs the tag of one piece of address data selected from multiple pieces of address data (two pieces of address data here) in accordance with the way-number signal WN to the tag comparison section 23A.
As shown above, the way switch 41 receives the predetermined control signal CS from the CPU core 11a and outputs the way-number signal WN to each of the selectors (SEL). The predetermined control signal CS is a processing change command or data that indicates a change in processing state, and in the present embodiment, the control signal CS is data indicating that a general image processing state has been changed to a processing state like MBAFF processing or indicating the MBAFF processing state.
Change of the number of ways and change of index numbers are made by changing assignment to multiple storage areas in the cache memory 11bA.
Operations of the cache memory 11bA of
When image processing has been changed to processing that involves change to the processing unit, e.g., MBAFF processing, during operation of the video processing apparatus 1 shown in
Upon the cache memory 11bA receiving the control signal CS, the way switch 41 outputs the way-number signal WN (=2) to the selectors 42, 43 and 44 for changing the number of ways to two.
Then, the selector 42 selects the block address (BA) of address data 32D2 that corresponds to two ways from the two pieces of address data 32D1 and 32D2, and outputs the address to the data selector 24A.
The selector 43 selects the index number of the address data 32D2 that corresponds to two ways, and outputs the index number to the tag table 21A and the memory section 22A.
The selector 44 selects the tag of address data 32D2 that corresponds to two ways, and outputs the tag to the tag comparison section 23A.
As a result, the memory section 22A outputs output data with the index and block address (BA) specified based on the address data 32D2 containing an index with an increased number of indices, so that the number of indices is increased as described in
Thereafter, when image processing has shifted to processing that was being executed before the MBAFF processing or another type of processing, the control signal CS becomes a signal that indicates MBAFF processing is no longer being performed. As a result, the cache memory 11bA returns the number of ways from two to four and the number of indices to the number from 1 to 119 and from 128 to 247, which were originally used. The selectors select the address data 32D1 and respectively output the block address (BA), index, and tag of the address data 32D1.
As has been described, during MBAFF processing, the present embodiment halves the number of ways to thereby double the number of indices or allocates two block rows of 16 by 16 pixels to one way.
Thus, the number of ways in each of the tag table 21A and the memory section 22A is changed in accordance with a change to the number of ways, resulting in an increased number of indices in the tag table 21A and the memory section 22A. Accordingly, cache hit rate can be improved even during processing in which the pixel area of the processing unit expands in the vertical (or lengthwise) direction in a frame.
In general, cache hit rate of cache memory is improved by increasing the number of ways. However, when the processing unit for image processing becomes large as described above, cache hit rate can be increased by decreasing the number of ways to increase indices.
As described above, according to the present embodiment, when two cache memories have the same cache capacity and number of bytes per cache block, the number of indices of cache blocks becomes large when the number of ways is small and the number of indices becomes small when the number of ways is large. Therefore, with regard to image processing, indices can be uniquely assigned over a wide range of an image when the number of ways is small, and the opposite when the number of ways is large. Cache memory is efficiently utilized to improve cache hit rate by reducing the number of ways to keep data for a wide range of an image within cache when access granularity to image data is high and increasing the number of ways to flexibly replace data for a small range of an image when access granularity is low.
Especially data coded using MBAFF processing of MPEG-4AVC/H.264 is processed by concurrently using two macroblocks in the vertical direction, meaning that the pixel area of processing unit of an image to be decoded is large or wide compared to when MBAFF processing is not used. Accordingly, access granularity to a reference image and the like also become large. Therefore, for stream data using MBAFF processing, cache memory can be utilized more efficiently in some cases by making the number of ways smaller than that for general stream data.
As has been described, according to the above-described embodiments, cache hit rate can be improved in a cache memory device that stores image data.
The present invention is not limited to the above-described embodiments and various changes and modifications are possible without departing from the scope of the invention.
Number | Date | Country | Kind |
---|---|---|---|
2008-321446 | Dec 2008 | JP | national |