1. Field
The invention relates generally to storage arrangements for reduced color-resolution image data in a memory, and particularly to methods and apparatus for efficiently accessing reduced color-resolution image data.
2. Description of the Related Art
Pixels defined in the YCbCr color space are described by a luma component (Y), and two chroma components (Cb, Cr). Each component is often represented by a one byte value. In addition, pixels in a digital image are arranged in raster order. A raster pattern refers to the scanning of an image from side-to-side in lines from top-to-bottom.
Commonly, digital images are stored into and fetched from a computer memory, and it is generally important to minimize the size of memory in computer systems. One technique sometimes employed for reducing memory requirements is reducing the color resolution of image data. The human eye is more sensitive to brightness than to color so the color resolution of an image can be lowered with modest visual impact. Color resolution is reduced by chroma subsampling. That is, while the luma information for every pixel in the image is sampled, the chroma information from fewer than all of the pixels is sampled. Reducing the color resolution of an image means that it can be stored in fewer bytes than is required to store the image at its full-resolution. Certain operations, however, can not be performed on a reduced color-resolution image. Accordingly, before these operations can be performed, the missing color information must be reintroduced into the image (by interpolation or repetition).
The color resolution of an image may be reduced horizontally, vertically, or in both dimensions. For example, if the color resolution is reduced horizontally, sampling is performed on groups of horizontally adjacent pixels in a line, such as a group of four adjacent pixels. The Y information of every pixel in the group is sampled, but the Cr and Cb information from less then every pixel is sampled. Reduction in the vertical direction is analogous, except sampling is performed on groups of vertically adjacent pixels in a column. On the other hand, if the color resolution is reduced both horizontally and vertically, sampling is performed on a group of pixels that are both horizontally and vertically adjacent, such as a group of two horizontally adjacent pixels in two vertically adjacent lines. As before, the Y information of every pixel in the group is sampled, but the Cr and Cb information from less then every pixel is sampled. For example, Cr information may be sampled from every other pixel on the odd-numbered lines, while Cb may be sampled from every other pixel on the even-numbered lines. When the color-resolution of the image has been reduced both horizontally and vertically, data access efficiency may suffer when the image is fetched for the purpose of reintroducing the missing color information.
Memory bandwidth refers to the amount of data that can be written to or read from a memory in a given time period, e.g., bytes per second. The amount of memory bandwidth available in a system depends on the memory clock frequency and the width of the memory bus. Given a particular bus width and frequency, there is a finite amount of bandwidth available in any given time period. The percentage of that finite amount of bandwidth being used at any time depends on the particular operation(s) being performed. It also depends on how efficiently those operations are performed. Systems must be designed so that there is always a sufficient amount of bandwidth available for performing necessary operations. Stated another way, a system should have enough bandwidth to accommodate peak and not merely average bandwidth requirements. However, as power consumption is proportional to clock frequency, it is desirable to limit the memory bandwidth (e.g., clock frequency) as much as possible, while still accommodating peak bandwidth requirements.
Of course, it is also desirable to have operations performed as efficiently as possible in terms of the number of memory accesses they need. Typically, memories formed from semiconductors are conceptually organized into rows and columns. When a memory is accessed, there is a maximum number of bytes at a particular row address that can be written or read in each access cycle. For example, if the memory bus is four bytes wide, it is possible (in an SRAM type memory) to access up to four bytes within a row in a single access. Sometimes four bytes from the specified row are needed, but at other times fewer than four may be needed. “Data access efficiency,” as the term is used herein, refers to the percentage of bytes available in a memory read cycle that are actually needed. Obviously, an operation that requires four memory cycles to fetch four bytes is less efficient than one that can fetch the four bytes in a single memory cycle. Similarly, the term also refers to the percentage of the maximum possible number of bytes that are actually stored in a memory write cycle.
Generally, an image will be transmitted for storing in a memory in raster order. A common way to store an image in memory is to store raster-ordered pixels at sequential memory addresses as they are received, which is an efficient way to store the image data. However, if an image in which the color-resolution has been reduced both horizontally and vertically is stored in raster order, data access efficiency can suffer when the image is fetched for the purpose of reintroducing the missing color information. This is because the Cr and Cb components needed to calculate missing color information are not stored “locally,” that is, they are not stored in the same row with the associated Y components.
Another situation where data access efficiency suffers is where a reduced color-resolution image is presented in raster order for storing in a memory and it is desired to display the image in a rotated orientation. An image may be rotated when it is stored into or when it is fetched from memory. Consider the case of rotating an image by 90 degrees upon storing. Further, assume that the color-resolution of the image has been reduced both horizontally and vertically. This operation typically requires that the image data be stored in such a way that fetching from sequential memory addresses provides a rotated, raster-ordered version of the image. The reason that “data access efficiency” suffers in this situation is because the Cr and Cb components needed to calculate missing color information for associated Y components should, for efficent fetching, be stored locally in the memory. However, the Cr and Cb components are not “local” to one another in the raster-ordered data stream presented for storage. First, two Y components and the Cr component from a first line will appear sequentially in the data stream. Later, two Y more components and the Cb component from pixels in the same image columns, but on the line below will appear sequentially in the data stream. The first and second groups of three components separated temporally by the time needed to store a line of image data. Accordingly, because there is a time delay before the second group can be stored, two memory accesses are needed.
Data access efficiency can also be penalized where a single storage arrangement is used for displaying an image in both rotated and non-rotated orientations. As an example, the frame to be displayed often includes two or more distinct images, e.g., a main window and one or more sub-windows. It is desirable to be able to rotate one image while not rotating the other. It is also desirable to store the entire frame using a single storage arrangement. However, displaying a main window without rotation while displaying a sub-window in a rotated orientation is a situation where data access efficiency suffers when the image is stored using certain storage arrangements.
Accordingly, there is a need for storage arrangements for reduced color-resolution image data, and particularly to methods and apparatus for efficiently accessing reduced color-resolution image data in a memory.
The invention is directed, in one embodiment, to a method for generating memory addresses for accessing an image in which groups of pixels share chroma components. The method includes providing a memory, having a plurality of first portions and a plurality of second portions. In addition, the method includes generating first memory addresses, each of which corresponds to one of the first portions of the memory. Each such first address may be used as an address where the luma components of one of the pixel groups are stored. In addition, the method includes generating second memory addresses, each of which corresponds to one of the second portions of the memory. Each such second address may be used as an address where the chroma components one of the pixel groups is stored.
In another embodiment, the invention is directed to a graphics display controller for use with the data of an image for which groups of pixels share chroma components as a result of a reduction in the color resolution of the image. The graphics display controller includes a memory, having a plurality of first portions and a plurality of second portions, and an address generator. The address generator is capable of generating first memory addresses, each of which corresponds to one of the first portions. Each such first address may be used as an address where the luma components of one of the pixel groups are stored. In addition, the address generator is capable of generating second memory addresses, each of which corresponds to one of the second portions. Each such second address may be used as an address where the chroma components of one of the pixel groups are stored.
In another embodiment, the invention is directed to a device for use with the data of an image for which groups of pixels share chroma components as a result of a reduction in the color resolution of the image. Preferably, the device includes an image data source, a display device, a memory, and an address generator. The memory has a plurality of first portions and a plurality of second portions. The address generator is capable of generating first memory addresses, each of which corresponds to one of the first portions. Each such first address may be used as an address where the luma components of one of the pixel groups are stored. In addition, the address generator is capable of generating second memory addresses, each of which corresponds to one of the second portions. Each such second address may be used as an address where the chroma components of one of the pixel groups are stored.
The objectives, features, and advantages of the invention will be more readily understood upon consideration of the following detailed description of the invention, taken in conjunction with the accompanying drawings.
a illustrates a group of pixels.
b illustrates an exemplary image having groups of pixels.
a illustrates first and second exemplary memories.
b illustrates a third exemplary memory.
The present invention is directed generally to storage arrangements for reduced color-resolution image data in a memory, and particularly to methods and apparatus for efficiently accessing reduced color-resolution image data. Reference will now be made in detail to the present preferred embodiments of the invention, examples of which are illustrated in the accompanying drawings. Wherever possible, the same reference numbers are used in the drawings and the description to refer to the same or like parts.
In the YCbCr color space, luma (Y) refers to a quantity representative of luminance. The term “color-difference” is used herein as having the same meaning as “chroma.” In the art, the term “YUV” is sometimes used to refer to the YCbCr color space. In this specification, the term YUV is used to refer to the YCbCr color space. e.g., Y=Y, U=Cr, and V=Cb. In addition, the terms “sample” and “component” are used herein with respect to a pixel as having the same meaning.
Chroma subsampling is usually expressed in terms of the number of each of three types of components in a sample area, i.e., #:#:#. The sample area is often four pixels and common sample formats include: 4:2:2, 4:2:0, and 4:1:1. In the 4:2:0 format, one color-difference component has half the sample rate of the luma components, and alternate color-difference components are sampled in alternate lines. Preferably, the inventions disclosed herein are for use with reduced color-resolution image data in the 4:2:0 format.
a illustrates a 2×2 group (or “tile”) 20 of pixels, specifically the pixels P0, 0, P0, 1, P1, 0, and P1, 1. (Herein, the first subscript denotes the row and the second the column.) Image data is generally presented in raster order for chroma subsampling. Thus, the pixels P0, 0 and P0, 1 would be presented first for sampling. Preferably, the U values of two adjacent pixels on a line are averaged to create a U sample, e.g., U0, 0. Subsequently, after the entire line 0 has been presented, the pixels P1, 0, and P1, 1 would be presented for sampling. Preferably, the V values of two adjacent pixels on the next sequential line are averaged to create a V sample, e.g., V1, 0. One of ordinary skill in the art will appreciate other ways in which the U and V samples may be created, however, it should be noted that the particular manner in which the U and V values are created is not critical.
In
b shows an exemplary image 26. The image 26 comprises four rows and eight columns of pixels. Typical images are much larger, but the image 26 is useful in this description for illustrating the principles of the invention. The image 26 includes the pixel tile 20.
The embodiments of the invention may be used with memories having a variety of different row and column configurations. Preferably, the embodiment are for use with a row and column configuration according to the memory 30 described below. In alternative embodiments, the inventions disclosed herein may be used with row and column configurations according to the memories 36 and 40 described below, as well as with other memory configurations.
a illustrates the first exemplary memory 30 and the second exemplary memory 36.
Each of the memories 30, 36, and 40 is notionally organized into a plurality of rows and columns. The memory 30 includes two “banks” 32, 34. Bank 32 includes a first set of rows, while Bank 34 includes a second set of rows. In the memory 30, the rows of bank 32 are designated or allocated as first “portions” while the rows of bank 34 are designated or allocated as second “portions.” The memories 36 and 40 are comprised of a single bank 38, 42, respectively. With respect to the memory 36, the bytes 0-3 of each row are allocated as first “portions” and the bytes 4-7 of each row are designated as second “portions.” In the memory 40, some of the rows are allocated as first portions while others are allocated or designated as second portions, as shown.
In order to access data in a random access memory, such as the memories 30, 36, and 40, the first step a memory controller needs to perform is to specify a row address. Once the memory controller has selected a particular row, it can select one or more columns in the row to access. To access data in another row, the memory controller needs to first repeat the step of specifying a row address.
In the memory 30, the first and second banks, or RAM cells, may or may not reside in the same semiconductor, but generally have the property that each bank can be accessed simultaneously. In other words, the data stored in any first portion and the data stored in any second portion may be fetched at the same time. Likewise, data may be stored in the two portions simultaneously.
In contrast, data stored in particular first and second portions of the memory 36 cannot, generally speaking, be accessed at the same time as only one row in the bank 38 can be accessed at any point in time. (Only when the desired first and second portions reside in the same memory row is it possible to access first and second portions simultaneously.) In addition, data stored in particular first and second portions of the memory 40 can not be accessed at the same time. Like memory 36, only one row in the memory 40 can be accessed at any point in time.
The number of bits that can be stored in a memory row is referred to as the “memory width.” Each bank of a memory is capable of storing one or more bits. In a preferred embodiment, the above-described “portions” are capable of storing 4 bytes. In an alternative embodiment, the portions are capable of storing 8 bytes. Thus, the width of banks 32 and 34 are 4 bytes, the width of bank 38 is 8 bytes, and the width of bank 42 is 4 bytes. In other embodiments of the invention, alternative memory widths may be employed.
In the Background above, it was said that the Cr and Cb components are not “local” to one another in the raster-ordered data stream presented for storage.
The metric “accesses per pixel” is used herein to quantify the degree of data access efficiency. If it is assumed that a 2×2 group of pixels in the YUV 4:2:0 format is defined by 6 bytes, each component being one byte, and the memory bus is four bytes wide, the best achievable efficiency for storing or fetching data is 0.375 accesses per pixel. For writing data, this efficiency would be achieved where an input stream of pixel components is separated into groups of four sequential samples, and each group is placed on the bus for storing in a single access. In the 4:2:0 format, six samples define four pixels so each sample (“s”) effectively defines ⅔ of a pixel (“p”): (6s/4p=1s/0.667p). In each access four samples are stored, and four samples are equivalent to 8/3 pixels (4*0.667p). Therefore, each access stores 8/3 pixels. Accordingly, when expressed in accesses (“a”) per pixel, the efficiency is 0.375 accesses per pixel: (1a/4s=1a/( 8/3p)=⅜ a/p). Similarly, for fetching data, this efficiency is achieved where four pixel components are placed on the bus in every read access. When both storing and fetching data are taken into account, the best achievable data access efficiency is 0.75 accesses per pixel (0.375+0.375).
Fetching the image data for raster order, non-rotated presentation requires 0.625 memory accesses per pixel. In addition, fetching the image data for raster order, 90 degree-rotated presentation requires 1.0 memory accesses per pixel. Thus, the total accesses per pixel for both storing and fetching are 1.0 and 1.375, respectively, for non-rotated and 90 degree-rotated presentations. Moreover, as mentioned above, only one row can be accessed at a time. So even though data access is not as inefficient as other arrangements, the memory clock would need to be much faster than is needed when compared with a storage arrangement that employs two independently accessible banks, such as the memory 30.
Writing raster-ordered image data in the storage arrangement of
Fetching image data arranged as shown in
With respect to the storage arrangement of
For comparison purposes,
If image data is rotated on storing instead of when fetching, the number of memory accesses per pixel for bank 34 can be improved to 0.5, which reduces total accesses per pixel for bank 34 to 0.75, reducing the total accesses per pixel to 2.0 for 90 degree-rotated presentations. The data access efficiency for rotating upon storing is summarized below:
It can be seen that a peak bandwidth condition of 1.25 accesses per pixel occurs in bank 32 if the image data is rotated by 90 degrees upon storing. (This peak bandwidth condition of 1.25 accesses per pixel also occurs in bank 32 if the image data is rotated by 270 degrees upon storing.)
The image that is stored in a memory and fetched for display is preferably a “frame.” As the term is used herein, “frame” generally refers to the set of pixels (pixmap) that fills a display screen. A frame may be comprised of one or more still or video images that may be displayed in one or more windows, sprites, or other overlays. Different windows may be used for displaying the output of simultaneously running applications. For example, a main window may display the user interface for the communication functions of a mobile telephone, while a sub-window simultaneously displays a television video stream.
It is desirable to be able to rotate one image within a frame while not rotating the other. It is also desirable to store the entire frame using a single storage arrangement. However, displaying a main window without rotation while displaying a sub-window in a rotated orientation is a situation where data access efficiency suffers when the image is stored using certain storage arrangements. For example, consider the storage arrangement depicted in
In a preferred embodiment, the memory addresses for the luma components of horizontally adjacent pixel groups correspond to sequential first portions of the memory. For example, the pixel groups T0, T1, T2, and T3 are horizontally adjacent. Likewise, the pixel groups T4, T5, T6, and T7 are horizontally adjacent (
Preferably, the memory addresses of the chroma components of horizontally adjacent pairs of pixel groups correspond to sequential second portions of the memory. For example, the pixel groups T0, T1, T2, and T3 are horizontally adjacent, and the pixel groups T0 and T1 form a horizontally adjacent pair, as does the pixel group T2 and T3. Likewise, the T4, T5, T6 and T7 are horizontally adjacent, and the pixel groups T4 and T5 form a horizontally adjacent pair, as does the pixel group T6 and T7. Referring to
Storing image data according to the arrangement of
It can be seen that a peak bandwidth condition of 1.0 accesses per pixel occurs in bank 32 if the image data is stored for non-rotated fetching according to one embodiment of the invention.
Now consider access efficiency if the data stored in the arrangement of
It can be seen that a peak bandwidth condition of 1.0 accesses per pixel occurs in bank 32 if the image data is stored for non-rotated fetching, but fetched for rotated presentation according to one embodiment of the invention.
In a preferred embodiment, the memory addresses for the luma components of vertically adjacent pixel groups correspond to sequential first portions of the memory. For example, the pixel groups T0 and T4 are vertically adjacent. Likewise, the pixel groups: T1 and T5; T2 and T6; T3 and T7 are vertically adjacent (
In a further preferred embodiment, the memory addresses of the chroma components of vertically adjacent pairs of pixel groups correspond to sequential second portions of the memory. For example, the pixel groups: T0 and T4; T1 and T5; T2 and T6; T3 and T7 are vertically adjacent pairs. Referring to
Storing image data according to the arrangement of
It can be seen that a peak bandwidth condition of 1.0 accesses per pixel occurs in bank 32 if the image data is stored for rotated fetching according to one embodiment of the invention.
Now consider access efficiency if the data stored in the arrangement of
It can be seen that a peak bandwidth condition of 1.0 accesses per pixel occurs in banks 32 and 34 if the image data is stored for rotated fetching, but fetched for non-rotated presentation according to one embodiment of the invention.
Comparing storage arrangements of
In contrast, the storage arrangements according to the invention provide a significant advantage when used for both non-rotated and 90 degree-rotated presentations. There is no corresponding need to increase clock frequency to accommodate a bandwidth requirement peak. Thus, it will be appreciated that the storage arrangement of the present invention advantageously reduces clock speed and conserves power.
The host 54 is typically a microprocessor, but may be a digital signal processor, a computer, or any other type of controlling device adapted for controlling digital circuits. The host 54 communicates with the graphics controller 52 over a bus 60 that is coupled with a host interface 62 in the graphics controller.
The graphics controller 52 includes a display device interface 64 for interfacing between the graphics controller and the display device 56 over display device bus 66. LCDs are typically used as display devices in portable digital appliances, such as mobile telephones, but any devices capable of rendering pixel data in visually perceivable form may be employed. In a preferred embodiment, the display device 56 is an LCD that has a display area 56a. In another preferred embodiment, the display device 56 is a printer.
Preferably, the graphics display controller 52 is a separate integrated circuit from the remaining elements of the system, that is, the graphics controller is “remote” from the host, camera, and display device. The graphics controller 52 includes a camera interface 68 (“CAM I/F”) for receiving pixel data output on data lines of a bus 70 from the camera 58.
A number of image processing operations may be performed on data provided by an image data source, such as the host or the camera. Such image processing operations may be performed by units included in an image processing block indicated generally as 72. The image processing block 72 may include, for example, a CODEC for compressing and decompressing image data. In addition, the image processing block 72 preferably includes a unit for reintroducing missing color information into reduced color resolution image data, such as by interpolation or repetition of U and V components. Further, the image processing block 72 preferably includes a unit for converting the image data from the YUV color space to the RGB color space. Moreover, the image processing block 72 preferably includes a unit for scaling and cropping image data received from the host 54 and the camera 58.
The source of image data in the system 50 is preferably the camera 58, however, this is not essential. Image data may be provided by the host or any other image data source 44. Further, in alternative embodiments, the image data may be provided by a plurality of image data sources simultaneously or at different times.
In a preferred embodiment, the graphics controller 52 includes a memory 74 for storing frames of image data in a frame buffer 76. In other embodiments, however, the memory 74 may be remote from the graphics controller. Data are stored in and fetched from the memory 50 under control of a memory controller 78. The memory 74 is preferably an SRAM, however, any type of memory may be employed. Image data stored in the memory 76 are fetched and transmitted through a display pipe 80. Reduced color-resolution image is generally not suitable for display. Accordingly, display pipe 80 may include logic for reintroducing missing color information into the image, such as by interpolation or repetition. Image data are transmitted from the display pipe 80 through the display device interface 64 and output bus 66 to the display device 56.
In one preferred embodiment, the graphics controller includes a chroma subsampling unit 46. The chroma subsampling unit 46 preferably samples the data in the manner described above with reference to
One advantage of storing image data according to the invention is that only a single address counter 49 is needed, as opposed to alternative storage arrangements where separate counters are needed for luma and chroma samples.
While image data is preferably rotated upon storing, in an alternative embodiment image data may be rotated on the output side. In the case of fetching for display, the display pipe 80 employs an address generator 82 that provides memory addresses for luma and chroma components. In the case of fetching for further processing, the image processing logic 72 uses the address generator 82 in a similar manner.
The address generator 82 provides addresses such that the samples can be fetched from memory according to the invention. As mentioned above, an image stored for non-rotated presentation e.g.,
While the YCbCr color space (referred to herein as YUV) is preferred, the embodiments of the invention are not limited for use only with image data of this type. Embodiments of the invention may be used with image data defined in any suitable color space, such as YUV, RGB, YIQ, CMYK, YPbPr, HSV, and HSL.
With the above embodiments in mind, it should be understood that the invention may employ various computer-implemented operations involving data stored in computer systems. These operations are those requiring physical manipulation of physical quantities. Usually, though not necessarily, these quantities take the form of electrical or magnetic signals capable of being stored, transferred, combined, compared, and otherwise manipulated. Further, the manipulations performed are often referred to in terms, such as producing, identifying, determining, or comparing.
Any of the operations described herein that form part of the invention are useful machine operations. As described above, the invention preferably relates to a device or an apparatus specially constructed for performing these operations. It should be appreciated, however, that the invention may be employed in a general purpose computer selectively activated or configured by a computer program stored in the computer. In particular, various general purpose computer systems may be used with computer programs written in accordance with the teachings herein. Accordingly, it should be understood that the invention can also be embodied as computer readable code on a computer readable medium. In one embodiment, the invention is directed to computer readable code for generating memory addresses to be used for storing reduced color-resolution image data in a memory for display in rotated and non-rotated orientations in a manner that reduces peak bandwidth requirements.
A computer readable medium is any data storage device that can store data which can be thereafter read by a computer system. A computer readable medium includes an electromagnetic carrier wave in which the computer code is embodied. Examples of the computer readable medium include, among other things, floppy disks, memory cards, hard drives, RAMs, ROMs, EPROMs, compact disks, and magnetic tapes.
Although the invention has been described in some detail for purposes of clarity of understanding, it will be apparent that certain changes and modifications may be practiced within the scope of the appended claims. Accordingly, the present embodiments are to be considered as illustrative and not restrictive. Further, the terms and expressions that have been employed in the foregoing specification are used as terms of description and not of limitation, and are not intended to exclude equivalents of the features shown and described or portions of them. The scope of the invention is defined and limited only by the claims that follow.
This application claims priority from co-pending U.S. Provisional Patent Applications No. 60/711,098, filed Aug. 25, 2005, entitled “A Method for Storing YUV 4:2:0 Data To Simplify Image Rotation,” Attorney Docket No. VP218PR, and No. 60/710,765, filed Aug. 23, 2005, entitled “A Method To Save Bandwidth In Saving Or Retrieving Image When Stored in YUV 420,” Attorney Docket No. VP216PR, which are hereby incorporated by reference in their entirety.
Number | Date | Country | |
---|---|---|---|
60710765 | Aug 2005 | US | |
60711098 | Aug 2005 | US |