The following description relates to a system and method for encoding digital image data and more particularly to a hierarchical approach to coding of video data.
Video data streams consist of sequences of images and contain high data volumes. Such data streams may be encoded, e.g., compressed to reduce channel capacity requirements. Scalable video coding is commonly used for producing a compressed video in multiple-layer format, which may be used for applications that need to support devices of various resolutions. Each image layer in such a format is an independent bit stream corresponding to the data needed to produce a particular resolution as an enhancement from the previous lower resolution. In a system employing such a format, each device need only decode up to the resolution it needs.
In related-art implementations, an image is split into pixel regions that corresponding to a single pixel in the lowest resolution. For example, for H.265 (HEVC) these regions may be 32×32 blocks within the image. As result, the encoder cannot start encoding until at least 31 lines are available to the encoder, e.g., having been read into a buffer, which may be referred to as a line buffer, from the input video stream. This results in burdensome requirements on the hardware used to encode the video stream, and increased cost. Thus, there is a need for a system and method for encoding video which has reduced requirements for buffer space.
The above information disclosed in this Background section is only for enhancement of understanding of the background of the invention and therefore it may contain information that does not form the prior art that is already known in this country to a person of ordinary skill in the art.
An aspect of an exemplary embodiment of the present invention includes a hierarchical system and method of encoding and compressing image data, or video data including a sequence of images. In one embodiment, a line buffer is used to hold a line of an image, and as the second line of the image is read from the input data stream, 2×2 blocks of the image are transformed, e.g., by a Hadamard transform. Each transform results in a low-frequency component and three high-frequency component. The high-frequency components are encoded, e.g., using entropy coding, and sent out to the output bit stream. The low-frequency components are pushed to the line buffer. This process is continued until enough low-frequency components have been formed to complete a 2×2 block of low-frequency components, which is then transformed. The process may be repeated hierarchically for multiple layers.
According to an embodiment of the present invention there is provided a method for encoding an image, the method including: processing a plurality of rows of image elements of a first kind, one N by N block at a time, N being an integer greater than 1, with a transform, wherein the transform is configured to form a plurality of first high-frequency components and one first low-frequency component; for each N by N block, encoding the first high-frequency components and storing the first low-frequency component; processing the first low-frequency components, one M by M block at a time, M being an integer greater than 1, with the transform, wherein the transform is configured to form a plurality of second high-frequency components and one second low-frequency component; and for each M by M block, encoding the second high-frequency components and storing the second low-frequency component.
In one embodiment, each image element of the first kind is a pixel value.
In one embodiment, each image element of the first kind is a low-frequency component obtained by applying a transform to an array of pixel values.
In one embodiment, the transform is a Hadamard transform.
In one embodiment, the transform is a discrete cosine transform.
In one embodiment, the transform is a wavelet transform.
In one embodiment, M equals 2.
In one embodiment, N equals 2 and M equals 2.
In one embodiment, the encoding of the first high-frequency components includes encoding the first high-frequency components with an entropy coding.
In one embodiment, the encoding of the second high-frequency components includes encoding the second high-frequency components with an entropy coding.
In one embodiment, N equals 2, and each N by N block has four image elements, including: a first image element; a second image element; a third image element; and a fourth image element.
In one embodiment, the low-frequency component is a constant multiplied by the sum of the four image elements.
In one embodiment, the constant is one-half.
In one embodiment, a high-frequency component is a constant multiplied by the difference between: the sum of the first image element and the second image element; and the sum of the third image element and the fourth image element.
In one embodiment, the constant is one-half.
In one embodiment, a high-frequency component is a constant multiplied by the difference between: the sum of the first image element and the third image element; and the sum of the second image element and the fourth image element.
In one embodiment, the constant is one-half.
In one embodiment, a high-frequency component is a constant multiplied by the difference between: the sum of the first image element and the fourth image element; and the sum of the second image element and the third image element.
In one embodiment, the constant is one-half.
According to an embodiment of the present invention there is provided a system for encoding an image, the system including a processing unit configured to: process a plurality of rows of image elements of a first kind, one N by N block at a time, N being an integer greater than I, with a transform, wherein the transform is configured to form a plurality of first high-frequency components and one first low-frequency component; for each N by N block, encode the first high-frequency components and store the first low-frequency component; process the first low-frequency components, one M by M block at a time; M being an integer greater than 1, with the transform, wherein the transform is configured to form a plurality of second high-frequency components and one second low-frequency component; and for each M by M block, encode the second high-frequency components and store the second low-frequency component.
These and other features and advantages of the present invention will be appreciated and understood with reference to the specification, claims and appended drawings wherein:
The detailed description set forth below in connection with the appended drawings is intended as a description of exemplary embodiments of a hierarchical image and video codec provided in accordance with the present invention and is not intended to represent the only forms in which the present invention may be constructed or utilized. The description sets forth the features of the present invention in connection with the illustrated embodiments. It is to be understood, however, that the same or equivalent functions and structures may be accomplished by different embodiments that are also intended to be encompassed within the spirit and scope of the invention. As denoted elsewhere herein, like element numbers are intended to indicate like elements or features. Expressions such as “at least one of,” when preceding a list of elements, modify the entire list of elements and do not modify the individual elements of the list. Further, the use of “may” when describing embodiments of the present invention refers to “one or more embodiments of the present invention”.
Referring to
and the high frequency components are given by
The high-frequency components 135 may be compressed, e.g., using entropy coding, forming layer 5 data, and placed into the output bit stream 138. The low-frequency component 130 may be pushed back into the line buffer.
Subsequent 2×2 image blocks are encoded until the first two lines of the image have been processed. The third image line may then be read into the line buffer, and processing of additional 2×2 blocks may begin as soon as the first 2 pixels of the fourth line have been read into the line buffer. Low-frequency components generated by transforming these 2×2 blocks may be sent directly to the layer 4 encoder. Once the first two 2×2 blocks within the 3rd and 4th image rows have been encoded, the line buffer contains the low-frequency components 140, 145 from the first two 2×2 blocks of the first two lines, and the layer 4 encoder stores the low-frequency components 150, 155 from the first two 2×2 blocks of the 3rd and4th lines. The low-frequency components 140, 145 from the first two 2×2 blocks of the first two lines may be read into the layer 4 encoder, so that it contains the low-frequency components from this 2×2 set of 2×2 blocks, corresponding to a 4×4 region of pixels.
The layer 4 encoder then encodes this 2×2 set of low-frequency components, e.g., layer 4 encoder processes them with a transform, to obtain a low-frequency component 160 and 3 high-frequency components 165. The high-frequency components 165 may be compressed and placed in the output bit stream 138 forming layer 4 data, and the low-frequency component 160 may be pushed to the line buffer. The process of transforming 2×2 blocks in the layer 5 encoder may then resume until another 2×2 set of low-frequency components has been generated, which is then transformed in the layer 4 encoder. Operation of the layer 3, layer 2, and layer 1 encoders is similar, with each encoder using as input low-frequency components produced by the preceding encoder which may either store these low-frequency components in the line buffer or deliver them directly to the next encoder in the chain.
In these examples, the transform may be applied to pixels in an image block, such as one of the 2×2 blocks, or to an array of low-frequency components resulting from previously applied transforms, such as the four low-frequency components resulting from applying the transform to each of a 2×2 set of 2×2 blocks. In either case the input to the transform is referred to herein as a set, or array, of image elements. Thus, as used herein, image elements may be pixels, or they may be low-frequency components obtained by applying a transform to pixels or to image elements.
In one embodiment, additional lines in the image are processed as follows. For even lines, the 2×2 block of the current two pixels and two pixels above them from the line buffer are transformed into one low frequency value and three high frequency values. Then the three high frequency values are compressed into the enhanced layer 5, and the low frequency value is either pushed to the line buffer, or sent to layer 4 encoder if the line is an integer multiple of 4. Similarly, the layer 4 encoder runs on every 4th line, and the layer 3, 2, and 1 encoders run on every 8th, 16th, and 32nd lines, respectively. As the line buffer needed for each level is no more than half of the previous level, total buffer memory capable of storing 2 lines of pixels is sufficient.
In other embodiments, N×N blocks of pixels (i.e., N by N blocks of pixels), where N may be greater than 2, are transformed by the first encoder. In this case N−1 lines, and N pixels of the Nth line, are read into the buffer, before the first encoding step is conducted. In each case, the initial encoding step encodes an N×N block of the image. The encoding of an N×N image block results in one low-frequency component and N2−1 high-frequency components. The high frequency components are compressed, e.g., using entropy coding, and the low frequency component is initially pushed to the line buffer. A subsequent encoder in the chain may work with the same (N×N) block size, or with blocks of a different size, e.g., M×M.
The transform used need not be a Hadamard transform; it may be a discrete cosine transform (DCT), or a wavelet transform. In the case of a 2×2 block, the three transforms, i.e., the Hadamard transform, the DCT, and the wavelet transform, may differ only by a scaling factor. In the case of a larger block, such as a 16×16 block, the linear combinations of pixel values forming, for example, one of the high frequency components of the Hadamard transform may contain the pixel values in proportions not represented in any of the linear combinations of, e.g., a DCT. Thus, for larger blocks the choice of transform may affect the results of the encoding.
Implementations according to embodiments of the present invention use significantly less line buffer memory than, for example, a related art approach requiring that 31 full lines and 32 additional pixels be read into a line buffer before encoding of the first 32×32 block begins. Moreover, this related art approach must handle different block sizes at each level; embodiments of the present invention use the same transformation and encoding engines for each level, and they can be shared. One embodiment of the present invention uses a 2×2 transform at each level or layer; this transform may be particularly simple.
Elements of embodiments of the present invention may be implemented using one or more processing units. The term “processing unit” is used herein to include any combination of hardware, firmware, and software, employed to process data or digital signals. Processing unit hardware may include, for example, application specific integrated circuits (ASICs), general purpose or special purpose central processing units (CPUs), digital signal processors (DSPs), graphics processing units (GPUs), and programmable logic devices such as field programmable gate arrays (FPGAs).
Although exemplary embodiments of the hierarchical image and video codec have been specifically described and illustrated herein, many modifications and variations will be apparent to those skilled in the art. For example, although examples described herein relate to embodiments involving video data, related embodiments of the present invention may be practiced with individual images. Accordingly, it is to be understood that the hierarchical image and video codec constructed according to principles of this invention may be embodied other than as specifically described herein. The invention is also defined in the following claims, and equivalents thereof.
The present application claims priority to and the benefit of Provisional Application No. 61/807,671, filed Apr. 2, 2013, entitled “HIERARCHICAL IMAGE AND VIDEO CODEC”, the entire content of which is incorporated herein by reference.
Number | Date | Country | |
---|---|---|---|
61807671 | Apr 2013 | US |