Embodiments of the present invention generally relate to data processing. More specifically, embodiments of the present invention relate to encoding (compressing) data such as video data.
The ability to quickly and efficiently process video streams has grown in importance, with portable consumer electronic products incorporating more and more multimedia features. Mobile phones, for example, can be used to retrieve, view and transmit multimedia content. However, while the capabilities of portable devices continue to increase, such devices are still somewhat limited relative to more powerful platforms such as personal computers. Data transmission and retrieval rates may also be a factor. The amount of image (e.g., video) data is usually more of a consideration than the amount of audio data.
The data is often encoded (compressed) to facilitate storage and streaming, and then decoded (decompressed) for playback (e.g., display). Video data may be compressed using a Moving Pictures Experts Group (MPEG) scheme, for example. By encoding a video sequence, the number of bits needed to represent the video sequence can be greatly reduced.
In a typical video sequence, the content of one frame, or a least a portion of that frame, may be very similar to that of another frame. This is commonly referred to as “temporal redundancy.” A compression technique commonly referred to as “motion compensation” is employed to exploit temporal redundancy. If content in a frame is closely related to that of another (reference) frame, it is possible to accurately represent, or predict, the content of the frame using the reference frame.
The frames are partitioned into blocks of pixels (e.g., a macroblock of 16×16 pixels). The movement of a block that, apart from being moved to a different position, is not otherwise transformed significantly from a corresponding block in the reference frame, can be represented using a motion vector. For example, a motion vector of (3,4) can mean that the block has moved three pixels to the left and four pixels upward relative to the position of its corresponding block in the reference frame. Motion compensation refers to the application of a motion vector to a decoded (decompressed) block to construct a new block (or frame or image).
Compression standards continue to evolve, in order to achieve higher compression rates without compromising the quality of the reconstructed video. A recent compression standard that is becoming more widely used is known as H.264 or MPEG-4 Part 10, and is known more formally as Advanced Video Coding (AVC). Earlier standards such as MPEG-4 (which is different from MPEG-4 Part 10) continue to be used.
The continued use of earlier and still acceptable standards, such as MPEG-4, and the introduction of newer or improved standards, such as H.264, can create a dilemma for manufacturers of consumer electronic devices. Devices designed for one compression scheme may not be able to implement a different compression scheme. This may be particularly true in devices in which encoding is accomplished in hardware. Accordingly, a system and/or method that can readily adapt aspects of one compression scheme (e.g., MPEG-4) to another one (e.g., H.264) would be advantageous. Embodiments in accordance with the present invention provide this and other advantages.
In one embodiment, an H.264 encoder is implemented by adding an H.264 interpolator, including a half pixel (half pel) filter and a data packer module, into an otherwise MPEG-4 encoder pipeline. In one embodiment, the H.264 interpolator is implemented in hardware. The MPEG-4 pipeline is used to compute motion vectors, and the H.264 interpolator is used for motion compensation. The data packer module arranges the output of the motion compensator in a manner suitable for use by a downstream media processor unit (e.g., a digital signal processor), which directs the execution of other encoding processes such as transformation, quantization, inverse transformation and inverse quantization.
The implementation of an H.264 interpolator in hardware in an otherwise MPEG-4 pipeline is accomplished without increasing the number of gates and may reduce power consumption. Such features are particularly beneficial in portable handheld electronic devices such as portable phones, personal digital assistants (PDAs), and handheld gaming devices.
These and other objects and advantages of the various embodiments of the present invention will be recognized by those of ordinary skill in the art after reading the following detailed description of the embodiments that are illustrated in the various drawing figures.
The accompanying drawings, which are incorporated in and form a part of this specification, illustrate embodiments of the present invention and, together with the description, serve to explain the principles of the invention.
Reference will now be made in detail to the various embodiments of the present invention, examples of which are illustrated in the accompanying drawings. While the invention will be described in conjunction with these embodiments, it will be understood that they are not intended to limit the invention to these embodiments. On the contrary, the invention is intended to cover alternatives, modifications and equivalents, which may be included within the spirit and scope of the invention as defined by the appended claims. Furthermore, in the following detailed description of the present invention, numerous specific details are set forth in order to provide a thorough understanding of the present invention. However, it will be understood that the present invention may be practiced without these specific details. In other instances, well-known methods, procedures, components, and circuits have not been described in detail so as not to unnecessarily obscure aspects of the present invention.
Some portions of the detailed descriptions that follow are presented in terms of procedures, logic blocks, processing, and other symbolic representations of operations on data bits within a computer memory. These descriptions and representations are the means used by those skilled in the data processing arts to most effectively convey the substance of their work to others skilled in the art. In the present application, a procedure, logic block, process, or the like, is conceived to be a self-consistent sequence of steps or instructions leading to a desired result. The steps are those utilizing physical manipulations of physical quantities. Usually, although not necessarily, these quantities take the form of electrical or magnetic signals capable of being stored, transferred, combined, compared, and otherwise manipulated in a computer system. It has proven convenient at times, principally for reasons of common usage, to refer to these signals as transactions, bits, values, elements, symbols, characters, samples, pixels, or the like.
It should be borne in mind, however, that all of these and similar terms are to be associated with the appropriate physical quantities and are merely convenient labels applied to these quantities. Unless specifically stated otherwise as apparent from the following discussions, it is appreciated that throughout the present invention, discussions utilizing terms such as “executing,” “receiving,” “accessing,” “computing,” “identifying,” “decoding,” “encoding,” “loading,” “applying,” “removing,” “shifting,” “storing,” “selecting,” “arranging,” “directing,” “generating,” “reconstructing,” “comparing,” “transforming,” “quantizing,” “fetching” or the like, refer to actions and processes of a computer system or similar electronic computing device or processor. The computer system or similar electronic computing device manipulates and transforms data represented as physical (electronic) quantities within the computer system memories, registers or other such information storage, transmission or display devices.
The descriptions and examples provided herein are discussed in the context of video data; however, the present invention is not so limited. The data may be multimedia data; for example, there may be audio data associated with the video data.
As shown in
According to embodiments of the present invention, an H.264 encoder is implemented by inserting an H.264 interpolator into an otherwise MPEG-4 pipeline. The MPEG-4 pipeline computes motion vectors that are used by the H.264 interpolator for motion compensation. In one embodiment, the H.264 interpolator is implemented on the MPU 12. The output of the H.264 interpolator is provided to, for example, a digital signal processor (DSP) 16 that, in one embodiment, is a part of the MPU 12. Additional information regarding the H.264 interpolator is provided in conjunction with
Significantly, according to embodiments of the present invention, process 20 shows both MPEG-4 and H.264 processing blocks. As mentioned above, elements of an H.264 encoder are inserted into an MPEG-4 pipeline. In overview, the MPEG-4 pipeline is used to generate motion vectors that are in turn used by either an MPEG-4 encoder or an H.264 encoder. That is, downstream of the motion estimation block 206, the motion vectors generated according to MPEG-4 may be used in either an MPEG-4 encoding process or in an H.264 encoding process.
In the example of
In the present embodiment, an input (current) frame 202 is presented for encoding. The frame is processed in units of a macroblock (e.g., a 16×16 block of pixels). A prediction macroblock will be formed based on one or more reference frames 204. The reference frame(s) 204 may include a previously encoded frame (the frame encoded immediately preceding the current frame 202, noting that the order of encoding may be different from the order in which the frames are to be displayed), and is selected from other frames that have already been encoded and reconstructed (e.g., reconstructed frame 216 for MPEG-4 mode or reconstructed frame 224 for H.264 mode). In the motion estimation block 206, motion vectors 208 are derived by comparing the input frame 202 and the reference frame(s) 204.
In an MPEG-4 implementation, the motion vectors 208 are used in the motion compensation block 210 to form the prediction macroblock. The prediction macroblock is subtracted from the current macroblock (input frame 202) to produce a residual that is transformed (e.g., discrete cosine transformed) and quantized into a set of coefficients (block 212). The coefficients can be entropy encoded (e.g., Huffman encoded) and, along with other information used for decoding but not central to this discussion, formed into a compressed bitstream, which may be transmitted or stored. Also, the coefficients are decoded in order to reconstruct a frame that can be used to encode other frames. In block 214, the coefficients are inverse quantized and inverse transformed to produce another residual (different from the aforementioned residual), which is added to the prediction macroblock to form a reconstructed macroblock and ultimately a reconstructed frame 216.
In an H.264 implementation, the motion vectors 208, derived according to an MPEG-4 encoding scheme, are used in the H.264 motion compensation block 218 to produce a prediction block by using a six-tap filter (e.g., a six-tap finite impulse response filter) on six (6) adjacent pixels, for half pel motion vectors. The H.264 motion compensation block 218 is discussed further in conjunction with
Continuing with reference to
In the present embodiment, H.264 interpolator 32 also receives a signal 38 indicating the data type (e.g., current frame or reference frame).
The information output from the H.264 interpolator 32 includes out-of-band signals 36 and data 37. The data 37 includes a stream of motion-compensated pixel values corresponding to the pixels on the display screen 13 (
The out-of-band signals 36 of
In the following discussion of the H.264 interpolator 32 of
The data path through H.264 interpolator 32 includes a data pipeline and a control pipeline. The data and control pipelines contain bypass multiplexers that are used to forward data from the current pipe stage to a subsequent pipe stage; that is, data can jump past a pipe stage. The multiplexers that control the flow through the data and control pipelines are controlled by the same multiplexer select signals. Multiplexer bypass operations are controlled by the motion vectors 208 (
Continuing with reference to
In one embodiment, each plane of the reference frame 204 (
The H.264 interpolator block 32 of
The H.264 interpolator 32 reads data for the reference frame (e.g., reference frame 204 of
The H.264 interpolator 32 of
The data is processed using the luma row filters 404, 405, 406 and 407, the luma column filters 421, 422, 423 and 424, the chroma row filters 431, 432, 433 and 434, and the chroma column filters 441, 442, 443 and 444. With four filters in each stage, four pixels can be handled at a time. Each of these filters is a six-tap FIR filter that implements the kernel [1 −5 20 20 −5 1]/32 and produces an eight-bit output. The luma filters 404-407 and 421-424 each have six 14-bit inputs. Each respective pair of chroma filters 431-434 and 441-444 has two-by-five 14-bit inputs. That is, there are four chroma filters, each of which includes a row filter and a respective column filter. Thus, for example, one chroma filter includes chroma row filter 431 and chroma column filter 441, another such pair includes chroma row filter 432 and chroma column filter 442, and so on.
Each luma and chroma operation is controlled by the motion vectors 208 (
The operations in the x-direction (e.g., row) and in the y-direction (e.g., column) can be decoupled, so that operations can first proceed on the rows and then on the columns. The luma filter operations are split into an x-component filter operation before the cache 411 and a y-component filter operation after the cache 411. If the x-component of the motion vector is zero, then the luma data bypasses (bypass 412) the luma row filters 404-407 using a bypass multiplexer (routing logic 410) instead of multiplying by zero. If the x-component of the motion vector is non-zero (e.g., one), then the data is filtered by the luma row filters 404-407. The results are selected by the routing logic 410 and placed in luma cache 414.
The cache 411 has six rows that are shared by the luma column filters 421-424, the chroma row filters 431-434, and the chroma column filters 441-444. The first two rows have five 14-bit pixel words and constitute chroma cache 412 coupled to the chroma row filters 431-434 and the chroma column filters 441-444. The next four rows have four 14-bit pixel words and constitute luma cache 414 coupled to the luma column filters 421-424.
Each row of the cache 411 is connected to the next row. At the start of a macroblock the data cache 411 is empty, but is filled as data is fetched and processed by the luma row filters 404-407. Results from the luma row filter operations are loaded into the first row of the cache 411. In each clock cycle in which the cache 411 is enabled, the data is shifted down into the next row of the cache 411. The data is fed to the luma column filters 421-424, the chroma row filters 431-434, and the chroma column filters 441-444.
The luma column filters 421-424 use the y-component of the motion vectors to filter the data. Downstream of the luma column filters 421-424 are luma clipping blocks labeled “C” in
The chroma filters 431-434 and 441-444 use the x-component of a motion vector to select and apply the first stage (e.g., a row filter) of the chroma filter operation, and then use the y-component of the motion vector to select and apply the second stage (e.g., a column filter) of the chroma filter operation. Each chroma filter implements the equation:
pixel-out=(a*E)+(b*F)+(c*G)+(d*H);
where a=(4−xfrac), b=xfrac, c=(4−xfrac) and d=xfrac for the first stage of the filter; and where a=(4−yfrac), b=(4−yfrac), c=yfrac and d=yfrac for the second stage of the filter. E, F, G and H represent the 4 input pixels, and xfrac and yfrac represent the lower 2 bits of the x-component and y-component, respectively, of the motion vector. The result of this relation is rounded up by adding 8, and the result is right-shifted 4 bits in the block labeled “A” in
With reference to
In block 81 of
In block 82, six-tap FIR filters are applied to the motion vectors to calculate half pel interpolated pixels according to a second encoding scheme that is different from the first encoding scheme. In one embodiment, the second encoding scheme is an H.264 encoding scheme.
In one embodiment, the x-component of a luma channel of the video data are operated on using a plurality of luma row six-tap FIR filters. A result from the plurality of luma row filters is loaded into a first row of a cache. A result already residing in the first row is shifted to a second row of the cache.
Furthermore, in one embodiment, the x-component of a chroma channel of the video data is operated on using a plurality of chroma row six-tap FIR filters, the y-component of the chroma channel is operated on using a plurality of chroma column six-tap FIR filters, and the y-component of the luma channel is operated on using a plurality of luma column six-tap FIR filters.
In block 83, a data packer interleaves blocks of video data for the input frame with blocks of video data for the reference frame.
Embodiments of the present invention are thus described. While the present invention has been described in particular embodiments, it should be appreciated that the present invention should not be construed as limited by such embodiments.
This application claims priority to the copending provisional patent application Ser. No. 60/772,440, entitled “Adapting One Type of Encoder to Another Type of Encoder,” with filing date Feb. 10, 2006, assigned to the assignee of the present application, and hereby incorporated by reference in its entirety.
Number | Name | Date | Kind |
---|---|---|---|
6097842 | Suzuki et al. | Aug 2000 | A |
7092442 | Zhang et al. | Aug 2006 | B2 |
7167520 | Yoshioka et al. | Jan 2007 | B2 |
7535961 | Cho et al. | May 2009 | B2 |
20020044603 | Rajagopalan et al. | Apr 2002 | A1 |
20020064228 | Sethuraman et al. | May 2002 | A1 |
20020176500 | Bakhmutsky et al. | Nov 2002 | A1 |
20040013399 | Horiguchi et al. | Jan 2004 | A1 |
20040091049 | Yamaguchi et al. | May 2004 | A1 |
20040218626 | Tyldesley et al. | Nov 2004 | A1 |
20050201463 | Lee et al. | Sep 2005 | A1 |
20060088104 | Molloy et al. | Apr 2006 | A1 |
Number | Date | Country |
---|---|---|
1575301 | Sep 2005 | EP |
1879388 | Jan 2008 | EP |
6038243 | Feb 1994 | JP |
8009385 | Jan 1996 | JP |
08-046971 | Feb 1996 | JP |
2001-016594 | Jan 2001 | JP |
2004-242309 | Aug 2004 | JP |
9529561 | Nov 1995 | WO |
Entry |
---|
Ad hoc group on MPEG-4 video VM editing; MPEG-4 Video Verification Model Version 7.0; Video Group; Bristol, Apr. 1997. |
Advanced Video Coding for Generic Audiovisual Services; Series H: Audiovisual and Multimiedia Systems; ITU-T (International Telecommunication Union) Mar. 2005. |
IEEE Xplore; DSP-Based Multi-Format Video Decoding Engine for Media Adapter Applications; Yi-Shin Tung, Member, IEEE, Sung-Wen Wang, Chien-Wu Tsai, Ya-Ting Yang, and Ja-Ling Wu, Senior Member, IEEE; Issue Date: Feb. 2005; vol. 51 Issue:1; on pp. 273-280; ISSN: 0098-3063. |
ITU-T: Telecommunication Standardization Sector of ITU; H.264 (Mar. 2005); Series H: Audiovisual and Multimedia Systems; Infrastructure of audiovisual services—Coding of moving video; Advanced video coding for generic audiovisual services. |
Number | Date | Country | |
---|---|---|---|
20070189390 A1 | Aug 2007 | US |
Number | Date | Country | |
---|---|---|---|
60772440 | Feb 2006 | US |