This disclosure relates to digital signal processing.
Conventional digital video encoding includes the compression of a source video using a compression algorithm. The Moving Picture Experts Group (MPEG)-1, MPEG-2, H.261, H.262, H.263 and H.264 video coding standards each describe a syntax of the bitstream that is generated following application of the compression algorithm to the source video. The data in video streams is often redundant in space and time, thus video streams may be compressed by removing the redundancies, and encoding only the differences. For example, a blue sky background across the top of a picture may persist for several frames of a video stream, while other objects move in the foreground. It would be redundant to encode the background of each frame since it remains the same. The moving foreground objects also have redundancies. For example, a jet airplane that moves across a frame may appear the same from frame to frame, with only its position changing. In this case, the jet airplane does not need to be encoded in every frame, only its change in position needs to be encoded.
The disclosure provides various embodiments of systems and methods for video coding. In one embodiment, A method includes receiving a digital video stream. The digital video stream includes multiple sequential independent frames. The method further includes storing a first frame of the digital video stream. The method also includes encoding a second frame of the digital video stream using motion compensation with the stored first frame as a reference.
The foregoing—as well as other disclosed—example methods may be computer implementable. Moreover some or all of these aspects may be further included in respective processes, computer implemented methods, and systems for video coding.
The processes, computer implemented methods, and systems may also include determining a motion-predicted picture using the stored first picture as a reference, subtracting the motion-predicted picture from the second picture to generate a difference, and encoding the difference. They may also include quantizing the first picture, de-quantizing the quantized first picture, creating a reconstructed first picture from the de-quantized first picture; and storing the reconstructed first picture. Encoding the second picture may include determining a first motion-predicted picture based on the stored reconstructed first picture as a reference; determining a second motion-predicted picture based on the stored first picture as a reference; determining a resulting motion-predicted picture using the first and the second motion-predicted pictures and criteria; subtracting the resulting motion-predicted picture from the second picture to generate a difference; and encoding the difference.
Determining the resulting motion-predicted picture may include selecting between the first and the second motion-predicted pictures using the criteria. Determining the resulting motion-predicted picture may also include blending the first and the second motion-predicted pictures using the criteria. The criteria may be based on a determined quantization noise and a determined mismatch noise. The criteria may also include selecting a signal based on the proximity of encoding an intra-coded picture refresh point. The criteria may also include determining whether a mismatch noise threshold is exceeded.
The picture may include multiple regions, and the criteria may be determined by the region of the plurality of regions that is being encoded. Each of the multiple regions may be a macroblock. The criteria may be based on the amount of motion associated with each of the multiple regions. The criteria may be based on the amount of quantization noise associated with each of the multiple regions. The criteria may be based on the spatial characteristics of each of the plurality of regions. The first picture may be a first macroblock and the second picture may be a second macroblock.
The details of one or more embodiments of the invention are set forth in the accompanying drawings and the description below. Other features, objects, and advantages of the invention will be apparent from the description and drawings, and from the claims.
Like reference symbols in the various drawings indicate like elements.
Motion compensated inter-frame predictive (MC-prediction) coding is a tool for video compression that takes advantage of the time redundancy present in video streams. Referring now to
Video encoder 100 includes an encoding path and a feedback path. The encoding path includes a mixer 102, a discrete cosine transform (DCT) block 104, quantization block 106 and an entropy encoder block 108. The feedback path includes an inverse quantization block 110a, inverse DCT block 110b, mixer 112, delay element 116, and motion prediction block 114.
Input signal A is received at mixer 102 where an encoder motion prediction signal Fe is subtracted to produce an encoder residual signal B, which may be referred to as the error signal B. Signal B is provided as an input to the DCT transform block 104. The transformed residual signal is quantized in quantization block 106 producing a quantized encoder residual signal. This block introduces error, specifically quantization noise, between the encoder input signal and the encoders' coded and reconstructed input signal. The quantized encoder residual signal is provided as an input to the entropy encoder block 108, which in turn produces an encoded stream, C. The entropy encoder block 108 operates to code the motion compensated prediction residual (i.e., error signal embodied in the encoder residual signal B) to further compress the signal and may also produce a bit stream that is compliant with a defined syntax (e.g., coding standard). The encoded stream C can be provided as an input to a transmission source that in turn can transmit the encoded stream to a downstream device where it may be decoded and the underlying source video input reconstructed.
The feedback path includes a motion prediction block 114 that decides how best to create a version of the current frame of video data using pieces of the past frame of video data. More specifically, the quantized encoder residual signal is also provided as an input to an inverse quantization block 110a in the feedback path. The output of the inverse quantization block 110a is an inverse quantized transformed encoder residual signal. The inverse quantization block 110a seeks to reverse the quantization process to recover the transformed error signal. The output of the inverse quantization block 110a is provided as an input to the inverse DCT block 110b that in turn produces recovered encoder residual signal D. The inverse DCT block 110b seeks to reverse the transform process invoked by DCT block 104 so as to recover the error signal. The recovered error signal D is mixed with the output of the motion prediction block 114 (i.e., encoder motion prediction signal F) producing the reconstructed input video signal E. The reconstructed input video signal E is provided as an input to the delay element 116. The delay imparted by delay element 116 allows for the alignment of frames in the encoding path and feedback path (to facilitate the subtraction performed by mixer 102). The delayed reconstructed encoder signal is provided as a past frame input to motion prediction block 114.
Motion prediction block 114 has two inputs: a reconstructed past-frame input (i.e., delayed reconstructed encoder signal) and a current-frame input (i.e., input video signal A). Motion prediction block 114 generates a version of the past-frame input that resembles as much as possible (i.e., predicts) the current-frame using a motion model that employs simple translations only. Conventionally, a current frame is divided into multiple two-dimensional blocks of pixels, and for each block, motion prediction block 114 finds a block of pixels in the past frame that matches as well as possible. The prediction blocks from the past frame need not be aligned to a same grid as the blocks in the current frame. Conventional motion estimation engines can also interpolate data between pixels in the past frame when finding a match for a current frame block (i.e., sub-pixel motion compensation). The suitably translated version of the past-frame input is provided as an output (i.e., encoder motion prediction signal F) of the motion prediction block 114.
The prediction generated from a previously encoded reconstructed frame of video is subtracted from the input in a motion compensation operation (i.e., by mixer 102). Compression takes place because the information content of the residual signal (i.e., encoder residual signal B) typically is small when the prediction closely matches the input. The motion compensated prediction residual is transformed, quantized and then coded as discussed above to produce a bit stream.
The nonzero signal in the motion predicted residual (i.e., encoder residual signal B) originates from three primary sources: motion mismatch, quantization noise and aliasing distortion.
The motion prediction (i.e., encoder motion prediction signal F) is a piecewise approximation of the input. The motion prediction is generated assuming motion between frames is simple and translational. Motion mismatch is the difference between the assumed motion model and true motion between input and reference frames.
The motion prediction of encoder 100 includes quantization noise. More specifically, the motion prediction signal (i.e., encoder motion prediction signal F) contains quantization noise due to the motion prediction being performed on imperfectly encoded past video frames.
Aliasing distortion arises from the conventional interpolation filters (not shown) used in the motion prediction block 114 to generate sub-pixel precision motion predictions. The interpolation filters introduce aliasing distortion in the prediction when the prediction is extracted using sub-pixel motion vectors. The magnitude of this distortion component is dependent upon the spatial frequency content of the signal being interpolated and the stop band attenuation characteristics of the filter used to perform the interpolation.
Referring now to
An encoder residual signal B is produced by subtracting (at 312), from input signal A, an encoder motion estimation signal G, Fe, or some combination of the two. A conventional encoder uses Fe in the feedback loop, which in the case of a conventional encoder, matches the F of a conventional decoder (see
If, however, the encoder uses G instead of Fe, then a conventional decoder 336 (
The efficiency gains obtained from not coding the quantization noise in B may outweigh the cost of the mismatch noise generated by using G. For example, the Moving Picture Experts Group (MPEG)-4/AVC or H.264 encoding standard supports Hierarchical group of pictures (GOP) structures, in which some pictures may be used as motion compensation references only for a few nearby pictures.
An encoder 210 may also be configured to limit the accumulation of mismatch noise by adjusting its use of intra-coded pictures, motion-compensated (MC) predicted reference pictures, and MC predicted non-reference pictures. For example, the encoder may use signal G increasingly as the encoder nears an intra-coded picture refresh point. Doing so may lessen the impact of mismatch noise accumulation as compared to quantization noise, because there may be more accumulation of quantization noise from using Fe over several frames. An encoder 210 may estimate the quantization noise present in a coded picture from properties of the picture and from the quantization step sizes used to compress the picture.
In another example, a video encoder 210 may use G for several pictures and, before reaching a mismatch noise threshold, may use Fe for one or more frames to limit error propagation—even without an intra-coded picture refresh. The encoder 210 may accomplish this by reducing the quantization step size enough to eliminate the mismatch noise propagation. The mismatch noise may be measured by calculating Fe and G for each coded picture and calculating the difference between Fe and G.
In yet another example, the video encoder 210 may use a blend of signals Fe and G, using for example, the following equation:
α*Fe+(1−α)*G (Equation 1).
The encoder may set a constant α or adaptively determine α. For example, the encoder 210 may measure the mismatch noise resulting from the difference between Fe and G calculated for a given coded picture or series of pictures. The encoder 210 may use the measured mismatch noise along with any measured quantization noise to determine α, such as varying α so as to minimize the total of mismatch and quantization noise. The encoder 210 may set α to a constant 0.5 in order to use an equal blend of the signals Fe and G.
The video encoder 210 may determine and apply different combinations of Fe and G to the different sub-regions of a picture, such as picture macroblocks, rather than using a single combination for all the sub-regions of a picture. This determination may be based on a variety of factors. For example, the encoder 210 may make the determination based on the amount of motion in a particular sub-region. As another example, the encoder 210 may determine that one signal or combination is appropriate based on a sub-region's spatial characteristics, such as the amount of detail in the sub-region or the sub-region's overall brightness. In yet another example, the encoder 210 may make the determination based on the amount of quantization noise in a sub-region.
The encoder 210 may combine various aspects of the above examples in certain embodiments. For example, the encoder may adaptively control α in Equation 1 based on the measured quantization noise, mismatch noise, and spatial detail, for a particular sub-region or for an entire picture. The encoder may choose between Fe or G based on the brightness and the amount of motion in the entire picture. Many other combinations may be implemented.
At step 508, another video frame may be received. At step 510, MC-prediction coding may be performed on this video using the copy of the original reference frame to determine the motion-compensated reference picture that corresponds to signal G of
At step 516, the motion compensation is applied based on the blend determined in step 514. At step 518, a determination is made as to whether there are more frames that need to be encoded. If not, the process ends. Otherwise, at step 520, a determination is made whether to encode another reference frame. If a reference frame is to be encoded, step 502 is executed. Otherwise, step 508 is executed. Process 500 ends when there are no more frames to be encoded.
The terms picture and frame are meant to refer generically to subdivisions of a video that may not correspond directly to their use in a given video coding standards. Likewise, the terms I-, P-, and B-pictures/frames are meant to refer generically regardless of the actual coding standard used. For example, an I-picture/frame refers to an intra-coded picture, that is one that is encoded without using another frame or picture. P- and B-pictures or frames are encoded using other frames or pictures, which may be previous frames, future frames, or both. Those familiar with the art will recognize that terms differ from standard to standard while the overall concepts remain similar. As such, the terms used in this specification should be construed generically and not as pertaining to a particular standard, unless explicitly stated as so pertaining.
Embodiments of the subject matter and the functional operations described in this specification can be implemented in digital electronic circuitry, or in computer software, firmware, or hardware, including the structures disclosed in this specification and their structural equivalents, or in combinations of one or more of them. Embodiments of the subject matter described in this specification can be implemented as one or more computer program products, i.e., one or more modules of computer program instructions encoded on a computer readable medium for execution by, or to control the operation of, data processing apparatus. The computer readable medium can be a machine-readable storage device, a machine-readable storage substrate, a memory device, a composition of matter affecting a machine-readable propagated signal, or a combination of one or more of them. The term “data processing apparatus” encompasses all apparatus, devices, and machines for processing data, including by way of example a programmable processor, a computer, or multiple processors or computers. The apparatus can include, in addition to hardware, code that creates an execution environment for the computer program in question, e.g., code that constitutes processor firmware, a protocol stack, a database management system, an operating system, or a combination of one or more of them. A propagated signal is an artificially generated signal, e.g., a machine-generated electrical, optical, or electromagnetic signal, that is generated to encode information for transmission to suitable receiver apparatus.
A computer program (also known as a program, software, software application, script, or code) can be written in any form of programming language, including compiled or interpreted languages, and it can be deployed in any form, including as a stand alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment. A computer program does not necessarily correspond to a file in a file system. A program can be stored in a portion of a file that holds other programs or data (e.g., one or more scripts stored in a markup language document), in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, sub programs, or portions of code). A computer program can be deployed to be executed on one computer or on multiple computers that are located at one site or distributed across multiple sites and interconnected by a communication network.
The processes and logic flows described in this specification can be performed by one or more programmable processors executing one or more computer programs to perform functions by operating on input data and generating output. The processes and logic flows can also be performed by, and apparatus can also be implemented as, special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application specific integrated circuit).
Processors suitable for the execution of a computer program include, by way of example, both general and special purpose microprocessors, and any one or more processors of any kind of digital computer. Generally, a processor will receive instructions and data from a read only memory or a random access memory or both. The essential elements of a computer are a processor for performing instructions and one or more memory devices for storing instructions and data. Generally, a computer will also include, or be operatively coupled to receive data from or transfer data to, or both, one or more mass storage devices for storing data, e.g., magnetic, magneto optical disks, or optical disks. However, a computer need not have such devices. Moreover, a computer can be embedded in another device, e.g., a mobile telephone, a personal digital assistant (PDA), a mobile audio player, a Global Positioning System (GPS) receiver, to name just a few. Computer readable media suitable for storing computer program instructions and data include all forms of non volatile memory, media and memory devices, including by way of example semiconductor memory devices, e.g., EPROM, EEPROM, and flash memory devices; magnetic disks, e.g., internal hard disks or removable disks; magneto optical disks; and CD ROM and DVD-ROM disks. The processor and the memory can be supplemented by, or incorporated in, special purpose logic circuitry.
Embodiments of the subject matter described in this specification can be implemented in a computing system that includes a back end component, e.g., as a data server, or that includes a middleware component, e.g., an application server, or that includes a front end component, e.g., a client computer having a graphical user interface or a Web browser through which a user can interact with an implementation of the subject matter described is this specification, or any combination of one or more such back end, middleware, or front end components. The components of the system can be interconnected by any form or medium of digital data communication, e.g., a communication network. Examples of communication networks include a local area network (“LAN”) and a wide area network (“WAN”), e.g., the Internet.
The computing system can include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.
While this specification contains many specifics, these should not be construed as limitations on the scope of the invention or of what may be claimed, but rather as descriptions of features specific to particular embodiments of the invention. Certain features that are described in this specification in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination. Moreover, although features may be described above as acting in certain combinations and even initially claimed as such, one or more features from a claimed combination can in some cases be excised from the combination, and the claimed combination may be directed to a subcombination or variation of a subcombination.
Similarly, while operations are depicted in the drawings in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results. In certain circumstances, multitasking and parallel processing may be advantageous. Moreover, the separation of various system components in the embodiments described above should not be understood as requiring such separation in all embodiments, and it should be understood that the described program components and systems can generally be integrated together in a single software product or packaged into multiple software products.
Thus, particular embodiments of the invention have been described. Other embodiments are within the scope of the following claims. For example, the actions recited in the claims can be performed in a different order and still achieve desirable results. As another example, an adaptive encoder 210 (in
Number | Name | Date | Kind |
---|---|---|---|
5227878 | Puri et al. | Jul 1993 | A |
5377051 | Lane et al. | Dec 1994 | A |
5576902 | Lane et al. | Nov 1996 | A |
5729648 | Boyce et al. | Mar 1998 | A |
6363207 | Duruoz et al. | Mar 2002 | B1 |
6519287 | Hawkins et al. | Feb 2003 | B1 |
6931071 | Haddad et al. | Aug 2005 | B2 |
7545293 | Reznik | Jun 2009 | B2 |
7733958 | Su et al. | Jun 2010 | B2 |
8050324 | Yu et al. | Nov 2011 | B2 |
20030035478 | Taubman | Feb 2003 | A1 |
20030037335 | Gatto et al. | Feb 2003 | A1 |
20030043924 | Haddad et al. | Mar 2003 | A1 |
20030044166 | Haddad | Mar 2003 | A1 |
20030086000 | Siemens et al. | May 2003 | A1 |
20030106063 | Guedalia | Jun 2003 | A1 |
20040008775 | Panusopone et al. | Jan 2004 | A1 |
20060165301 | Cha et al. | Jul 2006 | A1 |
20070033494 | Wenger et al. | Feb 2007 | A1 |
20070092147 | Guionnet et al. | Apr 2007 | A1 |
20070116125 | Wada et al. | May 2007 | A1 |
20070133679 | Yang et al. | Jun 2007 | A1 |
20070195884 | Sakamoto | Aug 2007 | A1 |
20080063065 | Lin | Mar 2008 | A1 |
20080111721 | Reznik | May 2008 | A1 |
20080111722 | Reznik | May 2008 | A1 |
20080123733 | Yu et al. | May 2008 | A1 |
20110090963 | Po et al. | Apr 2011 | A1 |
20110110429 | La et al. | May 2011 | A1 |
Number | Date | Country |
---|---|---|
1802125 | Jun 2007 | EP |
WO 2006012382 | Feb 2006 | WO |
WO 2008048864 | Apr 2008 | WO |
Number | Date | Country | |
---|---|---|---|
20090080525 A1 | Mar 2009 | US |