The present invention relates generally to video coding and, specifically, to video coding using motion compensated temporal filtering.
For storing and broadcasting purposes, digital video is compressed, so that the resulting, compressed video can be stored in a smaller space.
Digital video sequences, like ordinary motion pictures recorded on film, comprise a sequence of still images, and the illusion of motion is created by displaying the images one after the other at a relatively fast frame rate, typically 15 to 30 frames per second. A common way of compressing digital video is to exploit redundancy between these sequential images (i.e. temporal redundancy). In a typical video at a given moment, there exists slow or no camera movement combined with some moving objects, and consecutive images have similar content. It is advantageous to transmit only the difference between consecutive images. The difference frame, called prediction error frame En, is the difference between the current frame In and the reference frame Pn. The prediction error frame is thus given by
En(x,y)=In(x,y)−Pn(x,y).
Where n is the frame number and (x, y) represents pixel coordinates. The predication error frame is also called the prediction residue frame. In a typical video codec, the difference frame is compressed before transmission. Compression is achieved by means of Discrete Cosine Transform (DCT) and Huffman coding, or similar methods.
Since video to be compressed contains motion, subtracting two consecutive images does not always result in the smallest difference. For example, when camera is panning, the whole scene is changing. To compensate for the motion, a displacement (Δx(x, y), Δy(x, y)) called motion vector is added to the coordinates of the previous frame. Thus prediction error becomes
En(x,y)=In(x,y)−Pn(x+Δx(x, y),y+Δy(x, y)).
In practice, the frame in the video codec is divided into blocks and only one motion vector for each block is transmitted, so that the same motion vector is used for all the pixels within one block. The process of finding the best motion vector for each block in a frame is called motion estimation. Once the motion vectors are available, the process of calculating Pn(x+Δx(x, y),y+Δy(x, y)) is called motion compensation and the calculated item Pn(x+Δx(x, y),y+Δy(x, y)) is called motion compensated prediction.
In the coding mechanism described above, reference frame Pn can be one of the previously coded frames. In this case, Pn is known at both the encoder and decoder. Such coding architecture is referred to as closed-loop.
Pn can also be one of original frames. In that case the coding architecture is called open-loop. Since the original frame is only available at the encoder but not the decoder, there may be drift in the prediction process with the open-loop structure. Drift refers to the mismatch (or difference) of prediction Pn(x+Δx(x, y), y+Δy(x, y)) between the encoder and the decoder due to different frames used as reference. Nevertheless, open-loop structure becomes more and more often used in video coding, especially in scalable video coding due to the fact that open loop structure makes it possible to obtain a temporally scalable representation of video by using lifting-steps to implement motion compensated temporal filtering (i.e. MCTF).
a and 1b show the basic structure of MCTF using lifting-steps, showing both the decomposition and the composition process for MCTF using a lifting structure. In these figures, In and In+1 are original neighboring frames.
The lifting consists of two steps: a prediction step and an update step. They are denoted as P and U respectively in
H=In+1−P(In)
L=In+U(H)
The prediction step P can be considered as the motion compensation. The output of P, i.e. P(In), is the motion compensated prediction. In
In the composite process shown in
I′n=L−U(H)
I′n+1=H+P(I′n)
If signals L and H remain unchanged between the decomposition and composition processes as shown in
The structure shown in
In MCTF, the prediction step is essentially a general motion compensation process, except that it is based on an open-loop structure. In such a process, a compensated prediction for the current frame is produced based on best-estimated motion vectors for each macroblock. Because motion vectors usually have sub-pixel precision, sub-pixel interpolation is needed in motion compensation. Motion vectors can have a precision of ¼ pixel. In this case, possible positions for pixel interpolation are shown in
Typically, values at half-pixel positions are obtained by using a 6-tap filter with impulse response (1/32, −5/32, 20/32, 20/32, −5/32, 1/32). The filter is operated on integer pixel values, along both the horizontal direction and the vertical direction where appropriate. For decoder simplification, 6-tap filter is generally not used to interpolate quarter-pixel values. Instead, the quarter positions are obtained by averaging an integer position and its adjacent half-pixel positions, and by averaging two adjacent half-pixel positions as follows:
b=(A+c)/2, d=(c+E)/2, f=(A+k)/2, g=(c+k)/2, h=(c+m)/2, i=(c+o)/2, j=(E+o)/2 l=(k+m)/2, n=(m+o)/2, p=(U+k)/2, q=(k+w)/2, r=(m+w)/2, s=(w+o)/2, t=(Y+o)/2 v=(w+U)/2, x=(Y+w)/2
An example of motion prediction is shown in
The present invention provides efficient methods for performing the update step in MCTF for video coding.
The update operation is performed according to coding blocks in the prediction residue frame. Depending on macroblock mode in the prediction step, a coding block can have different sizes. Macroblock modes are used to specify how a macroblock is segmented into blocks. For example, a macroblock may be segmented into a number of blocks as specified by a selected macroblock mode and the number can be one or more. In the update step, the reverse direction of the motion vectors used in the prediction step is used directly as an update motion vector and therefore no motion vector derivation process is performed.
Motion vectors that significantly deviate from their neighboring motion vectors are considered not reliable and excluded from the update step.
An adaptive filter is used in interpolating the prediction residue block for the update operation. The adaptive filter is an adaptive combination of a short filter (e.g. bilinear filter) and a long filter (e.g. 4 tap FIR filter). The switch between the short filter and the long filter is based on the energy level of the corresponding prediction residue block. If the energy level is high, the short filter is used for interpolation. Otherwise, the long filter is used.
For each prediction residue block, a threshold is adaptively determined to limit the maximum amplitude of the residue in the block before it is used as an update signal. In determining the threshold, one of the following mechanisms can be used:
Thus, the first aspect of the present invention is the method of encoding and decoding a video sequence having a plurality of video frames wherein a macroblock of pixels in a video frame is segmented based on a macroblock mode. The method comprises an update operation partially based on a reverse direction of motion vectors and a prediction operation.
The second aspect of the present invention is the encoding module and the decoding module having a plurality of processors for carrying out the method of encoding and decoding as described above.
The third aspect of the present invention is an electronic device, such as a mobile terminal, having the encoding module and/or the decoding module as described above.
The fifth aspect of the present invention is a software application product having a memory for storing a software application having program codes to carry out the method of encoding and/or decoding as described above.
The present invention provides an efficient solution for MCTF update step. It not only simplifies the update step interpolation process, but also eliminates the update motion vector derivation process. By adaptively determining a threshold to limit the prediction residue, this method does not require the threshold values to be saved in bit-stream.
a shows the decomposition process for MCTF using a lifting structure.
b shows the composition process for MCTF using the lifting structure.
a shows an example of the relationship of associated blocks and motion vectors that are used in the prediction step.
b shows the relationship of associated blocks and motion vectors that are used in the update step.
Both the decomposition and composition processes for motion compensated temporal filtering (MCTF) can use a lifting structure. The lifting consists of a prediction step and an update step.
In the update step, the prediction residue at block Bn+1 can be added to the reference block along the reverse direction of the motion vectors used in the prediction step. If the motion vector is (Δx, Δy) (see
The update process is performed only on integer pixels in frame In. If An is located at a sub-pixel position, its nearest integer position block A′n is actually updated according to the motion vector (−Δx, −Δy). This is shown in
The update step can be performed block by block with a block size of 4×4 in the frame to be updated. For each 4×4 block in the frame, a good motion vector for updating the block may be derived by scanning all the motion vectors used in the prediction step and selecting the motion vector that has the maximum cover ratio of the current 4×4 block. This is shown in
In one embodiment of the present invention, the update operation is performed according to coding blocks in the prediction residue frame. Depending on the macroblock mode in the prediction step, a coding block can have different size, e.g. from 4×4 up to 16×16.
As shown in
Now that the position of block A′n and the update motion vector (−Δx, −Δy) are both available, the reference block for block A′n in the update step can also be located. This is shown in
In sum, each coding block Bn+1 in prediction residue frame is processed in the following procedures:
According to one embodiment of the present invention, the block diagrams for MCTF decomposition (or analysis) and MCTF composition (or synthesis) are shown in
In the above-described process, pixels to be updated are not grouped in 4×4 blocks. Instead, they are grouped according to the exact block partition and motion vector it is associated with.
Removing Outlier or Unreliable Motion Vectors from Update Step
In order to improve the coding performance and to further simplify the update step operation, a motion vector filtering process can be incorporated for the update step in MCTF. Motion vectors that are too much different from their neighboring motion vectors can be excluded from the update operation.
There are different ways in filtering motion vectors for this purpose. One way is to check the differential motion vector of each coding block in the prediction residue frame. The differential motion vector is defined as the difference between the current motion vector and the prediction of the current motion vector. The prediction of the current motion vector can be inferred from the motion vectors of neighboring coding blocks that are already coded (or decoded). For coding efficiency, the corresponding differential motion vector is coded into bit-stream.
The differential motion vector reflects how different the current motion vector is from its neighboring motion vectors. Thus, it can be directly used in the motion vector filtering process. For example, if the difference reaches a certain threshold Tmv, the motion vector is excluded. Assuming the differential motion vector of the current coding block is (Δdx, Δdy), then the following condition can be used in the filtering process:
|Δdx|+|Δdy|<Tmv
If a differential motion vector does not meet the above condition, the corresponding motion vector is excluded from the update operation. It should be noted that the above condition is only an example. Other conditions can also be derived and used. For instance, the condition can be
max(|Δdx|, |Δdy|)<Tmv.
Here max is an operation that returns the maximum value among a set of given values.
Since the prediction of the current motion vector is inferred only from the motion vectors of the neighboring coding blocks that are already coded (or decoded), it is also possible to check the motion vectors of more neighboring blocks regardless of their coding order relative to the current block. To carry out the filtering, one example is to consider the four neighboring blocks that are above, below, left of and right of the current block. The average of the four motion vectors associated with the four neighboring blocks is calculated and compared with the motion vector of the current block. Again, the conditions mentioned above can be used to measure the difference of the average motion vector and the current motion vector. If the difference reaches a certain threshold, the current motion vector is excluded from update operation.
By removing some of the motion vectors from the update step operation, such a filtering process can further reduce the update step computation complexity. With a motion vector filter module, the MCTF decomposition and composition processes are shown in
Adaptive Interpolation for Update Step Based on Prediction Residue Energy Level
In the present invention, an adaptive filter is used in the interpolating prediction residue block for the update operation. The adaptive filter is an adaptive combination of a shorter filter (e.g. bilinear filter) and a longer filter (e.g. 4-tap filter). Switching between the short filter and the long filter can be based on a final weight factor of each 4×4 block. The final weight factor is determined based on the prediction residue energy level of the block as well as the reliability of the update motion vector derived for the block adopted for interpolation in the update process with slight modification. Energy estimation and interpolation are performed on the whole coding block regardless of its size. Interpolation on a larger block means less overall computation because more intermediate results can be shared in the process.
Energy estimation can be carried out in different methods. One method is to use the average squared pixel value of the block as the energy level. If the mean value of a prediction residue block is assumed to be zero, the average squared pixel value of the block is equivalent to the variance of the block. In one embodiment of the present invention, a different filter from a filter set is selected in interpolating the block based on the calculated energy level. Blocks with a lower energy level have relatively smaller prediction residue, which also indicates that motion vectors associated with these blocks are relatively more reliable. When choosing the interpolation filter, it is preferable to use the long filter for interpolation of these blocks because they are more important in maintaining the coding performance. For blocks with higher energy levels, however, the short filter can be used.
Taking
Adaptive Threshold for Controlling Update Signal Strength
In the present invention, a threshold is adaptively determined for each coding block and used to limit the maximum amplitude of update signal for the block. Since the threshold values are adaptively determined in the coding process, there is no need to save them in coded bitstream.
In the example as shown in
U(ij)=min(Tm, max(−Tm, U(ij)))
In the above equation, max and min are operations that return the maximum and minimum value respectively among a set of given values.
There are different ways in determining the threshold value for each coding block. One way is to determine the threshold value based on the energy level of the block. Since the energy level of the block is already calculated in selecting interpolation filter, it can be re-used in this step.
As mentioned above, blocks with lower energy levels have relatively smaller prediction residue, which also indicates that motion vectors associated with these blocks are relatively more reliable. In this case, a higher threshold value should be assigned so that most prediction residue values in the block can be used directly for update without being capped by the threshold. On the other hand, for block with higher energy level, since motion vectors of the block may not be reliable, a relatively lower threshold should be assigned to avoid introducing visual artifacts.
One example of relating the threshold value to the prediction residue energy level can be given as follows:
Tm=C1*(1−E)+D1
In the above equation, E represents the prediction residue energy level of the block. As explained earlier, it is assumed that E is normalized and is in the range of [0, 1]. C1 and D1 are two constants and their values can be determined through tests. For example, with C1=16 and D1=4, the corresponding threshold values are found to be appropriate with good coding performance. According to the above equation, the higher the energy level of the block, the lower a threshold value is used. The block diagram of such an adaptive control process on update signal strength is shown in
In another embodiment of the present invention, the threshold value is adaptively determined based on a block-matching factor. The block-matching factor is an indicator indicating how well the block is matched or predicted in the prediction step. If the block is matched well, it implies that the corresponding motion vector is more reliable. In this case, a higher threshold value may be used in the update step. Otherwise, a lower threshold value should be used.
To obtain the block-matching factor, one method is to check the ratio of the variance of the corresponding block to be updated versus the energy level of the prediction residue block. For the example shown in
Another method in obtaining a block-matching factor is to perform a high pass filtering operation on the block to be updated. Then the amplitude (i.e. absolute value) of each filtered pixel in the block is compared against the amplitude of the corresponding prediction residue pixel. It can be assumed that the prediction residue pixel should have smaller amplitude than the corresponding filtered pixel if the block is well matched in the prediction step. The percentage of prediction residue pixels in the block having smaller amplitude than corresponding filtered pixels can be used as block-matching factor. The percentage may be a good indication that the block is well-matched in the prediction step.
The high pass filtering operation can be general and is not limited to one method. One example is to apply a 2-D filter as follows:
Another example is to calculate the value difference between the current pixel and its four nearest neighboring pixels. The maximum difference among the four differential values can be used as the high pass filtered value for the current pixel.
Besides the above two examples of high pass filter, other high pass filters can also be used.
Once the block-matching factor is obtained, a threshold value can be derived from the block-matching factor. Assume the block-matching factor is M and it is a normalized value in the range of [0, 1]. An example of deriving the threshold value from the block matching factor can be given as follows:
Tm=C2*M+D2
In the above equation, C2 and D2 are two constants and their values can be determined through tests. For example, C2=16 and D2=4 may be appropriate values. According to the above equation, if a block is matched well and M has a relatively large value, Tm also has a relatively large value.
The process of adaptive control of update signal strength based on block-matching factor is shown in
In summary, the present invention provides a method, an apparatus and a software application product for performing the update step in motion compensated temporal filtering for video coding.
The update operation is performed according to coding blocks in the prediction residue frame. Depending on macroblock mode in the prediction step, a coding block can have different sizes. In encoding, the method is illustrated in
In decoding, the method is illustrated in
Referring now to
The mobile device 10 may communicate over a voice network and/or may likewise communicate over a data network, such as any public land mobile networks (PLMNs) in form of e.g. digital cellular networks, especially GSM (global system for mobile communication) or UMTS (universal mobile telecommunications system). Typically the voice and/or data communication is operated via an air interface, i.e. a cellular communication interface subsystem in cooperation with further components (see above) to a base station (BS) or node B (not shown) being part of a radio access network (RAN) of the infrastructure of the cellular network.
The cellular communication interface subsystem as depicted illustratively in
In case the mobile device 10 communications through the PLMN occur at a single frequency or a closely-spaced set of frequencies, then a single local oscillator (LO) 123 may be used in conjunction with the transmitter (TX) 122 and receiver (RX) 121. Alternatively, if different frequencies are utilized for voice/data communications or transmission versus reception, then a plurality of local oscillators can be used to generate a plurality of corresponding frequencies.
Although the mobile device 10 depicted in
After any required network registration or activation procedures, which may involve the subscriber identification module (SIM) 210 required for registration in cellular networks, have been completed, the mobile device 10 may then send and receive communication signals, including both voice and data signals, over the wireless network. Signals received by the antenna 129 from the wireless network are routed to the receiver 121, which provides for such operations as signal amplification, frequency down conversion, filtering, channel selection, and analog to digital conversion. Analog to digital conversion of a received signal allows more complex communication functions, such as digital demodulation and decoding, to be performed using the digital signal processor (DSP) 120. In a similar manner, signals to be transmitted to the network are processed, including modulation and encoding, for example, by the digital signal processor (DSP) 120 and are then provided to the transmitter 122 for digital to analog conversion, frequency up conversion, filtering, amplification, and transmission to the wireless network via the antenna 129.
The microprocessor/micro-controller (μC) 110, which may also be designated as a device platform microprocessor, manages the functions of the mobile device 10. Operating system software 149 used by the processor 110 is preferably stored in a persistent store such as the non-volatile memory 140, which may be implemented, for example, as a Flash memory, battery backed-up RAM, any other non-volatile storage technology, or any combination thereof. In addition to the operating system 149, which controls low-level functions as well as (graphical) basic user interface functions of the mobile device 10, the non-volatile memory 140 includes a plurality of high-level software application programs or modules, such as a voice communication software application 142, a data communication software application 141, an organizer module (not shown), or any other type of software module (not shown). These modules are executed by the processor 100 and provide a high-level interface between a user of the mobile device 10 and the mobile device 10. This interface typically includes a graphical component provided through the display 135 controlled by a display controller 130 and input/output components provided through a keypad 175 connected via a keypad controller 170 to the processor 100, an auxiliary input/output (I/O) interface 200, and/or a short-range (SR) communication interface 180. The auxiliary I/O interface 200 comprises especially USB (universal serial bus) interface, serial interface, MMC (multimedia card) interface and related interface technologies/standards, and any other standardized or proprietary data communication bus technology, whereas the short-range communication interface radio frequency (RF) low-power interface includes especially WLAN (wireless local area network) and Bluetooth communication technology or an IRDA (infrared data access) interface. The RF low-power interface technology referred to herein should especially be understood to include any IEEE 801.xx standard technology, which description is obtainable from the Institute of Electrical and Electronics Engineers. Moreover, the auxiliary I/O interface 200 as well as the short-range communication interface 180 may each represent one or more interfaces supporting one or more input/output interface technologies and communication interface technologies, respectively. The operating system, specific device software applications or modules, or parts thereof, may be temporarily loaded into a volatile store 150 such as a random access memory (typically implemented on the basis of DRAM (direct random access memory) technology for faster operation). Moreover, received communication signals may also be temporarily stored to volatile memory 150, before permanently writing them to a file system located in the non-volatile memory 140 or any mass storage preferably detachably connected via the auxiliary I/O interface for storing data. It should be understood that the components described above represent typical components of a traditional mobile device 10 embodied herein in the form of a cellular phone. The present invention is not limited to these specific components and their implementation depicted merely for illustration and for the sake of completeness.
An exemplary software application module of the mobile device 10 is a personal information manager application providing PDA functionality including typically a contact manager, calendar, a task manager, and the like. Such a personal information manager is executed by the processor 100, may have access to the components of the mobile device 10, and may interact with other software application modules. For instance, interaction with the voice communication software application allows for managing phone calls, voice mails, etc., and interaction with the data communication software application enables for managing SMS (soft message service), MMS (multimedia service), e-mail communications and other data transmissions. The non-volatile memory 140 preferably provides a file system to facilitate permanent storage of data items on the device including particularly calendar entries, contacts etc. The ability for data communication with networks, e.g. via the cellular interface, the short-range communication interface, or the auxiliary I/O interface enables upload, download, and synchronization via such networks.
The application modules 141 to 149 represent device functions or software applications that are configured to be executed by the processor 100. In most known mobile devices, a single processor manages and controls the overall operation of the mobile device as well as all device functions and software applications. Such a concept is applicable for today's mobile devices. The implementation of enhanced multimedia functionalities includes, for example, reproducing of video streaming applications, manipulating of digital images, and capturing of video sequences by integrated or detachably connected digital camera functionality. The implementation may also include gaming applications with sophisticated graphics and the necessary computational power. One way to deal with the requirement for computational power, which has been pursued in the past, solves the problem for increasing computational power by implementing powerful and universal processor cores. Another approach for providing computational power is to implement two or more independent processor cores, which is a well known methodology in the art. The advantages of several independent processor cores can be immediately appreciated by those skilled in the art. Whereas a universal processor is designed for carrying out a multiplicity of different tasks without specialization to a pre-selection of distinct tasks, a multi-processor arrangement may include one or more universal processors and one or more specialized processors adapted for processing a predefined set of tasks. Nevertheless, the implementation of several processors within one device, especially a mobile device such as mobile device 10, requires traditionally a complete and sophisticated re-design of the components.
In the following, the present invention will provide a concept which allows simple integration of additional processor cores into an existing processing device implementation enabling the omission of expensive complete and sophisticated redesign. The inventive concept will be described with reference to system-on-a-chip (SoC) design. System-on-a-chip (SoC) is a concept of integrating at least numerous (or all) components of a processing device into a single high-integrated chip. Such a system-on-a-chip can contain digital, analog, mixed-signal, and often radio-frequency functions—all on one chip. A typical processing device comprises a number of integrated circuits that perform different tasks. These integrated circuits may include especially microprocessor, memory, universal asynchronous receiver-transmitters (UARTs), serial/parallel ports, direct memory access (DMA) controllers, and the like. A universal asynchronous receiver-transmitter (UART) translates between parallel bits of data and serial bits. The recent improvements in semiconductor technology cause very-large-scale integration (VLSI) integrated circuits to enable a significant growth in complexity, making it possible to integrate numerous components of a system in a single chip. With reference to
Additionally, the device 10 is equipped with a module for scalable encoding 105 and scalable decoding 106 of video data according to the inventive operation of the present invention. By means of the CPU 100 said modules 105, 106 may individually be used. However, the device 10 is adapted to perform video data encoding or decoding respectively. Said video data may be received by means of the communication modules of the device or it also may be stored within any imaginable storage means within the device 10. Video data can be conveyed in a bitstream between the device 10 and another electronic device in a communications network.
Although the invention has been described with respect to one or more embodiments thereof, it will be understood by those skilled in the art that the foregoing and various other changes, omissions and deviations in the form and detail thereof may be made without departing from the scope of this invention.
The patent application is based on and claims priority to a pending U.S. Provisional Patent Application Ser. No. 60/695,648, filed Jun. 29, 2005.
Number | Date | Country | |
---|---|---|---|
60695648 | Jun 2005 | US |