This application claims the benefit under 35 U.S.C. §119(a)-(d) of UK Patent Application No. 1113113.3, filed on Jul. 29, 2011 and entitled “Method and Device for Error Concealment in Motion Estimation of Video Data”.
The above cited patent application is incorporated herein by reference in its entirety.
The present invention relates to video data encoding and decoding. In particular, the present invention relates to video encoding and decoding using an encoder and a decoder such as those that use the H.264/AVC standard encoding and decoding methods. The present invention focuses on error concealment based on motion information in the case of part of the video data being lost between the encoding and decoding processes.
H.264/AVC (Advanced Video Coding) is a standard for video compression that provides good video quality at a relatively low bit rate. It is a block-oriented compression standard using motion-compensation algorithms. By block-oriented, what is meant is that the compression is carried out on video data that has effectively been divided into blocks, where a plurality of blocks usually makes up a video frame (also known as a video picture). Processing frames block-by-block is generally more efficient than processing frames pixel-by-pixel and block size may be changed depending on the precision of the processing. A large block (or a block that contains several other blocks) may be known as a macroblock and may, for example, be 16 by 16 pixels in size. The compression method uses algorithms to describe video data in terms of a movement or translation of video data from a reference frame to a current frame (i.e. for motion compensation within the video data). This is known as “inter-coding” because of the inter-image comparison between blocks. The following steps illustrate the main steps of the ‘inter-coding’ applied to the current frame at the encoder side. In this case, the comparison between blocks gives rise to information (i.e. a prediction) regarding how an image in the frame has moved and the relative movement plus a quantized prediction error are encoded and transmitted to the decoder. Thus, the present type of inter-coding is known as “motion prediction encoding”.
1. A current frame is to be a “predicted frame”. Each block of this predicted frame is compared with a reference areas in a reference frame to give rise to a motion vector for each predicted block pointing back to a reference area. The set of motion vectors for the predicted frame obtained by this motion estimation gives rise to a motion vector field. This motion vector field is then entropy encoded.
2. The current frame is then predicted from the reference frame and the difference signal for each predicted block with respect to its reference area (pointed to by the relevant motion vector) is calculated. This difference signal is known as a “residual”. The residual representing the current block then undergo a transform such as a discrete cosine transform (DCT), quantisation and entropy encoding before being transmitted to the decoder.
Defining a current block by way of a motion vector from a reference area (i.e. by way of temporal prediction) will, in many cases, use less data than intra-coding the current block completely without the use of motion prediction. In the case of intra-coding a current block, that block is intra-predicted (predicted from pixels in the neighbourhood of the block), DCT transformed, quantized and entropy encoded. Generally, this occurs in a loop so that each block undergoes each step above individually, rather than in batches of blocks, for instance. With a lack of motion prediction, more information is transmitted to the decoder for intra-encoded blocks than for inter-coded blocks.
Returning to inter-coding, a step which has a bearing on the efficiency and efficacy of the motion prediction is the partitioning of the predicted frame into blocks. Typically, macroblock-sized blocks are used. However, a further partitioning step is possible, which divides macroblocks into rectangular partitions with different sizes. This has the aim of optimising the prediction of the data in each macroblock. These rectangular partitions each undergo a motion compensated temporal prediction.
The inter-coded and intra-coded partitions are then sent as an encoded bitstream through a communication channel to a decoder.
At the decoder side, the inverse of the encoding processes is performed. Thus, the encoded blocks undergo entropy decoding, inverse quantisation and inverse DCT. If the blocks are intra-coded, this gives rise to the reconstructed video signal. If the blocks are inter-coded, after entropy decoding, both the motion vectors and the residuals are decoded. A motion compensation process is conducted using the motion vectors to reconstruct an estimated version of the blocks. The reconstructed residual is added to the estimated reconstructed block to give rise to the final version of the reconstructed block.
Sometimes, for example, if the communication channel is unreliable, packets being sent over the channel may be corrupted or even lost. To deal with this problem at the decoder end, error concealment methods are known which help to rebuild the image blocks corresponding to the lost packets.
There are two main types of error concealment: spatial error concealment and temporal error concealment.
Spatial error concealment uses data from the same frame to reconstruct the content of lost blocks from that frame. For example, the available data is decoded and the lost area is reconstructed by luminance and chrominance interpolation from the successfully decoded data in the spatial neighbourhood of the lost area. Spatial error concealment is generally used in a case in which it is known that motion or luminance correlation between the predicted frame and the previous frame is low, for example, in the case of a scene change. The main problems with spatial error concealment is that the reconstructed areas are blurred because the interpolation can be considered to be equivalent to a kind of low-pass filtering of the image signal of the spatial neighbourhood; and this method does not deal well with a case in which several blocks—or even a whole slice—are lost.
Temporal error concealment—such as that described in US 2009/0138773, US 2010/0309982 or US 2010/0303154—reconstructs a field of motion vectors from the data available and then applies a reconstructed motion vector corresponding to a lost block in a predicted frame in such a way as to enable prediction of the luminance and the chrominance of the lost block from the luminance and chrominance of the corresponding reference area in the reference frame. For example, if the motion vector of a predicted block in a current predicted frame has been corrupted, a motion vector can be computed from the motion vectors of the blocks located in the spatial neighbourhood of the predicted block. This computed motion vector is then used to recognise a candidate reference area from which the luminance of the lost block of the predicted frame can be estimated. Temporal error concealment works if there is sufficient correlation between the current frame and the previous frame (used as the reference frame), for example, when there is no change of scene.
However, temporal error concealment is not always effective when several blocks or even full slices are corrupted or lost.
It is desirable to improve the motion reconstruction process in video error concealment while maintaining a high decoding speed and high compression efficacy. Specifically, it is desirable to improve the block reconstruction quality while transmitting a very low quantity of auxiliary information and limiting delay in transmission.
Video data that is transmitted between a server (acting as an encoder) and at least one client (acting as a decoder) over a packet network is subject to packet losses (i.e. losses of packets that contain the elementary video data stream corresponding to frame blocks). For example, the network can be an internet protocol (IP) network carrying IP packets. The network can be a wired network and/or a wireless network. The network is subject to packet losses at several places within the network. Two kinds of packet losses exist:
For dealing with these losses, several solutions are possible. The first solution is the usage of a congestion control algorithm. If loss notifications are received by the server (i.e. notifications that packets are not being received by the client), it can decide to decrease its transmission rate, thus controlling congestion over the network. Congestion control algorithms like TCP (Transmission Control Protocol) or TFRC (TCP Friendly Rate Control) implement this strategy. However, such protocols are not fully effective for facing congestion losses and are not at all effective for facing interferences losses.
Other solutions are based on protection mechanisms.
Forward Error Code (FEC) protects transmitted packets (e.g. RFC 2733) by transmitting additional packets with the video data. However, these additional packets can take up a large proportion of the communication channel between the server and the client, risking further congestion. Nevertheless, FEC enables the reconstruction of a perfect bit-stream if the quantity of auxiliary information is sufficient.
Packet retransmission (e.g. RFC 793), as the name suggests, retransmits at least packets that are lost. This causes additional delay that can be unpleasant for the user (e.g. in the context of video conferencing, where a time lag is detrimental to the efficient interaction between conference attendees). The counterpart of this increased delay is a very good reconstruction quality.
The use of redundant slices (as discussed in “Systematic Lossy Error Protection based on H.264/AVC redundant slices and flexible macroblock ordering”, Journal of Zhejiang University, University Press, co-published with Springer, ISSN1673-565X (Print) 1862-1775 (Online), Volume 7, No. 5, May 2006) requires the transmission of a high quantity of auxiliary information (though this quantity is usually lower than the quantity generated by the FEC). Redundant slices often enable only an approximation of the lost part of the video data.
As mentioned above, spatial and temporal error concealment work well if only a very small number of packets are lost, and if the packets that are lost contain blocks that are not near each other either spatially or temporally respectively because it is the neighbouring blocks (in the spatial or temporal direction) that are used to rebuild the lost blocks.
Thus, none of the solutions proposed in the prior art enables the improvement of the block reconstruction quality while transmitting a very low quantity of auxiliary information and limiting delay in transmission.
It is thus proposed to improve the quality of the lost blocks of the video (using error concealment algorithms) while transmitting little auxiliary information. This will be described below with reference to the figures.
According to a first aspect of the invention, there is provided an encoder for encoding a first frame I(t) of a video bitstream, the first frame being defined by a plurality of blocks of pixels, the encoder comprising: means for generating an irregular grid of cells, each cell having a size generated according to a motion vector field derived from motion information of a second frame I(t−1) of the video bitstream; means for generating motion vectors to be applied to each cell of the irregular grid, the generated motion vectors being representative of the motion in the first frame I(t) of the video bitstream; and means for transmitting the generated motion vectors to a decoder.
The complexity (or extent) of motion demonstrated by the motion vector field in the second frame is preferably determined by comparison with a preceding frame I(t−2) and gives rise to a complexity map. This complexity map gives an indication of areas of high complexity and areas of low complexity. Blocks of pixels in areas of low complexity are grouped together in large cells with one motion vector allocated to it (ways of determining the motion vector to be allocated to it are described below and may involve more than simply taking an average of the motion vectors of the blocks in the large cell) and blocks in areas are high complexity are divided (or grouped) into small cells. The sizes of the large and small cells are variable and may be anything from a block to the whole frame, the latter being possible if there is substantially no movement from one frame to the next. Once the cell sizes are determined, this gives rise to an “irregular grid” representing the second frame, as the grid cells are chosen based on the motion of the second frame. Motion vectors of the first frame (i.e. the frame being encoded) are then mapped onto the irregular grid using methods described below.
According to a second aspect of the invention, there is provided a transcoder for encoding a first frame I(t) of a video bitstream, the first frame being defined by a plurality of blocks of pixels, the encoder comprising: means for generating an irregular grid of cells, each cell having a size generated according to a motion vector field derived from motion information of a second frame I(t−1) of the video bitstream; means for generating motion vectors to be applied to each cell of the irregular grid, the generated motion vectors being representative of the motion in the first frame I(t) of the video bitstream; and means for transmitting the generated motion vectors to a decoder. A transcoder is similar in function to an encoder, but creates decoding information (e.g. as auxiliary information) from video data that has already been encoded, rather than from raw data as the encoder does.
According to a third aspect of the invention, there is provided a device—optionally an encoder—for generating auxiliary information for a first frame I(t) of a video bitstream, the first frame being defined by a plurality of blocks of pixels, the encoder comprising: means for generating an irregular grid of cells for the frame I(t) based on the motion information of a second frame I(t−1) of the video bitstream; means for generating motion vectors to be applied to each cell of the irregular grid, the generated motion vectors representing the motion of the first frame I(t) of the video bitstream; and means for transmitting the generated motion vectors to a decoder as auxiliary information.
According to a fourth aspect of the invention, there is provided a decoder for decoding a first frame I(t) of a video bitstream, the decoder comprising: means for generating an irregular grid of cells, each cell having a size generated according to a complexity of a motion vector field based on motion information of a second frame I(t−1) of the video bitstream at the respective position of the cell; means for receiving motion vectors from an encoder to be applied to each cell of the irregular grid, the received motion vectors representing the motion of the first frame I(t) of the video bitstream at the position of the cell; and means for applying the received motion vectors to the cells of the generated irregular grid to generate a motion vector field to be used for motion prediction of the first frame I(t).
The present invention is applicable where the first frame I(t) has lost one or more blocks of pixels while being transmitted from the encoder to the decoder. The decoder does not receive the irregular grid from the encoder, but recreates it itself using motion information from the preceding frames I(t−1), etc. The decoder does receive the motion vectors from the encoder that correspond to the first frame I(t) but that are associated with the cells in the irregular grid. In this way, the decoder determines which cells of the irregular grid correspond to or contain missing blocks and applies the received motion vectors of those cells to the incomplete first frame I(t) at the respective cell position. The advantage of this is that only the motion vectors need to be transmitted: both the encoder and the decoder are able to recreate the same irregular grid using correctly-received frames and certain predetermined rules.
According to a fifth aspect of the present invention, there is provided a processing device for generating an irregular grid of cells, each cell having associated with it a separate motion vector based on the motion in a current frame I(t−1) of a video bitstream, the processing device comprising: means for reading a plurality of blocks of the current frame I(t−1); means for determining a complexity value representing the complexity of motion within each block of the current frame I(t−1); means for grouping blocks together that have a low complexity value into a large cell with a single motion vector representing motion within the large cell, and for grouping or dividing blocks that have a high complexity value into small cells, each small cell having a motion vector representing the motion within each small cell; and means for generating an irregular grid made up of the large and/or small cells. This same processing device may be present in either or both of the encoder and the decoder, or indeed in a transcoder, which, as mentioned above, creates the estimated motion vector information that is sent to the decoder based on already-encoded video data rather than on raw data.
According to a sixth aspect of the present invention, there is provided an image processing system comprising an encoder as described above and a decoder as described above, wherein the encoder and the decoder are configured to generate the same irregular grid of cells.
According to a seventh aspect of the present invention, there is provided an encoding method of encoding a first frame I(t) of a video bitstream, the method comprising: generating an irregular grid of cells, each cell having associated with it a separate motion vector based on the motion of a second frame I(t−1) of the video bitstream at the position of the respective cell; generating motion vectors to be applied to each cell of the irregular grid, the generated motion vectors representing the motion of the first frame I(t) of the video bitstream at positions corresponding to the positions of each cell when the irregular grid is applied to the first frame I(t); and transmitting the generated motion vectors to a decoder.
According to an eighth aspect of the present invention, there is provided a decoding method of decoding a first frame I(t) of a video bitstream, the method comprising: generating an irregular grid of cells, each cell having associated with it a motion vector based on the motion of a second frame I(t−1) of the video bitstream at the position of the respective cell; receiving motion vectors from an encoder to be applied to each cell of the irregular grid, the generated motion vectors representing motion in the first frame I(t) at positions corresponding to positions of the cells of the irregular grid when applied to the first frame I(t); and applying the received motion vectors to the cells of the generated irregular grid to generate a motion vector field to be used for motion prediction of the first frame I(t).
According to a ninth aspect of the present invention, there is provided a transcoding method comprising: generating an irregular grid of cells, each cell having associated with it a separate motion vector based on the motion of a second frame I(t−1) of the video bitstream at the position of the respective cell; generating motion vectors to be applied to each cell of the irregular grid, the generated motion vectors representing the motion of the first frame I(t) of the video bitstream at positions corresponding to the positions of each cell when the irregular grid is applied to the first frame I(t); and transmitting the generated motion vectors to a decoder.
According to a tenth aspect of the present invention, there is provided a method of generating an irregular grid of cells, each cell having associated with it a motion vector based on the motion of a current frame I(t−1) of a video bitstream, the method comprising: reading a plurality of blocks of the current frame; determining a complexity value representing the complexity of motion within each block of pixels of the current frame; grouping blocks together that have a low complexity value into a large cell with a single motion vector representing the motion within the large cell, and grouping or dividing blocks that have a high complexity value into small cells, each having motion vectors representing the motion within each small cell; and generating an irregular grid made up of the large and/or small cells.
The invention also provides a computer program and a computer program product for carrying out any of the methods described herein and/or for embodying any of the apparatus features described herein, and a computer readable medium having stored thereon a program for carrying out any of the methods described herein and/or for embodying any of the apparatus features described herein.
In
The server 100 sends a video data bitstream in the form of IP/RTP packets 103 over a first network link 102. The compressed bitstream (elementary stream generated by the server) is split into sub-parts (slices). These slices are embedded as VCL NALU (Video Coding Layer Network Abstraction Layer Units) into the IP/RTP packets 103.
When a video bitstream is being manipulated (e.g. transmitted or encoded, etc.), it is useful to have a means of containing and identifying the data. To this end, a type of data container used for the manipulation of the video data is a unit called a Network Abstraction Layer Unit (NAL unit or NALU). A NALU—rather than being a physical division of the frame as the macroblocks described above are—is a syntax structure that contains bytes representing data. Different types of NALU may contain coded video data or information related to the video data. A set of successive NALUs that contributes to the decoding of one frame forms an Access Unit (AU).
Returning to
The wireless network is subject to interferences 109. For example, microwaves can pollute the wireless network. In such a case, some packets 110 may be lost. The distance between two losses caused by interference is usually higher than the distance between two losses caused by congestion. However, it is possible that losses caused by interference are also close or even consecutive.
Finally, in
To compensate for these losses, it is possible to use error concealment algorithms for reconstructing the missing part of the video as discussed above. However, the reconstruction quality is often poor and auxiliary information is usually necessary for helping the error concealment. It is proposed herein to use a new algorithm that has as an aim to generate a very low quantity of auxiliary information. This low quantity of auxiliary information enables the improvement of the reconstruction quality in comparison to classic error concealment. As this quantity of auxiliary information is very low, its transmission is easy.
The main modules of the server 100 are shown schematically in box 205. In a video encoder 207, the video compression (e.g. H.264) algorithm compresses the input video data and generates a video bitstream 208. In parallel, auxiliary information 210 is calculated in an auxiliary information extraction module 209. The auxiliary information 210 may be created by the encoder itself or by an external transcoder, which takes as an input the encoded video data and creates the auxiliary information from that.
This auxiliary information 210 is related to the motion information between consecutives frames of the video bitstream. The extracted auxiliary information 210 is merged with the video bitstream 208 to give rise to a final bit-stream 211 that will be transmitted to the client. For example, the auxiliary information is put in an SEI (Supplemental Enhancement Information) of the H.264 or H.264/AVC or other type of bitstream. The SEI is optional information that can be embedded in the bitstream (in the form of a NALU). This information can be ignored by the decoder not aware of the syntax of the SEI. On the other hand, a dedicated video decoder can read this SEI and can extract the auxiliary information as appropriate.
The main modules of the video client 101 are shown in box 206. The video decompression is first triggered in a decoder 212. Assuming, for this module, that the received RTP packets have been successfully received, the video decompression corresponds to the extraction of the different NALUs of the bitstream and the decompression of each NALU. Two kinds of information are extracted:
The auxiliary information 213; and
The video 214 which is not related to the auxiliary information.
If RTP packets have been lost during the video transmission (e.g. packets 204), an error correction algorithm based on the motion auxiliary information is run in an auxiliary information correction module 215.
The embodiments of the present invention are particularly concerned with creating the auxiliary information 210 and 213 in both the video server 100 and in the video client 101. Optimally, the auxiliary information that is transmitted is minimal, but with sufficient information to reconstruct blocks even when information for reconstructing those blocks has been lost in a lost packet. The embodiments of the present invention are also concerned with how the video server and the video client can use an optimal amount of auxiliary information most efficiently to obtain correctly-reconstructed blocks from the successfully-received information.
According to an embodiment of the invention, the encoder 207 in the video server 100 and the decoder 212 in the video client 101 perform the creation and use of the auxiliary information in the following way.
At the encoder (or transcoder), for a frame I(t):
By “irregular grid”, what is meant is that a motion vector field is divided in such a way that areas each defined by a motion vector vary in size over the motion vector field. The appearance of the irregular grid and the way in which it is generated, as well as how it is down-sampled, will be described in more detail below.
In motion estimation module 305, the encoder estimates the motion between the frame 303 and the frame 302. The motion estimation algorithm may be a block-matching algorithm. The result of this motion estimation is the motion vector field 306. Specifically, this motion vector field 306 is a symbolic representation of the motion vector field calculated by the motion estimation module 305.
The motion vector field 306 may represent a frame having a size of 64×32 pixels. In the embodiment shown, the frame is composed of 8 macroblocks of 16×16 pixels each. Each of the macroblocks is potentially divisible to create the irregular grid as explained below.
According to the complexity of the motion between the frames 303 and 302, the macroblocks can be decomposed into either 8×8 pixel blocks or 4×4 pixel blocks. For example, in a first macroblock 307, one motion vector is associated with the 16×16 macroblock. In macroblock 308, one motion vector is associated to each 8×8 block (so there are four motion vectors allocated to the macroblock 308). In macroblock 309, one motion vector is associated to each 4×4 block within one of the 8×8 blocks. A larger number of motion vectors may be allocated to a block or macroblock with more complex motion. If the motion is too complex or the trade-off in terms of rate/distortion optimization is bad, no motion vector is calculated and the macroblock is encoded as an Intra macroblock. Such a case is depicted in macroblock 310.
The motion vector field 306 calculated by the video encoder is a starting point for calculating the auxiliary information related to the frame 303. This motion information can be directly obtained during the video compression operation or obtained from a partial decoding of an already-encoded video bitstream.
Auxiliary information is preferably associated to each Inter frame of the video (unless there has been no relative movement between frames and the residual has a zero value). The auxiliary information related to the frame I(t) will thus be called AI(t). As mentioned above, there is no auxiliary information accompanying Intra frames, as these are encoded without motion vectors or residuals.
Supposing that the motion vector field 306 generated for the frame 303 in
The beginning of the process for generating the auxiliary information for the frame I(t) is now described with reference to
In step 406, the motion vector field 400 is extended. This extension consists of attributing a motion vector to each 4×4 block of the motion vector field 306. The extension consists of replicating the motion vector of an 8×8 or 16×16 block or macroblock to the corresponding 4×4 blocks within the larger block or macroblock. For example, all 4×4 blocks within the macroblock 311 will be allocated the same motion vector as macroblock 311. The extension also consists of interpolating the motion vector values to the blocks without motion vectors (e.g. block 310). For example, the missing motion vector information in 310 could be created by replicating the neighbouring motion vector 311 during the interpolation process. The skilled person would understand various ways of interpolating motion vectors for blocks that do not have their own, such as averaging the motion vectors of surrounding blocks, etc. The extension gives rise to the extended motion vector field 401 of
Once a motion vector is associated with each 4×4 block as shown in motion vector field 401 in
Each vector (among the set of vectors of the motion vector field that are to be sub-sampled) is successively selected in step 500 within what is defined above as the first loop. The presently-selected vector is called Vref. As mentioned above, there are two loops that are linked in the sub-sampling process and the idea is to select one from the motion vectors that has the smallest cumulative error determined in particular in step 505 as shown in
When it is established in steps 506 and 507 that all the vectors have been tested, the sum of step 505 is compared in step 508 to a minimum value. If this sum is lower than the minimum (yes in step 508), the reference vector (Vref) is selected as the sub-sampled vector in step 509. A new value for Vref is set in step 510 (from among the set of vectors of the motion vector field that are to be sub-sampled). If, in step 508, the sum of step 505 is not less than a minimum, the process starts again with the next vector selected as Vref in step 500.
Experimental results have shown that this method for calculating a sub-sampled motion vector produces better results than the average motion vector:
(the average motion vector is the vector that minimises the least square distance d(V1,V2)=(V1x−V2x)2+(V1y−V2y)2), though the latter is also a legitimate way of obtaining the sub-sampled motion vector according to an embodiment of the present invention.
Thus,
The principle of the creation of the irregular grid is to obtain an irregular grid that can be constructed symmetrically both in the encoder and in the decoder without transmitting auxiliary information. In other words, it is desirable for both the encoder and the decoder to be able to recreate the same irregular grid. Once the grid is constructed, it can be used at the encoder for extracting the motion auxiliary information as explained with respect to step 408 in
According to a preferred embodiment of the invention, the auxiliary information may have a fixed budget or threshold of bandwidth to be allocated to motion vectors. Thus, the number of motion vectors, and therefore the format of the irregular grid, may be tailored (i.e. limited) to this budget. The threshold of complexity in the complexity map for a specific size of cell of the irregular grid may thus be dictated by the total number of grid cells permitted. For instance, in a case where there is little bandwidth and therefore a small budget for motion vectors in the auxiliary information, the complexity threshold over which small cells will be formed will be higher than if a large budget is available. In the example illustrated in the
In 600, the image I(t−1) is displayed. The frame I(t) in this case is subject to slice losses. If no loss occurs on this frame I(t−1) during the transmission, the same frame is available both in the encoder and in the decoder. In 601, the encoded frame I(t) is displayed.
The irregular grid 603 is constructed in step/module 602 based on the frame I(t−1) (i.e. the frame preceding the current frame containing the losses). As the frame I(t−1) is not subject to slice loss, this grid can also be constructed by the decoder (in the same way as it had been constructed by the encoder and as explained below with reference to
Once the irregular grid is constructed at the decoder side, the received motion vectors (from the auxiliary information) corresponding to each block of the irregular grid can be allocated to the right place in the grid at the decoder. With respect to the process shown in
On the other hand, when the process of the
In stage 700, the encoded frame I(t−1) is displayed. The motion vector field associated with this frame is extracted in 701. This motion vector field may be characteristic of the motion between the frame I(t−1) and the frame I(t−2), for example. The way this motion vector field is calculated is similar to the process described in the steps 404, 405 and 406 of
The frame I(t−1) is labelled 800 and is the starting point for the process. Each cell of the frame contains an associated motion vector: for example, the motion vector 801, which can be represented as V(x,y)=(Vx,Vy), is associated with the block 802. The coordinates (x,y) are taken as being the centre of the block 802.
This motion vector 801 is inverted, giving: −V(x,y)=(−Vx, −Vy). Following the direction of the inverted vector thus gives the equivalent position of the block in the subsequent frame I(t) that is equivalent to the block 802 in frame I(t−1). The block 802 is thus projected onto the frame I(t) 803 according to this inverted motion vector 805 and results in block 804. The centre of block 804 in frame I(t) is at the position represented by (x−Vx, y−Vy).
The value of the motion vector 805 associated with this block 804 is the same value as the original uninverted motion vector, namely V(x,y)=(Vx,Vy).
As can be seen from frame I(t) labelled 803, the inversion-produced block 804 shares the largest common area with the cell 806 from among all the cells of the frame I(t). Thus, the value of the motion vector V(x,y) 805 is attributed to the cell 806 as depicted in the resultant frame 807. The same inversion and projection process is repeated for all the cells of the frame I(t−1). An example of the result of this process is shown in frame 808. After this first process, some cells have no corresponding motion vectors because the motion vector inversion process has not led to a majority overlap of the inversion-produced block with those cells. An interpolation stage 809 may thus be conducted to enable the obtaining of a full motion vector field 810 for the frame I(t). The interpolation may be performed in a similar way to the interpolation described above with respect to the motion vector extension 406 shown in
Returning to
In the example illustrated in this figure, the complexity map calculation consists of calculating the maximum variation of motion vector size (i.e. by measuring the variance of a plurality of 4×4 blocks) with respect to ‘adjacent’ motion vectors. By adjacent, what is meant is either the nearest neighbours (top, bottom, left, right) or the nearest and next nearest neighbours (including diagonal nearest motion vectors), or even all of the motion vectors in a single block.
The maximum variation of vector size represents the maximum motion with respect to the previous frame. A higher complexity value therefore represents a greater motion in the relevant blocks, which will, in further steps described below, give rise to a higher density of motion vectors in the motion vector field for those blocks with higher complexity values. The complexity map therefore is created in order to determine the density of motion vectors to be output from the sub-sampling step/module.
Blocks of 4×4 motion vectors are extracted from the motion vector field (such as block 710 of the motion vector field 703) in stage 701. The variances of the horizontal and vertical components of these 16 motion vectors are calculated (i.e. the variance of each motion vector angle with respect to the x-axis as viewed in
For example, the block of 4×4 motion vectors 710 is selected and the variances of the motion vectors are calculated in stage 704. The maximum of the horizontal and vertical variances is determined and associated to the corresponding block 711 in the complexity map 705. For example, the complexity of the block 711 is called C in
This complexity map 705 is split into two kinds of cells 707 using the highest complexity selection step/module 706. Of course, more kinds of cells than two may be distinguished in a separate embodiment. A group of small cells (e.g. 712) corresponds to a block of 4×4 motion vectors with a high complexity value. The large cell (e.g. 713) corresponds to a block of 4×4 motion vectors with a low complexity value. The number of ‘small’ cells and ‘large’ cells depends on the number of motion vectors (to be) sent in the auxiliary information. In the illustrative example of
From the frame 705, the two 4×4 blocks are checked and the one 711 with the largest complexity will be kept as (or divided into) small cells 712 (16 cells, in the illustrated case). The second 4×4 block will be effectively combined and considered as a single large cell 713 if the complexity is low. Of course, the size of the cells can vary according the preferences of the user.
The complexity map 707 shows the two kinds of cells that are created (small and large). In the final sub-sampling stage 708, a motion vector is associated with each cell (whatever the size of the cell).
In this example, the motion vectors 709 corresponding to the small cells correspond to the motion vectors of the frame I(t) at the same location. It is noted that the frame I(t−1) was used to calculate the irregular grid format, but once this grid is calculated, with smaller and larger cells, the motion vectors for each block are calculated using the motion in the frame I(t). These motion vectors are calculated using either the motion vectors associated with a 4×4 block or by sub-sampling large blocks that have plural motion vectors.
For the large cell 713, the generation of the single motion vector 714 using the sub-sampling step/module 708 consists of applying the algorithm described with respect to
At the decoder side, the motion vector field in the irregular grid 720 is received as the auxiliary information and this is applied to the irregular grid that is independently but symmetrically calculated at the decoder from frame I(t−1) using the same method (i.e. motion comparison with I(t−2)) as the encoder. The retrieval of the motion vectors from the auxiliary information is explained above with reference to
The final result is a motion vector field containing cells of different size. This motion vector field is the auxiliary information. In other words, the number of large and small cells is shared between the server and the client so that the same irregular grid is created. The motion vector field can be compressed by an entropic encoder (e.g. arithmetic encoding). Of course, The different cells of the irregular grid need to be read on the same way both at the server and at the client. For example, a lexicographic reading adapted to the irregularity of the grid can be used. This and other methods for recognising vectors in transmitted information in a specific order so that they can be correctly applied to the cells of the irregular grid will be understood and applicable by the skilled person.
The advantage of this process of creating the complexity map is to have a larger density of motion vectors on areas with high motion complexity and a lower density of motion vectors on areas with low motion complexity. This gives rise to the irregular grid of the preferred embodiments. Thus, a minimum number of motion vectors is possible to achieve (while having those minimum of motion vectors allocated to the most appropriate blocks), which in turn reduces the amount of bandwidth required by the auxiliary information.
In
First, the process for calculating the irregular grid is described.
In frame 900, the frame I(t−1) is displayed. As this frame is theoretically lossless (no slice lost on this frame), this frame is similar to the frame 700 used by the video server during the encoding process. In the motion vector extraction step/module 901, the motion vectors associated with the frame I(t−1) are extracted. These motion vectors are then inverted and projected 902 onto the frame I(t) as described with reference to
The complexity calculation step 904 is run by a complexity calculation module 904. Once again, this process is the same as the process conducted by the video server 704 shown with reference to
Once the same irregular grid 907 is created in the client as was created in the server, the auxiliary information (i.e. the motion vectors) associated with the frame I(t) is read. Specifically, the SEI carrying the auxiliary information is read and the motion vectors are extracted in the auxiliary information extraction step/module 908. These motion vectors are inserted into the correct locations in the irregular grid 909. As mentioned above, the association of the motion vectors with the correct locations in the irregular grid is achieved by coding the motion vectors in a certain order or with specific flags or using a lexicographic reading that associate the motion vectors with the correct positions in the irregular grid.
In frame 910, one slice of the frame I(t) is lost. Though the reconstruction of the frame is correct for the received slice, no information (i.e. prediction information and residual or 1-frame information) is available for the lost slice.
In step/module 911, the motion vector information corresponding to the lost slice is inserted into frame I(t). The resulting frame is shown as 912. Thus, a full frame with motion vectors associated with each block is recreated.
In the motion compensation module 913, standard motion compensation is performed on the lost part of the frame I(t) using the resulting frame 912 (i.e. using the auxiliary motion information and the previous decoded frame I(t−1) 900). The result is the frame 914 where the lost slice has been replaced by the motion compensated information. This frame can be displayed.
The skilled person may be able to think of other applications, modifications and improvements that may be applicable to the above-described embodiment. The present invention is not limited to the embodiments described above, but extends to all modifications falling within the scope of the appended claims.
Number | Date | Country | Kind |
---|---|---|---|
1113113.3 | Jul 2011 | GB | national |