Video coding systems are widely used to compress digital video signals to reduce the storage need and/or transmission bandwidth of such signals. Among the various types of video coding systems, such as block-based, wavelet-based, and object-based systems, nowadays block-based hybrid video coding systems are the most widely used and deployed. Examples of block-based video coding systems include international video coding standards such as H.261, MPEG-1, MPEG-2, H.263, H.264/AVC, and H.265/HEVC.
Exemplary systems and methods described herein provide for encoding and decoding (collectively “coding”) of video in a bitstream using filter-based prediction. In an exemplary method of coding a video in a bitstream, the video includes a plurality of frames, each frame comprising blocks of pixels. A plurality of pixels neighboring a current block are reconstructed. For each of the pixels in the current block: (i) a respective set of reconstructed pixels neighboring the current pixel is used to derive a corresponding set of filter coefficients; and (ii) the current pixel is predicted by applying a filter to a respective pattern of reconstructed pixels neighboring the current pixel, wherein the filter uses the corresponding set of derived filter coefficients. The filter may be a Wiener filter. Prediction of a pixel results in a pixel value that, in an encoding method, can be subtracted from an original pixel input value to determine a residual that is encoded in the bitstream. In a decoding method, a residual can be decoded from the bitstream and added to the predicted pixel value to obtain a reconstructed pixel that is identical to or approximates the original input pixel. Prediction methods as described herein thus improve the operation of video encoders and decoders by decreasing, in at least some implementations, the number of bits required to encode and decode video. Further benefits of exemplary prediction methods to the operation of video encoders and decoders are described in the Detailed Description.
In some embodiments, the derivation of filter coefficients is performed recursively, such that a pixel predicted using one set of filter coefficients may in turn be used in deriving a new set of filter coefficients to predict a further pixel. For example, in some embodiments, for at least one of the pixels in the current block, the respective set of neighboring pixels used to derive the corresponding set of filter coefficients includes at least one other pixel in the current block. In some embodiments at least a first pixel in the current block is reconstructed by adding a residual to the prediction of the first pixel to generate a first reconstructed pixel. The first reconstructed pixel may then be included in the set of reconstructed pixels used to derive the filter coefficients used to predict at least a second pixel in the current block. In some embodiments, for each of the pixels in the current block, the pattern of reconstructed pixels is a subset of the set of reconstructed pixels.
In some exemplary embodiments, to code a first pixel in a video, a plurality of neighboring pixels of the first pixel are reconstructed. Coefficients of a Wiener filter or other filter are then derived based on the reconstructed neighboring pixels, and the first pixel is predicted by applying the Wiener filter to at least some of the reconstructed neighboring pixels. The coefficients may be derived on a pixel-by-pixel basis or a block-by-block basis.
In some embodiments, filter-based prediction techniques as described herein are used to predict residuals. In some embodiments, the filter-based prediction techniques are combined with other prediction techniques (e.g. other intra and inter prediction techniques). For example, a composite prediction of a current block may be generated as a weighted sum of filter-based prediction of the block and an inter or intra prediction of the block.
In some embodiments, filter coefficients are signaled in the bitstream.
In some embodiments, filter-based prediction techniques are used to predict residuals that remain from prediction using other techniques, such as residuals from intra or inter prediction. One such method is provided for coding a video in a bitstream, where the video comprises a plurality of frames, each frame comprising blocks of pixels. In the exemplary method, a plurality of pixels neighboring a current block are reconstructed and residuals are determined for those neighboring reconstructed pixels. The residuals of neighboring reconstructed pixels are used to derive coefficients of a filter, and residuals of pixels in the current block are predicted by applying the filter to a respective pattern of residuals of reconstructed pixels neighboring the current pixel, where the filter uses the corresponding set of derived filter coefficients. In some embodiments, a first-order prediction of the current block is generated using intra or inter prediction, and a second-order prediction of the current block is generated by adding the predicted residuals for the current block to the first-order prediction. Second-order residuals representing the difference between the second-order prediction and the input or reconstruction signal may be signaled in the bitstream. In some embodiments, coefficients used to predict the residuals are derived on a block-by-block basis. In some embodiments, those coefficients are derived on a pixel-by-pixel basis.
Exemplary video encoders and decoders that implement the filter-based prediction are described. The present disclosure further describes bitstreams generated using the encoding techniques disclosed herein.
A more detailed understanding may be had from the following description, presented by way of example in conjunction with the accompanying drawings, which are first briefly described below.
A detailed description of illustrative embodiments will now be provided with reference to the various figures. Although this description provides detailed examples of possible implementations, it should be noted that the provided details are intended to be by way of example and in no way limit the scope of the application.
For an input video block (e.g., an MB or a CU), spatial prediction 160 and/or temporal prediction 162 may be performed. Spatial prediction (e.g., “intra prediction”) may use pixels from already coded neighboring blocks in the same video picture/slice to predict the current video block. Spatial prediction may reduce spatial redundancy inherent in the video signal. Temporal prediction (e.g., “inter prediction” or “motion compensated prediction”) may use pixels from already coded video pictures (e.g., which may be referred to as “reference pictures”) to predict the current video block. Temporal prediction may reduce temporal redundancy inherent in the video signal. A temporal prediction signal for a video block may be signaled by one or more motion vectors, which may indicate the amount and/or the direction of motion between the current block and its prediction block in the reference picture. If multiple reference pictures are supported (e.g., as may be the case for H.264/AVC and/or HEVC), then for a video block, its reference picture index may be sent. The reference picture index may be used to identify from which reference picture in a reference picture store 164 the temporal prediction signal comes.
The mode decision block 180 in the encoder may select a prediction mode, for example, after spatial and/or temporal prediction. The prediction block may be subtracted from the current video block at 116. The prediction residual may be transformed 104 and/or quantized 106. The quantized residual coefficients may be inverse quantized 110 and/or inverse transformed 112 to form the reconstructed residual, which may be added back to the prediction block 126 to form the reconstructed video block.
In-loop filtering (e.g., a deblocking filter, a sample adaptive offset, an adaptive loop filter, and/or the like) may be applied 166 to the reconstructed video block before it is put in the reference picture store 164 and/or used to code future video blocks. The video encoder 100 may output an output video stream 120. To form the output video bitstream 120, a coding mode (e.g., inter prediction mode or intra prediction mode), prediction mode information, motion information, and/or quantized residual coefficients may be sent to the entropy coding unit 108 to be compressed and/or packed to form the bitstream. The reference picture store 164 may be referred to as a decoded picture buffer (DPB).
The residual transform coefficients may be sent to an inverse quantization unit 210 and an inverse transform unit 212 to reconstruct the residual block. The prediction block and the residual block may be added together at 226. The reconstructed block may go through in-loop filtering 266 before it is stored in reference picture store 264. The reconstructed video in the reference picture store 264 may be used to drive a display device and/or used to predict future video blocks. The video decoder 200 may output a reconstructed video signal 220. The reference picture store 264 may also be referred to as a decoded picture buffer (DPB).
A video encoder and/or decoder (e.g., video encoder 100 or video decoder 200) may perform spatial prediction (e.g., which may be referred to as intra prediction). Spatial prediction may be performed by predicting from already coded neighboring pixels following one of a plurality of prediction directions (e.g., which may be referred to as directional intra prediction).
Intra coding is used to reduce the spatial correlation in most image and video coding standards such as JPEG, H.261, MPEG-1, MPEG-2, H.263, H.264/AVC and H.265/HEVC. Directional intra prediction has been used in H.264/AVC and H.265/HEVC to improve the coding efficiency. The intra prediction modes utilize a set of reference samples from immediate above row and left column of the current block to be predicted. In the following sections, the reference samples are denoted by Rx, y with (x, y) having its origin one pixel above and one pixel on the left relative to the block's top-left corner. Similarly, Px,y is used to denote a predicted sample value at a position (x, y).
Spatial prediction may be performed on a video block of various sizes and/or shapes. Spatial prediction of a luma component of a video signal may be performed, for example, for block sizes of 4×4, 8×8, and 16×16 pixels (e.g., in H.264/AVC). Spatial prediction of a chroma component of a video signal may be performed, for example, for block size of 8×8 (e.g., in H.264/AVC). For a luma block of size 4×4 or 8×8, a total of nine prediction modes may be supported, for example, eight directional prediction modes and the DC mode (e.g., in H.264/AVC). Four prediction modes may be supported; horizontal, vertical, DC, and planar prediction, for example, for a luma block of size 16×16.
Furthermore, directional intra prediction modes and non-directional prediction modes may be supported.
Non-directional intra prediction modes may be supported (e.g., in H.264/AVC, HEVC, or the like), for example, in addition to directional intra prediction. Non-directional intra prediction modes may include the DC mode and/or the planar mode. For the DC mode, a prediction value may be obtained by averaging the available neighboring pixels and the prediction value may be applied to the entire block uniformly. For the planar mode, linear interpolation may be used to predict smooth regions with slow transitions. H.264/AVC may allow for use of the planar mode for 16×16 luma blocks and chroma blocks.
An encoder (e.g., the encoder 100) may perform a mode decision (e.g., at block 180 in
HEVC intra coding supports two types of prediction unit (PU) division, PART_2N×2N and PART_N×N, splitting a coding unit (CU) into one or four equal size PUs, respectively.
For 4:2:0 chroma formats, an 8×8 CU that is split into four 4×4 PUs may have four luma prediction blocks (PBs), but only one 4×4 PB per chroma channel for one 8×8 intra coded block to avoid throughput impact caused by 2×2 chroma intra prediction blocks.
When a CU is split into multiple transform units (TUs), the intra prediction may be applied for each TU sequentially in the quad-tree Z scanning order instead of applying the intra prediction at PU level. This allows neighboring reference samples from previous reconstructed TUs that are closer to the current TU samples to be coded.
In exemplary embodiments, each predicted sample Px,y may be obtained by projecting its location to a reference row or column of pixels, applying the selected prediction direction, and interpolating a predicted value for the sample in 1/32 pixel accuracy. Interpolation in some embodiments is performed linearly utilizing the two closest reference samples Ri,0 and Ri+1,0 for vertical prediction (modes 18-34 as shown in
HEVC supports a total of 33 angular prediction modes as well as planar and DC prediction for luma intra prediction for all the PU sizes. HEVC defines three most probable modes (MPMs) for each PU based on the modes of the top and left neighboring PUs. Firstly, a flag is coded to indicate whether the prediction mode of the current block is in the MPM set or not. In the case when the current intra prediction mode is equal to one of the elements in the set of MPMs, only the index in the MPM set is transmitted to the decoder. Otherwise (the current prediction mode is none of the MPM modes), a 5-bit fixed length code is used to determine the selected mode outside the set of MPMs.
A 3-tap smoothing filter may be applied to all reference samples when smoothing is enabled (e.g. when intra_smoothing_disabled_flag is set to 0 in HEVC). The filtering may further be controlled by the given intra prediction mode and transform block size. For 32×32 blocks, all angular modes except horizontal and vertical may use filtered reference samples. For 16×16 blocks, the modes not using filtered reference samples may be extended to the four modes (9, 11, 25, 27) (see directions in
For PBs associated with the chroma component, the intra prediction mode may be specified as either planar, DC, horizontal, vertical, ‘DM CHROMA’ mode. Table 1 shows an exemplary rule specifying the chroma component intra prediction mode given the corresponding luma PB intra prediction mode and the intra_chroma_pred_mode syntax element. The intra prediction direction of chroma component sometimes is mapped to diagonal mode ‘34’.
When the DM CHROMA mode is selected and the 4:2:2 chroma format is in use, the intra prediction mode for a chroma PB is derived from intra prediction mode for the corresponding luma PB as specified in Table 2.
When reconstructing intra-predicted transform blocks (TBs), an intra-boundary filter may be used to filter the predicted luma samples along the left and/or top edges of the TB for PBs using horizontal, vertical and DC intra prediction modes, as shown in
In an exemplary embodiment, the intra boundary filter may be defined with respect to an array of predicted samples p as input and predSamples as output as follows. For horizontal intra-prediction applied to luma transform blocks of size (nTbS) less than 32×32, and boundary filtering enabled (disablelntraBoundaryFilter is equal to 0), the following filtering applies with x=0 . . . nTbS−1, y=0:
predSamplesx,y=Clip1Y(P−1,y+((Px,−1−P−1,−1)>>1)) (3)
For vertical intra-prediction applied to luma transform blocks of size (nTbS) less than 32×32, and disablelntraBoundaryFilter equal to 0, the following filtering applies with x=0 . . . nTbS−1, y=0:
predSamplesx,y=Clip1Y(Px,−1+((P−1,x−P−1,−1)>>1)) (4)
For DC intra-prediction applied to luma transform blocks of size (nTbS) less than 32×32 the following filtering applies with x=0 . . . nTbS−1, y=0 (where dcVal is the DC predictor):
predSamples0,0=(P−1,0+2*dcVal+P0,−1+2)>>2 (5)
predSamplesx,0=(Px,−1+3*dcVal+2)>>2, with x=1 . . . nTbS−1 (6)
predSamples0,y=(+3*dcVal+2)>>2, with y=1 . . . nTbS−1 (7)
The average improvement provided by the boundary smoothing is measured to be 0.4% in terms of Bjøntegaard Delta (BD) rate saving. The intra boundary filter is only applied on the luma component as the prediction for chroma components tends to be very smooth.
For intra mode residual coding, HEVC utilizes intra mode dependent transforms and coefficient scanning for coding the residual information. A discrete sine transform (DST) is selected for 4×4 luma blocks and discrete cosine transform (DCT) is used for the all other types of blocks.
During HEVC development, a linear-model (LM) based chroma intra prediction method was proposed. In this LM chroma prediction method, chroma samples are predicted from the collocated reconstructed luma samples using a linear model (LM) as follows:
PredC[x,y]=α·RecL[x,y]+β. (8)
where PredC indicates the predicted chroma samples in a block and RecL indicates the corresponding reconstructed luma samples in the block. Parameters α and β are derived from causal reconstructed luma and chroma samples around the current block, and thus do not need to be signaled.
Experimental results show that the average BD-rate reductions of Y, Cb, Cr components are 1.3%, 6.5% and 5.5% in All Intra test configuration. However, for low delay and random access configurations, the coding gain from this tool is smaller since they have fewer intra coded blocks.
Inter coding may be used to reduce temporal redundancy. Compared with intra-coding, HEVC supports more PU partition sizes. In addition to PART_2N×2N, PART_2N×N, PART_N×2N, PART_N×N as supported by intra prediction, there are additional 4 asymmetric motion partitions PART_2N×nU, PART_2N×nD, PART_nL×2N and PART_nR×2N supported by inter-picture prediction as shown in
In an exemplary embodiment, each inter-prediction PU has a set of motion parameters consisting of one or two motion vectors and reference picture indices. A P slice only uses one reference picture list and a B slice may use two reference picture lists. The inter-prediction samples of the PB are obtained from those of a corresponding block region in the reference picture identified by a reference picture index, which is at a position displaced by the horizontal and vertical components of the motion vector (MV).
Motion vector prediction exploits spatial-temporal correlation of motion vectors with neighboring PUs. To reduce motion vector signaling cost, HEVC allows the merge code. In merge mode, a list of motion vector candidates from the neighboring PU positions (spatial neighbors and/or temporal neighbors) and/or zero vectors are included in the merge candidate list. The encoder can select the best predictor from the merge candidate list and transmit the corresponding index indicating the chosen candidate. The decoder will reuse the motion parameters of the selected merge candidate signaled by the encoder.
The HEVC encoder can be configured to encode a video using lossless coding by operating in the tansquant bypass mode. In this mode, the transform and quantization, and its associated inverse processes both at the encoder and decoder are bypassed. This mode may be enabled by setting a Picture Parameter Set (PPS) syntax element transquant bypass enabled flag to one, which specifies that cu transquant bypass flag is present. The cu transquant bypass flag is also set to one which enforces quantization, transform, and in-loop filter processes are bypassed both at the encoder and decoder. In the present disclosure, the transquant bypass mode is referred to as the lossless coding mode.
Hybrid video coding standards, including the HEVC, have the ability to code blocks within a frame using either an intra mode that exploits spatial redundancies, or inter mode that exploits temporal redundancies. In this disclosure, prediction systems and methods are described for intra and inter coding that demonstrate improved performance over existing methods. The exemplary systems and methods may be used with both lossy and lossless video coding.
The reference codec software for the exploration of next generation video coding technologies, JEM-1.0 software, includes 67 different intra prediction modes, which consists of 65 angular modes and two non-angular modes. These intra modes primarily use reconstructed pixels located on the left column and top row of the current block to generate a block of prediction pixels. Although this method is effective for predicting pixels that are closer to the left and top boundaries, the prediction suffers for those pixels located further away from these boundaries. An example of this problem is illustrated in
In an exemplary embodiment, an intra prediction mode is provided that predicts pixels using the nearest neighboring reconstructed pixels. Furthermore, exemplary prediction modes use a filter that has been derived using the neighboring reconstructed pixels. Exemplary embodiments therefore provide enhanced prediction by adapting to local image characteristics.
In another exemplary embodiment, systems and methods are provided for inter coding to improve motion compensated prediction. In such an embodiment, a filter is applied, where the coefficients of the filter have been derived to minimize the mean squared error between the prediction and reconstructed signal. In lossless coding, the reconstructed signal is equal to the original signal. Different techniques are described herein for deriving the filter coefficients either offline or online during the encoding/decoding process. By improving the motion compensated prediction, systems and methods described herein may improves the overall performance of inter coding.
In exemplary embodiments, nearest neighboring prediction is used to improve both intra and inter prediction in video coding. Systems and methods described herein use a filter derived from neighboring reconstructed pixels. In some embodiments, a Wiener filter (WF) is used for prediction. In some embodiments, as an alternative to the Wiener filter, other linear or nonlinear prediction filtering techniques may be used, such as least mean squares filtering or non-local mean filtering. The Wiener filter is a signal processing technique designed to yield predictions that minimize the mean squared error with respect to the actual signal. In some exemplary embodiments, there is no need to signal the filter coefficients to the decoder because the filter coefficients are derived from neighboring reconstructed pixels. Prediction methods described herein can be applied to components of different color spaces, such as, YCbCr, RGB, YCoCg, etc.
Intra Prediction using Wiener Filter.
In one embodiment of intra prediction described herein, pixel-based adaptive prediction is used where Wiener filter coefficients adapt on pixel-by-pixel basis. In another embodiment of intra prediction described herein, block-based adaptive prediction is used, where Wiener filter coefficients are fixed across an entire block of pixels but could vary on a block-basis. In these exemplary embodiments, the coefficients (e.g. Wiener filter coefficients) are derived at the encoder and decoder using reconstructed neighboring pixels. Since these pixels are not available for blocks along the frame boundary, in some embodiments, the use of this prediction mode may be disabled for transform units (TU) on the frame boundary.
Pixel-Based Adaptive Prediction using Wiener Filtering.
Exemplary embodiments disclosed herein include systems and methods for encoding and decoding video using a filter-based intra prediction mode (filter prediction mode). Details of exemplary filter prediction algorithms are further described.
In an exemplary embodiment, an encoding process is described for a filter prediction mode. For the sake of clarity, the process is described with reference to a 4×4 TU block, although the methods described are not limited to that block size. In contrast to other intra modes, the filter prediction mode may operate on a pixel-by-pixel basis, in one of the scan orders shown in
r
i
=x
i
−p
i. (9)
These residuals within a TU may then be coded using an entropy coding technique such as Context Adaptive Binary Arithmetic coding and embedded into the bitstream.
In an embodiment where the encoding process does not consider quantization or transform, the reconstructed pixel derived at the encoder will be
The reconstructed pixel is stored in the buffer and is made available for Wiener filter prediction of subsequent pixels. An exemplary decoding process used in filter prediction mode is illustrated in
The following notation is used in this description of filter-based prediction, such as Wiener filter prediction. Let xi,j correspond to the original input pixel at spatial location (i,j), {circumflex over (x)}i,j denote its corresponding prediction pixel, and {circumflex over (x)}i,j denote its reconstructed pixel. For a given input pixel xi,j, for example shown in
A filter pattern may be adapted on a pixel-by-pixel basis. The choice of filter pattern for a given pixel may be based on the availability of associated neighboring reconstructed pixels, characteristics of the neighboring reconstructed pixels (e.g., edge gradient and direction), etc. The choice may be inferred, for example, by setting rules for choosing a filter pattern given available reconstructed pixels, or a filter pattern index may be signaled in the bitstream.
Consider a situation in which it is desirable to predict pixel {circumflex over (x)}k,l using its associated neighboring reconstructed pixels that are selected by the choice of filter pattern used. For example, when filter pattern in
{circumflex over (x)}
k,l
=w
0
{tilde over (x)}
k−1,l
+w
1
{tilde over (x)}
k−1,l−1
+w
2
{tilde over (x)}
k,l−1
+w
3
{tilde over (x)}
k+1,l−1, (11)
where W=[w0, . . . , w3] are coefficients. Let X denote a vector of input pixels, and g its associated predictions, then coefficients W may be derived so as to minimize the mean square error (MSE) between original pixels and prediction pixel, as follows
The Wiener filter offers solution to the above equation. An approach to derive the filter coefficients is described as follows. A linear equation is constructed for a training pixel {circumflex over (x)}k,l using its neighboring reconstructed pixels, for example, as shown in
{circumflex over (x)}
k,l
=w
0
{tilde over (x)}
k−1,l
+w
1
{tilde over (x)}
k−1,l−1
+w
2
{tilde over (x)}
k,l−1
+w
3
{tilde over (x)}
k+1,l−1, (13)
A linear equation is constructed for each of the N=8 training pixels, and the resulting system of linear equation is given by
This equation can then be written as follows
where zk is a row vector associated with a training pixel sk. The coefficient vector W may be estimated by minimizing the mean squared error between the predicted sample sk and actual sample sk. The least squares solution for coefficients W is given by
W=(Rzz)−1rzs, (17)
where R is estimate of the covariance of zk defined as
and rzs is an estimate of the cross correlation between zk and sk, defined as
The matrix inverse in Equation (17) can be computed using the Gaussian elimination algorithm, which involves floating point operations. Alternatively, matrix inverse can be computed using fixed point implementation, or other techniques may be used to invert the matrix Rzz.
A unique solution for coefficients W exists if the matrix R is invertible, or equivalently if determinant of R is not zero. However, for sufficiently small values of the determinant, the solutions for W may be unstable. Therefore, in exemplary embodiments, a threshold is applied to the determinant such that the coefficients in Equation (17) are used only when the following condition is true
|det(Rzz)|>thresh, where (20)
thresh=2(bitdepth-8)+1, bitdepth is the internal bit depth used for encoding and decoding, bitdepth≥8, and 1.1 denotes the absolute value function, and det( ) denotes the matrix determinant. Other threshold values may alternatively be used.
It has been observed that at certain times, even where the condition in Equation (20) is satisfied, the coefficients derived have very large values that result in incorrect prediction. To avoid this, in some embodiments, the mean of the absolute values of coefficients is computed, and the coefficients are used for prediction only if the mean absolute value is no greater than a threshold thresh_2 as shown below.
¼Σk=03|wk|≤thresh_2 (21)
Through experimentation, it has been found that a thresh_2 being equal to 4 yields good results. Other threshold values may alternatively be used. If either condition in Equations (20) or (21) is false, coefficients of wk=¼ may be used, which is simple averaging like DC prediction. Otherwise, the solution from Equation (17) is used.
The predicted pixel may be computed as follows:
{circumflex over (x)}
i,j
=w
0
{tilde over (x)}
i−1,j
+w
1
{tilde over (x)}
i−1,j−1
+w
2
{tilde over (x)}
i,j−1
+w
3
{tilde over (x)}
i+1,j−1, (22)
The Wiener filter coefficients derived above are floating-point values; fixed-point coefficients can also be derived. In one embodiment, the approach described in Chia-Yang Tsai, et al. “Adaptive loop filtering for video coding,” IEEE Journal of Selected Topics in Signal Processing, Vol. 7, No. 6, pp. 934-945, December 2013, for use in adaptive loop filtering is used for deriving the fixed-point coefficients. In this approach, the coefficients are quantized suitably to provide 8 bit precision for the fractional part. Constraints may be imposed on the range of filter coefficient values, so that coefficients can be represented within 10-bits.
The exemplary filter prediction mode described above generates Wiener filter coefficients for each pixel, and therefore the Wiener filter is pixel adaptive. In some embodiments, one set of Wiener filter coefficients is used uniformly across all pixels within a block. This approach reduces the computational complexity both at the encoder and decoder, since it is not necessary to estimate coefficients for each pixel.
A block diagram illustrating an exemplary encoding process is provided in
The Wiener filter coefficients may be estimated offline or online. In an online approach, the coefficients are estimated using neighboring reconstructed pixels on the top and left of the current TU. During coefficient estimation, a greater number of training pixels may be considered compared to the pixel-based adaptive scheme (e.g. N>8), since coefficients are estimated once for an entire block. This would require availability of more number of neighboring reconstructed pixels.
In an offline approach, the Wiener filter coefficients are estimated offline, categorized based on certain criteria, and stored in lookup tables (LUT). During filter based prediction, appropriate coefficients are chosen from the LUT for a TU, based on the category it belongs to. Described below are different categories of coefficients and associated processes of selecting those categories from a LUT during the WF prediction.
Some embodiments make use of block size-based categorization. In this approach, Wiener coefficients are estimated for each valid size of TU. During the filter based prediction process, appropriate coefficients are fetched from the LUT based on the current TU size.
Some embodiments make use of intra mode-based categorization. In this approach, filter based prediction is not considered as a separate intra prediction mode, but it rather accompanies an existing intra mode. Wiener filter coefficients are estimated offline for each intra mode. In the encoding process, intra prediction modes are first applied to the given TU and the best mode is found based on the rate-distortion (RD) cost. Appropriate Wiener filter coefficients are selected from the LUT using the best intra mode, and those coefficients are used for filter based prediction that operates on neighboring reconstructed pixels. If the RD cost is smaller when using the filter based mode, then the encoder may signal a flag WF flag equal to one in the bitstream; otherwise a zero is signaled. The signaling for the best intra mode remains unchanged. In the decoding process, if the WF flag is equal to one, then the decoder uses the signaled intra mode for selecting the Wiener filter coefficients from the LUT for predicting the TU. In another embodiment, the intra modes of the neighboring PUs are used to select the Wiener filter coefficients from the LUT. The WF flag is still signaled in the bitstream to indicate the use of WF prediction to the decoder.
Some embodiments make use of block size and intra mode-based Wiener filter weights. Such embodiments may be understood as a combination of block size-based categorization and intra mode-based categorization. Wiener filter coefficients are estimated for different block sizes and intra modes, and stored in a LUT. During WF prediction, the coefficients are chosen based on both the TU size and the intra mode.
The prediction block P is subtracted from the input block X to generate block of residuals R, which are then entropy coded. The decoding process is illustrated in
The Joint Exploration Model JEM-1.0 considers 67 different intra modes. Among them, 65 are angular modes, and the remaining two are DC and planar modes. Since the most frequently occurring modes are the non-angular modes and the horizontal and vertical modes, in some embodiments, one of the angular modes (preferably not a frequently-occurring mode) is replaced with a filter based prediction mode. In one embodiment, JEM intra mode number 6 (or HEVC mode 4) is replaced by the filter based prediction mode, since mode 6 is close to a 45 degrees angle, as illustrated in
Combining WF Mode with other Intra Modes.
A typical intra angular prediction mode is able to perform well when a block contains pure edge angles. However, natural video often contains angular edges that have textural variation, e.g. where the edge itself is textured, or there is an illumination variation in the edge region, or there is camera noise, or there are other weak edges in different directions. Since Wiener filter prediction adapts well to local pixel variation, a combination of Wiener filter prediction with an intra angular prediction mode may provide improved prediction for angular edges with textural variations.
Two exemplary techniques for combining filter based prediction with intra prediction are described in detail, namely (1) the use of a weight average between filter-based and other intra prediction, and (2) the use of filter-based prediction in tandem with other intra prediction. The filter based prediction used in these techniques may either be pixel-based adaptive or block-based adaptive. The effects of quantization and transform are again excluded from the following discussion to facilitate ease of explanation.
Weighted Averaging of WF Prediction with Other Intra Prediction.
In methods described above, the filter based prediction mode is generally used as a separate intra prediction mode. In alternative embodiments, a prediction block may be generated as a weighted average between the filter based prediction PWF and existing intra mode prediction Pintra, as follows
P=gP
intra+(1−g)PWF (23)
An exemplary block diagram of this scheme is provided in
Using WF Prediction in Tandem with other Intra Prediction.
In some embodiments, prediction generated by an existing intra mode is improved by applying filter based prediction to it, as shown in
Since the WF prediction is applied on an intra prediction block, pixels surrounding the to-be-filtered pixel location can be used by the Wiener filter. For example, filter patterns such as square and diamond shaped filters can be used, as shown by square pattern (
In cases where the Wiener filter uses all predictions signals, there is no dependency between the Wiener filter and the pixel reconstruction process. However, the Wiener filter prediction process can be improved if the Wiener filter is applied on the reconstructed pixel. Nevertheless, this calls for the pixel reconstruction to be performed pixel by pixel in the coding order, before the Wiener filter is applied. Suppose the coding order is the horizontal direction. The samples used in Wiener filtering the top and left neighboring positions of the pixel to be filtered (e.g. p1, p2, p3 and p4 in the pattern of
Various different techniques may be used to improve motion compensated prediction (MCP) using Wiener filtering thereby improving the performance of inter frame coding. Three such techniques are described in detail below, namely (1) using implicit signaling, (2) using explicit signaling, and (3) combining filter-based prediction with inter prediction. In the case of intra coding, the Wiener filtering pattern is generally restricted to operating on reconstructed pixels that are on the top or left of the pixel-of-interest position. However, in filter based prediction mode for inter coding, it is possible to operate on all MCP pixels surrounding and including the pixel-of-interest position. The use of Wiener filtering for inter coding exploits correlation of all neighboring pixels for improved prediction. For ease of explanation, the schemes presented here do not describe quantization and transform steps.
Improving Motion Compensated Prediction using Wiener Filtering with Implicit Signaling.
In some embodiments, Wiener filter coefficients are derived from the neighboring reconstructed pixels and associated motion compensated prediction. A block diagram of such a method is provide in
Consider for example an instance where it is desirable to predict a sample from neighboring block ‘b’ {circumflex over (x)}b(t,i,j) by applying a filter on its motion compensated prediction pixels as follows
{circumflex over (x)}
b(t,i,j)=w0{tilde over (x)}b(t−n,i−1,j−1)+w1{tilde over (x)}b(t−n,i,j−1)+w2{tilde over (x)}b(t−n,i+1,j−1)+w3{tilde over (x)}b(t−n,i−1,j)+w4{tilde over (x)}b(t≤n,i,j)+w5{tilde over (x)}b(t−n,i+1,j)+w6{tilde over (x)}b(t−n,i−1,j+1)+w7{tilde over (x)}b(t−n,i,j+1)+w8{tilde over (x)}b(t−n,i+1,j+1), (24)
It is desired to determine coefficients W=[w0, . . . ,w8] that minimize the MSE between reconstructed pixels {tilde over (x)}b(t) and prediction pixels {circumflex over (x)}b(t) as follows
The Wiener filter is the solution to the above equation. A procedure such as that described in greater detail above may be used to derive the Wiener filter coefficients. Once the coefficients are derived, they may be used for predicting reconstructed pixels of the current block xa(t) as follows
{circumflex over (x)}
a(t,i,j)=w0{tilde over (x)}a(t−n,p−1,q−1)+w1{tilde over (x)}a(t−n,p,q−1)+w2{tilde over (x)}a(t−n,p+1,q−1)+w3{tilde over (x)}a(t−n,p−1,q)+w4{tilde over (x)}a(t≤n,p,q)+w5{tilde over (x)}a(t−n,p+1,q)+w6{tilde over (x)}a(t−n,p−1,q+1)+w7{tilde over (x)}a(t−n,p,q+1)+w8{tilde over (x)}a(t−n,p+1,j+1), (26)
where spatial indices (p,q)=(i+mvx, j+mvy), and (mvx, mvy) are the associated displacements due to motion along the x and y axis, respectively. These prediction pixels are used further in the encoding process for generating the residual block, etc. It should be noted that where lossless coding is performed, reconstructed pixels are identical to the original pixels, i.e., x(t)=x(t), but this distinction in notation is retained for clarity of explanation.
Improving Motion Compensated Prediction using WF with Explicit Signaling.
In some embodiments, the motion compensation prediction block is improved by estimating the Wiener filter whose coefficients minimizes the MSE between the motion compensated prediction block and input block of pixels, as follows
where WF( ) is a function that performs Wiener filter prediction using coefficients W. An exemplary block diagram for this scheme is provided in
Combining Filter-Based Prediction with Inter Prediction.
Filter-based intra prediction exploits local spatial correlation, and it can be combined with inter prediction which exploits temporal correlation. This combination can reap benefits from both spatial and temporal correlations. One exemplary approach for improving the inter-prediction is to employ weighted averaging of the motion compensated prediction PMCP and the filter based intra prediction PWF as follows
P=gP
MCP+(1−g)PWF (28)
In some embodiments, the residuals generated by existing intra or inter prediction modes are predicted further using the Wiener filter operating on neighboring reconstructed residuals. The difference between the actual residuals and the residuals predicted by the Wiener filter is coded and transmitted in the bitstream. A block diagram for an exemplary encoding process using this approach is provided in
This mode is expected to perform well compared to Residual Differential Pulse Code Modulation (RDPCM), where only one neighboring residual is used as the residual prediction. Exemplary uses of Wiener filtering described herein operate on a group of neighboring residuals for prediction and can thereby better capture local residual statistics.
The Wiener filter coefficients may be estimated offline for different intra modes, TU sizes, and quantization parameters (QPs) and may be stored in a LUT. Based on this information the encoder or decoder may select the appropriate Wiener filter coefficients from the LUT. The Wiener filter coefficients may also be estimated online both at the encoder and decoder using neighboring reconstructed residuals. The encoder may indicate the use of Wiener filter residual prediction by signaling a flag in the bitstream.
In filter based intra and inter encoding processes described herein, the residuals generated could be correlated and have large magnitude values that could result in higher bit consumption. To reduce this correlation and make the magnitude of the residuals smaller, some embodiments operate to apply Residual Differential Pulse Code Modulation (RDPCM) to the residuals. RDPCM is a lossless method and therefore leads to the original residuals being recovered after the inverse RDPCM process. The order in which the residuals are processed is the same as the scan order chosen for the TU.
This approach is applicable to both pixel-based and block-based adaptive filter based prediction. An exemplary encoding process for filter based prediction mode using RDPCM is illustrated in
The encoder 1302 and/or the decoder 1306 may be incorporated into a wide variety of wired communication devices and/or wireless transmit/receive units (WTRUs), such as, but not limited to, digital televisions, wireless broadcast systems, a network element/terminal, servers, such as content or web servers (e.g., such as a Hypertext Transfer Protocol (HTTP) server), personal digital assistants (PDAs), laptop or desktop computers, tablet computers, digital cameras, digital recording devices, video gaming devices, video game consoles, cellular or satellite radio telephones, digital media players, and/or the like.
The communications network 1304 may be a suitable type of communication network. For example, the communications network 1304 may be a multiple access system that provides content, such as voice, data, video, messaging, broadcast, etc., to multiple wireless users. The communications network 1304 may enable multiple wireless users to access such content through the sharing of system resources, including wireless bandwidth. For example, the communications network 1304 may employ one or more channel access methods, such as code division multiple access (CDMA), time division multiple access (TDMA), frequency division multiple access (FDMA), orthogonal FDMA (OFDMA), single-carrier FDMA (SC-FDMA), and/or the like. The communication network 1304 may include multiple connected communication networks. The communication network 1304 may include the Internet and/or one or more private commercial networks such as cellular networks, WiFi hotspots, Internet Service Provider (ISP) networks, and/or the like.
The processor 1218 may be a general purpose processor, a special purpose processor, a conventional processor, a digital signal processor (DSP), a graphics processing unit (GPU), a plurality of microprocessors, one or more microprocessors in association with a DSP core, a controller, a microcontroller, Application Specific Integrated Circuits (ASICs), Field Programmable Gate Array (FPGAs) circuits, any other type of integrated circuit (IC), a state machine, and the like. The processor 1218 may perform signal coding, data processing, power control, input/output processing, and/or any other functionality that enables the WTRU 1500 to operate in a wired and/or wireless environment. The processor 1218 may be coupled to the transceiver 1220, which may be coupled to the transmit/receive element 1222. While
The transmit/receive element 1222 may be configured to transmit signals to, and/or receive signals from, another terminal over an air interface 1215. For example, in one or more embodiments, the transmit/receive element 1222 may be an antenna configured to transmit and/or receive RF signals. In one or more embodiments, the transmit/receive element 1222 may be an emitter/detector configured to transmit and/or receive IR, UV, or visible light signals, for example. In one or more embodiments, the transmit/receive element 1222 may be configured to transmit and/or receive both RF and light signals. It will be appreciated that the transmit/receive element 1222 may be configured to transmit and/or receive any combination of wireless signals.
In addition, although the transmit/receive element 1222 is depicted in
The transceiver 1220 may be configured to modulate the signals that are to be transmitted by the transmit/receive element 1222 and/or to demodulate the signals that are received by the transmit/receive element 1222. As noted above, the WTRU 1202 may have multi-mode capabilities. Thus, the transceiver 1220 may include multiple transceivers for enabling the WTRU 1500 to communicate via multiple RATs, such as UTRA and IEEE 802.11, for example.
The processor 1218 of the WTRU 1202 may be coupled to, and may receive user input data from, the speaker/microphone 1224, the keypad 1226, and/or the display/touchpad 1228 (e.g., a liquid crystal display (LCD) display unit or organic light-emitting diode (OLED) display unit). The processor 1218 may also output user data to the speaker/microphone 1224, the keypad 1226, and/or the display/touchpad 1228. In addition, the processor 1218 may access information from, and store data in, any type of suitable memory, such as the non-removable memory 1230 and/or the removable memory 1232. The non-removable memory 1230 may include random-access memory (RAM), read-only memory (ROM), a hard disk, or any other type of memory storage device. The removable memory 1232 may include a subscriber identity module (SIM) card, a memory stick, a secure digital (SD) memory card, and the like. In one or more embodiments, the processor 1218 may access information from, and store data in, memory that is not physically located on the WTRU 1202, such as on a server or a home computer (not shown).
The processor 1218 may receive power from the power source 1234, and may be configured to distribute and/or control the power to the other components in the WTRU 1202. The power source 1234 may be any suitable device for powering the WTRU 1202. For example, the power source 1234 may include one or more dry cell batteries (e.g., nickel-cadmium (NiCd), nickel-zinc (NiZn), nickel metal hydride (NiMH), lithium-ion (Li-ion), etc.), solar cells, fuel cells, and the like.
The processor 1218 may be coupled to the GPS chipset 1236, which may be configured to provide location information (e.g., longitude and latitude) regarding the current location of the WTRU 1202. In addition to, or in lieu of, the information from the GPS chipset 1236, the WTRU 1202 may receive location information over the air interface 1215 from a terminal (e.g., a base station) and/or determine its location based on the timing of the signals being received from two or more nearby base stations. It will be appreciated that the WTRU 1202 may acquire location information by way of any suitable location-determination method while remaining consistent with an embodiment.
The processor 1218 may further be coupled to other peripherals 1238, which may include one or more software and/or hardware modules that provide additional features, functionality and/or wired or wireless connectivity. For example, the peripherals 1238 may include an accelerometer, orientation sensors, motion sensors, a proximity sensor, an e-compass, a satellite transceiver, a digital camera and/or video recorder (e.g., for photographs and/or video), a universal serial bus (USB) port, a vibration device, a television transceiver, a hands free headset, a Bluetooth® module, a frequency modulated (FM) radio unit, and software modules such as a digital music player, a media player, a video game player module, an Internet browser, and the like.
By way of example, the WTRU 1202 may be configured to transmit and/or receive wireless signals and may include user equipment (UE), a mobile station, a fixed or mobile subscriber unit, a pager, a cellular telephone, a personal digital assistant (PDA), a smartphone, a laptop, a netbook, a tablet computer, a personal computer, a wireless sensor, consumer electronics, or any other terminal capable of receiving and processing compressed video communications.
The WTRU 1202 and/or a communication network (e.g., communication network 804) may implement a radio technology such as Universal Mobile Telecommunications System (UMTS) Terrestrial Radio Access (UTRA), which may establish the air interface 1215 using wideband CDMA (WCDMA). WCDMA may include communication protocols such as High-Speed Packet Access (HSPA) and/or Evolved HSPA (HSPA+). HSPA may include High-Speed Downlink Packet Access (HSDPA) and/or High-Speed Uplink Packet Access (HSUPA). The WTRU 1202 and/or a communication network (e.g., communication network 804) may implement a radio technology such as Evolved UMTS Terrestrial Radio Access (E-UTRA), which may establish the air interface 1515 using Long Term Evolution (LTE) and/or LTE-Advanced (LTE-A).
The WTRU 1202 and/or a communication network (e.g., communication network 804) may implement radio technologies such as IEEE 802.16 (e.g., Worldwide Interoperability for Microwave Access (WiMAX)), CDMA2000, CDMA2000 1×, CDMA2000 EV-DO, Interim Standard 2000 (IS-2000), Interim Standard 95 (IS-95), Interim Standard 856 (IS-856), Global System for Mobile communications (GSM), Enhanced Data rates for GSM Evolution (EDGE), GSM EDGE (GERAN), and the like. The WTRU 1500 and/or a communication network (e.g., communication network 804) may implement a radio technology such as IEEE 802.11, IEEE 802.15, or the like.
Although features and elements are described above in particular combinations, one of ordinary skill in the art will appreciate that each feature or element can be used alone or in any combination with the other features and elements. In addition, the methods described herein may be implemented in a computer program, software, or firmware incorporated in a computer-readable medium for execution by a computer or processor. Examples of computer-readable media include electronic signals (transmitted over wired or wireless connections) and computer-readable storage media. Examples of computer-readable storage media include, but are not limited to, a read only memory (ROM), a random access memory (RAM), a register, cache memory, semiconductor memory devices, magnetic media such as internal hard disks and removable disks, magneto-optical media, and optical media such as CD-ROM disks, and digital versatile disks (DVDs). A processor in association with software may be used to implement a radio frequency transceiver for use in a WTRU, UE, terminal, base station, RNC, or any host computer.
The present application is a non-provisional filing of, and claims benefit under 35 U.S.C. § 119(c) from, U.S. Provisional Patent Application Ser. No. 62/326,590, filed Apr. 22, 2016, entitled “Prediction Systems and Methods Based on Nearest Neighboring Pixels for Video Coding,” which is incorporated herein by reference in its entirety.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US2017/028822 | 4/21/2017 | WO | 00 |
Number | Date | Country | |
---|---|---|---|
62326590 | Apr 2016 | US |