Video compression is used to reduce the size of video files, which allows video files to be processed, stored and transmitted more efficiently. Video compression takes advantage of predictive coding to remove redundant video frames without noticeably affecting the quality of video.
A variety of different video compression techniques and standards exist. Some video compression techniques and standards are applied to different types of frames of the video file (e.g., I frames, P frames and B frames) by analyzing macroblocks within the frame of video.
The present disclosure discloses an example apparatus and method for applying a temporal drift correction to video frames of a video. For example, video compression that takes advantage of predictive coding techniques are sensitive to error propagation due to the fact that the predictive accuracy of future frames are partially dependent on the perceptibility of the change from the frames previously encoded. As a result, the propagation of error may become noticeable to a viewer of the video when a reference frame (I-frame) reset occurs at a subsequent I-frame.
The propagation of error can be referred to as “drift.” Example video coding processes use a set frame sequence, known as a group of pictures (GOPs), recursively in the processing of the video stream. Each GOP is bounded by an I-frame, which serves as a reset to encoding errors that may have accumulated during the processing of the previous GOP. If the total error accumulation in GOP is significant, the I-frame reset will likely result in a visible step.
The errors in the video can be approximated by the following function (1):
R=Rb+Re+Rd, Function (1):
where R is the original video being encoded, Rb is a baseline encoding, Re is a reconstruction error of the video when compared to the original video and Rd is an error due to temporal drift. The drift, or temporal drift, refers to the accumulation of small imperceptible errors that result in a visible “jump” during an I-frame reset.
In one example, the present disclosure corrects for the drift that occurs in certain compression techniques compared to a baseline compression technique.
However, the present disclosure can apply a correction factor to residuals of the dynamic compression methods such that the image quality of the dynamic compression methods approach the image quality of the baseline compression method. For example, a dotted line 606 represents the drift correction that is applied to the dynamic compression methods represented by the dashed line 604. As can be seen in
In one example, the video 112 and the video 114 may be a sequence of video frames or images. The video 112 and video 114 may be encoded using I-frames, predictive frames (P-frames), and bi-predictive frames (B-frames). The video 112 and video 114 may be analyzed by the AS 102 using GOPs as described above and analyzing macroblocks of each video frame of a plurality of video frames of the GOPs.
In one implementation, the compressed video 116 that is generated by the first AS 102 may be outputted to the second AS 104 or a database (DB) 106. For example, the DB 106 may be a mass storage device that is used to store the compressed video 116 for later retrieval or processing.
In one implementation, the second AS 104 may be deployed as a hardware device having a processor and memory similar to the first AS 102. In one example, the second AS 104 may be a decoder that decodes the compressed video 116 displayed on a display device 120. The display device 120 may be a monitor, a television, and the like. As described above, the first AS 102 may encode the video 112 to reduce or eliminate the visual artifacts or the visual “jump” that is seen during an I-frame reset when the compressed video 116 is decoded and displayed on the display 120.
At block 202, the method 200 begins. At block 204, the method 200 determines an amount of video information that is lost in a video frame due to compression. For example, as each video frame is compressed the errors may accumulate over time for the entire video. Large amounts of compression or using certain dynamic compression techniques can lead to an accumulation of errors over time that lower the image quality (e.g., correlated to an SSIM value as shown in
At block 206, the method 200 applies a drift correction to the video frame to add back a percentage of the amount of video information that is lost. In other words, some pixels that would be normally discarded in a macroblock or a video frame for compression may be added back in to reduce the accumulated errors over time. In one example, the percentage (e.g., the amount of video information that is added back in) of the drift correction may be associated with a drift compensation factor (β) that has a value between 0 and 1. In one example, β may be a function of a current frame position within a GOP, a next reference frame position with the GOP and a last reference frame position within the GOP. An example of a function (2) that is used to calculate β is provided below:
where βn represents the drift compensation factor at a current frame n. In one example, β may have a linear profile over time. In another example, β may have a quadratic profile over time.
In one example, determining when to start applying β may affect the optimal performance of the encoder. For example, if β is applied too early in time, then the advantage of compression is lost. However, if β is applied too late in time, then the I-frame reset becomes more visible. In one example, an optimal time to begin the application of β may be in a middle of two reference frames. For example, if a GOP included 60 frames, the method 200 may begin applying the drift correction beginning with the 31st frame. Said another way, if the first I-frame begins at time i and the next I-frame begins at time i+n, then the method 200 may begin applying the drift correction at time (i+n)/2.
At block 208, the method 200 encodes the video frame with the drift correction. In one implementation, the blocks 204, 206 and 208 may be repeated for each video frame that is encoded once the application of the drift correction has begun. At block 210, the method 200 ends.
In one example, the method 300 may begin with a residue (R(x,y)) of a video frame. For example, the residue of a video frame may be obtained from motion block estimation that is performed on the B-frame or P-frame of a GOP. For example, a current macroblock of a B-frame or a P-frame can be predicted based on other macroblocks around the current macroblock or previous B-frames and P-frames. The residue may represent a difference between the predicted macroblock and an actual macroblock of the video frame.
At block 302, the residue may be multiplied with a perceptibility of the video frame (P(x,y)) to obtain a perceptibility weighted delta frame of the video frame. For example, the perceptibility weighted delta frame (Δweighted(x,y)) may be represented as Δweighted(x,y)=R(x,y)P(x,y). In one example, perceptibility of the video frame may be obtained by mapping how visually important every pixel is for a video frame. Based on the map, the perceptibility may determine whether a pixel is visually important to be kept or not visually important to be discarded. This may also referred to as the encoding of residues.
At block 304, the method 300 may subtract the perceptibility weighted delta frame of the video frame from the residue to obtain a delta loss of the video frame. The delta loss of the video frame may represent the video information that was lost during compression. In one example, the delta loss (Δloss(x,y)) of the video frame may be represented as Δloss(x,y)=R(x,y)[1−P(x,y)].
At block 306, the method 300 may multiply the delta loss of the video frame by a spatial activity map (SA(x,y)) of the video frame to obtain a preservation of the video frame (e.g., a λ map). The spatial activity map provides an indication of how much motion is in the video frame. The spatial activity provides an indication of how important a pixel may be based on a spatial point of view. In one example, the preservation (Δpres(x,y)) may be represented as Δpres(x,y)=Δloss(x,y)[1−SA(x,y)]ρ, where the exponent ρ is chosen to allow control of a relative weight assigned to low and high values of variance in a Lightness channel. In one example, the value of ρ may be chosen to be 2.2.
At block 308, the method 300 may subtract the preservation of the video frame from the delta loss of the video frame. The result from the block 308 may be fed to block 310.
At block 310, the method 300 may apply the drift correction. For example, the drift compensation factor, β, may be multiplied to the delta loss of the video frame to provide a drift correction of the video frame. In one example, the drift correction (Δdrift(x,y)) may be represented as βn[Δloss(x,y)−Δpres(x,y)].
As discussed above, the block 310 may begin operating at a particular time between the first I-frame and a subsequent I-frame. In other words, the block 310 may operate on a subset of video frames of the plurality of video frames that are contained in a GOP that includes I-frames, B-frames and P-frames before the subsequent I-frame. For example, the block 310 may begin operating at a point in time that is the middle of the first I-frame and the subsequent I-frame.
At block 312, the method 300 may add the perceptibility weighted delta frame of the video frame, the drift correction of the video frame and the preservation of the video frame to obtain a modified residue of the video frame ({tilde over (R)}(x,y)). In one example, {tilde over (R)}(x,y) may be represented as {tilde over (R)}(x,y)=Δdrift(x,y)+Δweighted(x,y)+Δpres(x,y). The modified residue of the video frame may then be provided as an output to a subsequent video processing block and the method 300 ends.
In one example, the video attenuator 402 may attenuate, or increase the gain, of the video (e.g., the video 112 or 114) to improve the compression of the video. In other words, the video attenuator 402 may increase a compression gain of the video and determine an amount of information that is lost in a video frame of a plurality of video frames of the video. In one example, the output of the video attenuator 402 may be represented as the residue R(x,y) described above.
In one example, the drift corrector 404 may apply a drift correction to a subset of the plurality of video frames to add back a percentage of the amount of information that is lost during compression. For example, the drift corrector 404 may include the operations illustrated in
In one example, the drift corrector 404 may receive inputs from other functional blocks, such as a block that computes a spatial activity map. As described above, the drift corrector 404 may receive the SA(x,y) from the block that computes the spatial activity map.
In one example, the transform, scale, and quantizer 406 may perform a discrete cosine transform on the video. The encoder 408 may then perform the compression of the plurality of video frames based on the compression gain and the drift correction. The encoder 408 may then output the encoded video to mass storage device (e.g., the DB 106) for later retrieval or to a decoder (e.g., the second AS 104) for decompression and display.
It should be noted that
In one example, the instructions 506 may include instructions to subtract a perceptibility weighted delta of a video frame from a residue of the video frame to calculate a delta loss of the video frame. The instructions 508 may include instructions to apply a drift correction to the delta loss frame. The instructions 510 may include instructions to generate a modified residue of the video frame based on the perceptibility weighted delta of the video frame plus the delta loss frame that includes the drift correction.
It will be appreciated that variants of the above-disclosed and other features and functions, or alternatives thereof, may be combined into many other different systems or applications. Various presently unforeseen or unanticipated alternatives, modifications, variations, or improvements therein may be subsequently made by those skilled in the art which are also intended to be encompassed by the following claims.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US2016/014587 | 1/22/2016 | WO | 00 |
Publishing Document | Publishing Date | Country | Kind |
---|---|---|---|
WO2017/127115 | 7/27/2017 | WO | A |
Number | Name | Date | Kind |
---|---|---|---|
6498814 | Morel | Dec 2002 | B1 |
8654844 | Hoang | Feb 2014 | B1 |
9135723 | Guo et al. | Sep 2015 | B1 |
20040066793 | Van Der Schaar | Apr 2004 | A1 |
20040202249 | Lo et al. | Oct 2004 | A1 |
20090074084 | Drezner et al. | Mar 2009 | A1 |
20110216828 | Yang et al. | Sep 2011 | A1 |
20140233632 | Narroschke et al. | Aug 2014 | A1 |
20150010060 | Lei et al. | Jan 2015 | A1 |
20150334422 | Shaw et al. | Nov 2015 | A1 |
Number | Date | Country |
---|---|---|
WO2008075261 | Jun 2008 | WO |
Entry |
---|
Yang, J.X et al, “Robust Filtering Technique for REduction of Temporal Fluctuation in H.264 Video Sequences”, Nov. 3, 2009. |
International Searching Authority., International Search Report and Written Opinion dated Oct. 31, 2016 for PCT Application No. PCT/US2016/014587 Filed Jan. 22, 2016, 11 pgs. |
Gong Yanchao et al Perception-based quantitative definition of temporal pumping artefact 2014 19th International Conference on Digital Signal Processing, Aug. 20, 2014, pp. 870-875, XP032644189. |
MarkQ. Shaw et al Temporal Drift Correction of Residues for Perceptually Based Video Compression Color Science and Engineering Systems, Technologies and Application 24th Color and Imaging Conference 2016 (CiC24) San Diego California USA Nov. 7-11 vol. 2016 No. 1, pp. 27-31 XP055537778. |
Shaw Mark Q et al Color Difference Weighted Adaptive Residual Preprocessing Using Perceptual Modeling for Video Compression, Signal Processing, Image Communication, Elsevier Science Publishers Amsterdam NL vol. 39, Apr. 25, 2015 pp. 355-368 XP029304933. |
Number | Date | Country | |
---|---|---|---|
20180227590 A1 | Aug 2018 | US |