This application claims the benefit, under 35 U.S.C. §365 of International Application PCT/CN2010/001922, filed Nov. 30, 2010, which was published in accordance with PCT Article 21(2) on Jun. 7, 2012 in English.
This invention relates to method and apparatus for measuring quality of a video based on frame loss pattern.
This section is intended to introduce the reader to various aspects of art, which may be related to various aspects of the present invention that are described and/or claimed below. This discussion is believed to be helpful in providing the reader with background information to facilitate a better understanding of the various aspects of the present invention. Accordingly, it should be understood that these statements are to be read in this light, and not as admissions of prior art.
In a transmission of digitally compressed video, a very important source of impairments comes from the delivery of the video stream over an error-prone channel. Partial loss or partial corruption of information can have a dramatic impact on user's perceived quality because a localized distortion within a frame can spatially and temporally propagate over frames. The visual impact of such frame loss varies between video decoders depending on their ability to deal with corrupted streams. In some cases, a decoder may decide to drop some frames on its own initiative. For example, a decoder can entirely drop or discard the frame that has corrupted or missing information and repeat the previous video frame instead until the next valid decoded frame is available. Encoders can also drop frames during a sudden increase of motion in the content in a case that the target encoding bit rate is too low. In all the-above case, we call a frame loss occurs in a video.
In many existing video quality monitoring products, the overall video quality of a media will be analyzed based on three main coding artifacts, which are jerkiness, blockiness and blurring. Blockiness and blurring are two main kinds of spatial coding artifacts which behave as discontinuity in block boundary and high frequency loss respectively. While jerkniess is the most important temporal artifacts.
A temporal video quality degradation caused by a set of group frame losses is called a jerkiness, wherein the group frame loss means a fact that one or more consecutive frames in a video sequence are lost together.
There are some studies about the evaluation of the perceptual impact of (periodic and non-periodic) video frame losses on perceived video quality.
In K. C. Yang, C. C. Guest, K. EI-Maleh and P. K. Das, “Perceptual Temporal Quality Metric for Compressed Video”. IEEE Transaction on Multimedia, vol.9, no.7, November 2007, pp.1528-1535 (hereinafter referred to as prior art 1), it was pointed out that humans usually have higher tolerance to consistent frame loss and the negative impact is highly related to the consistency of frame loss, which is then be used as a measurement of jerkiness.
In R. R. Pastrana-Vidal and J. C. Gicquel, “Automatic Quality Assessment of Video Fluidity Impairments Using a No-Reference Metric”, the 2nd International Workshop on Video Processing and Quality Metric for Consumer Electronics, Scottsdale, USA 22-24, January 2006 (hereinafter referred to as prior art 2), the relationship between perceptual impacts of jerkiness and the length and occurrence frequency of the group frame losses was mentioned.
The inventors of the present invention have found that a frame loss pattern of a video has a great influence on the perceptual impact of jerkiness, which in turn will impact the overall video quality. By “frame loss pattern”, it means a pattern generated by recording in sequence the status of each frame in a video sequence on whether they are successfully transmitted or lost during transmission with different representations.
Therefore, the present invention makes use of this finding, by providing a method for measuring the quality of a video, and a corresponding apparatus.
In one embodiment, a method for measuring a quality of video is provided. The method comprises: generating a frame loss pattern of the video by indicating whether each frame in the video is lost or successfully transmitted; and evaluating the quality of the video as a function of the generated frame loss pattern.
In one embodiment, an apparatus for measuring a quality of video is provided. The apparatus comprises: means for receiving an input video and generating a frame loss pattern of the received video; and means for evaluating the quality of the video as a function of the generated frame loss pattern.
Advantageous embodiments of the invention are disclosed in the dependent claims, the following description and the figures.
Exemplary embodiments of the invention are described with reference to the accompanying drawings, in which:
In the following description, various aspects of an embodiment of the present invention will be described. For purposes of explanation, specific configurations and details are set forth in order to provide a thorough understanding. However, it will also be apparent to one skilled in the art that the present invention may be practiced without the specific details present herein.
According to the finding of the inventor, the perceptual impact of jerkiness will be greatly influenced by the frame loss pattern. The following cases (1) to (3) are taken as an example, which are within a same frame loss ratio.
The overall frame loss ratio in all of the above three cases is 50%. However, their perceptual impact is quite different. in case 1, a viewer will perceive clear dithering and even feel sick after long period browsing; while in case 3, the viewer will not perceive such kind of phoeneman but will be facing a long period freezing. That is, totally different perceptions will be caused by different frame loss pattern with the same frame loss rate.
According to an embodiment of the present invention, a method for measuring the quality of a video is provided based on the above finding.
As shown in
S201: a frame loss pattern is generated by indicating the status (lost or successfully transmitted) of each frame in a video sequence;
S202: grouping, from the first lost frame, one or more consecutive lost frames in the video sequence into a group frame loss;
S203: dividing, from the first group frame loss, the frame loss pattern into a plurality of sections having one or more consecutive group frame losses, each group frame loss in a section having the same number of successfully transmitted frames between the group frame loss and its previous group frame loss and the same number of lost frames;
S204: calculating a value of the quality degradation generated by each section of group frame loss;
S205: evaluating the quality of the video sequence by combining the values of all sections.
Next, a detailed description will be given with reference to attached drawings.
In the method according to the embodiment of the invention, firstly a frame loss pattern of a video sequence is generated. This can be achieved by indicating the status of all frames in the video sequence by an appropriate manner. It can be appreciated by a person skilled in the art that a frame loss can be detected by known methods. No further details will be given for this point.
Next, starting from the first one of all the lost frames in the video sequence, one or more consecutive lost frames will be grouped into one group which is called a group frame loss.
Denote the considered frame loss pattern FLP={gd1, gd2, . . . , gdR|gdi=(gapgdi, lengdi)} which is given by time stamp, wherein gdi represents the ith group frame loss. Then the frame loss pattern will be divided (or segmentated), from the first group frame loss, into a plurality of sub-sections. Each sub-section comprises one or more consecutive group frame losses, each of which has similar perceptual impact on the quality degradation of the video sequence.
For the above purpose of dividing consecutive group frame losses with a similar perceptual impact, in this method, each group frame loss in a video sequence can be identified by two parameters gd=(gapgp, lengd): the first parameter gapgd is the number of successfully transmitted frames between the IS current group frame loss and the previous group frame loss; and the second parameter lengd is the number of lost frames in the current group frame loss. Both the value of gapgd and lengd can be limited as an integer between 1 and 10. If all the group frame losses in a segmented sub-section have the same parameters gapgd and lengd, they will have similar perceptual impact on the quality degradation of the video sequence.
A distance function d(gd1, gd2)=|f(gapgd1, lengd1)−f(gapgd2, lengd2)| can be used as a measurement of the difference in perceptual impact between two group frame losses in a sub-section. In the above distance function, function f(x,y) is used for a perceptual quality evaluation of a video, which will be described later.
The frame loss pattern is then segmentated into sub-sections based upon the definition of the distance function.
The following is a pseudo-code of the distance function.
In the above pseudo-code, c is a constant number. This procedure divides the frame loss pattern FLP into a set of sub-sections:
FLP={SubSection1, SubSection2, . . . , SubSectionnCount}.
Next, a perceptual evaluation of each sub-section will be carried out.
It can be appreciated that since the group frame losses inside a same sub-section are considered to be of similar perceptual impact, the sub-section can be treated as a typically periodic frame loss.
Therefore, each sub-section is also identified by two parameters SubSection=(gap55,len55): the first parameter gap55 is the average number of successfully transmitted frames between two neighboring group frame losses and the second parameter len55 is the average number of lost frame of all the group frame losses in the sub-section.
Simply speaking, the sub-section's feature values of gap55 and len55 are exactly the average values of gapgd and lengd over all the group frame losses inside the sub-section.
It is then supposed that the perceptual quality degradation of the sub-section is determined by the feature values of gap and len. Defined as:
Jp(SubSection)=fp(gap55 ,len55) (1)
By subjective examination, for some discrete values of (gap55,len55), the perceptual quality evaluation can be marked manually. For this purpose, we define the discrete function f(x, y) with x,y ε (1,2, . . . , 10).
As an example, the function f(x,y) can be defined as:
where cstill constant number as a threshold, CameraMotion is a measurement of the level of camera motions in the sub-section. And f1(x,y) f2(x,y) are given in the below tables:
Since camera motions will be another important factor that influences the perceptual quality, the level of camera motions also needs to be estimated.
The camera motions can be estimated by known methods. One of the most important global motion estimation models is the eight-parameter perspective motion model described by L. Y. Duan, J. Q. Wang et al, “Shot-Level Camera Motion Estimation based on a Parametric Model” (hereinafter referred to as prior art 3).
The prior art 3 disclosed the following equations:
xi1=(a0+a2xi+a3yi)/(1+a6xi+a7yi)
yi132 (a1+a4xi+a5yi)/(1+a6xi+a7yi)
where (a0, . . . , a7) are the global motion parameters, (xi, yi) denotes the spatial coordinates of the ith pixel in the current frame and (xtl,ytl) denotes the coordinates of the corresponding pixel in the previous frame. The relationship between motion model parameters and symbol-level interpretation is established:
Pan=a0
Tilt=a1
Zoom=(a2+a5)/2
The algorithms introduced in the prior art 3 is applied to extract the eight-parameter GME model in the method of the embodiment of the present invention. The level of the camera motion is finally defined as:
CameraMotion=β1×Pan+β2×Tilt+β3×Zoom
Then fp(x,y)=f(x,y) is defined for x,y ε (1, 2, . . . , 10). There is also a problem on how to generalize the function fp(x,y) to those non-integer variables, which is a typical training problem. Therefore, a training machine (for example, Artifical Neural Network (ANN) which is known in the art) can be used to assign Jp(SubSection)=ANN (gap55,len55) while the machine is trained with f(x, y).
Till now, a value Jp of the perceptual quality degradation generated by each sub-section of the frame loss pattern is obtained.
Finally, the quality of the video sequence will be evaluated by combining the values of all sections of the frame loss pattern.
In this method, a pooling strategy can be used to combine these values into an overall quality evaluation of the video sequence. It should be pointed out that the pooling strategy of such a temporal quality is quite different from the pooling strategy when considering spatial artifacts such as blockiness, blur, etc. Because of the characteristic of a human vision system (HVS), people are “easy to hate, difficult to forgive”. The successfully transmitted frames between two sub-sections which are of higher temporal quality will usually be ignored when considering the overall temporal quality.
In the above-described segementation step, the video sequence is segmented into a set of sub-sections of periodic frame loss FLP=(SubSection1, SubSection2, . . . , SubSectionnCount), and every two neighboring sub-sections are separated by some successfully transmitted frames, denote NoLossi the successfully transmitted frames between SubSectioni and SubSectioni+1. For simplicity, The NoLossi will be treated as a special kind of periodic frame loss with least quality degradation value 1. That is, we set
Jp(NoLoss)=1 (2)
And then all these NoLossi was inserted into the set FLP.
Therefore, the overall quality degradation is defined as:
Wherein w(flpi) is a weighting function for the element of FLP, which is defined as
wflpi)=fT(dist(flpi))×fD(Jp(flpi))×length(flpi) (4)
In this expression, length(flpi) is the number of frames in flpi; dist(flpi) is the distance of the center of flpi to the last frame; Jp(flpi) is the perceptual temporal degradation introduced by flpi as defined above.
fT is the function to describe human's “remember & forget” property. It is supposed that a viewer will provide his/her overall evaluation when he/she finished browsing the last frame. The sub-sections far away from the last frame will probability be forgotten by the viewer. The more far away, the higher probability to be forgot.
fD is the function to describe human's “easy to hate, hard to forgive” vision property. Human will get high impact to the sub-sections with a significant distortion while ignore most of the sub-sections without distortion.
As shown in
In the segmentation step, the frame loss pattern of an input video sequence is divided into a set of sections as described above. These sections can be classified into two kinds, one of which (SubSectioni) is composed of similar group frame losses and considered as periodic frame loss inside the segment; and another kind (NoLossi) contains no frame loss.
In the perceptual evaluation step, the perceptual evaluation of NoLossi is set to constant number 1. The perceptual evaluation of SubSectioni is estimated based on equation (1) as described above.
In the pooling step an overall jerkiness evaluation is estimated based on the perceptual evaluation of all the sections according to equation (3) as described above.
Another embodiment of the present invention provides an apparatus for measuring the quality of a video based on frame loss pattern.
As shown in
Experiments were done to estimate the evalution accuracy of the present invention compared with the prior arts 1 and 2. For this purpose, a software tool is designed to have a subjective test of a video quality.
A viewer is then required to mark the perception of jerkiness as follows:
In the subjective test, 10 CIF (video resolution of 352×288 pixels) sequences is selected and 20 frame loss pattern is chosen. Three viewers are invited to score while their average value is considered the subjective score, denoted as JS. All the sequence with the marked score composed the dataset DS.
Parameters Setting:
In the implementation, the constant numbers are determined experientially. β1=β2=1. β3=2; c=1.5; cstill=0.23.
And for simplicity, a 300-frame window was taken as the memory size while supposing that a viewer will forget about the quality of the frames before this window. And inside the window, set fT≡1 and Set fD(d)=6−d. f1(x,y) and f2 (x,y) are determined by Table 1 and Table 2 described above.
Experimental Results:
The evaluation accuracy of the present invention is estimated by comparing the objective evaluation result J obtained according to the present invention with the subjective score JE. Pearson Correlation is used for the prediction accuracy measurement.
The below table shows the Pearson Correlation (prediction accuracy) of the present invention and the algorithm proposed in the prior arts 1 and 2.
While there has been shown, described, and pointed out fundamental novel features of the present invention as applied to preferred embodiments thereof, it will be understood that various omissions and substitutions and changes in the apparatus and method described, in the form and details of the devices disclosed, and in their operation, may be made by those skilled in the art without departing from the spirit of the present invention. It is expressly intended that all combinations of those elements that perform substantially the same function in substantially the same way to achieve the same results are within the scope of the invention. Substitutions of elements from one described embodiment to another are also fully intended and contemplated.
It will be understood that the present invention has been described purely by way of example, and modifications of detail can be made without departing from the scope of the invention. Each feature disclosed in the description and (where appropriate) the claims and drawings may be provided independently or in any appropriate combination. Features may, where appropriate be implemented in hardware, software, or a combination of the two.
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/CN2010/001922 | 11/30/2010 | WO | 00 | 5/23/2013 |
Publishing Document | Publishing Date | Country | Kind |
---|---|---|---|
WO2012/071680 | 6/7/2012 | WO | A |
Number | Name | Date | Kind |
---|---|---|---|
7233348 | Bourret | Jun 2007 | B2 |
8150234 | Bourret | Apr 2012 | B2 |
8843979 | Berkey et al. | Sep 2014 | B2 |
20030121053 | Honda | Jun 2003 | A1 |
20030142214 | Bourret | Jul 2003 | A1 |
20070047640 | Venna et al. | Mar 2007 | A1 |
20070237227 | Yang et al. | Oct 2007 | A1 |
20070242080 | Hamada et al. | Oct 2007 | A1 |
20080192119 | Li et al. | Aug 2008 | A1 |
20080205856 | Kim et al. | Aug 2008 | A1 |
20080316362 | Qiu et al. | Dec 2008 | A1 |
20090148058 | Dane et al. | Jun 2009 | A1 |
20090196188 | Takeyoshi et al. | Aug 2009 | A1 |
20100002771 | Huang et al. | Jan 2010 | A1 |
20100039943 | Ryoo et al. | Feb 2010 | A1 |
20100042843 | Brunk et al. | Feb 2010 | A1 |
20100053336 | Bourret | Mar 2010 | A1 |
20100091841 | Ishtiaq et al. | Apr 2010 | A1 |
20100095455 | Brinkerhoff et al. | Apr 2010 | A1 |
Number | Date | Country |
---|---|---|
101558657 | Oct 2009 | CN |
101635846 | Jan 2010 | CN |
2077672 | Jul 2009 | EP |
2296379 | Mar 2011 | EP |
2009018834 | Feb 2009 | KR |
WO2008119924 | Oct 2008 | WO |
WO2009087863 | Jul 2009 | WO |
Entry |
---|
Search Report:Aug. 25, 2011. |
Duan et al: “Shot-Level Camera Motion Estimation Based on a Parametric Model”; Proceedings of the TREC Video Retrieval Evaluation Online Proceedings; Oct. 15, 2005. |
Yang et al: “Perceptual Temporal Quality Metric for Compressed Video”; IEEE Transaction on Multimedia, vol. 9, issue 7, Nov. 2007 pp. 1528-1535. |
Pastrana-Vidal et al: “Automatic Quality Assessment of Video Fluidity Impairments Using a No Reference Metric.” International Workshop on Video Processing and Quality Metric for Consumer Electronics. |
Yang et al: “Temporal Quality Evaluation for Enhancing Compressed Video”, Proceedings of 16th Int'l Conference on Computer Communications & Networks, Aug. 13-16, 2007; Honolulu, Hawaii. |
Tasaka et al: “Enhancement of QoE in Audio-Video IP Transmission by Utilizing Tradeoff Between Spatial and Temporal Quality for Video Packet Loss”; IEEE Global Telecommunication Conference; Nov. 30-Dec. 4, 2008; New Orleans, Louisiana. |
Sun et al: “Low Complexity Frame Importance Modelling and Resource Allocation Scheme for Error Resilience H. 264 Video Streaming”; IEEE 10th Workshop on Multimedia Signal Processing, Oct. 8-10, 2008, Cairns, Australia. |
Xu et al: “A Novel Algorithm for Video Smoothness Evaluaton”; Int'l Conference on Advanced Computer Theory & Engineering; Dec. 20-22, 2008. |
Number | Date | Country | |
---|---|---|---|
20130235214 A1 | Sep 2013 | US |