The present invention claims priority under 35 U.S.C. 119(a-d) to CN 201410360953.9, filed Jul. 25, 2014.
1. Field of Invention
The present invention relates to a video signal processing technology, and more particularly to a video quality evaluation method based on 3-dimensional (3D for short) wavelet transform.
2. Description of Related Arts
With the rapid development of video coding technology and display technology, different kinds of video systems are applied more and more widely, and gradually become the research focus of the field of information processing. Because of a series of uncontrollable factors, video information will be inevitably distorted in video acquisition, compression, transmission, decoding and display stages, resulting in decrease of video quality. Therefore, how to accurately measure the video quality is the key for the development of video system. Video quality evaluation is divided into subjective and objective quality evaluation. As the visual information is eventually accepted by human eye, the subjective quality evaluation is the most reliable in accuracy. However, subjective quality evaluation requires scoring by observer, which is time-consuming and not easy to be integrated in the video system. The objective quality evaluation model is able to be well integrated in the video system for real-time quality evaluation, which contributes to timely parameter adjustment of the video system, so as to provide a video system application with high quality. Therefore, the objective video quality evaluation method, which is accurate, effective and consistent with human visual characteristics, has a very good application value. The conventional objective video quality evaluation method mainly simulates motion and time-domain video information processing methods of human eyes, and some objective image quality evaluation methods are combined. That is to say, time-domain distortion evaluation of the video is added into the conventional objective image quality evaluation, so as to objectively evaluate the video information quality. Although time-domain information of video sequences are described from different angles according to the above methods, understanding of processing methods of human eye when viewing video information is limited at present. Therefore, time-domain information description according to the above methods is limited, which means it is difficult to evaluate the video time-domain quality, and will eventually lead to poor consistency of objective evaluation results with subjective evaluation visual results.
An object of the present invention is to provide a video quality evaluation method based on 3D wavelet transform which is able to effectively improve relativity between an objective quality evaluation result and subjective quality judged by human eyes.
Accordingly, in order to accomplish the above object, the present invention provides a video quality evaluation method based on 3D wavelet transform, comprising steps of:
a) marking an original undistorted reference video sequence as Vref, marking a distorted video sequence as Vdis, wherein the Vref and the Vdis both comprise Nfr frames of images, wherein Nfr≧2n, n is a positive integer, and nε[3,5];
b) regarding 2n frames of images as a group of picture (GOP for short), respectively dividing the Vref and the Vdis into nGoF GOPs, marking a No. i GOP in the Vref as Grefi, marking a No. i GOP in the Vdis as Gdisi, wherein
the symbol └ ┘ means down-rounding, and 1≦i≦nGoF;
c) applying 2-level 3D wavelet transform on each of the GOPs of the Vref, for obtaining 15 sub-band sequences corresponding to each of the GOPs, wherein the 15 sub-band sequences comprise 7 level-1 sub-band sequences and 8 level-2 sub-band sequences, each of the level-1 sub-band sequences comprises
frames of images, and each of the level-2 sub-band sequences comprises
frames of images;
similarly, applying the 2-level 3D wavelet transform on each of the GOPs of the Vdis, for obtaining 15 sub-band sequences corresponding to each of the GOPs, wherein the 15 sub-band sequences are 7 level-1 sub-band sequences and 8 level-2 sub-band sequences, each of the level-1 sub-band sequences comprises
frames of images, and each of the level-2 sub-band sequences comprises
frames of images;
d) calculating quality of each of the sub-band sequences corresponding to the GOPs of the Vdis, marking the quality of a No. j sub-band sequence corresponding to the Gdisi as Qi,j, wherein
K represents a frame quantity of a No. j sub-band sequence corresponding to the Grefi and the No. j sub-band sequence corresponding to the Gdisi; if the No. j sub-band sequence corresponding to the Grefi and the No. j sub-band sequence corresponding to the Gdisi are both the level-1 sub-band sequences, then
if the No. j sub-band sequence corresponding to the Grefi and the No. j sub-band sequence corresponding to the Gdisi are both the level-2 sub-band sequences, then
VIrefi,j,k represents a No. k frame of image of the No. j sub-band sequence corresponding to the Grefi, VIdisi,j,k represents a No. k frame of image of the No. j sub-band sequence corresponding to the Gdisi, SSIM ( ) is a structural similarity function, and
μref represents an average value of the VIrefi,j,k, μdis represents an average value of the VIdisi,j,k, σref represents a standard deviation of the VIrefi,j,k, σdis represents a standard deviation of the VIdisi,j,k, σref-dis represents covariance between the VIrefi,j,k and the VIdisi,j,k, c1 and c2 are constants, and c1≠0, c2≠0;
e) selecting 2 sequences from the 7 level-1 sub-band sequences of each of the GOPs of the Vdis, then calculating quality of the level-1 sub-band sequences corresponding to the GOPs of the Vdis according to quality of the selected 2 sequences of the level-1 sub-band sequences corresponding to the GOPs of the Vdis, wherein for the 7 level-1 sub-band sequences corresponding to the Gdisi, supposing that a No. p1 sequence and a No. q1 sequence of the level-1 sub-band sequences are selected, then quality of the level-1 sub-band sequences corresponding to the Gdisi is marked as QLv1i, wherein QLv1i=wLv1×Qi,p
and selecting 2 sequences from the 8 level-2 sub-band sequences of each of the GOPs of the Vdis, then calculating quality of the level-2 sub-band sequences corresponding to the GOPs of the Vdis according to quality of the selected 2 sequences of the level-2 sub-band sequences corresponding to the GOPs of the Vdis, wherein for the 8 level-2 sub-band sequences corresponding to the Gdisi, supposing that a No. p2 sequence and a No. q2 sequence of the level-2 sub-band sequences are selected, then quality of the level-2 sub-band sequences corresponding to the Gdisi is marked as QLv2i, wherein QLv2i=wLv2×Qi,p
f) calculating quality of the GOPs of the Vdis according to the quality of the level-1 and level-2 sub-band sequences corresponding to the GOPs of the Vdis, marking the quality of the Gdisi as QLvi, wherein QLvi=wLv×QLv1i+(1−wLv)×QLv2i, wLv is a weight value of the QLvi; and
g) calculating objective evaluated quality of the Vdis according to the quality of the GOPs of the Vdis, marking the objective evaluated quality as Q, wherein
wi is a weight value of the QLvi.
Preferably, for selecting the 2 sequences of the level-1 sub-band sequences and the 2 sequences of the level-2 sub-band sequences, the step e) specifically comprises steps of:
e-1) selecting a video database with subjective video quality as a training video database, obtaining quality of each sub-band sequence corresponding to each GOP of distorted video sequences in the training video database by applying from the step a) to the step d), marking the No. nv distorted video sequence as Vdisn
e-2) calculating objective video quality of all the same sub-band sequences corresponding to all the GOPs of the distorted video sequences in the training video database, marking objective video quality of all the No. j sub-band sequences corresponding to all the GOPs of the Vdisn
e-3) forming a vector vXj with the objective video quality of all the No. j sub-band sequences corresponding to all the GOPs of the distorted video sequences in the training video database, wherein vXj=(VQ1j, VQ2j, . . . , VQn
then calculating a linear correlation coefficient of the objective video quality of the same sub-band sequences corresponding to all the GOPs of the distorted video sequences in the training video database and the subjective quality of the distorted sequences, marking the linear correlation coefficient of the objective video quality of the No. j sub-band sequence corresponding to all the GOPs of the distorted video sequences and the subjective quality of the distorted sequences as CCj, wherein
Q
j is an average value of all element values of the vXj,
e-4) selecting a max linear correlation coefficient and a second max linear correlation coefficient from the 7 linear correlation coefficients corresponding to the 7 level-1 sub-band sequences out of the obtained 15 linear correlation coefficients, regarding the level-1 sub-band sequences respectively corresponding to the max linear correlation coefficient and the second max linear correlation coefficient as the two level-1 sub-band sequences to be selected; and selecting a max linear correlation coefficient and a second max linear correlation coefficient from the 8 linear correlation coefficients corresponding to the 8 level-2 sub-band sequences out of the obtained 15 linear correlation coefficients, regarding the level-2 sub-band sequences respectively corresponding to the max linear correlation coefficient and the second max linear correlation coefficient as the two level-2 sub-band sequences to be selected.
Preferably, in the step e), wLv1=0.71, and wLv2=0.58.
Preferably, in the step f), wLv=0.93.
Preferably, for obtaining the wi, the step g) specifically comprises steps of:
g-1) calculating an average value of brightness average values of all the images in each of the GOPs of the Vdis, marking the average value of the brightness average values of all the images of the Gdisi as Lavgi, wherein
∂f represents the brightness average value of a No. f frame of image, a value of the ∂f is the brightness average value obtained by averaging brightness values of all pixels in the No. f frame of image, and 1≦i≦nGoF;
g-2) calculating an average value of motion intensity of all the images of each of the GOPs except a first frame of image in the GOP, marking the average value of motion intensity of all the images of Gdisi except the first frame of image as MAavgi, wherein
MAf′ represents the motion intensity of the No. f′ frame of image of the Gdisi,
represents a width of the No. f′ frame of image of the Gdisi, H represents a height of the No. f′ frame of image of the Gdisi, mvx (s,t) represents a horizontal value of a motion vector of a pixel with a position of (s,t) in the No. f′ frame of image of the Gdisi, mvy(s,t) represents a vertical value of the motion vector of the pixel with the position of (s,t) in the No. f′ frame of image of the Gdisi;
g-3) forming a brightness average value vector with the average values of the brightness average values of all the images of the GOPs of the Vdis, marking the brightness average value vector as VLavg, wherein VLavg=(Lavg1, Lavg2, . . . , Lavgn
and forming an average value vector of the motion intensity with the average values of the motion intensity of all the images of the GOPs of the Vdis except the first frame of image, marking the average value vector of the motion intensity as VMAavg, wherein VMAavg=(MAavg1, MAavg2, . . . , MAavgn
g-4) normalizing every element of the VLavg, for obtaining normalized values of the elements of the VLavg, marking the normalized value of the No. i element of the VLavg as vLavgi,norm, wherein
Lavgi represents a value of the No. i element of the VLavg, max(VLavg) represents a value of the element with a max value of the VLavg, min(VLavg) represents a value of the element with a min value of the VLavg;
and normalizing every element of the VMAavg, for obtaining normalized values of the elements of the VMAavg, marking the normalized value of the No. i element of the VMAavg as vMAavgi,norm, wherein
MAavgi represents a value of the No. i element of the VMAavg, max(VMAavg) represents a value of the element with a max value of the vMAavg, min(VMAavg) represents a value of the element with a min value of the VMAavg; and
g-5) calculating the weight value wi of the QLvi according to the vLavgi,norm and the vMAavgi,norm, wherein wi=(1−vMAavgi,norm)×vLavgi,norm.
Compared to the conventional technologies, the present invention has advantages as follows.
Firstly, according to the present invention, the 3D wavelet transform is utilized in the video quality evaluation, for transforming the GOPs of the video. By splitting the video sequence on a time axis, time-domain information of the GOPs is described, which to a certain extent solves a problem that the video time-domain information is difficult to be described, and effectively improves accuracy of objective video quality evaluation, so as to effectively improve relativity between the objective quality evaluation result and the subjective quality judged by the human eyes.
Secondly, for time-domain relativity between the GOPs, the method weighs the quality of the GOPs according to the motion intensity and the brightness, in such a manner that the method is able to better meet human visual characteristics.
These and other objectives, features, and advantages of the present invention will become apparent from the following detailed description, the accompanying drawings, and the appended claims.
a is a scatter diagram of objective evaluated quality Q judged by the video quality evaluation method and a difference mean opinion score DMOS of distorted video sequences with wireless transmission distortion according to the preferred embodiment of the present invention.
b is a scatter diagram of objective evaluated quality Q judged by the video quality evaluation method and a difference mean opinion score DMOS of distorted video sequences with IP network transmission distortion according to the preferred embodiment of the present invention.
c is a scatter diagram of objective evaluated quality Q judged by the video quality evaluation method and a difference mean opinion score DMOS of distorted video sequences with H.264 compression distortion according to the preferred embodiment of the present invention.
d is a scatter diagram of objective evaluated quality Q judged by the video quality evaluation method and a difference mean opinion score DMOS of distorted video sequences with MPEG-2 compression distortion according to the preferred embodiment of the present invention.
e is a scatter diagram of objective evaluated quality Q judged by the video quality evaluation method and a difference mean opinion score DMOS of all distorted video sequences in a video quality database according to the preferred embodiment of the present invention.
Referring to the drawings and a preferred embodiment, the present invention is further illustrated.
Referring to
a) marking an original undistorted reference video sequence as Vref, marking a distorted video sequence as Vdis, wherein the Vref and the Vdis both comprise Nfr frames of images, wherein Nfr≧2n, n is a positive integer, and nε[3,5], wherein n=5 in the preferred embodiment;
b) regarding 2n frames of images as a group of picture (GOP for short), respectively dividing the Vref and the Vdis into nGoF GOPs, marking a No. i GOP in the Vref as Grefi, marking a No. i GOP in the Vdis as Gdisi, wherein
the symbol └ ┘ means down-rounding, and 1≦i≦nGoF;
wherein in the preferred embodiment, n=5, therefore, each of the GOPs comprises 32 frames of images; in practice, if quantities of the frames of images of the Vref and the Vdis are not positive integer times of 2n, after a plurality of GOPs are obtained orderly, the rest images are omitted;
c) applying 2-level 3D wavelet transform on each of the GOPs of the Vref, for obtaining 15 sub-band sequences corresponding to each of the GOPs, wherein the 15 sub-band sequences comprise 7 level-1 sub-band sequences and 8 level-2 sub-band sequences, each of the level-1 sub-band sequences comprises
frames of images, and each of the level-2 sub-band sequences comprises
frames of images;
wherein the 7 level-1 sub-band sequences corresponding to the GOPs of the Vref comprise: a level-1 reference time-domain low-frequency horizontal detailed sequence LLHref, a level-1 reference time-domain low-frequency vertical detailed sequence LHLref, a level-1 reference time-domain low-frequency diagonal detailed sequence LHHref, a level-1 reference time-domain high-frequency approximated sequence HLLref, a level-1 reference time-domain high-frequency horizontal detailed sequence HLHref, a level-1 reference time-domain high-frequency vertical detailed sequence HHLref, and a level-1 reference time-domain high-frequency diagonal detailed sequence HHHref; the 8 level-2 sub-band sequences corresponding to the GOPs of the Vref comprise: a level-2 reference time-domain low-frequency approximated sequence LLLLref, a level-2 reference time-domain low-frequency horizontal detailed sequence LLLHref, a level-2 reference time-domain low-frequency vertical detailed sequence LLHLref, a level-2 reference time-domain low-frequency diagonal detailed sequence LLHHref, a level-2 reference time-domain high-frequency approximated sequence LHLLref, a level-2 reference time-domain high-frequency horizontal detailed sequence LHLHref, a level-2 reference time-domain high-frequency vertical detailed sequence LHHLref, and a level-2 reference time-domain high-frequency diagonal detailed sequence LHHHref;
similarly, applying the 2-level 3D wavelet transform on each of the GOPs of the Vdis, for obtaining 15 sub-band sequences corresponding to each of the GOPs, wherein the 15 sub-band sequences are 7 level-1 sub-band sequences and 8 level-2 sub-band sequences, each of the level-1 sub-band sequences comprises
frames of images, and each of the level-2 sub-band sequences comprises
frames of images;
wherein the 7 level-1 sub-band sequences corresponding to the GOPs of the Vdis comprise: a level-1 distorted time-domain low-frequency horizontal detailed sequence LLHdis, a level-1 distorted time-domain low-frequency vertical detailed sequence LHLdis, a level-1 distorted time-domain low-frequency diagonal detailed sequence LHHdis, a level-1 distorted time-domain high-frequency approximated sequence HLLdis, a level-1 distorted time-domain high-frequency horizontal detailed sequence HLHdis, a level-1 distorted time-domain high-frequency vertical detailed sequence HHLdis, and a level-1 distorted time-domain high-frequency diagonal detailed sequence HHHdis; the 8 level-2 sub-band sequences corresponding to the GOPs of the Vdis comprise: a level-2 distorted time-domain low-frequency approximated sequence LLLLdis, a level-2 distorted time-domain low-frequency horizontal detailed sequence LLLHdis, a level-2 distorted time-domain low-frequency vertical detailed sequence LLHLdis, a level-2 distorted time-domain low-frequency diagonal detailed sequence LLHHdis, a level-2 distorted time-domain high-frequency approximated sequence LHLLdis, a level-2 distorted time-domain high-frequency horizontal detailed sequence LHLHdis, a level-2 distorted time-domain high-frequency vertical detailed sequence LHHLdis, and a level-2 distorted time-domain high-frequency diagonal detailed sequence LHHHdis;
wherein the time-domain of the video is split with the 3D wavelet transform; the time-domain information is described from an angle of frequency components, and is treated in a wavelet-domain, which to a certain extent solves a problem that the video time-domain information is difficult to be described in the video quality evaluation, and effectively improves accuracy of the evaluation method;
d) calculating quality of each of the sub-band sequences corresponding to the GOPs of the Vdis, marking the quality of a No. j sub-band sequence corresponding to the Gdisi as Qi,j, wherein
1≦j≦15, 1≦k≦K, K represents a frame quantity of a No. j sub-band sequence corresponding to the Grefi and the No. j sub-band sequence corresponding to the Gdisi; if the No. j sub-band sequence corresponding to the Grefi and the No. j sub-band sequence corresponding to the Gdisi are both the level-1 sub-band sequences, then
if the No. j sub-band sequence corresponding to the Grefi and the No. j sub-band sequence corresponding to the Gdisi are both the level-2 sub-band sequences, then
VIrefi,j,k represents a No. k frame of image of the No. j sub-band sequence corresponding to the Grefi, VIdisi,j,k represents a No. k frame of image of the No. j sub-band sequence corresponding to the Gdisi, SSIM ( ) is a structural similarity function, and
μref represents an average value of the VIrefi,j,k, μdis represents an average value of the VIdisi,j,k, σref represents a standard deviation of the VIrefi,j,k, σdis represents a standard deviation of the VIdisi,j,k, σref-dis represents covariance between the VIrefi,j,k and the VIdisi,j,k, c1 and c2 are constants for preventing unstableness of
when the denominator is close to zero, and c1≠0, c2≠0;
e) selecting 2 sequences from the 7 level-1 sub-band sequences of each of the GOPs of the Vdis, then calculating quality of the level-1 sub-band sequences corresponding to the GOPs of the Vdis according to quality of the selected 2 sequences of the level-1 sub-band sequences corresponding to the GOPs of the Vdis, wherein for the 7 level-1 sub-band sequences corresponding to the Gdisi, supposing that a No. p1 sequence and a No. q1 sequence of the level-1 sub-band sequences are selected, then quality of the level-1 sub-band sequences corresponding to the Gdisi is marked as QLvi, wherein QLv1i=wLv1×Qi,p
and selecting 2 sequences from the 8 level-2 sub-band sequences of each of the GOPs of the Vdis, then calculating quality of the level-2 sub-band sequences corresponding to the GOPs of the Vdis according to quality of the selected 2 sequences of the level-2 sub-band sequences corresponding to the GOPs of the Vdis, wherein for the 8 level-2 sub-band sequences corresponding to the Gdisi, supposing that a No. p2 sequence and a No. q2 sequence of the level-2 sub-band sequences are selected, then quality of the level-2 sub-band sequences corresponding to the Gdisi is marked as QLv2i, wherein QLv2i=wLv2×Qi,p
wherein in the preferred embodiment, wLv1=0.71, wLv2=0.58, p1=9, q1=12, p2=3, and q2=1;
wherein according to the present invention, selection of the No. p1 and the No. q1 level-1 sub-band sequences and selection of the No. p2 and the No. q2 level-2 sub-band sequences are processes of selecting suitable parameters with statistical analysis, that is to say, the selection is provided with a suitable training video database through following steps e-1) to e-4); after obtaining values of the p2, q2, p1 and q1, constant values thereof are applicable during video quality evaluation of distorted video sequences with the video quality evaluation method;
wherein for selecting the 2 sequences of the level-1 sub-band sequences and the 2 sequences of the level-2 sub-band sequences, the step e) specifically comprises steps of:
e-1) selecting a video database with subjective video quality as a training video database, obtaining quality of each sub-band sequence corresponding to GOPs of distorted video sequences in the training video database by applying from the step a) to the step d), marking the No. nv distorted video sequence as Vdisn
e-2) calculating objective video quality of all the same sub-band sequences corresponding to all the GOPs of the distorted video sequences in the training video database, marking objective video quality of all the No. j sub-band sequences corresponding to all the GOPs of the Vdisn
e-3) forming a vector vXj with the objective video quality of all the No. j sub-band sequences corresponding to all the GOPs of the distorted video sequences in the training video database, wherein vXj=(VQ1j, VQ2j, . . . , VQn
then calculating a linear correlation coefficient of the objective video quality of the same sub-band sequences corresponding to all the GOPs of the distorted video sequences in the training video database and the subjective quality of the distorted sequences, marking the linear correlation coefficient of the objective video quality of the No. j sub-band sequence corresponding to all the GOPs of the distorted video sequences and the subjective quality of the distorted sequences as CCj, wherein
Q
j is an average value of all element values of the vXj,
e-4) after obtaining the 15 linear correlation coefficients in the step e-3), selecting a max linear correlation coefficient and a second max linear correlation coefficient from the 7 linear correlation coefficients corresponding to the 7 level-1 sub-band sequences out of the obtained 15 linear correlation coefficients, regarding the level-1 sub-band sequences respectively corresponding to the max linear correlation coefficient and the second max linear correlation coefficient as the two level-1 sub-band sequences to be selected; and selecting a max linear correlation coefficient and a second max linear correlation coefficient from the 8 linear correlation coefficients corresponding to the 8 level-2 sub-band sequences out of the obtained 15 linear correlation coefficients, regarding the level-2 sub-band sequences respectively corresponding to the max linear correlation coefficient and the second max linear correlation coefficient as the two level-2 sub-band sequences to be selected;
wherein in the preferred embodiment, for selecting the No. p2 and the No. q2 level-2 sub-band sequences, and the No. p1 and the No. q1 level-1 sub-band sequences, a distorted video collection with 4 different distortion types and different distortion degrees based on 10 undistorted video sequences in a LIVE video quality database from University of Texas at Austin is utilized; the distorted video collection comprises: 40 distorted video sequences with wireless transmission distortion, 30 distorted video sequences with IP network transmission distortion, 40 distorted video sequences with H.264 compression distortion, and 40 distorted video sequences with MPEG-2 compression distortion; each of the distorted video sequences has a corresponding subjective quality evaluation result which is represented by a difference mean opinion score DMOS; that is to say, a subjective quality evaluation result VSn
f) calculating quality of the GOPs of the Vdis according to the quality of the level-1 and level-2 sub-band sequences corresponding to the GOPs of the Vdis, marking the quality of the Gdisi as QLvi, wherein QLvi=wLv×QLv1i+(1−wLv)×QLv2i, wLv is a weight value of the QLv1i, in the preferred embodiment, wLv=0.93; and
g) calculating objective evaluated quality of the Vdis according to the quality of the GOPs of the Vdis, marking the objective evaluated quality as Q, wherein
wi is a weight value of the QLvi; wherein for obtaining the wi, the step g) specifically comprises steps of:
g-1) calculating an average value of brightness average values of all the images in each of the GOPs of the Vdis, marking the average value of the brightness average values of all the images of the Gdisi as Lavgi, wherein
∂f represents the brightness average value of a No. f frame of image, a value of the ∂f is the brightness average value obtained by averaging brightness values of all pixels in the No. f frame of image, and 1≦i≦nGoF;
g-2) calculating an average value of motion intensity of all the images of each of the GOPs except a first frame of image in the GOP, marking the average value of motion intensity of all the images of Gdisi except the first frame of image as MAavgi, wherein
MAf′ represents the motion intensity of the No. f′ frame of image of the Gdisi,
W represents a width of the No. f′ frame of image of the Gdisi, H represents a height of the No. f′ frame of image of the Gdisi, mvx (s,t) represents a horizontal value of a motion vector of a pixel with a position of (s,t) in the No. f′ frame of image of the Gdisi, mvy(s,t) represents a vertical value of the motion vector of the pixel with the position of (s,t) in the No. f′ frame of image of the Gdisi; the motion vector of each of the pixels in the No. f′ frame of image of the Gdisi is obtained with a reference to a former frame of image of the No. f′ frame of image of the Gdisi;
g-3) forming a brightness average value vector with the average values of the brightness average values of all the images of the GOPs of the Vdis, marking the brightness average value vector as VLavg wherein VLavg=(Lavg1, Lavg2, . . . , Lavgn
and forming an average value vector of the motion intensity with the average values of the motion intensity of all the images of the GOPs of the Vdis except the first frame of image, marking the average value vector of the motion intensity as VMAavg, wherein VMAavg=(MAavg1, MAavg2, . . . , MAavgn
g-4) normalizing every element of the VLavg, for obtaining normalized values of the elements of the VLavg, marking the normalized value of the No. i element of the VLavg as vLavgi,norm, wherein
Lavgi represents a value of the No. i element of the VLavg, max(VLavg) represents a value of the element with a max value of the VLavg, min(VLavg) represents a value of the element with a min value of the VLavg;
and normalizing every element of the VMAavg, for obtaining normalized values of the elements of the VMAavg, marking the normalized value of the No. i element of the VMAavg as vMAavgi,norm, wherein
MAavgi represents a value of the No. i element of the VMAavg, max(VMAavg) represents a value of the element with a max value of the VMAavg, min(VMAavg) represents a value of the element with a min value of the VMAavg; and
g-5) calculating the weight value wi of the QLvi according to the vLavgi,norm and the vMAavgi,norm, wherein wi=(1−vMAavgi,norm)×vLavgi,norm.
For illustrating effectiveness and feasibility of the present invention, the LIVE video quality database from University of Texas at Austin is utilized for experimental verification, so as to analyze relativity of the objective evaluated result and the difference mean opinion score. The distorted video collection with 4 different distortion types and different distortion degrees is formed based on the 10 undistorted video sequences in the LIVE video quality database, the distorted video collection comprises: 40 distorted video sequences with wireless transmission distortion, 30 distorted video sequences with IP network transmission distortion, 40 distorted video sequences with H.264 compression distortion, and 40 distorted video sequences with MPEG-2 compression distortion. Referring to
Herein, 4 common parameters for evaluating the performance of video quality evaluation method are utilized, that is, Pearson correlation coefficient under nonlinear regression (CC for short), Spearman rank order correlation coefficient (SROCC for short), outlier ratio (OR for short), and rooted mean squared error (RMSE for short). CC represents accuracy of the objective quality evaluation method, and SROCC represents prediction monotonicity of the objective quality evaluation method, wherein the CC and the SROCC being closer to 1 means that the performance of the objective quality evaluation method is better. OR represents dispersion degree of the objective quality evaluation method, wherein the OR being closer to 0 means that the objective quality evaluation method is better. RMSE represents prediction accuracy of the objective quality evaluation method, the RMSE being smaller means that the objective quality evaluation method is better. CC, SROCC, OR and RMSE coefficients representing accuracy, monotonicity and dispersion ratio of the video quality evaluation method according to the present invention are illustrated in a Table. 1. Referring to the Table. 1, overall hybrid distortion CC and SROCC are both above 0.79, wherein CC is above 0.8. OR is 0, RMSE is lower than 6.5. According to the present invention, the relativity of the objective evaluated quality Q and the difference mean opinion score DMOS obtained is high, which illustrates sufficient consistency of objective evaluation results with subjective evaluation visual results, and well illustrates the effectiveness of the present invention.
One skilled in the art will understand that the embodiment of the present invention as shown in the drawings and described above is exemplary only and not intended to be limiting.
It will thus be seen that the objects of the present invention have been fully and effectively accomplished. Its embodiments have been shown and described for the purposes of illustrating the functional and structural principles of the present invention and is subject to change without departure from such principles. Therefore, this invention includes all modifications encompassed within the spirit and scope of the following claims.
Number | Date | Country | Kind |
---|---|---|---|
201410360953.9 | Jul 2014 | CN | national |