Video quality evaluation method based on 3D wavelet transform

CROSS REFERENCE OF RELATED APPLICATION

The present invention claims priority under 35 U.S.C. 119(a-d) to CN 201410360953.9, filed Jul. 25, 2014.

BACKGROUND OF THE PRESENT INVENTION

1. Field of Invention

The present invention relates to a video signal processing technology, and more particularly to a video quality evaluation method based on 3-dimensional (3D for short) wavelet transform.

2. Description of Related Arts

With the rapid development of video coding technology and display technology, different kinds of video systems are applied more and more widely, and gradually become the research focus of the field of information processing. Because of a series of uncontrollable factors, video information will be inevitably distorted in video acquisition, compression, transmission, decoding and display stages, resulting in decrease of video quality. Therefore, how to accurately measure the video quality is the key for the development of video system. Video quality evaluation is divided into subjective and objective quality evaluation. As the visual information is eventually accepted by human eye, the subjective quality evaluation is the most reliable in accuracy. However, subjective quality evaluation requires scoring by observer, which is time-consuming and not easy to be integrated in the video system. The objective quality evaluation model is able to be well integrated in the video system for real-time quality evaluation, which contributes to timely parameter adjustment of the video system, so as to provide a video system application with high quality. Therefore, the objective video quality evaluation method, which is accurate, effective and consistent with human visual characteristics, has a very good application value. The conventional objective video quality evaluation method mainly simulates motion and time-domain video information processing methods of human eyes, and some objective image quality evaluation methods are combined. That is to say, time-domain distortion evaluation of the video is added into the conventional objective image quality evaluation, so as to objectively evaluate the video information quality. Although time-domain information of video sequences are described from different angles according to the above methods, understanding of processing methods of human eye when viewing video information is limited at present. Therefore, time-domain information description according to the above methods is limited, which means it is difficult to evaluate the video time-domain quality, and will eventually lead to poor consistency of objective evaluation results with subjective evaluation visual results.

SUMMARY OF THE PRESENT INVENTION

An object of the present invention is to provide a video quality evaluation method based on 3D wavelet transform which is able to effectively improve relativity between an objective quality evaluation result and subjective quality judged by human eyes.

Accordingly, in order to accomplish the above object, the present invention provides a video quality evaluation method based on 3D wavelet transform, comprising steps of:

a) marking an original undistorted reference video sequence as V_ref, marking a distorted video sequence as V_dis, wherein the V_refand the V_disboth comprise N_frframes of images, wherein N_fr≧2ⁿ, n is a positive integer, and nε[3,5];

b) regarding 2ⁿframes of images as a group of picture (GOP for short), respectively dividing the V_refand the V_disinto n_GoFGOPs, marking a No. i GOP in the V_refas G_refⁱ, marking a No. i GOP in the V_disas G_disⁱ, wherein

$n_{GoF} = ⌊ \frac{N_{fr}}{2^{n}} ⌋,$

the symbol └ ┘ means down-rounding, and 1≦i≦n_GoF;

c) applying 2-level 3D wavelet transform on each of the GOPs of the V_ref, for obtaining 15 sub-band sequences corresponding to each of the GOPs, wherein the 15 sub-band sequences comprise 7 level-1 sub-band sequences and 8 level-2 sub-band sequences, each of the level-1 sub-band sequences comprises

$\frac{2^{n}}{2}$

frames of images, and each of the level-2 sub-band sequences comprises

$\frac{2^{n}}{2 \times 2}$

frames of images;

similarly, applying the 2-level 3D wavelet transform on each of the GOPs of the V_dis, for obtaining 15 sub-band sequences corresponding to each of the GOPs, wherein the 15 sub-band sequences are 7 level-1 sub-band sequences and 8 level-2 sub-band sequences, each of the level-1 sub-band sequences comprises

$\frac{2^{n}}{2}$

frames of images, and each of the level-2 sub-band sequences comprises

$\frac{2^{n}}{2 \times 2}$

frames of images;

d) calculating quality of each of the sub-band sequences corresponding to the GOPs of the V_dis, marking the quality of a No. j sub-band sequence corresponding to the G_disⁱas Q^i,j, wherein

$Q^{i, j} = \frac{\sum_{k = 1}^{K} SSIM ({VI}_{ref}^{i, j, k}, {VI}_{dis}^{i, j, k})}{K}, 1 \leq j \leq 15, 1 \leq k \leq K,$

K represents a frame quantity of a No. j sub-band sequence corresponding to the G_refⁱand the No. j sub-band sequence corresponding to the G_disⁱ; if the No. j sub-band sequence corresponding to the G_refⁱand the No. j sub-band sequence corresponding to the G_disⁱare both the level-1 sub-band sequences, then

$K = \frac{2^{n}}{2};$

if the No. j sub-band sequence corresponding to the G_refⁱand the No. j sub-band sequence corresponding to the G_disⁱare both the level-2 sub-band sequences, then

$K = \frac{2^{n}}{2 \times 2};$

VI_ref^i,j,krepresents a No. k frame of image of the No. j sub-band sequence corresponding to the G_refⁱ, VI_dis^i,j,krepresents a No. k frame of image of the No. j sub-band sequence corresponding to the G_disⁱ, SSIM ( ) is a structural similarity function, and

$SSIM ({VI}_{ref}^{i, j, k}, {VI}_{dis}^{i, j, k}) = \frac{(2 μ_{ref} μ_{dis} + c_{1}) (2 σ_{ref - dis} + c_{2})}{(μ_{ref}^{2} + μ_{dis}^{2} + c_{1}) (σ_{ref}^{2} + σ_{dis}^{2} + c_{2})},$

μ_refrepresents an average value of the VI_ref^i,j,k, μ_disrepresents an average value of the VI_dis^i,j,k, σ_refrepresents a standard deviation of the VI_ref^i,j,k, σ_disrepresents a standard deviation of the VI_dis^i,j,k, σ_ref-disrepresents covariance between the VI_ref^i,j,kand the VI_dis^i,j,k, c₁and c₂are constants, and c₁≠0, c₂≠0;

e) selecting 2 sequences from the 7 level-1 sub-band sequences of each of the GOPs of the V_dis, then calculating quality of the level-1 sub-band sequences corresponding to the GOPs of the V_disaccording to quality of the selected 2 sequences of the level-1 sub-band sequences corresponding to the GOPs of the V_dis, wherein for the 7 level-1 sub-band sequences corresponding to the G_disⁱ, supposing that a No. p₁sequence and a No. q₁sequence of the level-1 sub-band sequences are selected, then quality of the level-1 sub-band sequences corresponding to the G_disⁱis marked as Q_Lv1ⁱ, wherein Q_Lv1ⁱ=w_Lv1×Q^i,p¹+(1−w_Lv1)×Q^i,q¹, 9≦p₁≦15, 9≦q₁≦15, w_Lv1is a weight value of Q^i,p¹, the Q^i,p¹represents the quality of the No. p₁sequence of the level-1 sub-band sequences corresponding to the G_disⁱ, Q^i,q¹represents the quality of the No. q₁sequence of the level-1 sub-band sequences corresponding to the G_disⁱ;

and selecting 2 sequences from the 8 level-2 sub-band sequences of each of the GOPs of the V_dis, then calculating quality of the level-2 sub-band sequences corresponding to the GOPs of the V_disaccording to quality of the selected 2 sequences of the level-2 sub-band sequences corresponding to the GOPs of the V_dis, wherein for the 8 level-2 sub-band sequences corresponding to the G_disⁱ, supposing that a No. p₂sequence and a No. q₂sequence of the level-2 sub-band sequences are selected, then quality of the level-2 sub-band sequences corresponding to the G_disⁱis marked as Q_Lv2ⁱ, wherein Q_Lv2ⁱ=w_Lv2×Q^i,p²+(1+w_Lv2)×Q^i,q², 1≦p₂≦8, 1≦q₂≦8, w_Lv2is a weight value of Q^i,p², the Q^i,p²represents the quality of the No. p₂sequence of the level-2 sub-band sequences corresponding to the G_disⁱ, Q^i,q²represents the quality of the No. q₂sequence of the level-2 sub-band sequences corresponding to the G_disⁱ;

f) calculating quality of the GOPs of the V_disaccording to the quality of the level-1 and level-2 sub-band sequences corresponding to the GOPs of the V_dis, marking the quality of the G_disⁱas Q_Lvⁱ, wherein Q_Lvⁱ=w_Lv×Q_Lv1ⁱ+(1−w_Lv)×Q_Lv2ⁱ, w_Lvis a weight value of the Q_Lvⁱ; and

g) calculating objective evaluated quality of the V_disaccording to the quality of the GOPs of the V_dis, marking the objective evaluated quality as Q, wherein

$Q = \frac{\sum_{i = 1}^{n_{GoF}} w^{i} \times Q_{Lv}^{i}}{\sum_{i = 1}^{n_{GoF}} w^{i}},$

wⁱis a weight value of the Q_Lvⁱ.

Preferably, for selecting the 2 sequences of the level-1 sub-band sequences and the 2 sequences of the level-2 sub-band sequences, the step e) specifically comprises steps of:

e-1) selecting a video database with subjective video quality as a training video database, obtaining quality of each sub-band sequence corresponding to each GOP of distorted video sequences in the training video database by applying from the step a) to the step d), marking the No. n_vdistorted video sequence as V_disⁿ^v, marking quality of a No. j sub-band sequence corresponding to the No. i′ GOP of the V_disⁿ^vas Q_n_v^i′,j, wherein 1≦n_v≦U, U represents a quantity of the distorted sequences in the training video database, 1≦i′≦n_GoF′, n_GoF′ represents a quantity of the GOPs of the V_disⁿ^v, 1≦j≦15;

e-2) calculating objective video quality of all the same sub-band sequences corresponding to all the GOPs of the distorted video sequences in the training video database, marking objective video quality of all the No. j sub-band sequences corresponding to all the GOPs of the V_disⁿ^vas VQ_n_v^j, wherein

${VQ}_{n_{v}}^{j} = \frac{\sum_{i^{'} = 1}^{n_{GoF}^{'}} Q_{n_{v}}^{i^{'}, j}}{n_{GoF}^{'}};$

e-3) forming a vector v_X^jwith the objective video quality of all the No. j sub-band sequences corresponding to all the GOPs of the distorted video sequences in the training video database, wherein v_X^j=(VQ₁^j, VQ₂^j, . . . , VQ_n_v^j, . . . , VQ_U^j); forming a vector v_Ywith the subjective video quality of all the distorted video sequences in the training video database, wherein v_Y=(VS₁, VS₂, . . . , VS_n_v, . . . , VS_U), wherein 1≦j≦15, VQ₁^jrepresents the objective video quality of the No. j sub-band sequences corresponding to all the GOPs of the first distorted video sequence in the training video database, VQ₂^jrepresents the objective video quality of the No. j sub-band sequences corresponding to all the GOPs of the second distorted video sequence in the training video database, VQ_n_v^jrepresents the objective video quality of the No. j sub-band sequences corresponding to all the GOPs of the No. n_vdistorted video sequence in the training video database, VQ_U^j, represents the objective video quality of the No. j sub-band sequences corresponding to all the GOPs of the No. U distorted video sequence in the training video database; VS₁represents the subjective video quality of the first distorted video sequence in the training video database, VS₂represents the subjective video quality of the second distorted video sequence in the training video database, VS_n_vrepresents the subjective video quality of the No. n_vdistorted video sequence in the training video database, VS_Urepresents the subjective video quality of the No. U distorted video sequence in the training video database;

then calculating a linear correlation coefficient of the objective video quality of the same sub-band sequences corresponding to all the GOPs of the distorted video sequences in the training video database and the subjective quality of the distorted sequences, marking the linear correlation coefficient of the objective video quality of the No. j sub-band sequence corresponding to all the GOPs of the distorted video sequences and the subjective quality of the distorted sequences as CC^j, wherein

${CC}^{j} = \frac{\sum_{n_{v} = 1}^{U} ({VQ}_{n_{v}}^{j} - {\overline{V}}_{Q}^{j}) ({VS}_{n_{v}} - {\overline{V}}_{S})}{\sqrt{\sum_{n_{v} = 1}^{U} {({VQ}_{n_{v}}^{j} - {\overline{V}}_{Q}^{j})}^{2}} \sqrt{\sum_{n_{v} = 1}^{U} {({VS}_{n_{v}} - {\overline{V}}_{S})}^{2}}}, 1 \leq j \leq 15,$

V
_Q
^jis an average value of all element values of the v_X^j, V_Sis an average value of all element values of the v_Y; and

e-4) selecting a max linear correlation coefficient and a second max linear correlation coefficient from the 7 linear correlation coefficients corresponding to the 7 level-1 sub-band sequences out of the obtained 15 linear correlation coefficients, regarding the level-1 sub-band sequences respectively corresponding to the max linear correlation coefficient and the second max linear correlation coefficient as the two level-1 sub-band sequences to be selected; and selecting a max linear correlation coefficient and a second max linear correlation coefficient from the 8 linear correlation coefficients corresponding to the 8 level-2 sub-band sequences out of the obtained 15 linear correlation coefficients, regarding the level-2 sub-band sequences respectively corresponding to the max linear correlation coefficient and the second max linear correlation coefficient as the two level-2 sub-band sequences to be selected.

Preferably, in the step e), w_Lv1=0.71, and w_Lv2=0.58.

Preferably, in the step f), w_Lv=0.93.

Preferably, for obtaining the wⁱ, the step g) specifically comprises steps of:

g-1) calculating an average value of brightness average values of all the images in each of the GOPs of the V_dis, marking the average value of the brightness average values of all the images of the G_disⁱas Lavgⁱ, wherein

${Lavg}^{i} = \frac{\sum_{f = 1}^{2^{n}} \partial_{f}}{2^{n}},$

∂_frepresents the brightness average value of a No. f frame of image, a value of the ∂_fis the brightness average value obtained by averaging brightness values of all pixels in the No. f frame of image, and 1≦i≦n_GoF;

g-2) calculating an average value of motion intensity of all the images of each of the GOPs except a first frame of image in the GOP, marking the average value of motion intensity of all the images of G_disⁱexcept the first frame of image as MAavgⁱ, wherein

${MAavg}^{i} = \frac{\sum_{f^{'} = 2}^{2^{n}} {MA}_{f^{'}}}{2^{n} - 1}, 2 \leq f^{'} \leq 2^{n},$

MA_f′ represents the motion intensity of the No. f′ frame of image of the G_disⁱ,

${MA}_{f^{'}} = \frac{1}{W \times H} \sum_{s = 1}^{W} \sum_{t = 1}^{H} ({({mv}_{x} (s, t))}^{2} + {({mv}_{y} (s, t))}^{2}),$

represents a width of the No. f′ frame of image of the G_disⁱ, H represents a height of the No. f′ frame of image of the G_disⁱ, mv_x(s,t) represents a horizontal value of a motion vector of a pixel with a position of (s,t) in the No. f′ frame of image of the G_disⁱ, mv_y(s,t) represents a vertical value of the motion vector of the pixel with the position of (s,t) in the No. f′ frame of image of the G_disⁱ;

g-3) forming a brightness average value vector with the average values of the brightness average values of all the images of the GOPs of the V_dis, marking the brightness average value vector as V_Lavg, wherein V_Lavg=(Lavg¹, Lavg², . . . , Lavgⁿ^GoF), Lavg¹represents an average value of the brightness average values of images of the first GOP of the V_dis, Lavg²represents an average value of the brightness average values of images of the second GOP of the V_dis, Lavgⁿ^GoFrepresents an average value of the brightness average values of images of the No. n_GoFof the V_dis;

and forming an average value vector of the motion intensity with the average values of the motion intensity of all the images of the GOPs of the V_disexcept the first frame of image, marking the average value vector of the motion intensity as V_MAavg, wherein V_MAavg=(MAavg¹, MAavg², . . . , MAavgⁿ^GoF), MAavg¹represents an average value of the motion intensity of images of the first GOP of the V_disexcept the first frame of image, MAavg²represents an average value of the motion intensity of images of the second GOP of the V_disexcept the first frame of image, MAavgⁿ^GoFrepresents an average value of the motion intensity of images of the No. n_GoFGOP of the V_disexcept the first frame of image;

g-4) normalizing every element of the V_Lavg, for obtaining normalized values of the elements of the V_Lavg, marking the normalized value of the No. i element of the V_Lavgas v_Lavg^i,norm, wherein

$v_{Lavg}^{i, norm} = \frac{{Lavg}^{i} - \max (V_{Lavg})}{\max (V_{Lavg}) - \min (V_{Lavg})},$

Lavgⁱrepresents a value of the No. i element of the V_Lavg, max(V_Lavg) represents a value of the element with a max value of the V_Lavg, min(V_Lavg) represents a value of the element with a min value of the V_Lavg;

and normalizing every element of the V_MAavg, for obtaining normalized values of the elements of the V_MAavg, marking the normalized value of the No. i element of the V_MAavgas v_MAavg^i,norm, wherein

$v_{MAavg}^{i, norm} = \frac{{MAavg}^{i} - \max (V_{MAavg})}{\max (V_{MAavg}) - \min (V_{MAavg})},$

MAavgⁱrepresents a value of the No. i element of the V_MAavg, max(V_MAavg) represents a value of the element with a max value of the v_MAavg, min(V_MAavg) represents a value of the element with a min value of the V_MAavg; and

g-5) calculating the weight value wⁱof the Q_Lvⁱaccording to the v_Lavg^i,normand the v_MAavg^i,norm, wherein w_i=(1−v_MAavg^i,norm)×v_Lavg^i,norm.

Compared to the conventional technologies, the present invention has advantages as follows.

Firstly, according to the present invention, the 3D wavelet transform is utilized in the video quality evaluation, for transforming the GOPs of the video. By splitting the video sequence on a time axis, time-domain information of the GOPs is described, which to a certain extent solves a problem that the video time-domain information is difficult to be described, and effectively improves accuracy of objective video quality evaluation, so as to effectively improve relativity between the objective quality evaluation result and the subjective quality judged by the human eyes.

Secondly, for time-domain relativity between the GOPs, the method weighs the quality of the GOPs according to the motion intensity and the brightness, in such a manner that the method is able to better meet human visual characteristics.

These and other objectives, features, and advantages of the present invention will become apparent from the following detailed description, the accompanying drawings, and the appended claims.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block diagram of a video quality evaluation method based on 3D wavelet transform according to a preferred embodiment of the present invention.

FIG. 2 is a linear correlation coefficient diagram of objective video quality of the same sub-band sequences and a difference mean opinion score of all distorted video sequences in a LIVE video database according to the preferred embodiment of the present invention.

FIG. 3
a is a scatter diagram of objective evaluated quality Q judged by the video quality evaluation method and a difference mean opinion score DMOS of distorted video sequences with wireless transmission distortion according to the preferred embodiment of the present invention.

FIG. 3
b is a scatter diagram of objective evaluated quality Q judged by the video quality evaluation method and a difference mean opinion score DMOS of distorted video sequences with IP network transmission distortion according to the preferred embodiment of the present invention.

FIG. 3
c is a scatter diagram of objective evaluated quality Q judged by the video quality evaluation method and a difference mean opinion score DMOS of distorted video sequences with H.264 compression distortion according to the preferred embodiment of the present invention.

FIG. 3
d is a scatter diagram of objective evaluated quality Q judged by the video quality evaluation method and a difference mean opinion score DMOS of distorted video sequences with MPEG-2 compression distortion according to the preferred embodiment of the present invention.

FIG. 3
e is a scatter diagram of objective evaluated quality Q judged by the video quality evaluation method and a difference mean opinion score DMOS of all distorted video sequences in a video quality database according to the preferred embodiment of the present invention.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT

Referring to the drawings and a preferred embodiment, the present invention is further illustrated.

Referring to FIG. 1 of the drawings, a video quality evaluation method based on 3D wavelet transform is illustrated, comprising steps of:

$n_{GoF} = ⌊ \frac{N_{fr}}{2} ⌋,$

the symbol └ ┘ means down-rounding, and 1≦i≦n_GoF;

wherein in the preferred embodiment, n=5, therefore, each of the GOPs comprises 32 frames of images; in practice, if quantities of the frames of images of the V_refand the V_disare not positive integer times of 2ⁿ, after a plurality of GOPs are obtained orderly, the rest images are omitted;

$\frac{2^{n}}{2}$

frames of images, and each of the level-2 sub-band sequences comprises

$\frac{2^{n}}{2 \times 2}$

frames of images;

wherein the 7 level-1 sub-band sequences corresponding to the GOPs of the V_refcomprise: a level-1 reference time-domain low-frequency horizontal detailed sequence LLH_ref, a level-1 reference time-domain low-frequency vertical detailed sequence LHL_ref, a level-1 reference time-domain low-frequency diagonal detailed sequence LHH_ref, a level-1 reference time-domain high-frequency approximated sequence HLL_ref, a level-1 reference time-domain high-frequency horizontal detailed sequence HLH_ref, a level-1 reference time-domain high-frequency vertical detailed sequence HHL_ref, and a level-1 reference time-domain high-frequency diagonal detailed sequence HHH_ref; the 8 level-2 sub-band sequences corresponding to the GOPs of the V_refcomprise: a level-2 reference time-domain low-frequency approximated sequence LLLL_ref, a level-2 reference time-domain low-frequency horizontal detailed sequence LLLH_ref, a level-2 reference time-domain low-frequency vertical detailed sequence LLHL_ref, a level-2 reference time-domain low-frequency diagonal detailed sequence LLHH_ref, a level-2 reference time-domain high-frequency approximated sequence LHLL_ref, a level-2 reference time-domain high-frequency horizontal detailed sequence LHLH_ref, a level-2 reference time-domain high-frequency vertical detailed sequence LHHL_ref, and a level-2 reference time-domain high-frequency diagonal detailed sequence LHHH_ref;

$\frac{2^{n}}{2}$

frames of images, and each of the level-2 sub-band sequences comprises

$\frac{2^{n}}{2 \times 2}$

frames of images;

wherein the 7 level-1 sub-band sequences corresponding to the GOPs of the V_discomprise: a level-1 distorted time-domain low-frequency horizontal detailed sequence LLH_dis, a level-1 distorted time-domain low-frequency vertical detailed sequence LHL_dis, a level-1 distorted time-domain low-frequency diagonal detailed sequence LHH_dis, a level-1 distorted time-domain high-frequency approximated sequence HLL_dis, a level-1 distorted time-domain high-frequency horizontal detailed sequence HLH_dis, a level-1 distorted time-domain high-frequency vertical detailed sequence HHL_dis, and a level-1 distorted time-domain high-frequency diagonal detailed sequence HHH_dis; the 8 level-2 sub-band sequences corresponding to the GOPs of the V_discomprise: a level-2 distorted time-domain low-frequency approximated sequence LLLL_dis, a level-2 distorted time-domain low-frequency horizontal detailed sequence LLLH_dis, a level-2 distorted time-domain low-frequency vertical detailed sequence LLHL_dis, a level-2 distorted time-domain low-frequency diagonal detailed sequence LLHH_dis, a level-2 distorted time-domain high-frequency approximated sequence LHLL_dis, a level-2 distorted time-domain high-frequency horizontal detailed sequence LHLH_dis, a level-2 distorted time-domain high-frequency vertical detailed sequence LHHL_dis, and a level-2 distorted time-domain high-frequency diagonal detailed sequence LHHH_dis;

wherein the time-domain of the video is split with the 3D wavelet transform; the time-domain information is described from an angle of frequency components, and is treated in a wavelet-domain, which to a certain extent solves a problem that the video time-domain information is difficult to be described in the video quality evaluation, and effectively improves accuracy of the evaluation method;

d) calculating quality of each of the sub-band sequences corresponding to the GOPs of the V_dis, marking the quality of a No. j sub-band sequence corresponding to the G_disⁱas Q^i,j, wherein

$Q^{i, j} = \frac{\sum_{k = 1}^{K} SSIM ({VI}_{ref}^{i, j, k}, {VI}_{dis}^{i, j, k})}{K},$

1≦j≦15, 1≦k≦K, K represents a frame quantity of a No. j sub-band sequence corresponding to the G_refⁱand the No. j sub-band sequence corresponding to the G_disⁱ; if the No. j sub-band sequence corresponding to the G_refⁱand the No. j sub-band sequence corresponding to the G_disⁱare both the level-1 sub-band sequences, then

$K = \frac{2^{n}}{2};$

if the No. j sub-band sequence corresponding to the G_refⁱand the No. j sub-band sequence corresponding to the G_disⁱare both the level-2 sub-band sequences, then

$K = \frac{2^{n}}{2 \times 2};$

$SSIM ({VI}_{ref}^{i, j, k}, {VI}_{dis}^{i, j, k}) = \frac{(2 μ_{ref} μ_{dis} + c_{1}) (2 σ_{ref - dis} + c_{2})}{(μ_{ref}^{2} + μ_{dis}^{2} + c_{1}) (σ_{ref}^{2} + σ_{dis}^{2} + c_{2})},$

$SSIM ({VI}_{ref}^{i, j, k}, {VI}_{dis}^{i, j, k}) = \frac{(2 μ_{ref} μ_{dis} + c_{1}) (2 σ_{ref - dis} + c_{2})}{(μ_{ref}^{2} + μ_{dis}^{2} + c_{1}) (σ_{ref}^{2} + σ_{dis}^{2} + c_{2})}$

when the denominator is close to zero, and c₁≠0, c₂≠0;

e) selecting 2 sequences from the 7 level-1 sub-band sequences of each of the GOPs of the V_dis, then calculating quality of the level-1 sub-band sequences corresponding to the GOPs of the V_disaccording to quality of the selected 2 sequences of the level-1 sub-band sequences corresponding to the GOPs of the V_dis, wherein for the 7 level-1 sub-band sequences corresponding to the G_disⁱ, supposing that a No. p₁sequence and a No. q₁sequence of the level-1 sub-band sequences are selected, then quality of the level-1 sub-band sequences corresponding to the G_disⁱis marked as Q_Lvⁱ, wherein Q_Lv1ⁱ=w_Lv1×Q^i,p¹+(1−w_Lv1)×Q^i,q¹, 9≦p₁≦15, 9≦q₁≦15, w_Lv1is a weight value of the Q^i,p¹, the Q^i,p¹represents the quality of the No. p₁sequence of the level-1 sub-band sequences corresponding to the G_disⁱ, Q^i,q¹represents the quality of the No. q₁sequence of the level-1 sub-band sequences corresponding to the G_disⁱ; from the No. 9 to the No. 15 sub-band sequences of the 15 sub-band sequences corresponding to the GOPs of the V_disare the level-1 sub-band sequences;

and selecting 2 sequences from the 8 level-2 sub-band sequences of each of the GOPs of the V_dis, then calculating quality of the level-2 sub-band sequences corresponding to the GOPs of the V_disaccording to quality of the selected 2 sequences of the level-2 sub-band sequences corresponding to the GOPs of the V_dis, wherein for the 8 level-2 sub-band sequences corresponding to the G_disⁱ, supposing that a No. p₂sequence and a No. q₂sequence of the level-2 sub-band sequences are selected, then quality of the level-2 sub-band sequences corresponding to the G_disⁱis marked as Q_Lv2ⁱ, wherein Q_Lv2ⁱ=w_Lv2×Q^i,p²+(1−w_Lv2)×Q^i,q², 1≦p₂≦8, 1≦q₂≦8, w_Lv2is a weight value of the Q^i,p², the Q^i,p²represents the quality of the No. p₂sequence of the level-2 sub-band sequences corresponding to the G_disⁱ, Q^i,q²represents the quality of the No. q₂sequence of the level-2 sub-band sequences corresponding to the G_disⁱ; from the No. 1 to the No. 8 sub-band sequences of the 15 sub-band sequences corresponding to the GOPs of the V_disare the level-2 sub-band sequences;

wherein in the preferred embodiment, w_Lv1=0.71, w_Lv2=0.58, p₁=9, q₁=12, p₂=3, and q₂=1;

wherein according to the present invention, selection of the No. p₁and the No. q₁level-1 sub-band sequences and selection of the No. p₂and the No. q₂level-2 sub-band sequences are processes of selecting suitable parameters with statistical analysis, that is to say, the selection is provided with a suitable training video database through following steps e-1) to e-4); after obtaining values of the p₂, q₂, p₁and q₁, constant values thereof are applicable during video quality evaluation of distorted video sequences with the video quality evaluation method;

wherein for selecting the 2 sequences of the level-1 sub-band sequences and the 2 sequences of the level-2 sub-band sequences, the step e) specifically comprises steps of:

e-1) selecting a video database with subjective video quality as a training video database, obtaining quality of each sub-band sequence corresponding to GOPs of distorted video sequences in the training video database by applying from the step a) to the step d), marking the No. n_vdistorted video sequence as V_disⁿ^v, marking quality of a No. j sub-band sequence corresponding to the No. i′ GOP of the V_disⁿ^vas Q_n_v^i′,j, wherein 1≦n_v≦U, U represents a quantity of the distorted sequences in the training video database, 1≦i′≦n_GoF′, n_GoF′ represents a quantity of the GOPs of the V_disⁿ^v, 1≦j≦15;

${VQ}_{n_{v}}^{j} = \frac{\sum_{i^{'} = 1}^{n_{GoF}^{'}} Q_{n_{v}}^{i^{'}, j}}{n_{GoF}^{'}};$

e-3) forming a vector v_X^jwith the objective video quality of all the No. j sub-band sequences corresponding to all the GOPs of the distorted video sequences in the training video database, wherein v_X^j=(VQ₁^j, VQ₂^j, . . . , VQ_n_v^j, . . . , VQ_U^j), wherein a vector is formed for each of the same sub-band sequences, that is to say, there are 15 vectors respectively corresponding to the 15 sub-band sequences; forming a vector v_Ywith the subjective video quality of all the distorted video sequences in the training video database, wherein v_Y=(VS₁, VS₂, . . . , VS_n_v, . . . , VS_U), wherein 1≦j≦15, VQ₁^jrepresents the objective video quality of the No. j sub-band sequences corresponding to all the GOPs of the first distorted video sequence in the training video database, VQ₂^jrepresents the objective video quality of the No. j sub-band sequences corresponding to all the GOPs of the second distorted video sequence in the training video database, VQ_n_v^jrepresents the objective video quality of the No. j sub-band sequences corresponding to all the GOPs of the No. n_vdistorted video sequence in the training video database, VQ_U^jrepresents the objective video quality of the No. j sub-band sequences corresponding to all the GOPs of the No. U distorted video sequence in the training video database; VS₁represents the subjective video quality of the first distorted video sequence in the training video database, VS₂represents the subjective video quality of the second distorted video sequence in the training video database, VS_n_vrepresents the subjective video quality of the No. n_vdistorted video sequence in the training video database, VS_Urepresents the subjective video quality of the No. U distorted video sequence in the training video database;

V
_Q
^jis an average value of all element values of the v_X^j, V_Sis an average value of all element values of the v_Y; and

e-4) after obtaining the 15 linear correlation coefficients in the step e-3), selecting a max linear correlation coefficient and a second max linear correlation coefficient from the 7 linear correlation coefficients corresponding to the 7 level-1 sub-band sequences out of the obtained 15 linear correlation coefficients, regarding the level-1 sub-band sequences respectively corresponding to the max linear correlation coefficient and the second max linear correlation coefficient as the two level-1 sub-band sequences to be selected; and selecting a max linear correlation coefficient and a second max linear correlation coefficient from the 8 linear correlation coefficients corresponding to the 8 level-2 sub-band sequences out of the obtained 15 linear correlation coefficients, regarding the level-2 sub-band sequences respectively corresponding to the max linear correlation coefficient and the second max linear correlation coefficient as the two level-2 sub-band sequences to be selected;

wherein in the preferred embodiment, for selecting the No. p₂and the No. q₂level-2 sub-band sequences, and the No. p₁and the No. q₁level-1 sub-band sequences, a distorted video collection with 4 different distortion types and different distortion degrees based on 10 undistorted video sequences in a LIVE video quality database from University of Texas at Austin is utilized; the distorted video collection comprises: 40 distorted video sequences with wireless transmission distortion, 30 distorted video sequences with IP network transmission distortion, 40 distorted video sequences with H.264 compression distortion, and 40 distorted video sequences with MPEG-2 compression distortion; each of the distorted video sequences has a corresponding subjective quality evaluation result which is represented by a difference mean opinion score DMOS; that is to say, a subjective quality evaluation result VS_n_vof the No. n_vdistorted video sequence in the training video database of the preferred embodiment is marked as DMOS_n_v; by applying from the step a) to the step e) of the video quality evaluation method on the above distorted video sequences, objective video quality of the same sub-band sequences corresponding to all GOPs of the distorted video sequence is obtained by calculating, which means that there are 15 objective video quality corresponding to the 15 sub-band sequences for each distorted video sequence; then by applying the step e-3) for calculating a linear correlation coefficient of the objective video quality of the sub-band sequence corresponding to the distorted video sequences and a corresponding difference mean opinion score DMOS of the distorted video sequences, linear correlation coefficients corresponding to the objective video quality of the 15 sub-band sequences of the distorted video sequences are obtained; referring to the FIG. 2, a linear correlation coefficient diagram of the objective video quality of the same sub-band sequences and the difference mean opinion scores of all the distorted video sequences in the LIVE video database is illustrated, wherein in the 7 level-1 sub-band sequences, LLH_dishas the max linear correlation coefficient, and HLL_dishas the second max linear correlation coefficient, which means p₁=9, and q₁=12; wherein in the 8 level-2 sub-band sequences, LLHL_dishas the max linear correlation coefficient, and LLLL_dishas the second max linear correlation coefficient, which means p₂=3, and q₂=1; the larger the linear correlation coefficient is, the more accurate the objective quality of the sub-band sequence is when compared to the subject video quality; therefore, the sub-band sequences with the max and the second max linear correlation coefficients according to the subject video quality are selected from the level-1 and level-2 sub-band sequences for further calculating;

g) calculating objective evaluated quality of the V_disaccording to the quality of the GOPs of the V_dis, marking the objective evaluated quality as Q, wherein

$Q = \frac{\sum_{i = 1}^{n_{GoF}} w^{i} \times Q_{Lv}^{i}}{\sum_{i = 1}^{n_{GoF}} w^{i}},$

wⁱis a weight value of the Q_Lvⁱ; wherein for obtaining the wⁱ, the step g) specifically comprises steps of:

${Lavg}^{i} = \frac{\sum_{f = 1}^{2^{n}} \partial_{f}}{2^{n}},$

${MAavg}^{i} = \frac{\sum_{f^{'} = 2}^{2^{n}} {MA}_{f^{'}}}{2^{n} - 1}, 2 \leq f^{'} \leq 2^{n},$

MA_f′represents the motion intensity of the No. f′ frame of image of the G_disⁱ,

${MA}_{f^{'}} = \frac{1}{W \times H} \sum_{s = 1}^{W} \sum_{t = 1}^{H} ({({mv}_{x} (s, t))}^{2} + {({mv}_{y} (s, t))}^{2}),$

W represents a width of the No. f′ frame of image of the G_disⁱ, H represents a height of the No. f′ frame of image of the G_disⁱ, mv_x(s,t) represents a horizontal value of a motion vector of a pixel with a position of (s,t) in the No. f′ frame of image of the G_disⁱ, mv_y(s,t) represents a vertical value of the motion vector of the pixel with the position of (s,t) in the No. f′ frame of image of the G_disⁱ; the motion vector of each of the pixels in the No. f′ frame of image of the G_disⁱis obtained with a reference to a former frame of image of the No. f′ frame of image of the G_disⁱ;

g-3) forming a brightness average value vector with the average values of the brightness average values of all the images of the GOPs of the V_dis, marking the brightness average value vector as V_Lavgwherein V_Lavg=(Lavg¹, Lavg², . . . , Lavgⁿ^GoF), Lavg¹represents an average value of the brightness average values of images of the first GOP of the V_dis, Lavg²represents an average value of the brightness average values of images of the second GOP of the V_dis, Lavgⁿ^GoFrepresents an average value of the brightness average values of images of the No. n_GoFGOP of the V_dis;

g-4) normalizing every element of the V_Lavg, for obtaining normalized values of the elements of the V_Lavg, marking the normalized value of the No. i element of the V_Lavgas v_Lavg^i,norm, wherein

$v_{Lavg}^{i, norm} = \frac{{Lavg}^{i} - \max (V_{Lavg})}{\max (V_{Lavg}) - \min (V_{Lavg})},$

$v_{MAavg}^{i, norm} = \frac{{MAavg}^{i} - \max (V_{MAavg})}{\max (V_{MAavg}) - \min (V_{MAavg})},$

MAavgⁱrepresents a value of the No. i element of the V_MAavg, max(V_MAavg) represents a value of the element with a max value of the V_MAavg, min(V_MAavg) represents a value of the element with a min value of the V_MAavg; and

g-5) calculating the weight value wⁱof the Q_Lvⁱaccording to the v_Lavg^i,normand the v_MAavg^i,norm, wherein wⁱ=(1−v_MAavg^i,norm)×v_Lavg^i,norm.

For illustrating effectiveness and feasibility of the present invention, the LIVE video quality database from University of Texas at Austin is utilized for experimental verification, so as to analyze relativity of the objective evaluated result and the difference mean opinion score. The distorted video collection with 4 different distortion types and different distortion degrees is formed based on the 10 undistorted video sequences in the LIVE video quality database, the distorted video collection comprises: 40 distorted video sequences with wireless transmission distortion, 30 distorted video sequences with IP network transmission distortion, 40 distorted video sequences with H.264 compression distortion, and 40 distorted video sequences with MPEG-2 compression distortion. Referring to FIG. 3a, a scatter diagram of objective evaluated quality Q judged by the video quality evaluation method and a difference mean opinion score DMOS of the 40 distorted video sequences with wireless transmission distortion is illustrated. Referring to FIG. 3b, a scatter diagram of objective evaluated quality Q judged by the video quality evaluation method and a difference mean opinion score DMOS of the 30 distorted video sequences with IP network transmission distortion is illustrated. Referring to FIG. 3c, a scatter diagram of objective evaluated quality Q judged by the video quality evaluation method and a difference mean opinion score DMOS of the 40 distorted video sequences with H.264 compression distortion is illustrated. Referring to FIG. 3d, a scatter diagram of objective evaluated quality Q judged by the video quality evaluation method and a difference mean opinion score DMOS of the 40 distorted video sequences with MPEG-2 compression distortion is illustrated. And referring to FIG. 3e, a scatter diagram of objective evaluated quality Q judged by the video quality evaluation method and a difference mean opinion score DMOS of all the 150 distorted video sequences is illustrated. In the FIGS. 3a-3e, the higher concentration of the scatters, the better objective quality evaluation performance and relativity with the DMOS. According to the FIGS. 3a-3e, the video quality evaluation method is able to well separate the sequences with low quality from the sequences with high quality, and has good evaluation performance.

Herein, 4 common parameters for evaluating the performance of video quality evaluation method are utilized, that is, Pearson correlation coefficient under nonlinear regression (CC for short), Spearman rank order correlation coefficient (SROCC for short), outlier ratio (OR for short), and rooted mean squared error (RMSE for short). CC represents accuracy of the objective quality evaluation method, and SROCC represents prediction monotonicity of the objective quality evaluation method, wherein the CC and the SROCC being closer to 1 means that the performance of the objective quality evaluation method is better. OR represents dispersion degree of the objective quality evaluation method, wherein the OR being closer to 0 means that the objective quality evaluation method is better. RMSE represents prediction accuracy of the objective quality evaluation method, the RMSE being smaller means that the objective quality evaluation method is better. CC, SROCC, OR and RMSE coefficients representing accuracy, monotonicity and dispersion ratio of the video quality evaluation method according to the present invention are illustrated in a Table. 1. Referring to the Table. 1, overall hybrid distortion CC and SROCC are both above 0.79, wherein CC is above 0.8. OR is 0, RMSE is lower than 6.5. According to the present invention, the relativity of the objective evaluated quality Q and the difference mean opinion score DMOS obtained is high, which illustrates sufficient consistency of objective evaluation results with subjective evaluation visual results, and well illustrates the effectiveness of the present invention.

TABLE 1

Evaluation result of the 4 performance parameters according

to the method of the present invention

CC
SROCC
OR
RMSE

40 distorted video sequences with
0.8087
0.8047
0
6.2066

wireless transmission distortion

30 distorted video sequences with IP
0.8663
0.7958
0
4.8318

network transmission distortion

40 distorted video sequences with
0.7403
0.7257
0
7.4110

H.264 compression distortion

40 distorted video sequences with
0.8140
0.7979
0
5.6653

MPEG-2 compression distortion

All the 150 distorted video sequences
0.8037
0.7931
0
6.4570

One skilled in the art will understand that the embodiment of the present invention as shown in the drawings and described above is exemplary only and not intended to be limiting.

It will thus be seen that the objects of the present invention have been fully and effectively accomplished. Its embodiments have been shown and described for the purposes of illustrating the functional and structural principles of the present invention and is subject to change without departure from such principles. Therefore, this invention includes all modifications encompassed within the spirit and scope of the following claims.

Video quality evaluation method based on 3D wavelet transform

Information

Publication Number

Date Filed

Date Published

Inventors

Original Assignees

CPC

International Classifications

Abstract

Description

Claims

Priority Claims (1)