This invention relates to a method for determining a video quality measure, and an apparatus for determining a video quality measure.
Video quality measurement is the basis of video coding and processing, and has attracted more and more attention from industries and academics. However accurately estimating the image quality of video is still difficult, especially when there is no reference video available. In this case, the only useful information can be obtained from the video itself. A common approach is to extract some features from the uncompressed video and then build a relationship between the extracted features and the video quality. The feature extraction is one of the most important steps. Common features used for quality assessment or quality measurement are blur, blockiness, and noise. Several feature extraction algorithms have been proposed in the last decades. However it is still a problem how to identify and extract more effective features for the video quality measurement with no reference.
The present invention solves at least some of the above-mentioned problems. It provides a new feature for image quality assessment and measurement without using a reference, also known as “no-reference” type of measurement. The feature is calculated based on a variance within a local area, and is called Context Variance (CV) herein since it refers to the context of a measuring position. The feature is highly related to video quality and can be very helpful for image quality assessment and measurement, such as video quality measurement. Except for the term Context Variance, the term variance is used in a mathematical sense herein, i.e. variance is the square of standard deviation.
In principle, a method for determining a quality measure for a video image comprises selecting a measuring point, such as a MB, determining a context area around the measuring point, calculating a variance of pixel values in the context area, calculating a variance of pixel values in the measuring point, calculating a relation between the two variances of pixel values, and averaging said relations for a plurality of measuring points, wherein a quality measure for a video image is obtained.
In particular, according to one aspect of the invention, a method for determining a quality measure for a video image comprises selecting a first encoding unit of the video image as a measuring point, determining a context area of the selected first encoding unit, calculating a variance of pixel values in the context area, calculating a variance of pixel values in the selected first encoding unit, calculating a relation between the variance of pixel values in the selected first encoding unit and the variance of pixel values in the context area, and averaging for a plurality of selected first encoding units said relations, wherein a quality measure for a video image is obtained. The context area comprises the selected first encoding unit and a plurality of second encoding units that are directly adjacent to the selected first encoding unit.
An apparatus for determining a quality measure for video images is disclosed in claim 11.
Advantageous embodiments of the invention are disclosed in the dependent claims, the following description and the figures.
Exemplary embodiments of the invention are described with reference to the accompanying drawings, which show in
Another observation is that the variance of the pixels in neighbouring encoding units of a given encoding unit will not change very much as the quantization level is changed. Therefore the measuring method is largely independent from the actual quantization parameter value. A current encoding unit and at least adjacent portions of its neighbouring encoding units, e.g. a MB and at least adjacent portions of its neighbouring MBs, are defined herein as a “context block”. Based on these observations, the variance of an encoding unit is used to measure the compressing distortion, and the variance of the encoding unit's context block is used to reduce the influence from the video content. Thus, the Context Variance that is obtained from the variance of the encoding unit and the variance of the context block can be used to measure video quality.
Some aspects of the invention are the following:
First, the variance of an encoding unit is used to measure the compressing distortion. The variance of the pixels in an encoding unit is often lower as the compressing distortion is higher.
Second, a context block is determined around the encoding unit, and the variance of the context block is used to reduce the influence of the image content. The variance of the context block (e.g. a current MB and its neighbouring blocks) won't change much as the compressing strength is changed. It is more related to video content. Therefore it can be used to reduce the influence from video content. Third, areas with too plain or too complicated texture can be excluded, which makes the result more stable. In a local area (or context) with too plain or too complicated texture, the calculated local context variance (CVblock) may be unstable. Excluding those areas can help to get better results. An example is given below with respect to
In the following, the invention is described in detail. For most existing codecs, such as H.264 and MPEG2, the macroblock (MB) is the basic encoding unit. Therefore the MB is exemplarily used as a basic processing unit herein. In other embodiments, the processing unit may be different, such as a 24×24 pixel block or a 32×32 pixel block.
In a first step 21, a MB is selected to be processed next.
In some videos, especially movies, there are black edges 32, mostly at the top and bottom, sometime also at left and right. In one embodiment, MBs in the black edges cannot be selected: If a current MB or any of its neighbouring blocks are completely or partially in a black edge, the calculation of its individual Context Variance CVblock is skipped for the current MB, since any possible calculation result would be much different from the real value and therefore disturbing.
In a second step 22, the variance of the context block (σ2cb) is calculated.
The variance of the context block can be calculated according to eq. (1), where xi is the value of the ith pixel, N is the number of pixels and
In a third step 25, check if σ2cb is in a given range (α, β). In an area with too plain or too complicated texture, the calculated local Context Variance (CVblock, see eq. (2) in step 5) may be far from the real value, so in the final calculation of the image Context Variance (CVimage), the local Context Variance (CVblock) of these areas is excluded. The range (α, β) may have default values. An advantageous default value is e.g. α=2 and β=2000, since it gives good results for most videos. Optionally, the range can be set or modified through user configuration 24. For videos with a high percentage of plain texture, α may be set a little lower, but not lower than 0.5; for videos with a high percentage of very complicated texture, β may be set a little higher, but not higher than 10000.
In a fourth step 26, the variance σ2b of the current selected MB is calculated. While a MB is the basic encoding unit for most codecs, quantization may also be applied on 8×8 blocks or even smaller, such as 4×4 blocks. In one embodiment, the MB is sub-divided, e.g. into four 8×8 sub-blocks (shown in
In a fifth step 27, the local Context Variance (CVblock) is calculated according to eq. (2), in which the block variance σ2b is related to video compression distortion and the context block variance σ2cb is related to video content. Local Context Variance (CVblock) is a content independent quality metric. In eq. (2), if σ2cb is too low, such as e.g. <0.1, a little difference on σ2b may be magnified largely; if σ2cb is too high, such as e.g. >10000, an obvious difference on σ2b may be much reduced. Therefore the result is more stable if the areas whose σ2cb is out of the range (α, β) are excluded.
CV
block=σ2b/σ2cb (2)
Then it is checked 28 that the above-described steps 21, . . . , 27 are repeated until all required MBs have been tested. As described below, the calculation may be required only for the MBs within a relevant area of the image. In one embodiment, the calculation is performed only for a given number of randomly selected MBs within the relevant area. In one embodiment, the calculation is performed only for MBs at defined positions within the relevant area, such as e.g. in a grid. In one embodiment, the calculation is performed for all MBs within the relevant area. In one embodiment, MBs are excluded when the variance of their context block σ2cb is outside the defined range mentioned above.
Finally, in a sixth step 29, an image Context Variance is calculated. After the above-described steps 21, . . . , 27 have been performed for all the required MBs, the image Context Variance (CVimage) can be calculated as the average of all the calculated local Context Variances (CVblock).
CV
image=average(CVblock) (3)
The proposed scheme has been tested in a test database with 168 images with different content types and different quality levels.
Since block variance is much influenced by video content, it is often used to measure the video texture complexity. Thus, it may appear obvious to use variance also for video quality measurement. However, this would commonly require knowledge of the variance of the reference, in order to reduce the influence from the video content. Thus, obvious solutions are reference based metrics. Opposed to that, the present solution does not need any reference or metadata. Another unfavourable approach would be to calculate a variance only near edges in the image content, since this would require an image analysis and edge extraction over a complete image. Opposed to that, in the present invention the variance is calculated on any selected encoding unit and its context block, independent from the image content.
As mentioned above, the calculation may be performed only for selected encoding units and their context block. In one embodiment, a given number of encoding units are randomly selected from a relevant portion of the image, so that each image uses an individual set of e.g. 20 different measuring points. In another embodiment, encoding units are selected according to a regular or irregular grid, e.g. every 2nd MB of every second 2nd row of MBs, or every 4th MB of every 4th row of MBs or similar. In a further embodiment, the calculation is performed for each encoding unit within the above-described relevant area.
One advantage of the invention is that no reference image is required. Another advantage of the invention is that the influence of the image content on the measuring result is strongly reduced. Thus, the feature is less related to video image content and highly related to video image quality. If enough measuring points are selected (e.g. more than 50% of all possible measuring points when evenly distributed), the measuring result can be considered as independent from the image content. A further advantage is that the measuring method is largely independent from the actual quantization parameter value that was used for encoding.
The invention may also be used for measuring compression distortion, i.e. distortion resulting from compression of a video image, where no reference image is available. In one embodiment, the method has a preceding step of decoding a video image, or decoding a sequence of video images received in a data stream.
Although the present invention has been disclosed with regard to video, one skilled in the art would recognize that the method and devices described herein may also be applied to any still picture.
While there has been shown, described, and pointed out fundamental novel features of the present invention as applied to preferred embodiments thereof, it will be understood that various omissions and substitutions and changes in the apparatus and method described, in the form and details of the devices disclosed, and in their operation, may be made by those skilled in the art without departing from the spirit of the present invention. It is expressly intended that all combinations of those elements that perform substantially the same function in substantially the same way to achieve the same results are within the scope of the invention. Substitutions of elements from one described embodiment to another are also fully intended and contemplated. It is to be noted that while a common encoding unit is a macroblock for most codecs, the invention may also be applied to other encoding units, such as blocks.
It will be understood that the present invention has been described purely by way of example, and modifications of detail can be made without departing from the scope of the invention. Each feature disclosed in the description and (where appropriate) the claims and drawings may be provided independently or in any appropriate combination. Features may, where appropriate be implemented in hardware, software, or a combination of the two. Reference numerals appearing in the claims are by way of illustration only and shall have no limiting effect on the scope of the claims.
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/CN10/01850 | 11/18/2010 | WO | 00 | 11/29/2012 |