This invention is related to video compression and decompression systems.
This invention is directed to the video coding area which aims at compressing video content as much as is possible without significantly compromising its visual quality. Typically, the compression is attained by exploiting statistical data redundancies in the spatial and temporal dimensions. Given the high amount of data related to video, state-of-the-art video coding standard (e.g. MPEG-2, MPEG-4, H.264/AVC, etc.) use compression in a lossy fashion, whereas lossy in this context means that the decompressed signal is different to the original one before compression. During such compression, parts of video data are discarded/reduced by applying quantisation whereby the video data are scaled at the encoder to reduce their value and expanded at the decoder to reconstruct it. In this context, video data may refer to the image pixels as well as any data obtained by applying some related processing over the pixels (e.g. frequency transformation, temporal or spatial prediction, etc). The quantisation can be made more adaptive by exploiting some characteristics of the image as spatial activity, level of motion activity, luminance intensity, etc. These characteristics can be combined and used as input to a rule or a set of rules which decide the amount of scaling (by the quantisation) to be used over an image area to be coded. Moreover, the quantisation can be made adaptive also with respect to the rule used in the quantisation step selection. In particular, a different rule can be used depending on the image features or on the different effect one may want to introduce in the image area. Additionally, considering coding of high dynamic range images, it may be beneficial to apply different coding strategies on different areas whereas the pixel range varies. It will be understand by the embodiments of this invention that the quantisation step is now dependent on the image-related features computation which may involve a significant amount of computational resources. For some decoder architectures the inverse quantisation is conducted at a time or in a process pipe where the feature computation may not have been formed. Therefore, the aforementioned quantisation step dependency on image features may be a critical factor in determining processor resource.
It is an object of this invention to provide methods of encoding and decoding which allow additional coding efficiencies dictated by encoder to be obtained, without introducing additional dependencies at the de-quantisation levels.
Accordingly, the present invention consists in one aspect in a method of video encoding in which an image is divided into blocks; image intensity values in at least some blocks are transformed to provide transform coefficients and said transform coefficients are quantised with quantisation parameters that vary between said coefficients to derive a compressed bitstream; comprising the steps of:
In some embodiments this invention provides video compression and decompression systems with a framework to adaptively vary pixel values during image reconstruction. In this framework a selection of compression parameters is used to derive the varying effect used over each image area. The process affects spatial domain elements, just before reconstruction, and is therefore different from the standard quantisation used in frequency domain. However it can be used in the same framework where transforms are applied.
Preferably, the steps of varying values comprise expanding or compressing the dynamic range; for example compressing the dynamic range of pixel values at the encoder and expanding the dynamic range of values at the decoder.
Suitably, the values that are varied can be residual values formed at the encoder after subtraction of a prediction and formed at the decoder prior to addition of the prediction. Additionally, for some images it may be desirable to control the application of such rules by limiting it to some segment/block. In that case the usage of rules can be controlled by segment—related coding flags.
It may sometimes be convenient to define the expanding or compressing of the dynamic range as scaling by a scaling parameter.
Thus, a scaling operation can be described by the following operation:
where r denotes the prediction residue, r′ denotes the scaled residue and Δ is a scaling parameter. In the inverse scaling process, the residue reconstructed at the decoder after inverse transformation is re-scaled by applying the following operation:
{circumflex over (r)}={circumflex over (r)}′×Δ
where {circumflex over (r)} denotes the final reconstructed residue and {circumflex over (r)}′denotes the reconstructed residue after inverse transformation.
Another important aspect of the invention is related to the realisation that compression control rules depend on the given content, i.e. on a given image. Therefore this invention enables rule selection for each spatio-temporal video segment. A spatio-temporal video segment may be an image region, such as a slice or a tile; a plurality of images, such as a Group Of Pictures (GOP), a video shot or a video programme.
The present invention can be used when high dynamic range images are considered, because of different precision needed for coding of different parts of such images (e.g. overlays are usually with lower range and some intensity values require lower precision).
The present invention can also be used to take advantage of some masking phenomena related to the Human Visual System (HVS). One example of these phenomena is the one related to the average pixel intensity.
For this masking, the compression rule may be the Intensity Dependent (ID) profile which models the HVS masking to the image area average pixel intensity. That is because ID coding can be seen as a general case of Just Noticeable Distortion (JND) coding whereas the amount of coding distortion introduced in each image area is just noticeable by any human observer.
The ID profile can be used to increase the amount of information discarded during compression. In one example the higher discarded amount is obtained by scaling residual values r with a scaling parameter Δ. The scaling parameter will depend on the average pixel intensity μ and the ID profile.
It is important to note that the aforementioned Δ scaling parameter is also needed at the decoder side in order to properly perform re-scaling. However, the value for Δ depends on the ID (μ) which in turn depends on the average intensity value μ for the image block being decoded. Usually, μ is computed over the original video data which are now needed also at the decoder side for inverse quantisation. To avoid the transmission of the original data to the decoder, the μ value can be computed from the predictor or the already reconstructed pixels.
The representative image intensity may be an average luminance value.
A selected ID profile operating on the average luminance value for a block may be used in the scaling of luminance and chrominance values. Another approach is to select different IDQ profiles for each component (e.g. Y, U and V, or R, G and B, depending on the chosen colour space format).
Preferably, a graphical representation of the ID profile is parameterised and the ID profile is signalled through a small number of ID profile parameters. The number of parameters may be very much smaller then the number of intensity points (e.g. 28) which can enable very efficient compression of the ID profile by transmitting only a few numbers.
It should be understood that whilst the present invention offers important advantages where the scaling is in accordance with intensity, it is not so limited. Therefore the described ID profile is only an example of a compression control rule and image intensity is only an example of a compression control parameter. In a variation, the compression control parameter may comprise motion information with the compression control rule styled accordingly.
Thus, more generally, a coding system may involve a non-linear signal expansion where the dynamic range is compressed during quantisation/transform and is expanded to the original range at the point of reconstruction at the decoder. Non-linear expansion is driven by reconstruction signals represented by the compression control rule and compression control parameters such as prediction intensity, motion information, spatial pixel activity, residuals energy, etc.
The present invention consists in a further aspect in a method of decoding a compressed bitstream in which bitstream values are inverse quantised and where appropriate inverse transformed to form an image block; characterised by the steps of receiving in association with the bitstream a signalled compression control rule; and varying values of the image block in accordance with the signalled compression control rule operating on an identified compression control parameter.
The step of varying values may comprise expanding the dynamic range. The values that are varied may be residual values formed prior to addition of a prediction. The compression control parameter may be a representative image intensity for a transform block or other image area.
In one example, a prediction is received for a block, and the representative image intensity is formed for a block by computing the average image intensity of the prediction for that block or by the blocks in the neighbourhood.
It will be understood that embodiments of this invention allow a different compression control rule to be used for each spatio-temporal video segment to allow broadcaster and video content producers to encode a given segment by preserving more some image details or by smoothing some others. Moreover, the usage of said control rule may allow the creation of some effects in the coded video as for example, fading shot transition, weighted prediction, colour contrast enhancement, etc.
The present invention will now be described by a way of examples.
In one example of an encoding process according to the invention, the prediction residue to be frequency transformed and subsequently quantised, is scaled in order to reduce its magnitude. To this end, the residue prior transformation can be scaled by the following operation:
where r denotes the prediction residue, r′ denotes the scaled residue and Δ is a scaling parameter.
In the inverse scaling process, the residue reconstructed after inverse transformation is re-scaled by applying a re-scaling operation described by the following operation:
{circumflex over (r)}={circumflex over (r)}′×Δ
where {circumflex over (r)} denotes the final reconstructed residue and {circumflex over (r)}′ denotes the reconstructed residue after inverse transformation.
In one form of the invention, the scaling parameter Δ varies with image intensity according an ID profile. The well known Just Noticeable Distortion (JND) profile as an example of an ID profile.
ID profile derivation and transmission
From several experimental records available in the literature, it is known that the average pixel intensity masking profile (i.e. the ID profile) related to the human visual system is a U-shape curve, ID=ID (μ) where μ denotes the average intensity value for the block being coded. Since the ID profile is needed at the decoder side to perform inverse scaling, a compact way to represent this profile is proposed in this invention.
ID profile derivation and transmission
In addition to the sending of Δ for intensity values, here an alternative way is also described. An ID profile may be used to vary the scaling parameter Δ, indirectly. A parallel may be drawn with quantisation, where a quantisation parameter QP is varied, rather than directly varying the quantisation step Δ. Here QP is different than the one used in conventional quantisation that is performed in transform domain.
Generally, θ is invertible, that is QP=θ−1(Δ). Furthermore, QP takes integer values which yields to: QP=floor(θ−1(Δ)+0.5). Here such quantisation parameter is called Intensity differential quantisation parameter P (idP). The strength of its quantisation can be related to the strength of the quantiser in the conventional quantisation, for implementation purposes. As an example, the quantiser used in both the H.264/AVC and the HEVC standards, uses a nonlinear mapping/relationship θ given as follows:
That is, the quantisation step Δ doubles in size for each QP increment of six and Δ(QP=0)=0.625. Using this example in the scaling case of the present embodiment leads to the idP curve profile depicted in
In one arrangement, an ID profile may be signalled to the decoder. The decoder can then derive the appropriate idP profile.
In an alternative, the idP profile may be derived at the coder and signalled to the decoder. For example, a number of points in which idP changes (other than the point μ=0) are sent to the decoder. Therefore, for μ=0, only idP(0) is sent, then for the other points such that idP(μ−1)≠idP(μ), pairs (μ, idP(μ)) are sent. Alternatively, a difference of these parameters, between points n-1 and n can be sent: (μn−μn-1, idP(μn)−idP(μn−1)).
It will frequently be the case that even where idP changes it will change by only ±1.The number of bits needed fo profile transmission can then be reduced further. So If ∀ μ>0, |idP(μ−1)−idP(μ)|∈{0 ,1}, then the profile can be communicated in the same way as above, with a difference that the pairs include (μ, b(μ)), where b(μ) is a single value which defines whether idQP values increases or decreases in μ, with respect to idP(μ−1).
Examples of encoders utilising the present invention are shown in
In the
In the scaling process S, rule parameters are used with a representative image intensity to determine the strength of scaling (idP). Not shown, the rule is also provided to the entropy encoding block for incorporation in the output bitstream. For inter and/or intra coded blocks, a representative image intensity is estimated for the block by calculating an average luminance from either the block prediction (as indicated in the
In
An example of a decoder utilising the present invention is shown in
The rules for scaling are extracted in the entropy decoding process and provided to an inverse scaling process IS. This computation block has available to it the prediction for the current block or selected already decoded blocks, for estimation of the representative image intensity as described above. The idP level is decided in that block for inverse scaling. The inverse scaling process then applies the final scaling of pixels to be used to reconstruct the decoded coefficients.
It will be observed in this case, the Q−1 stage is conventional and in particular does not rely upon the average pixel intensity μ and for this reason does not require access to the prediction. In some decoder architectures, this may allow efficiencies or time savings.
Where appropriate, the scaling process may be enabled or disabled as signalled by—for example—image segment or image block related coding flags.
Computation of Average Pixel Intensity
It is also optionally specified how to compute the average pixel intensity μ, independently from the internal bit depth used by the encoder and decoder to represent image pixels. In fact, some video coding standards allow the encoder and decoder to change the internal bit depth representation for source pixels to compensate for rounding errors happening at various processing stages (e.g. motion compensation, transform, etc.). In some embodiments of the present invention the average operation to obtain average pixel intensity for an image block is specified in such a way that the output is always kept into n bits. Therefore, assuming that both the encoder and decoder use m with m>n bits to internally represent image pixels, for an image block with R rows and C columns, the average pixel intensity is μ given by:
where>>denotes the right bit shift operation and o, q are rounding offsets. The values for these offsets are o=2(m−n−1) and q=2log
It will be understood that the techniques described and claimed here will typically form only a relatively small part of a complete encoding or decoding system. Also those techniques may be employed for some but not all images or image regions in a video sequence.
Dynamic Range Variation at the Reconstruction
In conventional video compression systems, inverse transform is the last step before reconstruction. Therefore, the inverse transform has typically incorporated mechanisms that bring the underlying signal to the required level. This process often results in some reduction of precision. While that amount of the loss is not in the general case significant, in the case when inverse scaling is used after inverse transform, it may be beneficial to preserve as much of the signal's precision before the final scaling.
Therefore the final scaling may incorporate this fact into account. In that case, when scaling is used, some changes to the inverse transform are needed. For illustration, a case is considered where higher precision is preserved by modifying bit-shifting amount the inverse transform performs.
According to
B
shift
=B
MAX
−B
out.
It is important to take into account such precision adjustment during scaling, so that the output of the scaling is at Bout. To achieve that for this example case, the final reconstructed residuals are obtained as follows:
{circumflex over (r)}=({circumflex over (r)}×Δ)>>Bshift.
As will be seen in the
It will be understood that this invention has been described by way of examples only and that a wide variety of modifications are possible without departing from the scope of the invention. Thus the described ID profile is only one example of a compression control rule that can be selected and signalled in association with the compressed bitstream for use at the decoder. Similarly, representative image intensity is only one example of a compression control parameter that can be identified for an image area, the compression control parameter varying between at least some image areas.
Number | Date | Country | Kind |
---|---|---|---|
1210853.6 | Jun 2012 | GB | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/GB2013/051598 | 6/19/2013 | WO | 00 |