The invention relates to a pre-processing device and method before encoding of a video image sequence.
The image encoding devices become all the more effective as the temporal or spatial entropy of the images they encode reduces.
They are therefore often associated with image pre-processing devices in which the images are processed in order to allow a better encoding.
As is known, the pre-processing devices for reducing the entropy of a video sequence use linear or non-linear filters which reduce, even eliminate, the high-frequency components that are mainly responsible for the image encoding cost in intra mode. There are numerous filters available, including one- or two-dimensional low-pass filters, Nagao filters, averaging filters and median filters.
The main drawbacks with these methods are:
The invention proposes to resolve at least one of the abovementioned drawbacks.
To this end, the invention proposes a pre-processing device before encoding of a video image sequence, characterized in that it comprises:
Applying a raw morphological processing would have a devastating effect on the quality of the image. The presence of a mixer between each morphological operator weights the effect of this processing by mixing the raw result of the operator and the input of the same operator.
According to a preferred embodiment, the device comprises means of measuring the complexity of said video image sequence before applying the plurality of morphological processing steps.
In practice, a pre-processing for reducing the spatial entropy of the image is recommended mainly for images with high complexity. Thus the pre-processing can be controlled according to the complexity of the image and suited to the complexity of the image.
Advantageously, the means of measuring the complexity of said video image sequence measure the intra-image correlation.
According to a preferred embodiment, the means of measuring the complexity compute a weighting coefficient for each mixer.
Advantageously, the weighting coefficient is identical for each mixer.
Advantageously, the weighting coefficients are inversely proportional to the intra-image correlation.
Preferably, the means of applying a plurality of morphological processing steps and the mixers apply the processing steps to the luminance component of the video signal, pixel by pixel, for each image.
Preferably, the device comprises:
This makes it possible to obtain progressive frames which each contain the complete vertical definition of an image. It is then possible to consider without bias a processing in both axes of the image, horizontal and vertical.
The invention also relates to a method of pre-processing before encoding a video image sequence. According to the invention, the method comprises:
The invention will be better understood and illustrated by means of examples of embodiments and advantageous implementations, by no means limiting, with reference to the appended figures in which:
The modules represented are functional units, which may or may not correspond to physically distinguishable units. For example, these modules, or some of them, may be grouped in a single component, or form functionalities of one and the same software. Conversely, certain modules may, if necessary, comprise separate physical entities.
The video signal Ei at the input of the pre-encoding device is an interlaced type video signal.
In order to improve the performance of the pre-encoding device, the video signal Ei is deinterlaced by the deinterlacer 1. The deinterlacer 1 doubles the number of lines per field of the video signal Ei using a deinterlacing method known to a person skilled in the art based on three consecutive fields of the video signal Ei. Progressive format is then obtained, where each field is becoming frame and contains the complete vertical definition of an image, so that a processing can be performed in the vertical axis.
The signal E is obtained at the output of the deinterlacer 1.
A complexity analysis of the image is then carried out. In practice, spatial entropy reduction is applied mainly to images having a high spatial entropy.
The device therefore includes upstream means 2 of measuring the correlation of each image.
For each pixel of the image, the pixel result “Rp” is computed using the luminance of the current point and that of four of its neighbours:
Rp=[abs(P−P(−1,0))+abs(P−P(0,−1))+abs(P−P(0,+1))+abs(P−P(+1,0))]/4
Then, all these pixel results are accumulated over one frame.
This correlation measurement is used to ascertain the average deviation, over one image, between a pixel and its adjacent pixels. Interesting information on the definition in the image is thus obtained.
In other embodiments, it is possible to modify these equations in order to obtain a more complete definition of the complexity of the image. It is also possible to enlarge the vicinity of the current pixel taken into account in computing the image complexity.
From this measurement, a coefficient K is computed within the range [0,1], as a function of the complexity of the incoming images.
The table below illustrates the values of the coefficient K as a function of Cintra, given as an illustration only.
The value of Cintra is encoded on 8 pixels and is therefore between 0 and 255.
When the correlation is very strong (very little definition), the pre-processing of the image is still performed, but in lesser proportions, illustrated by the value of the coefficient K.
Conversely, a weak correlation is an indicator of strong entropy. The example of a random noise (weak, even zero correlation) is a good example (high entropy).
The coefficient K can be the same for each mixer as in the preferred embodiment or different for all the mixers.
The processing carried out in the device of
In practice, a processing of the chrominance component may provoke disagreeable artefacts in colour, and, above all, it is the luminance component that has most of the complexity of the image.
The signal Ein (identical to the signal E) at the output of the module 2 is then subjected to erosion in the module 3. The erosion process consists in keeping the pixel that has the minimum luminance value among the pixels of a structuring element that it receives as input. The structuring element comprises a 3*3 window, three rows by three columns, around the current pixel, or 9 pixels. However, another window size can be found bearing in mind that the size of the window is directly proportional to the severity of the erosion.
The module 3 therefore computes for each pixel of the incoming video signal Ein, its new luminance value.
The video signal T0 at the output of the module 3 therefore represents the eroded signal Ein, or therefore for each pixel, its luminance value modified in relation to T0 corresponding to the minimum value of the structuring element of which it is part.
Then, the signal T0 is transmitted to a mixer 4. The mixer 4 also receives as input the coefficient K transmitted by the module 2.
The mixer 4 produces as output the signal S0 according to the following formula:
s0=K×T0+(1−K)×Ein
The signal S0 is input into the dilatation module 5.
The dilatation operation consists in retaining the pixel that has the maximum luminance value among the elements of a structuring element centred on the current pixel, with a size of 3×3=9 pixels.
The dilatation module produces as output a new video signal T1 transmitted to the input of a mixer 6. The mixer 6 also receives as input the weighting coefficient K.
The mixer 6 then produces a signal S1 according to the following formula by weighting, for each pixel, the luminance value that it receives as input:
S1=K×T1+(1−K)×S0
The signal S1 is then input into a second dilatation module 7.
The dilatation module 7 then performs an dilatation operation on the signal S1. The dilatation operation consists in retaining the pixel that has the maximum luminance value among the elements of a structuring element centred on the current pixel with a size of 3×3=9 pixels.
The module 7 then produces as output a signal T2 which is input into the mixer 8 which weights the luminance component of the signal received, for each pixel received. The mixer 8 also receives as input the weighting coefficient K.
The mixer 8 produces as output a signal S2, according to the formula
S2=K×T2+(1−K)×S1
The signal S2 is then input into a second erosion module 9.
The module 9 applies erosion to the signal S2, the erosion operation consisting as indicated previously in replacing the current pixel with the minimum pixel among the pixels of the structuring element defined as previously.
The erosion module 9 produces as output a signal T3 which is input into a fourth mixer 10. The mixer 10 also receives as input the weighting coefficient K.
The mixer produces as output a signal S3 according to the following formula:
S3=K×T3 +(1−K)×S2
The signal S3 is then transmitted to the input of an interlacer 11 which is used to interlace the signal S3 in order to obtain the video output Si of the pre-processing device which is then transmitted to an encoding device.
The incoming video signal Si then benefits from a reduced spatial entropy and its subsequent encoding is made easier. Any encoding type can be considered subsequently.
Naturally, the invention is not limited to the embodiment described above. A person skilled in the art will easily understand that modifying the number of morphological operations of the pre-processing device can be considered, as can modifying the number of associated mixers.
An erosion followed by a dilation is called an opening.
A dilatation followed by an erosion is called a closing.
Number | Date | Country | Kind |
---|---|---|---|
04/51381 | Jul 2004 | FR | national |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/EP05/52908 | 6/22/2005 | WO | 00 | 9/7/2007 |