The field of the invention is that of data compression (compression of audio and/or video data).
More specifically, the invention relates to a technique for the dynamic reduction of the entropy of a signal upstream to a data compression device (also called an encoder).
One particular embodiment proposes a real-time, bit-rate/time constrained encoder.
In the present description, a frame is defined as a set of successive items of data and a scene is defined as a set of successive frames. In the particular case of video, a frame is an image.
The quantity of information (entropy) contained in a signal can vary hugely over time. For example, in the case of a video signal, it is possible to pass from a static scene containing smooth textures to a scene comprising many moving objects and complex textures. In this case, a significant increase is observed in the complexity of the scene and in the quantity of information.
When compression techniques are used, this natural variation has two consequences:
Solutions have been developed to overcome these two problems.
There are known techniques, called bit-rate control techniques used to regulate the output bit rate of the encoder. It is possible to ensure either a constant bit rate (CBR) or a variable bit rate (VBR). In both cases, an external constraint, for example the physical capacity of a communications channel. Is met. If this external constraint varies over time, then this is a variable bit rate (VBR).
Techniques are also known for the dynamic reduction of the entropy of the signal upstream to the encoder. These techniques (which can be used jointly with the above-mentioned bit-rate control techniques) rely on the use, upstream to the encoder, of:
A filter is defined as a module that converts the frames of a scene by applying an operator. In the present description, the filters considered are aimed at reducing the entropy of the frames. However, one drawback of these filters is that the video quality of the frames (at output from the encoder) is degraded.
To ensure a minimum subjective video quality at output from the encoder, a first known solution (the simplest solution) is described in a first part (see paragraph 1.2) of the following article: Serhan Uslubas, Ehsan Maani and Aggelos K. Katsaggelos, “A Resolution Adaptive Video Compression System”, (North Western University, Department of EECS).
This first known solution consists of a systematic filtering of the applied filter. In the case of this article, the filter is a resizing (reduction of the encoding resolution).
However, this first known solution is sub-optimal since the resolution chosen could be increased for most of the scenes (the scenes of low complexity are not rendered in full resolution). Conversely if the chosen resolution is too high, it can happen that certain scenes do not reach the minimum level of quality expected at output from the encoder: there is a risk that there will be artifacts of undesirable quality (blockiness, frozen images, etc). It can then be decided to increase the bit rate of the encoder but in this case the efficiency of the compression is limited or an external constraint is not met.
To improve the flexibility of their first approach, the authors of the above-mentioned article propose a second solution described in a second part of the article (see paragraphs 2.1, 2.2 and 2.3). The principle of the second known solution is the following: for each block of each frame, the system carries out two encoding operations, one in high resolution (encoding of the original block) and the other in low resolution (encoding of a filtered block obtained by reducing the resolution). Then, the system selects one of the two encodings in taking the rate-distortion (RD) cost as the criterion.
This second known solution is efficient but not optimal for the following reasons:
A third known solution is described in the article by Jie Dong and Yan Ye, “Adaptive Downsampling for High-Definition Video Coding”, (InterDigital Communications, San Diego, Calif.). It improves the second known solution by applying a resizing (downsampling) of the entire frame, the chosen resizing being the one that gives the best balance between two estimated distortions (distortion related to encoding and distortion related to resizing), and therefore that which achieves the best overall performance in terms of rate-distortion (RD) cost.
It thus resolves the problem of homogeneity of filtering. However, the third known solution is not complete for the following reasons:
The invention, in at least one embodiment, is aimed especially at overcoming these different drawbacks of the prior art.
More specifically, it is a goal of at least one embodiment of the invention, to provide a technique for the dynamic reduction of the entropy of a signal upstream to a data compression device (encoder) enabling the dynamic adaption of the filtering of the frames of the signal before encoding, in order to increase the tolerance of an encoder to variations of complexity of a scene while at the same time ensuring minimum quality at the output from the encoder.
It is another goal of at least one embodiment of the invention to provide a technique of this kind enabling an upstream implementation of a real-time encoder complying with a constant bit rate (CBR).
It is an additional goal of at least one embodiment of the invention to provide a technique of this kind that can be implemented whatever the format of the signal output from the capture module.
It is an additional goal of at least one embodiment of the invention to provide a technique of this kind that is simple to implement and costs little.
One particular embodiment of the invention proposes a system for the dynamic reduction of the entropy of a signal upstream to a data compression device, said signal comprising a set of successive frames, said system comprising a filtering decision module that provides a setpoint value of filtering and a filtering module that filters the signal according to said setpoint value of filtering and gives a filtered signal to the data compression device. The system comprises a module for obtaining a piece of information on complexity for each frame of the signal, and said filtering decision module is adapted to determining said setpoint value of filtering for each frame of the signal as a function inter alio of said piece of information on complexity.
The general principle of the invention therefore consists in obtaining a piece of information on complexity for each frame of a signal and in using this piece of information in a decision-making process which (by filtering) adaptively reduces the entropy of this signal in order to meet the constraints of bit rate and quality expected at output of the compression device (encoder).
Thus, this particular embodiment of the invention relies on a wholly novel and inventive approach:
In a first particular application of the system, the setpoint value of filtering is a setpoint value of resolution and the filtering module is adapted to carrying out a resizing of each frame according to said setpoint value of resolution.
In a second particular application of the system, the setpoint value of filtering is a setpoint value of filtering strength and the filtering module is adapted to carrying out a low-pass filtering of each frame according to said setpoint value of filtering strength.
In a first particular implementation of the module for obtaining information on complexity, the module for obtaining information on complexity comprises means for estimating complexity adapted to determining an estimated complexity K(t) associated with a frame I(t) and said filtering decision module is adapted to determining said setpoint value of filtering for the frame I(t) as a function inter alio of said estimated complexity K(t).
Thus, in this first implementation, the information on complexity of a frame I(t) is an estimated complexity. If the estimation module is appropriately chosen (cf. especially in the particular case described here below), the estimation of the complexity can be done in real time and the present technique can then be implemented upstream relative to real-time data compression device (encoder).
According to one particular characteristic, said estimated complexity K(t) is defined by:
with:
This mode of estimation is compatible with a real-time implementation.
According to one particular characteristic, said estimated cost C(t) of the frame I(t) is defined by:
with:
According to one particular characteristic, said estimated cost Cblock|T of a block blk is defined by:
C
Block|T(blk)=P(blk=tempo)×Ctempo(blk)+(1−P(blk=tempo))×Cspatio(blk)
with:
According to one particular characteristic, said cost Ctempo is defined by:
with:
According to one particular characteristic, the cost Cspatio is is defined by:
with:
According to one particular characteristic, said probability P(blk=tempo) is a function of SAD(blk) and the type T of encoding of the frame I(t) in which the block blk is included, and is defined by:
P(blk=tempo|SAD(blk), Frame=I)=0
P(blk=tempo|SAD(blk), Frame=B)=1
Or
with:
In a second particular implementation of the module for obtaining information on complexity, the module for obtaining information on complexity comprises an encoding module enabling the computation of the complexity associated with a frame I(t) by computing the number of bits really necessary for encoding the frame I(t) and said filtering decision module is adapted to determining said setpoint value of filtering for a frame I(t) as a function inter alio of the complexity computed.
Thus, in this second implementation, the information on complexity of a frame I(t) is a computed complexity. This computation requires encoding which is added to the encoding performed by the data compression device. There is therefore a double encoding (double-pass encoding) not compatible with a real-time implementation.
In a third particular implementation of the module for obtaining information on complexity, the module for obtaining information on complexity comprises:
Thus, in this third mode of implementation, the information on complexity of a frame I(t) is a complexity obtained simply by combining (by multiplication) two pieces of information read in the signal: one pertaining to the size of the frame and the other to the quantization step. This is a low-cost solution but is restricted to situations where the received signal is an already compressed signal (which will be decoded and then re-encoded in a different format or at a different bit rate). The two pieces of information read are reliable only if the signal (compressed stream) has not preliminary undergone different successive encoding operations.
Another embodiment of the invention proposes a method for the dynamic reduction of the entropy of a signal upstream to a data compression device, said signal comprising a set of successive frames, said method comprising a step for filtering the signal according to a setpoint value of filtering, the filtered signal resulting from said filtering being given to the data compression device. The method comprises the following steps for each frame of the signal: obtaining a piece of information on complexity and determining the setpoint value of filtering as a function inter alio of the piece of information on complexity.
Advantageously, the method comprises steps implemented within the system as described here above in any one of its different embodiments.
In a first particular application of the method, the setpoint value of filtering is a setpoint value of resolution and the filtering step comprises a resizing of each frame according to said setpoint value of resolution.
In a second particular application of the method, the setpoint value of filtering is a setpoint value of filtering strength and the filtering step comprises a low-pass filtering of each frame according to said setpoint value of filtering strength.
Another embodiment of the invention proposes a computer program product comprising program code instructions to implement the above-mentioned method (in any one of its different embodiments) when said program is executed on a computer.
Another embodiment of the invention proposes a computer-readable, non-transient storage medium storing a computer program comprising a set of executable instructions executable by a computer to implement the above-mentioned method (in any one of its different embodiments).
Other features and advantages of the invention shall appear from the following description given by way of an indicative and non-exhaustive example and from the appended figures, of which:
In all the figures of the present document, the identical elements and steps are designated by a same numerical reference.
Here below in the description, we consider by way of an example the particular case of the compression of a video signal (in this case the successive frames are images). The encoder (data compression device) used (referenced 10 in the figures) is for example an H.264/MPEG4-10 AVC encoder.
As illustrated in
The encoder 10 sends out a compressed signal 11 comprising frames compressed in the resolution fixed by the process implemented by the system for the dynamic reduction of entropy (step 65 of
The different modules (referenced 2, 4, 6 and 8) included in the system for the system for dynamic reduction of entropy are carried out by computer, with one or more hardware elements (memory components and processors in particular) and/or software elements (programs). Each module comprises for example a RAM, a processing unit fitted for example with a processor and driven by a computer program stored in a ROM. At initialization, the code instructions of the computer program are for example loaded into the RAM and then executed by the processor of the processing unit. This is only one way, among several possible ways, to carry out the different algorithms described in detail here below. Indeed, each module is done indifferently on a reprogrammable computing machine (a PC, a DSP processor or a microcontroller) executing a program comprising a sequence of instructions or on a dedicated computing machine (for example a set of software gates such as the FPGA or an ASIC, or any other hardware module). In the case of an implementation on a reprogrammable computing machine, the corresponding program (i.e. the sequence of instructions) could be stored in a storage medium that may be detachable (such as for example a floppy disk, a CD ROM or a DVD ROM) or not detachable, this storage medium being partially or totally readable by a computer or a processor.
Referring now to
In the case of classic encoding with a given bit rate, the simplest known solution consists in choosing a fixed resolution. This resolution is fixed to ensure accurate quality for most of the sequences. However, it will be sub-optimal (undersized) for the easier sequences and doubtless excessively high (oversized) for the difficult sequences.
In the first application of the system of
In this first application, the filter decision module 6 and the filter module 8 of
A detailed description is given now of an example of operation of the resolution decision module 26. The higher the resolution, the clearer is the output frame. However, the higher the resolution, the greater is the entropy of the frame and the greater the risk of artifacts of compression in the video at output from the encoder is high (blockiness, frozen images, etc). Using the previously computed complexity, the resolution decision module 26 chooses the highest resolution without causing any risk of compression artifacts.
In one particular embodiment, a table is built beforehand. This table gives a particular resolution for a given bit rate of the encoder (the table comprises pairs each associating a resolution and a bit rate). This is an average resolution that is appropriate to a scene of average complexity. The resolution decision module 26 uses for example this table in the following way, for each frame:
Referring now to
The entropy of a source is the average quantity of information contained in this source. The greater the entropy, the greater is the number of bits needed for the encoding. In the case of lossy encoding, a part of the information is not rendered. It is chosen to eliminate a part of the information before the encoding so that this encoding can render more important data.
In this second application, the filtering decision module 6 and the filtering module 8 of
This process integrates a low-pass filter 38 upstream to the encoder 10 because the high frequencies necessitate more bits in the encoding process and are not necessarily visible to the human visual system. It is sometimes preferably to eliminate them to prevent encoding artifacts (blockiness, frozen image, etc). However, a low-pass filter induces an undesirable fuzzy effect when this filtering is not necessary.
A detailed description is now given of an example of operation of the filtering strength decision module 36. Depending on the encoding bit rate R, the resolution S of the scene and the complexity K(t) of the frames to be encoded, the filtering strength decision module 36 determines the strength F(t) of the low-pass filter to be applied to the frame I(t):
F(t)=function(R, S, K(t))
The average bit-rate/resolution table described in the above paragraph (pertaining to the first particular application) is also used here.
Here is an example of such a table:
Let Km be an average complexity that is set, for example, by being given the following value: Km=1000.
The strength F(t) is computed for example according to the following method.
Let Rtable be the bit rate associated with the resolution S in the table given as an example here above.
DR is defined as the difference between Rtable and the encoding bit rate R.
D
R
=R
Table
−R
DK(t) is defined as the difference between K(t) and Km
D
K(t)=K(t)−Km
These two differences make it possible to determine F(t):
F(t)=MIN(0; a×(DK(t)+DR)+b)
a and b are to be defined according to the filter, for example: a=0.003 and b=1.5
The STC module 4 links the complexity of a video with the number of bits needed for encoding the video with a given encoder and a fixed quantization step denoted as Qstep. The complexity K is defined as the product of the Qstep chosen and the number of associated bits C:
K=Qstep×C (1)
In a real-time system, it would be far too costly in computing time to carry out a first encoding to compute the number of bits really needed for the encoding. The STC module 4 actually makes an estimation of this.
As illustrated in
We have: 0≦N≦Nmax (for example Nmax=60).
The complexity K(t) associated with the frame I(t) is defined by:
where:
Specific Details on P(Type(l(k))=T).
The configuration of the encoder is known:
For example, we consider an H264 type encoder which sequences the frames as follows: P Bref B B. This means that a frame of the “inter (P)” type is followed by a “bi-predicted and reference inter (Bref)” type frame and two “bi-predicted inter (B)” type frames.
The frequency f of the input sequence is known. For example f−25 fps. In the present example, the probability function P(Type(l(k))=T) will take the following values:
NOTE: If a frame is the first of a change of scene, it will be of a “intra” type in the encoder, and this adds a “intra” frame in addition to the periodic “intra” frames. If the STC module knows the position of the changes of scene (because of an external module or because of a detection integrated into the processor), it is then possible to improve the probability P(Type(l(k))=intra) in taking account of the additional “intra” frames placed in the changes of scene.
If the configuration of the encoder is unknown, the probability function can be defined with default values.
When N is equal to zero (N=0), the complexity K(t) is associated with the single frame I(t).
When N is different from zero (N≠0), the complexity K(t) is associated with the current frame I(t) but also with the group of N following frames. Indeed, in this case, the computation of the equation (2) gives an average complexity to a sliding window of N+1 frames (smoothing enabling a lower result to be obtained).
The complexity of each frame is thus computed by means of a complexity estimation module (STC module) 4 which is:
One particular embodiment uses an H.264/MPEG4-10 AVC encoder.
The estimated cost of a frame is the sum of the estimated costs of the blocks of the frame. The estimated cost of a frame Cframe=T(t) varies according to the type T of encoding: intra (I), inter (P), bi-predicted inter (B) or bi-predicted and reference inter (Bref).
C
frame=T=Σblk=0NblockCBlock|T(blk) (3)
with:
A block can be included with a spatial prediction (Spatio) or a temporal prediction (Tempo). The estimated cost CBloc|T(blk) of a block blk is either an estimated cost of encoding with a spatial prediction Cspatio, or an estimated cost of encoding with a temporal prediction Ctempo. If we consider the probability P(blk=tempo) that a block blk is encoded in temporal mode, we have:
C
Block|T(blk)=P(blk=tempo)×Ctempo(blk)+(1−P(blk=tempo))×Cspatio(blk) (4)
The process predicts the motion between the current image and a reference image. With no information on the sequencing of the images, the preceding image is taken as the reference image.
So as not to burden the process and in order to obtain real-time execution, the proposed technique uses for example a hierarchical motion estimation (HME) executed on 16×16 blocks. The principle of an HME is described for example in the following article: Chia-Wen Lin, Yao-Jen Chang, and Yung-Chang Chen, “Hierarchical Motion Estimation Algorithm Based on Pyramidal Successive Elimination”, Proceedings of International Computer Symposium (ICS98), Workshop on Image Processing and Character Recognition, pp. 41-44, Tainan, Taiwan, Dec. 17-19, 1998.
This HME gives the following for each block blk:
In using these metrics, we have been able to define the following model:
α and λ are predetermined parameters of the model, defined in a preliminary learning phase. Qstep is fixed.
We use for example a measurement of spatial energy to estimate the intra prediction of the encoder H264. The STC module 4 computes this energy (Energy(blk)) for each block blk of the current frame. Thus we define:
β is a predetermined parameter of the model defined in a preliminary learning phase.
The distribution of probability of a block being encoded with a temporal prediction P(blk=tempo), can be modeled by an exponential function which depends on the SAD of the block and the type of image considered (or more specifically the type T of encoding of the frame in which the block blk is included). This probability is defined by:
with:
In on alternative embodiment, the equation (8) is replaced by:
with γ′ and σ′ being predetermined parameters of the model defined (for the frames 8) in a preliminary learning phase.
In the above description, the module 4 for obtaining a piece of complexity information is a module for estimating complexity (STC module) which enables an upstream application of a real-time encoder.
However, there are other possibilities for obtaining a piece of information on complexity associated with each frame of a scene.
In a first variant, the module 4 for obtaining information on complexity comprises an encoding module used to compute the complexity associated with a frame (t) by computing the number of bits really needed for encoding the frame I(t). For each frame, the filtering decision module 6 receives a computed complexity which it takes into account (inter alio) to determine the setpoint value of filtering.
In other words, in this first variant, a first encoding is done (in addition to the one performed by the encoder 10), whence the notion of a double encoding (also called a “double pass” encoding). This first encoding serves only to assess the complexity of each frame. This method is efficient but it is also very costly. Optimizations of this first variant consist for example in proposing an encoding that is simplified into a first pass.
A second variant is possible in cases of use where the capture module 2 receives an already compressed stream 1 (compressed by another encoder (not shown)) which is decoded and then re-encoded (by the encoder 10) in a different format or a different bit rate. The module 4 for obtaining information on complexity in this case simply comprises:
In other words, in this second variant, pieces of information on frame sizes and on Qstep in the compressed stream are retrieved and combined. This is a low-cost approach but is limited to situations where the capture module receives a compressed stream. In addition, the data streams are not reliable if this stream has undergone different successive encodings beforehand.
In the above description, the complexity estimation module (STC module) 4 forms part of a system for the dynamic reduction of entropy of a signal upstream to a data compression device (encoder) 10.
As illustrated in
For example, the complexity estimation module 4 gives the encoder 10 one or more of the following pieces of information
The utility of each of these different pieces of information in the decision process of the encoder is described here below.
As indicated further above, when N is different from zero (N≠0), the complexity K(t) is associated with the current frame I(t) but also with the following group of N frames (average complexity on a sliding window of N+1 frames). In other words, K(t) corresponds to the complexity of the group of N frames to come. This piece of data is used by the encoder in the bit-rate allocation algorithm or rate control algorithm: if the complexity to come increases, diminishes or remains constant, the bit rate allocated to the current frames will not be the same.
As indicated further above, the spatial cost (also called the intra cost) of a frame corresponds to the sum of the spatial costs of the blocks of the frame. The spatial cost Cspatio of a block blk (i.e. the estimated cost of encoding with a spatial prediction) is defined by the equation (6) described in detail further above:
This piece of data is used by the encoder in the algorithm for allocating frames of the type I (intra). These frames are key frames in the stream, and are not encoded except by means of a spatial prediction. Depending on the spatial cost of a frame and the number of bits allocated to it, the algorithm decides on a quantization step for the frame.
The sequencer of the encoder is the part that decides on the number of Bs (i.e. the number of B type frames) in the GOP (group of pictures) sub-group to come. According to the encoder, whether H264 or MPEG2, there are different possibilities: no B, 1B, 2B, 3B, 5B, 7B, etc.
The Bs are very useful because their prediction is optimized and they facilitate the allocation of the bit rate because their size is often smaller. However, depending on the motion present in the sequence, they can be varyingly optimal: a difficult sequence with a great deal of motion will be more easily encoded with 1B while a greater number of Bs will be more efficient for a static sequence.
The temporal cost Ctempo of a block blk (i.e. the estimated cost of encoding with a temporal prediction), also called “inter cost”, gives this useful piece of information on motion to the sequencer. It is defined by the equation (5) described in detail further above:
Number | Date | Country | Kind |
---|---|---|---|
1351277 | Feb 2013 | FR | national |