The present invention belongs to the technical field of video coding and decoding, and in particular, relates to temporal domain rate distortion optimization considering coding-mode adaptive distortion propagation.
Rate distortion theory is the basic theory of lossy coding. The rate distortion optimization (RDO) technology developed based on this theory is one of the important tools to improve the coding efficiency, and has been widely applied in the field of video coding.
The performance of the video coding needs to be measured by coding bit and reconstruction distortion. On one hand, when we want higher video quality, the coding bit of the video will be increased; and on the other hand, at the lower coding bit level, the distortion of the video will be greatly increased, so there is a mutually contradictory and mutually constrained relationship between the coding bit and the reconstruction distortion. The rate distortion optimization technology is to make the encoder to select one group of optimal coding parameter set, so that the coding distortion is minimum on the premise that the coding bit is less than a target bit, and the mathematical expression is shown in a formula (1.1):
wherein Di and Ri represent the distortion and bit number of the coding unit, N is a total number of the coding units, and Rc represents the target bit number.
In order to solve the above restrictive rate distortion optimization problem, the global Lagrange multiplier λg may be introduced to transform the constrained problem into an unconstrained problem of a formula (1.2), wherein J is called a rate distortion cost function.
Under the condition of independent rate distortion optimization, that is, the rate distortion performance between different coding units is mutually independent, the formula (1.2) is derived with respect to R to obtain λg=∂Dv/∂Ri. It can be seen that λg is a negative slope of a certain point on a rate distortion curve, the larger λg corresponds to an operation point with a smaller code rate and larger distortion, the smaller λg corresponds to a larger code rate, and the operable point with smaller distortion is the most important determining factor affecting the rate distortion performance, therefore, it is very important to select the Lagrange multiplier λg. The size of λg in a current VVC is mainly determined by a preset quantization parameter (QP) and is irrelevant to an input video sequence.
However, since intra-frame/inter-frame will introduce dependency among different coding units, and using the independent rate distortion optimization technology for each coding unit cannot achieve the optimal coding performance. Therefore, a global rate distortion optimization method with acceptable complexity is required to further improve the coding efficiency.
A temporal domain rate distortion optimization algorithm under an LD coding structure is studied in the literature temporally dependent rate-distortion optimization for low-delay hierarchical video coding. According to the time dependent relationship under the LD configuration, a temporal domain distortion propagation chain under multiple reference frames is established, the distortion propagation degree is estimated, and the propagation factor is calculated, so that the global Lagrange multiplier is adjusted according to the aggregation propagation factor, thereby realizing temporal domain rate distortion optimization and solving the problem about the global rate distortion optimization.
When the temporal domain rate distortion optimization of the coding unit Bi in the key frame fi is considered under the LD coding structure, the expected distortion of the affected coding unit Bi+1 in the coding frame fi+1 is:
assuming that Pi,j is the probability that the coding frame fi is referenced by the coding frame fi, oi is the coding parameter of Bi. The last three terms are irrelevant to the coding parameter oi of Bi, so the formula (1.3) may be simplified as:
E(Di+1)=Pi,i+1·Di+1(oi,oi+11)+ai+1 (1.4)
In the same way, the expected distortion of the coding unit Bi+2 may be written as:
E(Di+2)=Pi,j+2·Di+2(oi,oi+22)+Pi+1,i+2·Di+2(oi,oi+1*,oi+21)+ai+2 (1.5)
wherein ai+2=Pi−4,i+2·Di+2(oi−4,oi+23)+Pi−8,i+2·Di+2 (oi−8,oi+24) is irrelevant to the coding parameter oi of Bi, and the expected distortion of the coding unit which will affect the subsequent coding unit may be obtained by the similar method.
Based on the concept of the expected distortion, the rate distortion problem of the formula (1.2) may be represented again as:
The algorithm is relative rough to the expected distortion estimated by the current coding unit and the subsequent coding unit, so it is difficult for the propagation factor to accurately measure the influence on the subsequent coding distortion by the distortion of the current coding unit, and a loss is generated in the new generation video coding standard VCC; and meanwhile, the algorithm does not perform temporal domain rate distortion optimization on the I frame, and the coding performance of the I frame is very important in the LD coding structure.
For the above problem, in order to further optimize the temporal domain rate distortion optimization solution under the LD coding structure, the problem of dependency rate distortion optimization based on temporal domain distortion propagation is induced again according to a temporal domain dependency relationship under an LD structure and distortion propagation analysis under the skip mode and the inter mode; and the aggregation distortion of a current coding unit and an affected future coding unit are estimated and a propagation factor of a coding unit in a temporal domain distortion propagation model is calculated by constructing a temporal domain distortion propagation chain, so that a Lagrange multiplier is adjusted through a more accurate propagation factor to realize temporal domain dependency rate distortion optimization, and an I frame is subjected to a secondary coding technology to realize temporal domain dependency rate distortion optimization of the I frame.
The present invention adopts the following technical solutions:
The reconstruction distortion of a coding unit Bi is assumed to be Di. Due to the presence of a skip mode in inter-frame prediction, it is unnecessary to transmit residual error in this mode, an inter-frame prediction value is directly used as a reconstruction value, and it is necessary to transmit residual error in another mode which is called an inter mode; therefore, the distortion of the current coding unit may consist of distortions brought by the skip mode and the inter mode:
D
i
=p
inter
·D
i
inter
+p
skip
·D
i
skip
=d
inter
+d
skip (1.7)
Only the partial distortion dinter of the current coding unit in the inter mode will affect the subsequent coding unit, because it is unnecessary to transmit a predicted residual error when the coded reference unit serves as a prediction block in the skip mode. The distortion of the current coding unit is determined by the distortion of the previously coded reference unit, so the influence on the subsequent coding unit is determined by the previously coded unit, and the distortion in the skip mode should be eliminated when the influence on the subsequent coding unit by the current coding unit is considered. Assuming that Diinter and Diskip are coding distortions of the current coding unit selecting the inter mode and the skip mode respectively, pinter and pskip are the probabilities that the current coding unit selects the inter mode and the slip mode respectively, and the sum of the two is 1. The larger error between the current coding unit and the prediction unit will cause larger probability that an encoder selects the inter mode, and the larger quantification step size will increase the probability that the encoder selects the skip mode. Therefore, pinter is defined as:
wherein DiOMCP=∥Fi−Fi−1∥2 is an original motion compensation error obtained by Bi in an original frame through motion search, Fi and Fi−1 represents original pixels of a coding unit Bi and a reference unit Bi−1 respectively, and Δ is quantification step size.
when Bi is coded, a partial derivative of a formula (1.6) with respect to Ri is evaluated to obtain a global Lagrange multiplier λg:
A ∂Ri/∂Di is multiplied at both ends of the formula (1.9) and assuming that ∂Di/∂Ri=λi, it may be obtained as follows:
wherein is a Lagrange multiplier of the coding unit Bi under the global rate distortion performance. In addition, κi represents the influence on the subsequent video sequence coding distortion by the coding unit Bi, which is called a propagation factor of the coding unit Bi.
The distortion function under the inter mode with high code rate may be represented as Di+1inter=e−bR
Fi represents an original pixel of the coding unit Bi, {circumflex over (F)}i represents a reconstruction pixel of the coding unit Bi and Fi+1 represents an original pixel of the coding unit Bi+1.
According to the experimental observation, a is about equal to a constant, and at this time, the distortion of the coding unit Bi+1 may be represented as:
D
i+1
≈p
i,i+1
inter
·e
−bR
·α·(Di+1OMCP+Di)+pi,i+1skip·α·(Di+1OMCP+Di) (1-12)
wherein Pi,i+1inter and Pi,i+1skip represent the probabilities of using the inter mode and the skip mode when the coding unit Bi+1 is referenced to the coding unit Bi, and Di+1OMCP represents an original motion compensation error of the coding unit Bi+1.
At this time, the expected distortion of the coding unit Bi+1 affected by the coding unit Bi in the coding frame fi+1 may be obtained by a formula (1.4) and a formula (1.7):
Wherein γi,i+1=α·(pi,i+1inter·e−bR
In the same way, the expected distortion of the coding unit Bi+2 affected by Bi in the coding frame fi+2 is:
E(Di+2)=(Pi+1,i+2·γi+1,i+2·Pi,i+1·γi,i+1+Pi,i+2·γi,i+2)·Diinter+ci+2 (1-14)
wherein γi+1,i+2=α··(pi+1,i+2inter·e−bR
In the same way, the expected distortion of the coding unit Bi+3 affected by Bi in the coding frame fi+3 is:
wherein γi+2,i+3=α··(pi+2,i+3inter·e−bR
Therefore, the aggregation distortion of all the coding units influenced by the coding unit B in four coding frames in the current GOP is:
wherein γi,i+k+1−t=α··(pi,i+k+1−tinter·e−bR
being irrelevant to the coding parameter oi of the coding unit Bi.
In the same way, the aggregation distortion of all the coding units influenced by the coding unit Bi in four coding frames in the m-th GOP is:
γi+4m,j+4m+k+1−t=α··(pi+4m,i+4m+k+1−tinter·e−bR
being irrelevant to the coding parameter oi of the coding unit Bi.
The aggregation distortion of the coding units affected by Bi in all the subsequent coding frames from the coding frame fi+1 to the last coding frame fN is:
wherein M is a total number of the GOP from the coding frame fi+1 to the last coding frame fN, and L represents an item irrelevant to oi.
It may be seen from a formula (1.8) that a relationship between the coding distortion Diinter of the current coding unit Bi using the inter mode and the actual coding distortion Di is as follows:
making
being the probability of the coding unit Bi selecting the inter mode, and a formula (1.19) may be represented as: Diinter=ηiDi.
According to a formula (1.10), the calculation formula of the propagation factor κi is:
the CTU-level global Lagrange multiplier λg may be adaptively adjusted by using the propagation factor κi, the CTU-level QP is further adjusted, and the frame level QP of all the B frames is adjusted by using a frame level average propagation factor.
Since the I frame is particularly important under the LD coding structure, and the subsequent coding frames need to be referenced to the I frame. At present, the QP of the I frame is uniformly lowered by 1 in the VTM, but the importance of the I frame is different in different sequences, so the I frame may be coded twice, the distortion propagation chain is established by the coding distortion obtained by the first coding, the propagation factor of each 16×16 block in the I frame is calculated, and the QP of the I frame is adjusted by the frame level average propagation factor, so that the QP of the I frame may be adjusted according to the influence on the subsequent coding frame by the I frame and the adjustment value is not limited to −1.
The present invention has the following beneficial effects: the problem in the traditional method that the I frame is not subjected to temporal domain rate distortion optimization is solved, so that the global rate distortion performance of the I frame is optimal, the problem of dependent rate distortion optimization based on temporal domain distortion propagation is induced again according to the temporal domain dependent relationship under the LD coding structure and the distortion propagation analysis in the skip mode and the inter mode, and the rate distortion optimization performance under the LD coding structure is improved.
The present invention is described in detail below with reference to the embodiments:
in order to simplify the implementation method of a global rate distortion algorithm, a global Lagrange multiplier λg may be directly modified in VTM through a propagation factor κi. The subsequent coding unit is not really coded when deducing a propagation factor κi, so it is necessary to estimate the distortion of the subsequent coding unit.
Under the condition of high code rate, the large probability of the coding distortion of the subsequent coding unit is inter distortion, and at this time, Di+1=e−bR
F(θ)=Di+1/Di+1MCP=e−bR
wherein θ=√{square root over (2)}Qstep/√{square root over (DMCP)}, a F(θ) curve may be fit based on a large amount of experiments with different quantification step sizes and coding units, a query table is established according to the curve, and the value of F(θ) is queried by calculating θ, so that the inter distortion of the coding block is estimated; meanwhile, in the present invention, α is set as 0.94.
According to the formula (1-9), a global Lagrange multiplier may be obtained:
Meanwhile, the Lagrange multiplier λVTM=−∂DiVTM/∂RiVTM=bDiVTM of VTM. Therefore, Δg and λVTM have the following relationship:
D
i·λg=(1+κi)DiVTM·λVTM (1-23)
For all the coding units, there is:
The global Lagrange multiplier λg may be evaluated by a formula (1-24), wherein N is the number of all the coding units, the distortion of all the coding units cannot be obtained in the coding process, and λg is updated by the weighted sum of the distortion at this time, the distortion of the coded frame and the distortion of the coding frame which is just completed. Since DiVTM cannot be obtained in an encoder which integrates with the rate distortion algorithm proposed in this section, Di is used for replacing.
When the distortion propagation chain is established, motion search is done based on a 16×16 block, and a propagation factor of each block is calculated. The CTU of 128-128 is independently divided and coded in the VTM, so the average value of the propagation factors in all the 16×16 blocks in the CTU is taken as the propagation factor of the CTU, and the CTU-level Lagrange multiplier and the QP are adjusted; meanwhile, the QP of the frame level is adjusted by using the average propagation factor of a whole image.
The I frame is coded for twice to optimize and adjust the QP of the I frame. In order to reduce the coding complexity, the first coding process of the I frame is optimized, binary tree and ternary tree division modes are skipped, the CTU is divided only by a quad tree division mode, the minimum dividing size of the coding unit is set as 16×16 without smaller size division, and the distortion obtained based on the first coding of the I frame may estimate the influence on the subsequent coding unit by the distortion of the coding unit at the I frame, thereby realizing adaptive adjustment of the QP of the I frame.
According to the present invention, VVC reference software VTM5.0 serves as an experimental platform, the experimental environment is configured according to the common test conditions (CTC) specified by JVET and the reference software, the experiment is only performed under an LDB coding structure, the experiment test sequences are 16 video sequences such as Class B, C, D and E suggested by CTC, and each test sequence uses four QP points (22, 27, 32 and 37) for coding.
The coding experimental result is shown in Table 1. The table shows the Y component of the test sequence under the LDB coding structure achieves 2.57% coding performance. For most test sequences, the performance of the present invention is obviously improved, especially for Class E, the performance is obviously improved, and 10.13% code rate is saved under the Y component. The main reason is that Class E is a video sequence with a relatively fixed scene, each video frame has high similarity and high temporal domain dependence, and the present invention can achieve a better effect for the sequence. Then, some sequences are selected, a curve comparison diagram is optimized on the basis of the rate distortion, and the improvement condition of the coding performance of the sequences is observed. As shown in
Similarly, in the aspect of the coding complexity, the coding complexity of the temporal domain rate distortion optimization algorithm under the LDB coding structure is averagely increased by 15%, which is mainly due to that it takes a certain amount of time to do motion search on each 16×16 block to find the affected coding block so as to establish the distortion propagation chain; meanwhile, the I frame is optimized through 2-pass coding. Although the first coding process of the I frame is simplified, a small amount of coding complexity is increased.
Number | Date | Country | Kind |
---|---|---|---|
202010241861.4 | Mar 2020 | CN | national |
This application is a continuation application of International Application No. PCT/CN2020/132812, filed on Nov. 30, 2021, which is based upon and claims priority to Chinese Patent Application No. 202010241861.4, filed on Mar. 31, 2020, the entire contents of which are incorporated herein by reference.
Number | Date | Country | |
---|---|---|---|
Parent | PCT/CN2020/132812 | Nov 2020 | US |
Child | 17412292 | US |