This application claims priority to and the benefit of Korean Patent Application No. 10-2012-0134287 filed in the Korean Intellectual Property Office on Nov. 26, 2012, the entire contents of which are incorporated herein by reference.
The present invention relates to a fast encoding technology of a video signal, and more particularly, to a technology of using a probability distribution of rate-distortion costs in order to accelerate prediction mode determination during an encoding process of an encoder.
Current video compression standards have been designed to enable intra screen prediction or inter screen prediction of various block sizes in order to effectively encode a video signal. For example, an H.264/advanced video coding (AVC) standard may divide a single 16×16 macro block into blocks having a size of 16×16, 16×8, 8×16, 8×8, 8×4, 4×8, or 4×4, and thereby perform prediction. Currently, a technology of predicting a video signal using further various blocks sizes compared to a related art is applied. Therefore, a high efficiency video coding (HEVC) standard having quad-tree coding structure is expected to perform prediction into various sizes ranging from a maximum of 64×64 to a minimum of 4×4.
A process of finding an optimal combination having the most excellent coding efficiency among combinations of prediction blocks with various sizes may be classified into (i) a splitting process and (ii) a pruning process. Initially, through the splitting process, prediction is performed for each size while splitting the largest block into small blocks and a rate-distortion value according thereto is stored. After repeating the above operation up to the smallest block, a sum of rate-distortion values of the smallest blocks is obtained and the obtained sum is compared with a rate-distortion value of a single upper block and thereby a smaller value therebetween is selected through the pruning process.
In general, according to an increase and diversification in a prediction block size, compression efficiency is enhanced. Compression efficiency about a high definition video signal of FULL-HD, UHD, and the like, is improved. However, combinations of probable prediction blocks also further increase and thus, an operation amount of an encoder used to determine the optimal prediction mode significantly increases. In the case of HEVC, when a depth of 64×64 largest CU (LCU) is “0”, four (32×32) CUs may be present in depth 1, 16 (16×16) CUs may be present in depth 2, and 64 (8×8) CUs may be present in depth 3. Each 8×8 CU may be split to PUs having a size such as 8×8, 8×4, 4×8, 4×4, and the like, and thereby be predicted. To determine the most optimal prediction mode, intra screen prediction and inter screen prediction need to be performed with respect to all of CU depths and PU splitting during the aforementioned splitting process and pruning process, which significantly increases an operation amount of an encoder.
The present invention has been made in an effort to provide a fast prediction mode determination method of a video encoder that may remove an unnecessary operation of an encoder by selectively terminating early or omitting a splitting process and a pruning process based on a probability distribution of rate-distortion values, and thereby enables the encoder to quickly determine a prediction mode.
The present invention may include a method that may adaptively change a termination and omission determination criterion of the splitting process and the pruning process based on a characteristic of an input image. When using the method provided by the present invention, reliability regarding the termination and omission determination of the splitting process and the pruning process may be set and thus, it is possible to adjust the tradeoff between a decrease in an operation amount and a quality degradation of the encoder.
An exemplary embodiment of the present invention provides a fast prediction mode determination method of a video encoder, the method including: an early splitting test process of determining an early split coding unit (CU) through comparison between a first rate-distortion value and a first threshold with respect to candidate prediction modes that are selected by calculating the first rate-distortion value with respect to each prediction unit (PU) split mode in a single CU of an intra screen image or an inter screen image; and an early pruning test process of determining an early pruned CU through comparison between a second rate-distortion value and a second threshold with respect to a candidate prediction mode that does not correspond to the early split CU.
The early split CU may be a CU in which calculation of the second rate-distortion value is omitted from a pruning process, and the early pruned CU may be a CU in which a splitting process and a pruning process with respect to remaining lower CUs are omitted.
The first rate-distortion value JLRD may be calculated according to equation JLRD=DISTLRD+λpred·Rpred, and the second rate-distortion value JFRD may be calculated according to equation JFRD=DISTFRD+λmode·Rmode. Here, DISTLRD may denote a sum of absolute differences (SAD) or a sum of absolute Hadamard transformed differences (SAID) based on a luminance pixel value of an image in a corresponding prediction mode, λpred may denote a Lagrangean multiplier in the corresponding prediction mode, Rpred may denote a bit amount occurring due to usage of the corresponding prediction mode, DISTFRD may denote a sum of absolute error (SSE) based on a luminance pixel value of an image in a corresponding prediction mode, λmode may denote a Lagrangean multiplier in the corresponding prediction mode, and Rmode may denote a bit amount occurring due to usage of the corresponding prediction mode.
In the early splitting test process, when the first rate-distortion value is greater than the first threshold, a corresponding prediction mode may be determined as the early split CU. In the early pruning test process, when the second rate-distortion value is less than the second threshold, the corresponding prediction mode may be determined as the early pruned CU.
In the early pruning test process, a corresponding second rate-distortion value with respect to the early split CU may be replaced with a summed value of second rate-distortion values of the respective lower split modes.
The first threshold and the second threshold may be respectively updated based on a distribution of the first rate-rate distortion value and a distribution of the second rate-distortion value that are obtained periodically or intermittently at a predetermined time.
The first threshold and the second threshold may be updated per a predetermined frame.
The first threshold and the second threshold may be updated based on a Bayesian rule.
A value that satisfies a conditional probability value α given through the Bayesian rule within an error range ε may be determined as the first threshold or the second threshold.
Another exemplary embodiment of the present invention provides a video encoder, including: an early splitting test means to perform an early splitting test process of determining an early split CU through comparison between a first rate-distortion value and a first threshold with respect to candidate prediction modes that are selected by calculating the first rate-distortion value with respect to each PU split mode in a single CU of an intra screen image or an inter screen image; and an early pruning test means to perform an early pruning test process of determining an early pruned CU through comparison between a second rate-distortion value and a second threshold with respect to a candidate prediction mode that does not correspond to the early split CU.
The early split CU may be a CU in which calculation of the second rate-distortion value is omitted from a pruning process, and the early pruned CU may be a CU in which a splitting process and a pruning process with respect to remaining lower CUs are omitted.
The early splitting test means may calculate the first rate-distortion value JLRD, according to equation JLRD=DLRD+λpred·Rpred, and may calculate the second rate-distortion value JFRD according to equation JFRD=DISTFRD+λmode·Rmode. Here, DISTLRD may denote a SAD or an SATD based on a luminance pixel value of an image in a corresponding prediction mode, λpred may denote a Lagrangean multiplier in the corresponding prediction mode, Rpred may denote a bit amount occurring due to usage of the corresponding prediction mode, DISTFRD may denote an SSE based on a luminance pixel value of an image in a corresponding prediction mode, λmode may denote a Lagrangean multiplier in the corresponding prediction mode, and Rmode may denote a bit amount occurring due to usage of the corresponding prediction mode.
When the first rate-distortion value is greater than the first threshold, the early splitting test means may determine a corresponding prediction mode as the early split CU. When the second rate-distortion value is less than the second threshold, the early pruning test means may determine the corresponding prediction mode as the early pruned CU.
In the early pruning test process, a corresponding second rate-distortion value with respect to the early split CU may be replaced with a summed value of second rate-distortion values of the respective lower split modes.
The first threshold and the second threshold may be respectively updated based on a distribution of the first rate-rate distortion value and a distribution of the second rate-distortion value that are obtained periodically or intermittently at a predetermined time.
The first threshold and the second threshold may be updated per a predetermined frame.
The first threshold and the second threshold may be updated based on a Bayesian rule.
A value that satisfies a conditional probability value α given through the Bayesian rule within an error range ε may be determined as the first threshold or the second threshold.
According to exemplary embodiments of the present invention, a fast prediction mode determination method of a video encoder may omit or partially perform only a portion of an operation with respect to a prediction mode during a video encoding process using a standard in which size and type of prediction blocks are various. Accordingly, compared to an existing scheme, it is possible to significantly decrease an operation amount required to determine whether to split a block. According to the method provided by the present invention, it is possible to adjust a determination criterion for omitting or partially performing the operation for the prediction block and thus, a user may select a decrease in an operation amount and quality degradation according thereto.
The foregoing summary is illustrative only and is not intended to be in any way limiting. In addition to the illustrative aspects, embodiments, and features described above, further aspects, embodiments, and features will become apparent by reference to the drawings and the following detailed description.
It should be understood that the appended drawings are not necessarily to scale, presenting a somewhat simplified representation of various features illustrative of the basic principles of the invention. The specific design features of the present invention as disclosed herein, including, for example, specific dimensions, orientations, locations, and shapes will be determined in part by the particular intended application and use environment.
In the figures, reference numbers refer to the same or equivalent parts of the present invention throughout the several figures of the drawing.
Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings. However, the present invention is not limited to or restricted by the exemplary embodiments.
As described above, the present invention may significantly decrease an operation amount required to determine whether to perform coding unit (CU) splitting and a prediction unit (PU) split mode by omitting or partially performing a splitting process or a pruning process with respect to a CU or a PU of a predetermined depth in an encoder for performing the splitting process and the pruning process. For the above operation, a distribution of rate-distortion values (costs) used for prediction mode determination in video encoding is modeled and used. In general, low complexity rate-distortion cost JLRD is used for comparison between prediction modes in prediction blocks of the same size, that is, comparison between intra screen prediction modes of which prediction directions differ or comparison between inter screen prediction modes of which motion data differs, and is calculated according to Equation 1.
J
LRD
=DIST
LRD+λpred·Rpred [Equation 1]
Here, a sum of absolute differences (SAD) and a sum of absolute Hadamard transformed differences (SAID) based on a luminance pixel value of an image in a corresponding prediction mode are used for a DISTLRD value, λpred denotes a Lagrangean multiplier, and Rpred denotes an approximate bit amount occurring due to usage of the corresponding prediction mode. By considering calculation complexity, a residual signal, that is, residual is not considered or is modeled from other values.
Next, to determine a final prediction mode of a prediction block, precise rate-distortion cost JLRD is used for rate-distortion cost comparison between prediction blocks of different sizes or comparison between different prediction modes, that is, to compare an intra screen prediction mode, an inter screen prediction mode that transmits motion data and a residual signal, and an inter screen prediction mode that does not transmit motion data and a residual signal, and the like, and is calculated according to Equation 2.
J
FRD
=DIST
FRD+λmode·Rmode [Equation 2]
Here, a sum of absolute error (SSE) based on a luminance pixel value of an image is used for a DISTFRD value according to a prediction mode, λmode denotes a Lagrangean multiplier, and Rmode denotes a bit amount occurring due to usage of a corresponding prediction mode and corresponds to the number of actually occurred bits that is calculated by performing entropy coding of a coefficient that is obtained by performing conversion, quantization, inverse conversion, and inverse quantization with respect to a residual signal for precision calculation.
Although JFRD provides a more accurate rate-distortion cost value compared to JLRD, it can be known from Equation 1 and Equation 2 that calculation of JFRD is further complex compared to calculation of JLRD. Depending on cases, JLRD or relatively simple other calculation in a similar form may be used to determine a final prediction mode in order to decrease a calculation amount.
As an example of existing HEVC configuration using the aforementioned JLRD and JFRD, a method of selecting a candidate prediction mode by calculating JLRD of each intra screen prediction direction with respect to all of the probable PU splitting in a CU of a predetermined depth and selecting the most optimal prediction mode by calculating JFRD with respect to the candidate prediction modes may be considered in the splitting process. Similarly, even with respect to inter screen prediction, a candidate prediction mode is selected from among prediction modes having different motion data using JLRD with respect to all of the probable PU splitting. Next, JFRD is calculated with respect to the candidate prediction mode. JFRD values of a prediction mode that does not transmit motion data and a prediction mode that transmits none of motion data and a residual signal are calculated.
By comparing the respective JfRD values obtained as above, it is possible to perform PU splitting and prediction mode determination with respect to a corresponding CU. The above process is repeatedly performed with respect to CUs of all of the depths from an LCU up to a smallest CU (SCU). Next, in the pruning process, it is possible to determine whether to split all of the CUs within the LCU by repeating an operation of comparing a JFRD sum of sub-CUs and JFRD value of an upper CU of the same area from the SCU up to the LCU.
The present invention may be configured by additionally performing an early splitting test and an early pruning test while the encoder is performing the aforementioned splitting process.
Initially, in the early splitting test, a predetermined means (for example, an early splitting test means) of the encoder calculates JLRD value (S111) with respect to each of P PU split modes in an intra screen image (an image for intra screen prediction having a predetermined number of pixels) and Q PU split modes in an inter screen image (an image for inter screen prediction having a predetermined number of pixels) (5110), in a CU of each depth, and selects candidate prediction modes within a predetermined range of the value (S112). When a current CU is not an SCU with respect to all of the candidate prediction modes (S113), the predetermined means of the encoder tests whether JLRD value of each prediction mode is greater than a predetermined threshold JLRD
Meanwhile, only when the CU is not the early split CU, a predetermined means (for example, a pruning test means) of the encoder performs the early pruning test in order to determine an optimal prediction mode (S120) and tests whether JFRD value of a predetermined prediction mode is less than a predetermined threshold JFRD
JLRD
Schemes to deduce a posterior probability from a prior probability are used to determine JLRD
P(ωj|x)=P(x|ωj)·P(ωj)/p(x) [Equation 3]
Here, x corresponds to JLRD or JFRD as a measurement value. In an event ωj, j denotes, as “1” or “2”, a case in which a predetermined CU is split to sub-CUs or PUs smaller than the CU and thereby is predicted (j=1) and a case in which the predetermined CU is not split and is predicted as a PU having the same size as the corresponding CU (j=2). In Equation 3, p(x|ωj) and P(ωj) denote a conditional probability distribution and the prior probability, respectively, and are calculated like p(x)=ρj=12P(x|ωj)·P(ωj). p(x|ωj) may be directly calculated from rate-distortion costs stored for each of the aforementioned criteria, or may be calculated by modeling a distribution of each rate-distortion cost. For example, it is possible to model the distribution of rate-distortion cost to a normalization distribution, a Laplacian distribution, and the like, and to calculate p(x|ωj) from a corresponding model. Accordingly, when rate-distortion cost with respect to a predetermined prediction block is given, it is possible to obtain a probability that the prediction block may be or may not be split to a lower prediction block through Equation 3. On the contrary, it is possible to calculate rate-distortion cost that satisfies a given conditional probability value a within an approximate error range εJLRD
Meanwhile, even though all of the constituent elements constituting the aforementioned exemplary embodiments of the present invention are described to be combined into a single module or to be combined and thereby operate, the present invention is not limited thereto. That is, without departing from the spirit of the present invention, all of the constituent elements may be selectively combined into at least one module and thereby operate. Even though each of all of the constituent elements may be configured as single independent hardware, a portion of or all of the constituent elements may be selectively combined and thereby be configured as a computer program having a program module that performs a portion or all of the combined functions in single or a plurality of hardware. The computer program may be stored in computer-readable media such as a universal serial bus (USB) memory, a CD disk, a flash memory, and the like, and thereby be read and executed by a computer, thereby embodying the exemplary embodiments of the present invention. Storage media of the computer program may include magnetic recording media, optical storage, media, carrier wave media, and the like.
As described above, the exemplary embodiments have been described and illustrated in the drawings and the specification. The exemplary embodiments were chosen and described in order to explain certain principles of the invention and their practical application, to thereby enable others skilled in the art to make and utilize various exemplary embodiments of the present invention, as well as various alternatives and modifications thereof. As is evident from the foregoing description, certain aspects of the present invention are not limited by the particular details of the examples illustrated herein, and it is therefore contemplated that other modifications and applications, or equivalents thereof, will occur to those skilled in the art. Many changes, modifications, variations and other uses and applications of the present construction will, however, become apparent to those skilled in the art after considering the specification and the accompanying drawings. All such changes, modifications, variations and other uses and applications which do not depart from the spirit and scope of the invention are deemed to be covered by the invention which is limited only by the claims which follow.
Number | Date | Country | Kind |
---|---|---|---|
10-2012-0134287 | Nov 2012 | KR | national |