The presently disclosed subject matter relates generally to the field of compression of video information, and more specifically, to methods and systems for optimized video encoding.
The compression of video information (including, in particular, digital video information) comprises a well-known area of prior art endeavor. Generally speaking, video information compression results in a reduced set of data that consumes less memory when stored and that requires less bandwidth to transmit during a given period of time. Also, generally speaking, one goal of good compression methodologies is to achieve such benefits with minimal computational complexity while achieving a desired target size or bitrate of the compressed video stream or bitstream, and obtaining maximal perceptual quality when viewing the decompressed video sequence.
Modern video compression methodologies, such as Advanced Video Coding (AVC), also known as H.264, High Efficiency Video Coding (HEVC), also known as H.265 and MPEG-H Part 2, Google's VP9 and AOMedia Video 1 (AV1), can achieve relatively high compression rates while providing good video quality. However, as known to those skilled in the art, video compression standards set forth only the specification of compliant bitstreams and the decoding methods. These standards do not address the encoder compression efficiency and performance. As a result, some implementing platforms may operate at a technical disadvantage, due, for example, to the power-consumption requirements and/or computational requirements that attend the encoder employing the compression techniques set forth in the corresponding methodology. In addition, existing prior art approaches generally require a particular bitrate level to achieve a particular level of perceived video quality, this bitrate being higher than is desired for many application settings.
In accordance with certain aspects of the presently disclosed subject matter, there is provided a computerized method of optimized video encoding, the method comprising: i) receiving a current video frame of an input video sequence to be encoded, the current video frame comprising a plurality of encoding blocks; ii) encoding the current video frame to generate a corresponding frame bitstream, comprising: for each encoding block of the plurality of encoding blocks: a) processing the encoding block using a filter, giving rise to a processed encoding block; b) computing a residual block as a difference between the encoding block and a corresponding predictor block; c) performing a frequency transform on the residual block to obtain a transformed residual block constituted by transform coefficients; d) performing an optimized quantization of the transform coefficients using a modified rate-distortion cost function, giving rise to quantized transform coefficients, wherein the modified rate-distortion cost function is obtained by configuring a reconstruction error in a rate-distortion cost function in accordance with a relation associated with the encoding block and the processed encoding block; and e) performing entropy encoding of the quantized transform coefficients to obtain a bit sequence corresponding to the encoding block; thereby giving rise to the frame bitstream comprising a plurality of bit sequences corresponding to the plurality of encoding blocks; and iii) placing the frame bitstream in an output video stream corresponding to the input video sequence, wherein upon decoding the output video stream, a reconstructed video frame corresponding to the current video frame has optimized perceived visual quality as compared to perceived visual quality of a reconstructed video frame of a corresponding frame bitstream which is generated without using the optimized quantization.
In addition to the above features, the method according to this aspect of the presently disclosed subject matter can comprise one or more of features (i) to (xi) listed below, in any desired combination or permutation which is technically possible:
In accordance with another aspect of the presently disclosed subject matter, there is provided a computerized system for optimized video encoding, the system comprising: an I/O interface configured to receive a current video frame of an input video sequence to be encoded, the current video frame comprising a plurality of encoding blocks; and a control circuitry operatively connected to the I/O interface, the control circuitry comprising a processor and a memory coupled thereto and configured to: i) encode the current video frame to generate a corresponding frame bitstream, comprising: for each encoding block of the plurality of encoding blocks: a) processing the encoding block using a filter, giving rise to a processed encoding block; b) computing a residual block as a difference between the encoding block and a corresponding predictor block; c) performing a frequency transform on the residual block to obtain a transformed residual block constituted by transform coefficients; d) performing an optimized quantization of the transform coefficients using a modified rate-distortion cost function, giving rise to quantized transform coefficients, wherein the modified rate-distortion cost function is obtained by configuring a reconstruction error in a rate-distortion cost function in accordance with a relation associated with the encoding block and the processed encoding block; and e) performing entropy encoding of the quantized transform coefficients to obtain a bit sequence corresponding to the encoding block; thereby giving rise to the frame bitstream comprising a plurality of bit sequences corresponding to the plurality of encoding blocks; and ii) place the frame bitstream in an output video stream corresponding to the input video sequence, wherein upon decoding the output video stream, a reconstructed video frame corresponding to the current video frame has optimized perceived visual quality as compared to perceived visual quality of a reconstructed video frame of a corresponding frame bitstream which is generated without using the optimized quantization.
This aspect of the disclosed subject matter can comprise one or more of features (i) to (ix) listed above with respect to the method, mutatis mutandis, in any desired combination or permutation which is technically possible.
In accordance with another aspect of the presently disclosed subject matter, there is provided a non-transitory computer readable storage medium tangibly embodying a program of instructions that, when executed by a computer, cause the computer to perform a method of optimized video encoding, the method comprising: i) receiving a current video frame of an input video sequence to be encoded, the current video frame comprising a plurality of encoding blocks; ii) encoding the current video frame to generate a corresponding frame bitstream, comprising: for each encoding block of the plurality of encoding blocks: a) processing the encoding block using a filter, giving rise to a processed encoding block; b) computing a residual block as a difference between the encoding block and a corresponding predictor block; c) performing a frequency transform on the residual block to obtain a transformed residual block constituted by transform coefficients; d) performing an optimized quantization of the transform coefficients using a modified rate-distortion cost function, giving rise to quantized transform coefficients, wherein the modified rate-distortion cost function is obtained by configuring a reconstruction error in a rate-distortion cost function in accordance with a relation associated with the encoding block and the processed encoding block; and e) performing entropy encoding of the quantized transform coefficients to obtain a bit sequence corresponding to the encoding block; thereby giving rise to the frame bitstream comprising a plurality of bit sequences corresponding to the plurality of encoding blocks; and iii) placing the frame bitstream in an output video stream corresponding to the input video sequence, wherein upon decoding the output video stream, a reconstructed video frame corresponding to the current video frame has optimized perceived visual quality as compared to perceived visual quality of a reconstructed video frame of a corresponding frame bitstream which is generated without using the optimized quantization.
This aspect of the disclosed subject matter can comprise one or more of features (i) to (ix) listed above with respect to the method, mutatis mutandis, in any desired combination or permutation which is technically possible.
In accordance with yet other aspects of the presently disclosed subject matter, there is provided a computerized method of optimized video encoding, the method comprising: sequentially receiving, by an I/O interface, a current video frame from a video sequence of input video frames to be encoded, the video frame comprising luma and chroma pixel planes; partitioning each pixel plane into encoding blocks, each encoding block comprising a rectangular block of pixel values from the pixel plane, and for each encoding block: selecting an initial predictor block from a previously encoded and reconstructed pixel plane associated with a previously processed video frame in the video sequence; computing an initial residual block as the difference between the encoding block and the initial predictor block; performing a frequency transform on the initial residual block giving rise to initial residual block transform coefficients; quantizing the initial residual block transform coefficients giving rise to initial quantized transform coefficients; estimating bit consumption of initial quantized transform coefficients; calculating the rate-distortion cost associated with the initial predictor block; performing inverse quantization and inverse transform giving rise to a reconstructed initial residual block; selecting an alternative predictor block, associated with a lowest approximate rate-distortion cost among all candidate alternative predictor blocks of the initial predictor block, and wherein the approximate rate-distortion cost is calculated using the reconstructed initial residual block; completing the block encoding process, giving rise to a bit sequence comprising the bitwise representation of the encoded block, and inserting the bit sequence into the output video stream.
In addition to the above features, the method according to this aspect of the presently disclosed subject matter can comprise one or more of features (a) to (g) listed below, in any desired combination or permutation which is technically possible:
In accordance with yet other aspects of the presently disclosed subject matter, there is provided a computerized system for optimized video encoding, the system comprising: an I/O interface configured to receive video information to be encoded, the video information comprising a sequence of video frames, and a control circuitry operatively connected to the I/O interface, the control circuitry comprising a processor and a memory coupled thereto and configured to encode a pixel plane of a video frame of said sequence, by passing the pixel plane to a partitioning module to obtain a set of encoding blocks, and then for each encoding block: activating the initial predictor selector to obtain an initial prediction block comprising prediction pixels and initial Motion Vector (MV) indicating initial prediction block relative coordinates; providing the initial predictor block to the transform and quantize module to obtain initial quantized transform coefficients; providing initial quantized transform coefficients to Rate estimator to obtain initial coefficient rate estimation; providing the inverse and transform module with quantized transform coefficients and to obtain the initial reconstructed residual block; providing the residual and predictor combiner with the reconstructed residual block and a prediction block to obtain a reconstructed block; providing the reconstructed block to the distortion(s) estimator to obtain at least one distortion estimation; Providing the Motion Vector to the Rate estimator to obtain the MV rate estimation; calculating the modified rate distortion according to the calculated rate and distortion values and providing the cost to the Decider; then, for each candidate alternative predictor corresponding to an alternative Motion Vector and alternative prediction block, the system is configured to repeat the providing the residual and predictor combiner, providing the reconstructed block to the distortion(s) estimator, providing the Motion Vector to the Rate estimator and calculating the modified rate distortion, for each candidate alternative predictor corresponding to an alternative Motion Vector and alternative prediction block. The system further comprises a decider module which performs selection of the Motion Vector corresponding to the optimal predictor candidate and residual to encode, and an entropy encoding module which performs encoding of block data giving rise to a bit sequence, which is inserted into the output video stream.
This aspect of the disclosed subject matter can comprise one or more of features (a) to (g) listed above with respect to the method, mutatis mutandis, in any desired combination or permutation which is technically possible.
In accordance with yet other aspects of the presently disclosed subject matter, there is provided a computerized method for optimized video encoding, which uses rate-distortion functions for decisions during encoding of an encoding block, and further comprising a modified rate-distortion cost calculator, wherein the modified rate-distortion calculation is obtained by computing a first complexity value associated with an encoding block, setting scaling factors according to this first complexity value, calculating the reconstructed block distortion values and adapting the rate-distortion functions used by the encoder by applying the scaling factors to the block distortion values.
In addition to the above features, the method according to this aspect of the presently disclosed subject matter can comprise one or more of features (1) to (6) listed below, in any desired combination or permutation which is technically possible:
In accordance with yet another aspect of the presently disclosed subject matter, there is provided a non-transitory computer readable storage medium tangibly embodying a program of instructions that, when executed by a computer, cause the computer to perform the method steps of any of the methods disclosed above.
The above needs are at least partially met through provision of the apparatus and method for optimized video encoding described in the following detailed description, particularly when studied in conjunction with the drawings.
In order to understand the presently disclosed subject matter and to see how it may be carried out in practice, embodiments will now be described, by way of non-limiting example only, with reference to the accompanying drawings, in which:
Elements in the figures are illustrated for simplicity and clarity and have not necessarily been drawn to scale. For example, the dimensions and/or relative positioning of some of the elements in the figures may be exaggerated relative to other elements to help to improve understanding of various embodiments of the present teachings. Also, common but well-understood elements that are useful or necessary in a commercially feasible embodiment are often not depicted in order to facilitate a less obstructed view of these various embodiments of the present teachings. Certain actions and/or steps may be described or depicted in a particular order of occurrence while those skilled in the art will understand that such specificity with respect to sequence is not actually required. The terms and expressions used herein have their ordinary technical meaning as are accorded to such terms and expressions by persons skilled in the technical field as set forth above, except where different specific meanings have otherwise been set forth herein.
In the following detailed description, numerous specific details are set forth in order to provide a thorough understanding of the invention. However, it will be understood by those skilled in the art that the presently disclosed subject matter may be practiced without these specific details. In other instances, well-known methods, procedures, components and circuits have not been described in detail so as not to obscure the presently disclosed subject matter.
Unless specifically stated otherwise, as apparent from the following discussions, it is appreciated that throughout the specification discussions utilizing terms such as “receiving”, “encoding”, “processing”, “calculating”, “computing”, “estimating”, “configuring”, “filtering”, “obtaining”, “generating”, “using”, “extracting”, “performing”, “placing”, “adding”, “partitioning”, “applying”, “comparing”, “sharpening”, “scaling”, “calculating”, “clipping”, “multiplying”, “repeating”, or the like, refer to the action(s) and/or process(es) of a computer that manipulate and/or transform data into other data, said data represented as physical, such as electronic, quantities and/or said data representing the physical objects. The term “computer” should be expansively construed to cover any kind of hardware-based electronic device with data processing capabilities including, by way of non-limiting example, the system/apparatus and parts thereof as well as the control circuit/circuitry therein disclosed in the present application.
The terms “non-transitory memory” and “non-transitory storage medium” used herein should be expansively construed to cover any volatile or non-volatile computer memory suitable to the presently disclosed subject matter.
It is appreciated that, unless specifically stated otherwise, certain features of the presently disclosed subject matter, which are described in the context of separate embodiments, can also be provided in combination in a single embodiment. Conversely, various features of the presently disclosed subject matter, which are described in the context of a single embodiment, can also be provided separately or in any suitable sub-combination. In the following detailed description, numerous specific details are set forth in order to provide a thorough understanding of the methods and apparatus.
Generally speaking, pursuant to these various embodiments, an apparatus has an input configured to receive video information and an output configured to provide an output video stream. The apparatus includes a control circuit operably coupled to the foregoing input and output and configured to perform optimized video encoding. It will be noted that some of the encoding operations described herein do not relate to the novel aspects of the invention, but are provided for the sake of completeness and clarity.
By one approach, the control circuit is configured to perform optimized quantization of the transform coefficients associated with the encoding block. In one of the steps of hybrid block-based encoders, transform coefficients are quantized prior to encoding into the bitstream. The control circuit described herein can be configured to perform the quantizing using quantization rate-distortion functions which utilize modified reconstruction errors, wherein the modification is according to a relation between the encoding block and a processed version of the encoding block.
By another approach, in lieu of the foregoing or in combination therewith, the control circuit is configured to perform optimized INTER predictor selection, or Motion Vector (MV) refinement. In one of the steps of hybrid block-based encoders, when performing inter-frame prediction, the encoder seeks an optimal MV, corresponding to an optimal predictor, for the encoding block. After an initial predictor has been selected by the encoder, for example using coarse motion-estimation with a method known by those skilled in the art, the control circuit described herein can be configured to select an alternative predictor block associated with a lowest approximate rate-distortion cost among all candidate alternative predictor blocks of the initial predictor block, wherein the approximate rate-distortion cost is efficiently calculated using the reconstructed initial residual block.
By yet another approach, in lieu of the foregoing or in combination therewith, the control circuit is configured to perform optimized encoding by using modified rate-distortion calculations. The control circuit described herein can be configured to compute a first complexity value associated with the encoding block, and set scaling factors according to the first complexity value, and upon calculating reconstructed block distortion values, adapting the rate-distortion functions by applying the scaling factors to the block distortion values.
Using one or more of the aforementioned techniques, video information can be processed in a way that can greatly reduce the computational and/or bitrate requirements of the resulting compressed video bitstream. In particular, many prior art compression methodologies, including the recent HEVC standard, can be carried out in a considerably more efficient and less computationally-intensive manner. As a result, use of these teachings can reduce power requirements and/or can reduce the computational overhead requirements of the implementing encoder hardware while also possibly reducing the necessary bitrate. More importantly, these teachings permit a lower bitrate to be utilized than previous approaches while maintaining at least a similar level of perceptible quality and can also achieve a higher level of perceptible quality at a given bitrate than existing approaches. These and other benefits may become clearer upon thorough review and study of the following detailed description.
Referring now to the drawings,
There is presented an enabling computer-based apparatus/system 100 configured to perform optimized video encoding.
System 100 can comprise a control circuitry (also termed herein as control circuit, not shown separately) operating jointly with a hardware-based I/O interface 110 and a storage module or buffer 112. The system 100 may obtain, e.g., via I/O interface 110, video information to be encoded, the video information comprising a sequence of video frames (also termed herein as frames or input frames). In some embodiments, the input video information or the video frames thereof can be received from a user, a third-party provider or any other system that is communicatively connected with system 100. Alternatively, or additionally, the input video information or the video frames thereof can be pre-stored in the storage module or buffer 112.
The control circuitry is a processing circuitry configured to provide all processing necessary for the required blocks, which are further detailed below. The control circuitry refers to hardware (e.g., an electronic circuit) within a computer that executes a program. The control circuitry can comprise a processor (not shown separately) and a memory (not shown separately). The processor of system 100 can be configured to execute several functional modules in accordance with computer-readable instructions implemented on a non-transitory computer-readable memory comprised in the control circuitry. Such functional modules (such as, e.g., the video encoding module 118, or any modules included therein) are referred to hereinafter as comprised in the control circuitry.
According to certain embodiments, a “circuit” or “circuitry” can include a structure that includes at least one (and typically many) electrically-conductive paths (such as, e.g., paths comprised of a conductive metal such as copper or silver) that convey electricity in an ordered manner, whose path(s) will also typically include corresponding electrical components (both passive, such as, e.g., resistors and capacitors, and active, such as, e.g., any of a variety of semiconductor-based devices, as appropriate) to permit the circuit to effect the control aspect of these teachings.
Such a system 100 can comprise a fixed-purpose hard-wired hardware platform (including but not limited to, e.g., an application-specific integrated circuit (ASIC) which is an integrated circuit that is customized by design for a particular use, rather than intended for general-purpose use, a field-programmable gate array (FPGA), and the like) or can comprise a partially or wholly-programmable hardware platform (including but not limited to, e.g., microcontrollers, microprocessors, and the like). If desired, the system 100 can comprise an integral part of a dedicated video encoder integrated circuit which can implement the functionalities of the functional module-video encoding module 118, as will be described below. The system 100 can be configured (for example, by using corresponding programming as will be well understood by those skilled in the art) to carry out one or more of the steps, actions, and/or functions described herein.
As aforementioned, the system 100 can comprise a processor and a memory. By one approach the system 100 can be operably coupled to the memory. This memory may be integral to the control circuitry or can be physically discrete (in whole or in part) from the control circuitry as desired. This memory can also be local with respect to the control circuitry (where, for example, both share a common circuit board, chassis, power supply, and/or housing) or can be partially or wholly remote with respect to the control circuitry (where, for example, the memory is physically located in another facility, metropolitan area, or even country, as compared to the control circuitry).
In addition to other useful information described herein, this memory can serve, for example, to non-transitorily store the computer instructions that, when executed by the control circuitry, cause the control circuitry to behave as described herein. As used herein, the reference to “non-transitorily” will be understood to refer to a non-ephemeral state for the stored contents (and hence excludes when the stored contents merely constitute signals or waves) rather than volatility of the storage media itself and hence includes both non-volatile memory (such as e.g., read-only memory (ROM)) as well as volatile memory (such as, e.g., an erasable programmable read-only memory (EPROM)).
As aforementioned, the I/O interface 110 (also referred to herein separately as input interface and output interface or input and output) is operably coupled to the system 100 and is configured to receive video information to be encoded, the video information comprising a sequence of video frames, as well as to output a compressed or encoded video stream or bitstream.
The teachings herein will accommodate receiving video information in any of a wide variety of formats. In a typical application setting, the video information can constitute digital content. By one approach, if desired, the original video content can have an analog format and can then be converted to a digital format to constitute the video information.
As noted above, the received video information is “to be compressed”. By one approach the video information refers to any original video content that has not been compressed in any way, aside from some optional inherent compression that might occur during the digitization process, such as, e.g., an original raw video clip or part thereof. Such a video clip can comprise a plurality of original video frames, and can be obtained from, e.g., a digital camera or recorder, or any other suitable devices that are capable of capturing or recording individual still images or sequences of images constituting videos or movies. By another approach, the video information may already have undergone some compression but, if so, is still nevertheless to be compressed again via the video encoding module 118 (also referred to as video frame encoder). In such cases, video bit-stream (also referred to as video bitstream or video stream) that contains encoded data can be first decoded or reconstructed to a decoded video sequence prior to being further processed using the present disclosure. The input video information can comprise the decoded or reconstructed video sequence which was decoded from the encoded video bit-stream. In this case, the compression refers to recompression of the video information. Without limiting the scope of the disclosure in any way, it should be noted that the term “frame” used in the specification should be expansively construed to include a single video picture, frame, image, field, or slice of the input video sequence.
The terms Rate-Distortion, RD, rate-distortion, and Rate-Distortion cost, RD cost, RDcost and RdCost may be interchangeably used herein. As known to those skilled in the art of video compression, rate-distortion cost uses a cost function which combines estimated rate or bits required when encoding certain data in a specific manner, with a measure of distortion which this specific manner of encoding will introduce to the corresponding reconstructed data. Generally speaking, encoders aim to minimize the RD cost to obtain the best possible quality at the lowest possible rate. Different forms of RD cost functions will result in different encoding decisions, and hence different bitrate of the compressed video stream and/or different quality of the reconstructed video obtained when decoding said video stream.
According to certain embodiments the system 100 may receive as input a sequence of input video frames, or a single video frame from such a sequence 102. Each video frame may correspond to one or more pixel planes. For example, each frame may consist of one luminance or luma plane and two chrominance or chroma planes. The system 100 may further provide as output an output video stream 104, or bitstream, containing the bits corresponding to the compressed input frame or frames, i.e. a bitstream, which when fed into a corresponding decoder, will result in reconstructed video frames which are similar to the input video frames.
According to certain embodiments, functional modules comprised in the processor of the system 100 can comprise a frame level rate control module 116, a bitstream control module 117, and a video frame encoder 118 which are operatively connected with each other. The frame level rate control module 116 is used, as known by those skilled in the art, to configure the video frame encoder, and set parameters such as frame type and frame bit allocation or quantization parameters. The bitstream control module manages the bitstream creation and receives, as input, data and bit sequences from the frame level rate control module 116 and the video frame encoder 118. The video frame encoder 118 can be configured to perform optimized video encoding in various ways as described herein. The block partitioning module 120 splits the frame into coding blocks such as Macro-Block or Coding Units. Note that in most coding standards and techniques further sub-partitioning of these coding blocks is also supported. In this description the terms a coding block or encoding block are used to describe either the entire coding block or sub blocks thereof, and do not provide any distinction between them for the purpose of the present teachings. For each coding block, all or some of the modules described in the block coding module 119 are invoked. These include, but are by no means limited to, the optimized rate distortion cost calculator 122 described with reference to
Those skilled in the art will be familiar with a wide variety of video encoders and compression techniques employing frame level rate control, block partitioning, initial predictor selection, residual calculation, frequency transform calculation, entropy encoding and decoding or reconstruction. As the present teachings are not especially sensitive to any particular choices in this regard, no further elaboration is provided here.
The storage module or buffer 112 comprises a non-transitory computer readable storage medium. For instance, the storage module can include a buffer that holds the input video information as well as an output video sequence. In another example, the buffer may also hold one or more of the intermediate results including but not limited to previously encoded and reconstructed blocks, pixel planes or video frames. According to certain embodiments, the storage module or buffer 112 can also comprise computer-readable instructions embodied therein to be executed by the system 100 for implementing the process of optimized video encoding as described below with reference to
Those versed in the art will readily appreciate that the teachings of the presently disclosed subject matter are not bound by the system illustrated in
The system in
It is also noted that the system illustrated in
While not necessarily so, the process of operation of system 100 can correspond to some or all of the stages of the methods described with respect to
Turning now to
Turning now to
Turning now to
One possible embodiment of the optimized predictor refinement module 124 will now be provided. First, an initial Rate-Distortion cost associated with the initial predictor is calculated. This is performed using the rate estimator 345, a module which for some x provides an estimated bit consumption resulting from entropy encoding of x, which may be denoted as B(x), the distortion(s) calculator 350, a module which for two inputs y1, y2, provides an estimate of the distortion between these two inputs which may be denoted as D(y1, y2), and the modified Rate-Distortion calculator 360, which given a bit estimation B and one or more distortion values D calculates a RD cost, for example using a Lagrange multiplier format such as RDcost=λB+D. For calculating the initial RD cost the rate estimator is applied to the initial predictor residual coefficients Cq(MVo) and the initial MV, while the distortion calculator is applied to the encoding block and the initial predictor reconstructed block. This yields a RDcost values associated with the initial predictor. The goal of the optimized predictor refinement module is to find an alternative predictor which yields a lower RDcost. The often-adopted approach to address this is to evaluate candidate alternative predictors by repeating the full process of obtaining predictor, calculating the residual, applying frequency transform followed by quantization, inverse quantization and inverse transform, thus obtaining a candidate reconstructed residual, and using this data to calculate the rate distortion of the proposed candidate predictor. The disadvantage of this approach is primarily the abhorrent computational cost of this task, leading either to slow or computationally demanding encoding, or to poorer compression efficiency if the refinement process of choosing an improved predictor is precluded or constrained. In the approach proposed herein this may be overcome by performing the proposed optimized refinement process. The alternative predictor selector provides selected candidates of the alternative prediction block. These candidates depend on the configuration of the selector but may, for example, in one embodiment, consist of all predictors associated with a candidate motion vector MVj which are close to the initial motion vector MVo, for example MVj=MVo+delta_j wherein delta_j may indicate positive or negative increase in either the horizontal or vertical motion vector components, or in both. For each such alternative predictor, the following steps are performed. The candidate prediction is added to the initial predictor reconstructed residual 308 to obtain an estimated candidate reconstructed block. Then the distortion calculator(s) is/are applied to calculate Dj, a distortion between the encoding block and the estimated candidate reconstructed block. The rate estimator 345 is applied to MVj to obtain an estimation of the bits required for encoding this motion vector. Then the modified rate-distortion cost calculator 360 calculates the estimated RDcost_j based on the estimated rate and distortion values. The decider 370 determines which of the candidate alternate predictors is optimal and controls the output of the optimized predictor refinement module. This may be decided for example by selecting the candidate with the lowest RDcost_j value, however, other criteria or logic may also be used by the decider for this purpose. In some embodiments of the presently disclosed subject matter the encoding may proceed always using the initial predictor residual coefficients 312 combined with the selected refined motion vector. In yet other embodiments, the decider, when selecting the refined motion vector to be used, may decide, according to some internal logic, to use the coefficients corresponding to the selected motion vector. This internal logic may for example be based on the absolute difference between the initial motion vector and the selected motion vector, where the coefficients corresponding to the selected motion vector will be calculated when this difference exceeds a threshold. However, other criteria or logic may also be used by the decider for this purpose. In order to provide the coefficients corresponding to the selected motion vector, the residual between the selected predictor and encoding block is calculated by the residual calculator 325. This residual then undergoes a frequency transform and quantization by the transform and quantize module 330 and the resulting alternative residual coefficients 307 are provided as output for further encoding steps. In yet another embodiment the residual calculation and transform and quantization may be performed by corresponding blocks 126, 128 and 130 of block coding module 119.
Turning now to
Turning now to
resulting in the reconstructed block of transform coefficients CR. The integer numbers M(j) and D used in a given decoder are pre-defined according to the quantization level, and the integer values division is done by rounding to the nearest integer. The simplest method of quantization is a calculation of Cq(j) based on the above formula, as the inverse of the coefficients reconstruction. Such methods are called fixed dead-zone quantization and are well known in the prior art. For a specific set of quantized coefficients, the reconstruction error or distortion may be formulated as:
wherein Ej(Cq(j), C(j)) denotes a reconstruction error corresponding to a single quantized coefficient Cq(j), relative to a corresponding initial coefficient C(j). In some embodiments, the reconstruction error can be indicative of a difference related to the transform coefficients and corresponding de-quantized transform coefficients. In more detail, the error is related to the difference between the original pixel values of the residual block corresponding to C, and the pixel values obtained when performing reconstruction of the pixels by inverse quantization and inverse transform applied to Cq. Based on this, the problem of optimal quantization in the encoder can be formulated: obtain the quantized coefficients Cq which meet the contradictory requirements of minimizing both the number of compressed bits B(Cq) and the block reconstruction error E(Cqr, C).
In a Rate-Distortion optimized quantization method the quantized coefficients Cq are calculated as a minimization problem solving: Cq=argmin(E(Cq, C)+λB(Cq)), wherein the Lagrange multiplier λ is pre-defined for a given block and quantization level. In practice, trellis quantization algorithms are usually used for an efficient numeric solution of this minimization problem.
Although the quantization methods that are based on Rate-Distortion provide the encoding close to optimal in the Rate-Distortion sense, their usage typically leads to smoother reconstructed images compared to those provided by simple fixed dead-zone quantization. This in turn results in lower perceived visual quality of the reconstructed video. The present disclosure proposes an optimized quantization of the transformed residual block of transform coefficients using modified reconstruction errors in Rate-Distortion cost functions which utilize a function related to the encoding block and the processed encoding block. The optimized quantization as illustrated in step 524 and described below is targeted at maintaining the advantages of Rate-Distortion based quantization while simultaneously improving the perceived visual quality of the reconstructed video. Specifically, upon decoding the output video stream, a reconstructed video frame corresponding to the current video frame has optimized perceived visual quality as compared to perceived visual quality of a reconstructed video frame of a corresponding frame bitstream which is generated without using the optimized quantization.
According to some embodiments, the encoding block can be processed by applying a filter in the pixel domain to the encoding block. A frequency transform can be performed on the encoding block to obtain a transformed encoding block, and can be performed on the processed encoding block to obtain a transformed processed encoding block, as illustrated in
According to some other embodiments, a frequency transform can be first performed on the encoding block to obtain a transformed encoding block. A filter in the transform domain can be applied to the transformed encoding block to obtain the transformed processed encoding block, as illustrated in
For purpose of illustration and exemplification, the initial encoding block pixels are denoted as pixels Porig and Psharp. The encoding block is processed in step 521, for example, by applying a filter (e.g., a texture sharpening filter) on the initial block. Denote the transform coefficients of a frequency transform of the blocks Porig and Psharp as Torig and Tsharp correspondingly. Alternatively, in another embodiment, Torig is obtained in the same manner while Tsharp is obtained by applying a transform domain filter to Torig. The relation associated with the encoding block and processed encoding block, which is used to configure the reconstruction error, can be, in some cases, a relation between corresponding pixel values in the two blocks Porig and Psharp. In some other cases, the relation can be between corresponding transform coefficients of the transformed encoding block Torig and the transformed processed encoding block Tsharp. For instance, consider the block Tscale whose elements can be calculated for example as Tscale(j)=Tsharp(j)−Torig(j), or in yet another example as Tscale(j)=Tsharp(j)/Torig(j) or via any other suitable relations thereof. Then, in one embodiment of the presently disclosed subject matter, it is proposed to calculate the quantized transform coefficients for the fixed quantization level as:
wherein functions Xj and Yj are parameters of the modification method. In some embodiments, the reconstruction error Ej can be configured by scaling the reconstruction error by a scaling factor (Xj) which is dependent on the relation. In some embodiments, the reconstruction error Ej can be configured by adding to the reconstruction error a difference value (Yj). For instance, a difference between transform coefficients in the transformed encoding block and the corresponding transformed processed encoding block can be calculated, and the difference can be clipped in accordance with a quantizer step size to obtain the difference value. Note that adding the function Yj(Tsharp(j)Torig(j)) to the initial coefficients C(j) as done here is equivalent to adding it to the difference C(j)−CR(j) in the square error calculation. Examples of functions Xj and Yj include but are not limited to setting Xj to 1, and Yj(x, y)=clip(2(x−y),−D/(2M(j))), wherein clip(x, A, B) means clipping the value x in range [A; B]. In another example Xj is set to be some monotonically non-decreasing function and Yj is set to zero. The solution to this minimization problem can then be found using the same methods employed in conventional Rate-Distortion optimizing of quantization such as the Trellis scheme, the RDOQ approach implemented in the HM16 test model, etc.
As aforementioned, in some embodiments the processing of the encoding block may be implemented using a sharpening filter which is usable for enhancing one or more features in the encoding block including, e.g., edges, etc. By way of a non-limiting example the sharpening filter may be implemented as:
wherein NB(j) is the indices set for 8 pixels spatially neighboring to Porig(j).
In yet other embodiments the processing of the encoding block may be applied multiplicatively in the transform coefficients domain as Tsharp(j)=α(j)·Torig(j). In this case is possible to set Tscale(j)=α(j). Note that if the transform used by the video coding algorithm keeps the convolution-multiplication property and a linear FIR filter is used for processing in the pixel domain, then there exists an equivalent multiplication filter in the transform domain which may be used instead.
It is to be noted that the optimized encoding as described above with reference to
Turning now to
An initial prediction pixels block Ppred is obtained for the current pixel block P, as depicted in step 610.
In step 522 pixels of an initial prediction residual block are calculated as Rinit(j)=Porig(j)−Ppred(j), wherein (j) indexing denotes individual pixels of the blocks.
A frequency transform, such as an integer approximation of the Discrete Cosine Transform, is applied to Rinit resulting in the transform coefficients block C, as depicted in step 523.
Transform coefficients block C is quantized resulting in the quantized coefficients block Cq, as depicted in step 630.
In step 640 the block of quantized coefficients is either compressed using full entropy decoding resulting in a bit sequence consisting of B(Cq) bits, or alternatively B(Cq) is set to an estimate of the bits requires to encode the block of quantized coefficients foregoing the entropy encoding at this stage in order to reduce computational costs. The number of bits corresponding to the header data of the block, denoted as Bheader and corresponding for example to the prediction mode or Motion Vector used, is similarly either calculated or estimated.
The quantized coefficients block Cq is inverse quantized resulting in the block of the reconstructed coefficients Crec, and the reconstructed coefficients block Crec is inverse transformed resulting in the reconstructed prediction residual block Rrec, as depicted in step 650.
Further details on this system, previously described in regard to
Thus, for example, when SSD is selected as the distortion metric to be used, Rate-Distortion optimized encoding will select encoding parameters for each inter block in order to minimize the block cost function of the form:
RDcost=λ×(B(Cq)+Bheader)+SSD(Porig, Prec) wherein λ is a pre-defined Lagrange multiplier and SSD(Porig, Prec) is a sum of square differences between the pixel blocks Porig and Prec. This cost function calculation is very computationally costly since it requires fulfilling all of the steps presented above of calculating a residual, transforming, quantizing, entropy coding for bit estimation, inverse quantizing, inverse transforming and calculating the reconstructed block Prec. Note that for a given reference frame and a given encoding block the blocks, Ppred, Prec, Rrec, Cq, the value of Bheader and the cost function RDcost are fully defined by the motion vector MV used for inter-prediction. This is why they may be referred to as Ppred(MV), Prec(MV) etc.
For the purpose of cost function minimization, the motion estimation is sometimes done in two stages. The first stage includes selecting a primary motion estimation resulting in the motion vector MVo. At the second stage, additional Rate-Distortion optimizing refinement of the motion vector MVo is performed, with the goal of seeking an optimal MV from a set of candidates “similar” to MVo for example in the spatial vicinity of MVo. The optimal motion vector is the one corresponding to the lowest value of RDcost, and it will be used to complete the block encoding as depicted in steps 670 and 680. Note this is particularly prevalent when reusing motion information obtained using corresponding lower resolution content, which for example may be performed in a pre-process step, or in cascade or hierarchical video encoding, or in multi-stream encoding when multiple resolutions are encoded simultaneously. It is also often used in the context of sub-pixel MV refinement, where the initial search seeks the best full-pel or integer pixel MV and the refinement stage finds the sub-pel offset providing the best prediction result.
This additional Rate-Distortion optimizing motion vector refinement at the second stage increases the encoding quality in Rate-Distortion sense but leads to a dramatic slow-down of the encoding, or increase in computational requirements, and thus cannot be used when encoding speed, or when CPU utilization is a primary concern.
The proposed optimized method for selecting an alternative predictor block, depicted in step 660, offers a solution which is both significantly faster than the simple one described above, and at the same time yields predictors which are almost as optimal, thus significantly improving coding efficiency compared to the case of using the initial MV, without further refinement, to encode the block.
In accordance with certain embodiments of the optimized motion vector refinement presently disclosed, it is proposed to use the prediction residual corresponding to the initial motion vector when evaluating the corresponding RDcost for all alternative MV candidates, thus not requiring performing the computationally costly steps 522, 523, 630, 640 and 650 per each candidate MV. In some embodiments, the initial residual may be used for encoding the bitstream. In other embodiments, steps 522, 523, 630, 640 and 650 may be repeated for the selected MV only, still obtaining significant performance improvement as they are not performed per each MV candidate. In yet other embodiments, the optimized refinement may be applied as an iterative process, whereby, after selection of the optimized predictor, this MV is considered the ‘new’ MVo and the refinement process is repeated.
Some further details are now provided describing an example embodiment of step 660, as depicted in
RdCost(MV0)=λ·(B(Cq(MV0))+Bheader(MV0))+SSD(Porig,Prec(MV0)
In Step 662 a candidate motion vector is selected, and the full set of candidates selected can be defined as MVj, j∈
RdCost(MVj)=λ·(B(Cq(MV0))+Bheader(MVj))+SSD(Porig,Ppred(MVj)+RRec(MV0))
This process is repeated for each candidate MV until step 666 determines that all candidates have been evaluated. Then, in 667 the optimized motion vector MVopt is selected, as the MVj corresponding to the minimal value of RdCost(MVj), j∈
The functions used above in RDcost calculation are provided as a non-limiting example only. The proposed optimized refinement may be used in conjunction with any other distortion functions.
Turning now to
For this figure it is assumed that an encoding block is obtained, as described with reference to previous figures. Optionally, processing of the encoding block, as described regarding
Step 664 illustrates an example of mode selection in accordance with certain embodiments of the presently disclosed subject matter. First, the complexity of the encoding block is calculated in step 710. This complexity may be calculated based on the pixels themselves or some transform thereof. The calculated Block Complexity may be denoted as BC. This block complexity may for example be a measure of the texture, or texture strength, variation in the block.
In one non-limiting example, BC may be calculated as follows: Denote Bj8×8, j∈
Next, in step 715, the calculated complexity value BC is used to obtain corresponding one or more scale Factor values, denoted below as F. This may be done via a calculation, a Look-Up-Table or using any mapping function. For example, and without limitation, a single scale Factor value F(BC) may be used, which will be used to scale a distortion metric in the RD cost calculation, according to block complexity. In a further example two scale Factor values may be used, each to scale a different distortion metric in the RD cost calculation. In yet another example, the possible range for BC values may be divided into N intervals, interval_1 corresponding to very low complexity values, and interval_N corresponding to very high complexity levels, and different scale Factor values or scale Factor functions may be used for the different intervals. Further by way of example, there may be four Scale-Factor values: A pair of scaling factors, one for high and one for low levels of BC, used to scale a texture complexity difference based distortion metric: Fcmpllow(BC) and Fcmplhigh(BC), and another pair of scaling factors, again one for high and one for low levels of BC, used to scale a Sum of Square Differences based distortion metric: Fssdlow(BC) and Fssdhigh(BC). These scale factors will be used in calculating the modified RD cost as detailed below. In a non-limiting example of a possible calculation of these scaling factors according to BC, the possible range for BC values is divided into 6 intervals, where interval_1 corresponds to BC values from 0 to interval_2_s (non-inclusive), interval_2 corresponds to BC values from interval_2_s to interval_3_s (non-inclusive) etc. Then, the scaling Factor, can be calculated for the low complexity intervals 1-3 as a monotonically non-decreasing function of the form:
And the scaling Factor for the high complexity intervals 4-6, can be calculated as a monotonically non-increasing function of the form:
whereby C1, C2, C3, C4 and R1, R2, R4, R5 are constant and ratio values selected such that the value corresponding to the highest BC value of interval_i equals the value corresponding to the lowest BC value of interval_i+1. These function forms and division into 6 intervals is provided by way only of a non-limiting example. In yet another example, the scaling factors may be calculated using a monotonically non-increasing function for low complexity intervals, and a monotonically non-decreasing function for the high complexity intervals.
Steps 720, 725 and 730 are then repeated for each candidate encoding mode, with the goal of finding the optimal mode to use when encoding the block. In step 720 selected distortion measures are calculated for the encoding block and a reconstructed block associated with the candidate mode, wherein the reconstructed block may be received for example using the decoding/reconstruction module 134 possibly after employing one or more of blocks 126, 128 and 130 to calculate a residual, and perform frequency transform and quantization. These distortion measures may be calculated using block 350, block 415 and/or block 420 previously described. A bitrate or rate estimate associated with encoding the block with the candidate mode is obtained in step 725, for example using block 345. Then, in step 730 the modified rate-distortion cost is calculated using the distortion(s) calculated in step 720, the rate estimate form step 725 and the scaling factors obtained in step 715. For example, as is implemented in some existing video encoding implementations, the RD cost may be calculated as:
Cost=SSD(Borig, Brec)+λ·R+μ·CmplDiff(Borig, Brec), wherein R is the a block rate, SSD(Borig, Brec) is a sum of square differences between the original encoding block Borig pixels, and the reconstructed block Brec pixels, CmplDiff(Borig, Brec) is a measure of difference between the texture variation strengths of the original Borig and the reconstructed Brec blocks, and λ, are pre-calculated constants depending on the quantization level.
In Rate-Distortion optimized encoding, driven purely by the numeric distortion and rate values, μ is usually set to zero. Non-zero values of μ are used in order to improve the subjective quality of the reconstructed frame. Using non-zero μ values enables better preserving the accuracy of perceived fine texture elements in the frame. Typically, as μ increases, the size or bitrate of the block corresponding to the optimal coding mode will also increase. Thus, an increase of μ causes the RD optimization process to converge to selections which result in increase of the frame visual or subjective quality, at the price of an increase in the bitrate. Thus, an optimal value of μ for a given quantization level will provide the optimal perceptual quality/rate relation or trade-off.
Thus, adapting the Lagrange multipliers λ and μ in a content adaptive manner, in particular, according to the encoding block complexity, and not only according to quantization level as done in some existing video encoding implementations, can lead to better encoding decisions and thus allow for better subjective quality of video at a specific bitrate when compared to a result obtained with a video encoder which does not employ this adaptation.
While investigating adaptation of μ it became apparent that while usage of “large” values of μ were beneficial for perceptual quality obtained when encoding blocks with “average” complexity, they were not beneficial for blocks with smooth texture and low complexity. Furthermore, when encoding highly contrasting blocks with very strong texture variations, or high complexity, sufficient quality and texture preservation was achieved even without using “large” values of μ. Thus, uniformly using large value of μ for all encoding blocks, while improving visual quality for some blocks, will also introduce a rate increase in other blocks without corresponding increase in obtained visual quality. Hence, it is proposed to apply the scaling factors described above in reference to step 715, calculating the RD cost as:
Cost=λ·R+Fssd(BC)·SSD(Borig,Brec)+Fcmpl(BC)·μ·CmplDiff(Borig,Brec)
wherein Fssd(BC) and Fcmpl(BC) are scaling factors dependent on the encoding block complexity BC, and wherein different functions or LUTs may be used to implement each of these scaling factors.
Turning now to
In 805 Porig is presented, the 4×4 encoding block used for the duration of this example. The numerical values correspond to the 8-bit pixel values of the block. This corresponds also to input 202 of
The pixel values of the selected prediction block, Ppred, are presented in 810. This is another 4×4 8-bit pixel value set, taken from a previously reconstructed block in the encoded video stream.
In this example, the processing of the encoding block is performed in the pixel domain, corresponding to block 224. The processing involves applying a sharpening filter, resulting in the processed block Psharp, depicted in 815. In the example this block is obtained from Porig by applying a sharpening filter of the form:
where N8(j) is the indices set for 8 pixels spatially neighboring to Porig(j)), followed by a clipping operation to limit the values to the 8-bit range [0, 255]. Then, the frequency transform used in the encoding process is applied to each Porig and Psharp, corresponding to block 226, and yielding transform coefficients blocks Torig depicted in 820 and Tsharp depicted in 825. Note that in this example the transform and quantization procedures are performed in accordance with those of the H.264 or AVC video coding standard.
The 4×4 residual block Rinit, corresponding to 202, also supplied to the optimized quantization module as input, is depicted in 830. The frequency transform used for encoding is applied resulting in a block of initial coefficients C, depicted in 835.
Using a Quantization Parameter (QP) value equal to 42 and the de-quantization procedure as indicated in H.264 video coding standard, the quantized coefficients block Cqusual depicted in 850 would be obtained when applying the “usual”, non-novel Rate-Distortion optimized quantization with a simplified Trellis scheme to the residual block Rinit.
However, using the proposed method, the minimization problem for the quantized coefficients calculation is modified as
where, in the scope of this example set Xj(x, y)≡1 is set, and Yj(x, y)=clip(2(x−y),−D/(2M (j)), D/(2M(j))), wherein clip(x, A, B) means clipping the value x in range [A; B]. The function Yj(x, y) for this example is depicted in 840.
Applying the proposed quantization method using a QP value of 42 will result in the quantized coefficient block Cq depicted in 855. As seen in this example, the only difference between Cqusual and Cq, is the value of the coefficient at the position (1,1): due to different signs of Yj and C at this position the optimal absolute value of the corresponding quantized coefficient for the proposed method appears to be lower than that for the ordinary Rate-Distortion based method.
In order to understand the impact of this difference, the resulting reconstructed block corresponding to Cqusual is also provided, depicted in 860, and reconstructed block corresponding to Cq, depicted in 865. As expected, the Sum of Square Differences (SSD) between the encoding block and the reconstructed block is lower for the case of ordinary RD-based quantization (8,706 vs 10,354). However, the SSD between the reconstructed block and the processed or sharpened encoded block is lower for the proposed method: 33,729 vs 3,5393.
Thus, in this example, the proposed method provides a reconstructed block which is both closer to the sharpened encoded block, thus introducing less smoothing or texture loss visual artifacts in the reconstructed video, while simultaneously reducing the bit rate or number of bits required to encode the block due to the lower absolute value of the (1,1) coefficient.
Turning now to
In
In addition, two constants Acmpl and Assd are defined, which, for the sake of this example, are set to 230 and 190 respectively, and, accordingly, scaling factors are set to:
These scaling factors are incorporated into the RD cost as described above to obtain:
Cost=λ·R+Fssd(BC)·SSD(Borig,Brec)+Fcmpl(BC)·μ·CmplDiff(Borig,Brec)
equals 151. Using the above example scaling factor formulations, Fssd (151)=0.75 are obtained, and Fcmpl(151)=1.02, as is illustrated in point #2 on the graphs in
Thus configured, these teachings provide for optimized video encoding, such that a desired level of quality at a desired bit-rate can be attained in a reduced amount of time, with reduced computational requirements, and/or using a reduced bitrate to obtain the desired level of quality, when compared to an encoder which does not utilize these teachings.
Those skilled in the art will recognize that a wide variety of modifications, alterations, and combinations can be made with respect to the above described embodiments without departing from the scope of the invention, and that such modifications, alterations, and combinations are to be viewed as being within the ambit of the inventive concept.
It is to be noted that the examples and embodiments described herein are illustrated as non-limiting examples and should not be construed to limit the presently disclosed subject matter in any way.
It is to be understood that the invention is not limited in its application to the details set forth in the description contained herein or illustrated in the drawings. The invention is capable of other embodiments and of being practiced and carried out in various ways. Hence, it is to be understood that the phraseology and terminology employed herein are for the purpose of description and should not be regarded as limiting. As such, those skilled in the art will appreciate that the conception upon which this disclosure is based may readily be utilized as a basis for designing other structures, methods, and systems for carrying out the several purposes of the presently disclosed subject matter.
It will also be understood that the system according to the invention may be, at least partly, implemented on a suitably programmed computer. Likewise, the invention contemplates a computer program being readable by a computer for executing the method of the invention. The invention further contemplates a non-transitory computer-readable storage medium tangibly embodying a program of instructions executable by the computer for executing the method of the invention.
Those skilled in the art will readily appreciate that various modifications and changes can be applied to the embodiments of the invention as hereinbefore described without departing from its scope, defined in and by the appended claims.
Number | Date | Country | Kind |
---|---|---|---|
2018143702 | Dec 2018 | RU | national |