The invention relates generally to the field of video encoding, and more particularly to rate control for video encoders.
The International Telecommunication Union and the International Standards Organization developed a set of standards for low bit rate video compression. The standards are commonly referred to as H.264, MPEG-4, Part 10 or AVC (Advanced Video Coding.) The goal of H.264 and similar standards is to provide a common set of standards for video compression that can be applicable to a number of video applications and can allow various encoders and decoders to function together. Because H.264 is capable of producing low bit rates it is used for high-definition (HD) video applications.
Videos are made up of a series of frames, where each frame represents a single point in time of a particular scene or moving image. The frames are made up of pixels. The number of pixels in a frame determines the resolution of that frame. Each frame is further sub-divided into macroblocks, representing a small portion of a single frame. A typical 1080i HD video will have approximately 30 frames per second and each frame will have 1920×1080 pixels. A macroblock is typically a block of 16×16 pixels. The large number of pixels in an HD-video requires a considerably larger number of bits than standard videos.
Video compression aims to reduce the number of bits in a video, without significantly reducing the resolution or quality of the image. A common method of reducing the bit rate is by prediction, where redundant information is intelligently reduces. Video encoders and video compression algorithms will make predictions as to how the frame looks based on the redundancies that exist within a frame, spatial redundancy, and the redundancies that exist between a series of frames, temporal redundancy. For example, a scene that remains constant over time will be redundant in the temporal domain. However, once the scene changes, the amount of redundancy will be minimal, resulting in a spike in the bit requirements.
Another method of reducing the bit rate is by discarding information through quantization. Quantization maps a range of values to a single value. Once a frame or macroblock is represented in the frequency domain, using one of the transform functions known in the art, such as Discrete Cosine Transformation or Integer Transformation, quantization will be performed to increase the quantization step and therefore have fewer discrete values possible to represent the entire range. The idea being that it is not necessary to have the full number of discrete steps in order to maintain a high quality image. The set of distinct values used in quantization is based on a step size, or quantization parameter (QP). When the step size is increased more values are encompassed in an individual step. Greater step sizes result in a reduced number of bits and reduced image quality.
Videos that need to be sent over a network, such as the Internet, to an endpoint, must meet additional requirements regarding the bit rate. Given the stochastic nature of the Internet, all of the information pertaining to an image does not reach the endpoint at the same time. Therefore, buffers are needed to temporarily store and collect the bits before sending the bit stream and corresponding image to the endpoint. The size of the buffer determines how many bits the buffer is capable of storing and how much time is required before the bit stream is finally sent to the endpoint. The size of the buffer effects the latency of the video encoding system. To avoid excessive delays in sending a video over the Internet, the buffer cannot be set too large. However, the buffer must be set large enough to accommodate any spikes in the bit rate.
The bit stream must meet the requirements of the buffer, or information will be lost. The bit stream is generally controlled by a rate control block. Rate control aims to send as much bit information as possible, without exceeding the network bit-rate and buffer size and while maintaining the image quality.
HD Video Conferencing is an application of the H.264 video compression standard that presents unique challenges to rate control. The most important constraint is the real-time nature of video conferencing. The parties must be able to communicate with one another in real-time. Any delays in the communication over 0.5 seconds will make the video unwatchable and a video conferencing unit unusable. Therefore, given the inherent latency of the Internet, rate control and encoding times must be done efficiently and be kept at an absolute minimum.
Rate control methods in the prior art do not address the issue of low latency and maintaining a smaller buffer. Also, the prior art does not efficiently and quickly address the issue of spikes in the bit rate, especially in low latency environments. Therefore in a scene change or a highly complex video, the bit rate will be much higher and may not fit within the constraints of a buffer. Prior inventions also focus on applying rate control methods based on whether the image represents a scene change. The entire frame is analyzed by comparing each macroblock temporally and spatially. This requires buffering an entire frame and thus incurring more than a frame's worth of delay in the encoding process, which is too much for real-time applications.
It is an object of the invention to develop a method and system for rate control, applicable to H.264 standards. It is a further object of the invention to develop a method and system for rate control for single-pass, real time, high-definition video applications that can maintain high image quality, with low processing times. It is a further object of the invention to develop a method and system for rate control that can operate independently of the complexity of the current frame or macroblock as it compares to previous frames and macroblocks. It is a further object of the invention to develop a method and system for rate control that can be used on both high complexity and low complexity frames.
Exemplary embodiments of the invention are concerned with a method and system for a rate control block that adjusts the Quantization Parameter (QP) for a frame or macroblock based on the number of bits already used in encoding the frame or macroblock. In another embodiment of the invention, the QP for a macroblock is based on the occupancy of a buffer. In another embodiment of the invention, a range of allowable QPs are defined based on the occupancy of a buffer.
The present invention will be understood more fully from the detailed description that follows and from the accompanying drawings, which however, should not be taken to limit the invention to the specific embodiments shown, but are for explanation and understanding only.
a is a flowchart diagram of an embodiment of the present invention illustrating a method for setting a range of quantization parameter values.
b is a flowchart diagram of an embodiment of the present invention illustrating an optimized method for setting a range of quantization parameter values.
A method and system for rate control in a video encoder is described. In the following description specific details are set forth, such as device types, system configurations, protocols, applications, methods, etc., in order to provide a thorough understanding of the present invention. However, persons having ordinary skill in the relevant arts will appreciate that these specific details may not be needed to practice the present invention.
Step 101 begins at a current macroblock. Macroblocks in a frame of n macroblocks, are encoded one at a time, beginning at the first macroblock in a frame. At the point of the current macroblock, all previous macroblocks in the frame have already been encoded. At step 102, encoding will be performed on the macroblock. The encoding includes removing any redundancies in the macroblock and transforming the macroblock to a frequency domain. Encoding is performed by using parameters that are either defined at this step, input to this step or are input by a user. First, there is a quantization parameter for the entire frame, frameQP. Second, there is a number of bits targeted to be used for encoding an entire frame, targetFrameBits. This value is based on the constraints of the system on to which the method is applied. For example, in HD video conferencing, the targetFrameBits may be reduced so that video processing times and sending times are reduced. There is also the macroblockQP, which is based on the frameQP and is adjusted according to the macroblocks that have already been encoded within the current frame.
The targetFrameBits, bit information and bit requirements of the current macroblock are sent to step 103 to calculate the bitDiff. The bitDiff is the difference between the number of bits actually used in encoding a frame through the current macroblock and the number of bits targeted to be used in encoding a frame through the current macroblock, targetCurrentBits. The formula for calculating the number of bits targeted to be used up through the current macroblock is:
where n is the total number of macroblocks in the frame and currentMB is the number of encoded macroblocks in the frame. The bitDiff is represented by the formula:
Bitdiff=currentBits−targetCurrentBits,
where currentBits is obtained from the encoding step 102 and represents the number of bits that have already been used to encode the frame from the first macroblock to the current macroblock.
In step 104, a right shift is applied to the bitDiff to obtain the mbCorrection.
mbCorrection=bitDiff>>x,
where ‘>>’ is the operand for right shift and x is the number of positions to shift the bits. The value x is defined by the user and is based on the requirements of the system and how much degradation of the image will be tolerated.
In another embodiment of the invention, the formula for mbCorrection is:
mbCorrection=(bitDiff>>x)2.
This formula is optimized for H.264 applications, where the quantization parameter does not map linearly to the bits. In an embodiment of the present invention, an optimized value for x is defined in the following formula:
x=mbShift+9.
The parameter mbShift is a variable shift parameter and the number 9 represents a fine adjustment on the shift value.
In step 105, the mbCorrection is added to the frameQP to fine tune the image quality and number of bits and obtain a macroblockQP. The resulting macroblockQP is returned to the encoding step 102 and used to encode the next macroblock and output the compressed bit stream.
If the current macroblock is the first macroblock within a frame then it is not necessary to offset the frameQP. No bits will have been used at the first macroblock. The mbCorrection will be equal to zero and the macroblockQP will be equal to the frameQP.
Calculating mbCorrection' includes the step of defining an upper limit, mbLimitUp, and a lower limit, mbLimitDn, for the correction factors 208. mbLimitUp represents the maximum allowable correction factor that can be applied to the frameQP, or the maximum degradation allowed to the image quality. mbLimitDn represents the minimum allowable correction factor that can be applied to the frameQP. These values may be user-defined, variable and/or based on the requirements of the system and the requirements of the image quality. mbLimitUp and mbLimitDn may be the same magnitude. It is appreciated that by limiting the magnitude of mbCorrection, the QPs of the n macroblocks within a single frame can be more consistent and thus produce a more homogeneous image. After defining the limits, mbCorrection is evaluated. If mbCorrection is between mbLimitUp and mbLimitDn 209, then mbCorrection' is equal to mbCorrection 210. If mbCorrection is less than mbLimitDn, 211,and therefore exceeds maximum allowable negative correction, then mbCorrection' is equal to mbLimitDn 212. If mbCorrection is greater than mbLimitUp 213, and therefore exceeds the maximum allowable positive correction, then mbCorrection' is equal to mbLimitUp 214.
a illustrates another embodiment of the present invention. It is a method for setting a range of allowable values of the correction factor for a QP to be used for encoding, based on the current state of the system buffer. The method includes the steps of defining a maximum magnitude correction, mbMaxCorr, to the quantization parameter under nominal conditions 401, calculating the current occupancy of the buffer, buff_occ, 402, defining buffer occupancy ranges and corresponding shift parameters 403, calculating an adjusted maximum correction, mbMaxCorr', allowable to the quantization parameter, based on buff_occ and shifting parameters 404, and finally setting a lower and upper limit for the correction factor, mbLimitUp and mbLimitDn 405.
The maximum correction, mbMaxCorr, is defined by the user and is based on the constraints of the system to which the method is applied. It is appreciated that nominal conditions include situations where there is not a scene change, where the image is not highly complex or where there exists a minimal amount of redundancy.
The buffer occupancy, buff_occ, is equal to the number of bits currently in the buffer, divided by the total number of bits the buffer can accommodate. The size of the buffer is based on the constraints of the system. In a video conferencing application, or other real-time application, the buffer must be kept at a minimum. The buffer will reach capacity more quickly in these applications.
Based on the requirements of the system and application on to which the method of
In the present invention, a left bit shift, as illustrated in
mbMaxCorr'=mbMaxCorr<<rough_Tune(buff_occ),
where ‘<<’ is the operand for a left shift and rough_Tune is a variable, user-defined parameter of the number of bits to shift mbMaxCorr and is a function of buff_occ. A rough_Tune value may be defined for each buffer occupancy range. It is appreciated that as the buffer reaches capacity, the QP may also increase, to reduce the size of the incoming bit stream and to ensure that the incoming bit stream does not exceed the size of the buffer.
In a further embodiment of the method of
mbMaxCorr'=(mbMaxCorr<<rough_Tune(buff_occ))+fine_Tune(buff_occ),
where fine_Tune is a variable, user-defined parameter that is a function of the buffer occupancy. A fine_Tune value may be defined for each buffer occupancy range and may be used to more precisely select the QP.
In Step 405, mbLimitUp and mbLimitDn are defined based on mbMaxCorr' and represented in the following formulas:
mbLimitUp=+mbMaxCorr' and
mbLimitDn=−mbMaxCorr'.
Once the values of mbLimitUp and mbLimitDn have been defined, they can be applied to methods for offsetting a QP, as depicted in
b illustrates another embodiment of the present invention. It depicts an exemplary method for setting buffer occupancy ranges or bins. The optimal buffer occupancy ranges are defined in step 403 as: 1) buff_occ>0.875; 2) buff_occ>0.75; 3) buff_occ>0.5; 4) buff_occ>0.25; 5) all else. It is appreciated by those skilled in the art that these bins are optimal because they are efficient in terms of time and memory in terms of the arithmetic calculations. A rough_Tune value is defined for each of the occupancy ranges. Where the buffer is less than 25% full, it is not necessary to further degrade the picture, and mbMaxCorr is not adjusted 408. In applications where additional fine tuning is required, a fine_Tune value may be defined for each of the occupancy ranges. Once it is determined which buffer occupancy range the buffer falls in 406, mbMaxCorr is shifted accordingly 407. The result of the shifted mbMaxCorr (or mbMaxCorr, where buff_occ<0.25), is returned 409 to step 404 to set mbLimitUp and mbLimitDn.
It is appreciated that the embodiments of the present invention as set forth are applicable to a video conferencing session, as depicted in
The video input 601 enters the encoding module 604. The encoding module encodes the video according to the H.264 encoding standard. It is appreciated that the encoding module and embodiments of the present invention may be applicable to standards similar to the H.264 encoding standard. As part of the encoding process, the encoding module uses quantization parameters (QP) to quantize the bits and reduce the size of the bit stream. During the encoding process, information from a frame or macroblock within the video input 501 is sent from the encoding module 604 to the rate control block 606 to obtain the optimal QP for the frame or macroblock. The rate control block 606, sends the optimal QP to the encoding module to complete the encoding of the frame or macroblock. The rate control block 606 can set the QP based on the number of bits already used to encode the current frame or current video. The rate control block 606 can also set the QP based on the current state of the buffer 607 in the system. The rate control block 606 uses user-defined or calculated limits to set the QP.
It is appreciated that in embodiments of the invention, the rate control block 606 can be incorporated into the encoding module 604, to form a complete encoding unit 608. The encoding block 608 performs all the same functions of the encoding module 604 and the rate control block 606 and outputs the same encoded bit stream of the video input.
The encoding module 604 outputs a bit stream, which is the original video input 601, compressed. The compressed, encoded bit stream of the video input 601 is sent to a data network 603, such as the Internet. The data network 603 receives the bit stream and sends it through the network and on to a decoding module 605. The decoding module 605 decodes the compressed, encoded bit stream according to the encoding module 604 and the H.264 standard as applied in the encoding module 604.
The decoded bit stream is sent on to a buffer 607. The buffer compensates for any irregularities in the flow of the bit stream through the data network 603 by holding and storing the bit stream before sending it on. The size of the buffer and the amount of time the bit stream is held is set to the requirements of the user application. It is appreciated that as hardware requirements dictate, the buffer 607 may be located before the decoding module 605, according to the requirements of the user application. Once the bit stream has been decoded by the decoding module 605, and gone through a buffer 607, a video output 602 is sent. In a video-conferencing session 500, the video output is displayed on a terminal 502.
The above description is included to illustrate embodiments of the present invention and is not meant to limit the scope of the invention. The scope of the invention is to be limited only by the claims set forth below. From the above discussion, many variations will be apparent to one skilled in the art that are encompassed by the scope and spirit of the following claims.