The present disclosure relates to a dynamic rate control algorithm for high definition video conferencing.
Video conferencing allows two or more locations to communicate using simultaneous two-way video transmissions. Among the different factors that affect user satisfaction during video conferencing, rate adaptation plays an important role. For example, generating video at a bitrate that exceeds network capabilities leads to degradation in video quality. Conversely, generating video at a bitrate below network capabilities is an inefficient use of network resources. Conventional rate control algorithms, however, are unable to address high definition video streaming requirements of video conferencing over best effort networks, such as the Internet. Therefore, it is desirable to develop a dynamic rate control mechanism for use in high definition video conferencing applications.
This section provides background information related to the present disclosure which is not necessarily prior art.
A computer-implemented method is provided for controlling an encoder that is particularly suited for a high definition video conferencing application. The method includes: receiving an average allowance of bits for encoding a data frame; receiving a burst allowance for data encoded by the encoder, where the burst allowance specifies a variance above the average allowance for a given time period; determining bits needed to encode an incoming data frame; comparing the bits needed to encode the incoming data frame to the average allowance of bits; computing a bit allowance for the incoming data frame using the bits needed to encode the incoming data frame and the burst allowance when the bits needed to encode the incoming data frame exceeds the average allowance of bits; computing a quantization parameter for the incoming data frame using the bit allowance for the incoming data frame; and providing the quantization parameter to the video encoder.
This section provides a general summary of the disclosure, and is not a comprehensive disclosure of its full scope or all of its features. Further areas of applicability will become apparent from the description provided herein. The description and specific examples in this summary are intended for purposes of illustration only and are not intended to limit the scope of the present disclosure.
The drawings described herein are for illustrative purposes only of selected embodiments and not all possible implementations, and are not intended to limit the scope of the present disclosure. Corresponding reference numerals indicate corresponding parts throughout the several views of the drawings.
Rate distortion is one of the main features addressed in the H.264/AVC standard. Since the standard defines only a bit stream syntax and decoder process, encoder developers have freedom to develop rate control algorithms. Among various H.264 implementations, the x264 open-source implementation has received much attention and is used in many applications such as ffshow, ffmpeg and MEncoder. It is one of the best software codecs on the Internet and its performance is close to or better than some commercial softwares. The main reasons behind the high performance of x264 are its rate control algorithm which is based on the libavcodec implementation, motion estimation, macro block mode decision, quantization, and frame type decision algorithms. There are many parameters in frames and macro blocks which affect the rate-distortion and the final bit rate such as frame types (e.g. I, P or B frame), number of frames, modes to be used for each macro block (e.g. INTRA, INTER, or SKIP, etc.), motion estimation methods (e.g. Full search, tree step search), etc. Practically, rate control can be applied in three different granularity levels: group of pictures level, frame level, and macro-block level. A rate control (RC) algorithm adjusts such parameters to meet a determined video bit rate. While the proposed rate control algorithm described below is discussed in the context of the H.264/AVC standard, it is extendable to other video compression standards as well.
Average bitrate control (ABR) is a known rate control algorithm that tries to assign bitrate to each frame based on a predetermined final bitrate. More specifically, a quantization parameter (QP) for the encoder is calculated based on the bitrates for data frames previously encoded by the video encoder and corresponding quantization parameters used to encode the previously encoded data frames. An exemplary formula for calculating the quantization parameter is as follows:
where QPi is the quantization parameter used to encode data frame i, Bi is the bits consumed for data frame i, and Wi is the desired bitrate for data frame i, respectively. In order to help rate controller to make proper decision for QP value, a proper duration for collecting history should be defined. For example the bitrate used 30 minutes ago is not a good reference to calculate the average bitrate in that a frame of 30 minute before is less similar than a frame from 5 minutes ago. Accordingly, the control algorithm uses from a group of previous frames (referred to as a window) to calculate in current frame QP value. Therefore, formula (1) can be modified as follows:
where k is a fixed integer number which is determined based on the conference environment at the beginning and it is not changed during conference session.
While the bitrate is fixed, the network delivers data at the specified bitrate with some variations. By definition, the bitrate is the amount of bits that are transferred per unit of time. For instance, the encoder may be configured to encode raw video at 3 Mbit/second. From the specified bit rate, the dynamic rate control algorithm 20 begins by determining an average allowance of bits (i.e., quota) for encoding a data frame as indicated at 21. That is, the average allowance of bits for a given frame is computed by dividing the bit rate by a frame rate, where the bit rate is the average bits per unit of time for the video data as reported by the encoder and the frame rate is received from the source of the data to be encoded by the encoder. By way of example, for a bitrate of 2 Mbit/second and a frame rate of 20 frames per second, the average allowance of bits per data frame is 100 kilobits per frame.
When allocating bits to an incoming data frame, the rate control algorithm 20 uses the concept of a future budget (or bit allowance). The concept of a future budget is very similar to the wage system. An employee may be paid at the beginning of a given month. The employee may pay any fixed expenses for the month and place the surplus money in the bank. The surplus can be used to manage any unexpected expenditures throughout the month. In the case of an unexpected expenditure, the employee can withdraw from the surplus money in the bank to pay for the unexpected expenditure. The proposed rate control algorithm operates in a similar manner.
The rate controller can receive at 22 a burst allowance for data encoded by the encoder, where the burst allowance specifies a variance from the allocated bandwidth for a given period of time and is provided, for example by the network service provider. The burst allowance represents the future budget for the rate controller.
The rate control algorithm 20 can consume up to a percent of the future budget when the bit allowance for an incoming data frame exceeds the average bit allowance, thereby preventing degradation of video quality. Thus, the future budget allows the encoder to tolerate some fluctuations on the output. However, if the consumption of the encoder continues to generate video bit rates over the desired bit rate, the future budget will be depleted such that the video quality will be degraded.
On the other hand, when encoder is generating fewer bits than the desired bit rate, the rate control algorithm 20 can replenish the future budget. The future budget can be increased by the difference between the average bit allowance and the bit allowance for the incoming data frame up to the value of the initial burst allowance.
Upon receipt of an incoming data frame, the rate controller determines at 23 the bits needed to encode the incoming data frame. The number of bits needed to encode the incoming data frame is then compared at 24 to the average bit allowance. When the bits needed to encode the incoming data frame exceeds the average bit allowance, the bit allowance for the incoming data frame is computed at 25 using the future budget. Specifically, the rate controller determines a difference between the bits needed to encode the incoming data frame and the average allowance of bits, subtracts the difference from the future bit allowance (i.e., future budget) to yield an updated future bit allowance, and sets the bit allowance equal to the bits needed to encode the incoming data frame so long as the updated future bit allowance is equal to or greater than zero. In this way, the quality of the video is maintained. Once the future budget is depleted (i.e., the updated future allowance is less than zero), the bit allowance for the incoming frame is set to the average bit allowance and the quality of the video is degraded.
Conversely, when the bits needed to encode the incoming data frame is less than or equal to the average bit allowance, the future budget is replenished at 26. To do so, a difference is computed between the average allowance of bits and the bits needed to encode the incoming data frame. The difference is added to the future bit allowance to yield an updated future bit allowance. The future budget can be increased up to but not to exceed the value of the initial burst allowance. In this case, the bit allowance for the incoming data frame is set to the average bit allowance.
In either case, the quantization parameter for the incoming frame is computed at 27 using the bit allowance allocated to the incoming data frame. In an exemplary embodiment, the quantization parameter is computed using equation (1) although other computational techniques are contemplated by this disclosure. The quantization parameter is then provided at 28 to the video encoder 14 which in turn encodes the incoming data frame in accordance with the provided quantization parameter.
Information pertaining to an incoming data frame is received at 31 by the rate controller. A determination is made at 32 as to whether the incoming data frame is the first frame to be received at a new bitrate. For the first frame, the quantization parameter is estimated as indicated at 33. According to an x264 implementation, the quantization parameter may be estimated as follows:
where Generatedbits and wanted_bit_window represent the encoded frame size and the desired frame size, respectively. The estimated quantization parameter is in turn passed by the rate controller 12 to the video encoder 14 and used to encode the first data frame.
For the frames following the first frame, a determination is made at 34 as to the bits needed to encode the current frame. The actual number of bits used to encode the previous frame is then compared at 35 to the average allowance of bits for encoding a data frame, where the actual number of bits used to encode the previous data frame is provided by the encoder to the rate controller. As noted above, the average allowance of bits may be computed by dividing the bit rate by the frame rate. When the actual bits needed to encode the previous frame is substantially equal to the average bit allowance, the current frame can be encoded without modifying the encoder parameters (i.e., using QP from previously encoded data frame) as indicated at 36.
When the actual bits needed to encode the previous data frame differs from the average bit allowance, a second determination is made at 37 to determine whether the actual bits needed to encode the data frame fall within a variance (e.g., ±10% of average bit allowance). If the bits needed fall within the variance, the quantization parameter is updated at 38 and the updated quantization parameter is used to encode the current frame. In the exemplary embodiment, the quantization parameter is updated using equation (1) above.
When the actual bits needed to encode the previous data frame falls outside of the variance, a determination is made at 39 as to whether the bits needed to encode the incoming data frame exceeds the average bit allowance or are below the average bit allowance. If the bits needed exceed the average bit allowance, then a determination is made at 40 as to whether the future bit allowance can be used to encode the incoming data frame. To do so, the rate controller determines a difference between the bits needed to encode the incoming data frame and the average allowance of bits and subtracts the difference from the future bit allowance to yield an updated future bit allowance. If the updated future bit allowance is equal to or greater than zero, then the quantization parameter can be calculated using the future bit allowance. Specifically, the bit allowance for the current frame is set at 41 to the number of bits needed to encode the incoming data frame and the quantization parameter is calculated at 42 using the set bit allowance. The quantization parameter can be calculated using equation (1) above. Lastly, future bit allowance is updated at 43 by subtracting the difference at 42 from the future bit allowance to yield an updated future bit allowance.
The future budget has been depleted if the update future bit allowance is less than zero. In this case, the quantization parameter is calculated at 44 without the benefit of the future budget. That is, the bit allowance for the incoming frame is set to the average bit allowance and the quantization parameter is calculated using equation (1) above. In such as case, because the number of bits needed to encode the current data frame is more than the average bit allowance, the video quality will be degraded.
If the bits needed are less than the average bit allowance, then the future budget can be replenished at 46. To do so, the rate controller determines a difference between the average allowance of bits and the bits needed to encode the incoming data frame. The future bit allowance can be replenished by adding the difference to the future bit allowance to yield an updated future bit allowance. The future budget can be increased up to but not to exceed the value of the initial burst allowance.
In the exemplary embodiment, the quantization parameter for the incoming data frame may depend upon the structural similarity (SSIM) index for the previous data frame. The SSIM is compared at 47 to a desired quality level. If the SSIM meets the desired quality level, the quantization parameter for the previous data frame is used at 48 to encode the current data frame. If the SSIM does not meet the desired quality level, the quantization parameter for the current data frame is calculated at 49 using equation (1) above. In this case, the bit allowance for the current frame is set to the average bit allowance. SSIM is a known method for measuring the similarity between two images. Other types of quality measures, such as PSNR, VQM, MPQM, NQM and other indicators of video quality also fall within the scope of this disclosure.
An example scenario is set forth below to illustrate the concept. In this example scenario, assume the average bit allowance is 100 kilobits per frame and the burst allowance is 200 kilobits per frame. When the bits needed to encode a first incoming data frame is 150 kilobits, the future bit allowance is adjusted from 200 to 150 kilobits per frame. When the bits needed to encode a second incoming data frame is 200 kilobits, the future bit allowance is adjusted from 150 to 50 kilobits per frame. If the bits needed to encode a third incoming data frame is 175 kilobits, the future bit allowance is adjusted from 50 to 0. Conversely, if the bits needed to encode the third incoming data frame is 50, the future bit allowance is adjusted upward from 50 to 100 kilobots per frame. Note that the future bit allowance can not be adjusted to a value higher than the initial burst allowance (i.e., 200 kilobits per frame).
While adjusting to the new bitrate, the rate controller 14 operates in a “transient” state as indicated at 52. Previous history for calculating the quantization parameter in accordance with equation (1) is cleared (i.e., summations set to zero) as indicated at 53. For the first data frame at the new bitrate, the quantization parameter for the video encoder 14 is initially estimated at 54. In an exemplary embodiment, the quantization parameter may be estimated using equations (2)-(7) set forth above. The first data frame is then encoded at 55 with the estimated quantization parameter.
Next, a determination is made at 56 as to whether the encoder output was generated at a bitrate close to the desired new bitrate. If the generated bitrate has not yet reached the desired new bitrate, the quantization parameter for the next incoming data frame is calculated at 57 using equation (1) above. The quantization parameter is used by the video encoder at 55 to encode the next incoming data frame. The process is repeated until generated bitrate reaches the desired new bitrate. Because this equation does not have any previous history, steps between the quantization parameters are large, thereby converging the generated bitrate quickly to the desired new bit rate.
Once the generated bitrate reaches the desired new bitrate, the rate controller 14 operates is a “steady” state as indicated at 58. That is, the rate controller 14 beings processing incoming data frames using, for example the dynamic rate control algorithm 30 described above in relation to
where this equation is a simple approximation of equation (5). Some of the parameters, such as bluredcomplexity, Cplxrsum, and wanted_bit_windows, should be modified according to the new bitrate.
The proposed dynamic rate control algorithm described above was compared with other rate control mechanisms in terms of bandwidth, frame size, PSNR values for Y, and SSIM values for different types of the high definition video conferencing (HDVC) environment from low activity video conferencing to high activity even noisy background. That is, rate in HDVC in two main situations: fixed bit rate where the bit rate doesn't change significantly around session, and variable bit rate in which the bit rate if changed from 2 Mbps to 1.5 Mbps. In the first situation, the desired bit rate for all test cases was targeted at 2 Mbps; i.e., the current available bandwidth is assumed to be around 2 Mbps and then measured how different rate control algorithms reacted in that situation. The videos which are used in this simulation were taken from an actual HDVC session. For the simulation, the low activity video is the video that a person is sitting in front of the camera in which the background has almost no modifications. For example, the person talks, moves his head, raises his hands. Moreover, there is a whiteboard behind the person with some texts written on it. The quality of the text is one of the areas which is evaluated in the subjective tests.
Performance of three select rate control methods are evaluated and described in terms of matched bit rate and video quality. The three select rate control methods include: an average bitrate method (ABR); constant rate factor method (CRF); and the proposed dynamic rate control algorithm 20.
In order to compare the proposed dynamic rate control algorithm 20 against the other rate control methods, it is required to generate a similar bit rate between all of them. Therefore, the CRF method was set up to 21 for a 2.7 Mbps bit rate without using VBV. This is the best possible value to generate the closest bit rate to 2 Mbps. The ABR method and the proposed dynamic rate control algorithm 20 generated bit rates were 2 Mbps. The first frame is encoded as an I frame and the next frames are encoded as P frames. That is why the first frame in
As can be seen in
Similar to PSNR, SSIM is another video quality indicator. Similar to
In the above tests, the proposed dynamic rate control algorithm 20 is compared with other rate control methods when the network capacity doesn't change. We can see that even in these situations, the proposed dynamic rate control algorithm 20 has valuable advantages over the other rate control algorithms. However, the worthiness of the proposed dynamic rate control algorithm 20 will be even more obvious when the network bandwidth changes. The faster a rate control algorithm adjusts to a new network bandwidth, the smaller the negative effects of the network will be in the video conversation. This premise was tested and the results are discussed next.
In one simulation, the video bit rate was changed from 2 Mbps to 1.5 Mbps. With reference to
Frame size values are evaluated when the bitrate has been changed as shown in
With regard to PSNR, the ABR method generate the huge negative impact of the quality (especially in frame 101) and then increase the video quality as shown in
Similar behavior was observed in relation to SSIM as shown in
The techniques described herein may be implemented by one or more computer programs executed by one or more processors. The computer programs include processor-executable instructions that are stored on a non-transitory tangible computer readable medium. The computer programs may also include stored data. Non-limiting examples of the non-transitory tangible computer readable medium are nonvolatile memory, magnetic storage, and optical storage.
Some portions of the above description present the techniques described herein in terms of algorithms and symbolic representations of operations on information. These algorithmic descriptions and representations are the means used by those skilled in the data processing arts to most effectively convey the substance of their work to others skilled in the art. These operations, while described functionally or logically, are understood to be implemented by computer programs. Furthermore, it has also proven convenient at times to refer to these arrangements of operations as modules or by functional names, without loss of generality.
Unless specifically stated otherwise as apparent from the above discussion, it is appreciated that throughout the description, discussions utilizing terms such as “processing” or “computing” or “calculating” or “determining” or “displaying” or the like, refer to the action and processes of a computer system, or similar electronic computing device, that manipulates and transforms data represented as physical (electronic) quantities within the computer system memories or registers or other such information storage, transmission or display devices.
Certain aspects of the described techniques include process steps and instructions described herein in the form of an algorithm. It should be noted that the described process steps and instructions could be embodied in software, firmware or hardware, and when embodied in software, could be downloaded to reside on and be operated from different platforms used by real time network operating systems.
The present disclosure also relates to an apparatus for performing the operations herein. This apparatus may be specially constructed for the required purposes, or it may comprise a general-purpose computer selectively activated or reconfigured by a computer program stored on a computer readable medium that can be accessed by the computer. Such a computer program may be stored in a tangible computer readable storage medium, such as, but is not limited to, any type of disk including floppy disks, optical disks, CD-ROMs, magnetic-optical disks, read-only memories (ROMs), random access memories (RAMs), EPROMs, EEPROMs, magnetic or optical cards, application specific integrated circuits (ASICs), or any type of media suitable for storing electronic instructions, and each coupled to a computer system bus. Furthermore, the computers referred to in the specification may include a single processor or may be architectures employing multiple processor designs for increased computing capability.
The algorithms and operations presented herein are not inherently related to any particular computer or other apparatus. Various general-purpose systems may also be used with programs in accordance with the teachings herein, or it may prove convenient to construct more specialized apparatuses to perform the required method steps. The required structure for a variety of these systems will be apparent to those of skill in the art, along with equivalent variations. In addition, the present disclosure is not described with reference to any particular programming language. It is appreciated that a variety of programming languages may be used to implement the teachings of the present disclosure as described herein.
The present disclosure is well suited to a wide variety of computer network systems over numerous topologies. Within this field, the configuration and management of large networks comprise storage devices and computers that are communicatively coupled to dissimilar computers and storage devices over a network, such as the Internet.
The foregoing description of the embodiments has been provided for purposes of illustration and description. It is not intended to be exhaustive or to limit the disclosure. Individual elements or features of a particular embodiment are generally not limited to that particular embodiment, but, where applicable, are interchangeable and can be used in a selected embodiment, even if not specifically shown or described. The same may also be varied in many ways. Such variations are not to be regarded as a departure from the disclosure, and all such modifications are intended to be included within the scope of the disclosure.