The present invention generally relates to encoding video data, and more specifically, to a method and device for controlling a bit rate when encoding the video data.
The demand for exchanging video data between mobile communication devices is becoming more popular as the speed and the quality of the transmitted video increases. For transmitting video data in a wireless environment, which is known to be limited by the available bandwidth and plagued by quality problems, manipulation and/or storage of large amount of information consumes both time and resources. For increasing the quality of the transmitted video data, a greater amount of information is required, as this increased amount of information provides for better visual quality. Thus, compression techniques were developed to compress the transmitted video to sizes that can achieve a balance between maintaining the visual quality and reducing the amount of information necessary for displaying the video.
In order to reduce the amount of information necessary to display video, the compression techniques take advantage of the human visual system. Information that cannot be perceived by the human eye is typically removed. In addition, information is often repeated across multiple frames in a video sequence. To reduce the amount of information, redundant information is also removed from a video sequence. A video compression technique is described in detail in the Moving Pictures and Experts Group-2 (MPEG-2) standard in ISO/IEC 13518-2, “Information technology-generic coding of moving pictures and associated audio information: Video, 1996,” which is incorporated herein by reference.
The encoders are developed to perform in a constant bit rate (CBR) mode, where the average rate of the video stream is mainly the same from start to finish. A video stream includes a plurality of pictures or frames of various types, such as I, B and P picture types as defined by the MPEG-2 standard. A picture, depending on its type, may consume more or less bits than the set target rate of the video stream. The CBR rate-control strategy maintains a bit ratio between the different picture types of the stream, such that the desired average bit rate is satisfied, and a high quality video sequence is displayed. Other encoders, for example MPEG-2, perform in a variable bit rate (VBR) mode. Variable bit rate encoding allows each compressed picture to have a different amount of bits based on the complexity of intra and inter-picture characteristics. For example, the encoding of scenes with simple picture content consumes fewer bits than scenes with complicated picture content, in order to achieve the same perceived picture quality.
VBR encoding is accomplished in non-real time using two or more passes because of the amount of information that is needed to characterize the video and the complexity of the algorithms needed to interpret the information to effectively enhance the encoding process. In a first pass, encoding is performed and statistics are gathered and analyzed. In a second pass, the results of the analysis are used to control the encoding process. Although the VBR process produces a high quality compressed video stream, it does not allow for real-time operation.
The discussed compression techniques are generally based on a rate control algorithm that dynamically adjusts the parameters of the encoder to achieve the desired target bit rate. The rate control algorithm allocates a budget of bits to each group of pictures, individual picture and/or sub-picture in a video sequence. Block-based hybrid video encoding schemes such as the MPEG and H.26* (standard for video compressing developed by ITU Telecommunication Standardization Sector) families are inherently losing information. These processes achieve compression not only by removing truly redundant information from the bit stream, but also by making small quality compromises in ways that are intended to be minimally perceptible. In particular, a quantization parameter (QP) regulates how much spatial detail is saved. When QP is small, almost all that detail is retained. As QP is increased, some of that detail is aggregated so that the bit rate drops, but at the price of some increase in distortion and some loss of quality. In this regard,
Thus, a bit rate control unit may be a part in a video encoder. Often, when video data is to be encoded, there is a need for a specific size or rate of the encoded video in bits per second. In all modern video codecs, the bit rate (and the quality) of the encoded video depends on the level at which the transformation coefficients are quantized (QP). However, for different video sequences, different QP yields different bit rates. Thus, the QP may be changed as the coding advances in order to produce a certain bit rate.
Two different rate controls are described in Sullivan et al., JVT-1049, Joint Model Reference Encoding Methods and Decoding Concealment Methods, ftp3.itu.ch/av-arch/jvt-site/2003—09_SanDiego/JVT-1049d0.doc and Beom J, U.S. Patent Application Publication 20050036698, the content of which is incorporated by reference herein. Sullivan et al. describes a rate control that uses a Mean of Absolute Difference (MAD) from a previous picture to calculate the QP for a basic unit (an arbitrary number of macroblocks). Beom does not use information from a previous frame but divides the picture into rows of macroblocks and then a number of bits is allocated to a current picture on the basis of previous encoding results without defining a relation between an encoding rate and a distortion. The limited number of bits is not forced when features of the current picture are different from those of the previous pictures, and a quantizer scale is set adaptively to various features of the current picture without using an additional number of bits corresponding to variation of the quantizer scale.
However, the rate control mechanisms discussed above do not provide a method for rate control that focuses on conversational applications where there is a demand for fast encoding and constant bit rate. These mechanisms are faced with a question of performance contra speed. Accordingly, it would be desirable to provide devices, systems and methods for controlling a bit rate that avoid the afore-described problems and drawbacks.
According to an exemplary embodiment, there is a method for controlling a bit rate when encoding video data that includes a plurality of frames. The method include partitioning a received current frame into groups of blocks; estimating a current row energy for a current group of blocks, wherein the current row energy of the current group of blocks is based, at least in part, on a corresponding row energy associated with a corresponding group of blocks in a previous frame; determining a target number of bits for the current group of blocks; calculating a quantization parameter for the current group of blocks of the current frame based on the estimated current row energy of the current group of blocks and the determined target number of bits for the current group of blocks; and encoding the current group of blocks based on the calculated quantization parameter.
According to another embodiment, there is a computer readable medium for storing computer executable instructions, wherein the instructions, when executed by a processor, cause the processor to control a bit rate when encoding video data that includes a plurality of frames. The instructions include partitioning a received current frame into groups of blocks; estimating a current row energy for a current group of blocks, wherein the current row energy of the current group of blocks is based, at least in part, on a corresponding row energy associated with a corresponding group of blocks in a previous frame; determining a target number of bits for the current group of blocks; calculating a quantization parameter for the current group of blocks of the current frame based on the estimated current row energy of the current group of blocks and the determined target number of bits for the current group of blocks; and encoding the current group of blocks based on the quantization parameter.
According to still another embodiment, there is a communication device for controlling a bit rate when encoding video data that includes a plurality of frames. The device includes a processor configured to partition a current frame into groups of blocks; to estimate a current row energy for a current group of blocks, wherein the current row energy of the current group of blocks is based, at least in part, on a corresponding row energy associated with a corresponding group of blocks in a previous frame; to determine a target number of bits for the current group of blocks; and to calculate a quantization parameter for the current group of blocks of the current frame based on the estimated current row energy of the current group of blocks and the determined target number of bits for the current group of blocks; and an encoding unit connected to the processor and configured to encode the current group of blocks based on the calculated quantization parameter.
According to another exemplary embodiment, there is a communication device for controlling a bit rate when encoding video data that includes a plurality of frames. The device includes means for partitioning a received current frame into groups of blocks; for estimating a current row energy for a current group of blocks, wherein the current row energy of the current group of blocks is based, at least in part, on a corresponding row energy associated with a corresponding group of blocks in a previous frame; for determining a target number of bits for the current group of blocks; and for calculating a quantization parameter for the current group of blocks of the current frame based on the estimated current row energy of the current group of blocks and the input target number of bits for the current group of blocks; and means for encoding the current group of blocks based on the calculated quantization parameter.
The accompanying drawings, which are incorporated in and constitute a part of the specification, illustrate one or more embodiments and, together with the description, explain these embodiments. In the drawings:
The following description of the exemplary embodiments refers to the accompanying drawings. The same reference numbers in different drawings identify the same or similar elements. The following detailed description does not limit the invention. Instead, the scope of the invention is defined by the appended claims. The following embodiments are discussed, for simplicity, with regard to the video codec H.264 terminology. However, the embodiments discussed next are not limited to this codec but may be applied to other existing systems.
Reference throughout the specification to “one embodiment” or “an embodiment” means that a particular feature, structure, or characteristic described in connection with an embodiment is included in at least one embodiment of the present invention. Thus, the appearances of the phrases “in one embodiment” or “in an embodiment” in various places throughout the specification are not necessarily all referring to the same embodiment. Further, the particular features, structures or characteristics may be combined in any suitable manner in one or more embodiments.
As shown in
The novel process for video compressing that will be discussed in the following embodiments may be implemented in the communication device 50. The processor 52 may be configured to act as an encoder/decoder and also as a control unit or the communication device 50 may be fitted with dedicated circuitry for encoding/decoding and controlling the video. According to an exemplary embodiment, the process partitions frames (a frame is one of the many still images which compose a complete moving picture), which are received at the communication device, into groups of blocks (GOB) and uses statistics of the groups of blocks to choose a quantization level that produces an even bit rate and a smooth quantization distribution within the frames. For example, one possible partition is dividing the frames into rows of macroblocks. A row of macroblocks may be defined as those blocks that share a common characteristic, for example a same QP or, more generally, a substantially similar QP (inclusive of a same QP). Throughout this disclosure, the GOBs are referred to as rows of macroblocks or simply rows. However, as described later, the processes discussed in the disclosure also work when the GOB is chosen as parts of, or a number of, macroblock rows.
According to an exemplary embodiment, the GOB is chosen to be a macroblock row and the video codec used is H.264. A codec is a device or program capable of encoding and/or decoding a digital data stream or signal. Thus, a frame is made up of a plurality of rows and in this embodiment the row is the GOB. Each GOB is characterized by energy and QP. The QP may be determined based on statistics from the same row in a previous frame or frames and from the previously encoded rows in the same frame. The energy, the QP and the size in bits of the encoded row may be linked by formula (1), which is characteristic for H.264 (for another codec formula (1) may be modified):
Q
i
=−A·ln(Bitsi)+Ci (1)
In formula (1) “A” is a constant, and a value of “A” which was found from experiments to produce good results regardless of the encoding size, quality and content of the video may be A=4. “Q” describes the quantization parameter QP, “C” describes the energy associated with the respective row, “Bits” describes the number of bits necessary to compress the respective row, and “i” represents the respective row and varies from 1 to the number of rows in the frame. The formula for determining the size of a row (i.e., the number of bits necessary to compress the row) when knowing the QP is then:
The variable C may vary from row to row within a frame and from frame to frame within a sequence. If C is large, the number of bits for a certain QP is high, and if C is small, the number of bits for that QP is low. For this reason, C can be referred to as a measurement of the energy for that row. When a row has been encoded, both the QP and the number of bits Bits is known and from that the energy for the row may be calculated using another rewriting of formula (1):
C
i
=Q
i
+A·ln(Bitsi) (3)
For a sequence of frames that does not include scene cuts or flashes, i.e., sudden changes in the image from a frame to the next frame, there is a high correlation between the energy of one row in a given frame and the same row in previous frames. In addition, if the energy has increased or decreased for one row of a current frame compared to a corresponding row of the previous frame, it is likely that the next row (the row below) in the current frame has changed in a similar way. Thus, according to an exemplary embodiment, the energy of a current row in a current frame, also referred to herein as the “current row energy”, may be estimated by formula (4):
in which “i” is the current row number and “j” is the current frame number. Optional weighting constants a and b may be included in the formula to indicate that the terms may be given equal or unequal weights. Formula (4) shows that the energy of the current row in the current frame is calculated as the mean of the energy of the row corresponding to the current row in the previous frame (also referred to herein as the “corresponding row energy”), and the energy of the previous row in the current frame (also referred to herein as the “previous row energy”) multiplied by the change in energy in the previous row of the current frame and the previous row in the previous frame. Ci-1(j−1) is also referred to herein as “a block energy” of the previous group of blocks in the previous frame. More generally, the energy of a given row in a current frame may be estimated based on the energy of a previous row in the same frame and on the energy of the same row in a previous frame and/or a previous row in a previous frame.
Thus, when a row is to be encoded, the desired number of bits to be used for encoding is known (as will be discussed later) and the energy estimated from formula (4) may be substituted into formula (1) to obtain the QP to be used in the encoding. In this respect,
Calculating the QP for a row based on the bit rate of the row and the energy of the row is one aspect of the process for encoding/compressing the image. Various other aspects are involved in this process, which are discussed next. One or more of these aspects may be implemented in the communication device together with the calculation of the QP for the rows. After a discussion of some of these aspects, an exemplary process, which includes these aspects, is discussed for a better understanding of the disclosure.
The calculation of target bits for a frame is one of the aspects mentioned above. The process assumes that the target number of bits is known for each frame. The target number of bits for a frame may be decided by using an averaging buffer. The desired bit rate and frame rate is specified and given as input, before the encoding starts. These rates may be used to calculate the target size for each frame. For example, if an averaging buffer is used, the target size for a frame may be 1000 bytes and if the encoded frame has a size of 950 bytes, then the target for the next frame is set to 1050 bytes. Thus, if the encoder achieves the target on the second frame, the bit rate is evened out to 1000 bytes/frame on average per frame.
Another aspect of the encoding process is related to the calculation of the target bit for a row of a frame. When a frame has been encoded, the number of bits used for the different rows may vary. There are two factors that affect how many bits are used for encoding a row, the energy of the row and the QP used for the row. However, when determining the bit target for a row, it is desirable to take into account the variation in energy of the row but not the variation of the QP of the row. This is so because it is desirable to use the statistics from the previous frame. Ideally, the same QP is used for all the rows in a frame. However, because of the uncertainty in QP during encoding of the previous frame, it will typically have been encoded with different QPs for different rows. Thus, the encoding of the current frame may not use the true number of bits from the encoding of the previous frame when the targets for the current frame are set up, but rather the encoding should estimate how many bits each row would have cost if the bits were encoded with the same QP and use this bit number because it is desirable to use the same QP for all rows.
When a frame has been encoded, the energy is known for each row, and the average QP may be calculated. The energy may be inserted into formula (1) together with the average QP to determine an estimation of how the bits would have been distributed if all rows were encoded with the same QP. This distribution of the bits is used when calculating the target bits for each row in the next frame. It is noted that the calculated distribution of the bits to be used in the frame to be encoded is different from the distribution of the bits in the already encoded, previous, frame. In addition, the target bit for a row is calculated with respect to the bits left when the row above the current row has been encoded, in order to match the bit target of the frame. This process is not applied if there is low correlation between a previous frame and the current frame, as will be discussed later.
Another aspect of the encoding process is related to the recoding of a row. In some cases, the QP that is calculated from formula (1) gives rise to a coding with too few or too many bits with respect to the target bit of the row, when comparing the target bits to a given threshold. If this situation occurs, the row may be recoded with a QP calculated from the energy that is retrieved in the first encoding to reduce or increase the number of bits relative to the given threshold. For the last row, the threshold is set to achieve the target bit of the entire frame, i.e., the difference between encoded bits and target bits is no more than a specified limit for the frame. According to an exemplary embodiment, each row has its own threshold set with respect to a target of the row and the last row has a threshold set with respect to the frame.
In another aspect related to the coding process, the QP may be changed within a row, i.e., a first QP is used for a first part of the row and a second QP, different from the first QP, is used for a second part of the same row. According to this exemplary embodiment, this (rare) case occurs when one QP for the last row gives too many bits for the whole frame (above a specified threshold) but the next QP value gives too few bits for the whole frame (below a specified threshold). In this particular case, the QP changes within the last row. A process that produces the right amount of bits (if the distance between the thresholds interval is large enough) is to start with the lower QP and after each macroblock, evaluate if the total number of bits is above the lower threshold. When this threshold is passed, the QP is increased by one.
In still another aspect related to the encoding process, the QP is temporally aligned. More specifically, when the QP is calculated with formula (1) for a current row in a current frame, the effective QP of a corresponding row in the previous frame is not taken into account, although it is easier to encode a row if that row had a low QP in the previous frame and harder to encode the row if that row had a high QP in the previous frame. To take into account the QP of the corresponding row in the previous frame, the current QP is approximated by:
where Q
In another aspect related to the encoding process, the QP fluctuation is moderated. The rate control discussion until now was silent about controlling the QP to not fluctuate within a picture. When the QP fluctuates within a picture, different parts of the picture may have different visual qualities. Thus, according to an exemplary embodiment, every second row is assigned a high QP and every other row is assigned a low QP. In order to obtain smoother QP transitions, a limitation for the first QP guess (not the recodings) is introduced, with respect to the row above. Having alternating low and high QPs from one row to the next row is undesirable. That is, if one row is encoded with too many bits (but still below the threshold) the next may be given much higher QP (because the bit budget is reduced), but may be encoded with fewer bits than the target. Then the QP is increased and the alternating problem occurs. The above discussed exemplary embodiment solves this problem by introducing those restrictions.
In another aspect related to the encoding process, an Extracted Sum of Absolute Differences (ESAD) novel method is used to determine differences between two frames. In natural video sequences (i.e., where the action flows without sudden changes in the frames), there is often a high correlation between two consecutive frames. This correlation is used by the modern video codecs and also when estimating the energy for a row. If there were no differences between two consecutive frames, the video sequence would just be a static image. In general, the stronger the differences between successive frames (motion changes, new objects, luminance changes etc.), the harder it is to encode (higher QP is required for a certain bit-rate) the respective frames. ESAD uses the motion vectors from the previously encoded frame together with the current frame that is going to be encoded. The Sum of Absolute Differences (SAD) is calculated for all pixels in a row with the motion vector of the latest encoded frame applied to the pixel positions, as indicated by
In this regard,
Assume that the character 808 has further moved from frame 802 to frame 804 along a straight line, for simplicity. Thus, if the motion vector encoded for frame 802 is extracted and applied to frame 804 and the sum of absolute differences is calculated for the pixels of rows 808 of frames 802 and 804, which have the same position, the ESAD value is determined. In this case, the ESAD value is low, correctly indicating a high correlation between frames 802 and 804.
With regard to Case 2, the difference from Case 1 is that character 806 moves differently from frame 802 to frame 804 than from frame 800 to frame 802. Thus, the motion vector 812 extracted from the encoding of frame 802 does not describe well the movement of the character 806 in frame 804, which results in a large ESAD value, correctly indicating the low correlation of frames 802 and 804.
However, a high ESAD value does not necessarily mean that the corresponding QP shall be increased from the previous frame, because it is possible that the previous frame also had high ESAD value. It is the difference between the ESAD value from the current frame and the ESAD value from the previous frame that reveals if the QP should be increased. Thus, based on the calculated ESAD value of the current frame relative to the previous frame, it can be detected if drastic changes are occurring in the sequence of frames.
In another aspect related to the encoding process, a scene change may be detected by using ESAD method described above. If a sequence of frames that is to be encoded contains scene cuts or abrupt scene changes (such as flashes or high speed motions), it may be necessary to make sharp increases in the QP. In these cases, there is a low correlation between the frame or frames just before the scene change and the frame or frames after the scene change. To determine whether there is or not a scene change between two or more frames, at least one of the following two conditions should take place: (1) the increase in the ESAD value for the two adjacent frames may be above a certain threshold, and (2) the ESAD value itself may be above a certain threshold. In an exemplary embodiment, both values should be above the corresponding thresholds. Once a scene change is identified, the encoding parameters for a frame that is the first in the scene are set in the same way as they are set when a first frame in a sequence is encoded. Thus, the estimated QP for the frame in the scene change is calculated, according to an exemplary embodiment, only from the target bits for the frame and the frame size. Therefore, the ESAD values are desirable to be calculated prior to encoding in order to decide a suitable encoding method.
In another aspect related to the encoding process, an energy change detection and compensation in the rows of a frame is performed using the ESAD method. As described above, an increase in energy may be detected by analyzing the difference of the ESAD values for two different but adjacent frames. If the ESAD value has increased (significantly) in some rows but not in all rows of one frame but no scene change is detected, two operations may be performed. First, the bit distribution, which is used in the encoding, is increased for those rows that have an ESAD increase to accommodate the increase in energy. Second, the Qtemporal introduced in equation (5) may be increased for all the rows in the frame, to ensure that the quality level is smooth for the whole frame even when QP has to be increased. The magnitudes of the increments are determined based on the magnitude of the ESAD increments. In one exemplary embodiment, the magnitude of the increments is proportional to the magnitude of the ESAD increments and they have a maximum value.
The novel rate control and associated aspects discussed above have been implemented in an H.264 encoder. However, the process is not limited to H.264 but may be used with any codec that uses quantization levels. For those cases, the Bits, QP, and Energy formulas 1, 2 and 3, are adjusted depending on the nature of the quantization for that codec (for example logarithmic or linear).
In an exemplary embodiment, the frames may be partitioned in other ways than macroblock rows. The GOBs may consist of a number of rows or part of a row. In this respect,
An exemplary encoding process is discussed with reference to
The encoding process starts in step 100 with receiving one or more frames that have to be encoded. In step 102 the current frame is partitioned into groups of blocks, for example, groups of rows. In step 104, the encoding device calculates the row target bit distribution. In step 106, the frame is encoded. In step 108, the communication device verifies whether new frames have to be encoded. If more frames have to be encoded, the process returns to step 100. If no frames are left to be encoded, the encoded frames are transmitted to another communication device and the process ends.
Step 106 of encoding the frame includes various substeps which are shown in
Various optional features may be added to the process described in
The rate control process disclosed in the exemplary embodiments has a low complexity overhead and is efficient at achieving a target within a specified interval. The process is also flexible as it works for many frame sizes, bit rates, frame rates and the process may be applied to all video codecs that use quantization levels.
According to an exemplary embodiment, a method for controlling a bit rate when encoding video data is illustrated in
The disclosed exemplary embodiments provide a communication system, a method and a computer program product for encoding video data. It should be understood that this description is not intended to limit the invention. On the contrary, the exemplary embodiments are intended to cover alternatives, modifications and equivalents, which are included in the spirit and scope of the invention as defined by the appended claims. Further, in the detailed description of the exemplary embodiments, numerous specific details are set forth in order to provide a comprehensive understanding of the claimed invention. However, one skilled in the art would understand that various embodiments may be practiced without such specific details.
As also will be appreciated by one skilled in the art, the exemplary embodiments may be embodied in a wireless communication device, a telecommunication network, as a method or in a computer program product. Accordingly, although the exemplary embodiments described above refer to the usage of an encoder 10 and a CPU 52 which can be used to perform, for example, the various encoding and partitioning, estimating, determining and calculating functions, respectively, it will be appreciated that other “means” for performing the afore-described functions may likewise be used in other exemplary embodiments. For example, in such exemplary embodiments the “means” for performing such functions may take the form of pure hardware, pure software or both hardware and software components. Thus, the exemplary embodiments or “means” may take the form of a computer program product stored on a computer-readable storage medium having computer-readable instructions embodied in the medium. Any suitable computer readable medium or “means” may be utilized including hard disks, CD-ROMs, digital versatile disc (DVD), optical storage devices, or magnetic storage devices such a floppy disk or magnetic tape. Other non-limiting examples of computer readable media or “means” include flash-type memories or other known memories.
The present exemplary embodiments may be implemented in a user terminal, a base station, and generally in a wireless communication network or system comprising both the user terminal and the base station. Thus, the exemplary embodiments, as well as the various “means” for performing the afore-described functions may also be implemented in an application specific integrated circuit (ASIC), or a digital signal processor. Suitable processors or “means” include, by way of example, a general purpose processor, a special purpose processor, a conventional processor, a digital signal processor (DSP), a plurality of microprocessors, one or more microprocessors in association with a DSP core, a controller, a microcontroller, Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) circuits, any other type of integrated circuit (IC), and/or a state machine. A processor in association with software may be used to implement a radio frequency transceiver for use in the user terminal, the base station or any host computer. The user terminal may be used in conjunction with modules, implemented in hardware and/or software, such as a camera, a video camera module, a videophone, a speakerphone, a vibration device, a speaker, a microphone, a television transceiver, a hands free headset, a keyboard, a Bluetooth module, a frequency modulated (FM) radio unit, a liquid crystal display (LCD) display unit, an organic light-emitting diode (OLED) display unit, a digital music player, a media player, a video game player module, an Internet browser, and/or any wireless local area network (WLAN) module.
Although the features and elements of the present exemplary embodiments are described in the embodiments in particular combinations, each feature or element can be used alone without the other features and elements of the embodiments or in various combinations with or without other features and elements disclosed herein. The methods or flow charts provided in the present application may be implemented in a computer program, software, or firmware tangibly embodied in a computer-readable storage medium for execution by a general purpose computer or a processor.
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/SE08/50766 | 6/25/2008 | WO | 00 | 11/10/2010 |