1. Field
The present disclosure relates generally to the field of video data processing, and, more particularly, to a multi-panel rate control method for real-time digital video encoders such as an MPEG-4 or an H series encoder.
2. Related Art
Video signals generally include data corresponding to one or more video frames, where each video frame is composed of an array of picture elements (pels). A typical color video frame at standard resolution can be composed of over several hundred thousand pel arranged in an array of blocks. Since each pel has to be characterized with a color (or hue) and luminance characteristics, these data may be represented with groups of four luminance pel blocks and two chrominance pel blocks called macroblocks (MBs). Thus, digital signals representing a sequence of video frame data, usually containing many video frames, have a large number of bits. However, the available storage space and bandwidth for transmitting such signals is limited. Therefore, compression processes are used to more efficiently transmit or store video data.
Compression of digital video signals for transmission or for storage has become widely practiced in a variety of contexts, especially in multimedia environments for video conferencing, video games, Internet image transmissions, digital TV and the like. Coding and decoding are accomplished with coding processors which may be general computers, special hardware or multimedia boards and other suitable processing devices.
Compression processes typically involve quantization, in which sampled video signal data values are represented by a fixed number of redefined quantizer values. The quantized signal is composed of quantizer values that are approximations of the sampled video signal. Therefore, the encoding of the video signal data onto a limited number of quantizer values necessarily produces some loss in accuracy during the decoding process.
The following disclosure describes embodiments of a method which provides an improved video digital data compression capable of adjusting the quantization parameter to achieve an improved coding and decoding process. This method may employ an encoder having a panel based architecture with a digital signal processor to handle one or several rows of MB in lieu of frame-by-frame, thereby, allowing a greater number of frames to be processed. Various embodiments of this method further have the capability of handling both frame and field pictures as opposed to one mode of picture such as frame pictures.
The embodiments may perform several steps including bit allocation, rate control and adaptive quantization. Bit allocation assigns a target number of bits per group of pictures, and per picture of each type. Rate control adjusts the quantization parameter at the MB level to achieve that target number of bits per picture. Adaptive quantization further modulates the parameter per MB using a local activity measure. Bit allocation and rate control can be implemented through a central control unit, or a central digital signal processor (DSP) while adaptive quantization can be implemented at the local panel. One encoder employing the method may have a central DSP and several panels, each with its own DSP. During the process of encoding, a whole frame is divided into multiple slices which are processed in parallel by the DSPs at the multiple panels.
An embodiment capable of implementing the present disclosure may be a video data encoding apparatus comprising of a processor, an input/output device, a memory, and a video encoding module capable of performing bit allocation by assigning a target number of bits per GOP, rate control by adjusting the quantization parameter QP to achieve said target number of bits, and adaptive quantization by modulating the quantization parameter using the local activity measure. Such embodiment shall have the capability of performing scene changes within a GOP, and also the ability to check and adjust the target number of bits assigned for a picture I, P or B in order to prevent the system from overflowing and underflowing.
The above-mentioned features of the present disclosure will become more apparent with reference to the following description taken in conjunction with the accompanying drawings wherein like reference numerals denote like elements and in which:
The following detailed description is not to be taken in a limiting sense, but is made merely for the purpose of illustrating general principles of embodiments of the present disclosure. The scope of the present disclosure is best defined by the appended claims.
In one embodiment, the real-time video encoder system 10 is implemented on a general-purpose computer or any other hardware equivalent. Thus, the real-time video encoder system 10 may comprises a processor (CPU) 11, memory 12, e.g., random access memory (RAM) and/or read only memory (ROM), video encoding module 14, and various input/output devices 13, (e.g., storage devices, including but not limited to, a tape drive, a floppy drive, a hard disk drive or a compact disk drive).
It should be understood that the video encoding module 14 may be implemented as one or more physical devices that are coupled to the processor 11 through a communication channel. Alternatively, the video encoding module 14 may be represented by one or more software applications or with a combination of software and hardware, e.g., using application specific integrated circuits (ASIC), where the software is loaded from a storage medium, (e.g., a magnetic or optical drive or diskette) into memory 12 and operated on by the processor 11 of the video encoding module 14. As such, the video encoding module 14 (including associated data structures) of the present embodiment may be stored on a computer readable medium, e.g., RAM memory, magnetic or optical drive or diskette or the like.
Real-time video encoders may have a multi-panel architecture for processing a whole picture. In such an architecture, a picture 21 is divided into several slices 22, and each panel 23 processes one of these slices 22. as shown in
In one embodiment, it is assumed in the following that a picture can be of type intra picture (I), predictive coded picture (P), or bi-directional predictive coded picture (B).
With respect to first main step of bit allocation 32, pictures of the input video sequence are grouped into GOPs. A GOP 50 may contain one I picture 51 and a few P pictures 52, as shown in
Target Rate Per GOP: Given a target bit rate of bit_rate in bits per second and a picture rate of pic_rate in pictures per second, a GOP 50 of NGOP pictures is budgeted a nominal number of bits as
At the beginning of a GOP 50, the central DSP calculates a target number of bits, RGOP
Target Rate Per Picture: Given a target number of bits for a GOP 50, RGOP
For instance, in one embodiment of an encoder such as a MPEG-4 AVC |H.264, encoding a MB may require the coded information of its left and above neighbor MBs. The geometric positions of the current MBs in panels therefore may not be the same, as shown in
The present method allows an interlace picture of two fields, field 0 and field 1, to be encoded as a single frame picture or as two separate field pictures. An encoder such as an MPEG encoder may allow adaptive switching between frame and field picture coding. The rate control method therefore maintains two sets of the complexity measures of pic_type I, P and B picture: one for frame pictures and one for field pictures. The target numbers of bits for frame pictures of pic_type I, P and B are set as
and the target numbers of bits for field pictures of pic_type I, P and B are set as
where pic_type indicates the picture type of I, P or B for the current picture; Cpic
GOP Remaining Bits Updating: After encoding a picture of type I, P or B, the remaining number of bits for the current GOP is updated as RGOP
Complexity Initialization: At the beginning of a sequence, the complex measures for frame and field pictures are initialized. For example,
After the first I and P frame pictures, the complexity measure for B frame picture is set based upon the updated complexity measures of I and P. For example,
Cframe
If the first I frame is coded as one I field followed by one P field, the complexity measures for P field 0 and B field pictures are set based upon the updated complexity measures of I and P. For example,
Note that the above settings for complexity measures are implemented only once per sequence.
Complexity Updating: The complexity measure of pic_type I, P or B is defined as the product of the number of bits used for a picture of pic_type I, P or B and the associated coding distortion, D, that is, Cpic
Note that a picture is encoded only once, either in frame mode or in field mode. However, the complexity measures in both frame and field modes are updated. Specifically, when a picture is coded in frame mode, the complexity measures in frame mode are updated using equation Cpic
When a picture is coded in field mode, the complexity measures in field mode are updated using the equations
Cpic
After field 1 is coded, the complexity measures in frame mode are also updated as
Picture Number Updating: The numbers of I, P and B (frame) pictures per GOP, NI, NP, and NB, are pre-set. For example, assume there is only one I frame in a GOP of NGOP and Nsub
Further assume that I in field mode is configured to be coded as two I fields, or I field 0 followed by P field 1, or P field 0 followed by I field 1, or I field 0 followed by B field 1, or B field 0 followed by I field 1, and P and B in field mode are configured as two P fields and two B fields, other configurations for P and B in field are also possible.
At the beginning of a GOP, the remaining numbers of I, P and B frame and field pictures for the current GOP are set as
and if I in field mode is configured to be coded as two I fields,
or if I in field mode is configured to be coded as I field 0 followed by P field 1,
or if I in field mode is configured to be coded as P field 0 followed by I field 1,
or if I in field mode is configured to be coded as one I field and one B field,
After a frame picture of I, P or B is encoded, the corresponding number of I, P or B pictures in the current GOP is updated in the following manner: if it is a I picture and if the I picture in field is configured to be coded as two I fields, then
or if the I picture in field mode is configured to be coded as I field 0 followed by P field 1, then
or if the I picture in field mode is configured to be coded as P field 0 followed by I field 1, then
or if the I picture in field mode is configured to be coded as one I field and one B field,
else, if it is a P picture, then
After field 0 of I, P, or B is encoded, the corresponding number of I, P or B pictures in the current GOP is updated in the following manner: if it is a I picture, then nfield
After field 1 of I, P or B is encoded, the corresponding number of I, P or B pictures in the current GOP is updated in the following manner: if it is an I picture, then
else if it is a P picture and if field 0 is coded as I, then nframe
Scene Change Handling: In one embodiment, a proposed encoder system allows preview beyond the current GOP in handling a scene change. If a scene change occurs within a GOP and I picture in the GOP is in the new scene, no action is taken. Otherwise, the first P picture in the new scene is changed to I picture. The following process is invoked depending upon whether the first P picture in the new scene is in the first half or the second half of the GOP.
Assume that the first P picture in the new scene is the Nth picture of the GOP. If the first P picture is in the first half of GOP, the scheduled I in the GOP is changed to P picture. This creates a longer GOP 70 followed by a shorter current GOP 71, as shown in
The corresponding numbers of I, P and B frame and field pictures for the longer and shorter GOPs 70, 71 can be calculated from the above equations using the updated GOP lengths.
The nominal number of bits for the longer GOP 70 is set as
and the nominal number of bits for the shorter GOP 61 is reset as
On the other hand, if the first P in the new scene is in the second half of GOP, the scheduled I picture in the next GOP is changed to P picture. This creates a shorter GOP 72 followed by a longer GOP 73, as shown in
The shorter GOP 72 is of the length equal to NGOP=N−Nsub
Similarly, the corresponding numbers of I, P and B frame and field pictures for the longer and shorter GOPs 72, 73 can be calculated from the above equations using the updated GOP lengths. The nominal number of bits for the shorter GOP 72 is reset as
as and the nominal number of bits for the longer GOP 73 is reset as
An alterative embodiment to the above embodiment which compensates the longer or the shorter GOP is described in
Rate Control: The target number of bits per frame or field may be achieved by properly selecting a value of QP per MB or a group of MBs. MPEG4 AVC |H.264 encoder, for instance, allows a total of 52 possible values in quantization parameter (QP), e.g., 0, 1, 2, . . ., 51. Given the target numbers of bits for (frame or field) pictures of pic_type I, P and B, Tpic
is the initial virtual buffer fullness at the beginning of the picture of pic_type I, P or B in frame or field. The final virtual buffer fullness of the current picture, dpic
The above assumes that each MB uses the same nominal number of bits. An alternative embodiment provides for weighing the bit budget per MB according to its need. For example,
where acti is the local activity measure of MB(i),
and the index i is over all the MBs in the current picture. Another way of determining the virtual buffer fullness is by the equation
where costs is the cost measure of MB(i) (often used in mode decision), and
and the index i is over all the MBs in the current picture. The above two options tend to distribute the bits over MBs of a picture according to their need. The initial values of the virtual buffer fullness are set as dpic
The same reference quantization parameter, QPpic
Interval for Updating Reference QP: The central DSP checks the virtual buffer fullness at a constant, or variable, interval. The interval may be set around an average time for processing one or several MBs. At each checking time instant, say t, the central DSP receives the information on the number of MBs that have been processed since the last checking time from each of the panels of the current (one, two or three) pictures of pic_type I, P and/or B, and the associated bit, or bin, counts of the processed MBs. Note that due to the complexity of each MB as well as the possible different coding modes assigned, panels may not necessarily sync at processing their current MBs. Hence, at time t, panels may give slightly different numbers of the processed MBs, that is, 1, 0, or other numbers. The central DSP then re-computes the virtual buffer fullness dpic
Adaptive Quantization: The reference quantization parameter, QPpic
and xk(i, j) are the original pixel values of MB/sub_MB partition (k). Normalized local activity is given by
where β is a constant and avg_act is the average value of actj of the picture. The reference quantization parameter QPpic
Additional Buffer Protection: Assume that buffer_delay and decoder_buffer _size are the buffer delay and the decoder buffer size, respectively. The encoder buffer size can be set as buffer_size=min(buffer_delay, decoder_buffer_size). To prevent the overflow and underflow of both the encoder and decoder buffers, the target number of bits determined for the current picture in bit allocation, Tpic
Assume that buffer_occupany is the buffer occupancy of the encoder buffer. Before encoding a picture, the target number of bits assigned for the picture is checked and, if necessary, adjusted as follows: if
buffer_occupany+Tpic
Tpic
buffer_occupany+Tpic
Tpic
where α is a constant, and can be set, for example, to be between 0.90 and 0.95
It is understood that this multi-panel rate control method for real-time video encoders may also be applied in other type of encoders. Those skilled in the art will appreciate that the various adaptations and modifications of the preferred embodiments of this method and apparatus can be configured without departing from the scope and spirit of the present method and apparatus. Therefore, it is to be understood that, within the scope of the appended claims, the present method and apparatus may be practiced other than as specifically described herein.
Number | Name | Date | Kind |
---|---|---|---|
5986712 | Peterson et al. | Nov 1999 | A |
Number | Date | Country | |
---|---|---|---|
20080084928 A1 | Apr 2008 | US |