The present disclosure generally relates to encoding of video signals and more particularly relates to variable bitrate encoding of video signals.
In many electronic devices, video information is encoded to reduce the size of the information and thus reducing the resources required to communicate or store the video information. The encoded video information is typically decoded before it is displayed. To ensure reliable communication of video information between different electronic devices, standards have been promulgated for many encoding methods including the H.264 standard that is also referred to as MPEG-4, part 10 or Advanced Video Coding, (AVC). Rate control is frequently employed in video encoding or transcoding applications in an attempt to ensure that picture data being encoded meets various constraints, such as network bandwidth limitations, storage limitations, or processing bandwidth limitations, which may dynamically change. These constraints are reflected in the target bit rate for the resulting encoded video stream, and thus the goal of rate control is to maintain the bit rate of the encoded stream within a certain range of the target bit rate.
The present disclosure may be better understood, and its numerous features and advantages made apparent to those skilled in the art by referencing the accompanying drawings.
To illustrate via an example, in some scenarios an encoder may identify, early in a video stream, a relatively complex portion of video information. In order to maintain quality of the encoded video stream, it is desirable for the encoder to increase the output bit rate. However, if the length of the video stream is ignored, the encoder typically must set the output bit rate to a relatively lower level than might otherwise be desirable in order to account for potential additional complex portions later in the video stream that are to be encoded. In particular, when the length of the video stream is unknown, the amount of potential additional complex portions is also unknown and difficult to predict. Further, failure to account for those additional complex portions can lead to an undesirably low output bit rate for the additional complex portions, resulting in poor quality of the encoded video stream, or to the ABR being exceeded for the encoded video stream, causing storage overflow or other errors. Accordingly, to avoid such errors by accounting for the possibility of the additional complex portions, the encoder can set the output bit rate, even for relatively complex portions, to a relatively low rate, potentially reducing the overall quality of the encoded video stream more than is needed to achieve the target ABR. By taking into account the length of the input video stream, the encoder can more aggressively set the output bit rate for complex portions of the video stream. For example, if the encoder identifies that there is a lot of time remaining in the video stream when a complex portion of the input video stream is encountered, it can set the output bit rate to a relatively high level, under the assumption that there will be relatively few complex portions remaining and the target ABR can therefore be achieved. If the encoder identifies that there is relatively little time remaining before the end of the video stream when a complex portion of the video stream is encountered, the encoder can set the output bit rate to a relatively low level to ensure that the ABR is achieved. The encoder can thus improve the overall quality of the encoded video stream while ensuring that the ABR is achieved.
For ease of illustration, the techniques of the present disclosure are described in the example context of the ITU-T H.264 encoding standards, which are also commonly referred to as the MPEG-4 Part 10 standards or the Advanced Video Coding (AVC) standards. However, the techniques of the present disclosure are not limited to this context, but instead may be implemented in any of a variety of block-based video compression techniques, examples of which include the MPEG-2 standards and the ITU-T H.263 standards.
In operation, the video source 102 transmits or otherwise provides an input video stream 108 to the video processing device 104 in either an analog format, such as a National Television System Committee (NTSC) or Phase Alternating Line (PAL) format, or a digital format, such as an H.263 format, an H.264 format, a Moving Picture Experts Group (MPEG) format (such as MPEG1, MPEG-2 or MPEG4), QuickTime format, Real Media format, Windows Media Video (WMV) or Audio Video Interleave (AVI), or other digital video format, either standard or proprietary. In instances whereby the input video stream 108 has an analog format, the video processing device 104 operates to encode the input video stream 108 to generate an encoded video stream 110, and in instances whereby the input video stream 108 has a digital format, the video processing device 104 operates to transcode the input video stream 108 to generate the encoded video stream 110. The resulting encoded video stream 110 is stored at the storage module 160 for subsequent decoding and display.
In the illustrated embodiment, the video processing device 104 includes interfaces 112 and 114, an encoder 116, a rate control module 118, and, in instances whereby the video processing device 104 provides transcoding, a decoder 120. The interfaces 112 and 114 include interfaces used to communicate signaling with the video source 102 and the video destination 106, respectively. Examples of the interfaces 112 and 114 include input/output (I/O) interfaces, such as Peripheral Component Interconnect Express (PCIE), Universal Serial Bus (USB), Serial Attached Technology Attachment (SATA), wired network interfaces such as Ethernet, or wireless network interfaces, such as IEEE 802.11x or Bluetooth™ or a wireless cellular interface, such as a 3GPP, 4G, or LTE cellular data standard. The decoder 120, the encoder 116, and rate control module 118 each may be implemented entirely in hard-coded logic (that is, hardware), as the combination of software stored in a memory 122 and a processor 124 to access and execute the software, or as combination of hard-coded logic and software-executed functionality. To illustrate, in one embodiment, the video processing device 104 is implemented as a SOC whereby portions of the decoder 120, the encoder 116, and the rate control module 118 are implemented as hardware logic, and other portions are implemented via firmware stored at the SOC and executed by a processor of the SOC.
The hardware of the video processing device 104 can be implemented using a single processing device or a plurality of processing devices. Such processing devices can include a central processing unit (CPU), a graphics processing unit (GPU), a microcontroller, a digital signal processor, a field programmable gate array, programmable logic device, state machine, logic circuitry, analog circuitry, digital circuitry, or any device that manipulates signals (analog and/or digital) based on operational instructions that are stored in a memory, such as the memory 122. The memory 122 may be a single memory device or a plurality of memory devices. Such memory devices can include a hard disk drive or other disk drive, read-only memory, random access memory, volatile memory, non-volatile memory, static memory, dynamic memory, flash memory, cache memory, and/or any device that stores digital information. Note that when the processing module implements one or more of its functions via a state machine, analog circuitry, digital circuitry, and/or logic circuitry, the memory storing the corresponding operational instructions may be embedded within, or external to, the circuitry comprising the state machine, analog circuitry, digital circuitry, and/or logic circuitry.
In a transcoding mode, the decoder 120 operates to receive the input video stream 108 via the interface 112 and partially or fully decode the input video stream 108 to create a decoded data stream 126, which can include pixel information, motion estimation/detection information, timing information, and other video parameters. The encoder 116 receives the decoded data stream 126 and uses the video parameters represented by the decoded data stream to generate the encoded video stream 110, which comprises a transcoded representation of the video content of the original input video stream 108. The transcoding process implemented by the encoder 116 can include, for example, a stream format change (e.g., conversion from an MPEG-2 format to an AVC format), a resolution change, a frame rate change, a bit rate change, and the like. In an encoding mode, the decoder 120 is bypassed and the input video stream 108 is digitized and then encoded by the encoder 116 to generate the encoded video stream 110.
In at least one embodiment, the video source 102 provides to the video processing device information indicating the length of the input video stream 108, such as by indicating, for example, an amount of time it takes to display video based on the input video stream 108 at a designated frame rate. For example, the input video stream 108 can represent a program or portion of a program to be recorded, and the length of the input video stream 108 can be indicated by the amount of time it would require to display the program or portion of the program, at a designated frame rate. The length of the input video stream 108 can be provided via a user input, via an electronic programming guide, and the like. For example, the video stream 108 may be generated in response to a user request to record a program represented by the input video stream 108. In response, the video processing device 104 can identify (e.g. based on an electronic programming guide) a length of the program when the program is displayed at a typical frame rate (e.g. 60 frames per second). Based on this information, the video processing device 104 can identify the length of the input video stream 108 using a linear transformation that relates the length of the program to the length of the input video stream 108.
In at least one embodiment, the rate control module 118 utilizes the length of the input video stream 108 to determine the length of the portion of the stream that has not yet been encoded (referred for purposes of description as the “remaining stream time”) and to dynamically determine and adjust various encoding parameters used by the encoder 116 based on the remaining stream time. In one embodiment, these encoding parameters include a control signal 128 (denoted “QP” in
In operation, the encoder 116 employs a subtraction process and motion estimation process for data representing macroblocks of pixel values for a picture to be encoded. The motion estimation process, employed by the SMS module 202, compares each of these new macroblocks with macroblocks in a previously stored reference picture or pictures to find the macroblock in a reference picture that most closely matches the new macroblock. The motion estimation process then calculates a motion vector, which represents the horizontal and vertical displacement from the macroblock being encoded to the matching macroblock-sized area in the reference picture. The motion estimation process also provides this matching macroblock (known as a predicted macroblock) out of the reference picture memory to the subtraction process, whereby it is subtracted, on a pixel-by-pixel basis, from the new macroblock entering the encoder. This forms an error prediction, or “residual”, that represents the difference between the predicted macroblock and the actual macroblock being encoded. The encoder 116 employs a two-dimensional (2D) discrete cosine transform (DCT) to transform the residual from the spatial domain. The resulting DCT coefficients of the residual are then quantized using a corresponding QP so as to reduce the number of bits needed to represent each coefficient. The quantized DCT coefficients then may be Huffman run/level coded to further reduces the average number of bits per coefficient. This is combined with motion vector data and other side information (including an indication of I, P or B pictures) for insertion into the encoded video stream 110.
For the case of P/B reference pictures, the quantized DCT coefficients also go to an internal loop that represents the operation of the decoder (a decoder within the encoder). The residual is inverse quantized and inverse DCT transformed. The predicted macroblock is read out of the reference picture memory is added back to the residual on a pixel by pixel basis and stored back into a memory to serve as a reference for predicting subsequent pictures. The encoding of I pictures uses the same process, except that no motion estimation occurs and the negative (−) input to the subtraction process is to be spatial predicted. In this case the quantized DCT coefficients represent residual values from spatial prediction rather than from both temporal and spatial prediction as was the case for P and B pictures. As is the case for P/B reference pictures, decoded I pictures are stored as reference pictures.
The rate quantization module 208 receives the length of the input video stream 108 and, based on the length, continuously identifies the remaining stream time. The rate-quantization module 208 uses the image complexity, target bit allocations, and remaining stream time as parameters for determining the QP, which in turn determines the degree of quantization performed by the encoder 116 and thus influences the bit rate of the resulting encoded video data. In one embodiment, the image complexity is estimated by an complexity estimation module 213 (implemented, for example, as part of the SMS module 202), which calculates a SVAR metric and a PCOST metric from the residuals and other pixel information of a picture as an estimate of image complexity for a picture to be encoded. The SVAR and PCOST metrics may be calculated using any of a variety of well-known algorithms. The bit allocations are represented by target numbers of bits that may be allocated at different granularities, such as per picture, GOP, slice, or block. In one embodiment, the HRD 206 maintains a model of the buffer fullness (e.g., a coded picture buffer (CPB)) of a modeled decoder at the video destination 106 (
At time 302, the rate control module 118 identifies a relatively complex portion of the input video stream 108 is to be encoded. Further, based on the length of the input video stream 108, the rate control module 118 identifies that the remaining stream time has a relatively large value. Accordingly, the rate control module 118 sets the bit rate for the encoded video stream 110 to a level designated level 320. At time 303, the rate control module 118 identifies that the input video stream still has the same complexity as identified at time 302 (e.g. has the same SVAR and PCOST metrics) but the remaining stream time (the time before time 305) is below a threshold. In response, the rate control module 118 lowers the bit rate for the encoded video stream 110 to a lower level, designated level 321. At time 304, the rate control module 118 identifies that the complexity of the input video stream 108 has fallen, and in response sets the bit rate for the encoded video stream 110 to a lower level, designated level 322, to ensure that the encoded video stream 110 meets the ABR 315.
At time 402, the rate control module 118 identifies a relatively complex portion of the input video stream 108 is to be encoded. Further, based on the length of the input video stream 108, the rate control module 118 identifies that the remaining stream time has a relatively large value. Accordingly, the rate control module 118 sets the bit rate for the encoded video stream 110 to a level designated level 420. At time 403, the rate control module 118 identifies that the complexity of the input video stream 108 has fallen, and in response lowers the bit rate for the encoded video stream 110 to a lower level, designated level 421. At time 404, the rate control module 118 identifies that the input video stream 108 has returned to the same complexity as identified at time 402 (e.g. has the same SVAR and PCOST metrics). However, the rate control module 118 identifies that the remaining stream time is lower. Accordingly, the rate control module 118 increases the bit rate for the encoded video stream 110 to a level designated level 422. This level is higher than for less complex portions of the input video stream 108, but is lower than level 420 to account for the fact that there is less remaining stream time, and therefore subsequent complex portions of the input video stream 108 could cause the ABR 415 to be exceeded if the bit rate were set to level 420. At time 405, the rate control module 118 identifies that the complexity of the input video stream 108 has fallen, and in response sets the bit rate for the encoded video stream 110 to a lower level, designated level 423, to ensure that the encoded video stream 110 meets the ABR 315.
If, at block 506, the encoder 116 identifies that there is additional information in the input video stream 108 to be encoded, the method flow moves to block 510 and the rate control module 118 identifies a complexity for the next portion of the input video stream 108 to be encoded. At block 512 the rate control module identifies, based on the time that the portion to be encoded occurs in the video stream 108 and the length of the video stream 108, the remaining stream length. At block 514 the rate control module sets the bit rate for encoding the portion of the input video stream 108 based on the target ABR, the remaining stream length, and the complexity of the portion. At block 516 the encoder 116 encodes the portion of the input video stream so that the corresponding portion of the encoded output stream 110 has the bit rate set at block 514. The method flow returns to block 506.
In this document, relational terms such as “first” and “second”, and the like, may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual relationship or order between such entities or actions or any actual relationship or order between such entities and claimed elements. The term “another”, as used herein, is defined as at least a second or more. The terms “including”, “having”, or any variation thereof, as used herein, are defined as comprising.
Other embodiments, uses, and advantages of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. The specification and drawings should be considered as examples only, and the scope of the disclosure is accordingly intended to be limited only by the following claims and equivalents thereof.
Note that not all of the activities or elements described above in the general description are required, that a portion of a specific activity or device may not be required, and that one or more further activities may be performed, or elements included, in addition to those described. Still further, the order in which activities are listed are not necessarily the order in which they are performed.
Also, the concepts have been described with reference to specific embodiments. However, one of ordinary skill in the art appreciates that various modifications and changes can be made without departing from the scope of the present disclosure as set forth in the claims below. Accordingly, the specification and figures are to be regarded in an illustrative rather than a restrictive sense, and all such modifications are intended to be included within the scope of the present disclosure.
Benefits, other advantages, and solutions to problems have been described above with regard to specific embodiments. However, the benefits, advantages, solutions to problems, and any feature(s) that may cause any benefit, advantage, or solution to occur or become more pronounced are not to be construed as a critical, required, or essential feature of any or all the claims.