Video encoding techniques modify an input video stream to produce a compressed output video stream. Compression techniques may include intra refresh (IR) encoding in which data to encode a video unit, such as, for example, frames, slices, and/or macroblocks, comes only from the present video frame or inter prediction encoding in which data to encode a video unit comes from the present video frame by referring one or more temporally adjacent video frames. Under certain conditions, inter prediction encoding allows for higher compression rates for a given output video quality by leveraging similarities between the temporally adjacent video frames. In contrast, IR encoding is based only on techniques for compressing still images, such as discrete cosine transform (DCT) encoding. Inter prediction encoding may incorporate techniques for compressing still images in combination with leveraging data from temporally adjacent video frames.
While encoding any digital video signal, in order to allow potential random access of the video signal, as well as for error resiliency reasons (for the decoder to be able to recover if an access unit of the bitstream is corrupted), a set of video units may be encoded as instantaneous decoder refresh (IDR) type, which means the units are forced as IR encoding units, rather than including any inter prediction encoding units. Typically, those forced IDR units come in patterns (once every preset number of frames and/or preset specific regions within a frame), the patterns representing IR regions within the frames.
IDR units, such as IDR macroblocks, allow random access of a video signal, as well as for error resiliency, but may cause noticeable visual artifacts, such as repetitive patterns, that harm the subjective quality of the video. In this manner, encoding that includes IDR type macroblocks may result in different quality of a reconstructed signal compared to encoding that does not include IDR type macroblocks. Subjectively noticeable visual artifacts from IDR type macroblocks are generally undesirable.
In accordance with one aspect of the present disclosure, a method of encoding video data includes accessing video information, and encoding macroblocks in an intra refresh (IR) region of a video frame of the video information with restricted quantization parameters (QPs) based on QPs of macroblocks within at least one of a non-IR region of the video frame, and a co-located region of a neighboring video frame in a temporal domain.
In some examples of this aspect of the disclosure, the restricted QPs of the macroblocks in the IR region are based on QPs of an intermediate area in the non-IR region of the video frame, wherein the intermediate area of the video frame is adjacent to the IR region of the video frame.
In the same or different examples of this aspect of the disclosure, the restricted QPs of the macroblocks in the IR region are based on QPs of the co-located region of the neighboring video frame in the temporal domain.
In the same or different examples of this aspect of the disclosure, the video frame includes the IR region of the video frame, a first intermediate area adjacent a first side of the IR region of the video frame, a second intermediate area in the non-IR region of the video frame adjacent a second side of the IR region of the video frame, and remaining portions of the non-IR region of the video frame. In at least some of these examples, the method further includes encoding a majority of macroblocks in the first and second intermediate areas of the non-IR region of the video with inter prediction encoding while forcing IR encoding on a distributed minority of the macroblocks, as well as encoding a majority of macroblocks in the remaining portions of the non-IR region of the video frame predominately with inter prediction encoding while forcing IR encoding on a distributed minority of the macroblocks.
In accordance with one aspect of the present disclosure, an apparatus for encoding video data, the apparatus comprising a video encoder logic structured to: access video information, and encode instantaneous decoder refresh (IDR) type macroblocks in an intra refresh (IR) region of a video frame with restricted quantization parameters (QPs) based on QPs of macroblocks within at least one of a non-IR region of the video frame, and a co-located region of a neighboring video frame in a temporal domain.
In some examples of this aspect of the disclosure, the video encoder logic is further structured to limit restricted QPs of the IDR type macroblocks based on QPs of an intermediate area in the non-IR region of the video frame, wherein the intermediate area of the video frame is adjacent to the IR region of the video frame.
In the same or different examples of this aspect of the disclosure, the video encoder logic is further structured to limit restricted QPs of the IDR type macroblocks based on QPs of the co-located region of the neighboring video frame in the temporal domain.
In the same or different examples of this aspect of the disclosure, the video frame includes the IR region of the video frame, a first intermediate area adjacent a first side of the IR region of the video frame, a second intermediate area in the non-IR region of the video frame adjacent a second side of the IR region of the video frame, and remaining portions of the non-IR region of the video frame. In at least some of these examples, the video encoder logic is further structured to encode macroblocks in the first and second intermediate areas of the non-IR region of the video frame predominately with inter prediction encoding while forcing IR encoding on a distributed minority of the macroblocks, encode macroblocks in the remaining portions of the non-IR region of the video frame predominately with inter prediction encoding while forcing IR encoding on a distributed minority of the macroblocks, and also structured to encode macroblocks in the first and second intermediate areas includes forcing IR encoding on a greater proportion of the macroblocks than when encoding macroblocks in the remaining portions of the non-IR region of the video frame.
In accordance with one aspect of the present disclosure, a non-transitory computer readable medium comprising executable instructions that when executed cause an integrated circuit (IC) fabrication system to fabricate one or more ICs that comprise: a video encoder logic structured to: access video information, and encode instantaneous decoder refresh (IDR) type macroblocks in an intra refresh (IR) region of a video frame with restricted quantization parameters (QPs) based on QPs of macroblocks within at least one of a non-IR region of the video frame, and a co-located region of a neighboring video frame in a temporal domain.
The embodiments will be more readily understood in view of the following description when accompanied by the below figures and wherein like reference numerals represent like elements, wherein:
Techniques disclosed herein, including methods and apparatus, allow encoding of video data in a way to mitigate visible patterns from IDR macroblocks, such as those within IR regions of a video frame. Two techniques are introduced; the technique applies restrictions to limit variations in the quantization levels between different regions of the video frame. This first technique is described in most detailed with respect to
Turning now to the drawings, and as described in detail below, one example of the presently disclosed system is a video encoder comprising an encoder with QP limiting logic and/or forced IR encoded macroblocks outside an IR region of a video frame.
In some embodiments, encoding subsystem 102 may be an accelerated processing unit (APU), which may include one or more CPU cores or one or more general processing unit (GPU) cores on a same die. Alternatively, one or more of processor 104, memory 106, and video encoder 108 may include one or more digital signal processors (DSPs), one or more field programmable gate arrays (FPGAs), or one or more application-specific integrated circuits (ASICs). In some embodiments, some or all of the functions of processor 104, memory 106, and video encoder 108 may be performed by any suitable processors.
In some embodiments, some or all of the video encoder 108 including entropy encoder 110, QP limiting control logic 112, and any other logic described herein may be implemented by executing suitable instructions on, for example, processor 104 or any other suitable processor. In some examples, the executable suitable instructions may be stored on a computer readable storage medium, where the executable instructions are executable by one or more processors to cause the one or more processors to perform the actions described herein. In some embodiments, executable instructions may be stored on memory 106 or any other suitable memory that includes QP limiting encoder control code 138 that when accessed over communication link 124 and executed by processor 104 or any other suitable processor, control the video encoder 108 or parts thereof. For example, processor 104 may control the video encoding process by accessing the video encoder 108 over communication link 128. For example, video encoder 108 may include registers or other control mechanisms, such as within QP limiting control logic 112, that control some or all of the video encoding process. For example, communication link 134 may provide control information, data, or signals to the video encoder 108 to control the video encoding process. Some or all of this functionality may also be implemented in any other suitable manner such as but not limited to a software implementation, a firmware implementation, a hardware implementation, or any suitable combination of the example implementations described above.
As shown in
After the encoding process is performed, the video encoder 108 may generate encoded output video data 136 that may be provided to interface circuit 114. The interface circuit 114 may in turn provide encoded output video data 136 to expansion bus 140. The expansion bus 140 may further connect to, for example, a display 116; one or more peripheral devices 118; an additional memory 120 and one or more input/output (I/O) devices 122. The display 116 may be a cathode ray tube (CRTs), liquid crystal displays (LCDs), or any other type of suitable display. Thus, for example, after encoding the video data, the encoding subsystem 102 may provide the encoded output video data 136 for display on the display 116 and/or to any other suitable devices via, for example, the expansion bus 140. In some embodiments, the generated output video data 136 may be stored in memory, such as memory 106, additional memory 120, or any other suitable memory, to be accessed at a future time.
The encoding subsystem 102 may further include an optional video decoder 109. Video decoder 109 may be operable to decode video data, such as encoded output video data 136, received from external devices, such as the peripheral devices 118 and I/O devices 122. For example, especially with mobile devices, it may be necessary to both encode and decode video data with an APU, GPU, CPU or any other suitable integrated circuit(s) of the device.
In some embodiments, executable instructions that may include some or all of the video encoder 108, such as QP limiting control logic 112, and any other logic described herein may be stored in the additional memory 120 in addition to or instead of being stored in the memory 106. Additional memory 120 may also include, for example, QP limiting control code 138 that may be accessed by processor 104, or any other suitable processor, over communication link 130 to interface circuit 114. Interface circuit 114 allows access to expansion bus 140 over communication link 142, thus allowing processor 104 access to additional memory 120. The one or more I/O devices 122 may include, for example, one or more cellular transceivers such as a 3G, 4G or LTE transceiver, a WI-FI transceiver; a keypad, a touch screen, an audio input/output device or devices, a mouse, a stylus, a printer, and/or any other suitable input/output device(s).
When row or column based IR is enabled, through IDR video units, such as macroblocks forming IR region 252, the IR region may be visible in a video as a belt shaped flicker moving from the top of the frame to bottom for the row based IR or from the left to the right for the column based IR. This scanning flicker may be perceived as a visible artifact, especially when input video content containing fine granular texture is encoded with relatively high QP, i.e., lower bit rate. The scanning flicker may be caused by different quantization effect on the DCT coefficients of residue resulted from different prediction modes, such as IR and inter prediction modes. The scanning flicker may also be caused by different QPs between IR and non-IR regions in spatial domain, such as IR region 252 and areas 254, 256, 258 of the non-IR region of video frame 250, and that of two co-located macroblocks in neighboring frames in the temporal domain, such as IR region 273 of video frame 272 and co-located region 275 of video frame 276 (
In the temporal domain, between two neighboring frames, such as video frames 272, 276, when two co-located macroblocks, such as co-located macroblocks within IR region 273 of video frame 272 and co-located region 275 of video frame 276, are encoded in different prediction modes, the reconstructed version of those two co-located macroblocks look different. In spatial domain, the boundary between rows/columns based IR region and non-IR region can be observed. These visual difference may be annoying to a viewer when the input video with complex texture is encoded with relatively high QP.
In the encoding process of a hybrid video encoder providing for both IR encoding and inter prediction encoding, before quantization, the distribution of DCT coefficients generated from IR and inter prediction prediction are different. So even using the same QP, a different amount of quantization noise can be introduced. When this noise forms patterns, due to IDR type macroblocks for example, it may also contribute to undesirable visual artifacts in the encoded video stream.
Besides the impact from various prediction modes, such as IR and inter prediction, the QP difference in spatial and temporal domains bring not only different quantization effect but unequal values for loop filter parameters, and in turn different extent of smoothing on reconstructed picture which may amplify visual artifacts between IR and non-IR regions.
As discussed in further detail below, top intermediate area 254 and bottom intermediate area 256 may be utilized to mitigate visual artifacts from IR region 252. For example, as described in more detail with respect to
As another example, as described in more detail with respect to
As discussed in further detail below, co-located region 275 may be utilized to mitigate visual artifacts from IR region 273. For example, as described in more detail with respect to
Video encoder 108 includes DCT control logic 304, which transfers video information from the special domain to the frequency domain and calculates coefficients of residue based on the video information from frame buffer 302. DCT control logic 304 may be applied to both IR encoded video units and inter prediction encoded video units or only to IR encoded video units as well as the selected encoding mode (IR encoding or inter prediction encoding) from mode decision control logic 316. Quantization control logic 306 receives video information with DCT coefficients from DCT control logic 304, and drops input video information based on the QP′ value received from QP limiting control logic 112. The QP′ value for each unit may be based on a QP valuation determined by the rate control logic 311, as limited by QP limiting control logic 112. As discussed in further detail with respect to
As shown in
The video information from the quantization control logic 306, which has been reduced based on the QP′ value from QP limiting control logic 112 is also redirected to facilitate inter prediction encoding. Specifically, the inverse quantization control logic 320 may perform the reverse process of the quantization control logic 306. The output of the inverse quantization control logic 320 will not include more information than the output of the quantization control logic 306. Since the quantization control logic 306 resulted in information loss of the input video signal, the output of the inverse quantization control logic 320 will also represent less information than the input video data 132. The output of the inverse quantization control logic 320 is directed to the inverse DCT control logic, which transforms the signal from the frequency domain into the spatial domain of the input video data 132 to produce a reconstructed frame. The reconstructed frame includes a lot of artifacts, some of which are mitigated through the deblocking filter 324. The output of deblocking filter 324 is directed to frame buffer 326, which is directed to motion compensation control logic 330 to allow comparison within an input video unit from the input video data 132 via motion estimation control logic 328 for inter prediction encoding. Motion estimation control logic 328 may utilize matching motion estimation, integer motion estimation, fractional motion estimation or other inter prediction encoding techniques.
Mode decision control logic 316 evaluates whether inter prediction encoding or intra prediction coding is more suitable for each video unit; however, IDR type video units, within an IR region for example, as indicated by IR control logic 314 are forced IR macroblocks, i.e., automatically encoded using IR encoding techniques. In addition, as described in more detail with respect to
In
As shown in
As shown in
In contrast to video frame 450 of
For clarity, the techniques of
As represented by blocks 354, 358, and 362, the QP values from rate control logic 311 are analyzed according to the positions of the corresponding macroblocks. According to blocks 354, 356, QP values for macroblocks within top intermediate area 454 (
According to blocks 358, 360, QP limiting control logic 112 may modify the QP value of a macroblock within the IR region 452 calculated by the rate control logic 311 to produce a QP′ value actually used in the encoding of the macroblock. The QP′ values for macroblocks within IR region 452 (
Similarly, according to blocks 362, 364, QP′ values for macroblocks within bottom intermediate area 456 (
For clarity, the techniques of
In addition to, or as an alternative to, limiting the QP value to reduce visual artifacts in the encoded video using QP limiting control logic 112, IR control logic 314 of video encoder 108 may force IR encoding on macroblocks outside IR region 452 of video frame 600 to further mitigate visual artifacts in the encoded video data.
As shown in
As represented by blocks 554, 556, IR control logic 314 forces all or predominant proportion of macroblocks within IR region 452 to be IR encoded. Moreover, IR control logic 314 may further randomly introduce IR encoded macroblocks within non-IR regions of video frame 600, including top and bottom intermediate areas 654, 656 and well as the remaining area 658 of the non-IR regions of video frame 600 in order to relieve the belt shaped visual difference between IR encoded macroblocks and inter prediction encoded macroblocks (basically, to break the pattern).
In a specific example, as represented by blocks 558, 560, IR control logic 314 may force a distributed minority, such as about twenty-five percent (25%), of macroblocks within the top and bottom intermediate areas 654, 656 to be IR encoded. In the same or different example, as represented by blocks 562, 564, IR control logic 314 may force a distributed minority, such as the same or less proportion as with the top and bottom intermediate areas 654, 656, such as about twelve and-a-half percent (12.5%), of macroblocks within the remaining non-IR region of video frame 600, area 658, to be IR encoded. In this manner, macroblocks within the remaining non-IR region of video frame 600, area 658, may be predominately inter prediction encoded with a higher proportion of inter prediction encoded macroblocks than within the top and bottom intermediate areas 654, 656.
Human eyes are more sensitive to some kind of pattern in video rather than random noise such that randomly introducing IR encoded macroblocks within the non-IR region reduces visual artifacts in the encoded video stream caused by the IDR type macroblocks of IR region 452.
In various examples, the probability in which the macroblocks within top and bottom intermediate areas 654, 656 the remaining non-IR region of video frame 600, area 658, to be IR encoded can be controlled. In order to achieve the random distribution of forced IR encoded macroblocks, in one specific example, multiple bit fields of the cycle count value can be used to select macroblocks for forced IR encoding. For example, a macroblock could be selected for forced IR encoding with roughly a twenty-five percent (25%) probability if the last two bits of cycle count value are ‘0x11’. Of course other selection techniques are also possible.
Different probabilities for forced IR encoding of macroblocks within top and bottom intermediate areas 654, 656 the remaining non-IR region of video frame 600, area 658, can be and configurable with IR control logic 314 of video encoder 108.
Referring to
The disclosed integrated circuit designs may be employed in any suitable apparatus including but not limited to, for example, a mobile or smart phone, a phablet, a tablet, a camera, a laptop computer, a portable media player, a set-top box, a printer, or any other suitable device which encodes or plays video and/or displays images. Such devices may include, for example, a display that receives image data, such as image data that has been processed in the manner described herein, including the encoded output vide data 136, from the one or more integrated circuits where the one or more integrated circuits may be or may include, for example, an APU, GPU, CPU or any other suitable integrated circuit(s) that provide(s) image data for output on the display. Such an apparatus may employ one or more integrated circuits as described above including one or more of the encoder with QP limiting control logic, and other components described above.
Also, integrated circuit design systems, such as work stations including, as known in the art, one or more processors, associated memory in communication via one or more buses or other suitable interconnect and other known peripherals, are known that create wafers with integrated circuits based on executable instructions stored on a computer readable medium such as but not limited to CDROM, RAM, other forms of ROM, hard drives, distributed memory, etc. The instructions may be represented by any suitable language such as but not limited to hardware descriptor language (HDL), Verilog, or other suitable language. As such, the logic and structure described herein may also be produced as one or more integrated circuits by such systems using the computer readable medium with instructions stored therein. For example, one or more integrated circuits with the logic and structure described above may be created using such integrated circuit fabrication systems. In such a system, the computer readable medium stores instructions executable by one or more integrated circuit design systems that causes the one or more integrated circuit design systems to produce one or more integrated circuits. For example, the one or more integrated circuits may include one or more of the encoder with QP limiting control logic, and any other components described above that process video data in a way that reduces subjective visual artifacts, as described above.
Among other advantages, for example, the disclosed methods and apparatus allow video encoding that reduces subjective visual artifacts, such as flicker, caused by IDR type macroblocks and in turn subjective video quality can be improved. Other advantages will be recognized by those of ordinary skill in the art.
The above detailed description and the examples described therein have been presented for the purposes of illustration and description only and not for limitation. For example, the operations described may be done in any suitable manner. It is therefore contemplated that the present embodiments cover any and all modifications, variations or equivalents that fall within the scope of the basic underlying principles disclosed above and claimed herein. Furthermore, while the above description describes hardware in the form of a processor executing code, hardware in the form of a state machine or dedicated logic capable of producing the same effect, other structures are also contemplated.