Embodiments described relate to video encoding, and include an encoding system with temporally adaptive quantization.
Typically, signals, such as audio or video signals, may be digitally encoded for transmission to a receiving device. Video signals may contain data that is broken up in frames over time. Due to high bandwidth requirements, baseband video signals are typically compressed by using video encoders prior to transmission/storage. Video encoders may employ a coding methodology to encode macroblocks within a frame using one or more coding modes. In many video encoding standards, such as MPEG-1, MPEG-2, MPEG-4, H.261, H.262, H.263, H.264, etc., a macroblock denotes a square region of pixels, which is 16×16 in size. Most of the coding processes (e.g. motion compensation, mode decision, quantization decision, etc.) occur at this level. Modern block based video coding standards take advantage of temporal and spatial redundancy of a channel to achieve efficient video compression, but produce variable bitrate (VBR) bitstreams.
As complexity of content of the channel changes, the bitrates of the encoded bitstreams may vary over time. A quantification of complexity is often specific to a video coding methodology and an encoder used to encode the content. One issue with encoded bitstreams is managing the variability of the bitrates to efficiently use available bandwidth, while maintaining consistent video quality. In order to control an output bitrate for an encoded bitstream, a rate controller may assign a quantization parameter (QP) for a frame based on an evaluation of complexity of the frame. However, complexity within a frame may vary greatly from macroblock to macroblock. Using a frame-level QP value to encode macroblocks of a frame may produce an encoded frame with noticeable visual quality differences within the frame.
Certain details are set forth below to provide a sufficient understanding of embodiments of the disclosure. However, it will be clear to one having skill in the art that embodiments of the disclosure may be practiced without these particular details, or with additional or different details. Moreover, the particular embodiments described herein are provided by way of example and should not be used to limit the scope of the disclosure to these particular embodiments. In other instances, well-known video components, encoder or decoder components, circuits, control signals, timing protocols, and software operations have not been shown in detail in order to avoid unnecessarily obscuring the disclosure.
As previously described, the encoding system 150 may receive a video signal, and generate an encoded bitstream based on the video signal using one or more encoding techniques. The video signal may be divided into coding units. Examples of coding units may include frames, sub-frames, regions, fields, macroblocks, etc. In the interest of clarity, operation of the encoding system 150 will discussed in terms of frames as coding units, and macroblocks as sub-coding units. The encoded bitstream may be a variable bitrate bitstream, with variance based on, for example, a complexity of the frames of the video signal. Examples of variables that may affect complexity may include spatial complexity of a frame (e.g., texture) and temporal complexity of a frame (e.g., motion).
In operation, the encoding system 150 may utilize a rate controller to determine a quality level with which to encode a frame of the video signal. The main quality driving parameter may be the quantization parameter (QP). The encoding system 150 may use standard methods for calculating a frame-level QP value, such as based on spatial complexity statistics for a frame. The encoding system 150 may then use adaptive quantization to adjust the assigned frame-level quality level for individual macroblocks of a current frame based on spatial and/or temporal complexities. For example, the encoding system 150 may determine intra-frame spatial statistics and/or motion estimation statistics associated with a macroblock. The encoding system 150 may then use adaptive quantization to adjust the frame-level QP value based on the spatial complexities of the macroblock, and may use temporally adaptive quantization to further adjust the frame-level QP value based on the motion complexities of the macroblock.
Balancing the visual quality of moving and stationary areas, or more precisely changing and not changing areas (e.g., in motion versus not in motion), may present a complex problem. The video quality of areas of a frame that are changing (e.g., in motion) do not need to be as high as areas that are relatively stationary to maintain a particular quality level. This is due to the area that is changing providing a temporal masking effect, where the human eye does not have enough time to detect video quality due to the movement. Conversely, a stationary or a stable area (e.g., not in motion) does not provide any form of temporal masking, and the human eye has the time to evaluate the video quality. Additionally, if an area of a frame is stationary or stable, it may be easier to distinguish between changes of the content and un-natural compression artifacts (e.g., beating, shimmering, and washout texture). Therefore, reducing a quality of areas of a frame that are changing or in motion, and increasing the quality of areas of the frame that are stationary, may provide a perceived video quality across the frame that is more consistent than adjusting for only spatial complexities. Therefore, by spatially and temporally adapting the visual quality based on spatial and motion complexities within a frame, the encoding system 150 may provide an encoded frame that has improved video quality balance across the frame as compared with using a common frame-level QP value for an entire frame.
The spatial complexity statistics may include an average pixel value (DC), and an activity variance. The activity variance represents pixel value variance from the DC value (e.g. the average) for the macroblock. The activity variance may indicate a complexity of texture of the macroblock. The encoding system 150 may provide a spatial QP value adjustment (e.g., sdQP) based on the DC value and the activity variance of the macroblock, which may be used to adjust the frame-level QP value assigned by the rate controller. Generally, the more activity variance (e.g., texture) within a macroblock, the harder it may be for the human eye to notice or distinguish visual quality defects. Thus, a macroblock with more activity variance (e.g., texture) may be encoded at a lower quality than a frame with little or no more activity variance (e.g., texture). For example, the encoding system 150 may set the sdQP value to increase the frame-level QP value (e.g., decrease a video quality) when the macroblock has a large amount of activity variance. Additionally, the encoding system 150 may set the sdQP value to decrease the frame-level QP value (e.g., increase a video quality) when the macroblock has a small amount of activity variance.
Motion estimation statistics may include bidirectional motion estimation of an incoming macroblock. The motion estimation may provide an indication of how well a macroblock may be motion compensated. The motion estimation may include macroblock differences, which may include absolute pixel differences between macroblocks of frames in the forward and/or backward directions. Thus, the encoding system 150 may determine whether a macroblock can be found in a previous or next frame, and whether, if found, the macroblock is stationary or includes some amount of motion. Along with the motion estimation, a macroblock activity may be calculated for all macroblocks in each frame. The macroblock activity may be determined using frames, fields, or adaptively frame or field on a macroblock level based on the video content. In an example for a macroblock that has 16 pixels, the macroblock activity ratio (e.g., sa_ratio) may be calculated as follows:
sa_ratio=min(16, 16*min(SADfw, SADbw)/act
where sa_ratio is an activity ratio for a macroblock in a current frame, SADfw is a sum of absolute pixel differences of a best matched macroblock of the previous frame, SADbw if a sum of absolute pixel differences of a best matched macroblock of the next frame, and act is a sum of absolute pixel differences is both the horizontal and vertical directions in the current frame, which may be calculated as follows:
act=Σy=015Σx=014|pixelx,y−pixelx+1,y|+Σy=014Σx=015|pixelx,y−pixelx,y+1|
In this manner, act may be calculated by summing the absolute differences between neighboring pixels in a x direction for each row in a frame, and by adding to that a sum of the absolute differences between neighboring pixels in a y direction for each column in a frame. The SAD sum of absolute pixel difference between the current macroblock and the reference motion compensated macroblock (e.g., for a previous (SADfw) or next frame (SADbw)) may be calculated as follows:
SAD=Σ
y=0
15Σx=015|curr_pixelx,y−ref_pixelx,y|
where curr_pixelx,y is a pixel of the current macroblock, and ref_pixelx,y is a pixel of a reference macroblock (e.g., for a previous (SADfw) or next frame (SADbw)).
The act value may normalize the motion of a macroblock by an amount of activity (e.g., texture) within the macroblock. The sa_ratio may be rounded to an integer value between 0 and 16, and limited to a maximum value of 16 in some examples. The sa_ratio may provide an indication of how much the macroblock is changing. A value of 0 may correspond with an example where the macroblock is not changing at all. A value of 16 may correspond with an example where the macroblock is undergoing extensive changes. If the macroblock is not changing at all (e.g., the sa_ratio is closer to 0), the frame-level QP value may be decreased (e.g., higher video quality) because the human eye has more time to judge the quality of stationary areas of a screen. If the macroblock is changing extensively (e.g., the sa_ratio is closer to 16), the frame-level QP value may be increased (e.g., lower video quality) because the motion may mask the effects of lower quality to the human eye. Using the minimum of the SADfw or the SADbw values for the sa_ratio may immediately detect an area of a screen that becomes stationary, in order to adjust the video quality for the stationary area.
For example, when a car is moving through a frame, a portion of a current frame just behind the car may be compared with a corresponding portion of a next frame that most closely matches the portion of the current frame. Since a macroblock of the current frame for the area behind the car would closely match a macroblock of a next frame, the encoding system 150 may be able to determine that the area behind the car is becoming relatively stationary. Therefore, the encoding system 150 may decrease the frame-level QP value (e.g., increase the quality) of the macroblock of the current frame due to the determination that the macroblock is not in motion.
Based on the sa ratio, the encoding system 150 set a temporal change in QP (e.g., tdQP) value that adjusts the frame-level QP value. In some embodiments, the sa_ratio may be mapped to a value increase or decrease for the QP. In an example, the tdQP value may be determined as follows:
tdQP=SAR_to—DQP[sa_ratio]/4
where SAR_to_DQP is:
SAR_to—DQP={−39, −32, −27, −22, −19, −16, −13, −10, −8, −6, −4, −1,1, 2, 4, 5}
The above mapping from the sa_ratio value to the tdQP value is exemplary. Other embodiments may use different mapping values. Further, the sa_ratio may be limited to a maximum value other than 16. Modifying of the QP value may be done based on a smaller or larger section than a macroblock, such as at a field or region of a frame.
An encoder of the encoding system 150 may receive the frames of the video signal. Responsive to receipt of the frames, the encoder may encode the frames in accordance with one or more encoding methodologies or standards, such as MPEG-2, MPEG-4, H.263, H.264, and/or HEVC, and based on the frame-level QP value selected by the rate controller that has been modified (e.g., adjusted) based on spatial complexity and/or motion estimation. The encoded frames may be provided in the encoded bitstream.
The encoding system 250 may include a statistics block 220 and a motion estimator that each receive the video signal. The statistics block 220 may provide statistical information for a frame or a macroblock that indicates spatial complexity based on the video signal to a rate controller 230, a spatially adaptive quantizer 240, and a temporally adaptive quantizer 260. As explained with reference to
The rate controller 230 may provide a frame-level QP value based on the spatial complexity statistics for the entire frame to an adder 242. The spatially adaptive quantizer 240 may provide a spatial QP value adjustment (e.g., sdQP) to the adder 242 based on the spatial complexity statistics for a current macroblock of the frame (e.g., the DC value and the activity variance of the macroblock). The adder 242 may adjust the frame-level QP value provided by the rate controller 230 based on the sdQP value (e.g., raise the frame-level QP value for high activity variance or lower the frame-level QP value for low activity variance).
The encoding system 250 may further include a motion estimator 210 that is coupled to a temporally adaptive quantizer 260. The temporally adaptive quantizer 260 may be further coupled to the spatially adaptive quantizer 240. The motion estimator 210 may receive the video signal and may provide motion estimation statistics associated with a macroblock and/or a frame to the temporally adaptive quantizer 260. The motion estimation statistics from the motion estimator 210 may include, for example, bidirectional motion estimation of an incoming macroblock of the video signal. The motion estimation may provide an indication of how well the macroblock may be motion compensated.
The temporally adaptive quantizer 260 may provide a temporal QP value adjustment (e.g., tdQP) to the adder 262 based on the motion estimation statistics from the motion estimator 210 and the spatial complexity statistics from the statistics block 220. The adder 262 may modify the QP value output from the adder 242 based on the tdQP value. The adder 262 may provide an adjusted QP* value to the encoder 270.
The encoder 270 may receive coding units via video signal and provide the encoded bitstream at an output that is encoded based on the QP* value. The encoder 270 may be implemented in hardware, software, or combinations thereof. The encoder 270 may include an entropy encoder, such as a variable-length coding encoder (e.g., Huffman encoder, context-adaptive variable length coding (CAVLC) encoder, or context-adaptive binary arithmetic coding (CABAC) encoder), and/or may be configured to encode the frames, for instance, at a macroblock level. Each macroblock may be encoded in intra-coded mode, inter-coded mode, bidirectionally, or in any combination or subcombination of the same. As an example, the encoder 270 may encode the video signal in accordance with one or more encoding methodologies or standards, such as MPEG-2, MPEG-4, H.263, H.264, and/or HEVC. The encoding methodologies and/or standards implemented by each encoder 0-N statistics block 220(0-N) may result in encoded frames having variable bitrates.
In operation, the encoding system 250 may utilize the rate controller 230 to determine a frame-level QP value with which to encode a frame of the video signal. The rate controller 230 may use standard methods for calculating the QP on the frame-level. Then, as previously described, the encoding system 250 may use adaptive quantization to adjust the assigned frame-level quality level for individual macroblocks of the frame based on spatial and/or temporal complexities. For example, for a given macroblock, the statistics block 220 may determine intra-frame spatial complexity statistics and/or the motion estimator 210 may determine motion estimation statistics. The spatially adaptive quantizer 240 may set a value of a sdQP to adjust the frame-level QP value based on the spatial complexity statistics, and the temporally adaptive quantizer 260 may set a value of the tdQP to further adjust the frame-level QP value based on the motion complexities of the macroblock. By spatially and temporally adapting the visual quality based on spatial and motion complexities within a frame, the encoding system 250 may provide an encoded frame that has improved video quality balance across the frame as compared with using a common frame-level QP value for an entire frame.
The spatial complexity statistics determined by the statistics block 220 may include an average pixel value (DC), and an activity variance. The spatially adaptive quantizer 240 may provide the sdQP to the adder 242 having a value based on the DC value and the activity variance of the macroblock. As previously described, the greater the activity variance, the lower the quality required for the encoded macroblock. Thus, the spatially adaptive quantizer 240 may provide a sdQP value to the adder 242 that increases the QP value (e.g., decrease a video quality) when the macroblock has a large amount of activity variance, and may provide a sdQP value to the adder 242 that decreases the QP value (e.g., increase a video quality) when the macroblock has a small amount of activity variance. The adder 242 may add the sdQP value to the QP value, and provide an updated QP value to the adder 262.
Motion estimation statistics determined by the motion estimator 210 may include bidirectional motion estimation of an incoming macroblock. As previously described, the motion estimation statistics may provide an indication of how well a macroblock may be motion compensated. The motion estimator 210 may determine macroblock differences, which may include absolute pixel differences between macroblocks of frames in the forward and/or backward directions. The temporally adaptive quantizer 260 may use the motion differences to determine whether a macroblock can be found in a previous or next frame, and whether, if found, the macroblock is stationary or includes some amount of motion. For example, in
sa_ratio=min(16, 16*min(SADfw, SADbw)/act
where sa_ratio is an activity ration for a macroblock in a current frame (e.g., current frame 320 of
act=Σy=015Σx=014|pixelx,y−pixelx+1,y|+Σy=014Σx=015|pixelx,y−pixelx,y+1|
The SAD sum of absolute pixel difference between the current macroblock and the reference motion compensated macroblock (e.g., a best matched macroblock of a previous (SADfw) or a best matched macroblock of next frame (SADbw)) may be calculated by the temporally adaptive quantizer 260, as follows:
SAD=Σ
y=0
15Σx=015|curr_pixelx,y−ref_pixelx,y|
where curr_pixelx,y is a pixel of the current macroblock, and ref_pixelx,y is a pixel of a reference macroblock (e.g., for a previous (SADfw) or next frame (SADbw)).
As previously described, the act value may normalize the motion of a macroblock by an amount of activity (e.g., texture) within the macroblock. Also as previously described, the sa_ratio may be rounded to an integer value between 0 and 16, and may be limited to a maximum value of 16 in some examples. The sa_ratio may provide an indication of how much the macroblock is changing (e.g., the higher the value, the more the macroblock is changing). If the macroblock is not changing at all (e.g., the sa_ratio is closer to 0), the temporally adaptive quantizer 260 may set the value of the tdQP to decrease the frame-level QP value (e.g., increase video quality), and if the macroblock is changing extensively, the temporally adaptive quantizer 260 may set the value of the tdQP to increase the frame-level QP (e.g., lower video quality). Using the minimum of the SADfw or the SADbw values for the sa_ratio may allow the temporally adaptive quantizer 260 to detect an area of a screen that becomes stationary (e.g. previously described in the moving car example), in order to adjust the video quality for the stationary area.
Based on the sa_ratio, the temporally adaptive quantizer 260 may provide a temporal change in QP (e.g., tdQP) value to the adder 262 to modify the updated QP value received from the adder 242 to provide the QP* value. The adder 262 may provide the QP* value to the encoder 270. In an example, the temporal change in tdQP value may be determined as follows:
tdQP=SAR_to—DQP[sa_ratio]/4
where SAR_to_DQP is:
SAR_to—DQP={−39, −32, −27, −22, −19, −16, −13, −10, −8, −6, −4, −1,1, 2, 4, 5}
The above mapping from the sa ratio value to the tdQP value is exemplary. Other embodiments may use different mapping values. Further, the temporally adaptive quantizer 260 may set the tdQP value based on a smaller or larger section of a frame than a macroblock, such as at a field or region level of a frame.
The encoder 270 may receive and encode (based on the QP* value value) the video signal that includes the macroblock. The video signal may be encoded in accordance with one or more encoding standards, such as MPEG-2, MPEG-4, H.263, H.264, and/or HEVC, to provide the encoded bitstream. The video signal may be encoded by the encoder 270 based on a quantization strength, which is based on the QP* value received from the adder 262. In encoding content, the encoder 270 may generate a predictor for a macroblock, and may subtract the predictor from the macroblock to generate a residual. The encoder 270 may transform using, for example, a discrete cosine transform (DCT), the residual to provide a block of coefficients. The encoder 270 may quantize the block of coefficients based on the QP* value. The block of quantized coefficients and other syntax elements may be provided to an entropy encoder and encoded into the encoded bitstream. In some embodiments, the block of quantized coefficients may be reconstructed to determine an encoding cost (e.g., inverse quantized and inverse transformed to produce a reconstructed macroblock residual). The reconstructed residual may be used for prediction in encoding subsequent macroblocks and/or frames, such as for further in-macroblock intra prediction or other mode decision methodologies.
Components described herein, including but not limited to the encoding systems, rate controllers, motion estimators, spatial complexity estimators, spatially and temporally adaptive quantizers, and encoders described herein, may be implemented in all or in part using software in some examples. The software may be implemented using instructions encoded on one or more computer readable media. Any electronic storage (e.g. memory) may be used to implement the computer readable media, which may be transitory or non-transitory. The computer readable media may be encoded with instructions for performing the acts described herein, including but not limited to, rate control, encoding, QP selection, temporally or spatial adaptive quantization, motion estimation, spatial complexity calculation, and combinations thereof. The instructions may be executable by one or more processing units to perform the acts described. The processing units may be implemented using any number and type of hardware capable of executing the instructions including, but not limited to, one or more processors, circuitry, or combinations thereof.
The media source data 402 may be any source of media content, including but not limited to, video, audio, data, or combinations thereof. The media source data 402 may be, for example, audio and/or video data that may be captured using a camera, microphone, and/or other capturing devices, or may be generated or provided by a processing device. Media source data 402 may be analog or digital. When the media source data 402 is analog data, the media source data 402 may be converted to digital data using, for example, an analog-to-digital converter (ADC). Typically, to transmit the media source data 402, some type of compression and/or encryption may be desirable. Accordingly, an encoding system with temporally adaptive quantization 410 may be provided that may encode the media source data 402 using any encoding method in the art, known now or in the future, including encoding methods in accordance with video standards such as, but not limited to, MPEG-2, MPEG-4, H.264, HEVC, or combinations of these or other encoding standards. The encoding system with temporally adaptive quantization 410 may be implemented using any encoder described herein, including the encoding system 150 of
The encoded data 412 may be provided to a communications link, such as a satellite 414, an antenna 416, and/or a network 418. The network 418 may be wired or wireless, and further may communicate using electrical and/or optical transmission. The antenna 416 may be a terrestrial antenna, and may, for example, receive and transmit conventional AM and FM signals, satellite signals, or other signals known in the art. The communications link may broadcast the encoded data 412, and in some examples may alter the encoded data 412 and broadcast the altered encoded data 412 (e.g., by re-encoding, adding to, or subtracting from the encoded data 412). The encoded data 420 provided from the communications link may be received by a receiver 422 that may include or be coupled to a decoder. The decoder may decode the encoded data 420 to provide one or more media outputs, with the media output 404 shown in
The receiver 422 may be included in or in communication with any number of devices, including but not limited to a modem, router, server, set-top box, laptop, desktop, computer, tablet, mobile phone, etc.
The media delivery system 400 of
A production segment 510 may include a content originator 512. The content originator 512 may receive encoded data from any or combinations of the video contributors 505. The content originator 512 may make the received content available, and may edit, combine, and/or manipulate any of the received content to make the content available. The content originator 512 may utilize encoding systems described herein, such as the encoding system with temporally adaptive quantization 410 of
A primary distribution segment 520 may include a digital broadcast system 521, the digital terrestrial television system 516, and/or a cable system 523. The digital broadcasting system 521 may include a receiver, such as the receiver 422 described with reference to
The digital broadcast system 521 may include an encoding system, such as the encoding system with temporally adaptive quantization 410 of
The cable local headend 532 may include an encoding system, such as the encoding system with temporally adaptive quantization 410 of
Accordingly, encoding, transcoding, and/or decoding may be utilized at any of a number of points in a video distribution system. Embodiments may find use within any, or in some examples all, of these segments.
From the foregoing it will be appreciated that, although specific embodiments of the disclosure have been described herein for purposes of illustration, various modifications may be made without deviating from the spirit and scope of the disclosure. Accordingly, the disclosure is not limited except as by the appended claims.