A video may include a series of images. A series of images when rendered in sequence may be perceived by a viewer as a motion picture. Each of the images in a video may be referred to as a video frame. A video frame may be arranged as an array of pixels each pixel having a corresponding set of data.
A video may include a relatively large amount of data. For example, a video having F video frames per second in which each video frame is an array of A by B pixels of X data bits each results in F times A times B times X bits per second of data. As a consequence, a video may consume relatively large amounts of storage space and large amounts of bandwidth of a communication channel.
Video encoding may be employed to reduce an amount of data in a video. For example, video encoding may be used to transform a series of video frames into a video bit stream having substantially less data than the original video frames while retaining much of the visual information in the original video frames.
Video encoding may be subject to one or more encoding constraints. One example of an encoding constraint is a bit rate constraint, e.g. a maximum or minimum bit rate in a video bit stream. Another example of an encoding constraint is an encoding time constraint, e.g. a maximum time that may be consumed in encoding all or part of a video.
Prior methods for meeting an encoding constraint include adjusting quantization parameters. For example, the quantization parameters used to encode video data may be used to increase or decrease the bit rate of an encoded video bit stream. Unfortunately, adjusting quantization parameters to meet an encoding constraint may excessively sacrifice the quality of an encoded video.
Video encoding is disclosed that enables fine-grained control over the complexity of motion estimation to meet encoding constraints. Video encoding according to the present teachings includes scaling a set of complexity control parameters in response to an encoding constraint and encoding a video in response to the complexity control parameters.
Other features and advantages of the present invention will be apparent from the detailed description that follows.
The present invention is described with respect to particular exemplary embodiments thereof and reference is accordingly made to the drawings in which:
a-7b show examples of ordered mode searches.
The encoding constraint 24 may be any encoding constraint. One example of an encoding constraint is a bit rate constraint. Another example of an encoding constraint is an encoding time constraint, e.g. the encoding time of a macro-block or video frame, the time taken for motion estimation of a macro-block, etc. Another example of an encoding constraint is a buffering constraint. Another example of an encoding constraint is an amount of distortion in an encoded video signal. Another example of an encoding constraint is an amount of power consumption involved in encoding.
The complexity control parameters 52 in one embodiment are parameters for a fast motion estimation on macro-blocks. The complexity controller 20 may scale the complexity control parameters 52 to increase the complexity of fast motion estimation, thereby decreasing a bit rate of the video signal 14 and increasing coding time. The complexity controller 20 may scale the complexity control parameters 52 to decrease the complexity of fast motion estimation, thereby increasing a bit rate of the video signal 14 and decreasing coding time. The complexity controller 20 may scale the complexity control parameters 52 to meet a distortion constraint.
The complexity controller 20 obtains a timing signal 22 from the encoder 10. The timing signal 22 indicates a time consumed by the encoder 10 to encode a macro-block. The complexity controller 20 compares the timing signal 22 to a target encoding time. If the timing signal 22 indicates more time than the target encoding time then the complexity controller 20 scales the complexity control parameters 52 to decrease the encoding time. If the timing signal 22 indicates less time than the target encoding time then the complexity controller 20 scales the complexity control parameters 52 to increase the encoding time. The complexity controller 20 may employ a sliding window control loop to ensure that a variation in the encoding time over time is relatively small.
A training based method may be used to determine a mapping of the scaled complexity control value 16 to the complexity control parameters 52. A training method may include creating a pool of rate-complexity (R-C) points at a constant distortion based on a large training video and finely sampling the appropriate parameters. The R-C points not on the convex hull are pruned out and from the remaining R-C points the optimal parameter combination for a given complexity value are read out.
The complexity controller 20 provides a feedback control loop for controlling the encoding time of the video encoder 10 per macro-block. The scaled complexity control value 16 (CS) is updated in response to a deviation from a target encoding time using a sliding window of previous M macro-blocks according to the following.
where c is the real encoding time for each macro-block measured with an accurate timer and CT is the target encoding time per macro-block. KP and KD are proportional and derivative constants.
The mapper 42 maps the CS for each macro-block to the complexity control parameters 52 before encoding. The target encoding time per any unit, e.g. a video frame or group of video frames. A similar mechanism may be used for joint complexity-rate control in real time coding and transmission systems where the delay and buffer constraints are satisfied with relatively little fluctuations in quality.
The rate-distortion slope is updated as follows.
where B1 (i) and B2 (i) are the fullness of the input buffer 150 and the output buffer 152 at time i and B1max and B1max are the maximum buffer sizes and μ1
The process of fine-grained complexity scaling in the video encoder 10 is based on an observation that a majority of the complexity in transform-based motion-compensated video encoders involves the motion estimation with mode search, along with transform and entropy coding. Most of the complexity may be attributed to the motion estimation (ME) and mode decision steps in the video encoder 10 even when a fast ME scheme is used. The complexity controller 20 allocates the total available complexity, e.g. per frame, optimally and differently to constituent macro-blocks.
The complexity control parameters 52 are selected to scale the complexity of motion/mode search in the video encoder 10 in the context of a fast ME process. In one embodiment, the complexity control parameters 52 include a mode gradient (λMD) for the number of modes searched, a motion estimation gradient (λME) for motion vector accuracy, and an early stop SAD threshold (β). The complexity control parameters 52 may be scaled in combination to achieve the best rate-distortion tradeoff for a given complexity.
The early stop SAD threshold (β) comes into play during the mode and motion search by the video encoder 10. The early stop criterion terminates the search and the best mode and motion vectors obtained up to that point are used as the decision for the corresponding macro-block. This is done by comparing the best SAD cost so far against the early stop SAD threshold. The early stop SAD threshold is obtained by SAD cost prediction from neighboring blocks for the 16×16 case and the SAD cost value for the next higher block size for smaller sizes of macro-blocks. The SAD cost threshold is scaled from the original prediction using the early stop SAD threshold (β) as follows.
SAD_Early_Stop—Th=β(SAD cost prediciton)
The motion estimation gradient (λME) is defined as follows.
where ΔSAD is the SAD cost difference between before and after that ME step is performed and Δcomputation is the computation required to perform that step which can be the number of SAD cost computations per pixel or real time required. When λME is smaller than a gradient threshold (λME
A method of scaling complexity using the motion estimation gradient (λME) and SAD cost threshold (SAD_Th) is as follows.
Step A1: For each macro-block.
Step A2: Check the SAD cost of the predictors to find the best possible initial search point.
Step A3: If SAD<SAD_Th go to step A5. Otherwise, do an unsymmetrical Cross Search.
Step A4: If SAD<SAD_Th go to step A5. Otherwise, do big hexagon search.
Step A5: Conduct one step in the recursive small hexagon search loop.
Step A6: If
or if ΔSAD=0, go to step A8. Otherwise repeat step A5.
Step A7: Conduct one step in the recursive diamond search loop.
Step A8: If
or if ΔSAD=0, stop. Otherwise repeat step A7.
A method of scaling sub-pixel complexity using the motion estimation gradient (λME) is as follows.
Step B1: For every (interpolated) macro-block.
Step B2: Conduct one step in the recursive hexagonal search loop, by computing SADs with respect to interpolated reference.
Step B3: If
or if ΔSAD=0, stop. Otherwise repeat step B2.
The mode gradient (λMD) is defined as follows.
where ΔSAD is the SAD cost difference between before and after that mode search step is performed and Δcomputation is the computation required to perform that mode which can be the number of SAD computations per pixel or real time consumed. When λMD is smaller than gradient threshold (λ—
The encoder 10 searches a fixed number of a set of selected modes sequentially until a stopping criteria is satisfied. Alternatively, the encoder 10 may search only 16×16, 16×8, and 8×16 modes. The stopping criterion may be based on a threshold in the cost function or the mode gradient λMD.
The order in which the encoder 10 searches modes may be based on statistical frequency of the modes for a given training set. Alternatively, the order may be based on low complexity features computed from a video. The dependencies in the INTER mode group from motion vector and SAD predictors require searching in-order from larger to smaller sizes even though the search may terminate anywhere within that group.
a shows an example ordered mode search for relatively low resolution video.
Step C6: Find SAD_cost for 8×16 and 16×8 modes, if
then set mode=Inter16×8 (or 8×16) and go to step C13, else go to step C7.
Step C7: For each 8×8 block,
Step C8: Find SAD_cost for 8×8 mode, if
then go to step C11, else go to step C9.
Step C9: Find SAD_cost for 4×8 and 8×4 modes, if
then to step C11, else go to step C10.
Step C10: Find SAD_cost for 4×4 mode, if
then to step C11, else go to step C12.
Step C11: Set mode of the 8×8 block, if all 8×8 block modes are set go to step C12, else go to step C7 for the next 8×8 block.
Step C12: Find Intra-cost for the macro-block with predictions, select the mode with minimum intra modes should be tested earlier. The INTRA-II group includes a variety of predictors and complexity scaling may be performed by ordering the search within the predictors as well, particularly for high definition content in a video.
A method of scaling complexity using the mode gradient (λMD)is as follows.
Step C1: For every macro-block.
Step C2: Find Skip mode SAD_cost(SAD(Skip)), if SAD(Skip)<SAD_Early_Skip_Th then set mode=skip, go to step C13, else go to step C3.
Step C3: If SAD(Skip)<SAD_Early_Skip_Th, then set MV=MV pred, mode=Inter16×16, go to step C13, else go to step C4.
Step C4: Find Intra-cost(SAD(intra)), if SAD(intra)<SAD_Early_Skip_Th, then set mode=intra, go to step C13, else go to step C5.
Step C5: Find SAD_cost for 16×16 mode (SAD (16×16) ), if
then set mode=Inter16×16 and go to step C13, else go to step C6. SAD_cost. Step C13: Encode macro-block with given mode.
The foregoing detailed description of the present invention is provided for the purposes of illustration and is not intended to be exhaustive or to limit the invention to the precise embodiment disclosed. Accordingly, the scope of the present invention is defined by the appended claims.