The present arrangement provides a system and method associated with real-time video encoding and, more specifically, a system which intelligently allocates and achieves a time budget during real-time video encoding.
Real-time encoding is an important feature of modem video encoders. Video encoding is a computationally intensive process, requiring the interaction between several core modules such as spatial and temporal prediction, motion estimation and compensation, mode decision, transform coding, quantization and entropy coding. Natural video sequences have widely varying characteristics (motion, texture, special different coding complexities, etc). Adding to the complexity of the encoding process is the fact that encoders may be implemented on various software and hardware platforms each with their own distinct and varying processing capabilities. Thus, a drawback associated with real-time video encoding is that there is a large variability in the time it takes to encode video data. Moreover, it is difficult to predict the encoding time consumed by different encoding units.
An attempt to remedy the deficiencies identified above focuses on allocating a time budget for use during encoding. One manner of allocating time budgets relates to allocating a time budget for each Picture to be encoded based on the target frame rate alone. Then, after accounting for the overhead time within a Picture, the Mode Decision and Motion Estimation modules at the Macroblock level are constrained to execute in a fixed amount of time. The achievement mechanism used to implement the above allocation uses fixed, pre-determined thresholds to determine whether or not to evaluate certain coding modes in order to achieve the real-time constraint. However, there are certain weaknesses associated with this approach to time budgeting for real-time video encoding. First, allocating a constant Picture-level encoding time and Macroblock encoding time is not optimal because of the different coding complexities associated therewith. Second, the above method of time budgeting focuses solely on individual pictures and does not take into account the carry-over time between pictures and/or between macroblocks.
Another path to achieving real-time video coding efficiency focuses on reconfiguring an encoder depending on the fullness of a multi-frame input buffer. To maintain a target buffer fullness and hence real-time encoding, a controller module reduces encoder complexity when the buffer fullness is high and increases complexity when the buffer fullness is low. Complexity control is achieved by either changing Picture-types or by switching between different Motion Estimation schemes. However, there are also drawbacks associated with this mode of achieving real-time coding efficiency. Specifically, while this method works well for smooth sequences, the encoder cannot be properly reconfigured in order to handle abrupt changes in complexity. An additional drawback of method which relies on tight control of the multi-frame input buffer is the inability to estimate the complexity of the incoming video signal because of possible computation overhead resulting in an inability to adapt to the video signal characteristics.
Yet another path for achieving real-time coding efficiency is based on a frame-level control module for allocation and a per-frame complexity control module for achievement. In this method, the allocation module computes a target encoding time for a next frame depending on the total delay (or waiting time) experienced by the frames in the input buffer. If the coding delay is too large, then frames may be dropped. The complexity control module then uses a Lagrangian rate-distortion-complexity cost estimation to encode the frames within the target encoding time. The rate and distortion statistics of the co-located Macroblock in the previous Picture (in temporal order) and the Quantization parameter (QP) are used to model the coding behavior of the current Macroblock. This model is used to determine whether it would be more efficient to use a SKIP mode or evaluate all the remaining Macroblock coding modes. The drawback associated with this method is similar to those discussed above. Specifically, this real-time encoding scheme does not adapt to the input video signal characteristics during the time budget calculation. Specifically, this method fails to estimate the macroblock complexity prior to actual encoding and does not model the performance of coding modes other than SKIP mode.
Therefore, a need exists for a system that provides an efficient real-time video encoder that remedies these and other deficiencies described hereinabove.
In a first embodiment, an apparatus for encoding video is provided. A pre-analysis processor processes unencoded video data formed from a series of video pictures into respective video segments. An allocation processor allocates a first encoding time budget to a respective video segment respective video segment based on a size of the respective segment a target frame rate for the respective video segment, a second encoding time budget to individual pictures that form the respective video segment based on a picture-level complexity value and a type of picture, the second time budget for all individual pictures being substantially equal to the first time budget, and a third encoding time budget to individual blocks that form respective ones of the individual pictures based on a coding mode for the individual block and a block complexity value, the third time budget for all blocks being substantially equal to the second time budget for the respective individual picture that includes the blocks. An encoding processor encodes respective video segments using the third time budget to encode the video segment using the first, second and third time budgets.
In another embodiment, a method of encoding video is provided. The method includes the activities of processing unencoded video formed from a series of video pictures into respective video segments; allocating a first encoding time budget to a respective video segment respective video segment based on a size of the respective segment a target frame rate for the respective video segment; allocating a second encoding time budget to individual pictures that form the respective video segment based on a picture-level complexity value and a type of picture, the second time budget for all individual pictures being substantially equal to the first time budget, and allocating a third encoding time budget to individual blocks that form respective ones of the individual pictures based on a coding mode for the individual block and a block complexity value, the third time budget for all blocks being substantially equal to the second time budget for the respective individual picture that includes the blocks. The method further includes encoding respective video segments using the third time budget to encode the video segment using the first, second and third time budgets.
The above presents a simplified summary of the subject matter in order to provide a basic understanding of some aspects of subject matter embodiments. This summary is not an extensive overview of the subject matter. It is not intended to identify key/critical elements of the embodiments or to delineate the scope of the subject matter. Its sole purpose is to present some concepts of the subject matter in a simplified form as a prelude to the more detailed description that is presented later.
To the accomplishment of the foregoing and related ends, certain illustrative aspects of embodiments are described herein in connection with the following description and the annexed drawings. These aspects are indicative, however, of but a few of the various ways in which the principles of the subject matter can be employed, and the subject matter is intended to include all such aspects and their equivalents. Other advantages and novel features of the subject matter can become apparent from the following detailed description when considered in conjunction with the drawings.
The subject matter is now described with reference to the drawings, wherein like reference numerals are used to refer to like elements throughout. In the following description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the subject matter. It can be evident, however, that subject matter embodiments can be practiced without these specific details. In other instances, well-known structures and devices are shown in block diagram form in order to facilitate describing the embodiments.
It should be understood that the elements shown in the FIGS. may be implemented in various forms of hardware, software or combinations thereof Preferably, these elements are implemented in a combination of hardware and software on one or more appropriately programmed general-purpose devices, which may include a processor, memory and input/output interfaces.
The present description illustrates the principles of the present disclosure. It will thus be appreciated that those skilled in the art will be able to devise various arrangements that, although not explicitly described or shown herein, embody the principles of the disclosure and are included within its spirit and scope.
All examples and conditional language recited herein are intended for pedagogical purposes to aid the reader in understanding the principles of the disclosure and the concepts contributed by the inventor to furthering the art, and are to be construed as being without limitation to such specifically recited examples and conditions.
Moreover, all statements herein reciting principles, aspects, and embodiments of the disclosure, as well as specific examples thereof, are intended to encompass both structural and functional equivalents thereof Additionally, it is intended that such equivalents include both currently known equivalents as well as equivalents developed in the future, i.e., any elements developed that perform the same function, regardless of structure.
Thus, for example, it will be appreciated by those skilled in the art that the block diagrams presented herein represent conceptual views of illustrative circuitry embodying the principles of the disclosure. Similarly, it will be appreciated that any flow charts, flow diagrams, state transition diagrams, pseudocode, and the like represent various processes which may be substantially represented in computer readable media and so executed by a computer or processor, whether or not such computer or processor is explicitly shown.
The functions of the various elements shown in the figures may be provided through the use of dedicated hardware as well as hardware capable of executing software in association with appropriate software. When provided by a processor, the functions may be provided by a single dedicated processor, by a single shared processor, or by a plurality of individual processors, some of which may be shared. Moreover, explicit use of the term “processor” or “controller” should not be construed to refer exclusively to hardware capable of executing software, and may implicitly include, without limitation, digital signal processor (“DSP”) hardware, read only memory (“ROM”) for storing software, random access memory (“RAM”), and nonvolatile storage.
Other hardware, conventional and/or custom, may also be included. Similarly, any switches shown in the figures are conceptual only. Their function may be carried out through the operation of program logic, through dedicated logic, through the interaction of program control and dedicated logic, or even manually, the particular technique being selectable by the implementer as more specifically understood from the context.
In the claims hereof, any element expressed as a means for performing a specified function is intended to encompass any way of performing that function including, for example, a) a combination of circuit elements that performs that function or b) software in any form, including, therefore, firmware, microcode or the like, combined with appropriate circuitry for executing that software to perform the function. The disclosure as defined by such claims resides in the fact that the functionalities provided by the various recited means are combined and brought together in the manner which the claims call for. It is thus regarded that any means that can provide those functionalities are equivalent to those shown herein.
As used in this application, the terms “component” and/or “module” are intended to refer to hardware, or a combination of hardware and software in execution. For example, these elements can be, but are not limited to being, a process running on a processor, a processor, an object, an executable running on a processor, and/or a microchip and the like. By way of illustration, both an application running on a processor and the processor can be a component or a module. One or more components and/or modules can reside within a process and may be localized on one system and/or distributed between two or more systems. Functions of the various components and/or modules shown in the figures can be provided through the use of dedicated hardware as well as hardware capable of executing software in association with appropriate software.
The present invention advantageously provides a video encoder apparatus for real-time video encoding. The video encoder apparatus intelligently allocates time budgets at each of three different levels within the video. The present apparatus advantageously allocates encoding time budgets for each of a Group Of Pictures (GOP) level, a Picture level and Macroblock (MB) level. By advantageously taking into account the time budget for each of the three levels, the video encoder advantageously ensures efficient real-time encoding of video data.
The time budget allocation is based on time-complexity modeling at the Picture and MB level. The time budget allocation is, in some ways, similar but not equivalent to the rate control function performed by a rate control module of a video encoder. As is known, the rate control module of video encoders allocate bit budgets among different coding units of an encoder to maintain a target bitrate among the encoded video data stream. While the present encoder performs rate control as discussed, the video encoder further advantageously couples adaptive time budget allocation for each coding unit that encodes a respective level of the video data (e.g. GOP, Picture and MB). The adaptive time budget allocation performed by the video encoder adapts based on at least one video encoding metric that is used by an time budget allocation processor or module. The at least one video encoding metric may include at least one of (a) video signal characteristics; (b) actual encoder configuration; and (c) computing resources (or platform capabilities).
Thus, the video encoder apparatus advantageously performs time budget allocation via a time budget allocation processor that automatically determines how the video encoder should make best use of its time in order to achieve real-time performance. Once an efficient time budget has been allocated to the respective levels of the video data to be encoded, the video encoder further advantageously achieves the determined time budget using a time budget “achievement” module thereby forming a complete time control system. In one embodiment, the time budget allocation module and time budget achievement module are performed in separate modules. In another embodiment, the time budget allocation and achievement functions are performed by a single module.
To accomplish the inventive time budget achievement, the video encoder uses time-complexity modeling to control an encoding decision mode for the third level being encoded thereby. In one embodiment, time-complexity modeling is used to control a Macroblock level mode decision process in order to achieve the time-budget required for real-time video encoding. The time budget achievement algorithm adapts according to the at least one video encoding metric. For example, as referenced above, the at least one video encoding metric may include at least one of (a) video signal characteristics; (b) actual encoder configuration; and (c) computing resources (or platform capabilities).
Furthermore, once the video encoder has completed the time budget allocation function for a respective video data stream, the video encoder 100 advantageously achieves the real-time encoding time budget using the same accurate time-complexity modeling approach applied at a macroblock encoding decision mode at the Macroblock level. By using the same time-complexity modeling in both the time budget allocation and time budget achievement phases of encoding, the computing resources of the video encoder 100 may be minimized further improving the real-time coding efficiency. The video encoder 100 may achieve the time budget at a time associated with and consumed by different Macroblock coding modes is advantageously measured and tracked to generate a Macroblock complexity measurement. The Macroblock complexity measurement requires low computational overhead to generate and may be advantageously used by the encoder for other encoding processes such as Picture-type selection and Rate Control. Thus, the achievement scheme of the video encoder dynamically adapts to the actual encoder performance and platform capabilities.
The adaptive time budget allocation and achievement algorithm may be implemented in any type of video encoder 100. For example, the video encoder 100 may be any standard video encoder including, but not limited to a video encoder that encodes video according to an H.264/AVC encoding scheme, an H.264/SVC encoding scheme, an MPEG-4 encoding scheme and/or an MPEG-2 encoding scheme. These are described for purposes of example only and the principles of the present invention may be embodied in any video encoder that encodes video data according to any video encoding standard.
As shown in
The video data from source 50 is processed by the pre-analysis processor 102 into a plurality of different levels, each level of the video pictures are to be encoded separately. The video data is organized at a first level as a Group of Pictures (GOP) which represents a predetermined sequence of pictures. At a second level, hereinafter, the Picture level, each picture of the sequence of pictures is divided into non-overlapping blocks of a predetermined shape having a predetermined size. The shape and size of the non-overlapping blocks is dependent upon the type of coding scheme implemented by the video encoder 100. These blocks into which each picture is divided forms the third level to be encoded and is termed Macroblocks which are the most basic unit of any video encoder 100.
The video encoder 100 includes an encoding processor 103 that is coupled to the pre-analysis processor 102. The encoding processor 103 selectively receives pre-processed video data that is unencoded and encodes the pre-processed video data according to a video encoding scheme. The encoding processor 103 may encode the pre-processed video data according to any or all parameters associated with a particular video encoding standard or scheme. In one embodiment, the video encoder 100 may be an H.264/AVC video encoder and the video data received from the source 50 is uncompressed YUV formatted video data. In this embodiment, the pre-analysis processor 102 organizes the video data as a GOP having a predetermined size and the individual Pictures are divided into Macroblocks that are 16×16 pixels. The encoding processor 103 selectively encodes the pre-processed video data based on the parameters defined by the H.264/AVC encoding standard.
An output processor 105 is coupled to the encoding processor 103 for selectively outputting the video data encoded by the encoding processor 103 to a destination system. The output processor 105 may include a transmission function that enables transmission of the encoded data via a communication network. Additionally, the output processor 105 may also further format and/or partition the encoded video data to enable efficient transmission to a destination system. This operation is describe for purposes of example only and the output processor 105 may perform any operation that enables the encoded video data to be provided to any destination system either locally or remotely located from the video encoder 100.
A rate control processor 104 is coupled to the pre-analysis processor 102 and implements a rate control scheme for the video data to be encoded. The rate control processor 104 allocates bits to each Picture of a particular GOP with the least amount of distortion and subject to a target bit rate. In one embodiment, the rate control processor 104 may implement a constant bit rate (CBR) encoding scheme. In another embodiment, the rate control processor 104 may be inactive such that the resulting encoding scheme is a variable bit rate (VBR) encoding scheme. The rate control processor 104 may allocate collectively to a GOP as well as on sub-levels of the GOP including the Picture level and the Macroblock level.
The video encoder 100 also includes an allocation processor 106 that dynamically allocates a time budget associated with different organizational levels (GOP, Picture and MB) within a video data stream. The allocation processor 106, at the GOP level, derives a time budget based on a GOP size (i.e. number of coded Pictures in the GOP) and the target frame rate. The allocation processor 106 further advantageously determines an amount of encoding time associated with a previous GOP that was unused. For example, if the previous GOP had a time budget but the actual time it took to encode the GOP was lower than the allocated time budget, the allocation processor 106 may use any remaining time in determining and allocating a time budget for a present GOP. The allocation processor 106 estimates an overhead time associated with coding a current GOP from previously measured encoding time values and subtracts the estimated value from the GOP budget to yield the GOP encoding time budget.
The allocation processor 106, in response to determining the time budget associated with the first level (GOP), a second level time budget is determined The second level of encoding is at the Picture level and the individual Picture level encoding time budgets can be derived depending on the operating mode of the encoder. For CBR (Constant Bit Rate) encoding, the derivation is based on the corresponding bit budgets assigned by the Rate Control processor 104 which has previously considered a picture-level complexity measurement while allocating bit budgets subject to a maximum GOP bit budget. If the encoding scheme is a VBR (Variable Bit Rate) encoding scheme, the derivation of the time budget for the second (Picture) level is based on a Picture-level complexity metric that was calculated by the pre-analysis processor. The complexity metric defines a complexity (e.g. an amount of energy) associated with at least one characteristic of the picture of the video data. The complexity metric may include at least one of (a) motion; (b) texture; (c) special effect; and (d) auxiliary picture characteristic. Additionally, when the allocation processor 106 is determining the second level time budget for the Picture level, the type of picture (e.g. I frame, P frame, B frame) is also taken into account.
Upon determining the time budget at the second level, that is to say, for each Picture of the GOP, the allocation processor 106 determines and allocates a time budget at the third level for each Macroblock that forms each Picture of the GOP. At the MB level, the allocation processor 106 determines and allocates time budgets in proportion to a local complexity measure associated with each Macroblock. The complexity measure used has a very low computational overhead and is also used by other modules within the encoder, such as Picture-type selection and Rate Control. The allocation processor 106 measures a performance of the encoding processor 103. The performance of the encoding processor 103 is measured in terms of actual encoding time associated with a particular encoding operation (e.g. GOP encoding, Picture level encoding, MB encoding) and actual coded bits at each level of encoding. The performance measurement is utilized by the allocation processor to generate at least one model parameter that is used for allocating a time budget for a subsequent GOP and the Pictures and Macroblocks associated with the subsequent GOP. The at least one model parameter is automatically updated after each coded Picture resulting in the dynamically adaptable time budget allocation which considers actual encoder performance and platform capabilities to adaptively update a time budget allocation within each of the GOP, Picture and Macroblock levels in the video data stream prior to encoding thereof.
The video encoder 100 further includes an achievement processor 108 and a memory 108 to which the achievement processor 108 is coupled. The achievement processor 108 enables the time budgets allocated by the allocation processor 106 to be achieved to facilitate efficient real-time encoding by the encoding processor 103. The achievement processor 108 uses the time-complexity modeling to control the Macroblock level mode decision process in order to achieve the allocated time-budget required for real-time video encoding by the encoding processor 103. The achievement processor 108 executes an achievement algorithm that dynamically adapts according to the at least one video signal characteristics as well as the actual encoder configuration and computing resources available to the video encoder 100.
The achievement processor 108 uses an accurate time-complexity modeling approach at the Macroblock mode level whereby an encoding time associated with each Macroblock coding mode is modeled as a function of complexity. The time consumed by different Macroblock coding modes is measured and tracked, and because the Macroblock complexity measurement requires very low computational overhead to generate, the complexity measurement may also be used for other purposes such as Picture-type selection and Rate Control. The achievement scheme dynamically adapts to the actual encoder performance and platform capabilities.
The achievement processor 108 receives a time budget associated with a particular Macroblock as determined by the allocation processor 106. For the allocated time budget for the Macroblock, the achievement processor evaluates a coding cost associated with all available coding modes that may be applied to the particular Macroblock. In order to evaluate or “code” each mode, the encoder has to perform spatial or temporal prediction, motion estimation and compensation and residue coding (transformation, quantization and entropy coding). Therefore, the mode decision process is computationally intensive. Often, the mode decision process (along with the motion estimation) consumes the greatest portion of the encoding time. Hence, the objective is to reduce the computational burden at the mode decision stage in order to achieve the Macroblock time-budget that has been allocated for the particular Macroblock. The number and types of coding modes evaluated by the achievement processor 108 may include mandatory coding modes whereby all coding modes designated as mandatory are evaluated prior to selection thereof Other non-mandatory coding modes may also be evaluated by the achievement processor 108 if a time associated with evaluating non-mandatory coding modes allows the achievement processor 108 to remain within the encoding time budget allocated to the particular Macroblock and still have sufficient time for the encoding processor 103 to actually the particular Macroblock according to one of the MB coding modes. In response to evaluating the coding modes (either mandatory only or mandatory and non-mandatory), the achievement processor 108 selects a coding mode that is determined to be the least costly (e.g. least computationally intensive) to encode thereby providing the most compression-efficient coding mode for the particular Macroblock.
During the code mode evaluation process, the achievement processor 108, for each coding mode evaluated, calculates a ratio representing an actual time required to evaluate a particular coding to a complexity value of the particular Macroblock. These ratios represent as mode complexity map and may be stored in memory 110. Memory 110 being a separate component is described for purposes of example only and the memory may be resident within any of the above described processors and be accessible by the achievement (or other) processers depending on the computational operation being performed. The achievement processor 108 selectively queries the mode complexity map for each coding mode that has not yet been evaluated to determine if a time remaining in the allocated encoding time budget will be sufficient to evaluate the next previously unevaluated coding mode.
Once the coding mode for the particular Macroblock having the lowest computational cost is selected, the achievement processor 108 repeats the evaluation for a subsequent Macroblock of the Picture. When all Macroblocks of a particular picture have been evaluated and encoded, the achievement processor 108 repeats this process for Macroblocks of a subsequent picture until all Macroblocks of all pictures that form the GOP have been encoded. Thereafter, these operations are repeated for subsequent the Macroblocks of the pictures of subsequent GOPs.
In one embodiment, the video encoder 100 may be an H.264/VC video encoder. The achievement processor 108 may achieve the time budget allocated for each Macroblock by the allocation processor 108. The Macroblock mode decision process for an H.264/AVC encoder selects coding for each type of picture to be encoded (I, P and B pictures). For I Pictures, the available MB coding modes are Intra—16×16, Intra—4×4 and Intra_PCM. These modes support spatial prediction only. For P Pictures, the available MB coding modes include all the Intra modes, SKIP, Inter—16×16, Inter—8×8, Inter—8×16 and Inter—16×8. The Inter—8×8 also supports sub-partitions of sizes 8×4, 4×8 or 4×4. Within the Inter modes, only uni-directional temporal prediction is allowed. For B Pictures, the available MB coding modes include all the Intra and Inter modes mentioned above, with the addition of the DIRECT mode. Within the Inter modes, both unidirectional (forward or backward) and bidirectional (forward and backward) temporal prediction are supported. Most encoders evaluate some or all of these coding modes. For each mode, a Rate-Distortion cost is obtained. Next, the mode with the least cost is selected as the final coding mode, since this is the most compression efficient option. The achievement processor 108 evaluates each mode indicated as mandatory and evaluates additional coding modes as the achievement processor 108 determines sufficient time exists from the allocated time budget to evaluate those coding modes. The determination as to which coding modes are evaluated is performed using the mode complexity map that is selectively updated to include the ratio of actual coding time for a particular mode and the complexity measurement associated with the respective Macroblock being encoded.
The above discussion of the video encoder being an H.264/AVC encoder is described for purposes of example only. With suitable modifications based on the principles of the algorithm that control operation of the video encoder 100 can also be implemented on other standard video encoders. For example, the Macroblock coding modes and partition sizes that are discussed in this section are unique to H.264/AVC. One skilled in the art could readily substitute coding modes and MB size based on the particular encoding scheme being used by the encoder. The achievement scheme described above may operate in any other type of video encoder so long as a value corresponding to the time budget allocation for each Macroblock is available.
In block 202, the pre-analysis processor (102 in
At block 204, a first coding level time budget is allocated. The first level coding level is the coding associated with the Group of Pictures (GOP). The time budget for each GOP can be calculated from the current GOP size as determined by the pre-analysis processor, the target frame rate (in frames per second). Additionally, the first coding level time budget may also be based on the remaining time left from the actual encoding of the immediately preceeding GOP. Therefore, for the current GOP, its calculated time budget in accordance Equation 1 which provides:
Where N represents a GOP size (e.g. number of individual pictures in the current GOP), (FR)Target represents a target frame rate (frames/second) and TCarryover reprsents a difference between a calculated time budget for a previous GOP and an actual time taken to encode the previous GOP. For the very first GOP, TCarryover equals 0 and is calculated and updated after the last Picture in the current GOP has been encoded. The TCarryover is used to maintain the real-time frame rate over consecutive GOPs.
In block 206, the overhead time for the first coding level is computed and updated. The total time required to encode a particular GOP can be split into two parts—overhead time and encoding time. Overhead time can be generally defined as time spent by the encoder on tasks that do not directly contribute to the Macroblock encoding process such as the time it takes for the pre-analysis processor to execute all of its defined functions. This is overhead time because these processes execute prior to the actual encoding stage performed by the encoding processor (103 in
After Picture, has been pre-processed by the pre-analysis processor, block 206 measures and updates the overhead time Toverhead for the current picture type using a sliding window average of a pre-process time associated with the last WO coded pictures of the same type as defined in Equation 2 which states
In Equation 2, WO represents the sliding window of a predetermined number of previously coded individual pictures. Furthermore, Toverhead is tracked separately for each type of picture to be coded by the video encoder.
In block 208, a second coding level time budget is allocated based on the total time budget allocated for the first coding level. In this embodiment, the second coding level is the picture level encoding and the time budget for the picture level encoding is determined based on the time budgeted to the respective GOP from which the current Picture is found.
Picture-level encoding time can be generally defined as time spent by the encoder on tasks that directly contribute to the Macroblock encoding process. This typically involves motion estimation and compensation (for inter pictures), spatial prediction (for intra and inter pictures), mode decision, transform, quantization and finally entropy coding. The encoding time mainly depends on the allocated bits (in CBR mode) and the picture coding complexity. In an embodiment where the encoder is a variable bit encoder, the picture coding complexity may be used alone. At the Picture level, the goal is to optimally distribute the computed GOP budget
among the individual Pictures. Let i be the index of a Picture in coding order within the current GOP.
To allocate a time budget within the second coding level for the picture level encoding stage, it is determined whether the current picture I is the first picture to be coded in the current GOP. The system initializes the time values according to Equation 3 which provides:
Wherein the remaining time for the current GOP is equal to the calculated coding time for the GOP and the actual encoding time is equal to 0.
Before encoding a Picture with coding index i, a minimum encoding time
required for all the remaining pictures in this GOP is determined The encoding time available for this GOP is obtained by subtracting the total overhead time as derived in Equation 1 from the allocated GOP level budget as shown in Equation 4:
Thereafter, the encoding time for the current picture is calculated according to Equation 5 which provides:
Where θ represents a model parameter for Picture i, (Bits)iCalc represents a total amount of allocated bits for current Picture i obtained from Picture-level Rate Control. It is assumed that the bits allocated by a rate control processor (104 in
Model parameter θ is required to account for the different coding times of different Picture types. We can define θI, θP and θB, as model parameters for I, P and B Pictures types respectively. The GOP pattern decision (i.e. the Picture-type assignment) has already been made by the PreAnalysis processor. Hence, when evaluating Equation 5, the appropriate model parameter is plugged in, depending on the Picture-type. Extensive experiments with a variety of video sequences have shown that the formulation in Equation 5 results in optimum use of the encoding time budget.
In an embodiment of a video encoder that does not using Rate Control (for example, in Variable Bit Rate mode), a Picture-level complexity metric can be used in place of allocated bits in Equation 5. In this embodiment that is not using rate control, the model parameter θ represents the ratio between the actual encoding time and the actual complexity of a given Picture type. In fact, as will be discussed below, in the Macroblock-level time budget allocation, an MB-level complexity metric that can be averaged over all the Macroblocks may be used to yield a Picture-level complexity value that may be used.
To calculate the time budget for each Picture, we look to the coding modes available for the Macroblocks that comprise the respective picture. In the embodiment, where the video encoder is an H.264/AVC encoder the following coding modes for the following types of pictures are available. For I Pictures, the available MB coding modes are Intra—16×16, Intra—4×4 and Intra PCM. These modes support spatial prediction only. For P Pictures, the available MB coding modes include all the Intra modes, SKIP, Inter—16×16, Inter—8×8, Inter—8×16 and Inter—16×8. The Inter 8×8 also supports sub-partitions of sizes 8×4, 4×8 or 4×4. Within the Inter modes, unidirectional temporal prediction is allowed. For B Pictures, the available MB coding modes include all the Intra and Inter modes, with the addition of the DIRECT mode. Within the Inter modes, both unidirectional (forward or backward) and bidirectional temporal prediction are supported.
The modes are then examined to find the one that is the least time consuming. It should be noted that in case of SKIP or DIRECT mode (for P and B Pictures respectively), the encoder makes use of inferred motion information and hence little or no additional computation (such as spatial or temporal prediction or Motion Estimation) is necessary. For I Pictures, there is no equivalent to SKIP or DIRECT mode. Intra—16×16 is chosen as the mandatory mode since it consumes much less time compared to Intra—4×4, but far more coding efficient than Intra_PCM. Another property of these chosen modes (SKIP, DIRECT and Intra—16×16) is that their encode time is fairly constant and independent of the video content.
The calculated Picture level budget is then constrained by the minimum Picture coding time
This is defined as the total time required to encode all the Macroblocks of the Picture with the least time consuming mode, without evaluating any other coding mode. M represents the number of Macroblocks in every Picture and
represents the time required to encode a Macroblock with a particular coding mode Mode, without evaluating any other coding mode and
represents the least time consuming mode. Then, we can write the following equations 7 and 8
Such that
Once the second picture level coding time budget is calculated, the third coding level time budget is allocated. The third coding level time budget is the time budget for coding each respective Macroblock the forms an individual Picture. Thus, prior to encoding a Macroblock j, it is determined whether or not the current MBj is the first MB to be encoded and the system is initialized according to Equation 9:
Thereafter, the time budget for Macroblock j is calculated according to Equation 10 which provides
Where (Cmpl)i represents the complexity metric for Macroblock k from the pre-analysis processor and M represents the number of Macroblocks in the current Picture. This computed budget can now be ultized by the time budget achievement processor (108 in
is passed on to the achievement processor. The achievement processor may employ various mechanisms in order to constrain the Macroblock encode time to meet the allocated budget requirements.
Unlike at the Picture-level, the use of MB-level model parameters are not required in Equation 10. The requirement is relaxed because only the allocation between MBs of the same Picture-type, independent of their coding mode, is considered. In one embodiment, Equation 10 may be more accurate if it considered model parameters for each possible MB coding mode. However, this approach has two main problems. First, there are a large number of coding modes, especially for P and B Pictures. Second, the coding time for each individual mode is extremely small and exhibits a large amount of variance. Therefore, in practice, only the general coding complexity of the whole Macroblock is considered as shown in Equation 10, rather than individual coding modes.
In block 212, the encoding time associated with the third coding level is updated. For consistent real-time performance, the model parameters are measured and updated along with an actual (i.e. achieved) time budgets. This advantageously enables budget allocations adapt to any changes in the encoding system behavior due to internal or external factors. Internal factors may include encoder configuration changes, content changes, etc whereas external factors include CPU load, thread and process scheduling, memory and disk accesses, etc.
Furthermore, after encoding the Macroblock j, the system updates the remaining time budget the current Picture such that
is the actual achieved Macroblock encoding time. After evaluating Equation 11, it is possible for
to approach zero or be negative. To handle these cases,
is constrained by the minimum time required to encode all of the remaining Macroblocks in this picture as shown below in Equation 12.
Once the actual coding time for the third level is updated, the system updates at least one parameter associated with the second coding level after the complete second level has been encoded. For example, when the complete Picture has been encoded, the model parameter θ is updated using a sliding window Wθ having a predetermined number of pictures (e.g. 3) defined in Equation 13 as:
represents the actual, achieved encoding time and
reprsents the actual bits consumed by Picture I, measured after Picture i has been completely encoded. Moreover, θ is traced separately for each type of Picture that is to be encoded (I, P, and B pictures).
In one embodiment, the updating of the model parameter value may be omitted if the Picture-level complexity (i.e. average of all MB-level complexities from PreProcess) is below a certain threshold
This is because Pictures with very little or no motion (i.e. low complexity) provide no useful information regarding the time-complexity model relationship. In fact, including such Pictures in the update may adversely affect the modeling of other “normal” Pictures in the video sequence. From our experiments, a reasonable value for
is 5.
The system then measures and updates
using Equation 7 from the current Picture statistics. It should be noted that
is strongly dependent on the capabilities of the platform. Video encoding is generally a CPU bound process rather than an I/O or memory bound process. Therefore, “platform capabilities” can be interpreted as “CPU speed” or a measure of available computational resources. So, for a given combination of CPU processing speed and encoder configuration,
determines an upper bound for the maximum achievable frame rate FRMax as provided in Equation 14
where N is the GOP size.
Thereafter, the remaining GOP time budget is updated according to Equations 15 and 16 which provide
It is possible that after evaluating Equation 15,
approaches zero or is negative. Thus,
is constrained by the minimum time required to encode all the remaining pictures in the current GOP as provided in Equation 17:
In block 216, the system determines if this coded picture is the last coded picture in the current GOP, any carryover time as defined by the difference between calculated encoding time and actual encoding time as is updated for the subsequent GOP.
The inventive time budget allocation algorithm provides a scheme that allocate budgets at three coding levels in order to ensure real-time video encoding efficiency. This allocation includes allocating a time budget at a first coding level (GOP level) based on a size of the GOP, a target frame rate for the GOP and a carryover time representing a difference in calculated and actual encoding time associated with a previous GOP. Additionally, the algorithm models a second coding level time-complexity relationship and optimally distribute the time budget associated with a first coding level amongst the elements of the second coding level (e.g. Pictures that make up a GOP) based on one of a Picture level bit budget, Picture type, picture complexity metric and measured encoder performance.
The algorithm of
is received from the allocation processor and the remaining MB time-budget is initialized to the same value as shown in Equation 18
In operation, the remaining MB time budget is subsequently updated after evaluating each coding mode in accordance with Equation 19
represents the time required to evaluate the coding mode Mode without evaluating any other coding mode.
In block 304, at least one mandatory coding mode is evaluated by the achievement processor. The Macroblock coding modes that are the least time consuming are designated as “mandatory” and are always evaluated. This is because the mode decision process requires at least one mode to be checked, even if the allocated time budget cannot be met. It should be noted that in case of SKIP or DIRECT mode (for P and B Pictures respectively), the encoder makes use of inferred motion information and hence little or no additional computation (such as spatial or temporal prediction or Motion Estimation) is necessary. For I Pictures, there is no equivalent to SKIP or DIRECT mode. Intra—16×16 is selected as the mandatory mode since it consumes much less time compared to Intra—4×4 and is far more coding efficient than Intra_PCM. Another property of these chosen modes (SKIP, DIRECT and Intra—16×16) is that their encode time is fairly constant and independent of the video content.
If
represents the least time consuming mode, then for each type of picture discussed above (I, P and B) the following equation 20 applies:
If the Macroblock time-budget is not achieved in spite of coding only mandatory MB modes, it means that the given combination of encoder configuration and platform capabilities (or computing resources) is insufficient to achieve the target real-time encoding frame-rate. Therefore, one or more of these factors need to be changed in order to perform real-time video encoding.
To determine whether or not any other coding modes beyond the mandatory modes are to be evaluated, the achievement processor queries a mode complexity map table in block 306. The mode complexity map table stores data representing a ratio between the actual time required to evaluate a particular coding mode to the complexity (e.g. characteristic of the video picture) of the Macroblock as shown in Equation 21.
Which may be implemented as a sliding window average of mode-specific complexity ratios as shown in Equation 22 which provides
reprsents the actual, measured coding time of previously coded Macroblock i using coding mode Mode and (Cmpl)i represents the MB level complexity metric and WM represents a sliding window having a predetermined size (e.g. 5). The complex metric (Cmpl)i is computed for each MB by the pre-analysis processor (102 in
The formulation in (22) tracks the relationship between coding time and complexity for each coding mode, over a short window. This allows the ModeComplexityMap (and hence the achievement mechanism) to dynamically adapt to any changes in the encoder's performance, platform capabilities or computing resources. There are other factors that may affect the time budget achievement process. For example, the maximum number of reference pictures allowed and the maximum motion estimation search range can greatly affect the time consumed by each coding mode. The assumption made in our scheme is that these factors would uniformly affect the encoding time of all the Macroblocks of the current Picture. Therefore, only the time-complexity relationship is considered in the Mode Complexity Map table and results in the The Mode Complexity Map being queried to determine whether
will be sufficient to evaluate the current MB coding mode.
In block 308, the system queries, based on the values in the Mode Complexity Map table, whether or not there is sufficient time to evaluate a current coding mode. If the query is positive, then the algorithm continues at block 310 whereby the current coding mode is evaluated and the table is updated with the resulting evaluation value. The evaluation of the current Macroblock may include spatial or temporal prediction, motion estimation and compensation and residue coding (transform, quantization and entropy coding). The actual time consumed by the currently evaluated coding mode is measured and the time-complexity ratio is updated in the appropriate index of the ModeComplexityMap table stored in memory.
If the result of the query in block 308 is negative, the algorithm selects a coding mode in block 312 to code the particular Macroblock. The achievement algorithm further includes an error correction aspect whereby the encoding budget is only sufficient to evaluate the mandatory coding modes. For certain types of pictures (e.g. B and P pictures), this means that several Macroblocks may be encoded using SKIP or DIRECT modes. For video sequences with a high amount of motion, this may result in annoying visual artifacts. Therefore, it is necessary to correctly detect such “bad SKIP” or “bad DIRECT” modes and correct them by enforcing “Safe” modes. These “Safe” modes may use inferred motion information along with proper residue coding in order to limit the amount of distortion which greatly improves visual quality while maintaining the real-time encoding constraint.
Additionally, the methods may be implemented by instructions being performed by a processor, and such instructions may be stored on a processor or computer-readable media such as, for example, an integrated circuit, a software carrier or other storage device such as, for example, a hard disk, a compact diskette, a random access memory (“RAM”), a read-only memory (“ROM”) or any other magnetic, optical, or solid state media. The instructions may form an application program tangibly embodied on a computer-readable medium such as any of the media listed above. As should be clear, a processor may include, as part of the processor unit, a computer-readable media having, for example, instructions for carrying out a process. The instructions, corresponding to the method of the present invention, when executed, can transform a general purpose computer into a specific machine that performs the methods of the present invention.
What has been described above includes examples of the embodiments. It is, of course, not possible to describe every conceivable combination of components or methodologies for purposes of describing the embodiments, but one of ordinary skill in the art can recognize that many further combinations and permutations of the embodiments are possible. Accordingly, the subject matter is intended to embrace all such alterations, modifications and variations that fall within the spirit and scope of the appended claims. Furthermore, to the extent that the term “includes” is used in either the detailed description or the claims, such term is intended to be inclusive in a manner similar to the term “comprising” as “comprising” is interpreted when employed as a transitional word in a claim.