A detailed description of one or more embodiments of the invention is provided hereinafter along with accompanying figures that illustrate by way of example the principles of the invention, wherein:
While the invention is described in connection with certain exemplary embodiments, it should be understood that the invention is not limited to any specific embodiment. On the contrary, the scope of the invention is limited only by the appended claims and the claim equivalents, and the invention encompasses numerous alternatives, modifications and equivalents. For the purpose of example, numerous specific details are set forth in the following description in order to provide a thorough understanding of the present invention.
In the following detailed description of the preferred embodiments, reference is made to the accompanying drawings that form a part hereof, and in which are shown by way of illustration, specific embodiments in which the invention may be practiced. It is understood that other embodiments may be utilized and certain changes may be made without departing from the scope of the present invention.
The proposed technique provides a mechanism for a low complexity transcoding of MPEG-2/MPEG-4 format bit stream into H.264 format bit stream.
With reference to
H.264 encoding is a highly computationally intensive operation. The proposed invention reduces the complexity of the trans-coding process by reusing certain data that is available in the input bit stream. By reusing the relevant data available in the input bit stream, this invention achieves a one-third reduction in implementation complexity.
Comparing the H.264 encoder (block 101) in
The H.264 re-encoder uses the decisions made by the MPEG-2/MPEG-4 encoder as a starting point. These decisions are available in the input MPEG-2/MPEG-4 bit stream. Starting from these decisions the re-encoder refines them to select the best coding configuration. Since the re-encoder starts from a decision that was already proven good in the MPEG-2/MPEG-4 context, the complexity is reduced substantially as it now only has to refine them to select a better coding configuration.
Information from the input MPEG-2/MPEG-4 bit stream, that can be reused at the MB level includes:
The proposed technique reuses the following data from the input MPEG-2/MPEG-4 bit stream, at the picture level:
The proposed invention expediently deals with three options, namely:
Trans-Scaling
In the trans-scaling method of approach, a trans-scaler unit (block 202) is used to down-scale the resolution of the MPEG-2/MPEG-4 decoder's output as shown in
The horizontal resolution can be down-scaled by a factor of ½, ⅔, ¾. The vertical resolution can be down-scaled by a factor of ½, when the horizontal scaling factor is set to ½ as mentioned in the table below.
Trans-Coding
The second approach referenced above is trans-coding which includes the process of mapping the relevant information from the MPEG-2/MPEG-4 stream in the H.264 format, deciding the best mode to code the MB in and then finally encoding and entropy coding the residual information to convert the data into the H.264 stream format.
The mapping process intelligently handles the differences between the MPEG-2/MPEG-4 and the H.264 video coding standard. For example, MPEG-2 allows the MVs to be specified only in half pixel units. MPEG-4 allows the MVs to be specified in quarter pixel units though it is possible to specify the MVs in half pixel units as well. All MVs in H.264 are specified in quarter pixel units, the difference being that the quarter pixels in MPEG-4 and H.264 are derived using different spatial filters.
Similarly there is no ‘one to one’ correspondence for all MB modes supported by MPEG-2/MPEG-4 and those supported by H.264.
Further, while coding interlaced sequences as frame pictures, both the MPEG-2 and MPEG-4 standards decide between the frame and the field coding mode on a MB by MB basis. In the H.264 standard however, this decision is done on a vertical pair of MBs.
When the trans-scaling option is enabled, the mapping process becomes more complicated since one MB in the output stream derives its information from many MBs in the input stream. The number of MBs in the input stream that map to a single MB in the output stream and the way this mapping is done, depends upon the configuration of the trans-scaler.
These and the other differences between the input and the output standards and the complexities introduced by scaling the resolution of the input sequence necessitates an intelligent mapping process and mode selection engine to optimally trans-code the MPEG-2/MPEG-4 stream into the H.264 format.
The first step involved in the trans-coding operation is to map the relevant information from the input MPEG-2/MPEG-4 stream in the H.264 format. As mentioned before, at the picture level, information such as the picture coding type is used to decide the picture coding type of the picture in the output stream. In general the picture in the output stream is coded in the same mode as it was coded in the output stream. The trans-coder has the ability to code any picture as an intra picture (I) and may choose to do so whenever it decides that it is the most optimum coding mode decision to make. For instance, when the input stream is corrupted with errors, the trans-coder may choose to code an Intra (I) picture once it recovers from the errors over-riding the decision that was made in the input stream. Further, if desired, while trans-coding progressive refresh MPEG-2/MPEG-4 streams the trans-coder may insert intra pictures at regular intervals to facilitate random access.
The flowchart in
The re-encoder translates the relevant MB level information from the MPEG-2/MPEG-4 stream to the H.264 format. The information used at this stage includes the mode (intra, inter, skip) in which the MB was coded in the input stream, the MVs used over the different MB partitions in case the MB is coded as an inter coded MB and information on whether the MB was coded in the frame or the field coding mode in case an interlaced picture is being coded as a frame picture.
A MB in H.264 can be coded as an inter, intra or a skip MB. The skip and the inter MB mode in MPEG-2/MPEG-4 are mapped to an inter MB in H.264. An intra MB in MPEG-2/MPEG-4 maps to an intra MB in H.264. To optimally trans-code the input stream, the trans-coder evaluates the mapped mode with the other two modes in which a MB could be coded. Thus, instead of blindly following the decisions made by the MPEG-2/MPEG-4 encoder, the trans-coder evaluates the different possible modes in which a MB could be coded before making the final decision.
After mapping the relevant information from the input stream, the re-encoder first computes the skip mode MVs and the corresponding predictor. If the residual information corresponding to the skip mode gets quantized to zero at the active quantizer scale, it declares the skip mode to be the best mode to code the MB in and exits the MB mode evaluation loop immediately (early exit).
However, if the residual information corresponding to the skip mode does not get quantized to zero at the active quantizer scale it stores the skip mode as one of the possible candidate modes to use for the current MB.
MVs found for individual MB partitions during the mapping stage are usually not the MVs that would yield the best coding cost. Besides the noise that gets added to the sequence during the MPEG-2/MPEG-4 encoding process, the H.264 trans-coding process, and the trans-scaling operation and the fact that the H.264 standard allows the MVs to be specified in quarter pixel resolution necessitates the MV refinement stage (block 206). The MV refinement stage (block 206) refines the mapped MV mapped from the MPEG-2/MPEG-4 stream and those derived for intra MBs in the input stream at the full pixel and/or half pixel accuracy. The MV that results in the least cost during the MV refinement stage (block 206) is used as the final MV for the MB partition being refined.
The intra coding mode supported in the H.264 coding standard has a better coding efficiency than the one supported in MPEG-2/MPEG-4. It is therefore possible that a MB coded as an inter or a skip MB in the MPEG-2/MPEG-4 stream may be better coded as an intra MB in the H.264 stream. Evaluating the intra MB mode for every inter or skip MB coded in the MPEG-2/MPEG-4 stream is computationally intensive and usually unnecessary. The re-encoder evaluates the intra MB mode conditioned on a criterion. If the lesser of the cost found for the skip and the inter MB mode is less than the average cost of the inter MBs coded in the previous picture of the same type, the re-encoder declares that either the skip or the inter MB mode results in a good enough match and hence the intra MB mode need not be evaluated.
However, if this is not true, the re-encoder evaluates both the intra 4×4 and the intra 16×16 mode.
The re-encoder finally compares the cost of all the MB modes evaluated until now and selects the mode associated with the least cost as the mode to code the MB in.
Mapping Stage
As mentioned before, there exist several differences between the MPEG-2/MPEG-4 and the H.264 standards. Further, when the trans-scaler is enabled, data from several MBs in the input picture is mapped to one MB in the output picture. Because of these complexities a ‘one to one’ mapping is usually not possible between the input and the output sequence. The mapping stage is therefore required to be intelligent to handle these differences to optimally translate the information present in the input sequence in the H.264 format.
The mapping stage is best described in three different sections. First, the way the input MB partitions physically map to partitions in the output MB. Second, the algorithm used to map the information from the input MB partitions to their corresponding output MB partitions. Here, the manner of how differences between the input and the output standards are handled is explained. Lastly, the arithmetic used to translate the MV and other relevant information in the H.264 syntax.
The MB level mapping stage re-uses information present in the input MPEG-2/MPEG-4 picture at the corresponding location in the H.264 picture.
In the most simple of all cases when the trans-scaler is disabled, mode-I, a MB in the input picture maps to a MB in the output picture. As shown in
When the trans-scaler is configured in mode-II, the horizontal dimension of the output picture is ¾th that of the input picture. The vertical dimension remains the same. In this mode, 4 horizontally contiguous MBs from the input picture map to 3 horizontally contiguous MBs in the output picture. In
When the trans-scaler is configured in mode-III, the horizontal dimension of the output picture is ⅔rd that of the input picture. The vertical dimension remains the same. In this mode, 3 horizontally contiguous MBs from the input picture map to 2 horizontally contiguous MBs in the output picture. In
When the trans-scaler is configured in mode-IV, the horizontal dimension of the output picture is half that of the input picture. The vertical dimension remains the same. In this mode, 2 horizontally contiguous MBs in the input picture form 1 MB in the output picture. In
When the trans-scaler is configured in mode-V, both the horizontal and the vertical dimension of the output picture is half that of the input picture. In this mode two rows of two horizontally contiguous MBs from the input picture map to a single MB in the output picture. In
With the above explanation for the mapping process using the 8×8 MB partitions, it should be relatively simple to infer from these figures how information from larger MB partitions (16×16, 16×8) in the input picture would be used in the output picture. As an example, consider that the input to the trans-coder is a progressive MPEG-2 picture where the only allowed MB partition is 16×16 and that the trans-scaler has been configured in mode II as shown in
The MB coding modes in the MPEG-2/MPEG-4 can be broadly classified into the following four categories—intra frame DCT mode, intra field DCT mode, inter frame MC mode, inter field MC mode. As mentioned before, with the trans-scaler enabled, an output MB in H.264 derives its information from several input MBs.
For the case of an interlaced frame picture, the input MBs could be coded in any of the four modes mentioned above. Deciding on a single coding mode for the output MB therefore is a non-trivial effort. Taking
As shown in
If that is not the case, the trans-coder computes the cost of coding the MB in several different modes and selects the mode that results in the least coding cost. As shown in
Tables 2-5 below are expediently used to map information from the input stream to the output stream. In particular, Table-2 relates to the interframe coding mode, Table-3 relates to the interfield coding mode, Table-4 relates to the intraframe coding mode and Table 5 relates to the intrafield coding mode. These tables 2-5 are used by the trans-coder to decide the steps required to convert the MBs from one mode in H.264 to the other coding mode in H.264 in the case of an interlaced frame picture.
Again as shown in
The re-encoder codes only half the vertical resolution of the input picture when the picture is being down-scaled by a factor of 2. For a progressive picture the trans-scaler is configured to down-scale the vertical resolution of the picture by 2. For an interlaced frame picture or a field picture the trans-scaler is not used to scale the vertical resolution. Instead, the trans-coder codes only one field (top/bottom) skipping the other.
As shown in
For a field picture, once the second field is eliminated, the picture generated using just the first field can be viewed as a progressive frame picture with half the vertical resolution. Hence in this particular case the first field picture (top/bottom) in the input sequence is coded as a frame picture in the output sequence with half the vertical resolution. Referring to
For an interlaced frame picture sequence too, just one field (top/bottom) is coded. In this case also, the output picture is coded as a frame picture. As shown in
MVs in the MPEG-2 are mentioned in half pixel accuracy units. MVs in MPEG4 stream can be mentioned in either half or quarter pixel accuracy. MVs in H.264 are coded with quarter pixel accuracy. The MV information are mapped to the H.264 format, based on the trans-scaler configuration, the accuracy with which the MVs are mentioned in the input stream and whether or not they need to be converted from the frame to the field mode or vice-versa.
MPEG-4 quarter pixel accurate MVs are actually mentioned in half pixel units with a fractional component. MPEG-2 MVs are mentioned in half pixel units. The half pixel unit MVs are first converted to quarter pixel unit MVs using the following equation:
quarter—pel—mv(x)=half—pel—mv(x)<<1
quarter—pel—mv(y)=half—pel—mv(y)<<1 Eq (I)
These quarter pixel MVs are then approximated to the nearest half pixel location using the following equation:
quarter—pel—mv—appx_to_half—pel(x)=((quarter_pixel—mv(x)+1)&0xFFFE)
quarter—pel—mv—appx_to_half—pel(y)=((quarter_pixel—mv(y)+1)&0xFFFE). Eq (II)
When the trans-scaler is configured in mode I, MVs from the input stream are first converted to quarter pixel units using Eq (I) and Eq (II):
h264_quarter—pel—mv(x)=quarter—pel—mv—appx_to_half—pel(x)
h264_quarter—pel—mv(y)=quarter—pel—mv—appx_to_half—pel(y). Eq (III)
When the trans-scaler is configured in mode II, H.264 MVs are derived from half pixel accurate MVs derived using Eq (I) and Eq (II) follows:
h264_quarter—pel—mv(x)=[(quarter—pel—mv—appx_to_half—pel(x)*3+2)>>2]&0xFFFE
h264_quarter—pel—mv(y)=quarter—pel—mv—appx_to_half—pel(y). Eq (IV)
When the trans-scaler is configured in mode III, H.264 MVs are derived from the half pixel accurate MVs derived using Eq (I) and Eq (II) as follows:
h264_quarter—pel—mv(x)=[(quarter—pel—mv—appx_to_half—pel(x)*4+3)/6]&0xFFFE
h264_quarter—pel—mv(y)=quarter—pel—mv—appx_to_half—pel(y). Eq (V)
When the trans-scaler is configured in mode IV, H.264 MVs are derived from the half pixel accurate MVs using Eq (I) and Eq (II) as follows:
h264_quarter—pel—mv(x)=((quarter—pel—mv—appx_to_half—pel(x)+1)>>1)&0xFFFE
h264_quarter—pel—mv(y)=quarter—pel—mv—appx_to_half—pel(y). Eq (VI)
When the trans-scaler is configured in mode V, H.264 MVs are derived from half pixel accurate MVs using Eq (I) and Eq (II) as follows:
h264_quarter—pel—mv(x)=((quarter—pel—mv—appx_to_half—pel(x)+1)>>1)&0xFFFE
h264_quarter—pel—mv(y)=((quarter—pel—mv—appx_to_half—pel(y)+1)>>1)&0xFFFE. Eq (VII)
Once mapped to the H.264 format, conversion of field mode MVs in quarter pixel units to frame mode in quarter pixel units is done using the following equation:
Similarly, once mapped to the H.264 format, conversion of the frame mode MVs to field mode MVs is done using the following equation:
quarter—pel—mv(x)=quarter—pel—mv(x)
quarter—pel—mv(y)=(quarter—pel—mv(y)>>3)<<2 Eq (IX)
MV Refinement:
In
The MV refinement stage (block 206) computes the cost of coding the MB partition at the closest half pixel location derived from the mapping stage. Quarter pixel accurate MVs computed for temporally predicting intra MBs in the input picture are approximated to the closest half pixel location for the purpose of refinement. Centered around this half pixel location, it then searches in a window that may span outside the boundaries of the picture for a full pixel location that results in a lesser coding cost. The MV associated with the least coding cost (say MVcenter) is used as the center pixel for subsequent sub pixel refinements.
The trans-coder searches a square grid (marked “green”) spaced at a distance of one half pixel unit around MVcenter.
This step is repeated for all partitions into which a MB is divided during the mapping stage.
It should be noted here that while the MPEG-2 standard does not allow unrestricted motion estimation, the trans-coder makes use of unrestricted motion estimation during the sub-pixel refinement stage, while evaluating the skip mode of coding and while converting intra MBs to inter MBs.
Complexities introduced by trans-scaling and the differences between the two standards result in finer MB partitions and also different and multiple MVs to be evaluated for each MB partition. These multiple evaluations require a lot of data to be fetched from the external memory which is typically much slower than the processor clock. Optimizing accesses to the external memory hence optimizes the overall system. To do so, DMA (Direct Memory Access) is often employed to fetch data from the external to the internal memory where the internal memory works at a much faster speed. Configuring and triggering DMAs consume processor cycles and doing so multiple numbers of times for each individual partition consumes even more cycles. To optimize the number of DMAs that need to be triggered, the trans-coder fetches an area centered around a MB. The bounding rectangle is defined by the amount of internal memory that can be spared for the search area and by the typical range of MVs that the trans-coder might expect to encounter in the input stream. The typical range of MVs that the trans-coder expects to encounter in the input stream is derived intuitively by considering the application in which the trans-coder is to be used, and by considering the f_code parameter in the input stream. The f_code parameter in the input stream puts a constraint on the range of MVs in the input stream. Pixel data that lie outside of this area and are required by some MVs are then fetched individually by configuring and triggering a separate DMA.
Transrating:
Rate control is done in the feedback loop from the output of the entropy coder (block 210) to the H.264 coding engine (block 209) as shown in
Initial buffer fullness,
Average bit rate at which the MPEG-2 stream has been encoded,
SAD (Sum of Absolute Differences) at the picture and the MB level,
Syntactical and the residual bits used at the picture and the MB level,
Average quantizer scale used over the picture, and,
Number of intra MBs in the input picture.
Information like the SAD, and the number of intra MBs that are collected at the picture level in the input stream are used to decide whether or not a scene change occurred in the sequence. The scene change information is critical for the trans-rater (block 208) to shape the VBV (Video Buffer Verifier) trajectory so as to avoid any trans-coder buffer underflow.
The SAD, the number of bits used for the picture and the average quantizer used over the picture is used by the trans-rater (block 208) to judge the complexity of the current picture to be coded and allocate bits to it.
The SAD, the number of bits and the quantizer scale information at the MB level are used to judge the complexity of the MB to decide on the distribution of bits over the MBs in the picture.
If required the Transcoder can be configured to closely track the VBV trajectory of the input MPEG-2/MPEG-4 stream.
Similar to a rate-control engine (block 106), the trans-rater (block 208) is benefited by knowing the coding type and the complexity of the pictures in the input stream in advance. In systems where a low-latency between the input and the output is not a requirement, the input stream is parsed in advance to know the picture coding types and the bits used on each one of them Once the trans-coder begins decoding (block 201) and encoding (block 203) the input stream, this advance information helps the trans-rater (block 208) shape the VBV trajectory. The trans-coder ensures that the input stream is parsed in advance so that the collected information could be made use of later while trans-coding.
In systems where a low-latency between the input and the output stream is desired, the trans-rater (block 208) maintains a histogram of the different GOP (Group Of Pictures) structures it has encountered over a sliding window of the last 16 GOP periods. The bin with the largest hit count is assumed to be the GOP structure for the next GOP to be coded. This helps the trans-rater (block 208) arrive at a reasonably good estimate of the GOP structure—the number of I, P, B pictures, the GOP duration and the sub GOP length which helps it shape the VBV trajectory to look close to the one in the input stream.
Besides the GOP structure the trans-rater gets more real-time feedback on the sub GOP structure (PBB) by decoding the temporal-reference syntax element. This helps the trans-rater know the number of B pictures that follow a P picture which helps the trans-rater shape the VBV trajectory and also delay or avoid the skip of a reference picture (P) by skipping the non-reference picture.
Error Handling:
The trans-coder needs to be resilient to errors in the input picture. The decoder (block 201) while decoding the input stream expediently checks the bounds and the constraints imposed on all syntactical elements of the stream to decide if the input picture was in error. This error information is sent to the error handling engine (block 214) which takes the necessary steps based upon the severity of the error determined by the information in the stream that got corrupted. Errors in the input stream are categorized as either critical or non-critical.
Information in the MPEG-2/MPEG-4 stream can be broadly classified into two types—header information and picture data information. Information such as the sequence header, the GOP header and the picture header all fall into the header information category. Information used to decode the MB in the input picture fall into the picture data category. An error in the header information is considered critical. Error in the picture data information of a reference picture is again considered critical whereas the same kind of an error in a non-reference picture is not considered critical.
When the trans-coder encounters a critical error, it only decodes the incoming stream without encoding the pictures. It resumes encoding as soon as it finds an I picture or decodes around 0.5 sec of video without encountering any error. The encoder only inserts stuffing bits during this interval if the encoder buffer is going to underflow.
When the trans-coder encounters a non-critical error, it skips coding the current picture and inserts stuffing bits if necessary. The trans-coder then resumes encoding from the next picture.
In the foregoing detailed description of embodiments of the invention, various features are grouped together in a single embodiment for the purpose of streamlining the disclosure. This method of disclosure is not to be interpreted as reflecting an intention that the claimed embodiments of the invention require more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive subject matter lies in less than all features of a single disclosed embodiment. Thus the following claims are hereby incorporated into the detailed description of embodiments of the invention, with each claim standing on its own as a separate embodiment. It is understood that the above description is intended to be illustrative, and not restrictive. The description is intended to cover all alternatives, modifications and equivalents as may be included within the spirit and scope of the invention as defined in the appended claims. Many other embodiments will be apparent to those of skill in the art upon reviewing the above description. The scope of the invention should therefore be determined with reference to the appended claims, along with the full scope of equivalents to which such claims are entitled.