A portion of the disclosure of this patent document contains material that is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, as it appears in the Patent and Trademark Office patent files or records, but otherwise reserves all copyright rights whatsoever.
1. Field of the Invention
The present invention relates to the field of digital video processing, and more particularly in one exemplary aspect, to methods and systems of splicing video associated with digital video bitstreams.
2. Description of the Related Technology
Since the advent of Moving Pictures Expert Group (MPEG) digital audio/video encoding specifications, digital video is ubiquitously used in today's information and entertainment networks. Example networks include satellite broadcast networks, digital cable networks, over-the-air television broadcasting networks, and the Internet.
Furthermore, several consumer electronics products that utilize digital audio/video have been introduced in the recent years. Some examples included digital versatile disk (DVD), MP3 audio players, digital video cameras, and so on.
Such proliferation of digital video networks and consumer products has led to an increased need for a variety of products and methods that perform storage or processing of digital video. One such example of video processing is changing bitrate of a compressed video bitstream. Such processing may be used to, for example, change bitrate of a digital video program stored on a personal video recorder (PVR) at the bitrate at which it was received from a broadcast video network to the bitrate of a home network to which the program is being sent. Changing bitrate of a video program is also performed in prior art video distribution networks such as digital cable networks or an Internet protocol television (IPTV) distribution network.
The wide spectrum of types of digital video devices has spanned a plethora of specific requirements and use cases. For example, at one extreme of the spectrum lie consumer devices which are meant for mobile personal playback, whereas the other end of the spectrum may include commercial grade theaters. Accordingly, several different advanced codecs of varying capabilities (such as VC-1 and H.264), have found support niches within the electronics community. Many video codecs also support a wide range of bit rates, and features.
The task of splicing (i.e., combining or inserting) video data has become more complex with the introduction of the aforementioned advanced codecs. Splicing of disparate or heterogeneous video streams (for instance, those encoded with different codecs and/or different specifications of the same codec such as different bitrate, GOP structure, and interlace formats) may be desirable, for example, when performing advertisement insertion or bridging media from multiple sources. Furthermore, splicing has wide applicability in video distribution networks such as digital cable or satellite networks, or an internet protocol television (IPTV) distribution network. Unfortunately, splicing data streams from different codec types is not straightforward for many reasons. Some of these reasons are now described in greater detail.
As used herein, the term “picture” refers generally and without limitation to a frame or a field. If a frame is coded with lines from both fields, it is termed a “frame picture”. If, on the other hand, the odd or even lines of the frame are coded separately, then each of them is referred to as a “field picture”. Prior art video decoding generally comprises three frame types, Intra pictures (I-pictures), Predictive pictures (P-pictures), and Bi-directional pictures (B-pictures). H.264 allows other types of coding such as Switching I (SI) and Switching P (SP) in the Extended Profile. I-pictures are generally more important to a video codec than P-pictures, and P-pictures are generally more important to a video codec than B-pictures. P-pictures are dependent on previous I-pictures and P-pictures. B-pictures come in two types, reference, and non-reference. Reference B-pictures (Br-pictures) are dependent upon one or more I-pictures, P-pictures, or other reference B-pictures. Non-reference B-pictures are dependent on I-pictures, or P-pictures or reference B-pictures. As a result, the loss of a non-reference B-picture will not affect I-picture, P-picture and Br-picture processing, and the loss of a Br-picture, though not affecting I-picture and P-picture processing, may affect B-picture processing, and the loss of a P-picture, though not affecting I-picture processing, may affect B-picture and Br-picture processing. The loss of an I-picture may affect P-picture, B-picture and Br-picture processing.
Due to the varying importance of these different picture types, video encoding does not proceed in a sequential fashion. Significant amounts of processing power are required to compress and protect I-pictures, P-pictures, and Br-pictures, whereas B-pictures may be “filled-in” afterward. Thus, the video encoding sequence would first code an I-picture, then P-picture then Br-picture, and then the “sandwiched” B-picture. The pictures are decoded in their proper sequence. Herein lies a fundamental issue; i.e., decoding B pictures in a compressed digital video bit stream requires decompressed content from both prior and future frames of the bit stream.
Due to this complex ordering of pictures containing frame and field pictures, reference and non-reference B-pictures, different ordering of top-bottom field parities in interlaced frames, splicing between two streams require a complex set of algorithms that make the transition syntactically legal per the desired “target” video standard (e.g., H.264).
The task of producing a spliced video bitstream that is syntactically conformant to a standard (e.g., H.264), and which exhaustively addresses every possible mode in which pictures may be encoded in the two video bitstreams being spliced, remains unaddressed in the prior art. Prior art solutions for splicing mostly address MPEG-2 encoded video splicing, which does not consist of reference or hierarchical B-pictures, complementary field pairs, non-paired fields, etc. Due to the wider variety of H.264 coding possibilities, the splicing problem between any two arbitrary H.264 streams is quite complex, and completely unaddressed by such prior art solutions.
Hence, there is a need for an improved method and apparatus for splicing video bitstreams which may or may not be heterogeneous in nature, including those having reference B or hierarchical B pictures.
The present invention satisfies the foregoing needs by providing improved methods and apparatus for video processing, including splicing of disparate video data streams.
In a first aspect of the invention, a video splicing method is disclosed. In one embodiment, the method comprises: providing a first video stream comprising hierarchical B pictures; providing a second video stream comprising no hierarchical B pictures; identifying a splicing boundary; splicing the first and second streams at the boundary to produce a spliced stream; and applying a correction to the spliced stream. In one variant, the act of identifying is performed so as to maintain compliance with H.264 protocol requirements. In another variant, the act of identifying is performed based at least in part on frame type. The frame type is selected from e.g., (i) I-frames; and (ii) P-frames, and the act of splicing comprises splicing in the second stream at an I-frame or P-frame of the first stream. In another variant, the method further comprises evaluating field parity; e.g., evaluating whether a frame corresponds to a top field or bottom field associated with an interlaced video stream. The splicing boundary is then adjusted based at least in part on the evaluation of parity. In yet another variant, applying a correction comprises duplication of a frame. In a further variant, applying a correction comprises deleting a frame. In still another variant, the method further comprises throttling a bitrate associated with the spliced stream to as to avoid overflow or underflow conditions.
In a second embodiment, the video splicing method comprises: providing a first video stream encoded according to a standard and comprising a first plurality of coding parameters; providing a second video stream encoded according to the same standard and comprising a second plurality of coding parameters, the second plurality of parameters being different from the first plurality of parameters in at least one regard; identifying a splicing boundary; and splicing the first and second streams at the boundary to produce a spliced stream. In one variant, the standard comprises the H.264 standard.
In a second aspect of the invention, video splicing apparatus is disclosed. In one embodiment, the apparatus comprises: first apparatus adapted to receive a first video stream comprising hierarchical B pictures; second apparatus adapted to receive a second video stream comprising no hierarchic B pictures; logic in communication with the first and second apparatus, the logic configured to identify a splicing boundary within at least one of the first and second streams; a splicer configured to splice the first and second streams at the boundary; and logic configured to apply a correction. In one variant, the apparatus is configured to maintain compliance with H.264 protocol requirements. In another variant, the logic configured to identify is configured to identify based at least in part on frame type selected from e.g.,: (i) I-frames; and (ii) P-frames. In another variant, the splicer comprises logic adapted to splice in the second stream at an I-frame or P-frame of the first stream. In a further variant, the apparatus further comprises logic in communication with the splicer and configured to evaluate field parity (e.g., whether a frame corresponds to a top field or bottom field associated with an interlaced video stream). In still another variant, the apparatus further comprises logic in communication with the splicer and configured to adjust the splicing boundary based at least in part on the evaluation of parity. In another variant, the apparatus further comprises logic adapted to apply a correction via duplication or deletion of a frame. In yet another variant, the apparatus further comprises apparatus configured to throttle a bitrate associated with the spliced stream to as to avoid overflow or underflow conditions. The apparatus configured to throttle comprises e.g., first and second picture buffers, and at least one of the buffers is configured to be emptied at a substantially constant rate specified by a presentation timeline. In another variant, the video splicing apparatus comprises a processor and at least one computer program adapted to run thereon, the at least one computer program comprising at least: (i) the logic configured to identify a splicing boundary within at least one of the first and second streams; (ii) the splicer; and (iii) the logic configured to apply a correction.
In a third aspect of the invention, computer readable apparatus is disclosed. In one embodiment, the apparatus comprises a storage medium, the medium adapted to store at least one computer program, the at least one computer program being configured to, when executed on a processing device: receive a first video stream comprising a first type of picture, the first type having a first form of dependency relating to frame type; receive a second video stream comprising a second type of picture, the second type having a second form of dependency relating to frame type different than the first form; identify a splicing boundary within the first stream; splice the second stream into the first at the boundary to produce a spliced stream; and determine whether a correction is required and if so, apply a correction.
In a fourth aspect of the invention, a splicing system is disclosed. In one embodiment, the system comprises a first video stream source, a second video stream source, and a splicing apparatus. In one embodiment, the two streams comprise H.264 encoded streams having pictures containing frame and field pictures and reference and non-reference B-pictures, and the splicer is adapted to splice the two streams.
a is a block diagram of the Hypothetical Reference Decoder (HRD) Model in Annex C of the H-1.264 Standard.
b is a graphical illustration of PTS and DTS time stamps of a sequence in display and encoding orders.
a is a logical flow diagram illustrating one embodiment of a method of deleting an extra picture at a splice point, in accordance with the present invention.
b is a logical flow diagram illustrating one embodiment of a method of splicing sequences of different hierarchies and filling gaps, in accordance with the present invention.
c is a logical flow diagram illustrating one embodiment of a method of filling gaps under differing circumstances, in accordance with the present invention.
All figures and tables © Copyright 2008-2009 TransVideo, Inc. All rights reserved.
The following detailed description is of the best currently contemplated modes of carrying out the invention. The description is not to be taken in a limiting sense, but is made merely for the purpose of illustrating the general principles of the invention, since the scope of the invention is best defined by the appended claims.
As used herein, “video bitstream” refers without limitation to a digital format representation of a video signal that may include related or unrelated audio and data signals.
As used herein, “transrating” refers without limitation to the process of bit-rate transformation. It changes the input bit-rate to a new bit-rate which can be constant or variable according to a function of time or satisfying a certain criteria. The new bitrate can be user-defined, or automatically determined by a computational process such as statistical multiplexing or rate control.
As used herein, “transcoding” refers without limitation to the conversion of a video bitstream (including audio, video and ancillary data such as closed captioning, user data and teletext data) from one coded representation to another coded representation. The conversion may change one or more attributes of the multimedia stream such as the bitrate, resolution, frame rate, color space representation, and other well-known attributes.
As used herein, the term macroblock (MB) refers without limitation to a two dimensional subset of pixels representing a video signal. A macroblock may or may not be comprised of contiguous pixels from the video and may or may not include equal number of lines and samples per line. A preferred embodiment of a macroblock comprises an area 16 lines wide and 16 samples per line.
As used herein, the term H.264 refers without limitation to ITU-T Recommendation No. H.264, “SERIES H: AUDIOVISUAL AND MULTIMEDIA SYSTEMS—Infrastructure of audiovisual services—Coding of moving Video—Advanced video coding for generic audiovisual services” dated November 2007, and any variants (e.g., H.264 SVC), revisions, modifications, or subsequent versions thereof, each of which is incorporated by reference herein in its entirety.
In one salient aspect, the present invention discloses methods and apparatus for splicing two (or more) video streams together. The invention resolves the issues inherent to splicing two compressed video bit streams (having one or more disparate qualities, such as bit rate, format, field parity, etc.), together to form a single video bit stream. Splicing according to various embodiments of the invention can also be hierarchical in nature.
In one embodiment of the invention, two video streams are spliced together—one containing certain types of pictures (e.g., hierarchic B pictures), and the other without them. A splicing boundary is determined in compliance with e.g., extant protocol requirements. In a first implementation, the boundaries are determined based on frame types. One or more additional constraints (e.g. field parity, bit rate) are considered and, a correction (e.g., duplication of a frame, and/or deletion of a frame) is applied.
The splicing boundary can be determined based on the decoding requirements of the frame types. For example, an I-picture has no decoding requirements, as it is decoded “standalone”. In contrast, a B-picture requires information from both its lead-pictures and its follow-pictures. A P-picture only relies on the information from its lead-pictures. Accordingly, a first video stream may be spliced at either an I-frame or a P-frame. The spliced-in second video stream replaces the spliced frame (e.g., the P-frame of the first stream) with its own replacement I-frame, thus prompting the video decoder to begin freshly decoding the second stream.
One or more additional constraints may also be considered, such as the current field parity, or bit rate. For example, in some “interlaced” video codecs, each frame additionally has “top/bottom parity”. Interlaced video flashes only half the frame at a time. A “field” is an image that contains only half of the lines needed to make a complete picture. The top field comprises every other row of an image, starting at the first row (e.g., 1, 3, 5, etc.). The bottom field comprises every other row of an image, starting at the second row (e.g., 2, 4, 6, etc.). The top field and bottom field are interlaced to produce the complete image without requiring the full bandwidth to do so. Each frame of an interlaced video is assigned “parity”, this parity indicating if the frame is a top or bottom field. Parity must always alternate; i.e., a top frame must always be followed by a bottom frame, and vice versa.
Thus, in one implementation, a first video stream which is spliced at a P-field during a bottom parity field is repeated for a stalling top parity field. The spliced-in second video stream replaces the subsequent field (e.g., the bottom parity P-field of the first stream) with its own replacement bottom parity I-field, thus prompting the video decoder to begin freshly decoding the second stream while remaining consistent with the correct parity sequence.
In another aspect of the invention, the output is throttled according to the input and output video bit streams to resolve any data rate discrepancies. In a first embodiment, two regulated buffers (a Compressed Picture Buffer (CPB) and a Display Picture Buffer (DPB)) are described for a hypothetical reference decoder (HRD). The CPB is at the input of the HRD and is used to regulate network jitter and outgoing compressed bitrate. The DPB is at the output of the HRD and is used to store decoded pictures before they are displayed.
If there is a difference in the bitrates of two spliced streams and there is a difference in the CPB fullness at the spliced point, then blindly switching from one stream to the other can cause HRD CPB and DPB overflow or underflow without correction.
Consequently, one rate matching apparatus is a CPB which accumulates or loses bits due to the addition or deletion of frames or fields. These quantities are measured in field units (i.e., a frame comprises a top filed and bottom field). Thus, if a field is added to the CPB, the running count is incremented by one; if an extra frame is added, it is incremented by two. Likewise, if a frame is deleted from the CPB, the running count is decremented by two. During splicing, the CPB must operate within reasonable limits (which may vary depending on device operational memory capabilities).
The DPB is in one implementation emptied at a constant time interval specified by the presentation timeline. Positive changes to the DPB denote delays from the intended or ideal presentation time. Negative changes indicate that the picture is presented earlier than intended. Ideally, the splicer should not cause significant deviations to the DPB.
Exemplary embodiments of the various apparatus and methods according to the present invention are now described in detail.
It will be recognized that while the exemplary embodiments of the invention are described herein primarily in the context of the H.264 codec syntax referenced above, the invention is in no way so limited, and in fact may be applied broadly across various different codec paradigms and syntaxes.
Moreover, it will be recognized that any tables contained herein are purely illustrative in nature, and not representative of actual images or relationships between frames or other elements (e.g., sizing and width variations in no way indicate any relative differences).
One common architectural concept underlying certain aspects and embodiments of the invention relates to use of a “three stage” process—i.e., (i) an input processing stage, (ii) an intermediate format processing stage, and (iii) an output processing stage. In one embodiment, the input processing stage comprises both a decompression stage that takes an input bitstream and produces an intermediate format signal, and a parsing stage that parses certain fields of the bitstream to make them available to the output processing stage.
The intermediate format processing stage performs signal processing operations, described below in greater detail, in order to condition the signal for transrating.
Finally, the output processing stage converts the processed intermediate format signal to produce the output bitstream, which comprises the transrated version of the input bitstream in accordance with one or more quality metrics such as e.g., a target bitrate and/or a target quality.
Annex C of the H.264 standard (ITU-T Recommendation No. H.264, “SERIES H: AUDIOVISUAL AND MULTIMEDIA SYSTEMS—Infrastructure of audiovisual services—Coding of moving Video—Advanced video coding for generic audiovisual services” dated November 2007, which is incorporated by reference herein in its entirety) describes a hypothetical reference decoder (HRD) consisting of a Hypothetical Stream Scheduler (HSS), Coded Picture Buffer (CPB), instantaneous decoder, Decoded Picture Buffer (DBP), and instantaneous display, in that order. See
Appendix I herein lists various abbreviations and acronyms used in the following discussion.
The HRD model of AVC and Video Buffering Verifier (VBV) model of MPEG-2 specify a Decode Time Stamp (DTS) (aka tr(n)), which indicates the time at which an encoded picture or audio block (access unit) is instantaneously removed from the CPB and decoded by the instantaneous decoder. It also specifies a Presentation Time Stamp (PTS) (aka to,dpb(n)), which indicates the instant at which an access unit is removed from the DPB and presented for instantaneous display.
For the exemplary embodiments of splicing discussed herein, it is assumed that the CPB does not overflow due to accumulation of bits respectively, and do not underflow due to deletion of bits respectively
Also, it is assumed for purposes of certain embodiments herein that any deletion of frames is achieved by setting the no_output_of_prior_pics_flag in the immediately next IDR frames in display order in its slice header in the dec_ref_pic_marking( ) syntax. This frame is an IDR frame. Furthermore, it is assumed that the DTS of the IDR frame is less than the PTS of the frame to be deleted, and greater than or equal to the prior (possibly B) frame not to be deleted.
In order to delete frame P3 only by setting the no_output_of_prior_pics_flag in the slice header of IDR4, the PTS and DTS timings have to satisfy the following inequality:
PTS(B2)≦DTS(IDR4)<PTS(P3) Eqn. (1)
The following scenarios which may be encountered during video splicing are now considered in detail.
As previously noted, splicing according to various embodiments of the invention can be hierarchical in nature. For example, considering three hypothetical streams (streams 1, 2 and 3), stream 2 can be spliced into stream 1, and then soon after stream 3 can be spliced into stream 2. Subsequently, we can return to stream 2 and eventually to stream 1.
Scenario No. 1—Sequences with No Hierarchic B Frames—
As a specific example of the foregoing generalized methodology 300, the following sequence (Seq1) with frame pictures only with SubGop=3 and no hierarchic B frames in the display (D) and Coding (C) order is considered. The sequence has SubGop=3, Hierarchy=0 and Latency=1.
The sequence has a latency of 1 frame; i.e., the decoded frames are displayed one frame after the encoding begins Now another sequence (Seq2) with SubGop=4 and no hierarchic B frames is considered. It has SubGop=4, Hierarchy=0 and Latency=1.
The two sequences Seq1 and Seq2 are then spliced, with Seq1 first and Seq2 spliced in (Table 3), and Seq2 first with Seq1 spliced in (Table 4), where the spliced sequence is shown in bold.
Note that in all these frame sequences and the spliced sequences, the latency and hierarchy remain the same (at one in this example). Note also that the arbitrary number of B pictures in the SubGop may change across the splice point.
Scenario No. 2—Sequences with Hierarchic B Frames—
As shown in
Next a frame sequence (Seq3) is examined with SubGop=4 with one hierarchic B frame per SubGop, in decoding and encoding order. Here, SubGop=4, Hierarchy=1 and Latency=2.
Here, the latency is 2; i.e., the decoded frames are displayed two frames after the encoding begins. It is also noted that for frame sequences:
Latency=Levels of Hierarchy+1 Eqn. (2)
Now, a splice of a Hierarchy=0 sequence (such as Seq1 or Seq2) with a Hierarchy=1 sequence (such as Seq3), i.e., Seq3 followed by Seq1 or Seq2 is attempted.
Note that in both cases of splicing above (Table 6 and Table 7), the display sequence has an extra picture at the splice point which can be deleted and not displayed. The extra frame in both Tables is P8. This can be accomplished by e.g., one of the two methods below:
PTS(B7)≦DTS(I9)<PTS(P8) Eqn. (3)
DTS(I9)=DTS(B7)+2*Tframe Eqn. (4)
The foregoing logic is graphically illustrated in steps 414 and 416 of the method of
Next, a Hierarchy=1 sequence (such as Seq3) is spliced with a Hierarchy=0 sequence (such as Seq1 or Seq2), i.e., Seq1 or Seq2 followed by Seq3:
The foregoing logic is graphically illustrated in the generalized methodology of
In the two cases of splicing above, wherein a Hierarchy=1, Latency=2 sequence is spliced into a Hierarchy=0, Latency=1 sequence, a gap in the display sequence that needs to be filled in with the previous frame which in this case corresponds to P9 or P8. This can be achieved in one embodiment of the invention as follows:
Steps 434-440 of the method of
In splicing sequences, in addition to the addition/deletion/no action of pictures to maintain continuity in the display sequence, field parity needs to be maintained. This can be demonstrated with the following field sequence with Hierarchy=0, Latency=1, and SubGop=3:
Here, FP denotes field parity of the display sequence which is either top (T) field or bottom (B) field. Consider the following sequence with different field parity.
If Seq4 is spliced with Seq5 even without change of hierarchy/latency, a problem results due to field parity mismatch at the splice point.
In Table 12, it can be seen that two bottom fields are next to each other for p3 and i4, which is illegal. In order to solve this problem, a replication is performed (shown in gray) for bottom field p3 as a top field p3 in the bitstream.
A distinction can be made between repeating a picture and replicating the bits in a picture. Repeating a picture is applicable only for frame pictures, whereby the frame is repeated by using the pic_struct field of the picture timing SEI. The cases for pic_struct are given in Annex D of the H.264 standard (see Appendix III hereto):
Replication of bits is discussed in detail below.
Replication of a picture means copying the bits of a picture in the bitstream. This is different from repeating a picture by using picture timing SEI, which is not allowed for field pictures. Replication of a picture may produce a different number of bits in the new picture.
The illustrated embodiment makes use of the concept of field parity transitions across the splicing boundary being consistent or inconsistent. If the original sequence ends with a field parity (say T=top), and the spliced sequence starts with the opposite field parity (say B=bottom), that is labeled as consistent across the splice boundary, since the expected field parity for the next field is B=bottom. On the other hand, if the original sequence ends with a T=top field parity and the spliced sequence starts with T=top field parity, then we term it inconsistent parity, since this parity is different from the expected field parity.
For example, a frame sequence I0 B1 B2 P3 which is TB parity, i.e., top field first, when spliced with a frame sequence I0 B1 B2 P3 that has TB parity, a consistent parity results. If the spliced sequence has BT parity, i.e., bottom field first, then the parity is inconsistent. For a field sequence i0(T) b1(B) b2(T) p3(B), when spliced with a sequence i0(T) b1 (B) b2(T) p3(B), it is a consistent parity. If it is spliced with a sequence i0(B) b1(T) b2(B) p3(T), it is an inconsistent parity.
The examples above, demonstrate the need for the following conditions to create a legal H.264 bitstream after splicing:
In the next sections, the different splicing scenarios are described. In each case, the hierarchy of the original and spliced sequences is defined, as well as the field parity of the original and spliced sequences. The cases considered are:
The following assumptions are made in the context of the exemplary embodiments described below:
The following notations are used herein for convenience, yet are in no way intended as limiting on the various embodiments or implementations of the invention:
For each splicing case, the CPB may accumulate or lose bits due to addition or deletion of pictures. These quantities are measured in field units; i.e., if an extra field is added to the CPB, it is +1, if extra frame is added, it is +2, if a frame is deleted, it is −2. One goal, during splicing, is to not let the CPB grow out of bounds and maintain this buffer within reasonable limits. The sequence will not be compliant of the CPB over-/under-flows.
The DPB change in the following sections denotes a change in the presentation timeline or schedule. Positive changes denote delays from the intended or ideal presentation time. Negative changes are earlier than intended presentation. The DPB over/underflow (nothing to present, or presentation too far in the future resulting in no space to decode picture at specified DTS) does not occur if encoder provides HDR legal bitstreams. The splicer, however, ensures that the display process can continue at the specified constant frame rate given in the VUI across the splice “seam” where a discontinuity in DPB fullness may occur. In other words, the splice cannot result in gaps in display time or delay presenting the spliced sequence from being able to decode at the designated DTS time. Note that if the cumulative DPB change becomes too positive, the splicer can/must delete a full sub-gop. Note that sub-gops are dense and contiguous in both display and coding order.
It is noted that the CPB and DPB change due to splice in and splice out in each pair of cases. If the CPB or DPB grows, they can be reduced by deleting an entire subgop so that the buffers are bounded. The CPB changes can be further bounded by (1) transrating, and (2) slower or faster than modulation for CBR splicing.
For splicing with field pictures, two cases exist according to the exemplary H.264 standard:
1. Complementary field pairs:
2. Non-paired field pictures:
Frame Sequence Followed by Spliced Frame Sequence—
1. Hierarchy original=0, Hierarchy spliced=0, Parity=Consistent—
Splicer action: None.
2. Hierarchy original=0, Hierarchy spliced=0, Parity=Inconsistent—
Splicer action:
3. Hierarchy original=0, Hierarchy spliced=1, Parity=Consistent—
Splicer action:
1. Repeat P3 by using pic_struct=7 in picture timing SEI of P3.
4. Hierarchy original=0, Hierarchy spliced=1, Parity=Inconsistent—
Splicer action:
5. Hierarchy original=1, Hierarchy spliced=0, Parity=Consistent—
Splicer action:
1. DTS(I5)=DTS(B3)+2*Tframe.
This splicing can alternatively be performed as below.
Splicer action:
6. Hierarchy original=1, Hierarchy spliced=0, Parity=Inconsistent—
Splicer action:
This splicing can alternatively be performed as below.
Splicer action:
7. Hierarchy original=1, Hierarchy spliced=1, Parity=Consistent—
Splicer action: None.
8. Hierarchy original=1, Hierarchy spliced=1, Parity=Inconsistent—
Splicer action:
CPB and Presentation Timeline Changes Due to Splice in Followed by Splice Out—
1. Hierarchy original=0, Hierarchy spliced=0, Parity=Consistent:
Total Presentation Timeline Change=0.
Total CPB Change=0.
2. Hierarchy original=0, Hierarchy spliced=0, Parity=Inconsistent:
Total Presentation Timeline Change=+1+1=+2.
Total CPB Change=+1+1=+2.
3. Hierarchy original=0(1), Hierarchy spliced=1(0), Parity=Consistent:
Total Presentation Timeline Change=+2+0(−2)=+2(+0).
Total CPB Change=+0+2(2)=+2(−2).
4. Hierarchy original=0(1), Hierarchy spliced=1(0), Parity=Inconsistent:
Total Presentation Timeline Change=+3+1(1)=+4(+2).
Total CPB Change=+1+3(−1)=+4(+0).
5. Hierarchy original=1, Hierarchy spliced=1, Parity=Consistent:
Total Presentation Timeline Change=0.
Total CPB Change=0.
6. Hierarchy original=1, Hierarchy spliced=1, Parity=Inconsistent:
Total Presentation Timeline Change=+1+1=+2.
Total CPB Change=+1+1=+2.
Non-Paired Field Sequence Followed by Spliced Non-Paired Field Sequence—
1. Hierarchy original=0, Hierarchy spliced=0, Parity=Consistent—
Splicer action: None.
2. Hierarchy original=0, Hierarchy spliced=0, Parity=Inconsistent—
Splicer action:
3. Hierarchy original=0, Hierarchy spliced=1, Parity=Consistent—
See “Remaining Splicing Cases Involving Non-Paired Field Sequences” discussed subsequently herein.
4. Hierarchy original=0, Hierarchy spliced=1, Parity=Inconsistent—
See “Remaining Splicing Cases Involving Non-Paired Field Sequences” discussed subsequently herein.
5. Hierarchy original=1, Hierarchy spliced=0, Parity=Consistent—
Splicer action:
1. DTS(i4)=DTS(b3)+2*Tfield.
This can alternatively be spliced as:
Splicer action:
6. Hierarchy original=1, Hierarchy spliced=0, Parity=Inconsistent—
Splicer action:
This can alternatively be spliced as:
Splicer action:
7. Hierarchy original=1, Hierarchy spliced=1, Parity=Consistent—
Splicer action: None.
8. Hierarchy original=1, Hierarchy spliced=1, Parity=Inconsistent—
Splicer action:
CPS and Presentation Timeline Changes Due to Splice in Followed by Splice Out—
1. Hierarchy original=0, Hierarchy spliced=0, Parity=Consistent:
Total Presentation Timeline Change=0.
Total CPB Change=0.
2. Hierarchy original=0, Hierarchy spliced=0, Parity=Inconsistent:
Total Presentation Timeline Change=+1+1=+2.
Total CPB Change=+1+1=+2.
3. Hierarchy original=0(1), Hierarchy spliced=1(0), Parity=Consistent:
Total Presentation Timeline Change=N+0(−1).
Total CPB Change=N+1(−1).
4. Hierarchy original=0(1), Hierarchy spliced=1(0), Parity=Inconsistent:
Total Presentation Timeline Change=N+1(−1).
Total CPB Change=N+2(−1).
5. Hierarchy original=1, Hierarchy spliced=1, Parity=Consistent:
Total Presentation Timeline Change=0.
Total CPB Change=0.
6. Hierarchy original=1, Hierarchy spliced=1, Parity=Inconsistent:
Total Presentation Timeline Change=+1+1=+2.
Total CPB Change=+1+1=+2.
Here, “N” denotes timing discussed below in “Remaining Splicing Cases Involving Non-Paired Field Sequences”.
Frame Sequence Followed by Spliced Non-Paired Field Sequence—
1. Hierarchy original=0, Hierarchy spliced=0, Parity=Consistent—
Splicer action:
1. DTS(i4)=DTS(B2)+Tframe+Tfield.
2. Hierarchy original=0, Hierarchy spliced=0, Parity=Inconsistent—
Splicer action:
This can alternatively be spliced as:
Splicer action:
3. Hierarchy original=0, Hierarchy spliced=1, Parity=Consistent—
Splicer action: None.
4. Hierarchy original=0, Hierarchy spliced=1, Parity=Inconsistent—
Splicer action:
5. Hierarchy original=1, Hierarchy spliced=0, Parity=Consistent—
Splicer action:
This can alternatively be spliced as below:
Splicer action:
6. Hierarchy original=1, Hierarchy spliced=0, Parity=Inconsistent—
Splicer action:
This can alternatively be spliced as below:
Splicer action:
7. Hierarchy original=1, Hierarchy spliced=1, Parity=Consistent—
Splicer action:
1. DTS(i5)=DTS(B3)+2*Tframe.
This can alternatively be spliced as below:
Splicer action:
8. Hierarchy original=1, Hierarchy spliced=1, Parity=Inconsistent—
Splicer action:
This can alternatively be spliced as below:
Splicer action:
Non-Paired Field Sequence Followed by Spliced Frame Sequence
1. Hierarchy original=0, Hierarchy spliced=0, Parity=Consistent—
See “Remaining Splicing Cases Involving Non-Paired Field Sequences” discussed subsequently herein.
2. Hierarchy original=0, Hierarchy spliced=0, Parity=Inconsistent—
See “Remaining Splicing Cases Involving Non-Paired Field Sequences” discussed subsequently herein.
3. Hierarchy original=0, Hierarchy spliced=1, Parity=Consistent—
See “Remaining Splicing Cases Involving Non-Paired Field Sequences” discussed subsequently herein.
4. Hierarchy original=0, Hierarchy spliced=1, Parity=Inconsistent—
See “Remaining Splicing Cases Involving Non-Paired Field Sequences” discussed subsequently herein.
5. Hierarchy original=1, Hierarchy spliced=0, Parity=Consistent—
Splicer action: None.
6. Hierarchy original=1, Hierarchy spliced=0, Parity=Inconsistent—
Splicer action:
7. Hierarchy original=1, Hierarchy spliced=1, Parity=Consistent—
Splicer action:
1. Repeat frame I5 by using pic_struct=7 in picture timing SEI of I5.
8. Hierarchy original=1, Hierarchy spliced=1, Parity=Inconsistent—
Splicer action:
CPB and Presentation Timeline Changes Due to Splice in Followed by Splice Out—
1. Hierarchy original=0, Hierarchy spliced=0, Parity=Consistent:
Total Presentation Timeline Change=+0+N.
Total CPB Change=+1+N.
2. Hierarchy original=0, Hierarchy spliced=0, Parity=Inconsistent:
Total Presentation Timeline Change=+1(−1)+N.
Total CPB Change=+2(−1)+N.
3. Hierarchy original=0, Hierarchy spliced=1, Parity=Consistent:
Total Presentation Timeline Change=+0+0=0.
Total CPB Change=+0+0=0.
4. Hierarchy original=0, Hierarchy spliced=1, Parity=Inconsistent:
Total Presentation Timeline Change=+1+1=+2.
Total CPB Change=+1+1=+2.
5. Hierarchy original=1, Hierarchy spliced=0, Parity=Consistent:
Total Presentation Timeline Change=−2(+0)+N.
Total CPB Change=−1(−3)+N.
6. Hierarchy original=1, Hierarchy spliced=0, Parity=Inconsistent:
Total Presentation Timeline Change=−1(+1)+N.
Total CPB Change=+0(+4)+N.
7. Hierarchy original=1, Hierarchy spliced=1, Parity=Consistent:
Total Presentation Timeline Change=+0(−2)+2=+2(+0).
Total CPB Change=+2(−2)+0=+2(−2).
8. Hierarchy original=1, Hierarchy spliced=1, Parity=Inconsistent:
Total Presentation Timeline Change=+1(−1)+3=+4(+2).
Total CPB Change=+3(−1)+3=+6(+2).
Here, “N” denotes timing discussed below in “Remaining Splicing Cases Involving Non-Paired Field Sequences”.
Remaining Splicing Cases Involving Non-Paired Field Sequences—
The six “unsolved” cases listed in the discussion presented above require a different method. All of these cases involve a non-paired field sequence. In these situations, the latency of the non-paired field sequences is increased by changing the PTS of the pictures, because fields cannot be repeated by pic_struct field of the picture timing SEI. Consider the exemplary sequence below.
The latency can be changed by one by PTS(i0)=PTS(i0)+Tfield. The same operation is performed for b1, b2, and p3.
Non-Paired Field Sequence followed by Spliced Non-Paired Field—
1. Hierarchy original=0, Hierarchy spliced=1, Parity=Consistent.
Splicer action: None.
2. Hierarchy original=0, Hierarchy spliced=1, Parity=Inconsistent.
Splicer action:
Replicate the bits of bottom field p3 as a top field p3 as described previously herein.
DTS(replicated p3)=DTS(b2)+Tfield.
DTS(i4)=DTS(replicated p3) Tfield.
Non-Paired Field Sequence followed by Spliced Frame—
1. Hierarchy original=0, Hierarchy spliced=0, Parity=Consistent.
Splicer action: None.
2. Hierarchy original=0, Hierarchy spliced=0, Parity=Inconsistent.
Splicer action:
Replicate the bits of top field p3 as a bottom field p3 as described previously herein.
DTS(replicated p3)=DTS(b2)+Tfield.
DTS(I4)=DTS(replicated p3)+Tfield.
3. Hierarchy original=0, Hierarchy spliced=1, Parity=Consistent.
Splicer action:
1. Repeat I4 by using pic_struct=7 in picture timing SEI of I4.
4. Hierarchy original=0, Hierarchy spliced=1, Parity=Inconsistent.
Splicer action:
Complementary Field Sequence Followed by Spliced Complementary Field Sequence—
1. Hierarchy original=0, Hierarchy spliced=0, Parity=Consistent—
Splicer action: None.
2. Hierarchy original=0, Hierarchy spliced=0, Parity=Inconsistent—
Splicer action:
1. Replicate the bits of bottom field p7 as a non-paired top field p7.
3. Hierarchy original=0, Hierarchy spliced=1, Parity=Consistent—
Splicer action:
1. Replicate the bits of field pair p6 and p7 as a frame picture P7.
2. Repeat frame P7 by using pic_struct=7 in picture timing SEI of P7.
4. Hierarchy original=0, Hierarchy spliced=1, Parity=Inconsistent—
Splicer action:
1. Replicate the bits of field pair p6 and p7 as a frame picture P7.
2. Repeat frame P7 by using pic_struct=7 in picture timing SEI of P7.
3. Replicate the bits of bottom field i8 as a top field i8.
5. Hierarchy original=1, Hierarchy spliced=0, Parity=Consistent—
Splicer action:
1. DTS(i10)=DTS(b7)+3*Tfield.
6. Hierarchy original=1, Hierarchy spliced=0, Parity=Inconsistent—
Splicer action:
1. Replicate the bits of bottom field p9 as a non-paired top field p9.
2. DTS(p9)=DTS(b7) Tfield.
3. DTS(i10)=DTS(b7)+4*Tfield.
7. Hierarchy original=1, Hierarchy spliced=1, Parity=Consistent—
Splicer action: None.
8. Hierarchy original=1, Hierarchy spliced=1, Parity=Inconsistent—
Splicer action:
1. Replicate the bits of bottom field p9 as a non-paired top field p9.
2. DTS(p9)=DTS(b7)+Tfield.
3. DTS(i10)=DTS(b7)+Tfield.
CPB and Presentation Timeline Changes Due to Splice in Followed by Splice Out—
1. Hierarchy original=0, Hierarchy spliced=0, Parity=Consistent:
Total Presentation Timeline Change=0.
Total CPB Change=0.
2. Hierarchy original=0, Hierarchy spliced=0, Parity=Inconsistent:
Total Presentation Timeline Change=+1+1=+2.
Total CPB Change=+1+1=+2.
3. Hierarchy original=0(1), Hierarchy spliced=1(0), Parity=Consistent:
Total Presentation Timeline Change=+4+0=+4.
Total CPB Change=+2+2+4.
4. Hierarchy original=0(1), Hierarchy spliced=1(0), Parity=Inconsistent:
Total Presentation Timeline Change=+5+1=+6.
Total CPB Change=+3+3=+6.
5. Hierarchy original=1, Hierarchy spliced=1, Parity=Consistent:
Total Presentation Timeline Change=0.
Total CPB Change=0.
6. Hierarchy original=1, Hierarchy spliced=1, Parity=Inconsistent:
Total Presentation Timeline Change=+1+1=+2.
Total CPB Change=+1+1=+2.
Frame Sequence Followed by Spliced Complementary Field Sequence—
1. Hierarchy original=0, Hierarchy spliced=0, Parity=Consistent—
Splicer action: None.
2. Hierarchy original=0, Hierarchy spliced=0, Parity=Inconsistent—
Splicer action:
3. Hierarchy original=0, Hierarchy spliced=1, Parity=Consistent—
Splicer action:
1. Repeat P3 by using pic_struct=7 in picture timing SEI of P3.
4. Hierarchy original=0, Hierarchy spliced=1, Parity=Inconsistent—
Splicer action:
1. Repeat P3 by using pic_struct=7 in picture timing SEI of P3.
2. Replicate the bits of bottom field i4 as a non-paired top field i4.
5. Hierarchy original=1, Hierarchy spliced=0, Parity=Consistent—
Splicer action:
1. DTS(i4)=DTS(B3)+2*Tframe.
This can alternatively be spliced as:
Splicer action:
6. Hierarchy original=1, Hierarchy spliced=0, Parity=Inconsistent—
Splicer action:
This can alternatively be spliced as:
Splicer action:
7. Hierarchy original=1, Hierarchy spliced=1, Parity=Consistent—
Splicer action: None.
8. Hierarchy original=1, Hierarchy spliced=1, Parity=Inconsistent—
Splicer action:
Complementary Field Sequence Followed by Spliced Frame Sequence—
1. Hierarchy original=0, Hierarchy spliced=0, Parity=Consistent—
Splicer action: None.
2. Hierarchy original=0, Hierarchy spliced=0, Parity=Inconsistent—
Splicer action:
3. Hierarchy original=0, Hierarchy spliced=1, Parity=Consistent—
Splicer action:
1. Repeat frame I8 by using pic_struct=7 in picture timing SEI of I8.
4. Hierarchy original=0, Hierarchy spliced=1, Parity=Inconsistent—
Splicer action:
1. Replicate the bits of bottom field p7 as a non-paired top field p7.
2. Repeat frame I8 by using pic_struct=7 in picture timing SEI of I8.
5. Hierarchy original=1, Hierarchy spliced=0, Parity=Consistent—
Splicer action:
1. DTS(I10)=DTS(b7)+3*Tfield.
6. Hierarchy original=1, Hierarchy spliced=0, Parity=Inconsistent—
Splicer action:
7. Hierarchy original=1, Hierarchy spliced=1, Parity=Consistent—
Splicer action: None.
8. Hierarchy original=1, Hierarchy spliced=1, Parity=Inconsistent—
Splicer action:
CPB and Presentation Timeline Changes Due to Splice in Followed by Splice Out—
1. Hierarchy original=0, Hierarchy spliced=0, Parity=Consistent:
Total Presentation Timeline Change=+0+0=0.
Total CPB Change=+0+0=0.
2. Hierarchy original=0, Hierarchy spliced=0, Parity=Inconsistent:
Total Presentation Timeline Change=+1+1=+2.
Total CPB Change=+1+1=+2.
3. Hierarchy original=0, Hierarchy spliced=1, Parity=Consistent:
Total Presentation Timeline Change=+2+0=+2.
Total CPB Change=+0+2=+2.
4. Hierarchy original=0, Hierarchy spliced=1, Parity=Inconsistent:
Total Presentation Timeline Change=+3+1=+4.
Total CPB Change=+1+3=+4.
5. Hierarchy original=1, Hierarchy spliced=0, Parity=Consistent:
Total Presentation Timeline Change=+0(−2)+2=+2(+0).
Total CPB Change=+2(−2)+0=+2(−2).
6. Hierarchy original=1, Hierarchy spliced=0, Parity=Inconsistent:
Total Presentation Timeline Change=+1(−1)+3=+4(+2).
Total CPB Change=+3(−1)+1=+4(+0).
7. Hierarchy original=1, Hierarchy spliced=1, Parity=Consistent:
Total Presentation Timeline Change=+0+0=0.
Total CPB Change=+0+0=0.
8. Hierarchy original=1, Hierarchy spliced=1, Parity=Inconsistent:
Total Presentation Timeline Change=+1+3=+4.
Total CPB Change=+1+3=+4.
Pull Down—
Film sequences which are coded to display at 24 frames per second (fps) when displayed as interlaced video at 30 fps, a 2-3 pull down method is used, as shown in the exemplary embodiment of
Here, each film frame can be displayed as a top (T) or bottom (B) frame of interlaced video. Furthermore, each film frame can be displayed as two or three fields in various field combinations TB, BT, TBT, or BTB. The field parity for display is stored in the pic_struct field of the picture timing SEI of each film frame. These are:
1. pic_struct=3 is TB.
2. pic_struct=4 is BT.
3. pic_struct=5 is TBT.
4. pic_struct=6 is BTB.
The splicing process also considers the field parity consistency. Four specific cases are discussed below:
Splicer action: None.
2. Original=2-3 Pull down, Spliced=2-3 Pull down, Parity=Inconsistent—
Splicer action:
Delete the last bottom field of P3 by changing the pic_struct to 4 from 6.
3. Original=2-3 Pull down, Spliced=Normal Video, Parity=Consistent
Splicer action: None.
4. Original=2-3 Pull down, Spliced=Normal Video, Parity=Inconsistent—
Splicer action:
1. Delete the last bottom field of P3 by changing the pic_struct to 4 from 6.
Two Layer B-Hierarchy—
All discussions presented above can be extended to higher layers of hierarchy of B pictures, such as a two-layer hierarchy.
In Table 87 above, the anchor or reference or stored B pictures are shown in italics. A few exemplary splicing cases are now considered for purposes of illustration.
1. Hierarchy original=2, Hierarchy spliced=1, Parity=Consistent—
Splicer action:
1. DTS(P9)=DTS(B7)+Tframe.
2. Hierarchy original=2, Hierarchy spliced=0, Parity=Consistent—
Splicer action:
1. DTS(P9)=DTS(B7)+3*Tframe.
Alternatively, this can be spliced as follows:
Splicer action:
3. Hierarchy original=1, Hierarchy spliced=2, Parity=Consistent—
Splicer action:
1. Repeat P4 by using pic_struct=7 in picture timing SEI of P4.
4. Hierarchy original=0, Hierarchy spliced=2, Parity=Consistent—
Splicer action:
1. Repeat P3 twice by using pic_struct=8 in picture timing SEI of P3.
In one exemplary software implementation, the present invention may be implemented as a computer program that is stored on a computer useable medium, such as a memory card, a digital versatile disk (DVD), a compact disc (CD) and the like, that includes a computer readable program which when loaded on a computer implements the methods of the present invention.
It would be recognized by those skilled in the art, that the invention described herein can take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment containing both hardware and software elements. In an exemplary embodiment, the invention may be implemented in software, which includes but is not limited to firmware, resident software, microcode, etc.
In this case, the invention can take the form of a computer program product accessible from a computer-usable or computer-readable medium providing program code for use by or in connection with a computer or any instruction execution system. For the purposes of this description, a computer-usable or computer readable medium can be any apparatus that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
It should be understood, of course, that the foregoing relates to exemplary embodiments of the invention and that modifications may be made without departing from the spirit and scope of the invention as set forth in the following claims.
Consider the sequence below in display and coding order for the definitions below:
pic_struct indicates whether a picture should be displayed as a frame or one or more fields, according to Table III-1. Frame doubling (pic_struct equal to 7) indicates that the frame should be displayed two times consecutively, and frame tripling (pic_struct equal to 8) indicates that the frame should be displayed three times consecutively.
NOTE—Frame doubling can facilitate the display, for example, of 25 p video on a 50 p display and 29.97 p video on a 59.94 p display. Using frame doubling and frame tripling in combination on every other frame can facilitate the display of 23.98 p video on a 59.94 p display.
NumClockTS is determined by pic_struct as specified in Table III-1. There are up to NumClockTS sets of clock timestamp information for a picture, as specified by clock timestamp_flag[i] for each set. The sets of clock timestamp information apply to the field(s) or the frame(s) associated with the picture by pic_struct.
The contents of the clock timestamp syntax elements indicate a time of origin, capture, or alternative ideal display. This indicated time is computed as
clockTimestamp=((hH*60+mM)*60+sS)*time_scale+nFrames*(num_units_in_tick*(1+nuit_field_based_flag))+tOffset, (III-1)
in units of clock ticks of a clock with clock frequency equal to time_scale Hz, relative to some unspecified point in time for which clockTimestamp is equal to 0. Output order and DPB output timing are not affected by the value of clockTimestamp. When two or more frames with pic_struct equal to 0 are consecutive in output order and have equal values of clockTimestamp, the indication is that the frames represent the same content and that the last such frame in output order is the preferred representation.
NOTE—clockTimestamp time indications may aid display on devices with refresh rates other than those well-matched to DPB output times.
This application claims priority to co-owned and co-pending U.S. provisional patent application Ser. No. 61/199,292 filed Nov. 14, 2008 entitled “Method and Apparatus for Splicing B Pictures in a Compressed Video Bitstream”, which is incorporated herein by reference in its entirety. This application is related to co-owned and co-pending U.S. patent application Ser. No. 12/322,887 filed Feb. 9, 2009 and entitled “Method and Apparatus for Transrating Compressed Digital Video”, U.S. patent application Ser. No. 12/604,766 filed Oct. 23, 2009 and entitled “Method and Apparatus for Transrating Compressed Digital Video”, U.S. patent application Ser. No. 12/396,393 filed Mar. 2, 2009 and entitled “Method and Apparatus for Video Processing Using Macroblock Mode Refinement”, U.S. patent application Ser. No. 12/604,859 filed Oct. 23, 2009 and entitled “Method and Apparatus for Video Processing Using Macroblock Mode Refinement”, and U.S. patent application Ser. No. 12/582,640 filed Oct. 20, 2009 and entitled “Rounding and Clipping Methods and Apparatus for Video Processing”, the contents of each of the foregoing incorporated herein by reference in its entirety.
Number | Date | Country | |
---|---|---|---|
61199292 | Nov 2008 | US |