The present disclosure relates generally to the field of video data transmissions over digital networks.
The ever-increasing demand for multimedia content on end-user devices combined with the limited bandwidth available to deliver that content has lead to the development of very efficient and highly robust video coding algorithms. For example, the H.264/AVC (Advanced Video Coding) digital video coding standard written by the International Telecommunication Union (ITU) Video Coding Experts Group (VCEG) together with the International Organization for Standardization ISO/International Electrotechnical Commission (IEC) IEC Moving Picture Experts Group (MPEG) is widely known for its ability to provide high quality video in error prone environments. The emerging scalable extension of H.264/AVC, known as H.264/SVC (Scalable Video Coding) defines a scalable video bitstream that contains a non-scaleable base layer and one or more enhancement layers.
The H.264 standard contains a feature called Flexible Macroblock Ordering (FMO) that allows multiple “distinct slice groups” to be created in an H.264 picture in such a way that no macroblock (a block of 16×16 pixels) is surrounded by any other macroblock from the same slice group. Inside a slice group all macroblocks are ordered in raster scan order. Basically, each slice group is like its own little mini-picture. For instance, one slice group can be intracoded while a neighboring group can be predictively coded from the same corresponding slice group in a reference picture. The FMO feature makes it possible to take independent contributing streams from multiple sources and combine the streams together into one stream of composited pictures, say, for video conferencing.
The present invention will be understood more fully from the detailed description that follows and from the accompanying drawings, which however, should not be taken to limit the invention to the specific embodiments shown, but are for explanation and understanding only.
In the following description specific details are set forth, such as device types, system configurations, protocols, applications, methods, etc., in order to provide a thorough understanding of the disclosure herein. However, persons having ordinary skill in the relevant arts will appreciate that these specific details may not be needed to practice the embodiments described.
In the context of the present application, a computer network is a geographically distributed collection of interconnected subnetworks for transporting data between nodes, such as intermediate nodes and end nodes (also referred to as endpoints). A local area network (LAN) is an example of such a subnetwork; a plurality of LANs may be further interconnected by an intermediate network node, such as a router, bridge, or switch, to extend the effective “size” of the computer network and increase the number of communicating nodes. Examples of the devices or nodes include servers, mixers, control units, and personal computers. The nodes typically communicate by exchanging discrete frames or packets of data according to predefined protocols.
A video receiver device represents any equipment, node, terminal, or other device capable of receiving, decoding, or rendering a digital video image therefrom. Examples of video receiver devices include a video appliance (e.g., a video monitor, etc.), a personal digital assistant (PDA); a personal computer (PC), such as notebook, laptop, or desktop computer; a television device, set-top box (STB), cellular phone, video phone, or any other device, component, element, or object capable of receiving, decoding, or rendering digital video images.
A video stream, such as a contributing stream or source stream, is a sequence of pictures encoded in accordance to a video coding specification, such as ITU H.264. A video coding specification may allow a video source to be encoded without incremental enhancement layers corresponding to SVC. An H.264/SVC stream herein refers to a video stream that contains at least one enhancement layer in addition to a base layer of coded video.
A video compositor (“compositor” for short) is any device capable of receiving two or more digital video input streams and combining them into a single digital video output stream with minimal, or without performing, decoding or re-encoding operations below the slice layer on any of the constituent streams.
Arbitrary Slice Ordering (ASO) is a technique for restructuring the ordering of macroblocks in pictures that obviates the need to wait for a full set of pictures to arrive from all sources.
Overview
In one embodiment, a video compositor receives multiple video source streams. The source streams may comprise H.264/AVC streams, H.264-SVC streams, or a combination of H.264/AVC and H.264/SVC streams. Other types of streams compatible with the H.264 standard, such as future enhancements or extensions to the existing definitions, may also be received as inputs to the video compositor. The compositor first processes the existing H.264 headers from each source stream to provide proper information in the produced combined stream. Headers are removed from the source streams as necessary but some of the parsed or interpreted information is retained or modified as necessary to affect composited pictures in the combined stream. Processed header information may include slice headers, picture parameters sets, sequence parameter sets, and Network Abstraction Layer (NAL) headers.
The encoded macroblocks from each of the streams are preserved and are added to a picture with a picture parameter set (PPS) that identifies them as part of a larger composite picture containing multiple slice groups. Since each contributing stream is a sequence of pictures, the compositor produces a sequence of composited pictures, each composited picture for a corresponding time interval in sequential time intervals. Composition is performed in a manner that enables a visually synchronized presentation of the combined stream when decoded and displayed. The sequential progression of compositing pictures from contributing streams may be according to pictures having a transmission or presentation time within the time interval corresponding to a composition operation. The associated time with a picture may be expressed, for example, as a time stamp relative to a reference clock. In one implementation, only pictures with an associated time within a corresponding time interval are composited. In one embodiment, only pictures with an associated time within the corresponding time interval are composited.
Using FMO and ASO techniques, the incoming pictures can be combined in a time-consistent manner to produce streams of composited pictures. ASO provisions the respective portions of a composited picture to be transmitted or decoded in a particular order, e.g., raster scan order. Hence, the individual source streams do not need to be delayed to wait for pictures from the other sources to arrive. Note that all composition operations occur in the coded domain. By performing only limited modification of the streams using FMO and ASO techniques, the latency of the compositor is kept at an acceptable delay. In one embodiment the raster scan order of the contributing streams in the composited picture is maintained up to the latency amount that does not supersede the latency of a typically video switch (e.g., less than 20 milliseconds).
In accordance with one embodiment, any number of two or more source video streams may be composited together. Additionally, source streams having varying resolutions, aspect ratios, etc., may be composited together to form a single, larger picture. In a specific embodiment, the compositor further combines contributing streams of different profiles into a single profile.
In another embodiment, the maximum number of contributing streams for composition is set to a predetermined threshold. If the number of contributing streams exceed the threshold, certain contributing streams with the same picture resolution and/or AVC characteristics are adjoined side-by-side, or on top of each other, as a slice group.
It is appreciated that in the embodiment of
The use pad slice groups is illustrated in
Depending on the sophistication level of receiver 17, composite picture 20 may be rendered as shown in
By way of further example, five video sources consisting of one Common Image Format (CIF) image (352×288 pixels) and four Quarter CIF (QCIF) images (each 176×144 pixels) may be pieced together by the compositor of
In certain embodiments, in addition to encoding the pad slice groups with a simple, constant image, the pad slice groups may be keyed with a distinct pattern that readily identifies them as pad areas. This keying may include a small number of PCM-encoded macroblocks that have a unique pattern indicating that they comprise padding for the source pictures. A simple decoder could render these pad slice groups with only minor defects to the pad image. An enhanced or more sophisticated decoder could detect these keyed slice groups and recognize that all other slice groups were individually composed images. This would allow the decoder to scale and render each of the composed slice groups independently, producing an enhanced rendering.
It should be understood that the pad slice groups in a composite image may be produced using FMO in various rectilinear shapes, not limited to rectangles. Practitioners in the art will appreciate that FMO consists of multiple different types of patterns that may be utilized to mark macroblock areas of interest.
To achieve that result, the compositor first removes the existing H.264 header information from each of the incoming streams (block 54). Once the incoming H.264 headers have been stripped, the compositor converts the macroblocks from the source streams into slice groups (block 55). Each of the source video images may comprise a single slice; however, as the separate source images are composed into a larger picture, pad slice groups may need to be added to create a rectangular composite picture size compatible with a certain output display format. The slice groups are then reordered into a single composite output stream with an appropriate slice group header using FMO and ASO techniques (block 56). As discussed previously, an H.264/AVC compliant decoder receiving the output stream may simply render the composite picture as a large rectangle with the given padding area arrangement. More advanced decoders may have the ability to recognize the padding information contained in the output stream, select the active video portions, and either display them separately or arrange them into a different display format.
In another embodiment, the video compositor that formats the incoming streams into slice groups creates a picture parameter set with new (e.g., proprietary) fields that attach scaling parameters to each slice group. These scaling parameters may instruct the decoder to render the slice groups—irrespective of their current width and height—as an image that is N macroblocks wide by M macroblocks high where N and M are integers. The scaling parameters may also indicate to the decoder whether a picture should be stretched, cropped, filled, or some combination thereof, to achieve a new scaled format. The slice groups are then placed at specific coordinates using FMO and ASO techniques. The slice groups, however, do not have to add up to an image that creates a rectangle. Instead, the scaling parameters would ensure that the decoder scale the individual slice groups to a rectangle.
It should be understood that elements of the present invention may also be provided as a computer program product which may include a machine-readable medium having stored thereon instructions which may be used to program a computer (e.g., a processor or other electronic device) to perform a sequence of operations. Alternatively, the operations may be performed by a combination of hardware and software. The machine-readable medium may include, but is not limited to, floppy diskettes, optical disks, CD-ROMs, and magneto-optical disks, ROMs, RAMs, EPROMs, EEPROMs, magnet or optical cards, propagation media or other type of media/machine-readable medium suitable for storing electronic instructions. For example, elements of the present invention may be downloaded as a computer program product, wherein the program may be transferred from a remote computer or telephonic device to a requesting process by way of data signals embodied in a carrier wave or other propagation medium via a communication link (e.g., a modem or network connection).
Additionally, although the present invention has been described in conjunction with specific embodiments, numerous modifications and alterations are well within the scope of the present invention. Accordingly, the specification and drawings are to be regarded in an illustrative rather than a restrictive sense.
Number | Name | Date | Kind |
---|---|---|---|
5444491 | Lim | Aug 1995 | A |
5508746 | Lim | Apr 1996 | A |
5742343 | Haskell et al. | Apr 1998 | A |
5923820 | Cunnagin et al. | Jul 1999 | A |
6208382 | Glenn | Mar 2001 | B1 |
6337716 | Yim | Jan 2002 | B1 |
6353613 | Kubota et al. | Mar 2002 | B1 |
6757911 | Shimoji et al. | Jun 2004 | B1 |
6801575 | Crinon | Oct 2004 | B1 |
7020205 | Beyers et al. | Mar 2006 | B1 |
7366462 | Murali et al. | Apr 2008 | B2 |
7672377 | Heng et al. | Mar 2010 | B2 |
7693220 | Wang et al. | Apr 2010 | B2 |
7747921 | DaCosta | Jun 2010 | B2 |
20010006404 | Yun | Jul 2001 | A1 |
20020054638 | Hanamura et al. | May 2002 | A1 |
20020075954 | Vince | Jun 2002 | A1 |
20020162114 | Bisher et al. | Oct 2002 | A1 |
20040240860 | Bruls et al. | Dec 2004 | A1 |
20040244059 | Coman | Dec 2004 | A1 |
20040264931 | Nakashika et al. | Dec 2004 | A1 |
20050015794 | Roelens | Jan 2005 | A1 |
20050210523 | Parnell et al. | Sep 2005 | A1 |
20060023748 | Chandhok et al. | Feb 2006 | A1 |
20060083308 | Schwarz et al. | Apr 2006 | A1 |
20060146734 | Wenger et al. | Jul 2006 | A1 |
20060271990 | Rodriguez et al. | Nov 2006 | A1 |
20070206673 | Cipolli et al. | Sep 2007 | A1 |
20080037656 | Hannuksela | Feb 2008 | A1 |
20090103605 | Rodriguez et al. | Apr 2009 | A1 |
20090103634 | Rodriguez et al. | Apr 2009 | A1 |
20090106812 | Rodriguez et al. | Apr 2009 | A1 |
20090106814 | Rodriguez et al. | Apr 2009 | A1 |
20090122183 | Rodriguez et al. | May 2009 | A1 |
20090122184 | Rodriguez et al. | May 2009 | A1 |
20090122185 | Rodriguez et al. | May 2009 | A1 |
20090122186 | Rodriguez et al. | May 2009 | A1 |
20090122190 | Rodriguez et al. | May 2009 | A1 |
20090122858 | Rodriguez et al. | May 2009 | A1 |
20090141794 | Rodriguez et al. | Jun 2009 | A1 |
20090144796 | Rodriguez et al. | Jun 2009 | A1 |
20090154553 | Rodriguez et al. | Jun 2009 | A1 |
Number | Date | Country |
---|---|---|
0 608 092 | Jul 1994 | EP |
1 182 877 | Feb 2002 | EP |
WO 9107847 | May 1991 | WO |
WO 0176245 | Oct 2001 | WO |
WO 03026300 | Mar 2003 | WO |
WO 2006125052 | Nov 2006 | WO |
WO 2007076486 | Jul 2007 | WO |
Entry |
---|
Bergeron, Cyril; Lamy-Bergot, Catherine. Soft-input Decoding of Variable-Length Codes Applied to the H.264 Standard. IEEE 6th Workshop on Multimedia Signal Processing. Pub. Date: 2004. Relevant pp. 87-90. Found on the World Wide Web at: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=1436425. |
U.S. Official Action mailed Oct. 27, 2009 in U.S. Appl. No. 11/132,060. |
U.S. Official Action mailed Nov. 13, 2009 in U.S. Appl. No. 12/342,938. |
U.S. Non-Final Official Action mailed Nov. 8, 2010 in U.S. Appl. No. 11/132,060. |
Schwarz et al., “Overview of the Scalable Video Coding Extension of the H.264/AVC Standard,” IEEE Transactions on Circuits and Systems for Video Technology, IEEE Service Center, Piscataway,NJ,US, vol. 17, Sep. 9, 2007, pp. 1103-1120, XP011193019. |
Wiegand et al., “Overview of the H.264/AVC Video Coding Standard,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 13, Jul. 7, 2003, pp. 560-576, XP002509016. |
European Examination dated Jan. 21, 2011 in Application No. 08 830 482.9, 3 pages. |
U.S. Final Official Action mailed Mar. 21, 2011 in U.S. Appl. No. 12/342,576. |
U.S. Final Official Action mailed Mar. 24, 2011 in U.S. Appl. No. 12/342,938. |
U.S. Non-Final Official Action mailed Apr. 1, 2011 in U.S. Appl. No. 12/342,572. |
U.S. Final Official Action mailed Apr. 1, 2011 in U.S. Appl. No. 12/342,824. |
U.S. Final Official Action mailed Apr. 6, 2011 in U.S. Appl. No. 12/342,567. |
U.S. Final Official Action mailed Apr. 13, 2011 in U.S. Appl. No. 12/342,582. |
U.S. Non-Final Official Action mailed May 3, 2011 in U.S. Appl. No. 12/342,914. |
U.S. Non-Final Official Action mailed May 12, 2011 in U.S. Appl. No. 12/342,934. |
U.S. Official Action mailed May 10, 2010 in U.S. Appl. No. 11/132,060. |
U.S. Final Official Action mailed Apr. 13, 2010 in U.S. Appl. No. 12/342,934. |
U.S. Final Office Action mailed Apr. 9, 2010 in U.S. Appl. No. 12/342,914. |
U.S. Final Official Action mailed May 11, 2010 in U.S. Appl. No. 12/342,938. |
U.S. Non-Final Official Action mailed Oct. 14, 2010 in U.S. Appl. No. 12/342,567. |
U.S. Non-Final Official Action mailed Oct. 1, 2010 in U.S. Appl. No. 12/342,572. |
U.S. Non-Final Official Action mailed Oct. 6, 2010 in U.S. Appl. No. 12/342,576. |
U.S. Non-Final Official Action mailed Sep. 28, 2010 in U.S. Appl. No. 12/342,582. |
U.S. Non-Final Official Action mailed Oct. 15, 2010 in U.S. Appl. No. 12/342,824. |
U.S. Official Action mailed Sep. 16, 2010 in U.S. Appl. No. 12/342,914. |
U.S. Non-Final Official Action mailed Sep. 28, 2010 in U.S. Appl. No. 12/342,934. |
U.S. Official Action mailed Oct. 8, 2010 in U.S. Appl. No. 12/342,938. |
U.S. Appl. No. 11/132,060, filed May 18, 2005, entitled “Higher Picture Rate HD Encoding and Transmission with Legacy HD Backward Compatibility”. |
U.S. Appl. No. 12/342,567, filed Dec. 23, 2008, entitled “Providing Video Programs with Identifiable and Manageable Video Streams”. |
U.S. Appl. No. 12/342,569, filed Dec. 23, 2008, entitled “Receiving and Processing Multiple Streams Associated with a Video Program”. |
U.S. Appl. No. 12/342,572, filed Dec. 23, 2008, entitled “Providing Complementary Streams of a Program Coded According to Different Compression Methods”. |
U.S. Appl. No. 12/342,576, filed Dec. 23, 2008, entitled “Providing Identifiable Video Streams of Different Picture Formats”. |
U.S. Appl. No. 12/342,582, filed Dec. 23, 2008, entitled “Providing Video Streams of a Program with Different Stream Type Values Coded According to the Same Video Coding Specification”. |
U.S. Appl. No. 12/342,824, filed Dec. 23, 2008, entitled “Adaptive Processing of Programs with Multiple Video Streams”. |
U.S. Appl. No. 12/342,875, filed Dec. 23, 2008, entitled “Video Processing Impermeable to Additional Video Streams of a Program”. |
U.S. Appl. No. 12/342,914, filed Dec. 23, 2008, entitled “Processing Different Complementary Streams of a Program”. |
U.S. Appl. No. 12/342,934, filed Dec. 23, 2008, entitled “Processing Video Streams of a Different Picture Formats”. |
U.S. Appl. No. 12/342,938, filed Dec. 23, 2008, entitled “Era-Dependent Receiving and Processing of Programs with One or More Video Streams”. |
U.S. Appl. No. 12/342,946, filed Dec. 23, 2008, entitled “Processing Identifiable Video Streams of a Program According to Stream Type Values”. |
U.S. Appl. No. 12/343,032, filed Dec. 23, 2008, entitled “Providing a Video Stream with Alternate Packet Identifiers”. |
U.S. Appl. No. 12/343,059, filed Dec. 23, 2008, entitled “Receiving and Separating an Encoded Video Stream into Plural Encoded Pictures with Different Identifiers”. |
U.S. Official Action mailed Jun. 2, 2008 in U.S. Appl. No. 11/132,060. |
U.S. Official Action mailed Mar. 20, 2009 in U.S. Appl. No. 11/132,060. |
U.S. Official Action mailed Sep. 30, 2009 in U.S. Appl. No. 12/342,934. |
U.S. Official Action mailed Sep. 30, 2009 in U.S. Appl. No. 12/342,914. |
Bayrakeri S et al., “MPEG-2/ECVQ Lookahead Hybrid Quantization and Spatially Scalable Coding” The SPIE, Bellingham WA., vol. 3024, pp. 129-137, Published: Jan. 10, 1997. |
PCT Search Report mailed Oct. 24, 2006 in PCT/US2006/019181. |
PCT Written Opinion mailed Oct. 24, 2006 in PCT/US2006/019181. |
Number | Date | Country | |
---|---|---|---|
20090067507 A1 | Mar 2009 | US |