The present invention relates generally to images. More particularly, an embodiment of the present invention relates to the delivery of a dual-layer stream that combines a backwards-compatible interlaced video stream layer with an enhancement layer to reconstruct full-resolution progressive video.
Video broadcasting standards for digital television, such as the ATSC (Advanced Television Systems Committee) specification in the United States and the family of DVB (Digital Video Broadcasting) international standards, allow broadcasters to transmit digital content in a variety of resolutions and formats, such as 480p (e.g., 720×480 at 60 frames per second), 1080i (e.g., 1920×1080 at 60 fields per second), or 720p (1280×720 at 60 frames per second). Typically, a broadcasting station will allocate one or more channels for a particular broadcast, where each channel utilizes a single transmission format. For example, a sports station may broadcast a football game in 720p in one channel and in 480p in another channel. Broadcasting stations may prefer to use progressive transmission mode (e.g., 720p) for sports or movies, and interlaced transmission (e.g., 1080i) for regular programming (e.g., news and daytime TV series).
As more and more consumers invest into 1080p HDTVs, there is an increased interest from broadcasters to offer premium programming, such as movies and special sports broadcasts, using 1080p (e.g., 1920×1080 at 60 or 50 frames per second) transmission formats. Since many older receivers and TV sets may not be able to decode 1080p broadcasts, to accommodate backwards compatibility with those sets, broadcasters could transmit both a 720p or 1080i stream and a 1080p stream on two separate channels; however, such solutions require at least twice the bandwidth of a traditional 720p or 1080i high definition (HD) broadcast.
The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section. Similarly, issues identified with respect to one or more approaches should not assume to have been recognized in any prior art on the basis of this section, unless otherwise indicated.
An embodiment of the present invention is illustrated by way of example, and not in way by limitation, in the figures of the accompanying drawings and in which like reference numerals refer to similar elements and in which:
Dual-layer, backwards compatible delivery of progressive video is described herein. In the following description, for the purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the present invention. It will be apparent, however, that the present invention may be practiced without these specific details. In other instances, well-known structures and devices are not described in exhaustive detail, in order to avoid unnecessarily occluding, obscuring, or obfuscating the present invention.
Overview
Example embodiments described herein relate to dual-layer, backwards compatible, delivery of high frame rate progressive video. In an embodiment, given an input progressive sequence, a video encoder creates a coded dual-layer stream that combines a backwards-compatible interlaced video stream layer with a full-resolution progressive video stream layer. Given two consecutive frames in the input progressive sequence, vertical processing generates a top field-bottom field (TFBF) frame in a base layer (BL) TFBF sequence, and horizontal processing generates a side-by-side (SBS) frame in an enhancement layer (EL) SBS video sequence. The BL TFBF and the EL SBS sequences are compressed together to create a coded, backwards compatible output stream.
In an another embodiment, a decoder accesses a coded base layer (BL) top field-bottom field (TFBF) stream and a coded enhancement layer (EL) side-by-side (SBS) stream, where the coded BL TFBF stream was generated by an encoder based on vertical processing of at least two consecutive frames of a progressive video sequence, and the coded EL SBS stream was generated by an encoder based on horizontal processing of the at least two consecutive frames of the progressive video sequence. The coded BL TFBF stream may be decoded to generate an interlaced output sequence. The coded EL SBS stream may be decoded to generate a decoded SBS sequence. The decoded BL TFBF and EL SBS sequences may be demultiplexed to generate a progressive output sequence.
In an embodiment, given an input progressive sequence, a video encoder creates a dual-layer stream that combines a backwards-compatible interlaced video stream layer with a residual video stream layer. Given two consecutive frames in the input progressive sequence, vertical processing generates a top field-bottom field (TFBF) frame in a base layer (BL) TFBF sequence. Horizontal processing is applied to residuals between the input progressive sequence and up-sampled frames based on the BL TFBF sequence to generate an enhancement layer (EL) SBS residual sequence. The BL TFBF video sequence and the EL SBS residual sequence are compressed together to create a coded, backwards compatible output stream.
Example Residual-Free Dual-Layer Video Delivery System
In digital HDTV (High Definition Television) broadcasting, 720p and 1080i are two of the most popular transmission formats. Progressive transmission (e.g., 720p) is typically preferred for fast-action sequences, such as sports, since it offers higher vertical resolution than equivalent interlaced content (e.g., 1080i). On the other hand, interlaced transmission (e.g., 1080i) offers higher horizontal resolution and may require less transmission bandwidth than progressive video content transmitted at the same video quality.
In recent years, due to dramatic price decreases, more and more consumers have adopted 1080p HDTV sets. As 1080p HDTVs proliferate, there is an increased interest from broadcasters to offer premium programming, such as movies and sports broadcasts, using 1080p transmission formats. Since older TV sets and set-top boxes may not support 1080p decoding, one way to accommodate backwards compatibility is to transmit simultaneously, on separate channels, both a 1080p stream and a legacy stream; however, such a solution requires at least twice the bandwidth of a traditional high definition (HD) broadcast.
In an alternative implementation, one could apply hierarchical or layered coding methods. In such methods, a base layer (BL) carries a first stream in a first format (e.g., 720p or 1080i) and an enhancement layer (EL) carries a residual signal representing the difference between a second stream in a second format (e.g., 1080p) and a predicted version of the second stream using the base layer. Legacy decoders can decode only the base layer and ignore the enhancement layer; however, newer decoders can use both the base layer and the enhancement layer to reconstruct the second (e.g., 1080p) stream.
The conversion of a high-resolution, high-frame rate, video (e.g., 1920×1080 at 60 frames per second), to a corresponding interlaced video (e.g., 1920×1080 at 60 fields per second) comprises applying to the progressive input video a single, low-pass, vertical filtering process to reduce anti-aliasing artifacts due to the down-sampling process. Because such filtering is applied only to the base layer, the decoder can never fully restore the lost high frequencies from the original progressive signal, and the reconstructed progressive output may lose sharpness and fine detail. Embodiments of the present invention overcome these limitations by utilizing a combination of vertical and horizontal pre-processing in the base and the enhancement layers, respectively. By preserving high-frequency content in both the horizontal and vertical directions, compared to the prior art, embodiments provide better picture quality in the reconstructed progressive picture.
As depicted in
As used herein, the term “metadata” denotes any ancillary information that is embedded or transmitted in parallel with a coded bitstream and assists a decoder to render a decoded image. Such metadata may include, but are not limited to, such data as: color space or gamut information, dynamic range information, tone mapping information, or information related to processing reference frames.
As used herein, the term “vertical processing” denotes image processing operations, such as filtering and down-sampling, that is applied vertically (e.g., from top to bottom or bottom to top) to the pixels of an input frame. Similarly, as used herein, the term “horizontal processing” denotes image processing operations, such as filtering and down-sampling, that is applied horizontally (e.g., from left to right or right to left) to the pixels of an input image. As it is well known in the art of image processing, such horizontal or vertical processing may be performed by a variety of means, including 1-D or 2-D filtering kernels and sub-sampling processing.
Following low pass filtering 115F, each frame is sub-sampled vertically to create one of the fields in the interlaced output sequence νit 130. For example, as shown in
Given an input sequence of progressive video frames νp 110,
Returning to
The BL encoder may comprise any video compression encoder, such as those based on the MPEG (Motion Pictures Experts Group) (e.g., MPEG-2, MPEG-4, or H.264) or JPEG2000 specifications, or any other video encoders known in the art. Such an encoder may encode (compress) the νit sequence 130 on its own, without any reference to the νps sequence. The EL encoder may also comprise any video compression encoder, such as those based on the MPEG (e.g., MPEG-2, MPEG-4, or H.264) or JPEG2000 specifications, or any other video encoders known in the art. In some embodiments, the EL encoder may be the same type as the BL encoder (e.g., both may be based on the H.264 specification). In other embodiments, the EL and BL encoders may be based on different specifications. For example, the BL encoder may be based on MPEG-2 specification, but the EL encoder may be based on the H.264 specification or a proprietary video encoder. The EL encoder could encode (compress) the νps sequence 150 on its own, without any reference to the νit sequence 130; however, in a more efficient embodiment, each frame in the νps sequence 150 may be encoded by taking into consideration reference frames from both the νps 150 and the νit 130 sequences.
Reference processing unit (RPU) 220 interfaces with both the BL encoder 230 and EL encoder 210. As described in PCT application 13/376,707 “Encoding and decoding architecture for format compatible 3D video delivery,” by A. Tourapis et al., incorporated herein by reference, the RPU 220 may serve as a pre-processing stage that processes information from BL encoder 230, before utilizing this information as a potential predictor for the enhancement layer in EL encoder 210. Information related to the RPU processing may be communicated (e.g., as metadata) to a decoder (e.g., 300) using the RPU stream 225. Some embodiments may not use an RPU unit. Some embodiments may encode the νps 150 and the νit 130 sequences using a multi-view encoder (MVC) as specified by the H.264 coding specification.
Using the coded BL stream 350, BL decoder 320 may decompress (decode) and generate a backwards compatible interlaced νi sequence 380. Using the coded EL stream 310 and information from the RPU unit 340 (e.g., reference frames from the BL stream 350), EL decoder 320 may also generate progressive SBS sequence νps 325. Demultiplexer 370 may combine the TFBF sequence 365 and the SBS sequence 325 to generate progressive sequence νp 375. Legacy decoders may only be able to decode the backwards-compatible νi sequence 380. Advanced decoders may be able to decode either one or both of these sequences.
As depicted in
Similarly, EL half-resolution frame 410 is up-sampled horizontally by up-sampler 415 to create full-resolution frame 417. Vertical low pass filter 430B and high-pass filter 430A may be applied to frame 417 to yield corresponding filtered frames 434 and 432. All the filtered frames are combined together by averaging unit 450 and adder 440 to yield a full-resolution progressive frame 445.
Due to the low-pass filtering (e.g., 115F and 135F) being applied to the νp input 110 in the encoder, the demultiplexing process may be simplified according to the embodiments depicted in
In an alternative embodiment, the high-pass filtering operation (540) depicted in
Example Residual-Based Dual-Layer Video Delivery System
At the expense of some additional processing in the encoder, the operation of decoding demultiplexer 370 may be further simplified in an alternative embodiment that combines methods of vertical and horizontal processing described earlier with inter-layer coding and residual coding.
Using the output of RPU 855, or other equivalent processing and storage means, and vertical up-sampler 870, encoding system 800-E may generate two consecutive up-sampled progressive frames 873 and 875. For example, up-sampled νit(n) 873 may be generated by up-sampling the top field of νit(n) 822 and up-sampled νit(n+1) 875 may be generated by up-sampling the bottom field of νit(n) 822. These frames may be subtracted from the original progressive input frames 810-1 and 810-2 to generate two residual frames 833 and 835.
Similarly to the horizontal processing depicted in
The BL stream 852 and the residual EL stream 862 may be combined (multiplexed) with ancillary information from the RPU stream 857 to be transmitted to a receiver (not shown).
The output of encoder 800-E may be decoded using decoder system 300 depicted in
Using demultiplexor 800-D, r(n) (left half of the decoded SBS EL sequence) may be combined with the top field of νit(n) to generate progressive frame νp(n), and r(n+1) (right half of the decoded SBS EL sequence) may be combined with the bottom field of νit(n) to generate progressive frame νp(n+1).
Example Computer System Implemenatation
Embodiments of the present invention may be implemented with a computer system, systems configured in electronic circuitry and components, an integrated circuit (IC) device such as a microcontroller, a field programmable gate array (FPGA), or another configurable or programmable logic device (PLD), a discrete time or digital signal processor (DSP), an application specific IC (ASIC), and/or apparatus that includes one or more of such systems, devices or components. The computer and/or IC may perform, control, or execute instructions relating to the encoding and decoding of dual-layer, backwards-compatible, progressive video, such as those described herein. The computer and/or IC may compute any of a variety of parameters or values that relate to methods for encoding and decoding of dual-layer, backwards-compatible progressive video as described herein. The image and video embodiments may be implemented in hardware, software, firmware and various combinations thereof.
Certain implementations of the invention comprise computer processors which execute software instructions which cause the processors to perform a method of the invention. For example, one or more processors in a display, an encoder, a set top box, a transcoder or the like may implement encoding and decoding of dual-layer, backwards-compatible, progressive video as described above by executing software instructions in a program memory accessible to the processors. The invention may also be provided in the form of a program product. The program product may comprise any medium which carries a set of computer-readable signals comprising instructions which, when executed by a data processor, cause the data processor to execute a method of the invention. Program products according to the invention may be in any of a wide variety of forms. The program product may comprise, for example, physical media such as magnetic data storage media including floppy diskettes, hard disk drives, optical data storage media including CD ROMs, DVDs, electronic data storage media including ROMs, flash RAM, or the like. The computer-readable signals on the program product may optionally be compressed or encrypted.
Where a component (e.g. a software module, processor, assembly, device, circuit, etc.) is referred to above, unless otherwise indicated, reference to that component (including a reference to a “means”) should be interpreted as including as equivalents of that component any component which performs the function of the described component (e.g., that is functionally equivalent), including components which are not structurally equivalent to the disclosed structure which performs the function in the illustrated example embodiments of the invention.
Equivalents, Extensions, Alternatives and Miscellaneous
Example embodiments that relate to dual-layer encoding and decoding of progressive video are thus described. In the foregoing specification, embodiments of the present invention have been described with reference to numerous specific details that may vary from implementation to implementation. Thus, the sole and exclusive indicator of what is the invention, and is intended by the applicants to be the invention, is the set of claims that issue from this application, in the specific form in which such claims issue, including any subsequent correction. Any definitions expressly set forth herein for terms contained in such claims shall govern the meaning of such terms as used in the claims. Hence, no limitation, element, property, feature, advantage or attribute that is not expressly recited in a claim should limit the scope of such claim in any way. The specification and drawings are, accordingly, to be regarded in an illustrative rather than a restrictive sense.
This application claims the benefit of priority to U.S. Provisional Patent Application Ser. No. 61/583,075, filed on Jan. 4, 2012, which is hereby incorporated by reference in its entirety.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US2012/069426 | 12/13/2012 | WO | 00 |
Publishing Document | Publishing Date | Country | Kind |
---|---|---|---|
WO2013/103490 | 7/11/2013 | WO | A |
Number | Name | Date | Kind |
---|---|---|---|
4661850 | Strolle | Apr 1987 | A |
5128791 | LeGall | Jul 1992 | A |
5216505 | Kageyama | Jun 1993 | A |
5270813 | Puri | Dec 1993 | A |
5305104 | Jensen | Apr 1994 | A |
5329365 | Uz | Jul 1994 | A |
5387940 | Kwok | Feb 1995 | A |
5408270 | Lim | Apr 1995 | A |
5410354 | Uz | Apr 1995 | A |
5457498 | Hori | Oct 1995 | A |
5475435 | Yonemitsu | Dec 1995 | A |
5497199 | Asada | Mar 1996 | A |
5508746 | Lim | Apr 1996 | A |
5703654 | Iizuka | Dec 1997 | A |
5742343 | Haskell | Apr 1998 | A |
6069664 | Zhu | May 2000 | A |
6366324 | Van Rooy | Apr 2002 | B1 |
6490321 | Sugiyama | Dec 2002 | B1 |
6700933 | Wu | Mar 2004 | B1 |
7447264 | Sugiyama | Nov 2008 | B2 |
20040101049 | Sugiyama | May 2004 | A1 |
20070009039 | Ryu | Jan 2007 | A1 |
20070086666 | Bruls | Apr 2007 | A1 |
20070140350 | Sakazume et al. | Jun 2007 | A1 |
20090154562 | Syed | Jun 2009 | A1 |
20090225869 | Cho | Sep 2009 | A1 |
20090262803 | Wang | Oct 2009 | A1 |
20090304081 | Bourge | Dec 2009 | A1 |
20100033622 | Bellers | Feb 2010 | A1 |
20110050851 | Chen | Mar 2011 | A1 |
20110074922 | Chen et al. | Mar 2011 | A1 |
20110134214 | Chen | Jun 2011 | A1 |
20120027079 | Ye et al. | Feb 2012 | A1 |
20120092452 | Tourapis | Apr 2012 | A1 |
Number | Date | Country |
---|---|---|
0598786 | Jun 1994 | EP |
2273620 | Jun 1994 | GB |
2013040170 | Mar 2013 | WO |
2013049179 | Apr 2013 | WO |
2013049383 | Apr 2013 | WO |
Entry |
---|
Dolby's Frame Compatible Full Resolution (FCFR) 3D System Specifications. Dolby Laboratories Inc. Dec. 2010. |
ITU-T and ISO/IEC JTC 1, “Advanced Video Coding for Generic Audiovisual Services” ITU-T Recommendation H.264 and ISO/IEC 14496-10, 2009. |
Tourapis, A.M. et al “A Frame Compatible System for 3D Delivery” MPEG Meeting, Jul. 26-30, 2010, ISO/IEC JTC1/SC29/WG11. |
Number | Date | Country | |
---|---|---|---|
20140376612 A1 | Dec 2014 | US |
Number | Date | Country | |
---|---|---|---|
61583075 | Jan 2012 | US |