1. Technical Field
The present disclosure relates to encoding and decoding techniques.
This disclosure was developed with specific attention paid to its possible use in encoding and/or decoding a video sequence comprised of digital samples.
2. Description of the Related Art
A well established paradigm in encoding a digital video signal is based on the layout illustrated in
There, an input (“original”) video signal Video In is encoded in an encoder to be then transmitted over a channel or stored in a storage medium to be eventually decoded in a decoder and reproduced as a Video Out signal.
Specifically,
The two encoder blocks of
During the decoding process, the i-th layer of the bitstream can be decoded starting from the results of decoding the previous layers. Increasing the number of layers in the bitstream increases the fidelity in reproducing the original signal form the signals being decoded.
Scalable Video Coding (SVC) as provided by the ITU-T/MPEG standards (ITU-T Rec. H.264/ISO 14496-10 AVC, Annex G “Scalable Video Coding”) is exemplary of layered coding which extends the H.264/AVC standards by means of a layered encoding process which enables spatial, time and quality scaling of the decoded signal.
In another encoding/decoding paradigm, known as Multiple Description Coding (MDC) as schematically represented in
In the decoding process, the fidelity of the signal decoded (i.e. reproduced) to the original signal increases with an increasing number of descriptions that are received and decoded. The block diagram of
Advantages of layered coding (LC) over multiple description coding (MDC) are:
Advantages of MDC over LC are:
The article by A. Vitali et al. “Video Over IP Using Standard-Compatible Multiple Description Coding: an EPF Proposal”—Proceedings of the 2006 Packet Video Workshop, Hangzhou, China, provides a detailed review of LC and MDC.
Internet Protocol TeleVision (IPTV) is a digital TV service provided using the IP protocol over a wideband network infrastructure such as the Internet.
IPTV is becoming one of the most significant applications within the framework of digital video technology. Projects aiming at producing IPTV set-top boxes for receiving High Definition TV (HDTV) over IP and using the 802.11n standard are currently in progress.
A feature of IPTV is the Video On Demand (VOD) capability, which permits any user in the system to access at any time a given TV content. At a given time instant, each user may notionally access a different content, whereby conventional point-to-point multicast transmission of encoded contents (left-hand side of
Recent research in the area of P2P protocols demonstrates that MDC encoding can greatly improve efficiency of such a distribution system for multimedia contents. By resorting to MDC, users may exchange different alternative representations of the original system, thus increasing the efficiency of connections between peers within the P2P network. The various representations received may be eventually re-composed to reconstruct the original signal with an increasing quality as a function of the number of the descriptions that are received.
Another useful feature for IPTV is adaptability of the content to the terminal, so that the digital video signal received can be effectively reproduced on different types of terminals adapted to be connected to an IPTV system, such as High Definition TV (HDTV) receivers, conventional Standard Definition TV (SDTV) receivers, PC desktops, PC laptops, PDAs, smart phones, IPods, and so on.
One embodiment provides a flexible arrangement wherein e.g. the decoded signal may be scaled as a function of the capabilities of the receiving terminals as regards spatial resolution of the display unit, the frame rate, the video quality and the computation resources.
One embodiment provides an arrangement which combines the advantages of LC and MDC, especially as regards compatibility with the SVC standard.
One embodiment is a method having the features set forth in the claims that follow. This disclosure also relates to corresponding encoding/decoding apparatus as well as to a corresponding signal. The disclosure also relates to a corresponding computer program product, loadable in the memory of at least one computer and including software code portions for performing the steps of the method of this disclosure when the product is run on a computer. As used herein, reference to such a computer program product is intended to be equivalent to reference to a computer-readable medium containing instructions for controlling a computer system to coordinate the performance of the method of the invention. Reference to “at least one computer” is evidently intended to highlight the possibility for the present invention to be implemented in a distributed/modular fashion.
The claims are an integral part of the disclosure as provided herein.
An embodiment of this disclosure is a method which combines the LC encoding paradigm with the MDC paradigm by using the SVC Video Encoding Standard (ITU-T/MPEG SVC).
An embodiment of the arrangement described herein makes it possible to combine the advantages of LC and MDC within a framework compatible with SVC, by giving raise to improved encoding efficiency with respect to the prior art.
In an embodiment of the arrangement described herein, flexibility is provided by compliance with the SVC standard by using an LC encoding paradigm such that the bitstream including the various representations of the original video signal can be scaled by simply discarding those packets which correspond to those layers that are not necessary, whereby the resulting bitstream can be decoded by using a representation of the original video signal in the format held to be the most suitable for the intended use.
An embodiment of the arrangement described herein makes it possible to provide a P2P network (for which MDC encoding is advantageous) and a set-top box adapted for serving different terminals (for which the scalability as offered by the SVC standard is advantageous.
Embodiments will now be described, by way of example only, with reference to the enclosed representations, wherein:
In the following description, numerous specific details are given to provide a thorough understanding of embodiments. The embodiments can be practiced without one or more of the specific details, or with other methods, components, materials, etc. In other instances, well-known structures, materials, or operations are not shown or described in detail to avoid obscuring aspects of the embodiments.
Reference throughout this specification to “one embodiment” or “an embodiment” means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment. Thus, the appearances of the phrases “in one embodiment” or “in an embodiment” in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.
The headings provided herein are for convenience only and do not interpret the scope or meaning of the embodiments.
The input signal SIN is also spatially subsampled in a downsample filter (DS) 12 to produce a downsampled version SBIN of the input signal SIN having a spatial and temporal resolution which is lower or equal to the resolution of the N multiple descriptions MD1, MD2, . . . , MDN.
Subsequently, the various downsampled video signals thus generated are encoded in an encoder 14 complying with the SVC (scalable video coding) standard to generate an encoded output bitstream.
In an embodiment, the encoder 14 provides encoding as follows.
The downsampled signal SBIN from the downsample filter 12 is encoded as the base layer BL of the SVC bitstream resulting from encoding in the encoder 14.
The multiple descriptions MD1, . . . , MDN are encoded as enhancement layers (ELs) of the of the SVC bitstream. Each enhancement layer EL1, EL2, . . . , ELN (in the exemplary embodiment illustrated N=4) can be of the spatial or CGS (Course Grain Scalability) type.
Each enhancement layer EL1, EL2, EL3, EL4 is spatially predicted by the base layer. The inter-layer prediction mechanism of SVC leads to each enhancement layer being encoded efficiently.
Consequently, the encoding arrangement exemplified in
If compared to conventional single description encoding, MDC encoding introduces a redundancy in the data. In the embodiment illustrated in
The encoding arrangement as exemplified in
scalability, related to the use of the SVC standard;
robustness to errors and/or the higher efficiency in the case of P2P transmission for the enhancement layers, which is related to the use of the MDC paradigm;
higher encoding efficiency with respect to a conventional MDC encoding, which is a further advantage deriving from combining MD and SVC;
compliance with the SVC standard; specifically the arrangement as exemplified herein does not require any additional encoding/decoding capability with respect to video encoding/decoding as provided by IPU/MPEG video standards, while also ensuring full compatibility with SVC specifications.
Tests performed by the Applicant indicate that the rate-distortion efficiency of the arrangement exemplified herein compares well with the efficiency of a conventional MD encoding arrangement for various test sequences. For instance, comparisons with MD arrangements including four multiple descriptions independently encoded according to the H.264/AVC standard indicate that substantially the same quality in terms of PSNR ratio of the same level can be ensured while obtaining, in the case of the arrangement exemplified herein, a much higher encoding efficiency (for instance in excess of 25% in the case of the “Crew” test sequence).
In each layer (BL=Base Layer; EL1, EL2, EL3=Enhancement Layers) each image is subdivided in slices, each of which includes a sequence of macro blocks belonging to that image. Each slice is encoded within a packet, designated NALU (Network Abstraction Layer Unit), and the data contained in the packets of the base layer are used to decode data included in the packets of the enhancement layers.
The packets are designated with K and B suffixes to denote the time prediction structure used by the SVC encoding process. There, K denotes a key picture (of the I or P type), while B denotes images encoded as hierarchical B-pictures. The representation of
The block diagram of
The bitstream to be decoded is first fed to an otherwise conventional SVC decoder 24, to extract (in a manner known per se) the base layer BL and the various enhancement layers EL1, EL2, EL3.
The enhancement layers are fed to an MD filter 20 which plays the role of a multiple description filter-composer (MDFC) with the purpose of performing spatial and/or time filtering of the multiple descriptions by applying in an inverse manner the de-composition filter (Multiple Description Filter-Decomposer or MDFD) 10 which generated the multiple descriptions for the encoder 14 of
In an exemplary embodiment, the encoder 10 of
Those of skill in the art will appreciate that arrangement described herein is in no way linked to any specific approach adopted for generating the multiple descriptions MD1, MD2, . . . , MDN. Any conventional method adapted to generate such multiple description can be used within the framework of the instant disclosure.
In an embodiment, two multiple descriptions are generated by spatially subsampling the input sequence by using—for each image and each row of pixels—the even pixels for description MD1 while the odd pixels are used for description MD2. In that way, two descriptions MD1, MD2 are obtained each having a spatial resolution which is one half the spatial resolution of the original sequence.
The approach described in the foregoing may be extended to the columns of each image to derive for descriptions MD1, . . . , MD4, each having a resolution which is ¼ (one fourth) the resolution of the original sequence. Stated otherwise, each image in the input video sequence SIN is sub divided in sub-blocks including 2×2 pixels and each pixel is used to compose a different description.
From the mathematical viewpoint, a generic pixel (x, y) of each image t in each description MDi (i=1 . . . , 4) generated from the input video signal SIN has the following value:
MD1(x,y,t)=SIN(2x,2y,t)
MD2(x,y,t)=SIN(2x+1,2y,t)
MD3(x,y,t)=SIN(2x,2y+1,t)
MD4(x,y,t)=SIN(2x+1,2y+1,t)
where x=1, . . . , W(SIN)/2, y=1, . . . , H(SIN)/2
The block diagram of
In an embodiment, an input sequence can be subdivided into two multiple descriptions each having a time resolution half the time resolution of the input sequence by simply using the even images for the first description and the odd images for the second description.
The block diagram of
By assuming that the input SX to the encoder (i.e. the MDFD filter 10) of
W(SIN)=width of the images (measured in pixels),
H(SIN)=height of the images (again measured in pixels),
F(SIN) frame-rate, namely number of images per second,
then any spatially and/or time down sampling operation of the sequence SIN will give rise to a second sequence (SBIN) with homologous parameters W, H, and F such that:
W(SB)≦W(SIN)
H(SB)≦H(SIN)
F(SB)≦F(SIN)
where SIN and SB denote the original input sequence and the downsampled sequence, respectively.
The MDFD filter 10 when receiving the input sequence SIN, generates therefrom N multiple descriptions MD1, . . . , MDN each of which meets the following requirements:
W(SB)≦W(MDi)≦W(SIN)
H(SB)≦H(MDi)≦H(SIN)
F(SB)≦F(MDi)≦F(SIN),
where SIN and SB again denote the original input sequence and sub sampled sequence, respectively, while MDi denotes the i-th of the N multiple descriptions generated by the filter 10.
When coupled as represented in
The input sequence SBIN sent to the SVC encoder and encoded as the base layer of the scalable bitstream, or the header of each NALU packet used for encoding the SIDN sequence may include the following syntax elements:
dependency_id=0
quality_id=0
layer_base_flag=1.
In an embodiment, the multiple descriptions (MDi) i=1, . . . , N are sent to the SVC encoder and encoded as enhancement layers (ELi) of the spatial type, or the header of each NALU packet used for encoding the i-th multiple description may contain the following syntax elements:
dependency_id=i
quality_id=0
layer_base_flag=0
In an embodiment, the slices in each ELi exploit an inter-layer prediction from the base layer, while the header in each NALU of the enhancement layers Eli include the syntax element:
base_id=0.
In an embodiment, the SVC decoder decodes the base layer BL by producing a video signal designated SOUT and further decodes all the enhancement layers Eli, each containing one of the multiple descriptions MDi the original sequence. These multiple descriptions MDi are then composed by the MDFC filter in such a way to provide a representation SOUT of the input sequence SIN.
The various embodiments described above can be combined to provide further embodiments. All of the U.S. patents, U.S. patent application publications, U.S. patent applications, foreign patents, foreign patent applications and non-patent publications referred to in this specification and/or listed in the Application Data Sheet are incorporated herein by reference, in their entirety. Aspects of the embodiments can be modified, if necessary to employ concepts of the various patents, applications and publications to provide yet further embodiments.
These and other changes can be made to the embodiments in light of the above-detailed description. In general, in the following claims, the terms used should not be construed to limit the claims to the specific embodiments disclosed in the specification and the claims, but should be construed to include all possible embodiments along with the full scope of equivalents to which such claims are entitled. Accordingly, the claims are not limited by the disclosure.
Number | Name | Date | Kind |
---|---|---|---|
6330370 | Goyal et al. | Dec 2001 | B2 |
6345125 | Goyal et al. | Feb 2002 | B2 |
6460153 | Chou et al. | Oct 2002 | B1 |
6625223 | Wimmer et al. | Sep 2003 | B1 |
6665646 | John et al. | Dec 2003 | B1 |
6757735 | Apostolopulos et al. | Jun 2004 | B2 |
6920177 | Orchard et al. | Jul 2005 | B2 |
6920179 | Anand et al. | Jul 2005 | B1 |
7916789 | Kim et al. | Mar 2011 | B2 |
20040066793 | Van Der Schaar | Apr 2004 | A1 |
20050117641 | Xu et al. | Jun 2005 | A1 |
20050135477 | Zhang et al. | Jun 2005 | A1 |
20060088107 | Cancemi et al. | Apr 2006 | A1 |
20060256851 | Wang et al. | Nov 2006 | A1 |
20070009039 | Ryu | Jan 2007 | A1 |
20080002776 | Borer et al. | Jan 2008 | A1 |
20080013620 | Hannuksela et al. | Jan 2008 | A1 |
20080043832 | Barkley et al. | Feb 2008 | A1 |
20080232452 | Sullivan et al. | Sep 2008 | A1 |
20080292005 | Xu et al. | Nov 2008 | A1 |
20090074070 | Yin et al. | Mar 2009 | A1 |
20090238279 | Tu et al. | Sep 2009 | A1 |
Number | Date | Country |
---|---|---|
1 638 337 | Mar 2006 | EP |
2008060732 | May 2008 | WO |
Entry |
---|
Goyal, V. K., “Multiple Description Coding: Compression Meets the Network,” IEEE Signal Processing Magazine, pp. 74-93, Sep. 2001. |
Mayer, A. et al., “A Survey of Adaptive Layered Video Multicast using MPEG-2 Streams,” 14th IST Mobile and Wireless Communications Summit, Jun. 19, 2005, 5 pages. |
Vitali, A. et al., “Video over IP using standard-compatible multiple description coding: an IETF proposal,” Journal of Zhejing University Science A 7(5):668-676, 2006. |
Folli et al., “Scalable multiple description coding of video sequences,” GTTI: 1-7, 2008, URL:http://www.gtti.it/GTTI08/files/SessioneScientifica/folli.pdf [retrieved on Sep. 1, 2010]. |
Richardson, I., “H.264 and MPEG-4 Video Compression—Chapter 5: MPEG-4 Visual”, Oct. 17, 2003. |
Number | Date | Country | |
---|---|---|---|
20100027678 A1 | Feb 2010 | US |