Implementations are described that relate to coding systems. Particular implementations relate to bit-depth scalable coding and/or spatial scalable coding.
In recent years, digital images and videos with color bit depth higher than 8-bit are being deployed in many video and image applications. Such applications include, for example, medical image processing, digital cinema workflows in production and postproduction, and home theatre related applications. A bit-depth is the number of bits used to represent the color of a single pixel in a bitmapped image or a video frame. Bit-depth scalability is a solution that is practically useful to enable the co-existence of conventional 8-bit depth and higher bit depth digital imaging systems in the marketplace. For example, a video source can render a video stream having 8-bit depth and 10-bit depth. The bit depth scalability enables two different video sinks (e.g., displays) each having different bit depth capabilities to decode such a video stream.
According to a general aspect, a source image of a base layer macroblock is encoded. A source image of an enhancement layer macroblock is encoded by performing inter-layer prediction. The source image of the base layer and the source image of the enhancement layer differ from each other both in spatial resolution and color bit-depth.
According to another general aspect, a source image of a base layer macroblock is decoded. A source image of an enhancement layer macroblock is decoded by performing an inter-layer prediction. The source image of the base layer and the source image of the enhancement layer differ from each other both in spatial resolution and color bit-depth.
According to another general aspect, a portion of an encoded image is accessed and decoded. The decoding includes performing spatial upsampling of the accessed portion to increase the spatial resolution of the accessed portion. The decoding also includes performing bit-depth upsampling of the accessed portion to increase the bit-depth resolution of the accessed portion.
The details of one or more implementations are set forth in the accompanying drawings and the description below. Even if described in one particular manner, it should be clear that implementations may be configured or embodied in various manners. For example, an implementation may be performed as a method, or embodied as apparatus, such as, for example, an apparatus configured to perform a set of operations or an apparatus storing instructions for performing a set of operations, or embodied in a signal. Other aspects and features will become apparent from the following detailed description considered in conjunction with the accompanying drawings and the claims.
Several techniques are discussed below to handle the coexistence of an 8-bit bit-depth and a higher bit depth (and in particular 10-bit video). Certain embodiments include a method for encoding data such that the encoding has combined spatial and bit-depth scalability. Certain embodiments also include a method for decoding such an encoding.
One of the techniques includes transmitting only a 10-bit coded bit-stream where the 8-bit representation for standard 8-bit display devices is obtained by applying a tone mapping method to the 10-bit presentation. Another technique for enabling the co-existence of 8-bit and 10-bit includes transmitting a simulcast bit-stream that contains an 8-bit coded presentation and a 10-bit coded presentation. The decoder selects which bit-depth to decode. For example, a 10-bit capable decoder can decode and output a 10-bit video while a normal decoder supporting only 8-bit data can output an 8-bit video.
The first technique transmits 10-bit data and is, therefore, not compliant with H.264/AVC 8-bit profiles. The second technique is compliant to all the current standards but it requires additional processing.
A tradeoff between the bit reduction and backward compatibility is a scalable solution. The scalable extension of H.264/AVC (hereinafter “SVC”) supports bit depth scalability. A bit-depth scalable coding solution has many advantages over the techniques described above. For example, such a solution enables 10-bit depth to be backward-compatible with AVC High Profiles and further enables the adaptation to different network bandwidths or device capabilities. The scalable solution also provides low complexity and high efficiency and flexibility.
The SVC bit depth solution supports temporal, spatial, and SNR scalability, but does not support combined scalability. The combined scalability refers to combining both spatial and bit-depth scalability, i.e., the different layers of a video frame or image would be different from each other in both spatial resolution and color bit-depth. In one example, the base layer is 8-bit depth and standard definition (SD) resolution, and the enhancement layer is 10-bit depth and high definition (HD) resolution.
Certain embodiments provide a solution that enables the bit-depth scalability to be fully compatible with the spatial scalability.
The EL source image 102 may be encoded using an output of the interlayer prediction module 150 or by just performing spatial prediction using a model 160. The operational mode is determined by the state of switch 104. The state of the switch 104 is an encoder decision determined by a rate-distortion optimization process, which chooses a state that has higher coding efficiency. Higher coding efficiency means lower cost. Cost is a measure that combines the bit rate and distortion. Lower bit rate for the same distortion or lower distortion with the same bit rate means lower cost.
The interlayer prediction module 150 computes the prediction of the current enhancement layer by spatial and bit depth upsampling the BLrec. Also shown in
A non-limiting block diagram of the interlayer prediction module 150 is shown in
The input BL bit stream 301 is parsed by the entropy decoding unit 310 and then is inverse quantized and inverse transformed by the inverse quantizer and inverse transformer module 320 to output a reconstructed base layer residual signal BLres. The spatial prediction of the current block, as computed by the spatial prediction module 330, is added to the output of module 320 to generate the reconstructed base layer collocated macroblock BLrec.
The EL bit stream 302 may be decoded using the output of interlayer prediction unit 340. Otherwise, the decoding is performed based on the spatial prediction similar to the decoding of the BL bit stream 301. The interlayer prediction module 340 decodes the enhancement layer bit stream 302 using the BLrec macroblock by performing spatial and bit depth upsampling. Deblocking is performed by deblocking modules 360-1 and 360-2.
A non-limiting block diagram of an implementation of the interlayer prediction module 340 is shown in
The interlayer prediction module 340 is adapted to process macroblocks that are intra-coded. Specifically, first, the reconstructed base layer macro-block BLrec is spatial upsampled using a spatial upsampler 410. Then, bit depth upsampling is performed, using a bit-depth upsampler 420, by applying a bit-depth upsampling function Fb on the spatial upsampled signal. The Fb function has the same parameters as that of the Fb function used to encode the enhancement layer. Components analogous to elements 230 and 240 in
The interlayer residual prediction model 520 processes a reconstructed base layer residual signal BLkres, (where k is a picture order count of the current picture). The residual signal BLkres output by the inverse quantizer and transformer module 530.
As illustrated in
At S810 a base layer bit-stream is encoded. The base layer typically has low bit depth and low spatial resolution. At S820 it is checked if a collocated base layer macroblock is intra-coded, and if so execution continues with S830. Otherwise, execution proceeds to S840. At S830, a reconstructed base layer collocated macroblock BLrec is spatial upsampled to generated a signal Fs{BLrec}. At S831, a bit-depth upsampling function Fb{.} is generated. At S832, the bit-depth upsampling function Fb{.} is applied on the spatial upsampled signal Fs{BLrec} to generate the prediction of the current enhancement layer Fb{Fs{BLrec}}. At S833, the parameters of the bit-depth upsampling function Fb{.} are encoded and the coded bits are inserted into the input EL bit stream. Then, execution proceeds to S850.
At S840 the collocated base layer macroblock motion vector is motion upsampled for a motion-compensated prediction of the current enhancement layer macroblock. Then, at S841, interlayer residual prediction is performed by spatial upsampling (Fs{.}) the reconstructed base layer residual signal BLKres to generate the signal Fs{BLKres}. The signal Fs{BLKres} is then bit-depth upsampled Fb′{.}) to generate the residual prediction signal Fb′{Fs{BLres}}. At S850, the residual prediction signal of the current enhancement layer, which is output either by S833 or S841, is added to the EL bit stream.
At S910 the base layer bit stream is parsed and parameters of the bit-depth upsampling function Fb{.} are extracted from the bit stream. At S920 a check is made to determine if a collocated base layer macroblock is intra-coded, and if so execution continues with S930. Otherwise, execution steps to S940.
At S930, the reconstructed base layer collocated macroblock BLrec is spatial upsampled (Fs{.}) to generate a signal Fs{BLrec}. At S931, the spatial upsampled signal Fs{BLrec} is bit-depth upsampled (Fb{.}) to generate the prediction of the current enhancement layer Fb{Fs{BLrec}}. Then, execution proceeds to S950.
At S940, the collocated base layer macroblock motion vector is motion upsampled for the motion-compensated prediction of the current enhancement layer macroblock. Then, at S941, an interlayer residual prediction is performed by spatial upsampling (Fs{.}) the reconstructed base layer residual signal BLres to generate a signal Fs{BLkres} and then bit-depth upsampling (Fb′{.}) the signal Fs{BLkres} to generate the residual prediction signal Fb′{Fs{BLkres}}. At S950, the residual prediction signal of the current enhancement layer is added to the bit stream of the enhancement layer.
The video transmission system 1000 is capable of generating and delivering video contents with enhanced features, such as extended gamut and high dynamic compatible with different video receiver requirements. For example, the video contents can be displayed over home-theater devices that support enhanced features, CRT and flat panel displays supporting conventional features, and portable display devices supporting limited features. This is achieved by generating an encoded signal including a combined spatial and bit-depth scalability.
The video transmission system 1000 includes an encoder 1010 and a transmitter 1020 capable of transmitting the encoded signal. The encoder 1010 receives two video streams having different bit-depths and resolutions and generates an encoded signal having combined scalability properties. The encoder 1010 may be, for example, the encoder 100 or the encoder 500 which are described in detail above.
The transmitter 1020 may be, for example, adapted to transmit a program signal having a plurality of bitstreams representing encoded pictures. Typical transmitters perform functions such as, for example, one or more of providing error-correction coding, interleaving the data in the signal, randomizing the energy in the signal, and modulating the signal onto one or more carriers. The transmitter may include, or interface with, an antenna (not shown).
The video receiving system 2000 may be, for example, a cell-phone, a computer, a set-top box, a television, or other device that receives encoded video and provides, for example, decoded video for display to a user or for storage. Thus, the video receiving system 2000 may provide its output to, for example, a screen of a television, a computer monitor, a computer (for storage, processing, or display), or some other storage, processing, or display device.
The video receiving system 2000 is capable of receiving and processing video contents with enhanced features, such as extended gamut and high dynamic compatible with different video receiver requirements. For example, the video contents can be displayed over home-theater devices that support enhanced features, CRT and flat panel displays supporting conventional features, and portable display devices supporting limited features. This is achieved by receiving an encoded signal including a combined spatial and bit-depth scalability.
The video receiving system 2000 includes a receiver 2100 capable of receiving an encoded signal having combined spatial properties and a decoder 2200 capable of decoding the received signal.
The receiver 2100 may be, for example, adapted to receive a program signal having a plurality of bitstreams representing encoded pictures. Typical receivers perform functions such as, for example, one or more of receiving a modulated and encoded data signal, demodulating the data signal from one or more carriers, de-randomizing the energy in the signal, de-interleaving the data in the signal, and error-correction decoding the signal. The receiver 2100 may include, or interface with, an antenna (not shown).
The decoder 2200 outputs two video signals having different bit-depths and resolutions. The decoder 2200 may be, for example, the decoder 300 or 700 described in detail above. In a particular implementation the video receiving system 2000 is a set-top box connected to two different displays having different capabilities. In this particular implementation, the system 2000 provides each type of display with a video signal having properties supported by the display.
The decoding operation 1420 includes performing spatial upsampling of the accessed portion to increase the spatial resolution of the accessed portion (1430). The spatial upsampling may change the accessed portion from standard definition (SD) to high definition (HD), for example.
The decoding operation 1420 includes performing bit-depth upsampling of the accessed portion to increase the bit-depth resolution of the accessed portion (1440). The bit-depth upsampling may change the accessed portion from 8-bits to 10-bits, for example.
The bit-depth upsampling (1440) may be performed before or after the spatial upsampling (1430). In a particular implementation, the bit-depth upsampling is performed after the spatial upsampling, and changes the accessed portion from 8-bit SD to 10-bit HD. The bit-depth upsampling in various implementations uses inverse tone mapping, which generally provides a non-linear result. Various implementations apply non-linear inverse tone mapping, after spatial upsampling.
The process 1400 may be performed, for example, using the enhancement layer decoding portions of decoders 300 or 700. Further, the spatial and bit-depth upsampling may be performed by, for example, the inter-layer prediction modules 340 (see
Further, the process 1400 may be performed by an encoder, such as, for example, the encoders 100 or 500. In particular, the process 1400 may be performed, for example, using the enhancement layer encoding portions of encoders 100 or 500. Further, the spatial and bit-depth upsampling may be performed by, for example, the inter-layer prediction modules 150 (see
The implementations described herein may be implemented in, for example, a method or a process, an apparatus, or a software program. Even if only discussed in the context of a single form of implementation (for example, discussed only as a method), the implementation of features discussed may also be implemented in other forms (for example, an apparatus or program). An apparatus may be implemented in, for example, appropriate hardware, software, and firmware. The methods may be implemented in, for example, an apparatus such as, for example, a processor, which refers to processing devices in general, including, for example, a computer, a microprocessor, an integrated circuit, or a programmable logic device. Processors also include communication devices, such as, for example, computers, cell phones, portable/personal digital assistants (“PDAs”), and other devices that facilitate communication of information between end-users.
Implementations of the various processes and features described herein may be embodied in a variety of different equipment or applications, particularly, for example, equipment or applications associated with data encoding and decoding. Examples of equipment include video coders, video decoders, video codecs, web servers, set-top boxes, laptops, personal computers, cell phones, PDAs, and other communication devices. As should be clear, the equipment may be mobile and even installed in a mobile vehicle.
Additionally, the methods may be implemented by instructions being performed by a processor, and such instructions may be stored on a processor-readable medium such as, for example, an integrated circuit, a software carrier or other storage device such as, for example, a hard disk, a compact diskette, a random access memory (“RAM”), or a read-only memory (“ROM”). The instructions may form an application program tangibly embodied on a processor-readable medium. Instructions may be, for example, in hardware, firmware, software, or a combination. Instructions may be found in, for example, an operating system, a separate application, or a combination of the two. A processor may be characterized, therefore, as, for example, both a device configured to carry out a process and a device that includes a computer readable medium having instructions for carrying out a process.
As will be evident to one of skill in the art, implementations may produce a variety of signals formatted to carry information that may be, for example, stored or transmitted. The information may include, for example, instructions for performing a method, or data produced by one of the described implementations. For example, a signal may be formatted to carry as data the rules for writing or reading the syntax of a described embodiment, or to carry as data the actual syntax-values written by a described embodiment. Such a signal may be formatted, for example, as an electromagnetic wave (for example, using a radio frequency portion of spectrum) or as a baseband signal. The formatting may include, for example, encoding a data stream and modulating a carrier with the encoded data stream. The information that the signal carries may be, for example, analog or digital information. The signal may be transmitted over a variety of different wired or wireless links, as is known.
A number of implementations have been described. Nevertheless, it will be understood that various modifications may be made. For example, elements of different implementations may be combined, supplemented, modified, or removed to produce other implementations. Additionally, one of ordinary skill will understand that other structures and processes may be substituted for those disclosed and the resulting implementations will perform at least substantially the same function(s), in at least substantially the same way(s), to achieve at least substantially the same result(s) as the implementations disclosed. Accordingly, these and other implementations are contemplated by this application and are within the scope of the following claims.
This application claims the benefit of U.S. Provisional Application No. 60/999,569, filed on Oct. 19, 2007, titled “Bit-Depth Scalability”, the contents of which are hereby incorporated by reference in their entirety for all purposes.
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/US08/11901 | 10/17/2008 | WO | 00 | 4/19/2010 |
Number | Date | Country | |
---|---|---|---|
60999569 | Oct 2007 | US |