This invention relates to a method and an apparatus for generating color mapping parameters for video encoding, and more particularly, to a method and an apparatus for generating the color mapping parameters for inter-layer prediction in scalable video encoding.
This section is intended to introduce the reader to various aspects of art, which may be related to various aspects of the present invention that are described and/or claimed below. This discussion is believed to be helpful in providing the reader with background information to facilitate a better understanding of the various aspects of the present invention. Accordingly, it should be understood that these statements are to be read in this light, and not as admissions of prior art.
A sample in a picture may be transformed from one color space to another color space, or more generally, from one color to another color. For example, in scalable video coding, Enhancement Layer (EL) pictures are usually predicted from (possibly upsampled) decoded Base Layer (BL) pictures. When the EL pictures and the BL pictures are represented with different color spaces and/or have been color graded differently, or have different luminance ranges (such as Standard Dynamic Range for the BL and High Dynamic Range for the EL) transforming the decoded BL pictures, for example, to the color space or the dynamic range of the EL may improve the prediction.
This color transform is also known as color mapping, which may be represented by a Color Mapping Function (CMF). The CMF can for example be approximated by a 3×3 gain matrix plus an offset (Gain-Offset model), which are defined by 12 parameters. When only one set of Gain-Offset model parameters is used to map the entire color space of the BL pictures, such an approximation of the CMF may not be very precise because it assumes a linear transform model. To improve the precision of color mapping, the color space of the BL pictures can be partitioned into multiple octants, wherein each octant is associated with a respective color mapping function.
In another example, a 3D Look Up Table (also known as 3D LUT), which indicates how a color (usually with three components) is mapped to another color in a look-up table, can be used to describe a CMF. The 3D LUT can be much more precise because its size can be increased depending on the required accuracy. However, a 3D LUT may thus represent a huge data set.
In another example, the color transform can be performed by applying a one-dimensional color LUT independently on each color component of a picture or of a region in the picture. Since applying 1D LUT independently on each color component breaks component correlation, which may decrease the efficiency of the inter-layer prediction and thus the coding efficiency, a linear model such as a 3×3 matrix (in the case of 3 color components) and optionally a vector of offsets can be applied to the mapped components so as to compensate for the decorrelation between the components. Optionally, an additional transform can be performed by applying another one-dimensional color LUT independently on each color component of a picture or of a region in the picture.
According to an aspect of the present principles, a method for video encoding is presented, comprising: accessing a first set of samples and a second set of samples of a base layer picture, which respectively belong to a first octant and a second octant in a color space for the base layer picture; generating color mapping parameters for the first octant responsive to the first and second sets of samples; transforming a block of samples of the base layer picture to form a prediction block of a corresponding block in an enhancement layer (EL) picture, the block of samples of the base layer picture including at least one sample which belongs to the first octant, wherein the at least one sample which belongs to the first octant is transformed based on the generated color mapping parameters; encoding the corresponding block in the enhancement layer picture using the formed prediction block; and generating a bitstream responsive to the encoding. The present embodiment also provide an apparatus for performing these steps.
The present embodiment also provide a computer readable storage medium having stored thereon instructions for video encoding according to the methods described above.
The present embodiment also provide a computer readable storage medium having stored thereon a bitstream generated according to the methods described above.
In scalable video coding, for example, as defined in the scalable extension of HEVC (also referred to as SHVC, as described in a document entitled “High Efficiency Video Coding, Recommendation ITU-T H.265,” published by ITU-T in October 2014), video signals represented in different layers can have different parameters, such as, but not limited to, spatial resolutions, sample bit depths, and color gamuts. Depending on which parameters differ between the BL and EL, appropriate forms of inter-layer processing are applied to the BL reconstructed pictures to derive the inter-layer reference (ILR) pictures for efficient EL coding.
In the following, we use a two-layer SHVC encoder to illustrate various embodiments according to the present principles. It should be noted that the present principles can be applied to any scalable video encoders with one or more enhancement layers. In the present application, we use the terms “picture” and “image” interchangeably.
When the color spaces and/or the color gamuts of the BL and of the EL are different, one can use a color mapping function to transform the samples of the BL when performing the inter-layer prediction of the EL samples from BL samples. In the following, the color mapping is also called CGS (Color Gamut Scalability) prediction as it supports color gamut scalability. In the present application, we use the YUV color space to illustrate different embodiments. The present principles can also be applied to other color spaces, for example, but not limited to, the RGB color space and XYZ color space.
As described before, to improve the precision of color mapping, the color space of the BL pictures can be partitioned into multiple octants, wherein each octant is associated with a respective Gain-Offset model.
Mathematically, the CGS prediction of EL sample (y′, u′, v′) from the corresponding BL sample (y, u, v) using the Gain-Offset model can be described as:
where
is the gain matrix and
is the offset vector for octant i.
At step 440, the encoder begins to loop over individual octants in the current picture. At step 450, the encoder computes the CMF parameters, for example, twelve parameters of a Gain-Offset model, for the current octant (Octi). The loop over individual octants ends at step 460. At step 470, the encoder performs CGS prediction to obtain the EL prediction from the BL samples based on the CMF parameters. The CGS prediction may be performed, for example, on a block basis or on a picture basis. When it is performed on a block basis, for each sample in a block, the encoder determines an octant which the sample belongs to. Subsequently, using the color mapping parameters for the octant, the encoder can transform the sample into the EL prediction using the CMF. The encoder may also perform other operations, for example, but not limited to, spatial upsampling, bit depth upsampling to obtain the EL prediction. Based on the CGS prediction and/or other type of inter-layer prediction, the encoder encodes the enhancement layer for the current picture at step 480. The loop over individual pictures ends at step 490. Method 400 ends at step 499.
For the decoder to properly decode the bitstream, the CMF parameters are also encoded in the bitstream. For example, the CMF parameters can be encoded using syntax structures colour_mapping_table ( ) and colour_mapping_octants ( ), in PPS (Picture Parameter Set), as described in Sections F.7.3.2.3.4 and F.7.3.2.3.5 of the SHVC Specification.
In the current implementation of the SHVC reference software, the color mapping function parameters are estimated using an error minimization method (such as Least Square Minimization, LSM):
arg min(M
where ErrX(Mi, Oi)=Σ(y,u,v)εOct
The computation of the minimization problem (2) is performed separately for each octant, using the samples (y, u, v) belonging to the current octant (i.e., (y, u, v)εOcti). Because different octants use different sets of samples to estimate the color mapping function parameters, two samples that are close in the BL color space, but fall into two different octants, may be transformed into two samples that show color discontinuity in the EL prediction frame.
For example, a BL picture includes a red area with smooth gradients. After the partitioning of the color space, the colors corresponding to a first subset of the red area fall in one octant, and the colors corresponding to the rest of the red area fall into other octants. After color mapping (CGS prediction for EL), the color range corresponding to the first subset becomes more saturated than the color corresponding to the rest of the red color set. This generates artificial edge (artifact) in the area that was originally smooth in EL.
The present principles are directed to a method and an apparatus for improving the parameter estimation of the color mapping function. In one embodiment, the color mapping function parameters are estimated using not only samples from the current octant but also samples from neighboring octants. Advantageously, the color mapping function parameters are no longer estimated independently for each octant, and the proposed techniques may reduce color discontinuity artifacts at the octant boundaries. Thus, the proposed techniques may improve the subjective quality of the reconstructed enhancement layer video.
In one embodiment, we propose to compute the color mapping function parameters by including samples from neighboring octants.
Subsequently, the minimization problem can be formulated as:
arg min(M
where ErrX(Mi, Oi)=Σ(y,u,v)εOct′
In the following, we describe the methods of selecting the super octant Oct′i in further detail.
Asymmetrical Overlapping
In one embodiment, we may choose to use the overlapping area from one or two directions of the color space, but not all the directions. For example, we observe that in some cases the color discontinuity occurs most frequently along the Y-direction, thus we may only consider overlapping in the Y-direction.
While using a super octant to estimate the CMF parameters for an octant can reduce the color discontinuity artifact, it may also sacrifice compression efficiency. Thus, by only overlapping in certain direction(s), we provide a good tradeoff between artifact reduction and compression efficiency.
Selective Overlapping
We may also selectively choose the overlapping area for each octant separately. For example, in
The encoder estimates at step 840 the CMF parameters (P2) with overlapping in all boundaries that need to be checked, and computes at step 850 the discontinuity error (E2,j) in EL prediction based on P2 for boundary j. In one example, the overlapping area for one boundary is ⅛ of the size of the octant. Using the discontinuity errors based on P1 and P2 (i.e., the CMF parameters calculated without or with an overlapping area for boundary j respectively), the encoder determines at step 860 whether an overlapping area is used or not for boundary j. In particular, when E1,j<E2,j, there is less artifact without using the overlapping area and the encoder would choose not using an overlapping area for boundary j. Otherwise, if E1,j≧E2,j, the encoder would choose using an overlapping area for boundary j. At step 870, the encoder checks whether there are more boundaries to be checked. If yes, the control is returned to step 850. At step 880, based on the determined overlapping area, the encoder may re-compute the CMF parameters for the octants. Method 800 ends at step 899.
The steps in method 800 may proceed at a different order from what is shown in
An example is depicted in
In another embodiment, we choose the overlapping area based on the observation that the color discontinuity artifacts often occur at the pixels which are close to each other in the image and with colors falling into different sides of an octant boundary. In other words, the artifacts often happen among those pixels that are close to each other in both image and color spaces. Thus, we would choose to use an overlapping area for one octant boundary if pixels along the octant boundary are adjacent to each other in the image space, but fall into different sides of the octant boundary. In
Table 1 provides exemplary pseudo-code for one implementation.
For sample Si, we denote its location as Xi and its color values as Ci=(yi, ui, vi). The distance between samples Si and Sj (i.e., the distance between locations Xi and Xj, denoted as Disij (Si, Sj)) can be used to determine spatially neighboring pixels, for example, sample Sj is considered to be a spatially neighboring sample of Si if Disij(Si, Sj)≦aimgDis. In one example, threshold aimgDis can be set to 1.
We denote the CMF parameters estimated on octant OctK itself (no overlapping) as P1(OctK), and the color sample value obtained with the CGS prediction for sample Si as C′i=(y′i,u′i,v′i). Similarly, we denote the CMF parameters estimated on octant OctL itself (no overlapping) as P1(OctL), and the color sample value obtained with the CGS prediction for pixel Sj as C′j=(y′j,u′j,v′j). In one example, the discontinuity error for component Y can be calculated as
E
ij
=|y′
i
−y′
j|. (4)
In Table 1, the average discontinuity error avgE(BKL) is used to determine whether an overlapping area is used for the boundary of OctK and OctL. In other embodiments, the number of spatially neighboring samples that fall into two neighboring octants in the color space, N(BKL), can be used to determine whether an overlapping area is used for the boundary of OctK and OctL. For example, we may check whether N(BKL)≧nDisColor, i.e.,
if N(BKL)≧nDisColor (5)
in place of checking
if (avgE(BKL)≧aDiscolor) (6)
as described in Table 1. An exemplary value of the threshold nDisColor can be set to 1% of the overall number of samples in the image. Alternatively, we can check
if (avgE(BKL)≧aDiscolor∥N(BKL)≧nDisColor) or (7)
if (avgE(BKL)≧aDiscolor && N(BKL)≧nDiscolor) (8)
to determine whether or not an overlapping area is to be used for the boundary of OctK and OctL.
After the overlapping areas are determined for different octant boundaries, that is, after the super octant is determined, the CMF parameters can be estimated accordingly based on the super octant.
The size of the overlapping area could be varied, for example, with the luma values of the two octants. In one embodiment, we can have a larger overlapping area between two octants when the octants have greater luma values. This is based on the observation that human eyes are usually more sensitive to the bright regions in the image. Another example is that the size of the overlapping could be varied with the perceptual importance of the corresponding image region.
In the above, we discussed using the Gain-Offset model. The present principles can also be applied when other models are used.
The system 1300 may include at least one processor 1310 configured to execute instructions loaded therein for implementing the various processes as discussed above. Processor 1310 may include embedded memory, input output interface and various other circuitries as known in the art. The system 1300 may also include at least one memory 1320 (e.g., a volatile memory device, a non-volatile memory device). System 1300 may additionally include a storage device 1340, which may include non-volatile memory, including, but not limited to, EEPROM, ROM, PROM, RAM, DRAM, SRAM, flash, magnetic disk drive, and/or optical disk drive. The storage device 1340 may comprise an internal storage device, an attached storage device and/or a network accessible storage device, as non-limiting examples. System 1300 may also include an encoder/decoder module 1330 configured to process data to provide an encoded video or decoded video.
Encoder/decoder module 1330 represents the module(s) that may be included in a device to perform the encoding and/or decoding functions. As is known, a device may include one or both of the encoding and decoding modules. Additionally, encoder/decoder module 1330 may be implemented as a separate element of system 1300 or may be incorporated within processors 1310 as a combination of hardware and software as known to those skilled in the art.
Program code to be loaded onto processors 1310 to perform the various processes described hereinabove may be stored in storage device 1340 and subsequently loaded onto memory 1320 for execution by processors 1310. In accordance with the exemplary embodiments of the present principles, one or more of the processor(s) 1310, memory 1320, storage device 1340 and encoder/decoder module 1330 may store one or more of the various items during the performance of the processes discussed herein above, including, but not limited to the base layer input video, the enhancement layer input video, equations, formula, matrices, variables, operations, and operational logic.
The system 1300 may also include communication interface 1350 that enables communication with other devices via communication channel 1360. The communication interface 1350 may include, but is not limited to a transceiver configured to transmit and receive data from communication channel 1360. The communication interface may include, but is not limited to, a modem or network card and the communication channel may be implemented within a wired and/or wireless medium. The various components of system 1300 may be connected or communicatively coupled together using various suitable connections, including, but not limited to internal buses, wires, and printed circuit boards.
The exemplary embodiments according to the present principles may be carried out by computer software implemented by the processor 1310 or by hardware, or by a combination of hardware and software. As a non-limiting example, the exemplary embodiments according to the present principles may be implemented by one or more integrated circuits. The memory 1320 may be of any type appropriate to the technical environment and may be implemented using any appropriate data storage technology, such as optical memory devices, magnetic memory devices, semiconductor-based memory devices, fixed memory and removable memory, as non-limiting examples. The processor 1310 may be of any type appropriate to the technical environment, and may encompass one or more of microprocessors, general purpose computers, special purpose computers and processors based on a multi-core architecture, as non-limiting examples.
Referring to
The data transmission system 1400 receives processed data and other information from a processor 1401. In one implementation, the processor 1401 generates color mapping function parameters, for example, using method 400, 800, or 1100. The processor 1401 may also provide metadata to 1400 indicating, for example, the partitioning of the color space.
The data transmission system or apparatus 1400 includes an encoder 1402 and a transmitter 1404 capable of transmitting the encoded signal. The encoder 1402 receives data information from the processor 1401. The encoder 1402 generates an encoded signal(s).
The encoder 1402 may include sub-modules, including for example an assembly unit for receiving and assembling various pieces of information into a structured format for storage or transmission. The various pieces of information may include, for example, coded or uncoded video, and coded or uncoded elements. In some implementations, the encoder 1402 includes the processor 1401 and therefore performs the operations of the processor 1401.
The transmitter 1404 receives the encoded signal(s) from the encoder 1402 and transmits the encoded signal(s) in one or more output signals. The transmitter 1404 may be, for example, adapted to transmit a program signal having one or more bitstreams representing encoded pictures and/or information related thereto. Typical transmitters perform functions such as, for example, one or more of providing error-correction coding, interleaving the data in the signal, randomizing the energy in the signal, and modulating the signal onto one or more carriers using a modulator 1406. The transmitter 1404 may include, or interface with, an antenna (not shown). Further, implementations of the transmitter 1404 may be limited to the modulator 1406.
The data transmission system 1400 is also communicatively coupled to a storage unit 1408. In one implementation, the storage unit 1408 is coupled to the encoder 1402, and stores an encoded bitstream from the encoder 1402. In another implementation, the storage unit 1408 is coupled to the transmitter 1404, and stores a bitstream from the transmitter 1404. The bitstream from the transmitter 1404 may include, for example, one or more encoded bitstreams that have been further processed by the transmitter 1404. The storage unit 1408 is, in different implementations, one or more of a standard DVD, a Blu-Ray disc, a hard drive, or some other storage device.
Referring to
The data receiving system 1500 may be, for example, a cell-phone, a computer, a set-top box, a television, or other device that receives encoded video and provides, for example, decoded video signal for display (display to a user, for example), for processing, or for storage. Thus, the data receiving system 1500 may provide its output to, for example, a screen of a television, a computer monitor, a computer (for storage, processing, or display), or some other storage, processing, or display device.
The data receiving system 1500 is capable of receiving and processing data information. The data receiving system or apparatus 1500 includes a receiver 1502 for receiving an encoded signal, such as, for example, the signals described in the implementations of this application. The receiver 1502 may receive, for example, a signal providing a bitstream, or a signal output from the data transmission system 1400 of
The receiver 1502 may be, for example, adapted to receive a program signal having a plurality of bitstreams representing encoded pictures. Typical receivers perform functions such as, for example, one or more of receiving a modulated and encoded data signal, demodulating the data signal from one or more carriers using a demodulator 1504, de-randomizing the energy in the signal, de-interleaving the data in the signal, and error-correction decoding the signal. The receiver 1502 may include, or interface with, an antenna (not shown). Implementations of the receiver 1502 may be limited to the demodulator 1504.
The data receiving system 1500 includes a decoder 1506. The receiver 1502 provides a received signal to the decoder 1506. The signal provided to the decoder 1506 by the receiver 1502 may include one or more encoded bitstreams. The decoder 1506 outputs a decoded signal, such as, for example, decoded video signals including video information.
The data receiving system or apparatus 1500 is also communicatively coupled to a storage unit 1507. In one implementation, the storage unit 1507 is coupled to the receiver 1502, and the receiver 1502 accesses a bitstream from the storage unit 1507. In another implementation, the storage unit 1507 is coupled to the decoder 1506, and the decoder 1506 accesses a bitstream from the storage unit 1507. The bitstream accessed from the storage unit 1507 includes, in different implementations, one or more encoded bitstreams. The storage unit 1507 is, in different implementations, one or more of a standard DVD, a Blu-Ray disc, a hard drive, or some other storage device.
The output data from the decoder 1506 is provided, in one implementation, to a processor 1508. The processor 1508 is, in one implementation, a processor configured for performing post-processing. In some implementations, the decoder 1506 includes the processor 1508 and therefore performs the operations of the processor 1508. In other implementations, the processor 1508 is part of a downstream device such as, for example, a set-top box or a television.
The implementations described herein may be implemented in, for example, a method or a process, an apparatus, a software program, a data stream, or a signal. Even if only discussed in the context of a single form of implementation (for example, discussed only as a method), the implementation of features discussed may also be implemented in other forms (for example, an apparatus or program). An apparatus may be implemented in, for example, appropriate hardware, software, and firmware. The methods may be implemented in, for example, an apparatus such as, for example, a processor, which refers to processing devices in general, including, for example, a computer, a microprocessor, an integrated circuit, or a programmable logic device. Processors also include communication devices, such as, for example, computers, cell phones, portable/personal digital assistants (“PDAs”), and other devices that facilitate communication of information between end-users.
Reference to “one embodiment” or “an embodiment” or “one implementation” or “an implementation” of the present principles, as well as other variations thereof, mean that a particular feature, structure, characteristic, and so forth described in connection with the embodiment is included in at least one embodiment of the present principles. Thus, the appearances of the phrase “in one embodiment” or “in an embodiment” or “in one implementation” or “in an implementation”, as well any other variations, appearing in various places throughout the specification are not necessarily all referring to the same embodiment.
Additionally, this application or its claims may refer to “determining” various pieces of information. Determining the information may include one or more of, for example, estimating the information, calculating the information, predicting the information, or retrieving the information from memory.
Further, this application or its claims may refer to “accessing” various pieces of information. Accessing the information may include one or more of, for example, receiving the information, retrieving the information (for example, from memory), storing the information, processing the information, transmitting the information, moving the information, copying the information, erasing the information, calculating the information, determining the information, predicting the information, or estimating the information.
Additionally, this application or its claims may refer to “receiving” various pieces of information. Receiving is, as with “accessing”, intended to be a broad term. Receiving the information may include one or more of, for example, accessing the information, or retrieving the information (for example, from memory). Further, “receiving” is typically involved, in one way or another, during operations such as, for example, storing the information, processing the information, transmitting the information, moving the information, copying the information, erasing the information, calculating the information, determining the information, predicting the information, or estimating the information.
As will be evident to one of skill in the art, implementations may produce a variety of signals formatted to carry information that may be, for example, stored or transmitted. The information may include, for example, instructions for performing a method, or data produced by one of the described implementations. For example, a signal may be formatted to carry the bitstream of a described embodiment. Such a signal may be formatted, for example, as an electromagnetic wave (for example, using a radio frequency portion of spectrum) or as a baseband signal. The formatting may include, for example, encoding a data stream and modulating a carrier with the encoded data stream. The information that the signal carries may be, for example, analog or digital information. The signal may be transmitted over a variety of different wired or wireless links, as is known. The signal may be stored on a processor-readable medium.
Number | Date | Country | Kind |
---|---|---|---|
15305447.3 | Mar 2015 | EP | regional |
15305507.4 | Apr 2015 | EP | regional |