The present invention relates to video coding systems and, more particularly, to three dimensional (3D) image coding and decoding systems.
Television programming is becoming more widely available in 3D. Sporting events and concerts have been broadcast for home consumption. As 3D component sales ramp up and as the demand for 3D grows, it is expected that 3D programming will be offered widely on most of the popular TV channels in the near future.
In order to facilitate new video applications such as 3D television and free-viewpoint video (FVV), 3D video data formats consisting of both conventional 2D video and depth—generally referred to as “2D data”—can be utilized such that additional views can be rendered for the end user or viewer. There are a number of different 3D video formats including, for example: 2D plus depth (2D+Z), Layered Depth Video (LDV), Multiview plus Depth (MVD), Disparity Enhanced Stereo (DES), and Layer Depth Video plus Right View (LDV+R), to name a few. The 2D plus depth (2D+Z) format consists of a 2D video element and its corresponding depth map. The Layered Depth Video (LDV) format includes the 2D+Z format elements and occlusion video together with occlusion depth. The Multiview plus Depth (MVD) format consists of a set of multiple 2D+Z formatted elements, each 2D+Z formatted element related to a different viewpoint. The Disparity Enhanced Stereo (DES) format is composed of two LDV formatted elements, wherein each LDV formatted element is related to one of two different viewpoints. The Layer Depth Video plus Right View (LDV+R) format is composed of one LDV formatted element from a left view and the 2D video element from a right view.
Coding has been used to protect the data in these various formats as well as to gain possible transmission or even processing efficiencies. Coding, as the term is contemplated for use herein, should be understood to encompass encoding and decoding operations. It is typically a challenging task to code the 3D content usually involving multiple views and possibly corresponding depth maps as well. Each frame of 3D content may require the system to handle a huge amount of data. Although the coding of such formatted data remains a subject of ongoing research, at least one framework for encoding and decoding much of the 3D video content in these formats is known to have been presented in PCT Application No. PCT/US2010/001286, which has been identified above. Nonetheless, it appears that most coding efforts are directed primarily toward the actual video or textural information as opposed to supplemental data such as depth and occlusion data.
Occlusion data, either occlusion video or occlusion depth, is not directly viewed by, or presented to, an end user viewing a TV display. Instead, it is used for virtual view rendering purposes by a receiver. Occlusion data exhibits different characteristics from normal video or depth information. It typically contains pixel values (i.e., for occlusion video data) or depth values (i.e., for occlusion depth data) that are invisible from a TV viewer's observation point. No techniques are presently known for efficiently handling and coding occlusion data in spite of the fact that occlusion data had surfaced in the LDV format within MPEG 3DV Ad Hoc group at least as early as 2008.
Some coding experiments on the LDV format were performed using multi-view video coding (MVC), in which the occlusion data are treated as a normal 2D view. However, the approach is not an efficient way to handle the occlusion video data and the occlusion depth data.
Limitations in transmission bandwidth, storage capacity, and processing capacity, for example, in the face of growing demand for affordable 3D content will continue to underscore the need for greater efficiency throughout the 3D system. Yet, none of the techniques known in the art are suitable for coding occlusion data efficiently. Hence, a more efficient coding technique for occlusion data, including both occlusion video data and occlusion depth data, appears to be needed in order to provide greater system efficiencies in the processing, storage, and transmission of 3D content.
The coding treatment for occlusion data so far appears to ignore the fact that occlusion data is referenced infrequently in the rendering process, if at all, and only small areas in a frame of occlusion data is typically used at any single point in the rendering process. Typically, the occlusion video is referenced when holes are observed after a view has been warped to a virtual position. Even then, reference is only made to one or more small areas of the occlusion video corresponding to the position of the holes in the warped view. A similar rationale applies to use of occlusion depth. These observations are then useful in developing an efficient coding strategy for the occlusion data.
In accordance with the principles of the present invention, coding methods for occlusion layers, such as occlusion video data and occlusion depth data in 3D video, are directed to improving the transmission and processing efficiency in systems handling this data. These coding methods for occlusion data include: indication of occlusion format; conversion of all occlusion data into a sparse data format; filling non-occlusion areas or macroblocks with a defined characteristic, such as a single color; rearranging the placement of the 2D data within the reference picture list; the use of proximity to depth boundaries to detect occlusion and non-occlusion areas or macroblocks; the use of skip mode coding for non-occlusion areas or macroblocks; the use of rate distortion cost for coding occlusion areas macroblocks; and the coding of a single occlusion frame while skipping the next n−1 occlusion frames. Each of these techniques, whether applied separately or in combination, affords improved and even significantly enhanced coding and transmission gains for the overall bitstreams of 3D data.
According to an aspect of the present principles, there is provided a method for processing occlusion data in a sequence of video data frames, the method includes the steps of: determining a format for the occlusion data, the format selected from one of a sparse occlusion data format and a filled occlusion data format; when the format for the occlusion data is determined to be the filled occlusion data format, converting the occlusion data into a sparse occlusion data format before encoding; encoding the occlusion data to produce encoded occlusion data; and outputting the encoded occlusion data together with an indicator representative of the format determined for the occlusion data.
According to another aspect of the present principles, there is provided an apparatus for processing occlusion data in a sequence of video data frames, the apparatus includes an encoder for: determining a format for the occlusion data, the format selected from one of a sparse occlusion data format and a filled occlusion data format; when the format for the occlusion data is determined to be the filled occlusion data format, converting the occlusion data into a sparse occlusion data format before encoding; encoding the occlusion data to produce encoded occlusion data; and outputting the encoded occlusion data together with an indicator representative of the format determined for the occlusion data.
According to another aspect of the present principles, there is provided a method for processing occlusion data in a sequence of video data frames, the method includes the steps of: extracting an indicator representative of an original format for received occlusion data, the original format selected from a one of a sparse occlusion data format and a filled occlusion data format; decoding the received occlusion data to produce decoded occlusion data; and when the indicator indicates the original format as a filled occlusion data format, converting the decoded occlusion data from a sparse occlusion data format to the filled occlusion data format, the converting further includes the step of replacing non-occlusion area data, which is represented with a defined characteristic, by respective collocated samples from 2D data in the video data frame associated with the occlusion data; outputting the decoded occlusion data and, when present, converted decoded occlusion data.
According to another aspect of the present principles, there is provided an apparatus for processing occlusion data in a sequence of video data frames, the apparatus includes a decoder for: extracting an indicator representative of an original format for received occlusion data, the original format selected from a one of a sparse occlusion data format and a filled occlusion data format; decoding the received occlusion data to produce decoded occlusion data; and when the indicator indicates the original format as a filled occlusion data format, converting the decoded occlusion data from a sparse occlusion data format to the filled occlusion data format, the converting further includes replacing non-occlusion area data, which is represented with a defined characteristic, by respective collocated samples from 2D data in the video data frame associated with the occlusion data; outputting the decoded occlusion data and, when present, converted decoded occlusion data.
The details of one or more implementations are set forth in the accompanying drawings and the description below. Even if described in one particular manner, it should be clear that implementations may be configured or embodied in various manners. For example, an implementation may be performed as a method, or embodied as an apparatus configured to perform a set of operations, or embodied as an apparatus storing instructions for performing a set of operations. Other aspects and features will become apparent from the following detailed description considered in conjunction with the accompanying drawings and the claims.
The above-mentioned and other features and advantages of this invention, and the manner of attaining them, will become more apparent and the invention will be better understood by reference to the following description of embodiments of the invention taken in conjunction with the accompanying drawings, wherein:
The exemplary embodiments set out herein illustrate preferred embodiments of the invention, and such exemplary embodiments are not to be construed as limiting the scope of the invention in any manner.
Coding methods for occlusion layers, such as occlusion video data and occlusion depth data, are described herein directed to improving the transmission and processing efficiency in systems handling this data. Several improved coding techniques are disclosed. Additionally, the description also includes information about syntaxes for inclusion in frame headers or overhead messages to communicate details about the actual type of occlusion data and other information useful in the practice of the present invention.
It is intended that the encoding and decoding techniques described herein are applicable to occlusion data, in general, whether that data is occlusion depth data or occlusion video data, unless one specific kind of occlusion data is expressly specified. Moreover, it is also intended that the encoding and decoding techniques described herein are applicable to any format of the occlusion data, in general, whether that data format is sparse or filled, unless one specific type of occlusion data format is expressly specified. It is important to describe certain terms so that they are properly understood in the context of this application. Certain useful terms are defined below as follows:
“2D data” includes one or both of the 2D video data and depth data, wherein the term “data” can be used interchangeably with the term “layer”.
A “2D video” layer is generally used herein to refer to the traditional video signal.
A “depth” layer is generally used herein to refer to data that indicates distance information for the scene objects.
A “depth map” is a typical example of a depth layer.
An “occlusion video” layer is generally used herein to refer to video information that is occluded from a certain viewpoint. The occlusion video layer typically includes background information for the 2D video layer.
An “occlusion depth” layer is generally used herein to refer to depth information that is occluded from a certain viewpoint. The occlusion depth layer typically includes background information for the depth layer.
A “transparency” layer is generally used herein to refer to a picture that indicates depth discontinuities or depth boundaries. A typical transparency layer has binary information, with one of the two values indicating positions for which the depth has a discontinuity, with respect to neighboring depth values, greater than a particular threshold.
A “3DV view” is defined herein as a data set from one view position, which is different from the “view” used in MVC. For example, a 3DV view may include more data than the view in MVC. For the 2D+Z format, a 3DV view may include two layers: 2D video plus its depth map. For the LDV format, a 3DV view may include four layers: 2D video, depth map, occlusion video, and occlusion depth map. In addition, a transparency map can be another layer data type within a 3DV view, among others.
A “3DV layer” is defined as one of the layers of a 3DV view. Examples of 3DV layers are, for example, 2D view or video, depth, occlusion video, occlusion depth, and transparency map. Layers other than 2D view or video are also defined as “3DV supplemental layers”. In one or more embodiments, a 3DV decoder can be configured to identify a layer and distinguish that layer from others using a 3dv layer id. In one implementation, 3dv layer id is defined as in the Table 1. However, it should be noted that the layers may be defined and identified in other ways, as understood by those of ordinary skill in the art in view of the teachings provided herein.
In a generic 3DV coder/decoder (codec) framework such as the one described in PCT Application No. PCT/US2010/001286, as identified above, occlusion video and occlusion depth are treated in a specific 3DV layer making it possible to design new or additional coding modes. In the present description, the 3DV codec framework from
An enhanced 2D layer is generally used herein to distinguish such a layer from a layer that is compatible with AVC, MVC, SVC, or some other underlying standard. For example, enhanced 2D layers are typically not compatible with MVC because such layers allow new coding tools, such as, for example, using inter-layer references. Such layers are, therefore, generally not backward compatible with MVC.
Note that the term “enhanced 2D layer’ (or supplemental layer) may also be used to refer to layers that could be coded with MVC, but which would not be expected to be displayed and so are not typically described as being coded with MVC. For example, a series of depth layers could be treated by MVC as a series of pictures and could be coded by MVC. However, it is not typical to display depth layers, so it is often desirable to have a different way of identifying and coding such layers, other than by using MVC.
Each layer can also use a different reference. The reference may be from a different layer than the picture/block being encoded (decoded). The references from different layers may be obtained from a 3DV Reference Buffer 316 (3DV Reference/Output Buffer 414). As shown in
By utilizing the 3DV Reference Buffer 316, each layer of the 3DV format can be encoded using references from its own layer, such as, for example, temporal references and/or inter-view references within the same layer with motion and/or disparity compensation, and/or using inter-layer prediction between the various layers. For example, an inter-layer prediction may reuse motion information, such as, for example, motion vector, reference index, etc., from another layer to encode the current layer, also referred to as motion skip mode. In this way, the output signal 318 may be interleaved with various layer information for one or more 3DV views. The inter-layer prediction may be of any kind of technique that is based on the access of the other layers.
With regard to the decoder system/apparatus 400, system 400 includes various layer decoders to which signal 318 may be input as shown in
As illustrated in
In addition, the 3DV reference/output buffer 414 can be configured to generate an output signal 416 in a 3DV compatible format for presentation to a user. The formatted 3DV content signal 416 may, of course, include, for example, 2D view, depth, occlusion view, occlusion depth, and transparency map layers. The output buffer may be implemented together with the reference buffer, as shown in
Other implementations of the encoder 300 and the decoder 400 may use more or fewer layers. Additionally, different layers than those shown may be used. It should be clear that the term “buffer”, as used in the 3DV Reference Buffer 316 and in the 3DV Reference/Output Buffer 414, is an intelligent buffer. Such buffers may be used, for example, to store pictures, to provide references (or portions of references), and to reorder pictures for output. Additionally, such buffers may be used, for example, to perform various other processing operations such as, for example, hypothetical reference decoder testing, processing of marking commands (for example, memory management control operations in AVC), and decoded picture buffer management.
It should be noted that with regard to an MVC encoder, the input is composed of multiple views. Each view is a traditional 2D video. Thus, compared to an AVC encoder, the typical MVC encoder includes additional blocks such as a disparity estimation block, a disparity compensation block, and an inter-view reference buffer. Analogously,
With regard to the high level diagram illustrated in
For example, if a mode decision module 536 in signal communication with the switch 512 determines that the encoding mode should be intra-prediction with reference to the same block or slice currently being encoded, then the adder receives its input from intra-prediction module 530. Alternatively, if the mode decision module 536 determines that the encoding mode should be displacement compensation and estimation with reference to a block or slice, of the same frame or 3DV view or 3DV layer currently being processed or of another previously processed frame or 3DV view or 3DV layer, that is different from the block or slice currently being encoded, then the adder receives its input from displacement compensation module 508, as shown in
The adder 506 provides a signal including 3DV layer(s) and prediction, compensation, and/or estimation information to the transform module 514, which is configured to transform its input signal and provide the transformed signal to quantization module 516. The quantization module 516 is configured to perform quantization on its received signal and output the quantized information to an entropy encoder 518. The entropy encoder 518 is configured to perform entropy encoding on its input signal to generate bitstream 520. The inverse quantization module 522 is configured to receive the quantized signal from quantization module 516 and perform inverse quantization on the quantized signal. In turn, the inverse transform module 524 is configured to receive the inverse quantized signal from module 522 and perform an inverse transform on its received signal. Modules 522 and 524 recreate or reconstruct the signal output from adder 506.
The adder or combiner 526 adds (combines) signals received from the inverse transform module 524 and the switch 512 and outputs the resulting signals to intra prediction module 530 and deblocking filter 528. Further, the intra prediction module 530 performs intra-prediction, as discussed above, using its received signals. Similarly, the deblocking filter 528 filters the signals received from adder 526 and provides filtered signals to 3DV reference buffer 532.
The 3DV reference buffer 532, in turn, parses its received signal. The 3DV reference buffer 532 aids in inter-layer and displacement compensation/estimation encoding, as discussed above, by elements 534, 508, and 510. The 3DV reference buffer 532 provides, for example, all or part of various 3DV layers.
With reference again to
Adder 612 can receive one of a variety of other signals depending on the decoding mode employed. For example, the mode decision module 622 can determine whether 3DV inter-layer prediction, displacement compensation, or intra prediction encoding was performed on the currently processed block by the encoder 500 by parsing and analyzing the control syntax elements 607. Depending on the determined mode, model selection control module 622 can access and control switch 623, based on the control syntax elements 607, so that the adder 612 can receive signals from the 3DV inter-layer prediction module 620, the displacement compensation module 618 or the intra prediction module 614.
Here, the intra prediction module 614 can be configured to, for example, perform intra prediction to decode a block or slice using references to the same block or slice currently being decoded. In turn, the displacement compensation module 618 can be configured to, for example, perform displacement compensation to decode a block or a slice using references to a block or slice, of the same frame or 3DV view or 3DV layer currently being processed or of another previously processed frame or 3DV View or 3DV layer, that is different from the block or slice currently being decoded. Further, the 3DV inter-layer prediction module 620 can be configured to, for example, perform 3DV inter-layer prediction to decode a block or slice using references to a 3DV layer, of the same frame or 3DV view currently processed or of another previously processed frame or 3DV view, that is different from the layer currently being processed.
After receiving prediction or compensation information signals, the adder 612 can add the prediction or compensation information signals with the inverse transformed signal for transmission to a deblocking filer 602. The deblocking filter 602 can be configured to filter its input signal and output decoded pictures. The adder 612 can also output the added signal to the intra prediction module 614 for use in intra prediction. Further, the deblocking filter 602 can transmit the filtered signal to the 3DV reference buffer 616. The 3DV reference buffer 316 can be configured to parse its received signal to permit and aid in inter-layer and displacement compensation decoding, as discussed above, by elements 618 and 620, to each of which the 3DV reference buffer 616 provides parsed signals. Such parsed signals may be, for example, all or part of various 3DV layers.
It should be understood that systems/apparatuses 300, 400, 500, and 600 can be configured differently and can include different elements as understood by those of ordinary skill in the art in view of the teachings disclosed herein.
Occlusion data plays a key role in Layered Depth Video (LDV) format.
In
For the purpose of rendering video for a viewer, it should be understood that the sparse occlusion data is considered to be equivalent to the counterpart filled occlusion data because the non-occlusion area is generally not referred to in 3D warping and hole filling operations at all. So it is possible to encode either the filled occlusion data or the sparse occlusion data in the LDV format without any confusion or loss of generality.
Sparse and filled occlusion data are equivalent to each other and interchangeable in terms of rendering. However, a rendering process may need to know if a pixel belongs to occlusion area or non-occlusion area such as when performing a hole filling process in rendering. In such a case, when a hole pixel resides in an occlusion area, the occlusion data can be used to fill the hole pixel. Otherwise, neighboring background pixels can be used to fill the hole pixel.
As noted above, the indication of the occlusion format is useful at least in assisting the determination of occlusion area or non-occlusion area. An indication of occlusion data format can be included in a high level syntax for the 3D video signal. As used herein, “high level syntax” refers to syntax present in the bitstream that resides hierarchically above the macroblock layer. For example, high level syntax, as used herein, may refer, but is not limited, to syntax at the slice header level, Supplemental Enhancement Information (SEI) level, Picture Parameter Set (PPS) level, Sequence Parameter Set (SPS) level, View Parameter Set (VPS), and Network Abstraction Layer (NAL) unit header level. Table 2 presents an example of modified SPS to include such an indicator flag, where the extended SPS for 3DV sequences is employed as an example.
The semantics for all the shaded entries in Table 2 above have been completely described in the commonly owned and co-pending PCT Application No. PCT/US2010/001286 on at least pages 50-55 with respect to Table 13 therein. The semantics of the remaining entry occlusion_data_format are as follows:
The encoding method in
In step S603, the sparse occlusion data is encoded using a standard video encoding technique to produce encoded occlusion data. Standard video encoding techniques include, but are not limited to, Multiview Video Coding (MVC), H.264/Advanced Video Coding (AVC), and MPEG coding including at least MPEG-2. These coding techniques are standardized and are understood to be well known to persons of ordinary skill in this technical field. No further description of these techniques will be presented herein. Control is transferred to step S605.
In step S605, the bitstream is prepared for transmission. The bitstream includes the encoded occlusion data together with the indicator of occlusion data format (i.e., the indication of sparse or filled) for the originally received occlusion data. Control is transferred to step S606, where the encoding method ends.
In step S604, the received occlusion data is processed to change the occlusion data format from a filled format to a sparse format. When the received occlusion data is represented in the sparse format, each non-occlusion area is represented as a defined characteristic, such a defined color or data value. This is accomplished by replacing data samples in the non-occlusion area by a defined characteristic such as a defined color or a defined depth level such that sparse occlusion data format results. The process is similar to color keying techniques wherein a color in one image is used to reveal another image behind. The change in representation to a sparse occlusion data format is more preferable than the converse (i.e., sparse format changed to filled format) because of efficiencies that arise from the standard coding techniques.
Efficiencies are obtained through conventional encoding because most of the non-occlusion area uniformly represented with certain uniform color can be coded in skip mode. In skip mode encoding, a macroblock is coded as a skipped macroblock thereby reducing the amount of data in the encoded occlusion data output by the encoder. When skip mode coding is used, the decoder decodes the macroblock by referring to motion vectors of the surrounding macroblocks and/or partitions within surrounding macroblocks. Skip mode coding is understood to be well known to persons of ordinary skill in this technical field. No further description of this coding technique will be presented herein. Control is then transferred to step S603.
In this step, it is necessary to identify at least one occlusion area and at least one non-occlusion area for the occlusion data. These occlusion areas will be mutually exclusive of each other. Identification allows the non-occlusion areas to be filled with a defined characteristic, such as the defined color.
One exemplary technique for performing such an identification of occlusion or non-occlusion areas includes the use of the depth data, which is from the same frame as the occlusion data, for detecting one or more depth discontinuities in the video data frame associated with the occlusion data. The area along each detected depth discontinuity is then classified as an occlusion area in the occlusion data. Other techniques may be utilized to perform the detection and/or classification described herein.
In another exemplary technique, the video data is input together with the filled occlusion data. Non-occlusion areas are exposed by calculating the difference frame between the video frame and the filled occlusion video frame. Samples in a non-occlusion area will have a value of zero or close to zero within the difference frame.
The decoding method in
In step 703, the sparse occlusion data is decoded using a standard video decoding technique to produce decoded occlusion data. Standard video decoding techniques include, but are not limited to, Multiview Video Coding (MVC), H.264/Advanced Video Coding (AVC), and MPEG coding including at least MPEG-2. Control is transferred to step S704.
In step S704, a determination is made concerning the occlusion data format for the occlusion data originally received at the encoder. This determination is based at least in part on the flag or indicator extracted in step S702. When the indicator indicates that sparse occlusion data was originally received by the encoder (
In step S705, the decoded occlusion data is output in either a sparse occlusion data format (from step S704) or a filled occlusion data format (from step S706). The method ends at step S707.
Step S706 is entered because it had been determined in step S704 that the occlusion data originally received by the encoder was in a filled occlusion data format as identified by the received flag or indicator extracted in step S702. As mentioned above, step S704 outputs decoded occlusion data in the sparse data format. In order to convert the sparse occlusion data format to the originally received filled occlusion data format, it is necessary to fill the non-occlusion area, identified by the defined characteristic such as the defined color, for example, with the collocated data sample in the corresponding video or depth component of the frame. When the occlusion data is the occlusion video, then the corresponding video component from the same frame is used for filling the non-occlusion area data samples in the decoded occlusion data. Similarly, when the occlusion data is the occlusion depth component, then the corresponding depth component from the same frame is used for filling the non-occlusion area data samples in the decoded occlusion data. When the decoded occlusion data is converted back into the proper originally received format, control is transferred to step S705.
In another embodiment of the present invention, the location of the occlusion data, which can be either occlusion video or occlusion depth, is changed in the reference picture list. Construction of the reference picture list typically appends the inter-layer reference pictures after the temporal pictures and the inter-view reference pictures in the reference picture list. Examples of various reference picture lists are described in PCT Application No. PCT/US2010/001291, which has been identified above. In this regard, see also commonly owned U.S. Patent Application Serial No. 2010/0118933 for Pandit et al. In the present invention, when encoding occlusion data, the reference picture from the video layer is positioned at location 0 in the reference picture list. In other words, when encoding occlusion data, the 2D data having the same timestamp (i.e., the same video frame) is placed at location 0 in the reference picture list.
When occlusion data is encoded using this reordered reference picture list, it is possible to obtain some coding efficiency in dealing with the blocks in the non-occlusion area. It should be noted that the encoding described herein can be applied to either the occlusion video data or the occlusion depth data and that data can be in either a sparse occlusion data format or a filled occlusion data format. The coding efficiency is gained because skip mode encoding can be applied during encoding of the non-occlusion areas so that the depth or video data corresponding to the non-occlusion area(s) is directly copied without any further modification to the data in the non-occlusion area. This efficiency is made possible by having the non-occlusion area information immediately available from the occlusion video or depth data at location 0 in the reference picture list.
Identification of the non-occlusion areas is achieved through any of the techniques discussed above in reference to step S604 in
For the decoder realized in accordance with this aspect of the present invention, data from the video reference frame is copied to the non-occlusion block. If a sparse occlusion data format is desired at the decoder, the copy process in the decoder is skipped and the decoder simply fills the block by the defined characteristic, such as the defined color described above.
The encoding method in
In step S802, the reference picture list is arranged by placing the 2D data having the same timestamp at location 0. The term “2D data” is understood to include one or both of the 2D video data and the depth data. Control is then transferred to step S803.
It is to be understood that the preferred embodiment of the present invention is realized by processing the received occlusion data to change the occlusion data format from a filled format to a sparse format. This has been described above with respect to
In step S803, encoding of the data is performed. When the block of data being encoded is identified as being in a non-occlusion area, encoding is performed using skip mode encoding for that block. Otherwise, for a block of data identified as not being in a non-occlusion area (i.e., being in an occlusion area), the coding mode is selected on the conventional basis of rate distortion cost (RD cost). Control is then transferred to step S804.
In step S804, the bitstream is prepared for output transmission. The bitstream includes the encoded occlusion data together with the indicator or flag occlusion data format (i.e., the indication of sparse or filled) for the originally received occlusion data. This indicator has been described in detail above with respect to
The decoding method in
In step S902, the reference picture list is again arranged by placing the 2D data having the same timestamp at location 0. As noted above, the term “2D data” is understood to include one or both of the 2D video data and the depth data. Control is then transferred to step S903.
In step S903, all macroblocks in the slice or picture are decoded in the conventional video decoding manner. Control is then transferred to step S904.
In step S904, on the basis of the indicator or flag received with the video data, one of two possible techniques are used for the occlusion data. When the indicator identifies the occlusion data format as sparse for the originally received occlusion data, the non-occlusion areas are filled with the defined characteristic, such as the defined color or defined depth value. When the indicator identifies the occlusion data format as filled for the originally received occlusion data, the non-occlusion areas are filled with the data samples from the corresponding portions of the 2D video. Control is then transferred to step S905, where the decoding method ends for this data.
It is recognized herein that, for the revised reference picture list construction described in the embodiment above, the reference picture index is not necessarily optimized for coding the occlusion blocks. This issue regarding optimization arises because blocks in occlusion areas are likely to use a temporal reference picture for best matching instead of inter-layer reference picture. On the other hand, it is not necessarily good for the blocks in non-occlusion areas to put the layer reference picture at the end of the reference picture list as is shown in PCT Application No. PCT/US2010/001291, identified above. Thus, the rearrangement of the reference picture list may not alone provide a completely suitable and effective solution for encoding/decoding blocks, both blocks associated with occlusion areas and blocks associated with non-occlusion areas.
Another embodiment of an encoder and decoder method for occlusion data involves the use of depth and the detection of depth boundaries. This embodiment is depicted in
In order to favor the coding of both the occlusion area blocks and the non-occlusion area blocks, for this embodiment of the present invention, the reference picture list is arranged by appending inter-layer reference pictures at the end of the reference picture list. Examples of such a reference picture list are described in PCT Application No. PCT/US2010/001291.
During the encoding process, boundary detection is performed on the reconstructed depth samples to determine the proximity of the current macroblock to a detected depth boundary, usually measured in pixels. The reconstructed depth samples are available usually at the output of deblocking filter 528 in the encoder of
If it is determined that a macroblock is within/pixels of a detected depth boundary, then this macroblock is marked as an occlusion area macroblock, and the encoding mode is selected using rate distortion (RD) cost as explained above. On the other hand, if it is determined that a macroblock is not within/pixels of a detected depth boundary, then the inter-layer skip encoding mode will be used to encode this macroblock.
In decoding, the blocks encoded via skip mode encoding utilize the depth data in the following way. Distance between the macroblock and the depth boundary is determined. For any macroblock that was skipped in the encoding process, when distance from the macroblock to the nearest detected depth boundary is at or within (i.e., less than) a threshold of/pixels, that macroblock is identified as a temporally skipped block. Otherwise, when the distance from the skipped macroblock to the nearest detected depth boundary is greater than (i.e., beyond) the threshold of/pixels, that macroblock is identified as non-occlusion area macroblock, and it is further deemed to be an inter-layer skipped macroblock.
Detection of the depth boundary is important to the operation of the codec embodiment. It is noted that the depth boundary should be detected in the decoder preferably using the same algorithm as was used in the encoder. This insures that the reconstructed depth samples have identical reconstruction at the encoder and at the decoder. Depth boundary detection may be accomplished by any number of well known techniques. These well known techniques will not be described further herein.
The encoding method in
In step S1003, one or more depth boundaries are detected from a reconstructed depth map. The distance from each macroblock to the closest depth boundary is measured. When the distance from a macroblock to its closest depth boundary is less than or equal to/pixels, the macroblock is marked as an occlusion area macroblock. Otherwise, the macroblock is a non-occlusion area macroblock. Since the mark or flag identifies the macroblock as being an occlusion area macroblock, the absence of the mark or flag automatically identifies the associated macroblock as being a non-occlusion area macroblock. It should be noted that a two state flag will suffice to identify each macroblock properly as either a non-occlusion area macroblock (e.g., flag=0) or an occlusion area macroblock (e.g., flag=1). Control is then transferred to step S1004.
In step S1004, the flag or mark for the macroblock is read. When the mark indicates that the macroblock is a non-occlusion area macroblock, the conventional skip mode encoding is used to encode the macroblock. When the mark indicates that the macroblock is an occlusion area macroblock, an encoding mode is selected and used based on conventional rate distortion cost (RD cost). Control is then transferred to step S1005.
In step S1005, the bitstream is prepared for output transmission. The bitstream includes the encoded occlusion data together with the indicator or flag occlusion data format (i.e., the indication of sparse or filled) for the originally received occlusion data. This indicator has been described in detail above with respect to
The decoding method in
At step S1102, the reference picture list is arranged by placing the 2D data having the same timestamp after both the temporal and inter-view reference pictures in the reference picture list. Control is then transferred to step S1103.
In step S1103, just as in step S1002 for the encoding method, one or more depth boundaries are detected from a reconstructed depth map. The distance from each macroblock to the closest depth boundary is measured. When the distance from a macroblock to its closest depth boundary is less than or equal to/pixels, the macroblock is marked as an occlusion area macroblock. Otherwise, the macroblock is a non-occlusion area macroblock. Since the mark or flag identifies the macroblock as being an occlusion area macroblock, the absence of the mark or flag automatically identifies the associated macroblock as being a non-occlusion area macroblock. As described above with respect to
Macroblock decoding is then performed in step S1104. Decoding is performed initially on the basis of the indicators or flags received with the video data: one flag or mark indicating the macroblock as being a non-occlusion/occlusion area macroblock and the other indicator or flag identifying the occlusion data format as sparse or filled. First, all macroblocks in the slice or picture are decoded in the conventional video decoding manner, similar to the step S903 shown in
When a skipped macroblock is identified by one flag that indicates a non-occlusion area macroblock and the other indicator that identifies the occlusion data format as sparse for the originally received occlusion data, the non-occlusion areas are filled with the defined characteristic, such as the defined color or defined depth value. When a skipped macroblock is identified by one flag that indicates a non-occlusion area macroblock and the other indicator that identifies the occlusion data format as filled for the originally received occlusion data, the non-occlusion areas are filled with the data samples from the corresponding portions of the 2D video. For all other macroblocks, conventional decoding is used, as noted above. Control is then transferred to step S1105, where the decoding method ends for this data.
In this embodiment, it is expected that the occlusion frames are substantially identical or constant from one frame to the next over a defined period of time (or frames). On the encoder side, the occlusion data may be obtained by using one representative occlusion data frame. Alternatively, a number of consecutive occlusion data frames from a video scene may be merged in a combinatorial manner to realize the representative occlusion data frame. For both encoding and decoding, the representative occlusion data frame is then valid for a defined number of frames (i.e., period of time) until it is replaced by a new representative occlusion data frame. This method can be applied on either the occlusion video data or occlusion depth data.
In order to realize this technique, it is necessary to determine the number of frames n over which the representative occlusion data frame is valid until the next update. Additionally, it is necessary to include that number of frames n over which the representative occlusion data frame is valid until the next update in a syntax transmitted via a message from the encoder to the decoder so that the decoder can operate properly. While the frames over which the representative occlusion data frame is valid are generally intended to be consecutive, it is contemplated that the frames may even be non-consecutive under certain circumstances. For example, when two scenes are switched frequently, the occlusion data for one scene can be used for the frames related to that scene in the alternating scene sequence. Since those frames are alternated with frames from a second scene, the number n for the period actually covers non-consecutive frames.
In step S1202, the time period n is determined. This time period is generally expressed as an integer number of frames. It represents the period over which a single representative occlusion data frame (video or depth) is valid. Control is passed to step S1203.
In step S1203, the representative occlusion data frame is encoded. No encoding or transmission is performed on the next n−1 consecutive occlusion data frames. They are effectively skipped. The representative occlusion data frame may be one occlusion data frame selected from n consecutive occlusion data frames in time period n over which the representative occlusion data frame is valid. As noted above, the representative occlusion data frame may be a combination of the characteristics of two or more occlusion data frames selected from n consecutive occlusion data frames in time period n over which the representative occlusion data frame is valid. Control is passed to step S1204.
In step S1204, the encoded representative occlusion data frame is transmitted along with a syntax message indicating the period, n. Control is passed to step S1205.
In decision step S1205, it is determined whether the period n has expired so that a new representative occlusion data frame can be encoded to update and replace the current representative occlusion data frame. If the time period has expired and there is another representative occlusion data frame ready for encoding, then control is passed back to step S1202. If there are no more occlusion data frames ready for encoding, then control is passed to step S1206 where the process ends.
In this embodiment, a decoded occlusion frame will remain valid for its associated frame and all n−1 subsequent consecutive frames in decoding order until another representative occlusion frame is decoded to update and replace the prior representative occlusion frame.
The decoding process starts at step S1301, where control is passed to step S1302. In step S1302, the syntax message is decoded to determine the period n. Control is passed to step S1303.
In step S1303, the representative occlusion data frame is decoded. That representative occlusion data frame is then maintained as valid for period n, that is, for the next n−1 consecutive frames. Control is passed to step S1304.
In decision step S1304, it is determined whether the period n has expired so that a new representative occlusion data frame can be decoded to update and replace the current representative occlusion data frame. If the time period n has expired and there is another representative occlusion data frame ready for decoding, then control is passed back to step S1302. If there are no more occlusion data frames ready for decoding, then control is passed to step S1305 where the process ends.
The methods described herein are contemplated for use in computer processor based implementations, or on computer readable storage media, or in other apparatus such as the coding/decoding apparatus depicted in
The above descriptions and illustrations of the coding and decoding occlusion data are exemplary of the various embodiments of the present invention. Certain modifications and variations such as the use of different types of occlusion data, different orders of performing certain encoding or decoding steps, or even omitting one or more steps in a method, may also be used to practice of the present invention.
All examples and conditional language recited herein are intended for pedagogical purposes to aid the reader in understanding the present principles and the concepts contributed by the inventor to furthering the art, and are to be construed as being without limitation to such specifically recited examples and conditions.
Moreover, all statements herein reciting principles, aspects, and embodiments of the present invention, as well as specific examples thereof, are intended to encompass both structural and functional equivalents thereof. Additionally, it is intended that such equivalents include both currently known equivalents as well as equivalents developed in the future, including any elements developed at any that perform the same function, regardless of structure.
A number of implementations have been described herein. Nevertheless, it will be understood that various modifications may be made. For example, elements of different implementations may be combined, supplemented, modified, or removed to produce other implementations. Additionally, one of ordinary skill will understand that other structures and processes may be substituted for those disclosed and the resulting implementations will perform at least substantially the same function(s), in at least substantially the same way(s), to achieve at least substantially the same result(s) as the implementations disclosed. In particular, although illustrative embodiments have been described herein with reference to the accompanying drawings, it is to be understood that the present principles is not limited to those precise embodiments, and that various changes and modifications may be effected therein by one of ordinary skill in the pertinent art without departing from the scope or spirit of the present principles. Accordingly, these and other implementations are contemplated by this application and are within the scope of the following claims.
This application claims the benefit, under 35 U.S.C. § 365 of International Application PCT/US2011/049877, filed Aug. 31, 2011, which was published in accordance with PCT Article 21(2) on Mar. 22, 2012 in English and which claims the benefit of U.S. provisional patent application No. 61/403,345, filed Sep. 14, 2010. The present application is related to the following co-pending, commonly owned patent applications: PCT Application No. PCT/US2010/001286 entitled “3D Video Coding Formats”, having an international filing date of Apr. 30, 2010; PCT Application No. PCT/US2010/001291 entitled “Reference Picture Lists for 3DV,” having an international filing date of Apr. 30, 2010; and PCT Application No. PCT/US2010/001292 entitled “Inter-Layer Dependency Information for 3DV”, having an international filing date of Apr. 30, 2010.
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/US2011/049877 | 8/31/2011 | WO | 00 | 3/11/2013 |
Publishing Document | Publishing Date | Country | Kind |
---|---|---|---|
WO2012/036901 | 3/22/2012 | WO | A |
Number | Name | Date | Kind |
---|---|---|---|
5327509 | Rich | Jul 1994 | A |
5892554 | DiCiccio et al. | Apr 1999 | A |
7058027 | Alessi et al. | Jun 2006 | B1 |
7671894 | Yea et al. | Mar 2010 | B2 |
8345751 | Klein Gunnewiek et al. | Jan 2013 | B2 |
8406525 | Ma et al. | Mar 2013 | B2 |
8571113 | Jeon et al. | Oct 2013 | B2 |
8854428 | Suh | Oct 2014 | B2 |
20020076080 | Hecht et al. | Jun 2002 | A1 |
20020077617 | Drevik | Jun 2002 | A1 |
20020176025 | Kim et al. | Nov 2002 | A1 |
20040119709 | Strom et al. | Jun 2004 | A1 |
20050094726 | Park | May 2005 | A1 |
20060146143 | Xin et al. | Jul 2006 | A1 |
20060233239 | Sethi et al. | Oct 2006 | A1 |
20070005795 | Gonzalez | Jan 2007 | A1 |
20070024614 | Tam et al. | Feb 2007 | A1 |
20070030356 | Yea et al. | Feb 2007 | A1 |
20070041442 | Novelo | Feb 2007 | A1 |
20070109409 | Yea et al. | May 2007 | A1 |
20070121722 | Martinian et al. | May 2007 | A1 |
20080117985 | Chen et al. | May 2008 | A1 |
20080253671 | Choi et al. | Oct 2008 | A1 |
20100020871 | Hannuksela | Jan 2010 | A1 |
20100020884 | Pandit et al. | Jan 2010 | A1 |
20100034260 | Shimizu et al. | Feb 2010 | A1 |
20100046635 | Pandit et al. | Feb 2010 | A1 |
20100091881 | Pandit et al. | Apr 2010 | A1 |
20100118942 | Pandit et al. | May 2010 | A1 |
20100165077 | Yin et al. | Jul 2010 | A1 |
20100195716 | Klein Gunnewiek et al. | Aug 2010 | A1 |
20100202535 | Fang et al. | Aug 2010 | A1 |
20100226444 | Thevathasan et al. | Sep 2010 | A1 |
20100231689 | Bruls | Sep 2010 | A1 |
20100284466 | Pandit et al. | Nov 2010 | A1 |
20110064302 | Ma et al. | Mar 2011 | A1 |
20110122131 | Bruls | May 2011 | A1 |
20110122230 | Boisson et al. | May 2011 | A1 |
20110142289 | Barenbrug et al. | Jun 2011 | A1 |
20110149037 | Van Der Horst et al. | Jun 2011 | A1 |
20110211128 | Petrides | Sep 2011 | A1 |
20110261050 | Smolic et al. | Oct 2011 | A1 |
20110291988 | Bamji et al. | Dec 2011 | A1 |
20120007948 | Suh | Jan 2012 | A1 |
20120007951 | Fogel | Jan 2012 | A1 |
20120044322 | Tian et al. | Feb 2012 | A1 |
20120050475 | Tian et al. | Mar 2012 | A1 |
20120056981 | Tian et al. | Mar 2012 | A1 |
20130033586 | Hulyalkar | Feb 2013 | A1 |
20130162773 | Tian et al. | Jun 2013 | A1 |
20130162774 | Tian et al. | Jun 2013 | A1 |
20130176394 | Tian et al. | Jul 2013 | A1 |
Number | Date | Country |
---|---|---|
2257533 | Dec 1997 | CA |
101166282 | Apr 2008 | CN |
101242530 | Aug 2008 | CN |
101292538 | Oct 2008 | CN |
101309411 | Nov 2008 | CN |
101415114 | Apr 2009 | CN |
1727091 | Nov 2006 | EP |
1806930 | Jul 2007 | EP |
2197217 | Jun 2010 | EP |
2464122 | Jun 2012 | EP |
09027969 | Jan 1997 | JP |
2005130428 | May 2005 | JP |
2008172749 | Jul 2006 | JP |
200822549 | Jan 2008 | JP |
2009513074 | Mar 2009 | JP |
2009532931 | Sep 2009 | JP |
2010531604 | Sep 2010 | JP |
100965881 | Jun 2010 | KR |
20080063323 | Dec 2014 | KR |
2005083636 | Sep 2005 | NO |
2007047736 | Apr 2007 | NO |
2007080223 | Jul 2007 | NO |
2007081189 | Jul 2007 | NO |
WO2006137000 | Dec 2006 | WO |
2007081926 | Jul 2007 | WO |
2007114609 | Oct 2007 | WO |
2007126511 | Nov 2007 | WO |
2008047316 | Apr 2008 | WO |
2008051381 | May 2008 | WO |
2008054100 | May 2008 | WO |
2008133455 | Nov 2008 | WO |
2008156318 | Dec 2008 | WO |
2009001255 | Dec 2008 | WO |
2009130561 | Oct 2009 | WO |
2010010077 | Jan 2010 | WO |
2010096189 | Aug 2010 | WO |
2010126608 | Nov 2010 | WO |
2010126612 | Nov 2010 | WO |
2010126613 | Nov 2010 | WO |
Entry |
---|
Notice of Grant for Related U.S. Appl. No. 13,822,064 dated Apr. 27, 2015. |
Telecommunication Standardization Sector of ITU, International Telecommunication Union, Mar. 2009, Series H: Audiovisual and Multimedia Systems Infrastructure of audiovisual services—Coding of moving video, Advanced video coding for generic audiovisual services, Recommendation ITU-T H.264. |
W.H.A. (FONS) Bruls, et al., Options for a New Efficient, Compatible, Flexible 3D Standard, Philips Research Laboratories, ICIP 2009, pp. 3497-3500, 978-1-4244-5654-3/09, 2009 IEEE. |
Fons Bruls, et al., International Organisation for Standardisation, Coding of Moving Pictures and Audio, ISO/IEC JTC1/SC29/WG11, MPEG2009/M16139, Feb. 2009, pp. 1-18, Lausanne. |
Fons Bruls, et al., International Organisation for Standardisation, Coding of Moving Pictures and Audio, ISO/IEC JTC1/SC29/WG11, MPEG2007/M14700, Jul. 2007, pp. 1-8, Lausanne |
Ying Chen et al, on MVC Reference Picture List Construction, Joint Video Team (JVT) of ISO/IEC Mpeg & ITU-T VCEG, (ISO/IEC JTC1/SC29/WG11 and ITU-T SG16 Q.6), 22nd Meeting: Jan. 13-19, 2007, pp. 1-9, Document:.JVT-V043, Filename: JVT-V043.doc, Marrakech, Morocco. |
Ying Chen et al., Operation Point and View Dependency Change SEI MEssage for MVC, Joint Video Team (JVT) of ISO/IEC MPEG & ITU-T VCEG, (ISO/IEC JTC1/SC29/WG11 and ITU-T 5G16 Q.6), Apr. 21-27, 2007, pp. 1-8, 23rd Meeting, Document: JVT-W038, Filename: JVT-W038.doc, San Jose, California. |
MVC Software Manual, Nov. 18, 2008, pp. 1-26, Version JMVC 3.01 (CVS tag: JMVC—3—0—1). |
Christopher Fehn et al, Study of Some MPEG Tools Related to 3D-Video, International Organisation for Standardisation, Coding of Moving Pictures and Associated Audio Information, May 2002, pp. 1-6, ISO/IEC JTC1/SC29/WG11, ISO/IEC JTC1/SC29/WG11, Fairfax. |
Stefan Grewatsch et al, Sharing of Motion Vectors in 3D Video Coding, 2004 International Conference on Image Processing (ICIP), Oct. 24-27, 2004, pp. 3271-3274. |
Yo-Sung Ho et al, Overview of Multi-view Video Coding, 2007 14th International Workshop on Systems, Signals and Image Processing and 6th EURASIP Conference focused on Speech and Image Processing, Multimedia Communications and Services, Jun. 27-30, 2007, pp. 5-12, IEEE. |
Junyan Huo et al, A Flexible Reference Picture Selection Method for Spatial DIRECT Mode in Multiview Video Coding, 2008 Congress on Image and Signal Processing, May 27-30, 2008, pp. 268-272, IEEE. |
ISO/IEC, International Standard, Information Technology—MPEG Video Technologies, Part 3: Representation of auxiliary video and supplemental information, Oct. 15, 2007, pp. 1-34, First Edition, Reference No. ISO/IEC 23002-3:2007(E). |
ISO/IEC, Information technology—MPEG video technologies—Part 3: Representation of auxiliary video and supplemental information, Jan. 19, 2007, pp. 1-34, ISO/IEC JTC 1/SC 29, ISO/IEC FDIS 23002-3:2007(E), ISO/IEC JTC 1 /SC 29/WG 11. |
Patrick Lopez et al, 3DV EE3 Results on Lovebird 1 and Leaving Laptop Sequences, International Organisation for Standardisation, Coding of Moving Pictures and Audio, Oct. 2008, ISO/IEC JTC1/SC29/WG 11, MPEG2008/M 15802, Busan, Korea. |
Mona Mahmoudi et al, Sparse Representations for Three-Dimensional Range Data Restoration, Institute for Mathematics and Its Applications, Sep. 2009, pp. 1-6, IMA Preprint Series # 2280, University of Minnesota. |
P.Y. Yip et al, Joint Source and Channel Coding for H.264 Compliant Stereoscopic Video Transmission, Canadian Conference on Electrical and Computer Engineering, May 1-4, 2005, pp. 188-191, IEEE. |
Philipp Merkle et al, Efficient Compression of Multi-view Depth Data Based on MVC, 2007 3DTV Conference, May 7-9, 2007, pp. 1-4, IEEE. |
Yannick Morvan et al, System Architecture for Free-Viewpoint Video and 3D-TV, IEEE Transactions on Consumer Electronics, May 2008, pp. 925-932, vol. 54, No. 2, IEEE. |
Yannick Morvan et al, Design Considerations for a #D-Tv Video Coding Architecture, 2008 Digest of Technical Papers—International Conference on Consumer Electronics, Jan. 9-13, 2008, pp. 1-2, 6.4-5, IEEE. |
Han Oh et al, H.264-Based Depth Map Sequence Coding Using Motion Information of Corresponding Texture Video, PSIVT '06 Proceedings of the First Pacific Rim conference on Advances in Image and Video Technology, Dec. 10-13, 2006, pp. 898-907, Hsinchu Taiwan. |
Purvin Pandit et al, H.264/AVC extension for MVC using SEI message, Joint Video Team (JVT) of ISO/IEC MPEG & ITU-T VCEG, 24th Meeting, Jun. 29-Jul. 6, 2007, pp. 1-14, (ISOnEC JTC1/SC29/WG11 and ITU-T SG16 Q.6) Document: JVT-X061, Filename: JVT-X061.doc Geneva, Switzerland. |
Search Report for corresponding China Application. No. 2010800298716, dated Jan. 17, 2014, pp. 1-6. |
Final Office Action to corresponding U.S. Appl. No. 13/138,956, dated Jan. 26, 2015, pp. 1-25. |
Non-Final Office Action to corresponding U.S. Appl. No. 13/138,956, dated Jul. 2, 2014 pp. 1-22. |
Search Report for corresponding PCT Application No. PCT/US2010/001286, dated Oct. 18, 2010, pp. 1-18. |
Search Report for corresponding China Application. No. 2010800296161, dated Mar. 14, 2014, pp. 1-4. |
Search Report for corresponding China Application. No. 2010800296161, dated Dec. 5, 2014, pp. 1-4. |
Final Office Action to corresponding U.S. Appl. No. 13/318,412, dated Dec. 24, 2014, pp. 1-20. |
Final Office Action to corresponding U.S. Appl. No. 13/318,412, dated Mar. 27, 2014. pp. 1-25. |
Search Report for corresponding PCT Application No. PCT/US2010/001291, dated Nov. 30, 2010, pp. 1-19. |
Search Report for corresponding China Application. No. 2010800296072, dated Mar. 26, 2014, pp. 1-6. |
Non-Final Office Action to corresponding U.S. Appl. No. 13/318,418, dated Dec. 24, 2014 pp. 1-22. |
Search Report for corresponding PCT Application No. PCT/US2010/001292, dated Oct. 22, 2010, pp. 1-18. |
Shinya Shimizu et al, A Backward Compatible 3D Scene Coding Using Residual Prediction, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing, Mar. 31-Apr. 4, 2008, pp. 1141-1144, IEEE. |
Paul Kerbiriou et al, Looking For An Adequate Quality Criterion for Depth Coding, The International Society for Optical Engineering, Proc. SPIE 7526, Three-Dimensional Image Processing (3DIP) and Applications, 75260A, Feb. 4, 2010, From Conference vol. 7526. |
Dong Tian et al, View Synthesis Techniques for 3D Video, Published in SPIE Proceedings Applications of Digital Image Processing XXXII Sep. 2, 2009, pp. 1-11, vol. 7443, SPIE. |
Search Report for corresponding PCT Application No. PCT/U52011/049877, dated Nov. 4, 2011, pp. 1-13. |
Search Report for corresponding PCT Application No. PCT/U52011/049881, dated Nov. 3, 2011, pp. 1-14. |
Search Report for corresponding PCT Application No. PCT/U52011/049886, dated Aug. 31, 2011, pp. 1-16. |
Shinya Shimizu et al, Free-Viewpoint Scalable Multi-View Video Coding Using Panoramic Mosaic Depth Maps, 16th European Signal Processing Conference (EUSIPCO 2008), Aug. 25-29, 2008, pp. 1-5, EURASIP. |
Gary J. Sullivan et al, Editors' draft revision to ITU-T Rec. H.264 | ISO/IEC 14496-10 Advanced Video Coding—in preparation for ITU-T SG 16 AAP Consent (in integrated form), Joint Video Team (JVT) of ISO/IEC MPEG & ITU-T VCEG (ISO/IEC JTC1/SC29/WG11 and ITU-T SG16 Q.6) 30th Meeting, Jan. 29-Feb. 3, 2009, Document: JVT-AA007, Filename: JVT-AD007.doc, Geneva Switzerland. |
Detlev Marpe et al, the H.264/MPEG4 Advanced Video Coding Standard and its Application, IEEE communications Magazine, Aug. 2006, pp. 134-143, IEEE. |
Non-Final Office Action to corresponding U.S. Appl. No. 13/822,031, dated Jun. 3, 2015, pp. 1-22. |
Number | Date | Country | |
---|---|---|---|
20130162773 A1 | Jun 2013 | US |
Number | Date | Country | |
---|---|---|---|
61403345 | Sep 2010 | US |