Embodiments of this disclosure generally relate to encoding a block-based volumetric video, and more particularly, to a system and method for encoding the block-based volumetric video having a plurality of video frames of a 3D object into a 2D video format.
A volumetric video, or a free-viewpoint video, captures a representation of surfaces in 3-dimensional (3D) space and combines the visual quality of photography with the immersion and interactivity of 3D content. The volumetric video may be captured using multiple cameras to capture surfaces inside a defined volume by filming from one or more viewpoints and interpolating over space and time. Alternatively, the volumetric video may be created from a synthetic 3D model. One of the features of volumetric video is the ability to view a scene from multiple angles and perspectives in a realistic and consistent manner. Since the amount of data that has to be captured and streamed is huge as compared to non-volumetric video, encoding and compression play a key role in broadcasting the volumetric video. Each frame of a block-based volumetric video includes different types of data such as RGB data, depth data, etc. which have to be stored in the block-based volumetric video.
When encoding the block-based volumetric video in a 2D video format, a block may represent some part of an irregular 3D surface. If the block is rectangular, and the irregular 3D surface lies inside it, there may be some parts of the block that are “empty”, or “unoccupied”. These parts of the block do not contain any valid volumetric content, and should not be displayed to a viewer. Unfortunately, under data compression, transmission, and subsequent decompression for display, it becomes harder to discriminate which data is stored where in the block-based volumetric video and it can lead to errors that can cause unpleasant visual artifacts in a rendered output.
Accordingly, there remains a need for mitigating and/or overcoming drawbacks associated with current methods.
In view of the foregoing, embodiments herein provide a processor-implemented method for encoding a block-based volumetric video having a plurality of video frames of a 3D object into a 2D video format. The processor-implemented method includes (i) splitting each video frame of the plurality of video frames into a first region that includes RGB data, a second region that includes depth data, and at least a third region containing render metadata of the 3D object; and (ii) storing the render metadata of the 3D object in at least one of the first region that includes the RGB data, the second region that includes the depth data and the at least the third region in at least one channel that is selected from a U chroma channel, a V chroma channel, and a luma channel of the block-based volumetric video.
In some embodiments, the render metadata includes material information for rendering a surface of the 3D object.
In some embodiments, the material information includes a material property of a surface normal of a surface representation of surface data of the 3D object.
In some embodiments, the material information includes a 2D vector that represents a principal axis of anisotropy in a material of the 3D object.
In some embodiments, the material information describes at least one of a valid pixel that includes a valid volumetric content or an invalid pixel that does not include the valid volumetric content.
In some embodiments, if a magnitude of the 2D vector is above a threshold, the material of the 3D object is identified as being anisotropic, and if the magnitude of the 2D vector is equal to or below the threshold, the material of the 3D object is identified as being isotropic.
In some embodiments, the material information includes a transparency value that represents transparency data. In some embodiments, a relationship between the transparency value and whether a pixel is a valid pixel or an invalid pixel is defined by at least one of (i) if the transparency value is greater than a threshold, the pixel is a valid pixel and if the transparency value is lesser than the threshold, the pixel is an invalid pixel, or (ii) if the transparency value is lesser than the threshold, the pixel is a valid pixel and if the transparency value is greater than the threshold, the pixel is an invalid pixel. In some embodiments, a valid pixel is a fully opaque pixel. In some embodiments, a valid pixel is a partially transparent pixel. In some embodiments, an invalid pixel is a fully transparent pixel.
In some embodiments, the material information describes at least one of the valid pixel and the invalid pixel. In some embodiments, the invalid pixel is represented in a first color, and the valid pixel is represented in a second color. In some embodiments, the first color is different from the second color.
In some embodiments, the method further includes filling a pixel in the RGB data or the depth data that corresponds to the invalid pixel in the RGB data or the depth data with a selected color using an encoder. In some embodiments, the selected color is similar to a color of the valid pixel in the RGB data that is near to the pixel that corresponds to the invalid pixel in the RGB data. In some embodiments, the selected color is visually similar to a color of the valid pixel in the depth data that is near to the pixel that corresponds to the invalid pixel in the depth data. The method uses visual similar colors for two reasons. The first reason is to improve standard compression techniques like H264 that compress similar colors better than large color changes. The second reason is that in the case an invalid pixel is erroneously classified as valid due to compression artifacts, the displayed color or depth value is similar enough to valid data that it will minimize visual artifacts.
In some embodiments, the transparency data has a first resolution, the RGB data that is stored in the first region has a second resolution, and the depth data that is stored in the second region has a third resolution. In some embodiments, the first resolution of the transparency data is different from at least one of the second resolution and the third resolution.
In some embodiments, the method further includes linearly interpolating the RGB data or the depth data to generate a smoothly varying value of the RGB data or the depth data, respectively and to fetch the RGB data or the depth data at a sub-pixel location, when the transparency data is stored at least in the third region. In some embodiments, the sub-pixel location of the RGB data or the depth data represents at least one of an x coordinate or a y coordinate. The x coordinate and they coordinate may include an integer value or a non-integer value.
In some embodiments, the render metadata includes an alpha value that represents transparency of at least one of the valid pixel or the invalid pixel. In some embodiments, the alpha value is stored in the at least the third region in the previously unused channel or in the luma channel.
In one aspect, a system for encoding a block-based volumetric video having a plurality of video frames of a 3D object into a 2D video format is provided. The system includes a memory that stores a set of instructions and a processor that executes the set of instructions and is configured to perform a method including: (i) splitting each video frame of the plurality of video frames into a first region that includes RGB data, a second region that includes depth data, and at least a third region containing render metadata of the 3D object and (ii) storing the render metadata of the 3D object in at least one of the first region that includes the RGB data, the second region that includes the depth data and the at least the third region in at least one channel that is selected from a U chroma channel, a V chroma channel, and a luma channel of the block-based volumetric video.
In some embodiments, the render metadata includes material information for rendering a surface of the 3D object.
In some embodiments, the material information includes a material property of a surface normal of a surface representation of surface data of the 3D object.
In some embodiments, the material information includes a 2D vector that represents a principal axis of anisotropy in a material of the 3D object.
In some embodiments, the material information describes at least one of a valid pixel that includes a valid volumetric content or an invalid pixel that does not include the valid volumetric content.
In some embodiments, the material information includes a transparency value that represents transparency data. In some embodiments, a relationship between the transparency value and whether a pixel is a valid pixel or an invalid pixel is defined by at least one of (i) if the transparency value is greater than a threshold, the pixel is a valid pixel and if the transparency value is lesser than the threshold, the pixel is an invalid pixel, or (ii) if the transparency value is lesser than the threshold, the pixel is a valid pixel and if the transparency value is greater than the threshold, the pixel is an invalid pixel. In some embodiments, a valid pixel is a fully opaque pixel. In some embodiments, a valid pixel is a partially transparent pixel. In some embodiments, an invalid pixel is a fully transparent pixel.
In some embodiments, the material information describes at least one of the valid pixel and the invalid pixel. In some embodiments, the invalid pixel is represented in a first color, and the valid pixel is represented in a second color. In some embodiments, the first color is different from the second color.
In another aspect, one or more non-transitory computer readable storage mediums storing one or more sequences of instructions, which when executed by one or more processors, causes a processor-implemented method for encoding a block-based volumetric video having a plurality of video frames of a 3D object into a 2D video format is provided. The method includes (i) splitting each video frame of the plurality of video frames into a first region that includes RGB data, a second region that includes depth data, and at least a third region containing render metadata of the 3D object and (ii) storing the render metadata of the 3D object in at least one of the first region that includes the RGB data, the second region that includes the depth data and the at least the third region in at least one channel that is selected from a U chroma channel, a V chroma channel, and a luma channel of the block-based volumetric video.
These and other aspects of the embodiments herein will be better appreciated and understood when considered in conjunction with the following description and the accompanying drawings. It should be understood, however, that the following descriptions, while indicating preferred embodiments and numerous specific details thereof, are given by way of illustration and not of limitation. Many changes and modifications may be made within the scope of the embodiments herein without departing from the spirit thereof, and the embodiments herein include all such modifications.
The embodiments herein will be better understood from the following detailed description with reference to the drawings, in which:
The embodiments herein and the various features and advantageous details thereof are explained more fully with reference to the non-limiting embodiments that are illustrated in the accompanying drawings and detailed in the following description. Descriptions of well-known components and processing techniques are omitted so as to not unnecessarily obscure the embodiments herein. The examples used herein are intended merely to facilitate an understanding of ways in which the embodiments herein may be practiced and to further enable those of skill in the art to practice the embodiments herein. Accordingly, the examples should not be construed as limiting the scope of the embodiments.
There remains a need for a more efficient method for mitigating and/or overcoming drawbacks associated with current methods. Referring now to the drawings, and more particularly to
In some embodiments, the content server 102 is implemented as a Content Delivery Network (CDN), e.g., an Amazon® CloudFront®, Cloudflare®, Azure® or an Edgecast® Content Delivery Network. In some embodiments, the content server 102 is associated with an online video publisher, e.g., YouTube by Google, Inc., Amazon Prime Video by Amazon, Inc., Apple TV by Apple, Inc., Hulu and Disney Plus by The Walt Disney Company, Netflix by Netflix, Inc., CBS All Access by ViacomCBS, Yahoo Finance by Verizon Media, etc., and/or an advertiser, e.g., Alphabet, Inc, Amazon Inc, Facebook, Instagram, etc. In some embodiments, the content server 102 is associated with a media company, e.g., Warner Media, News Corp, The Walt Disney Company, etc. In some embodiments, the content server 102 is a video conferencing server, e.g. a Jitsi or Janus Selective Forwarding Unit (SFU).
A partial list of devices that are capable of functioning as the content server 102, without limitation, may include a server, a server network, a mobile phone, a Personal Digital Assistant (PDA), a tablet, a desktop computer, or a laptop. In some embodiments, the network 104 is a wired network. In some embodiments, the network 104 is a wireless network. In some embodiments, the network 104 is a combination of the wired network and the wireless network. In some embodiments, the network 104 is the Internet.
The video decoder 106 may be part of a mobile phone, a headset, a tablet, a television, etc. The viewer device 120, without limitation, may be selected from a mobile phone, a gaming device, a Personal Digital Assistant, a tablet, a desktop computer, or a laptop.
The video decoder 106 receives a volumetric video from the content server 102 through the network 104. In some embodiments, the content server 102 delivers a 3 Dimensional (3D) content. In some embodiments, the 3D content is a 3D asset or a 3D video.
The video frame splitting module 108 of the video decoder 106 splits each video frame (F) 110 of the plurality of video frames into a first region, a second region, and at least a third region. The first region includes (Red, Green, and Blue) RGB data 110A, the second region includes depth data 110B, and the at least the third region containing render metadata 110C of the 3D object. The video frame splitting module 108 of the video decoder 106 then transmits the RGB data 110A, the depth data 110B, and the render metadata 110C to the GPU 112 and the encoder 116. In some embodiments, the 3D object is selected from, without limitation, any of a synthetic data object, a human being, animal, a natural scenery, etc.
In some embodiments, the RGB data 110A stores a color image for each block and represents a color of a 3D surface within a block. In some embodiments, the depth data 110B stores a grayscale image for each block and represents a 3D shape of the 3D surface within the block. In some embodiments, the depth data 110B represents the 3D shape of the 3D surface as a height-field. The depth data 110B may be encoded as a grayscale video in a luma channel. In some embodiments, the video frame is 1536×1024 pixels. In some embodiments, there are 255 tiles, each of which has RGB, depth, and transparency components. In some embodiments, RGB data has a 64×64 resolution while the depth data 110B and transparency data have a 32×32 resolution. One such example is shown in
In some embodiments, the render metadata 110C includes material information for rendering a surface of the 3D object. The render metadata 110C may be information that is necessary for rendering the surface of the 3D object. In some embodiments, the material information includes a material property of a surface normal of a surface representation of surface data of the 3D object. In some embodiments, the material property includes at least one of unit-length or a direction of the surface normal. The material information of a material of the 3D object, or the unit-length of the surface normal of the surface representation may be encoded in an unused U chroma channel and an unused V chroma channel.
In some embodiments, the surface representation includes a 2D surface that is embedded in 3 dimensions. In some embodiments, the surface representation includes the 2D surface that is parameterized in a rectangular grid. In some embodiments, the surface representation is parameterized in 2 dimensions as a depth map with color data.
In some embodiments, the material information includes a 2D vector that represents a principal axis of anisotropy in a material of the 3D object. For example, the material information may be a 2D parameterization of material properties, e.g., anisotropic specularity. In some embodiments, the 2D vector represents the principal axis of the anisotropy in the material of the 3D object is defined using a U chroma channel and a V chroma channel. In some embodiments, if a magnitude of the 2D vector is above a threshold, the material of the 3D object is identified as being anisotropic, and if the magnitude of the 2D vector is equal to or below the threshold, the material of the 3D object is identified as being isotropic.
In some embodiments, from the magnitude of zero to the threshold, the material is interpreted as going from shiny to matte, and then from the threshold to the maximum, the material is interpreted as going from matte to shiny in the direction of the 2D vector, while maintaining a constant matte reflectivity in a direction perpendicular to the 2D vector.
In some embodiments, the material information describes at least one of a valid pixel that includes a valid volumetric content or an invalid pixel that does not include the valid volumetric content.
The material information may include a transparency value that represents transparency data. In some embodiments, the transparency value that is stored in images is 8 bits. The transparency values may be mapped to floating-point values. In some embodiments, a relationship between the transparency value and whether a pixel is a valid pixel or an invalid pixel is defined by at least one of (i) if the transparency value is greater than a threshold, the pixel is a valid pixel and if the transparency value is lesser than the threshold, the pixel is an invalid pixel, or (ii) if the transparency value is lesser than the threshold, the pixel is a valid pixel and if the transparency value is greater than the threshold, the pixel is an invalid pixel. In some embodiments, a valid pixel is a fully opaque pixel. In some embodiments, a valid pixel is a partially transparent pixel. In some embodiments, an invalid pixel is a fully transparent pixel. The threshold value may be in a range of 0 to 256. In some embodiments, if the transparency data is stored in a separate channel, the threshold value may be half the range, e.g., 128.
In some embodiments, the transparency data has a first resolution, the RGB data 110A that is stored in the first region has a second resolution, the depth data 110B that is stored in the second region has a third resolution. In some embodiments, the first resolution of the transparency data is different from at least one of the second resolution and the third resolution. In some embodiments, the transparency data stored at least in the third region is stored in a previously unused channel.
The video frame splitting module 108 of the video decoder 106 stores the render metadata 110C of the 3D object in at least one of the first region that includes the RGB data 110A and the at least the third region in at least one channel that is selected from at least one of a U chroma channel, a V chroma channel, and a luma channel of the block-based volumetric video.
In some embodiments, the render metadata 110C includes an alpha value that represents transparency of at least one of the valid pixel or the invalid pixel. In some embodiments, the alpha value is stored in the at least the third region in the previously unused channel or the luma channel. In some embodiments, an alpha value is represented by 8 bits. In some embodiments, an alpha value of 255 means totally opaque, and an alpha value of 0 means totally transparent. In some embodiments, an alpha value of 240 or greater means totally opaque, and an alpha value of 16 or lesser means totally transparent. In some embodiments, an alpha value between the totally opaque and totally transparent threshold values indicates the degree of transparency.
In some embodiment, the material information describes at least one of the valid pixel and the invalid pixel. In some embodiments, the transparency data encoding module 118 of the encoder 116 represents the invalid pixel in a first color, and the valid pixel in a second color. In some embodiments, the first color is different from the second color.
In some embodiments, the transparency data encoding module 118 of the encoder 116 fills a pixel that corresponds to the invalid pixel in the RGB data 110A or the depth data 110B with a selected color. The selected color may be similar to a color of the valid pixel in the RGB data 110A that is near to the pixel that corresponds to the invalid pixel in the RGB data 110A. In some embodiments, the selected color is similar to a color of the valid pixel in the depth data 110B that is near to the pixel that corresponds to the invalid pixel in the depth data 110B.
In some embodiments, the RGB data 110A and the depth data 110B corresponding to a region of invalid pixels are filled with colors that are selected to smoothly interpolate between the RGB data 110A or the depth data 110B, respectively, corresponding to valid pixels that border the region. In some embodiments, filled values are selected using a diffusion process that minimizes magnitude of gradients between pixels in the region corresponding to the invalid pixels.
In some embodiments, if the transparency data or information on whether a pixel is valid or invalid that is stored in the at least the third region, then the encoder 116 fills the corresponding invalid pixel in the RGB data 110A or the depth data 110B with a similar color to valid values in the RGB data 110A and the depth data 110B. In some embodiments, if the transparency data or the information on whether the pixel is a valid pixel or invalid that is stored in the at least the third region, then the encoder 116 fills values in the RGB data 110A and the depth data 110B in full range.
The GPU 112 includes the transparency data interpolating module 114 that may linearly interpolate the RGB data 110A to generate a smoothly varying value of the RGB data 110A and to fetch the RGB data 110A at a sub-pixel location when the transparency data is stored in the at least the third region. Similarly, the transparency data interpolating module 114 may linearly interpolate the depth data 110B to generate a smoothly varying value of the depth data 110B and to fetch the depth data 110B at the sub-pixel location. The sub-pixel location of the RGB data 110A or the depth data 110B may represent at least one of an x coordinate or a y coordinate. In some embodiments, the x coordinate and the y coordinate include an integer value, e.g., −5, 1, 5, 8, 97 or a non-integer value, e.g., −1.43, 1¾, 3.14.
In some embodiments, a classification boundary 402 is inserted to classify the valid colors 404 and the invalid colors 406. In some embodiments, if a black color is used to indicate invalid pixels, darkest valid pixels may still be relatively close to the black color. In some embodiments, some invalid pixels may have a color that is above the classification boundary 502, and some valid pixels may have a color that is below the classification boundary 502 after compressing a block-based volumetric video. In some embodiments, if 0 is used to indicate the invalid pixels, anything less than the classification boundary of 16 may be considered invalid. In some embodiments, anything above or equal to the classification boundary of 16, e.g., 40 may be considered valid.
In some embodiments, the RGB data 110A and the depth data 110B corresponding to a region of invalid pixels are filled with colors that are selected to smoothly interpolate between the RGB data 110A or the depth data 110B, respectively, corresponding to the valid pixels that border the region. In some embodiments, filled values are selected using a diffusion process that minimizes magnitude of gradients between pixels in the region corresponding to the invalid pixels.
With reference to
With reference to
At step 604, the method 600 includes storing, the render metadata 110C of the 3D object in at least one of the first region that includes the RGB data 110A and the at least the third region in at least one channel that is selected from a U chroma channel, a V chroma channel, and a luma channel of the block-based volumetric video. The render metadata 110C may be information that is necessary for rendering a surface of the 3D object.
At step 704, the method 700 includes storing material information that describes at least one of a valid pixel that includes a valid volumetric content or an invalid pixel that does not include the valid volumetric content in the first region that includes the RGB data 110A of the 3D object. In some embodiments, the valid pixel is fully opaque or partially transparent. In some embodiments, the invalid pixel is fully transparent or partially opaque. At step 706, the method 700 includes representing the invalid pixel in a first color, and the valid pixel in a second color. In some embodiments, the first color is different from the second color.
At step 904, the method 900 includes storing the transparency data 302 of the 3D object in the at least the third region in a previously unused channel. In some embodiments, the previously unused channel is a luma channel. At step 906, the method 900 includes filling a pixel in the RGB data 110A or the depth data 110B that corresponds to the invalid pixel in the RGB data 110A or the depth data 110B with a selected color using the encoder 116 (as shown in
At step 908, the method 900 includes linearly interpolating the RGB data 110A or the depth data 110B to generate a smoothly varying value of the RGB data 110A or the depth data 110B, respectively and to fetch the RGB data 110A or the depth data 110B at a sub-pixel location when the transparency data 302 is stored in the at least the third region. The sub-pixel location of the RGB data 110A or the depth data 110B may represent at least one of an x coordinate or a y coordinate. In some embodiments, the x coordinate and the y coordinate include an integer value or a non-integer value.
The embodiments herein may include a computer program product configured to include a pre-configured set of instructions, which when performed, can result in actions as stated in conjunction with the methods described above. In an example, the pre-configured set of instructions can be stored on a tangible non-transitory computer readable medium or a program storage device. In an example, the tangible non-transitory computer readable medium can be configured to include the set of instructions, which when performed by a device, can cause the device to perform acts similar to the ones described here. Embodiments herein may also include tangible and/or non-transitory computer-readable storage media for carrying or having computer executable instructions or data structures stored thereon.
Generally, program modules utilized herein include routines, programs, components, data structures, objects, and the functions inherent in the design of special-purpose processors, etc. that perform particular tasks or implement particular abstract data types. Computer executable instructions, associated data structures, and program modules represent examples of the program code means for executing steps of the methods disclosed herein. The particular sequence of such executable instructions or associated data structures represents examples of corresponding acts for implementing the functions described in such steps.
The embodiments herein can include both hardware and software elements. The embodiments that are implemented in software include but are not limited to, firmware, resident software, microcode, etc.
A data processing system suitable for storing and/or executing program code will include at least one processor coupled directly or indirectly to memory elements through a system bus. The memory elements can include local memory employed during actual execution of the program code, bulk storage, and cache memories which provide temporary storage of at least some program code in order to reduce the number of times code must be retrieved from bulk storage during execution.
Input/output (I/O) devices (including but not limited to keyboards, displays, pointing devices, etc.) can be coupled to the system either directly or through intervening I/O controllers. Network adapters may also be coupled to the system to enable the data processing system to become coupled to other data processing systems or remote printers or storage devices through intervening private or public networks. Modems, cable modem and Ethernet cards are just a few of the currently available types of network adapters.
A representative hardware environment for practicing the embodiments herein is depicted in
The foregoing description of the specific embodiments will so fully reveal the general nature of the embodiments herein that others can, by applying current knowledge, readily modify and/or adapt for various applications such specific embodiments without departing from the generic concept, and, therefore, such adaptations and modifications should and are intended to be comprehended within the meaning and range of equivalents of the disclosed embodiments. It is to be understood that the phraseology or terminology employed herein is for the purpose of description and not of limitation. Therefore, while the embodiments herein have been described in terms of preferred embodiments, those skilled in the art will recognize that the embodiments herein can be practiced with modification within the spirit and scope of the appended claims.
This patent application is a continuation-in-part of, and claims priority to, all the following including pending U.S. patent application Ser. No. 16/872,259 filed on May 11, 2020, which is a continuation-in-part of U.S. patent application Ser. No. 16/440,369 filed Jun. 13, 2019, now U.S. Pat. No. 10,692,247, which is a continuation-in-part of U.S. patent application Ser. No. 16/262,860 filed on Jan. 30, 2019, now U.S. Pat. No. 10,360,727, which is a continuation-in-part of PCT patent application no. PCT/US18/44826 filed on Aug. 1, 2018, U.S. non-provisional patent application Ser. No. 16/049,764 filed on Jul. 30, 2018, now U.S. Pat. No. 10,229,537, and U.S. provisional patent application No. 62/540,111 filed on Aug. 2, 2017, the complete disclosures of which, in their entireties, are hereby incorporated by reference.
Number | Date | Country | |
---|---|---|---|
Parent | 16872259 | May 2020 | US |
Child | 17334769 | US | |
Parent | 16440369 | Jun 2019 | US |
Child | 16872259 | US | |
Parent | 16262860 | Jan 2019 | US |
Child | 16440369 | US |