Sphere projected motion estimation/compensation and mode decision

Description

BACKGROUND

The present disclosure relates to coding of 360° video to achieve bandwidth compression and, in particular, to techniques for processing 360° video using compression algorithms that were designed to process two-dimensional video data.

The term “360° video” refers to video recordings where views in multiple directions, sometimes in all directions about a camera, are recorded at the same time. The 360° video may be captured using an omnidirectional camera or a collection of cameras that capture image data from different but overlapping fields of view and whose outputs are stitched together. A viewer of a 360° video may be given control over viewing direction during playback, which allows the viewer to navigate within the video's field of view.

Although 360° video captures image information from a three-dimensional space, the video data itself often is represented by image data in a two-dimensional format. The image data is represented by an array of pixels arranged at predetermined spatial locations in two dimensions (e.g., x, y locations). And, while objects at different depths within a field of view will be represented in the image data having sizes that correspond not only to the object's physical size but also to its distance from a camera, the pixel data that represents the objects do not vary pixel locations by depth.

The two-dimensional representation of a three-dimensional space can cause distortions of image data in different locations in a field of view. For example, straight lines in a three-dimensional spare might not appear as straight lines in two-dimensional image data. Moreover, the sizes and shapes of different objects may become distorted as they move about within the field of view of a 360° image. These changes pose challenges to processing systems that operate on such image data.

As one example, the distortions can cause issues in video coding. Video coders typically reduce bandwidth of image signals by exploiting spatial and temporal redundancies in image data. Such redundancies, however, are not always detected by video coders that operate on two-dimensional representations of three-dimensional images due to the distortions that can arise from frame to frame. When such video coders fail to detect redundancies in content, they often generate coded representations of image data that are not as bandwidth-efficient as they could be.

Accordingly, the inventors perceive a need in the art for a video coding system that better recognizes redundancies in two-dimensional representations of three-dimensional image content, such as with 360° video.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 illustrates a system for exchange of 360° video.

FIG. 2 is a functional block diagram of a coding system according to an embodiment of the present disclosure.

FIG. 3 illustrates an exemplary transform that may be performed for spherical projection in an embodiment of the present disclosure.

FIG. 4 illustrates an exemplary transform that may performed for spherical projection in another embodiment of the present disclosure.

FIG. 5 illustrates a coding method according to an embodiment of the present disclosure.

FIG. 6 illustrates an exemplary spherical projection that may be performed by an embodiment of the present disclosure.

FIG. 7 is a functional block diagram of a coding system according to an embodiment of the present disclosure.

FIG. 8 is a functional block diagram of a decoding system according to an embodiment of the present disclosure.

FIG. 9 illustrates a coding method according to an embodiment of the present disclosure.

FIG. 10 illustrates an exemplary spherical projection that may be performed by another embodiment of the present disclosure.

FIG. 11 illustrates an exemplary spherical projection that may be performed by a further embodiment of the present disclosure.

FIG. 12 is a functional block diagram of a decoding system according to another embodiment of the present disclosure.

FIG. 13 is a functional block diagram of a decoding system according to a further embodiment of the present disclosure.

DETAILED DESCRIPTION

Embodiments of the present disclosure provide techniques for coding video data predictively based on predictions made from spherical-domain projections of input pictures to be coded and reference pictures that are prediction candidates. Spherical projection of an input picture and the candidate reference pictures may be generated. Thereafter, a search may be conducted for a match between the spherical-domain representation of a pixel block to be coded and a spherical-domain representation of the reference picture. On a match, an offset may be determined between the spherical-domain representation of the pixel block to a matching portion of the of the reference picture in the spherical-domain representation. The spherical-domain offset may be transformed to a motion vector in a source-domain representation of the input picture, and the pixel block may be coded predictively with reference to a source-domain representation of the matching portion of the reference picture.

FIG. 1 illustrates a system 100 in which embodiments of the present disclosure may be employed. The system 100 may include at least two terminals 110-120 interconnected via a network 130. The first terminal 110 may have a camera system 112 that captures 360° video. The terminal 110 also may include coding systems and transmission systems (not shown) to transmit coded representations of the 360° video to the second terminal 120, where it may be consumed. For example, the second terminal 120 may display the 360° video on a local display, it may execute a video editing program to modify the 360° video, or may integrate the 360° into an application (for example, a virtual reality program), may present in head mounted display (for example, virtual reality applications) or it may store the 360° video for later use.

FIG. 1 illustrates components that are appropriate for unidirectional transmission of 360° video, from the first terminal 110 to the second terminal 120. In some applications, it may be appropriate to provide for bidirectional exchange of video data, in which case the second terminal 120 may include its own camera system, video coder and transmitters (not shown), and the first terminal 100 may include its own receiver and display (also not shown). If it is desired to exchange 360° video bidirectionally, then the techniques discussed hereinbelow may be replicated to generate a pair of independent unidirectional exchanges of 360° video. In other applications, it would be permissible to transmit 360° video in one direction (e.g., from the first terminal 110 to the second terminal 120) and transmit “flat” video (e.g., video from a limited field of view in a reverse direction.

In FIG. 1, the second terminal 120 is illustrated as a computer display but the principles of the present disclosure are not so limited. Embodiments of the present disclosure find application with laptop computers, tablet computers, smart phones, servers, media players, virtual reality head mounted displays, augmented reality display, hologram displays, and/or dedicated video conferencing equipment. The network 130 represents any number of networks that convey coded video data among the terminals 110-120, including, for example, wireline and/or wireless communication networks. The communication network 130 may exchange data in circuit-switched and/or packet-switched channels. Representative networks include telecommunications networks, local area networks, wide area networks and/or the Internet. For the purposes of the present discussion, the architecture and topology of the network 130 is immaterial to the operation of the present disclosure unless explained hereinbelow.

FIG. 2 is a functional block diagram of a coding system 200 according to an embodiment of the present disclosure. The system 200 may include a camera system 210, an image processing system 220, a video coder 230, a video decoder 240, a reference picture store 250, a predictor 260 and a pair of spherical transform units 270, 280. The camera system 210 may generate image data representing a local environment as a so-called “360° image,” containing image data of a field of view that extends around the camera system 210 in all directions. The image processing system 220 may convert the image data from the camera system 210 as needed to fit requirements of the video coder 230. The video coder 230 may generate a coded representation of its input image data, typically by exploiting spatial and/or temporal redundancies in the image data. The video coder 230 may output a coded representation of the input data that consumes less bandwidth than the input data when transmitted and/or stored.

The video decoder 240 may invert coding operations performed by the video encoder 230 to obtain a reconstructed picture from the coded video data. Typically, the coding processes applied by the video coder 230 are lossy processes, which cause the reconstructed picture to possess various errors when compared to the original picture. The video decoder 240 may reconstruct picture of select coded pictures, which are designated as “reference pictures,” and store the decoded reference pictures in the reference picture store 250. In the absence of transmission errors, the decoded reference pictures will replicate decoded reference pictures obtained by a decoder (not shown in FIG. 2).

The predictor 260 may select prediction references for new input pictures as they are coded. For each portion of the input picture being coded (called a “pixel block” for convenience), the predictor 260 may select a coding mode and identify a portion of a reference picture that may serve as a prediction reference search for the pixel block being coded. The coding mode may be an intra-coding mode, in which ease the prediction reference may be drawn from a previously-coded (and decoded) portion of the picture being coded. Alternatively, the coding mode may be an inter-coding mode, in which case the prediction reference may be drawn front another previously-coded and decoded picture. In an embodiment, the predictor 260 may search for prediction references of pictures being coded operating on input picture and reference picture that has been transformed to a spherical projection representation. The spherical transform units 270, 280 may transform the input picture and the reference picture to the spherical projection representations.

When an appropriate prediction reference is identified, the predictor 260 may furnish the prediction data to the video coder 230 in a representation that the video coder 230 accepts. Typically, the reference picture(s) stored in the reference picture store will be in a format that the video coder accepts.

As indicated, the coded video data output by the video coder 230 should consume less bandwidth than the input data when transmitted and/or stored. The coding system 200 may output the coded video data to an output device 290, such as a transmitter (not shown) that may transmit the coded video data across a communication network 130 (FIG. 1) or a storage device (also not shown) such as an electronic-, magnetic- and/or optical storage medium.

FIG. 3 illustrates exemplary transforms that may be performed by the spherical transform units 270, 280 of FIG. 2 in a first embodiment. In this embodiment, a camera system 210 (FIG. 2) may perform a 360° capture operation 310 and output an equirectangular picture 320 having dimensions M×N pixels. The equirectangular picture 320 may represent a 360° field of view having been partitioned along a slice 312 that divides a cylindrical field of view into a two dimensional array of data. In the equirectangular picture 320, pixels on either side of the slice 312 represent adjacent image content even though they appear on different edges of the equirectangular picture 320.

The spherical transform unit 270 may transform pixel data at locations (x, y) within the equirectangular picture 320 to locations (θ, φ) along a spherical projection 330 according to a transform such as:

θ=x+θ₀, and (Eq. 1.)
φ=y+φ₀, where (Eq. 2.)

θ and φ respectively represents the longitude and latitude of a location in the spherical projection 330, θ₀, φ₀represent an origin of the spherical projection 330, and x and y represent the horizontal and vertical coordinates of the source data in the equirectangular picture 320.

When applying the transform, the spherical transform unit 270 may transform each pixel location along a predetermined row of the equirectangular picture 320 to have a unique location at an equatorial latitude in the spherical projection 330. In such regions, each location in the spherical projection 330 may be assigned pixel values from corresponding locations of the equirectangular picture 320. At other locations, particularly toward poles of the spherical projection 330, the spherical projection unit 270 may map several source locations from the equirectangular picture 320 to a common location in the spherical projection 330. In such a case, the spherical projection unit 270 may derive pixel values for the locations in the spherical projection 330 from a blending of corresponding pixel values in the equirectangular picture 320 (for example, by averaging pixel values at corresponding locations of the equirectangular picture 320).

FIG. 4 illustrates exemplary transforms that may be performed by the spherical transform units 270, 250 of FIG. 2 in another embodiment. In this embodiment, a camera system 110 (FIG. 2) may perform a 360° capture operation 410 and output an picture 420 having dimensions M×N pixels in which image content is arranged according to a cube map. The image capture 410 may capture image data in each of a predetermined number of directions (typically, six) which are stitched together according to the cube map layout. In the example illustrated in FIG. 4, six sub-images corresponding to a left view 411, a front view 412, a right view 413, a back view 414, a top view 415 and a bottom view 416 may be captured and arranged within the cube map picture 420 according to “seams” of image content between the respective views. Thus, as illustrated in FIG. 4, pixels from the front image that are adjacent to the pixels from each of the top, the left, the right and the bottom images represent image content that is adjacent respectively to content of the adjoining sub-images. Similarly, pixels from the right and back images that are adjacent to each other represent adjacent image content. Further, content from a terminal edge 422 of the back image is adjacent to content from an opposing terminal edge 424 of the left image. The cube map picture 420 also may have regions 426.1-426.4 that do not belong to any image.

The spherical transform unit 270 may transform pixel data at locations (x, y) within the cube map picture 420 to locations (θ, φ) along a spherical projection 330 according to transforms derived from each sub-image in the cube map. FIG. 4 illustrates six faces 411-416 of the image capture 410 superimposed over the spherical projection 430 that is to be generated. Each sub-image of the image capture corresponds to a predetermined angular region of a surface of the spherical projection 430. Thus, image data of the front face 412 may be projected to a predetermined portion on the surface of the spherical projection, and image data of the left, right, back, top and bottom sub-images may be projected on corresponding portions of the surface of the spherical projection.

In a cube map having square sub-images, that is, height and width of the sub-images 411-416 are equal, each sub-image projects to a 90°×90° region of the projection surface. Thus, each position x, y with a sub-image maps to a θ, φ location on the spherical projection 430 based on a sinusoidal projection of the form φ=f^k(x, y) and θ=g^k(x, y), where x, y represent displacements from a center of the cube face k for top, bottom, front, right, left, right and θ, φ represent angular deviations in the sphere.

When applying the transform, some pixel locations in the cube map picture 420 may map to a unique location in the spherical projection 430. In such regions, each location in the spherical projection 430 may be assigned pixel values from corresponding locations of the cube map picture 420. At other locations, particularly toward edges of the respective sub-images, the spherical projection unit 270 may map image data from several source locations in the cube map picture 420 to a common location in the spherical projection 430. In such a case, the spherical projection unit 270 may derive pixel values for the locations in the spherical projection 430 from a blending of corresponding pixel values in the cube map picture 420 (for example, by a weighted averaging pixel values at corresponding locations of cube map picture 420).

The techniques of the present disclosure find application with other types of image capture techniques. For example, truncated pyramid-, tetrahedral-, octahedral-, dodecahedral- and icosahedral-based image capture techniques may be employed. Images obtained therefrom may be mapped to a spherical projection through analogous techniques.

Returning to FIG. 2, a predictor 260 may perform prediction searches using image data obtained by spherical projection. Thus, the spherical transform unit 270 may transform image data captured by the imaging system into a spherical projection and the second spherical transform unit 280 may transform candidate reference pictures into other spherical projections. The predictor 260 may search for prediction reference data to be used by the video coder 230 in the spherical domain. Once an appropriate prediction match is identified, the predictor 260 may furnish prediction data from the reference picture store 250 in the format that is accepted by the video coder 230 (e.g., without transforming it to the spherical projection).

FIG. 5 illustrates a coding method 500 according to an embodiment of the present disclosure. The method 500 may operate on a pixel block by pixel block basis to code a new input picture that is to be coded. The method 500 may begin by transforming data of an input pixel block to a spherical representation (box 510). The method 300 also may transform a candidate reference picture to the spherical representation (box 520). Thereafter, the method 500 may perform a prediction search (box 530) from a comparison between the transformed pixel block data and transformed reference picture data. When an appropriate prediction reference is found, the method 500 may code the input pixel block differentially using the matching reference picture data (the “reference block,” for convenience) as a basis for prediction (box 540). Typically, this differential coding includes a calculation of pixel residuals from a pixel-wise subtraction of prediction block data from the input pixel block data (box 542) and a transformation, quantization and entropy coding of the pixel residuals obtained therefrom (box 544). In this regard, the method 500 may adhere to coding protocols defined by a prevailing coding specification, such as ITU H.265 (also known as “HEVC”), H.264 (also, “AVC”) or a predecessor coding specification. These specifications define protocols for defining pixel blocks, defining search windows for prediction references, and for performing differential coding of pixel blocks with reference to reference blocks. The method 500 also may transform spherical-domain representation of the motion vector to a coder-domain representation, the representation used by the video coding specification (box 546). The method 500 may output the coded pixel residuals, motion vectors and other metadata associated with prediction (typically, coding mode indicators and reference picture IDs) (box 548).

The prediction search (box 530) may be performed to maximize bandwidth conservation and to minimize information losses. The method 500 may perform operations to estimate when appropriate prediction reference(s) are found. In an embodiment, for each input pixel block, the method 500 may rotate the spherical projection of the reference frame about a plurality of candidate rotations with respect to the transformed input pixel block (box 532). At each candidate rotation, the method may estimate prediction residuals that would be obtained if the candidate rotation were used (box 534). These computations may be performed by a pixel-wise comparison of the spherically-projected input pixel block and a portion of the rotated candidate reference frame that aligns with the location of the input pixel block. Typically, when the comparisons generate pixel residuals of high magnitude and high variance will lead to lower coding efficiencies than comparisons of other candidate pixel blocks that generate pixel residuals having lower magnitude and lower variance. The method 500 also may estimate coding distortions that would arise if the candidate reference block were used (box 536). These computations may be performed by estimating loss of pixel residuals based on quantization parameter levels that are predicted to be applied to the input pixel block, again operating in a domain of the spherical projection. Once estimates have been obtained for all candidate reference blocks under consideration, the method 500 may select the reference pixel block that minimizes overall coding cost (box 538).

For example, the coding cost J of an input pixel block with reference to a candidate “reference block” BLK_α,β,γ that aligns with the location of the input pixel block when the reference frame is rotated by an angle α, β, γ may be given as:

J=Bits(BLK_α,β,γ)+k*DIST(BLK_α,β,γ), where (Eq. 3.)

Bits(BLK_α,β,γ) represents a number of bits estimated to be required to code the input pixel block with reference to the reference block BLK_α,β,γ and DIST(BLK_α,β,γ) represents the distortion that would be obtained from coding the input pixel block with reference to the reference block BLK_α,β,γ and k may be an operator-selected scalar to balance contribution of these factors. As explained, the method 500 may be performed to select a reference pixel block that minimizes the value J.

FIG. 6 figuratively illustrates prediction operations between a spherical protection of an input frame 610 and a spherical projection of a reference frame 620 according to the embodiment of FIG. 5. As discussed, the two-dimensional representation of the input frame may be parsed into a plurality of pixel blocks, which are to be coded. These pixel blocks may have corresponding projections in the spherical projection of the input frame 610 as shown in FIG. 6. The blocks' sizes and shapes may vary according to their locations in the spherical projection. For example, pixel blocks that, in the two-dimensional representation, are located toward top and bottom edges of an image may project to shapes that are triangular in the spherical projection of the input frame 610 and will be located toward polar regions of the spherical projection. Pixel blocks that are located along rows in the middle of the two-dimensional representation may project to shapes that are approximately rectangular and will be located toward equatorial regions of the spherical project. Pixel blocks from the two-dimensional representation that map to intermediate locations between the equatorial regions and the polar regions may have generally trapezoidal shapes. All such representations of pixel blocks to the spherical projection may be considered under the method 500 of FIG. 5 even though their shapes vary from pixel block to pixel block.

FIG. 6 also illustrates an exemplary rotation of a spherical projection of a reference frame 620. Here, an axis 622 of the spherical projection of the reference frame 620 is shown as rotated by an angle α, β, γ from its original position. When coding pixel blocks 614, 616 of the input frame, the method 500 of FIG. 5 may perform pixel-wise comparisons between pixels in the spherical projection of the input frame to pixels in co-located reference blocks (shown as blocks 624, 626 respectively) in the spherical projection of the reference frame 620. Here, again, reference blocks may vary in size and shapes at different locations in the spherical projection of the reference picture 620.

When a predict on reference is selected for an input pixel block, the angle of rotation α, β, γ that corresponds to the selected prediction reference may be converted to a motion vector in the two-dimensional space of the reference picture. This motion vector may be transmitted as part of coded video data of the input pixel block.

Motion vectors for many coding protocols, such as HEVC and AVC, are limited to describing spatial displacements (e.g., x and y directions) in the two-dimensional domain of the input frames and references frames. It may occur that the angle of rotation α, β, γ for a given prediction search maps to a spatial location in the two-dimensional domain that is both displaced by x and y directions and also is rotated with respect to a source pixel block. In one embodiment, if desired to utilize the principles of the present disclosure with such video coders, input pixel blocks may be coded using prediction references that are identified solely by the x and y displacements obtained from a conversion of the motion vector from the spherical domain to the two-dimensional domain. In other words, if an angle of rotation α, β, γ in the spherical domain converts to a motion vector of the form Δx, Δy and λ (where λ represents a rotation of a pixel block in the two-dimensional spare), a video coder may perform prediction using a prediction reference selected by a motion vector of the form Δx, Δy, where λ is ignored. However, better performance is expected to be achieved where motion vectors for prediction may be represented fully, for example, in either a Δx, Δy, λ format or a α, β, γ format; these alternate embodiments are discussed hereinbelow.

Many coding applications perform motion estimation at granularities smaller than an individual pixel of a reference picture. For example, in the HEVC and H.264 protocols, video coders perform motion estimation at quarter-pixel and/or half-pixel increments. In an embodiment, a video encoder may perform spatial interpolation to develop image data sufficient to perform motion estimation at these smaller granularities. For example, the video encoder may perform interpolation to find a matching reference pixel to the source pixel with finer rotation angle. In this manner, the spherically-projected rotation data may contain sufficient information to perform prediction searches as such granularities.

The foregoing process also may be applied for use in intra-coding. When an input pixel block is to be coded, it may be coded with reference to previously-coded image data of the same frame in which the pixel block is located. Thus, referring to FIG. 6, by the time pixel block 616 is to be coded, coded data for several other pixel blocks (including block 614) will have been coded and transmitted to a decoder. Both the encoder and the decoder will have decoded data of those pixel blocks, which may be used as a source of prediction for pixel block 616. According to an embodiment a video coder may search for appropriate prediction data within the previously-coded and decoded image data of the current frame, by rotating the coded image data through various permutations of α, β, γ to identify a prediction reference on an intra-coding basis. Also the direction of intra prediction in 2D can be reflected in sphere of 610 as block 616 is changed into wedge-like shape. For example, vertical direction for prediction in 2D frame can be mapped accordingly in 610 with interpolation. The video coder may estimate a coding cost J of the candidate prediction references according to the techniques of Eq. 3.

FIG. 7 is a functional block diagram of a coding system 700 according to an embodiment of the present disclosure. The system 700 may include a pixel block coder 710, a pixel block decoder 720, an in-loop filter system 730, a reference picture store 740, a pair of spherical transform units 750, 760, a predictor 770, a controller 780, and a syntax unit 790. The pixel block coder and decoder 710, 720 and the predictor 770 may operate iteratively on individual pixel blocks of a picture. The predictor 770 may predict data for use during coding of a newly-presented input pixel block. The pixel block coder 710 may code the new pixel block by predictive coding techniques and present coded pixel block data to the syntax unit 790. The pixel block decoder 720 may decode the coded pixel block data, generating decoded pixel block data therefrom. The in-loop filter 730 may perform various filtering operations on a decoded picture that is assembled front the decoded pixel blocks obtained by the pixel block decoder 720. The filtered picture may be stored in the reference picture store 740 where it may be used as a source of prediction of a later-received pixel block. The syntax unit 790 may assemble a data stream from the coded pixel block data which conforms to a governing coding protocol.

The pixel block coder 710 may include a subtractor 712, a transform unit 714, a quantizer 716, and an entropy coder 718. The pixel block coder 710 may accept pixel blocks of input data at the subtractor 712. The subtractor 712 may receive predicted pixel blocks from the predictor 770 and generate an array of pixel residuals therefrom representing a difference between the input pixel block and the predicted pixel block. The transform unit 714 may apply a transform to the sample data output from the subtractor 712, to convert data from the pixel domain to a domain of transform coefficients. The quantizer 716 may perform quantization of transform coefficients output by the transform unit 714. The quantizer 716 may be a uniform or a non-uniform quantizer. The entropy coder 718 may reduce bandwidth of the output of the coefficient quantizer by coding the output, for example, by variable length code words.

The transform unit 714 may operate in a variety of transform modes as determined by the controller 780. For example, the transform unit 714 may apply a discrete cosine transform (DCT), a discrete sine transform (DST), a Walsh-Hadamard transform, a Haar transform, a Daubechies wavelet transform, or the like. In an embodiment, the controller 780 may select a coding mode M to be applied by the transform unit 715, may configure the transform unit 715 accordingly and may signal the coding mode M in the coded video data, either expressly or impliedly.

The quantizer 716 may operate according to a quantization parameter Q_Pthat is supplied by the controller 780. In an embodiment, the quantization parameter Q_Pmay be applied to the transform coefficients as a multi-value quantization parameter, which may vary, for example, across different coefficient locations within a transform-domain pixel block. Thus, the quantization parameter Q_Pmay be provided as a quantization parameters array.

The pixel block decoder 720 may invert coding operations of the pixel block coder 710. For example, the pixel block decoder 720 may include a dequantizer 722, an inverse transform unit 724, and an adder 726. The pixel block decoder 720 may take its input data from an output of the quantizer 716. Although permissible, the pixel block decoder 720 need not perform entropy decoding of entropy-coded data since entropy coding is a lossless event. The dequantizer 722 may invert operations of the quantizer 716 of the pixel block coder 710. The dequantizer 722 may perform uniform or non-uniform de-quantization as specified by the decoded signal Q_P. Similarly, the inverse transform unit 724 may invert operations of the transform unit 714. The dequantizer 722 and the inverse transform unit 724 may use the same quantization parameters Q_Pand transform mode M as their counterparts in Me pixel block coder 710. Quantization operations likely will truncate data in various respects and, therefore, data recovered by the dequantizer 722 likely will possess coding errors when compared to the data presented to the quantizer 716 in the pixel block coder 710.

The adder 726 may invert operations performed by the subtractor 712. It may receive the same prediction pixel block from the predictor 770 that the subtractor 712 used in generating residual signals. The adder 726 may add the prediction pixel block to reconstructed residual values output by the inverse transform unit 724 and may output reconstructed pixel block data.

The in-loop filter 730 may perform various filtering operations on recovered pixel block data. For example, the in-loop filter 730 may include a deblocking filter 732 and a sample adaptive offset (“SAO”) filter 733. The deblocking filter 732 may filter data at seams between reconstructed pixel blocks to reduce discontinuities between the pixel blocks that arise due to coding. SAO filters may add offsets to pixel values according to an SAO “type,” for example, based on edge direction/shape and/or pixel/color component level. The in-loop filter 730 may operate according to parameters that are selected by the controller 780.

The reference picture store 740 may store filtered pixel data for use in later prediction of other pixel blocks. Different types of prediction data are made available to the predictor 770 for different prediction modes. For example, for an input pixel block, intra prediction takes a prediction reference from decoded data of the same picture in which the input pixel block is located. Thus, the reference picture store 740 may store decoded pixel block data of each picture as it is coded. For the same input pixel block, inter prediction may take a prediction reference from previously coded and decoded picture(s) that are designated as reference pictures. Thus, the reference picture store 740 may store these decoded reference pictures.

The spherical transform units 750, 760 may perform transforms of image data to spherical projection representations. The first spherical transform unit 750 may perform its transform on candidate prediction reference data from the reference picture store, whether for intra prediction or inter prediction. The second spherical transform unit 760 may perform its transform on input video data as it is presented to the pixel block coder 710. The spherical transform units 750, 760 may output their transformed data, respectively, to the predictor 770.

As discussed, the predictor 770 may supply prediction data to the pixel block coder 710 for use in generating residuals. The predictor 770 may include an inter predictor 772, an intra predictor 773 and a mode decision unit 774. The inter predictor 772 may receive spherically-projected pixel block data representing a new pixel block to be coded and may search spherical projections of reference picture data from store 740 for pixel block data from reference picture(s) for use in coding the input pixel block. The inter predictor 772 may support a plurality of prediction modes, such as P mode coding and B mode coding. The inter predictor 772 may select an inter prediction mode and an identification of candidate prediction reference data that provides a closest match to the input pixel block being coded. The inter predictor 772 may generate prediction reference metadata such as motion vectors, to identify which portion(s) of which reference pictures were selected as source(s) of prediction for the input pixel block.

The intra predictor 773 may support Intra (I) mode coding. The intra predictor 773 may search from among spherically-projected pixel block data from the same picture as the pixel block being coded that provides a closest match to the spherically-projected input pixel block. The intra predictor 773 also may generate prediction reference indicators to identify which portion of the picture was selected as a source of prediction for the input pixel block.

The mode decision unit 774 may select a final coding mode to be applied to the input pixel block. Typically, as described above, the mode decision unit 774 selects the prediction mode that will achieve the lowest distortion when video is decoded given a target bitrate. Exceptions may arise when coding modes are selected to satisfy other policies to which the coding system 700 adheres, such as satisfying a particular channel behavior, or supporting random access or data refresh policies. When the mode decision selects the final coding mode, the mode decision unit 774 may output a non-spherically-projected reference block from the store 740 to the pixel block coder and decoder 710, 720 and may supply to the controller 780 an identification of the selected prediction mode along with the prediction reference indicators corresponding to the selected mode.

The controller 780 may control overall operation of the coding system 700. The controller 780 may select operational parameters for the pixel block coder 710 and the predictor 770 based on analyses of input pixel blocks and also external constraints, such as coding nitrate targets and other operational parameters. As is relevant to the present discussion, when it selects quantization parameters Q_P, the use of uniform or non-uniform quantizers, and/or the transform mode M, it may provide those parameters to the syntax unit 790, which may include data representing those parameters in the data stream of coded video data output by the system 700.

During operation, the controller 780 may revise operational parameters of the quantizer 716 and the transform unit 715 at different granularities of image data, either on a per pixel block basis or on a larger granularity (for example, per picture, per slice, per largest coding unit (“LCU”) or another region). In an embodiment, the quantization parameters may be revised on a per-pixel basis within a coded picture.

Additionally, as discussed, the controller 780 may control operation of the in-loop filter 730 and the prediction unit 770. Such control may include, for the prediction unit 770, mode selection (lambda, modes to be tested, search windows, distortion strategies, etc.), and, for the in-loop filter 730 selection of filter parameters, reordering parameters, weighted prediction, etc.

FIG. 8 is a functional block diagram of a decoding system 800 according to an embodiment of the present disclosure. The decoding system 800 may include a syntax unit 810, a pixel block decoder 820, an in-loop filter 830, a reference picture store 840, a predictor 850 and controller 860. The syntax unit 810 may receive a coded video data stream and may parse the coded data into its constituent parts. Data representing coding parameters may be furnished to the controller 860 while data representing coded residuals (the data output by the pixel block coder 210 of FIG. 2) may be furnished to the pixel block decoder 820. The pixel block decoder 820 may invert coding operations provided by the pixel block coder (FIG. 2). The in-loop filter 830 may filter reconstructed pixel block data. The reconstructed pixel block data may be assembled into pictures for display and output from the decoding system 800 as output video. The pictures also may be stored in the prediction buffer 840 for use in prediction operations. The predictor 850 may supply prediction data to the pixel block decoder 820 as determined by coding data received in the coded video data stream.

The pixel block decoder 820 may include an entropy decoder 822, a dequantizer 824, an inverse transform unit 826, and an adder 828. The entropy decoder 822 may perform entropy decoding to invert processes performed by the entropy coder 718 (FIG. 8). The dequantizer 824 may invert operations of the quantizer 716 of the pixel block coder 710 (FIG. 7). Similarly, the inverse transform unit 826 may invert operations of the transform unit 714 (FIG. 7). They may use the quantization parameters Q_Pand transform modes M that are provided in the coded video data stream. Because quantization is likely to truncate data, the data recovered by the dequantizer 824, likely will possess coding errors when compared to the input data presented to its counterpart quantizer 716 in the pixel block coder 210 (FIG. 2).

The adder 828 may invert operations performed by the subtractor 712 (FIG. 7). It may receive a prediction pixel block from the predictor 850 as determined by prediction references in the coded video data stream. The adder 828 may add the prediction pixel block to reconstructed residual values output by the inverse transform unit 826 and may output reconstructed pixel block data.

The in-loop filter 830 may perform various filtering operations on reconstructed pixel block data. As illustrated, the in-loop filter 830 may include a deblocking filter 832 and an SAO filter 834. The deblocking filter 832 may filter data at seams between reconstructed pixel blocks to reduce discontinuities between the pixel blocks that arise due to coding. SAO filters 834 may add offset to pixel values according to an SAO type, for example, based on edge direction/shape and/or pixel level. Other types of in-loop filters may also be used in a similar manner. Operation of the deblocking filter 832 and the SAO filter 834 ideally would mimic operation of their counterparts in the coding system 700 (FIG. 7). Thus, in the absence of transmission errors or other abnormalities, the decoded picture obtained from the in-loop filter 830 of the decoding system 800 would be the same as the decoded picture obtained from the in-loop filter 730 of the coding system 700 (FIG. 7); in this manner, the coding system 700 and the decoding system 800 should store a common set of reference pictures in their respective reference picture stores 740, 840.

The reference picture stores 840 may store filtered pixel data for use in later prediction of other pixel blocks. The reference picture stores 840 may store decoded pixel block data of each picture as it is coded for use in intra prediction. The reference picture stores 840 also may store decoded reference pictures.

As discussed, the predictor 850 may supply prediction data to the pixel block decoder 820. The predictor 850 may supply predicted pixel block data as determined by the prediction reference indicators supplied in the coded video data stream.

The controller 860 may control overall operation of the coding system 800. The controller 860 may set operational parameters for the pixel block decoder 820 and the predictor 850 based on parameters received in the coded video data stream. As is relevant to the present discussion, these operational parameters may include quantization parameters Q_Pfor the dequantizer 824 and transform modes M for the inverse transform unit 815. As discussed, the received parameters may be set at various granularities of image data, for example on a per pixel block basis, a per picture basis, a per slice basis, a per LCU basis, or based on other types of regions defined for the input image.

In an embodiment, use of spherical transforms by an encoder for selection of prediction references during coding does not require a decoder to use such transforms. In the embodiments illustrated in FIGS. 2 and 7, encoders 200, 700 may perform prediction searches with reference to spherically-projected input data and reference data but the differential video coding 230, 710 itself may be perform using non-spherically projected data. In this manner, embodiments of the present disclosure may be used cooperatively with decoders that do not perform spherical projections.

FIG. 9 illustrates a coding method 900 according to an embodiment of the present disclosure. The method 900 may operate on a pixel block by pixel block basis to code a new input picture that is to be coded. The method 900 may begin by transforming data of an input pixel block to a spherical representation (box 910). The method 900 also may transform candidate reference picture data to the spherical representation (box 920). Thereafter, the method 900 may perform a prediction search (box 930) from a comparison between the transformed pixel block data and transformed reference picture data. When an appropriate prediction reference is found, the method 900 may code the input pixel block differentially using the matching reference block as a basis for prediction (box 940). Typically, this differential coding includes a calculation of pixel residuals from a pixel-wise subtraction of prediction block data from the input pixel block data (box 942) and a transformation, quantization and entropy coding of the pixel residuals obtained therefrom (box 944). The method 900 may represent motion vector data in a α, β, γ format (box 946) where α, β, and γ respectively represent rotations of the spherically-projected reference picture from its initial axis (FIG. 6). The method 900 may output the motion vector along with the coded residuals and other metadata of prediction (typically, coding mode indicators, reference picture IDs) (box 948).

In another embodiment, the method 900 may represent the α, β, γ format differentially with respect to a global rotation value estimated for the reference picture. In such an embodiment, the method 900 may estimate a global notational difference between a reference picture and an input picture (box 940). For example, the method 900 may perform overall comparisons between the spherical projection of the input picture 610 (FIG. 6) and the spherical protection of the reference picture 620. The method may generate a first rotational parameter, called a “global rotation,” which reflects a detected rotation between the input picture 610 and the reference picture 620. The global rotation may be coded as a first vector α, β, γ. During coding of individual spherically-projected pixel blocks 614, 616, the reference picture may be rotated further in an effort to find rotations that achieve lower coding costs J than would be achieved if reference blocks were selected using the global rotation values alone. If such rotations are identified, then the motion vectors for the pixel blocks 614, 616 may be coded differentially with respect to the global rotation vector α, β, γ in a format Δα, Δβ and Δγ. In an embodiment, the global vector may be included in coded video data in syntactic elements that occur at a higher level than the coded pixel block data, for example, in picture header or slice headers. And, when a single input picture is coded with reference to a plurality of reference pictures, such headers may include fields for identification of the reference pictures (for example, by a picture ID) and the global rotation vectors α, β, γ that apply to each of them.

The coding method 900 of FIG. 9 may be performed by a video coder such as shown in FIG. 7. In this embodiment, however, when the predictor 770 outputs spherical-domain motion vectors to the controller 780, the controller need not convert the motion vectors to a coder-domain representation. The controller 780 may perform the estimation of rotational differences between the input picture and the reference picture (inputs not shown) and may derive appropriate motion vectors representations as described. The controller 780 also may output its estimate of rotational differences to the channel.

Similarly, decoding may be performed by a video decoder such as shown in FIG. 8. In this embodiment, however, a controller 860 may perform conversion operations to convert motion vectors from their differential representation (ΔMV) to absolute representation (MV) and further to convert the absolute representations of the motion vectors from a spherical-domain representation to a coder domain representation. Thereafter, the predictor 850 may retrieve appropriate reference blocks from the reference picture store 840 and provide them to the pixel block decoder 820.

In another embodiment, spherical projections may assign variable radii to portions of image content from either input pictures or reference pictures. Such variable radii may accommodate object motion or camera movement that causes image content to become resized between input pictures and reference pictures. For example, when an object moves toward or away from a camera, the object's size within the camera's field of view may increase or decrease accordingly. Similarly, when a camera moves, some objects become close to the camera, in which cause their sizes increase, and other objects become farther from the camera, in which case their sizes decrease. In either case, a video coder may compensate for such changes by different radii among spherical projections.

FIG. 10 illustrates an exemplary use case in which an input picture 1010 to be coded is transformed into a spherical projection 1012 on a sphere having radius R. The input picture 1010 may be coded predictively using a reference picture 1020 as a prediction reference, which has its own spherical projection 1022. For convenience, FIG. 10 illustrates only a portion of the spherical projections for the input and reference pictures. In FIG. 10, the spherical projection of the reference picture is shown also having a radius R. In an embodiment, a video coder may vary radii of the spherical projections 1012, 1022 of the input picture and the reference picture according to detected differences in relative sizes among objects therein in an effort to detect correlation between them. When correlation is found, the video coder may derive relative ratios of radii in the spherical projection 1012, 1022, shown as R_I/R_P, which reflects resizing between the elements of image content in the two projections 1012, 1022. In this manner, the spherical projection of one of the pictures (here, the reference picture 1020) may be altered, shown as projection 1024, to accommodate the alternate radius R_P. Thereafter, the resizing ratio R_I/R_P, along with other components of motion vectors derived from the prior embodiments, may be mapped to the reference picture, which identifies a reference block 1026 in the reference picture that may be resized and used as a basis to predict a pixel block 1014 in the input picture 1010.

Variations of radii among spherical projections may be performed anew for each pixel block being coded or at other coding granularities. Thus, a video coder may change the radius of a reference pictures spherical projection on a per picture basis, a per tile basis or a per slice basis, if desired.

In another embodiment, spherical projections may assign variable spatial locations of origins of the spherical projections assigned to image content from the input pictures and/or the reference pictures. Such variable origins may accommodate object motion of camera movement that causes image content to become resized between input pictures and reference pictures. As discussed, when a camera moves, some objects become close to the camera, in which case their sizes increase, and other objects become farther from the camera, in which case their sizes decrease. In this case, a video coder may compensate for such changes by assigning different locations to origins of the spherical projections.

FIG. 11 illustrates an exemplary use case in which an input picture 1110 to be coded is transformed into a spherical projection 1112 on a sphere having a radius R. The input picture 1110 may be coded predictively using a reference picture 1120 as a prediction reference, which has its own spherical projection 1122. For convenience, FIG. 11 illustrates only a portion of the spherical projections for the input and reference pictures. In FIG. 11, the spherical projection of the reference picture is shown also having the radius R. In an embodiment, a video coder may vary locations of origins of the radii R of the spherical projections 1112, 1122 of the input picture and the reference picture according to detected differences in relative sizes among objects therein in an effort to detect correlation between them. When correlation is found, the video coder may derive a relative offset among the spherical projections 1112, 1122, shown as Δx, Δy, Δz, which reflects shifts among the two projections 1112, 1122. Thereafter, the origin offsets Δx, Δy, Δz, along with other components of motion vectors derived from the prior embodiments, may be mapped to the reference picture, which identifies a reference block 1124 in the reference picture that may be resized and used as a basis to predict a pixel block 1114 in the input picture 1110.

Variations of origin offsets Δx, Δy, Δz among spherical projections may be performed anew for each pixel block being coded or at other coding granularities. Thus, a video coder may change the origin offset of reference picture spherical projections on a per picture basis, a per tile basis or a per slice basis, if desired.

In a further embodiment, the techniques of FIG. 9, FIG. 10 and FIG. 11 may be combined to provide video coders and decoders multiple degrees of freedom to identify differences among spherical projections of input pictures and reference pictures. Thus, a video coder may provide motion vectors that reflect, for example, offsets with respect to a global rotation vector (which may be represented by angular vectors α, β, γ), radial ratios R_I/R_Pand/or origin offsets Δx, Δy, Δz.

FIG. 12 is a functional block diagram of a decoding system 1200 according to another embodiment of the present disclosure. In the embodiment of FIG. 12, the decoding system 1200 may operate in a domain of a spherical projection, shown figuratively in FIGS. 3-4, 6 and 10-11, until decoded image data is to be output from the decoding system 1200. Decoded image data may be converted to a two-dimensional domain when it is to be output from the decoding system 1200.

The decoding system 1200 may include a syntax unit 1210, a pixel block decoder 1220, an in-loop filter 1230, a reference picture store 1240, a predictor 1250, a controller 1260, inverse spherical transform unit 1270 and an inverse spherical transform unit 1280. The syntax unit 1210 may receive a coded video data stream and may parse the coded data into its constituent parts. Data representing coding parameters may be furnished to the controller 1260 while data representing coded residuals (the data output by the pixel block coder 210 of FIG. 2) may be furnished to the pixel block decoder 1220. The pixel block decoder 1220 may invert coding operations provided by the pixel block coder (FIG. 2). The in-loop filter 1230 may filter reconstructed pixel block data. The reconstructed pixel block data may be assembled into pictures for display and output from the decoding system 1200 as output video. The pictures also may be stored in the prediction buffer 1240 for use in prediction operations. The predictor 1250 may supply prediction data to the pixel block decoder 1220 as determined by coding data received in the coded video data stream. The spherical transform unit 1270 may transform data from the reference picture store 1240 to the spherical domain and furnish the transformed data to the predictor 1250. The inverse spherical transform unit 1280 may transform prediction data from the spherical domain back to the domain of the pixel block encoder 1220 and may furnish the transformed data to the adder 1228 therein.

The pixel block decoder 1220 may include an entropy decoder 1222, a dequantizer 1224, an inverse transform unit 1226, and an adder 12212. The entropy decoder 1222 may perform entropy decoding to invert processes performed by the entropy coder 7112 (FIG. 12). The dequantizer 1224 may invert operations of the quantizer 716 of the pixel block coder 710 (FIG. 7). Similarly, the inverse transform unit 1226 may invert operations of the transform unit 714 (FIG. 7). They may use the quantization parameters Q_Pand transform modes M that are provided in the coded video data stream. Because quantization is likely to truncate data, the data recovered by the dequantizer 1224, likely will possess coding errors when compared to the input data presented to its counterpart quantizer 716 in the pixel block coder 210 (FIG. 2).

The adder 1228 may invert operations performed by the subtractor 712 (FIG. 7). It may receive a prediction pixel block from the predictor 1250 as determined by prediction references in the coded video data stream. The adder 1228 may add the prediction pixel block to reconstructed residual values output by the inverse transform unit 1226 and may output reconstructed pixel block data.

The in-loop filter 1230 may perform various filtering operations on reconstructed pixel block data. As illustrated, the in-loop filter 1230 may include a deblocking filter 1232 and an SAO filter 1234. The deblocking filter 1232 may filter data at seams between reconstructed pixel blocks to reduce discontinuities between the pixel blocks that arise due to coding. SAO filters 1234 may add offset to pixel values according to an SAO type, for example, based on edge direction/shape and/or pixel level. Other types of in-loop filters may also be used in a similar manner. Operation of the deblocking filter 1232 and the SAO filter 1234 ideally would mimic operation of their counterparts in the coding system 700 (FIG. 7). Thus, in the absence of transmission errors or other abnormalities, the decoded picture obtained from the in-loop filter 1230 of the decoding system 1200 would be the same as the decoded picture obtained from the in-loop filter 730 of the coding system 700 (FIG. 7); in this manner, the coding system 700 and the decoding system 1200 should store a common set of reference pictures in their respective reference picture stores 740, 1240.

The reference picture stores 1240 may store filtered pixel data for use in later prediction of other pixel blocks. The reference picture stores 1240 may store decoded pixel block data of each picture as it is coded for use in intra prediction. The reference picture stores 1240 also may store decoded reference pictures.

As discussed, the predictor 1250 may supply prediction data to the pixel block decoder 1220. The predictor 1250 may supply predicted pixel block data as determined by the prediction reference indicators supplied in the coded video data stream.

The controller 1260 may control overall operation of the coding system 1200. The controller 1260 may set operational parameters for the pixel block decoder 1220 and the predictor 1250 based on parameters received in the coded video data stream. As is relevant to the present discussion, these operational parameters may include quantization parameters Q_Pfor the dequantizer 1224 and transform modes M for the inverse transform unit 1215. As discussed, the received parameters may be set at various granularities of image data, for example, on a per pixel block basis, a per picture basis, a per slice basis, a per LCU basis, or based on other types of regions defined for the input image.

In the embodiment of FIG. 12, the pixel block decoder 1220, the in loop filter 1230, the and reference picture store 1240 may operate on image data in a two dimensional domain. The predictor 1250, however, may operate on motion vectors that identify reference blocks according to the techniques described hereinabove with respect to FIGS. 6, 10 and/or 11, which may identify reference pixel blocks by a rotational vector (α, β, γ), a ratio of projection radii (Ri/Rp), and/or offsets between origins of the spherical projections (Δx, Δy, Δz). In response, the predictor 1250 may retrieve a reference picture from the reference picture store 1240 and the spherical transform 1270 may transform the retrieved picture to the spherical domain. The predictor 1250 may align the spherical projection of the reference picture to the input pixel blocks according to the motion vectors that are received (e.g., by rotating the spherical projection of the reference picture according to the rotational vector (α, β, γ), by resizing the spherical projection of the reference picture according to the ratio of projection radii (Ri/Rp), and/or by shifting an origin of the spherical projection of the reference picture according to the origin offsets (Δx, Δy, Δz)). After such processing, a portion of the reference picture that aligns with a location of the coded pixel block that is being decoded may be output to the adder 1228 as a reference block.

The inverse spherical transform unit 1280 may transform the data of the reference block to the two-dimensional domain. The inverse transform unit may invert operations described with respect to ¶¶[29]-[36] hereinabove. The transformed reference block may be output to the adder 1228 of the pixel block decoder 1220 as a prediction block.

FIG. 13 is a functional block diagram of a decoding system 1300 according to another embodiment of the present disclosure. In the embodiment of FIG. 13, the decoding system 1300 may operate in a domain of a spherical projection, shown figuratively in FIGS. 3-4, 6 and 10-11, until decoded image data is to be output from the decoding system 1300. Decoded image data may be converted to a two-dimensional domain when it is to be output from the decoding system 1300.

The decoding system 1300 may include a syntax unit 1310, a pixel block decoder 1320, an in-loop filter 1330, a reference picture store 1340, a predictor 1350, a controller 1360 and a inverse spherical transform unit 1370. The syntax unit 1310 may receive a coded video data stream and may parse the coded data into its constituent parts. Data representing coding parameters may be furnished to the controller 1360 while data representing coded residuals (the data output by the pixel block coder 210 of FIG. 2) may be furnished to the pixel block decoder 1320. The pixel block decoder 1320 may invert coding operations provided by the pixel block coder (FIG. 2). The in-loop filter 1330 may filter reconstructed pixel block data. The reconstructed pixel block data may be assembled into pictures for display and output from the decoding system 1300 as output video. The pictures also may be stored in the prediction buffer 1340 for use in prediction operations. The predictor 1350 may supply prediction data to the pixel block decoder 1320 as determined by coding data received in the coded video data stream.

The pixel block decoder 1320 may include an entropy decoder 1322, a dequantizer 1324, an inverse transform unit 1326, and an adder 13213. The entropy decoder 1322 may perform entropy decoding to invert processes performed by the entropy coder 7113 (FIG. 13). The dequantizer 1324 may invert operations of the quantizer 716 of the pixel block coder 710 (FIG. 7). Similarly, the inverse transform unit 1326 may invert operations of the transform unit 714 (FIG. 7). They may use the quantization parameters Q_Pand transform modes M that are provided in the coded video data stream. Because quantization is likely to truncate data, the data recovered by the dequantizer 1324, likely will possess coding errors when compared to the input data presented to its counterpart quantizer 716 in the pixel block codex 210 (FIG. 2).

The adder 1328 may invert operations performed by the subtractor 713 (FIG. 7). It may receive a prediction pixel block from the predictor 1350 as determined by prediction references in the coded video data stream. The adder 1328 may add the prediction pixel block to reconstructed residual values output by the inverse transform unit 1326 and may output reconstructed pixel block data.

The in-loop filter 1330 may perform various tittering operations on reconstructed pixel block data. As illustrated, the in-loop filter 1330 may include a deblocking filter 1332 and an SAO filter 1334. The deblocking filter 1332 may filter data at seams between reconstructed pixel blocks to reduce discontinuities between the pixel blocks that arise due to coding. SAO filters 1334 may add offset to pixel values according to an SAO type, for example, based on edge direction/shape and/or pixel level. Other types of in-loop filters may also be used in a similar manner. Operation of the deblocking filter 1332 and the SAO filter 1334 ideally would mimic operation of their counterparts in the coding system 700 (FIG. 7). Thus, in the absence of transmission errors or other abnormalities, the decoded picture obtained from the in-loop filter 1330 of the decoding system 1300 would be the same as the decoded picture obtained from the in-loop filter 730 of the coding system 700 (FIG. 7); in this manner, the coding system 700 and the decoding system 1300 should store a common set of reference pictures in their respective reference picture stores 740, 1340.

The reference picture stores 1340 may store filtered pixel data for use in later prediction of other pixel blocks. The reference picture stores 1340 may store decoded pixel block data of each picture as it is coded for use in intra prediction. The reference picture stores 1340 also may store decoded reference pictures.

As discussed, the predictor 1350 may supply prediction data to the pixel block decoder 1320. The predictor 1350 may supply predicted pixel block data as determined by the prediction reference indicators supplied in the coded video data stream.

The controller 1360 may control overall operation of the coding system 1300. The controller 1360 may set operational parameters for the pixel block decoder 1320 and the predictor 1350 based on parameters received in the coded video data stream. As is relevant to the present discussion, these operational parameters may include quantization parameters Q_Pfor the dequantizer 1324 and transform modes M for the inverse transform unit 1315. As discussed, the received parameters may be set at various granularities of image data, for example, on a per pixel block basis, a per picture basis, a per slice basis, a per LCU basis, or based on other types of regions defined for the input image.

As indicated, the pixel block decoder 1320, the in loop filter 1330, the reference picture store 1340, the predictor 1350 and the controller 1360 may operate on image data in a spherical projection domain. Thus, decoding system 1300 may decode coded pixel blocks that vary in size and shape as shown in FIG. 6. The predictor 1350 may operate on motion vectors that identify reference blocks according to the techniques described hereinabove with respect to FIGS. 6, 10 and/or 11, which may identify reference pixel blocks by a rotational vector (α, β, γ), a ratio of projection radii (Ri/Rp, and/or offsets between origins of the spherical projections (Δx, Δy, Δz). In response, the predictor 1350 may retrieve reference pixel blocks from the reference picture store and align them to the input pixel blocks according to the motion vectors that are received (e.g., by rotating the spherical projection of the reference picture according to the rotational vector (α, β, γ), by resizing the spherical projection of the reference picture according to the ratio of projection radii (Ri/Rp), and/or by shifting an origin of the spherical projection of the reference picture according to the origin offsets (Δx, Δy, Δz)). After such processing, a portion of the reference picture that aligns with a location of the coded pixel block that is being decoded may be output to the adder 1328 as a reference block.

The inverse spherical transform unit 1370 may transform reconstructed images that are output from the in loop filter 1330 to an output domain. Typically, the output domain will be a two-dimensional domain, which causes the reconstructed images to be suitable for display on a display device or for use by an application program that consumes such data. The inverse transform unit may invert operations described with respect to ¶¶[29]-[36] hereinabove. In an embodiment, the inverse spherical transform 1370 may be omitted when outputting reconstructed images to an application that processes graphics data in a spherical domain.

The foregoing embodiments have described video coding and decoding processes that operate on 360° video obtained from camera systems but the principles of the present disclosure are not so limited. The techniques described herein may find application with 360° video regardless of the techniques by which such videos are generated. For example, 360° video may find application in computer applications such as video games and three dimensional rendering applications. Thus, the 360° video may represent computer-generated models of virtual worlds or computer rendered video data representing human-authored content, as desired. The principles of the present disclosure also find application with augmented reality systems in which camera-generated image data and computer-generated graphics data are merged into 360° video pictures that are coded. In this regard, the source of the 360° video is immaterial to the present discussion.

In an embodiment, a video coder and decoder may exchange signaling to identify parameters of the spherical projection. Such signaling may occur according to the following syntax, in one such embodiment.

Video coders and decoder may exchange a projection_format field, which identifies a type of projection format that is used by the video coder in the coder's domain. For example, the projection_format field may contain a projection_format_id value that may take the following values:

TABLE 1

projection_format_id
Coder Domain Format

0
2D conventional video

1
Equirectangular

2
Cube map

3
reserved

The projection_format field may be provided, for example, in a sequence parameter set of a coding protocol such as H.265.

The video coder and decoder may exchange other signaling, such as a use_rotation_estimation field and a rotation_estimation_mode field. The use_rotation_estimate may indicate, for example, whether rotational estimations such as described for box 840 (FIG. 8) are performed. The rotation_estimation_mode field may identify parameters of the rotational estimation, such as by:

TABLE 2

Rotation Angle Signaled

rotation_estimation_mode
Along which Coordinate

0
x, y, z

1
x, y

2
x, z

3
y, z

4
x

5
y

6
X

The use_rotation_estimation field and the rotation_estimation_mode field may be signaled at different levels of coding granularity within a coding protocol. For example, the use_rotation_estimation field may be signaled at the pixel block level during coding while the rotation_estimation_mode field may be signaled at higher levels of granularity, for example in a sequence parameter set or a picture parameter set. The converse also may occur; the use_rotation_estimation field may be signaled at the sequence parameter set level during coding while the rotation_estimation_mode field may be signaled at a lower level of granularity, for example, at the pixel block level. Moreover, the use_rotation_estimation field and the rotation_estimation_mode field may be predictively coded from picture to picture or from pixel block to pixel block as desired. Additionally, context-adaptive binary arithmetic coding (CABAC) of the parameter data may be performed.

Similarly, the video coder and decoder may exchange other signaling, such as a global_rotation_estimation_mode field, that may signal a type of global rotation estimation that is performed. The global_rotation_estimation_mode field may identify parameters of the global rotational estimation, such as by:

global_rotation_estimation_mode
Mode

0
No global rotation estimation

1
Rotate reference

2
Rotate Source

The global_rotation_estimation_mode field may be signaled in a sequence parameter set, a picture parameter set, or a slice segment header as may be convenient.

The foregoing discussion has described operation of the embodiments of the present disclosure in the context of video coders and decoders. Commonly, these components are provided as electronic devices. Video decoders and/or controllers can be embodied in integrated circuits, such as application specific integrated circuits, field programmable gate arrays and/or digital signal processors. Alternatively, they can be embodied in computer programs that execute on camera devices, personal computers, notebook computers, tablet computers, smartphones or computer servers. Such computer programs typically are stored in physical storage media such as electronic-, magnetic- and/or optically-based storage devices, where they are read to a processor and executed. Decoders commonly are packaged in consumer electronics devices, such as smartphones, tablet computers, gaming systems, DVD players, portable media players and the like; and they also can be packaged in consumer software applications such as video games, media players, media editors, and the like. And, of course, these components may be provided as hybrid systems that distribute functionality across dedicated hardware components and programmed general-purpose processors, as desired.

The foregoing description has been presented for purposes of illustration and description. It is not exhaustive and does not limit embodiments of the disclosure to the precise forms disclosed. Modifications and variations are possible in light of the above teachings or may be acquired from the practicing embodiments consistent with the disclosure. Unless described otherwise herein, any of the methods may be practiced in any combination.

Claims

1. A video coding method, comprising: for a plurality of input pixel blocks from an input picture in a source-domain representation: transforming a first input pixel block in the source-domain representation to a spherical-domain representation;transforming a candidate reference picture from the source-domain representation to the spherical representation;searching for a match between the spherical-domain representation of the first input pixel block and a portion of the spherical-domain representation of the candidate reference picture;on a match, determining a spherical-domain motion vector including a spherical-domain rotational offset for predicting the first input pixel block in the spherical-domain representation from a matching portion of a reference picture in the spherical-domain representation;transforming the spherical-domain motion vector to a two-dimensional source-domain motion vector and a source-domain rotational offset for predicting the first input pixel block in the source-domain representation of the input picture; andpredictively coding the first input pixel block in the source-domain including determining a source-domain prediction from the reference picture in the source-domain based on the two-dimensional source-domain motion vector and the source domain rotational offset.
2. The method of claim 1, wherein the searching comprise searching for a best mode to optimize overall cost to send source-domain representation of the pixel block in terms of minimum spherical domain error and source-domain bit rate.
3. The method of claim 1, further comprising, outputting coded data of the pixel blocks with source-domain motion vectors as coded data of the input picture.
4. The method of claim 1, further comprising, outputting, with coded data of the pixel blocks, spherical-domain offsets as coded data of the input picture.
5. The method of claim 4, further comprising, on the match: outputting data identifying the rotational offset, whereinthe outputted spherical-domain offsets are represented differentially with respect to the rotational offset.
6. The method of claim 1, wherein, for at least one pixel block, the searching includes estimating a relative radius ratio between the spherical-domain representation of the pixel block and the spherical-domain representation of the matching portion of the reference picture, and outputting data identifying the relative radius ratio as coded data of the respective input pixel block.
7. The method of claim 1, wherein, for at least one pixel block, the searching includes estimating a relative offset between an origin of the spherical-domain representation of the pixel block and an origin of the spherical-domain representation of the matching portion of the of the reference picture, and outputting data identifying the relative offset as coded data of the respective input pixel block.
8. The method of claim 1, wherein the searching comprises estimating, for a plurality of candidate blocks from the reference pictures, a number of bits required to code the input pixel block with reference to each respective candidate block, anda matching candidate block is selected based on a relative differences between the number of bits estimated as required from among the candidate blocks.
9. The method of claim 1, wherein the searching comprises estimating, for a plurality of candidate blocks from the reference pictures, an amount of distortion that could be created by coding the input pixel block with reference to each respective candidate block, anda matching candidate block is selected based on a relative differences between the amount of distortion estimated for each of the candidate blocks.
10. The method of claim 1, wherein the input picture is an equirectangular picture in its source-domain representation.
11. The method of claim 1, wherein the input picture is a cube map picture in its source-domain representation.
12. The method of claim 1, wherein the input picture is one of a truncated pyramid-based, tetrahedral-based, octahedral-based, dodecahedral-based and icosahedral-based image in its source-domain representation.
13. A video coder, comprising: a pixel block coder, having an input for a pixel block of input data in a coding-domain and a reference pixel block,a reference picture store, storing data of a plurality of reference pictures in the coding-domain,a pair of spherical transform units, a first spherical transform unit having an input for the input pixel block in the coding-domain and a second spherical transform unit having an input for reference picture data in the coding-domain,a predictor having spherical-domain inputs coupled to outputs from the first and second spherical transform units,wherein, the predictor selects a spherical-domain motion vector including a spherical-domain rotational offset by searching the spherical transformed reference pictures for a prediction reference of the spherical transformed input pixel block, the predictor converts the spherical-domain motion vector to a coding-domain motion vector and a coding-domain rotational offset and outputs a prediction block in the coding-domain to the pixel block coder, and the pixel block coder codes a coding-domain residual between the pixel block in the coding-domain and the prediction in the coding domain.
14. The video coder of claim 13, wherein the video coder outputs coded data of the pixel blocks with source-domain motion vectors as coded data of the input picture.
15. The video coder of claim 13, wherein the video coder outputs coded data of the pixel blocks with spherical-domain offsets as coded data of the input picture.
16. The video coder claim 15, further comprising a controller that: outputs data identifying the rotational offset, whereinthe outputted spherical-domain offsets are represented differentially with respect to the rotational offset.
17. The video coder of claim 13, further comprising a controller that estimates a relative radius ratio between the spherical-domain representation of the pixel block and the spherical-domain representation of the matching portion of the of the reference picture, and outputting data identifying the relative radius ratio as coded data of the respective input pixel block.
18. The video coder of claim 13, further comprising a controller that estimates a relative offset between an origin of the spherical-domain representation of the pixel block and an origin of the spherical-domain representation of the matching portion of the of the reference picture, and outputting data identifying the relative offset as coded data of the respective input pixel block.
19. The video coder of claim 13, wherein the predictor: estimates, for a plurality of candidate blocks from the reference pictures, a number of bits required to code the input pixel block with reference to each respective candidate block, andselects a matching candidate block based on a relative differences between the number of bits estimated as required from among the candidate blocks.
20. The video coder of claim 13, wherein the predictor: estimates, for a plurality of candidate blocks from the reference pictures, an amount of distortion that could be created by coding the input pixel block with reference to each respective candidate block, andselects a matching candidate block based on a relative differences between the amount of distortion estimated for each of the candidate blocks.
21. The video coder of claim 13, wherein the input picture is an equirectangular picture in its source-domain representation.
22. The video coder of claim 13, wherein the input picture is a cube map picture in its source-domain representation.
23. A non-transitory computer readable medium storing program instructions that, when executed by a processing device, cause the device to: for a plurality of input pixel blocks from an input picture in a source-domain representation: transforming an first input pixel block in the source-domain representation to a spherical-domain representation;transforming a candidate reference picture from the source-domain representation to the spherical representation;searching for a match between the spherical-domain representation of the first input pixel block and a portion of the spherical-domain representation of the candidate reference picture;on a match, determining a spherical-domain motion vector including a spherical-domain rotational offset for predicting the first input pixel block in the spherical-domain representation from a matching portion of a reference picture in the spherical-domain representation;transforming the spherical-domain motion vector to a two-dimensional source-domain motion vector and a source-domain rotational offset for predicting the first input pixel block in the source-domain representation of the input picture; andpredictively coding the first input pixel block in the source-domain including determining a source-domain prediction from the reference picture in the source-domain based on the two-dimensional source-domain motion vector and the source domain rotational offset.
24. The medium of claim 23, further comprising, on the match: determining a rotational offset between a spherical-domain representation of the input picture and a spherical-domain representation of the reference picture,outputting data identifying the rotational offset, andoutputting coded data of the pixel blocks with spherical-domain offsets as coded data of the input picture, wherein the spherical-domain offsets are represented differentially with respect to the rotational offset.
25. The medium of claim 23, wherein, for at least one pixel block, the searching includes estimating a relative radius ratio between the spherical-domain representation of the pixel block and the spherical-domain representation of the matching portion of the of the reference picture, and outputting data identifying the relative radius ratio as coded data of the respective input pixel block.
26. The medium of claim 23, wherein, for at least one pixel block, the searching includes estimating a relative offset between an origin of the spherical-domain representation of the pixel block and an origin of the spherical-domain representation of the matching portion of the of the reference picture, and outputting data identifying the relative offset as coded data of the respective input pixel block.
27. A video decoder, comprising: a pixel block decoder having an input for coded image data and an output for reconstructed pixel blocks in a non-spherical projection domain;a reference picture store for storing reconstructed reference pictures assembled from reconstructed pixel blocks of frames in the non-spherical projection domain represented by the coded video data; anda predictor, responsive to motion vector data in a spherical-domain included in the coded image data, for converting the motion vector data in the spherical-domain to motion vector data in the non-spherical projection domain and providing reference block data in the non-spherical projection domain to the pixel block decoder based upon manipulations of a reference picture in a spherical domain;wherein the motion vector data includes an identification of a rotational offset between a spherical domain representation of the coded picture and a spherical domain representation of a reference picture, and the predictor aligns the reference picture and the coded picture according to the rotational offset.
28. The video decoder of claim 27, further comprising a transform unit for transforming reconstructed pictures from the spherical domain representation to a two-dimensional domain representation.
29. The video decoder of claim 27, wherein motion vector data of a coded picture includes an identification of a ratio of radii between a spherical domain representation of the coded picture and a spherical domain representation of a reference picture identified by the motion vector, and the predictor aligns the spherical domain representation of the reference picture to the spherical domain representation of the coded picture according to the ratio.
30. The video decoder of claim 27, wherein the motion vector includes an identification of an offset between an origin of a spherical domain representation of the coded picture and an origin of a spherical domain representation of a reference picture identified by the motion vector, and the predictor aligns the origins of the reference picture and the coded picture according to the offset.
31. A video coding method, comprising: for an input pixel block from an input picture in a source-domain representation: transforming the input pixel block in the source-domain representation to a spherical-domain representation;transforming a candidate reference picture from the source-domain representation to the spherical representation;searching for a match between the spherical-domain representation of the input pixel block and a portion of the spherical-domain representation of the candidate reference picture;on a match, determining a spherical-domain motion vector between the pixel block in the spherical-domain representation and a matching portion of a reference picture in the spherical-domain representation;transforming the spherical-domain motion vector to a source-domain motion vector for predicting the input pixel block in the source-domain representation of the input picture,predictively coding the input pixel block with reference to a source-domain representation of the matching portion of the reference picture converted from the spherical-domain representation; andoutputting the coded pixel block and the spherical-domain offset.
32. A video coding method, comprising: for a plurality of input pixel blocks from an input picture in a source-domain representation: transforming a first input pixel block in the source-domain representation to a spherical-domain representation;transforming a candidate reference picture from the source-domain representation to the spherical representation;searching for a match between the spherical-domain representation of the first input pixel block and a portion of the spherical-domain representation of the candidate reference picture;on a match, determining spherical-domain offsets including a rotational offset for predicting the first input pixel block in the spherical-domain representation from a matching portion of a reference picture in the spherical-domain representation; andpredictively coding the first input pixel block including predicting the first input pixel block in the spherical-domain using the spherical domain offsets including the rotational offset, and including in the coded output indications of the spherical domain offsets with the rotational offset.

US Referenced Citations (514)

Number	Name	Date	Kind
4890257	Anthias et al.	Dec 1989	A
5185667	Zimmerman	Feb 1993	A
5262777	Low et al.	Nov 1993	A
5313306	Kuban et al.	May 1994	A
5359363	Kuban et al.	Oct 1994	A
5448687	Hoogerhyde et al.	Sep 1995	A
5537155	O'Connell et al.	Jul 1996	A
5600346	Kamata et al.	Feb 1997	A
5684937	Oxaal	Nov 1997	A
5689800	Downs	Nov 1997	A
5715016	Kobayashi et al.	Feb 1998	A
5787207	Golin	Jul 1998	A
5872604	Ogura	Feb 1999	A
5903270	Gentry et al.	May 1999	A
5936630	Oxaal	Aug 1999	A
6011897	Koyama et al.	Jan 2000	A
6031540	Golin et al.	Feb 2000	A
6043837	Driscoll, Jr. et al.	Mar 2000	A
6058212	Yokoyama	May 2000	A
6122317	Hanami et al.	Sep 2000	A
6144890	Rothkop	Nov 2000	A
6204854	Signes et al.	Mar 2001	B1
6219089	Driscoll, Jr. et al.	Apr 2001	B1
6222883	Murdock et al.	Apr 2001	B1
6317159	Aoyama	Nov 2001	B1
6331869	Furlan et al.	Dec 2001	B1
6426774	Driscoll, Jr. et al.	Jul 2002	B1
6535643	Hong	Mar 2003	B1
6539060	Lee et al.	Mar 2003	B1
6559853	Hashimoto et al.	May 2003	B1
6577335	Kobayashi et al.	Jun 2003	B2
6751347	Pettigrew et al.	Jun 2004	B2
6762789	Sogabe et al.	Jul 2004	B1
6769131	Tanaka et al.	Jul 2004	B1
6795113	Jackson et al.	Sep 2004	B1
6907310	Gardner et al.	Jun 2005	B2
6973130	Wee et al.	Dec 2005	B1
6993201	Haskell et al.	Jan 2006	B1
7006707	Peterson	Feb 2006	B2
7015954	Foote et al.	Mar 2006	B1
7039113	Soundararajan	May 2006	B2
7050085	Park et al.	May 2006	B1
7095905	Peterson	Aug 2006	B1
7123777	Rondinelli et al.	Oct 2006	B2
7139440	Rondinelli et al.	Nov 2006	B2
7149549	Ortiz et al.	Dec 2006	B1
7259760	Hashimoto et al.	Aug 2007	B1
7327787	Chen et al.	Feb 2008	B1
7382399	McCall et al.	Jun 2008	B1
7385995	Stiscia et al.	Jun 2008	B2
7415356	Gowda et al.	Aug 2008	B1
7433535	Mukherjee et al.	Oct 2008	B2
7450749	Rouet et al.	Nov 2008	B2
7593041	Novak et al.	Sep 2009	B2
7620261	Chiang et al.	Nov 2009	B2
7660245	Luby	Feb 2010	B1
7742073	Cohen-Solal et al.	Jun 2010	B1
7755667	Rabbani et al.	Jul 2010	B2
7782357	Cutler	Aug 2010	B2
8027473	Stiscia et al.	Sep 2011	B2
8045615	Liang et al.	Oct 2011	B2
8217956	Jin	Jul 2012	B1
8255552	Witt et al.	Aug 2012	B2
8270496	Yin et al.	Sep 2012	B2
8295360	Lewis et al.	Oct 2012	B1
8339394	Lininger	Dec 2012	B1
8442109	Wang et al.	May 2013	B2
8442311	Hobbs et al.	May 2013	B1
8462109	Nasiri et al.	Jun 2013	B2
8462853	Jeon et al.	Jun 2013	B2
8482595	Kweon	Jul 2013	B2
8682091	Amit et al.	Mar 2014	B2
8693537	Wang et al.	Apr 2014	B2
8711941	Letunovskiy et al.	Apr 2014	B2
9013536	Zhu et al.	Apr 2015	B2
9071484	Traux	Jun 2015	B1
9094681	Wilkins et al.	Jul 2015	B1
9098870	Meadow et al.	Aug 2015	B2
9219919	Deshpande	Dec 2015	B2
9224247	Wada et al.	Dec 2015	B2
9258520	Lee	Feb 2016	B2
9277122	Imura et al.	Mar 2016	B1
9404764	Lynch	Aug 2016	B2
9430873	Nakamura et al.	Aug 2016	B2
9510007	Chan et al.	Nov 2016	B2
9516225	Banta et al.	Dec 2016	B2
9596899	Stahl et al.	Mar 2017	B2
9639935	Douady-Pleven et al.	May 2017	B1
9723223	Banta et al.	Aug 2017	B1
9743060	Matias et al.	Aug 2017	B1
9754413	Gray	Sep 2017	B1
9781356	Banta et al.	Oct 2017	B1
9823835	Wang et al.	Nov 2017	B2
9838687	Banta et al.	Dec 2017	B1
9866815	Vrcelj et al.	Jan 2018	B2
9936204	Sim et al.	Apr 2018	B1
9967563	Hsu et al.	May 2018	B2
9967577	Wu et al.	May 2018	B2
9992502	Abbas et al.	Jun 2018	B2
9996945	Holzer et al.	Jun 2018	B1
10102611	Murtha et al.	Oct 2018	B1
10204658	Krishnan	Feb 2019	B2
10212456	Guo et al.	Feb 2019	B2
10277897	Mukherjee et al.	Apr 2019	B1
10282814	Lin et al.	May 2019	B2
10306186	Chuang et al.	May 2019	B2
10321109	Tanumihardja et al.	Jun 2019	B1
10334222	Kokare et al.	Jun 2019	B2
10339627	Abbas et al.	Jul 2019	B2
10339688	Su et al.	Jul 2019	B2
10349068	Banta et al.	Jul 2019	B1
10375371	Xu et al.	Aug 2019	B2
10455238	Mody et al.	Oct 2019	B2
10523913	Kim et al.	Dec 2019	B2
10559121	Moudgil et al.	Feb 2020	B1
10573060	Ascolese et al.	Feb 2020	B1
10574997	Chung et al.	Feb 2020	B2
10642041	Han et al.	May 2020	B2
10652284	Liu et al.	May 2020	B2
10728546	Leontaris et al.	Jul 2020	B2
20010006376	Numa et al.	Jul 2001	A1
20010028735	Pettigrew et al.	Oct 2001	A1
20010036303	Maurincomme et al.	Nov 2001	A1
20020080878	Li	Jun 2002	A1
20020093670	Luo et al.	Jul 2002	A1
20020126129	Snyder et al.	Sep 2002	A1
20020140702	Koller et al.	Oct 2002	A1
20020141498	Martins	Oct 2002	A1
20020190980	Gerritsen et al.	Dec 2002	A1
20020196330	Park et al.	Dec 2002	A1
20030098868	Fujiwara et al.	May 2003	A1
20030099294	Wang et al.	May 2003	A1
20030152146	Lin et al.	Aug 2003	A1
20040022322	Dye	Feb 2004	A1
20040028133	Subramaniyan et al.	Feb 2004	A1
20040028134	Subramaniyan et al.	Feb 2004	A1
20040032906	Lillig et al.	Feb 2004	A1
20040056900	Blume	Mar 2004	A1
20040189675	Pretlove et al.	Sep 2004	A1
20040201608	Ma et al.	Oct 2004	A1
20040218099	Washington	Nov 2004	A1
20040227766	Chou et al.	Nov 2004	A1
20040247173	Nielsen et al.	Dec 2004	A1
20050013498	Srinivasan et al.	Jan 2005	A1
20050041023	Green	Feb 2005	A1
20050069682	Tseng	Mar 2005	A1
20050129124	Ha	Jun 2005	A1
20050204113	Harper et al.	Sep 2005	A1
20050243915	Kwon et al.	Nov 2005	A1
20050244063	Kwon et al.	Nov 2005	A1
20050286777	Kumar et al.	Dec 2005	A1
20060034527	Gritsevich	Feb 2006	A1
20060055699	Perlman et al.	Mar 2006	A1
20060055706	Perlman et al.	Mar 2006	A1
20060110062	Chiang et al.	May 2006	A1
20060119599	Woodbury	Jun 2006	A1
20060126719	Wilensky	Jun 2006	A1
20060132482	Oh	Jun 2006	A1
20060165164	Kwan et al.	Jul 2006	A1
20060165181	Kwan et al.	Jul 2006	A1
20060204043	Takei	Sep 2006	A1
20060238445	Wang et al.	Oct 2006	A1
20060282855	Margulis	Dec 2006	A1
20070024705	Richter et al.	Feb 2007	A1
20070057943	Beda et al.	Mar 2007	A1
20070064120	Didow et al.	Mar 2007	A1
20070071100	Shi et al.	Mar 2007	A1
20070097268	Relan et al.	May 2007	A1
20070115841	Taubman et al.	May 2007	A1
20070223582	Borer	Sep 2007	A1
20070263722	Fukuzawa	Nov 2007	A1
20070291143	Barbieri et al.	Dec 2007	A1
20080036875	Jones et al.	Feb 2008	A1
20080044104	Gering	Feb 2008	A1
20080049991	Gering	Feb 2008	A1
20080077953	Fernandez et al.	Mar 2008	A1
20080118180	Kamiya et al.	May 2008	A1
20080184128	Swenson et al.	Jul 2008	A1
20080252717	Moon et al.	Oct 2008	A1
20080310513	Ma et al.	Dec 2008	A1
20090040224	Igarashi et al.	Feb 2009	A1
20090123088	Kallay et al.	May 2009	A1
20090153577	Ghyme et al.	Jun 2009	A1
20090190858	Moody et al.	Jul 2009	A1
20090219280	Maillot	Sep 2009	A1
20090219281	Maillot	Sep 2009	A1
20090251530	Cilia et al.	Oct 2009	A1
20090262838	Gholmieh et al.	Oct 2009	A1
20100029339	Kim et al.	Feb 2010	A1
20100061451	Fuchigami	Mar 2010	A1
20100079605	Wang et al.	Apr 2010	A1
20100080287	Ali	Apr 2010	A1
20100110481	Do et al.	May 2010	A1
20100124274	Cheok et al.	May 2010	A1
20100135389	Tanizawa et al.	Jun 2010	A1
20100215226	Kaufman et al.	Aug 2010	A1
20100305909	Wolper et al.	Dec 2010	A1
20100316129	Zhao et al.	Dec 2010	A1
20100329361	Choi et al.	Dec 2010	A1
20100329362	Choi et al.	Dec 2010	A1
20110058055	Lindahl et al.	Mar 2011	A1
20110090967	Chen et al.	Apr 2011	A1
20110128350	Oliver et al.	Jun 2011	A1
20110142306	Nair	Jun 2011	A1
20110194617	Kumar et al.	Aug 2011	A1
20110200100	Kim et al.	Aug 2011	A1
20110235706	Demircin et al.	Sep 2011	A1
20110274158	Fu et al.	Nov 2011	A1
20110305274	Fu et al.	Dec 2011	A1
20110310089	Petersen	Dec 2011	A1
20120082232	Rojals et al.	Apr 2012	A1
20120098926	Kweon	Apr 2012	A1
20120192115	Falchuk et al.	Jul 2012	A1
20120219055	He et al.	Aug 2012	A1
20120230392	Zheng et al.	Sep 2012	A1
20120260217	Celebisoy	Oct 2012	A1
20120263231	Zhou	Oct 2012	A1
20120307746	Hammerschmidt et al.	Dec 2012	A1
20120320169	Bathiche	Dec 2012	A1
20120320984	Zhou	Dec 2012	A1
20120327172	El-Saban et al.	Dec 2012	A1
20130003858	Sze	Jan 2013	A1
20130016783	Kim et al.	Jan 2013	A1
20130044108	Tanaka et al.	Feb 2013	A1
20130051452	Li et al.	Feb 2013	A1
20130051467	Zhou et al.	Feb 2013	A1
20130088491	Hobbs et al.	Apr 2013	A1
20130094568	Hsu et al.	Apr 2013	A1
20130101025	Van der Auwera et al.	Apr 2013	A1
20130101042	Sugio et al.	Apr 2013	A1
20130111399	Rose	May 2013	A1
20130124156	Wolper et al.	May 2013	A1
20130127844	Koeppel et al.	May 2013	A1
20130128986	Tsai et al.	May 2013	A1
20130136174	Xu et al.	May 2013	A1
20130170726	Kaufman et al.	Jul 2013	A1
20130182775	Wang et al.	Jul 2013	A1
20130195183	Zhai et al.	Aug 2013	A1
20130208787	Zheng et al.	Aug 2013	A1
20130219012	Suresh et al.	Aug 2013	A1
20130251028	Au et al.	Sep 2013	A1
20130272415	Zhou	Oct 2013	A1
20130301706	Qiu et al.	Nov 2013	A1
20140002439	Lynch	Jan 2014	A1
20140003450	Bentley et al.	Jan 2014	A1
20140010293	Srinivasan et al.	Jan 2014	A1
20140078263	Kim	Mar 2014	A1
20140082054	Denoual et al.	Mar 2014	A1
20140089326	Lin et al.	Mar 2014	A1
20140140401	Lee et al.	May 2014	A1
20140153636	Esenlik et al.	Jun 2014	A1
20140169469	Bernal et al.	Jun 2014	A1
20140176542	Shohara et al.	Jun 2014	A1
20140218356	Distler et al.	Aug 2014	A1
20140254949	Chou	Sep 2014	A1
20140267235	DeJohn et al.	Sep 2014	A1
20140269899	Park et al.	Sep 2014	A1
20140286410	Zenkich	Sep 2014	A1
20140355667	Lei et al.	Dec 2014	A1
20140368669	Talvala et al.	Dec 2014	A1
20140376634	Guo et al.	Dec 2014	A1
20150003525	Sasai et al.	Jan 2015	A1
20150003725	Wan	Jan 2015	A1
20150016522	Sato	Jan 2015	A1
20150029294	Lin et al.	Jan 2015	A1
20150062292	Kweon	Mar 2015	A1
20150089348	Jose	Mar 2015	A1
20150103884	Ramasubramonian et al.	Apr 2015	A1
20150145966	Krieger et al.	May 2015	A1
20150195491	Shaburov et al.	Jul 2015	A1
20150195559	Chen et al.	Jul 2015	A1
20150215631	Zhou et al.	Jul 2015	A1
20150237370	Zhou et al.	Aug 2015	A1
20150256839	Ueki et al.	Sep 2015	A1
20150264259	Raghoebardajal et al.	Sep 2015	A1
20150264386	Pang et al.	Sep 2015	A1
20150264404	Hannuksela	Sep 2015	A1
20150271517	Pang et al.	Sep 2015	A1
20150279087	Myers et al.	Oct 2015	A1
20150279121	Myers et al.	Oct 2015	A1
20150304665	Hannuksela et al.	Oct 2015	A1
20150321103	Barnett et al.	Nov 2015	A1
20150326865	Yin et al.	Nov 2015	A1
20150339853	Wolper et al.	Nov 2015	A1
20150341552	Chen et al.	Nov 2015	A1
20150346812	Cole et al.	Dec 2015	A1
20150346832	Cole et al.	Dec 2015	A1
20150350673	Hu et al.	Dec 2015	A1
20150351477	Stahl et al.	Dec 2015	A1
20150358612	Sandrew et al.	Dec 2015	A1
20150358613	Sandrew et al.	Dec 2015	A1
20150358633	Choi et al.	Dec 2015	A1
20150373334	Rapaka et al.	Dec 2015	A1
20150373372	He et al.	Dec 2015	A1
20160012855	Krishnan	Jan 2016	A1
20160014422	Su et al.	Jan 2016	A1
20160027187	Wang et al.	Jan 2016	A1
20160050369	Takenaka et al.	Feb 2016	A1
20160080753	Oh	Mar 2016	A1
20160112489	Adams et al.	Apr 2016	A1
20160112704	Grange et al.	Apr 2016	A1
20160142697	Budagavi et al.	May 2016	A1
20160150231	Schulze	May 2016	A1
20160165257	Chen et al.	Jun 2016	A1
20160227214	Rapaka et al.	Aug 2016	A1
20160234438	Satoh	Aug 2016	A1
20160241836	Cole et al.	Aug 2016	A1
20160269632	Morioka	Sep 2016	A1
20160277746	Fu et al.	Sep 2016	A1
20160286119	Rondinelli	Sep 2016	A1
20160350585	Lin et al.	Dec 2016	A1
20160350592	Ma et al.	Dec 2016	A1
20160352791	Adams	Dec 2016	A1
20160353089	Gallup et al.	Dec 2016	A1
20160353146	Weaver et al.	Dec 2016	A1
20160360104	Zhang et al.	Dec 2016	A1
20160360180	Cole et al.	Dec 2016	A1
20170013279	Puri et al.	Jan 2017	A1
20170026659	Lin et al.	Jan 2017	A1
20170038942	Rosenfeld et al.	Feb 2017	A1
20170054907	Nishihara et al.	Feb 2017	A1
20170064199	Lee et al.	Mar 2017	A1
20170078447	Hancock et al.	Mar 2017	A1
20170085892	Liu et al.	Mar 2017	A1
20170094184	Gao et al.	Mar 2017	A1
20170104927	Mugavero et al.	Apr 2017	A1
20170109930	Holzer et al.	Apr 2017	A1
20170127008	Kankaanpaa et al.	May 2017	A1
20170142371	Barzuza et al.	May 2017	A1
20170155912	Thomas et al.	Jun 2017	A1
20170180635	Hayashi et al.	Jun 2017	A1
20170200255	Lin et al.	Jul 2017	A1
20170200315	Lockhart	Jul 2017	A1
20170208346	Narroschke et al.	Jul 2017	A1
20170214937	Lin et al.	Jul 2017	A1
20170223268	Shimmoto	Aug 2017	A1
20170223368	Abbas et al.	Aug 2017	A1
20170228867	Baruch	Aug 2017	A1
20170230668	Lin et al.	Aug 2017	A1
20170236323	Lim et al.	Aug 2017	A1
20170244775	Ha et al.	Aug 2017	A1
20170251208	Adsumilli et al.	Aug 2017	A1
20170257644	Andersson et al.	Sep 2017	A1
20170272698	Liu et al.	Sep 2017	A1
20170272758	Lin et al.	Sep 2017	A1
20170278262	Kawamoto et al.	Sep 2017	A1
20170280126	Van der Auwera	Sep 2017	A1
20170287200	Forutanpour et al.	Oct 2017	A1
20170287220	Khalid et al.	Oct 2017	A1
20170295356	Abbas et al.	Oct 2017	A1
20170301065	Adsumilli et al.	Oct 2017	A1
20170301132	Dalton et al.	Oct 2017	A1
20170302714	Ramsay et al.	Oct 2017	A1
20170302951	Joshi et al.	Oct 2017	A1
20170309143	Trani et al.	Oct 2017	A1
20170322635	Yoon et al.	Nov 2017	A1
20170323422	Kim et al.	Nov 2017	A1
20170323423	Lin et al.	Nov 2017	A1
20170332107	Abbas et al.	Nov 2017	A1
20170336705	Zhou et al.	Nov 2017	A1
20170339324	Tocher et al.	Nov 2017	A1
20170339341	Zhou et al.	Nov 2017	A1
20170339391	Zhou et al.	Nov 2017	A1
20170339392	Forutanpour et al.	Nov 2017	A1
20170339415	Wang et al.	Nov 2017	A1
20170344843	Wang et al.	Nov 2017	A1
20170353737	Lin et al.	Dec 2017	A1
20170359590	Zhang et al.	Dec 2017	A1
20170366808	Lin et al.	Dec 2017	A1
20170374332	Yamaguchi et al.	Dec 2017	A1
20170374375	Makar et al.	Dec 2017	A1
20180005447	Wallner et al.	Jan 2018	A1
20180005449	Wallner et al.	Jan 2018	A1
20180007387	Izumi	Jan 2018	A1
20180007389	Izumi	Jan 2018	A1
20180018807	Lu et al.	Jan 2018	A1
20180020202	Xu et al.	Jan 2018	A1
20180020238	Liu et al.	Jan 2018	A1
20180027178	Macmillan et al.	Jan 2018	A1
20180027226	Abbas et al.	Jan 2018	A1
20180027257	Izumi et al.	Jan 2018	A1
20180047208	Marin et al.	Feb 2018	A1
20180048890	Kim et al.	Feb 2018	A1
20180053280	Kim et al.	Feb 2018	A1
20180054613	Lin et al.	Feb 2018	A1
20180061002	Lee et al.	Mar 2018	A1
20180063505	Lee et al.	Mar 2018	A1
20180063544	Tourapis et al.	Mar 2018	A1
20180075576	Liu et al.	Mar 2018	A1
20180075604	Kim et al.	Mar 2018	A1
20180075635	Choi et al.	Mar 2018	A1
20180077451	Yip et al.	Mar 2018	A1
20180084257	Abbas	Mar 2018	A1
20180091812	Guo et al.	Mar 2018	A1
20180098090	Lin et al.	Apr 2018	A1
20180101931	Abbas et al.	Apr 2018	A1
20180109810	Xu et al.	Apr 2018	A1
20180124312	Chang	May 2018	A1
20180130243	Kim et al.	May 2018	A1
20180130264	Ebacher	May 2018	A1
20180146136	Yamamoto	May 2018	A1
20180146138	Jeon et al.	May 2018	A1
20180152636	Yim et al.	May 2018	A1
20180152663	Wozniak et al.	May 2018	A1
20180160113	Jeong et al.	Jun 2018	A1
20180160138	Park	Jun 2018	A1
20180160156	Hannuksela et al.	Jun 2018	A1
20180164593	Van der Auwera et al.	Jun 2018	A1
20180167613	Hannuksela et al.	Jun 2018	A1
20180167634	Salmimaa et al.	Jun 2018	A1
20180174619	Roy et al.	Jun 2018	A1
20180176468	Wang et al.	Jun 2018	A1
20180176536	Jo et al.	Jun 2018	A1
20180176596	Jeong et al.	Jun 2018	A1
20180176603	Fujimoto	Jun 2018	A1
20180184101	Ho	Jun 2018	A1
20180184121	Kim et al.	Jun 2018	A1
20180191787	Morita et al.	Jul 2018	A1
20180192074	Shih et al.	Jul 2018	A1
20180199029	Van der Auwera et al.	Jul 2018	A1
20180199034	Nam et al.	Jul 2018	A1
20180199070	Wang	Jul 2018	A1
20180218512	Chan et al.	Aug 2018	A1
20180220138	He et al.	Aug 2018	A1
20180227484	Hung et al.	Aug 2018	A1
20180234700	Kim et al.	Aug 2018	A1
20180240223	Yi et al.	Aug 2018	A1
20180240276	He et al.	Aug 2018	A1
20180242016	Lee et al.	Aug 2018	A1
20180242017	Van Leuven et al.	Aug 2018	A1
20180249076	Sheng et al.	Aug 2018	A1
20180249163	Curcio et al.	Aug 2018	A1
20180249164	Kim et al.	Aug 2018	A1
20180253879	Li et al.	Sep 2018	A1
20180268517	Coban et al.	Sep 2018	A1
20180270417	Suitoh et al.	Sep 2018	A1
20180276789	Van der Auwera et al.	Sep 2018	A1
20180276826	Van der Auwera et al.	Sep 2018	A1
20180276890	Wang	Sep 2018	A1
20180288435	Boyce et al.	Oct 2018	A1
20180295282	Boyce et al.	Oct 2018	A1
20180302621	Fu et al.	Oct 2018	A1
20180307398	Kim et al.	Oct 2018	A1
20180315245	Patel	Nov 2018	A1
20180322611	Bang et al.	Nov 2018	A1
20180329482	Woo et al.	Nov 2018	A1
20180332265	Hwang et al.	Nov 2018	A1
20180332279	Kang et al.	Nov 2018	A1
20180338142	Kim et al.	Nov 2018	A1
20180343388	Matsushita	Nov 2018	A1
20180349705	Kim et al.	Dec 2018	A1
20180350407	Decoodt et al.	Dec 2018	A1
20180352225	Guo et al.	Dec 2018	A1
20180352259	Guo et al.	Dec 2018	A1
20180352264	Guo et al.	Dec 2018	A1
20180359487	Bang et al.	Dec 2018	A1
20180374192	Kunkel	Dec 2018	A1
20180376126	Hannuksela	Dec 2018	A1
20180376152	Wang et al.	Dec 2018	A1
20190004414	Kim et al.	Jan 2019	A1
20190007669	Kim et al.	Jan 2019	A1
20190007679	Coban et al.	Jan 2019	A1
20190007684	Van der Auwera et al.	Jan 2019	A1
20190012766	Yoshimi	Jan 2019	A1
20190014304	Curcio et al.	Jan 2019	A1
20190026956	Gausebeck et al.	Jan 2019	A1
20190028642	Fujita et al.	Jan 2019	A1
20190045212	Rose et al.	Feb 2019	A1
20190057487	Cheng	Feb 2019	A1
20190057496	Ogawa et al.	Feb 2019	A1
20190082184	Hannuksela	Mar 2019	A1
20190104315	Guo et al.	Apr 2019	A1
20190108611	Izumi	Apr 2019	A1
20190132521	Fujita et al.	May 2019	A1
20190132594	Chung et al.	May 2019	A1
20190141318	Li et al.	May 2019	A1
20190158800	Kokare et al.	May 2019	A1
20190200016	Jang et al.	Jun 2019	A1
20190215512	Lee et al.	Jul 2019	A1
20190215532	He et al.	Jul 2019	A1
20190230285	Kim	Jul 2019	A1
20190230337	Kim	Jul 2019	A1
20190230377	Ma	Jul 2019	A1
20190236990	Song et al.	Aug 2019	A1
20190238888	Kim	Aug 2019	A1
20190246141	Kim et al.	Aug 2019	A1
20190253622	Van der Auwera et al.	Aug 2019	A1
20190253624	Kim	Aug 2019	A1
20190268594	Lim et al.	Aug 2019	A1
20190273929	Ma et al.	Sep 2019	A1
20190273949	Kim et al.	Sep 2019	A1
20190281217	Kim	Sep 2019	A1
20190281290	Lee et al.	Sep 2019	A1
20190289324	Budagavi	Sep 2019	A1
20190289331	Byun	Sep 2019	A1
20190297341	Zhou	Sep 2019	A1
20190297350	Lin et al.	Sep 2019	A1
20190306515	Shima	Oct 2019	A1
20190387251	Lin et al.	Dec 2019	A1
20200029077	Lee et al.	Jan 2020	A1
20200036976	Kanoh et al.	Jan 2020	A1
20200045323	Hannuksela	Feb 2020	A1
20200074687	Lin et al.	Mar 2020	A1
20200077092	Lin et al.	Mar 2020	A1
20200084441	Lee et al.	Mar 2020	A1
20200120340	Park et al.	Apr 2020	A1
20200120359	Hanhart et al.	Apr 2020	A1
20200137401	Kim et al.	Apr 2020	A1
20200162731	Kim et al.	May 2020	A1
20200213570	Shih et al.	Jul 2020	A1
20200213571	Kim et al.	Jul 2020	A1
20200213587	Galpin et al.	Jul 2020	A1
20200244957	Sasai et al.	Jul 2020	A1
20200252650	Shih et al.	Aug 2020	A1

Foreign Referenced Citations (13)

Number	Date	Country
2077525	Jul 2009	EP
2008-193458	Aug 2008	JP
2012-160886	Aug 2012	JP
2014-176034	Sep 2014	JP
2017-0015938	Feb 2017	KR
WO 2012044709	Apr 2012	WO
WO 2015138979	Sep 2015	WO
WO 2015184416	Dec 2015	WO
WO 2016076680	May 2016	WO
WO 2016140060	Sep 2016	WO
WO 2017125030	Jul 2017	WO
WO 2017127816	Jul 2017	WO
WO 2018118159	Jun 2018	WO

Non-Patent Literature Citations (19)

Entry
Boyce et al.; “Common Test Conditions and Evaluation Procedures for 360 degree Video Coding”; Joint Video Exploration Team; ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC 29/WG 11; Doc. JVET-D1030; Oct. 2016; 6 pages.
Tosic et al.; “Multiresolution Motion Estimation for Omnidirectional Images”; IEEE 13th European Signal Processing Conf.; Sep. 2005; 4 pages.
Li et al.; “Projection Based Advanced Motion Model for Cubic Mapping for 360-Degree Video”; Cornell University Library; 2017; 5 pages.
Zheng et al.; “Adaptive Selection of Motion Models for Panoramic Video Coding”; IEEE Int'l Conf. Multimedia and Expo; Jul. 2007; p. 1319-1322.
He et al.; “AHG8: Algorithm description of InterDigital's projection format conversion tool (PCT360)”; Joint Video Exploration Team; ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC 29/WG 11; Doc. JVET-D0090; Oct. 2016; 6 pages.
International Patent Application No. PCT/US2017/051542; Int'l Search Report and the Written Opinion; dated Dec. 7, 2017; 17 pages.
International Patent Application No. PCT/US2017/051542; Int'l Preliminary Report on Patentability; dated Jul. 4, 2019; 10 pages.
He et al.; “AHG8: InterDigital's projection format conversion tool”; Joint Video Exploration Team (JVET) of ITU-T SG 16 WP3 and ISO/IEC JTC 1/SC 29/WG 11 4th meeting; Oct. 2016; 18 pages.
Kammachi et al.; “AHG8: Test results for viewport-dependent pyramid, cube map, and equirectangular panorama schemes”; JVET-D00078; Oct. 2016; 7 pages.
Yip et al.; “Technologies under Considerations for ISO/IEC 23000-20 Omnidirectional Media Application Format”; ISO/IEC JTC1/SC29/WG11 MPEG2017/W16637; Jan. 2017; 50 pages.
International Patent Application No. PCT/US2018/018246; Int'l Search Report and the Written Opinion; dated Apr. 20, 2018; 15 pages.
He et al.; “AHG8: Geometry padding for 360 video coding”; Joint Video Exploration Team (JVET); Document: JVET-D0075; Oct. 2016; 10 pages.
Vishwanath et al.; “Rotational Motion Model for Temporal Prediction in 360 Video Coding”; IEEE 19th Int'l Workshop on Multimedia Signal Processing; Oct. 2017; 6 pages.
Sauer et al.; “Improved Motion Compensation for 360 Video Projected to Polytopes” Proceedings of the IEEE Int'l Conf. on Multimedia and Expo; Jul. 2017; p. 61-66.
International Patent Application No. PCT/US2018/017124; Int'l Search Report and the Written Opinion; dated Apr. 30, 2018; 19 pages.
Choi et al.; “Text of ISO/IEC 23000-20 CD Omnidirectional Media Application Format”; Coding of Moving Pictures and Audio; ISO/IEC JTC1/SC29/WG11 N16636; Jan. 2017; 48 pages.
International Patent Application No. PCT/US2018/018246; Int'l Preliminary Report on Patentability; dated Sep. 6, 2019; 8 pages.
International Patent Application No. PCT/US2018/017124; Int'l Preliminary Report on Patentability; dated Aug. 29, 2019; 12 pages.
Sauer et al.; “Geometry correction for motion compensation of planar-projected 360VR video”; Joint Video Exploration Team; Document: JVET-D0067; Oct. 2016; 13 pages.

Related Publications (1)

	Number	Date	Country
	20180184121 A1	Jun 2018	US

Sphere projected motion estimation/compensation and mode decision

Information

Patent Number

Date Filed

Date Issued

Inventors

Original Assignees

Examiners

Agents

CPC

Field of Search

CPC