The present disclosure relates to techniques for correcting image artifacts in multi-view images.
Some modern imaging applications capture image data from multiple directions about a camera. Many cameras have multiple imaging systems that capture image data in several different fields of view. An aggregate image may be created that represents a merger or “stitching” of image data captured from these multiple views.
Oftentimes, the images created from these capture operations exhibit visual artifacts due to discontinuities in the fields of view. For example, a “cube map” image, described herein, may be generated from the merger of six different planar images that define a cubic space about a camera. Each planar view represents image content of objects within the view's respective field of view. Thus, each planar view possesses its own perspective and its own vanishing point, which is different than the perspectives and vanishing points of the other views of the cube map image. Visual artifacts can arise at seams between these images. The artifacts are most pronounced when parts of a common object are represented in multiple views. Parts of the object may appear as if they are at a common depth in one view but other parts of the object may appear as if they have variable depth in the second view.
The inventors perceive a need in the art for image correction techniques that mitigate such artifacts in multi-view images.
Embodiments of the present invention provide an image correction technique for multi-view image that includes a plurality of planar views. Image content the planar views may be projected from the planar representation to a spherical projection. Thereafter, a portion of the image content may be projected from the spherical projection to a planar representation. The image content of the planar representation may be used for display. Extensions of the disclosure provide techniques to correct artifacts that may arise during deblocking filtering of the multi-view images.
In
The image source 210 may generate image data as a multi-directional image, containing image data of a field of view that extends around a reference point in multiple directions.
The image pre-processing system 220 may process the input images to condition them for coding by the video coder 230. For example, the image pre-processor 220 may perform image formatting, projection and/or padding operations as described herein.
The video coder 230 may generate a coded representation of its input image data, typically by exploiting spatial and, for video, or temporal redundancies in the image data. The video coder 230 may output a coded representation of the input data that consumes less bandwidth than the original source video when transmitted and/or stored.
For video, the video decoder 240 may invert coding operations performed by the video encoder 230 to obtain a reconstructed picture from the coded video data. Typically, the coding processes applied by the video coder 230 are lossy processes, which cause the reconstructed picture to possess various errors when compared to the original picture. The video decoder 240 may reconstruct select coded pictures, which are designated as “reference pictures,” and store the decoded reference pictures in the reference picture store 250. In the absence of transmission errors, the decoded reference pictures will replicate decoded reference pictures obtained by a decoder (not shown in
The predictor 260 may select prediction references for new input pictures as they are coded. For each portion of the input picture being coded (called a “pixel block” for convenience), the predictor 260 may select a coding mode and identify a portion of a reference picture that may serve as a prediction reference search for the pixel block being coded. The coding mode may be an intra-coding mode, in which case the prediction reference may be drawn from a previously-coded (and decoded) portion of the picture being coded. Alternatively, the coding mode may be an inter-coding mode, in which case the prediction reference may be drawn from another previously-coded and decoded picture.
When an appropriate prediction reference is identified, the predictor 260 may furnish the prediction data to the video coder 230. The video coder 230 may code input video data differentially with respect to prediction data furnished by the predictor 260. Typically, prediction operations and the differential coding operate on a pixel block-by-pixel block basis. Prediction residuals, which represent pixel-wise differences between the input pixel blocks and the prediction pixel blocks, may be subject to other coding operations to reduce bandwidth further.
As indicated, the coded video data output by the video coder 230 should consume less bandwidth than the input data when transmitted and/or stored. The coding system 200 may output the coded video data to an output device 270, such as a transmitter, that may transmit the coded video data across a communication network 130 (
The image post-processor 330 may perform operations on reconstructed video data output from the video decode 320 to condition it for consumption by the video sink 340. As part of its operation, the image post-processor may remove padding information from decoded data. The image post-processor 330 also may perform projection and reformatting operations to alter format of the decoded data to a format of the video sink 340.
The video sink 340, as indicated, may consume decoded video generated by the decoding system 300. Video sinks 340 may be embodied by, for example, display devices that render decoded video. In other applications, video sinks 340 may be embodied by computer applications, for example, gaming applications, virtual reality applications and/or video editing applications, that integrate the decoded video into their content. In some applications, a video sink may process the entire multi-view field of view of the decoded video for its application but, in other applications, a video sink 340 may process a selected sub-set of content from the decoded video. For example, when rendering decoded video on a flat panel display, it may be sufficient to display only a selected sub-set of the multi-view video. In another application, decoded video may be rendered in a multi-view format, for example, in a planetarium.
Image sources 210 that capture multi-directional images often generate image data that include discontinuities in image content. Such discontinuities often occur at “seams” between fields of view of the camera sub-systems that capture image data in various fields of, from which a final multidirectional image is created.
Embodiments of the present disclosure provide techniques for reducing effects of image content discontinuities.
In an embodiment, image rendering may be performed by projecting content from the spherical domain 1010 to a planar domain. For example, as shown in
The principles of the present discussion find application with multi-view images captured according to other techniques. For example, as illustrated in
In another embodiment, illustrated in
In another embodiment, illustrated in
In a further embodiment, illustrated in
The image format may be obtained from an omnidirectional camera 1540 that contains a plurality of imaging systems 1550, 1560, 1570 to capture image data in an omnidirectional field of view. Imaging systems 1550 and 1560 may capture image data in top and bottoms fields of view, respectively, as “flat” images. The imaging system 1570 may capture image data in a 360° field of view about a horizon H established between the top and bottom fields of view. In the embodiment illustrated in
The pixel block coder 1610 may include a subtractor 1612, a transform unit 1614, a quantizer 1616, and an entropy coder 1618. The pixel block coder 1610 may accept pixel blocks of input data at the subtractor 1612. The subtractor 1612 may receive predicted pixel blocks from the predictor 1650 and generate an array of pixel residuals therefrom representing a difference between the input pixel block and the predicted pixel block. The transform unit 1614 may apply a transform to the sample data output from the subtractor 1612, to convert data from the pixel domain to a domain of transform coefficients. The quantizer 1616 may perform quantization of transform coefficients output by the transform unit 1614. The quantizer 1616 may be a uniform or a non-uniform quantizer. The entropy coder 1618 may reduce bandwidth of the output of the coefficient quantizer by coding the output, for example, by variable length code words.
The transform unit 1614 may operate in a variety of transform modes as determined by the controller 1660. For example, the transform unit 1614 may apply a discrete cosine transform (DCT), a discrete sine transform (DST), a Walsh-Hadamard transform, a Haar transform, a Daubechies wavelet transform, or the like. In an embodiment, the controller 1660 may select a coding mode M to be applied by the transform unit 1615, may configure the transform unit 1615 accordingly and may signal the coding mode M in the coded video data, either expressly or impliedly.
The quantizer 1616 may operate according to a quantization parameter QP that is supplied by the controller 1660. In an embodiment, the quantization parameter QP may be applied to the transform coefficients as a multi-value quantization parameter, which may vary, for example, across different coefficient locations within a transform-domain pixel block. Thus, the quantization parameter QP may be provided as a quantization parameters array.
The entropy coder 1618, as its name implies, may perform entropy coding of data output from the quantizer 1616. For example, the entropy coder 1618 may perform run length coding, Huffman coding, Golomb coding and the like.
The pixel block decoder 1620 may invert coding operations of the pixel block coder 1610. For example, the pixel block decoder 1620 may include a dequantizer 1622, an inverse transform unit 1624, and an adder 1626. The pixel block decoder 1620 may take its input data from an output of the quantizer 1616. Although permissible, the pixel block decoder 1620 need not perform entropy decoding of entropy-coded data since entropy coding is a lossless event. The dequantizer 1622 may invert operations of the quantizer 1616 of the pixel block coder 1610. The dequantizer 1622 may perform uniform or non-uniform de-quantization as specified by the decoded signal QP. Similarly, the inverse transform unit 1624 may invert operations of the transform unit 1614. The dequantizer 1622 and the inverse transform unit 1624 may use the same quantization parameters QP and transform mode M as their counterparts in the pixel block coder 1610. Quantization operations likely will truncate data in various respects and, therefore, data recovered by the dequantizer 1622 likely will possess coding errors when compared to the data presented to the quantizer 1616 in the pixel block coder 1610.
The adder 1626 may invert operations performed by the subtractor 1612. It may receive the same prediction pixel block from the predictor 1650 that the subtractor 1612 used in generating residual signals. The adder 1626 may add the prediction pixel block to reconstructed residual values output by the inverse transform unit 1624 and may output reconstructed pixel block data.
The in-loop filter 1630 may perform various filtering operations on recovered pixel block data. For example, the in-loop filter 1630 may include a deblocking filter 1632 and a sample adaptive offset (“SAO”) filter 1633. The deblocking filter 1632 may filter data at seams between reconstructed pixel blocks to reduce discontinuities between the pixel blocks that arise due to coding. SAO filters may add offsets to pixel values according to an SAO “type,” for example, based on edge direction/shape and/or pixel/color component level. The in-loop filter 1630 may operate according to parameters that are selected by the controller 1660.
The reference picture store 1640 may store filtered pixel data for use in later prediction of other pixel blocks. Different types of prediction data are made available to the predictor 1650 for different prediction modes. For example, for an input pixel block, intra prediction takes a prediction reference from decoded data of the same picture in which the input pixel block is located. Thus, the reference picture store 1640 may store decoded pixel block data of each picture as it is coded. For the same input pixel block, inter prediction may take a prediction reference from previously coded and decoded picture(s) that are designated as reference pictures. Thus, the reference picture store 1640 may store these decoded reference pictures.
As discussed, the predictor 1650 may supply prediction data to the pixel block coder 1610 for use in generating residuals. The predictor 1650 may include an inter predictor 1652, an intra predictor 1653 and a mode decision unit 1652. The inter predictor 1652 may receive pixel block data representing a new pixel block to be coded and may search reference picture data from store 1640 for pixel block data from reference picture(s) for use in coding the input pixel block. The inter predictor 1652 may support a plurality of prediction modes, such as P mode coding and B mode coding. The inter predictor 1652 may select an inter prediction mode and an identification of candidate prediction reference data that provides a closest match to the input pixel block being coded. The inter predictor 1652 may generate prediction reference metadata, such as motion vectors, to identify which portion(s) of which reference pictures were selected as source(s) of prediction for the input pixel block.
The intra predictor 1653 may support Intra (I) mode coding. The intra predictor 1653 may search from among pixel block data from the same picture as the pixel block being coded that provides a closest match to the input pixel block. The intra predictor 1653 also may generate prediction reference indicators to identify which portion of the picture was selected as a source of prediction for the input pixel block.
The mode decision unit 1652 may select a final coding mode to be applied to the input pixel block. Typically, as described above, the mode decision unit 1652 selects the prediction mode that will achieve the lowest distortion when video is decoded given a target bitrate. Exceptions may arise when coding modes are selected to satisfy other policies to which the coding system 1600 adheres, such as satisfying a particular channel behavior, or supporting random access or data refresh policies. When the mode decision selects the final coding mode, the mode decision unit 1652 may output a selected reference block from the store 1640 to the pixel block coder and decoder 1610, 1620 and may supply to the controller 1660 an identification of the selected prediction mode along with the prediction reference indicators corresponding to the selected mode.
The controller 1660 may control overall operation of the coding system 1600. The controller 1660 may select operational parameters for the pixel block coder 1610 and the predictor 1650 based on analyses of input pixel blocks and also external constraints, such as coding bitrate targets and other operational parameters. As is relevant to the present discussion, when it selects quantization parameters QP, the use of uniform or non-uniform quantizers, and/or the transform mode M, it may provide those parameters to the syntax unit 1670, which may include data representing those parameters in the data stream of coded video data output by the system 1600. The controller 1660 also may select between different modes of operation by which the system may generate reference images and may include metadata identifying the modes selected for each portion of coded data.
During operation, the controller 1660 may revise operational parameters of the quantizer 1616 and the transform unit 1615 at different granularities of image data, either on a per pixel block basis or on a larger granularity (for example, per picture, per slice, per largest coding unit (“LCU”) or another region). In an embodiment, the quantization parameters may be revised on a per-pixel basis within a coded picture.
Additionally, as discussed, the controller 1660 may control operation of the in-loop filter 1630 and the prediction unit 1650. Such control may include, for the prediction unit 1650, mode selection (lambda, modes to be tested, search windows, distortion strategies, etc.), and, for the in-loop filter 1630, selection of filter parameters, reordering parameters, weighted prediction, etc.
And, further, the controller 1660 may perform transforms of reference pictures stored in the reference picture store when new packing configurations are defined for input video.
The principles of the present discussion may be used cooperatively with other coding operations that have been proposed for multi-view video. For example, the predictor 1650 may perform prediction searches using input pixel block data and reference pixel block data in a spherical projection. Operation of such prediction techniques are may be performed as described in U.S. patent application Ser. No. 15/390,202, filed Dec. 23, 2016 and U.S. patent application Ser. No. 15/443,342, filed Feb. 27, 2017, both of which are assigned to the assignee of the present application, the disclosures of which are incorporated herein by reference. In such an embodiment, the coder 1600 may include a spherical transform unit 1690 that transforms input pixel block data to a spherical domain prior to being input to the predictor 1650.
As indicated, the coded video data output by the video coder 230 (
The pixel block decoder 1720 may include an entropy decoder 1722, a dequantizer 1724, an inverse transform unit 1726, and an adder 1728. The entropy decoder 1722 may perform entropy decoding to invert processes performed by the entropy coder 1618 (
The adder 1728 may invert operations performed by the subtractor 1610 (
The in-loop filter 1730 may perform various filtering operations on reconstructed pixel block data. As illustrated, the in-loop filter 1730 may include a deblocking filter 1732 and an SAO filter 1734. The deblocking filter 1732 may filter data at seams between reconstructed pixel blocks to reduce discontinuities between the pixel blocks that arise due to coding. SAO filters 1734 may add offset to pixel values according to an SAO type, for example, based on edge direction/shape and/or pixel level. Other types of in-loop filters may also be used in a similar manner. Operation of the deblocking filter 1732 and the SAO filter 1734 ideally would mimic operation of their counterparts in the coding system 1600 (
The reference picture store 1740 may store filtered pixel data for use in later prediction of other pixel blocks. The reference picture store 1740 may store decoded pixel block data of each picture as it is coded for use in intra prediction. The reference picture store 1740 also may store decoded reference pictures.
As discussed, the predictor 1750 may supply the transformed reference block data to the pixel block decoder 1720. The predictor 1750 may supply predicted pixel block data as determined by the prediction reference indicators supplied in the coded video data stream.
The controller 1760 may control overall operation of the decoding system 1700. The controller 1760 may set operational parameters for the pixel block decoder 1720 and the predictor 1750 based on parameters received in the coded video data stream. As is relevant to the present discussion, these operational parameters may include quantization parameters QP for the dequantizer 1724 and transform modes M for the inverse transform unit 1710. As discussed, the received parameters may be set at various granularities of image data, for example, on a per pixel block basis, a per picture basis, a per slice basis, a per LCU basis, or based on other types of regions defined for the input image.
And, further, the controller 1760 may perform transforms of reference pictures stored in the reference picture store 1740 when new packing configurations are detected in coded video data.
Embodiments of the present invention may mitigate boundary artifacts in coding systems 1600 and decoding systems 1700 by altering operation of in loop filters 1630, 1730 in those systems. According to such embodiments, in loop filters 1630, 1730 may be prevented from performing filtering on regions of decoded images that contain null data. For example, in
Embodiments of the present disclosure provide coding systems that generate padded images from input pictures and perform video coding/decoding operations on the basis of the padded images. Thus, a padded input image may be partitioned into a plurality of pixel blocks and coded on a pixel-block-by-pixel-block basis. An image pre-processor 220 (
The padded image content may be derived from views that are adjacent to the view being filtered. For example, in the image space illustrated in
Similarly, for the image format 1900 illustrated in
In another embodiment, shown in
As coding progresses through other rows of the source image 2000 (
In such embodiments, a coding syntax may be developed to notify decoding systems 1700 of the deblocking mode decisions performed by coding systems 1600. In one embodiment, it may be sufficient to provide a deblocking mode flag in coding syntax as follows:
The foregoing embodiments may be performed without requiring padding data to be transmitted in a channel. Padding data may be derived from decoded video data contained in other views. Thus, in the absence of transmission errors between the coding system 1600 and the decoding system 1700, the coding system 1600 and the decoding system 1700 may develop padding data and perform filtering in parallel based on information that is available locally to each system.
In another embodiment, padded image data may be used in prediction operations for video coding. A predictor may interpolate reference pictures for prediction that include padding content provided adjacent to each view of a multi-view image. An exemplary padded reference picture 1830 is illustrated in
Embodiments of the present disclosure may create padded images 1830, 1930 (
In such an embodiment, video coders 230 (
Similarly, a decoding system 1700 (
The padding operations may be performed locally by an encoder and decoder without requiring signaling in a coded data stream representing content of the padded image data. In such embodiments, a coding syntax may be developed to notify decoding systems 1700 of the deblocking mode decisions performed by coding systems 1600. In one embodiment, it may be sufficient to provide a prediction_mode flag in coding syntax as follows:
Such a flag permits an encoder and decoder to control whether to perform padding or not when developing reference pictures for prediction.
The foregoing discussion has described operation of the embodiments of the present disclosure in the context of video coders and decoders. Commonly, these components are provided as electronic devices. Video decoders and/or controllers can be embodied in integrated circuits, such as application specific integrated circuits, field programmable gate arrays and/or digital signal processors. Alternatively, they can be embodied in computer programs that execute on camera devices, personal computers, notebook computers, tablet computers, smartphones or computer servers. Such computer programs typically are stored in physical storage media such as electronic-, magnetic- and/or optically-based storage devices, where they are read to a processor and executed. Decoders commonly are packaged in consumer electronics devices, such as smartphones, tablet computers, gaming systems, DVD players, portable media players and the like; and they also can be packaged in consumer software applications such as video games, media players, media editors, and the like. And, of course, these components may be provided as hybrid systems that distribute functionality across dedicated hardware components and programmed general-purpose processors, as desired.
For example, the techniques described herein may be performed by a central processor of a computer system.
The central processor 2110 may read and execute various program instructions stored in the memory 2130 that define an operating system 2112 of the system 2100 and various applications 2114.1-2114.N. As it executes those program instructions, the central processor 2110 may read, from the memory 2130, decoded image data created either by a codec 2150 or an application 2114.1 and may perform filtering controls as described hereinabove.
As indicated, the memory 2130 may store program instructions that, when executed, cause the processor to perform the techniques described hereinabove. The memory 2130 may store the program instructions on electrical-, magnetic- and/or optically-based storage media.
The transceiver 2140 may represent a communication system to receive coded video data from a network (not shown). In an embodiment where the central processor 2110 operates a software-based video codec, the transceiver 2140 may place coded video data in memory 2130 for retrieval by the processor 2110. In an embodiment where the system 2100 has a dedicated codec, the transceiver 2140 may provide coded video data to the codec 2150.
The foregoing discussion has described the principles of the present disclosure in terms of encoding systems and decoding systems. As described, an encoding system typically codes video data for delivery to a decoding system where the video data is decoded and consumed. As such, the encoding system and decoding system support coding, delivery and decoding of video data in a single direction. In applications where bidirectional exchange is desired, a pair of terminals 110, 120 (
Several embodiments of the present disclosure are specifically illustrated and described herein. However, it will be appreciated that modifications and variations of the present disclosure are covered by the above teachings and within the purview of the appended claims without departing from the spirit and intended scope of the invention.