The present Application relates to video processing, such as video coding. More particularly, an embodiment of the present invention relates to region based asymmetric coding, such as region based asymmetric coding for video compression.
The provision of a stereoscopic (3D) user experience is a long held goal of both content providers and display manufacturers. For 3D video delivery, bandwidth is a major concern. 3D video contains information about two distinct views, one for each eye, which may result in potentially doubling the bitrate compared to 2D video coding system. In many systems, however, it may not be possible to consider such an increase in bandwidth to support 3D applications, whereas sacrificing quality to fit the 3D signal within the same bandwidth may be undesirable Thus, how to reduce the bandwidth for 3D video delivery without affecting the image quality is an issue for 3D video systems.
According to a first aspect of the present disclosure, a method for processing video data is provided, comprising: providing video data representing a first view and a second view, the first view and the second view, each view comprising a plurality of regions; and asymmetrically processing the regions of the first view according to a first view processing pattern and asymmetrically processing the regions of the second view according to a second view processing pattern, wherein the first view processing pattern is different from the second view processing pattern.
According to a second aspect, a method for processing video data comprised of a plurality of frames each frame represented according to a first view and a second view, each view comprising a plurality of regions is provided, the method comprising: for each frame, evaluating a first view processing pattern and a second view processing pattern; for each frame, asymmetrically processing the regions of the first view according to a first view processing pattern and asymmetrically processing the regions of the second view according to a second view processing pattern, wherein, for each frame, the first view processing pattern is different from the second view processing pattern.
According to a third aspect, a method for encoding video data is provided, comprising: providing video data representing a first view and a second view, the first view and the second view, each view comprising a plurality of color components; and asymmetrically encoding the color components of the first view according to a first view encoding pattern and asymmetrically encoding the color components of the second view according to a second view encoding pattern, wherein the first view encoding pattern is different from the second view encoding pattern.
Further embodiments of the disclosure are shown in the specification, claims and drawings of the present application.
Several methods can be utilized to deliver 3D data. Methods include simulcast encoding and delivery of the different views (encoding and sending two views independently) (see, for example,
Other methods are also developed for next generation 3D TV activities which are primarily based on the combined use of multiview video coding and depth information [see reference 12].
Although many of these methods can result in reduced bandwidth compared to the simulcast approach, bandwidth reduction is still desired. A method used to further reduce bandwidth employs asymmetric coding and processing of the two views. This concept is based on the theories of binocular suppression [see references 4, 5 and 9]. The theory states that the final 3D perception is usually dominated by the higher-quality component of a stereo pair if both views are spatially and temporally properly aligned and the difference of the samples between the two views is not too significant, or the quality of the “lesser” view is within an acceptable range. Based on the above theory, one may select to encode one view with lower quality than the other view without introducing visible distortion artifacts in the binocular percept. In the presence of blocking artifacts, for example, the binocular quality of a stereo sequence tends to be close to the mean of the quality of both views.
Lowering/degrading quality of a view can involve the consideration of blurring/filtering, the use of coarser compression (quantization), or the reduction of spatial and/or temporal resolution [see references 6, 7 and 8].
All of the above asymmetric 3D video coding approaches usually consider imposing quality asymmetry in one view, thus making one view being of higher quality than the other. This level of asymmetry has been restricted at the picture level, and will therefore be referred to as picture-level or view-level asymmetry.
In the present disclosure, techniques that impose asymmetry at the region level, but at the same time may consider overall/average symmetry or significantly reduced asymmetry at the picture/view level, are presented. Throughout the present disclosure, the term “asymmetry” or “asymmetric” at the region level will be used to intend that not all of the regions of a picture or view are processed (e.g., pre-filtered or coded) in the same way, meaning that some regions are processed differently from others.
Region asymmetry may consider coding as well as prefiltering of the content. The techniques proposed in the present disclosure could be applied to any 3D coding method, including 2D+delta, MVC, simulcast, and frame compatible coding. The methods of the present disclosure could also be applied to Full Resolution Frame Compatible (FCFR) coding schemes [see reference 10]. In the context of FCFR coding [see again reference 10], where the base and enhancement layers both are likely to consist of frame compatible, e.g. side by side, over-under, and line interleaved among others, formats, the asymmetric coding concepts of the present disclosure can be extended to both the base and the one or more enhancement layers.
Without losing generality, frame-compatible coding formats will be used in the following paragraphs, as an example. However, the teachings of the present disclosure are also intended to encompass other compression formats, such as Simulcast, MVC, and SVC and could also be applied to multiple view scenarios. Applicants will use H.264/MPEG-4 AVC and Motion compensated DCT based video codecs as an example for the compression techniques presented herein. The person skilled in the art will understand that other coding techniques, such as JPEG Image coding, wavelets and other coding standards and methods can be applied.
A Frame Compatible 3D video signal is created by first multiplexing the two stereo views into a single frame configuration through the consideration of a variety of filtering, sampling, and arrangement methods, as shown, for example, in
Although other methods such as filtering, adjusting the Lagrangian lambda and thresholding, as well as biasing coding modes among others could be used to illustrate and apply the teachings of the present disclosure, applicants will primarily focus, by way of example, on the application of different quantization. In case of filtering, a stronger low-pass filter (e.g., a blurring filter) can be used in one view and not in the other. In the case of biasing the coding modes, one view can bias more bitrate efficient modes than higher quality modes than the other, e.g. bias the selection of SKIP, Direct, 16×16, etc modes versus other higher precision modes. This could be done through changing for example the lagrangian lambda, or by limiting the mode decision to a smaller set of coding modes as an example.
In the context of H.264, quantization is adjusted primarily through the Quantization Parameter (QP) but also through the use of quantization matrices or other encoding techniques, such as quantization rounding, thresholding, or application of different Lagrangian parameters etc. In the case of quantization matrices, if both view samples belong to the same slice, then, since quantization matrices are dependent on the transform mode, different strength quantization matrices could be defined for lets say 4×4 and 8×8 modes, and samples in each view could be somehow biased to select one transform mode instead of the other. If separate slices are used, then different quantization matrices could be signaled instead. For quantization rounding and thresholding, a larger rounding offset or a larger threshold could be set for one view versus the other, respectively, whereas similar considerations could be used when applying a Lagrangian parameter, i.e. a larger Lagrangian (lambda) parameter would result in biasing the coding decision of a block towards lower bitrate.
According to an embodiment of the present disclosure, coding parameter, i.e. QP, assignment is switched for each view alternatively region by region. The region can be some fixed pattern (such as a macroblock row, a slice, a sequence of macroblocks, etc), or it can be decided using the characteristics of the signals, such as content activity, edges, depth/disparity information, region of interest etc.
According to an embodiment of the disclosure, the QP assignment is balanced for each view, so that somewhat similar, if not equal, average QP values are used for both views. This would result in an overall similar quality for the two views.
For the location, QP changes could happen at fixed, random or even adaptive (based on the view/signal characteristics) intervals. For QP value, QP changes (QP offset or delta QP) between regions for each view could be fixed, “random” or could even be adjusted based on the view/signal characteristics. QP variation can refer to either QP location change and/or QP value change. In one embodiment, the QP variation for asymmetric coding can be fixed for all the regions. In another embodiment, a set of values for allowable QP variation can be generated. Then for each region, a different QP value can be assigned using a random number generator and/or user intervention. QP values of macroblocks (MBs) within a region could be further adjusted given their spatio-temporal characteristics. For QP variation adaptation, for example, a smaller QP variation between corresponding regions of the two views can be considered for regions that appear closer to a viewer's eyes and a larger QP variation to regions that are farther away.
In a different embodiment, a smaller or no QP variation is considered at the region or regions of interest, whereas larger variations and/or larger QP parameters are considered for less important regions. A region or regions of interest could be user specified or defined as objects in the foreground, possibly extracted using a segmentation or depth estimation (objects appearing closer to the viewer) technique. In a further embodiment, smaller or no QP variation between corresponding regions of the two views is considered for regions that may have higher impact to the subjective or objective quality of a scene, whereas higher QP variations between corresponding regions of the two views are considered for regions with lesser subjective or objective quality impact. Specifically, the same QP can be assigned to the edges for both views, but for the regions everywhere else, asymmetric coding can be applied and higher QP variation between corresponding regions of the two views can be allowed. This characterization could, for example be made using brightness/contrast analysis, texture and/or edge analysis, and motion among others.
In a further embodiment, the QP variation and pattern are used in order not to cause any “QP Pop-up” effect, which may be caused by the use of different QP assignments. To overcome this, when QP is allocated to each region, QPs can be adjusted smoothly within one picture. For example, when assigning the quantization parameter for one view, if the first region is allocated a quantization parameter value equal to QP, then the quantization parameter of the second region can be assigned to QP+1, that of the third region to QP+2, and so on. For another view, the first region can be assigned to QP+1, the second region to QP, and the third region to QP+3 as shown, by way of example, in
Temporal impact can also be considered in addition to spatial impact, if desired. Temporally, the pattern can be the same for the full sequence of views, or it can be adjusted adaptively, based on the characteristics of the signal, such as temporal segmentation of the scenes and motion trajectory. For example, as shown in
In the embodiment of
In the further embodiment shown in
Quality asymmetry could also be controlled through adjusting the quality of the chroma parameters in addition to or instead of the luma. Further or lesser quality in the chroma components may be easier tolerated than adjusting the quality of the luma and this could be done again with chroma QP adjustments or by biasing coding decisions, thresholding, motion estimation or other parameters for the coding of chroma from one view to the other. Adjusting and biasing also different chroma components for one view versus the other may also be possible.
In the embodiments of
In the example of
Turning now to the FCFR system, the same or different type of asymmetry can be applied for the base and one or more enhancement layers. In one embodiment, region asymmetric/view symmetric coding is applied in the base layer, whereas in the one or more enhancement layers, only symmetric coding is applied, as shown in
In another embodiment, asymmetric coding can be applied for both the base and the one or more enhancement layers. For example, the same QP variation pattern for each region is used for both the base and enhancement layers.
According to a further embodiment, different regions could be defined for base vs enhancement layers. By way of example, the enhancement layer could be using bigger regions or using region analysis, while the base layer could operate on a fixed pattern, or vice versa.
The person skilled in the art will understand that the above teachings can easily be generalized for encoding sequences with more views and more enhancement layers.
2. Interaction with Other Coding Factors
In the present disclosure, the image quality of the alternating regions for each view is degraded on purpose. This may have some impact on other encoding and decoding factors, such as pre-processing, pre-analysis, motion estimation, mode decision, rate control, and post-processing among others. Therefore, these can be taken into consideration to further improve coding performance given the characteristics of the asymmetric quality allocation.
Some example methods of how these factors could be considered and improved are presented.
In many modem encoders, such as the H.264 JM reference software, Lagrange optimization in the form J=D+lambda*R is used to determine the best motion vector for each mode as well as the best mode for each macroblock. Here D represents distortion, which could be based on a variety of metrics, such as the Sum of Absolute Differences (SAD), Sum of Square Errors (SSE), or other metrics including subjective evaluation among others, and could be based on prediction only or final reconstruction after full encoding. R represents an estimate or the actual value of the bitrate required for encoding a certain motion vector or coding mode, which may include apart from mode information such as mode index, motion vectors, reference index, qp information etc, also residual information. Lambda is commonly a constant factor, which is decided by the quantization process applied to the image, i.e. a masterQP, and may relate to also the slice type as well as other factors such as the use and coefficients of quantization matrices, qp characteristics of color components, and coding structure among others. In Lagrange optimization, for the overall optimization of the full picture, the same lambda is better used.
Due to macroblock based rate control or other encoder optimization considerations, such as Rate Distortion Optimization Quantization (RDOQ) [see reference 11], one may wish to apply a different QP for each macroblock. In this case, a masterQP can be defined, which is used to decide a common lambda for every macroblock. The masterQP is generally set as the initial QP “assignment” or QP “estimate” for the current slice and its value does not change within a slice/picture. It should be noted that although this value does not change, different QP values can still be used for different macroblocks within the same slice, as is also presented in our disclosure. In our invention, lambda computation also accounts for asymmetric QP allocation by considering a masterQP that is now a function of the different QP values used within a slice. The function can be, for example set as the average or weighted average, their minimum or maximum, or some other linear or non linear combination of these values.
The asymmetric coding process described in the present disclosure can also be combined with any rate control system. In one example, MB level rate control may take into consideration the statistics of the same regions in both views jointly. Then the QP variation can be kept between the regions, and the formulae used for rate control can be adapted while considering this QP variation or, to be more specific, will generate an average QP value (QPrc) that will be used as the QP base for these two regions. The corresponding region in one view will be allocated a value of QPrc−N, and the equivalent region in the other view will be allocated a QP value of QPrc+N.
Region based asymmetric coding can require some prior knowledge to decide what is the best way for defining regions and QP allocation. This can be done in a pre-processing and/or a pre-analysis step. The encoder can later reuse some of the statistics generated earlier during pre-filtering, perform refinements thereof, or may even require some additional new statistics, such as disparity information. The consideration of asymmetric quality coding can also guide the pre-filtering process. For example, if it is known that one region is likely to use a larger QP, instead or in addition to an increase in QP, a stronger filter can be applied, or the boundaries between adjacent macroblocks can be filtered in such a way that QP variation and blocking artifacts become less visible.
Since asymmetric coding is switched, according to some embodiments, region by region for each view, the joint design of post filtering is desirable. For example, filtering should be performed across boundaries that account for the difference in quality across regions. A stronger filter might be needed when QP variation is bigger across the boundary of different regions. For FCFR, if different asymmetry is used for the base and the enhancement layer, post-processing should be adjusted accordingly to account for asymmetries in quality not only within the same view but also for quality asymmetries between the base and enhancement layers. A post-processor could exploit such asymmetries in an attempt to improve the quality of the base or enhancement layer depending on how these asymmetries are designed.
In particular,
The methods and systems described in the present disclosure may be implemented in hardware, software, firmware, or combination thereof. Features described as blocks, modules, or components may be implemented together (e.g., in a logic device such as an integrated logic device) or separately (e.g., as separate connected logic devices). The software portion of the methods of the present disclosure may comprise a computer-readable medium which comprises instructions that, when executed, perform, at least in part, the described methods. The computer-readable medium may comprise, for example, a random access memory (RAM) and/or a read-only memory (ROM). The instructions may be executed by a processor (e.g., a digital signal processor (DSP), an application specific integrated circuit (ASIC), or a field programmable logic array (FPGA)).
As thus described herein, an embodiment of the present invention may relate to one or more of the example embodiments, which are enumerated in Table 1, below. Accordingly, the invention may be embodied in any of the forms described herein, including, but not limited to the following Enumerated Example Embodiments (EEEs) which described structure, features, and functionality of some portions of the present invention.
EEE1. A method for processing video data, comprising:
providing video data representing a first view and a second view, the first view and the second view, each view comprising a plurality of regions; and
asymmetrically processing the regions of the first view according to a first view processing pattern and asymmetrically processing the regions of the second view according to a second view processing pattern,
wherein the first view processing pattern is different from the second view processing pattern.
EEE2. The method of Enumerated Example Embodiment 1, wherein the step of asymmetrically processing the regions of the first view and the second view comprises the step of asymmetrically pre-filtering the regions of the first view and the second view.
EEE3. The method of Enumerated Example Embodiment 1, wherein the step of asymmetrically processing the regions of the first view and the second view comprises the step of asymmetrically coding the regions of the first view and the second view.
EEE4. The method of Enumerated Example Embodiment 3, wherein the step of asymmetrically coding the regions of the first view and the regions of the second view comprises the step of varying a coding parameter from region to region of the first view and varying a coding parameters from region to region of the second view.
EEE5. The method of Enumerated Example Embodiment 4, wherein the coding parameter includes one or more of a quantization parameter, a coding mode, quantization matrices, quantization rounding, quantization thresholding or application of different Lagrangian parameters.
EEE6. The method of Enumerated Example Embodiment 5, wherein the asymmetrical coding of the first view according to the first view coding pattern is balanced with respect to the asymmetrical cording of the second view according to the second coding pattern to minimize quality asymmetry.
EEE7. The method according to any one of Enumerated Example Embodiments 1-4, wherein the first view processing pattern is balanced with respect to the second view processing pattern to minimize quality asymmetry.
EEE8. The method according to any one of the previous Enumerated Example Embodiments, wherein each region is one or more macroblock rows of the first view or second view.
EEE9. The method according to any one of Enumerated Example Embodiments 1-7, wherein the regions are based on object segmentation.
EEE10. The method according to any one of Enumerated Example Embodiments 1-7, wherein each region is a depth region.
EEE11. The method according to Enumerated Example Embodiment 10, wherein the regions of the first view and the second view are asymmetrically coded according to the depth of said regions.
EEE12. The method according to Enumerated Example Embodiment 11, wherein asymmetrical coding occurs through assignment of different quantization parameters.
EEE13. The method according to any one of Enumerated Example Embodiments 1-7, wherein each region is a portion of the first view and second view.
EEE14. The method according to any one of Enumerated Example Embodiments 1-7, wherein each region is based on features of the video data to be processed.
EEE15. The method of Enumerated Example Embodiment 14, wherein the features include one or more of content activity, edges, depth information, disparity information, and region of interest of the video data to be processed.
EEE16. The method according to any one of the previous Enumerated Example Embodiments, wherein the first view and the second view are adapted to be combined to form stereoscopic video data.
EEE17. The method according to any one of the previous Enumerated Example Embodiments, wherein the first view and the second view are part of a multi-view arrangement.
EEE18. The method of Enumerated Example Embodiment 1, wherein the video data representing the first view and the second view are provided on a base layer and one or more enhancement layers and are adapted to be multiplexed in a frame compatible 3D video signal.
EEE19. The method of Enumerated Example Embodiment 4, wherein the coding parameter varies between a region of the first view and a corresponding region of the second view for all regions of the first view and second view.
EEE20. The method of Enumerated Example Embodiment 19, wherein the regions are macroblock rows and wherein, for each region of different views, the coding parameter varies between a first value in the first view and a second value in the second view for even rows and between the second value in the first view and the first value in the second view for odd rows.
EEE21. The method of Enumerated Example Embodiment 19, wherein the regions are macroblock rows and wherein, for each region of different views, the coding parameter varies between a first value in the first view and a second value in the second view for even rows and between a third value in the first view and the first value in the second view for odd rows.
EEE22. The method of Enumerated Example Embodiment 19 wherein the coding parameter varies between a first value in the first view and a first value in a second view for a first set of regions in the first view and second view, and varies between the second value in the first view and the first value in the second view for a second set of regions in the first view and second view.
EEE23. The method of Enumerated Example Embodiment 19 wherein the coding parameter varies between a first value in the first view and a second value in the second view for a first set of regions in the first view and second view, and varies between a third value in the first view and the first value in the second view for a second set of regions in the first view and second view.
EEE24. The method of Enumerated Example Embodiment 22 or 23, wherein a region is a slice or a sequence of macroblocks.
EEE25. The method of Enumerated Example Embodiment 4, wherein the coding parameter varies between a region of the first view and a corresponding region of the second view for some regions of the first view and second view.
EEE26. The method of Enumerated Example Embodiments 19 or 25, wherein the coding parameter varies between two or more values.
EEE27. The method of any one of Enumerated Example Embodiments 19-26, wherein variation of the coding parameter is fixed, random or adjustable.
EEE28. The method of any one of Enumerated Example Embodiments 19-26, wherein the regions contain more relevant regions and less relevant regions, and wherein variation of the coding parameter for the more relevant regions is lower than variation of the coding parameter for the less relevant regions.
EEE29. The method of Enumerated Example Embodiment 28, wherein the more relevant regions include a region of interest.
EEE30. The method of Enumerated Example Embodiment 28, wherein the more relevant regions include edge regions.
EEE31. The method of any one of Enumerated Example Embodiments 28-30, wherein variation of the coding parameter for the more relevant regions is zero.
EEE32. The method according to any one of the previous Enumerated Example Embodiments, wherein the first view processing pattern and the second view processing pattern include smooth variation of a processing parameter between regions of a same view.
EEE33. The method according to any one of the previous Enumerated Example Embodiments, wherein the step of asymmetrically processing the regions of the first view according to a first view processing pattern and the step of asymmetrically processing the regions of the second view according to a second view processing pattern include temporal variation of the first processing pattern and the second processing pattern.
EEE34. The method of Enumerated Example Embodiment 33, wherein the temporal variation is adaptively adjustable.
EEE35. The method of Enumerated Example Embodiment 33 or 34, wherein the temporal variation is based on features of the video data.
EEE36. The method of Enumerated Example Embodiment 33 or 34, wherein the temporal variation occurs every frame, every group of frames, every group of pictures, every scene cut, randomly, or in accordance with motion statistics.
EEE37. The method of Enumerated Example Embodiment 25, wherein the coding parameter does not vary between corresponding views of the remaining regions.
EEE38. The method according to Enumerated Example Embodiment 18, wherein the first view processing pattern and the second view processing pattern of the base layer are the same of the first view processing pattern and the second view processing pattern of the one or more enhancement layers.
EEE39. The method according to Enumerated Example Embodiment 18, wherein the first view processing pattern and the second view processing pattern of the base layer are different from the first view processing pattern and the second view processing pattern of the one or more enhancement layers.
EEE40. The method according to Enumerated Example Embodiment 18, wherein asymmetric processing is applied to the base layer and not to the one or more enhancement layers.
EEE41. A method for processing video data comprised of a plurality of frames each frame represented according to a first view and a second view, each view comprising a plurality of regions, comprising:
for each frame, evaluating a first view processing pattern and a second view processing pattern;
for each frame, asymmetrically processing the regions of the first view according to a first view processing pattern and asymmetrically processing the regions of the second view according to a second view processing pattern,
wherein, for each frame, the first view processing pattern is different from the second view processing pattern.
EEE42. The method of Enumerated Example Embodiment 41, wherein the first view processing pattern and the second view processing pattern comprise a first view coding pattern and a second view coding pattern, respectively.
EEE43. The method of Enumerated Example Embodiment 42, wherein the first view coding pattern and the second view coding pattern comprise a first quantization parameter assignment pattern and a second quantization parameter assignment pattern, respectively.
EEE44. A method for encoding video data, comprising:
providing video data representing a first view and a second view, the first view and the second view, each view comprising a plurality of color components; and
asymmetrically encoding the color components of the first view according to a first view encoding pattern and asymmetrically encoding the color components of the second view according to a second view encoding pattern,
wherein the first view encoding pattern is different from the second view encoding pattern.
EEE5. The method of Enumerated Example Embodiment 44, wherein the color components of the first and second view are chroma parameters.
EEE46. An encoder for encoding a video signal according to the method recited in one or more of Enumerated Example Embodiments 1-45.
EEE47. An apparatus for encoding a video signal according to the method recited in one or more of Enumerated Example Embodiments 1-45.
EEE48. A system for encoding a video signal according to the method recited in one or more of Enumerated Example Embodiments 1-45.
EEE49. A computer-readable medium containing a set of instructions that causes a computer to perform the method recited in one or more of Enumerated Example Embodiments 1-45.
EEE50. Use of the method recited in one or more of Enumerated Example Embodiments 1-45 to encode a video signal.
Furthermore, all patents and publications mentioned in the specification may be indicative of the levels of skill of those skilled in the art to which the disclosure pertains. All references cited in this disclosure are incorporated by reference to the same extent as if each reference had been incorporated by reference in its entirety individually.
The examples set forth above are provided to give those of ordinary skill in the art a complete disclosure and description of how to make and use the embodiments of the methods of the disclosure, and are not intended to limit the scope of what the inventors regard as their disclosure. Modifications of the above-described modes for carrying out the disclosure may be used by persons of skill in the video art, and are intended to be within the scope of the following claims.
It is to be understood that the disclosure is not limited to particular methods or systems, which can, of course, vary. It is also to be understood that the terminology used herein is for the purpose of describing particular embodiments only, and is not intended to be limiting. As used in this specification and the appended claims, the singular forms “a,” “an,” and “the” include plural referents unless the content clearly dictates otherwise. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which the disclosure pertains.
A number of embodiments of the disclosure have been described. Nevertheless, it will be understood that various modifications may be made without departing from the spirit and scope of the present disclosure. Accordingly, other embodiments are within the scope of the following claims.
This application claims priority to U.S. Provisional Patent Application No. 61/387,946 filed 29 Sep. 2010 and U.S. Provisional Patent Application No. 61/472,112 filed 5 Apr. 2011, both of which are incorporated herein by reference in their entirety.
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/US2011/053218 | 9/26/2011 | WO | 00 | 3/27/2013 |
Number | Date | Country | |
---|---|---|---|
61387946 | Sep 2010 | US | |
61472112 | Apr 2011 | US |