This application claims priority under 35 U.S.C. § 119(a) to European Patent Application No. EP23184567.8 filed on Jul. 10, 2023. The above-identified patent applications are hereby incorporated by reference in their entirety.
Embodiments according to the disclosure relate to multi-component color picture or video coding and a concept for applying intra-prediction including matrix-based intra prediction modes such as in VVC.
It is known in various block-based video coders to offer intra-predicted block of a color component to adopt the intra-prediction mode used for another component. Further, codecs like VVC, offer matrix-based intra prediction modes in addition to the usual DC, planar and directional modes.
It is desired to provide concepts for applying intra-prediction in multi-component or color picture or video coding in a manner achieving a more efficient coding.
Other technical features may be readily apparent to one skilled in the art from the following figures, descriptions, and claims.
Before undertaking the DETAILED DESCRIPTION below, it may be advantageous to set forth definitions of certain words and phrases used throughout this patent document. The term “couple” and its derivatives refer to any direct or indirect communication between two or more elements, whether or not those elements are in physical contact with one another. The terms “transmit,” “receive,” and “communicate,” as well as derivatives thereof, encompass both direct and indirect communication. The terms “include” and “comprise,” as well as derivatives thereof, mean inclusion without limitation. The term “or” is inclusive, meaning and/or. The phrase “associated with,” as well as derivatives thereof, means to include, be included within, interconnect with, contain, be contained within, connect to or with, couple to or with, be communicable with, cooperate with, interleave, juxtapose, be proximate to, be bound to or with, have, have a property of, have a relationship to or with, or the like. The term “controller” means any device, system or part thereof that controls at least one operation. Such a controller may be implemented in hardware or a combination of hardware and software and/or firmware. The functionality associated with any particular controller may be centralized or distributed, whether locally or remotely. The phrase “at least one of,” when used with a list of items, means that different combinations of one or more of the listed items may be used, and only one item in the list may be needed. For example, “at least one of: A, B, and C” includes any of the following combinations: A, B, C, A and B, A and C, B and C, and A and B and C.
Moreover, various functions described below can be implemented or supported by one or more computer programs, each of which is formed from computer readable program code and embodied in a computer readable medium. The terms “application” and “program” refer to one or more computer programs, software components, sets of instructions, procedures, functions, objects, classes, instances, related data, or a portion thereof adapted for implementation in a suitable computer readable program code. The phrase “computer readable program code” includes any type of computer code, including source code, object code, and executable code. The phrase “computer readable medium” includes any type of medium capable of being accessed by a computer, such as read only memory (ROM), random access memory (RAM), a hard disk drive, a compact disc (CD), a digital video disc (DVD), or any other type of memory. A “non-transitory” computer readable medium excludes wired, wireless, optical, or other communication links that transport transitory electrical or other signals. A non-transitory computer readable medium includes media where data can be permanently stored and media where data can be stored and later overwritten, such as a rewritable optical disc or an erasable memory device.
Definitions for other certain words and phrases are provided throughout this patent document. Those of ordinary skill in the art should understand that in many if not most instances, such definitions apply to prior as well as future uses of such defined words and phrases.
The drawings are not necessarily to scale emphasis instead generally being placed upon illustrating the principles of the disclosure. For a more complete understanding of the present disclosure and its advantages, reference is now made to the following description taken in conjunction with the accompanying drawings, in which like reference numerals represent like parts:
Equal or equivalent elements or elements with equal or equivalent functionality are denoted in the following description by equal or equivalent reference numerals even if occurring in different figures.
In the following description, a plurality of details is set forth to provide a more throughout explanation of embodiments of the present disclosure. However, it will be apparent to those skilled in the art that embodiments of the present invention may be practiced without these specific details. In other instances, well-known structures and devices are shown in block diagram form rather than in detail in order to avoid obscuring embodiments of the present invention. In addition, features of the different embodiments described herein after may be combined with each other, unless specifically noted otherwise.
In the following, different examples, embodiments and aspects will be described. At least some of these examples, embodiments and aspects refer, inter alia, to methods and/or apparatus for video coding and/or for performing block-based Predictions e.g. using linear or affine transforms with neighboring sample reduction and/or for optimizing video delivery (e.g., broadcast, streaming, file playback, etc.), e.g., for video applications and/or for virtual reality applications.
Further, examples, embodiments and aspects may refer Versatile Video Coding (VVC) or successors. Also, further embodiments, examples and aspects will be defined by the enclosed claims.
It should be noted that any embodiments, examples and aspects as defined by the claims can be supplemented by any of the details (features and functionalities) described in the following chapters.
Also, the embodiments, examples and aspects described in the following chapters can be used individually, and can also be supplemented by any of the features in another chapter, or by any feature included in the claims.
Also, it should be noted that individual, examples, embodiments and aspects described herein can be used individually or in combination. Thus, details can be added to each of said individual aspects without adding details to another one of said examples, embodiments and aspects.
It should also be noted that the present disclosure describes, explicitly or implicitly, features of decoding and/or encoding system and/or method.
Moreover, features and functionalities disclosed herein relating to a method can also be used in an apparatus. Furthermore, any features and functionalities disclosed herein with respect to an apparatus can also be used in a corresponding method. In other words, the methods disclosed herein can be supplemented by any of the features and functionalities described with respect to the apparatuses.
Also, any of the features and functionalities described herein can be implemented in hardware or in software, or using a combination of hardware and software, as will be described in the section “implementation alternatives”.
Moreover, any of the features described in parentheses (“( . . . )” or “[ . . . ]”) may be considered as optional in some examples, embodiments, or aspects.
In order to ease the understanding of the following examples of the present application, the description starts with a presentation of possible encoders and decoders fitting thereto into which the subsequently outlined examples of the present application could be built.
As mentioned, encoder 14 performs the encoding in a block-wise manner or block-based. To this, encoder 14 subdivides picture 10 into blocks, units of which encoder 14 encodes picture 10 into data stream 12. Examples of possible subdivisions of picture 10 into blocks 18 are set out in more detail below. Generally, the subdivision may end-up into blocks 18 of constant size such as an array of blocks arranged in rows and columns. Alternatively, the subdivision may end-up into blocks 18 of different block sizes such as by use of a hierarchical multi-tree subdivisioning with starting the multi-tree subdivisioning from the whole picture area of picture 10 or from a pre-partitioning of picture 10 into an array of tree blocks. These examples shall not be treated as excluding other possible ways of subdivisioning picture 10 into blocks 18.
Further, encoder 14 is a predictive encoder configured to predictively encode picture 10 into data stream 12. For a certain block 18 this means that encoder 14 determines a prediction signal for block 18 and encodes the prediction residual, i.e. the prediction error at which the prediction signal deviates from the actual picture content within block 18, into data stream 12.
Encoder 14 may support different prediction modes so as to derive the prediction signal for a certain block 18. The prediction modes, which are of importance in the following examples, are intra-prediction modes according to which the inner of block 18 is predicted spatially from neighboring, already encoded samples of picture 10. The encoding of picture 10 into data stream 12 and, accordingly, the corresponding decoding procedure, may be based on a certain coding order 20 defined among blocks 18. For instance, the coding order 20 may traverse blocks 18 in a raster scan order such as row-wise from top to bottom with traversing each row from left to right, for instance. In case of hierarchical multi-tree based subdivisioning, raster scan ordering may be applied within each hierarchy level, wherein a depth-first traversal order may be applied, i.e. leaf nodes within a block of a certain hierarchy level may precede blocks of the same hierarchy level having the same parent block according to coding order 20. Depending on the coding order 20, neighboring, already encoded samples of a block 18 may be located usually at one or more sides of block 18. In some of the examples presented herein, neighboring, already encoded samples of a block 18 are located to the top of, and to the left of block 18.
Intra-prediction modes may not be the only ones supported by encoder 14. In case of encoder 14 being a video encoder, for instance, encoder 14 may also support inter-prediction modes according to which a block 18 is temporarily predicted from a previously encoded picture of video 16. Such an inter-prediction mode may be a motion-compensated prediction mode according to which a motion vector is signaled for such a block 18 indicating a relative spatial offset of the portion from which the prediction signal of block 18 is to be derived as a copy. Additionally, or alternatively, other non-intra-prediction modes may be available as well such as inter-prediction modes in case of encoder 14 being a multi-view encoder, or non-predictive modes according to which the inner of block 18 is coded as is, i.e. without any prediction.
Before starting with focusing the description of the present application onto intra-prediction modes, a more specific example for a possible block-based encoder, i.e. for a possible implementation of encoder 14, as described with respect to
As already mentioned above, encoder 14 performs block-based encoding. In certain embodiments, the encoder 14 subdivides picture 10 into blocks for which an intra-prediction mode is selected out of a set or plurality of intra-prediction modes supported by predictor 44 or encoder 14, respectively, and the selected intra-prediction mode performed individually. Other sorts of blocks into which picture 10 is subdivided may, however, exist as well. For instance, the above-mentioned decision whether picture 10 is inter-coded or intra-coded may be done at a granularity or in units of blocks deviating from blocks 18. For instance, the inter/intra mode decision may be performed at a level of coding blocks into which picture 10 is subdivided, and each coding block is subdivided into prediction blocks. Prediction blocks with encoding blocks for which it has been decided that intra-prediction is used, are each subdivided to an intra-prediction mode decision. To this, for each of these prediction blocks, the encoder decides which supported intra-prediction mode should be used for the respective prediction block. These prediction blocks will form blocks 18 which are of interest here. Prediction blocks within coding blocks associated with inter-prediction would be treated differently by predictor 44. They would be inter-predicted from reference pictures by determining a motion vector and copying the prediction signal for this block from a location in the reference picture pointed to by the motion vector. Another block subdivisioning pertains the subdivisioning into transform blocks at units of which the transformations by transformer 32 and inverse transformer 40 are performed. Transformed blocks may, for instance, be the result of further subdivisioning coding blocks. Naturally, the examples set out herein should not be treated as being limiting and other examples exist as well. For the sake of completeness only, it is noted that the subdivisioning into coding blocks may, for instance, use multi-tree subdivisioning, and prediction blocks and/or transform blocks may be obtained by further subdividing coding blocks using multi-tree subdivisioning, as well.
A decoder 54 or apparatus for block-wise decoding fitting to the encoder 14 of
Again, with respect to
As said, the above examples are merely illustrative for video codecs, i.e., decoders and encoders, into which the subsequently described embodiments could be built into. A further example is VVC. That is, embodiments of the present application may also result from a modification of the VVC as defined in the current VVC specification so far. This current VVC specification is used in order to describe an embodiment of the present application by way of a modification and motivate the underlying thoughts and advantages. Thereinafter, a broader description of embodiments of the present application is presented. The above-outlined details for possible decoders and encoders into which embodiments of the present application may be built into, may individually or in combination be used in order to specify and further develop the broadened embodiment described herein below.
In the current VVC specification, if a frame is coded in the 4-4-4 color format (sps_chroma_format_idc equal to 3 in the VVC specification) and with the tree type single tree (treeType equal to SINGLE_TREE in the VVC specification), then, in the VVC specification, it is not possible to code a block in the following configuration:
This can be seen as follows, where, from now on, it is assumed that sps_chroma_format_idc is equal to 3, that treeType is equal to SINGLE_TREE and that the adaptive color transform is disabled on the given block, i.e cu_act_enabled_flag[xCb][yCb]=0 in section 8.4.3.
According to section 8.4.5.2.1, on the luma component (cIdx equal to 0) of the block, MIP can be used as an intra prediction mode if the MIP-flag for the given block, denoted IntraMipFlagp[xTbCmp][yTbCmp] in that section, is equal to 1.
However, if the MIP-flag for the given block is equal to 1, then, in section 8.4.3, two options are possible. For example, if the MIP-flag for the given block is equal to 1, either the intra chroma mode, denoted intra_chroma_pred_mode in that section, is equal to 4, or intra_chroma_pred_mode is not equal to 4.
If the intra chroma mode of section 8.4.3 is equal to 4, then, according to 8.4.3, the Mip chroma direct flag, denoted MipChromaDirectFlag[xCb][yCb] in that section, is always inferred to be equal to 1. However, in this case, according to section 8.4.5.2.1, for the given block, the intra prediction signal for chroma (cIdx not equal to 0) has to be computed by the MIP-prediction specified in 8.4.5.2.2. Thus, if the intra chroma mode of section 8.4.3 is equal to 4, the chroma intra prediction signal cannot be computed using the planar mode. If the intra chroma mode of section 8.4.3 is not equal to 4, then, according to 8.4.3, the variable lumaIntraPredMode is set to INTRA_PLANAR-0. Consequently, according to Table 20, the chroma intra prediction mode, denoted IntraPredModeC[xCb][yCb] in 8.4.3, can only be set to 66 or to 81 or to 82 or to 83. Thus, according to section 8.4.5.2.1, the intra prediction signal for chroma (cIdx equal to 1 or 2), is computed by the process specified in section 8.4.5.2.6. Consequently, according to 8.4.5.2.6, for chroma, where the chroma intra prediction mode is denoted by predModeIntra in that section, the chroma intra prediction signal is either computed by the process specified in section 8.4.5.2.14 (for predModeIntra equal to 81 or 82 or 83) or by the process specified in section 8.4.5.2.13 (for predModeIntra equal to 66). In particular, the chroma intra prediction signal is not computed by the planar process, which is specified in 8.4.5.2.11. Thus, if the intra chroma mode of section 8.4.3 is not equal to 4, the intra chroma prediction signal is not generated by the planar mode either.
However, embodiments of the present disclosure take into consideration that it is desirable to enable the possibility for a combination of MIP on luma and Planar on Chroma for the case of 4-4-4 single tree and no ACT in a way that fits to the structure of the surrounding VVC specification.
As a first observation, and from a signal-processing perspective, in the 4-4-4-single-tree case it is, on the one hand, very desirable to support, for the case that the adaptive color transform is disabled on a given block, the possibility that the luma intra prediction signal is generated by a MIP-mode and that the chroma intra prediction signals are computed by the planar mode, while, on the other hand, it is also very desirable to support the possibility that luma and chroma prediction signals are computed by the same MIP-mode. The reason for this observation is that both MIP and planar modes target rather smooth content and that, for typical content, the luma and the chroma signals of a block reveal some correlation. However, depending on the content, this correlation might either reach the degree that one can assume that both signals are rather smooth, in which case it might be highly beneficial to allow MIP in the luma component but planar in the chroma component, or it might be so strong that it is preferable that luma and chroma use exactly the same intra prediction mode. Consequently, following the general principle of hybrid video coding which tries to support efficient coding options for all statistically significant cases, it is very desirable to support both combinations of luma and chroma prediction generation in the 4-4-4-single-tree case. However, it is noted that, by the very design of the adaptive color transform, the first combination, the usage of a MIP mode in the luma, but of the planar mode in the chroma components, is not suitable in the case that the adaptive color transform is enabled.
As a second observation, is it noted that the VVC specification is finalized already for some time and multiple companies worldwide already work on implementing it on devices, in particular for the case of a 4-2-0 color format. During the VVC-standardization process, the design of the standard for the 4-2-0 format was fixed earlier in the process while more specialized applications like the 4-4-4 case were treated later. Thus, it is highly desirable for any VVC modification that, if possible, for the case of the 4-2-0 color format the specification should remain unchanged. This holds particularly true for the intra-mode-derivation process of VVC, which is already quite sophisticated and which, for the 4-2-0 case, was designed carefully by the JVET community. In particular, for the non-4-4-4-single tree case, an important aspect of the VVC-intra-mode derivation is that, for the case that on a given block the luma intra prediction signal is computed using a MIP-mode, the direct mode (intra_chroma_pred_mode equal to 4 in section 8.4.3) always implies that the chroma intra prediction signals are computed using a planar mode. This is due to the fact that for non-4-4-4-single-tree cases, there is no appropriate way to interpret a MIP-mode on the luma block also as a MIP-mode on the chroma block, in particular since MIP-modes are defined depending on the shape (or size), where, for the non-4-4-4-single-tree case, the shape of the luma and of the chroma components of a given block might be different. Here, the planar mode is a reasonable candidate to be used in chroma when MIP is used in luma since both MIP modes and the planar mode tend to be more efficient for rather smooth content.
However, if it is the case that for chroma, the planar intra prediction signal can be generated by using the direct mode, in order to avoid redundancy, it is desirable that the planar mode can be signaled via the direct-mode option and not by any other option. Hence, in Table 20 of section 8.4.3 of the specification, the planar mode should occur once in each column. For this reason, in Table 20 of the current specification, the planar mode has to be removed from the row corresponding to intra_chroma_pred_mode=0 whenever its usage as a chroma intra prediction mode can be signaled via the direct mode, i.e. via setting intra_chroma_pred_mode=4.
Despite the highly sophisticated process of mode-derivation in the VVC-specification, which is partially outlined above, the embodiments of the present disclosure describe a modification of the VVC specification that alters the specific MIP-Planar combinations between luma and chroma for the case of a 4-4-4-single-tree-non-ACT block, but keeps the functionality of the specification unmodified in all other cases. However, in order to extend the capabilities of the VVC standard towards the possibility of enabling the combination of MIP on the luma and planar on the chroma component of a block in the 4-4-4 single-tree-non-ACT case while, at the same time, suggesting very limited modifications of the VVC specification that in particular do not alter the whole design for the non-4-4-4-single-tree-non-ACT case, which is described in various embodiments of the present application.
It is suggested to introduce a specific new value for the luma intra prediction mode. This value is chosen such that all the aforementioned aspects can be kept, i.e. the specification, in particular the intra-mode derivation, is kept unchanged for all cases that are not 4-4-4-no-ACT. Thus, on the one hand, in the case that one does not have single-tree-4-4-4, in the present embodiment, Table 20 is designed such that the direct mode is always the planar mode in the case that a MIP mode is used to compute the luma intra prediction signal and consequently, for this case, the planar mode does not occur in the row corresponding to intra_chroma_pred_mode=0 (since it is already present in the row corresponding to intra_chroma_pred_mode=4). On the other hand, for the case of 4-4-4-single-tree, by the method suggested in the present disclosure, Table 20 is designed such that the planar mode is assigned to the row corresponding to intra_chroma_pred_mode=0 in the case that a MIP mode is used to compute the luma intra prediction signal.
As a concrete example, all above-mentioned goals are achieved by the following modification of section 8.4.3 of the VVC specification, highlighted by underlining.
Input to this process are:
In this process, the chroma intra prediction mode IntraPredModeC[xCb][yCb] and the MIP chroma direct mode flag MipChromaDirectFlag[xCb][yCb] are derived.
The variable isSingleTreeAnd444 is derived as follows:
The MIP chroma direct mode flag MipChromaDirectFlag[xCb][yCb] is derived as follows:
The chroma intra prediction mode IntraPredModeC[xCb][yCb] is derived as follows:
IntraPredModeY[xCb+cbWidth/2][yCb+cbHeight/2].
Embodiments of the present disclosure will now describe how the proposed modification solves the identified problem and meets all described design-criteria.
First, the presented solution does not alter anything for the cases which are not 4-4-4-single-tree case. Namely, in these cases, lumaIntraPredMode cannot be set to −1 since the conditions that sps_chroma_format_idc is equal to 3 and that treeType is equal to SINGLE_TREE cannot both be true. Thus, the solution does not modify the outcome of Table 20 and of the whole section 8.4.3 in this case.
Second, the modified specification now supports that, in the 4-4-4-single-tree-no-ACT case, for a given block, the luma intra prediction signal is generated using a MIP mode while the chroma intra prediction signals are generated using the planar mode. Namely, in the case 4-4-4-single-tree-no-ACT, if in the above modified version of section 8.4.3, the luma intra prediction signal is generated using a MIP-mode, i.e. if IntraMipFlag[xCb+cbWidth/2][yCb+cbHeight/2] is equal to 1, then, by the above modification, lumaIntraPredMode is set equal to −1. Consequently, according to the above modified Table 20, if intra_chroma_pred_mod is equal to 0, then the chroma intra mode IntraPredModeC[xCb][yCb] is set equal to 0, which corresponds to the planar mode. Moreover, since in 8.4.3, intra_chroma_pred_mod is equal to 0, the MIP direct mode flag, denoted MipChromaDirectFlag[xCb][yCb] in that section, is set equal to 0. Consequently, according to section 8.4.5.2.1, for chroma (cIdx not equal to zero), the chroma intra prediction signal is computed by the process specified in section 8.4.5.2.6, where predModeIntra equals IntraPredModeC[xCb][yCb]=0=INTRA_PLANAR. Moreover, according to section 8.4.5.2.6 this implies that the chroma intra prediction signal is computed by the process specified in section 8.4.5.2.11, i.e. the chroma intra prediction signal is computed using the planar mode.
Third, the modified specification also supports that, in the 4-4-4-single-tree case, for a given block, the luma intra prediction signal is generated using a MIP mode and the chroma intra prediction signals are computed using the same MIP mode. Namely, if in the 4-4-4-single-tree case, in the above modified version of section 8.4.3, the luma intra prediction signal is generated using a MIP-mode, i.e. if IntraMipFlag[xCb+cbWidth/2][yCb+cbHeight/2] is equal to 1 and if intra_chroma_pred_mod is equal to 4 or if the adaptive color transform is enabled, then, the MIP direct mode flag, denoted MipChromaDirectFlag[xCb][yCb] in that section, is set equal to 1. Consequently, in section 8.4.3, IntraPredModeC[xCb][yCb] is set equal to IntraPredMode Y[xCb][yCb], which is the MIP mode used for luma. Moreover, in section 8.4.5.2.1, the intra prediction signals for chroma (cIdx not equal to zero) are computed by the MIP-prediction specified in 8.4.5.2.2, since MipChromaDirectFlag[xTbCmp][yTbCmp] is equal to 1. Here, the MIP-mode IntraPredModeC[xCb][yCb] is used, which equals the MIP mode IntraPredMode Y[xCb][yCb] used for luma.
Now that a certain embodiment of the present application has been outlined and presented as a modification of the current VVC specification along with a presentation of the associated advantages thereof, the description of the present application proceeds with broadened embodiments which are not necessarily restricted to an implementation according to VVC. Again, for the subsequently explained embodiments, it is naturally possibly to extend same by individual or combined details of the description brought forward above, i.e. those brought forward with respect to
Reference is made to
The block-based decoder 54 of
The decoder 54 of
Other than partitioning scheme type 310a, the partitioning scheme type 310b decouples the partitioning of picture 10 into first color component blocks on the one hand and the partitioning of picture 10 into a second color component blocks on the other hand. To be more precise, according to partitioning scheme type 310b, the decoder decodes from data stream 12 first color component partitioning information for defining the partitioning of picture 10 into first color component blocks, and second color partitioning information for partitioning picture 10 into second color component blocks. The partitioning of the second color component into the second color component blocks may be defined by the second color component partitioning information uniquely or in a manner depending on the first color component partitioning information. In any case, as can be seen, the partitioning scheme type 310b causes that there might be, and in
The aforementioned first and second color component blocks may from the block level at which decoder and encoder perform the selection among intra coding and inter coding, i.e. intra-prediction and inter-prediction. In case of the partitioning scheme type 310a, for instance, it might be that the selection is done commonly for co-located first and second color component blocks, i.e., so that each second color component block is intra-predicted if the co-located first component block is intra-predicted as well, and is inter-predicted if it is co-located first component block is inter-predicted as well.
As decoder 54 supports different color sampling formats and different partitioning schemes, the block-based decoder 54 derives 404 the color sampling format to be used for picture 10 and the partitioning scheme to be used for picture 10 from one or more first syntax elements 402 in data stream 12.
The description of the functionality of decoder 54 concentrates, from now on, onto the mode of operation with respect to intra-predicted blocks. In principle, however, and as it got clear from the indications above pointing to
Decoder 54 decodes the first color component 101 of picture 10 in the block-wise manner and, in doing so, selects 406, for each of intra-predicted first color component blocks of picture 10, one out of several intra-prediction modes.
According to the embodiment of
The directional modes 5041 . . . n mutually differ in the definition of an angle or direction along which the picture content defined in the already decoded neighborhood 410 is copied into, or extrapolated into, the block inner. To this end, some interpolation may be used in order to define sub-sample positions between the reference samples and choosing for each sample in the block inner that sample or sub-sample position in the neighborhood which hits the respective sample position in the block inner when using the angle or direction of the corresponding directional mode for aiming. The chosen sample/sub-sample's sample value is then used or copied as predicted sample value. In other words, the neighborhood's content is copied into, alongside the mode's direction, into the block inner.
In case of matrix-based intra-prediction modes 5101 . . . m, the block inner 408 is predicted by deriving a sample value vector 514 out of the reference samples of the already decoded neighborhood 410, neighboring the block inner 408, computing a matrix-vector product 512 between the sample value vector 414 and a prediction matrix 516 which is associated with a respective matrix-based inter-prediction mode so as to obtain a prediction vector 518 and predicting samples 519 in the block inner 408 on the basis of this prediction vector 518.
Note that, theoretically, each of the other modes 5041 . . . n to 508 could, theoretically, be implemented using a matrix-vector product, but each prediction matrix 516 associated with the matrix-based intra-prediction modes are the result of machine learning and while same might be integer valued, they tend to be irregular other than matrices would which would emulate any of modes 5041 . . . n to 508. In even other words, the matrix-based intra-prediction modes have no real semantic meaning with respect to the way the block inner 408 is predicted based on the neighborhood 410, other than the other intra-prediction modes 504 to 508. The latter statement is also interesting when inspecting different block sizes. As already noted above, all partitioning schemes lead to picture 10 being partitioned into blocks of different sizes with respect to both color components with this leading, in turn, also to intra-predicted blocks of different sizes again with respect to the first and second color component, respectively. In order to account for intra-predicted blocks of different sizes, decoder 54 selects the prediction matrices associated with the matrix-based intra-prediction modes depending on a size of the block inner 408 of the block to be intra predicted. In even other words, the block-based decoder 54 manages for each possible block size or for each of several block size ranges, a different set of prediction matrices associated with the matrix-based intra-prediction modes. Since these prediction matrices are, however, the result of machine learning, the prediction matrices associated with the matrix-based intra-prediction modes for one block size are not semantically comparable with the different set of prediction matrices associated with the matrix-based intra-prediction modes for a different block size.
The latter statement leads us to the main topic of the block-based decoder 54, namely the question how to effectively determine the intra-prediction mode to be used for a predetermined second color component block 182i with respect to which block 181i forms a co-located intra-predicted first color component block. Again, the block-based decoder 54 selects one out of the available intra-prediction modes for the intra-predicted first color component block 181i such as by use of one or more syntax elements 400 in data stream 12. For instance, the selection may be guided by one syntax element selecting one out of the set of available matrix-based intra-prediction modes 506 to 510m, or by a set of syntax elements among which, for instance, one or two may indicate whether the intra-prediction mode for the intra-predictive first component block 181i is to be selected out of a list of most probable prediction modes construed on the basis of intra-prediction modes used for neighboring intra-predicted first component blocks, neighboring in spatial and/or temporal terms, and, if yes, which one of this list of most probable intra-prediction modes, and if not, a further syntax element specifying the intra-prediction mode to be used for the intra-predicted first component block 181i out of the remaining available intra-prediction modes, excluded from the list of most probable prediction modes. Other possibilities exist as well and the explicit examples shall not be treated as exclusive.
In decoding the second color component 102 of picture 10 in a block-wise manner, decoder 54 decodes, for the second color component block 182i, a second color-component intra-prediction mode syntax element 403 from data stream 12. The syntax element 403 has a comparatively reduced set of available states and may index one out of a set 412 of explicit intra-prediction modes 4121 . . . p, and a special direct mode or inter-component mode induction mode 416. Note that, alternatively, the decoder may determine the intra-prediction mode index from the data stream implicitly, i.e. not through explicit signaling using intra-prediction mode syntax element 403, but by an inference based on other information contained in the data stream such as by way of inspecting whether the data stream indicates that for the block 182i ACT applies or not.
That is, decoder is capable of applying each of modes 5041 to 510m onto both first-component intra-prediction blocks and second-component intra-prediction blocks with the block inner 108 relating to the block to be predicted and the neighborhood relating to first component samples in case of the block inner being the block inner of a first-component intra-prediction block, and relating to second component samples in case of the block inner being the block inner of a second-component intra-prediction block. That is, the intra-prediction modes operate, with respect to the sample prediction, intra-color-component wise. However, merely a reduced set out of these modes is available/signalable for a certain block 182i, as described below.
Both, the set 412 as well as the direct mode 416 are determined or populated or recruited in a manner depending on the intra-prediction mode having been selected for the co-located first component intra-prediction block 181i. Together, the intra-prediction modes 4121 . . . p of set 412 and mode 416 form a reduced set of list out of which a selection is made by way of index such as the index represented by syntax element 403. The modes of the list 412+416 are, as said, recruited, i.e. are selected out of, the larger set of all available modes 5041 to 510m according to certain rules outlined in more detail below. These rules follow the thoughts outlined above and also outlined below. This results into a very effective signaling of the intra-prediction mode to be used for block 182i as, firstly, the intra-prediction mode syntax element 403 has a reduced set of available states to make a selection there among, and these states or modes 412+416 are filled or defined by decoder and encoder in an efficient manner so that this reduced set of states or modes 412+416 most likely result into a selectable set of modes for block 182i, which comprises the mode out of all modes which fits best or results in the best coding compression for block 182i even when considering the other modes which had been available for block 181i, which are, however, not included among the available states or modes 412+416 for block 182i.
In particular, if the intra-prediction mode index indexes one of the set 412 of explicit intra-prediction modes, the decoder forms the set of explicit intra-prediction modes in a manner distinguishing between several cases namely ones, where the color sampling format defines the common sampling format type 300a and the partitioning scheme defines the common block-partitioning according to 310a, and ones where this is not the case. The reason for treating the case where the common sampling format type 300a and the common block-partitioning according to 310a apply, as a specific situation is that, in this situation, the intra-predicted second component block 182i as well as the co-located intra-predicted first component block 181i are of equal size and shape and accordingly, if a matrix-based intra-prediction mode has been selected for block 181i, this mode is also meaningful for block 182i. If, however, this color sampling format and prediction scheme situation does not apply, it is not clear whether a matrix-based prediction mode possibly elected for block 181i, i.e. one having a certain index, would also be suitable for block 182i, as the blocks might be, or are definitely, of different size/shape and accordingly, the concordance between the matrix-based intra-prediction modes might not be given. Accordingly, the decoder uses a selected matrix-based prediction mode for block 181i as the direct mode 416 merely in case of this common sampling format type 300a and common block-partitioning 310a and uses the planar mode 506 as a substitute for direct mode 416 otherwise. This, in turn, means that the planar mode is not available by way of direct mode 416 in case of the common sampling format type 300a and the common block-partitioning 310a applying, so that in the latter case, the set 412 of explicit modes is formed in a manner so that the planar mode is included in this set in case of the common sampling format type 300a and the common block-partitioning 310a applying.
The exact way of forming set 412 and defining direct mode 416 are defined in the claims. The encoder performs the same predictions, but selects the intra-prediction modes wherever the selection is free, for blocks 181i and 182i such as in rate-distortion optimization sense.
In particular, for defining the direct mode 416, the following applies:
If the color sampling format defines the common sampling format type 300a and the partitioning scheme defines the common block-partitioning 310a, and if one of the matrix-based intra-prediction modes 5101 to 510m is selected for the co-located intra-predicted first-color-component block 18a1, the one of the matrix-based intra-prediction modes 5101 to 510m which is selected for the co-located intra-predicted first-color-component block 18a1 is used as direct mode, as there is no block size/shape gap between first and second component blocks 181i and 182i and, thus, the same mode may reasonably be used.
If this situation does not apply (i.e. if the color sampling format does not define the common sampling format type 300a or the partitioning scheme does not define the common block-partitioning 310a) and if any of the matrix-based intra-prediction modes 5101 to 510m is selected for the co-located intra-predicted first-color-component block 18a1, then the planar mode is used as direct mode 416.
If any of the planar, DC or the directional modes is selected for the co-located intra-predicted first-color-component block 18a1, then the intra-prediction mode selected for the co-located intra-predicted first-color-component block 18a1 is used as direct mode 416 irrespective of whether the special situation of 300a and 310a applies are not.
Set 412 is formed so that the planar mode is available for block 182i. For example, if the color sampling format defines the common sampling format type 300a and the partitioning scheme defines the common block-partitioning according to 310a, and if any of the matrix-based intra-prediction modes 5101 to 510m is selected for a co-located intra-predicted first-color-component block 18a1, for one of the set 412 of explicit intra-prediction modes the planar mode is used, since in that case the direct mode 416 is represented by one of the matrix-based modes, while the remaining modes of set 412 are recruited from the directional modes and the DC mode.
For another example, if the DC mode or any of the directional modes is selected for the co-located intra-predicted first-color-component block 18a1, for one of the set 412 of explicit intra-prediction modes the planar mode is used, while the remaining modes of set 412 are recruited from the directional modes and the DC mode.
For yet another example, if the planar mode is selected for the co-located intra-predicted first-color-component block 18a1, the set of explicit intra-prediction modes is recruited from directional modes and the DC mode and excludes the planar mode.
Additionally, for yet another example, if the color sampling format does not define the common sampling format type 300a or the partitioning scheme does not define that the more than one color component are coded into the data stream using the common block-partitioning according to 310a, and if one of the matrix-based intra-prediction modes 5101 to 510m is selected for a co-located intra-predicted first-color-component block 18a1, the set of explicit intra-prediction modes is recruited from directional modes and excludes the planar mode, since in that case, the planar mode is used as the direct mode as a substitute for the matrix based mode of block 181i due to the difference in size/shape.
In recruiting set 412, it might be that the planar mode, if to be included into set 412, is most preferred, followed by the DC mode, unless same is used as direct mode 416, followed by a predetermined sequence of decreasingly preferred directional modes among which any is left-off if, by accident, being used as the direct mode.
Although the figures illustrate different examples of user equipment, various changes may be made to the figures. For example, the user equipment can include any number of each component in any suitable arrangement. In general, the figures do not limit the scope of this disclosure to any particular configuration(s). Moreover, while figures illustrate operational environments in which various user equipment features disclosed in this patent document can be used, these features can be used in any other suitable system. None of the description in this application should be read as implying that any particular element, step, or function is an essential element that must be included in the claims scope.
Although the present disclosure has been described with an exemplary embodiment, various changes and modifications may be suggested to one skilled in the art. It is intended that the present disclosure encompass such changes and modifications as fall within the scope of the appended claims.
Number | Date | Country | Kind |
---|---|---|---|
23184567.8 | Jul 2023 | EP | regional |