The present disclosure generally relates to scalable video coding schemes, in particular encoding and decoding schemes (e.g., codecs) having signal element modification functionality.
An encoded video signal having scalability can be decoded (by a decoder) at different levels of quality (e.g., at different spatial dimensions). This is advantageous because it means that a single encoded video signal can be sent to many types of decoding devices (each having different operating capabilities), and each device can decode the encoded video signal in line with the operating capability of the decoder. For example, a first decoding device may only be able to decode and render an encoded video signal using a base codec whereas a second decoding device may be able to decode and render an encoded video signal using a base (or “base layer”) codec and an enhancement (or “enhancement layer”) codec. If an encoded video signal does not have scalability, then a first encoded video signal needing only the base codec would need to be sent to the first decoding device and a second encoded video signal needing both the base layer codec and the enhancement layer codec would be sent to the second decoding device. Therefore, it is desirable to have an (single) encoded video signal that can be decoded using the base codec by the first decoding device and the base and enhancement layer codecs by the second decoding device.
However, the first decoding device might, in some cases, be able to reconstruct a visually acceptable representation of an original video signal using only the base codec, albeit not one having the visual quality of a representation reconstructed using both the base and enhancement layer codecs.
This may be undesirable in terms of content protection. For example, it may be desirable that decoding using only the base codec does not produce a visually acceptable representation of the original video signal.
In addition, as explained above, it may be desirable for the second decoding device to receive the same encoded video signal as the first decoding device, but to be able to reconstruct the original video signal accurately using the same base codec as the first decoder device, as well as the enhancement layer codec.
There is thus a need for scalable video coding schemes and systems, and encoded video signals, having improved content protection.
Aspects and variations of the present invention are set out in the appended claims. Certain unclaimed aspects are further set out in the detailed description below.
In the detailed description below, measures are described which provide improved content protection. Such measures make it difficult or impossible for certain decoders to reconstruct accurately an original signal. Such decoders may be able to create one or more versions of the original signal, but with the version(s) having undesirable (from the perspective of a viewer) visual artefacts. Other decoders may be able to create one or more versions of the original signal without the undesirable visual artefacts and being better reconstructions of the original signal.
Without loss of generality, measures are described which modify a signal by introducing a watermark into the signal. The watermark is present, as a visual artefact, in a base layer rendition of the signal. However, the watermark is fully or substantially removed in one or more enhancement layer renditions of the signal. Without having the correct enhancement layer data to remove the watermark, the watermark appears as an undesirable visual artefact in the output signal.
Examples are presented herein with reference to a signal as a sequence of samples (i.e., two-dimensional images, video frames, video fields, sound frames, etc.). For simplicity, non-limiting embodiments illustrated herein often refer to signals that are displayed as 2D planes of settings (e.g., 2D images in a suitable colour space), such as for instance a video signal. In a preferred case, the signal comprises a video signal. An example video signal is described in more detail with reference to
The terms “picture”, “frame” or “field” will be used interchangeably with the term “image”, so as to indicate a sample in time of the video signal: any concepts and methods illustrated for video signals made of frames (progressive video signals) can be easily applicable also to video signals made of fields (interlaced video signals), and vice versa. Despite the focus of embodiments illustrated herein on image and video signals, people skilled in the art can easily understand that the same concepts and methods are also applicable to any other types of multidimensional signal (e.g., audio signals, volumetric signals, stereoscopic video signals, 3 DoF/6 DoF video signals, plenoptic signals, point clouds, etc.). Although image or video coding examples are provided, the same approaches may be applied to signals with dimensions fewer than two (e.g., audio or sensor streams) or greater than two (e.g., volumetric signals).
In the description the terms “image”, “picture” or “plane” (intended with the broadest meaning of “hyperplane”, i.e., array of elements with any number of dimensions and a given sampling grid) will be often used to identify the digital rendition of a sample of the signal along the sequence of samples, wherein each plane has a given resolution for each of its dimensions (e.g., X and Y), and comprises a set of plane elements (or “signal element”, “element”, or “pel”, or display element for two-dimensional images often called “pixel”, for volumetric images often called “voxel”, etc.) characterized by one or more “signal element values”, “values” or “settings” (e.g., by ways of non-limiting examples, colour settings in a suitable colour space, settings indicating density levels, settings indicating temperature levels, settings indicating audio pitch, settings indicating amplitude, settings indicating depth, settings indicating alpha channel transparency level, etc.). Each plane element is identified by a suitable set of coordinates, indicating the integer positions of said element in the sampling grid of the image. Signal dimensions can include only spatial dimensions (e.g., in the case of an image) or also a time dimension (e.g., in the case of a signal evolving over time, such as a video signal). In one case, a frame of a video signal may be seen to comprise a two-dimensional array with three colour component channels or a three-dimensional array with two spatial dimensions (e.g., of an indicated resolution—with lengths equal to the respective height and width of the frame) and one colour component dimension (e.g., having a length of 3). In certain cases, the processing described herein is performed individually to each plane of colour component values that make up the frame. For example, planes of pixel values representing each of Y, U, and V colour components may be processed in parallel using the methods described herein.
Certain examples described herein use a scalability framework that uses a base encoding and an enhancement encoding. The video coding systems described herein operate upon a received decoding of a base encoding (e.g., frame-by-frame or complete base encoding) and add one or more of spatial, temporal, or other quality enhancements via an enhancement layer. The base encoding may be generated by a base layer, which may use a coding scheme that differs from the enhancement layer, and in certain cases may comprise a legacy or comparative (e.g., older) coding standard.
In the spatially scalable coding scheme, the methods and apparatuses may be based on an overall algorithm which is built over an existing encoding and/or decoding algorithm (e.g., MPEG standards such as AVC/H.264, HEVC/H.265, etc. as well as non-standard algorithms such as VP9, AV1, and others) which works as a baseline for an enhancement layer. The enhancement layer works according to a different encoding and/or decoding algorithm. The idea behind the overall algorithm is to encode/decode hierarchically the video frame as opposed to using block-based approaches as done in the MPEG family of algorithms. Hierarchically encoding a frame includes generating residuals for the full frame, and then a reduced or decimated frame and so on.
Various measures (such as methods, apparatuses, systems and computer programs) are described herein in which a to-be-modified signal is modified using a modification signal to produce a modified signal. The to-be-modified signal may or may not have been modified previously but is to be modified by using the modification signal. The modified signal has an element corresponding to an element of the to-be-modified signal. The elements may be corresponding in that they have the same position (e.g., coordinates) in their respective signals. The element of the modified signal has a modified value with respect to a value of the corresponding element of the to-be-modified signal. As such, the values are different from each other as a result of the modification. The modified signal, or a down-sampled modified signal derived based on down-sampling the modified signal, is sent to be encoded. This may involve sending the modified signal or the down-sampled modified signal to an encoder. A decoded modified signal is received. The decoded modified signal may be received from a decoder, which may be associated with the encoder. The decoded modified signal is a decoded version of the modified signal as encoded (i.e., by the encoder to which it is sent) or of the down-sampled modified signal as encoded (i.e., by the encoder to which it is sent). The decoded modified signal is used to generate a processed signal. Residual data is generated based at least on a value of an element of a target signal and a value of a corresponding element of the processed signal. The target signal and the processed signal may take various different forms, as will become more apparent from the below.
In some examples, such as described below with reference to
In some examples, such as described below with reference to
In some examples, such as described below with reference to
In some examples, such as described below with reference to
In some examples, such as described below with reference to
In some examples, such as described below with reference to
As explained above, in some examples, such as described below with reference to
In some examples, such as described below with reference to
In some examples, such as described below with reference to
As explained above, in some examples, such as described below with reference to
In some examples, such as described below with reference to
In some examples, the modification signal comprises an overlay to be applied to the to-be-modified signal. The overlay may be anything that covers, lies over, or is otherwise applied over the to-be-modified signal.
In some examples, the modification signal comprises a watermark. A watermark is a type of overlay. A watermark may comprise text (e.g. “WATERMARK”), an image, a pattern, or the like.
In some examples, the modified signal has another element corresponding to another element of the to-be-modified signal, and the other element of the modified signal has the same value as the value of the corresponding other element of the to-be-modified signal. Such other element may be considered to be an unmodified element. Unmodified elements are likely to have relatively small associated residuals, which may be more efficient to encode than larger residuals.
In some examples, the modifying of the to-be-modified signal comprises combining the to-be-modified signal with the modification signal. Such combining may take various different forms. Examples include, but are not limited to, addition-based combining, subtraction-based combining and substitution-based combining.
In some examples, the to-be-modified signal comprises a luminance component and chrominance components. The luminance component of the to-be-modified signal is modified based on the modification signal. The chrominance components of the to-be-modified signal are not modified based on the modification signal.
In some examples, the using of the decoded modified signal to generate the processed signal comprises performing an up-sampling operation.
Various measures (such as methods, apparatuses, systems and computer programs) are described herein in which a decoded modified signal is processed to obtain a processed signal the processed signal and residual data are combined to generate a reconstructed signal. The decoded modified signal is a decoded version of a modified signal as encoded or of a down-sampled modified signal as encoded. The down-sampled modified signal having been derived based on the modified signal having been down-sampled. The modified signal has been produced based on a to-be-modified signal having been modified using a modification signal. The modified signal has an element corresponding to an element of the to-be-modified signal. The element of the modified signal has a modified value with respect to a value of the corresponding element of the to-be-modified signal.
The encoder 100 comprises a first input 10 for receiving a first input signal I1.
The encoder 100 comprises a second input 20 for receiving a second input signal I2.
The encoder 100 comprises a modifier 30. The modifier 30 modifies a to-be-modified signal to generate a modified signal. The to-be-modified signal may or may not already have been modified in some way prior to being input to the modifier 30. In this example, the to-be-modified signal comprises the first input signal I1.
In examples, the modifier 30 modifies the to-be-modified signal using (or “based on”) another signal. The other signal may be referred to herein as a “modification signal”, a “modifying signal” or the like. In this example, the modification signal comprises the second input signal I2. The modifier 30 outputs a modified signal IM.
A modifier may be a combining modifier, which modifies the to-be-modified signal by combining the to-be-modified signal with the modification signal. Such combining may take various different forms, examples of which are described below.
A modifier may be an additive modifier, which modifies the to-be-modified signal by adding the modification signal to the to-be-modified signal. Such addition may involve the modifier adding a signal element value of a signal element of the modification signal to a signal element value of a corresponding signal element of the to-be-modified signal. The result of such addition may be a signal element value of a corresponding signal element of the modified signal.
A modifier may be a subtractive modifier, which modifies the to-be-modified signal by subtracting the modification signal from the to-be-modified signal (or vice versa). Such subtraction may involve the modifier subtracting a signal element value of a signal element of the modification signal from a signal element value of a corresponding signal element of the to-be-modified signal. The result of such subtraction may be a signal element value of a corresponding signal element of the modified signal.
A modifier may be a substitutive (or “substitution”, “replacing”, “replacement”, “overwriting” or the like) modifier, which modifies the to-be-modified signal by substituting one or more values of the to-be-modified signal with one or more corresponding values from the modification signal. Such substitution may involve the modifier replacing a signal element value of a signal element of the to-be-modified signal with a signal element value of a corresponding signal element of the modification signal. The result of such substitution may be signal element value of a corresponding signal element of the modified signal.
A modifier may be a shifting (or “shift”) modifier, which modifies the to-be-modified signal by shifting the position of one or more values of the to-be-modified signal within the to-be-modified signal. For example, the modification signal may indicate that all values of the to-be-modified signal are to be shifted one signal element to the right. In such an example, the value “one” from the modification signal is not added to or subtracted from, and does not replace, any signal element values of the to-be-modified signal, but nevertheless influences how the to-be-modified signal is modified.
Other types of modifier may be used.
As such, in examples, the modified signal has an element (e.g., a signal element) corresponding to an element (e.g., a signal element) of the to-be-modified signal. The element (e.g., a signal element) of the modified signal has a modified value with respect to a value of the corresponding element (e.g., a signal element) of the to-be-modified signal. As such, the modified value is different from the value of the corresponding element of the to-be-modified signal. The modified value may have been modified as a result of addition, subtraction, substitution, shifting or any other modifying operation.
The modified input signal IM is provided to a down-sampler 105D. The down-sampler 105D outputs, at the base level, a down-sampled signal B derived based on down-sampling the modified input signal IM. The down-sampled signal B may be considered to be a down-sampled modified signal, since it is a down-sampled version of the modified input signal IM. The down-sampler 105D outputs (or “sends”) to an encoder, which in this example is a base encoder 120E at the base level of the encoder 100. The down-sampler 105D also outputs to a residual generator 110-S. An encoded base stream BE is created directly by the base encoder 120E, and may be quantised and entropy encoded as necessary according to the base encoding scheme. The encoded base stream BE may be referred to as the base layer or base level. The encoded base stream BE may be considered to be an encoded modified signal, since it is an encoded version of the down-sampled signal B, which itself is a down-sampled version of the modified input signal IM.
To generate an encoded sub-layer 1 enhancement stream, the encoded base stream BE is received from the encoder, which in this example is the base encoder 120E. The encoded base stream BE is decoded via a decoding operation that is applied at a base decoder 120D. The decoding generates a decoded base stream BD. In preferred examples, the base decoder 120D may be a decoding component that complements an encoding component in the form of the base encoder 120E within a base codec. In other examples, the base decoding block 120D may instead be part of the enhancement level.
The decoded base stream BD is then processed to generate a processed signal. In this example, the processed signal is an up-sampled signal U, which will be described in more detail below.
Via the residual generator 110-S, a difference between the decoded base stream BD output from the base decoder 120D and the down-sampled input video B is created (i.e., a subtraction operation 110-S is applied to a frame of the down-sampled input video B and a frame of the decoded base stream BD to generate a set of residuals). As such, the residual generator 110-S uses and processes both the decoded base stream BD and the down-sampled input video B to generate the set of residuals. Here, residuals represent the error or differences between a reference signal or frame and a desired signal or frame. The residuals used in the first enhancement level can be considered as a correction signal as they are able to ‘correct’ a frame of a (future) decoded base stream. To distinguish from the residuals used in the second enhancement level, the residuals used in the first enhancement level will be referred to hereafter as “correction data”. The correction data is useful as it can correct for quirks or other peculiarities of the base codec. These include, amongst others, motion compensation algorithms applied by the base codec, quantisation and entropy encoding applied by the base codec, and block adjustments applied by the base codec.
In
To generate the encoded sub-layer 2 stream, a further level of enhancement information is created by producing and encoding a further set of residuals via residual generator 100-S.
The up-sampler 105U up-samples a corrected (via the below-described sub-layer 1 correction) version of the decoded base stream BD to produce an up-sampled version U (via up-sampler 105U) of a corrected version of the decoded base stream BD. The up-sampled version U may be referred to as a reference signal, a reference frame, or the like. In this example, the up-sampled version U is the processed signal referred to above.
To achieve a reconstruction of the corrected version of the decoded base stream as would be generated at a decoder (e.g., as shown in
The residual generator 100-S generates a further set of residuals R based at least on a value of an element of a target signal and a value of a corresponding element of the processed signal. The target signal may also be referred to as a target frame, a desired signal, a desired frame or the like. In this example, the target signal is the first input signal I1, which is also the to-be-modified signal in this example. As explained above, in this example the processed signal comprises the up-sampled signal U.
As such, in this example, the further set of residuals R are the difference between the up-sampled version U (via up-sampler 105U) of the corrected version of the decoded base stream BD (the processed signal), and the first input signal I1 (the target signal).
In more detail, the up-sampled signal U (i.e., reference signal or frame) is compared to the first input signal I1 (i.e., desired signal or frame) to create the further set of residuals R (i.e., a difference operation is applied by the residual generator 100-S to the up-sampled re-created frame U to generate the further set of residuals R). The further set of residuals R is then processed via an encoding pipeline that mirrors that used for the set of correction data to become an encoded sub-layer 2 stream (i.e., an encoding operation is then applied to the further set of residuals R to generate the encoded further enhancement stream). In particular, the further set of residuals R are transformed (i.e., a transform operation 110-0 is performed on the further set of residuals R to generate a further transformed set of residuals). The transformed residuals are then quantized and entropy encoded in the manner described above in relation to the set of correction data (i.e., a quantization operation 120-0 is applied to the transformed set of residuals to generate a further set of quantized residuals; and an entropy encoding operation 120-0 is applied to the quantized further set of residuals to generate the encoded sub-layer 2 stream containing the further level of enhancement information). In certain cases, the operations may be controlled, e.g., such that, only the quantisation step 120-1 may be performed, or only the transform and quantization step. Entropy encoding may optionally be used in addition. Preferably, the entropy encoding operation may be a Huffmann encoding operation or a run-length encoding (RLE) operation, or both (e.g., RLE then Huffmann encoding). The transformation applied at both blocks 110-1 and 110-0 may be a Hadamard transformation that is applied to 2×2 or 4×4 blocks of residuals. In this example, the encoder used to encode the further set of residuals R (or the data derived based on the further set of residuals R) is different from the base encoder 120E used to encode the base stream.
The encoding operation in
As illustrated in
Additionally, and optionally in parallel, the encoded sub-layer 2 stream is processed to produce a decoded further set of residuals R. Similar to sub-layer 1 processing, sub-layer 2 processing comprises an entropy decoding process 230-0, an inverse quantization process 220-0 and an inverse transform process 210-0. Of course, these operations will correspond to those performed at block 100-0 in encoder 100, and one or more of these steps may be omitted as necessary. Block 200-0 produces a decoded sub-layer 2 stream comprising the further set of residuals R and these are summed at operation 200-C with the output U from the up-sampler 205U in order to create a sub-layer 2 reconstruction of the input signal I1, which may be provided as the output O of the decoder. Thus, as illustrated in
The numerical signal element values used in this and other examples should be understood to be by way of example only.
As explained above, first and second input signals I1, I2 are received. In this example, the combiner 30 is an additive combiner, which adds the first and second input signals I1, I2 together to generate the modified input signal IM. In this example, such adding involves adding a signal element value of a signal element in the first input signal I1 and a signal element value of a corresponding signal element in the second input signal I2 together to create a signal element value of a corresponding signal element in the modified input signal IM. In this example, corresponding signal elements have the same position as each other within their respective signals.
The down-sampler 105D receives and down-samples the modified input signal IM to generate the down-sampled input signal B. In this example, such down-sampling involves averaging the signal element values of each signal element in a 2×2 block of signal elements in the modified input signal IM, to give a signal element value of a corresponding signal element in the down-sampled input signal B. However, down-sampling can be performed in other ways. In this example, a signal element in the down-sampled input signal B corresponds to a 2×2 block of signal elements in the modified input signal IM.
In this and other examples, for convenience and brevity, it is assumed that the decoded base stream BD is the same as the down-sampled input signal B. In other words, it is assumed that the down-sampled input signal B is encoded by the base encoder 102E to generate the encoded base stream BE, the encoded base stream BE is decoded by the base decoder 120D to generate the decoded base stream BD, and the signal element values of signal elements of the decoded base stream BD are the same as the signal element values of corresponding signal elements of the down-sampled input signal B. However, the original signal reconstruction O at the higher resolution can still be achieved in a similar manner even if the decoded base stream BD is not the same as the down-sampled input signal B. In particular, the above-described set of correction data can remove encode-decode errors.
In this and other examples, for convenience and brevity, it is also assumed that no errors are introduced in the sub-layer 1 processing. In other words, it is assumed that the output of the residual generator 110-S is the same is the input to the summing operation 110-C from the inverse transform block 110-1i. Again, the original signal reconstruction O at the higher resolution can still be achieved in a similar manner even if errors are introduced in the sub-layer 1 processing.
The up-sampler 105U up-samples the decoded base stream BD (whether or not the decoded base stream BD has been subject to correction) to generate the up-sampled signal U.
In this example, the signal element values of signal elements of the up-sampled signal U are not exactly the same as signal element values of corresponding signal elements of the first input signal I1. This may be as a result of asymmetries between the down-sampling and the up-sampling.
The residual generator 100-S calculates a difference between a target signal, which in this example is the first input signal I1, and a processed signal, which in this example is the up-sampled signal U. In this example, such calculation involves subtracting a signal element value of a signal element in the up-sampled signal U from a signal element value of a corresponding signal element in the first input signal I1 to give a value of a corresponding residual element in the further set of residuals R.
As explained above, in terms of decoder-side processing, a decoder can generate the up-sampled signal U using the encoded base stream BE and the set of correction data. Combining the up-sampled signal U with the further set of residuals R creates the original signal reconstruction O at the higher resolution. In this example, such combining involves adding a signal element value of a signal element in the up-sampled signal U and a signal element value of a corresponding residual element in the further set of residuals R together to give a signal element value of a corresponding signal element in the original signal reconstruction O.
It should be understood that, although the effect of the modification is visible in the base video stream, i.e., in the decoded base stream BD, and the up-sampled signal U, it is not visible in the original signal reconstruction O.
As such, if a decoder receives the encoded base stream BE, it will be able to recover some content of the first input signal I1 at the base level. The decoder may also be able to up-sample the decoded base stream BD and recover some content of the first input signal I1 at a higher spatial resolution. However, without the further set of residual data R, the decoder would not be able to reconstruct fully the first input signal I1.
The absolute values of the residuals in the bottom-right 2×2 block in the first set of residuals R are generally larger than those in the other 2×2 blocks in the first set of residuals R. This is, in this example, primary as a result of the modification modifying the values of the signal elements in the corresponding 2×2 block of the first input signal I1, such that the first input signal I1 and the up-sampled signal U are particularly different in that 2×2 block in the first set of residuals R.
Since the residuals in the bottom-right 2×2 block in the first set of residuals R are likely to have a more significant impact on the visual quality of the original signal reconstruction O than the residuals in the other 2×2 blocks in the first set of residuals R, they may be processed differently from the other residuals in the first set of residuals R. For example, the residuals in the bottom-right 2×2 block (corresponding to the modified signal elements of the first input signal I1) may not be subject to any quantization, or at least may be subject to a lower level of quantization than other residuals in the first set of residuals R.
It should also be understood that in many spatially scalable coding schemes and systems, it would be desirable to minimise the values of all residuals in the first set of residuals R. This can make encoding such residuals more efficient and can reduce the amount of residual data to be processed.
In contrast, the modification described above goes against such general desires by increasing the size of the values of some of the residuals in the first set of residuals R, compared to the size those values would have been without such modification. A benefit of such modification is, as described above, improved content protection.
In this example, the first, second and modified input signals I1, I2, IM are shown as being 4×4 arrays. In practice, the signals may have much larger dimensions.
Additionally, in this example, the modification occurs in the bottom-right 2×2 block of the array. Where signals may have much larger dimensions, modifications may occur at more central and visually prominent parts of the signal to maximise the (negative) effect of the modification on a viewer of the base layer stream without the enhancement.
In this example, the values of the elements in the bottom-right 2×2 block of the second input signal I2 are selected, based on the values of the corresponding elements of the bottom-right 2×2 block of the first input signal I1, such that the values of the elements in the bottom-right 2×2 block of the modified input signal IM are all ‘10’. Assuming, for this example only, that ‘10’ is the maximum signal element value, the values of the elements in the bottom-right 2×2 block of the second input signal I2 may be selected independently of the values of the corresponding elements of the bottom-right 2×2 block of the first input signal I1; for example, they may all be set to ‘10’. If the first and second input signals I1, I2 are added together and if the sum of values of corresponding signal elements exceeds 10, the value of the corresponding signal element for the modified input signal IM+ can be set to the maximum value of ‘10’.
The example spatially scalable encoder 100 shown in
However, in the example spatially scalable encoder 100 shown in
In contrast, in the example spatially scalable encoder 100 shown in
Further, in this example, the target signal, based on which the residual generator 100-S generates the set of residual data R, comprises a to-be-down-sampled signal. In this example, the to-be-down-sampled signal is the first input signal I1. The to-be-down-sampled signal, namely the first input signal I1, is down-sampled to derive the to-be-modified signal, namely the down-sampled signal B.
As explained above, a first input signal I1 is down-sampled to create a down-sampled signal B, for example by averaging.
In this example, the combiner 30 is an additive combiner, which adds the down-sampled signal B and the second input signal I2 together to generate a modified down-sampled signal BM. The base encoder 120E encodes the modified down-sampled signal BM to generate an encoded base stream BE and the base decoder 120D decodes the encoded base stream BE to generate a decoded base stream BD. In this specific example, for convenience and brevity, it is again assumed that the decoded base stream BD is the same as the modified down-sampled signal BM, which may be expressed mathematically as: BD=BM.
The output BD1 of the residual generator 110-S is the difference between the down-sampled signal B and the decoded down-sampled signal BD, which may be expressed mathematically as: BD1=B−BD=B−BM. In this example, the modified down-sampled signal BM is the combination of the down-sampled signal B and the second input signal I2, which may be expressed mathematically as: BM=B+I2. Taking these two mathematical expressions together, BD1=B−BM=B−(B+I2)=−I2.
The output BD2 of the summing operation 110-C is the sum of the decoded base stream BD (which equals the modified down-sampled signal BM) and the output BD1 of the residual generator 110-S, which may be expressed mathematically as: BD2=BM+BD1. However, from above, BD1=B−BM. As such, BD2=BM+BD1=BM+(B−BM)=B. As such, the modification caused by the second input signal I2 has, in effect, been removed at the output BD2 of the summing operation 110-C.
The output BD2 of the summing operation 110-C is then up-sampled to generate the up-sampled signal U, and the further set of residuals R is calculated based on differences between the first input signal I1 and the up-sampled signal U.
Again, in terms of decoder-side processing, a decoder can generate the up-sampled signal U using the encoded base stream BE and the set of correction data. Combining the up-sampled signal U with the further set of residuals R creates the original signal reconstruction O at the higher resolution. Additionally, although, again, the effect of the modification is visible in the base video stream, i.e., in the decoded base stream BM, it is not visible in the original signal reconstruction O. The absolute values of the residuals in the bottom-right 2×2 block in the first set of residuals R shown in
The example spatially scalable encoder 100 shown in
However, in the example spatially scalable encoder 100 shown in
In contrast, in the example spatially scalable encoder 100 shown in
Additionally, the example spatially scalable encoder 100 shown in
The first further set of residuals R1 is calculated as the difference between the modified input signal IM and the up-sampled signal U. As such, in this example, the target signal, based on which the first further set of residuals R1 is generated, comprises the modified input signal IM.
The second further set of residuals R2 is calculated as the sum of the first further set of residuals R1 and the third input signal I3. As such, further residual data (namely, the second further set of residuals R2) is generated based on a value of an element of the first further set of residuals R1 and a value of a corresponding element of a signal based on the modification signal I2. In this example, the signal based on the modification signal I2 is obtained by processing the modification signal I2. In this example, such processing involves calculating the negative (or “inverse”) of the modification signal I2 to generate the third input signal I3.
It should be noted that the second set of residuals R2 shown in
The example spatially scalable encoder 100 shown in
However, in the example spatially scalable encoder 100 shown in
In contrast, in the example spatially scalable encoder 100 shown in
As such, in this example, the second further set of residuals R2 is calculated as the difference between the first further set of residuals R1 and the modification signal I2. As such, further residual data is generated based on a value of an element of the first further set of residuals R1 and a value of a corresponding element of the modification signal I2.
It can therefore be seen that the example spatially scalable encoders 100 shown in
The example spatially scalable encoder 100 shown in
However, in the example spatially scalable encoder 100 shown in
In contrast, in the example spatially scalable encoder 100 shown in
In particular, the modifier 30 substitutes (which may also be referred to as “replaces” or “overwrites”) a signal element value of a signal element in the first input signal I1 with a signal element value of a corresponding signal element in the second input signal I2 to give a signal element value of a corresponding signal element in the modified input signal IM.
In this example, not all signal element values of signal elements in the first input signal I1 are overwritten. However, in this example, signal element values of signal elements in the first input signal I1 that are overwritten form the basis of the third input signal I3. As such, in this example, the first input signal I1 is processed to derive the third input signal I3.
Additionally, the further modifier 40 overwrites any values of residual elements in the first set of residuals R1 corresponding to a modified signal element of the modified input signal IM with the signal element values of the corresponding signal element of the third input signal I3.
Again, in this example, not all values of all residual elements in the first set of residuals R1 are overwritten by the further modifier 40; only those residual elements corresponding to a modified signal element of the modified input signal IM.
In this example, the modifier 30 overwrites the signal element values “7”, “8”, “8” and “9” in the bottom-right 2×2 block of the first input signal I1 with the signal element values “10”, “10”, “10” and “10” in the bottom-right 2×2 block of the second input signal I2. The modifier 40 overwrites the residual values “2”, “2”, “2” and “1” in the bottom-right 2×2 block of the first set of residuals R1 with the signal element values “7”, “8”, “8” and “9” in the bottom-right 2×2 block of the third input signal I3.
Instead of the third input signal I3 storing the signal element values “7”, “8”, “8” and “9” from the first input signal I1, the third input signal I3 could indicate which residuals in the first set of residual elements R1 are to be overwritten with corresponding signal element values from the first input signal I1.
The example spatially scalable encoder 100 shown in
However, in the example spatially scalable encoder 100 shown in
In contrast, in the example spatially scalable encoder 100 shown in
As such, in this example, correction data is generated based on: (i) a value of an element of the down-sampled modified signal B, (ii) a value of a corresponding element of the decoded modified signal BD, and (iii) a value of a corresponding element of the modification signal I2 or a value of a corresponding element of a signal I3 based on the modification signal. In this specific example, the signal based on the modification signal, namely the third input signal I3, is used but, in other examples, the modification signal I2 may be used.
In this example, the signal based on the modification signal, namely the third input signal I3, is derived by processing the modification signal I2. In this specific example, such processing involves down-sampling the modification signal I2 and creating a negative of the down-sampled modification signal, as will become more apparent below.
The correction data, or data derived based on the correction data, can be sent to be encoded.
In this example, the combiner 30 is an additive combiner, which adds the first and second input signals I1, I2 together to generate the modified input signal IM, and the down-sampler 105D down-samples the modified input signal IM to generate the down-sampled signal B. The base encoder 120E encodes the down-sampled signal B to generate an encoded base stream BE and the base decoder 120D decodes the encoded base stream BE to generate a decoded base stream BD. In this specific example, for convenience and brevity, it is again assumed that the decoded base stream BD is the same as the down-sampled signal B, which may be expressed mathematically as: BD=B.
The output BD1 of the residual generator 110-S is the difference between the down-sampled signal B and the decoded down-sampled signal BD, which may be expressed mathematically as: BD1=B−BD. Since, in this example, BD=B, it follows that BD1=0.
In this example, the third input signal I3 is the negative of a down-sampled version of the second input signal −I2D. The third input signal I3 may be obtained by down-sampling the second input signal I2 and generating a negative of the result.
In this example, the modifier 40 modifies the output BD1 of the residual generator 110-S by adding the third input signal I3 to give an output BD2, where: BD2=BD1+(−I2D)=0+(−I2D)=−I2D.
The output BD3 of the summing operation 110-C is the sum of the decoded base signal BD and the output BD2 of the modifier 40, which may be expressed mathematically as: BD3=BD+BD2. However, since BD is generated by down-sampling IM, where IM=I1+I2, the summing operation 110-C effectively cancels out the down-sampled I2, leaving the output BD3 of the summing operation 110-C as the down-sampled I1.
The output BD3 of the summing operation 110-C, namely the down-sampled I1, is then up-sampled to generate the up-sampled signal U, and the further set of residuals R is calculated based on differences between the first input signal I1 and the up-sampled signal U.
Again, in terms of decoder-side processing, a decoder can generate the up-sampled signal U using the encoded base stream BE and the set of correction data. Combining the up-sampled signal U with the further set of residuals R creates the original signal reconstruction O at the higher resolution. Additionally, although, again, the effect of the modification is visible in the base video stream, i.e., in the decoded base stream BD, it is not visible in the original signal reconstruction O.
The set of residuals R in
The top part of
The bottom part of
Even if a decoder were to receive an encoded version of the base video stream B in the bottom part of
As such, a shifting modifier can likewise provide content protection.
The shifting modifier may apply different-sized and/or different-direction shifts to different encodings. This can provide a further degree of content protection.
In more detail, a first encoding of the first input signal I1 may use a first shift (in terms of size and/or direction) and a second encoding of the first input signal I1 may use a second, different shift (in terms of size and/or direction). Residual data R associated with the first encoding would not match (or “align”) with the base video stream B of the second encoding. Similarly, residual data R associated with the second encoding would not match with the base video stream B of the first encoding. As such, even if a decoder had access to a base video stream B from one of the first and second encodings of the first input signal I1, they would need the corresponding residual data R from that sane encoding to generate a matched (or “aligned”) reconstruction of the first input signal I1.
Additionally, the first and second encodings may use different codecs, for example to encode the residual data R. If a decoder uses an incorrect codec to decode residual data R (for example if the decoder uses a codec used for a different encoding), then any reconstruction of the first input signal I1 may also not match the first input signal I1.
The shift-amount may be a small number, for example three or less. The shift-direction may be up, down, left, right or otherwise.
By using a small shift, the residual values can be kept relatively small, for efficient encoding, while still enabling content protection.
In this example, the shift modifier shifts the first input signal I1 to generate the modified input signal IM, which is then down-sampled. However, in other examples, and in a manner similar to that described above with reference to
In
In certain preferred implementations, the components of the base layer 301 may be supplied separately to the components of the enhancement layer 302; for example, the base layer 301 may be implemented by hardware-accelerated codecs whereas the enhancement layer 302 may comprise a software-implemented enhancement codec. The base layer 301 comprises a base encoder 310. The base encoder 310 receives a version of an input signal to be encoded 306, for example a signal following one or two rounds of down-sampling and generates a base bitstream 312. The base bitstream 312 is communicated between the encoder 305 and decoder 306. At the decoder 306, a base decoder 314 decodes the base bitstream 312 to generate a reconstruction of the input signal at the base level of quality 316.
Both enhancement sub-layers 303 and 304 comprise a common set of encoding and decoding components. The first sub-layer 303 comprises a first sub-layer transformation and quantisation component 320 that outputs a set of first sub-layer transformed coefficients 322. The first sub-layer transformation and quantisation component 320 receives data 318 derived from the input signal at the first level of quality and applies a transform operation. This data may comprise the first set of residuals as described above. The first sub-layer transformation and quantisation component 320 may also apply a variable level of quantisation to an output of the transform operation (including being configured to apply no quantisation). Quality scalability may be applied by varying the quantisation that is applied in one or more of the enhancement sub-layers. The set of first sub-layer transformed coefficients 322 are encoded by a first sub-layer bitstream encoding component 324 to generate a first sub-layer bitstream 326. This first sub-layer bitstream 326 is communicated from the encoder 305 to the decoder 306. At the decoder 306, the first sub-layer bitstream 326 is received and decoded by a first sub-layer bitstream decoder 328 to obtain a decoded set of first sub-layer transformed coefficients 330. The decoded set of first sub-layer transformed coefficients 330 are passed to a first sub-layer inverse transformation and inverse quantisation component 332. The first sub-layer inverse transformation and inverse quantisation component 332 applies further decoding operations including applying at least an inverse transform operation to the decoded set of first sub-layer transformed coefficients 330. If quantisation has been applied by the encoder 305, the first sub-layer inverse transformation and inverse quantisation component 332 may apply an inverse quantisation operation prior to the inverse transformation. The further decoding is used to generate a reconstruction of the input signal. In one case, the output of the first sub-layer inverse transformation and inverse quantisation component 332 is the reconstructed first set of residuals 334 that may be combined with the reconstructed base stream 316 as described above.
In a similar manner, the second sub-layer 304 also comprises a second sub-layer transformation and quantisation component 340 that outputs a set of second sub-layer transformed coefficients 342. The second sub-layer transformation and quantisation component 340 receives data derived from the input signal at the second level of quality and applies a transform operation. This data may also comprise residual data 338 in certain embodiments, although this may be different residual data from that received by the first sub-layer 303, e.g., it may comprise the further set of residuals as described above. The transform operation may be the same transform operation that is applied at the first sub-layer 303. The second sub-layer transformation and quantisation component 340 may also apply a variable level of quantisation before the transform operation (including being configured to apply no quantisation). The set of second sub-layer transformed coefficients 342 are encoded by a second sub-layer bitstream encoding component 344 to generate a second sub-layer bitstream 346. This second sub-layer bitstream 346 is communicated from the encoder 305 to the decoder 306. In one case, at least the first and second sub-layer bitstreams 326 and 346 may be multiplexed into a single encoded data stream. In one case, all three bitstreams 312, 326 and 346 may be multiplexed into a single encoded data stream. The single encoded data stream may be received at the decoder 306 and de-multiplexed to obtain each individual bitstream.
At the decoder 306, the second sub-layer bitstream 346 is received and decoded by a second sub-layer bitstream decoder 348 to obtain a decoded set of second sub-layer transformed coefficients 350. As above, the decoding here relates to a bitstream decoding and may form part of a decoding pipeline (i.e., the decoded set of transformed coefficients 330 and 350 may represent a partially decoded set of values that are further decoded by further operations). The decoded set of second sub-layer transformed coefficients 350 are passed to a second sub-layer inverse transformation and inverse quantisation component 352. The second sub-layer inverse transformation and inverse quantisation component 352 applies further decoding operations including applying at least an inverse transform operation to the decoded set of second sub-layer transformed coefficients 350. If quantisation has been applied by the encoder 305 at the second sub-layer, the inverse second sub-layer transformation and inverse quantisation component 352 may apply an inverse quantisation operation prior to the inverse transformation. The further decoding is used to generate a reconstruction of the input signal. This may comprise outputting a reconstruction of the further set of residuals 354 for combination with an up-sampled combination of the reconstruction of the first set of residuals 334 and the base stream 316 (e.g., as described above).
The bitstream encoding components 324 and 344 may implement a configurable combination of one or more of entropy encoding and run-length encoding. Likewise, the bitstream decoding components 328 and 348 may implement a configurable combination of one or more of entropy encoding and run-length decoding.
Further details and examples of a two sub-layer enhancement encoding and decoding system may be obtained from published LCEVC documentation.
In general, examples described herein operate within encoding and decoding pipelines that comprises at least a transform operation. The transform operation may comprise the DCT or a variation of the DCT, a Fast Fourier Transform (FFT), or a Hadamard transform as implemented by LCEVC. The transform operation may be applied on a block-by-block basis. For example, an input signal may be segmented into a number of different consecutive signal portions or blocks and the transform operation may comprise a matrix multiplication (i.e., linear transformation) that is applied to data from each of these blocks (e.g., as represented by a 1D vector). In this description and in the art, a transform operation may be said to result in a set of values for a predefined number of data elements, e.g., representing positions in a resultant vector following the transformation. These data elements are known as transformed coefficients (or sometimes simply “coefficients”).
As described herein, where the signal data comprises residual data, a reconstructed set of coefficient bits may comprise transformed residual data, and a decoding method may further comprise instructing a combination of residual data obtained from the further decoding of the reconstructed set of coefficient bits with a reconstruction of the input signal generated from a representation of the input signal at a lower level of quality to generate a reconstruction of the input signal at a first level of quality. The representation of the input signal at a lower level of quality may be a decoded base signal (e.g., from base decoder 314) and the decoded base signal may be optionally upscaled before being combined with residual data obtained from the further decoding of the reconstructed set of coefficient bits, the residual data being at a first level of quality (e.g., a first resolution). Decoding may further comprise receiving and decoding residual data associated with a second sub-layer 304, e.g., obtaining an output of the inverse transformation and inverse quantisation component 352, and combining it with data derived from the aforementioned reconstruction of the input signal at the first level of quality. This data may comprise data derived from an upscaled version of the reconstruction of the input signal at the first level of quality, i.e., an upscaling to the second level of quality.
Although examples have been described with reference to a tier-based hierarchical coding scheme in the form of LCEVC, the methods described herein may also be applied to other tier-based hierarchical coding scheme, such as VC-6: SMPTE VC-6 ST-2117 as described in PCT/GB2018/053552 and/or the associated published standard document, which are both incorporated by reference herein.
In some examples, the to-be-modified signal described above comprises a luminance component (e.g., the Y component)) and chrominance components (e.g., the U and V components). In some such examples, the luminance component of the to-be-modified signal is modified based on the modification signal. However, the chrominance components of the to-be-modified signal are not modified based on the modification signal. Modifying only the luminance component can still produce the visual artefacts described herein. However, modifying only the luminance component can result in less processing than modifying all three components, while still providing the desired content protection.
In LCEVC and certain other coding technologies, a video signal fed into a base layer such as 301 is a downscaled version of the input video signal 302. In this case, the signal that is fed into both sub-layers comprises a residual signal comprising residual data. A plane of residual data may also be organised in sets of n by n blocks of signal data 410. The residual data may be generated by comparing data derived from the input signal being encoded, e.g., the video signal 402, and data derived from a reconstruction of the input signal, the reconstruction of the input signal being generated from a representation of the input signal at a lower level of quality. In the example of
Hence, a plane of data 408 for the first sub-layer 303 may comprise residual data that is arranged in n by n signal blocks 410. One such 2 by 2 signal block is shown in more detail in
As shown in
The techniques described herein may be implemented in software or hardware, or may be implemented using a combination of software and hardware. They may include configuring an apparatus to carry out and/or support any or all of techniques described herein.
The above examples are to be understood as illustrative. Further examples are envisaged.
For example, although examples described above relate primarily to image or video signals, the spatially scalable coding schemes and systems described herein may be applied to audio signals. In such cases, an audio signal may be modified, for example, by adding noise. The audio signal may be recoverable at a base level but with the added noise present. The audio signal may be recoverable at an enhancement level but with the added noise removed.
It is to be understood that any feature described in relation to any one example may be used alone, or in combination with other features described, and may also be used in combination with one or more features of any other of the examples, or any combination of any other of the examples. Furthermore, equivalents and modifications not described above may also be employed without departing from the scope of the invention, which is defined in the accompanying claims.
The following sets out particularly preferred examples of the present disclosure as a set of numbered clauses. It will be understood that these are examples helpful in understanding the invention.
Numbered Clause 1. A method comprising:
Numbered Clause 2. A method according to Numbered Clause 1, comprising:
Numbered Clause 3. A method according to Numbered Clause 2, wherein the target signal comprises the to-be-modified signal.
Numbered Clause 4. A method according to Numbered Clause 3, comprising sending the residual data, or data derived based on the residual data, to be encoded.
Numbered Clause 5. A method according to any of Numbered Clauses 2 to 4, comprising generating correction data based on:
Numbered Clause 6. A method according to Numbered Clause 5, comprising deriving said signal based on the modification signal by processing the modification signal.
Numbered Clause 7. A method according to Numbered Clause 5 or 6, comprising:
Numbered Clause 8. A method according to Numbered Clause 2, wherein the target signal comprises the modified signal.
Numbered Clause 9. A method according to Numbered Clause 8, comprising:
Numbered Clause 10. A method according to Numbered Clause 9, comprising deriving said signal based on the modification signal by processing the modification signal.
Numbered Clause 11. A method according to Numbered Clause 9 or 10, comprising:
Numbered Clause 12. A method according to Numbered Clause 1, wherein the target signal comprises a to-be-down-sampled signal.
Numbered Clause 13. A method according to Numbered Clause 12, comprising down-sampling the to-be-down-sampled signal to derive the to-be-modified signal.
Numbered Clause 14. A method according to any of Numbered Clauses 1 to 13, wherein the modification signal comprises an overlay to be applied to the to-be-modified signal.
Numbered Clause 15. A method according to any of Numbered Clauses 1 to 14, wherein the modified signal has another element corresponding to another element of the to-be-modified signal, and wherein the other element of the modified signal has the same value as the value of the corresponding other element of the to-be-modified signal.
Numbered Clause 16. A method according to any of Numbered Clauses 1 to 15, wherein the modifying of the to-be-modified signal comprises combining the to-be-modified signal with the modification signal.
Numbered Clause 17. A method according to any of Numbered Clauses 1 to 16, wherein the to-be-modified signal comprises a luminance component and chrominance components, wherein the luminance component of the to-be-modified signal is modified based on the modification signal, and wherein the chrominance components of the to-be-modified signal are not modified based on the modification signal.
Numbered Clause 18. A method according to any of Numbered Clauses 1 to 17, wherein the using of the decoded modified signal to generate the processed signal comprises performing an up-sampling operation.
Numbered Clause 19. A method comprising:
Numbered Clause 20. A method according to any of Numbered Clauses 1 to 19, wherein the element of the modified signal and the corresponding element of the to-be-modified signal are pixels.
Numbered Clause 21. Apparatus configured to perform a method according to any of Numbered Clauses 1 to 20.
Numbered Clause 22. A computer program arranged, when executed, to perform a method according to any of Numbered Clauses 1 to 20.
Numbered Clause 23. A data stream comprising:
Number | Date | Country | Kind |
---|---|---|---|
2204621.3 | Mar 2022 | GB | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/GB2023/050818 | 3/29/2023 | WO |