EMBEDDING DATA WITHIN TRANSFORMED COEFFICIENTS USING BIT PARTITIONING OPERATIONS

Abstract
Examples described herein relate to decoding and encoding signals. Certain examples described herein encapsulate custom data that is not signal data within a stream of encoded signal data. The custom data may comprise a wide variety of metadata that annotates the signal data, or provides additional information relating to the signal data. Certain examples described herein encapsulate custom data within a set of transformed coefficient values that represent data derived from a transform operation that forms part of the signal encoding. The encapsulation is may be performed by applying a bit shift operation to coefficient bits representing the set of transformed coefficient values.
Description
FIELD OF INVENTION

The invention relates to a system and method for adaptively varying the quality of a stream of video data in order to ensure that the stream may be decoded at the highest possible quality which the decoding device can handle. In particular the invention relates to a methodology of encoding a stream of video data which enables adaptive video quality decoded.


BACKGROUND TO THE INVENTION

Video content providers use set-top boxes to deliver content to end users, and those set-top boxes may vary in age and quality. As a result, the video decoding capability of the set-top boxes currently used to receive video content from a content provider varies between poor and excellent.


Furthermore, it is known that within a level of resolution (e.g., Standard Definition or SD, 720p High Definition of HD, 1080p HD, Ultra HD) a number of profiles which define a set of capabilities may be defined. For example in H264, a profile defines the requirements to decode a particular stream and is typically associated with a class of application. Accordingly, a content provider may have to broadcast to a number of set-top boxes which have varying profiles with some boxes unable to decode the content, or unable to use their full capabilities.


Similarly an end user may decode a stream on a device such as a smartphone, tablet computer, laptop etc., and the video decoding capabilities of such devices may vary greatly.


In order to ensure that that all users are able to view content, the content delivery network (CDN) has to provide the content either: at the lowest supported specification, or profile, (thereby meaning that many of the devices such as set-top boxes are unable to view the content at their best/optimal capabilities as the content is provided at a lower specification); or provide multiple channels showing the same content at different qualities, or profiles (resulting in increasing costs and bandwidth for the CDN).


SUMMARY OF THE INVENTION

In order to mitigate for at least some of the above problems, there is provided a method for encoding a first stream of video data comprising a plurality of frames of video, the method, for one or more of the plurality of frames of video, comprising the steps of: encoding in a hierarchical arrangement a frame of the video data, the hierarchical arrangement comprising a base layer of video data and a first enhancement layer of video data, said first enhancement layer of video data comprising a plurality of sub-layers of enhancement data, such that when encoded: the base layer of video data comprises data which when decoded renders the frame at a first, base, level of quality; and each sub-layer of enhancement data comprises data which, when decoded with the base layer, render the frame at a higher level of quality than the base level of quality; and wherein the steps of encoding the sub-layers of enhancement data comprises: quantizing the enhancement data at a determined initial level of quantization thereby creating a set of quantized enhancement data; associating to each of the plurality of sub-layers a respective notional quantization level and allocating, for each of the plurality of sub-layers, a sub-set of the set of quantized enhancement data based on the respective notional quantization level.


The arrangement of the stream provides multiple benefits. As identified above in the prior art the stream is encoded at the lowest supported specification as determined by the recipient decoder, as such a number of recipient decoders are not used to their optimal ability as the stream is deliberately downgraded to ensure that all devices are able to decode the stream. As the base layer of the stream enables a minimum level of quality to be rendered by a decoding device, all decoders are able to render the video, as a minimum, at a base level of quality. The arrangement of the stream further enables the decoder to decode, in a hierarchical fashion, further enhancement information, which when decoded with the base layer produces a video with an enhanced level of quality. As multiple sub-layers of enhancement data are provided within the encoded stream a given recipient decoder decodes as many sub-layers of enhancement data as it is able to (limited by bandwidth and device capabilities) in order to ensure that video is rendered at the optimal of the ability of the decoder. As such each individual decoder may be used to its full capabilities, without having to provide multiple streams for the various profiles of user.


By associating each sub-layer with a notional quantization level each sub-layer comprises the data which renders the video at an enhanced level of quality equivalent to the notional quantization amount. Advantageously as the enhancement layers contain data regarding the individual pixels to be enhanced if the bandwidth is insufficient to download the entirety of a sub-layer for a given frame, or set of frames, (i.e. a sub-layer is only partially downloaded) the decoder may progress to the next chunk of data without having to download the remainder of the sub-layer. In many prior art systems as the data is encoded with reference to other frames, once downloading of a layer of data (such as an enhancement layer) has begun the layer must be downloaded in its entirety before progressing to the next chunk of data. Such an arrangement can result in delays due to buffering etc., when the available bandwidth changes.


Further aspects of the invention will be apparent from the description and the claims.





BRIEF DESCRIPTION OF THE FIGURES


FIG. 1 is a block diagram of an exemplary system for performing an adaptive video quality methodology according to an aspect of the invention;



FIG. 2 is a schematic representation of the data structure according to an aspect of the invention;



FIG. 3 is a block diagram of an exemplary set-top box which is configured to decode a video stream encoded using the data structure of FIG. 2;



FIG. 4 is a flow chart of the methodology of encoding an adaptive video quality stream according to an aspect of the invention;



FIGS. 5A to 5F are schematic representations of the selection and encoding of pixels according to the method of FIG. 4; and



FIGS. 6A to 6D are plots of the residual data used to determine an initial level of quantization according to an aspect of the invention.





DETAILED DESCRIPTION OF AN EMBODIMENT

The present invention defines a system and method for encoding a video stream in such a manner that enables the recipient decoder to adaptively vary the quality of the stream according to available bandwidth and the capabilities of the decoder.


By encoding the stream in a hierarchical arrangement it is possible to define a base level, which all recipient devices are able to decode, and one or more enhancement layers which when decoded enable the stream to be rendered at a higher level of quality than the base layer. Each enhancement layer has a plurality of sub-layers which contain a subset of pixels which when decoded render the frame at a notional level of quantization. Therefore depending on the capabilities of the decoder and the available bandwidth the image may be decoded at a number of different levels of quality and levels of quantization. Such an arrangement also ensures that the same stream may be decoded by multiple devices at different levels of quality and quantization thus ensuring the stream can be provided to the optimal ability of the decoder.



FIG. 1 is a block diagram of a system for performing an adaptive video quality methodology according to an aspect of the invention.


In FIG. 1 there is shown the system 100, the system 100 comprising a streaming server 102 connected via a network to a plurality of client devices 108, 110, 112.


The streaming server 104 comprising an encoder 104, the encoder configured to encode a first video stream utilising the methodology described herein. The streaming server 104 is configured to deliver an encoded video stream 106 to a plurality of client devices such as set-top boxes, 108, 110 and 112. Each client device 108, 110 and 112 is configured to decode and render the encoded video stream 106. The client devices and streaming server 104 are connected via a network 114.


For ease of understanding the system 100 of FIG. 1 is shown with reference to a single streaming server 102 and three recipient set-top boxes 108, 110, 112 though in further embodiments the system 100 may comprise multiple servers (not shown) and several tens of thousands of set-top boxes. In further embodiments the client devices 108, 110, 112 may be any other devices capable of decoding video streams such as smart TVs, smartphones, tablet computers, laptop computers etc.


The streaming server 102 can be any suitable data storage and delivery server which is able to deliver encoded data to the set-top boxes over the network. Streaming servers are known in the art, and may use unicast and/or multicast protocols. The streaming server is arranged to store the encoded data stream, and provide the encoded video data in one or more encoded data streams 106 to the client devices 108, 110 and 112. The encoded video stream 106 is generated by the encoder 104. The encoder 104 in FIG. 1 is located on the streaming server 102, though in further embodiments the encoder 104 is located elsewhere in the system 100. The encoder 104 generates the encoded video stream in accordance with the methodology as described with reference to FIGS. 3 and 4.


The client devices 108, 110 and 112 are devices such as set-top boxes, which are well known in the art. As is known in the art the capabilities of the set-top boxes in such a system 100 vary. For example in FIG. 1 set-top box 108 is an older set-top box and only able to decode SD resolution, set-top box 110 is a more advance box than set-top box 108 and able to decode at a higher resolution such as 720p and set-top box 112 is able to decode at 1080p. Such variation in set-top box capabilities is typically seen where, for example, a broadcaster will provide new set-top boxes to new clients and slowly introduce the updated boxes to existing clients meaning that a number legacy devices are constantly in use.


An input video stream is encoded at the encoder 104 to generate an encoded data stream 106. As described in detail below the encoded data stream 106 is encoded in such a manner to comprise data that enables the plurality of client devices to decode the video stream in accordance with the maximum capability of the set-top box. Accordingly, the single encoded data stream 106 is able to provide an encoded stream which may be decoded at different video qualities in accordance with the capabilities of the client device 108, 110, 112. Furthermore, the encoded data stream 106 is encoded is such a manner that depending on the capabilities of the device and the network the data stream 106 that the level of quality of the video decoded video may vary over time for each device. Accordingly, the system 100 is configured to provide a data stream which enables an adaptive variation in the quality of the video rendered.



FIG. 2 is a schematic representation of the hierarchical arrangement of an encoded video stream according to an aspect of the invention.


There is shown the encoded stream 210, the encoded stream 210 comprising a base layer 212, and enhancement layer 214. The enhancement layer comprises a plurality of sub-layers of enhancement data, comprising a first sub-layer 216, a second sublayer 218 and a third sub-layer 220. For ease of understanding only three sub-layers of enhancement data 216, 218, 220 are shown, though it is to be understood that further sub-layers of enhancement data may be present.


Each individual frame, or segment, comprising a plurality of frames, is encoded in a manner such as shown in FIG. 2. In a preferred embodiment the hierarchical encoding of the frame or segment means that each frame is encoded with reference to itself and does not contain any information regarding other frames.


The base layer 212 comprises data encoded in a known video encoding standard, for example MPEG-4 at standard definition. Preferably, the base layer 212 is chosen as the highest standard of video which can be decoded by all devices. In other words all recipient devices are able to decode and render the base layer. In further embodiments, where the video is encoded as described above with reference to PCT/IB2012/053723 where the video is encoded in a hierarchical arrangement. In such an arrangement there is a layer with a lowest (base) level of quality and one or more enhancement layers which define higher levels of quality. In such embodiments the base layer 212 may be defined as the lowest level of quality, and in further embodiments is the base layer 212 comprises a base layer and one or more enhancement layers. That is to say the base layer 212 already comprises some form of enhancement data and the invention provides further enhancement data.


The sub-layers of enhancement data 216, 218, 220 provide further data which when decoded with the base layer 212 will enable a device to decode and render the video data at a higher level of quality than the base layer.


The first sub-layer of enhancement data 216 which when decoded with the base layer 212 would render the stream at a higher level of quality. The higher level of quality may be defined by the resolution of the video (e.g. the first level of quality layer being equivalent to a form of HD stream) or other metrics (such as profile, colour, signal-to-noise etc.). In an embodiment the base layer 212 and enhancement layer 214 are encoded at the same resolution and, as described below in detail with reference to FIGS. 4 and 5, each sub-layer of enhancement data provides information to enhance, or correct, individual pixels thereby improving the quality of the image. Accordingly the first sub-layer of enhancement data 216 when decoded with the base layer 212 renders the video at the same level of resolution as the base layer, wherein a plurality of pixels of the base layer are enhanced/corrected to provide an image with a higher level of quality than the base image. In an embodiment the base layer 212 and first enhancement layer 216 when decoded together would return a decoded frame which is equivalent to a first, nominal, profile. The second sub-layer of enhancement data 218 which when decoded with the base layer 212 and the first sub layer 216 provides further data regarding individual pixel correction, thereby rendering the image at level of quality that is higher than the level of quality associated with the first sub-layer 216. In an embodiment the base layer 212 and first and second enhancement layers


when decoded together would result in a decoded frame which is equivalent to a second nominal profile, wherein the second profile is associated with a higher set of capabilities than the first nominal profile. Similarly the third sub-layer of enhancement data 220 comprises data regarding a further set of correction/enhancement data for individual pixels which when decoded with the base layer 212 and the first 216 and second 218 sub-layers of enhancement data renders the frame at an even higher level of quality. Again, the base layer 212 and first, second and third enhancement layers would result in a decoded frame which is equivalent to a third nominal profile, wherein the third profile is associated with a higher set of capabilities than the first and second nominal profiles.


In further embodiments the first sub-layer of enhancement data 216 when decoded with the base layer 212 renders the video at 720p. The second sub-layer of enhancement data 218 which when decoded with the base layer 212 and the first sub layer renders the video at 1080p and the third sub-layer of enhancement data 220 when decoded with the base layer 212 and the first 216 and second 218 sub-layers of enhancement data renders the stream at UHD. Therefore each sub-layer of enhancement data is associated with a level of quality, which may be defined by the resolution or other metrics.


The hierarchical arrangement allows for multiple profiles to be present in the single stream. The base layer being a baseline profile defining the minimum level of quality accessible by all decoders. Each sub-layer of enhancement data can define a further profile associated with the video.


As the data is arranged in a hierarchical manner, for each frame, or segment of data, the base layer 212 is downloaded first followed by the data of the first sub-layer of enhancement data 216, the second sub-layer of enhancement data 218 and the third sub-layer of enhancement data 220.



FIG. 3 is a block diagram showing an example client device 300 (equivalent to devices 108, 110, 112 in FIG. 1) which is configured to decode a stream encoded in accordance with an aspect of the invention. In a preferred embodiment the client


device is an OTT enabled device, for example a smartphone, tablet computer, desktop computer or set-top box and the OTT box 300 comprises a communications port 302 coupled to network 114, a processor 304, a buffer 306, a decoder 308 and an output 310.


The computer processor 304 is configured to initiate and control reception of the encoded data stream via the communications port 302, which is in turn configured to communicate with the streaming server 102 across the data network 114. As is known in the art, the buffer 306 is configured, under the control of the computer processor 304, to buffer, or cache, the data, before sending the buffered part of the encoded data stream to the decoder 308 for decoding after a prescribed time period. The decoder 308 is configured to decode the buffered part of the encoded data stream and to pass the decoded data to the output 310 for presentation to a user.


Clearly, the above description is a simplified version of how encoded data streams are buffered, decoded and, if appropriate, output, and is provided not as a comprehensive discussion as to how the decoding and display of encoded data in client devices works, but so as to give sufficient context in which to explain the invention.


As described above the decoder 308 of every OTT device, such as the set-top box is configured to be able to decode the base layer 212. Depending on the capabilities of the set-top box processor 304 and decoder 308 a device may or may not be able to decode the enhancement information as provided in one or more of the sub-layers of enhancement data. Furthermore as the data is arranged in a hierarchical manner, if the bandwidth available in the network means that the set-top box is unable to buffer the entirety of the enhancement data, the set-top box as a minimum would be able to render the base layer thus ensuring a minimum level of quality at all times.


If the processor 304 and decoder 308 are able to decode the one of more sub-layers of enhancement information an image with a level of quality higher than the base layer is provided to the output 310. Therefore depending on the capabilities of the set-top box the quality of the video provided to the output 310 may be varied, and beneficially ensures that when the data is provided to multiple set-top boxes, each set-top box is able to decode the stream at its maximum capability.



FIG. 4 is a flow chart of the process of encoding the video stream according to an aspect of the invention, and FIGS. 5A to 5E are a diagrammatic representation of the output of various steps relating to the selection and encoding of pixels in a frame in accordance with the method of FIG. 4.


The process starts at step SI 02 where a video stream to be encoded is received at the encoder of the system 100.


For one or more of the frames of the video stream, the encoder encodes the frame in accordance with the methodology as defined by steps S 104 to SI 16, so as to encode the frame in the hierarchical manner.


At step SI 04 the encoder identifies which pixels of the frame are to be enhanced. When a frame is encoded, differences between the encoded frame and a reference frame (which may be the raw frame, or a frame encoded in a different manner) will be apparent. Such differences, or residuals, are artefacts of the encoding process. The extent of the residuals is dependent on the encoding processing, and the parameters used such as the level of quantization.


The residual therefore is indicative of the amount of enhancement required for a given pixel in the encoded frame to be visually indistinct from the reference frame. Therefore in an embodiment by comparing the encoded frame with a reference frame the residuals are calculated and all pixels which are to be enhanced are identified in order to render the frame at the higher level of quality.


Therefore at step SI 04 each of the individual pixels which are to be enhanced with enhancement data are identified.


In a preferred embodiment the encoded frame is the frame encoded at the base layer 212, and the reference frame is a frame quantized at the theoretical maximum level of quantization for a given bandwidth. As such the pixels which display the largest difference between the pixels in the comparison frame and base layer can be identified are deemed to be the pixels which need to be “touched” or enhanced. This process is defined in further detail below with reference to FIGS. 7A to 7D.


In further embodiments other suitable metrics are used to determine which pixels are to be enhanced.



FIG. 5A is an example of a frame in which the pixels to be enhanced have been identified as per step SI 04. FIG. 5A is a graphical representation of a frame of pixels, comprising an array of pixels in the x and y direction. The number of pixels in each direction is denoted N, and each pixel may be described by its co-ordinates xn yn where n is <<th pixel in the x or y direction.


As is known in the art each pixel will also be associated with data to describe the pixel. In FIG. 5A the black pixels represent the pixels which are identified at step SI 04 as those showing a residual and therefore in need of enhancement, that is to say they are to be enhanced in order to provide a frame at a higher level of quality. Each pixel may then be identified by their coordinates.


As described in further detail with reference to FIGS. 7A to 7D not all the pixels which are identified at step SI 04 (and shown in FIG. 5A) have the same importance, in that some pixels will add a more significant correction than other pixels. For example, the pixels which provide the most significant correction would be deemed to be the most important, as these pixels would provide the greatest effect on the level of quality of the frame.


At step SI 06 an initial quantization factor is determined. The initial quantization factor is representative of the amount of data that can be encoded, and which is based on the available bandwidth, via a rate control mechanism. The step of determining the initial quantization factor is explained below with reference to FIGS. 7A to 7D. All the pixels which could be enhanced with enhancement data are identified at step SI 04, however due to bandwidth considerations it is typically not possible to encode all enhancement data. Accordingly, a subset of the enhancement pixels are encoded. As described in detail below the subset of pixels which are encoded are beneficially selected as those which are visually the most important.


Once the initial quantization factor is determined, the enhancement pixels (i.e. the pixels determined at step SI 04 to require enhancement data) are quantized at the determined initial quantization level at step S108. Therefore, in most situations only a subset of the enhancement pixels are therefore encoded at the initial quantization level at step S108. As the most important enhancement pixels are those with the most information (as they contain the most correction information) during the quantization stage such pixels are beneficially quantized in preference. The quantization of the enhancement data occurs in a known manner.


In further embodiments the initial quantization factor is fixed, for example q=0.6.



FIG. 5B graphically represents the output of step SI 08.



FIG. 5B shows the same frame as FIG. 5A and as is shown in FIG. 5B, as the black and cross hatched pixels are the same pixels as identified in FIG. 5A to be enhanced. In FIG. 5B the black pixels are the pixels to be enhanced that are quantized at the initial quantization level as at step S108. The pixels with the vertical hatching are the pixels to be enhanced which due to the initial level of quantization used (e.g. q=0.6) have not been encoded. That is to say the enhancement data associated with the pixels with the vertical hatching is not encoded, and accordingly the enhancement data for these pixels is not present in the enhancement layer 214. Beneficially the pixels which are quantized (the black pixels) are those which are identified as being the pixels with the most important enhancement data.


Accordingly, at step SI 08 a subset of the enhancement pixels are quantized, where the amount of enhancement pixels which are quantized varies according to the initially determined quantization factor. Therefore, as described with reference to FIGS. 7A to 7D below, depending on the conditions the initial quantization factor may change resulting in fewer or greater number of pixels being quantized. Beneficially, due to the selection mechanism utilized the most important enhancement pixels are quantized.


Once the enhancement data has been quantized at step SI 08, the enhancement data of the pixels to be enhanced is divided into sub-layers, each sub-layer representative of a notional quantization level of the enhancement data. As described above, the initial quantization level is representative of the highest level of quantization possible for a given data set and conditions. As such the initial level of quantization is representative of the highest level of quality achievable.


At step SI 10 a plurality of notional quantization levels are determined. In an embodiment three notional quantization levels are determined, with the upper notional quantization level being equivalent to the initial quantization level as determined at step S106. The second, middle, quantization level is lower than the first, level (thereby effectively reducing the number of pixels to be enhanced which are quantized) and the third lower quantization level being less than the second quantization levels (thereby effectively further reducing the number of pixels to enhanced which are quantized).


In an embodiment the third lower level of quantization is determined as being the minimum acceptable quality for the stream and as a default is set at q=0.2. The middle level of quantization is then set as the mid-point between the upper and lower levels of quantization. In further embodiments other values, and other methods of setting the values, may be used. In further embodiments the levels of quantization are fixed, for example q=0.6, 0.4, and 0.2.


At step SI 12 an upper sub-layer is defined as the enhancement pixels which would be quantized when the notional level of quantization is equivalent to the initial level of quantization as determined at step SI 08.


As the enhancement data has already been quantized (at step S108) the initial level of quantization is a notional level of quantization as the data is not quantized utilising the initial level of quantization rather the methodology determines which pixels of the already quantized data would have been quantized. Accordingly steps SI 12, SI 14 and


SI 16 refer to a notional quantization as it determines notionally which pixels would have been quantized at the given level of quantization.



FIG. 5C illustrates the pixels which are defined as the upper sub-layer of quantized enhancement data as the black pixels. As can be seen in FIG. 5C the pixels that are quantized are the black pixels in FIG. 5B (i.e. the pixels of enhancement data which were quantized).


At step SI 14 the middle sub-layer of enhancement data is defined. The middle sub-layer of enhancement data is defined by utilising the middle level of quantization (which is lower than the initial quantization value) and determining which pixels of the enhancement data would have been quantized had the middle level of quantization value been used to quantize the enhancement data. As stated above no further quantization has occurred thereby reducing computational cost.



FIG. 5D is a graphical representation of the output of step SI 14.


In FIG. 5D there is shown the pixels of the frame where the quantized pixels of enhancement data are represented by the black and cross hatched pixels. The pixels with the horizontal hatching represent the pixels which would have been quantized have the middle, notional, level of quantization been used. The black pixels are the remaining quantized pixels. Accordingly, as can be seen in FIG. 5D the middle sublayer of enhancement data is a subset of the upper sub-layer of enhancement data.


At step SI 16 the lower sub-layer of enhancement data is defined. The lower sub-layer of enhancement data is defined by utilising the determined lower level of quantization (which is lower than the upper and middle quantization values). As with the upper and middle quantization values, the lower sub-layer of enhancement data is determined by identifying which pixels of the enhancement data would have been quantized had the lower level of quantization value been used to quantize the enhancement data.


As the enhancement data has already been quantized step SI 16 results in a notional level of quantization, as it simply calculates which pixels would have been quantized had the lower quantization value been used at step SI 08.



FIG. 5E is a graphical representation of the output of step SI 16.


In FIG. 5E there is shown the pixels of the frame where the quantized pixels of enhancement data are represented by the black pixels, pixels with horizontal hatching and pixels with a checkerboard pattern. In FIG. 5E the pixels with the checkerboard pattern represent the pixels identified as forming the lower sub-layer of enhancement data. As can be seen the pixels of the lower sub-layer are a subset of the pixels of middle sub-layer. As in FIG. 5D the pixels with the horizontal hatching represent the pixels which would have been quantized have the middle, notional, level of quantization been used and the black pixels are the remaining quantized pixels.


In the example given above three notional quantization levels are used to define three sub-layers of enhancement data, and in further embodiments a different number of notional quantization levels may be used to define a different number of sub-layers. As defined with reference to FIG. 2 each sub-layer may be associated with a profile (representative of the capabilities of a decoding device), and depending on the number of profiles to be catered for the number of sub-layers may increase, or decrease, as appropriate. As the methodology utilises a notional level of quantization, the quantization step only occurs once regardless of how many sub-layers are defined. Accordingly, the computational cost for quantizing the frame does not increase as the number of sub-layers increases as the step of quantization occurs once.



FIG. 5F illustrates the final distribution of the encoded enhancement pixels in accordance with the above described process.


In FIG. 5F the full set of quantized pixels 500 as identified at step SI 08 are shown. The full set of quantized pixels 500 comprises: a sub-set of quantized elements allocated to the lower sub-layer 502; a sub-set of quantized elements allocated to the middle sub-layer 504; and a sub-set of quantized elements allocated to the upper sublayer 506.


As the data is arranged in a hierarchical manner as described with reference to FIG. 2, set-top box 108 (which has the lowest capabilities) would receive the stream and decode the lower sub-layer 502 to produce a video at a level of quality that is higher than the base layer. Similarly set-top box 110 (which has higher capabilities than set-top box 108) would receive and decode the lower sub-layer 502 and middle sub-layer to produce video at an even higher level of quality. Set-top box 112 would receive and decode all the sub-layers and produce the highest level of quality video. Accordingly, the video quality produced from the same encoded stream may adaptively vary in accordance with the capabilities of the receiving device. In an embodiment, the differing capabilities of the set-top boxes (or indeed any other type of decoding device) would be defined by multiple profiles, and beneficially devices which are associated with different profiles may decode the same stream.


Once step SI 16 the frame has been encoded in the hierarchical manner and the process returns to step SI 04 and is continued until such time that all frames have been encoded.


As described above with reference to FIGS. 4 and 5 A to 5F the quantization factor q will determine the number of enhancement pixels that will be encoded. The lower the value of q the fewer the number of enhancement pixels which will be encoded in the enhancement data, and accordingly reduces the amount of enhancement of the base layer. As described with reference to FIG. 4 the pixels which need enhancement are determined. However, in many cases due to the available rate for the transmission of the encoded stream it is not possible to encode each and every enhancement pixel. For example if a system has an encoded stream is a 3 Mb/s and the base layer (e.g. the video MPEG-4 data) comprises 2 Mb/s of data allowing for 1 Mb/s of enhancement data. Accordingly if it is known that the enhancement data comprises N residuals the quantization factor must be selected to ensure that the total stream does not exceed the 3 Mb/s. Accordingly, provisions are made to ensure that the most important pixels are encoded and quantized. This occurs via a rate control mechanism which determines


the quantization factor based on the amount of space available for the enhancement data in the encoded stream.


The rate control aspect of the invention relies on the fact that during quantization of a frame, the quantization metric is typically defined such that the residuals are distributed around a value of 0 in a Laplacian distribution, as when entropy encoding such values these pixels will require the fewest amount of bits. An example of the typical distribution of the residuals is shown in as a histogram in FIG. 6A.


In FIG. 6A the x-axis represents the value of the residual and the y axis the number of pixels which have the residual value as the y axis. As can be seen due to the Laplace distribution the majority of the pixels have values near 0. The pixels which define the tail of the distribution are fewer in number, however as these represent pixels with the most correction, or enhancement, they are consider to be visually the most important. As such pixels are the most important in terms of enhancement then such pixels are preferentially encoded.


As the pixels which have a value of, or near 0, are less important in terms of enhancement of the base layer a dead-zone is defined around 0. By definition the pixels in the dead zone are deemed to be of lesser importance, and as such can be ignored, whilst reducing the affect that occurs by not including such pixels in the enhancement data.


In an embodiment the number of bits of data to encode the enhancement data based on the residual values varies according to the value of the residual. For example residual values which are, or close to zero, may be encoded utilising a low number of bits as there is little information needed to enhance the pixel. Conversely a high residual value would require a larger number of bits as there would be more information associated with the larger residual. In an embodiment a number of bins are defined, each bin defining a range of residual values. Each bin has a set number of bits to define the enhancement data. As the distribution is known, the number of pixels in each bin is also known and accordingly the total data required to encode the entire enhancement data is also known.


This information can be used to determine the level of quantization required to encode the enhancement data. For example if it is known that 1 Mb is available for the enhancement data then, as described below, the level of quantization is selected so as select pixels with a total 1 Mb of data.



FIG. 6B is a histogram of the same pixel distribution as per FIG. 6A in which a dead zone 602 has been introduced for pixels around 0. The size of the dead zone is shown in FIG. 6A schematically represents the dead zone for the initial level of quantization i.e. it is the highest level of quantization, meaning that a minimum number of pixels are included in the dead zone. As the pixel distribution is known and it is known that pixels which have a value of, or near, 0 have the least impact in the enhancement data it has been realised that such pixels may be removed from the enhancement data with minimal effect on the enhancement of the frame. The dead zone therefore defines the pixels which are not quantized as enhancement data. The width of the dead zone is determined by the quantization factor, and the pixels outside of the dead zone i.e. those in the tail will be encoded. Whilst the pixels near 0 require the least amount of bits to define due to their number the dead zone may result in significant reduction in the size of the enhancement data.


As described above, as the number of pixels are known as well as the number of bits required to describe each pixel, it is possible to determine the level of quantization required to encode the enhancement data.



FIG. 6C shows the dead zone 604 for middle sub-layer. As can be seen in FIG. 6C the dead zone 604 is wider than that of the dead zone 602 in FIG. 6B meaning that more pixels are ignored. Accordingly the level of correction is lower, whilst still ensuring that the most significant pixels (i.e. those in the tail) are maintained.



FIG. 6D shows the dead zone 606 for the lowest sub-layer with the lowest level of quantization. The dead zone in FIG. 6D is wider than that of FIG. 6C as more pixels are ignored. Again the most important pixels are encoded in FIG. 6D.


As the available rate is known, it is therefore possible to determine the maximum possible size of the enhancement data. From FIG. 6A it is possible to determine the total size of the enhancement data given the values of the residuals and the number of pixels. Starting from the 0 value of the residuals pixels are included in the dead zone until such time the size of the enhancement data is at, or below, the maximum allowable size. This represents the number of pixels to be quantized from which the initial quantization value, as per step SI 06, may be determined.


Accordingly, the methodology ensures that the most important enhancement pixels are encoded. The hierarchical arrangement ensures that the recipient device, such as a set-top boxes are able to encode the video at the maximum capability of the device.

Claims
  • 1. (canceled)
  • 2. A method of encoding signal data, comprising: obtaining coefficient bits representing values for a set of transformed coefficients, the values being generated by applying at least a transform operation to blocks of signal data derived from an input signal being encoded;obtaining userdata bits representing custom data to add to an encoded signal bitstream;applying a bit shift operation to the coefficient bits, the bit shift operation shifting the coefficient bits by a predefined number of bits;setting values of a set of additional bits added to the coefficient bits based on the userdata bits to generate a modified set of coefficient bits; andinstructing generation of an encoded bitstream using the modified set of coefficient bits,wherein the encoded bitstream carries both the custom data and an encoding of the signal data,wherein the blocks of signal data comprise residual data generated by comparing data derived from the input signal being encoded and data derived from a reconstruction of the input signal, the reconstruction of the input signal being generated from a representation of the input signal at a lower level of quality,wherein the encoded bitstream is an encoded enhancement bitstream for a first enhancement sub-layer at a first level of quality, and the method further comprises:obtaining further coefficient bits representing values for a set of transformed coefficients at a second level of quality, the second level of quality being higher than the first level of quality, the values being generated by applying at least a transform operation to blocks of signal data at the second level of quality; andinstructing generation of an encoded enhancement bitstream for a second enhancement sub-layer at the second level of quality using the further coefficient bits without applying a bit shift operation.
  • 3. The method of claim 2, comprising, prior to obtaining coefficient bits: obtaining the blocks of signal data derived from an input signal being encoded;applying the transform operation to data from each of the blocks of signal data to generate initial transformed coefficients; and quantizing the initial transformed coefficients to generate the set of transformed coefficients.
  • 4. The method of claim 2, further comprising: encoding the modified set of coefficient bits using one or more of entropy encoding and run-length encoding to generate the encoded bitstream.
  • 5. The method of claim 2, wherein the representation of the input signal at a lower level of quality comprises a representation of the input signal at a lower resolution.
  • 6. The method of claim 5, wherein the encoded bitstream is an encoded enhancement bitstream to enhance an encoded base bitstream, the encoded base bitstream being an encoded representation of the input signal at a lower level of quality.
  • 7. The method of claim 2, wherein the blocks of signal data comprise n by n blocks of signal data, and the transform operation implements a matrix multiplication applied to flattened vectors of length n2 representing the blocks of signal data and wherein the matrix multiplication comprises a multiplication with an n2 by n2 Hadamard matrix.
  • 8. The method of claim 2, wherein the transform operation outputs values for a set of data elements for each block of signal data, and the coefficient bits represent transformed coefficient values for a predefined one of the set of data elements.
  • 9. The method of claim 2, comprising: obtaining custom data to add to the encoded signal bitstream;obtaining a parameter indicating a bit length for user data values, the bit length indicating the predefined number of bits for the bit shift operation; andpre-processing the custom data to generate a bitstream of custom data values, each value being represented in the bitstream by a group of bits of the bit length.
  • 10. The method of claim 2, wherein the custom data comprises data associated with defined specific locations with the input signal, and wherein the method comprises: pre-processing the custom data to assign custom data values to specific blocks of the signal data based on the defined specific locations with the input signal,wherein applying the bit shift operation and copying the userdata bits Is performed for at least the specific blocks of the signal data.
  • 11. An encoder configured to perform the method according to claim 2.
  • 12. A method of decoding signal data, the method comprising: obtaining an encoded bitstream;decoding the encoded bitstream to obtain an initial set of coefficient bits representing values for a set of transformed coefficients, the values being generated during encoding by applying at least a transform operation to blocks of signal data derived from an input signal;extracting userdata from a set of end bits of the initial set of coefficient bits;applying a bit shift operation to the initial set of coefficient bits, the bit shift operation being in a direction that is opposite to a direction of a bit shift operation applied during encoding, the bit shift operation generating a reconstructed set of coefficient bits; andinstructing further decoding of the reconstructed set of coefficient bits, the further decoding comprising applying at least an inverse transform operation to values represented by the reconstructed set of coefficient bits,wherein the further decoding is used to generate a reconstruction of the input signal, wherein the reconstructed set of coefficient bits comprise transformed residual data, and the method further comprises:instructing a combination of residual data obtained from the further decoding of the reconstructed set of coefficient bits with a reconstruction of the input signal generated from a representation of the input signal at a lower level of quality to generate a reconstruction of the input signal at a first level of quality,wherein the encoded bitstream is an encoded enhancement bitstream for a first enhancement sub-layer at the first level of quality and the method further comprises:obtaining an encoded enhancement bitstream for a second enhancement sub-layer at a second level of quality;decoding, without applying a bit shift operation and extracting user data, the encoded enhancement bitstream for the second enhancement sub-layer to obtain a second set of residual data for the second level of quality;instructing a combination of the second set of residual data with a reconstruction at the second level of quality derived from the reconstruction of the input signal at the first level of quality to generate a reconstruction of the input signal at the second level of quality.
  • 13. The method of claim 12, wherein decoding the encoded bit stream comprises one or more of entropy decoding and run-length decoding and wherein further decoding of the reconstructed set of coefficient bits comprises applying an inverse quantization operation prior to the inverse transform operation.
  • 14. The method of claim 12, wherein the encoded bitstream is an encoded enhancement bitstream to enhance an encoded base bitstream, the reconstruction of the input signal being derived from a decoding of the encoded base bitstream.
  • 15. The method of claim 12, wherein extracting userdata comprises: obtaining a parameter indicating a number of bits used for user data values;obtaining bit values for a set of bits that are located at one end of the initial set of coefficient bits, the set of bits being added during the bit shift operation applied during encoding; andpost-processing the bit values to reconstruct a set of user data values,wherein the size of the bit shift operation is set by the number of bits-D-used for user data values.
  • 16. The method of claim 12, wherein the transform operation outputs values for a set of data elements for each block of signal data, and the coefficient bits represent transformed coefficient values for a predefined one of the set of data elements.
  • 17. The method of claim 16, wherein the initial set of coefficient bits represent values for the predefined one of the set of data elements for different n by n blocks of signal data, and the inverse transform operation implements a matrix multiplication applied to vectors of the set of data elements to regenerate the blocks of signal data and wherein the matrix multiplication comprises a multiplication with an n2 by n2 Hadamard matrix.
  • 18. The method of claim 17, comprising: associating extracted userdata with specific locations in the reconstruction of the input signal based on the location of the blocks of signal data with respect to the input signal,wherein the bit shift operation applied to the initial set of coefficient bits is a right shift.
  • 19. The method of claim 12, wherein the input signal comprises a video signal, and the method is applied for blocks of data for at least one color plane associated with frames of the video signal.
  • 20. A decoder configured to perform the method according to claim 12.
  • 21. A bitstream comprising: an encoded enhancement bitstream for a first enhancement sub-layer at a first level of quality comprising:a modified set of coefficient bits carrying custom data and an encoding of signal data, the modified set of coefficient bits being derived from;an initial set of coefficient bits representing values for a set of transformed coefficients, the values being generated during encoding by applying at least a transform operation to blocks of signal data derived from an input signal; anduserdata bits at a set of end bits of the initial set of coefficient bits, said userdata bits representing said custom data;wherein the initial set of coefficient bits are useable to obtain a first set of residual data configured to be combined with a reconstruction of the input signal generated from a representation of the input signal at a lower level of quality to generate a reconstruction of the input signal at a first level of quality, anda further encoded enhancement bitstream for a second enhancement sub-layer at a second level of quality, the second level of quality being higher than the first level of quality, the further encoded enhancement bitstream comprising:a set of coefficient bits without userdata, the set of coefficient bits without userdata representing values for a set of transformed coefficients at a second level of quality, the values being generated by applying at least a transform operation to blocks of signal data at the second level of quality,wherein the set of coefficient bits is generated without applying a bit shift operation;wherein the set of coefficient bits without user data are useable to obtain a second set of residual data for the second level of quality for combining with a reconstruction at the second level of quality derived from the reconstruction of the input signal at the first level of quality to generate a reconstruction of the input signal at the second level of quality.
Priority Claims (3)
Number Date Country Kind
1915553.0 Oct 2019 GB national
2000430.5 Jan 2020 GB national
2001408.0 Jan 2020 GB national
Parent Case Info

The present application is a continuation of U.S. patent application Ser. No. 17/770,114, filed Apr. 19, 2022, which is a continuation of U.S. patent application Ser. No. 17/164,422, filed Feb. 1, 2021, which is a continuation of U.S. patent application Ser. No. 16/078,352, filed Aug. 21, 2018, which is a 371 US National Stage Entry of PCT/GB2017/050584, filed Mar. 3, 2017, which claims priority to UK Patent Application No. 1603727.7, filed Mar. 3, 2016, the entire disclosures of which are incorporated herein by reference.

Provisional Applications (1)
Number Date Country
62984261 Mar 2020 US
Continuations (1)
Number Date Country
Parent 17770114 Apr 2022 US
Child 18739998 US