This invention relates generally to transmitting and receiving a composite image and in the most important example to video broadcasting systems and notably to a framework which enables the transmission of additional information useful for post production editing and/or composition of video sequences. With this framework, flexibility in content production can be achieved in the context of digital video broadcasting.
Embodiments of this invention are directed to the digital video broadcasting area which aims at delivering video content through the broadcasting chain which roughly consists in four phases: video content production, post-production editing, video content transmission and receiver reception with possible further processing. During the post-production editing and receiver side processing phases, a video is manipulated in order to enhance its quality, insert or delete some image areas, compose it with other videos, etc. Moreover, at the receiver side, some processing could be also performed to embed secondary streams which carry additional information for particular audience. An example of this additional information may be represented by the sign language interpreter video to help deaf people to follow broadcasted programs. The processing carried out during the aforementioned manipulations may require some information which needs to be shared among the different parties involved in the broadcasting delivery chain. Therefore, it is important to provide an efficient representation of this information to allow flexibility in the content manipulation and transmission at affordable bandwidth.
One example of such information needed for post-production and/or receiver side processing is a transparency mask represented by the so-called alpha channel. An alpha channel is a signal associated to a particular video content and is typically used to compose different videos together or to insert objects in a video. It should be noted however that the transparency mask of this invention may encompass an alpha channel of any form. In particular, an alpha channel may be represented as a video sequence with the same number of frames and whereby each frame has the same width and height of the frames relative to the video content associated to the alpha channel. Each pixel in alpha channel signals assumes a value in the range [vmin, vmax] which represents the degree of opacity (or equivalently the degree of transparency) for that particular pixel. An example of one frame for a particular alpha channel is shown in
It is an objective of the present invention to enable the transmission of information useful for video editing and post production processing performed at different stages of one typical video broadcasting delivery chain.
In one aspect the present invention consists in a method of transmitting, in a video sequence of images, a composite image comprising at least a foreground image and a transparency mask, comprising the steps of encoding the foreground image; determining whether the transparency mask is the same as a transparency mask of a preceding image in the video sequence; where the transparency mask is not the same as a transparency mask of a preceding image, encoding the transparency mask as an image; and transmitting the encoded foreground image and any encoded transparency mask together with a flag signifying whether the encoded transparency mask for a preceding image is to be used in association with the encoded foreground image of the current image.
Preferably, the method further comprises the steps of transmitting a flag signifying whether or not the encoded transparency mask is to be decoded as a binary transparency mask in which each pixel can take only two values. Pixel values in a transparency mask may be compared with a threshold to derive a binary transparency mask. Clipping values may be signalled to a decoder for use in clipping of a decoded binary transparency mask. A binary transparency mask may be encoded by partitioning each mask into a non-overlapping grid of blocks; coding each block by transmitting its pixel value if all the pixels of the block share the same value or a split flag to signal that the block should be further split; and continuing the process recursively. A minimum allowed block size may be determined and the process of block splitting continued recursively until the minimum allowed block size is reached. Blocks with the minimum size which contain pixels with values which are not all equal may be encoded using predictive and entropy coding techniques including Differential Pulse Code Modulation (DPMC).
Suitably, the method further comprises the step of transmitting the encoded foreground image together with compositing information such as the size or location of the foreground image in the composite image. The compositing information may include the colour of pixels forming a frame of the composite image.
In another aspect, the present invention consists in a method of decoding a composite image, comprising the steps of receiving an encoded foreground image and any encoded transparency mask together with a flag; decoding the encoded foreground image; and where indicated by said flag, using the foreground image in association with the transparency mask for a preceding image in forming a composite image.
Preferably, the method comprises the further steps of, where indicated by a flag, decoding the encoded transparency mask as a binary transparency mask in which each pixel can take only two values; and using the foreground image in association with the binary transparency mask in forming a composite image. The step of decoding the encoded transparency mask as a binary transparency mask may comprise a decoding step to produce a preliminary transparency mask in which pixels are not constrained to take only two values; and a clipping step to produce a binary transparency mask in which pixels are constrained to take only two values. The clipping step may utilise clipping values signalled to the decoder by an encoder.
Suitably, the method further comprises the step of receiving an encoded foreground image together with compositing information; and using the foreground image in accordance with the compositing information to form a composite image. The foreground image may be scaled according to size information in the compositing information. The foreground image may be positioned in the composite image according to position information in the compositing information. A frame of the composite image may assume a colour specified by the compositing information.
The composite image may form part of a video sequence of images with coded data relating to the transparency mask is transmitted as a secondary picture in the same access unit as the coded data relating to the foreground image forming the primary coded picture. The foreground image and transparency mask may be encoded according to a video coding standard such as H.264/AVC and HEVC. Each flag may be represented in the syntax header element Sequence Parameter Set (SPS) of the H.264/AVC or HEVC standard.
Compositing information may be organised in a Supplementary Enhanced Information (SEI) message as specified by the H.264/AVC and the HEVC standards. The information contained in the SEI message for the purpose of frame composition may persist for only the time instant where the SEI message is received or may persist until a new SEI message is received.
In the following description, the term alpha channel will used to describe an example of a transparency mask.
According to one arrangement, a video sequence corresponding to the main broadcasted program is divided into frames which are encoded using motion compensated predictive video coding techniques standardised by H.264/AVC or the new High Efficiency Video Coding (HEVC) standard. Both for the H.264/AVC and HEVC standards, the coded data relative to one frame are organised into access units which contain a set of Network Abstraction Layer (NAL) units. Each NAL unit contains the coded data relative to the coded video sequence. These data may be headers relative to video sequence parameter (e.g. frame width and height) or may be data relative to the frame pixels themselves. In order to keep together the main broadcasted program and its associated alpha channel, the presence of alpha channel pictures (hereafter will be also denoted as secondary pictures) is signalled in the same access unit of the coded picture relative to the main video broadcasted video (hereafter also referred as the foreground image or primary picture). It may also be useful to signal data for frame composition, alpha channel processing after decoding and post processing the frame composed using the alpha channel. Finally, there is also provided a simplified coding algorithm for alpha channel signals which assume only two values (vtransparent and Vopaque) and are also denoted as binary alpha channels.
The present invention will now be described by a way of several examples related to the field of post-production editing and frame composition. These examples involve the use of secondary pictures to embed the alpha channel signal in video bitstreams to ease editing and processing. The examples also use the concept of Supplementary Enhanced Information (SEI) messages which are syntax elements carrying information useful for video processing. Finally, the examples also provide a simplified encoding algorithm for binary alpha channels which requires lower computational complexity with respect to classis and generic video coding techniques.
In order to keep together the data associated to the primary coded pictures and the alpha channel, it is proposed to signal the presence of the alpha channel compressed data in each access unit relative to the primary picture.
The flag secondary_picture_present specifies whether in the same access unit of the primary picture, the coded data of the alpha channel are present. The flag is_binary_secondary_picture specifies whether the transparency mask is a binary picture and therefore can assume only two values (transparent and opaque). The quantity bit_depth_secondary_picture specifies the bit depth for the pixels in the alpha channel. In case of binary transparency mask, this quantity is equal to one. The quantity value_opaque_pixels specifies the value for pixels in the alpha channel which are classified as opaque and dually the quantity value_transparent_pixels specifies the value for transparent pixels.
In some applications the required alpha channel may assume only binary values, i.e. either αtransparent or αopaque. Examples of some applications are logo insertion advertisements broadcasting or insertion of the sign language interpreters in the news to help deaf people to follow the programs. Since only a binary channel is required, the encoding process is simplified by having only two values to transmit. The use of a binary alpha channel is denoted by a flag. An example of binary alpha channel is depicted in
Given that the alpha channel may be used during the frame composition, the accurate coding of its sharp edges is important. In fact, conventional lossy compression algorithms may smooth and blur the edges of binary alpha channels resulting into annoying artifacts in the final composed frame. Moreover, from
When the transmitted alpha channel signal is decoded, its values may need to be clipped in order to stay in the range [αtransparent, αopaque]. Moreover, for some video broadcasting applications, although the needed alpha channel is binary, the transmitter may apply some processing to soften/smooth the alpha channel so that the its compression can be improved. At the received, the decoded alpha channel has to be put back to a binary one. In this case an appropriate threshold should be applied to the decoded alpha channel values. The needed thresholds can be signalled in the syntax structures of the coded video which carry the information for the sequence level parameters. In one example, threshold is signalled in the SPS as follows:
The quantity alpha_clipping_type specify which kind of clipping may be applied to the alpha channel values. Examples of useful clipping operations are depicted in
The flag secondary_picture_status has four values with the following meaning:
The chroma keying technique consists in extracting the pixels from one picture which are different from one specific value (usually referred as the key) of luminance or any other suitable colour space representation (e.g. red, green and blue). Usually given the camera noise and other imperfections during the content acquisition process, the image pixels, although they should have the key value, they present a value slightly different from the key which may be misinterpreted by the chroma key technique. In order to overcome this drawback, some robust chroma keying methods have been devised in the literature which require a significant amount of computational resources. These kinds of chroma keying techniques, may not be suitable when the processing has to be performed at the decoder side. Therefore, one alternative approach is to perform the keying at the transmitter side where the computational resources are less limited and then set the pixels which are meant to have the key value to exactly this key value. The key is then transmitted together with the video and then at the receiver the chroma keying process is a simple binary classification (background/foreground). Since lossy encoding may be applied to the transmitted image, the pixels with the key value may have a value which differs from the original key. In this case an interval value can sent so that all the pixel value falling in that interval can be still considered as belonging to the background. In one example, this interval value may be represented by a tolerance value so that a pixel still belongs to the background if D=|V−K|<T, where V is the pixel value, K is the value for the key, T is the tolerance and |•| denotes the absolute difference. The value for the key and the interval can be transmitted in the syntax structures of the coded video which carry the information for the sequence level parameters. In one example the syntax structure can be the SPS of the H.264/AVC and HEVC standards as follows:
The flag key_value_present indicates whether the coded video contains pixels with a conventions key value. The quantities key_value_component—1, . . . , key_value_component_n specify the key values for each component of the pixels in the video sequence. Finally, the quantities tolerance_value_component—1, . . . , tolerance_value_component_n specify how much a pixel value may differ from the key to be still considered as belonging to the background.
The flag frame_comp_info_persistence_flag specifies whether the current SEI message overwrites the information for frame composition previously received. Depending on the value, the flag may indicate that the information is overwritten only for the frame at the same time instant when the SEI message is received or is overwritten for all the following frames starting from the time instant when the SEI message is received until a new SEI massage is received. The quantities composite_frame_background_colour—1, . . . , composite_frame_background_colour_n specify the colour assumed by all the components of the background pixels in the composite frame. The quantities frame—0_offset_left and frame—0_offset_top specify the position in the composite frame of the top left corner for frame 0. Similarly, the quantities frame—1_offset_left and frame—1_offset_top specifies the position in the composite frame for frame 1. The quantities frame—0_width and frame—0_height specify the width and height of frame 0 in the composite frame. Similar meaning is expressed by frame—1_width and frame—1_height for frame 1.
Number | Date | Country | Kind |
---|---|---|---|
1306208.8 | Apr 2013 | GB | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/GB2014/051011 | 3/31/2014 | WO | 00 |