The disclosed subject matter relates to systems, methods, and media for providing interactive video using scalable video coding.
Digital video systems have become widely used for varying purposes ranging from entertainment to video conferencing. Many digital video systems require providing different video signals to different recipients. This can be a quite complex process.
For example, traditionally, when different content is desired to be provided to different recipients, a separate video encoder would need to be provided for each recipient. In this way, the video for that recipient would be encoded for that user by the corresponding encoder. Dedicated encoders for individual users may be prohibitively expensive, however, both in terms of processing power and bandwidth.
Similarly, in order to facilitate interactive video for an end-user, it is commonly required to use a different encoder for each state of the video. For example, this may be the case with real-time on-screen menus that may have different combinations of on-screen elements, close captions and translations that may be provided in different languages, Video On Demand (VOD) that can provide different levels of content, etc. In each of these types of products, an end user may desire to interactively switch the content that is being received, and hence change what content needs to be encoded for that user.
Accordingly, it is desirable to provide mechanisms for controlling video signals.
Systems, methods, and media for providing interactive video using scalable video coding are provided. In some embodiments, systems for providing interactive video using scalable video coding are provided, the systems comprising: at least one microprocessor programmed to at least: provide at least one scalable video coding capable encoder that at least: receives at least a base content sequence and a plurality of mutually exclusive added content sequences that have different content from the base content sequence; produces a first scalable video coding compliant stream that includes at least a basic layer, that corresponds to the base content sequence, and a first mutually exclusive enhancement layer, that corresponds to content in a first of the plurality of mutually exclusive added content sequences; and produces at least a second mutually exclusive enhancement layer, that corresponds to content in a second of the plurality of mutually exclusive added content sequences; and perform multiplexing of the first scalable video coding compliant stream and the second mutually exclusive enhancement layer to provide a second stream.
In some embodiments, methods for providing interactive video using scalable video coding are provided, the methods comprising: receiving at least a base content sequence and a plurality of mutually exclusive added content sequences that have different content from the base content sequence; producing a first scalable video coding compliant stream that includes at least a basic layer, that corresponds to the base content sequence, and a first mutually exclusive enhancement layer, that corresponds to content in a first of the plurality of mutually exclusive added content sequences; producing at least a second mutually exclusive enhancement layer, that corresponds to content in a second of the plurality of mutually exclusive added content sequences; and performing multiplexing of the first scalable video coding compliant stream and the second mutually exclusive enhancement layer to provide a second stream.
In some embodiments, computer-readable media encoded with computer-executable instructions that, when executed by a microprocessor programmed with the instructions, cause the microprocessor to perform a method for providing interactive video using scalable video coding are provided, the method comprising: receiving at least a base content sequence and a plurality of mutually exclusive added content sequences that have different content from the base content sequence; producing a first scalable video coding compliant stream that includes at least a basic layer, that corresponds to the base content sequence, and a first mutually exclusive enhancement layer, that corresponds to content in a first of the plurality of mutually exclusive added content sequences; producing at least a second mutually exclusive enhancement layer, that corresponds to content in a second of the plurality of mutually exclusive added content sequences; and performing multiplexing of the first scalable video coding compliant stream and the second mutually exclusive enhancement layer to provide a second stream
a is a diagram of an SVC-capable encoder in accordance with some embodiments of the disclosed subject matter.
b is a diagram of another SVC-capable encoder in accordance with some embodiments of the disclosed subject matter.
c is a diagram of yet another SVC-capable encoder in accordance with some embodiments of the disclosed subject matter.
a is a diagram illustrating the combination of basic and enhancement layers in accordance with some embodiments of the disclosed subject matter.
b is another diagram illustrating the combination of basic and enhancement layers in accordance with some embodiments of the disclosed subject matter.
a is a diagram showing contents of two SVC streams and a non-SVC-compliant stream produced by multiplexing the two SVC streams in accordance with some embodiments of the disclosed subject matter.
b is a diagram of how the contents of the non-SVC-compliant stream of
Systems, methods, and media for providing interactive video using scalable video coding are provided. In accordance with various embodiments, two or more video signals can be provided to a scalable video coding (SVC)-capable encoder so that a basic layer and one or more enhancement layers are produced by the encoder. The basic layer can be used to provide base video content and the enhancement layer(s) can be used to modify that base video content with added video content. By controlling when the enhancement layer(s) are available (e.g., by concealing corresponding packets, by selecting corresponding packets, etc.), the availability of the added video content by a video display can be controlled.
A scalable video protocol may include any video compression protocol that allows decoding of different representations of video from data encoded using that protocol. The different representations of video may include different resolutions (spatial scalability), frame rates (temporal scalability), bit rates (SNR scalability), portions of content, and/or any other suitable characteristic. Different representations may be encoded in different subsets of the data, or may be encoded in the same subset of the data, in different embodiments. For example, some scalable video protocols may use layering that provides one or more representations (such as a high resolution image of a user, or an on-screen graphic) of a video signal in one layer and one or more other representations (such as a low resolution image of the user, or a non-graphic portion) of the video signal in another layer. As another example, some scalable video protocols may split up a data stream (e.g., in the form of packets) so that different representations of a video signal are found in different portions of the data stream. Examples of scalable video protocols may include the Scalable Video Coding (SVC) protocol defined by the Scalable Video Coding Extension of the H.264/AVC Standard (Annex G) from the International Telecommunication Union (ITU), the MPEG2 protocol defined by the Motion Picture Experts Group, the H.263 (Annex O) protocol from the ITU, and the MPEG4 part 2 FGS protocol from the Motion Picture Experts Group, each of which is hereby incorporated by reference herein in its entirety.
Turning to
Base content sequence 102 can be any suitable video signal containing any suitable content. For example, in some embodiments, base content sequence can be video content that is fully or partially in a low-resolution format. This low-resolution video content may be suitable as a teaser to entice a viewer to purchase a higher resolution version of the content, as a more particular example. As another example, in some embodiments, base content sequence can be video content that is fully or partially distorted to prevent complete viewing of the video content. As another example, in some embodiments, base content sequence can be video content that is missing text (such as close captioning, translations, etc.) or graphics (such as logos, icons, advertisements, etc.) that may be desirable for some viewers.
Added content sequence(s) 104 can be any suitable content that provides a desired total content sequence. For example, when base content sequence 102 includes low-resolution content, added content sequence(s) 104 can be a higher resolution sequence of the same content. As another example, when base content sequence 102 is video content that is missing desired text or graphics, added content sequence(s) 104 can be the video content with the desired text or graphics.
Additionally or alternatively, in some embodiments, added content sequence(s) 104 can be any suitable content that provides a desired portion of a content sequence. For example, when a base content sequence 102 includes television program, added content sequences 104 can include close captioning content in different languages (e.g., one sequence 104 is English, one sequence 104 is in Spanish, etc.).
In some embodiments, the resolution and other parameters of the base content sequence and added content sequence(s) can be identical. In some embodiments, in case that added content is restricted to a small part of a display screen (e.g., as in the case of a logo or a caption), it may be beneficial to position the content in the added content sequence, so that is aligned to macro block (MB) boundaries. This may improve the visual quality of the one or more enhancements layers encoded by the SVC encoder.
SVC-capable encoder 106 can be any suitable SVC-capable encoder for providing an SVC stream, or can include more than one SVC-capable encoders that each provide an SVC stream. For example, in some embodiments, SVC-capable encoder 106 can implement a layered approach (similar to Coarse Grained Scalability) in which two layers are defined (basic and enhancement), the spatial resolution factor is set to one, intra prediction is applied only to the basic layer, the quantization error between a low-quality sequence and a higher-quality sequence is encoded using residual coding, and motion data, up-sampling, and/or other trans-coding is not performed. As another example, SVC-capable encoder 106 (and sub-encoders 261 and 281 of
Such an SVC encoder can be implemented in any suitable hardware in accordance with some embodiments. For example, such an SVC encoder can be implemented in a special purpose computer or a general purpose computer programmed to perform the functions of the SVC encoder. As another example, an SVC encoder can be implemented in dedicated hardware that is configured to provide such an encoder. This dedicated hardware can be part of a larger device or system, or can be the primary component of a device or system. Such a special purpose computer, general purpose computer, or dedicated hardware can be implemented using any suitable components. For example, these components can include a processor (such as a microprocessor, microcontroller, digital signal processor, programmable gate array, etc.), memory (such as random access memory, read only memory, flash memory, etc.), interfaces (such as computer network interfaces, etc.), displays, input devices (such as keyboards, pointing devices, etc.), etc.
As mentioned above, SVC-capable encoder 106 can provide SVC stream 108, which can include basic layer 110 and one or more enhancement layers 112. The basic layer, when decoded, can provide the signal in base content sequence 102. The one or more enhancement layers 112, when decoded, can provide any suitable content that, when combined with basic layer 110, can be used to provide a desired video content. Decoding of the SVC stream can be performed by any suitable SVC decoder, and the basic layer can be decoded by any suitable Advanced Video Coding (AVC) decoder in some embodiments.
While
Turning to
Data from motion compensation and intra prediction process 202 can then be used by inter-layer prediction techniques 220, along with added content sequence 104, to drive motion compensation and prediction mechanism 212. Any suitable data from motion compensation and intra prediction mechanism 202 can be used. Any suitable SVC inter-layer prediction techniques 220 and any suitable SVC motion compensation and intra prediction processes in mechanism 212 can be used. A residual texture signal 214 (produced by motion compensation and intra prediction mechanisms 212) may then be quantized and provided together with the motion signal 216 to entropy coding mechanism 218. Entropy coding mechanism 218 may then perform any suitable entropy coding function and provide the resulting signal to multiplexer 210. Multiplexer 210 can then combine the resulting signals from entropy coding mechanisms 208 and 218 as an SVC compliant stream 108.
Side information can also be provided to encoder 106 in some embodiments. This side information can identify, for example, a region of an image where content corresponding to a difference between the base content sequence and an added content sequence is (e.g., where a logo or text may be located). In some embodiments, side information can additionally or alternatively identify the content (e.g., close caption data in English, close caption data in Spanish, etc.) that is in each enhancement layer. The side information can then be used in a mode decision step within block 212 to determine whether to process the added content sequence or not.
Turning to
Data from motion compensation and intra prediction processes 252 and 253 can then be used by inter-layer prediction techniques 270 and 290 (respectively), along with added content sequences 104, to drive motion compensation and prediction mechanisms 262 and 282 (respectively). Any suitable data from motion compensation and intra prediction mechanisms 252 and 253 can be used. Any suitable SVC inter-layer prediction techniques 270 and 290 and any suitable SVC motion compensation and intra prediction processes in mechanisms 262 and 282 can be used. Residual texture signals 264 and 284 (produced by motion compensation and intra prediction mechanisms 262 and 282, respectively) may then be quantized and provided together with the motion signals 266 and 286 to entropy coding mechanisms 268 and 288, respectively. Entropy coding mechanisms 268 and 288 may then perform any suitable entropy coding function and provide the resulting signal to multiplexers 260 and 280, respectively. Multiplexers 260 and 280 can then combine the resulting signals from entropy coding mechanisms 258 and 268 and entropy coding mechanisms 258 and 288 as SVC compliant streams 294 and 296. These SVC compliant streams can then be provided to multiplexer 292, which can produce a non-SVC compliant stream 295.
In some embodiments, rather than using two sub-encoders 261 and 281, as shown in
In some embodiments, the quantization levels used by motion compensation and intra prediction mechanisms 252, 253, 262, and 282 are all identical.
In some embodiments, multiplexer 292 can include a mechanism to prevent duplicate base content from streams 294 or 296 from being in stream 295 as described further in connection with
Controller 310, or a similar mechanism in a network component, display, endpoint, etc., may use any suitable software and/or hardware to control which enhancement layers are presented and/or which packets of an SVC stream are concealed. For example, these devices may include a digital processing device that may include one or more of a microprocessor, a processor, a controller, a microcontroller, a programmable logic device, and/or any other suitable hardware and/or software for controlling which enhancement layers are presented and/or which packets of an SVC stream are concealed.
In some embodiments, controller 310 can be omitted.
Such a video distribution system, as described in connection with
Turning to
Additionally or alternatively to providing three SVC streams, a single stream may be generated and only selected portions (e.g., packets) utilized at each of video displays 312, 314, and 316. The selection of portions may be performed at the displays or at a component between the encoder and the displays as described above in some embodiments.
Turning to
MCU 502 can include an SVC-capable encoder 504 and a video generator 506. Video generator 506 may generate a continuous presence (CP) layout in any suitable fashion and provide this layout as a base content sequence to SVC-capable encoder 504. The SVC capable encoder may also receive as added content sequences current speaker video, previous speaker video, and other participant video from current speaker end point 508, previous speaker end point 510, and other participant end points 512, 514, and 516, respectively. SVC streams can then be provided from encoder 504 to current speaker end point 508, previous speaker end point 510, and other participant end points 512, 514, and 516 and be controlled as described below in connection with
As illustrated in
Although
Turning to
As shown in
In some embodiments, any suitable computer readable media can be used for storing instructions for performing the functions described herein. For example, in some embodiments, computer readable media can be transitory or non-transitory. For example, non-transitory computer readable media can include media such as magnetic media (such as hard disks, floppy disks, etc.), optical media (such as compact discs, digital video discs, Blu-ray discs, etc.), semiconductor media (such as flash memory, electrically programmable read only memory (EPROM), electrically erasable programmable read only memory (EEPROM), etc.), any suitable media that is not fleeting or devoid of any semblance of permanence during transmission, and/or any suitable tangible media. As another example, transitory computer readable media can include signals on networks, in wires, conductors, optical fibers, circuits, any suitable media that is fleeting and devoid of any semblance of permanence during transmission, and/or any suitable intangible media.
Although the invention has been described and illustrated in the foregoing illustrative embodiments, it is understood that the present disclosure has been made only by way of example, and that numerous changes in the details of implementation of the invention can be made without departing from the spirit and scope of the invention, which is only limited by the claims which follow. Features of the disclosed embodiments can be combined and rearranged in various ways.
An example of a “encoder.cfg” configuration file that may be used with a JSVM 9.1 encoder in some embodiments is shown below:
An example of a “base_content.cfg” configuration file (as referenced in the “encoder.cfg” file) that may be used with a JSVM 9.1 encoder in some embodiments is shown below:
An example of a “added_content.cfg” configuration file (as referenced in the “encoder.cfg” file) that may be used with a JSVM 9.1 encoder in some embodiments is shown below:
This application is a continuation-in-part of U.S. patent application Ser. No. 12/170,674, filed Jul. 10, 2008, which is hereby incorporated by reference herein in its entirety.
Number | Date | Country | |
---|---|---|---|
Parent | 12170674 | Jul 2008 | US |
Child | 12761885 | US |