When video is streamed over the Internet and played back through a Web browser or media player, the video is delivered in digital form. Digital video is also used when video is delivered through many broadcast services, satellite services and cable television services. Real-time videoconferencing often uses digital video, and digital video is used during video capture with most smartphones, Web cameras and other video capture devices.
Digital video can consume an extremely high amount of bits. The number of bits that is used per second of represented video content is known as the bit rate. Engineers use compression (also called source coding or source encoding) to reduce the bit rate of digital video. Compression decreases the cost of storing and transmitting video information by converting the information into a lower bit rate form. Decompression (also called decoding) reconstructs a version of the original information from the compressed form. A “codec” is an encoder/decoder system.
Over the last two decades, various video codec standards have been adopted, including the H.261, H.262 (MPEG-2 or ISO/IEC 13818-2), H.263 and H.264 (MPEG-4 AVC or ISO/IEC 14496-10) standards and the MPEG-1 (ISO/IEC 11172-2), MPEG-4 Visual (ISO/IEC 14496-2) and SMPTE 421M standards. In particular, decoding according to the H.264 standard is widely used in game consoles and media players to play back encoded video. H.264 decoding is also widely used in set-top boxes, personal computers, smart phones and other mobile computing devices for playback of encoded video streamed over the Internet or other networks. A video codec standard typically defines options for the syntax of an encoded video bitstream, detailing parameters in the bitstream when particular features are used in encoding and decoding. In many cases, a video codec standard also provides details about the decoding operations a decoder should perform to achieve correct results in decoding.
Several factors affect quality of video information, including spatial resolution, frame rate and distortion. Spatial resolution generally refers to the number of samples in a video image. Images with higher spatial resolution tend to look crisper than other images and contain more discernable details. Frame rate is a common term for temporal resolution for video. Video with higher frame rate tends to mimic the smooth motion of natural objects better than other video, and can similarly be considered to contain more detail in the temporal dimension. During encoding, an encoder can selectively introduce distortion to reduce bit rate, usually by quantizing video information during encoding. If an encoder introduces little distortion, the encoder maintains quality at the cost of higher bit rate. An encoder can introduce more distortion to reduce bit rate, but quality typically suffers. For these factors, the tradeoff for high quality is the cost of storing and transmitting the information in terms of bit rate.
When encoded video is delivered over the Internet to set-top boxes, mobile computing devices or personal computers, one video source can provide encoded video to multiple receiver devices. Or, in a videoconference, one device may deliver encoded video to multiple receiver devices. Different receiver devices may have different screen sizes or computational capabilities, with some devices able to decode and play back high quality video, and other devices only able to play back lower quality video. Also, different receiver devices may use network connections having different bandwidths, with some devices able to receive higher bit rate (higher quality) encoded video, and other devices only able to receive lower bit rate (lower quality) encoded video.
In such scenarios, with simulcast encoding and delivery, video is encoded in multiple different ways to provide versions of the video at different levels of distortion, temporal quality and/or spatial resolution quality. Each version of video is represented in a bitstream that can be decoded to reconstruct that version of the video, independent of decoding other versions of the video. A video source (or given receiver device) can select an appropriate version of video for delivery to the receiver device, considering available network bandwidth, screen size, computational capabilities, or another characteristic of the receiver device.
Scalable video coding and decoding are another way to provide different versions of video at different levels of distortion, temporal quality and/or spatial resolution quality. With scalable video coding, an encoder splits video into a base layer and one or more enhancement layers. The base layer alone provides a reconstruction of the video at a lower quality level (e.g., lower frame rate, lower spatial resolution and/or higher distortion). One or more enhancement layers can be decoded along with the base layer video data to provide a reconstruction with increased video quality in terms of higher frame rate, higher spatial resolution and/or lower distortion. Scalability in terms of distortion is sometimes called signal-to-noise ratio (SNR) scalability. A receiver device can receive a scalable video bitstream and decode those parts of it appropriate for the receiver device, which may be the base layer video only, the base layer video plus some of the enhancement layer video, or the base layer video plus all enhancement layer video. Or, a video source, media server or given receiver device can select an appropriate version of video for delivery to the receiver device, considering available network bandwidth, screen size, computational capabilities, or another characteristic of the receiver device, and deliver only layers for that version of the video to the receiver device.
Scalable video coding enables a rich set of configuration options, but this flexibility poses challenges for an encoder to advertise its encoding capabilities. It also poses challenges in terms of configuring which scalable video coding options are used for a given bitstream and signaling of run-time controls during encoding.
In summary, innovations described herein provide a framework for advertising encoder capabilities, initializing encoder configuration, and signaling run-time control messages for video coding and decoding. In various scenarios, the framework facilitates scalable video coding/decoding, simulcast video coding/decoding, or video coding/decoding that combines features of scalable and simulcast video coding/decoding.
According to a first set of innovations described herein, encoder capabilities are advertised. A controller for encoding receives a request for encoder capability data. For example, the encoding controller receives the request as part of a function call from a controller for a decoding host. For a given session, the encoding can include scalable video coding and/or simulcast video coding.
The encoding controller determines the encoder capability data, which can include various types of data. It can include data that indicate a number of bitstreams, each bitstream providing an alternative version of input video. For example, the number of bitstreams is a maximum number of simulcast bitstreams supported by an encoder, where each bitstream can be encoded as a scalable bitstream or non-scalable bitstream. The capability data can also include data that indicate scalable video coding capabilities for encoding of the bitstreams. The encoder capability data can further include data that indicate computational limits for the encoding of the bitstreams, which can, for example, be parameterized in terms of macroblocks per second. The encoder capability data can also include data that indicate spatial resolution and/or frame rate of the input video. The encoding controller sends the encoder capability data, for example, as part of a reply, to a decoding host controller.
From the perspective of a decoding host controller, the decoding host controller creates a request for encoder capability data and sends the request, for example, as part of a function call to an encoding controller as described above. For a given session, the encoded video data can include data encoded using scalable video coding and/or simulcast video coding. The decoding host controller receives encoder capability data in reply and processes the encoder capability data.
According to a second set of innovations described herein, an encoder is initially configured. A controller for a decoding host determines encoder capability data for an encoder. For example, the decoding host controller determines the encoder capability data by requesting it and receiving it from an encoding controller. The decoding host controller then creates stream configuration request data based at least in part on the encoder capability data. The stream configuration request data can include various types of data. It can include data that indicate a number of bitstreams, each bitstream providing an alternative version of input video. For example, the number of bitstreams is a target number of simulcast bitstreams, where each bitstream can be encoded as a scalable bitstream or non-scalable bitstream. The configuration request data can also include data that indicate scalable video coding options for the bitstreams. The decoding host controller sends the stream configuration request data, for example, as part of a function call to the encoding controller. Eventually, the decoding host controller receives and processes a reply.
From the perspective of an encoding controller, the encoding controller receives stream configuration request data. The encoding controller processes the stream configuration request data, for example, configuring an encoder and allocating encoder resources if the stream configuration request is permissible. The encoding controller then sends a reply, e.g., indicating successful configuration according to the request, or indicating failure.
According to a third set of innovations described herein, run-time control messages are signaled. During decoding of encoded video data for a bitstream, a decoding host controller creates a control message for run-time control of encoding for the bitstream. The control message includes layer identifier data. For example, the layer identifier data can include at least a stream identifier for the bitstream and a layer identifier for a given layer of the bitstream. The layer identifier data can include different types of layer identifiers. The stream identifier and/or a layer identifier can use a wild card symbol to identify multiple different streams and/or multiple different layers. The decoding host controller then sends the control message, e.g., as part of a function call to an encoding controller. The control message can be, for example, a request to insert a synchronization picture for the given layer, a request to change spatial resolution for the given layer, a request to start streaming of a subset of the bitstream, a request to stop streaming of a subset of the bitstream, or another request.
From the perspective of a controller for encoding, the encoding controller receives and processes the control message. For example, the encoding controller receives a control message as described above from a decoding host controller. The way the control message is processed depends on the type of control message. The encoding controller then sends a reply, e.g., indicating successful processing of the control message as expected, or indicating failure.
The foregoing and other objects, features, and advantages of the invention will become more apparent from the following detailed description, which proceeds with reference to the accompanying figures.
Innovations for encoder capability advertisement, encoder configuration and run-time control for video coding and decoding are described herein. An encoding controller advertises encoding capabilities to a decoding host controller, which specifies an initial configuration for encoding subject to the advertised encoder capabilities. The decoding host controller and encoding controller can then exchange run-time control messages during streaming. Example data structures, signatures for functions calls, and call flows for communication between a decoding host controller and encoding controller are presented.
In some examples described herein, encoder capability advertisement, encoder configuration and run-time control messages are described for encoders that perform scalable video coding (SVC) compliant with the H.264 standard to produce H.264/SVC bitstreams. Innovations described herein can also be implemented for encoder capability advertisement, encoder configuration and run-time control messages for video coding and decoding according to other standards or formats. For example, innovations described herein can be used for encoder capability advertisement, encoder configuration and run-time control messages for VP6, VP8, SMPTE 421M or another format, including formats under development such as H.265 or HEVC.
More generally, various alternatives to the examples described herein are possible. Certain techniques described with reference to flowchart diagrams can be altered by changing the ordering of stages shown in the flowcharts, by splitting, repeating or omitting certain stages, etc. The various aspects of encoder capability advertisement, encoder configuration and run-time control messages can be used in combination or separately. Different embodiments use one or more of the described innovations. Some of the techniques and tools described herein address one or more of the problems noted in the background. Typically, a given technique/tool does not solve all such problems.
I. Example Computing Systems
With reference to
A computing system may have additional features. For example, the computing system (100) includes storage (140), one or more input devices (150), one or more output devices (160), and one or more communication connections (170). An interconnection mechanism (not shown) such as a bus, controller, or network interconnects the components of the computing system (100). Typically, operating system software (not shown) provides an operating environment for other software executing in the computing system (100), and coordinates activities of the components of the computing system (100).
The tangible storage (140) may be removable or non-removable, and includes magnetic disks, magnetic tapes or cassettes, CD-ROMs, DVDs, or any other medium which can be used to store information in a non-transitory way and which can be accessed within the computing system (100). The storage (140) stores instructions for the software (180) implementing one or more innovations for encoder capability advertisement, encoder configuration and run-time control for video coding and decoding.
The input device(s) (150) may be a touch input device such as a keyboard, mouse, pen, or trackball, a voice input device, a scanning device, or another device that provides input to the computing system (100). For video encoding, the input device(s) (150) may be a camera, video card, TV tuner card, or similar device that accepts video input in analog or digital form, or a CD-ROM or CD-RW that reads video samples into the computing system (100). The output device(s) (160) may be a display, printer, speaker, CD-writer, or another device that provides output from the computing system (100).
The communication connection(s) (170) enable communication over a communication medium to another computing entity. The communication medium conveys information such as computer-executable instructions, audio or video input or output, or other data in a modulated data signal. A modulated data signal is a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. By way of example, and not limitation, communication media can use an electrical, optical, RF, or other carrier.
The innovations can be described in the general context of computer-readable media. Computer-readable media are any available tangible media that can be accessed within a computing environment. By way of example, and not limitation, with the computing system (100), computer-readable media include memory (120, 125), storage (140), and combinations of any of the above.
The innovations can be described in the general context of computer-executable instructions, such as those included in program modules, being executed in a computing system on a target real or virtual processor. Generally, program modules include routines, programs, libraries, objects, classes, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The functionality of the program modules may be combined or split between program modules as desired in various embodiments. Computer-executable instructions for program modules may be executed within a local or distributed computing system.
The terms “system” and “device” are used interchangeably herein. Unless the context clearly indicates otherwise, neither term implies any limitation on a type of computing system or computing device. In general, a computing system or computing device can be local or distributed, and can include any combination of special-purpose hardware and/or general-purpose hardware with software implementing the functionality described herein.
For the sake of presentation, the detailed description uses terms like “determine” and “use” to describe computer operations in a computing system. These terms are high-level abstractions for operations performed by a computer, and should not be confused with acts performed by a human being. The actual computer operations corresponding to these terms vary depending on implementation.
II. Frameworks for Capability Advertisement, Configuration and Control
In the framework (201) shown in
The bitstream (223) is conveyed over a transport channel (230) using an appropriate communication protocol. The transport channel (230) can include the Internet or another computer network.
The controller (222) is an encoding controller that communicates with the encoder (211). The encoding controller (222) can be part of the same computing system as the encoder (211) or part of another computing system. In general, the encoding controller (222) gets encoder capability data from the encoder (211) and advertises the capabilities of the encoder (211) to one or more controllers for decoding hosts. Sections III, IV and VII describe aspects of encoder capability advertisement.
The controller (252) is a decoding host controller that communicates with the decoder(s) (271, . . . , 27n). The decoding host controller (252) can be part of the same computing system as the decoder(s) (271, . . . , 27n) or part of another computing system. In general, the decoding host controller (252) gets the encoder capability data from the encoding controller (222) and creates stream configuration request data appropriate for the decoder(s) (271, . . . , 27n). The stream configuration request data can be set considering the computational capabilities, screen size or quality setting of a given decoder or decoders, or considering the network bandwidth between the encoder (211) and decoder(s) (271, . . . , 27n). The stream configuration request data are also set subject to the encoder capability data. The decoding host controller sends the configuration request data to the encoding controller (222), which uses the configuration request data to configure the encoder (211). Sections III, V and VII describe aspects of initial encoder configuration.
The decoding host controller (252) and encoding controller (222) can also exchange run-time control messages during streaming and playback of the bitstream. Such run-time control messages can be based upon feedback from one of the decoder(s) (271, . . . , 27n) to the decoding host controller (252). Sections III, VI and VII describe aspects of run-time control.
In the framework (202) shown in
The multi-layer encoder (210) can include a single encoder used multiple times to encode different versions of video in different component bitstreams for simulcast transmission. Or, the multi-layer encoder (210) can include multiple encoders used to produce the respective component bitstreams in parallel. The multi-layer encoder (210) can encode video for a videoconference, video telephone call, streaming over the Internet, or other use scenario. The component bitstreams can differ from each other in terms of the number of layers of temporal, spatial and/or SNR scalability supported in the bitstream, if the bitstream is scalable at all. The component bitstreams can all use the same format, or different component bitstreams can use different formats. The component bitstreams can be encoded for the same profile and level of decoding, or different component bitstreams can be encoded for different profile and/or level of decoding to serve decoders with different capabilities.
The multi-layer encoder (210) multiplexes the component bitstreams together to form a multi-layer encoding (MLE) bitstream (221). In doing so, the multi-layer encoder (210) applies composition rules to facilitate demultiplexing and avoid contradictory assignments of values to parameters in the MLE bitstream (221). Example composition rules are described in U.S. patent application Ser. No. 13/235,217, filed Sep. 16, 2011, entitled “Multi-layer Encoding and Decoding,” the disclosure of which is hereby incorporated by reference. In the context of the H.264 standard, an MLE bitstream can include multiple H.264/SVC bitstreams multiplexed together, multiple H.264/AVC bitstreams multiplexed together or a mix of H.264/AVC and H.264/SVC bitstreams multiplexed together.
The MLE bitstream DEMUX (250) receives the MLE bitstream (221) and demultiplexes at least part of a component bitstream (251) from it. The MLE DEMUX (250) applies decomposition rules in demultiplexing. The DEMUX (250) can be part of a multi-point conferencing unit in a videoconferencing system, network server that distributes streaming media, receiver, or other entity in a network environment. The operations of the DEMUX (250) depend on its role, as further detailed in U.S. patent application Ser. No. 13/235,217, which also describes example decomposition rules. In general, considering the computational capabilities, screen size or quality setting of a given decoder, or considering network bandwidth, the DEMUX (250) selects all or part of a component bitstream (251) that is appropriate in terms of bit rate, spatial resolution, frame rate or other quality level, for delivery to a decoder. Different decoders (271, 272, . . . , 27n) can thus receive different versions of the video from the MLE DEMUX (250). The number of decoders depends on implementation.
In
The decoding host controller (252) communicates with the MLE DEMUX (250). The decoding host controller (252) can be part of the same computing system as the DEMUX (250) or part of another computing system. The decoding host controller (252) gets the encoder capability data from the encoding controller (222) and creates stream configuration request data appropriate for the decoder(s) (271, . . . , 27n), e.g., considering the computational capabilities, screen size or quality setting of a given decoder or decoders, or considering network bandwidth. The decoding host controller (252) sends the stream configuration request data to the encoding controller (222), which uses the configuration request data to configure the encoders of the multi-layer encoder (210). The decoding host controller (252) and encoding controller (222) can also exchange run-time control messages during streaming and playback of the bitstream, as explained with reference to
In
In
III. Generalized Approach to Capability Advertisement, Configuration and Control
Features of encoder capability advertisement, encoder configuration and run-time control are provided in a framework that supports a wide range of encoders with different capabilities. The encoders can use scalable video coding and/or non-scalable video coding. The encoders potentially target various applications and scenarios, from low-end mobile phone video chat, up to high-end telepresence systems.
As part of a capability advertisement stage, a controller for a decoding host creates a request for encoder capability data and sends the request to an encoding controller. The encoding controller determines encoder capability data and provides the encoder capability data to the decoding host controller. In one implementation, the encoder capability data are formatted and requested using the structures, call signatures and call flows explained with reference to
As part of a configuration stage, the decoding host controller processes the encoder capability data received from the encoding controller. The decoding host controller creates stream configuration request data appropriate for the decoder(s) in question, subject to the capabilities indicated in the encoder capability data. The decoding host controller provides the configuration request data to the encoding controller. The encoder controller uses it to configure the encoder(s) and allocate resources for encoding, and also acknowledges receipt of the configuration request data. In one implementation, the stream configuration request data are formatted and sent using the structures, call signatures and call flows explained with reference to
The encoder then starts encoding and streaming, and the decoder starts decoding and playback. As part of the run-time control stage, during decoding, the decoding host control can create a control message and send it to the encoding controller. The encoding controller processes the run-time control message appropriately, and also acknowledges receipt of the run-time control message. In one implementation, run-time control messages are formatted and sent using the structures, call signatures and call flows explained with reference to
IV. Generalized Approach to Encoder Capability Advertisement
To start, the decoding host controller creates (410) a request for encoder capability data and sends (420) the request (e.g., as part of a function call) to the encoding controller. For example, the request for encoder capability data is formatted and sent using the structures, call signatures and call flows explained with reference to
The encoding controller receives (530) the request for encoder capability data and determines (540) the encoder capability data. The encoder capability data can include one or more of the following.
Alternatively, the encoder capability data include other and/or additional data.
The encoding controller then sends (550) the encoder capability data (e.g., as part of a return for a function call). The decoding host controller receives (460) the encoder capability data and processes it. The decoding host controller evaluates (470) whether the encoder capability data has been successfully provided. If so, the decoding host controller can continue to the encoder configuration phase. Otherwise, the decoding host controller creates (410) another request for encoder capability data and again attempts to get encoder capability data.
V. Generalized Approach to Initial Encoder Configuration
To start, the decoding host controller determines (610) encoder capability data. For example, the decoding host controller receives the encoder capability data from an encoding controller as described with reference to
The decoding host controller then creates (620) stream configuration request data based at least in part on the encoder capability data. The stream configuration request data can include one or more of the following.
Alternatively, the stream configuration request data include other and/or additional data.
The stream configuration request data can also be based at least in part on data that indicate computational limits for encoding. For example, the data that indicate computational limits for encoding are maximum macroblock processing rates. Alternatively, the data that indicate computational limits take another form.
The decoding host controller sends (630) the stream configuration request data (e.g., as part of a function call) to the encoding controller. For example, the stream configuration request data are formatted and sent using the structures, call signatures and call flows explained with reference to
The encoding controller receives (740) the stream configuration request data and processes (750) it. For example, the encoding controller configures one or more encoders and allocates resources such as memory buffers for encoding. The encoder can then start streaming with the encoder(s).
The encoding controller sends (760) a reply (e.g., as part of a return for a function call). The decoding host controller receives (670) the reply and processes it. The decoding host controller evaluates (680) whether the encoder configuration succeeded. If so, the decoding host controller can continue to the run-time control phase. Otherwise, the decoding host controller creates (620) new stream configuration request data and again attempts to initialize the encoder.
VI. Generalized Approach to Run-Time Control
During decoding of encoded video data of a bitstream, a decoding host controller creates (810) a run-time control message for control of encoding for the bitstream. Such run-time control can use commands specified for a particular layer of a bitstream, in which case the control message indicates which layer is controlled/changed. For this purpose, the control message can include layer identifier data, which in turn can include a stream identifier of the bitstream and at least one layer identifier of a given layer of the bitstream. For example, the layer identifier data are formatted using the structure explained with reference to
The decoding host controller sends (820) the control message (e.g., as part of a function call) to an encoding controller. For example, the control message is formatted and sent using the structures, call signatures and call flows explained with reference to
The control message can be a request to insert a synchronization picture for the given layer, a request to change spatial resolution for the given layer, a request to set a priority identifier for the given layer, a request to set quantization parameters and/or rate control parameters for the given layer, a request to start streaming of a subset of the bitstream, a request to stop streaming of a subset of the bitstream, or some other type of control message. When the control message is a request to start streaming a subset of the bitstream, the subset can include encoded video data for the given layer and any layers upon which the given layer depends. When the control message is a request to stop streaming a subset of the bitstream, the subset can include encoded video data for the given layer and any higher layers.
The encoding controller receives (930) the run-time control message and processes (940) it, in a way that depends on the type of control message, and if such processing is feasible. For example, the encoding controller causes an encoder to insert a synchronization picture for the given layer, change spatial resolution for the given layer, set quantization parameters and/or rate control parameters for the given layer, or make some other change. Or, the encoding controller causes an encoder or multiplexer to set a priority identifier for the given layer, start streaming of a subset of the bitstream, stop streaming of a subset of the bitstream, or perform some other action.
The encoding controller then sends (950) a reply (e.g., as part of a return from a function call). The decoding host controller receives (860) the reply and processes it. The decoding host controller evaluates (680) whether the run-time control succeeded. If so, the decoding host controller can continue the run-time control phase, possibly restarting the technique (800) for another control message. Otherwise, the decoding host controller creates (810) the control message again for another attempt at the run-time control operation.
VII. Capability Advertisement, Configuration and Control in Example Implementation
In an example implementation, features of encoder capability advertisement, encoder configuration and run-time control are provided in a framework that supports a wide range of hardware and software H.264 encoders with different capabilities. The H.264 encoders can use scalable video coding (that is, H.264/SVC) and/or non-scalable video coding (that is, H.264/AVC). In the example implementation, the framework uses a tiered approach from low to high capabilities that is designed to allow these different encoders to be used in a unified video system. The framework supports a variety of frame rates and spatial resolutions. For additional details about the framework in the example implementation, see U.S. patent application Ser. No. 13/235,217.
As part of a capability advertisement stage, a controller for a decoding host initiates a call to query an encoding controller for encoder capabilities. In
As part of a configuration stage, the decoding host controller processes the reply (including the structure that indicates encoder capabilities). The decoding host controller creates a stream configuration request structure.
The decoding host controller initiates a call to initialize the encoder. In
As part of the run-time control stage, during the encoding/decoding, the decoding host controller initiates a call to an appropriate function of the interface. In
A. Capability Advertisement Structures and Function Signatures
The next field in the function signature (1102) has a type of H264SVCCapability and indicates the maximum encoder capabilities and options for an H.264/SVC bitstream.
In the structure (1101), the field MaxNumOfTemporalEnhancementLayers indicates the maximum number of temporal enhancement layers in a bitstream. The field has 3 bits, as indicated by the number 3 in the structure (1101). A non-zero value of this field indicates the encoder supports the creation of temporal scalable bitstreams formed in a hierarchical prediction structure. For example, an encoder can produce an H.264/SVC bitstream using a hierarchical P-picture prediction structure to achieve temporal scalability. In this case, a frame in a temporal enhancement layer uses the immediate previously reconstructed frame in the lower layer as a reference frame. Thus, each layer represents a temporal scale. A value of temporal_id can specify the hierarchical dependency of a temporal layer relative to other layers, with 0 representing the base temporal layer, 1 the first temporal enhancement layer, 2 the second temporal enhancement layer, and so forth.
The single-bit field RewriteSupport indicates whether the encoder supports the creation of quality scalable bitstreams that can be converted into bitstreams that conform to one of the non-scalable H.264/AVC profiles by using a low-complexity rewriting process.
The next fields of the structure (1101) relate to support for SNR scalability. According to the H.264/SVC standard, an encoder can use coarse-grained scalability (“CGS”) and medium-grained scalability (“MGS”) in a single bitstream. Typically, however, it suffices for an encoder to use either CGS or MGS for a given bitstream.
The three-bit field MaxNumOfCGSEnhancementLayers indicates the maximum number of CGS quality enhancement layers in a bitstream. A non-zero value of this field indicates the encoder supports the creation of CGS quality scalable bitstreams. The field MaxNumOfMGSSublayers indicates the maximum number of MGS sub-layers allowed in an MGS enhancement layer in a bitstream. A non-zero value of this field indicates the encoder supports the creation of MGS quality scalable bitstreams with sub-layering. Key frame generation is supported when MGS is supported. The field AdditionalSNRScalabilitySupport indicates whether additional SNR layers are allowed to be present in a spatial enhancement layer. When this field is 1, additional SNR scalability may be introduced in a way that follows the quality capability specified for the base spatial layer. That is, the introduction of SNR enhancement layers in a spatial enhancement layer is constrained by the values of the fields MaxNumOfCGSEnhancementLayers and MaxNumOfMGSSublayers. Also, the rewrite mode is disabled in any spatial resolution enhancement layers.
Finally, the three-bit field MaxNumOfSpatialEnhancementLayers indicates the maximum number of spatial enhancement layers supported in a bitstream. A non-zero value of this field indicates the encoder supports the creation of spatial scalable bitstreams. The remaining bits are reserved in the structure (1101) of
Returning to
In
B. Initial Configuration Structures and Function Signatures
After the controller for the decoding host receives the encoder capability data, the decoding host controller can determine one or more feasible and appropriate layering structures for the respective stream(s), along with spatial resolutions and frame rates at the respective layers. With these structure(s), the decoding host controller can specify a particular stream configuration to the encoding controller.
1. Setting Per Stream Configuration Structures
The three-bit field NumberOfTemporalEnhancementLayers indicates the number of temporal enhancement layers in the stream. This value effectively corresponds to the values of syntax element temporal_id in H.264/SVC. For example, if this field is 2, three temporal layers, corresponding to temporal_id 0, 1, and 2 are present in the bitstream. The value of this field does not exceed the maximum number of temporal layers specified in the H.264/SVC capability structure (1101).
The remaining fields of the configuration structure (1201) are organized according to spatial base layer and 1st, 2nd and 3rd spatial enhancement layers. The structure (1201) shown in
In the structure (1201), the next fields indicate SNR scalability attributes for the spatial base layer. The single-bit field SNRModeBase indicates whether CGS or MGS is used to generate quality layers in the spatial base layer. The value 0 means CGS is used, and the value 1 means MGS is used.
When CGS is used according to SNRModeBase, SNRModeAttributeBase (1 bit) indicates whether a rewriting process is enabled (0 means rewriting is not used, and 1 means rewriting is used). The two-bit field NumberOfSNREnhancementLayersBase indicates the number of CGS quality enhancement layers in the spatial base layer. When the value of this field is 0, it means no SNR scalability is introduced in the spatial base layer. A non-zero value effectively corresponds to the values of the syntax element dependency_id in H.264/SVC. For example, if the value of this field is 2, three CGS layers (corresponding to dependency_id 0, 1, and 2) are present in the spatial base layer in the bitstream.
When MGS is used according to SNRModeBase, SNRModeAttributeBase indicates whether key frame generation is enabled (0 means key frame generation is disabled, and 1 means it is enabled). The field NumberOfSNREnhancementLayersBase indicates the number of MGS sub-layers in the spatial base layer. When the value of this field is 0, it means no SNR scalability is introduced in the spatial base layer, as noted above for the CGS case. A non-zero value effectively corresponds to the values of the syntax element quality_id in H.264/SVC. For example, if the value of this field is 3, three MGS sub-layers (corresponding to quality_id 1, 2, and 3) are present in the spatial base layer in the bitstream. (That is, quality_id value 0 corresponds to the base quality layer, and quality_id values 1, 2, and 3 correspond to sub-layers in the MGS enhancement layer.)
The three-bit field NumberOfSNRLayers1st indicates whether spatial scalability is introduced in the bitstream and whether and how additional SNR scalability is used. When the value of NumberOfSNRLayers1st is 0, spatial scalability is not introduced in the bitstream. When the value of this field is 1, spatial scalability is used, but no additional SNR scalability is introduced in the 1st spatial enhancement layer. When the value of this field is 2 or larger, spatial scalability is used and additional SNR scalability is also introduced in the 1st spatial enhancement layer. In the last case, depending on the value of SNRMode1st, the value of NumberOfSNRLayers1st indicates the number of CGS quality layers or MGS sub-layers used in the 1st spatial enhancement layer. When the value of NumberOfSNRLayers1st is non-zero, the maximum number of spatial layers advised by the encode capability data is at least 2. When the value of NumberOfSNRLayers1st is larger than 1, the use of additional SNR scalability does not exceed that specified in the encoder capability data.
In the configuration structure (1101), the field SNRMode1st (1 bit) indicates whether CGS or MGS is used to generate additional quality layers in a 1st spatial enhancement layer, if present. The value 0 means CGS is used, and the value 1 means MGS is used.
When CGS is used according to SNRMode1st, the field SNRModeAttribute1st (1 bit) indicates whether the rewriting process is enabled (0 means rewriting is not used; 1 means it is used). The value of NumberOfSNRLayers1st effectively corresponds to the values of the syntax element dependency_id in H.264/SVC. For example, if the value of this field is 3, three CGS layers (corresponding to dependency_id K+1, K+2, and K+3) are present in the 1st spatial enhancement layer in the bitstream, where K is 0 if SNRModeBase is 1 and K is NumberOfSNREnhancementLayersBase if SNRModeBase is 0.
When MGS is used according to SNRMode1st, the field SNRModeAttribute1st indicates whether key frame generation is enabled (0 means key frame generation is disabled; 1 means it is enabled). The value of NumberOfSNRLayers1st effectively corresponds to the values of the syntax element quality_id in H.264/SVC. For example, if the value of this field is 3, three MGS sub-layers (corresponding to quality_id 1, 2, and 3) are present in the 1st spatial enhancement layer in the bitstream.
The field UpscaleRatio1st (1 bit) indicates the resolution upscale ratio of the 1st spatial enhancement layer with respect to the base spatial layer. Two resolution upscale ratios are supported. A value of 0 means the upscaling ratio is 2, and a value of 1 means the ratio is 1.5.
In the configuration structure (1101), the fields SNRMode2nd and SNRMode3rd have the same meaning as the field SNRMode1st but relate to the 2nd and 3rd spatial enhancement layers, if present. The same applies for SNRModeAttribute2nd, UpscaleRatio2nd, SNRModeAttribute3rd and UpscaleRatio3rd.
The two-bit field NumberOfSNRLayers2nd indicates whether the 2nd spatial enhancement layer is introduced in the bitstream and, if so, whether additional SNR scalability is used. When this field is 0, the 2nd spatial enhancement layer is not introduced in the bitstream. When the value of this field is 1, the 2nd spatial enhancement layer exists, but no additional SNR scalability is introduced. When the value of this field is 2 or larger, additional SNR scalability is introduced in the 2nd spatial enhancement layer. In the last case, depending on the value of SNRMode2nd, the value of NumberOfSNRLayers2nd indicates the number of CGS quality layers or MGS sub-layers used in the 2nd spatial enhancement layer. When the value of NumberOfSNRLayers2nd is non-zero, NumberOfSNRLayers1st is also non-zero and the maximum number of spatial layers advised by the encode capability data is at least 3. When the value of this field is larger than 1, the use of additional SNR scalability does not exceed that specified in the encoder capability data.
When CGS is used according to SNRMode2nd, NumberOfSNRLayers2nd effectively corresponds to the values of the syntax element dependency_id in H.264/SVC. For example, if the value of this field is 3, three CGS layers (corresponding to dependency_id K+1, K+2, and K+3) are present in the 2nd spatial enhancement layer in the bitstream, where (a) K is 1 if both SNRModeBase and SNRMode1st are 1, (b) K is NumberOfSNREnhancementLayersBase+1 if SNRModeBase is 0 but SNRMode1st is 1, and (c) K is NumberOfSNRLayers1st if SNRModeBase is 1 but SNRMode1st is 0.
When MGS is used according to SNRMode2nd, NumberOfSNRLayers2nd effectively corresponds to the values of syntax element quality_id in H.264/SVC. For example, if the value of this field is 3, three MGS sub-layers (corresponding to quality_id 1, 2, and 3) are present in the 2nd spatial enhancement layer in the bitstream.
NumberOfSNRLayers3rd is defined in a similar way for the 3rd spatial enhancement layer.
2. Checking that Configuration is Computationally Tractable
In general, the controller for the decoding host can pick any combination of resolutions/frame rates for a particular layering structure, so long as the selected resolution and frame rate do not exceed the resolution and frame rate of the source or exceed the maximum values advised by the encoding controller, and so long as the aggregate macroblock processing rate does not exceed the relevant value indicated by the encoding controller. The aggregate macroblock processing rate for a given layer depends on the frame rate and spatial resolution for the given layer, and also depends on the frame rate and spatial resolution for layers used in reconstruction of the given layer.
For example, suppose layer A is 360p video with spatial resolution of 640×360 at 15 frames per second. The macroblock processing rate for layer A is:
where ceil( ) is a function that rounds up a fractional value to the next highest integer value.
Suppose layer B is 720p video with spatial resolution of 1280×720 at 30 frames per second. The macroblock processing rate for layer B by itself is:
Layer A could be an H.264/AVC bitstream, and layer B could be a separate H.264/AVC bitstream for simulcast coding for the same input video. If layer A provides base layer video for an H.264/SVC bitstream, and layer B provides spatial and temporal scalability for the H.264/SVC bitstream, the aggregate macroblock processing rate is 13800+108000=121800 macroblocks per second.
In this example, the H.264/SVC bitstream includes one spatial resolution re-scaling stage (re-scaling by a factor of 2 horizontally and vertically, from 640×360 to 1280×720). For the array MaxMacroblockProcessingRate[i][j], the value of index i is 1. The H.264/SVC bitstream uses temporal and spatial scalability, so the value of index j is 3.
The decoding host controller computes the macroblock processing rate for the configuration that it has specified using the configuration structures (1201), then compares that macroblock processing rate to the maximum value for the appropriate values of i and j to confirm that the configuration is within the applicable computational limit specified in the encoder capability data. If the configuration exceeds the applicable computational limit, the decoding host controller can adjust the spatial resolution, frame rate and/or layering structure for the configuration to reduce the expected computational cost.
If the configuration includes a single bitstream, the values for the indices i and j depend on the number of spatial resolution re-scaling stages and degree of scalability for that one bitstream. If the configuration includes multiple bitstreams, conservatively, the decoding host controller can use the highest applicable values for the indices i and j. Also, the decoding host can count different simulcast streams at different spatial resolutions as re-scaling operations, since these affect the computational cost for the encoder. For example, when the configuration includes two simulcast streams having different spatial resolutions, and neither bitstream uses spatial scalability within that bitstream, the decoding host controller can set the value of the index i to 1. If one of the simulcast streams also uses spatial scalability within the bitstream, the decoding host controller can increment the value of the index i appropriately to account for the re-scaling operations.
3. Initializing Encoder with Configuration Data
Typically, a stream configuration request within the SVC capabilities and computational limits advertised by the encoder works. If not, the decoding host controller can create a new configuration request and provide the new configuration request through a call to InitializeEncoder( ).
C. Run-Time Control Structures and Function Signatures
After the successful completion of the initialization phase, the encoder starts streaming the H.264 bitstreams. During the encoding and decoding that follows, the decoding host controller may need to adjust how the encoder operates in response to network bandwidth fluctuation, a remote decoder request, a decoding host-side resource change, or another factor. For this purpose, the decoding host controller creates and sends run-time control messages. For example, the decoding host controller can request the insertion of an intra-coded picture in a scalable layer to react packet loss at the decoding side.
The H.264/SVC standard employs three layer identifiers to identify a scalable layer in a bitstream: temporal_id, quality_id and dependency_id. These layer identifiers are non-negative integers. Lesser values are associated with lower scalable layers, and greater values are associated with higher scalable layers. The stream layer ID structure (1301) includes fields TemporalId (3 bits), QualityId (3 bits), and DependencyId (4 bits) that correspond to the syntax elements temporal_id, quality_id and dependency_id, respectively. In order to create an identifier for a particular scalable layer of a particular stream, and thereby facilitate communication between decoding host controller and encoding controller, values for the layer identifiers are constrained as follows. For temporal_id, quality_id or dependency_id, the value starts from zero and is incremented by one in the next higher scalable layer for that identifier (e.g., according to separate numbering, values 0 . . . x for temporal_id, values 0 . . . y for quality_id, and values 0 . . . z for dependency_id).
The stream layer ID structure (1301) further includes a four-bit field StreamId whose value identifies a stream with which the particular layer is associated. In this way, the stream layer ID structure (1301) can be used to identify layers in different simulcast streams. For example, the value of StreamId indicates an index to the stream layout structures (1201) signaled as part of the call to InitializeEncoder( ). More generally, StreamId is used between the encoding controller and decoding host controller to identify a particular stream for run-time control. When the configuration has only one stream, the value of StreamId is 0.
When multiple layers (in the same or different streams) are to be identified for a single control, wildcard masking can be used to reduce the number of control messages and calls. The maximum value in each field is reserved for wildcard masking purpose. For example, the value of 7 for TemporalId refers to layers with TemporalId 0 . . . 6.
In view of the many possible embodiments to which the principles of the disclosed invention may be applied, it should be recognized that the illustrated embodiments are only preferred examples of the invention and should not be taken as limiting the scope of the invention. Rather, the scope of the invention is defined by the following claims. We therefore claim as our invention all that comes within the scope and spirit of these claims.
Number | Name | Date | Kind |
---|---|---|---|
4142071 | Croisier et al. | Feb 1979 | A |
4216354 | Esteban et al. | Aug 1980 | A |
4464783 | Beraud et al. | Aug 1984 | A |
5243420 | Hibi | Sep 1993 | A |
5381143 | Shimoyoshi et al. | Jan 1995 | A |
5416521 | Chujoh et al. | May 1995 | A |
5418570 | Ueno et al. | May 1995 | A |
5436665 | Ueno et al. | Jul 1995 | A |
5454011 | Shimoyoshi | Sep 1995 | A |
5463424 | Dressler | Oct 1995 | A |
5537440 | Eyuboglu et al. | Jul 1996 | A |
5541852 | Eyuboglu et al. | Jul 1996 | A |
5544266 | Koppelmans et al. | Aug 1996 | A |
5617142 | Hamilton | Apr 1997 | A |
5623424 | Azadegan et al. | Apr 1997 | A |
5659660 | Plenge et al. | Aug 1997 | A |
5677735 | Ueno et al. | Oct 1997 | A |
5835495 | Ferriere | Nov 1998 | A |
5970173 | Lee et al. | Oct 1999 | A |
5986712 | Peterson et al. | Nov 1999 | A |
5995151 | Naveen et al. | Nov 1999 | A |
6044089 | Ferriere | Mar 2000 | A |
6084909 | Chiang et al. | Jul 2000 | A |
6192075 | Jeng | Feb 2001 | B1 |
6192154 | Rajagopalan et al. | Feb 2001 | B1 |
6249288 | Campbell | Jun 2001 | B1 |
6259741 | Chen et al. | Jul 2001 | B1 |
6278691 | Ohyama et al. | Aug 2001 | B1 |
6278735 | Mohsenian | Aug 2001 | B1 |
6285716 | Knee et al. | Sep 2001 | B1 |
6370502 | Wu et al. | Apr 2002 | B1 |
6393059 | Sugiyama | May 2002 | B1 |
6404814 | Apostolopoulos et al. | Jun 2002 | B1 |
6426977 | Lee et al. | Jul 2002 | B1 |
6434197 | Wang et al. | Aug 2002 | B1 |
6463414 | Su et al. | Oct 2002 | B1 |
6466623 | Youn et al. | Oct 2002 | B1 |
6496216 | Feder | Dec 2002 | B2 |
6496868 | Krueger et al. | Dec 2002 | B2 |
6504494 | Dyas et al. | Jan 2003 | B1 |
6507615 | Tsujii et al. | Jan 2003 | B1 |
6522693 | Lu et al. | Feb 2003 | B1 |
6526099 | Christopoulos et al. | Feb 2003 | B1 |
6529552 | Tsai et al. | Mar 2003 | B1 |
6647061 | Panusopone et al. | Nov 2003 | B1 |
6650705 | Vetro et al. | Nov 2003 | B1 |
6678654 | Zinser, Jr. et al. | Jan 2004 | B2 |
6728317 | Demos | Apr 2004 | B1 |
6757648 | Chen et al. | Jun 2004 | B2 |
6823008 | Morel | Nov 2004 | B2 |
6925501 | Wang et al. | Aug 2005 | B2 |
6931064 | Mori et al. | Aug 2005 | B2 |
6934334 | Yamaguchi et al. | Aug 2005 | B2 |
6937653 | Song et al. | Aug 2005 | B2 |
6944224 | Zhao | Sep 2005 | B2 |
6961377 | Kingsley | Nov 2005 | B2 |
6963347 | Selvaggi et al. | Nov 2005 | B1 |
7027982 | Chen et al. | Apr 2006 | B2 |
7039116 | Zhang et al. | May 2006 | B1 |
7058127 | Lu et al. | Jun 2006 | B2 |
7068718 | Kim et al. | Jun 2006 | B2 |
7085322 | Ngai et al. | Aug 2006 | B2 |
7116714 | Hannuksela | Oct 2006 | B2 |
7142601 | Kong et al. | Nov 2006 | B2 |
7292634 | Yamamoto et al. | Nov 2007 | B2 |
7295612 | Haskell | Nov 2007 | B2 |
7319720 | Abrams, Jr. | Jan 2008 | B2 |
7336720 | Martemyanov et al. | Feb 2008 | B2 |
7343291 | Thumpudi | Mar 2008 | B2 |
7346106 | Jiang et al. | Mar 2008 | B1 |
7352808 | Ratakonda et al. | Apr 2008 | B2 |
7643422 | Covell et al. | Jan 2010 | B1 |
7694075 | Feekes, Jr. | Apr 2010 | B1 |
7773672 | Prieto et al. | Aug 2010 | B2 |
7840078 | Segall | Nov 2010 | B2 |
7844992 | Boyce | Nov 2010 | B2 |
7885341 | Chen et al. | Feb 2011 | B2 |
7936820 | Watanabe et al. | May 2011 | B2 |
8130828 | Hsu et al. | Mar 2012 | B2 |
8553769 | He | Oct 2013 | B2 |
9191671 | Vanam et al. | Nov 2015 | B2 |
20020036707 | Gu | Mar 2002 | A1 |
20020080877 | Lu et al. | Jun 2002 | A1 |
20020090027 | Karczewicz et al. | Jul 2002 | A1 |
20020131492 | Yokoyama | Sep 2002 | A1 |
20020136298 | Anantharamu | Sep 2002 | A1 |
20020172154 | Uchida et al. | Nov 2002 | A1 |
20020181584 | Alexandre et al. | Dec 2002 | A1 |
20030035480 | Schaar et al. | Feb 2003 | A1 |
20030058931 | Zhang | Mar 2003 | A1 |
20030185298 | Alvarez et al. | Oct 2003 | A1 |
20030206597 | Kolarov et al. | Nov 2003 | A1 |
20030227974 | Nakamura et al. | Dec 2003 | A1 |
20040117427 | Allen et al. | Jun 2004 | A1 |
20040125877 | Chang | Jul 2004 | A1 |
20040136457 | Funnell et al. | Jul 2004 | A1 |
20040165667 | Lennon et al. | Aug 2004 | A1 |
20040234142 | Chang et al. | Nov 2004 | A1 |
20040264489 | Klemets et al. | Dec 2004 | A1 |
20050025234 | Kato | Feb 2005 | A1 |
20050041740 | Sekiguchi | Feb 2005 | A1 |
20050053157 | Lillevold | Mar 2005 | A1 |
20050075869 | Gersho et al. | Apr 2005 | A1 |
20050084007 | Lightstone et al. | Apr 2005 | A1 |
20050123058 | Greenbaum et al. | Jun 2005 | A1 |
20050165611 | Mehrotra et al. | Jul 2005 | A1 |
20050169545 | Ratakonda et al. | Aug 2005 | A1 |
20050175091 | Puri et al. | Aug 2005 | A1 |
20050180511 | Arafune et al. | Aug 2005 | A1 |
20050195899 | Han | Sep 2005 | A1 |
20050201469 | Sievers | Sep 2005 | A1 |
20050207497 | Rovati et al. | Sep 2005 | A1 |
20050228854 | Steinheider et al. | Oct 2005 | A1 |
20050232497 | Yogeshwar et al. | Oct 2005 | A1 |
20060002479 | Fernandes | Jan 2006 | A1 |
20060114995 | Robey et al. | Jun 2006 | A1 |
20060120610 | Kong et al. | Jun 2006 | A1 |
20060126726 | Lin et al. | Jun 2006 | A1 |
20060126744 | Peng et al. | Jun 2006 | A1 |
20060159169 | Hui et al. | Jul 2006 | A1 |
20060215754 | Buxton et al. | Sep 2006 | A1 |
20060222078 | Raveendran | Oct 2006 | A1 |
20060239343 | Mohsenian | Oct 2006 | A1 |
20060245491 | Jam et al. | Nov 2006 | A1 |
20060248516 | Gordon | Nov 2006 | A1 |
20060248563 | Lee et al. | Nov 2006 | A1 |
20060262844 | Chin | Nov 2006 | A1 |
20060262846 | Burazerovic | Nov 2006 | A1 |
20070039028 | Bar | Feb 2007 | A1 |
20070053444 | Shibata et al. | Mar 2007 | A1 |
20070058718 | Shen et al. | Mar 2007 | A1 |
20070058729 | Yoshinari | Mar 2007 | A1 |
20070071105 | Tian et al. | Mar 2007 | A1 |
20070140352 | Bhaskaran et al. | Jun 2007 | A1 |
20070153906 | Petrescu et al. | Jul 2007 | A1 |
20070160128 | Tian et al. | Jul 2007 | A1 |
20070223564 | Bruls et al. | Sep 2007 | A1 |
20070230564 | Chen et al. | Oct 2007 | A1 |
20070280349 | Prieto et al. | Dec 2007 | A1 |
20080007438 | Segall et al. | Jan 2008 | A1 |
20080046939 | Lu et al. | Feb 2008 | A1 |
20080137736 | Richardson et al. | Jun 2008 | A1 |
20080144723 | Peisong et al. | Jun 2008 | A1 |
20080151101 | Tian et al. | Jun 2008 | A1 |
20080165844 | Karczewicz | Jul 2008 | A1 |
20080165864 | Eleftheriadis et al. | Jul 2008 | A1 |
20080187046 | Joch | Aug 2008 | A1 |
20080247460 | Kang | Oct 2008 | A1 |
20080259921 | Dinesh | Oct 2008 | A1 |
20090003452 | Au et al. | Jan 2009 | A1 |
20090012982 | Merchia et al. | Jan 2009 | A1 |
20090028247 | Suh et al. | Jan 2009 | A1 |
20090033739 | Sarkar et al. | Feb 2009 | A1 |
20090034629 | Suh et al. | Feb 2009 | A1 |
20090037959 | Suh et al. | Feb 2009 | A1 |
20090074074 | Au et al. | Mar 2009 | A1 |
20090110060 | Cortes et al. | Apr 2009 | A1 |
20090147859 | McGowan et al. | Jun 2009 | A1 |
20090176454 | Chen et al. | Jul 2009 | A1 |
20090201990 | Leprovost et al. | Aug 2009 | A1 |
20090219993 | Bronstein et al. | Sep 2009 | A1 |
20090225870 | Narasimhan | Sep 2009 | A1 |
20090244633 | Johnston | Oct 2009 | A1 |
20090268805 | Shanableh | Oct 2009 | A1 |
20090279605 | Holcomb et al. | Nov 2009 | A1 |
20090282162 | Mehrotra et al. | Nov 2009 | A1 |
20100086048 | Ishtiaq et al. | Apr 2010 | A1 |
20100091837 | Zhu et al. | Apr 2010 | A1 |
20100091888 | Nemiroff | Apr 2010 | A1 |
20100142622 | Le Leannec et al. | Jun 2010 | A1 |
20100189179 | Gu et al. | Jul 2010 | A1 |
20100189183 | Gu et al. | Jul 2010 | A1 |
20100208795 | Hsiang | Aug 2010 | A1 |
20100272171 | Xu | Oct 2010 | A1 |
20100316126 | Chen et al. | Dec 2010 | A1 |
20100316134 | Chen et al. | Dec 2010 | A1 |
20110001642 | Yu | Jan 2011 | A1 |
20110188577 | Kishore et al. | Aug 2011 | A1 |
20110305273 | He et al. | Dec 2011 | A1 |
20120044999 | Kim et al. | Feb 2012 | A1 |
20120051432 | Fernandes et al. | Mar 2012 | A1 |
20120056981 | Tian et al. | Mar 2012 | A1 |
20120219069 | Lim et al. | Aug 2012 | A1 |
20120320993 | Gannholm | Dec 2012 | A1 |
20130003833 | Jang | Jan 2013 | A1 |
20130034170 | Chen | Feb 2013 | A1 |
20130044811 | Kim | Feb 2013 | A1 |
20130070859 | Lu | Mar 2013 | A1 |
20140133547 | Tanaka | May 2014 | A1 |
Number | Date | Country |
---|---|---|
0 909 094 | Apr 1999 | EP |
1 195 992 | Apr 2002 | EP |
3032088 | Apr 2000 | JP |
2002-152752 | May 2002 | JP |
3317327 | Aug 2002 | JP |
2003-259307 | Sep 2003 | JP |
2005-252555 | Sep 2005 | JP |
2007-036666 | Feb 2007 | JP |
2007-295423 | Nov 2007 | JP |
10-2005-0089720 | Sep 2005 | KR |
10-2006-0132890 | Dec 2006 | KR |
10-2008-0102141 | Nov 2008 | KR |
WO 0195633 | Dec 2001 | WO |
WO 02054774 | Jul 2002 | WO |
WO 2004004359 | Jan 2004 | WO |
WO 2004025405 | Mar 2004 | WO |
WO 2004025405 | Mar 2004 | WO |
WO 2006096612 | Sep 2006 | WO |
WO 2006134110 | Dec 2006 | WO |
WO 2010088030 | Aug 2010 | WO |
Entry |
---|
Chen et al., “Bandwidth-Efficient Encoder Framework for H.264/AVC Scalable extension”, 2007, IEEE, Int'l Symp. on Multimedia Workshops, pp. 401-406. |
Chen et al., “Bandwidth-Efficient Encoder Framework for H.264/AVC Scalable Extension,” IEEE Int'l Symp. on Multimedia Workshops, pp. 401-406 (2007). |
Detti et al., “SVEF: an Open-Source Experimental Evaluation Framework for H.264 Scalable Video Streaming,” IEEE Symp. on Computers and Communications, pp. 36-41 (2009). |
Kofler et al., “Improving IPTV Services by H.264/SVC Adaptation and Traffic Control, ” IEEE Int'l Symp. on Broadband Multimedia Systems and Broadcasting, 6 pp. (2009). |
Ortiz Murillo et al., “Towards User-driven Adaptation of H.264/SVC Streams,” European Conf. on Interactive TV and Video, 4 pp. (2010). |
Zhu et al., “Rate Control Scheme for Temporal Scalability of H.264/SVC Based on New Rate Distortion Model,” Journal of Convergence Information Technology, vol. 6, No. 1, pp. 24-33 (Jan. 2011). |
Huang et al., “Optimal Coding Rate Control of Scalable and Multi Bit Rate Streaming Media,” Microsoft Research Technical Report MSR-TR-2005-47, Apr. 2005 (also appeared in Proc. International Packet Video Workshop (PV 2004), Irvine, CA, Dec. 2004 and in Proc. Picture Coding Symposium (PCS 2004), San Francisco, CA, Dec. 2004, 26 pages. |
U.S. Appl. No. 60/341,674, filed Dec. 17, 2001, Lee et al. |
U.S. Appl. No. 60/488,710, filed Jul. 18, 2003, Srinivasan et al. |
U.S. Appl. No. 60/501,081, filed Sep. 7, 2003, Srinivasan et al. |
U.S. Appl. No. 60/501,133, filed Sep. 7, 2003, Holcomb et al. |
Akramullah et al., “Parallelization of MPEG-2 Video Encoder for Parallel and Distributed Computing Systems,” IEEE, pp. 834-837 (Aug. 1995). |
Asbun et al., “Very Low Bit Rate Wavelet-Based Scalable Video Compression,” Proc. Int'l Conf. on Image Processing, vol. 3, pp. 948-952 (Oct. 1998). |
Assuncao et al., “A Frequency-Domain Video Transcoder for Dynamic Bit-Rate Reduction of MPEG-2 Bit Streams,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 8, No. 8, pp. 953-967 (Dec. 1998). |
Assuncao et al., “Buffer Analysis and Control in CBR Video Transcoding,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 10, No. 1, pp. 83-92 (Feb. 2000). |
Assuncao et al., “Transcoding of Single-Layer MPEG Video Into Lower Rates,” IEE Proc.-Vis. Image Signal Process., vol. 144, No. 6, pp. 377-383 (Dec. 1997). |
ATI Technologies, Inc., “Introduction to H.264,” 6 pp. (month unknown, 2005). |
Braun et al., “Motion-Compensating Real-Time Format Converter for Video on Multimedia Displays,” Proceedings IEEE 4th International Conference on Image Processing (ICIP-97), vol. I, pp. 125-128 (Oct. 1997). |
Brightwell et al., “Flexible Switching and Editing of MPEG-2 Video Bitstreams,” IBC-97, 11 pp. (Sep. 1997). |
Chang et al., “Real-Time Content-Based Adaptive Streaming of Sports Videos,” IEEE, pp. 139-146 (Jul. 2001). |
Chen et al., “Implementation of H.264 Encoder and Decoder on Personal Computers,” Journal of Visual Comm. and Image Representation, 19 pp. (Apr. 2006). |
Chen, “Synchronization and Control of Multi-threads for MPEG-4 Video Decoder,” IEEE 1999 Int'l Conf. on Consumer Electronics, pp. 298-299 (Jun. 1999). |
Crooks, “Analysis of MPEG Encoding Techniques on Picture Quality,” Tektronix Application Note, 11 pp. (Jun. 1998). |
Dawson, “Coding for Multiple Cores on Xbox 360 and Microsoft Windows,” 8 pp. (Aug. 2006) [Downloaded from the Internet on Jan. 22, 2007]. |
Dipert, “Image Compression Article Addendum,” EDN Magazine, 8 pp. (Jun. 18, 1998). |
Duffy, “CLR Inside Out: Using Concurrency for Scalability,” MSDN Magazine, 11 pp. (Sep. 2006) [Downloaded from the Internet on Jan. 22, 2007]. |
Flordal et al., “Accelerating CABAC Encoding for Multi-standard Media with Configurability,” IEEE Xplore, 8 pp. (Apr. 2006). |
Fogg, “Question That Should Be Frequently Asked About MPEG,” Version 3.8, 46 pp. (Apr. 1996). |
foldoc.org, “priority scheduling,” 1 p. (No date) [Downloaded from the Internet on Jan. 26, 2007]. |
foldoc.org, “multitasking,” 1 p. (Document dated Apr. 24, 1998) [Downloaded from the Internet on Jan. 26, 2007]. |
Gerber et al., “Optimizing Video Encoding using Threads and Parallelism: Part 1—Threading a video codec,” 3 pp., downloaded from Embedded.com, (Dec. 2009). |
Gibson et al., Digital Compression for Multimedia, “Chapter 4: Quantization,” Morgan Kaufman Publishers, Inc., pp. 113-138 (Jan. 1998). |
Gibson et al., Digital Compression for Multimedia, “Chapter 7: Frequency Domain Coding,” Morgan Kaufman Publishers, Inc., pp. 227-262 (Jan. 1998). |
Gill, “Tips and Tricks for Encoding Long Format Content with Windows Media Encoder,” downloaded from World Wide Web, 12 pp. (document marked Aug. 2003). |
Hamming, Digital Filters, Second Edition, “Chapter 2: The Frequency Approach,” Prentice-Hall, Inc., pp. 19-31 (Jan. 1983). |
Haskell et al., Digital Video: An Introduction to MPEG-2, Table of Contents, International Thomson Publishing, 6 pp. (1997). |
Huang et al., “Optimal Control of Multiple Bit Rates for Streaming Media,” Proc. Picture Coding Symposium, 4 pp. (Dec. 2004). |
Intel Corp., “Intel's Next Generation Integrated Graphics Architecture—Intel® Graphics Media Accelerator X3000 and 3000,” 14 pp. (Jul. 2006). |
ISO/IEC, “ISO/IEC 11172-2, Information Technology—Coding of Moving Pictures and Associated Audio for Digital Storage Media at up to about 1.5 Mbit/s—Part 2: Video,” 112 pp. (Aug. 1993). |
ISO/IEC, “JTC1/SC29/WG11 N2202, Information Technology—Coding of Audio-Visual Objects: Visual, ISO/IEC 14496-2,” 329 pp. (Mar. 1998). |
ISO/IEC MPEG-2 Test Model 5, “TM5 Overview,” 10 pp. (Mar. 1993). |
Ito et al., “Rate control for video coding using exchange of quantization noise and signal resolution,” Electronics & Communications in Japan, Part II, Hoboken, New Jersey, vol. 83, No. 1, Jan. 1, 2000, pp. 33-43. |
ITU-T, “ITU-T Recommendation H.261, Video Codec for Audiovisual Services at p × 64 kbits,” 25 pp. (Mar. 1993). |
ITU-T, “ITU-T Recommendation H.262, Information Technology—Generic Coding of Moving Pictures and Associated Audio Information: Video,” 205 pp. (Jul. 1995). |
ITU-T, “ITU-T Recommendation H.263, Video Coding for Low Bit Rate Communication,” 162 pp. (Feb. 1998). |
ITU-T, “ITU-T Recommendation H.264, Advanced video coding for generic audiovisual services,” 676 pp. (Mar. 2010). |
Jacobs et al., “Thread-Parallel MPEG-2, MPEG-4 and H.264 Video Encoders for SoC Multi-Processor Architectures,” IEEE Trans. On Consumer Electronics, vol. 52, No. 1, pp. 269-275 (Feb. 2006). |
Joint Video Team (JVT) of ISO/IEC MPEG & ITU-T VCEG, “Joint Final Committee Draft (JFCD) of Joint Video Specification,” JVT-D157, 207 pp. (Aug. 2002). |
Kamikura et al., “Global brightness-variation compensation for video coding” IEEE Trans. on Circuits and Systems for Video Technology, vol. 8, No. 8, pp. 988-1000 (Dec. 1998). |
Kari et al., “Intensity controlled motion compensation,” Data Compression Conference Proc., pp. 249-258, (Mar. 30-Apr. 1, 1998). |
Keesman et al., “Transcoding of MPEG Bitstreams,” Signal Processing: Image Communication 8, pp. 481-500 (Sep. 1996). |
Khan et al., “Architecture Overview of Motion Vector Reuse Mechanism in MPEG-2 Transcoding,” Technical Report TR2001-01-01, 7 pp. (Jan. 2001). |
Kim et al., “Multi-thread VLIW processor architecture for HDTV decoding,” IEEE 2000 Custom Integrated Circuits Conf., pp. 559-562 (May 2000). |
Knee et al., “Seamless Concatenation—A 21st Century Dream,” 13 pp. (Jun. 1997). |
Lei et al., “Rate Adaptation Transcoding for Precoded Video Streams,” 13 pp. (month unknown, 2000). |
Leventer et al., “Towards Optimal Bit-Rate Control in Video Transcoding,” ICIP, pp. 265-268 (Sep. 2003). |
Loomis et al., “VC-1 Technical Overview,” 7 pp. (Apr. 2006) [Downloaded from the Internet on Jan. 24, 2007]. |
Microsoft Corporation, “Microsoft Debuts New Windows Media Player 9 Series, Redefining Digital Media on the PC,” 4 pp. (Sep. 4, 2002) [Downloaded from the World Wide Web on May 14, 2004]. |
Microsoft Corporation, “Windows Media and Web Distribution for Broadcasters,” downloaded from the World Wide Web, 4 pp. (document marked Sep. 2007). |
Microsoft Corporation, “Microsoft Lync—Unified Communication Specification for H.264 AVC and SVC UCConfig Modes V 1.1,” 37 pp. (Jun. 2011). |
Microsoft Corporation, “Microsoft Lync—Unified Communication Specification for H.264 AVC and SVC Encoder Implementation V 1.01,” 32 pp. (Jan. 2011). |
Miyata et al., “A novel MPEG-4 rate control method with spatial resolution conversion for low bit-rate coding,” Conference Proceedings / IEEE International Symposium on Circuits and Systems (ISCAS), Kobe, Japan, May 23-26, 2005, pp. 4538-4541. |
Mook, “Next-Gen Windows Media Player Leaks to the Web,” BetaNews, 17 pp. (Jul. 2002) [Downloaded from the World Wide Web on Aug. 8, 2003]. |
Moshnyaga, “An Implementation of Data Reusable MPEG Video Coding Scheme,” Proceedings of World Academy of Science, Engineering and Technology, vol. 2, pp. 193-196 (Jan. 2005). |
Moshnyaga, “Reduction of Memory Accesses in Motion Estimation by Block-Data Reuse,” ICASSP '02 Proceedings, IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 3, pp. III-3128-III-3131 (May 2002). |
Nasim et al., “Architectural Optimizations for Software-Bassed MPEG4 Video Encoder,” 13th European Signal Processing Conference: EUSIPCO '2005, 4 pp. (Sep. 2005). |
Nuntius Systems, Inc., “H.264—a New Technology for Video Compression”, downloaded from the World Wide Web, 4 pp. (document marked Mar. 2004). |
Oehring et al., “MPEG-2 Video Decompression on Simultaneous Multithreaded Multimedia,” Int. Conf. on Parallel Architectures and Compilation Techniques (PACT '99), Newport Beach, CA (Oct. 1999). |
Ostermann et al., “Video Coding with H.264/AVC: Tools, Performance, and Complexity,” IEEE Circuits and Systems Magazine, pp. 7-28 (Aug. 2004). |
Ozcelebi et al., “Optimal rate and input format control for content and context adaptive video streaming,” 2004 International Conference on Image Processing (ICIP), Singapore, Oct. 24-27, 2004, pp. 2043-2046. |
Ozcelebi et al., “Optimal rate and input format control for content and context adaptive streaming of sports videos,” 2004 IEEE 6th Workshop on Multimedia Signal Processing, Siena, Italy, Sep. 29-Oct. 1, 2004, pp. 502-505. |
Printouts of FTO directories from http://ftp3.itu.ch, 8 pp. [Downloaded from the World Wide Web on Sep. 20, 2005]. |
Reader, “History of MPEG Video Compression—Ver. 4.0,” 99 pp. [Document marked Dec. 16, 2003]. |
RealNetworks, Inc., “Chapter 5: Producing Video,” downloaded from the World Wide Web, 22 pp. (document marked 2004). |
Reed et al., “Optimal multidimensional bit-rate control for video communication,” IEEE Transactions on Image Processing, vol. 11, No. 8, pp. 873-874 (Aug. 1, 2002). |
Roy et al., “Application Level Hand-off Support for Mobile Media Transcoding Sessions,” Proceedings of the 12th International Workshop on Network and Operating Systems Support for Digital Audio and Video, 22 pp. (May 2002). |
Sambe et al., “High-speed Distributed Video Transcoding for Multiple Rates and Formats,” IEICE Trans on Information and Systems, vol. E88-D, Issue 8, pp. 1923-1931 (Aug. 2005). |
Schwarz et al., “Overview of the Scalable Video Coding Extension of the H.264/AVC Standard,” IEEE Trans. on Circuits and Systems for Video Technology, vol. 17, No. 9, pp. 1103-1120 (Sep. 2007). |
Senda et al., “A Realtime Software MPEG Transcoder Using a Novel Motion Vector Reuse and a SIMD Optimization Techniques,” ICASSP '99 Proceedings, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 4, pp. 2359-2362 (Mar. 1999). |
Shanableh et al., “Heterogeneous Video Transcoding to Lower Spatio-Temporal Resolutions and Different Encoding Formats,” IEEE Transactions on Multimedia, 31 pp. (Jun. 2000). |
Shanableh et al., “Transcoding of Video Into Different Encoding Formats,” ICASSP-2000 Proceedings, vol. IV of VI, pp. 1927-1930 (Jun. 2000). |
SMPTE, “Proposed SMPTE Standard for Television: VC-1 Compressed Video Bitstream Format and Decoding Process,” SMPTE 421M, pp. i-xx, 5-7, 23-27 (Aug. 2005). |
SMPTE, “SMPTE 327M-2000—MPEG-2 Video Recoding Data Set,” 9 pp. (Jan. 2000). |
Sullivan et al., “DirectX Video Acceleration (DXVA) Specification for H.264/MPEG-4 Scalable Video Coding (SVC) Off-Host VLD Mode Decoding,” 24 pp. (Jun. 2012). |
Sullivan, “DirectX Video Acceleration Specification for H.264/AVC Decoding,” 66 pp. (Dec. 2007, updated Dec. 2010). |
Sullivan et al., “DirectX Video Acceleration Specification for H.264/MPEG-4 AVC Multiview Video Coding (MVC), Including the Stereo High Profile,” 17 pp. (Mar. 2011). |
Sun et al., “Architectures for MPEG Compressed Bitstream Scaling,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 6, No. 2, pp. 191-199 (Apr. 1996). |
Sun et al., “Lossless Coders,” Digital Signal Processing for Multimedia Systems, Chapter 15, pp. 385-416 (Mar. 1999). |
Swann et al., “Transcoding of MPEG-II for Enhanced Resilience to Transmission Errors,” Cambridge University Engineering Department, Cambridge, UK, pp. 1-4 (Sep. 1996). |
Takahashi et al., “Motion Vector Synthesis Algorithm for MPEG2-to-MPEG4 Transcoder,” Proc. of SPIE, vol. 4310, pp. 872-882 (Jan. 2001). |
Tan et al., “On the Methods and Performances of Rational Downsizing Video Transcoding,” Signal Processing: Image Communication 19, pp. 47-65 (Jan. 2004). |
Tektronix Application Note, “Measuring and Interpreting Picture Quality in MPEG Compressed Video Content,” 8 pp. (2001). |
Tsai et al., “Rate-Distortion Model for Motion Prediction Efficiency in Scalable Wavelet Video Coding,” 17th International Packet Video Workshop, 9 pp. (May 2009). |
Tudor et al., “Real-Time Transcoding of MPEG-2 Video Bit Streams,” BBC R&D, U.K., 6 pp. (Sep. 1997). |
Van Der Tol et al., “Mapping of MPEG-4 decoding on a flexible architecture platform,” Proceedings of the SPIE, Media Processors, vol. 4674, 13 pp. (Jan. 2002). |
Van Der Tol et al., “Mapping of H.264 decoding on a multiprocessor architecture,” Proceedings of the SPIE, vol. 5022, pp. 707-718 (May 2003). |
Vetro et al., “Complexity-Quality Analysis of Transcoding Architectures for Reduced Spatial Resolution,” IEEE Transactions on Consumer Electronics, 9 pp. (Aug. 2002). |
Vishwanath et al., “A VLSI Architecture for Real-Time Hierarchical Encoding/Decoding of Video Using the Wavelet Transform,” Proc. ICASSP, 5 pp. (Apr. 1994). |
Waggoner, “In Depth Microsoft Silverlight,” downloaded from the World Wide Web, 94 pp. (document marked 2007). |
Watkinson, The MPEG Handbook, pp. 275-281 (Nov. 2004). |
Werner, “Generic Quantiser for Transcoding of Hybrid Video,” Proc. 1997 Picture Coding Symposium, Berlin, Germany, 6 pp. (Sep. 1997). |
Werner, “Requantization for Transcoding of MPEG-2 Intraframes,” IEEE Transactions on Image Processing, vol. 8, No. 2, pp. 179-191 (Feb. 1999). |
Wiegand et al., “Overview of the H.264/AVC Coding Standard,” IEEE Trans. on Circuits and Systems for Video Technology, vol. 13, No. 7, pp. 560-576 (Jul. 2003). |
Youn et al., “Video Transcoder Architectures for Bit Rate Scaling of H.263 Bit Streams,” ACM Multimedia 1999, Orlando, Florida, pp. 243-250 (Oct. 1999). |
Zhou et al., “Motion Vector Reuse Algorithm to Improve Dual-Stream Video Encoder,” ICSP 2008, 9th International Conference on Signal Processing, pp. 1283-1286 (Oct. 2008). |
Number | Date | Country | |
---|---|---|---|
20130177071 A1 | Jul 2013 | US |