This application claims the benefit of the filing date of the following European Patent Application No. 14305052.4, filed Jan. 14, 2014, hereby incorporated by reference in its entirety.
This invention relates to a method and an apparatus for multiplexing, and more particularly, to a method and an apparatus for multiplexing multiple bit streams corresponding to layered coded contents, and a method and apparatus for processing the same.
When transporting Audio Video (AV) streams, one common challenge is to send as many streams (channels) as possible within a fixed capacity network link (with a fixed bandwidth) while ensuring that the quality of each AV service remains above an acceptance threshold.
When using Constant Bitrate (CBR) streams, a simple time division multiplexing is often used to share the available bandwidth between AV services. While this is simple in terms of bandwidth allocation to each service, this is unfortunately inefficient in terms of AV coding. Indeed, when using CBR coding, sequences are coded with the same bitrate regardless of their complexity.
Variable Bitrate (VBR) coding allows spending higher bitrates on sequences with higher complexity (for example, sequences with more details, more movement) while ensuring that lower bitrates are used for sequences with lower complexity. The complexity of audio/video content is usually computed in order to decide how much bitrate will be dedicated at a given instance to the coding of the audio/video content.
Several VBR streams may be transported within a fixed capacity network link. For example,
Statistical multiplexing is based on the assumption that statistically, higher complexity scenes from one stream can happen at the same time as lower complexity scenes from another stream in the same network link. Therefore, extra bandwidth used for coding complex scenes can come from bandwidth savings on the coding of less complex scenes at the same time. Statistical multiplexing usually evaluates in real time the complexity of all AV streams and then allocates the total available bandwidth among each of the streams taking into account the complexity of all streams. When several streams compete for the bandwidth, additional mechanisms such as simple priorities may be used to make decisions on bandwidth sharing.
The invention sets out to remedy some of the drawbacks of the prior art. In particular, in some embodiments, the invention enables to reduce bitrate peaks after multiplexing. The present principles provide a method of processing a first bit stream and a second bit stream, comprising: accessing the first bit stream and the second bit stream wherein the first bit stream corresponds to one of a base layer of layered coded content and an enhancement layer of the layered coded content, and the second bit stream corresponds to the other one of the base layer of the layered coded content and the enhancement layer of the layered coded content; delaying the second bit stream by a first time duration; and multiplexing the first bit stream and the delayed second bit stream as described below.
According to an embodiment, the method further comprises: determining bits in the multiplexed streams exceeding capacity of a network link; and time shifting the determined bits by a second time duration.
According to an embodiment, the method further comprises determining the first time duration responsive to encoding parameters for the layered coded content, the encoding parameters including at least one of a GOP (Group of Picture) length and GOP structure. According to a variant, the first time duration varies with GOPs.
According to an embodiment, the method further comprises transmitting the multiplexed streams and information representative of the first time duration.
The present principles also provide an apparatus for performing these steps.
According to an embodiment, the apparatus is disposed within one of a server and a video multiplexer.
According to an embodiment, the apparatus comprises a transmitting antenna, an interface to a transmitting antenna, a video encoder, a video memory, a video server, an interface with a video camera, and a video camera.
The present principles also provide a method of processing a first bit stream and a second bit stream, comprising: decoding the first bit stream into a first representation of a program content; decoding the second bit stream into a second representation of the program content after a delay from the decoding of the first bit stream, wherein the first bit stream corresponds to one of a base layer of layered coded content and an enhancement layer of the layered coded content, and the second bit stream corresponds to the other one of the base layer of the layered coded content and the enhancement layer of the layered coded content; and outputting signals corresponding to the first representation and the second representation for rendering as described below.
According to an embodiment, the method further comprises rendering the first representation at a speed slower than a speed specified in at least one of the first bit stream, the second bit stream, and a transport stream.
According to an embodiment, the method further comprises rendering the first representation at the specified speed after the rendering of the first and second representations are aligned in time.
According to an embodiment, the method further comprises de-multiplexing the first bit stream, the second bit stream and information representative of the delay from a transport stream.
The present principles also provide an apparatus for performing these steps.
According to an embodiment, the apparatus comprises one or more of the following: an antenna or an interface to an antenna, a communication interface, a video decoder, a video memory and a display.
The present principles also provide a computer readable storage medium having stored thereon instructions for processing a first bit stream and a second bit stream, according to the methods described above.
When transporting two representations of a same content, it may be advantageous to use layered coding, rather than coding each of them separately and transport them simultaneously over the same network link. With layered coding, the base layer (BL) provides basic quality, while successive enhancement layers (EL) refine the quality incrementally. For example, both FID and UltraHD (UHD) versions of a same content can be delivered as one layered coded content in which the base layer contains the HD version of the media and the enhancement layer contains the extra information needed to rebuild UltraHD content from the HD content.
In layered coding, since the base layer and enhancement layers represent the same content with different qualities, their coding complexity (and therefore their bitrate needs for a proper quality after coding) usually follows a similar trend, and their bitrates usually exhibit peaks and drops at similar time instances. Such bitrate peaks may cause problems for statistical multiplexing which assumes that peaks and drops from different streams should statistically rarely coexist. In particular, simultaneous bitrate peaks from different bit streams can create an overall peak in terms of the total bandwidth usage, and the bitrate need for both the base layer and enhancement layers may exceed the network link capacity dedicated to this service.
For example, as shown in
In the present principles, we propose different methods to adapt statistical multiplexing for layered coding content. In one embodiment, we introduce a delay in the enhancement layer bit stream or the base layer bit stream such that bitrate peaks from different layers no longer occur simultaneously and the amplitude of the overall peak can be decreased.
In the following examples, we may assume that there is only one enhancement layer in layered coding, and that the layered coding is applied to a video content. The present principles can be applied when there are more enhancement layers and be applied to other type of media, for example, to audio content. In the present application, we use term “BL version” or “BL content” to refer to the original or decoded content corresponding to the base layer, and use term “EL version” or “EL content” to refer to the original or decoded content corresponding to the enhancement layer. Note that to decode the EL version, the base layer is usually needed.
The delay D may be fixed, and it can be determined based on encoding parameters, for example, based on the GOP (Group of Picture) length and GOP structure as set forth, for example, according to the MPEG standards. In one example, delay D can be set to half the duration of the GOP length. It is also possible to vary the value of D from GOP to GOP. In one example, D may vary with GOP depending on the coding structure (Intra only, IPPPP, IBBB, or random access) and/or GOP length. In another example, if the quality of the enhancement layer is very low, the delay could be small because the enhancement layer bitrate peaks can be small. If we vary delay D from GOP to GOP, the decoder needs to know the maximum value of D (Dmax) to decide its buffer size, and Dmax must be signaled to the decoder.
In some GOP structures, for example, I0P8B4B2b1b3B6b5b7, there is a significant delay between reception of the first image (I0) and the second image in the display order (b1). To avoid the scenario where the data runs out at decoding, in the present application, we assume that the entire GOP is needed before starting decoding the GOP. The present principles can also be applied when the decoding starts at a different time.
With traditional MPEG video coding, the maximum channel change time (also known as zap time) is usually 2 GOP times. For layered coding, since the enhancement layer can only be decoded after the base layer is received and decoded, the channel change time for the enhancement layer is equal to 2 maximum GOP times, wherein the maximum GOP time is the largest GOP time used among the base layer and enhancement layer(s) needed to decode a given layer. When BL and ELs have the same GOP size, the channel change time is 2 GOP times, the same as in traditional MPEG video coding.
As shown in method 300, a delay can be added to the enhancement layer or the base layer. When the delay is added to the base layer with regard to the enhancement layer (that is, the enhancement layer is sent before the base layer), at the time the first entire GOP of the base layer (BL GOP) is received and ready for display, the first entire GOP of the enhancement layer (EL GOP) could also have been received, and rendering could start directly in the EL version. If longer GOPs are used in the enhancement layer than in the base layer (which is often the case), the first EL GOP may not be entirely received when the first entire BL GOP is received, then the base layer must be rendered first and switching to the enhancement layer can be performed as described below for the scenario where the delay is added to the enhancement layer with regard to the base layer. One advantage of adding the delay to the base layer with regard to the enhancement layer is that the channel change time for the enhancement layer is decreased, even though there might be an extra playback delay, which is usually not an issue except for live events.
When the delay is added to the enhancement layer with regard to the base layer, the channel change time of the enhancement layer becomes (2 GOP times+D). The additional delay D may make the channel change time too long. It also may make it difficult for users to understand that for a given content, why channel changes much faster on the BL version (HD for instance) than the EL version (UltraHD for instance).
In order to reduce channel change time, the present principles provide different channel change mechanism for the EL content, for example, as described further below in methods 400 and 500.
At time T1, we can display the BL content but not the EL content since the EL bit stream is delayed by D. For ease of notation, we use Fi to denote the ith frame to be rendered. At step 440, until one full EL GOP is received (at time T2), it continues decoding and rendering the BL content while buffering the EL bit stream.
At time T2, the decoder is now ready to decode and display the first frame F0 for the EL content, but the first frame F0 for the BL content has already been rendered at time T1 since EL is delayed by D (D=T2−T1). To resynchronize the two layers, two modes could be used:
At time T2, the user has the option (step 450), for example, through a pop-up, to switch to the EL version. While the user is making the decision (for example, when the pop-up is presented to the user), the BL version can still be rendered in the background. If the user chooses not to switch to the EL version, BL decoding and rendering continue at step 460. Otherwise, if the user decides to switch to the EL version, the decoder starts decoding and rendering the EL frame at step 470, for example, using the “replay” mode or “wait” mode.
The advantage of method 400 is that it is simple, and it allows the user to change quickly across several channels by looking at the BL versions and he is offered the option to watch the EL version only when it is actually available. In one embodiment, the decoder may propose a user setting so as to decide whether to display such an option or always automatically switch to the EL version when it becomes available.
When the user switches from the BL version to the EL version, the quality may improve significantly and the user may notice a quality jump. To smooth the quality transition, we can use progressive “upscaling” from the BL to EL, for example, using the method described in U.S. application Ser. No. 13/868,968, titled “Method and apparatus for smooth stream switching in MPEG/3GPP-DASH,” by Yuriy Reznik, Eduardo Asbun, Zhifeng Chen, and Rahul Vanam.
At step 510, it accesses the base layer bit stream. It buffers the BL stream at step 520 until the decoding can start when a full BL GOP is received. At step 530, it accesses the enhancement layer bit stream. It buffers the EL stream at step 540 until the decoding can start when a full EL GOP is received. Due to the addition of delay D, BL is in advance of the EL by N frames where:
Due to delay D, at the beginning of the decoding, the decoded frames from the BL and ELs may not be aligned. In order to align the rendering in time of the BL and EL contents, the present embodiments propose to slow down the rendering of BL by m% at step 560, before they are aligned. Note that the video content is usually rendered at a frame rate specified for playback in the bit stream. However, in method 500, in order to align the BL and EL, the rendering of BL is m% slower than the specific frame rate. Consequently, at some time T3, both the BL and EL content will be aligned and offer the same frame at the same time. If it determines that the BL and EL contents are aligned at step 550, it renders of the BL and EL at a normal speed as specified in the bitstream at step 570. Using method 500, a decoder can seamlessly switch from BL to EL without breaking the frame flow.
Time T3 can be obtained using the following formula:
T
3
=T
1
+D*100/m.
The choice of m is important. The greater m is, the more likely the user will notice the slow-down effect, therefore it is important to keep m low. On the other hand, the smaller m is, the longer it will take for the BL and EL to be aligned (at time T3). There is a trade-off that is to be decided in the decoder. Such a decoder setting may or may not be presented to the user.
When the decrease in the rendering speed is low enough (i.e., a small value of m), the slowing on video is usually hardly perceivable by the user. However, a slow down of an audio stream is more noticeable, and we may use some existing solutions to change the pitch of the sound in order to hide the slowing.
Using ten frames as examples,
As discussed above, adding a delay to the base layer or enhancement bit stream helps in reducing simultaneous bitrate peaks of BL and EL streams, and thus more efficiently uses the bandwidth. However, it sometimes may not be enough to completely eliminate the bandwidth overflow. Therefore, in addition to adding delay D, the present principles also propose using a time window W when transmitting the bitstream.
To illustrate how the time window works,
In order to fit the bit streams into the fixed bandwidth, bits exceeding the bandwidth are shifted, backward or forward, within a time window W. Consequently, all streams can be transmitted within the network link capacity. For ease of notation, we denote the portion of bits that exceeds the bandwidth as an “over-the-limit” portion (UHD2′). One example of using the time window is shown in
We have discussed introducing delay D and time window W in order to multiplex bit streams. These two mechanisms can be used separately or jointly. When introducing delay D, an entire BL or EL stream is shifted in time. By contrast, when using time window W, we first determine whether the overall bitrate of all bitstreams goes beyond the maximum allowed bitrate, and if there exists an “over-the-limit” portion, we re-distribute the “over-the-limit” portion. Further, as discussed before, delay D may be determined based on encoding parameters or take a pre-determined value. By contrast, time shift W when using the time window depends on where bitrate peaks are and where there is spare bit rate available.
When a time window is used to shift the “over-the-limit” portion of bits, the channel change mechanisms described before for using a delay (for example, methods 400 and 500) are still applicable. In particular, the value of T3 can be computed with W (replacing D) when the time window is used alone, or W+D (replacing D) when both delay D and time window are used.
Even using both delay D and time window, the aggregated bit stream may still exceed the network link capacity. In this case, we may decrease the bitrate of one or several stream within the same network link so as to fit the bit streams into the network link, or we may even have to drop one or more bit stream.
The present principles propose different methods, such as adding a delay to a base layer bit stream or an enhancement layer bit stream, and shifting an “over-the-limit” portion of bits within a time window, to more efficiently use the bandwidth. Particularly, our methods work well for transmitting layered coded content, which does not satisfy the usual assumption of statistical multiplexing.
At the receiver side, even given the delay added in the bit streams, the present principles provide different channel change mechanisms to allow a user to change channel quickly. In particular, a decoder can start rendering the BL content, without having to wait for the EL to be available. Advantageously, this allows the user to quickly change channel between many channels until he sees something he would like to watch for a longer time. The present principles also provide the option for the use to decide whether he wants to switch to the EL version after watching the video for a period of time.
In the above, we discuss various methods that can be used for layered coding. The present principles can also be applied to scalable video coding, which are compliant with a standard, for example, but not limited to, H.264 SVC or SHVC. The multiplexing methods and the channel change mechanisms can be used together with any transport protocol, such as MPEG-2 Transport, MMT (MPEG Media Transport) protocol, or ATSC (Advanced Television Systems Committee) transport protocol.
Decoder (100) may perform channel change according to the present principles, such as those described in methods 400 and 500, when a user requests a channel change. In one mode, decoder 100 provides MPEG decoded data for display and audio reproduction on units 50 and 55, respectively. In another mode, the transport stream from unit 17 is processed by decoder 100 to provide an MPEG compatible datastream for storage on storage medium 105 via storage device 90.
A user selects for viewing either a TV channel or an on-screen menu, such as a program guide, by using a remote control unit 70. Processor 60 uses the selection information provided from remote control unit 70 via interface 65 to appropriately configure the elements of
Considering
It is assumed for exemplary purposes that a video receiver user selects a sub-channel (SC) for viewing using remote control unit 70. Processor 60 uses the selection information provided from remote control unit 70 via interface 65 to appropriately configure the elements of decoder 100 to receive the physical channel corresponding to the selected sub-channel SC.
The output data provided to processor 22 is in the form of a transport datastream containing program channel content and program specific information for many programs distributed through several sub-channels.
Processor 22 matches the Packet Identifiers (PIDs) of incoming packets provided by decoder 17 with PID values of the video, audio and sub-picture streams being transmitted on sub-channel SC. These PID values are pre-loaded in control registers within unit 22 by processor 60. Processor 22 captures packets constituting the program transmitted on sub-channel SC and forms them into MPEG compatible video, audio streams for output to video decoder 25, audio decoder 35 respectively. The video and audio streams contain compressed video and audio data representing the selected sub-channel SC program content.
Decoder 25 decodes and decompresses the MPEG compatible packetized video data from unit 22 and provides decompressed program representative pixel data to device 50 for display. Similarly, audio processor 35 decodes the packetized audio data from unit 22 and provides decoded audio data, synchronized with the associated decompressed video data, to device 55 for audio reproduction.
In a storage mode of the system of
Processor 60, in conjunction with processor 22 forms a composite MPEG compatible datastream containing packetized content data of the selected program and associated program specific information. The composite datastream is output to storage interface 95. Storage interface 95 buffers the composite datastream to reduce gaps and bit rate variation in the data. The resultant buffered data is processed by storage device 90 to be suitable for storage on medium 105. Storage device 90 encodes the buffered datastream from interface 95 using known error encoding techniques such as channel coding, interleaving and Reed Solomon encoding to produce an encoded datastream suitable for storage. Unit 90 stores the resultant encoded datastream incorporating the condensed program specific information on medium 105.
According to specific embodiments, the receiving system (or apparatus) comprises one or more of the following: an antenna or an interface to an antenna, a communication interface (e.g. from a wired or wireless link or network), a video decoder, a video memory and a display.
The implementations described herein may be implemented in, for example, a method or a process, an apparatus, a software program, a data stream, or a signal. Even if only discussed in the context of a single form of implementation (for example, discussed only as a method), the implementation of features discussed may also be implemented in other forms (for example, an apparatus or program). An apparatus may be implemented in, for example, appropriate hardware, software, and firmware. The methods may be implemented in, for example, an apparatus such as, for example, a processor, which refers to processing devices in general, including, for example, a computer, a microprocessor, an integrated circuit, or a programmable logic device. Processors also include communication devices, such as, for example, computers, cell phones, portable/personal digital assistants (“PDAs”) and other devices that facilitate communication of information between end-users.
According to specific embodiments of the method of processing a first bit stream and a second bit stream, the first bit stream and the second bit stream are accessed from a source belonging to a set comprising: a transmitting antenna, an interface to a transmitting antenna, a video encoder, a video memory, a video server, an interface with a video camera, and a video camera. According to a variant of the method, the multiplexed first bit stream and the second bit stream are sent to a destination belonging to a set comprising: a transmitting antenna, an interface to a transmitting antenna, a communication interface, a video memory, a video server interface and a client device.
According to specific embodiments of the method comprising decoding of first bit stream and the second bit stream, the first bit stream and the second bit stream are accessed before decoding from a source belonging to a set comprising a receiving antenna, an interface to a receiving antenna, a communication interface and a video memory. According to a variant of the method, signals corresponding to the first representation and the second representation for rendering are outputted to a destination belonging to a set comprising a video decoder, a video memory and a display.
Reference to “one embodiment” or “an embodiment” or “one implementation” or “an implementation” of the present principles, as well as other variations thereof, mean that a particular feature, structure, characteristic, and so forth described in connection with the embodiment is included in at least one embodiment of the present principles. Thus, the appearances of the phrase “in one embodiment” or “in an embodiment” or “in one implementation” or “in an implementation”, as well any other variations, appearing in various places throughout the specification are not necessarily all referring to the same embodiment.
Additionally, this application or its claims may refer to “determining” various pieces of information. Determining the information may include one or more of, for example, estimating the information, calculating the information, predicting the information, or retrieving the information from memory.
Further, this application or its claims may refer to “accessing” various pieces of information. Accessing the information may include one or more of, for example, receiving the information, retrieving the information (for example, from memory), storing the information, processing the information, transmitting the information, moving the information, copying the information, erasing the information, calculating the information, determining the information, predicting the information, or estimating the information.
Additionally, this application or its claims may refer to “receiving” various pieces of information. Receiving is, as with “accessing”, intended to be a broad term. Receiving the information may include one or more of, for example, accessing the information, or retrieving the information (for example, from memory). Further, “receiving” is typically involved, in one way or another, during operations such as, for example, storing the information, processing the information, transmitting the information, moving the information, copying the information, erasing the information, calculating the information, determining the information, predicting the information, or estimating the information.
As will be evident to one of skill in the art, implementations may produce a variety of signals formatted to carry information that may be, for example, stored or transmitted. The information may include, for example, instructions for performing a method, or data produced by one of the described implementations. For example, a signal may be formatted to carry the bit stream of a described embodiment. Such a signal may be formatted, for example, as an electromagnetic wave (for example, using a radio frequency portion of spectrum) or as a baseband signal. The formatting may include, for example, encoding a data stream and modulating a carrier with the encoded data stream. The information that the signal carries may be, for example, analog or digital information. The signal may be transmitted over a variety of different wired or wireless links, as is known. The signal may be stored on a processor-readable medium.
Number | Date | Country | Kind |
---|---|---|---|
14305052.4 | Jan 2014 | EP | regional |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/IB2015/000018 | 1/13/2015 | WO | 00 |