This application claims priority from European Patent application No. 17305402.4, entitled “METHOD OF DELIVERY AUDIOVISUAL CONTENT AND CORRESPONDING DEVICE”, filed on Apr. 4, 2017, the contents of which are hereby incorporated by reference in it entirety.
The present disclosure generally relates to the field of streaming of audiovisual content to receiver devices, and in particular to receiver devices connected in a local network.
Any background information described herein is intended to introduce the reader to various aspects of art, which may be related to the present embodiments that are described below. This discussion is believed to be helpful in providing the reader with background information to facilitate a better understanding of the various aspects of the present disclosure. Accordingly, it should be understood that these statements are to be read in this light.
Audiovisual content is distributed by a server device to receiver devices (‘receivers’) in a network. When a receiver requests audiovisual data from a distribution server, a certain amount of data buffering is required on the server side, e.g., queued in a First In-First Out (FIFO) output queue, so that the distribution server has enough data in its transmission buffer to ensure smooth stream delivery to a receiver requesting a stream including the audiovisual data. This buffering requirement conflicts however with a fast channel change requirement. It results in important delivery delay when a receiver requests a new stream, e.g., in the case of channel change, or Video on Demand (VoD) trick mode. Several solutions have been proposed to tackle this problem. For example US 2016/0150273 A1 to Yamagishi is related to implementing rapid zapping between channels in Moving Picture Experts Group-Dynamic Adaptive Streaming over HTTP (MPEG-DASH). Yamagishi proposes a server to separate a zapping segment stream from a viewing segment stream. The segments of the zapping stream are of shorter length than the segments of the viewing segment stream. When a receiver zaps to a new channel, it first connects to the zapping segment stream and then switches to the viewing segment stream. However, providing segments with short size puts an important encoding strain on the server and encoding efficiency is not optimal. Further, short segment lengths may result in poor streaming performance due to overhead produced by frequent requests and influence of network delay. Further, it is not always possible to provide segments of very short size. For example, HTTP Live Streaming (HLS) recommends a segment of size 9-10 s.
There is thus a need for a solution that does not require segments of short length, which enables data buffering on the server side, while offering fast channel change and short access delay when trick modes are used.
According to one aspect of the present disclosure, a method for delivery of audiovisual content to a receiver device, implemented by a transmitter device, is provided. The method comprises receiving a request for obtaining audiovisual data from a source for delivery to the receiver device; obtaining the audiovisual data from the source; modifying decoding and presentation time references in the audiovisual data before delivery of the audiovisual data to the receiver device for slowed-down decoding of the audiovisual data by the receiver device, the slowed down decoding by the receiver device adding a delay between the obtaining of the audiovisual data from the source and delivery of the obtained audiovisual data to the receiver device; using the added delay between said obtaining of said audiovisual data from said source and delivery of said obtained audiovisual data to said receiver device for filling a transmission buffer in the transmitter device with the obtained audiovisual data; and delivery of the audiovisual data to the receiver device.
According to an embodiment of the method for delivery of audiovisual content to a receiver device, the decoding and presentation time references are modified by applying an offset to the decoding and presentation time references, the offset slowing down decoding and presentation of the audiovisual data by the receiver device.
According to an embodiment of the method for delivery of audiovisual content to a receiver device, the method further comprises stopping with adding the delay when the transmission buffer is full.
According to an embodiment of the method for delivery of audiovisual content to a receiver device, the method further comprises adding a further delay between the obtaining of the audiovisual data and delivery of the obtained audiovisual data to the receiver device by repeating independently decodable frames in the audiovisual data.
According to an embodiment of the method for delivery of audiovisual content to a receiver device, the audiovisual data source is one of:
a Digital Terrestrial Television frequency carrier;
a satellite frequency carrier;
a Cable frequency carrier;
an audiovisual data storage device.
According to an embodiment of the method for delivery of audiovisual content to a receiver device, the transmitter device is one of a gateway or a set top box.
According to one aspect of the present disclosure, a device for delivery of audiovisual content to at least one receiver device, is provided. The device comprising a processor, a network interface and a memory, the processor being configured to: receive a request for obtaining audiovisual data from a source for delivery to the receiver device; obtain the audiovisual data from the source; modify decoding and presentation time references in said audiovisual data before delivery of said audiovisual data to said receiver device for slowed-down decoding of said audiovisual data by said receiver device, said slowed down decoding by said receiver device adding a delay between the obtaining of the audiovisual data from the source and delivery of the obtained audiovisual data to the receiver device; use the added delay between said obtaining of said audiovisual data from said source and delivery of said obtained audiovisual data to said receiver device for filling a transmission buffer in the transmitter device with the obtained audiovisual data; and deliver the audiovisual data to the receiver device.
According to an embodiment of the device, the processor, the network interface and the memory are further configured to modify the decoding and the presentation time references by applying an offset to the decoding and presentation time references, the offset slowing down decoding and presentation of the audiovisual data by the receiver device.
According to an embodiment of the device, the processor, the network interface and the memory are further configured to stop with adding the delay when the transmission buffer is full.
According to an embodiment of the device, the processor, the network interface and the memory are further configured to add a further delay between the obtaining of the audiovisual data and delivery of the obtained audiovisual data to the receiver device by repeating independently decodable frames in the audiovisual data.
According to an embodiment of the device, the audiovisual data source is one of:
a Digital Terrestrial Television frequency carrier;
a satellite frequency carrier;
a Cable frequency carrier;
an audiovisual data storage device.
According to an embodiment of the device, the transmitter device is one of a gateway or a set top box.
More advantages of the present disclosure will appear through the description of particular, non-restricting embodiments. In order to describe the manner in which the advantages of the present disclosure can be obtained, particular descriptions of the present principles are rendered by reference to specific embodiments thereof which are illustrated in the appended drawings. The drawings depict exemplary embodiments of the disclosure and are therefore not to be considered as limiting its scope. The embodiments described can be combined to form particular advantageous embodiments. In the following figures, items with same reference numbers as items already described in a previous figure will not be described again to avoid unnecessary obscuring the disclosure. The exemplary embodiments will be described with reference to the following drawings in which:
It should be understood that the drawings are for purposes of illustrating the concepts of the disclosure and are not necessarily the only possible configuration for illustrating the disclosure.
The present description illustrates the principles of the present disclosure. It will thus be appreciated that those skilled in the art will be able to devise various arrangements that, although not explicitly described or shown herein, embody the principles of the disclosure and are included within its spirit and scope.
All examples and conditional language recited herein are intended for educational purposes to aid the reader in understanding the principles of the disclosure and the concepts contributed by the inventor to furthering the art, and are to be construed as being without limitation to such specifically recited examples and conditions.
Moreover, all statements herein reciting principles, aspects, and embodiments of the disclosure, as well as specific examples thereof, are intended to encompass both structural and functional equivalents thereof. Additionally, it is intended that such equivalents include both currently known equivalents as well as equivalents developed in the future, i.e., any elements developed that perform the same function, regardless of structure.
Proposed is a solution that enables important data buffering on the server side while not penalizing fast channel change. While enabling the required buffering, channel change speed is improved through delivery, upon connection of a receiver to a new channel, of the audiovisual data to the receiver with modified timing. Then, the audiovisual data can be delivered with short delay, ensuring fast channel change, while the modified timing causes the receiver device to play the audiovisual data at a slower than ‘normal’ rate. The playing at slower-than-normal rate enables the server to constitute a transmission buffer as data is retrieved from the server at a lower than normal rate (i.e., the data is consumed at a slowed down rate by the receiver). When the server has constituted a transmitter buffer of sufficient size, the audiovisual data is proposed to the receiver without modification of the rate, so that the audiovisual data is consumed by the receiver at its normal rate and the transmission buffer stops growing.
When device 100 receives a service/channel selection from any of the receivers 101-103, operations are required such as: tuning to a specific carrier frequency which comprises elementary streams of the selected service or channel, filtering of specific elementary streams belonging to the selected service/channel from an Multi Program Transport Stream (MPTS) multiplex transmitted on the tuned-to-frequency and re-multiplexing of selected elementary streams into an Single Program Transport Stream (SPTS), storing the re-multiplexed data in a transmission buffer, and transmission of data from the transmission buffer to the receiver via IP unicast or otherwise. When device 100 receives a selection for readout of recorded content, a transmission buffer is constituted before the first packets of recorded data can be rendered available to the receiver. The receiver, in turn, will have to constitute a reception buffer and decode the data. The receiver will have to wait for receipt of an independently decodable frame, and I-frame, to start decoding. I-frames are typically transmitted in MPEG streams every second. From receipt of the first packets of the selected service/channel, it can take up to two seconds for the decoder to be able to start decoding. The addition of time required for all of these operations compose the channel change delay.
Adaptive streaming technologies such as HLS or MPEG-DASH have become widespread nowadays because of their many advantages. If a receiver for which content belonging to a selected service/channel/recorded content is to be prepared requires transmission of the selected service/channel/content according to an adaptive streaming protocol, further decoding and encoding steps are required on the side of device 100. One of the main principles of adaptive streaming is that content is available to a receiver in several bit rates so that a receiver can choose between several bit rates according to bandwidth conditions. For example, if reception bandwidth is low due to user displacement, the receiver 101-103 selects a low bit rate version of the selected content. This enables continued viewing of the selected content at the price of degraded rendering quality. If reception bandwidth is sufficiently high, the receiver 101-103 selects a high bit rate version of the selected content for an optimal viewing experience. In order to adapt to rapidly varying bandwidth conditions, the content is chopped in segments of a duration of typically 10 seconds. This means that every 10 seconds, the receiver 101-103 may select a same content in a different bandwidth version. On the device 100 side, this requires decoding and encoding operations. Content selected by the receiver 101-103 must first be decoded by device 100, and then be re-encoded by device 100 in the different bit rate versions. To enable the receiver 101-103 to smoothly switch between different bit rate versions of a same content, device 100 must re-encode each segment such that it starts with an independently decodable frame, the aforementioned I-frame. Each segment will thus start with an I-frame and each segment will comprise one or more complete Group of Pictures (GoPs) that can be independently decoded by the receiver 101-103. When device 100 thus receives a selection request for a new service/channel, e.g. during a channel change, the device 100 will have to acquire data from the selected service/channel, wait for a first I frame, decode one or several GoPs starting from the first received I-frame, and re-encode the decoded information into one or more GoPs with different bit rates. If the segment duration is set to 10 s, the device 100 may then additionally have to acquire 10 s of content of the selected service/channel after the first I-frame, and then decode the acquired content and re-encode it into different bit rate versions, before rendering the segments available to the receiver 101-103. When device 100 receives a selection request for content stored on storage 108, an I-frame is sought for that is the closest to a requested play out point in the requested content, one or several GoPs starting from that I-frame are decoded and re-encoded into one or more GoPs with different bit rates. If the segment duration is set to 10 s, the device 100 will have to acquire 10 s of content of the selected recorded content after the closest I-frame, decode and re-encode the acquired content into different bit rate versions, before rendering the segments available to the receiver 101-103.
It will therefore take more than 10 s for the receiver 101-103 to receive a first segment of the requested content. As aforementioned, the receiver 101-103 will require some additional reception buffering before transmitting the received content to its decoder. In practice, it will thus take largely more than 10 s for the receiver 101-103 to render a first image after a channel change or selection of stored content.
These time-consuming operations thus adversely affect user experience in case of channel surfing or selection of recorded content (e.g., via use of so-called trick modes such as play/pause/resume/fast forward/fast reverse). It is commonly admitted that channel change delays/content selections that exceed 2-3 s are perceived as too slow by consumers.
The effect of this treatment by device 100 is that a device 100 implementing the method can quickly, after having received a first request for retrieval of data from a channel from a receiver 101-103 that is substantially equal to duration 408, start to encode first segments for receiver 101-103, so that the receiver 101-103 can quickly start decoding and render the first data from the stream. As aforementioned, setting up of a reception of data from a selected channel may require operations for device 100 such as tuning, de-multiplexing and re-multiplexing which may take additional time not accounted for herewith. Another effect of this treatment is that, since the timing references of the encoded segments are modified, the receiver 101-103 will decode and consequently render the data included in the segments at a slowed-down rate, which enables the device 100 to constitute a transmission buffer.
Not shown in
According to a different embodiment already described previously, the timing references are modified such that the acceleration of the decoding rate by the receiver 101-103 is barely observable for a user of the receiver device 101-103. Also not shown in
It can thus be observed that the present principles enable to constitute a transmission buffer while keeping low channel change delay, and that the present principles are applicable to both adaptive streaming solutions as well to non-adaptive streaming solutions, without requiring modifications on the receiver side.
Following are further details about the constitution of what is produced by device 100 and how.
Device 100 may include, at first request for data from an audiovisual channel from a receiver 101-103, in the first SPTS data for transmission by the receiver 101-103, program specific information (PSI) about the program in the stream by including information tables such as Program Association Table (PAT) and Program Map Table (PMT), so that the receiver 101-103 can acquire information about the data such as which program is comprised in the SPTS, what is the Program Identifier (PID) of the PMT; whereas the PMT indicates which packets comprise a timing reference such as Program Clock Reference (PCR) and information about PIDs of elementary video and audio streams in the data. If the device 100 prepares segments for adaptive streaming, the first segment may include the program specific information. In the first data for transmission to a receiver 101-103, the device 100 may remove subtitle and audio stream information from the PMT so that the receiver 101-103 can quickly render a first image included in the first data without waiting for audio/video synchronization.
For embodiments that comprise a repetition of I-frames such as discussed with reference to
Note that I-frames can be detected in a transport stream by searching for transport stream packets having an adaptation_field with random_access_indicator set to 1. Note that the frame rate can be determined for example from the sequence header of an MPEG2 stream (field frame_rate_code), from VUI packets in H.265/HEVC stream or H.264/AVC type streams.
For slowed-down decoding of audiovisual data during the time required to constitute a transmission buffer, the PCRs in the data prepared by the device 100 for decoding by receiver 101-103 are multiplied with a slow-down factor, e.g., the previously mentioned 0.5-0.95, for example 0.8. According to the previously discussed embodiment that enables a progressive acceleration, the slow-down factor progressively changes from, for example, 0.5 to 1 during the time required to constitute a transmission buffer. If the slowed-down decoding is preceded by a repetition of I-frames as previously discussed, the value of the first PCRs following the PCR of the last I-frame of the repetition should of course be based on the modified PCR of that last I-frame. Additionally, Presentation Time Stamp (PTS) should be modified of the video elementary stream, of the audio elementary stream if not removed, and of the subtitle elementary stream if not removed for during the duration required to constitute the transmission buffer. Each following PTS value should then be set to a value of the previous PTS from which the accumulated difference between the PCR value of the original stream minus the slowed-down PCR value is deduced. Finally, the picture timing information of the video elementary stream should be modified to be coherent with the slow-down rate. For example, for MPEG2 video, the frame rate is changed by changing the fields frame_rate_code, frame_rate_extension_m, and frame_rate_extension_d. For HEVC video, divide the vui_num_ticks_poc_diff_one_minus1 by the slow-down factor. For H.264 video, the time scale value should be modified.
As mentioned, during the period required for constituting the transmission buffer, decoding and rendering of audio stream by the receiver 101-103 may be inhibited by removing references to the audio elementary stream from the PMT. However if the references are not removed, the slowed-down rate of the decoding and rendering by the receiver should not be too important as the pitch change would be audible, which would induce a hearing discomfort for the user. If the slowed-down rate is not too important (e.g., 0.8) the pitch change can be acceptable. The sample rate in the header of the audio packets should then be modified to be coherent with the slowed-down rate. According to a different embodiment, which is notably suited for more important slowed-down rates, the audio speed change can be performed without pitch change but this requires decoding and re-encoding of the audio stream. According to a different embodiment, audio PTS is modified such that first audio is rendered when the decoding rate is no longer modified so that there are no pitch changes.
When the transmission buffer is constituted and decoding rate by receiver 101-103 returns to the encoding rate of the stream received by device 100 (i.e., slow-down rate=1), the timing references need to be modified continuously so that they are coherent with the timing references of the data provided to the receiver 101-103 during the period the transmission was constituted. For this, an offset is to be added to each of the PCR of the stream destined to the receiver device 101-103 following this period, that is equal to the duration of the transmission buffer in terms of PCR ticks, or said otherwise, that is equal to the difference between the last modified PCR of the slow-down period and the value of the original corresponding PCR in the stream received by the device 100. The same is true for the PTS time stamps. An offset is to be added to each of the PTS of the stream destined to the receiver device 101-103 following this period, that is equal to the duration of the transmission buffer in terms of PCR ticks, or said otherwise, that is equal to the difference between the last modified PTS of the slow-down period and the value of the original corresponding PTS in the stream received by the device 100. Alternatively, see
Of course, when offsetting PCR and/or PTS, PCR/PTS looping on the 33-bits base of the PCR/PTS should be taken into account.
It is to be appreciated that some elements in the drawings may not be used or be necessary in all embodiments. Some operations may be executed in parallel. Variant embodiments other than those illustrated and/or described are possible. For example, a device implementing the present principles may include a mix of hard- and software.
It is to be appreciated that aspects of the present principles can be embodied as a system, method or computer readable medium. Accordingly, aspects of the present principles can take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, micro-code and so forth), or an embodiment combining hardware and software aspects that can all generally be defined to herein as a “circuit”, “module” or “system”. Furthermore, aspects of the present principles can take the form of a computer readable storage medium. Any combination of one or more computer readable storage medium(s) can be utilized.
Thus, for example, it is to be appreciated that the diagrams presented herein represent conceptual views of illustrative system components and/or circuitry embodying the principles of the present disclosure. Similarly, it is to be appreciated that any flow charts, flow diagrams, state transition diagrams, pseudo code, and the like represent various processes which may be substantially represented in computer readable storage media and so executed by a computer or processor, whether or not such computer or processor is explicitly shown.
A computer readable storage medium can take the form of a computer readable program product embodied in one or more computer readable medium(s) and having computer readable program code embodied thereon that is executable by a computer. A computer readable storage medium as used herein is considered a non-transitory storage medium given the inherent capability to store the information therein as well as the inherent capability to provide retrieval of the information there from. A computer readable storage medium can be, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. It is to be appreciated that the following, while providing more specific examples of computer readable storage mediums to which the present principles can be applied, is merely an illustrative and not exhaustive listing, as is readily appreciated by one of ordinary skill in the art: a hard disk, a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
Number | Date | Country | Kind |
---|---|---|---|
17305402.4 | Apr 2017 | EP | regional |