The present invention relates to a method and apparatus for streaming media to a plurality of adaptive client devices.
Wireless networks are well known in which many thousands of client wireless devices share the same network bandwidth. Cellular phones are one such example.
In a typical network, wireless network bandwidth fluctuates for a variety of reasons, including channel sharing between different client wireless devices, as well as changing conditions, environmental or otherwise, between the client wireless devices and a base station with which they communicate, as well as changes between the base station and other [servers] that are accessed for purposes of obtaining content.
Bandwidth sharing among applications on the same client wireless device, if instituted, is another reason for network bandwidth fluctuations, since bandwidth available to each application can also fluctuate.
One known method of transmission with a client device is to use constant bit rate streaming. Users experience poor streaming media quality when available bandwidth is lower than the streaming bit rate, as undesired gaps in the stream occur, which thus cause gaps in the audio or other content being experienced.
In order to overcome the disadvantages of constant bit rate streaming, it is also known to use a streaming server that can adapt its streaming bit rate dynamically. While such dynamic adaptation has advantages, such an approach does not scale well if the streaming server streams a large number of streams, such as tens of thousands. This is because the streaming server needs to fully understand the syntax of the transmitted media bit stream and process the adaptive bit rate request in a sophisticated manner that requires intensive computation processing such as time synchronization, and header seeking within the media bit stream.
The present invention relates to a method and apparatus for streaming media to a plurality of adaptive client devices.
In one aspect there is provided a method of providing a media stream over data channel of a best effort transmission network that includes a wireless path to a plurality of client devices.
In another aspect there is provided a method of encoding a stream of data into chunks, whereby the chunks are obtained by determining a break point between them that corresponds to a silence point.
In another aspect, there is provided a method for creating a library of encoded media for a media stream and linking the library to a plurality of cell phone devices.
These and other aspects and features of the present invention will become apparent to those of ordinary skill in the art upon review of the following description of specific embodiments of the invention in conjunction with the accompanying figures, wherein:
a), (b) and (c) together illustrate communications in the wireless network according to the present invention;
a) illustrates a representation of an audio sequence, and various break points within the sequence that are used to establish raw chunks according to the present invention and
The present invention provides an end-to-end system that delivers full version of long form content to small screen terminals/client devices in streaming (real-time) fashion and minimize the on-off interruption in the context of fluctuating bandwidth. As described further herein, in a preferred embodiment, the present invention employs an intelligent chunking mechanism that segments a continuous stream of unencoded/uncompressed media data into raw chunks, a multi-rate, multi client encoder that generates a library of streaming media in encoded chunks for an array of different classes of client devices, a client only adaptive adjustment mechanism for various bit rates and compression scheme combinations that is based on multi-client, multi-rate encoding of raw chunks to ensure uninterrupted streaming in real-time in the face of fluctuating bandwidth available to the streaming session and a mechanism to transmit and display on the device screen media such as video, text, flash, or an image, with or without a hyperlink, while audio is being streamed as described herein and played through the device's audio output device, and particularly when the audio buffer is full or larger than certain threshold, since when audio buffer occupancy is larger than a threshold, it is safer to download other content, without causing the buffer to deplete.
As described herein, usage of the intelligent chunking mechanism, in combination with the multi-rate, multi client encoder, provides advantages over usage of continuous bit streams that are conventionally used for streamed content These advantages include 1) simplifying the streaming server and the client since each raw chunk (or just “chunk” as used herein) is independently encoded leading to independency between chunks, and more particularly to independency between different files that contain different encoded chunks. As a result, when the streaming server and the client deal with adaptive transmissions, adaptation all is done on the encoded chunk level. There is no need for the streaming server and/or the client to probe into the bit stream and seek for certain header and timing information so as to determine the breaking points in the continuous bitstream at which to switch to another bit stream (encoded at a different bit rate); 2). Another benefit from the chunking approach is the support of multiple compression schemes to accommodate more diverse bit rate change 3). Since each encoded chunk is preferably represented as a file, caching of the files in the network save data server bandwidth, whereas a continuous bit stream can not be cached; 4) client device implementation is simplified as the client simply requests the file that corresponds to a chunk encoded at a particular bit rate/compression scheme combination (in contrast to sophisticated streaming protocols such as RTSP). As a result, low-resource cellphones can be supported. 5) natural support for text-based content sources that are converted to audio using a text-to-speech engine as text can be naturally broken into text segments based on sentence stops such as a period sign or a comma sign. Then each text segment is converted to an audio chunk via a text to speech engine;
In addition to using an intelligent chunk mechanism and a multi-client multi-rate encoding mechanism, usage of a client only approach can greatly simply the streaming server on which content is stored and achieve much better scalability because the server can be stateless.
a), (b) and (c) illustrate the system 100 of the present invention, and the manner of communication between the various elements, particularly the steaming/web server and the client as shown in
The system 100 includes an intelligent chunk mechanism 110, a multi-rate, multi client encoder 112, which provides multi-rate, multi-client encoded content, described further hereinafter, to a streaming server 120. The streaming server 120 serves the content to each of a plurality of client devices 130-(1, 2, 3, n), preferably wireless client devices, through a transmission network 140.
A representative portion of the transmission network 140 is illustrated in
The client devices 130 include conventional hardware such as transmitter and receiver circuits, a processor, a memory, a user interface, and some type or types of content delivery mechanism, such as a speaker or display unit. The processor, among other functions, will execute a program 132 that provides for the functionality of the present invention as described, which program will reside in an application area of the memory, and allow the downloading of content into a buffer, which content is then ultimately provided from the buffer to the content rendering mechanism (an audio player or a video player) so that it can be rendered and thus presented to the user. The buffer of the client device 130 is functionally illustrated in
The basic mechanism of providing multi-rate content according to the present invention is described in detail below using the example of audio. Other streaming media formats can be treated in a similar manner, and modifications for different formats are also described further hereinafter.
With respect to this example, a particular audio source file is segmented into raw chunks (or just “chunks”) using the intelligent chunking mechanism 110 illustrated in
The intelligent chunking mechanism 110 is preferably adapted to operate upon different types of content, audio, both audio and video, as well as text.
The first, shown by break points 310, are each made so that each chunk has the same period of time. While this has ease of implementation aspects with respect the subsequent encoding of the chunks, as well as with respect to the client device 130, a disadvantage can be that users may sense a brief pause in the middle of a sentence if the content is audio, for example, as the audio player on the client device 130 switches from the finished encoded chunk to the next encoded chunk. Accordingly an intelligent chunk algorithm is used to segment the streaming content data at appropriate break points, which are not typically purely periodic.
For a stream that includes both audio and video, either the audio or video can be used to determine the break point. Preferably the audio is used, and the video break point is made the same. But other methodologies can be used, including using the video scene change.
For a pure video stream without an audio component, a scene-change point, where there is a significant change in the background is used, and can be detected by looking at the difference between two consecutive video frames. One detection mechanism is to use a difference threshold, and if the difference is larger than that threshold, that is referred to as a scene change that is appropriate to use as a break point.
Another type of content is live audio. Chunking takes place in a manner that is the same as for a large pre-recorded audio file as described above, except that the live content is chunked in real-time.
Another type of content is a pre-stored large text file. Such a text file is preferably first chunked based upon text breaks, including but not limited to the period sign, commas, as well as more sophisticated divisions (such as not causing a break between a subject and verb that are adjacent to each other). Once chunked in text form, the intelligent chunker will convert each text chunk to an audio chunk using a text-to-speech engine (not shown).
Real-time text, such as an RSS feed, is handled in the same way as a large pre-stored text file as described above, except that the real-time text is chunked in real time.
The multi-client multi-rate encoder 112 inputs the chunks that have been obtained by the intelligent chunker 110 described above, and for each of the different type of client devices 130, taking into account the specifications the client type, the type of compression scheme being supported on the client device 130, and the type of wireless network that the particular type of client device 130 operates upon, encode each chunk at a plurality of different bit rates/compression scheme combination, with each bite rate/compression scheme combination corresponding to a so-called track. This is shown also in
The choice of the tracks by the multi-client multi-rate encoder 112, which correspond to one of the bit rates and a corresponding compression scheme as described above, depends on minimum type of client device, acceptable media quality, preferred/target media quality, network conditions, the compression algorithm chosen, and reasonable differences between two adjacent bit rates. For example, when network bandwidth can fluctuate from a few kbps to 100 kbps, 20 tracks can be generated for a single chunk: the first 8 tracks compressed with AMR and with the rates of 4.75 kbps, 5.15 kbps, 5.9 kbps, 6.7 kbps, 7.4 kbps, 7.95 kbps, 10.20 kbps, 12.20 kbps; the next tracks compressed with W-AMR with the rates of 14.25 kbps, 15.85 kbps, 19.85 kbps, 23.85 kbps, and the next two tracks using AACPlus with the bit rates of 32 kbps and 48 kpbs, and the remaining tracks encoded with MP3 at bit rates between 40 kbps and 100 kbps.
Since each client device 130 can be of one of many different types as mentioned above, with each type having a different capability: cpu power, memory, compression scheme supported, how fast the client can switch from one player (that plays one encoded chunk) to the next player (that plays the next encoded chunk). The present invention uses these various different cellphone parameters to determine a cellphone profile table (shown in
Also, in use, a separate hint track with bit rate/compression information for all tracks (of different bit rate/compression combinations) is also generated and supplied to the client device 130 at the beginning of a session that tells the client device 130 the different existing tracks that exist for each encoded chunk, and the approximate starttime and endtime for each encoded chunk, based on an chunk index scheme. For example, the hint track will notify the client device that the media content in issue has 12 tracks, and the target combination (of bit rate/compression) for each track. In addition, the hint track can also be used through the download of the media content, so that information on each chunk can be obtained—such as chunk #2 has 8 tracks (bit rate/compression scheme), starts from 7.5 second and has a length of 8.2 seconds.
With respect to the hint track, the hint track has the following fields, as shown in Table I below:
The client application program 132 on the mobile handset device 130 monitors the downloading speed of recent encoded chunks (in the stream buffer 134 associated with the device 130 and the application 132). The downloading speed is averaged over a specified period of time, and can have a widely vary range depending on the wireless network capabilities, in the range of a few kbps to a few mbps. Actions are triggered based on buffer overflow or underflow status, which actions are first generally described below, with a more detailed discussion provided thereafter.
The client program 132 is the preferable way in which to monitor the network as the method described herein is network agnostic: via monitoring the buffer 200 at the client device 130 side, at the application level, the present invention can tell how good or bad the channel is. It is noted that since each encoded chunk can be represented as a file on the server side, and the streaming server 120 can tell the client device 130 the file size before the transmission (such as in a HTTP protocol, there is a content-length field in the http header, and there is no need for the client device 130 to know the size of each encoded chunk a priori, because the system preferably operates on a need to know basis), this assists in allowing the client device 130 to initiate the request for the appropriate track, based upon the application level monitoring of the buffer 200. Nonetheless, other ways to monitor the network channel situation on the client device 130 side can be used. For example, one can monitor the signal-to-noise level at the physical layer. Another way is to monitor the packet loss at the logical layer. Another way is to monitor delay and loss at the IP layer. However, all of these schemes are isolated from the application, and as such aren't the preferred monitoring method.
If the downloading speed is lower than the content rendering rate (the audio decoder has to take an encoded chunk out of the buffer after it finishes playing the previous encoded chunk, hence the rendering rate has to do with the duration of each encoded chunk, not the track characteristics of each encoded chunk), it will lead to decrease in buffer occupancy. When the buffer occupancy is lower than Bl, the client program 132 initiates a switch to a track of lower resolution (bit rate/compression scheme) (commensurate to measured network speed) by requesting the server to send a different set of files (encoded chunks) that are encoded at that lower resolution track. The tradeoff is lower streaming media quality, but this quality is preferred to streams with a substantial number of lost packets. Bl is decided by the maximum latency required. The larger the latency can be, the larger Bl can be, and the less likely underflow of buffer 200 will happen.
On the other hand, if network speed is higher than current content rendering rate, and the buffer level is higher than Bt, the client program 132 can initiate an upshift to higher resolution track to increase/restore the streaming media quality. The reason why downshifting use Bl and upshifting use Bt is because we want to be conservative: when the buffer level is higher than Bl, if we upshift right away and if it happens that the bandwidth drop again to a very low resolution track, the buffer may deplete very soon. Hence we want to build up the buffer level to a higher value of Bt to play safe.
The downloading protocol can be of any, but in particular well suited for HTTP, in which case the streaming server can be standard stateless web server without any modification. The client simply uses the HTTP protocol to request a file, which corresponds to a chunk encoded at a particular track.
Among other benefits, two distinct benefits stand out: First, eliminated is the requirement to support complicated streaming protocols, such as (RTSP), on the device 132, which complicated protocols are typically currently available only on high end cellphones. Second, eliminated is the requirement for expensive and complicated and non-scalable streaming servers, as the present invention requires only conventional stateless web/wap servers to serve streaming audio content
Streaming server 120 illustrated in
The program 132 of the streaming client device 130 that requests encoded chunks will now be described with reference to the flowcharts of
The following annotations are used for both the flowcharts of
Then, preferably before each subsequent encoded chunk is downloaded, or after some number of encoded chunks are downloaded, the track (with the combination of bit rate and compression type for each track) is decided in step 520 so that the client device 130 can inform the server which encoded chunk to send. The track is determined in the following manner: first, the buffer level/occupancy is viewed. If the buffer level is less than Bl, then the lowest track is used in step 530 until the buffer level is larger than Bl.
When B>Bl, the current network bandwidth Rn is first estimated in step 540, based on the network bit rate measured for the previous encoded chunk (encoded chunk size divided by the time it took to download the previous encoded chunk). Then the target track is decided for this current encoded chunk to be downloaded in step 550. The algorithm is: if B>Bt, then the target track is set in step 560 to a closest track of the estimated network bandwidth, which may be a track that has a different combination that has a better quality (with greater bit rate/different compression scheme). If Bl<B<Bt, then the target track is set in step 570 to one level lower than the estimated network bandwidth. Steps 540-570 are then repeated until the end of the media stream, preferably for each encoded chunk, and as shown by the arrows back to step 540.
With respect to the flowchart of
Preferably after the encoded chunk is downloaded (or after some interval or some other measure) in step 610 there is checked whether Rn>Rt. If no, step 620 follows. If yes (ideally Rn=2*Rt), then step 630 follows.
In step 620, since Rn<Rt the program within the mobile device initiates a request for a track at a lower bit rate/compression scheme Rtlow right away, based on the measured Rn, so that an Rt is chosen that is lower than Rn. Once encoded chunks at this track combination are being received, then step 630 follow (to ensure that buffer 200 will gradually grow to Bt).
In step 630, the content within the buffer 200 is begun to get serially read out in sequence for rendering, while downloading of additional encoded chunks into the buffer 200 continues as fast as possible.
After Bt is reached, step 640 follows, and the download is slowed to a normal rate. Every Tc, an encoded chunk is downloaded, so that content is being downloaded into buffer 200 at the same rate it is being output from buffer 200.
At each encoded chunk downloaded, preferably Rn is calculated in step 650 and a determination of the network conditions is made, based upon the low threshold Bl, as will now be described.
If conditions are normal, then step 640 repeats. If Rn<Rt, the amount of data content in the buffer 200 will decrease, and a lower track resolution is likely needed. It is noted, however, that temporary fluctuation are permitted, so that the overall system will not change based on a temporary fluctuation. In order to determine that a fluctuation is significant, however, the present invention tracks some other measure, preferably a low threshold Bl, which corresponds to a low threshold amount of content data in the buffer 200.
If Bl is reached, step 660 follows and the program 132 at the client device 130 initiates a request for a track of a lower bit rate/compression scheme, Rtlow<Rn, to ensure that the buffer 200 will gradually grow to Bt. Alternatively, if Rt<Rn, instead of Bl being reached, a track of higher resolution can be requested.
After step 660, the buffer 200 should grow from Bl to Bt There are, however two possibilities. Both are based on the detected Rn as shown by step 670.
In the first, the buffer 200 grows, and Rt<Rn. While normally, as discussed above, this would indicate that a change to a higher resolution track, in this instance, a wait period of one encoded chunk occurs, as shown by step 680, before changing to a higher resolution track to avoid overreaction, since the increase in Rn can be temporary, just like the temporary decreases as noted above.
In the second possibility, it is determined that Rn still decreases below Rt, which would indicate that a change to a lower resolution track. Similarly as describe above, however, in step 690, a wait period occurs so that a change to a further lower resolution track is not made until Bl is again reached, in order to avoid a continued oscillation of track changes.
With respect to parameter setting for Bt and Bl, the difference Bl between Bt is actually the window in which we observe the network bit rate fluctuation. Bl is set so that with the roundtrip delay during downshifting to a lower resolution track, the buffer will not be depleted: Bl>(Rt−Rn)*Troundtrip.
Bt−Bl is determined by the statistical behavior of track change frequency and range. If the track changes very often, in order of Tc, the window should be large to accommodate such frequent change. If the track changes dramatically (e.g., from 20 kbps to 2 kbps), the buffer 200 can deplete quickly, in this case Bt should also be made larger. It should be understood, however, that it's not the case that the larger the Bt, the better, as this will waste network bandwidth (and users will potentially have to pay more) if the user abandons the transmission, or do a backward rewind to peruse content again. Therefore, Bt is preferably decided by how often the track change and how dramatically it changes.
In the above-described embodiment, if the buffer 200 undesirably depletes, the rendering/play is stopped, waiting for the next encoded chunk to arrive. In normal operation, the buffer 200 needs to be filled to Bt as described above before rendering will being, which also assists in ensuring that the rendering will last if the network condition become bad again. The drawback with such an implementation, however, is that such buffering requires a longer time, leading to bad user experience. In order to minimize such breaks in streaming, in another aspect, current network bandwidth is reviewed, and if the current network bandwidth is higher than the lowest encoding track and the transmission of content data encoded chunks is at that lowest bit, then rending is started right away, even if the level of the buffer 200 is not yet at the normal level Bt.
It should be apparent that other algorithms can be used to determine which track to use for a particular encoded chunk.
Given the above description, an application level multicast support and unicast caching support feature will now be discussed.
In one particular implementation of the system shown in
Although the present invention has been particularly described with reference to embodiments thereof, it should be readily apparent to those of ordinary skill in the art that various changes, modifications and substitutes are intended within the form and details thereof, without departing from the spirit and scope of the invention. Accordingly, it will be appreciated that in numerous instances some features of the invention will be employed without a corresponding use of other features. Further, those skilled in the art will understand that variations can be made in the number and arrangement of components illustrated in the above figures. It is intended that the scope of the appended claims include such changes and modifications.
This application claims priority from U.S. Provisional Application No. 60/797,486 titled “An End-To-End System That Delivers Full Version Of Long Form Content To Small Screen Terminals By Combining Text, Image And Streaming Media, And Employing A Client Only Adaptive Bit Rate Adjustment Mechanism Based On Multi-Rate Chunking To Ensure Uninterrupted Streaming In Real-Time In The Face Of Fluctuating Bandwidth Available To The Streaming Session” and filed on May 5, 2006, the contents of which are expressly incorporated by reference herein.
Number | Date | Country | |
---|---|---|---|
60797486 | May 2006 | US |