Many multimedia applications, such as news-on-demand, distance learning, and corporate training, rely on the efficient transfer of pre-recorded or live multimedia streams between a server computer and a client computer. These media streams may be captured and displayed at a predetermined rate. For example, video streams may require a rate of 24, 29.97, 30, or 60 frames per second. Audio streams may require 44,100 or 48,000 samples per second. An important measure of quality for such multimedia communications is the precisely timed playback of the streams at the client location.
Achieving this precise playback is complicated by the popular use of variable bit rate (VBR) media stream compression. VBR encoding algorithms allocate more bits per time to complex parts of a stream and fewer bits to simple parts, in order to keep the visual and aural quality reasonably uniform. For example, an action sequence in a movie may require more bits per second than the credits that are displayed at the end.
VBR compression may result in bursty network traffic and uneven resource utilization when streaming media. Additionally, due to the different transmission rates that may occur over the length of a media stream, transmission control techniques may need to be implemented so that a client buffer neither underflows or overflows. Transmission control schemes generally fall within one of two categories: they may be server-controlled or client-controlled.
Server-controlled techniques generally pre-compute a transmission schedule for a media stream based on a substantial knowledge of its rate requirements. The variability in the stream bandwidth is smoothed by computing a transmission schedule that has a number of constant-rate segments. The segment lengths are calculated such that neither a client buffer overflow nor an underflow will occur.
Server-controlled algorithms may use one or more optimization criteria. For example, the algorithm may minimize the number of rate changes in the transmission schedule, may minimize the utilization of the client buffer, may minimize the peak rate, or may minimize the number of on-off segments in an on-off transmission model. The algorithm may require that complete or partial traffic statistics be known a-priori.
Client-controlled algorithms may be used rather than server-controlled algorithms. In a client-controlled algorithm, the client provides the server with feedback, instructing the server to increase or decrease its transmission rate in order to avoid buffer overflow or starvation.
Systems and techniques are provided for using a multi-threshold buffer model to smooth data transmission to a client.
In general, in one aspect, a method includes receiving data such as streaming media data from a server transmitting the data at a first transmission rate. At least some of the received data is stored in a buffer. The buffer level is determined at different times. For example, a first buffer level is determined at a time, and a second buffer level is determined at a later time. The different buffer levels are compared to a plurality of buffer thresholds. For example, the first buffer level and the second buffer level are compared to the buffer thresholds to determine if one or more of the buffer thresholds is in the range between the first buffer level and the second buffer level (where the range includes the first buffer level and the second buffer level).
If at least one threshold is in the range, a second server transmission rate may be determined, based on the at least one threshold. The second server transmission rate may be predetermined (e.g., may be chosen from a list), or may be calculated.
Information based on the second server transmission rate may be transmitted to the server. For example, the second server transmission rate may be transmitted, or a change in server transmission rate may be transmitted. If the second server transmission rate is not different than the first transmission rate, rate information may or may not be transmitted to the server.
The second server transmission rate may be based on a difference between a buffer level and a target buffer level. Different methods may be used to determine second server transmission rates, depending on which threshold is in the range from the first buffer level to the second buffer level. For example, a first calculation method may be used to determine the second server transmission rate if a particular threshold is in the range, while a second calculation method may be used if a different threshold is in the range. Alternately, the second server rate may be calculated for a particular threshold, and may be chosen for a different threshold.
The second server transmission rate may be based on one or more predicted future consumption rates. Future consumption rates may be predicted using one or more past consumption rates. Future consumption rates may be predicted using a prediction algorithm. For example, an average consumption rate algorithm, an exponential average consumption rate algorithm, or a fuzzy exponential average algorithm may be used. One or more weighting factors may be used in the prediction algorithm.
In general, in one aspect, a method for transmitting data such as continuous media data includes transmitting data at a first transmission rate, receiving a communication from a client including rate change information, and transmitting additional continuous media data at a second transmission rate based on the rate change information. The rate change information may be determined using a plurality of buffer threshold levels.
The details of one or more implementations are set forth in the accompanying drawings and the description below. Other features and advantages will be apparent from the description and drawings, and from the claims.
Like reference symbols in the various drawings indicate like elements.
As explained above, server-controlled or client-controlled algorithms may be used for transmission control of streaming continuous media data. Server-controlled techniques may have several disadvantages. For example, they may not work with live streams where only a limited rate history is available. Additionally, they may not adjust to changing network conditions, and they may get disrupted when users invoke interactive commands such as pause, rewind, and fast forward.
Client-controlled algorithms may be a better choice in a dynamic environment. A client-controlled technique may more easily adapt to changing network conditions. In addition, a simpler and more flexible architecture may be used, since the server does not need to be aware of the content format of the stream. Therefore, new media types such as “haptic” data can automatically be supported without modification of the server software.
However, available client-controlled techniques have a number of drawbacks, including feedback overhead and response delays. Available techniques may not adapt sufficiently quickly to avoid buffer starvation or overflow.
The present application is directed to systems and techniques for providing continuous media data to end users effectively. Using multi-threshold flow control (MTFC), the current systems and techniques may avoid buffer overflow or starvation, even when used in a bursty environment such as a VBR environment. Unlike some available techniques, a-priori knowledge of the actual bit rate for the media stream is not necessary.
The current systems and techniques may be implemented in LAN, WAN, or other network environments, with a range of buffer sizes and prediction windows. Referring to
Client 130 includes a client buffer 140, with a capacity equal to B. Client buffer 140 may be used to store data prior to decoding and display/playback. For example, when client 130 is receiving streaming video data from server 110, buffer 140 stores data to be subsequently decoded by a media decoder and displayed to the user. If buffer 140 overflows, some of the data may be lost, and there may be a “hiccup” in the display. Similarly, if buffer 140 empties (i.e. “starves,”) there may be a hiccup in the display until additional data is received. Therefore, managing the buffer level is important to providing a high quality display or playback to an end user.
Client 130 (and/or associated machines) also includes circuitry and/or software for implementing multi-threshold flow control. For example, client 130 can receive streaming media data from the server (i.e., client 130 has a network connection), can store at least some of the data in buffer 140 prior to decoding using a decoder (e.g., implemented in hardware and/or software), can determine the buffer level at different times, and can determine whether one or more thresholds has been passed since a previous determination of a buffer level. Client 130 also includes circuitry and/or software to implement a prediction algorithm, and to determine a new server sending rate and/or rate change, and to transmit the server transmission information to server 110.
Similarly, server 110 (and/or one or more associated machines) includes circuits and/or software for transmitting continuous media data to one or more clients, for receiving communications from the one or more clients, and for updating a server transmission rate based on transmission information contained in a communication from the one or more clients.
Buffer Model
Referring to
Watermark 210-U is set at an underflow threshold protection level; that is, a percentage of the buffer capacity that indicates that buffer starvation may be imminent. Watermark 210-1 marks a low buffer warning threshold. When the buffer level falls below watermark 210-1, the buffer is nearing starvation.
Similarly, watermark 210-O is set at an overflow threshold protection level; that is, a percentage of the buffer capacity that indicates that buffer overflow may be imminent. The overflow threshold protection level may be the same as or different than the underflow threshold protection level. Watermark 210-N is the overflow buffer warning threshold.
Using a model such as buffer model 200 may allow smooth streaming of media data from the server to the client. The number of intermediate watermarks N may be varied to provide greater control over the buffer level (larger N) or to provide less control with fewer rate adjustments (smaller N). The watermark spacing may be equidistant, based on an arithmetic series (see FIG. 3A), based on a geometric series (see FIG. 3B), or may be set using a different method.
For equidistant spacing, the underflow and overflow thresholds may first be determined. For example, the underflow threshold may be set as 5% of the buffer capacity, and the overflow threshold may be set as 95% of the buffer capacity.
The number of intermediate watermarks N may be chosen (e.g., selected or predetermined), with N≧1. More typically, the number of intermediate watermarks is greater than one, for smoother traffic (see, e.g., FIG. 7 and the related discussion, below). Denoting the threshold for the underflow watermark as Wu, the threshold for the overflow watermark as Wo, the thresholds Wi for each of the intermediate watermarks i=1 through i=N are then given by:
Transmission Smoothing
Whenever a threshold Wi is crossed, a new server sending rate may be calculated, and a rate adjustment (or equivalently, a new server sending rate) may be sent to the server. For example, when the RTSP protocol is used, the information may be sent to the server using an RTSP feedback command. The action taken may depend on which threshold has been crossed. For example, if the Wo or Wu thresholds are crossed, more aggressive action may be taken than if one of the intermediate thresholds Wi is crossed. Similarly, if the warning thresholds W1 or WN are crossed, the action taken may be more aggressive than if a different intermediate threshold Wi had been crossed, but may be less aggressive than if Wo or Wu had been crossed.
For example, if the Wo threshold is exceeded, the server may be paused (i.e., the sending rate may be set to zero), or its sending rate substantially decreased. The server may remain paused until the buffer level crosses a particular threshold or a particular value (e.g., the N/2 threshold, or the mid-point of the buffer capacity).
Similarly, if the buffer Wu threshold is crossed, the server sending rate may be increased substantially; for example, it may be increased to about one and a half times the average server sending rate until the buffer level reaches a particular value or threshold. When the intermediate thresholds are crossed, new server sending information may be determined by choosing particular rate change amounts or by calculating new server sending information as described below.
In a simple system, the rate change amounts may be predetermined. For example, in an implementation with five intermediate thresholds W1-W5, the interval between packets may be set to 20% less than a default interval for W1, to 10% less than a default interval for W2, to the default interval for W3, to 10% greater than the default interval for W4, and for 20% greater than the default interval for W5.
Referring to
In operation, the server transmits streaming media data to a client at a server transmission rate (430). The client receives the streaming media data and stores at least some of the data in a buffer prior to decoding (440). At intervals, the buffer level is determined and compared to the previous buffer level to determine whether one or more thresholds has been crossed (450). If a threshold has been crossed, server transmission information (e.g., a new server transmission rate and/or a rate change) is calculated (460), and if it is different from the previous server transmission rate, the server transmission information is communicated to the server (470). Note that the method steps of
Techniques to provide data transmission smoothing may use a number of different components and variables. Table 1 includes a list of the parameters used herein.
Rate Change Computation
In order to determine an amount by which the server sending rate may be adjusted, the server sending rate, the decoder consumption rate, and the buffer level may be sampled at time intervals equal to Δtobs. If the observed buffer level bobs crosses any of the thresholds Wi, a new server sending rate is computed using Equation (2A) below, and the related rate change Δr is shown in Equation (2B).
Equation (3) below shows how C is related to the predicted future consumption rates {circumflex over (r)}i. (the prediction of future consumption rates is discussed more fully below):
When crossing the thresholds W1 and WN, the computed rate change Δr may not be sufficient to avoid reaching WU and Wo, respectively, due to the error margin of the prediction algorithms. Although the error margin may be reduced, doing so adds computational complexity that may not be desired in certain situations.
An alternative is to add or subtract a mean absolute percentage error (MAPE) from {circumflex over (r)}i, as shown in Equations (4A) and (4B). Equation (4A) shows how an adjusted {circumflex over (r)}i may be calculated when WN is reached, while Equation (4B) shows how an adjusted {circumflex over (r)}i may be calculated when W1 is reached.
{circumflex over (r)}i(adjusted)={circumflex over (r)}i×(1−MAPE) Equation (4A)
{circumflex over (r)}i(adjusted)={circumflex over (r)}i×(1−MAPE) Equation (4B)
Equation (5) shows how a MAPE value may be computed. In Equation (5), P is the number of prediction samples up to the current prediction time.
Consumption Rate Prediction
Rather than requiring knowledge of the bit rate of the media stream prior to transmission, the current systems and techniques predict a consumption rate, so that live streams (e.g., streams that are being produced and transmitted as the events they depict-such as a live concert or distance learning session-occur) may be provided to end users.
Consumption rate prediction may observe the wobs most recent rate samples to predict wpred samples into the future. For example, if wobs=10 and wpred=2, the 10 previous rate samples may be used to predict the rate 2 samples into the future. The observation window R includes wobs previous rate values <r1, r2, . . . , robs>, while prediction window {circumflex over (R)} includes the wpred predicted rate values <{circumflex over (r)}1, {circumflex over (r)}2, . . . , {circumflex over (r)}wpred>. The estimated future rate is denoted {circumflex over (r)}.
Prediction algorithms may be based on a number of different schemes. For example, an average consumption rate algorithm may be used, an exponential average algorithm may be used, or a fuzzy exponential average algorithm may be used.
An average consumption rate algorithm may predict the average consumption rate of the prediction window {circumflex over (R)} using an average consumption rate of the observation window R, according to Equation (6):
An exponential average consumption rate algorithm may be used to give more weight to some samples in the observation window than to others. A smoothed consumption rate parameter for i=1 is set to r1, while the remainder of the SCR[i] are given by Equation (7) below, where αcr is a weighting parameter.
SCR[i]=αcr×SCR[i−1]+(1−αcr)×ri−1 Equation (7)
The estimated future rate is then given by Equation (8) below.
{circumflex over (r)}=SCR[wobs+1] Equation (8)
There are two variations in applying this algorithm to forecast the future consumption rates during the prediction window {circumflex over (R)}. The first variation, which will be referred to as the “expanding window exponential average algorithm,” predicts {circumflex over (r)}i based on an increasing window <{circumflex over (R)}, {circumflex over (r)}1, {circumflex over (r)}2, . . . , {circumflex over (r)}i−1> using Equation (8). The expanding window exponential average algorithm increases the window size by one sample each time a new {circumflex over (r)}i is generated. The second variation, which will be referred to as the “sliding window exponential average algorithm,” keeps the window size constant and slides the observation window R forward when a new {circumflex over (r)}i is generated.
A fuzzy exponential average consumption rate algorithm may be used to generate the {circumflex over (r)}i by combining a fuzzy logic controller with the window exponential average algorithm. Using the fuzzy exponential average algorithm, the parameter αcr is dynamically calculated.
The parameter αcr controls the weight given to different samples. When αcr is large, more weight is given to past samples. When αcr is small, more weight is given to the more recent samples. Therefore, if the variability in the consumption rate in the system is small (i.e., the bit rate of the stream is fairly constant), the prediction error should be small, and a large αcr may be used. On the other hand, if the variability is large (e.g., the stream is bursty), a small αcr is appropriate, so that more recent sample data is weighted more heavily.
Referring to
The variability of a stream may be characterized by a normalized variance var, calculated according to Equation (9) below.
Referring to
Feedback Message Delay
The round-trip feedback message delay (dfeedback) is an important factor in the transmission rate smoothing. The delay may be configured to be a conservatively estimated constant delay, or may be based on one or more measurements. The delay may be estimated dynamically, based on a prediction algorithm, to more closely reflect the transmission delay in the network.
The systems and techniques described herein can provide a number of benefits for streaming media data. Referring to
As
Feedback messages from the client to the server introduce overhead. In order to reduce consumption of network resources for control purposes, the overhead may be reduced by reducing the number of rate changes. However, there may be a trade-off between the number of rate changes and the smoothness of the traffic.
Referring to
Various implementations of the systems and techniques described here may be realized in digital electronic circuitry, integrated circuitry, specially designed ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various implementations may include implementation in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, configured to receive and/or transmit data and instructions, at least one input device, and at least one output device.
A number of implementations have been described. Nevertheless, it will be understood that various modifications may be made without departing from the spirit and scope of the invention. For example, different buffer sizes, threshold numbers, prediction window sizes, etc. may be used. Accordingly, other embodiments are within the scope of the following claims.
The present application claims priority to co-assigned U.S. Provisional Patent Application No. 60/352,071, entitled “A MULTI-THRESHOLD ONLINE SMOOTHING TECHNIQUE FOR VARIABLE RATE MULTIMEDIA STREAMS,” filed on Jan. 25, 2002, which is hereby incorporated by reference in its entirety.
The invention described herein was made in the performance of work funded in part by NSF grants EEC-9529152 (IMSC ERC) and IIS-0082826, and is subject to the provisions of Public Law 96-517 (35 U.S.C. 202) in which the contractor has elected to retain title.
Number | Date | Country | |
---|---|---|---|
20030165150 A1 | Sep 2003 | US |
Number | Date | Country | |
---|---|---|---|
60352071 | Jan 2002 | US |