Live video streaming over HTTP (e.g., HAS) has gained immense popularity in the last five years. Existing HAS solutions use a pre-defined set of bitrate-resolution pairs (referred to as a bitrate ladder), with a fixed number of pairs. This approach, while simple to implement, fails to deliver a pleasant quality of experience (QoE) in real-world streaming setups, which often involve variable network conditions, device capabilities, and content complexities. Consequently, optimizing the bitrate ladder by dynamically adjusting the number and values of bitrates and resolutions during the live session to improve QoE while minimizing resource consumption remains a challenging problem.
An optimized bitrate ladder depends on the type of content and the available bandwidth of clients. Therefore, some solutions have been developed to optimize bitrate ladders based on these factors. These solutions are broadly classified into content-based and context-based teccniques. Content-based techniques involve analyzing video content or extracting relevant features to determine ideal encoding parameters. These features may include spatial and temporal complexity, motion activity, and color variance. Alternatively context-based techniques use network or client-related information to determine optimized bitrate ladders. These approaches take into account factors such as available network bandwidth and client device capabilities, including device display resolution and processing power. These techniques have shown good performance compared to fixed bitrate ladders. However, they largely depend on an offline phase and are primarily appropriate for video on-demand services, which makes their deployment for live streaming scenarios unfeasible.
Therefore, improved adaptive bitrate ladder optimization is desirable for live video streaming.
The present disclosure provides techniques for optimizing an adaptive bitrate ladder for live video streaming. A method for optimizing a bitrate ladder for live streaming, the method comprising: receiving a client-side input and an origin-side input during a first interval in a timeslot, the client-side input comprising a CDN log from a client, the origin-side input comprising a quality measure from an origin server; during the first interval, extracting from the CDN log a frequency of requests for each bitrate in a bitrate ladder in the timeslot and a duration of a recent stall event for the client's player; selecting, during a second interval in the timeslot, an optimized bitrate ladder comprising an optimal set of bitrates (OSB) using an optimization function, the optimization function taking as input the quality measure and a coefficient value determined using the frequency of requests and the duration of the recent stall event; and sending the optimized bitrate ladder to the origin server for live encoding a next segment.
In some examples, the method also may include selecting the coefficient value based on an average difference of quality and an average difference of bitrate. In some examples, the coefficient value is selected to decrease one or both of the average difference of quality and the average difference of bitrate. In some examples, the method also may include determining the coefficient value using a stall analysis function configured to determine the coefficient value and a binary variable based on a threshold mean stall duration. In some examples, the OSB comprises a new OSB when the binary variable comprises a True value. In some examples, the OSB comprises a previously selected OSB when the binary variable comprises a False value. In some examples, the CDN log comprises a URL of a HTTP request message, the duration of the recent stall event included in the URL in common media client data (CMCD) format. In some examples, the origin server comprises an origin agent and the quality measure comprises a measure of quality of a previously encoded segment by the origin server's live encoder. In some examples, the origin agent is deployed as a plug-in at the origin server and configured to measure perceptual quality. In some examples, the quality measure comprises one or both of a video multi-method assessment fusion (VMAF) and peak signal-to-noise ratio (PSNR). In some examples, the origin server comprises a live encoder configured to perform the live encoding of the next segment.
In some examples, the method also may include storing a tuple for each client that experienced a stall event, the tuple comprising a unique player identifier, a stall start time, and a stall end time. In some examples, the method also may include storing a number of requests received from a given client for each bitrate in a bitrate ladder. In some examples, selecting the optimized bitrate ladder comprises implementing a mixed-integer linear programming (MILP) model configured to perform a multi-objective optimization (MOO) function.
In some examples, the method also may include receiving a HTTP request from the client, the request comprising a selected segment and a requested bitrate; and providing the selected segment at the requested bitrate wherein the requested bitrate is included in the OSB or at a lower bitrate wherein the requested bitrate is not included in the OSB.
A distributed computing system may include: a distributed database configured to store client stall event information and bitrate ladders; and one or more processors configured to: receive a client-side input and an origin-side input during a first interval in a timeslot, the client-side input comprising a CDN log from a client, the origin-side input comprising a quality measure from an origin server; during the first interval, extract from the CDN log a frequency of requests for each bitrate in a bitrate ladder in the timeslot and a duration of a recent stall event for the client's player; select, during a second interval in the timeslot, an optimized bitrate ladder comprising an optimal set of bitrates (OSB) using an optimization function, the optimization function taking as input the quality measure and a coefficient value determined using the frequency of requests and the duration of the recent stall event; and send the optimized bitrate ladder to the origin server for live encoding a next segment. In some examples, the client stall event information is stored in tuples comprising a unique player identifier, a stall start time, and a stall end time.
A system for optimizing a bitrate ladder for live streaming, the system may include: a processor; and a memory comprising program instructions executable by the processor to cause the processor to implement: an analytics server configured to receive a client request comprising stall event information and an origin server message comprising a quality measure of a previously encoded segment, the analytics server further configured to select an optimal set of bitrates (OSB) using the stall event information and the quality measure; and an origin agent comprising a live encoder plug-in, the origin agent configured to measure perceptual quality of encoded segments and to request the encoder to adjust the bitrate ladder in accordance with the OSB selected by the analytics server. In some examples, the analytics server further is configured to implement a mixed-integer linear programming (MILP) model configured to perform a multi-objective optimization (MOO) function. In some examples, the MILP model is configured to receive as input a set of quality measures, a set of received requests for each bitrate in a bitrate ladder, and a coefficient value α.
The figures depict various example embodiments of the present disclosure for purposes of illustration only. One of ordinary skill in the art will readily recognize from the following discussion that other example embodiments based on alternative structures and methods may be implemented without departing from the principles of this disclosure, and which are encompassed within the scope of this disclosure.
The Figures and the following description describe certain embodiments by way of illustration only. One of ordinary skill in the art will readily recognize from the following description that alternative embodiments of the structures and methods illustrated herein may be employed without departing from the principles described herein. Reference will now be made in detail to several embodiments, examples of which are illustrated in the accompanying figures.
The above and other needs are met by the disclosed methods, a non-transitory computer-readable storage medium storing executable code, and systems for perceptually aware online per-title encoding.
In this invention, a bitrate ladder is optimized for live streaming services by utilizing information from the client and content delivery networks (CDNs) to improve quality of experience (QoE) and resource utilization within the delivery network. An end-to-end approach dynamically optimizes the bitrate ladder in live streaming applications leveraging real-time feedback from both origin and client sides. The invention comprises a highly scalable and plug-and-play solution (i.e., system) that seamlessly integrates with an existing HTTP adaptive streaming (HAS) solution. An end-to-end adaptive bitrate ladder optimization system, as described herein, can make the most out of both content-based and context-based bitrate ladder optimization techniques. Periodic real-time inputs may be received from a client (e.g., a video or media player and other devices configured to play videos and other media) and an origin server (e.g., an origin agent comprising or coupled with a live encoder) to dynamically select an optimized bitrate ladder comprising an optimal subset of bitrates (OSB) (i.e., an optimized temporary bitrate ladder) and to adjust the bitrate ladder accordingly during a live video session. This can result in a significant improvement in viewer QoE and reduction in encoding and delivery costs.
A system for optimizing a bitrate ladder may comprise an analytics server and an origin agent. The analytics server may determine an optimized bitrate ladder (e.g., periodically, responsively, per a schedule, on demand, ad hoc) based on inputs from various entities involved in the live streaming pipeline. The optimized bitrate ladder may comprise an OSB, as described herein). The origin agent may comprise a live encoder plugin configured to estimate a perceptual quality of every produced segment and request the encoder to adjust the bitrate ladder in accordance with an output (e.g., a decision) from the analytics server.
The analytics server may be located along a path between the client and the origin server. The analytics server may perform analytics and bitrate ladder optimization tasks (e.g., periodically, responsively, per a schedule, on demand, ad hoc). For example, during a live video session, the analytics server may collect client-side inputs (e.g., requested bitrate and stall duration) from each of a plurality of clients through a CDN and origin-side inputs (e.g., measured perceptual quality for each produced segment) from an origin server (e.g., a live encoder). The analytics server may use mixed-integer linear programming (MILP) to formulate a bitrate ladder optimization task as a multi-objective optimization problem. The result of said optimization comprises an optimized bitrate ladder (e.g., comprising OSB, as described herein), which may be provided to the origin server for a real-time encoding task.
The origin agent may be deployed as a plug-in at the origin server and configured to measure perceptual quality (e.g., in terms of video multi-method assessment fusion (VMAF), peak signal-to-noise ratio (PSNR), and the like). The origin agent may be further configured to communicate the perceptual quality measures to the analytics server (e.g., via in-band messages). Once the origin agent receives an updated bitrate ladder (i.e., an optimized bitrate ladder comprising OSB) from the analytics server, it may pass the updated bitrate ladder to an encoder (e.g., at the origin server) to for an encoding task.
In some examples, an origin server may comprise a commodity server that hosts a live encoder program to encode received content for a live camera into different bitrates-resolutions. The segments may be delivered into a distributed network (e.g., CDN). In some examples, a client may comprise a video or media playback device (e.g., player). The client may request (e.g., continuously, periodically, or otherwise) and buffer segments from a CDN network.
In some examples, the system for optimizing a bitrate ladder may operate in a timeslot manner, where each timeslot as a given duration of θ seconds. Within each time slot, an analytics server may receive inputs from a CDN server (e.g., on behalf of a client) and an origin agent (e.g., on the origin side). Each timeslot may be divided into two or more intervals, comprising a collecting requests (CR) interval and an optimizing bitrate ladder (OL) interval. During the CR interval, the analytics server may process metadata from CDN servers (e.g., CDN logs) to extract at least a frequency of requested different bitrates in the current timeslot and a duration of recent players' stall. For example, players may use Common Media Client Data (CMCD) to add stall information to a URL of a HTTP request message, thereby sending said stall information to a CDN server. A CDN server may transfer copies of relevant URLs to the analytics server. The origin agent may measure a quality of a plurality of produced segments by a live encoder and inform the analytics server accordingly. In this example, during the CR interval, the origin agent sends quality measures (e.g., PSNR, VMAF, and the like) of recently encoded video segments to the analytics server. The analytics server may use this information for updating the bitrate ladder. Moreover, the origin agent receives the recommended bitrate ladder from the analytics server and dictates it to a live encoder for encoding following live content (e.g., a next segment or plurality of segments). Any modifications made to the bitrate ladder are invisible to clients (e.g., players). A client receives a manifest, denoted by , that includes m different bitrates-resolutions (i.e., representations), which remain constant throughout a streaming session. A client may choose a representation from the manifest and send an HTTP request to buffer a subsequent segment. If the segment with the requested bitrate is present on the CDN server, the client may obtain it. Otherwise, the CDN server responds to the request by providing a segment encoded at a lower bitrate. In each OL interval, the analytics server may select an optimal subset of m bitrates (i.e., OSB), which may then be communicated to the origin agent. The live encoder may use the updated OSB to encode the live content.
In some examples, during each OL interval, an analytics server may determine an OSB for the offered representations in the manifest and inform the origin agent accordingly. In an example, a mixed integer linear programming (MILP) model may be used to provide OSB in each OL interval. For example, B={b1, b2, . . . , bm} may comprise a set of m bitrates in the manifest
to which segments may be encoded where bm comprises a highest bitrate. A set R={r1, r2, . . . , rm} consisting of m non-negative integer elements may be defined, where ri, 1≤i≤m, represents a number of requests for bi∈B. For each ri, a binary variable xi may be defined to indicate whether bitrate bi is included in the OSB (xi=1) or not (xi=0). However, if bi is not included in the OSB (xi=0), and there are still requests for that bitrate (i.e., ri>0), the analytics server may select a lower bitrate to serve those requests. In some examples, this may be handled by a set of (i−1) numbers of binary variables Yi={y(1,i), y(2,i), . . . , y(i−1,i)} where y(j,i)=1 shows bj will be transmitted to players requesting bitrate bi. This results in the following constraints:
The second constraint forces xj=1 when bitrate bj is added to the OSB to serve players requesting bitrates greater than bj. To prevent high bitrate changes, the number of chosen bitrates and differences between two consecutive OSBs may be limited to below thresholds 0<«m and β>0, where
represents a maximum length of OSB (e.g., 5, 6, etc.), and β represents a maximum change between two successive OSBs. Therefore:
where
where real variable q≥0 indicates the average difference of quality when bitrate bj is selected to serve all requests for bitrate bi (i.e., yj,i=1) and real variable s≥0 indicates the average difference of bitrate when bitrate bj is selected to serve all requests for bitrate bi (i.e., yj,i=1). With q and s, we can introduce the following multi-objective optimization (MOO) function:
where Q and S are used as upper-bounds of q and s for the normalization purpose, respectively. Coefficient α may be defined to prioritize q and s (e.g., to optimize reduction of or decreases in q and s). For example, by setting α=1, the analytics server may select a subset of bitrates from set B that minimizes the average quality degradation, thereby serving clients using the client-requested bitrates. In another example, by setting α=0, the analytics server may serve the requests with a lowest bitrate. The MILP model may be expressed as:
In =β=8, xi=0, ∀i, and for each bitrate bi, ri may be set to a random value between 50 and 100 (e.g., for a heterogeneous system regarding clients' requested qualities).
In some examples, the time complexity of the proposed MILP model is not affected by a number of clients and instead may be based on a number of bitrates in B. For example, if there are m bitrates in B, each bitrate bi has one binary variable xi and (i−1) number of variable yj,i. The total number of variables is therefore
where two real variables q and s are included. The number of constraints is equal to 2m+4.
As shown in
The following is an exemplary algorithm for an analytics server to determine an OSB:
CR interval starts
OL interval starts
Alg. 2
Alg. 3
Alg. 4
During each CR interval (i.e., lines 2-6 of Algorithm 1), an analytics server may collect quality measures of previously encoded segments as reported by an origin server, which may be saved in set I. In addition, the analytics server may process CDN servers' logs to extract stall information and a number of demands for each bitrate bi and store them in the following sets:
In some examples, a main task during an OL interval is to generate an OSB using a proposed optimization model (e.g., the MOO in Eq. (8)). Inputs to an optimization model (e.g., MOO) may include (1) set I, (2) set R, and (3) a coefficient value α. Selecting an optimal value a may include calling a StallAnalysis( ) function to determine an appropriate value for α in line 7 of Algorithm 1 above. An exemplary StallAnalysis( ) function may be:
A StallAnalysis( ) function may receive inputs, including a StallHistory set, which records an average duration of stalls in each timeslot, and a StallAlpha dictionary, which specifies an α value for each range of stall. A StallAlpha dictionary may be provided by a system administrator and can be updated during a streaming session. For example, if StallAlpha={[0,2]:1.0,[2,]:0.8}, then where an average stall duration falls in a range of [0,2] seconds, α may be set to 1.0, and where the average stall duration is greater than or equal to 2 seconds, α may be set to 0.8. In other examples, a StallAlpha parameter may be defined as {1: [0,1],0.9:[1,2],0.8:[2,3],0.7:[3,4],0.6:[4,5],0.5:[5,100]}, in which case where an average stall duration falls between 0 and 1 in each time slot, the optimization mode will run with α=1, and so on. A value of α may be determined in line 4 of Algorithm 2 above. By having the mean of the stall, denoted by mstall, and the value of the mean stall in the last previous timeslot, denoted by lastmstall, the StallAnalysis( ) may adjust a threshold Ts (e.g., Algorithm 2, lines 6-10) according to a difference between mstall and lastmstall. In line 12 of Algorithm 2, binary (e.g., Boolean) variable flag may be defined with an initial value False. Thereby, if stall events increase significantly, a generated random number in line 13 with a high probability is less than or equal to Ts. Consequently, if flag=True, a new OSB is required to prevent experiencing further stall events by client players. Algorithm 2 may return values of α and flag to Algorithm 1.
Returning to Algorithm 1, an optimization function may be called in line 8 with inputs R, I, and α. An exemplary optimization function may be:
In an optimization function, an OSBHistory set may be declared to store produced OSBs. In line 3 of the optimization function, an EstimateQI( ) function may be called with input parameter I to train an estimator function F (e.g., from Eq. (5)), for example, using a linear regression technique. After that, a MILP model (e.g., Eq. (8)) may be run with appropriate inputs to produce an OSB in line 4 of Algorithm 3. The OSB may be determined by selected values of xi variables. The value of q may be returned along with the OSB to Algorithm 1.
Returning to Algorithm 1 above, if the StallAnalysis( ) function returns a True flag (i.e., flag=True), an analytics server may use a simple RESTful API to notify a newly determined OSB to an origin agent. On the other hand, if the stall events are insignificant (i.e., flag=False), another metric may be considered to ensure that the last OSB remains unchanged by calling a QualityAnalysis( ) function at line 12 of Algorithm 1. An exemplary QualityAnalysis( ) function may be:
In lines 3-5 of Algorithm 4, a difference between a quality of a requested bitrate and a quality of a served bitrate may be measured. The quality of served bitrates may be available from set I. Other bitrates may use function F (e.g., from Eq. (5)). If a mean of quality difference is high, client players may be requesting higher bitrates due to various conditions (e.g., high available bandwidth), while OSB is providing lower bitrates. In such case, threshold Tq≤1 may be adjusted according to the gap between obtained q by an optimization function (e.g., Algorithm 3) and the mean of quality difference stored in q* (i.e., lines 6-11 of Algorithm 4). If a generated random number is less than Tq in line 13, the analytics server may send an obtained OSB to the origin agent in line 14. Otherwise, a live encoder may continue with a previous OSB.
Analytics server 203a may receive request's URLs from web server 203b and qualities of encoded segments from origin agent 207. Analytics server 203a may implement a timeslot and run an optimization model (e.g., Eq. (8) MOO) to determine an OSB. In this example, analytics server 203a may inform origin agent 207 and web server 203b of the OSB.
Origin agent 207 may send calculated quality measures (e.g., PSNR or VMAF values) to analytics server 203a. Origin agent 207 also may update encoder settings based on an OSB received from analytics server 203a. In some examples, during encoding, live encoder 208 may compute quality measure values and save a tuple of (segmentID, bitrate, quality-indicator-value) for each encoded segment in a log file. Origin agent 207 also may read the saved data from the log file and send it to analytics server 203a, for example, using a TCP socket. In addition, upon receiving an OSB from analytics server 203a, origin agent 207 may update a setting file by adding selected optimal subset of m bitrates and OSB ID to the setting file. Subsequently, live encoder 208 may encode follow-on (i.e., next) segment(s) based on a latest added OSB, for example, by adjusting arguments of an ffmpeg's arguments.
Computing device 601, which in some examples may be included in mobile device 601 and in other examples may be included in a server (e.g., dual-processor server), also may include a memory 602. Memory 602 may comprise a storage system configured to store a database 614 and an application 616. Application 616 may include instructions which, when executed by a processor 604, cause computing device 601 to perform various steps and/or functions (e.g., implementing algorithms described herein and other aspects of optimizing an adaptive bitrate ladder), as described herein. Application 616 further includes instructions for generating a user interface 618 (e.g., graphical user interface (GUI)). Database 614 may store various algorithms and/or data, including networks and data relating to bitrates, client information, videos, video segments, bitrate-resolution pairs, target encoding sets, device characteristics, network performance, among other types of data. Memory 602 may include any non-transitory computer-readable storage medium for storing data and/or software that is executable by processor 604, and/or any other medium which may be used to store information that may be accessed by processor 604 to control the operation of computing device 601.
Computing device 601 may further include a display 606, a network interface 608, an input device 610, and/or an output module 612. Display 606 may be any display device by means of which computing device 601 may output and/or display data (e.g., to play decoded video). Network interface 608 may be configured to connect to a network using any of the wired and wireless short range communication protocols described above, as well as a cellular data network, a satellite network, free space optical network and/or the Internet. Input device 610 may be a mouse, keyboard, touch screen, voice interface, and/or any or other hand-held controller or device or interface by means of which a user may interact with computing device 601. Output module 612 may be a bus, port, and/or other interfaces by means of which computing device 601 may connect to and/or output data to other devices and/or peripherals.
In one embodiment, computing device 601 is a data center or other control facility (e.g., configured to run a distributed computing system as described herein), and may communicate with a media playback device and other client devices. As described herein, system 600, and particularly computing device 601, may be used for video playback, running an application, encoding and decoding video data, providing feedback to a server, measuring perceptual quality, implementing models, and otherwise implementing steps in an adaptive bitrate ladder optimization method, as described herein. Various configurations of system 600 are envisioned, and various steps and/or functions of the processes described below may be shared among the various devices of system 600 or may be assigned to specific devices.
While specific examples have been provided above, it is understood that the present invention can be applied with a wide variety of inputs, thresholds, ranges, and other factors, depending on the application. For example, the time frames and ranges provided above are illustrative, but one of ordinary skill in the art would understand that these time frames and ranges may be varied or even be dynamic and variable, depending on the implementation.
As those skilled in the art will understand, a number of variations may be made in the disclosed embodiments, all without departing from the scope of the invention, which is defined solely by the appended claims. It should be noted that although the features and elements are described in particular combinations, each feature or element can be used alone without other features and elements or in various combinations with or without other features and elements. The methods or flow charts provided may be implemented in a computer program, software, or firmware tangibly embodied in a computer-readable storage medium for execution by a general-purpose computer or processor.
Examples of computer-readable storage mediums include a read only memory (ROM), random-access memory (RAM), a register, cache memory, semiconductor memory devices, magnetic media such as internal hard disks and removable disks, magneto-optical media, and optical media such as CD-ROM disks.
Suitable processors include, by way of example, a general-purpose processor, a special purpose processor, a conventional processor, a digital signal processor (DSP), a plurality of microprocessors, one or more microprocessors in association with a DSP core, a controller, a microcontroller, Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) circuits, any other type of integrated circuit (IC), a state machine, or any combination of thereof.