This disclosure generally relates to compressed video, including but not limited to compressed video for Internet protocol (IP) streaming over large-scale carrier-grade video services.
IP video streaming over large-scale video services lacks a guarantee of visual quality achieved in traditional cable and satellite digital video broadcasting services.
Various objects, aspects, features, and advantages of the disclosure will become more apparent and better understood by referring to the detailed description taken in conjunction with the accompanying drawings, in which like reference characters identify corresponding elements throughout. In the drawings, like reference numbers generally indicate identical, functionally similar, and/or structurally similar elements.
Various embodiments are discussed in detail below. While specific implementations are discussed, it should be understood that this is done for illustration purposes only. A person skilled in the relevant art will recognize that other components and configurations can be used without parting from the spirit and scope of the present disclosure.
IP video streaming over large-scale video services lacks a guarantee of visual quality achieved in traditional cable and satellite digital video broadcasting services. Central to establishing a video-quality guarantee are compressed video buffering and timing as video frames are transmitted end-to-end from an encoder to a decoder over a distribution path in an IP network. In this disclosure, techniques for compressed video buffering are presented that address IP video traffic regulation for end-to-end video network quality of service (QoS). A set of relationships are established between decoder buffer size, network transmission rate, network delay and jitter, and video source characteristics (e.g., burstiness, data rate, and frame size). Comparisons of the techniques of this disclosure used for implementations coded with the Moving Picture Experts Group (MPEG) standard MPEG-2, Advanced Video Communications (AVC) techniques, and High Efficiency Video Coding (HEVC) standards are presented, including measurements and comparisons of the burstiness of the associated video streams. The applicability of the techniques of this disclosure to IP networks with a specific class of routers is also demonstrated.
Service providers and end users desire a visual quality of IP video delivery to match a visual quality of existing cable and satellite digital video broadcasting services (referred to herein as carrier-grade quality). Improved compressed video buffering and timing is described in the present disclosure, for improving IP video delivery to carrier-grade quality and improving QoS.
In an aspect, a system controls a transmission of a sequence of compressed video data from an encoder buffer to a network for delivery to a decoder buffer. Control of the transmission includes to: determine characteristics of a video transmission path between the encoder buffer and the decoder buffer, the characteristics comprising at least one of a buffer size of the decoder buffer, an input rate of the decoder buffer, and a buffer size of an equivalent intermediary buffer of the video transmission path; determine a transmission rate from the characteristics of the video transmission path and from the sequence of compressed video data, the transmission rate being determined such that a target quality of service value can be guaranteed for the entire sequence of compressed video data transmitted at the determined transmission rate to the decoder buffer; and control transmission of the sequence of compressed video data at the determined transmission rate.
In another aspect, a method includes determining, in a video source equipment, a transmission rate for compressed video data and transmitting the compressed video data at the determined rate from a transmitter of the video source. The determined transmission rate is sufficient to avoid underflow and overflow of a decoder buffer in a memory of a receiver of a distribution path, where the distribution path is between the encoder buffer and the decoder buffer. Determining the transmission rate includes identifying a combined amount of compressed video data of an encoder buffer of the video source equipment, the total amount including a first amount of compressed video data presently in the encoder buffer and a second amount of compressed video data to be provided to the encoder buffer, and identifying a third amount of compressed video data presently in at least one intermediary buffer in the distribution path. Determining the transmission rate further includes calculating an expected amount of data to be received at the decoder buffer over a time interval, ensuring that the combined amount of compressed video data plus the third amount of compressed video data is greater than or equal to the expected amount of data to be received at the decoder buffer, and ensuring that the expected amount of data to be received at the decoder buffer minus the first amount of compressed video data presently in the encoder buffer and minus the third amount of compressed video data presently in the at least one intermediary buffer is less than or equal to a maximum size of the decoder buffer.
In another aspect, an apparatus includes at least one memory including a video encoder data buffer, transmitter circuitry to provide data from the encoder data buffer to a distribution path and directed to a receiver in the distribution path, and processor circuitry. The processor circuitry determines a transmission rate for compressed video data to be transmitted by the transmitter, the transmission rate determined to ensure that a decoder buffer of the receiver will not underflow or overflow during transmission of a defined length video sequence to the receiver. The processor circuitry provides the determined transmission rate to network nodes in the distribution path for quality of service provisioning. The processor circuitry controls the transmitter circuitry to transmit the video sequence at the determined transmission rate.
Prior to discussing specific embodiments of the present solution, it may be helpful to describe aspects of the operating environment as well as associated system components (e.g., hardware elements) in connection with the methods and systems described herein. In the following discussions, a wired network may be based on any available wired network standard, including but not limited to Ethernet, Universal Serial Bus (USB), and IEEE 1394 (Firewire™); and a wireless network may be based on any available wireless network standard, including but not limited to WiFi (802.11a, 802.11b, 802.11g, 802.11n, 802.11ac, etc.), Bluetooth, ZigBee, CDMA, GSM, Long Term Evolution (LTE), LTE Advanced, High-Speed Packet Access (HSPA) or Evolved HSPA (HSPA+), Wideband CDMA (WCDMA), Enhanced Voice-Data Optimized (EVDO), Worldwide Interoperability for Microwave Access (WiMAX), General Packet Radio Service (GPRS), and Enhanced Data Rates for GSM Evolution (EDGE).
In one or more embodiments, one or more devices 110 and access points 120 are, or are included as part of, a computing device. In one or more embodiments, a computing device includes a plurality of network interfaces, such as one or more wired interfaces for connecting to a wired network and one or more wireless interfaces for connecting to a wireless network. In one or more embodiments, a computing device uses these interfaces separately (e.g., connecting via WiFi when available and a cellular communication standard when WiFi is unavailable).
Examples of devices 110 include without limitation laptop computers, tablets, personal computers and/or cellular telephone devices. Computing devices are described more generally below with reference to
Devices 110 are illustrated in
Access point 120 allows one or more devices 110 to connect to a network through access point 120. For example, in one or more embodiments, access point 120 connects to a wired Ethernet connection and provides wireless connections using radio frequency links for wireless devices 110 to access the wired connection through access point 120.
In one or more embodiments, access point 120 is configured, designed and/or built for operating in a wireless local area network (WLAN). In one or more embodiments, access point 120 connects to a router (e.g., via a wired network), and in one or more embodiments, access point 120 is a component of a router. In one or more embodiments, access points 120 supports a standard for sending and receiving data using one or more radio frequencies. In one or more embodiments, one or more access points 120 support public Internet hotspots, or an internal network to extend the network's Wi-Fi signal range.
In one or more embodiments, access points 120 are used in wireless networks (e.g., based on a radio frequency network protocol). Correspondingly, in one or more embodiments, one or more devices 110 include a built-in radio or a coupling to a radio. In one or more embodiments, devices 110 have the ability to function as a client node seeking access to resources (e.g., data, and connection to networked nodes such as servers) via one or more access points 120.
In one or more embodiments, access points 120 may be operably coupled to network hardware 130 via local area network connections. Examples of network hardware 130 include without limitation a router, gateway, switch, bridge, modem, system controller, and appliance. In one or more embodiments, network hardware 130 provides a local area network connection. In one or embodiments, an access point 120 has an associated antenna or an antenna array to communicate with devices 110 which are wireless. Such wireless devices 110 may register with a particular access point 120 to receive services (e.g., via a SU-MIMO or MU-MIMO configuration). In some embodiments, devices 110 communicate directly via an allocated channel and communications protocol (e.g., point-to-point communications). In one or more embodiments, one or more wireless devices 110 are mobile with respect to corresponding access point(s) 120; in one or more embodiments, one or more wireless devices 110 are relatively static with respect to corresponding access point(s) 120.
Examples of network connections include without limitation a point-to-point network, a broadcast network, a telecommunications network, a data communication network, and a computer network. Examples of network topology include without limitation a bus, star, or ring network topology. In some embodiments, different types of data may be transmitted via different protocols. In other embodiments, the same types of data may be transmitted via different protocols.
CPU 205 represents processing functionality implemented in one or more of a processor, microprocessor, microcontroller, ASIC, and/or FPGA, along with associated logic. More generally, CPU 205 is any logic circuitry that responds to and processes instructions fetched from memory 210. Examples of CPU 205 include processors manufactured by Intel Corporation of Mountain View, Calif.; International Business Machines of White Plains, N.Y.; or Advanced Micro Devices of Sunnyvale, Calif.
Memory 210 represents one or more memory devices capable of storing data and/or storing instructions (e.g., operating system and application software). Portions of memory 210 are accessed by CPU 205 through a bus, or through a direct memory access (DMA) device or function. Memory 210 includes semiconductor memories such as random access memory (RAM, e.g., static RAM (SRAM), dynamic RAM (DRAM), and ferroelectric RAM (FRAM), among others), or other semiconductor devices (e.g., NAND flash, NOR flash, and solid state drives (SSD), among others). In the embodiment shown in
In the embodiment shown in
In the embodiment shown in
I/O devices 245 include input devices such as keyboards, mice, trackpads, trackballs, microphones, dials, touch pads, touch screen, and drawing tablets, and output devices such as video displays and speakers. The I/O devices in one or more embodiments are controlled by an I/O controller 230 as shown in
Referring again to
Computing device 200 in one or more embodiments includes a network interface 220 providing one or more connections such as LAN or WAN links (e.g., 802.11, T1, T3, 56 kb, X.25, SNA, DECNET), broadband connections (e.g., ISDN, Frame Relay, ATM, Gigabit Ethernet, Ethernet-over-SONET), wireless connections, or some combination of any or all of the above. Connections are established using associated protocols (e.g., TCP/IP, IPX, SPX, NetBIOS, Ethernet, ARCNET, SONET, SDH, Fiber Distributed Data Interface (FDDI), RS232, IEEE 802.11, IEEE 802.11a, IEEE 802.11b, IEEE 802.11g, IEEE 802.11n, IEEE 802.11ac, IEEE 802.11ad, CDMA, GSM, WiMax and direct asynchronous connections). In one or more embodiments, computing device 200 communicates with other computing devices via a gateway or tunneling protocol such as Secure Socket Layer (SSL) or Transport Layer Security (TLS). Network interface 220 in one or more embodiments includes a built-in network adapter, network interface card, PCMCIA network card, card bus network adapter, wireless network adapter, USB network adapter, modem or other device suitable for interfacing computing device 200 to a network capable of communication and performing the operations described herein.
In one or more embodiments, computing device 200 includes or is connected to one or more display devices 225. As such, any of I/O devices 245 and/or I/O controller 230 includes suitable hardware, software, or combination of hardware and software to support, enable or provide for the connection and use of display device(s) 225 by computing device 200. For example, in one or more embodiments, computing device 200 includes a video adapter, video card, driver, and/or library to interface, communicate, connect or otherwise use display device(s) 225. In one or more embodiments, a video adapter includes multiple connectors to interface to display device(s) 225. In one or more embodiments, computing device 200 includes multiple video adapters, with each video adapter connected to display device(s) 225. In one or more embodiments, computing device 200 communicates with multiple displays 225. One ordinarily skilled in the art will recognize and appreciate the various ways and embodiments that a computing device 200 connects to, includes, and/or controls one or more display devices 225.
In one or more embodiments, bridge 260 provides a connection between the shared bus 250 and an external communication bus, such as a USB bus, an Apple Desktop Bus, an RS-232 serial connection, a SCSI bus, a FireWire bus, a FireWire 800 bus, an Ethernet bus, an AppleTalk bus, a Gigabit Ethernet bus, an Asynchronous Transfer Mode bus, a FibreChannel bus, a Serial Attached small computer system interface bus, a USB connection, or a HDMI bus.
A computing device 200 of the sort depicted in
A computing device 200 is, for example, a workstation, telephone, desktop computer, laptop or notebook computer, server, handheld computer, mobile telephone or other portable telecommunications device, media playing device, a gaming system, mobile computing device, or any other type and/or form of computing, telecommunications or media device that is capable of communication. Computing device 200 has sufficient processor power and memory capacity to perform the operations described herein. Aspects of the operating environments and components described above will become apparent in the context of the systems and methods disclosed herein.
In the embodiment of
In some video compression standards (e.g., MPEG-2, AVC, and HEVC), a hypothetical reference decoder (HRD) or video buffer verifier (VBV) model (referred to herein as a buffering model) is specified for modeling the transmission of compressed video data from encoder 305 to decoder 355. The buffering model generally imposes constraints on variations in bit rate over time in a compressed bit stream with respect to timing and buffering. Such a buffering model has been used in the past for carrier-grade video transmission (e.g., digital video broadcasting via cable and satellite).
where f is the frame rate (e.g., 30 frames per second). The term ‘frame’ as used herein refers to a single picture in a video sequence for ease of discussion. However, it should be understood to be within the scope of the present disclosure that a frame may include less than a single picture of the video sequence, or more than a single picture of the video sequence. For delivering carrier-grade compressed digital video, a video buffer manager at a source provides a mechanism to prevent decoder buffer under- and/or overflow. In the buffering model of
Assuming that the video delivery system preserves the original video frame rate (neither inserting nor dropping frames), then the encoding and decoding times satisfy equations 1 and 2.
There are a combined number ‘c’ of compressed frames in encoder buffer 410 and decoder buffer 420 at the time te,j (
Equation 3 indicates that after a frame is (instantaneously in the ideal case) placed into the encoder buffer, it will take
seconds before it is instantaneously in the ideal case) removed from the decoder buffer. In this case, equation 4 applies.
Equations 1-4 relate to carrier-grade video distribution. The techniques of the present disclosure address buffer dynamics in a generalized IP video distribution (which encompasses carrier-grade video distribution). Parameters are determined to avoid underflow and overflow of decoder buffer 350 in the more general case of network 300 (
Now, assume that the end-to-end IP video data transmission has no loss and is in first-in-first-out (FIFO) order in both encoder buffer 430 and decoder buffer 450, and that compressed frames are transmitted as individual impulses. With these two assumptions, an end-to-end k-th distribution path from the encoder to the decoder as shown in
In equations 5 and 6, j is the compressed-frame index, σk≥0 is a constant, and
is a network buffer delay for the k-th distribution path (e.g., video path 340). Without loss of generality, assume σk is an integer for the k-th distribution path. Note that if σk is not an integer, ┌σk┐ is used instead.
To obtain correct frame timing for video coding and transmission, Theorem 1 and Theorem 2 are posited, with associated proofs.
Theorem 1:
For the video transmission system described in
PEj+BEj+BN,kj≤∫t
Proof:
(End of proof.)
Theorem 2:
For the video transmission system described in
∫t
Proof:
(End of proof.)
Keeping Theorem 1 and Theorem 2 in mind, note that there are c+σk compressed frames in encoder buffer 430, aggregate network buffer 440, and decoder buffer 450 at the time te,j (i.e., BEj+BN,kj+BD,kj are data of c+σk consecutive compressed frames). Thus, equation 10 applies.
As can be seen, decoder buffer 450 size is a network-delay dependent parameter.
In such a fixed-delay network, the following simplified conditions on video buffer dynamics are obtained, where decoder buffer 450 size is independent of the network delay. Without loss of generality, assume that network delay parameter Δk is an integer. (If it is not, ┌Δk┐ is used instead.) Corollary 1 is posited, with associated proof.
Corollary 1:
If the network link for the k-th distribution path has a fixed delay
between the output of encoder buffer 430 and the input of decoder buffer 450 at t∈T for all j, then
Proof:
(End of proof.)
Corollary 1 provides a set of conditions on video bit rate, frame rate, encoding/decoding time, video buffers, and network delay for the model of
the system model is modified as shown in
Corollary 2:
If the k-th distribution path between the output of encoder buffer 430 and the input of decoder buffer 450 has a fixed delay
with a maximum jitter
at t∈T for all j then
a) Decoder buffer 450 will not underflow if
b) Decoder buffer 450 will not overflow if
Proof:
(End of proof.)
Corollary 2 provides a set of sufficient conditions on video buffer dynamics for the system model given in
before decoder buffer 450 (size
Video data transmitted from the input of the aggregate network buffer 440 to the output of the dejitter buffer 540 has a fixed delay
Corollary 3 is posited, with associated proof.
Corollary 3:
For the k-th distribution path with a fixed delay
and a maximum jitter
at t∈T for all j, if a dejitter buffer 540 shown in
Proof:
(End of proof.)
Corollary 3 provides a set of buffer conditions for the system model constructed in
The network jitter can be compensated either in the dejitter buffer 540 as shown in
Next, the buffer, rate and delay constraints of an IP video transmission system are analyzed. The linear-bounded arrival process (LBAP) model is used for this analysis. An LBAP regulated stream constrains the video traffic data produced over any time interval τ by an affine function of this time interval τ. More specifically, if A(τ) is the traffic data transmitted over the time interval τ, the traffic is said to be LBAP if there exists a pair (ρ, b) for ρ≥0 and b≥0, such that equation 21 is satisfied, where ρ represents a long-term average rate of the source and b is a maximum burst the source is allowed to transmit in any time interval of length τ.
A(τ)≤ρτ+b,∀τ>0 (21)
When a maximum rate ρMAX of the source is known, a more useful arrival process model (ρ, b, ρMAX) which relates to the LBAP is:
A(τ)≤min(ρτ+b,ρMAXτ)∀τ>0.
Operationally, the above two arrival process models can be obtained by using so-called “leaky bucket” models: LBAP maps to a single leaky bucket (ρ, b) model while maximum rate LBAP maps to a dual leaky bucket model (ρ, b, ρMAX). In the leaky bucket technique, a counter, called the token bucket, builds up tokens of fixed size (e.g., 1 byte each), at a constant rate of ρ in a fixed bucket of size b. This size b is often referred to as the token depth. Each time a packet is offered, the value of the counter is compared to a predetermined threshold. If the counter is below the threshold, the counter is incremented and the packet is admitted to the network. Otherwise the packet is dropped. A modified version of the leaky bucket (ρ, b) model is a (ρ, b) regulator in which, rather than dropping packets, the (ρ, b) regulator buffers the packets for later transmission. A distribution path between the encoder buffer (e.g., encoder buffer 310 or 430) and the decoder buffer (e.g., decoder buffer 350 or 450) is next described in terms of (ρ, b) regulators.
An arbitrary network of (ρi, bi), i=1, 2, . . . , m regulators can be analyzed by considering an equivalent single (ρ, b) regulator. Specifically, a worst-case network behavior of (ρi, bi), i=1, 2, . . . , m regulators can be modeled by studying the behavior of an equivalent single (ρ, b) regulator. For example, a frame rate of the equivalent single (ρ, b) regulator is equal to the first (e.g., lowest) among the allocated frame rates for the (ρi, bi), i=1, 2, . . . , m regulators in a serial path of transmission, and a total latency is equal to a sum of latencies of the (ρi, bi), i=1, 2, . . . , m regulators.
as in equation 22.
For the analysis, denote the following frame parameters:
from the encoder butter 620. Note that this assumption is for deriving a lower bound on the token depth of the (ρ, b) regulator 610. However, in practice, this is a general requirement for some video applications (e.g., video conferencing). Theorem 3 is posited, with associated proof.
Theorem 3:
The token depth b of the (ρ, b) regulator 610 should satisfy:
Proof:
(End of proof.)
If the rate ρ is allocated to be equal to the average rate of the compressed video sequence, as shown in equation 25,
ρ≥ρavgPEAVG·f (25)
then
b≥PEMAX−PEAVG (26)
and the entire video sequence (e.g., a movie) can be transmitted within the (time) length of the video sequence.
Inequality (24) shows that the required token depth b can be as large as the maximum size of a compressed frame. Inequality (26) implies that the burstiness of a compressed video sequence depends not only on the largest compressed frame size, but also the average compressed frame size.
Traffic shaping is used at the network boundary to provide an average bandwidth between a sender (e.g., including encoder 305 in
The following analysis investigates whether increased compression ratios and decreased frame sizes (average and maximum) result in a decrease in burstiness. In the following analysis, assume that b always satisfies equation 24 and the encoder output data rate Re(t)≤ReMAX.
The metric PEMAX−PEMAX is used to measure the burstiness of a video stream coded by an encoder (e.g., encoder 305, which may be a coder/decoder (codec)), and this metric is applied to compare the burstiness levels of three generations of standard video codecs, MPEG-2, AVC and HEVC. Some assumptions about the relative sizes of PEMAX and PEAVG for these three codecs is summarized in Table 1.
In the analysis, MPEG-2 is used as the baseline and compared with AVC and HEVC in terms of burstiness of coded video. For this comparison, it is assumed that, on average,
It was determined through simulation that, even though burstiness may be reduced (e.g., burstiness metric ratio less than 1) for some combinations of α, β, γ, θ, and μ (e.g., with β=μ=0.5, γ=θ=0.6, and α ranging between 2 and 6), the burstiness is barely decreased, and is even increased (e.g., burstiness metric ratio larger than 1) for other combinations (e.g., with β=μ=0.5, γ=θ=0.9, and α ranging between 2 and 6). Note that, on average, this latter combination assumes that AVC is 50% more efficient than MPEG-2 and HEVC is 50% more efficient than AVC. However, the efficiency improvement for the largest frames is 10%. Thus, with compression ratios progressively increased and frame sizes (both average and maximum) reduced across MPEG-2, AVC and to HEVC, it was determined that the worst-case video burstiness is not reduced.
Next, the service rate offered for a compressed video sequence by a network of (ρi, bi), i=1, 2, . . . , m regulators in the system shown in
Lemma 1 is posited, with associated proof.
Lemma 1:
Decoder buffer 630 for the k-th distribution path will not underflow if the rate ρ(k) of its input equivalent (ρ(k),b) regulator 610 satisfies equation 27.
Proof:
(End of proof.)
For better network bandwidth utilization, a smaller rate ρ(k) is desirable. To satisfy equation 27, the allocated rate ρ(k) can be as small as shown in equation 28; however, equation 28 may be impractical to calculate.
It is known from the discussion above that, for the k-th distribution path, there are c+σk compressed frames residing in encoder buffer 430, aggregate network buffer 440 and decoder buffer 450 at the time te,j for all j. In other words, data BEj, BN,kj, and BD,kj contain exactly c+σk consecutive compressed frames for all j. Thus, the encoding time te,j of the j-th frame is the decoding time of the (j−(c+σk))-th frame. Since decoder buffer 450 does not underflow at the decoding time of the (j−(c+σk))-th frame, BN,kj should or must contain at least one frame, the (j−(c+σk))-th frame. Therefore, BEj+BN,kj should or must contain at most c+σk−1 frames, and
Therefore, the rate ρ(k) can be allocated for the k-th distribution path to be as shown in equation 29 (the allocated rate ρ(k)=rk(c) is a maximum value of the average rates of a sliding window of c+σk consecutive compressed frames of the video sequence).
Note that rk(c) is not only dependent on the video sequence parameters (e.g., compressed sizes PEi and the frame rate f), but also the network delay parameter σk. Lemma 2 is posited, with associated proof.
Lemma 2:
For the video transmission system model given in
Proof:
(End of proof.)
Theorem 4 follows from Lemma 1 and Lemma 2.
Theorem 4:
For the video transmission system model given in
Note:
Comparing with ρavg in equation 25, the rate ρ(k) in equation 29 is a different result. The rate ρavg implies that the entire video can be sent over the (time) length of the video. In contrast, the rate ρ(k) ensures that any c+σk consecutive compressed frames can be transmitted over the c+σk frame time interval
For the system model given in
with a maximum jitter
Theorem 5 is posited, with associated proof.
Theorem 5:
For the video transmission system model given in
Proof:
(End of proof.)
Note that the rate
is now only a function of the video parameters, and is independent of network delay and jitter of the k-th distribution path. However, decoder buffer 450 size is a jitter-dependent parameter for the k-th distribution path.
For the video transmission system model given in
Theorem 6:
For the video transmission system model given in
Proof:
It can be seen that, from the perspective of decoding time and total buffer sizes, Theorem 5 is equivalent to Theorem 6. Also, the system model shown in
given in Theorem 6 also satisfies the rate conditions of Corollary 1. Thus, Corollary 4 is posited.
Corollary 4:
If the network link for the k-th distribution path has a fixed delay between the output of encoder buffer 430 and the input of decoder buffer 450 at t∈T for all frames, decoder buffer 450 with a size {circumflex over (B)}DMAX given in equation 4 will neither underflow nor overflow if the rate ρ of its input equivalent (ρ, b) regulator satisfies
Once again, the rate
is only a function of the video parameters, and is independent of network delay and jitter of the k-th distribution path. In this case, the actual decoder buffer 450 size is now independent of network jitter.
In the analysis so far, it is assumed that each compressed frame is a single integral entity (e.g., modeled as an impulse function) when it traverses the IP network. With this assumption there is no ambiguity about frame-related timings. In this section, the assumption is relaxed with respect to the practical IP networks, where each video frame is transmitted as a sequence of packets (e.g., Ethernet frames). Frame-related timings are defined for packetized video transmission, and delay and jitter bounds are established for a class of IP networks.
In practice, before the video stream is transmitted to the IP network, the frames are first packetized on the encoder side. The video packets are then transmitted over the IP network, which in general includes of a series of routers and switches. On the decoder side, the received video packets, which may be out of order during transit, are reordered and depacketized to reassemble original compressed frames before being input to the decoder buffer. (It is assumed that the packets are transmitted across the IP network error-free.)
A frame PEj is packetized and transmitted to the IP network by packetizer/transmitter 720 as a sequence of Nj packets of sizes Lj,1, Lj,2 . . . , Lj,N
Assume that the packetization of a video frame starts immediately after the frame is provided at input A to packetizer/transmitter 720, and that the transmission of the first packet of a video frame takes place once the packet is available to the transmitter portion of packetizer/transmitter 720. Therefore, the time at which the j-th frame is output from the encoder buffer 710, t′e,j coincides with the time when the first packet of the frame starts to be transmitted, Te,j,0; that is
t′e,j=min0≤i≤N
Now assume that when the last packet of the frame j is received and made available at input/output D to receiver/de-packetizer 750, the reordering and depacketization of the frame will be completed immediately. Therefore, the time at which the j-th frame is provided to input/output E and thus into decoder buffer 760 coincides with the time at which the last packet of the frame is received from the IP network; that is
t′d,j,k=max1≤i≤N
Then, the end-to-end delay of the j-th frame for the k-th distribution path across the IP network can be denoted as in equation 32.
Dj,ke2et′d,j,k−t′e,j (32)
Assume that the maximum latency for any frame to go through packetizer/transmitter 720 is Tp; that is,
Tpmaxj{Te,j,N
Then, for any j and any i with 1≤i≤Nj,
Te,j,i−Te,j,0≤Tp (33)
Furthermore, assume that a maximum delay for any video packet to traverse the k-th distribution path of the IP network is Dk; that is,
Dkmaxj{max1≤i≤N
Thus, for any j and any i with 1≤i≤Nj,
Td,j,i≤Te,j,i+Dk (34)
Lemma 3 is posited, with associated proof.
Lemma 3:
The maximum end-to-end delay of video frames across a k-th distribution path of the IP network is upper-bounded as:
Dke2emaxj{Dj,ke2e}≤Tp+Dk (35)
Proof:
(End of proof.)
This holds even for the cases where packet-reordering is performed by receiver/depacketizer 750. For the k-th distribution path, if the input video stream is (ρ, b)-regulated, a maximum delay Dk incurred by a packet traversing an IP network that includes sk routers is upper-bounded by
where Θi,k is the latency of the i-th router along the k-th distribution path and pi,k is the propagation delay between the i-th router and its next neighboring node along the k-th distribution path. Each router has a rate of at least ρ.
If the i-th router (i<sk) can be modeled as a WFQ system, its latency is given by equation 37.
Lk,max is a maximum packet size of the k-th video stream (which is sent along the k-th distribution path), Lmax,i is a maximum packet size among all streams sent to the i-th router, and ri is a total bandwidth of an output port of the i-th router. For the last router (the sk-th router), if it can be modeled as a WFQ system, its latency is given by equation 38.
If all sk routers in
Lemma 4 and Theorem 7 follow from Lemma 3 and equations 6 and 39.
Lemma 4:
For an IP network with (ρ, b)-regulated video input and sk WFQ routers along the k-th distribution path, the maximum end-to-end delay of video frames across the distribution path is bounded by
Theorem 7:
For an IP network with (ρ, b)-regulated video input and sk WFQ routers along the k-th distribution path, the network buffer delay parameter σk defined in equation 6 can be set as shown in equation 41.
The network buffer delay parameter σk can be split into a fixed delay parameter Δk and a maximum jitter parameter δk, for example, as shown in Corollary 5.
Corollary 5:
For an IP network with (ρ, b)-regulated video input and sk WFQ routers along the k-th distribution path, the network buffer delay parameter σk may be set to σk=Δk+δk with Δk as shown in equation 42, where Lk,min is the minimum packet size of the video being sent along the k-th distribution path. └⋅┐ denotes a floor function.
Additionally, a maximum jitter parameter is determined as in equation 43.
The network buffer delay parameter σk in Corollary 5 is greater than or equal to the network buffer delay parameter σk in Theorem 7.
As can be seen from equations 42 and 43, the fixed delay parameter Δk is determined by the overall minimum packet latency over the routers in the k-th distribution path and the total propagation delay, while the maximum jitter parameter δk is determined by the latency from video stream burstiness (over the transmission rate), the latency from packetization and serialization, and the overall packet jitter over routers in the k-th distribution path caused by all streams.
Some comparison examples are provided below, where Δk and δk are calculated both for a video transmitted in terms of average rates ρavg as defined in equation 25 and ρ* as set according to Theorem 5. To set the stage for the examples, some parameters are next defined.
Referring again to
A video burst duration can be described by a ratio of a maximum burst size b and an average data rate ρ, which represents the burst duration when the video is transmitted across an IP network. From Theorem 3 and equation 26 (PEMAX−PEAVG) approximate the maximum burst size b. For the choice of the average data rate ρ, there are two options: (1) the data rate ρavg defined in equation 25; and (2) the data rate ρ* that is set according to Theorem 5. The video burst duration is then calculated by (PEMAX−PEAVG)/ρavg=(PEMAX−PEAVG)/(PEAVG×f) and (PEMAX−PEAVG)/ρ*, for the two options of average data rate, respectively. Following the same technique of determining the relative sizes of maximum and average frames with respect to different codecs as in Table 1, the video burst durations for MPEG-2, AVC and HEVC are compared in Table 2, with the assumption that ρ*=ρavg×ω, for a factor ω.
For β=μ=0.5, γ=θ=0.9, and ω=1.2, burst latency is reduced for option (b) (e.g., for a ratio of maximum frame size to average frame size of α=6, an approximately 100 millisecond (ms) reduction in burst latency is achieved). Thus, the sliding average window of the present disclosure provides for improved QoS.
Router queuing delay is represented by
and depends on an average data rate of the video stream (ρk), a number of routers (i.e., hops) along the k-th distribution path (sk), a maximum packet size of the k-th stream (Lk,max), a maximum packet size among all streams (Lmax,i) and a total bandwidth of the output port (ri) for the i-th router. A typical hop count for an Internet connection within U.S. domains is fourteen. The maximum Ethernet packet size is 1518 bytes; so both Lk,max and Lmax,i are set to 1518 bytes. Then, with the assumptions that ρk=20 Mbps (e.g., for 4K HEVC video) and that ri=100 Mbps, the router queuing delay is
Total propagation delay depends on the distances and media types of the links connecting all routers along the distribution path. For terrestrial fiber links totaling 4800 kilometers (km) (roughly coast-to-coast continental U.S.), the total propagation delay is 4800/(300×0.7)≈23 ms, assuming a light speed of 300 km/ms and a velocity factor of 0.7 for optical fiber. Similarly, for MEO and GEO satellite links of 18000 km and 74000 km, the corresponding delays are 60 ms and 247 ms, respectively.
Table 3 summarizes some examples for Δk and δk in various networks.
In Table 3, it is assumed that:
In the above examples, the dominating component of the fixed delay
is the total propagation delay, and the dominating components of the maximum jitter
are the packetization and serialization latency and the burst duration. In comparison, the contribution of router queuing to the fixed delay and the maximum jitter is relatively very small.
So far, several models of video transmission systems have been analyzed, and theorems and corollaries proven for these models regarding the required conditions for video network transmission rates, buffer dynamics, and frame coding times. To ensure the encoder and the decoder operate properly in these models, there should also be an appropriate clock to drive the system timing. Next, system timing is discussed, with impact on video transmission over the network and derivation of various frame timing parameters, such as td,j,k. There are two primary approaches for providing the system timing: the encoder clock and the global clock.
The encoder clock is a time source in the encoder which serves as a master timing source for the encoder and is also used for generating slave timing sources for the network and the decoder(s). For example, in MPEG-2 systems, the 27 MHz system time clock (STC) drives the encoding and is also used as a “master clock” for the entire video transmission system. At the encoder end, the decode time stamp (DTS) and the presentation time stamp (PTS) are created from the STC and carried together with the video packets. The DTS tells the decoder when to decode the frame while the PTS tells the decoder when to display the frame.
In addition to knowing at what time decoding and presentation should occur, the STC clock samples are also embedded, to allow a time reference to be created. The program clock reference (PCR) in MPEG-2 transport stream (TS) provides 27 MHz clock recovery information. PCR is a clock recovery mechanism for MPEG programs. In MPEG-2 TS, when a video is encoded, a 27 MHz STC drives the encoding. When the program is decoded (or re-multiplexed), the decoding is driven by a clock which is locked to the encoder's STC.
The decoder uses the PCR to regenerate a local 27 MHz clock. As mentioned above, when a compressed video is inserted into the MPEG-2 TS, a 27 MHz PCR timestamp is embedded. At the decoder end, a voltage controlled oscillator (VCXO) generates a 27 MHz clock. When a PCR is received, it is compared to a local counter which is driven by the VCXO, and the difference is used to correct the frequency of the VCXO so that the 27 MHz clock is locked to the PCR. Then, decoding and presentation happen at the proper DTS and PTS times, respectively.
In this approach, all time stamps (including the clock reference) are carried with the video packets and transmitted from the encoder end to the decoder end. Thus, it is not needed to know the exact network delay to generate these time stamps since the clock is locked to the encoder clock and the actual decoding time has counted for the DTS packets network delay (e.g.,
for the system given in
The STC approach requires that the network has a constant delay at the time stamp extraction point. Therefore, this approach is clearly suitable to the video transmission systems given by
The global clock is a global time source (e.g., synchronized “wall clock”) for the encoder and the decoder. For example, both the encoder and the decoder can use a precise global clock, such as a GPS clock, to generate, compare and calculate all encoding, decoding, and presentation timing parameters, and to drive the timing circuits for the encoding and decoding systems. In this approach, the DTS and PTS are also carried with the video packets and transmitted from the encoder to the decoder. For example, td,j,k given in equation 44 is also applicable for the system given in
Next is a discussion of three video service types: unicast, broadcast, and multicast.
Video unicast is a network communication where a video is sent from one sender to one receiver, such as the sending of a video packet from a single source to a specified destination. Today, unicast transmission is still the predominant form of video transmission over the Internet and on local area networks (LANs). Examples include YouTube video transmission and Netflix video service. All IP networks support the unicast transfer mode, and most users are familiar with the standard unicast applications (e.g., http, SMTP, FTP and Telnet) which employ the TCP transport protocol. All systems and theorems discussed above can be used in video unicast applications.
Video broadcast is a network communication where a video is sent from one point to all other service points. In this case there is just one source (e.g., a server), but the video is sent to all connected receivers for the service. Video broadcast examples include cable and satellite digital pay-TV broadcasting services. Today, these service examples are still the predominant forms of high-quality and carrier-grade video services to hundreds of millions of homes. Broadcast transmission is supported on most IP networks. Network layer protocols (such as IPv4) support a form of broadcast that allows the same packet to be sent to every system in a logical network (in IPv4 is the IP network ID and an all 1's host number).
In the traditional cable and satellite video broadcast services, the transmission propagation delay differences among all receivers are usually negligible. Thus, considering the system model given in
When video programs are streaming over an IP network, the transmission delays are different for each receiver due to network delay differences at various nodes. However, it can be seen from the example in Table 3 that, for the system models given in
If the service aims to achieve the same decoding time, a DTS offset would need to be added at each distribution path. In the systems given by
is the same for receivers on all distribution paths. However, the decoder buffer fullness for the receiver on each distribution path may be different at td,j.
Video multicast is a network communication where a video is sent from one or more points to a different set of points. In this case there may be one or more servers, and the information is distributed to a set of receivers. The discussions herein consider a single video server. However, all results can be easily extended to more video servers.
One application example which may use multicast is a video server sending out IP networked TV channels. Simultaneous delivery of high quality video and carrier-grade service to each of a large number of delivery platforms may exhaust the capability of even a high bandwidth network with a powerful video server. This poses a salability issue for applications which need sustained high bandwidth. One way to significantly ease scaling to larger groups of clients is to employ multicast networking. Multicasting is the networking technique of delivering the same packet simultaneously to a group of clients. IP multicast provides dynamic many-to-many connectivity between a set of senders/servers (at least one) and a group of receivers. The format of IP multicast packet is identical to that of unicast packets and is distinguished only by the use of a special class of destination address (e.g., class D IPv4 address) which denotes a specific multicast group. Since TCP supports only the unicast mode, multicast applications use the UDP transport protocol.
Unlike IP broadcast transmission (which is used on some LANs), multicast video clients receive a stream of video packets only if they have previously elected to do so (by joining the specific multicast group address). Membership of a group is dynamic and controlled by the receivers (in turn informed by the local client applications). The routers in a multicast network learn which sub-networks have active clients for each multicast group and attempt to minimize the transmission of packets across parts of the network for which there are no active clients. Due to the dynamic management of the multicast transmission, the DTS offset solution for video broadcast in the earlier discussion may be not applicable here, and the decoding time td,j,k will be different for each distribution path k.
The video multicast mode is useful if a group of clients require a common set of video at the same time, or when the clients are able to receive and store (cache) common video until requested (e.g., DVR). Where there is a common request for the same video by a group of clients, multicast transmission may provide significant bandwidth savings (up to
of the bandwidth compared to n separate unicast clients).
In addition to the above modes, the techniques of the present disclosure provide for improvements in adaptive bit rate (ABR) streaming, used for streaming multimedia over a communication network such as the Internet. While in the past, video streaming technologies utilized streaming protocols such as RTP with RTSP, ABR streaming technologies are generally based on HTTP and are designed to work efficiently over large distributed HTTP networks such as the Internet. ABR streaming works by detecting a user's bandwidth and the decoder capacity in real time and adjusting the quality of a video stream accordingly. ABR uses an encoder which can encode a single source video at multiple bit rates. The player client switches between streaming the different encodings depending on available resources. More specifically, the source content is encoded at multiple bit rates, each of the different bit rate streams are segmented into small multi-second parts, and the streaming client is made aware by a manifest file of the available streams at differing bit rates. When starting, the client requests the segments from the lowest bit rate stream. If the client finds the download speed is greater than the bit rate of the segment downloaded, then the client will request the next higher bit rate segments. If the client finds the download speed for a segment is lower than the bit rate for the segment, and therefore the network throughput has deteriorated, then the client will request a lower bit rate segment. The segment time can vary depending on the particular implementation. For example, a typical segment time is two seconds.
Assume that the transmitted video includes segments from all p bit-rate streams and the segments from the i-th bit-rate video stream are (ρi, bi)-regulated for i=1, 2, . . . p. Corollary 5 then extends to Corollary 6.
Corollary 6:
For an IP network with the (ρi, bi)-regulated video input and sk WFQ routers along the k-th distribution path, the network buffer delay parameter σk may be set to σk=Δk+δk,i with
anti Lk,max,i is the maximum packet size of the i-th bit-rate video being sent along the k-th distribution path.
Therefore, Theorem 8 applies to ABR systems.
Theorem 8:
For a p bit-rate ABR system, if the video segments from the i-th bit-rate video stream are (ρi,bi)-regulated for i=1, 2, . . . , p, and the network buffer delay parameter σk for the k-th distribution path are given by Corollary 6, then the decoding time can be set to
Thus, described in this disclosure is a technique for providing compressed video buffering to support large-scale carrier-grade IP video streaming services, to address video traffic regulation for end-to-end IP network QoS. General necessary and sufficient conditions were derived for the decoder buffer to neither overflow nor underflow when a video stream traverses any end-to-end IP distribution path. These results were then utilized to develop more specific sufficient conditions for the decoder buffer to neither overflow nor underflow with respect to distribution paths of different latency characteristics. A metric to measure the burstiness of video streams was developed, and then employed to compare the burstiness of video streams coded by MPEG-2, AVC and HEVC. As a step towards applying the techniques to real-world IP networks, a class of routers that can be modelled as WFQ (Weighted Fair Queuing) systems were analyzed for their queuing latencies, and the upper bounds of video-frame delay and jitter across a network path of such routers were derived. Finally, the video system timing approaches (encoder clock and global clock) and video system types (unicast, broadcast and multicast) were discussed with respect to the developed technique of compressed video buffering.
While the disclosure has been described with reference to the specific embodiments thereof, it should be understood by those skilled in the art that various changes can be made and equivalents substituted without departing from the true spirit and scope of the disclosure as defined by the appended claims. In addition, many modifications can be made to adapt a particular situation, material, composition of matter, method, operation or operations, to the objective, spirit and scope of the disclosure. All such modifications are intended to be within the scope of the claims appended hereto. In particular, while certain methods have been described with reference to particular operations performed in a particular order, it will be understood that these operations can be combined, sub-divided, or reordered to form an equivalent method without departing from the teachings of the disclosure. Accordingly, unless specifically indicated herein, the order and grouping of the operations is not a limitation of the disclosure.
This application claims the benefit of and priority to U.S. Provisional Patent Application 62/140,557 filed Mar. 31, 2015 to Chen et al., titled “Compressed Video Buffering,” the contents of which are incorporated herein by reference in their entirety for all purposes.
Number | Name | Date | Kind |
---|---|---|---|
6637031 | Chou | Oct 2003 | B1 |
8356327 | van Beek | Jan 2013 | B2 |
Number | Date | Country | |
---|---|---|---|
20160295254 A1 | Oct 2016 | US |
Number | Date | Country | |
---|---|---|---|
62140557 | Mar 2015 | US |