1. Technical Field
The present invention relates to mobile communications, and more particularly to streaming scalable video over fading wireless channels.
2. Description of the Related Art
Wireless video streaming is becoming increasingly popular as both wireless networking and video coding technologies have made significant progress. On the wireless side, the data transmission rates are steadily growing. Latest WiFi networks can support data rate of more than 100 Mbps and the next generation (4G) wireless technologies are expected to achieve 1 Gbps for nomadic users and 100 Mbps for mobile users. On the video coding side, the International Organization for Standardization/International Electro technical Commission (ISO/IEC) Moving Picture Experts Group-4 (MPEG-4) Part 10 Advanced Video Coding (AVC) standard/International Telecommunication Union, Telecommunication Sector (ITU-T) H.264 Recommendation (hereinafter the “MPEG-4 AVC Standard”) enables more efficient video compression and the scalable video coding (SVC) extension of the MPEG-4 AVC Standard allows more flexible video coding. Nevertheless, it remains a challenge to adapt wireless networks to satisfy the requirements of video streaming services. In fading wireless networks, most MAC schedulers employ some type of channel-state aware scheduling algorithms (e.g., Proportional Fair Scheduling) to exploit multi-user diversity. However these schemes often ignore the real-time quality of service (QoS) requirement of video traffic.
Moreover, since the wireless medium is often shared by many users, it is desirable to adapt to the wireless channel conditions in order to satisfy stringent bandwidth and delay requirement of video traffic. Streaming video over wireless networks has been studied extensively by many researchers, but much of the previous work has focused on the single-stream scenario where the transmitter of a video streaming service adaptively adjusts its transmission rate, re-transmission, video-truncation, forward error correction (FEC) and/or HARQ policy in order to optimize the received video quality. However, the wireless medium is a shared resource and a wireless base station (or access point) often provides streaming services to multiple wireless clients.
Overview of Scalable Video Coding
SVC can be referred to as both the general concept of scalable video coding and the special extension of the MPEG4-AVC Standard. An SVC stream has a base layer and one or more enhancement layers. As long as the base layer is received, the receiver can decode the video stream. As more enhancement layers are received, the decoded video quality is improved. The bandwidth scalability of SVC includes temporal scalability, spatial scalability, and quality scalability. Temporal scalability refers to representing the same video in different temporal resolutions or frame rates. Spatial scalability refers to representing the video in different spatial resolutions or sizes. Normally, the picture of a spatial layer is based on the prediction from both lower-temporal layers and lower-spatial layers. Quality (or SNR) scalability refers to representing the same video in different SNRs or quality levels. To be precise, SNR-scalable coding quantizes the DCT-coefficients using different quantization parameters. SNR scalability in SVC includes coarse-grain scalability (CGS) and fine grain scalability (FGS). CGS is achieved using the concept of spatial scalability but with identical picture size. FGS is achieved by so-called progressive refinement (PR) slides, each of which represents a refinement of the residual signal that corresponds to a bisection of the quantization step size (QP increase of 6).
In the SVC extension the MPEG-4 AVC Standard, the base layer is an MPEG-4 AVC Standard bitstream for backwards compatibility. The temporal scalable bit-stream is generated using hierarchical prediction structures as illustrated in
These and other drawbacks and disadvantages of the prior art are addressed by the present principles, which are directed to streaming scalable video over fading wireless channels.
According to an aspect of the present principles, there is provided a method. The method includes building a model relating to a relationship between an average data rate and an average peak signal-to-noise ratio for a video sequence encoded using scalable video coding and having a base layer and one or more enhancement layers. The method also includes computing a vector relating to a set of average data rates for a particular boundary point on an achievable rate region for a transmission strategy. The boundary point is a function of a parameter set for a plurality of users. The achievable rate region is based upon the model. The method further includes scheduling the plurality of users to receive the video sequence over a wireless channel, such that at a given transmission time slot a particular one of the plurality of users associated with a maximum value is selected. The maximum value is based on the vector and a channel capacity available to the particular one of the plurality of users.
According to another aspect of the present principles, there is provided an apparatus. The apparatus includes a model builder for building a model relating to a relationship between an average data rate and an average peak signal-to-noise ratio for a video sequence encoded using scalable video coding and having a base layer and one or more enhancement layers. The apparatus also includes a user scheduler for computing a vector relating to a set of average data rates for a particular boundary point on an achievable rate region for a transmission strategy. The boundary point is a function of a parameter set for a plurality of users. The achievable rate region is based upon the model. The user scheduler schedules the plurality of users to receive the video sequence over a wireless channel, such that at a given transmission time slot a particular one of the plurality of users associated with a maximum value is selected. The maximum value is based on the vector and a channel capacity available to the particular one of the plurality of users.
These and other features and advantages will become apparent from the following detailed description of illustrative embodiments thereof, which is to be read in connection with the accompanying drawings.
The disclosure will provide details in the following description of preferred embodiments with reference to the following figures wherein:
The present principles are directed to a system and method for scalable video streaming over fading wireless channels.
Referring now in detail to the figures in which like numerals represent the same or similar elements and initially to
The server 130 further includes at least one processor (CPU) 202 operatively coupled to other components via the system bus 204. A read only memory (ROM) 206, a random access memory (RAM) 208, a display adapter 210, an I/O adapter 212, a user interface adapter 214, a sound adapter 270, and a network adapter 298, are operatively coupled to the system bus 204. A display device 216 is operatively coupled to system bus 204 by display adapter 210. A disk storage device (e.g., a magnetic or optical disk storage device) 218 is operatively coupled to system bus 204 by I/O adapter 212. A mouse 220 and keyboard 222 are operatively coupled to system bus 104 by user interface adapter 214. The mouse 220 and keyboard 222 are used to input and output information to and from system 200. At least one speaker (herein after “speaker”) 285 is operatively coupled to system bus 204 by sound adapter 270. A (digital and/or analog) modem 296 is operatively coupled to system bus 104 by network adapter 198.
It is to be appreciated that while the model builder 291, the user scheduler 292, the frame scheduler 293, and the frame dropper 294, as well as the other elements of the server 130, are shown as separate elements, in other embodiments, one or more of the same may be combined with one or more other elements, while maintaining the spirit of the present principles. Moreover, it is to be appreciated that while the server 130 is shown separate from the base station 140, in other embodiments, one or more elements of the server, including, but not limited to, one or more of the model builder 210, the user scheduler 120, the frame scheduler 130, and the frame dropper 140, may be incorporated into the base station 140, into another server, and so forth. Further, it is to be appreciated that at least some of the elements shown and described with respect to server 130 are optional elements and, thus, may be omitted, depending upon the implementation. For example, in an embodiment, the sound adapter 270 and at least one speaker 285 may be omitted. Given the teachings of the present principles provided herein, one of ordinary skill in this and related arts will contemplate these and other variations to implementations of the present principles, while maintaining the spirit of the present principles.
With respect to user scheduling 510, in an embodiment, we use a maximal scheduling policy and/or a dynamic scheduling policy. In an embodiment, the maximal scheduling policy involves, at each time slot, the user with the maximum μi Ci being selected for scheduling, where Ci is the channel capacity of user i, and μi is the parameters computed per step 230 of
With respect to frame scheduling 520, after the user is selected, we choose the frames of that user in the order of their decoding deadline. In an embodiment, we can select frames based on, e.g., a decoding deadline and/or a playout deadline. In an embodiment, for the frames with the same decoding deadline, we choose the ones with the highest priority which is computed per step 310 of
With respect to the dropping strategy 530, we have two types of dropping. The first is late-dropping. When the playout deadline of a frame is passed, the frame is dropped. The second is early-dropping. Early dropping is based on the achievable rates for all users computed per step 330 of
Steps 510, 520, and 530 are described in further detail herein after with respect to various aspects of the present principles.
Embodiments described herein may be entirely hardware, entirely software or including both hardware and software elements. In a preferred embodiment, the present invention is implemented in software, which includes but is not limited to firmware, resident software, microcode, etc.
Embodiments may include a computer program product accessible from a computer-usable or computer-readable medium providing program code for use by or in connection with a computer or any instruction execution system. A computer-usable or computer readable medium may include any apparatus that stores, communicates, propagates, or transports the program for use by or in connection with the instruction execution system, apparatus, or device. The medium can be magnetic, optical, electronic, electromagnetic, infrared, or semiconductor system (or apparatus or device) or a propagation medium. The medium may include a computer-readable medium such as a semiconductor or solid state memory, magnetic tape, a removable computer diskette, a random access memory (RAM), a read-only memory (ROM), a rigid magnetic disk and an optical disk, etc.
It is to be appreciated that the use of any of the following “/”, “and/or”, and “at least one of”, for example, in the cases of “A/B”, “A and/or B” and “at least one of A and B”, is intended to encompass the selection of the first listed option (A) only, or the selection of the second listed option (B) only, or the selection of both options (A and B). As a further example, in the cases of “A, B, and/or C” and “at least one of A, B, and C”, such phrasing is intended to encompass the selection of the first listed option (A) only, or the selection of the second listed option (B) only, or the selection of the third listed option (C) only, or the selection of the first and the second listed options (A and B) only, or the selection of the first and third listed options (A and C) only, or the selection of the second and third listed options (B and C) only, or the selection of all three options (A and B and C). This may be extended, as readily apparent by one of ordinary skill in this and related arts, for as many items listed,
It is to be appreciated that while one or more embodiments of the present principles are described herein with respect to the SVC extension of the MPEG-4 AVC Standard, the present principles are not solely limited to the same and, thus, given the teachings of the present principles provided herein, may be readily applied by one of ordinary skill in this and related arts to other video coding standards, recommendations, and/or extensions thereof, while maintaining the spirit of the present principles.
Moreover, it is to be appreciated that while one or more embodiments of the present principles are described herein with respect to temporal scalability and MGS scalability, the present principles are not limited to solely the preceding and, thus, given the teachings of the present principles provided herein, may be readily applied by one of ordinary skill in this and related arts to other aspect of SVC, including but not limited to, spatial scalability, SNR scalability, coarse grain scalability, fine grain scalability, and so forth, while maintaining the spirit of the present principles.
As noted above, the present principles are directed to streaming scalable video over fading wireless networks. In one or more embodiments, we focus on the issue of multi-user video streaming over a shared wireless channel. As noted above, an SVC bitstream includes a base layer and one or more enhancement layers. As long as the base layer is received, the receiver can decode the video stream. As more enhancement layers are received, the decoded video quality is improved. With an SVC-encoded stream, the scheduler at the wireless base station can adapt to changing wireless channel conditions by transmitting a subset of enhancement layers.
We identify two issues under such a scenario and develop solutions for them. The first issue is how to share the wireless radio resources in order to optimize overall streaming video quality. A fundamental characteristic of wireless channels is the random time-varying of the channel states, which is called fading. Under the fading wireless channel model, channel-state-dependent scheduling is often used to exploit the multi-user diversity. We formulate the problem under this model as maximizing the weighted sum of video quality of all users subject to the achievable long-term (ergodic) rate constraint. To solve the problem, we develop a long-term radio resource allocation algorithm which determines the wireless scheduling policy and the parameters used by the scheduling policy. The scheduling policy has the following property: at each time slot, the user with the largest μiCi is selected for scheduling, where Ci is the channel capacity and μi is a parameter for user i. The set of parameters (μi,i=1, . . . ,n) is computed using a gradient-based approach. We rigorously prove that the scheduling policy together with the computed parameters achieves the global optimum of the weighted sum of video quality function under mild conditions. This is quite significant as the formulated objective function (video quality) is not concave in the space of parameters (μi, i=1, . . . ,n).
Further with regard to the first issue, in another embodiment, we first develop a model to characterize the relationship between the time-averaged data rate and the video quality (measured by PSNR), which turns out to be a concave, piece-wise linear function. We consider a general TDMA scheduling policy: at each time slot, only one user is selected for scheduling. We then formulate the problem as a long-term radio resource allocation problem where the objective is to maximize the weighted sum of the time-averaged video quality of all streaming users. We show that the optimal scheduling policy has the following property called maximal scheduling: at each time slot, the user with the largest μiCi is selected for scheduling, where Ci is the channel capacity and is a parameter for user i. We then design an algorithm to find the set of parameters {μi,i=1, . . . , n} (where n is the number of users) to maximize the weighted sum of PSNR of all n streaming users. We then design an online scheduling algorithm that uses the aforementioned maximal scheduling policy and the computed optimal parameters μ=(μ1,μ2, . . . ,μn) to select the user for scheduling in each slot. In addition, we also design a frame/layer dropping strategy based on the achievable rate for each user obtained in the aforementioned long-term resource allocation algorithm.
The second issue is how to design online scheduling algorithms that meet the video traffic QoS requirements in addition to exploiting multi-user diversity. Previous radio scheduling algorithms often only consider the deadline requirement of video traffic and channel conditions but ignore the bursty rate requirement of video traffic. We propose two exemplary scheduling algorithms. The static scheduling algorithm simply uses the results obtained from the aforementioned long-term resource allocation algorithm. The dynamic scheduling algorithm is based on the static scheduling algorithm but further adapts the parameters used by the scheduling policy to meet the instantaneous rate, deadline requirement of video traffic and wireless channel conditions. In this regard, we first compute the instantaneous rate requirement
Thus, we advantageously provide the following: (1) we develop an empirical model to relate the video quality and the average throughput; (2) we design a long-term radio resource allocation algorithm in order to optimize the overall video quality of multiple users; and (3) we devise two on-line scheduling schemes that jointly consider the deadline and the bursty rate requirement of video traffic and the varying wireless channel conditions. Our schemes also determine when video packets need to be dropped and which of them should be dropped.
Rate-Quality Model and Prioritization of Layers
A natural criteria for measuring video quality is distortion, which is defined as the mean square error of the reconstructed pixel values compared to the original uncompressed pixel values. Another important metric for measuring video quality is PSNR (Peak Signal to Noise Ratio). For a single video frame, PSNR is defined as PSNR=10 log10(2552/D), where D is the distortion. However, for a sequence of video frames, the relationship between average PSNR and the average distortion is not direct, because both averages are taken over multiple frames with respect to their own values. Some researchers have used distortion as a measure of the video quality and developed models that relate data rate and distortion. But PSNR is more widely used as the final performance metric in the literature. Therefore, we choose to model the relationship between the average data rate and the average PSNR directly. In the prior art, different data rates are obtained by encoding using different quantization parameters. To serve the purpose of scalable video streaming, we obtain different data rates by sequentially truncating some layers of an SVC-encoded video stream.
There are many possible ways to truncate SVC-encoded streams. For example, each video frame can be truncated to different SNR (quality) layers. However, allowing arbitrary truncation leads to very different video quality for each frame, which is undesirable, and also makes the rate-quality model more difficult to build. Therefore, in an embodiment, we truncate the video stream such that frames in the same temporal layer are truncated to the same SNR layer. In an embodiment, a layer can be specified by its temporal level t and its SNR level q. Thus, in an embodiment, the frames belonging to the same layer (t,q) are either all kept or all removed in a truncated video sequence.
In one embodiment, we consider truncation defined by a tuple (q0,q1, . . . ,qL) with q0≧q1≧ . . . ≧qL≧−1, where L represents the highest temporal layer. In the truncation defined by (q0,q1, . . . ,qL), frames with temporal layer k are truncated to keep up to qk SNR layers. When qk=−1, this means that the whole temporal layer k is dropped, and when qk=0, this means that only the base layer of temporal layer k is kept. We enforce that the maximum SNR layer in a higher temporal layer not be smaller than that in a lower temporal layer (i.e., qk≧qk+1) because a frame in a higher temporal layer may depend on that in a lower temporal layer. When the whole temporal layer k is dropped, we reconstruct it using linear interpolation from lower temporal layers that are not dropped and then we compute the average PSNR of all frames for a fair evaluation.
When we have to drop video packets at bad wireless channel conditions, we will drop them in a certain order based on their importance/priority. As a result, we limit the truncation order of different layers by their priorities. As noted above, a layer is specified by (t, q) which represents the qth SNR layer of the tth temporal layer. If q=0, then layer (t, 0) is a base layer of a temporal layer t. The priorities of the base layers of each temporal layer are decided by their dependence relationship: the lowest temporal base layer has the highest priority. In an embodiment, the priorities of other SNR layers are determined using the following algorithm. We start from the configuration where each temporal layer has only a base layer. At each iteration, the layer with the highest ratio of PSNR increase to rate increase is assigned with the next highest priority. The detailed algorithm is presented as follows.
Pseudo-Code to Determine the Priority Order:
For each rate obtained by sequentially adding the layers with decreasing priority, we can compute the distortion and PSNR compared with the original video sequence. The average PSNR is a piece-wise linear function of the average rate consisting of two line segments. The left line segment corresponds to the temporal scalability (i.e., some temporal frames are completely dropped and only the base layers are kept). The right line segment corresponds to the SNR scalability (i.e., all base layers are kept and some SNR layers are dropped). We next perform linear regression on each line segment and obtain a model, where the PSNR value Si can be written as follows:
where ri0,Si0 are the rate and the PSNR, respectively, at the intersection of the two line segments, and the last line specifies the maximum encoding rate of each video. That is, in an embodiment, ri0,Si0 are the rate and PSNR or user i when only the base layers of all temporal layers are kept. In one or more embodiments, depending upon the implementation, we may also want to use a minimum coding rate to maintain a minimum video quality. However, we assume that such a requirement can be achieved by admission control. For example, if the resulting data rate of user i is less than its minimum rate requirement, user i will be denied for admission.
Note that Li>Ki>0 based on the practical video modeling, and the function Si(r) is therefore a concave function with respect to r.
Long-Term Radio Resource Allocation and On-Line Scheduling
When multiple users request for streaming service of different video sequences concurrently through a wireless base station, the MAC scheduler in the base station needs to decide: (i) how much bandwidth should he allocated to each user; and (ii) how to achieve the desired bandwidth, in order to optimize the overall video quality. In a non-fading wireless network, the second problem can be simply solved by TDMA. But mobile networks often experience fast fading. In a fading wireless network, channel state dependent scheduling is often used to exploit multi-user diversity. Instead of considering the rate allocation on a short-term per-slot basis, we consider the long-term average resource allocation for each user in order to optimize average video quality under the assumption of fading wireless channels.
Problem Formulation
We assume that the link from the base station to each mobile client is fading with a known distribution. We consider a block fading model where the fading is constant in each time slot and changes independently from one time slot to the other. The complex baseband model of the ith user channel is given by the following:
y
i
=h,x+z (2)
where yi is the received signal, hi is the channel gain, x is the transmitted signal, and z is the complex Gaussian noise with zero mean and unit variance (we assume that the transmitted and the received signal are normalized with respect to the noise). We assume a fixed transmission power |x|1=ρ. We define the channel state h=(h1,h2, . . . ,hp) as the vector of all individual channel gains. We assume that the channel capacity for each user at each time slot t is as follows:
C(hi(t))=B log(1+ρ|hi(t)|2) (3)
where B is the channel bandwidth. We assume a discrete time system with a TDMA transmission strategy as follows: at each time slot, the server picks only one user (which may depend on the channel states of all the users) and sends information with the supportable rate of the channel of this scheduled user. It is to be appreciated that with practical code usually B log(1+ρ|hi(t)|2/Γ) can be achieved, where Γ≧1 represents the gap between the actual coding scheme and the Shannon capacity. All of our analysis and algorithms can be applied to this achievable rate equation.
Our objective is to maximize the weighted sum of average PSNR of all users as follows:
where Si is the PSNR of user i as modeled in Equation (1), ωi is the weight of user i, r=(r1, . . . ,rn), and is the achievable ergodic rate region.
The major challenge in solving problem (4) is that the achievable rate region cannot be explicitly specified in a fading environment. We next address this challenge by characterizing the achievable rate legion and its property.
Achievable Rate Region —Embodiment 1
Let C(h)=B log(1+ρ|h|2) denote the instantaneous capacity of the single link with the channel gain h. The achievable rate region for a TDMA strategy is given by
={r:ri≦Eh[I{s(h)=i}C(hi)], 1≦i≦n}
where s(h) is the index of the scheduled user for channel state h, and I{s(h)=i} is an indicator function which is equal to one if s(h)=i, and is equal to zero otherwise. We note that for the TDMA strategy, the scheduling function s(h) in general can be randomized. We show later that for any continuous distribution for the channel state h, each rate tuple in the boundary B can be achieved with a static scheduling function s(h) that returns only one index for a given channel state. Nonetheless, we consider the general randomized scheduling function to first show the convexity of the achievable rate region and then obtain the boundary points. Suppose that r(1) and r(2) are in the achievable rate region and are obtained by two scheduling functions s(1)(h) and s(2)(h). In other words, as follows:
r
i
(k)
=E
h
[I{s
(k)(h)=i}C(hi)], 1≦i≦n, k=1,2.
Thus, any intermediate point τr(1)+(1−τ)r(2), 0≦τ≦1 also belongs to the region by considering the randomized scheduling function as follows:
Therefore, we have proved that the achievable rate region is convex.
Next we define the boundary of the achievable rate region as B={r∈there exists no r′∈such that r r′}, where r r′ means that all components of r are less than or equal to r′ and at least one component in r is less than the corresponding one in r′. In other words, the boundary B is the set of the Pareto-optimal achievable rate tuples. Due to the non-decreasing property of the PSNR function Si(r), there must exist an optimal solution to problem (4) in the boundary region B of the achievable rate region. Therefore, to solve the PSNR optimization problem (4), it is sufficient to look for a solution on the boundary B.
The boundary surface B can be obtained by solving the following optimization problem:
for all μ=(μ1,μ2, . . . ,μn) that is a unit norm vector with positive real elements. To solve the above problem, for each channel state h we have the following:
Therefore, the solution for the scheduling function is given by the following:
s(h)=ih∈{h: μiC(hi)>μkC(hk),∀k≠i} (6)
In the solution given by Equation (6) we have ignored the set of channel states h for which μiC(hi)=μkC(hk) which in fact has zero probability if the distribution of h is continuous. We call the scheduling policy defined by s(h) in Equation (6) as the maximal scheduling policy, due to the fact that this scheduler only obtains the set of rates in the boundary which are the set of Pareto-optimal rate tuples.
Solution to Problem (4)
We can now view a boundary point r=(r1, . . . ,rn) as a function of the parameter set μ=(μ1, . . . ,μn), where the average rate ri of user i can be written as follows:
r
i
=E└C(hi)I(μiC(hi)>μjC(hj) for all j≠i)┘ (7)
Let γi=ρ|hi|2 denote the SINR of the user i with PDF function ƒγi(γ) and CDF function Fγi(γ). Let R(γ)=B log(1+γ). Thus, the rate ri can be computed as follows:
r
i(μ)=∫0∞R(x)Πj≠1Fγi(R−1(μiR(x)/μj))ƒγi(x)dx (8)
Now our objective is simply to address the following:
Note that Si(ri) is a non-decreasing concave function of ri.
A general approach to solve an optimization problem is the gradient-based method: we start with some point μ0, and in each step k, we find the gradient and update the vector μ(k) along a direction d that is related to the gradient:
μ
(k+1)=μ
(k)+α(k)d(k) (10)
where d(k)=D(k)∇Y(μ(k)), where D(k) is a positive definite symmetric matrix and ∇Y(μ(k)) is the gradient of Y with respect to μ(k).
However, the challenge with problem (9) is (i) the function Y is generally not concave (or convex) with respect to μ, and (ii) the function Y is not differentiable at the points when some ri=ri0 or ri=rimax. When a function is not concave or convex, the solution generated by the gradient-based approach is often only a local maximum (or minimum) but not a global maximum. Hereinafter, we develop an algorithm and prove that the limit point of the algorithm is the global maximum of problem (9) under mild conditions.
To resolve the second issue, we note that although Si is not differentiable at ri=ri0 or ri=rimax, it has one-sided derivatives. For functions with one-sided derivatives, the following lemma is a simple generalization of the first-order necessary optimality conditions.
Lemma 2 If μ* is a local maximum of Y, then the following applies:
Yi+′(μ*)≦0 (11)
Yi−′(μ*)≧0 (12)
for all 1≦i≦n, where
is the right-sided partial derivative and
is the left-sided partial derivative.
The proof is straightforward because if it is not true for any i, we can perturb μi sufficiently small to increase the objective value Y.
Now with the one-sided partial derivative, we can obtain an iterative modified gradient-based solution as follows. First, we compute the modified gradient g(k)=(g1(k),g2(k), . . . ,gn(k)) in each iteration k:
Let i0=arg max(|gi(k)|). The ascent direction is chosen to be d(k)=(0, . . . ,gi
Step-size selection using A modified Armijo Rule: in any gradient-based approach, we also need to choose the step size α(k) appropriately in order for the algorithm to converge to a local maximum. The Armijo rule is a simple but effective rule to choose the step size when the gradient exists. For fixed scalars α0,σ,β, the Armijo rule chooses the minimum non-negative m such that αk=α0βm and
Y(μ(k)+α0βmd(k))−Y(μ(k))≧σα0βm∇Y(μ(k))Td(k) (14)
where ∇Y(μ(k))T is the transpose of the gradient of Y with respect to μ. In our problem, the gradient ∇Y(μ(k)) may not exist, so we define a modified Armijo rule using d(k) to replace the gradient ∇Y(μ(k))T in Equation (14). The pseudo-code of our algorithm is listed next.
With the choice of the gradient-based approach and the Armijo rule, we can show the convergence of the algorithm, which is summarized next. Note that step 4 of the algorithm employs finite stopping conditions. In the convergence analysis, we always assume the algorithm never stops and study the limit point of μ(k) and Y(k).
Lemma 3 Algorithm A1 converges to a point μ* satisfying the necessary conditions of a local maximum in Equations (11,) and (12,), assuming that the finite stopping condition is removed.
Optimality of the Algorithm A1
Theorem 1 The limit point of the algorithm A1 is a global maximum of function Y assuming that the PSNR function Si(ri) is non-decreasing, concave, and continuously differentiable with respect to ri.
Online Scheduling for SVC Video Streaming
An online scheduling algorithm for real-time video applications needs to address three issues:
1. User scheduling: at each time slot, which user should be scheduled?
2. Frame scheduling: after a user is selected, which packets/frames of the selected user should be transmitted?
3. Dropping strategy: when does it need to drop frames and which frames should be dropped?
The resource allocation algorithm presented in the section produces two results: (1) the vector μ used for user-scheduling; and (2) the achievable average rate ri for each user i. We next describe two online scheduling schemes exploiting these results. In the first scheme, we simply apply the results obtained from the resource allocation algorithm. In the second scheme, we also consider the bursty and dynamic arrival and the deadline of video frames.
Static Scheduling Scheme
In this first scheme, the vector μ is computed from the previous section and is fixed during the process of streaming (μ may be re-computed when new users join or some users leave). At each time slot, the user with the largest μiCi is chosen for scheduling, where Ci is the current channel capacity of user i, and the vector μ={ρ1,μ2, . . . ,μn} is computed from the long-term resource allocation algorithm in the previous section.
Frame scheduling for the selected user is based on both the deadline and priority of the packets. We differentiate between two types of deadlines, namely the Playout deadline and the Decoding deadline. The Playout deadline is the time a frame needs to be displayed. The decoding deadline is the earliest time that a frame is needed for decoding itself or other frames. The decoding deadline of a frame can be computed as the minimum playout deadline of all frames that depend on it. Table I shows the playout deadline and the corresponding decoding deadline of a video sequence encoded using SVC with hierarchical B-frames with GOP size 8. Note that the playout deadline of each frame is fixed but the decoding deadline may change. In Table I, if the flame β1 is dropped, then the decoding deadline of B2, B4, and P8 will all become 2.
We then schedule packets of a given user in the order of their decoding deadline. Those packets with the same decoding deadline are scheduled in the order of their priority, which is also described herein.
As to the dropping strategy, there are two types of dropping. The first is late dropping, which happens when the playout deadline of a packet is passed. If the base layer of a frame is dropped, all dependent frames are dropped too. Note that when all packets of a frame are either successfully transmitted or dropped, the decoding deadline of the frames that it depends on need to be re-computed.
The second type of dropping is early dropping. With the achievable rate computed as described herein before, we can pre-determine which layers should be dropped based on the rate requirement. We find the minimum priority such that the average data rate of the packets with priority higher than or equal to the minimum priority does not exceed the achievable rate computed herein before. All packets with priority lower than the minimum priority are dropped at the beginning of the video streaming.
Dynamic Scheduling Scheme
The dynamic scheduling is built on top of the static scheduling scheme with two additional enhancements. The first enhancement is based on the user scheduling. At each time slot, still, the user with the largest μiCi is selected for scheduling. However, the vector μ is periodically updated to reflect both the bursty arrival and the deadline of video traffic.
Assume that for user i, the size of total packets that need to be transmitted before the deadline Tj is Qj. We define the target rate
where t is the current time.
Now with the target rate
over all possible choices of μ. Clearly, if the optimal value of Equation (16) is larger than or equal to 1, then the target rate
Theorem 2 Solving problem (16) is equivalent to solving the following problem: find μ such that
assuming that the channel distribution function ƒr(γ) is a continuous function of γ for all i.
Proof. First we show that at the optimum solution of Problem (16), the condition (17) is satisfied. If it is not satisfied, then max(ri(μ)/
Therefore, the following applies:
This contradicts to the fact that μ maximizes the objective function in (16), Thus, the condition (17) must be satisfied if μ is the optimal solution of (16).
Second, we prove that the value ri/
To solve problem (17), we define g=(g1, . . . ,gn),
In each iteration k, we compute
d
(k)=(∇g(μ)∇g(μ)T)−1∇g(μ)(g(μ)−
and update μ(k) as follows:
μ
(k+1)=μ(k)−α(k)d(k)
where α(k) is the step size chosen by Armijo rule. In other words, α(k)=α0βm where m is the minimum non-negative m such that:
h(μ(k))−h(μ(k)−α(k)d(k)≧σα(k)(∇h(μ)T)d(k) (20)
The pseudo-code of the algorithm is presented as follows:
Lemma 5 Algorithm A3 converges to a stationary point of the function h(μ) (defined in Equation (18) assuming σ<1 and the finite stopping condition in the outer-loop is removed.
In general, the stationary point of a function h(μ) is not necessarily a global minimum of the function because of the non-convexity of the function. But, surprisingly, we can also prove that the limit point of the algorithm A3 is a global minimum (i.e., 0) of the function h(μ), which is also the unique solution to problems (16) and (17), even though the function h(μ) may not be convex. The result is stated in the following theorem and the proof is given hereinafter.
Theorem 3 Algorithm A3 converges to the optimum solution to the problems (16) and (17) assuming that the finite stopping condition in the outer-loop is removed.
The second enhancement of this dynamic scheduling scheme is on the frame dropping strategy. Note that the algorithm A3 not only produces the new vector μ that is used for user scheduling, but also the value
If η>1, then it indicates that the target rate vector
When we need to drop some packets (i.e., η<η) or need to put some dropped packets back to the queue (i.e., η>
Having described preferred embodiments of a system and method (which are intended to be illustrative and not limiting), it is noted that modifications and variations can be made by persons skilled in the art in light of the above teachings. It is therefore to be understood that changes may be made in the particular embodiments disclosed which are within the scope and spirit of the invention as outlined by the appended claims. Having thus described aspects of the invention, with the details and particularity required by the patent laws, what is claimed and desired protected by Letters Patent is set forth in the appended claims.
This application claims priority to provisional application Ser. No. 61/102,092 filed on Oct. 2, 2008, and provisional application Ser. No. 61/117,652 filed on Nov. 25, 2008, both of which are incorporated herein by reference.
Number | Date | Country | |
---|---|---|---|
61102092 | Oct 2008 | US | |
61117652 | Nov 2008 | US |