This disclosure relates generally to the field of telecommunications and in particular to methods for scheduling cellular uplinks.
The 3GPP LTE-A based cellular network [1] together with the IEEE 802.16m based cellular network are the only two cellular networks classified as 4G cellular networks by the international telecommunications union. Some key attributes that a 4G uplink must possess are the ability to support a peak spectral efficiency of 15 bps/Hz and a cell average spectral efficiency of 2 bps/Hz, ultra-low latency and bandwidths of up to 100 MHz. To achieve these ambitious specifications, the 3GPP LTE-A uplink is based on a modified form of the orthogonal frequency-division multiplexing based multiple-access (OFDMA) [1]. In addition, it allows precoded multi-stream (precoded MIMO) transmission from each scheduled user as well as flexible multi-user scheduling. Notice that while OFDMA itself allows for significant spectral efficiency gains via channel dependent frequency domain scheduling, multi-user multi-stream communication promises substantially higher degrees of freedom [2]. Our focus in this paper is on the 3GPP LTE-A uplink (UL) and in particular on MU MIMO scheduling for the LTE-A UL. Predominantly, almost all of the 4G cellular systems that will be deployed will be based on the 3GPP LTE-A standard [1]. This standard is an enhancement of the basic LTE standard which is referred to in the industry as Release 8 and indeed deployments conforming to Release 8 are already underway. The scheduling in the LTE-A UL is done in the frequency domain where in each scheduling interval the scheduler assigns one or more resource blocks (RBs) to each scheduled user. Each RB contains a pre-defined set of consecutive subcarriers and consecutive OFDM symbols and is the minimum allocation unit.
The goal of this work is to design practical uplink MU-MIMO resource allocation algorithms for the LTE-A cellular network, where the term resource refers to RBs as well as precoding matrices. In particular, we consider the design of resource allocation algorithms via weighted sum rate utility maximization that account for finite user queues (buffers) and finite precoding codebooks. In addition, the designed algorithms comply with all the main practical constraints on the assignment of RBs and precoders to the scheduled users. Our main contributions are as follows:
1) We first assume that users can employ ideal Gaussian codes and that the base-station (BS) can employ an optimal receiver. We then enforce user rates to lie in a fundamental achievable rate region of the multiple access channel which is a polymatroid and show that the resulting resource allocation problem is NP-hard. We prove that the resource allocation problem can however be formulated as the maximization of a monotonic sub-modular set function subject to one matroid and multiple knapsack constraints, and can be solved using a recently discovered polynomial time randomized constant-factor approximation algorithm [3]. We also adapt a simpler deterministic greedy algorithm and show that it yields a constant-factor approximation for scenarios of interest.
2) We then consider scenarios where users employ codes constructed over finite alphabets. In this case the mutual information terms needed to specify an achievable rate region do not have closed form expressions. On the other hand the achievable rate region obtained for Gaussian alphabets can be a loose outer bound. Consequently, we obtain a tighter outer bound which is also a polymatroid. As a result all algorithms developed for Gaussian alphabets can be reused after simple modifications. Finally, we demonstrate the superior performance of our proposed algorithms via simulations using a realistic channel model.
An advance is made in the art according to aspects of the present disclosure directed to methods and systems for efficiently scheduling multiple-users in a 4G and 4GPP LTE cellular networks.
A more complete understanding of the disclosure may be realized by reference to the accompanying drawing in which:
The following merely illustrates the principles of the various embodiments. It will thus be appreciated that those skilled in the art will be able to devise various arrangements which, although not explicitly described or shown herein, embody the principles of the embodiments and are included within their spirit and scope.
Furthermore, all examples and conditional language recited herein are principally intended expressly to be only for pedagogical purposes to aid the reader in understanding the principles of the embodiments and the concepts contributed by the inventor(s) to furthering the art, and are to be construed as being without limitation to such specifically recited examples and conditions.
Moreover, all statements herein reciting principles, aspects, and embodiments of the invention, as well as specific examples thereof, are intended to encompass both structural and functional equivalents thereof. Additionally, it is intended that such equivalents include both currently known equivalents as well as equivalents developed in the future, i.e., any elements developed that perform the same function, regardless of structure. Where the particular embodiments are methods and/or algorithms, it is understood that such methods and/or algorithms execute on any of a variety of commercially available processors, computers, and equivalents, whether dedicated or general purpose.
Thus, for example, it will be appreciated by those skilled in the art that the block diagrams herein represent conceptual views of illustrative circuitry embodying the principles of the invention. Similarly, it will be appreciated that any flow charts, flow diagrams, state transition diagrams, pseudocode, and the like represent various processes which may be substantially represented in computer readable medium and so executed by a computer or processor, whether or not such computer or processor is explicitly shown.
The functions of the various elements shown in the FIGs., including any functional blocks labeled as “processors” may be provided through the use of dedicated hardware as well as hardware capable of executing software in association with appropriate software. When provided by a processor, the functions may be provided by a single dedicated processor, by a single shared processor, or by a plurality of individual processors, some of which may be shared. Moreover, explicit use of the term “processor” or “controller” should not be construed to refer exclusively to hardware capable of executing software, and may implicitly include, without limitation, digital signal processor (DSP) hardware, read-only memory (ROM) for storing software, random access memory (RAM), and non-volatile storage. Other hardware, conventional and/or custom, may also be included. Similarly, any switches shown in the FIGs. are conceptual only. Their function may be carried out through the operation of program logic, through dedicated logic, through the interaction of program control and dedicated logic, or even manually, the particular technique being selectable by the implementer as more specifically understood from the context.
In the claims hereof any element expressed as a means for performing a specified function is intended to encompass any way of performing that function including, for example, a) a combination of circuit elements which performs that function or b) software in any form, including, therefore, firmware, microcode or the like, combined with appropriate circuitry for executing that software to perform the function. The invention as defined by such claims resides in the fact that the functionalities provided by the various recited means are combined and brought together in the manner which the claims call for. Applicants thus regard any means which can provide those functionalities as equivalent as those shown herein.
Unless otherwise explicitly specified herein, the drawings are not drawn to scale.
Multi-User MIMO Scheduling in the LTE-A UL
Consider a single-cell with K users and one BS which is assumed to have Nr≧1 receive antennas. Suppose that user k has Nt≧1 transmit antennas and its power budget is Pk. Let Hk(n) denote the channel seen by the BS from user k on RB n. We let N denote the total number of RBs. In the following all rates are normalized by the number of resource elements in an RB.
We consider the problem of scheduling users in the frequency domain in a given scheduling interval. Let αk, 1≦k≦K denote the weight of the kth user which is an input to the scheduling algorithm and is updated using the output of the scheduling algorithm in every scheduling interval according to the proportional fairness rule [20]. Letting rk denote the rate assigned to the kth user, we consider the following weighted sum rate utility maximization problem,
where the maximization is over the assignment of RBs, precoders and powers to the users subject to:
One precoder and one power level per user: Each scheduled user can be assigned one precoding matrix from a finite codebook of such matrices . In addition, each scheduled user can transmit with only one power level (or power spectral density (PSD)) on all its assigned RBs. This PSD is implicitly determined by the number of RBs assigned to that user, i.e., the user divides its total power equally among all its assigned RBs.
At most two chunks per-user: The set of RBs assigned to each scheduled user should form at-most two mutually non-contiguous chunks, where each chunk is a set of contiguous RBs. This constraint is a compromise between the need to provide enough scheduling flexibility and the need to keep the per-user peak-to-average-power ratio (PAPR) under check. Feasible RB allocation and co-scheduling of users in LTE-A multi-user uplink is depicted in
Finite buffers and finite input alphabets: We let Qk denote the queue (buffer) size in bits and let Sk denote the maximum alphabet size of the kth user, respectively. Thus, the rate rk assigned to user k cannot exceed Qk and on any RB user k cannot achieve a rate greater than log(Sk).
Control channel overhead constraints: Recall that every user that is given an UL grant (i.e., is scheduled on at least one RB) must be informed about its transmission rate and the set of RBs on which it must transmit along with the precoder it should employ. This information is sent on the DL control channel of limited capacity which imposes a limit on the number of users that can be scheduled. In particular, the scheduling information of a user is encoded and formatted into a packet and the size of the packet can be selected from a predetermined set of packet sizes. A longer (shorter) packet is used for a cell edge (cell interior) user. Furthermore, in order to minimize the number of blind decoding attempts by the users, each user is assigned a search space in the control channel and it searches for packets only in that space.1
Per sub-band interference limit constraints: Inter-cell interference mitigation is performed by imposing interference limit constraints. In particular, on one or more subbands, the cell of interest must ensure that the total interference imposed by its scheduled users on a neighboring base-station is below a specified limit.
We will formulate the optimization problem in (1) as the maximization of a submodular set function subject to one matroid and multiple knapsack (linear packing) constraints.
Towards this end, let e=(u, c, W) denote an element, where 1≦u≦K denotes a user, W∈ denotes a precoder from a finite codebook and c∈C denotes a valid assignment of RBs chosen from the set C of all possible such valid assignments. In particular, each c is a vector with binary-valued ({0, 1}) elements and we say an RB i belongs to c (i∈c) if c contains a one in its ith position, i.e., c(i)=1. In addition c1 and c2 are said to intersect if there is some RB that belongs to both c1 and c2. Next, we let
ε={e=(u, c, W):1≦u≦K, c∈C, W∈} denote the ground set of all possible such elements. For any such element we adopt the convention that
e=(u, c, W)ce=c; We=W; ue=u; Se=Su; (2)
αe=αu; Qe=Qu; He(n)=Hu(n). (3)
In addition, we let pe denote the flower level (PSD) associated with the element e=(u, c, W). This PSD can be computed as
where size(c) denotes the number of ones (number of RBs) in c. Let αe, Qedenote the weight and buffer (queue) size associated with the element e, respectively and let redenote the rate associated with the element e. We will use the phrase selecting an element e to imply that the user ueis scheduled to transmit on the RBs indicated in cewith PSD pe and precoder We. Thus, the constraints of one precoder and one power level per user along with at most two chunks per-user can be imposed by allowing the scheduler to select any subset of elements U⊂ε such that Σe∈U1{ue=u}≦1 for each u∈{1, . . . , K}, where 1{.} denotes the indicator function. Accordingly, we define a family of subsets of ε, denoted by I, as
We next consider the decodability constraint after first assuming that each user can employ ideal Gaussian codes (i.e., codes for which the coded modulated symbols can be regarded as i.i.d. Gaussian) and that the base-station (BS) can employ an optimal receiver. Subsequently, we will impose a finite input alphabet constraint. Note that under the assumption of ideal Gaussian codes, the DFT spreading operation performed by each user on its codeword 2 has no effect (i.i.d. Gaussian distribution is invariant with respect to any unitary linear transformation). Accordingly, we define a set function f:2ε→IR+ as
It can be verified that f(.) defined in (5) is a submodular set function [5,21], i.e.,
f(A∪{e})−f(A)≧f(B∪{e})−f(B), (6)
for all A⊂B⊂ε and e∈ε. Further since it is monotonic (i.e., f(A)≦f(B), ∀A⊂B) and normalized f(Φ)=0, where Φ denotes the empty set, we can assert that f(.) is a rank function. Consequently, for each U⊂ε, the region
is a polymatroid [5]. Note that for each U ⊂ ε, P(U,f) is the fundamental achievable rate region of a multiple access channel. In particular, each rate-tuple ru=[re]e∈U∈P(U, f) is achievable [5] in the sense that for any rate assignment arbitrarily close to ru(i.e., r:r≦ru) there exist coding and decoding schemes that can meet any acceptable level of error probability. Thus, we can impose decodability constraints by imposing that the assigned rate-tuple satisfy ru∈P(U, f) for any selected subset U⊂ε.
Next, in order to impose buffer (queue) constraints, we define a hyper rectangle
B(U)={∈IR+|U|:0≦re≦Qe, ∀e∈U}, ∀U⊂ε. (8)
Thus, for a (tentative) choice U, we can satisfy both decodability and buffer constraints by assigning only rate-tuples that lie in P(U,f)∩B(U). Clearly among all such rate-tuples we are interested in the one that maximizes the weighted sum rate. Hence, without loss of optimality with respect to (1), with each U⊂ε we can associate a rate-tuple in P(U, f)∩B(U) that maximizes the weighted sum rate. Consequently, we define the following set function that determines the reward obtained upon selecting any subset of ε. We define the set function h:2ε→IR+ as
Let us now consider the control channel overhead constraints. Let L denote the number of search regions and recall that each user is associated with only one search region. We associate each element e with the search region of ue. Let π:ε→{1, . . . , |ε|} denote a bijective mapping and let xUdenote the characteristic vector of any subset U⊂ε, i.e., xUis a binary valued |ε| length vector having ones in positions {π(e)}e∈Uand zeros elsewhere. Then the control channel overhead constraints can be represented as L packing (knapsack) constraints such that a subset U is feasible if and only if
ACxU≦1L, (11)
where AC∈[0,1]L×|ε| and 1L is a L length vector of ones. Note that any element e∈ε can be involved in only one of the L knapsack constraints, which in particular corresponds to the search region assigned to user ue.
Finally, let us consider the interference limit constraints. Suppose that the cell of interest is surrounded by M adjacent cells (or sectors). Let em be an N-length vector of binary valued elements which conveys the RBs such that the total interference caused to the mth base station over all the RBs in em should be no greater than a specified upper bound. In particular, let Ru,m be the (wide-band) correlation matrix of the channel seen at the mth base station from the uth user in the cell of interest. Then the total interference caused to the mth base station over all the RBs indicated in em, upon selecting elements in any set U⊂ε is equal to
Then, we are allowed to select any set of elements U⊂ε such that the resulting total interference imposed on the mth base station over all the RBs indicated in em is no greater than a specified upper bound γ(m), i.e., such that
Thus, all the interference limit constraints can be represented as M packing (knapsack) constraints given by
AIxU≦1M, (13)
where AI∈[0, 1]M×|ε∥ and 1M is a M length vector of ones.
Summarizing the aforementioned results, we have formulated (1) as the following optimization problem:
In (14) we regard M, L as constants that are arbitrarily fixed. Then, for a given number of users K, number of RBs N and the codebook cardinality || (which together fix the size of the ground set ε), an instance (or input) of the problem in (14) consists of a set of user weights {αu} and queue sizes {Qu}, per-user per-RB channel matrices {Hu(n)}: 1≦u≦K, 1≦n≦N, a codebook (of cardinality ||) along with the matrices AC∈[0,1]L×|ε| and AI∈[0,1]M×|ε|. The output is a subset Û⊂ε along with a rate-tuple rÛ. Note that the cardinality of the ground set |ε| is O(K||N4).
We first introduce the following two results that will be invoked later.
Lemma 1 The family of subsets I defined in (4) is an independence family and (ε, I) is a partition matroid.
Proof: First we note that I is downward closed, i.e., if A∈I then any B⊂A satisfies B∈I. Next, let ε(k) denote the set of all e∈ε:ue=k and notice that ε(k)∩ε(j)=Φ, ∀k≠j. Then, note that I can also be defined as A∈I|A∩ε(k)|≦1∀1≦k≦K. Further, it can be verified I satisfies the exchange property, i.e., for any A, B∈I such that |A|>|B| we have that ∃e∈A\B such that B∪{e}∈I. Thus, we can conclude that (ε,I) is a partition matroid.
The proof of the following lemma follows from basic definitions [21] and is skipped for brevity.
Lemma 2 The region P(U, f)∩B(U), ∀U⊂ε is a polymatroid characterized by the rank function ∫′:2ε→IR+ where
We are now ready to offer our main result. Let us assume that computing h(U) for any U⊂ε incurs a unit cost. We will show that even under this assumption the problem in (14) is NP hard.
Theorem 1 The optimization problem in (14) is NP hard and is the maximization of a monotonic sub-modular set function subject to one matroid and multiple knapsack constraints. For a fixed number of knapsack constraints and any arbitrarily fixed ε>0, there exists a randomized algorithm whose complexity scales polynomially in |ε| and which yields a 1−1/e−ε approximation to (14).
Proof: We will first show that (14) is the maximization of a monotonic sub-modular set function subject to one matroid and multiple knapsack constraints. Invoking Lemma 1, it suffices to show that the function h(.) is a monotonic submodular set function. From the definition of h(.) in (9) it is readily seen that it is monotonic, i.e., h(U′)≦h(U), ∀U′⊂U⊂ε. Let o(.,.) denote an ordering function such that for any subset U⊂ε, o(U, k) is the element having the kth largest weight among the elements in U. Hence we have that αo(U,1)≧αo(U,2)≧αo(U,|U|). Further, let us adopt the convention that for any subset U⊂ε, o(U, k)=Φ, ∀k ≧|I|+1 & αΦ=0. We can now invoke Lemma 2 together with the important property that the rate-tuple in any polymatroid that maximizes the weighted sum is determined by the corner point of that polymatroid in which the elements are arranged in the non-increasing order of their weights [21]. Thus, we can express h(.) as
A key step is to express (16) as
It can be verified that since ∫′(.) is monotonic and submodular, each set function
f′k(U)=f′({o(ε,1), . . . , o(ε,k)}∩U), ∀U⊂ε. (18)
is also a monotonic and submodular set function. From (17) it can be inferred that since h(.) is a weighted sum of monotonic submodular functions in which all the weights are non-negative, it is a monotonic submodular set function. Thus, (14) is the maximization of a monotonic submodular set function subject to one matroid and multiple knapsack constraints. Assuming that the number of knapsack constraints in (14) is fixed (i.e., M, L are fixed) and referring to [7], wherein the maximization of a generic submodular function subject to one matroid and a multiple albeit fixed number of knapsack constraints is considered, we can obtain a randomized algorithm whose complexity is polynomial in |ε| and which offers the aforementioned guarantee.
We will now show that (14) is an NP hard problem. We will consider instances of the problem where the number of RBs N=1, all users have identical weights and one transmit antenna each and where the codebook is degenerate, i.e., =1. Thus, we have |ε|=K. In addition, we assume that the number of receive antennas is equal to the number of users K so that a given input of user channels forms a K×K matrix, denoted here by H=[hk]k−1K. Further, we will assume only one knapsack constraint which in particular is a cardinality constraint on the number of users that can be scheduled on the one available RB. We will show that the problem specialized to these instances is also NP-hard so that the original problem is NP-hard. Note that the matroid constraint now becomes redundant and (14) simplifies to maximizing the sum rate under a cardinality constraint
where 1≦C≦K is the input maximum cardinality. Now using the determinant equality
log|I+HDH†|=log|I+DH†HD| (20)
together with the monotonicity of the objective function, we can re-write (19) as
Note that (21) is equivalent to determining the C×C principal sub-matrix of the positive definite matrix I+H†H having the maximum determinant. Note that for a given K, an instance of the problem in (21) is the matrix H together with C. We will prove that (21) is NP-hard via contradiction. Suppose now that an efficient algorithm (with a complexity polynomial in K) exists that can optimally solve (21) for any input K×K matrix H and any C: 1≦C≦K. This in turn would imply that there exists an efficient algorithm (with a complexity polynomial in K) that for any input C: 1≦C≦K and any K×K positive definite matrix Σ, can determine the C×C principal sub-matrix of Σ having the maximum determinant. Invoking the reduction developed in [22], this would then contradict the NP hardness of the problem of determining whether a given input graph has a clique of a given input size.
We remark that upon specializing the algorithm from [7] (which considers the maximization of any submodular function subject to one matroid and multiple knapsack constraints), we obtain an algorithm that has a good guarantee but a high complexity (since it involves partial enumeration). Another simpler randomized algorithm is indeed possible as shown in the result below.
Theorem 2 There is a randomized algorithm whose complexity scales polynomially in |ε| and which yields
approximation to (14).
proof: The key observation is that the partition matroid constraint in (14) can he expressed as K linear packing constraints (one for each user). Let AP denote the resulting K×|ε| packing matrix whose kth row corresponds to the kth user. Note that this row has ones in each position for which the corresponding element e satisfies ue=k and zeros elsewhere. Together these K packing constraints are sparse packing constraints wherein in each column a non-zero entry appears only once. Similarly, since each user is assigned only one control region, all elements associated with the same user can be involved in only one out of the the L control channel overhead constraints. Thus, the total K+L+M packing constraints are sparse constraints in which each element can appear in at-most M+2 constraints so that each column can have at-most M+2 non-zero entries. With this understanding, we can invoke the randomized algorithm from [8] which is applicable to the maximization of any monotonic submodular function subject to sparse packing constraints and obtain the guarantee claimed in the theorem.
Notice that since any monotonic submodular set function is also monotonic and sub-additive, we can infer the following result from Theorem 1
Lemma 3 The function h(.) defined in (9) is sub-additive, i.e.,
h(U)≦h(U1)+h(U2), ∀U1,U2U:U1∪U2=U. (22)
We note that while both the aforementioned randomized algorithms involve solving a continuous relaxation of (14) using a continuous greedy procedure [23], the rounding with alteration method proposed in [8] to obtain a feasible solution is significantly simpler. However, for practical implementation an even simpler combinatorial (deterministic) algorithm is required. Unfortunately, as remarked in [7], it is difficult to design combinatorial (deterministic) algorithms that can combine both matroid and knapsack constraints. Nevertheless in Algorithm I we specialize a well known greedy algorithm to our problem of interest (14). The following result provides the worst-case guarantee offered by Algorithm I.
Theorem 3 The complexity of Algorithm I is O(K2N4||) and it yields a 1/K approximation to (14). Further, if each knapsack constraint is a matroid constraint then Algorithm I yields a
approximation to (14).
Proof: We first consider the complexity of Algorithm I and note that since the partition matroid constraint needs to be satisfied, there can be at-most K steps in repeat-until loop of the algorithm. Also, recall that the the size of the ground set ε is O(KN4||). Then, at each step we need to compute h(S∪e) for each e∈ε\S such that S∪e satisfies all the constraints. Thus, the worst-case complexity is O(K2N4||).
S ← S ∪ e
Let us now consider the approximation guarantees. Notice that due to the partition matroid constraint any optimal solution to (14) cannot contain more that K elements. Then, using the subadditivity of h(.) shown in Lemma 3 together with the fact that in its first step Algorithm I selects the element of ε having the highest weighted rate suffice to prove the 1/K guarantee. On the other hand, suppose that all knapsack constraints are matroid constraints (over all instances). For any instance, let (ε,Il) denote the matroid corresponding to the lth control channel constraint, where we let Il denote the independence family. Thus the set of elements that satisfy all the control channel overhead constraints belong to the matroid intersection ∩l=1L(ε,Il). Now recall that the control channel constraints involve mutually non-overlapping users and hence mutually non-overlapping elements of ε. Let εl denote the set of elements involved in the lth control channel constraint (i.e., all elements for which the entry in the lth row of AC is non-zero) and I′l denote the set of all subsets of εl that satisfy the lth control channel constraint. We see that (εl,I′l) is a matroid since I′l is downward closed and satisfies the exchange property.3 More importantly
∩l=1L(ε,Il)=∪l=1L(εl,I′l), (24)
where °l=1L(εl,Il) denote the union of matroids and hence is itself a matroid on ε. We caution that the union of matroids is a different operation than what one might expect [21]. Thus, the L control channel overhead constraints indeed are identical to one matroid constraint. Then, combining with the partition matroid and the other M (interference limit) matroid constraints, we see that the feasible subsets belong to the intersection of p=M+2 matroids and hence form a p-system where p=M+L+1. Then invoking the guarantee offered by the greedy algorithm on a p-system [24,25], proves the second part.
Recall that hitherto we have assumed that computing h(U) for any U⊂ε incurs a unit cost. We can indeed show that Algorithm I has polynomial complexity under a stricter notion that computing f(U) (instead of h(U)) for any U⊂ε incurs a unit cost.4 To show this, it suffices to prove that h(U) can be determined with a complexity polynomial in |U|. A key observation towards this end is that for any U⊂ε, ∫′(U) in (15) can be computed as
Then, since the function f(R)−Σe∈RQe, ∀R⊂ε is a submodular set function, we can solve the minimization in (25) using submodular function minimization routines that have a complexity polynomial in |U| [26,27]. Thus, from (16) we can conclude that h(U) can indeed be determined with a complexity polynomial in |U|.
We now propose simple observations that can considerably speed up the greedy algorithm
where the first inequality stems from the fact that h(S∪e′) is monotonically increasing in the transmit PSD of e′ and the second inequality stems from the monotonicity and subadditivity of h(.). Thus, we have that
h(S∪e′)≦2 max{h(S∪e1), h(S∪e2)}. (27)
Then if S∪e1, S∪e2 as well as S∪e′ satisfy all the constraints, we can evaluate h(S∪e1), h(S∪e2) and skip evaluating h(S∪e′). By adopting this procedure over all elements in ε\S, we can ensure that the element selected will offer at-least ½ the gain yielded by the locally optimal element. Then, using a well known result on the greedy algorithm with an approximately optimal selection at each step [24,25] we can conclude that this variation of our greedy algorithm will yield an approximation guarantee
of when all knapsack constraints are matroid constraints.
Some remarks on the performance of Algorithm I are due. Clearly the approximation guarantee is much better when all knapsack constraints are matroid constraints. When the matrices AI, AC have rational valued elements, necessary and sufficient conditions for a knapsack constraint to be matroid constraint have been derived in [30]. A simple sufficient condition for a knapsack constraint to be matroid constraint is the following.
Assumption 1 The ith knapsack constraint is a matroid constraint if all its strictly positive coefficients are identical, i.e., 1{Ai,j>0}=1{Ai,k>0}Ai,j=Ai,k, ∀j,k.
Situations when the above assumption is satisfied in practise are common and occur when:
Indeed a better guarantee for the greedy algorithm can be claimed when Assumption 1 is satisfied. The following result follows from the fact that when Assumption 1 is satisfied by each control channel constraint, then the intersection of the partition matroid and the L control channel constraints is in-fact a single matroid on ε.
Theorem 4 If all knapsack constraints are matroid constraints and each control channel constraint further satisfies Assumption 1, then Algorithm I yields a
approximation to (14).
Finite alphabet constraints:
In practise every user u∈{1, . . . , K} can transmit using an alphabet of cardinality at-most Su on each of its assigned RBs. In this section we impose a finite alphabet constraint by enforcing that the rate assigned to each user should be decomposable as a sum of rates over its assigned RBs and that the rate on each assigned RB should not exceed log|Su|, the latter being the maximum rate that can be achieved with any alphabet of cardinality |Su|. For each element e∈ε with ue=u, we define its maximum alphabet size as Se=Su. We first define a rank function f(n):2ε→IR+ for each 1≦n≦N, as
along with the polymatroid
We then recall the following useful result that can be inferred from [5]
Lemma 4 The region P(U, f) and the region
are identical.
Consequently, we can impose decodability and finite alphabet constraints by enforcing that any rate tuple r assigned to a given subset U of selected elements must be decomposable as
This ensures that the assigned rate tuple will not require a scheduled user to violate its alphabet cardinality constraint on any of its assigned RBs. Then, accommodating the buffer constraints as well, we impose that any rate tuple r assigned to a given subset U of selected elements must satisfy
We next offer the following result.
Theorem 5 For any choice of selected elements U⊂ε, the set of rate vectors that satisfy (32) is identical to the polymatroid T(U,g′) which is characterized by the rank function
Proof: We start by invoking Lemma 2 to deduce that the region
is a polymatroid with rank function g(n)(U) defined in (35). Thus, for any given subset of elements U the rate region of interest defined in (31) can also be expressed as
Define the function g(.) as in (34) and notice that g(.) is also a rank function so that the region
is also a polymatroid. Then along similar lines as Lemma 4, we can show that the rate region defined in (31) and T(U,g) are identical. Thus, for any given subset of elements U we can impose the finite alphabet constraint by considering the rate vectors that lie in the polymatroid T(U,g). Finally we can impose queue constraints as well by considering instead the rate region T(U,g′)=T(U,g)∩B(U) which is identical to the region defined in (32). The fact that T(U,g′) is a polymatroid characterized by a rank function g′(.) defined in (33) follows upon again invoking Lemma 2.
Upon by defining
we consider the optimization problem
As before, it can be shown h′(.) is a monotonic submodular set function so that the optimization problem in (39) is the maximization of a monotonic submodular function subject to one matroid and multiple knapsack constraints. Algorithm I and its associated results are thus applicable. We note that while h′(U) can also be computed with polynomial complexity for any U⊂ε, it is more complex than computing h(U).
MU-MIMO Scheduling in the LTE UL
We now consider UL MU-MIMO scheduling in LTE systems. As opposed to the LTE-A MU-MIMO scheduling there are three main differences.
The third constraint which demands complete overlap among users who share even one RB cannot be formulated as a matroid constraint and would require a large number of knapsack constraints (that would depend on both K and N). In addition the region of rates that can be achieved by simple receivers does not form a polymatroid. This renders the previous approach used for LTE-A scheduling unworkable. Fortunately, this constraint along with the one which mandates at-most one chunk per scheduled user facilitate the use of a local ratio test (LRT) based method. LRT was developed in [10] and has been used for interval scheduling problems among others. Recently it was used in [11] to develop a constant-factor (½) approximation algorithm for the LTE SU-MIMO problem in which at-most one user can be assigned to each RB and where there are no knapsack constraints. In the following we closely follow the notation developed by [11]. We fully exploit the power of the LRT technique by accommodating MU-MIMO scheduling with J≧1 knapsack constraints. We will assume that T and J are constants.
Let us define a set U as
U={U⊂{1, . . . , K}:|U|≦T} (40)
and let =U×C. For any c∈C, we adopt the convention that i∈c if the ith element of c is one. We will use Tail(c) (Head(c)) to return the largest (smallest) RB index that contains a one in c. Thus, each c∈C has ones in all positions Head(c), . . . , Tail(c) and zeros elsewhere. We can now pose the resource allocation problem as
where X(U,c) is an indicator function that returns one if users in U are co-scheduled on the chunk indicated by c. Without loss of generality, we assume that the weight of the pair (U,c) in the qth knapsack, βq(U,c), lies in the interval [0,1]. p(U,c) denotes the weighted sum-rate obtained upon co-scheduling the users in U on the chunk indicated by c. We note that there is complete freedom with respect to the computation of p(U,c). Indeed, it can accommodate buffer and practical MCS constraints, account for any particular receiver employed at the base station and can incorporate any rule to assign a precoder and a power level to each user in U over the chunk c. An interesting observation is that the LRT method that will be used to solve (41) in the sequel, can also be used to obtain a feasible allocation to (14). However, it breaks down when we try to extend it to allocations with arbitrary partial overlaps and up-to two chunks per user. This is because in that case the objective function cannot be expressed as a separable sum of functions, each function depending only on a pair (U,c).
q≦J}. We then define J sets, V(1), . . . , V(J), over wide as (U,c)∈V(q) iff βq(U,c)>½, 1≦q≦J. Note that for a given K, N, an instance of the problem in (41) consists of {p(U,c)}∀(U,c)∈ and {βq(U,c)}, ∀(U,c)∈, 1≦q≦J.
Then in order to sub-optimally solve (41) we propose Algorithm II which possesses the optimality given below. Note that since T, J are fixed, || is O(KTN2). From this the complexity of Algorithm II, which is essentially determined by that of Algorithm IIa, can be shown to be O(KTN3).
We assume that T, J are are arbitrarily fixed. Then, for a given K, N we first partition the set into two halves as =Mnarrow∪wide, where we define narrow={(U,c)∈:βq(U,c)≦½, ∀1≦
Theorem 6 The problem in (41) is NP-hard. Let Ŵopt denote the optimal weighted sum rate obtained upon solving (41) and let Ŵ denote the weighted sum rate obtained upon using Algorithm II. Then, we have that
Proof: Let us specialize (41) to instances where all the knapsack constraints are vacuous and where p(U,c)=0 whenever |U|≧2 for all (U,c)∈. Then (41) reduces to the SU scheduling problem considered in [11, 18] which was shown there to be NP-hard. Consequently, we can assert that (41) is NP-hard.
Next, consider first Algorithm IIa which outputs a feasible allocation over narrow yielding a weighted sum rate Ŵnarrow. Let Ŵopt,narrow denote the optimal weighted sum rate obtained by solving (41) albeit where all pairs (U,c) are restricted to lie in narrow. We will prove that
We present a proof that invokes results from [11] as much as possible and highlights mainly the key differences which allow us to co-schedule multiple users on a chunk and satisfy multiple knapsack constraints. Note that Algorithm IIa builds up the stack S in N steps. In particular let Sj, j=1, . . . , N be the element that is added in the jth step and note that either Sj=Φ or it is equal to some pair (U*j,c*j). As in [11], we use two functions p1(j):narrow→IR+ and p2(j):narrow→IR+ for j=0, . . . , N to track the function p′(,) as the stack S is being built up over N steps and in particular we set p1(0)(U,c)=0, ∀(U,c)∈narrow and p2(0)(U,c)=p(U,c), ∀(U,c)∈narrow. For our problem at hand, we define {p1(j)(U,c), p2(j)(U,c)}
recursively as
where X(.) denotes the indicator function and
Hence, we have that
p2(j−1)(U,c)=p2(j)(U,c)+p1(j)(U,c), ∀(U,c)∈narrow, j=1, . . . , N. (45)
It can be noted that
p2(j)(U,c)≦0, ∀(U,c)∈narrow:Tail(c)≦j
p2(k)(U,c)≦p2(j)(U,c), ∀(U,c)∈narrow& k≧j. (46)
Further, to track the stack S′ which is built in the while loop of the algorithm, we define stacks {S*j}j=0N where S*N=Φ and S*j is the value of S′ after the Algorithm has tried to add ∪m=j+1NSm to S′ (starting from S′=Φ) so that S*0 is the stack S′ that is the output of the Algorithm. Note that S*j+1∈S*j⊂S*j+1∪Sj+1. Next, for j=0, . . . , N, we let W(j)opt denote the optimal solution to (41) but where is replaced by narrow and the function p(,) is replaced by p2(j)(,). Further, let W(j)=Σ(U,c)∈S*
W(j)opt≦(T+1+2J)W(j), ∀j=N, . . . , 0, (47)
which includes the claim in (43) at j=0. The base case W(N)opt≦(T+1+2J)W(N) is readily true since S*N=Φ and p2(N)(U,c)≦0, ∀(U,c)∈narrow. Assume that (47) holds for some j. We focus only on the main case in which Sj=(U*j,c*j)≠Φ (the remaining cases can be inferred from [11]). Note that since (U*j,c*j) is added to the stack S in the algorithm, p2(j−1)(U*j,c*j)>0. Then from the update formulas (44), we must have that p2(j)(U*j,c*j)=0. Using the fact that S*j−1⊂S*j∪(U*j,c*j) together with the induction
hypothesis, we can conclude that
Next, we will show that
Towards this end, suppose that S*j−1=S*j∪(U*j,c*j). Then, recalling (44) we can deduce that (49) is true since p1(j)(U*j,c*j)=p2(j−1)(U*j,c*j). Suppose now that S*j−1=S*j. In this case we can have two possibilities. In the first one (U*j,c*j) cannot not be added to S*j due to the presence of a pair (U′,c′)∈S*j for which either U′∩U*j≠Φ or c′∩c*j≠Φ. Since any pair (U′,c′)∈S*j was added to S in the algorithm after the jth step, from the second inequality in (46) we must have that p2(j−1)(U′,c′)>0. Recalling (44) we can then deduce that p1(j)(U′,c′)=p2(j−1)(U*j,c*j) which proves (49). In the second possibility, (U*j,c*j) cannot not be added to S*j due to a knapsack constraint being violated. In other words, for some q∈{1, . . . , J}, we have that
Since (U*j,c*j)∈narrow, βq(U*j,c*j)≦½ so that
which along with (44) also proves (49). Thus, we have established the claim in (49).
Finally, letting V(j)opt denote the optimal solution to (41) but where is replaced by narrow and the function p(,) is replaced by p1(j)(,), we will show that
Towards this end, from (44) we note that for any pair (U,c)∈narrow, p1(j)(U,c)≦p2(j−1)(U*j,c*j). Let V1(j)opt be an optimal allocation of pairs that results in V(j)opt. For any two pairs (U1,c1), (U2,c2)∈V1(j)opt
we must have U1∩U2=c1∩c2=Φ. In addition |U1| and |U2| are no greater than T. Thus we can have at-most T such pairs {(Ui,ci)} in V1(j)opt for which Ui∩U*j=Φ. Further, using the first inequality in (46) we see that any pair (U,c) for which c∩c*j≠Φ and p1(j)(U,c)=p2(j−1)(U*j,c*j) must have Tail(c)≧j so that j∈c. Thus, V1(j)opt can include at-most one pair (U,c) for which c∩c*j≠Φ. Now the remaining pairs in V1(j)opt (whose users do not intersect U*j and whose chunks do not intersect c*j) must satisfy the knapsack constraints. Let these pairs form the set {tilde over (V)}1(j)opt so that
Combining these observations we have that
which is the desired result in (52).
Thus, using (48), (49) and (52) we can conclude that
which proves the induction step and proves the claim in (43).
Let us now consider the remaining part which arises when wide≠Φ. Consider first Algorithm IIb which outputs a feasible allocation over wide yielding a weighted sum rate Ŵwide. Let Ŵopt,wide denote the optimal weighted sum rate obtained by solving (41) albeit where all pairs (U,c) are restricted to lie in wide. We will prove that
Let Vopt,wide be an optimal allocation of pairs from wide that results in a weighted sum rate Ŵopt,wide. Clearly, in order to meet the knapsack constraints, Vopt,wide can include at-most one pair from each V(q), 1≦q≦J so that there can be at-most J pairs in Vopt,wide. Thus, by selecting the pair yielding the
maximum weighted sum-rate we can achieve at-least Ŵopt,wide/J. The greedy algorithm first selects the pair yielding the maximum weighted sum rate among all pairs in wide and then attempts to add pairs to monotonically improve the objective. Thus, we can conclude that (57) must be true.
Notice that we select Ŵ=max{Ŵnarrow,Ŵwide} so that
It is readily seen that
Ŵopt≦Ŵopt,narrow+Ŵopt,wide. (59)
(58) and (59) together prove the theorem.
An interesting observation that follows from the proof of Theorem 6 is that any optimal allocation over wide can include at-most one pair from each V(q), 1≦q≦J. Then since the number of pairs in each V(q), 1≦q≦J is O(KTN2), we can determine an optimal allocation yielding Ŵopt,wide via exhaustive enumeration with a high albeit polynomial complexity (recall that T and J are assumed to be fixed). Thus, by using exhaustive enumeration instead of Algorithm IIb, we can claim the following result.
Corollary 1 Let Ŵopt denote the optimal weighted sum rate obtained upon solving (41) and let Ŵ denote the weighted sum rate obtained upon using Algorithm II albeit with exhaustive enumeration over wide. Then, we have that
Simuation Results
In this section we present our simulation results. We consider both the LTE as well as the LTE-A uplink. In each case we simulate an uplink with 10 users, wherein the BS is equipped with four receive antennas. The system has 280 sub-carriers divided into 20 RBs (of size 14 sub-carriers each) available as data subcarriers that are used for serving the users. We assume 10 active users all of whom have identical maximum transmit powers. We use the SCM urban macro channel model (with co-polarized antennas having 10λ, 1λ separation at the BS and the mobile (user), respectively, and 15° BS mean angular spread) to generate the channel between each user and the base-station. In all the results given below we assume an infinitely backlogged traffic model.
We first consider the multi-user scheduling over the LTE Uplink. In
In
In
The following enumerated references have been referenced throughout the detailed description pertaining to the fourth generation cellular uplink. We list the references here for completeness, although they have been cited on separate disclosure documents as appropriate.
Multi-User Scheduling in the 3GPP LTE Cellular Uplink
We now discuss the aspects of the present disclosure directed to 3GPP LTE Cellular Uplink. We note that in this section the reference numerals are specific to this section.
The next generation cellular systems, a.k.a. 4G cellular systems, will operate over wideband multi-path fading channels and have chosen the orthogonal frequency-division multiplexing based multiple-access (OFDMA) as their air-interface [1,2]. The motivating factors behind the choice of OFDMA are that it is an effective means to handle multi-path fading and that it allows for enhancing multi-user diversity gains via channel-dependent frequency-domain scheduling. The deployment of 4G cellular systems has begun and will accelerate in the coming years. Predominantly the 4G cellular systems will be based on the 3GPP LTE standard [1] since an overwhelming majority of cellular operators have committed to LTE. Our focus in this paper is on the uplink (UL) in these LTE cellular systems and in particular on multi-user (MU) scheduling for the the LTE UL. The UL in LTE systems employs a modified form of OFDMA, referred to as the DFT-Spread-OFDMA [1]. The available system bandwidth is partitioned into multiple resource blocks (RBs), where each RB represents the minimum allocation unit and is a pre-defined set of consecutive subcarriers. The scheduler is a frequency domain packet scheduler, which in each scheduling interval assigns these RBs to the individual users. Unlike single-user (SU) scheduling, a key feature of MU scheduling is that an RB can be simultaneously assigned to more that one user in the same scheduling interval. MU scheduling is well supported by fundamental capacity and degrees of freedom based analysis [3,4] and indeed, its promised gains need to be harvested in order to cater to the ever increasing traffic demands. Anticipating such growing data traffic, LTE UL has enabled MU scheduling in the uplink along with transmit antenna selection. However, several constraints have been placed on such MU scheduling (and the resulting MU transmissions) which seek to balance the need to provide scheduling freedom with the need to ensure a low signaling overhead and respect device limitations.
In
Finally scheduling in LTE UL must respect control channel overhead constraints and interference limit constraints. The former constraints arise because the scheduling decisions are conveyed to the users on the downlink control channel, whose limited capacity in turn places a limit on the set of users that can be scheduled. The latter constraints are employed to mitigate intercell interference. In the sequel it is shown that both these types of constraints can be posed as column-sparse and generic knapsack (linear packing) constraints, respectively.
The goal of this work is to design practical MU resource allocation algorithms for the LTE cellular uplink, where the term resource refers to RBs, modulation and coding schemes (MCS), power levels as well as choice of transmit antennas. In particular, we consider the design of resource allocation algorithms via weighted sum rate utility maximization, which accounts for finite user queues (buffers) and practical MCS. In addition, the designed algorithms comply with all the aforementioned practical constraints. Our main contributions are as follows:
Resource allocation for the OFDMA networks has been the subject of intense research [8-12] with most of the focus being on the downlink. A majority of OFDMA resource allocation problems hitherto considered are single-user (SU) scheduling problems, which attempt to maximize a system utility by assigning non-overlapping subcarriers to users along with transmit power levels for the assigned subcarriers. These problems have been formulated as continuous optimization problems, which are in general non-linear and non-convex. As a result several approaches based on the game theory [13], dual decomposition [8] or the analysis of optimality conditions [14] have been developed. Recent works have focused on emerging cellular standards and have modeled the resource allocation problems as constrained integer programs. Prominent examples are [11], [15] which consider the design of downlink SU-MEMO schedulers for LTE and LTE-Advanced (LTE-A) systems, respectively, and derive constant factor approximation algorithms.
Resource allocation for the DFT-Spread-OFDMA uplink has been relatively much less studied with [7,16,17] being the recent examples. In particular, [7,16] show that the SU LTE UL scheduling problem is APX-hard and provide constant-factor approximation algorithms, whereas [17] extends the algorithms of [7,16] to the SU-MIMO LTE-A scheduling. The algorithm proposed in [7] is based on an innovative application of the LRT technique, which was developed earlier in [6]. However, we emphasize that the algorithms in [7,16,17] cannot incorporate MU scheduling and also cannot incorporate knapsack constraints. To the best of our knowledge the design of approximation algorithms for MU scheduling in the LTE uplink has not been considered before.
Consider a single-cell with K users and one BS which is assumed to have Nr≧1 receive antennas. Suppose that user k has Nt≧1 transmit antennas and its power budget is Pk. We let N denote the total number of RBs.
We consider the problem of scheduling users in the frequency domain in a given scheduling interval. Let αk, 1≦k≦K denote the weight of the kth user which is an input to the scheduling algorithm and is updated using the output of the scheduling algorithm in every scheduling interval, say according to the proportional fairness rule [18]. Letting rk denote the rate assigned to the kth user (in bits per N RBs), we consider the following weighted sum rate utility maximization problem,
where the maximization is over the assignment of resources to the users subject to:
We define the set C as the set containing N length vectors such that any c∈C is binary-valued with ({0,1}) elements and contains a contiguous sequence of ones with the remaining elements being zero. Here we say an RB i belongs to c (i∈c) if c contains a one in its ith position, i.e., c(i)=1 so that each c∈C denotes a valid assignment of RBs chosen from the set C. Also c1 and c2 are said to intersect if there is some RB that belongs to both c1 and c2. For any c∈C, we will use Tail(c) (Head(c)) to return the largest (smallest) index that contains a one in c. Thus, each c∈C has ones in all positions Head(c), . . . , Tail(c) and zeros elsewhere. Further, we define {G1, . . . , GL} to be a partition of {1, . . . , K} with the understanding that all users that belong to a common set (or group) Gs, for any 1≦s≦L, are mutually incompatible. In other words at-most one user from each group Gs can be scheduled in a scheduling interval. Notice that by choosing L=K and Gs={s}, 1≦s≦K we obtain the case where all users are mutually compatible. Let us define a family of subsets, U, as
U={U⊂{1, . . . , K}:|U|≦T&|U∩Gs|≦1∀1≦a≦L} (2)
and let =U×C.
We can now pose the resource allocation problem as
where Φ denotes the empty set and X(U,c) is an indicator function that returns one if users in U are co-scheduled on the chunk indicated by c. Note that the first constraint ensures that at-most one user is scheduled from each group and that each scheduled user is assigned at-most one chunk. In addition this constraint also enforces the complete overlap constraint. The second constraint enforces non-overlap among the assigned chunks. Note that p(U,c) denotes the weighted sum-rate obtained upon co-scheduling the users in U on the chunk indicated by c. We emphasize that there is complete freedom with respect to the computation of p(U,c). Indeed, it can accommodate finite buffer and practical MCS constraints, account for any particular receiver employed by the base station and can also incorporate any rule to assign a transmit antenna and a power level to each user in U over the chunk c.
The first set of J knapsack constraints in (3), where J is arbitrary but fixed, are generic knapsack constraints. Without loss of generality, we assume that the weight of the pair (U,c) in the qth knapsack, βq(U,c), lies in the interval [0,1]. Notice that we can simply drop each vacuous constraint, i.e., each constraint q for which Σ(U,c)∈βq(U,c)≦1. The second set of knapsack constraints are column-sparse binary knapsack constraints. In particular, for each (U,c)∈ and q∈I we have that αq(U,c)∈{0,1}. Further, we have that for each (U,c)∈, Σq∈Iαq(U,c)≦Δ, where Δ is arbitrary but fixed and denotes the column-sparsity level. Note that here the cardinality of I can scale polynomially in KN keeping Δ fixed. Together these two sets of knapsack constraints can enforce a variety of practical constraints, including the control channel and the interference limit constraints. For instance, defining a generic knapsack constraint as
for any given input {tilde over (K)} can enforce that no more that {tilde over (K)} can be scheduled in a given interval, which represents a coarse control channel constraint. In a similar vein, consider any given choice of a victim adjacent base-station and a sub-band with the constraint that the total interference caused to the victim BS by users scheduled in the cell of interest, over all the RBs in the subband, should be no greater than a specified upper bound. This constraint can readily modeled using a generic knapsack constraint where the weight of each (U,c)∈ is simply the ratio of the total interference caused by users in U to the victim BS over RBs that are in c as well as the specified subband, and the specified upper bound. The interference is computed using the transmission parameters (such as the power levels, transmit antennas etc) that yield the metric p(U,c). A finer modeling of the LTE control channel constraints is more involved (and somewhat tedious) and is given in Appendix B for the interested reader.
Note that for a given K,N, an instance of the problem in (3) consists of a finite set I of indices, a partition {G1, . . . , GL}, metrics {p(U,c)}∀(U,c)∈ and weights {βq(U,c)}, ∀(U,c)∈, 1≦q≦J and {αq(U,c)}, ∀(U,c)∈, q∈I. Then, in order to solve (3) for a given instance, we first partition the set into two parts as =narrow∪wide, where we define narrow={(U,c)∈:βq(U,c)≦½, ∀1≦q≦J} so that wide=\narrow. We then define J sets, V(1), . . . , V(J) that cover wide (note that any two of these sets can mutually overlap) as (U,c)∈V(q) iff βq(U,c)>½ for q=1, . . . , J. Recall that T, J are fixed and note that the cardinality of , ||, is O(KTN2) and that narrow and {V(q)} can be determined in polynomial time. Next, we propose Algorithm I which possesses the optimality given below. The complexity of Algorithm I, which is essentially determined by that of its module Algorithm IIa, scales polynomially in KN (recall that T is a constant) A detailed discussion on the complexity along with steps to reduce it are deferred to the next section. We offer the following theorem which is proved in Appendix A.
Theorem 1. The problem in (3) is APX-hard, i.e., there is an e>0 such that it is NP hard to obtain a 1−ε approximation algorithm for (3). Let Ŵopt denote the optimal weighted sum rate obtained upon solving (3) and let Ŵ denote the weighted sum rate obtained upon using Algorithm I. Then, we have that
An interesting observation that follows from the proof of Theorem 1 is that any optimal allocation over wide can include at-most one pair from each V(q), 1≦q≦J. Then since the number of pairs in each V(q), 1≦q≦J is O(KTN2), we can determine an optimal allocation yielding Ŵopt,wide via exhaustive enumeration with a high albeit polynomial complexity (recall that T and J are assumed to be fixed). Thus, by using exhaustive enumeration instead of Algorithm IIb, we can claim the following result.
Corollary 1. Let Ŵopt denote the optimal weighted sum rate obtained upon solving (3) and let Ŵ denote the weighted sum rate obtained upon using Algorithm II albeit with exhaustive enumeration over wide. Then, we have that
For notational simplicity, henceforth unless otherwise mentioned, we assume that all users are mutually compatible, i.e., L=K with Gs={s}, 1≦s≦K.
In this section we present key techniques to significantly reduce the complexity of our proposed local ratio test based multi-user scheduling algorithm. As noted before the complexity of Algorithm I is dominated by that of its component Algorithm IIa. Accordingly, we focus our attention on Algorithm IIa and without loss of generality we assume that =narrow. Notice that hitherto we have assumed that all the metrics {p(U,c):(U,c)∈} are available. In practise, computing these O(KTN2) metrics, which are often complicated non-linear functions, is the main bottleneck and indeed must be accounted for in the complexity analysis. Before proceeding, we make the following assumption that is satisfied by all physically meaningful metrics.
Assumption 1. Sub-additivity: We assume that for any (U,c)∈
p(U,c)≦p(U1,c)+p(U2,c), ∀U1,U2:U=U1∪U2. (6)
The following features can then be exploited for a significant reduction in complexity.
A potential drawback of the LRT based algorithm is that some RBs may remain un-utilized, i.e., they may not be assigned to any user. Notice that when the final stack S′ is built in the while-loop of Algorithm IIa, an allocation or pair from the top of stack S is added to stack S′ only if it does not conflict with those already in stack S′. Often multiple pairs from S are dropped due to such conflicts resulting in spectral holes formed by unassigned RBs. To mitigate this problem, we perform a second phase. The second phase consists of running Algorithm IIa again albeit with modified metrics {{hacek over (p)}(U,c):(U,c)∈narrow} which are obtained via the following steps.
A consequence of using the modified metrics is that the second phase has a significantly less complexity since a large fraction of the allocations are disallowed. While the second phase does not offer any improvement in the approximation factor, simulation results presented in the sequel reveal that it offers a good performance improvement with very low complexity addition.
In this section we evaluate key features of our proposed algorithm over an idealized single-cell setup. In particular, we simulate an uplink wherein the BS is equipped with four receive antennas. The system has 280 sub-carriers divided into 20 RBs (of size 14 sub-carriers each) available as data subcarriers that are used for serving the users. We assume 10 active users all of whom have identical maximum transmit powers. We model the fading channel between each user and the BS as a six-path equal gain i.i.d. Rayleigh fading channel. In all the results given below we assume an infinitely backlogged traffic model. For simplicity, we assume that there are no knapsack constraints and that at-most two users can be co-scheduled on an RB (i.e., J=0, Δ=0 and T=2). Further, each user can employ ideal Gaussian codes and upon being scheduled, divides its maximum transmit power equally among its assigned RBs. Notice that since =narrow we can directly use Algorithm IIa.
In
Next, in
We next propose a sequential LRT based MU scheduling approach that yields a scheduling decision over narrow. As before, our focus is on avoiding as many metric computations as possible. The idea is to implement the LRT based MU scheduling algorithm in T iterations, where we recall T denotes the maximum number of users that can be co-scheduled on an RB. In particular, in the sth iteration where 1≦s≦T−1, we first perform the following steps to obtain metrics {hacek over (p)}(U,c), ∀(U,c)∈narrow, where only a few of these metrics are positive, and then use these metrics in Algorithm IIa to obtain a tentative scheduling decision.
In the last iteration, i.e. when s=T, we initialize {hacek over (p)}(U,c)=p(U,c), ∀(U,c)∈narrow. Then, using the set S′ obtained as the output of the (T−1)th iteration, we perform the two aforementioned steps. Additionally, to ensure non-overlapping chunk allocation, for each (U,c)∈S′ we set
{hacek over (p)}(U′,c′)=0 if c′∩c≠Φ & U′∩U=Φ, ∀(U′,c′)∈narrow. (14)
Notice that in each iteration only a small subset out of the set of all metrics is selected, which in particular is that whose corresponding pairs are compatible (as defined in the aforementioned conditions) with the output tentative scheduling decision of the previous iteration. These compatibility conditions ensure that the set of non-zero metrics (i.e., the chosen metrics) is small and each iteration builds upon the decision of the previous iteration. Indeed, in each iteration any (user set, chunk) allocation made by the previous iteration can only be altered by adding one additional user and/or by expanding the chunk. Next, we offer an approximation result for the sequential LRT based MU scheduling that holds under mild assumptions.
Assumption 2. Suppose F is any allocation {(U,c)} that is feasible for (3). Then F is downward closed in the following sense. Any allocation F′ constructed as F′={(U′,c):U′⊂U & (U,c)∈F} is also feasible.
Proposition 1. Suppose that Assumptions 1 and 2 are satisfied. Let the weighted sum rate yielded by the sequential LRT based MU scheduling over narrow be denoted by Ŵseq−narrow. Then,
In a practical cellular system the number of active users can be large. Indeed the control channel constraints may limit the BS to serve a much smaller subset of users. It thus makes sense from a complexity stand-point to pre-select a pool of good users and then use the MU scheduling algorithm on the selected pool of users. Here we propose a few user pre-selection algorithms. For convenience, wherever needed, we assume that at-most two users can be co-scheduled on an RB (i.e., T=2) which happens to be the most typical value.
Before proceeding we need to define some terms that will be required later. Suppose that each user has one transmit antenna and let hu,j denote the effective channel vector seen at the BS from user u on RB j, where 1≦u≦K and 1≦j≦N. Note that the effective channel vector includes the fading as well as the path loss factor and a transmit power value. Then, letting wu denote the PF weight of user u, we define the following metrics:
We are now ready to offer our user pre-selection rules where a pool of {tilde over (K)} users must be selected from the K active users. Notice that to reduce complexity, all rules neglect the contiguity and the complete overlap constraints.
It can be shown that f:2{1, . . . , K}→IR+ is a monotonic sub-modular set function [15]. As a result, the user pre-selection problem
can be sub-optimally solved by adapting a simple greedy algorithm [19], which offers a half approximation [15].
Proposition 2. The set function h(.) defined in (23) is a monotonic sub-modular set function. Thus the problem
can be solved sub-optimally (with a ½ approximation) by a simple greedy algorithm.
As a benchmark to compare the performance of the proposed user pre-selection algorithms we can consider the case where LRT MU scheduling is employed without user pre-selection but where an additional knapsack constraint is used to enforce the limit on the number of users that can be scheduled in an interval. It can be verified that this can be achieved by defining a knapsack constraint in (3) as
We now present the performance of our MU scheduling algorithms via detailed system level simulations. The simulation parameters conform to those used in 3GPP LTE evaluations and are given in Table 4. In all cases inter-cell interference suppression (IRC) is employed by each base-station (BS).
We first consider the case when each cell (or sector) has an average of 10 users and where there are no knapsack constraints. In Table 5 we report the cell average and cell edge spectral efficiencies. The percentage gains shown for the MU scheduling schemes are over the baseline LRT based single-user scheduling scheme. Note that for the first three scheduling schemes we employed the second phase described in Section 4. As seen from Table 5, MU scheduling in conjunction with an advanced SIC receiver at the BS can result in very significant gains in terms of cell average throughout (about 27%) along with good cell edge gains. For the simpler MMSE receiver, we see significant cell average throughout gains (about 18%) but a degraded cell edge performance. We note that it is possible to tradeoff a small fraction of the cell edge gains for a large cell edge performance improvement by altering the PF rule. Finally, the last two reported schemes are based on the sequential-LRT method described in Section 6. We notice that sequential-LRT based scheduling provides significant cell average gains while retaining the cell edge performance of SU scheduling. Thus, the sequential LRT based scheduling method is an attractive way to tradeoff some cell average throughput gains for a reduction in complexity.
Next, in Tables 6 and 7 we consider LRT based MU scheduling, with the second phase described in Section 4, for the case when the BS employs the MMSE receiver and the case when it employs the SIC receiver, respectively. In each case we assume that an average of 15 users are present in each cell and at-most 7 first-transmission users can be scheduled in each interval. Thus, a limit on the number of scheduled users might have to be enforced in each scheduling interval. As a benchmark, we enforce this constraint (if it is required) using one knapsack constraint as described in Section 7. Note that upon specializing the result in Theorem 1 (with wide=Φ, T=2 and Δ=0, J=1)) we see that the LRT based MU scheduling algorithm guarantees an approximation factor of ⅕. Then, we examine the scenario where a pool of {tilde over (K)}=7 users is pre-selected whenever the number of first-transmission users is larger than 7. The LRT based MU scheduling algorithm is then employed on this pool without any constraints. In Table 6 we have used the first second and third pre-selection rules from Section 7 whereas in Table 7 we have used the first second and fourth pre-selection rules. It is seen that the simple rule one provides a superior performance compared to the benchmark. Indeed, it is attractive since it involves computation of only single user metrics. The other rule (rule 2) which possess this feature, however provides much less improvement mainly because it is much more aligned to single user scheduling. Rules 3 and 4 involve computation of metrics that involve user-pairing and hence incur higher complexity. For the MMSE receiver, the gain of rule 3 over rule 1 is marginal mainly because the metric in rule 3 is not sub-modular and hence cannot be well optimized by the simple greedy rule. On the other hand, considering the MMSE receiver, the gain of rule 4 over rule 1 is larger because the metric used in rule 4 is indeed sub-modular and hence can be well optimized by the simple greedy rule.
We considered resource allocation in the 3GPP LTE cellular uplink which allows for transmit antenna selection for each scheduled user as well as multi-user scheduling, wherein multiple users can be assigned the same time-frequency resource. We showed that the resulting resource allocation problem, which must comply with several practical constraints, is NP-hard. We then proposed constant-factor polynomial-time approximation algorithms and demonstrated their performance via simulations.
The following enumerated references have been referenced throughout the detailed description pertaining to the 3GPP cellular uplink. We list the references here for completeness, although they have been cited on separate disclosure documents as appropriate.
A Appendix: Proof of Theorem 1
Let us specialize (3) to instances where all the knapsack constraints are vacuous, where L=K and Gs={s}, 1≦s≦K and where p(U,c)=0 whenever |U|≧2 for all (U,c)∈. Then (3) reduces to the SU scheduling problem considered in [7,16] which was shown there to be APX-hard. Consequently, we can assert that (3) is APX-hard.
Next, consider first Algorithm IIa which outputs a feasible allocation over narrow yielding a weighted sum rate Ŵnarrow. Let Ŵopt,narrow denote the optimal weighted sum rate obtained by solving (3) albeit where all pairs (U,c) are restricted to lie in narrow. We will prove that
We present a proof that invokes results from [7] as much as possible and highlights mainly the key differences which allow us to co-schedule multiple users on a chunk and satisfy multiple knapsack constraints. Note that Algorithm IIa builds up the stack S in N steps. In particular let Sj, j=1, . . . , N be the element that is added in the jth step and note that either Sj=Φ or it is equal to some pair (U*j,c*j). As in [7], we use two functions p1(j):narrow→IR+ and p2(j):narrow→IR+ for j=0, . . . , N to track the function p′(,) as the stack S is being built up over N steps and in particular we set p1(0)(U,c)=0, ∀(U,c)∈narow and p2(0)(U,c)=p(U,c), ∀(U,c)∈narrow. For our problem at hand, we define {p1(j)(U,c),p2(j)(U,c)} recursively as
where (x)+=max{x,0}, x∈IR, X(.) denotes the indicator function and
Hence, we have that
p2(j−1)(U,c)=p2(j)(U,c)+p1(j)(U,c), ∀(U,c)∈narrow, j=1, . . . , N. (27)
It can be noted that
p2(j)(U,c)≦0, ∀(U,c)∈narrow:Tail(c)≦j
p2(k)(U,c)≦p2(j)(U,c), ∀(U,c)∈narrow & k≧j. (28)
Further, to track the stack S′ which is built in the while loop of the algorithm, we define stacks {S*j}j=0N where S*N=Φ and S*j is the value of S′ after the Algorithm has tried to add ∪m=j+1NSm to S′ (starting from S′=Φ) so that S*0 is the stack S′ that is the output of the Algorithm. Note that S*j+1⊂S*j⊂S*j+1∪Sj+1. Next, for j=0, . . . , N, we let W(j)opt denote the optimal solution to (3) but where is replaced by narrow and the function p(,) is replaced by p2(j)(,). Further, let W(j)=Σ(U,c)∈S*
W(j)opt≦(T+1+Δ+2J)W(j), ∀j=N, , , , 0, (29)
which includes the claim in (25) at j=0. The base case W(N)opt≦(T+1+Δ+2J)W(N) is readily true since S*N=Φ and p2(N)(U,c)≦0, ∀(U,c)∈narrow. Assume that (29) holds for some j. We focus only on the main case in which Sj=(U*j,c*j)≠Φ (the remaining case holds trivially true). Note that since (U*j,c*j) is added to the stack S in the algorithm, p2(j−1)(U*j,c*j)>0. Then from the update formulas (26), we must have that p2(j)(U*j,c*j)=0. Using the fact that S*j−1⊂S*j∪(U*j,c*j) together with the induction hypothesis, we can conclude that
Next, we will show that
Towards this end, suppose that S*j−1=S*j∪(U*j,c*j). Then, recalling (26) we can deduce that (31) is true since p1(j)(U*j,c*j)=p2(j−1)(U*j,c*j). Suppose now that S*j−1=S*j. In this case we can have two possibilities. In the first one (U*j,c*j) cannot not be added to S*j due to the presence of a pair (U′,c′)∈S*j for which at-least one of these three conditions are satisfied: ∃Gs:U′∩Gs≠Φ & U*j∩Gs≠Φ; c′∩c*j≠Φ and ∃q∈I:αq(U′,c′)=αq(U*j,c*j)=1. Since any pair (U′,c′)∈S*j was added to S in the algorithm after the jth step, from the second inequality in (28) we must have that p2(j−1)(U′,c′)>0. Recalling (26) we can then deduce that p1(j)(U′,c′)=p2(j−1)(U*j,c*j) which proves (31). In the second possibility, (U*j,c*j) cannot not be added to S*j due to a generic knapsack constraint being violated. In other words, for some
q∈{1, . . . , J}, we have that
Since (U*j,c*j)∈narrow, βq(U*j,c*j)≦½ so that
which along with (26) also proves (31). Thus, we have established the claim in (31).
Finally, letting V(j)opt denote the optimal solution to (3) but where is replaced by narrow and the function p(,) is replaced by p1(j)(,), we will show that
Towards this end, from (26) we note that for any pair (U,c)∈narrow, p1(j)(U,c)≦p2(j−1)(U*j,c*j). Let V1(j)opt be an optimal allocation of pairs that results in V(j)opt. For any two pairs (U1,c1), (U2,c2)∈V1(j)opt we must have that for each Gs 1≦s≦L, at-least one of U1∩Gs and U2∩Gs is Φ, as well as c1∩c2=Φ. In addition, |U1| and |U2| are no greater than T. Thus we can have at-most T such pairs {(Ui,ci)} in V1(j)opt for which ∃Gs:U∩Gs≠Φ & U*j∩Gs≠Φ. Further, using the first inequality in (28) we see that any pair (U,c) for which c∩c*j≠Φand p1(j)(U,c)=p2(j−1)(U*j,c*j) must have Tail(c)≧j so that j∈c. Thus, V1(j)opt can include at-most one pair (U,c) for which c∩c*j≠Φ. Next, there can be at-most Δ constraints in I for which αq(U*j,c*j)=1, q∈I is satisfied. For each such constraint q∈I we can pick at-most one pair (U,c) for which αq(U,c)=1 and p1(j)(U,c)=p2(j−1)(U*j,c*j). Thus, V1(j)opt can include at-most Δ such pairs, one for each constraint. Now the remaining pairs in V1(j)opt (whose users do not intersect U*j and whose chunks do not intersect c*j which do not violate any binary knapsack constraint in the presence of (U*j,c*j)) must satisfy the generic knapsack constraints. Let these pairs form the set {tilde over (V)}1(j)opt so that
Combining these observations we have that
which is the desired result in (34).
Thus, using (30), (31) and (34) we can conclude that
which proves the induction step and proves the claim in (25).
Let us now consider the remaining part which arises when wide≠Φ. Consider first Algorithm IIb which outputs a feasible allocation over wide yielding a weighted sum rate Ŵwide. Let Ŵopt,wide denote the optimal weighted sum rate obtained by solving (3) albeit where all pairs (U,c) are restricted to lie in wide. We will prove that
Let Vopt,wide be an optimal allocation of pairs from wide that results in a weighted sum rate Ŵopt,wide. Clearly, in order to meet the knapsack constraints, Vopt,wide can include at-most one pair from each V(q), 1≦q≦J so that there can be at-most J pairs in Vopt,widw. Thus, by selecting the pair yielding the maximum weighted sum-rate we can achieve at-least Ŵopt,wide/J. The greedy algorithm first selects the pair yielding the maximum weighted sum rate among all pairs in wide and then attempts to add pairs to monotonically improve the objective. Thus, we can conclude that (37) must be true.
Notice that we select Ŵ=max{Ŵnarrow,Ŵwide} so that
It is readily seen that
Ŵopt≦Ŵopt,narrow+Ŵopt,wide. (39)
(38) and (39) together prove the theorem.
B Appendix: Modeling 3GPP LTE Control Channel Constraints
In the 3GPP LTE system, the minimum allocation unit in the downlink control channel is referred to as the control channel element (CCE). Let {1, . . . , R} be a set of CCEs available for conveying UL grants. A contiguous chunk of CCEs from {1, . . . , R} that can be be assigned to a user is referred to as a PDCCH. The size of each PDCCH is referred to as an aggregation level and must belong to the set {1, 2, 4, 8}. Let D denote the set of all possible such PDCCHs. For each user the BS first decides an aggregation level, based on its average (long-term) SINR. Then, using that users' unique identifier (ID) together with its aggregation level, the BS obtains a small subset of non-overlapping PDCCHs from D (of cardinality no greater than 6) that are eligible to be assigned to that user. Let Du denote this subset of eligible PDCCHs for a user u. Then, if user u is scheduled only one PDCCH from Du must be assigned to it, i.e., must be used to convey its UL grant. Note that while the PDCCHs that belong to the eligible set of any one user are non-overlapping, those that belong to eligible sets of any two different users can overlap. As a result, the BS scheduler must also enforce the constraint that two PDCCHs that are assigned to two different scheduled users, respectively, must not overlap.
Next, the constraint that each scheduled user can be assigned only one PDCCH from its set of eligible PDCCHs can be enforced as follows. First, define a set Vu containing |Du| virtual users for each user u, 1≦u≦K, where each virtual user in Vu is associated with a unique PDCCH in Du and all the parameters (such as uplink channels, queue size etc.) corresponding to each virtual user in Vu are identical to those of user u. Let Ũ be the set of all possible subsets of such virtual users, such that each subset has a cardinality no greater than T and contains no more than one virtual user corresponding to the same user. Defining =×C, we can then pose (3) over after setting L=K with Gs=Vs, 1≦s≦K. Consequently, by defining the virtual users corresponding to each user as being mutually incompatible, we have enforced the constraint that at-most one virtual user for each user can be selected, which in turn is equivalent to enforcing that each scheduled user can be assigned only one PDCCH from its set of eligible PDCCHs.
Finally, consider the set of all eligible PDCCHs, {Du}u=1K. Note that this set is decided by the set of active users and their long-term SINRs. Recall that each PDCCH in {Du}u=1K maps to a unique virtual user. To ensure that PDCCHs that are assigned to two virtual users corresponding to two different users do not overlap, we can define multiple binary knapsack constraints. Clearly R such knapsack constraints suffice (indeed can be much more than needed), where each constraint corresponds to one CCE and has a weight of one for every pair (Ũ,c)∈ wherein Ũ contains a virtual user corresponding to a PDCCH which includes that CCE. Then, a useful consequence of the fact that in LTE the set Du for each user u is extracted from D via a well designed hash function (which accepts each user's unique ID as input), is that these resulting knapsack constraints are column-sparse.
This application claims the benefit of U.S. Provisional Patent Application Ser. No. 61/512,692 filed Jul. 28, 2011 and U.S. Provisional Patent Application Ser. No. 61/587,177 filed Jan. 17, 2012, the entire contents of which are incorporated by reference as if set forth at length herein.
Number | Name | Date | Kind |
---|---|---|---|
20060039312 | Walton et al. | Feb 2006 | A1 |
20060121946 | Walton et al. | Jun 2006 | A1 |
20070293157 | Haartsen et al. | Dec 2007 | A1 |
20090135944 | Dyer et al. | May 2009 | A1 |
20100020757 | Walton et al. | Jan 2010 | A1 |
20100029213 | Wang | Feb 2010 | A1 |
20100034186 | Zhou et al. | Feb 2010 | A1 |
20100080136 | Hunzinger | Apr 2010 | A1 |
20110053527 | Hunzinger | Mar 2011 | A1 |
20110136446 | Komninakis et al. | Jun 2011 | A1 |
20120082104 | Lysejko et al. | Apr 2012 | A1 |
20130028230 | Borran et al. | Jan 2013 | A1 |
Number | Date | Country | |
---|---|---|---|
20130170366 A1 | Jul 2013 | US |
Number | Date | Country | |
---|---|---|---|
61512692 | Jul 2011 | US | |
61587177 | Jan 2012 | US |