1. Field of the Invention
This invention relates generally to telecommunications, and, more particularly, to detection in wireless communications.
2. Description of the Related Art
In the field of wireless telecommunications, such as cellular telephony, a system typically includes a plurality of base stations distributed within an area to be serviced by the system. Various users within the area, fixed or mobile, may then access the system and, thus, other interconnected telecommunications systems, via one or more of the base stations. Typically, a user maintains communications with the system as the user passes through an area by communicating with one and then another base station, as the user moves. The user may communicate with the closest base station, the base station with the strongest signal, the base station with a capacity sufficient to accept communications, etc.
Commonly, each base station is constructed to process a plurality of communications sessions with a plurality of users in parallel. In this way, the number of base stations may be limited while still providing communications capabilities to a large number of simultaneous users. Typically, each user is generally free to transmit information to the base station substantially unregulated. Moreover, each user is free to transmit any of a wide variety of information from a known universe of symbols. That is, multiple users may transmit a complex array of information to the base station at the same time. Further, the information transmitted from each user may be subjected to unique conditions, such as noise, attenuation, etc. Given the variety of signals that may be sent and the variety of complicating factors that may be applied to these signals, the base station has a daunting task of accurately and quickly determining what each user has transmitted. The base station's ability to handle this task limits the total number of users that may be accommodated.
The present invention is directed to overcoming, or at least reducing, the effects of one or more of the problems set forth above.
In one aspect of the instant invention, a method is provided for performing a tree search. The method comprises identifying a set of candidates and producing interim and final characteristics associated with each of the candidates by a plurality of parallel tasks. Each candidate is removed from the set of candidates in response to determining that at least one of the interim and final characteristics exceeds at least one preselected setpoint. A set of final candidates is built from the set of candidates having a final characteristic falling below the preselected setpoint.
In another aspect of the instant invention, A computer readable program storage device is encoded with instructions that, when executed by a computer, performs a method for searching a tree. The method comprises identifying a set of candidates and producing interim and final characteristics associated with each of the candidates by a plurality of parallel tasks. Each candidate is removed from the set of candidates in response to determining that at least one of the interim and final characteristics exceeds at least one preselected setpoint. A set of final candidates is built from the set of candidates having a final characteristic falling below the preselected setpoint.
The invention may be understood by reference to the following description taken in conjunction with the accompanying drawings, in which like reference numerals identify like elements, and in which:
While the invention is susceptible to various modifications and alternative forms, specific embodiments thereof have been shown by way of example in the drawings and are herein described in detail. It should be understood, however, that the description herein of specific embodiments is not intended to limit the invention to the particular forms disclosed, but on the contrary, the intention is to cover all modifications, equivalents, and alternatives falling within the spirit and scope of the invention as defined by the appended claims.
Illustrative embodiments of the invention are described below. In the interest of clarity, not all features of an actual implementation are described in this specification. It will of course be appreciated that in the development of any such actual embodiment, numerous implementation-specific decisions must be made to achieve the developers' specific goals, such as compliance with system-related and business-related constraints, which will vary from one implementation to another. Moreover, it will be appreciated that such a development effort might be complex and time-consuming, but would nevertheless be a routine undertaking for those of ordinary skill in the art having the benefit of this disclosure.
Turning now to the drawings, and specifically referring to
In one embodiment, a plurality of the base stations 130 may be coupled to a Radio Network Controller (RNC) 138 by one or more connections 139, such as T1/EI lines or circuits, ATM circuits, cables, optical digital subscriber lines (DSLs), and the like. Although only two RNCs 138 are illustrated, those skilled in the art will appreciate that a plurality of RNCs 138 may be utilized to interface with a large number of the base stations 130. Generally, the RNC 138 operates to control and coordinate the base stations 130 to which it is connected. The RNC 138 of
The RNC 138 is, in turn, coupled to a Core Network (CN) 165 via a connection 145, which may take on any of a variety of forms, such as T1/E1 lines or circuits, ATM circuits, cables, optical digital subscriber lines (DSLs), and the like. Generally the CN 140 operates as an interface to a data network 125 and/or to a public telephone system (PSTN) 160. The CN 140 performs a variety of functions and operations, such as user authentication, however, a detailed description of the structure and operation of the CN 140 is not necessary to an understanding and appreciation of the instant invention. Accordingly, to avoid unnecessarily obfuscating the instant invention, further details of the CN 140 are not presented herein.
The data network 125 may be a packet-switched data network, such as a data network according to the Internet Protocol (IP). One version of IP is described in Request for Comments (RFC) 791, entitled “Internet Protocol,” dated September 1981. Other versions of IP, such as IPv6, or other connectionless, packet-switched standards may also be utilized in further embodiments. A version of IPv6 is described in RFC 2460, entitled “Internet Protocol, Version 6 (IPv6) Specification,” dated December 1998. The data network 125 may also include other types of packet-based data networks in further embodiments. Examples of such other packet-based data networks include Asynchronous Transfer Mode (ATM), Frame Relay networks, and the like.
As utilized herein, a “data network” may refer to one or more communication networks, channels, links, or paths, and systems or devices (such as routers) used to route data over such networks, channels, links, or paths.
Thus, those skilled in the art will appreciate that the communications system 100 facilitates communications between the users 120 and the data network 125. It should be understood, however, that the configuration of the communications system 100 of
Unless specifically stated otherwise, or as is apparent from the discussion, terms such as “processing” or “computing” or “calculating” or “determining” or “displaying” or the like, refer to the action and processes of a computer system, or similar electronic computing device, that manipulates and transforms data represented as physical, electronic quantities within the computer system's registers and memories into other data similarly represented as physical quantities within the computer system's memories or registers or other such information storage, transmission or display devices.
Referring now to
In the illustrated embodiment, the users 120a, 120b are substantially similar at least at a functional block diagram level. Those skilled in the art will appreciate that while the users 120a, 120b are illustrated as being functionally similar in the instant embodiment, substantial variations may occur without departing from the spirit and scope of the instant invention. For purposes of describing the operation of the instant invention it is useful to describe the users 120a, 120b as being functionally similar. Thus, for the instant embodiment, the structure and operation of the users 120a, 120b is discussed herein without reference to the “a” and “b” suffixes on their element numbers, such that a description of the operation of the user 120 applies to both of the users 120a, 120b.
The user 120 shares certain functional attributes with the base station 130. For example, the user 120 includes a controller 250, an antenna 255 and a plurality of channel types: a shared channel type 260, a data channel type 270, and a control channel type 280. The controller 250 generally operates to control both the transmission and reception of data and control signals over the antenna 255 and the plurality of channel types 260, 270, 280.
Normally, the channel types 260, 270, 280 in the user 120 communicate with the corresponding channel types 220, 230, 240 in the base station 130. Under the operation of the controllers 210, 250 the channel types 220, 260; 230, 270; 240, 280 are used to effect communications from the user 120 to the base station 130. For example, in one embodiment of the instant invention, the base station 130 receives information from the users 120a, 120b over one or more of the channels 220, 230, 240 and performs a predefined search technique for identifying the information or symbols that the users 120a, 120b have transmitted. As discussed above, the accuracy and speed of the search technique can have a significant impact on the number of users 120 that a base station 130 can support.
Consider a multi-user system with M users and N different symbols that may be received from each user, which can be represented by:
y=Hs+n (1)
where y is an N×1 vector of received symbols, H is an N×M complex matrix representing both a channel and spreading associated with the transmitted symbol, s is an M×1 vector representing the transmitted symbols, and n is an M×1 vector representing additive white Gaussian noise. This is, of course, a simplified model in which users are assumed to be synchronous. The simplified model is useful for illustrating the principles of the instant invention, but is not intended to limit the spirit or scope of the instant invention.
Estimating the transmitted symbols may begin with finding an unconstrained maximum likelihood solution that will become the center of a search sphere for a subsequent constrained maximum likelihood solution. The unconstrained maximum likelihood solution is given by a Moore-Penrose pseudo-inverse:
ŝ=(HHH)−1HHy (2)
The constrained maximum likelihood solution forces the result onto a lattice, A of permissible solutions. The constrained maximum likelihood solution is then:
It has been shown that solving equation (3) is equivalent to solving:
where ŝ is the unconstrained maximum likelihood solution as defined in equation (2).
Using a Cholesky or QR decomposition, an upper triangular matrix U may be obtained such that HHH=UHU with non-negative diagonal elements. This allows equation (4) to be simplified to:
Rather than consider all points (equivalent to a brute-force search), it may be useful to only consider the set of points lying within a hyper-sphere of radius r, centered at ŝ.
(s−ŝ)HUHU(s−ŝ)≦r2 (6)
Or equivalently,
where uij represents elements of the upper triangular matrix U. The diagonal elements of U are real and non-negative, whereas the off-diagonal elements may be complex. Consideration of this subset of points is described as a tree search, where each level of the tree corresponds to a row of U in equation (5).
An exemplary binary tree 300 of depth 4 is shown in
By exploiting the triangular shape of U, the total cost of equation (5) can be computed incrementally, row-by-row in U from the bottom up. Should the cost at any stage (or row) ever exceed a threshold (called the radius), the current solution may be discarded and any other solutions that match the partial solution which was discarded may also be discarded (solutions that are below the current node in the tree always have a higher cost than the current node because, by virtue of the norm in equation (7), the incremental cost is always positive). This allows one to efficiently prune significant parts of the tree 300 or search space during the search process, saving both computation time and power. It may also be desirable to reorder the rows of H, s and y so as to search the “easier to demodulate” layers first as described in equation (7), but the instant invention is not so limited.
An argument that simply minimizes equation (5) will produce the constrained maximum likelihood solution, but it gives no soft information or confidence about the decision. In order to generate soft information, a set of constrained points centered around S, a sphere center, may be considered.
By examining the set of solutions that lie within the hyper-sphere with radius less than r, it is possible to approximate a posteriori probability (APP) with suitable accuracy. How many points need to be considered in this set is examined subsequently herein. From a set of the L most-likely solutions that lie within the hyper-sphere, a list sphere detector can generate soft information by examining the bit changes and the relative costs of these bit changes.
The stack 402 is responsible for storing partial candidates, (where a partial candidate is an incomplete candidate, and a candidate is a solution to equation (6) with an associated cost). The processing elements 404, 406, 408 are each capable of computing one outer summation term of equation (7). The heap 410 is used to store the leading candidates, and the soft decision generator 412 uses information from the leading candidates stored in the heap 410 to produce a soft output signal. In one embodiment, the leading candidates are those candidates with the lowest costs, i.e., those closest to the sphere center.
The processing elements 404, 406, 408 comprise the main processing engine of the tree search engine 400. These processing elements 404, 406, 408 compute the cost of the child nodes (level i in
Each call to one of the processing elements 404, 406, 408 results in i being decremented. The processing elements 404, 406, 408 are described in more detail below.
The number of multiplication operations performed in the processing elements 404, 406, 408 can be significantly reduced by pre-computing U·ŝ. Since the vector s contains only ±1 entries (BPSK) or ±1 and ±j entries (QPSK), equation (7), may be simplified to the following expression which contains selective add/subtract and squaring operations.
where uijŝj are the pre-computed elements of U·ŝ, and sjε±1.
The stack 402 is used to store partial candidate solutions. In one embodiment, the stack 402 operates in a last-in, first-out (LIFO) mode, allowing the search to progress down the tree 300 in such a way as to compute the leaves from left to right across the tree 300. Alternatively, sorting entries in the stack 402 provides a more efficient way to search the tree 300 because nodes of most interest are visited first. A sorted stack is not strictly a stack because entries are not removed in a LIFO fashion, but for ease of understanding this sorted buffer will continue to be referred to as a stack.
Entries are sorted as they are added to the stack 402, limiting the memory required for the stack 402 to a small, well defined size, and simultaneously providing a mechanism to follow the branches with minimum incremental cost first, i.e., paths of highest interest first. Insertion sorting is efficient because entries added to the stack 402 do not generally move far during the insertion sort as discussed later. Those skilled in the art will appreciate that other sort techniques may be employed without departing from the spirit and scope of the instant invention.
Examining paths in order of interest means that the most likely leaves are examined first, which reduces processing in two ways. First, it means fewer leaves are added to the heap 410 and then discarded at a later time, and second, because lower cost candidates are found earlier, it allows the size of the search sphere to be dynamically reduced more quickly, resulting in more aggressive radius reduction, which in turn translates to fewer nodes visited. An added advantage of maintaining a sorted stack is that a meaningful result can be obtained even in cases where time constraints prevent the tree search from being completed. The stack 402 is common to all of the processing elements 404, 406, 408, and thus, provides a mechanism for redistributing the processing load between the processing elements 404, 406, 408.
Generally, the stack 402 stores several types of data, including depth in tree (i), cost to date for each node that will be processed in parallel at the next level (i), and the partial candidate. In one embodiment, it may be useful to sort the information in the stack 402 based on the depth first and the cost-to-date, such that next stack entry to be popped is the one with the greatest depth and lowest cost-to-date.
Since the stack 402 is sorted, there can be a maximum of M−2 entries on the stack 402 per processing module where M is the depth of the tree. Therefore, maximum stack length is bounded by the expression p.(M−2), where p is the number of parallel processors. Being bounded, the stack 402 can be readily built in hardware.
Stack sorting is not as expensive as a general sort because entries added to the stack 402 are typically at increased depths and therefore do not generally move very far during the insertion sort. The sorting process need not become a bottleneck. Should sorting time be a problem, a smart stack controller can allow a processing element to pop an entry off the stack 402 before the insertion sort has found the correct position for the entry it is adding.
Alternatively, the load associated with sorting may be eased by performing only a partial sort during times of high activity. Upon detecting a period of high activity, a smart stack controller could stop using the second sort key and rely solely on the first sort key. In the instant embodiment, partial sorting based on only the first key would result in the stack entries being sorted by depth (guaranteeing maximum stack size is bounded) but not by cost. Thus some “out-of-order” processing would occur, which may not be ideal, but this is permissible because the tree may be searched in any order. On the other hand, it may be useful in some embodiments to sort by cost, as under some circumstances the order in which the tree is searched may be improved.
Stack entries with a high relative cost can be removed early; that is, before their cost exceeds the current radius. If the partial cost is scaled up to the depth of the tree and the entries that exceed the radius by a certain amount are discarded, the operation count may be reduced by a factor of at least about 2 without significant effect on the performance of the sphere detector. The following formulae with linear scaling have been used in a 16 user system to predictively prune stack entries with good results.
Constants 1.5 and 1.25 are selected because multiplication by either value can be achieved with a single shift-add operation. Division by i can be avoided by either precomputing 16/i or multiplying both sides of the expression by i. Other values for predictive stack pruning may be selected without departing from the spirit and scope of the instant invention.
The selection criteria shown in Table 1 is used to prune the entries in the partial candidate stack 402, assuming that the matrix U is well balanced, that is, all diagonal elements are approximately equal. Should there be a wide range in the magnitude of diagonal elements of U, the matrix may be either normalized before performing detection or a non-linear scaling (based upon the magnitude of the diagonal elements) may be used to prune the stack. Predictive stack pruning based on the cost is performed on the newly calculated stack entry before the entry is added to the stack.
Using the heap 410 to store the list of the leading candidates (along with their cost) allows the largest cost-to-date candidate to be quickly found and is more efficient than keeping either a sorted or unsorted list. However, alternative constructs of the heap 410 may prove beneficial in certain circumstances. In practice, storing a fixed number of candidates is sufficient for generating bit a posteriori probabilities. The number of candidates that are required depends upon the quality of soft information desired and the number of users, M.
Assume a fixed amount of storage for L candidate solutions. As candidate solutions with cost less than radius are generated, they are added to a heap. Once the heap is full, further candidates are added by discarding the Lth highest cost candidate to date (top of heap) and replacing it with the new candidate. The heap controller then filters the new solution down to its appropriate level to maintain the heap rule. At the same time, the sphere radius is updated with the cost of the highest cost candidate in the new set (located at the heap top). This radius reduction strategy ensures that the L best candidates are kept and that additional power is not wasted computing candidates with cost greater than the Lth largest.
The heap rule is
cost(└x/2┘)≧cost(x) (10)
where 2≦x≦L is the index to the heap and └·┘ denotes round down.
Entries can be added to the heap in less than O(log2 L) time. During the early part of the detection process, while the heap is not full, the heap building process may be simplified by building the heap from bottom up. The first L/2 entries are added in leaf positions relative to the final heap and can be added in unit time. The next L/4 entries can be added in O(1) time (entries are filtered down by a maximum of 1 level), and so on, up the rows of the heap with the last entry being added in O(log2 L) time. Thus, the heap can be built in significantly less than O(log2 L) time. The data structure does not obey the properties of a heap until it is full, i.e. it is not a heap whilst it is being built. However this is not a problem in this application because the data may be extracted from the heap in arbitrary order.
The output of the tree search engine (or list sphere detector) is a soft decision for each user's bit, with the sign representing the decision and the magnitude representing the reliability. Generally, a log likelihood ratio (LLR) of probabilities is used:
In a spherical list detector, these probabilities can be determined directly from the cost information known about the candidates. For a system containing AWGN,
where cost is cost of the candidate s and is a squared Euclidian distance measure.
The probability of a “1” being transmitted is equal to the sum of the probabilities of all of the combinations containing a “1” for that given user k. If A is the set of 2M possible solutions for M users, then this is represented as
If only the costs of the best L solutions are known, then the others may be estimated from the knowledge that their cost is at least as high as that of our worst known point (current radius). This value can then be substituted in place of the unknown costs. Alternatively, these unknown results may be ignored completely, since their contribution is likely to be relatively small.
The soft outputs can then be determined by:
The softbit is thus obtained by performing a logsum of the probabilities for a received 1 and −1 (equations (13) and (14) respectively). The
term cancels out and 2σ2 can be estimated without significantly affecting the performance of most decoders. Equation (15) can then be computed with the well-known logsum operation.
A hard decision can be determined from the soft outputs by recording the sign of the output, with the magnitude representing the relative confidence of the decision.
Since the soft decision generator 412 can extract the candidates from the heap in any order, reading data out of the heap 410 can be completed in linear time. Furthermore, since the time to generate the soft data is faster than the tree search, this step can be pipelined and computed in parallel with the initial calculations for the next block.
The value initially chosen for the radius may have significant impact on the operation of the tree search. If the radius is too small, very few, if any, solutions will lie within this radius and the search may fail or give poor results. On the other hand, if the initial radius is too large, numerous candidates will be generated and later discarded, requiring significant computational overhead. One choice for the initial radius that guarantees a full candidate list is to set the initial radius to infinity.
r0=∞ (16)
Radius reduction comes into effect as soon as the heap fills, reducing the search sphere and amount of computation required.
In a real-time system, it may be useful to terminate the search before it comes to its natural completion. A meaningful result may still be obtained because the sorted stack ensures the paths of highest interest are normally searched first.
Higher degrees of parallelism within a processing element are possible. For example, one could compute the cost of all related nodes with a common great-great-grandparent. However, simulations to date have shown that computing the children for nodes with a common great-grandparent in parallel within a processing element results in an acceptable trade-off between power and speed for systems with less than about 30 users.
Multiple processing elements operating in parallel can speed up the search process. The processing elements share a common stack 402 and a common heap 410. Simultaneous access to either the stack 402 or heap 410 may be handled with arbitration.
When the number of parallel processing elements 404, 406, 408 becomes large, access to the sorted stack 402 may become a bottleneck, such that the addition of further processing elements 404, 406, 408 may not significantly increase throughput. Adding a specialist last-row processing element 600 to the architecture, as shown in
The specialist last-row processing element 600 in one embodiment is highly parallel and may be configured to process equation (9) for the case when i=1 (i is decremented from M down to 1 and 1 corresponds to computations for the last level of the tree). When a processing element 404, 406, 408 reaches the penultimate row (i=2), instead of pushing the partial candidates back onto the stack 402, this partial candidate is delivered to the specialist processing element 600 for accelerated last row processing.
The specialist last-row processing element 600 may significantly reduce the load on the partial candidate stack (up to 50% in some applications), and to a lesser degree on the processing elements 404, 406, 408. Most of the activity in the stack 402 occurs with respect to nodes located near the end of the tree. Thus, since the specialist last-row processing element 600 is invoked in the region of high activity for the stack 402, the stack 402 receives substantial benefit.
The specialist last-row processing element 600 has additional parallel logic (compared with the general processing elements 404, 406, 408) making it larger and faster than the general processing elements 404, 406, 408. In one embodiment, the specialist last-row processing element 600 calculates 4 leaf costs in as many cycles with pipelining. By generating leaf costs at least as fast as the heap 410 is able to accept candidates, the likelihood of a bottleneck is greatly reduced. Although the general processing elements 404, 406, 408 have arbitrated access to the last-row processing element 600, they would on average not have to wait any longer for access as compared with access to the heap 410. It is similar to a general row-processing element in that it computes the cost of all children for a common grandparent.
With the specialist last-row processing element 600 in place, predictive stack pruning is no longer available on the penultimate row. This suggests that additional specialist row processing elements on other rows is less worthwhile with diminishing returns. Also the hardware requirement for additional specialist processing elements grows exponentially.
An arithmetic unit 704 receives the information retrieved by the stack interface unit 702 and uses the information to compute one element of the outer sum of equation (7). The arithmetic unit 704 may be accomplished in hardware, software or a combination thereof. One exemplary representation of the arithmetic unit 704 is shown in
A pruning block 706 performs at least two tests on the 4 child nodes to determine whether to keep them or discard the newly calculated nodes. Hard pruning involves testing to see whether the new cost exceeds the current radius and discarding the nodes if the cost threshold has been exceeded. A second test involves applying equations shown in Table I to determine if predictive pruning is appropriate.
Accordingly, up to 4 new nodes may be discovered at one level further into the tree. These nodes are again partial candidate solutions, but are now closer to being (complete) candidates). An output controller 708 bundles the pairs of nodes and returns them to the stack 402, unless the nodes are leaf nodes or penultimate nodes in the case of specialist last-row processing being in place. If the nodes are leaf nodes, the output controller 708 delivers the candidate (which is equivalent to a leaf node) to the heap 410 instead.
Multiple iterations around the “stack 402—processing element 700—back to stack 402” loop build up successive elements of the outer summation term of equation 7 until the calculation is complete (i.e., when a leaf node is reached). The engine 400 is started by pushing a null partial candidate (corresponding to the top of the tree) onto the stack. The search process is complete when the stack 402 is empty and all of the processing units 404, 406, 408 are idle.
Those skilled in the art will appreciate that the various system layers, routines, or modules illustrated in the various embodiments herein may be executable control units (such as the controllers 210, 250 (see
The particular embodiments disclosed above are illustrative only, as the invention may be modified and practiced in different but equivalent manners apparent to those skilled in the art having the benefit of the teachings herein. Furthermore, no limitations are intended to the details of construction or design herein shown, other than as described in the claims below. Consequently, the method, system and portions thereof and of the described method and system may be implemented in different locations, such as the wireless unit, the base station, a base station controller and/or mobile switching center. Moreover, processing circuitry required to implement and use the described system may be implemented in application specific integrated circuits, software-driven processing circuitry, firmware, programmable logic devices, hardware, discrete components or arrangements of the above components as would be understood by one of ordinary skill in the art with the benefit of this disclosure. It is therefore evident that the particular embodiments disclosed above may be altered or modified and all such variations are considered within the scope and spirit of the invention. Accordingly, the protection sought herein is as set forth in the claims below.