The present invention will be understood more fully from the detailed description given below and from the accompanying drawings of various embodiments of the invention, which, however, should not be taken to limit the invention to the specific embodiments, but are for explanation and understanding only.
Methods and apparatuses for performing network coding over network topologies that change with time are disclosed. One embodiment of the invention provides a systematic way of increasing, and potentially maximizing, the amount of information delivered between multiple information sources (e.g., senders) and multiple information sinks (e.g., receivers) over an arbitrary network of communication entities (e.g., relays, routers, etc.), where the network is subject to changes (e.g., in connectivity and connection speeds) over the time of information delivery. Embodiments of the present invention differ from approaches mentioned in the background that look at static networks (fixed connectivity and connection speed), providing higher throughput than such prior art in which codes are designed to be robust over a sequence of topologies. Embodiments of the present invention are different from the approach of using random network codes.
Each network node (e.g., each sender, receiver, relay, router) consists of a collection of incoming physical interfaces that carry information to this node and a collection of outgoing physical interfaces that carry information away from this node. In a scenario of interest, the network topology can change over time due to, for example, interface failures, deletion or additions, node failures, and/or bandwidth/throughput fluctuations on any physical interface or link between interfaces.
In the following description, numerous details are set forth to provide a more thorough explanation of the present invention. It will be apparent, however, to one skilled in the art, that the present invention may be practiced without these specific details. In other instances, well-known structures and devices are shown in block diagram form, rather than in detail, in order to avoid obscuring the present invention.
Some portions of the detailed descriptions which follow are presented in terms of algorithms and symbolic representations of operations on data bits within a computer memory. These algorithmic descriptions and representations are the means used by those skilled in the data processing arts to most effectively convey the substance of their work to others skilled in the art. An algorithm is here, and generally, conceived to be a self-consistent sequence of steps leading to a desired result. The steps are those requiring physical manipulations of physical quantities. Usually, though not necessarily, these quantities take the form of electrical or magnetic signals capable of being stored, transferred, combined, compared, and otherwise manipulated. It has proven convenient at times, principally for reasons of common usage, to refer to these signals as bits, values, elements, symbols, characters, terms, numbers, or the like.
It should be borne in mind, however, that all of these and similar terms are to be associated with the appropriate physical quantities and are merely convenient labels applied to these quantities. Unless specifically stated otherwise as apparent from the following discussion, it is appreciated that throughout the description, discussions utilizing terms such as “processing” or “computing” or “calculating” or “determining” or “displaying” or the like, refer to the action and processes of a computer system, or similar electronic computing device, that manipulates and transforms data represented as physical (electronic) quantities within the computer system's registers and memories into other data similarly represented as physical quantities within the computer system memories or registers or other such information storage, transmission or display devices.
The present invention also relates to apparatus for performing the operations herein. This apparatus may be specially constructed for the required purposes, or it may comprise a general purpose computer selectively activated or reconfigured by a computer program stored in the computer. Such a computer program may be stored in a computer readable storage medium, such as, but is not limited to, any type of disk including floppy disks, optical disks, CD-ROMs, and magnetic-optical disks, read-only memories (ROMs), random access memories (RAMs), EPROMs, EEPROMs, magnetic or optical cards, or any type of media suitable for storing electronic instructions, and each coupled to a computer system bus.
The algorithms and displays presented herein are not inherently related to any particular computer or other apparatus. Various general purpose systems may be used with programs in accordance with the teachings herein, or it may prove convenient to construct more specialized apparatus to perform the required method steps. The required structure for a variety of these systems will appear from the description below. In addition, the present invention is not described with reference to any particular programming language. It will be appreciated that a variety of programming languages may be used to implement the teachings of the invention as described herein.
A machine-readable medium includes any mechanism for storing or transmitting information in a form readable by a machine (e.g., a computer). For example, a machine-readable medium includes read only memory (“ROM”); random access memory (“RAM”); magnetic disk storage media; optical storage media; flash memory devices; electrical, optical, acoustical or other form of propagated signals (e.g., carrier waves, infrared signals, digital signals, etc.); etc.
One can increase the average multicast rate by designing a strategy that targets the long-term network behavior.
Using an embodiment of the present invention, the optimal code used in
However, applying such a code over the time varying network of
Embodiments of the present invention achieve the gains mentioned above over a broad class of time-varying topologies under a variety of conditions. An embodiment of the present invention uses a “virtual topology” to define a fixed network code that does not need to be changed as the topology changes, The code is implemented over the sequence of instantaneous topologies by exploiting the use of buffers at each node.
In one such embodiment if there exists an “average topology”, i.e. the long term time averages for the link bandwidths can be defined, the “virtual topology” used can be this average topology, as in
When the long-term time averages do not exist or the session lifetimes are relatively shorter, one can use another definition of the “virtual topology”. For example, in a time varying network, one can consider a sequence of average graphs, each calculated over a limited time period, e.g. every “N” seconds over a period of “M”, “M>N” seconds. The virtual topology could be the minimum average topology considering this set of average topologies.
In another embodiment, one may consider a similar either long-term or short-term average topology in which some links, e.g. links below a minimum capacity, are removed.
In yet another embodiment, one may consider topologies as the above in which links that do not change the min-cut capacity are ignored.
In such embodiments of the present invention, the invention can also provide a sub-optimal adaptive strategy that can still perform better than or as good as instantaneous or robust strategies.
Network coding-based solutions, such as the prior embodiments based on the general principle of using a “virtual topology”, a “fixed network code”, and “virtual buffers”, that enable high-throughput low-complexity operation over networks with changing topologies are described. Solutions include, but are not limited to, (i) encoding functions that map input packets to output packets on outgoing physical interfaces at each node and techniques for buffering input packets upon arrival and output packets for transmission; (ii) mechanisms that determine the buffering time of input packets and possibly output packets and the associated number of output packets generated at each node; (iii) algorithms for updating the encoding functions at each node given deviations from the predicted transmission opportunities. One advantage of the proposed methods is that they can provide high-throughput low-complexity information delivery and management over time-varying networks, with lower decoding delays than random network coding methods. This is accomplished by addressing short-term fluctuations in network topology and performance via operation over an “induced” time-averaged (over a longer time-scale) topology.
In one embodiment, virtual buffers are needed. A virtual node architecture is described that (i) maps the incoming physical interfaces onto incoming logical interfaces; (ii) inter-connects the incoming logical interfaces to outgoing logical interfaces; and (iii) maps the outgoing logical interfaces onto outgoing physical interfaces. For example, in
The design and use of a fixed network code over a (finite- or infinite-length) sequence of time-varying networks for disseminating information from a set of sources to a set of destinations is also described. That is, during a prescribed period of time over which a network can be changing, the selection of a single network code may be made, which allows it to operate effectively and efficiently over such network variations. This is done by defining a code for a (fixed) “virtual topology”. The techniques to do so are widely known in the field and are applicable to any type of network (e.g., multicast, unicast, multiple users). In the case that (the same) information is multicast from a source to a set of destinations, the embodiment achieves high and, under certain conditions, the maximum achievable multicast rate.
In one embodiment, for instance, where the “time-averaged” sequence of networks converges as the averaging window becomes long, one embodiment implements a fixed network code that is designed for the “time-averaged” network. The implementation of the fixed network code relies on the use of virtual input and output buffers. These input (output) buffers are used as interfaces between the input (output) of the fixed network code and the actual input (output) physical interfaces. The collective effect of the use of these virtual buffers at each node facilitates the implementation of the fixed network code (designed for the virtual topology, which in this case is selected as the time-averaged topology) over the sequence of time-varying topologies that arise over the network while attaining the maximum achievable multicast throughput.
In another embodiment, a sequence of updated network codes is selected to be sequentially implemented over a sequence of time intervals. In particular, during any given update, a new network code is chosen that is to be used over the next time period (i.e., until the next update). In one embodiment, a “virtual” network topology, based on which the network code is constructed, is the predicted time-averaged topology for that period. Prediction estimates of the time-averaged topology for the upcoming period can be formed in a variety of ways. In their simplest form, these estimates may be generated by weighted time-averaging capacities/bandwidths/throughputs of each link until the end of the previous period. In general, however, they may be obtained via more sophisticated processing that better models the link capacity/bandwidth/throughput fluctuations over time and may also exploit additional information about the sizes of the virtual buffers throughout the network. If the time-averaged graphs vary slowly with time (i.e., if they do not change appreciably from one update to the next), the proposed method provides a computation and bandwidth-efficient method for near-optimal throughput multicasting.
Referring to
The time-varying network topology comprises a plurality of information sources and a plurality of information sinks as part of an arbitrary network of communication entities operating as network nodes. In such a case, in one embodiment, each network node of the topology consists of a set of one or more incoming physical interfaces to receive information into said each network node and a set of one or more outgoing physical interfaces to send information from said each network node.
In one embodiment, the virtual network topology for a given time interval is chosen as the topology that includes all the nodes and edges from the time-varying topology, with each edge capacity set to the average capacity, bandwidth, or throughput of the corresponding network interface until the current time. In another embodiment, the virtual network topology to exist at a time interval comprises a topology with each edge capacity set to an autoregressive moving average estimate (prediction) of capacity, bandwidth, or throughput of the corresponding network interface until the current time. In yet another embodiment, the virtual network topology to exist at a time interval comprises a topology with edge capacities set as the outputs of a neural network, fuzzy logic, or any learning and inference algorithm that uses the time-varying link capacities, bandwidths, or throughputs as the input.
In one embodiment, the virtual network topology is defined as the topology with the nodes and edges of the time-varying network, with each edge capacity set to a difference between the average capacity, bandwidth, or throughput of the corresponding network interface up to the time interval and a residual capacity that is calculated based on current or predicted sizes of virtual output buffers. In another embodiment, the virtual network topology comprises a topology with each edge capacity set to a difference between an autoregressive moving average of capacity, bandwidth, or throughput of the corresponding network interface up to the time interval and a residual capacity that is calculated based on current or predicted sizes of virtual output buffers. In yet another embodiment, the virtual network topology comprises a topology with edge capacities set as outputs of a neural network, fuzzy logic, or a learning and inference algorithm that uses the time-varying link capacities, bandwidths, or throughputs, as well as the current or predicted sizes of virtual output buffers as its input.
In one embodiment, the network topology varies due to one or more of link failures, link deletions, and link additions; time-varying capacity per link, time-varying bandwidth per link, time-varying throughput per link; time-varying inter-connectivity of network nodes; time-varying sharing of links with other users and applications; and node failures, node deletions, or node additions.
After determining a virtual network topology to exist at a time interval, processing logic selects, for the time interval, based on available network resources and the virtual network topology to exist at the time interval, a fixed network code for use during the time interval (processing block 402).
Once the network code has been selected, processing logic codes information to be transmitted over the time-varying network topology using the fixed network code (processing block 403). In one embodiment, the fixed network code is selected to achieve long-term multicast capacity over the virtual network. In one embodiment, selecting a network code for the time interval comprises choosing among many fixed network codes a code with optimized decoding delay characteristics. In one embodiment, selecting a network code comprises selecting, among many fixed network codes that satisfy a delay decoding constraint, the code that achieves the largest multicast capacity. In one embodiment, selecting a network code for the time interval comprises identifying an encoding function for use at a node in the topology for a given multicast session by computing a virtual graph and identifying the network code from a group of possible network codes that maximizes the multicast capacity of the virtual graph when compared to the other possible network codes. In one embodiment, computing the virtual graph is performed based on a prediction of an average graph to be observed for the session duration.
In one embodiment, coding information to be transmitted includes processing logic performing an encoding function that maps input packets to output packets onto outgoing physical interfaces at each node and determining buffering time of input packets and an associated number of output packets generated at each node.
Along with the coding process using the network code, processing logic handles incoming and outgoing packets at a node in the network using a virtual buffer system that contains one or more virtual input buffers and one or more virtual output buffers (processing block 404). In one embodiment, the network code dictates input and output encoding functions and buffering decisions made by the virtual buffer system for the node. The virtual buffer system handles incoming packets at a node and well as determines scheduling for transmitting packets and determining whether to discard packets.
In one embodiment, a node using the virtual buffer system performs the following: it obtains information (e.g., packets, blocks of data, etc.) from one or more of the physical incoming interfaces; it places the information onto virtual input buffers; it passes information from the virtual input buffers to one or more local network coding processing function blocks to perform coding based on the network code for the time interval; it stores the information in the virtual output buffers once they become available at the outputs of (one or more of) the function blocks; it sends the information from the virtual output buffers into physical output interfaces. In one embodiment, the (one or more) local network coding processing function blocks are based on a virtual-graph network code.
Referring to
Based on the virtual graph (and thus the virtual topology), processing logic computes a network code (processing block 504) and constructs a virtual buffer system for implementing the network code F(n) over the physical time-varying topologies during the n-th interval (processing block 505). The network code F is set according to a number “|E|” of functions as in the following:
F
(n)
={f
1
(n)
, f
2
(n)
, . . . , f
|E|
(n)}.
Each function can be computed at one node centrally (e.g., at the source node) and distributed to the routers (nodes). A given node needs only to know some of these functions, e.g. the ones it implements between its incoming and outgoing interfaces. Alternatively, each node in the network can compute its local functions itself, after sufficient topology information is disseminated to that node. In one embodiment, the network code is selected to be a throughput maximizing code, while in other embodiments, the network code is selected to achieve high throughput and other requirements (e.g., decoding delay requirements).
Thus, over the n-th time interval (session), the process comprises the following: (i) the formation of a virtual topology for the duration of the session, obtained via link-capacity measurements collected over the network during all cycles of (or, a subset of the most recent) past sessions; (ii) the construction of a network code for use with the virtual topology; (iii) the implementation of the network code (designed for the virtual topology) over the sequence of time-varying topologies during the n-th time interval (session) by exploiting the use of virtual buffers.
As set forth above, prior to the n-th multicasting session, a virtual topology is formed for the n-th session. In constructing the virtual topology, it is assumed that a topology control mechanism is present, providing the sets of nodes and links that are to be used by the multicast communication session. The topology control mechanism can be a routine in the routing layer. Alternatively, since network coding does not need to discover or maintain path or route information, the topology control mechanism can be a completely new module replacing the traditional routing algorithm. Topology control can be done by establishing signaling paths between the source and destination and the routers along the path can allocate resources. In an alternative setting where the topology corresponds to an overlay network, the overlay nodes allocate the path resources and routers at the network layer perform normal forwarding operations. In one embodiment, in generating the virtual topology, it is assumed that the set of instantaneous topologies during all the past sessions have been obtained via link-state measurements and are hence available. At the outset of the n-th session, the collection of weighted topology graphs {Gk}k<n,{G*k}k<n are available. Note this set can also be written as a function of {Gk}k<n since {G*k}k<n is itself a function of {Gk}k<n.
One can specify {Gk}k<n by a notation {V,E,C(k) for k<n} where:
The multicast capacity of the virtual (average) graph is 1 symbol per cycle (determined by the minimum cut). The network code shown in
In general, Ci(k) representing the i-th element of C(k) (and denoting the capacity, or throughput value estimate during the k-th session over edge ei, i.e., the i-th edge of the topology graph) changes over time, although it remains bounded. The link-state measurement function tracks C(n) over time; at the outset of the n-th session, it uses knowledge of C(k) for k<n, to form a predicted (virtual) topology for the n-th session. Specifically, the virtual topology graph can be expressed as G*n=G(V,E,C*(n)), where the i-th entry of C*(n) is the predicted capacity of the i-th link during the n-th session.
In general, not all the vectors {C(k), k<n} need to be used in calculating C*(n), and therefore to be used in calculating, G*n.
In one embodiment, the throughput vector of the virtual topology graph is an estimate of the time-averaged link-capacities to be observed in the n-th session. In one embodiment, the computation of the estimate C*(n) takes into account other factors in addition to all C(k), for k<n. In one embodiment, the computation takes into account any available statistical characterization of the throughput vector process, the accuracy of past-session C* estimates, and, potentially, the size of the virtual buffers that are discussed herein. In another embodiment, the computation takes into account finer information about the variability of the link capacities during any of the past sessions, and, potentially other inputs, such as decoding other constraints set by the information being multicasted (e.g. delay constraints).
Letting C(k,j) denote the j-th vector of link capacity estimates that was obtained during the k-th session, and assuming τk such vectors are collected during the k-th session, a capacity vector for the virtual topology, C*(n), can be calculated in general by directly exploiting the sets {C(k,1), C(k,2), . . . , C(k, τk)}, for all k<n.
The i-th entry of C(k,j), denoting the link-capacity of the i-th link in the j-th vector estimate of the k-th session, may be empty, signifying that “no estimate of that entry/link is available within this vector estimate.”
In one embodiment, the virtual topology is computed in a centralized manner by collecting the link-state measurement data at a central location where the virtual topology is to be calculated. In another embodiment, a distributed link-state measurement and signaling mechanism are used. In such a case, assuming each node runs the same prediction algorithm, one can guarantee that each node can share the same view on the topology and the predicted averages over the new session, provided sufficient time is allowed for changes to be propagated and take effect. Finally, the available link-state measurements can also be exploited by the topology control module, in order to expand or prune the vertex set V and/or the edge set E depending on the attainable network capacity.
Once a virtual topology graph G*n=G(V,E,C*(n)) is chosen for use during the n-th session, a network code is constructed for this graph. There are many existing techniques that can design deterministic, or random (pseudo-random in practice) linear network codes that achieve the maximum-flow (minimum-cut) capacity over a given fixed graph. In one embodiment, one such linear network code is chosen based on one of the existing methods for designing throughput-maximizing network codes for such fixed network graphs. Such a network code can be expressed via |E| vector-input vector-output functions {f1, f2, . . . , f|E|} (one function per edge in the graph). Specifically, the network code function fi, associated with edge ei, outputs a vector of encoded packets yi of dimension Ci*(n), where Ci*(n) is the i-th element of C*(n). Let k=tail(ei) denote the tail of edge ei, and let Vk denote the subset of indices from {1,2, . . . , |E|} such that the associated edges in E have node k as their head node. Let also Yk denote the vector formed by concatenating all vectors yj for all j in Vk (denoting all the vectors of encoded packets arriving to node k through all its incoming edges), and let ck*(n) denote its dimension (which is equal to the sum of the Cj*(n) over all j in Vk). Then, the vector of encoded packets that is to be transmitted over edge ei out of node k is formed as follows
y
i
=f
i(Yk)=WiYk, (1)
where the scalar summation and multiplication operations in the above matrix multiplication are performed over a finite field, and Wi is a matrix of dimension Ci*(n)×ck*(n) with elements from the same field. Although not stated explicitly in the functional descriptions of Wi, yi, and Yk, in general, their dimensions depend not only on the edge index i, but also on the session index n.
The edge capacities of a virtual graph may not be integers. In that case, each edge capacity Ci*(n) is scaled by a common factor t(n) and rounded down to the nearest integer, denoted by Qi*(n). The network code outputs on edge ei a vector yi of dimension Qi*(n). Similarly, the dimensions of Wi are Qi*(n)×ck*(n), where ck*(n) is the dimension of Yk (denoting the vector formed by concatenating all vectors yj for all j in Vk).
In one embodiment, each packet consists of several symbols, where each symbol consists of a finite set of bits. The number of bits in a symbol is defined as the base-2 logarithm of the order of the finite field over which the linear combinations are formed. The linear combinations are applied on a symbol-by-symbol basis within each packet.
In an alternative embodiment, where the minimum cut capacity can be achieved using a network code that does not utilize all the available capacity of each edge, then an overall capacity-achieving network code can often be selected which may generate sets of yi's, whereby some of the yi's have dimension less than Ci*(n).
Finally, associated with each receiver is a linear vector-input vector-valued linear function that takes all the available packets at the incoming virtual interfaces and recovers the original packets. Each of these decoding operations corresponds to solving a set of linear equations based on the packets received from all the incoming edges at the receiving node. Note that intermediate nodes, i.e., nodes that are not final receivers of information, can also perform such decoding operations in calculating messages for their outgoing interfaces.
As is well known in the prior art, by properly selecting the size of the finite field and the set of coefficients used in the linear network-coding transformations over a fixed graph, one can attain the maximum achievable multicast capacity over a fixed graph. In one such example, one can select the coefficients randomly at the start of each time-interval and use them until the next interval where the virtual graph will change and, with high probability, the resulting network code will be throughput maximizing over the fixed graph.
In one embodiment, the calculation of a network coding function is based on a virtual topology graph G*n This network coding function works effectively over the actual time-varying networks. The use of the network code that was designed for the virtual graph relies on emulation of the virtual graph over the instantaneous physical graphs that arise in the network over time. Such emulation accommodates the fact that the sequence of physical topologies observed can, in general, be significantly different from the virtual topology that was assumed in designing the network coding functions f1, f2, . . . , f|E|. In one embodiment, emulation of the virtual graph over the instantaneous physical graphs is accomplished by exploiting a virtual buffering system with respect to the fi's. In one embodiment, the virtual buffer system consists of virtual input and virtual output buffers with hold/release mechanisms, designed with respect to the virtual-graph network code. Note that, as shown herein, the operation of these buffers is more elaborate than simply locally smoothing out local variations in link capacities, especially when alternating between various extreme topologies. In particular, it allows optimizing the choice of the network code used on the virtual graph in an effort to achieve objectives such as high throughput, low decoding complexity, and low decoding delay.
The choice of the virtual-graph network code determines the set of network-coding functions implemented at each of the nodes, and, consequently, the associated virtual buffer architecture at each node. The principles behind designing a virtual buffer architecture can be readily illustrated by considering the sequence on networks presented in
When both of the virtual incoming buffers 702 and 703 have packets waiting, the local network-coding function 713 takes one packet from the head of each of virtual incoming buffers 702 and 703, XORs them and puts the encoded packet at the tail of the virtual outgoing buffer for edge e7 704. Then the two packets that were XORed are removed from the associated virtual input buffers 702 and 703. The procedure is repeated until at least one of the virtual input buffers 702 and 703 is empty. When the physical outgoing buffer 706 is ready to accept packets (e.g., the physical link is up and running), a release decision 705 (a decision to release the packet to the physical outgoing interface buffer 706) is made and the packet waiting at the head of the virtual outgoing buffer 704 is copied into the physical outgoing interface buffer 706. Once an acknowledgement of successful transmission of the packet is received (e.g., received ACK feedback 707), the packet is removed from the virtual output buffer 704.
Virtual buffers allow node 3 to continue network coding in a systematic manner, as packet pairs become available and to store the resulting encoded packets until the physical outgoing interface is ready to transmit them (e.g., physical outgoing interface buffers 706 are ready to transmit).
Note that the use of a deterministic network code (achieving the multicast capacity on the average graph) allows one to decide in a systematic low-complexity manner the information that needs to be stored and/or network coded so that the multicast capacity is achieved. Furthermore, it is guaranteed that this maximum capacity is achieved with an efficient use of storage elements (incoming packets are discarded once they are no longer needed by the fixed network code), as well as efficient use of transmission opportunities (it is a priori guaranteed that all packets transmitted by any given node are innovative). For instance, the network code of
Other embodiments of the virtual buffer architecture for implementing the network code at node 3 use a single virtual input buffer, with a more complex hold-and-release mechanism. In one such embodiment, both the y3 and y4 data (data from different physical incoming interface buffers 701) are stored in non-overlapping regions of the virtual input buffer as they become available to node 3. The hold-and-release mechanisms keep track of which of the available y3 and y4 data have not been network coded yet.
Two embodiments of the virtual buffer system at a typical node of an arbitrary network are depicted in
Specifically, input links 801 (e.g., logical links, physical links) feed packets to physical input buffers (1-Ni) 802, which in turn feed the packets to various virtual input buffers (1-Nf) 803. Packets in each of the virtual input buffers 803 is sent to one of the network coding functions F(1)-F(Nf) 804. The outputs of the network coding functions 804 are sent to distinct virtual output buffers 805. The coded data from virtual output buffers 805 are sent to physical output buffers 806, which in turn send them to output links 807 (e.g., logical links, physical links). Coded data from one of the virtual output buffers 805 is sent directly to one of the physical output buffers 806, while the other coded data from two of the virtual output buffers 805 are sent to the same one of physical output buffers 806 based on a release decision 810. Acknowledgement (ACK) feedback 808, when received, causes data to be removed from the virtual output buffers.
Specifically, input links 901 (e.g., logical links, physical links) feed packets to the common input buffer 902, which in turn feed the packets to the joint release and discard mechanism 903. Packets in each of the virtual input buffers 803 are sent to one of the network coding functions F(1)-F(Nf) 904. The results of the coding by network coding functions 904 are sent to distinct virtual output buffers 905. The coded data from the virtual output buffers 905 are sent to the physical output buffers 906, which send them to the output links 907 (e.g., logical links, physical links). Coded data from one of the virtual output buffers 905 is sent directly to one of the physical output buffers 906, while other coded data from two of the virtual output buffers 905 is sent to the same one of the physical output buffers 906 based on a release decision 910. Acknowledgement (ACK) feedback 908, when received, causes data to be removed from the virtual output buffers.
Thus, as shown in
Virtual output buffers collect and disseminate network coded packets. In particular, during a single execution of a given function “F(k)”, one network-coded output packet is generated for transmission and appended to the associated virtual queue. The hold and release mechanisms of these output buffers are responsible for outputting the network coded data in the physical output queues. Given that the rate of flow out of physical buffers is determined by the state of the links and possibly additional operations of the network node, and can thus be dynamic, these hold-and-release mechanisms can be designed to have many objectives. In one embodiment, the virtual buffers copy subsets of (or, all) their packets without discarding them, to the physical output buffer. A packet is discarded from the virtual outgoing buffer if its transmission is acknowledged by the physical link interface. In case of transmission failure, however, the packet is recopied from the virtual outgoing buffer (without being discarded) to the physical output buffer. In another embodiment, the hold-and-release mechanism of the virtual output buffers plays the role of a rate-controller, limiting the release of the packets at the rate supported by the physical layer. Release decisions in this embodiment can be based on the buffer occupancy of the physical layer and the instantaneous rates of the outgoing links.
In more advanced embodiments, also illustrated in
Another set of embodiments that can be viewed as an alternative to those illustrated in
There are fundamental differences between the virtual buffers used in embodiments described herein and the physical input and physical output buffers (for storing received packets or packets awaiting transmission) that have already been provisioned in many existing network elements (e.g. 802.11 access points, IP routers, etc). Such physical input/output buffers can take various forms. In some systems, a common physical Input/Output (I/O) buffer is employed (e.g., a First-In First-Out queue serving both functions in hardware), while in other cases multiple buffers are used, each serving a particular class of Quality of Service. Typically, when a packet is scheduled for transmission, it is removed from the interface queue and handed to the physical layer. If the packet cannot be delivered due to link outage conditions, after a finite number of retransmissions, the packet is discarded. On the other hand, virtual buffers are designed so as to enable the implementation of the (fixed) network coding functions (dictated by the virtual-graph network code) over the set of network topologies that arise over time. They accomplish this goal by accumulating and rearranging the packets that are required for each local network-code function execution, used in conjunction with hold-and-release operations that are distinctly different from those used in physical queues. The virtual buffer sizes (maximum delays) are set in accordance with the network code that is being implemented, i.e., they are set so as to maintain the average flow capacity out of the node required (or assumed) by each function in the network code design. Specifically, assuming the virtual-graph network code is designed in a way that requires on average flow of “Rk,i” units/sec on link “i” out of node “k”, the virtual buffer size and hold/release mechanism of packets to that link are designed to maintain that required flow rate Rk,i over link i out of node k, regardless of the instantaneous capacity of the link, which at any time can be greater, equal, or smaller than Rk,i. This flow rate is required by network coding functions by subsequent nodes in the information path. In fact, the link may be used for transmitting packets from other functions, each having their own average flow requirements. Virtual buffers allow sharing of links over many functions in this case. The systematic methods described herein for the virtual-graph network code design and implementation ensure that the required data flow can be handled by each link on average, i.e., that Rk,i is less than or equal to the average throughput that link “i” can handle.
In another embodiment, each node locally selects the coefficients of its (linear) network-coding functions. The embodiment can be viewed as a decentralized alternative to the aforementioned approach where a virtual graph is first centrally calculated and used to estimate the multicast capacity and construct the network code. In the embodiment, portions of the virtual graph are locally obtained at each node. Specifically, the estimate of the capacity of any given edge is made available only to the tail node and the head node associated with this edge, and the resulting locally available information at each node is used for generating local network-coding functions (including the local code coefficients). “Throughput probing” can then be performed over the network for tracking the multicast throughput achievable with the given network coding functions (and thus the maximum allowable rate at the source).
Throughput-probing is a method that can be used to estimate the multicast capacity of a (fixed) graph without knowledge of the entire graph. It also allows the source to adjust its rate during each session so as to track long-term throughput fluctuations over sequences of sessions. When the actual throughput during a session is lower than the one predicted by the average graph, the network coding operations performed during the session provide adequate information for throughput probing. For instance, throughput probing can be accomplished by estimating the rates of data decoding at all destination nodes, and making those rates available to the source. The attainable multicast throughput can be estimated at the source as the minimum of these rates and can then be used to adjust (reduce in this case) the source rate for the next cycle. However, when the actual achievable throughput during a session is higher than the source rate used by the virtual-graph network code (i.e., higher than the minimum cut of the associated virtual graph), more information is needed for throughput probing beyond what is available by the network coding operations. In one embodiment, this additional information may be provided by the following two-phase algorithm.
In the first phase of the algorithm, the local network coding functions at the source node are designed for a source rate Rmax at every session, where Rmax denotes the maximum operational source rate in packets per second. Specifically, in each session, the network code at the source node operates on a vector of Kmax(n) source packets every t(n) seconds, where Kmax (n) equals Rmax×t(n). Both Rmax and t(n) are design parameters of the embodiment. Let R(n) denote the estimate of the source rate that can be delivered during the n-th session, and assume that R(n) does not exceed Rmax. To guarantee that the source rate delivered during the n-th session is limited to R(n) (even though the network code was designed to operate at a rate Rmax), only K(n)=R(n)×t(n) out of Kmax(n) packets in each input vector is used to carry information, while the rest of the vector is set to zero.
In the second phase, each intermediate node first sends data according to the fixed network code and opportunistically sends more coded packets, whenever extra transmission opportunities become available (and assuming there is no more data in the virtual output buffer). This incremental expansion of the local-network codes exploits additional transmission opportunities that are not exploited by the fixed code for the virtual graph, thereby allowing sensing of potential increases in throughput at the destinations.
The first phase together with the second phase allows one to estimate the multicast throughput by calculating the minimum decoding rate, i.e., calculating the number of independent linear equations to be solved at each receiver node and selecting the smallest one as the new source vector dimension for the next session (the new source rate is obtained by dividing the new source vector dimension by t(n)). For example, if the minimum source vector dimension is d(n) and d(n)>K(n), then at least d(n)−K(n) additional packets can be transmitted in each input vector (for a total of d(n) packets in each source vector). In one embodiment, throughput probing is performed more than once during a session, in which case the adjusted source rate is the average of the minimum decoding rates.
The throughput probing algorithm may also be used in the case where the actual throughput during a session is lower than the one predicted by the average graph. In that case, the minimum decoding rate d(n)/t(n) is smaller than K(n)/t(n) and is used as the new source rate. The additional overhead for such throughput probing consists of two terms: (i) the number of bits that are required to describe the additional coefficients of the extra source packets used in each linear combination; and (ii) a few extra bits in order to be able to uniquely identify at each destination the number of non-zero-padded source packets used within each source input vector block. This additional overhead may be transmitted to the receivers once at the beginning of each session.
In summary, implementation-efficient and resource-efficient methods and apparatuses for realizing the benefits of network coding (in terms of achieving maximum flow capacity between a set of senders and a set of receivers) over time-varying network topologies have been described. These methods and apparatuses systematically select and implement a fixed network code over a session, during which the network topology is time-varying. Specifically, in one embodiment:
Under a wide range of conditions, the techniques described herein allow attaining optimal or near-optimal multicast throughput in the long-term. Since the network code employed by the proposed method stays fixed over each session and many different codes exist that achieve the same performance, the method allows one to select a near throughput-maximizing code with low decoding delay and complexity. Compared to other random network coding approaches proposed in the literature, for instance, the proposed codes can provide either lower decoding complexity and lower decoding delay for the same throughput, or higher throughput at comparable decoding complexity and decoding delay.
System 1000 further comprises a random access memory (RAM), or other dynamic storage device 1004 (referred to as main memory) coupled to bus 1011 for storing information and instructions to be executed by processor 1012. Main memory 1004 also may be used for storing temporary variables or other intermediate information during execution of instructions by processor 1012.
Computer system 1000 also comprises a read only memory (ROM) and/or other static storage device 1006 coupled to bus 1011 for storing static information and instructions for processor 1012, and a data storage device 1007, such as a magnetic disk or optical disk and its corresponding disk drive. Data storage device 1007 is coupled to bus 1011 for storing information and instructions.
Computer system 1000 may further be coupled to a display device 1021, such as a cathode ray tube (CRT) or liquid crystal display (LCD), coupled to bus 1011 for displaying information to a computer user. An alphanumeric input device 1022, including alphanumeric and other keys, may also be coupled to bus 1011 for communicating information and command selections to processor 1012. An additional user input device is cursor control 1023, such as a mouse, trackball, trackpad, stylus, or cursor direction keys, coupled to bus 1011 for communicating direction information and command selections to processor 1012, and for controlling cursor movement on display 1021.
Another device that may be coupled to bus 1011 is hard copy device 1024, which may be used for marking information on a medium such as paper, film, or similar types of media. Another device that may be coupled to bus 1011 is a wired/wireless communication capability 1025 to communication to a phone or handheld palm device.
Note that any or all of the components of system 800 and associated hardware may be used in the present invention. However, it can be appreciated that other configurations of the computer system may include some or all of the devices.
Whereas many alterations and modifications of the present invention will no doubt become apparent to a person of ordinary skill in the art after having read the foregoing description, it is to be understood that any particular embodiment shown and described by way of illustration is in no way intended to be considered limiting. Therefore, references to details of various embodiments are not intended to limit the scope of the claims which in themselves recite only those features regarded as essential to the invention.
The present patent application claims priority to and incorporates by reference the corresponding provisional patent application Ser. No. 60/829,839, entitled, “A Method and Apparatus for Efficient Information Delivery Over Time-Varying Network Topologies”, filed on Oct. 17, 2006.
Number | Date | Country | |
---|---|---|---|
60829839 | Oct 2006 | US |