The present invention relates to apparatus and methods for controlling the transfer of communication traffic to multiple links of a multi-link system, and in particular, but not limited to controlling the transfer of communication traffic in a router or switch onto member links of a multi-link bundle or group.
Network switches or routers may have one or more physical egress ports each having one or more groups or bundles of links for carrying egress communication traffic. Ingress communication packets which are to be routed to the port are received and distributed among the links of the multi-link group for further transmission. To increase transmission speed and reduce latency, particularly for large packets, the router may include a fragmenter which divides packets into smaller packet fragments which are subsequently distributed among different links of the multi-link group, so that the packet is effectively transmitted over two or more links rather than a single link. The fragmented packet is eventually reassembled at an appropriate point in the network. Distribution of packets or packet fragments to the multi-link group is typically managed by a scheduler which initially directs each packet or packet fragment to a particular buffer or queue associated with a particular link of the multi-link group. In one fragmentation scheme, a maximum fragment size is specified and packets larger than the maximum fragment size are divided into one or more fragments of the maximum specified size. Where the size of a packet is not equal to an integral number of maximum size fragments, the last fragment will be smaller than the maximum size. Packets that are smaller than the maximum fragment size are not segmented.
A proposed mechanism for determining which member link to transmit a packet or fragment is based on a determination of the member link with the least depth. This mechanism involves the steps of (1) polling the amount of traffic queued to each member link, (2) transmitting the fragment to the first empty queue found, (3) if no empty queues are found, transmitting the fragment onto the member link with the least amount of queued traffic, and (4) in the event of a tie, selecting one of the tied links, for example, the first tied link to be found or a tied link that is randomly chosen.
One drawback is that this method requires a relatively large amount of information and processing before a link is selected and a packet or packet fragment can be transmitted to the appropriate queue. Another drawback is that it can be difficult to maintain an accurate count of the depth of each member link queue, and this difficulty increases with the number of links in the multi-link group, and with the number of multi-link groups of the system. On highly channelized systems, the amount of processing required may impact throughput on other channels or links of other multi-link groups due to the amount of work required by the algorithm.
Another mechanism for determining the member link to which to transmit a packet or packet fragment involves a round robin selection process between member links and designating one of the member links as a reference link which is selected if another member link is full. This method involves the steps of (1) specifying one of the active links as the reference link. The amount of queued traffic associated with the reference link is monitored and used to back pressure the traffic management device scheduling traffic for the multi-link bundle;
(2) transmitting in a round robin manner successive packets or fragments to each active member link of the multi-link group; and (3) before transmitting to a link, polling the queue status to check if there is sufficient space for the packet or fragment. If there is insufficient space, the fragment is transmitted to the queue of the reference link.
This method requires less computation than the first in selecting the member link to which to transmit a particular packet or fragment. However, this method is not particularly effective in evenly distributing traffic between active member links where packets are divided into multiple fragments and the final fragment is small. In this event, some traffic patterns cause traffic to be unevenly distributed among the member links, causing unexpectedly high delays or poor utilization of the bundle member links. Some member links may become empty while others have large amounts of traffic queued to them.
According to one aspect of the present invention, there is provided an apparatus for controlling the transfer of communication traffic to an interface having a plurality of links, comprising a detector for detecting a parameter indicative of the sizes of data units to be transferred to said interface, and a controller operative to cause data units to be transferred to the interface, wherein, for at least one of said links, said controller is operative to select the link to which to transfer a data unit based on the detected parameter of the data unit.
As used herein, the term “data unit” means either a packet or a fragment of a packet whether the fragment is a full fragment or a partial fragment. A “full fragment” is a fragment of a maximum specified fragment size and a “partial fragment” is a fragment of less than the maximum specified fragment size.
In this arrangement, the controller for managing the distribution of data units to the links of a multi-link group is sensitive to the data unit, and this enables the controller to distribute data units to the links more evenly. In particular, the controller is sensitive to a characteristic of the data units that may vary between data units, such as the size of the data unit. This allows the controller to discriminate between differently sized data units and control their distribution to links of a multi-link group on that basis.
When transmitting variable size packet fragments onto a multi-link PPP (point-to-point protocol) or FR (frame relay) interface on a switch or router, the inventors have found that it is desirable to distribute traffic evenly to all member links. The even distribution of traffic minimizes transmission and reassembly latency. On a highly channelized multi-service switch/router, the challenge is to distribute packets evenly to all member links of a bundle without impacting the throughput on other channels. This requires a highly efficient algorithm when selecting a member link onto which to transmit.
In some embodiments, the controller is operative to consecutively select the same link a plurality of times wherein each selection results in the transfer of a data unit to the link, if at least one of the data units has a size below a predetermined value. This mechanism enables, in addition to a relatively small data unit, another data unit to be transferred to the same link before selecting another link, thereby preventing only a relatively small data unit to be transferred to a link in a single transfer session. This results in a more even distribution of traffic between the links of the multi-link group, and reduces the likelihood of a queue or link running dry.
In some embodiments, the apparatus further comprises a detector for detecting another characteristic of a data unit, and the controller is operative to select the link to which to transfer the data unit based on the detected characteristic. Thus, in this embodiment, the controller additionally selects the link based on whether a particular characteristic is present in a data unit. For example, the characteristic may be whether or not the data unit is a fragment of a packet. In one embodiment, if it is determined that the data unit is not a fragment of a packet and is below a predetermined fragment size (and is therefore an integral packet), the controller may be operative only to transfer that packet to the currently selected link without including another data unit which would otherwise increase the amount of traffic transferred to that link in a single transfer session. For example, this mechanism allows full data packets below a predetermined size such as voice packets, for instance, to be distributed among different links rather than two or more packets of such size being transmitted successively on the same link. This also provides a mechanism which allows the controller to discriminate between a full packet below a maximum fragment size and a fragment of a packet below the maximum size so that, for a plurality of consecutive full packets below the maximum fragment size, different member links can be successively selected for the transfer of each packet. This allows a contiguous stream of sub-maximum fragment size packets to be evenly distributed between the links and not all transmitted on a single link, so that the efficiency benefits of the multi-link system can be obtained.
In some embodiments, the plurality of links includes a reference link, and the controller is operative to transfer a data unit initially determined to be transferred to another link to the reference link in response to a status of the other link.
In some embodiments, the other link has an associated queue for receiving data units, and the status is that the queue has insufficient space for receiving the data unit. In this embodiment, the reference link provides an overflow for receiving data units which would otherwise have been transferred to other member links of the multi-link group if their respective queues had sufficient space.
In some embodiments, the controller is operative to select the reference link for the transfer of a data unit other than data units initially determined for transfer to another link. In some embodiments, the controller uses one or more different criteria or one or more different rules for transferring data units to the reference link to that used for transferring data units to another link of the group.
In this arrangement, the reference link is used not only as a data unit overflow but may also be selected by the controller for transmitting data units when not being used for data overflow. The controller may also use a different criteria for transferring data units to the reference link to that used for transferring data to one or more other links. For example, the criteria used by the controller may have the effect that when the reference link is selected for transfer of non-overflow data, less traffic tends to be transferred to the reference link than to at least one other member link. In this arrangement, the controller is operative to bias selection of the links to which to transfer data units, towards one or more other member links relative to the reference link. In one specific, non-limiting example, when a data unit to be transferred to the reference link comprises a partial fragment of a packet (i.e. a fragment below a predetermined maximum fragment size), the controller may transfer only that fragment to the reference link without an additional data unit before selecting the next potential link to which to transfer the next data unit or units. This implementation provides a mechanism for reducing the non-overflow traffic on the reference link relative to traffic on the other member link(s). Thus, in contrast to the conventional round robin distribution mechanism discussed above which tends to under fill member links other than the reference link, the present embodiment better fills the member links while moderating the amount of traffic on the reference link so that better use is made of the multi-link system as a whole.
In some embodiments, a monitor is provided to monitor a status indicative of the amount of traffic and/or the amount of available space in a reference buffer of the reference link and to generate a signal indicative of the status which is used to control the flow of communication traffic to be distributed to the buffers and links of the multi-link group. In some embodiments, only the status signal of the reference buffer is used to control the flow of incoming communication traffic for distribution to the buffers and links of the multi-link group. This arrangement simplifies the system and reduces the resources (e.g. hardware) required to implement this function. In other embodiments, more than one reference link may be provided for a multi-link group, where the number of reference links is less than the total number of member links of the group. In such an arrangement, the status of each, or fewer than each reference buffer may be monitored, and their status used to control the flow of traffic for distribution by the multi-link group.
According to another aspect of the invention, there is provided an apparatus for controlling the transfer of communication traffic to a plurality of links of a group of links, comprising a detector for determining whether each data unit to be transferred to said group of links has a predetermined characteristic, and a controller operative to cause data units to be transferred to said group of links, wherein said controller is operative to select the link of the group to which each data unit is to be transferred, and is operative to control the number of data units transferred to a currently selected link based on the determination.
According to another aspect of the invention, there is provided a method for controlling the transfer of data units to a plurality of links of a group of links, comprising detecting a parameter capable of distinguishing between data units of different size, and selecting a link of the group to which to transfer the data unit based on the detected parameter.
According to another aspect of the present invention, there is provided an apparatus for controlling the transfer of communication traffic to a group of links including a reference link, the apparatus comprising a detector for detecting a status associated with each link and a controller operative to cause data units to be transferred to said reference link in response to the detected status of another link, wherein said controller is operative to select a link for the transfer of each data unit and is operative to transfer more data unit(s) to a link other than said reference link while said other link is selected than to said reference link while said reference link is selected.
In some embodiments, the controller is operative to transfer more data units to one or more links other than the reference link while the respective link is selected based on one or more predetermined criteria.
In some embodiments, the predetermined criteria is that each other link has sufficient space to receive the data unit(s).
In some embodiments, the criteria is based on a characteristic of at least one of the data units to be transferred, for example whether the data unit is less than a predetermined size or is a full packet or fragment of a packet, and/or any other characteristic.
According to another aspect of the invention, there is provided an apparatus for controlling the transfer of communication traffic to one or more buffers, each having an associated link and to a reference buffer having an associated reference link, the apparatus comprising a detector for detecting a characteristic, e.g. the sizes of data units of the communication traffic to be transferred to said buffer(s) and to said reference buffer, and a controller operative to cause data units to be transferred to said buffer(s) and to said reference buffer, wherein said controller is operative in response to the detected characteristic of the data units to bias selection of the buffers to which to transfer data units, towards said plurality of buffers relative to said reference buffer.
In some embodiments, the controller is operative to bias the selection based on one or both of (1) a determination that a buffer other than the reference buffer meets a predetermined criterion, and (2) that a data unit meets a predetermined criterion.
In some embodiments, the predetermined criterion of a buffer is whether the other buffer has sufficient space to receive a data unit. In some embodiments, the predetermined criterion of the data unit is whether the data unit is only part of a data packet.
Examples of embodiments of the present invention will now be described with reference to the drawings, in which:
In this embodiment, the router 1 includes a fragmenter 27 operatively coupled to the ingress module 3 for dividing received packets above a predetermined size into packet fragments. The fragmenter may be implemented so that data units output from the fragmenter include full packets which are either equal to or less than the predetermined maximum fragment size, packet fragments of a size equal to the maximum fragment size and partial fragments which are fragments of packets below the maximum fragment size. Thus, data units from the fragmenter may have any size ranging from the maximum fragment size downwards. Data units from the fragmenter 27 are transferred to the egress interface 15 under the control of the scheduler 9. In other embodiments, the fragmenter may be omitted so that, for example, only whole packets are transferred to the multi-link group.
The scheduler 9 comprises a detector 11 for detecting a parameter indicative of the sizes of data units to be transferred to the interface 15. The parameter may be any suitable parameter indicating the size of a data unit, including but not limited to any one or more of (1) the actual size of the data unit, (2) an indication that the data unit is below and/or above a certain size, and (3) an indication that the data unit is within and/or outside a particular size range. The scheduler 9 further comprises a controller which is operative to cause data units to be transferred to the interface 15, wherein, for at least one of the links of the interface, the controller is operative to select the link to which to transfer a data unit based on the parameter of the data unit detected by the detector 11.
In this embodiment, the detector 11 is also operative to detect another characteristic of data units which is also used by the controller 13 to select the link to which to transfer a data unit. The characteristic may be whether the data unit is a full packet which is less than or equal to the maximum fragment size or a partial fragment.
The scheduler including the detector and the controller may be implemented in software, firmware, hardware, or a combination of any two or more of these or by another suitable means.
In this embodiment, one of the links of the group and its associated buffer is functionally designated as a reference link (and reference buffer), which are defined as the link and buffer to which a data unit is transferred by the scheduler if it is determined that another member link to which the data unit would otherwise have been transferred has insufficient space in its associated buffer for receiving the data unit (or the link or buffer status is such that the link/buffer cannot receive the data unit for some other reason). In this particular example, link 19n and its associated buffer 23n are the reference link and reference buffer, respectively, although in other embodiments, any other link and associated buffer may provide the reference link/buffer. In other embodiments, any two or more member links and associated buffers may provide the reference function.
An indicator associated with each buffer 23a to 23n provides an indication to the scheduler 9 indicative of the amount of traffic queued in each buffer, and this is used by the scheduler to determine whether or not a data unit can be transferred to a particular buffer. These indicators are schematically represented in
In this embodiment, the router further comprises a queue monitor 31 which monitors the status of the reference buffer 23n. The queue monitor may generate a signal indicative of the available space in the reference buffer for receiving data units and this signal may be used to control (for example, maintain at a current level, increase or decrease) the flow of traffic to be transferred to the buffers and links of the multi-link group. The control signal may be used by any device which is capable of providing such control, which may include but is not limited to any one or more of the scheduler 9, the fragmenter 27, the ingress module 3 or a device upstream of the router or network device 1. Any one or more of these devices may communicate with each other to provide the control.
In the general method of controlling the transfer of data units to member links of a multi-link group which may be implemented by the scheduler 9, for one or more member links, the controller selects the link to which to transfer a data unit based on a parameter indicative of its size. In addition, for these one or more member links, the link to which to transfer a data unit may also be based on another characteristic of a data unit such as whether or not the data unit is a full packet or partial fragment. In one example, where a specific link is selected for the transfer of a partial fragment, the same link may be consecutively selected also for the transfer of another data unit. In this way, the transfer of a relatively small data unit is accompanied by the transfer of another data unit to the same link (or buffer). This makes better use of a transfer session by transferring a larger amount of traffic, assisting in distributing traffic more evenly between the member links and helping to prevent a buffer running out of data units before receiving another unit. However, where a specific link is selected for the transfer of a full packet, only the packet is transferred without an additional data unit, and another buffer/link is initially selected for the transfer of the next data unit. Thus, the controller can discriminate between small packets and small fragments, and distribute small packets evenly among the links of the group.
Although in some embodiments, this method of consecutively selecting the same member link for the transfer of two or more data units before making another selection if one of the data units is a partial fragment may also apply to the reference link, in other embodiments, this method is not applied to the reference link and instead, a different criteria for transferring data to the reference link is used. In one embodiment, the method used for transferring data units to the reference link involves selecting the reference link a number of times which is less than the number of times another member link is consecutively selected to receive data units, and in one specific embodiment, the reference link is selected only once. Thus, in this embodiment, if the reference link is selected to receive a partial fragment, only the partial fragment is transferred to the reference link without a consecutive selection of the reference link for the transfer of another data unit, based on the transfer of a partial fragment. However, the reference link may be selected consecutively for the transfer of two or more data units where the previous transfer was to the reference link and a member link that would have been selected next cannot accept the next data unit and the reference link is invoked in its overflow capacity.
In other embodiments, the controller may be configured to consecutively select a link other than the reference link, for the transfer of n data units, where n≧3 based on a characteristic of a data unit and to consecutively select the reference link for the transfer of n-x data units, where x≧1 based on the same characteristic.
A specific but non-limiting example of an embodiment of a method for controlling the transfer of data units to member links of a multi-link group (or bundle) include the following steps:
(1) One of the active links is specified as the reference link. As mentioned above, the status of the reference link and/or its associated buffer is used to control the flow of traffic to be transferred to the buffers/links of the multi-link bundle, and may for example be used to back pressure the traffic processor, and/or any other device which is capable of controlling the traffic flow.
(2) Member links to which data units are to be transferred are selected in a round-robin manner.
(3) Before transmitting a data unit to a particular link, the status of the associated buffer is pulled to check if the buffer has sufficient space available for the data unit. If there is sufficient space in the selected buffer, the data unit is transferred to the selected buffer. If there is insufficient space, the data unit is transferred to the reference link.
(4) If the data unit to be transferred to a link other than the reference link is a partial fragment (i.e. a fragment of a packet which is smaller than the maximum fragment size), selection is not advanced to the next member link, but the same member link is again selected for receiving the next data unit. Thereafter, selection is advanced to the next member link.
(5) If the data unit to be transferred to the reference link is a partial fragment, the unit is transferred and selection is advanced to the next member link.
(6) If the data unit to be transferred to a member link is a full packet that is equal to or less than the size of a full fragment, the selection advances to the next member link.
(7) If the next data unit to be transferred is a full fragment, the full fragment is transferred to the currently selected link and the selection may advance to the next member link. Alternatively, in another embodiment, if the next two data units to be transferred comprise a full fragment and a partial fragment, the method may be implemented such that both data units are transferred to the same member link.
A flow diagram illustrating an example of a process for controlling the transfer of data units to member links of a multi-link group, and which may be implemented by the scheduler 9 shown in
Referring to
Returning to step 203, if it is determined that the data unit is not a partial fragment, it is determined at step 213 whether the data unit is a full packet rather than a fragment. In this case, the full packet may have a size either equal to or less than the maximum fragment size. If the data unit is a packet, the data unit is transferred to the selected member link buffer at step 215 and the process advances to step 211 in which the next member link buffer is selected.
Returning to step 213, if it is determined that the data unit is not a full packet, the process may deduce that the data unit is a full fragment, and transfers the full fragment to the selected member link buffer at step 217. The process may then advance to step 211, in which the next buffer is selected. In an alternative embodiment, after selecting the current member link, e.g. before, during or after transferring the full fragment to the selected buffer at step 217, (or at some other time), the process may perform steps in which a partial fragment is also transferred to the same buffer, an example of which is shown by the broken line steps in
Embodiments of the process may include both sets of steps 207,209 and steps 219,221 and 223 and in other embodiments, the process may include either one of these two sets of steps but not the other.
Once the next buffer for transfer of the next data unit is selected at step 211, the process determines if the selected buffer has sufficient space to receive the data unit at step 201. If yes, the process advances to step 203 and the cycle is repeated. If, at step 201 it is determined that the selected buffer has insufficient space, it is determined whether the selected buffer is the reference link buffer at step 225 and if not, a determination is made as to whether the reference buffer has sufficient space at step 227. If the reference buffer has sufficient space, the data unit is transferred to the reference buffer at step 229 and the process then advances to step 211 at which the next buffer is selected. Returning to step 227, if it is determined that the reference buffer does not have sufficient space, action is taken to reduce traffic flow for distribution to the member link group. Similarly, if at step 225 it is determined that the selected buffer that has insufficient space (as determined at step 201) is the reference buffer, the process advances to step 231 at which appropriate action is taken. Once appropriate action has been taken or while appropriate action is being taken, the process may again advance to step 211 at which the next member link buffer to which a data unit is to be transferred is selected.
The flow diagram of
A more specific but merely illustrative and non-limiting example of an implementation of a process for transferring data units to member links of a multi-link group based on the embodiment of the method shown in
Referring to
As illustrated in
In the example of
In this example, the fragmented packets or full packets from the fragmenter 27 are made available for distribution to the member links of the multi-link group in the order of P1 to P6 and the fragments of each packet are made available in the same order in which they appear in the packet. (In other embodiments, packets and/or fragments of a packet may be made available for distribution in any other order.)
In the next cycle shown in
In the third cycle illustrated in
In the next cycle, illustrated in
It can be appreciated from the above example, that the distribution method tends to cause the non-reference link member buffers to receive a higher proportion of the available data units for distribution to the group per distribution cycle compared to the prior methods. In the embodiment, this is achieved by consecutively selecting the same buffer for the transfer of two data (or possible more) units, where one of the data units is relatively small. This helps to ensure that each time a buffer is selected in the distribution cycle, a larger minimum amount of traffic is transferred to that buffer before advancing to the next buffer, making it less likely that that partial buffer runs out of data units to transfer to the link before it is selected again in the next distribution cycle. Advantageously, this also helps to reduce or eliminate latency in reassembling fragments due to delays in receiving one or more packet fragments.
Referring to an alternative (or additional) process illustrated in
In the third cycle shown in
In the next cycle illustrated in
For illustrative purposes, further data units to be transferred to the member links of the multi-link group may include data units P7, F1P8, F2P8 and P9. After the partial fragment F4P6 is transferred to the buffer B2 together with the full fragment F3P6, buffer selection advances to the next buffer, B3, for the transfer of the next data unit P7. As data unit P7 is a full packet, buffer selection then advances to the next buffer which is the reference buffer RB, and the next data unit which is a full fragment F1P8 is transferred thereto. Invoking the process rule 219 illustrated in
It will be appreciated that this method is similar to that and provides the same benefits as the method described above with reference to
Other benefits provided by embodiments of the method are that as the distribution method helps to more evenly distribute data units among the links of a multi-link group so that the link buffers are less likely to run dry or become full and unable to accept a data unit when selected, the buffer size may be reduced and/or the number of member links may be increased without compromising performance due to these two effects. The number of member links can be increased as it is less likely that the link buffer will run dry before it is next selected. The buffer size can be maintained or reduced as it is not necessary to oversize the buffers, if the number of buffers is increased, in order to accommodate more data units in each buffer to reduce the likelihood of running dry due to the increased time to complete a distribution cycle. Distributing the data units more evenly may also reduce the number of times the reference link is involved in its overflow capacity, which also reduces the additional processing involved, thereby making the distribution method even more efficient.
Embodiments of the apparatus and method may be applied to any device requiring data distribution over a plurality of links, including, but not limited to network devices including switches and routers, examples of which include Multi-link Point to Point Protocol (ML PPP), Multi-Link Frame Relay (MLFR) as well as others, relays, end user devices, e.g. computers, mobile or static communication devices including personal handheld devices, including mobile telephones and other devices. Embodiments of the apparatus and method may be used in any communication network including wireless, or landline including wireline, optical and/or any other communication traffic conveying media.
It is to be noted that a round robin buffer selection cycle may start with any buffer and the buffer may be selected in any predetermined sequence.
In any aspect or embodiment of the apparatus or method described herein, any one or more features may be omitted altogether or substituted by one or more other features, which may or may not be an equivalent thereof.
Other aspects and embodiments comprise any one or more features disclosed herein in combination with any one or more other features disclosed herein, or a variant or equivalent thereof.
Numerous modifications to the embodiments described herein will be apparent to those skilled in the art.