Reducing Propagation Of Message Floods In Computer Networks

Description

BACKGROUND

Computer networks are typically comprised of a number of network switches which connect a group of computers together. Ideally, computer networks pass messages between computers quickly and reliably. Additionally, it can be desirable that a computer network be self-configuring and self-healing. In Ethernet switching networks, a spanning tree algorithm is often used to automatically generate a viable network topography. However, there are several challenges when implementing Ethernet switching networks within large datacenters and computer clusters. One challenge relates to instances where the network switches do not have the necessary information to deliver a message to its destination. In this case, the network switches broadcast the message through out the entire network, resulting in a message flood. The message flood eventually results in the delivery of the message to the desired end station, but produces a large volume of network traffic. In large scale networks, where the number of end stations is large, the likelihood and magnitude of message floods increases dramatically.

BRIEF DESCRIPTION OF THE DRAWINGS

The accompanying drawings illustrate various embodiments of the principles described herein and are a part of the specification. The illustrated embodiments are merely examples and do not limit the scope of the claims.

FIG. 1 is a diagram of one illustrative computer network, according to one embodiment of principles described herein.

FIG. 2 is a diagram of a portion of an illustrative computer network, according to one embodiment of principles described herein.

FIG. 3 is a diagram of an illustrative method for a network switch to determine how to route an incoming message, according to one embodiment of principles described herein.

FIG. 4 is a diagram of a portion an illustrative computer network which limits message flooding to a relevant portion of the computer network, according to one embodiment of principles described herein.

FIG. 5 is a diagram of an illustrative method for using a local unique constant in a hash function to reduce message flooding, according to one embodiment of principles described herein.

FIG. 6 is a flowchart showing an illustrative method for reducing propagation of message floods in computer networks, according to one embodiment of principles described herein.

Throughout the drawings, identical reference numbers designate similar, but not necessarily identical, elements.

DETAILED DESCRIPTION

The network switches which make up a computer network incorporate forwarding tables which contain forwarding information required to route messages to their intended destination. Forwarding tables use caches that are based on techniques which hash the end station address to implement destination lookup tables. After a forwarding route is cached, switches forward data only on the port that sends data toward its intended destination.

However, the forwarding tables are typically stored on random access memory (RAM) with limited memory capacity, which may prevent the network switch from retaining a complete set of forwarding data. With limited size RAM, hash collisions result in hash table misses. Hash table misses cause flood type broadcasting within the network that decreases network performance. Existing systems do not fully utilize the capabilities of neighboring switches to limit the propagation of flooding to relevant portions of the network when forwarding cache misses result in a broadcast.

Additionally, the network switches depend on the proper forwarding information being propagated through the network. If the destination of an incoming message can be matched with proper routing information contained within the forwarding table, the switch forwards the message along the proper route. However, if there is no routing information that corresponds to the desired destination, the switch broadcasts the message to the entire network. This creates message flood which propagates through the entire computer network, eventually reaching the desired end station. Particularly in large computing networks, this message flood can consume a large portion of the network capacity, resulting in decreased performance and/or the requirement to build a more expensive network that has far greater capacity than would otherwise be required.

This specification describes networking techniques which reduce propagation of message floods while still allowing the message to reach the desired end station. In particular, the specification describes techniques that improve the ability of neighboring switches to mitigate broadcast penalties without the requirement for hardware changes or upgrades. This allows networks to incorporate smaller forwarding caches while providing an equivalent level of performance.

Existing forwarding techniques suffer from two inadequacies. First, since a forwarding cache in one switch uses the same hashing function as forwarding caches in neighboring switches, cache collisions produced in one switch may be replicated in neighboring switches. In particular, the broadcasting action within one switch may cause cache misses and broadcasting to neighboring switches. This may cause cache missing to propagate from switch to switch throughout a computer network.

According to one illustrative embodiment, distinct hash functions can be implemented within each switch. With this technique, even when a hash collision occurs within a forwarding cache in one switch, it is unlikely that a hash collision occurs in a neighboring switch. This can improve the neighboring switches ability to block unnecessary broadcast traffic. By introducing the concept of a distinct hash within each switch, broadcast traffic and wasted network bandwidth is reduced.

Limiting the scope for broadcast traffic also reduces the number of unnecessary forwarding entries that must be maintained within switch caches that are not directly on the communication path. A second inadequacy is that is many situations, it is difficult for a switch that is a neighbor to a switch that is missing its cache to learn the forwarding direction for missing end station addresses. The specification describes a method to detect that cache missing is occurring and to instruct neighboring switches as the location of the missing end station in order to eliminate unnecessary propagation of broadcast traffic.

Additionally or alternatively, a method for neighboring switches to learn forwarding direction for missing end station address can be implemented. First, a network switch detects conditions that are symptomatic of cache missing. In this situation, selective broadcasting is intentionally performed to deposit forwarding entries in the caches of switches that are in the neighborhood of the missing switch. This again improves the ability of neighboring switches to limit the effects of broadcast traffic. With this invention, networks can be constructed using simpler hashing functions and smaller forwarding table RAM while reducing the volume of unnecessary broadcast traffic.

In the following description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the present systems and methods. It will be apparent, however, to one skilled in the art that the present apparatus, systems and methods may be practiced without these specific details. Reference in the specification to “an embodiment,” “an example” or similar language means that a particular feature, structure, or characteristic described in connection with the embodiment or example is included in at least that one embodiment, but not necessarily in other embodiments. The various instances of the phrase “in one embodiment” or similar phrases in various places in the specification are not necessarily all referring to the same embodiment.

FIG. 1 shows one illustrative embodiment of a three tiered computer network which connects a number of end stations (105). Each end station (105) is connected to an edge switch (110). The edge switches (110) are connected in turn to larger aggregation switches (115) which are connected to core switches (120, 122). Thus, this illustrative computer network has a three tiered structure consisting of an edge layer (125), an aggregation layer (130), and a core layer (140). Using the computer network (100), any end station (105) can communicate with any of the other end stations (105).

The computer network topology and management is important to maximize the performance of the computer network, reduce costs, increase flexibility and provide the desired stability. Early in the development of computer networks, a number of problems had to be overcome. One of those problems was messages being trapped in endless loop as a result of a minor change to the network topology, such as adding a link or an end station. The trapped message would be repeatedly passed between various network components in a closed cycle that never allowed the message to reach the intended destination. This could generate enormous volumes of useless traffic, often making a network unusable.

A spanning tree algorithm was developed to eliminate potential cycles within a network. The spanning tree algorithm identifies a set of links that spans the network and allows each end station to communicate with every other end station. Redundant links were blocked to prevent loops which could give rise to a cycle. After a spanning tree is identified throughout an entire network, each switch within the network can use a very simple forwarding procedure. When a message is sent from any end station A to any end station B, each switch forwards an incoming message on all active (spanning tree) ports except the port on which the message arrived. This process is called flooding and can be performed with no routing information except information needed to define the active ports. This simple procedure guarantees correct network transmission. Every message sent from an end station A traverses the entire spanning tree and is guaranteed to arrive at end station B, where it is received when B recognizes its target address. Other end stations drop the message addressed to end station B because it is not addressed to them. The use of the spanning tree prevents endless message transmission. When a message reaches the end of the spanning tree, no further message transmission occurs.

For large networks, this broadcast-based, procedure can be very inefficient. FIG. 2 illustrates a portion of a computer network (200) in which a spanning tree has been implemented. The computer network (200) contains end station A (205), end station B (210), and a number of upstream switches (215, 220, 225). When a message is sent from end station A (205) to a neighboring end station B (210), that message traverses every link in the entire spanning tree even though communications could be supported from A to B using only the links (230, 235) between end station A (205), switch 1 (215), and end station B (210).

Adaptive forwarding has been developed to enhance communications efficiency using forwarding tables that learn the proper route to each destination. Messages contain MAC (Media Access Controller) addresses that uniquely identify all end stations within an Ethernet network. Each message has a source MAC address and destination MAC address. The source indicates the origin end station, and the destination indicates the target end station. Whenever a message is received on a link with source address X, then a forwarding table entry is created so that all subsequent messages destined for X are forwarded only this link. For example, after a first message is sent from end station B (205) with source address B, a forwarding entry to B is created within switch 1 (215). Subsequent messages sent into switch 1 (e.g. from end station A) with destination address B traverse only link 1 (230) and link 2 (235). This procedure is used to adaptively create forwarding tables throughout large networks to reduce network traffic. This adaptive forwarding procedure requires that the switches efficiently implement hardware based lookup tables. Lookup hardware reads the input destination MAC address, which consists of either 48 or 64 bits of address information, depending on the addressing standard. The lookup result, if successful, identifies the unique forwarding port to the addressed end station. If a forwarding entry for the input MAC address is not found, then the message is forwarded on all active links except the link on which the message arrived.

Efficient hash mapping approaches have been developed to implement hardware lookup for adaptive forwarding. FIG. 3 is a diagram of an illustrative associative lookup method (300). An input MAC address (305) is received on the left. The MAC address (305) is received by a hash function (310) which performs a randomizing function that produces a consistent random value for each MAC address read. In a simple example, the hash function (310) may multiply the MAC address by a very large prime number and select ten digits from the result. These ten digits are a quasi-random number which can be consistently generated given a MAC address. Consequently, whenever the same MAC address is applied on input, the same random address is produced. The hash function allows fast look up and even distribution of MAC addresses within the RAM address space. In one example system, the hash function reads a 48 bit MAC address and produces a 12 bit hash address. This hash address is used to lookup forwarding entries simultaneously within the two 4096 word lookup Forwarding Table Random Access Memory (RAM) (315, 320). Table entries are stored within two RAMS totaling 8192 potential entries in this two-way set associative lookup. Each entry is marked as empty or not empty. Each non-empty entry holds a forwarding instruction consisting of a tag field that validates a match and a result field that indicates the correct forwarding action when the match occurs. In this example, the tag field contains the full destination MAC address and the result field contains the index of the port on which data should be sent to reach that address. For example, if a matching entry results in the value 6, then the 6th port is used to forward data.

During destination address look up, two potential forwarding instructions result. The tag fields are then compared in by the tag compare modules (325, 330). If the tag field for one of those forwarding instruction exactly matches the input destination MAC address, then the result field from the matching instruction can be used to forward data.

Whenever a message enters a switch, both its source address and the destination address are processed. The destination address is processed to determine the correct forwarding port. The source address is processed to determine whether a new entry should be placed in the forwarding table. When a source address is processed, the lookup table is queried to see whether that source address is already in the table. If no entry for the source address lies in the table, then there are no current instructions on how to reach the end station having that source address. In this case, a new entry can be added into the table. If either entry is empty, then the value for that forwarding entry is set with tag field equal to the source address and result field equal to the port on which the message arrived into the switch. For this switch, subsequent messages sent to that source address will be sent only on the correct forwarding port. If the address is already in the table, and the correct forwarding port is indicated no further action is needed. If the address is already in the table and an incorrect forwarding port is indicated, then the entry is overwritten with correct forwarding instructions.

As new entries are entered into the table, a replacement strategy is needed. When a message arrives from an end station having a given source address, the lookup process may determine that both entries are nonempty and do not match the newly arriving message. In this case, the new entry may displace a randomly selected entry from the two-way set. Thus, replacement “flips a coin” and decides which entry is to be replaced with the new entry. There are occasions when multiple, frequently used destinations happen to hash to the same hash address. For this two-way set associative scheme, only two distinct forwarding instructions can be held at the same hash address location within each of the two RAMs. If there is a third common communication to the same hash address, at least one of these communications will repeatedly fail to identify forwarding instructions. This is called a forwarding table lookup miss. In this case, data is flooded or forwarded on all spanning tree ports except for the port on which the message arrived.

According to one illustrative embodiment, several changes can be made to the architecture described above which may reduce cost and improve the performance of the computer network. A reduction in forwarding efficiency occurs when multiple destination addresses produce the same hash address. For example, in a one-way set associative table, only a single forwarding entry can reside at each hash location. When multiple forwarding addresses hash to the same location, forwarding misses will cause some incoming messages to be flooded.

At least two features can be introduced into the network architecture which reduce propagation of message floods within the network. The purpose of these features is to assist neighboring switches in halting the lc propagation of broadcast floods throughout a larger network and to reduce the total forwarding table space needed within the network. Reducing table space requirements again reduces the number of flooding actions within the network as each required entry can replace another needed entry. To simplify examples, we assume switches use a one-way associative hash table to implement adaptive forwarding.

FIG. 4 shows an illustrative computer network (400) where it is assumed that, due to an unfortunate choice of destination addresses, end stations B (410) and end station C (415) conflict or map to the same hash address. Conflicting B and C forwarding entries cannot be simultaneously present within any switch. For this example, it is assumed that a large bidirectional communication flow occurs between end stations A and B (405, 410) as well as another large bidirectional flow between end stations A and C (405, 415). As traffic alternates first from end station A to B (405, 410), and then from end station A to C (405, 415), switch 4 (435) cannot hold forwarding entries for both communication flows. One of the flows will miss during its forwarding lookup and will flood messages on all paths including the required communication path. Determining which flow misses depends upon the replacement order for forwarding entries for the conflicting B and C end stations (410, 415). After end station B (410) communicates with end station A (405), then messages from end station A (405) to end station B (410) no longer miss, but messages from end station A (405) to end station C (415) now do miss. Similarly, after end station C (415) communicates with end station A (405), then messages from end station A (405) to end station C (415) no longer miss, but messages from end station A (405) to end station B (410) now do miss.

Assume that, after the network is initialized, a very first communication is from end station A (405) to end station B (410). Since, no switch within the entire network has a forwarding entry for end station B (410), the message is broadcast throughout the entire spanning tree and the adaptive forwarding procedure places an entry for end station A (405) in every switch. Now, all messages sent to end station A (405) traverse the proper communication path. For example, messages sent from either end station B (410) to end station A (405) or from end station C (415) to end station A (405) never traverse switch 5 (440). Consequently, switch 5 (440), and more remote switches, may never discover forwarding entries for the end station B (410) or end station C (415). As a result, misses at switch 2 (435) for flows from end station A (405) to end station B (410) or from end station A (405) to end station C (415) propagate throughout large regions of the network that have no knowledge of the location of end station B (410) or end station C (415).

This problem can be alleviated by ensuring that switch 5 (440) becomes aware of the location of end stations for which a miss is repeatedly occurring. If switch 5 (440) has a forwarding entry for end station B (410) and switch 5 (440) receives a message for end station B (410) from switch 4 (435), then that message can be dropped because it has arrived on an input link that also the correct route to the destination. If switch 5 (440) can learn the needed end station location information, switch 5 (440) can provide a barrier that limits unnecessary message propagation due to forwarding table misses in switch 4 (435).

One approach to propagating needed information uses logic in each switch to detect whenever a new forwarding entry is entered. For example, a message from end station B (410) to end station A (405) may cause a new forwarding entry for B to be added in switch 4 (435). This indicates that a message sent to end station B (410) would have missed just prior to this addition, and thus, misses to end station B (410) may likely happen in the future whenever this new entry is replaced. When the message from end station B (410) to end station A (405) is processed, and the B entry is added, this message is artificially flooded even though the lookup entry for end station A (405) lies in the forwarding table. This allows that on a subsequent miss from end station A (405) to end station B (410) at switch 4 (435), switch 5 (440) will block flooding that might otherwise propagate throughout the network. While the link connecting switch 4 (435) to switch 5 (440) is flooded, flooding does not propagate past switch 5 (440).

This artificial flooding action to teach the network the location of end stations need not be performed on every new entry insertion as that might waste undue link bandwidth. The artificial flooding action may be caused with some low probability each time a new entry is added. For example, when a message from end station B (410) to end station A (405) is processed and a replacement of the B forwarding entry occurs, the switch can flood with some low probability p (e.g. p=0.01). This allows that switch 5 (440) will eventually (after about 100 replacements of the destination at switch 4 (435)) learn the location for an end station that is repeatedly missing at switch 4 (435). The forwarding probability can be adjusted to produce the desired forwarding frequency. For example, a low forwarding probability could be used where there is a large number of communication flows and an inadequate hash table size such that the forwarding process can miss frequently. This can reduce the overall network traffic and minimize the expense of broadcasting messages over a long distance when this missing occurs. By informing a neighboring switch of the location of the conflicting end stations, the neighboring switch can be enabled to automatically act as a barrier to limit the flooding of messages to the remainder of the computer network.

A significant problem remains to be solved. If identical hash functions are used within all the switches, the switches will all exhibit the same conflicts. For example, when switch 4 (435) repeatedly misses as traffic is alternatively sent to end stations B and C (410, 415), then switch 5 (440) has the same conflict and may again propagate misses to its neighbors. In our example, forwarding entries for B and C cannot be simultaneously held within switch 4 (435) or switch 5 (440). Since switch 5 (440) is of identical construction and uses the same hash function, switch 5 (440) also cannot simultaneously hold entries for destinations of end stations B and C (410, 415).

This problem is rectified by the illustrative associative lookup method (500) shown in FIG. 5 which has a new input (505) to the hashing function. As discussed above, each switch performs a hash operation on incoming MAC addresses (305). However, if each switch performs the same hash operation on all incoming MAC address, all the switches will exhibit identical behavior. By introducing a local unique constant (505) or other variable into the hash function (310), each switch will then categorize the incoming MAC address (305) differently. According to one illustrative embodiment, the local unique constant (505) used by the switch may be its MAC address. The switch's MAC address then serves as an operand in the hash function (505) to ensure that a distinct hashing function is applied to end station addresses at each switch. This ensures that the likelihood of neighboring switches exhibiting the same forwarding table lookup miss is extremely low.

Returning to the example illustrated in FIG. 4, the situation has significantly improved when a local unique constant is used as an operand in the hash function at each switch. The same difficulty is confronted when an unlucky selection of the B and C target addresses leads to a conflict in the switch 4 (435) lookup table. Again, this causes flooding to neighboring switches when a message to end station B (410) or end station C (415) misses during forwarding table lookup. However, now the surrounding switches, including switch 5 (440), use a distinct hashing function to identify lookup table locations. With this new hash, it is very unlikely that end station B (410) and end station C (415) also conflict within switch 5 (440). Thus, now switch 5 (440) forms an effective barrier to the misses produced in switch 4 (435). This architecture reduces the total number of miss messages propagated within a fabric as well as the total amount of memory needed for lookup tables in a fabric.

Previously, when a repeated miss occurs at a switch, that miss might also be repeated at a neighboring switch. Under some conditions, misses can propagate throughout the entire fabric. These misses also systematically flood the network with potentially useless forwarding entries. In our example, conflicting flows from end station A to end station B and end station A to end station C cause repeated flooding that inserts forwarding entries for end station A throughout the network, potentially displacing useful entries even where not needed. With this improved architecture, misses still occur within a switch, but neighboring switches limit costly effects of flooding by acting as a barrier that reduces wasted bandwidth and wasted forwarding table space.

FIG. 6 is a flow chart of an illustrative method for reducing propagation of message floods in computer networks. In a first step, information is flooded to neighboring switches when a new entry is added the forwarding table of a first switch (step 600). The neighboring switches apply unique hash functions to the flooded information (step 610) and store the information in the resulting forwarding table entry (step 620). When subsequent misses occur, the neighboring switches receive the flooded information and access the forwarding table (step 630) and determine that the flooded message has arrived on an input link thatis also the correct route to the destination (step 640). The message can then be dropped, thereby reducing the undesirable flooding of information throughout areas of the computer network that are not needed to convey the message to its intended destination (step 650).

In sum, by propagating missed forwarding information throughout the network, neighboring switches learn pertinent information about switches which may have recurring forwarding table misses. Introducing variations in the calculation of the hash function at each switch ensures that there is a very low likelihood that neighboring switches will exhibit the same forwarding table miss. The neighboring switches can then act as a barrier to prevent unnecessary flooding into other areas of the computer network after a forwarding table miss. By applying these principles, the overall efficiency of a computer network can be improved without replacing switches or increasing the forwarding look up table RAM in each switch.

The preceding description has been presented only to illustrate and describe embodiments and examples of the principles described. This description is not intended to be exhaustive or to limit these principles to any precise form disclosed. Many modifications and variations are possible in light of the above teaching.

Claims

1. A computer network (400) comprising: a first switch (435); anda neighboring switch (440);wherein said first switch (435) floods said computer network (400) as a result of a forwarding table miss, said neighboring switch (440) acting as a barrier to prevent said flood from propagating into unrelated areas of said computer network (400).
2. The network of claim 1, wherein said neighboring switch (440) acquired said information relating to a proper destination through intentional flooding of information.
3. The network of claim 2, wherein said first switch (435) performs said intentional flooding when a new forwarding table entry is added to said forwarding table (315, 320) of said first switch (435).
4. The network of claim 2, wherein said first switch (435) performs said intentional flooding for only a fraction of new forwarding table entries added to said forwarding table (315, 320) of said first switch (435).
5. The network of claim 1, wherein said first switch (435) and said neighboring switch (440) use different unique hash functions (310) to calculate hashed addresses for forwarding table entries such that likelihood of said first switch (435) and said neighboring switch (440) having an identical forwarding table conflict is significantly reduced.
6. The network of claim 5, wherein said switches (435, 440) use a unique local constant in a hash function (310) to calculate hashed addresses.
7. The network of claim 6, wherein said unique local constant is a local MAC address (305).
8. The network of claim 1, wherein said first switch (435) and said neighboring switch (440) use different unique hash functions (310) to calculate hashed addresses for forwarding table entries, such that likelihood of said first switch (435) and said neighboring switch (440) having an identical forwarding table conflict is significantly reduced; wherein said neighboring switch (440) acquires said information relating to a proper destination through intentional flooding of information; said intentional flooding occurring when a new entry is added to said forwarding table (315, 320) of said first switch (435).
9. A network switch (435, 440) comprising: a plurality of ports, at least a portion of said plurality of ports being connected to surrounding network elements within a computer network;a hash function (310), said hash function receiving incoming MAC addresses from one of said plurality of ports and calculating a hash address using a locally unique constant;a forwarding table RAM (315, 320); said forwarding table RAM comprising a look up table containing destination addresses and associated destination ports organized by hash address.
10. The network switch of claim 9, wherein said network switch (435, 440) records incoming MAC addresses and associated incoming ports to learn locations of network elements within a computer network (400); said network switch (435, 440) receiving intentionally flooded information from other switches in said computer network (400); said intentionally flooded information being generated when said other switches make a new entry into a forwarding table (315, 330).
11. The network switch of claim 10, wherein said network switch (435, 440) does not forward incoming messages received on a proper destination port; said network switch (435, 440) accessing said lookup table to determine if said incoming messages are received on said proper destination port.
12. A method of reducing flooding within a computer network (400) comprising: intentionally flooding said computer network (400) when a new forwarding table entry is made by a first network switch (435), such that information contained within said new forwarding table entry is recorded by a neighboring network switch (440); andsaid neighboring switch (440) blocking subsequent messages which are received on a proper destination port.
13. The method of claim 12, further comprising: said neighboring switch (440) applying a locally unique hash function (310) to a MAC address associated with said intentionally flooded information to generate a hash address; andsaid neighboring switch (440) recording said intentionally flooded information within a forwarding table (315, 320) at said hash address.
14. The method of claim 13, further comprising: said neighboring switch (440) receiving a subsequent message on an input port; said subsequent message containing an associated destination address and a source address;said neighboring switch (400) accessing said forwarding table entry to determine if a destination port associated with said destination address is identical to said input port; andif said destination port associated with said destination address is identical to said input port, said neighboring switch (440) refuses to forward said message to other ports.
15. The method of claim 13, further comprising: said neighboring switch (440) using a unique local constant (505) to generate said unique local hash function, said unique local constant being a MAC address of said neighboring switch (440).

PCT Information

Filing Document	Filing Date	Country	Kind	371c Date
PCT/US09/30768	1/12/2009	WO	00	7/1/2011

Reducing Propagation Of Message Floods In Computer Networks

Information

Publication Number

Date Filed

Date Published

Inventors

CPC

US Classifications

International Classifications

Abstract

Description

Claims

PCT Information