Various exemplary embodiments disclosed herein relate generally to computer networking.
The Internet has evolved from a medium to interconnect machines into a medium to connect machines with content such as videos and photos. While the Internet developed on top of various mechanics, such as routing information via IP address, future architectures may employ alternative mechanisms in view of the current state of the Internet. One common principle among many proposals is that these future architectures may be centered on the content provided, rather than the machines themselves.
Content-centric networking is a more recent paradigm where content is requested by name, rather than location. This paradigm may employ name-based routing, wherein a router may move traffic to a destination server based on the “content name.” As such, Internet routers may be provided with explicit information as to the content being moved.
One result of name-based routing is that, while IP addresses are constrained in the number of variations and are clustered geographically, content names may take on virtually any value and may point to a server located anywhere, regardless of the locations of servers hosting content with similar names. As such, the number of entries in a name-based routing table may be orders of magnitude larger than an IPv4 or IPv6 routing table. This presents new challenges in efficiently storing the table and quickly forwarding traffic using the table.
A brief summary of various exemplary embodiments is presented below. Some simplifications and omissions may be made in the following summary, which is intended to highlight and introduce some aspects of the various exemplary embodiments, but not to limit the scope of the invention. Detailed descriptions of a preferred exemplary embodiment adequate to allow those of ordinary skill in the art to make and use the inventive concepts will follow in later sections.
Various exemplary embodiments relate to a method performed by a network device for forwarding a message, the method including: receiving, at a first input line card of the network device, a message to be forwarded toward a destination, the message including a destination address; determining, by the first input line card, that a second input line card of the network device is configured with routing information related to the destination address; transmitting the message, by the first input line card, to the second input line card based on the determination that the second input line card is configured with routing information related to the destination address; determining, by the second input line card and based on the routing information related to the destination address, that the message should be forwarded via a first output line card of the network device to another network device; transmitting the message, by the second input line card, to the first output line card based on the determination that the message should be forwarded via a first output line card of the network device to another network device; and transmitting the message, by the first output line card, to another network device.
Various embodiments are described wherein the first output line card includes one of the first input line card and the second input line card.
Various embodiments are described wherein the destination address is a content name.
Various embodiments are described wherein the transmitting the message to the second line includes transmitting the message, by the first input line card, to an input port of the second input line card.
Various embodiments are described wherein determining, by the first input line card, that the second input line card of the network device is configured with routing information related to the destination address includes: performing a hash function on at least a portion of the destination address to generate a hashed destination address; and determining that an assigned identifier of the second input line card corresponds to the hashed destination address.
Various embodiments are described wherein the first input line card stores a first set of forwarding information, the second input line card stores a second set of forwarding information that is different from the first set of forwarding information.
Various exemplary embodiments relate to a network device for forwarding a message, the network device including: a routing information base (RIB) storage configured to store routing information; a plurality of line cards; and a processor configured to: generate a line card table that associates a first identifier with a first input line card of the plurality of line cards and associates a second identifier with a second input line card of the plurality of line cards; generate a first set of forwarding information based on the routing information and the first identifier; generate a second set of forwarding information that is different from the first set of forwarding information based on the routing information and the second identifier; provide the line card table and the first set of forwarding information to the first input line card; and provide the line card table and the second set of forwarding information to the second input line card.
Various embodiments are described wherein, in generating the first set of forwarding information based on the routing information and the first identifier, the processor is configured to: generate an address prefix and associated forwarding information based on the routing information; perform a mathematical operation with respect to address prefix to generate a value; determine that the value matches the first identifier; and include the address prefix in the first set of forwarding information based on the value matching the first identifier.
Various embodiments are described wherein the processor is further configured to: receive an indication that the second input line card has failed; generate an updated line card table that associates the first identifier with the first input line card and associates the second identifier with a first output line card of the plurality of line cards; generate a third set of forwarding information based on the routing information and the second identifier; and provide the updated line card table and the third set of forwarding information to the first output line card.
Various embodiments are described wherein the first input line card includes: a first memory configured to store the line card table; and a first processing manager configured to: receive a message to be forwarded toward a destination device, the message including a destination address; determine, based on the second identifier and the destination address, that the second input line card should process the message; and transmit the message to the second input line card, based on determining that the second input line card should process the message.
Various embodiments are described wherein the second input line card includes: a second processing manager configured to determine that the second input line card should process the message; and a forwarding module configured to: determine that a first output line card should forward the message to another network device, and transmit the message to the first output line card.
Various embodiments are described wherein the second input line card further includes a memory configured to store a cache table; and the forwarding module is a cache configured to, in determining that the first output line card should forward the message to the other network device: determine that the cache table stores an entry associated with the destination address, and determine that the entry identifies the first output line card.
Various embodiments are described wherein the second input line card further includes a hash table storage configured to store the second set of forwarding information; and the forwarding module is a longest prefix matching (LPM) block configured to, in determining that the first output line card should forward the message to the other network device: identify an entry of the second set of forwarding information having a longest matching prefix for the destination address; and determine that the entry identifies the first output line card.
Various embodiments are described wherein, in identifying the entry of the second set of forwarding information having the longest matching prefix for the destination address, the LPM block utilizes a set of distributed Bloom filters to determine the length of the longest matching prefix.
Various exemplary embodiments relate to a network device for forwarding a message, the network device including: a first input line card that stores a first set of forwarding information; and a second input line card that stores a second set of forwarding information different from the first set of forwarding information, wherein the first input line card is configured to: receive a message to be forwarded, and transfer the message to the second input line card, and wherein the second input line card is configured to forward the message based on the second set of forwarding information.
Various embodiments are described wherein the first set of forwarding information and the second set of forwarding information store forwarding information for content address prefixes.
Various embodiments are described wherein the network device includes a switching fabric and, in forwarding the message based on the second set of forwarding information, the second input line card is configured to transmit the message via the switching fabric.
Various embodiments are described wherein the first input line card is configured to determine that a destination address associated with the message is assigned to the second input line card.
Various embodiments are described wherein, in determining that the destination address associated with the message is assigned to the second input line card, the first input line card is configured to: generate a hash value based on at least a portion of the destination address; generate an index based on the hash value and a number of line cards configured for the network device; and determine that the index corresponds to the second input line card.
Various embodiments are described wherein the second set of forwarding information includes forwarding information for a plurality of address prefixes of differing lengths and, in forwarding the message based on the second set of forwarding information, the second input line card is configured to: extract a destination address of the message; apply a set of distributed Bloom filters to the destination address to determine a beginning prefix length; and begin searching the second set of forwarding information by evaluating forwarding information for at least address prefix having the beginning prefix length.
In order to better understand various exemplary embodiments, reference is made to the accompanying drawings, wherein:
To facilitate understanding, identical reference numerals have been used to designate elements having substantially the same or similar structure or substantially the same or similar function.
As described above, various emerging routing paradigms propose to expand the routing table used in routing traffic over the Internet and other networks. While larger tables may be accommodated by larger and faster memories and faster processors, this approach may be cost-prohibitive. Accordingly, it may be desirable to implement a method of route resolution that may distribute a routing table and/or various routing operations among multiple hardware entities. Various additional objects and benefits will be apparent in view of the following description. It will be apparent to those of skill in the art that, while various examples described herein are described with respect to name-based routing, the methods and systems described may be useful in other environments such as, for example, routing according to IPv4 or IPv6 protocols.
Referring now to the drawings, in which like numerals refer to like components or steps, there are disclosed broad aspects of various exemplary embodiments.
As shown, the routers 140a-e may interconnect the client device 110, the server 120, and the server 130, such that messages may be exchanged between these devices. As noted, the exemplary network 100 may constitute a simplification and, as such, there may be a number of intermediate routers and/or other network devices (not shown) providing communication between those network devices that are illustrated. For example, the router 140c may be connected to the server 120 through one or more intermediate network devices (not shown).
The client device 110 may be any device capable of requesting and receiving content via a network. For example, the client device 110 may include a personal computer, laptop, mobile phone, tablet, or other device. The servers 120, 130 may each be any device capable of receiving requests and serving content. For example, the servers 120, 130 may each include a personal computer, stand-alone server, blade server, or other device. The servers 120, 130 may each host a number of content items, each item being identified by at least one content name. As illustrated, for example, the server 120 may host three content items identified as “TUX/notes.txt,” “JDOE/notes.txt,” and “/JDOE/VIDEOS/JD2012/vid.avi.” As another example, the server 130 may also host three content items identified as “/JDOE/PAPERS/PaperA.pdf,” “/JDOE/PAPERS/PaperB.pdf,” and “/JDOE/VIDEOS/abc.mpg.” The client device 110 may request the delivery of any of these items by sending a request message, which may then be routed by the routers 140a-e to the appropriate server 120, 130. This routing may be performed based on the name of the content requested. Thus, the content name may be referred to as a destination address of the request message. The server 120, 130 may then use the request to locate the appropriate content and transmit the content back to the client device 110 via the routers 150a-e.
As an example,
In various alternative embodiments, the request message 150 may address a specific “chunk” of the content requested. For example, the request message 150 may request “/JDOE/VIDEOS/abc.mpg/chunk2.” In various embodiments wherein all chunks for a particular content item are stored at the same location, the destination address used in processing the message may omit the chunk identifier, thus routing based on “/JDOE/VIDEOS/abc.mpg.” In various alternative embodiments such as, for example, embodiments where content may be distributed among multiple servers, routers may route based on the chunk identifier as well, thus routing based on “/JDOE/VIDEOS/abc.mpg/chunk2” in this example.
The RIB storage 212 may be a device configured to store routing information, such as a RIB. The RIB storage 212 may include a machine-readable storage medium such as read-only memory (ROM), random-access memory (RAM), magnetic disk storage media, optical storage media, flash-memory devices, and/or similar storage media. For example, the RIB storage 212 may include an SRAM that stores information regarding which destination addresses may be reached through various next hop devices. Exemplary contents of the RIB storage 212 will be described in greater detail below with respect to
The processor 214 may be a processing device such as a microprocessor, field programmable gate array (FPGA), application specific integrated circuit (ASIC), or other device. The processor 214 may be configured to perform various control plane functions. The memory 216 may include any memory device capable of supporting processor 214. For example, the memory 216 may include one or more SRAM chips.
In various embodiments, the processor 214 may be configured to generate one or more sets of forwarding information based on the contents of the RIB storage 212 and provide the one or more sets of forwarding information, such as forwarding information bases (FIBs) to the line cards 222a-b, 224a-b, 226a-b. This generation may include selection of optimal routes for each known address prefix and correlation of net hop devices to appropriate output ports or line cards.
In various embodiments, the processor 214 may be configured to provide different sets of forwarding information to each of the line cards 222a-b, 224a-b, 226a-b. As will be described in greater detail below with respect to
The line cards 222a-b, 224a-b, 226a-b may each constitute a device capable of receiving and forwarding messages having one or more input ports and one or more output ports. Each line card 222a-b, 224a-b, 226a-b may be useful as an input line card 222a, 224a, 226a, and an output line card 222b, 224b, 226b. The input line cards 222a, 224a, 226a may each be configured to receive messages, either from another external device or the switching fabrics 230, 232, and subsequently send the message to an appropriate output line card 222b, 224b, 226b via the switching fabrics 230, 232. The output line cards 222b, 224b, 226b, may be configured to receive messages from the switching fabrics 230, 232 and output the message to a next hop network device. Thus, as used herein, the terms “input line card” and “output line card” may refer to the same type of physical device performing input and output functions, respectively. Further, a “line card” may constitute both an “input line card” and an “output line card.” The data plane 220 may include numerous additional line cards (not shown). In various embodiments, the data plane 220 may include five hundred or one thousand line cards (not shown).
The switching fabrics 230, 232 may include hardware or machine-executable instructions encoded on a machine-readable medium configured to transport messages between the line cards 222b, 224b, 226b. It will be understood that the data plane 220 may include fewer or additional switching fabrics (not shown). In various embodiments, the data plane 220 may include eight or sixteen switching fabrics (not shown).
As will be explained in greater detail below with respect to
In some cases, a line card 222a-b, 224a-b, 226a-b may receive a message for which the line card 222a-b, 224a-b, 226a-b stores the useful forwarding information. For example, in message flow 250, line card N 226a may receive a message from an external device and determine, based on the locally-stored forwarding information, that line card 2224b should forward the message to the next hop. Input line card N 226a may then forward the message directly to output line card 2224b which may, in turn, forward the message to the next hop device.
The content prefix field 310 may store at least a portion of a content name or other destination address. The value of the content prefix field 310 may indicate that an entry applies to traffic matching the value. The next hop field 320 may store an identification of a next hop device capable of routing traffic matching the associated prefix toward its destination. In various embodiments, the next hop field 320 may alternatively or additionally store an indication of an output line card and/or port for matching traffic. The metric field 330 may store one or more metrics useful in determining which route is fastest, cheapest, or otherwise preferable for a particular prefix.
As an example, the data arrangement 300 is shown as including a number of entries 340-380. The data arrangement 300 may include numerous additional entries 390. A first entry 340 may indicate that traffic matching the content prefix “/TUX” may be forwarded to next hop “b” and that a metric associated with this route is set to a value of “2.” The next entry 350 may also apply to traffic matching the content prefix “/TUX,” indicating that next hop “d” may also be capable of forwarding this traffic toward the destination, but at a metric of “3.” When generating forwarding information, the processor 214 may select one of these entries 340, 350 to be used in forwarding traffic matching the “/TUX” prefix. In this case, the processor 214 may select the first entry 340 because this entry carries the lower metric value of “2.” Thus, the processor 214 may instruct the line cards 222a-b, 224a-b, 226a-b to forward this traffic to next hop b through the forwarding information provided.
The third and fourth entries 360, 370 may similarly specify available routes and costs for traffic matching the prefix “/JDOE.” Not all entries may include prefixes of the same length. As illustrated by the fifth entry 380, traffic matching the content prefix “/JDOE/PAPERS” may be forwarded to next hop “d” at a metric cost of “1.” While a given message may match both prefixes “/JDOE” and “/JDOE/PAPERS,” the network devices described herein may employ “longest prefix matching,” thereby routing the message according to the longest matching prefix. In the example of
The method 400 may begin in step 405 and proceed to step 410 where the processor 214 may assign an integer to each active line card 222a-b, 224a-b, 226a-b. This step may include assigning an integer to all line cards 222a-b, 224a-b, 226a-b in the system, only those line cards that are active 222a-b, 224a-b, 226a-b, or only those line cards 222a-b, 224a-b, 226a-b that have not experienced a failure. Next, in step 415, the processor 214 may generate a line card table based on the assigned integers. The line card table may be, for example, a table that associates an identification of each line card 222a-b, 224a-b, 226a-b with the assigned integer.
In step 420, the processor 214 may generate a master FIB from the RIB according to any method known in the art. This step may include, for example, evaluating metrics associated with various next hops for various address prefixes, consolidation of various address prefixes, or translation of next hop devices into corresponding output line cards or output ports.
Processor 214 may then begin iterating through the master FIB to create FIBs for each line card 222a-b, 224a-b, 226a-b by retrieving a FIB entry in step 425. Next, the processor 214 may determine which line card 222a-b, 224a-b, 226a-b should receive the RIB entry in step 430. In this example, the processor 214 may retrieve the first component “b1” of the content prefix carried by the entry. For example, if the entry is related to prefix “/JDOE/VIDEOS,” the processor 214 may retrieve the component “/JDOE.” Next, the processor 214 may hash the component using a hash function such as, for example, CRC-64 to produce a hashed value h(b1). The processor may then generate an index “i” by evaluating h(b1) modulo N, where N is the number of active line cards. The resulting index “i” may be used in conjunction with the line card table generated in step 415 to determine the line card to which integer “i” has been assigned. After matching the integer “i” to a line card 222a-b, 224a-b, 226a-b, the processor 214 may, in step 435, add the FIB entry to the FIB to be transmitted for integer i, FIBi. Next, in step 440, the processor 214 may determine whether the master FIB includes additional entries to process. If so, the processor 214 may loop back to step 425. Otherwise, the method 400 may proceed to step 445.
In step 445, the processor 214 may begin iterating through the active line cards by initializing a variable “j” to a value of “0.” Next, in step 450, the processor 214 may push the line card table and FIBj to the line card 222a-b, 224a-b, 226a-b that has been assigned the integer “j” in step 410. Then, the processor 214 may increment the value of “j” in step 455 and determine whether additional active line cards 222a-b, 224a-b, 226a-b remain to be processed, in step 460. If the current value of “j” is not greater than or equal to the number of active line cards n, the processor 214 may loop back to step 450. Otherwise, the method 400 may proceed to end in step 465.
It will be understood that various modifications to the method are possible. For example, instead of generating a master FIB and then splitting the FIB into subsets, the processor may generate the FIB for each line card 222a-b, 224a-b, 226a-b directly or concurrently. Various additional modifications will be apparent.
The processing manager 510 may include hardware and/or executable instructions on a machine-readable storage medium configured to receive a message to be forwarded and determine whether the line card 500 or some other line card 500 should process the message. As will be explained in greater detail below with respect to
The cache 520 may include hardware and/or executable instructions on a machine-readable storage medium configured to determine whether a cache table stored in the SRAM 540 includes an entry matching the address of the current message. If so, the cache 520 may forward the message to an output line card as identified by the cache entry. If there is no cache hit, the cache 520 may pass the message to the distributed Bloom filters module 532 of the LPM block 530.
The distributed Bloom filters module 532 may include hardware and/or executable instructions on a machine-readable storage medium configured to utilize Bloom filters stored in the SRAM 540 to determine a likely length of a longest prefix match for the current message. The distributed Bloom filters module 532 may, for example, apply the distributed Bloom filter procedure described in “IPv6 Lookups using Distributed and Load Balanced Bloom Filters for 100 Gbps Core Router Line Cards” by Song et al. and published in 2009 by the IEEE, the entirety of which is incorporated for all purposes herein by reference. The distributed Bloom filters module 532 may then pass the message to the route retriever 534 along with an indication of the likely length of the longest prefix match.
The route retriever 534 may search a FIB stored in the hash table storage 550 for a longest prefix match for the message. Based on the length indicated by the distributed Bloom filters module 532, the route retriever 534 may begin searching the FIB by looking for a matching prefix of the indicated length. If the distributed Bloom filters module 532 encountered a false positive, the route retriever 534 may gradually reduce the length of the prefixes being searched until eventually locating an entry having the longest prefix match for the message. The LPM block 530 may then forward the message to an output line card as indicated by the located entry.
The SRAM 540 may include hardware and/or executable instructions on a machine-readable storage medium configured to store various data useful for the line card 500. Exemplary contents of the SRAM 540 will be described in greater detail below with respect to
The hash table storage 550 may include hardware and/or executable instructions on a machine-readable storage medium configured to store a FIB for line card 500. Thus, the hash table storage may include a machine-readable storage medium such as read-only memory (ROM), random-access memory (RAM), magnetic disk storage media, optical storage media, flash-memory devices, and/or similar storage media. Exemplary contents of the hash table storage 550 will be described in greater detail below with respect to
The hash table storage 550 may constitute an “off-chip memory” that is implemented as a physically separate device from the other components of line card 500. The processing manager 510, the cache 520, the LPM block 530, and the SRAM 540 may be implemented as a single chip such as, for example, a microprocessor, FPGA, or ASIC.
As an example, the exemplary data arrangement 700 is illustrated as storing two cache entries 730,740. The data arrangement 700 may include numerous additional entries 750. The first cache entry 730 may indicate that messages including the destination address “/JDOE/PAPERS/PaperA.pdf” should be forwarded via line card “2,” which may correspond to output line card 2240b. The second cache entry 740 may indicate that messages including the destination address “/JDOE/notes.txt” should be forwarded via line card “1,” which may correspond to output line card 1220b.
As an example, the exemplary data arrangement 800 is illustrated as storing a number “p” of buckets 840-880. The first bucket 840 may include a single set of forwarding information, storing a hash, such as a CRC-64 hash, of the prefix “/JDOE” and indicating that matching messages should be forwarded via an output card “1.” The second bucket 850 may include two sets of routing information. The second bucket may indicate that for the longest prefix match “/TUX,” messages should be forwarded via output card “1,” while for the longest prefix match “/JDOE/VIDEOS,” messages should be forwarded via output card “2.” These two sets of forwarding information may occupy the same bucket because a mathematical operation used to determine an appropriate bucket may identify bucket “2” for both prefixes. For example, the hash values of either prefix modulo “p” may yield a value of 2. The meaning of the exemplary data in the remaining buckets 860-880 will be apparent in view of the foregoing description.
If, on the other hand, the record identifies the present line card, the method 900 may proceed to step 940 where the cache 520 may search the cache table to determine whether any entry matches the content name of the received message. If there is a cache hit, the cache 520 may determine the appropriate output line card from the cache entry in step 950. Otherwise, the LPM block 530 may perform longest prefix matching using the FIB in step 955 to determine the appropriate output line card. An exemplary LPM procedure will be described in greater detail below with respect to
Next, in step 1015, the distributed Bloom filters module 532 may apply appropriate Bloom filters to each hashed prefix. For example, each hash function applied to the prefixes may be associated with a separate Bloom filter. Each prefix hashed according to the first hash function may be applied to the first Bloom filter, each prefix hashed according to the second hash function may be applied to the second Bloom filter, and so on. The application of each Bloom filter may result in a bit field indicating whether, according to that Bloom filter, each prefix is likely to have a match in the FIB. For example, the bit field “0110” may indicate that the prefixes “/JDOE/VIDEOS” and “/JDOE/VIDEOS/JD2012” are likely to have a FIB match, while the prefixes “IJDOE” and “/JDOE/VIDEOS/JD2012/vid.avi” do not have a match. The distributed Bloom filters module 532 may then, in step 1020, combine each of the “k” bit fields to generate a single master bit field. For example, the distributed Bloom filters module 532 may produce the logical “AND” of all of the “k” bit fields. The master bit field may be used by the distributed Bloom filters module 532 in step 1025 to identify the length “m” of the likely longest prefix match. For example, the position of the first “1” in the master bit field may correspond to the length of the likely longest prefix match.
Next, the route retriever 534 may begin to search for the longest prefix match in the FIB by first, in step 1030, hashing the first “m” components of the content name. Next, the route retriever 534 may determine which bucket to access in step 1035, by calculating a value “f” based on the hash value, or a portion thereof, modulo “p,” the total number of buckets. The route retriever 534 may then retrieve bucket “f” from the FIB, in step 1040, and determine whether the bucket includes routing information matching the first “m” components of the content name. If the bucket does not include a match, the route retriever 534 may use the master bit field to determine, in step 1050, the next candidate for the length of the longest matching prefix. The route retriever 534 may then loop back to step 1030. Once the route retriever 534 locates a matching entry in the FIB, the route retriever 534 may determine that the line card indicated by the matching entry should be used to output the message. Method 1000 may then proceed to end in step 1060.
According to the foregoing, it should be apparent that various embodiments enable the efficient use of large routing tables to route messages. For example, by distributing the forwarding information among multiple line cards, the task of next hop lookup may be distributed among such devices, thereby providing an efficient means to support a large routing table. Further, by assigning a subset of destination addresses to each of a plurality of line cards, next hop lookups can be delegated in an efficient and reliable manner.
It should be apparent from the foregoing description that various exemplary embodiments of the invention may be implemented in hardware or firmware. Furthermore, various exemplary embodiments may be implemented as instructions stored on a machine-readable storage medium, which may be read and executed by at least one processor to perform the operations described in detail herein. A machine-readable storage medium may include any mechanism for storing information in a form readable by a machine, such as a personal or laptop computer, a server, or other computing device. Thus, a tangible and non-transitory machine-readable storage medium may include read-only memory (ROM), random-access memory (RAM), magnetic disk storage media, optical storage media, flash-memory devices, and similar storage media.
It should be appreciated by those skilled in the art that any block diagrams herein represent conceptual views of illustrative circuitry embodying the principles of the invention. Similarly, it will be appreciated that any flow charts, flow diagrams, state transition diagrams, pseudo code, and the like represent various processes which may be substantially represented in machine readable media and so executed by a computer or processor, whether or not such computer or processor is explicitly shown.
Although the various exemplary embodiments have been described in detail with particular reference to certain exemplary aspects thereof, it should be understood that the invention is capable of other embodiments and its details are capable of modifications in various obvious respects. As is readily apparent to those skilled in the art, variations and modifications can be effected while remaining within the spirit and scope of the invention. Accordingly, the foregoing disclosure, description, and figures are for illustrative purposes only and do not in any way limit the invention, which is defined only by the claims.