The present invention relates generally to packet processing systems, and more particularly to a network processor or other type of processor configured for use in performing tree-based processing.
A network processor generally controls the flow of packets between a physical transmission medium, such as a physical layer portion of, e.g., an asynchronous transfer mode (ATM) network or synchronous optical network (SONET), and a switch fabric in a router or other type of packet switch. Such routers and switches generally include multiple network processors, e.g., arranged in the form of an array of line or port cards with one or more of the processors associated with each of the cards.
In performing packet processing operations such as classifying, routing or switching, the network processor typically must examine at least a portion of each packet. A packet is generally made of a string of binary bits. The amount of each packet that must be examined is dependent upon its associated network communication protocols, enabled options, and other similar factors.
More specifically, in a packet classification operation, the network processor typically utilizes a tree traversal process to determine various characteristics associated with each packet, i.e., to classify the input data according to one or more data attributes. The tree structure is also known as a knowledge base. The tree structure typically has a root portion where the processing begins, intermediate branches, and finally a plurality of leaves, where the final decisions or matches occur. Thus, each node of the tree is an entry or a decision point, and such entries or decision points are interconnected by branches. An instruction or bit pattern resides at each decision point for analyzing the input bit pattern (also referred to as the search object) and in response thereto for sending the bit pattern to the next appropriate decision point.
Since the data is presented in the form of binary bits, the processor compares groups of the input bits with known bit patterns, represented by entries in the tree structure. A match between the group of input bits and the bits at a tree entry directs the process to the next associated entry in the tree. The matching processes progress through a path of the tree until the end is reached, at which point the input bits have been characterized. Because a large number of bits must be classified in a data network, these trees can require many megabits of memory storage capacity.
The classification process finds many uses in a data communications network. The input data packets can be classified based on a priority indicator within the packet, using a tree structure where the decision paths represent the different network priority levels. Once the priority level is determined for each packet, based on a match between the input bits and the tree bits representing the available network priority levels, then the packets can be processed in priority order. As a result, time sensitive packets (e.g., those carrying video-conference data) are processed before time insensitive packets (e.g., a file transfer protocol (FTP) data transfer).
Other packet classifications processes determine the source of the packet (for instance, so that a firewall can block all data from one or more sources), examine the packet protocol to determine which web server can best service the data, or determine network customer billing information. Information required for the reassembly of packets that have been broken up into data blocks for processing through a network processor can also be determined by a classification engine that examines certain fields in the data blocks. Packets can also be classified according to their destination address so that packets can be grouped together according to the next device they will encounter as they traverse the communications medium.
One important attribute of any tree processing scheme is the worst case time required to complete a traversal. Generally, such tree processing schemes are implemented in a plurality of steps or cycles that each take a predetermined amount of time to complete. Thus, the maximum time to complete a traversal of the tree is generally reduced by minimizing the time spent at each step of the process.
Another important attribute of any tree processing scheme is the processing bandwidth associated with the processor. The problem is that the processor has to fetch instructions associated with the tree from memory, and is thus limited by the bandwidth of the memory in which such instructions are stored.
Accordingly, a need exists for improved techniques for performing tree-based processing associated with a network processor or other type of processor, wherein the improved techniques serve to reduce the time required to perform the processing.
Principles of the invention provide improved techniques for performing tree-based processing associated with a network processor or other type of processor. Advantageously, such improved techniques serve to reduce the time required to perform the tree-based processing.
By way of example, in one aspect of the invention, a method of performing a traversal of a tree structure includes the following steps. A first portion of data of a tree structure to be traversed is stored in a first memory level. A second portion of data of the tree structure to be traversed is stored in a second memory level. At least a third portion of data of the tree structure to be traversed is stored in at least a third memory level. In response to receipt of an input search object, a processor traverses one or more of the portions of the tree structure respectively stored in the memory levels to determine one or more matches between the tree data stored in the memory levels and the input search object. The processor, the first memory level, and the second memory level are implemented on one integrated circuit, and the third memory level is implemented external to the integrated circuit.
The processor may include two or more engines, and the first memory level includes two or more memory elements, wherein the two or more memory elements are respectively dedicated to the two or more engines. The step of storing the first portion of the tree structure in the first memory level may include storing a copy of the first portion of data of the tree structure in each of the two or more memory elements of the first memory level. A first one of the two or more engines may access one or more of the portions of the tree structure respectively stored in the memory levels, including its dedicated memory element associated with the first memory level, to determine one or more matches between the stored tree data and at least a portion of the input search object. Substantially simultaneous with the first one of the engines, a second one of the two or more engines may access one or more of the portions of the tree structure respectively stored in the memory levels, including its dedicated memory element associated with the first memory level, to determine one or more matches between the stored tree data and at least a portion of the input search object. The portion of the input search object processed by the first engine may be different than the portion of the input search object processed by the second engine. Alternatively, the portion of the input search object processed by the first engine may be the same as the portion of the input search object processed by the second engine. Still further, one engine may process one input search object while another engine processes another input search object.
An access time associated with the first memory level may be less than an access time associated with at least one of the other memory levels. An access time associated with the third memory level may be greater than an access time associated with at least one of the other memory levels.
In one embodiment, the processor may include a network processor, the input search object may include packet data, and the tree structure may include data used for classifying at least a portion of the packet data.
In another aspect of the invention, apparatus for performing a traversal of a tree structure includes: a first memory level for storing a first portion of data of a tree structure to be traversed; a second memory level for storing a second portion of data of the tree structure to be tarversed; at least a third memory level for storing at least a third portion of data of the tree structure to be traversed; and a processor for traversing, in response to receipt of an input search object, one or more of the portions of the tree structure respectively stored in the memory levels to determine one or more matches between the tree data stored in the memory levels and the input search object. The processor, the first memory level, and the second memory level are implemented on one integrated circuit, and the third memory level is implemented external to the integrated circuit.
In a further aspect of the invention, an integrated circuit comprises: a first memory level for storing a first portion of data of a tree structure to be traversed; a second memory level for storing a second portion of data of the tree structure to be traversed; and a processor, wherein the processor is configured to access the first memory level, the second memory level, and at least a third memory level for storing at least a third portion of data of the tree structure to be traversed, wherein the third memory level is remote from the integrated circuit. In response to receipt of an input search object, the processor traverses one or more of the portions of the tree structure respectively stored in the memory levels to determine one or more matches between the tree data stored in the memory levels and the input search object.
These and other objects, features and advantages of the present invention will become apparent from the following detailed description of illustrative embodiments thereof, which is to be read in connection with the accompanying drawings.
The present invention will be illustrated below in conjunction with an exemplary tree-based packet classification function performed by a network processor that is part of a packet processing system. It should be understood, however, that the invention is more generally applicable to any processing system in which it is desirable to avoid the drawbacks attributable to the use of existing techniques for performing tree-based processing.
By way of example only, principles of the invention are applicable to packet processors such as those available from Agere Systems Inc. (Allentown, Pa.), e.g., network processors respectively identified as APP350, APP550, and APP650. However, it is to be understood that principles of the invention are not limited to these, or any, particular processors.
It is to be understood that the term “processor” as used herein may be implemented, by way of example and without limitation, utilizing a microprocessor, central processing unit (CPU), digital signal processor (DSP), application-specific integrated circuit (ASIC), or other type of data processing device or processing circuitry, as well as portions and combinations of these and other devices or circuitry.
Referring to
It should be understood that the particular arrangement of system elements shown in
An exemplary tree structure is illustrated in
The illustrative tree of
The decision process at each entry of the tree is executed by using a processor to compare a first number of symbols at the first entry with a first number of the input symbols. The result of the first comparison determines the next branch that the process will follow. The symbols at the second entry are fetched from memory by the processor and a second group of the input symbols are compared with the symbols at the second entry. These alternating fetching and comparing steps are executed as the search object is processed through the tree until a decision entry is reached.
The decision tree, such as that of
With reference to
Since the input object begins with an A, the process is directed to the node 218 (memory location 10), which contains three instructions or three potential pattern matches and a memory address offset associated with each. If the pattern match is D and the offset address for the D branch is 1, the process moves to the memory location 11 or node 219 in
Principles of the present invention realize that the tree structure for performing the classification process may be segregated and stored in a plurality of memory elements. This provides many advantages, for example, the processor is provided with parallel and simultaneous access to the levels of the tree structure. In addition, such an arrangement provides more memory bandwidth (and thus increased processing bandwidth) since there are more memory levels in which instructions can be stored.
Accordingly, the tree structure may be partitioned between multiple memory elements, such that, depending on the memory elements chosen (i.e., faster memory on-chip versus slower off-chip memory), different read access times are available. Thus, certain tree entries (i.e., nodes or instructions as discussed above) are accessible faster than others.
For example, lower level branches of the tree can be stored on-chip with the processor thereby reducing the read cycle time for the lower level tree entries. Advantageously, there are fewer lower level tree entries as these appear near the tree root. Therefore, the on-chip storage requirements are considerably less than the storage requirements for the entire tree.
As shown in
It is known that a significant number of tree traversals are terminated (i.e., reach an end leaf) in the root memory or within one or two levels of the root memory. Simulation and analysis show that about 30% of the search objects being processed are terminated in the root tree memory. Thus, if the tree root is stored in memory 304-1, the process will likely converge faster.
In one embodiment, memory elements 304-1 and 304-2 reside on-chip (part of the processor integrated circuit), while memory element 304-3 (where N=3) resides off-chip (not part of the processor integrated circuit, but rather part of one or more other integrated circuits). For example, with reference back to
The use of three separate memory structures (or elements) is merely exemplary as additional memory structures can also be employed for storing levels of the tree, i,e., N=4, 5, . . . , etc. Selection of the optimum number of memory elements, the memory access time requirements of each, and the tree levels stored in each memory element can be based on the probability that certain patterns will appear in the incoming data stream. The tree levels or sections of tree levels that are followed by the most probable data patterns are stored in the memory having the fastest access time. For example, all the input patterns traverse the lower levels of the tree, thus these lower levels can be stored within a memory having a fast read cycle time to speed up the tree analysis process. In addition, more memory levels may be added to provide further memory bandwidth improvement, depending on the memory bandwidth requirement of the particular application.
Principles of the present invention can also be applied to parallel processing of a tree structure. See
Advantageously, it is to be appreciated that the bandwidth of any memory level (internal or external memory) can be increased by having multiple copies/instances of that level of memory. Thus, the memory level including shared memory 408 can be configured to have multiple elements such as is done with the first (root) memory level. The external memory level can also be configured in this manner.
For a tree such as the one shown in
Thus, according to the embodiment of
The embodiment of
Advantageously, since there are fewer tree branches at the root level, the capacity requirements for a root memory 406 are lower than the capacity requirements for an equivalent number of upper level branches. As explained above, the latter are stored in internal shared memory 408 and external shared memory 410. While internal shared memory 408 can run at the same speed as a root memory 406, external shared memory 410 can run at a slower speed (resulting in a higher data latency and a lower memory bandwidth). But this latency factor has less impact on the speed at which the tree analysis process is executed because these higher tree branches are not traversed as frequently. The use of internal memory and external memory allows each to be accessed in parallel by a pipelined processor. Also, use of the internal memory without external memory reduces the pin-out count of the integrated circuit incorporating the processor. Additionally, in applications that do not need external memory for storing higher levels of the tree, the external memory does not need to be populated. This reduces the system implementation cost.
Furthermore, depending on the structure of the particular tree, many of the pattern matching processes may terminate successfully at a lower level branch in the on-chip memory, and thereby avoid traversing the upper level branches stored in the slower memory.
In yet another embodiment, it is possible to store especially critical or frequently-used small trees entirely within the internal memory elements (406 alone, or 406 and 408), thus providing especially rapid tree processing for any tree that is located entirely on-chip. The segregation between the tree levels stored within the internal memories and the external memory can also be made on the basis of the probabilities of certain patterns in the input data.
Typically, the data input to a network processor using a tree characterization process is characterized according to several different attributes. There will therefore be a corresponding number of trees through which segments of the data packet or data block are processed to perform the characterization function. Accordingly, the lower level branches are stored on-chip and the higher-level branches are stored off-chip. To perform the multiple characterizations, a pipelined processor will access a lower branch of a tree stored in one of the on-chip memories and then move to the off-chip memory as the tree analysis progresses. But since the off-chip access time is longer, while waiting to complete the read cycle off-chip, the processor can begin to characterize other input data or begin another processing thread. In this way, several simultaneous tree analyses can be performed by the processor, taking advantage of the faster on-chip access speeds while waiting for a response from a slower off-chip memory.
In another embodiment, certain portions of the tree (not necessarily an entire tree level) are stored within different memory elements. For example, the most frequently traversed paths can be stored in a fast on-chip or local memory and the less-frequently traversed paths stored in a slower remote or external memory.
A tree structure may also be adaptable to changing system configurations. Assume that the tree is processing a plurality of TCP/IP addresses. When the process begins the tree is empty and therefore all of the input addresses default to the same output address. The tree process begins at the root and immediately proceeds to the default output address at the single leaf. Then, an intermediate instruction or decision node is added to direct certain input addresses to a first output address and all others to the default address. As more output addresses are added, the tree becomes deeper, i.e., having more branches or decision nodes. Accordingly, the growth of the tree can occur in both the local and the remote memory elements.
Although illustrative embodiments of the present invention have been described herein with reference to the accompanying drawings, it is to be understood that the invention is not limited to those precise embodiments, and that various other changes and modifications may be made by one skilled in the art without departing from the scope or spirit of the invention.
The present application relates to co-pending U.S. patent application identified as Ser. No. 10/037,040, filed on Dec. 21, 2001, and entitled “Method of Improving the Lookup Performance of Tree-type Knowledge Base Searches,” the disclosure of which is incorporated by reference herein.