The invention relates to methods of searching for data, and more particularly, to methods for locating data within large databases.
Methods for locating a data result within a large data collection have existed for centuries, and have been applied to a wide range of fields. For small data collections, the typical method is to identify a search term or “key,” and then to successively match the key against each item in the data collection until the result is found. This direct approach is referred to herein as a “memory map.” Larger data collections are often organized into structures, such that certain elements of the key direct the search to a sub-group of the data collection, while other elements of the key are used to match particular items in the sub-group. For example, if one wished to find a biography of Napoleon in a library of bound volumes, one might mentally choose “Biography Napoleon” as the key. The first part of the key indicates the section of the library where the book will be located, while the second part indicates the particular book being sought, so that one would first proceed first to locate the section of the library that contains “biographies,” and would then proceed to find the appropriate shelf (for example containing titles beginning with L-P), and would finally match the name “Napoleon” against the individual titles on the shelf. Of course, one possible outcome could be that no biography of Napoleon is carried by that library, but the result of the search would still be successful, in that it would conclusively determine whether or not a biography of Napoleon existed in the library.
Of course, most data searches are carried out by computers searching databases stored in non-transient memory. Applications range from internet searches (which are actually searches of very large databases assembled by “web-crawlers” that contain internet website links and contents of the websites), to color adjustment lookup tables in graphic display systems, to tables used by IP routers to direct packets from inputs to outputs.
As mentioned above, the “memory map” approach is to simply scan the entire contents of a database until a match is found, or the contents of the database are exhausted. In this approach, a single table contains the complete list of keys and corresponding results. Using a memory map can be satisfactory when time is not critical and/or when the database is not excessively large. However, this approach is not satisfactory for many high speed applications, such as IP packet routing.
With reference to
The simple example of
Often, the approach of
Nevertheless, the approach of
While TCAM's can provide the speed and throughput required for some applications, they are expensive, high in power consumption, and can generate excessive heat, especially when large numbers of TCAM's are used for high speed searching of very large databases. This approach also excludes use of more widely available and less expensive pipeline processors that include on-chip memory units but do not include TCAM's.
Yet another approach is illustrated in
As can be seen from the simple example in
Use of the memory in the approach of
What is needed, therefore, is a method for high speed searching of a large database that does not require TCAMS or other dedicated searching processors, provides optimized memory usage, and yet provides throughput comparable to solutions that use TCAM's or other dedicated processors.
The present invention is a method for high speed searching of a large database without using TCAM's or other dedicated searching processors. The method provides optimized memory usage, while at the same time providing throughput comparable to solutions that use TCAM's or other dedicated processors. The objects of the present invention are achieved by using successive groups of bits from a search string or “key” to navigate through a “search tree” of tables in the database, in a manner similar to
Each link in the search “tree” of tables provides information that specifies both the type and size of the linked table. In this way, the processor is not only directed to the next subsequent table, but is also told how the linked table should be used and how many bits from the key should be provided to the linked table.
Specifically, tables in the database include data words that function as “pointer records” providing either direct or indirect links to other tables in the database. Each pointer record includes a “type” code that specifies to which of a group of “types” the next subsequent table in the tree belongs. At least one of the selectable table types uses bits from the key as an address offset, and the type codes in the pointer records that point to this type of “address offset” table further specify the size of the linked table and the number of key bits to be used as the address offset. Navigation through these address offset tables is highly efficient, because, in effect, the instruction pointer of the processor is being used as if it were a dedicated search co-processor.
In some embodiments, only address offset tables are included, and the type fields in the pointer records are used only to specify the sizes of the linked tables. Other embodiments of the present invention allow more than one type of table to be included within the database tree, such as memory mapped tables and string search tables, in addition to address offset tables. In these embodiments, the “type” code included in each pointer record includes bits that indicate the type of table to which the associated table link is pointing. In some of these embodiments, the first few bits of the type code indicate the general type of the linked table, and the remaining bits provide more specific information, depending on the type of linked table. For example, in some embodiments, if the first bit of the type code is a zero, then the linked table is an address offset table as described above, and the remaining bits in the type code indicate how many bits from the key should be used as the address offset for the linked table.
In various embodiments, for at least some table types other than address offset tables, the number of key bits to be used by the linked table is indicated in the table itself, or in a separate pointer to the table (in the case of indirect pointing).
The present invention can be very powerful when applied to searches where the key is a series of bit groups of varying sizes having specific meanings, which is a common situation that arises in packet routing, color mapping, and other applications. By analogy, consider the routing of physical US mail using “zip” codes. A US zip code is a series of 9 decimal digits. The first three digits direct a letter to a sectional mail sorting facility for a certain area. The fourth and fifth digits represent a group of delivery addresses within the area, and the last four digits represent a geographic segment within a group of delivery addresses that typically is served by a single mail carrier. When directing a piece of mail, the initial three digits are considered first, and the letter is sent to the appropriate sectional facility, UNLESS it is already located in the area served by that sectional facility, in which case the next two digits are considered, and possibly the final four digits, before the letter is finally routed.
Of course, the process of physical mail routing does not fall within the scope of the present invention, but it may serve a useful analogy for understanding some embodiments of the present invention, such as packet routing. A given packet router may by assigned to a certain section of the network, and may have outputs connected to local nodes in that section and/or to other local routers that serve smaller subsections within that section. The router may also have connections to other routers that serve other sub-sections. A specific group of bits in the packet delivery address (for example the most significant four bits) may indicate to which section of the network the packet is directed, while other groups of (less significant) bits may provide information regarding smaller subsections, and finally the address of the individual destination node. In such a case, if the destination address is in a different section of the network, then the output port will be assigned after consideration of only the first few bits of the address. On the other hand, if the destination address is in the same section as the router, then additional groups of bits will be considered. Note that the process of assigning a packet to a router output port is sometimes referred to as applying “rules” to the packet's destination address, but the process is equivalent to matching the packet's destination address with an entry in a database and retrieving the assigned output port from the database.
In applying the present invention to such a case, the table sizes would typically be assigned according to the bit groupings in the packet addresses. For example, if the first four bits indicate the primary section of the network where the destination node resides, the first table may contain sixteen entries corresponding to all possible combinations of the first four bits. Some entries may be empty, and the rest will direct the packet to the appropriate output leading to the router that handles that section, except when the first four bits refer to the section to which the router belongs. In that case, the table entry will point to another table whose size will correspond to the size of the next group of bits in the address. Several address offset tables may be quickly searched in succession simply by using the groups of key bits as address offsets. The search may also include one or more string searches and/or data lookups in one or more memory map tables. Finally, the search will terminate with an entry containing the output port ID to which the packet should be routed. In other applications, a similar search may terminate with a pointer to a location in external DDR memory where the information being sought for is stored.
The present invention is a method for performing a search within a data collection to locate a target result that corresponds to a key, the key including a plurality of key bits. The method includes creating a plurality of tables in non-transient media, each table being configured to process a group of bits from the key, where said processing yields a processing result, providing table links that connect the tables to form a search tree, each table link being at least part of a processing result for a preceding table, each table link including information specifying a location of a subsequent table pointed to by the table link and a bit assignment for the subsequent table, the bit assignment being a number of bits from the key to be processed by the subsequent table, the tables, table links, and bit assignments being selected according to a distribution of information within the data collection, processing a first group of bits from the key using a first table in the search tree to obtain a first processing result, if the first processing result is a first table link, obtaining from the first table link the location and the bit assignment of the subsequent table to which the first table link points, and processing a next group of key bits using the subsequent table pointed to by the first table link, the next group of key bits having a number of bits equal to the bit assignment of the subsequent table, and successively processing next groups of bits from the key using subsequent tables pointed to by table pointers until a processing result is obtained that provides the target result.
In embodiments, each table link includes a type field and a link field, the type field containing information specifying how the subsequent table pointed to by the table link will process bits from the key, the link field containing information that can be used to locate the subsequent table to which the table link points.
In some embodiments, at least one of the tables is an address offset table that processes bits from the key by using the bits as an address offset that is added to a base address of the address offset table to locate a data record in the address offset table from which a processing result can be derived. In some of these embodiments each entry in the address offset table is a single data word.
In other of these embodiments, at least one of the table links includes a type field containing data specifying that the table link is pointing to an address offset table, the type field further provides the bit assignment of the address offset table to which the table link is pointing, and the table link further includes a link field containing information that can be used to locate the address offset table to which the table link is pointing. And in some of these embodiments the table link is a single data record entry in a table, the type field includes one bit indicating that the table link is pointing to an address offset table and a plurality of bits that are a binary representation of the bit assignment of the address offset table to which the table link is pointing, and the link field contains a base address of the address offset table to which the table link is pointing.
In various embodiments, at least one of the tables is a string comparison table for which processing a group of bits from the key includes comparing the group of key bits with a comparison value and providing a first processing result if a match is found between the group of key bits and the comparison value. In some of these embodiments the string comparison table provides a second processing result if a match is not found between the key bits and the comparison value. In other of these embodiments if no match is found between the key bits and the comparison value, the key bits are compared with a second comparison value, and a second processing result is provided if a match is found between the key bits and the second comparison value.
In still other of these embodiments each string comparison table includes a first data word containing the bit assignment and the comparison value of the string comparison table, a second data word that can be used to obtain the processing result if a match is found between the key bits and the comparison value, and a third data word that specifies a next step in the search if a match is not found between the key bits and the comparison value. And in some of these embodiments the third data word is able to specify that the same key bits should be processed by a subsequent string comparison table if no match is found between the key bits and the comparison value.
In certain embodiments the search is performed by a processor and at least one of the tables is stored in a memory unit included within the processor. In some of these embodiments at least one of the tables is a memory mapped table located in memory not included within the processor, and the memory mapped processor processes bits from the key by using the bits as an address offset that is added to a base address of the memory mapped table to locate a data record in the address offset table from which a processing result can be derived. In other of these embodiments each entry in the memory mapped table is a single data word.
In still other of these embodiments at least one of the table links includes a type field containing data specifying that the table link is pointing to a memory mapped table, the table link further including a link field containing information that can be used to locate a pointer that points to the memory mapped table to which the table link is pointing. And in some of these embodiments the table link is a single data record in a table and the pointer includes a plurality of data words that specify the bit assignment and the location of the memory mapped table.
In yet other of these embodiments a plurality of memory units are included within the processor, and at least one table located in a first of the memory units can provide a processing result that is an instruction to continue the search on a second of the memory units. In still other of these embodiments the processing result includes information that can be used to locate a table link in the second of the memory units, the table link pointing to a subsequent table in the second of the memory units where the search is to be continued.
In yet other of these embodiments the processing result includes a type field indicating that it is an instruction to move the search from the first memory unit to another memory unit and an offset field that specifies the identity of the second of the memory units.
And in some of these embodiments the processor is a pipeline processor that includes a plurality of internal memory units, and the method further includes using at least one table located on a first memory unit of the processor to begin a second search for a second target result corresponding to a second key as soon as the processing of bits from the key is completed on the first memory unit.
The features and advantages described herein are not all-inclusive and, in particular, many additional features and advantages will be apparent to one of ordinary skill in the art in view of the drawings, specification, and claims. Moreover, it should be noted that the language used in the specification has been principally selected for readability and instructional purposes, and not to limit the scope of the inventive subject matter.
The present invention is a method for high speed searching of a large database without using TCAM's or other dedicated searching processors. The method provides optimized memory usage, while at the same time providing throughput comparable to solutions that use TCAM's or other dedicated processors. With reference to
Each link in the search “tree” of tables provides information that specifies both the type and size of the linked table. In this way, the processor is not only directed to the next subsequent table, but is also told how the linked table should be used and how many bits from the key should be provided to the linked table.
With reference to
The present invention can be very powerful when applied to searches where the key 201 is a series of bit groups of varying sizes having specific meanings, which is a common situation that arises in packet routing, color mapping, and other applications. For example, a given packet router may by assigned to a certain section of a network, and may have outputs connected to local nodes in that section and/or to other local routers that serve smaller subsections within that section. The router may also have connections to other routers that serve other sections. A specific group of bits in the packet delivery address (for example the most significant four bits) may indicate to which section of the network the packet is directed, while other groups of (less significant) bits may provide information regarding smaller subsections, and finally the address of the individual destination node. In such a case, if the destination address is in a different section of the network, then the output port will be assigned after consideration of only the first few bits of the address. On the other hand, if the destination address is in the same section as the router, then additional groups of bits will be considered.
In applying the present invention to such a case, the table sizes would typically be assigned according to the bit groupings in the packet addresses. For example, with reference again to
In the specific example of
With reference to
With reference to
With reference to
And with reference to
With reference to
With reference to
With reference to
In
In
In
Note that in some embodiments, the entire key is passed from MU to MU as the search progresses, while in other embodiments each MU receives only the bits that will be used to search tables stored on that MU. In the first case, processor overhead is reduced but the burden on the data links between the processing unit and the MU's is increased, while in the second case data communication between the processor and the MU's is reduced at the expense of more processor unit overhead in parsing the key and transmitting only specific bit groups to the MU's.
Note also that a tradeoff between MU usage, throughput, and latency can be optimized according to how many tables are stored in each MU. For example, throughput can be maximized at the expense of increased MU usage and latency if a large number of MU's are dedicated to the search, such that each MU contains only a single table. This allows a new key to be introduced into the MU pipeline after each table search. On the other hand, the search can be performed using fewer MU's and reduced latency, at the expense of lower throughput, if each MU contains a plurality of tables, so that a new key can be introduced into the pipeline only when the plurality of searches performed within a single MU has been completed.
The foregoing description of the embodiments of the invention has been presented for the purposes of illustration and description. It is not intended to be exhaustive or to limit the invention to the precise form disclosed. Many modifications and variations are possible in light of this disclosure. It is intended that the scope of the invention be limited not by this detailed description, but rather by the claims appended hereto.
This application claims the benefit of U.S. Provisional Application No. 61/710,198, filed Oct. 5, 2012, which is herein incorporated by reference in its entirety for all purposes.
Number | Date | Country | |
---|---|---|---|
61710198 | Oct 2012 | US |