The present disclosure relates to network security, methods of structuring data for high speed searching, pre-compiled exception list engines incorporating such structured data and network appliances including such engines.
This section provides background information related to the present disclosure which is not necessarily prior art.
There are various known methods of structuring a set of data elements and of searching the set of data elements to locate a desired element. The data elements may be left unstructured or may be structured according to some principle, such as numerically, alphabetically, etc. The data set may, for example, be searched linearly by looking at each item in the set in order or by using a binary search, which repeatedly divides the set in half and determines whether the desired element is above, below or equal to the dividing point of the set.
As the number of elements in a data set increases, many known methods of structuring and/or searching the data set become cumbersome. Such methods may become slower, require more processor power, and/or memory.
This section provides a general summary of the disclosure, and is not a comprehensive disclosure of its full scope or all of its features.
According to one aspect of the present disclosure, a computer executed method is disclosed for sorting a plurality of Internet protocol (IP) addresses, each of which has a numeric value within a range of numeric values. The method includes dividing the range into a plurality of clusters representing a plurality of contiguous sub-ranges. Each sub-range encompasses substantially the same number of numeric values of the range and each sub-range associated with a different cluster. The method also includes assigning each IP address to the cluster associated with the sub-range that includes the numeric value of that IP address. Each cluster has a cluster size defined by the number of IP addresses assigned to that cluster. The IP addresses in each cluster are assigned to one of a plurality of pages. Each page has a page size limit defining the maximum number of IP addresses that can be assigned to that page. Each page has a page size defined by the number of IP addresses assigned to that page. If one of the pages has a page size less than its page size limit, the method includes duplicating on that page at least one of the IP addresses assigned to that page to increase the page size of that page. For each page, the IP addresses assigned to that page are ordered by numeric value.
According to another aspect of the present disclosure, a computer executed method is disclosed for sorting a plurality of internet protocol (IP) addresses, each of which has a numeric value within a range of numeric values. The method includes dividing the range into a plurality of clusters representing a plurality of contiguous sub-ranges. Each sub-range encompasses substantially the same number of numeric values of the range and each sub-range associated with a different cluster. The method also includes assigning each IP address to the cluster associated with the sub-range that includes the numeric value of that IP address. Each cluster has a cluster size defined by the number of IP addresses assigned to that cluster. The method includes ordering the clusters by cluster size. The IP addresses in each cluster are assigned to one of a plurality of pages. Each page has a same page size limit defining the maximum number of IP addresses that can be assigned to that page. Each page has a page size defined by the number of IP addresses assigned to that page. If one or more of said pages has a page size less than its page size limit, the method includes duplicating on said page one or more of the IP addresses assigned to that page to increase the page size of said page to its page size limit. For each page, the IP addresses assigned to that page are ordered by numeric value.
According to yet another aspect of the present disclosure a network appliance for connection to a first network is disclosed. The appliance includes at least one input coupled to the first network for receiving a packet from the first network. The packet includes an internet protocol (IP) address. The appliance also includes at least one processor for determining whether to allow the packet from the first network to proceed and at least one memory device storing instructions and data. The data includes a plurality of pages storing a plurality of excepted IP addresses. The excepted IP addresses each has a numeric value within a range. The range is divided into a plurality of contiguous sub-ranges and each page includes one or more of the excepted IP addresses having numeric values within one or more of the sub-ranges associated with that page. Each page has a page size defined by the number of IP addresses assigned to that page. The excepted IP addresses are assigned to each page ordered by numeric value. The at least one processor is configured via the instructions to identify the IP address of the packet from the first network, identify a target page that will include the IP address if the IP address is one of the plurality of excepted IP addresses, search the target page to determine if the IP address is one of the excepted IP addresses in the target page, and process the packet from the first network according to whether the IP address is an excepted IP address in the target page.
According to another aspect of the present application, a network appliance for connection to a first network includes at least one input coupled to the first network for receiving a packet from the first network. The packet includes an internet protocol (IP) address. The appliance includes at least one processor for determining whether to allow the packet from the first network to enter the second network and at least one memory device. The appliance also includes a first engine stored in the memory device. The first engine includes a plurality of pages storing a plurality of excepted IP addresses. The excepted IP addresses each has a numeric value within a range of numeric values and the range is divided into a plurality of contiguous sub-ranges. Each page includes one or more of the excepted IP addresses having numeric values within one or more of the sub-ranges associated with that page. Each page has a page size defined by the number of excepted IP addresses assigned to that page. The excepted IP addresses are assigned to each page ordered by numeric value. The first engine also includes a first finite state machine (FSM). The first FSM includes instructions executable by the processor to determine the page associated with the sub-range encompassing the IP address and output an indication of the page associated with the sub-range encompassing the IP address. The engine also includes instructions executable by the processor to search the page associated with the sub-range encompassing the IP address to determine if the IP address is an excepted IP address, and output an indication of whether the IP address is an excepted IP address. The processor is also configured via instructions stored in the memory device to process the packet from the first network according to the indication from the first engine.
Further areas of applicability will become apparent from the description provided herein. The description and specific examples in this summary are intended for purposes of illustration only and are not intended to limit the scope of the present disclosure.
The drawings described herein are for illustrative purposes only of selected embodiments and not all possible implementations, and are not intended to limit the scope of the present disclosure.
Example embodiments will now be described more fully with reference to the accompanying drawings.
Example embodiments are provided so that this disclosure will be thorough, and will fully convey the scope to those who are skilled in the art. Numerous specific details are set forth such as examples of specific components, devices, and methods, to provide a thorough understanding of embodiments of the present disclosure. It will be apparent to those skilled in the art that specific details need not be employed, that example embodiments may be embodied in many different forms and that neither should be construed to limit the scope of the disclosure. In some example embodiments, well-known processes, well-known device structures, and well-known technologies are not described in detail.
The terminology used herein is for the purpose of describing particular example embodiments only and is not intended to be limiting. As used herein, the singular forms “a”, “an” and “the” may be intended to include the plural forms as well, unless the context clearly indicates otherwise. The terms “comprises,” “comprising,” “including,” and “having,” are inclusive and therefore specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. As used herein, the term “and/or” includes any and all combinations of one or more of the associated listed items. The method steps, processes, and operations described herein are not to be construed as necessarily requiring their performance in the particular order discussed or illustrated, unless specifically identified as an order of performance. It is also to be understood that additional or alternative steps may be employed.
According to one aspect of the present disclosure, a method, generally indicated by the reference numeral 100 in
IP addresses are numerical identifiers of devices in a network. Each IP address typically has a numerical value and is stored as a binary number. There are multiple IP addresses versions, such as IPv4 and IPv6. IPv4 addresses are 32 bit numbers, while IPv6 addresses are 128 bit numbers. IPv4 addresses are often represented in human friendly decimal notation as four three digit numbers separated by decimal points (e.g. 000.000.000.000). Extra leading zeros are sometimes removed from the decimal notation, but are included above for explanatory purposes. Each of the numbers is actually represented by eight bits. An eight bit number can represent decimal numbers between 0 and 255). Thus, each of the four numbers of the IP address may be between 0 and 255. The method disclosed herein is not version specific and may be used with any version of IP addresses. The size of the range encompassing a set of all IP addresses is determined by the bit size of that version's IP addresses. For example IPv4 addresses are 32 bits long resulting in a range of 2^32 possible IP addresses. Similarly, the range of IPv6 addresses is 2^128 possible IP addresses. The range of the IP addresses according to the method disclosed herein may be the range of all possible IP addresses (e.g., 0 to 2^32−1 for IPv4 addresses) or any appropriate subset of all possible IP addresses.
Whatever the range, the method includes dividing the range into a plurality of clusters representing a plurality of contiguous sub-ranges, an example of which is shown in
The plurality of contiguous sub-ranges may be determined in numerous different ways. As will be explained more fully below, after the clusters have been populated with the IP addresses from the plurality of IP addresses, the IP addresses will be assigned to pages by cluster. Each page may have a page size limit, also known as a maximum size, that may be expressed in terms of a number of IP addresses or a bit size. Although page sizes, cluster sizes, and page size limits will be discussed herein in terms of number of IP addresses, it should be understood that bit sizes may be interchangeably used. For example, a page size limit of one IP address is the same as a page size limit of 32 bits for IPv4 addresses or 128 bits for IPv6 addresses). Because of the page size limit, each cluster may have no more IP addresses assigned to it than the page size limit. One technique for determining the plurality of sub-ranges is to assign each sub-range the same number of numeric values, where the number of numeric values is equal to the page size limit. With such a division, no cluster could include more IP addresses than the page size limit. This way of dividing the range will work with any number of IP addresses, with any range of IP addresses, and with any distribution of IP addresses. However, such a division results in a very large number of clusters. Additionally, as the plurality of IP addresses may be widely spread over the range and may not be linearly distributed over the range. Thus there may be many clusters with no IP addresses in those clusters sub-ranges.
Accordingly, in some embodiments, the range may be divided into as few clusters as possible while still avoiding any cluster having a cluster size greater than the page size limit. Various techniques, including various heuristic techniques, are available for determining the number of clusters and the size of the sub-ranges to minimize the number of clusters and maximize the size of the sub-ranges of the clusters. One technique to minimize the number of clusters is to begin with one cluster. When the IP addresses are assigned to the cluster, as soon as (or if) the number of IP addresses assigned to the cluster exceeds the page size limit, the assigning is stopped, the number of clusters is doubled (to two) and the assigning begins again. The process stops and the cluster size is again doubled as soon as the number of IP addresses assigned to any cluster exceeds the page size limit. The process is repeated until the number of IP addresses assigned to each cluster is less than or equal to the page size limit. With a large number of IP addresses such a technique may require numerous iterations and a relatively long time.
Another technique assumes that the IP addresses are relatively linearly distributed and divide the number of IP addresses in the plurality of addresses by the page size limit. Thus, for example, if the page size limit is 2^10 and the number of IP addresses is 2^16, the number of clusters is 2^6. If the range is all IPv4 addresses, the range is 2^32. Dividing the range (2^32) by the number of clusters (2^6) indicates that each cluster has a sub-range of 2^26 IP addresses. According to some embodiments this sub-range is then incremented to the next power of two (i.e., incremented to 2^27 in this example) and the number of clusters is correspondingly decreased by a power of two (i.e. decreased to 2^5 in this example).
However an initial number of clusters is determined, the iterative technique discussed above may be applied to ensure that no cluster has a cluster size larger than the page size limit. A flow diagram of the process is illustrated in
After the IP addresses have been assigned to the appropriate clusters, the clusters may be ordered. The clusters are ordered by cluster size. The ordering may be in order of increasing or decreasing cluster size. Additionally, or alternatively, a cluster list may be created. The cluster list contains all clusters and their associated data, including the IP addresses assigned to each cluster. The clusters may be ordered by cluster size in the cluster list.
The IP addresses in each cluster are assigned to one of a plurality of pages. In one embodiment, the pages are an array of contiguous memory bytes. Each page has a page size limit defining the maximum number of IP addresses that can be assigned to that page and a page size defined by the number of IP addresses assigned to that page. In other embodiments, pages can be implemented in various other ways, such as using binary trees.
Generally, the page size limit(s) may be any appropriate size. According to some embodiments, however, it is preferred that the page size limit be a size that will result in a page with a size small enough to be stored in cache memory of a processor with which the page will be used. Such preferred page sizes increase cache locality. Data stored in a processor's cache memory may be accessed, manipulated, etc. much faster than data stored remote from the processor. Thus, a page size limit small enough that a page may be entirely loaded into cache memory may increase the speed at which the pages can be searched. Different pages may have different page size limits. In some embodiments, however, the page size limit of each page is the same.
The IP addresses in the clusters may be assigned to pages according to various techniques. Initially, at least one page is created. Although the cluster can be simple assigned one cluster to one page, other techniques may be preferable. Thus, according to some embodiments, the clusters are assigned to pages according to either a first fit or a best fit algorithm.
Various implementations of first fit algorithms are appropriate and may be used as part of methods disclosed herein. One example implementation is illustrated by the flow diagram 400 in
Various implementations of best fit algorithms are appropriate and may be used as part of methods disclosed herein. One example implementation 500 is illustrated in
The assigning of clusters to pages, whether by best fit, first fit, or any other suitable algorithm, continues until each cluster has been assigned to a page.
According to one embodiment, assignment of IP addresses of the clusters to pages utilizes a cluster list and a page list. As described above, the cluster list contains all clusters and their associated data items. The cluster list is sorted in descending order by the number of entries in each cluster (i.e. the cluster size). A page list is created that will contain pages containing one or more clusters. The page list is sorted by the number of free entries (the page size limit minus the page size) available for each page. Particularly if a first fit algorithm is being used, the page list may be sorted by number of free entries (or empty slots) in ascending order. For each cluster, the cluster is retrieved and the page list is searched for a page that has enough free entries to hold all the IP addresses in that cluster. This search may be according to a first fit algorithm, a best fit algorithm or any other suitable algorithm. If a page is found with enough free entries, the cluster's IP addresses are assigned to the page and the page list is updated to reflect the free entries now remaining on the page. If the page list was ordered by number of free entries, the page list is reordered after each cluster is assigned. If, however, there are no pages with enough free entries, a new page is allocated and the IP addresses of the first cluster are assigned to the page. The page list is updated to indicate the new page and the number of free entries on the new page. The process repeats for subsequent clusters until all clusters have been assigned. According to some embodiments, all clusters that do not include any IP addresses may be assigned to the same page. This page may be page 0, a null page, or any other suitable page. This assignment to a null page may speed up searching the IP addresses. As soon as it is determined that an IP address being search for is in a sub-range associated with a cluster assigned to the null page, the search can be stopped because the null page includes no IP addresses and therefore the searched for IP address is not part of the plurality of IP addresses that have been assigned to the pages.
According to some embodiments, a load factor is determined after all of the clusters have been assigned to pages. The load factor is a ratio of the sum of the page size limit of the plurality of pages to the number of IP addresses in the plurality of IP addresses. The load factor may be considered the ratio of resources used to amount of data stored. It represents the compactness or efficiency of the pages populated with the IP addresses. The lower the load factor the more compact the populated pages are. The more pages used for a given number of IP addresses, the higher the load factor. Similarly, more efficiently populated pages, i.e. pages having page sizes close to or equal to the page size limit when populated with the IP addresses, are more likely to result in fewer pages being needed and the load factor may be reduced. If the methods disclosed herein are utilized the load factor may be very low. The load factor, according to some embodiments, is compared to a load factor threshold. The load factor threshold can vary based on various factors, including desired speed of processing, amount of memory available in a system with which the pages will be used, amount of processing power available in a system with which the pages will be used, etc. According to at least one embodiment the load factor threshold is ten percent. According to at least one other embodiment, the load factor threshold is five percent. If the load factor exceeds the load factor threshold, the pages are not as compact as desired. According to some embodiments, when the load factor exceeds the load factor threshold, the number of clusters is doubled, thereby also decreasing the size of each of the sub-ranges by one-half. The IP addresses are reassigned to the new clusters and the IP addresses in the new clusters assigned to pages according to the methods disclosed above. Because the sub-range of each new cluster is one-half what it was previously, most new clusters will have a smaller cluster size. In many cases, this will permit the clusters to be fit more efficiently into the pages, reducing the unpopulated space in the pages, and decreasing the load factor. If the load factor remains above the load factor threshold, the number of clusters may again be doubled and this iterative process may continue until the load factor is reduced to, or below, the load factor threshold. Alternatively, or additionally, this process may be repeated for a set maximum number of iterations before being stopped. Such maximum number of iterations may be useful to prevent the process from being repeated too many times and creating more clusters than is desired.
After the pages have been populated with IP addresses from the clusters, some of the pages may still have free entries remaining. Accordingly, if one of the pages has a page size less than its page size limit, the method includes duplicating on that page at least one of the IP addresses assigned to that page to increase the page size of that page. In some embodiments, the at least one IP address may be duplicated on the page until the page size is increased to about the page size limit. This duplication of IP addresses to increase the page size to about the page size limit may be referred to sometimes as padding the pages. The at least one IP address may be one IP address, possibly repeatedly duplicated, or it may be multiple IP addresses. The IP address or addresses to duplicate may be randomly selected or specifically selected. For example, only the first IP address in a page may be duplicated on the page, only the middle IP address may be duplicated, the IP addresses may be duplicated in order, a single randomly selected IP address may be duplicated, multiple IP addresses may be randomly selected and duplicated, etc. In at least one embodiment, the at least one IP address is a plurality of randomly selected IP addresses duplicated until the page size is about the page size limit.
The method also includes ordering, for each page, the IP addresses assigned to that page by numeric value. This ordering may make searching the page easier. Additionally, the padding of the pages, the IP addresses of which are then ordered by numeric value, may offer some benefits when the pages are searched. For example, if a page has a page size equal to the page size limit, the number of data elements (i.e. IP addresses) on the page is known. That number of IP addresses is the page size limit. Thus, a table lookup may be used during the searching. The midpoint of the page is a known spot in the page. If a page has 1024 IP addresses, the midpoint of a padded page is the 512th IP address regardless of what the values on the page are and regardless of how many distinct IP addresses are assigned to the page. Additionally, when all pages are so padded and the page size limit of each page is the same, the midpoint of every page is the same. Thus, the midpoint of every page, and accordingly every subsequent midpoint of the portion above or below the midpoint, is a known point, or element, in the page. Particularly using a binary search, sometimes referred to as a divide and conquer search, being able to identify the IP address that is the midpoint without calculation may increase the speed of the search.
According to some embodiments a page record is generated. The page record includes all of the cluster sub-ranges and the page to which the IP addresses in that sub-range are assigned. This data may be arraigned in a tuple of the first numeric value in the sub-range, the last numeric value in the sub-range and the number of the page with which the sub-range is associated. The page record, however, is not so limited and may be implemented using any appropriate method of indicating which sub-range or sub-ranges are associated with each page.
In some embodiments one or more bloom filters may also be created. Bloom filters are a short-circuit. The bloom filter may be used to identify IP addresses that are not in the plurality of IP addresses assigned to the pages without actually searching the pages. A separate bloom filter may be created for each page or a bloom filter may be created for all of the pages. The bloom filter is a bit map having some number of bits, where each bit represents a certain number or combination of numbers. For example, the first bit of a bloom filter may represent the number 0 while the 256th bit represents the number 255. The bloom filter may be mapped to a portion of the IP addresses associated with a page, such as the bits representing the first decimal number, the last decimal number, etc., such that each bit in the bloom filter is associated with one number of the IP address. Alternatively, or additionally, each bit may be mapped to some combination of numbers of an IP address. The combinations of numbers may be the result of a function. Thus, for example, a bloom filter may represent the first two numbers of an IP address exclusive or'd with the last two numbers, or the first number and'd with the last two numbers, etc. Each bit may be set to a 1 or a 0. If a bit is set to 1, an IP address associated with the page includes that bits number. Conversely a bit in the bitmap that is set to 0 indicates that no IP address on the page includes that bits number. By way of example, if the IP address 208.77.188.166 is the only number included on a page, all bits of a bloom filter for the first number of the IP addresses on the page would be zeros except the 209th bit (representing the number 208), which would be set to 1. In use, bloom filters may give false positives (i.e. indicate that a number may be on a page when it is in fact not), but not false negatives (i.e. indicating that a number is not on a page when in fact it is). Continuing the example above, if an IP address to be searched for is 208.70.XXX.XXX, the bloom filter indicates that the IP address may be one of the IP addresses associated with the page (even though it is not in this example). Thus the page would still need to be searched to determine that the address 208.70.XXX.XXX is not an IP address assigned to the page. Conversely, if the IP address to be searched is 192.XXX.XXX.XXX, by simply looking at the 193 bit of the bloom filter the bloom filter indicates, correctly, that the IP address is not one of the IP addresses associated with the page. Because bit 193 is a zero, there are no IP addresses that begin with 192. Accordingly, the page need not be searched. Additionally, a plurality of bloom filters may be used. Each of the plurality of bloom filters is based on a different number, group of numbers, function using the numbers, etc. of the IP addresses. The plurality of bloom filters may be checked sequentially until all bloom filters indicate that the IP address searched is not within any of the pages, until on of the bloom filters indicates the IP address may be in the pages, etc. As can be seen, the use of bloom filters may decrease the amount of time required to determine whether an IP address is present within a plurality of IP addresses processed according to the disclosed methods. This improved performance is particularly notable when a large number of IP addresses to be search for will not and/or are not part of the plurality of IP addresses on the pages.
After one or more of the methods disclosed herein has been performed, the pages containing the IP addresses are ready to be searched. Any combination of searching the pages directly, searching the page record to determine on which page to search and/or using the bloom filter to determine if searching the pages and/or the page record is even necessary may be used. According to some embodiments, the bloom filter is searched first to determine if an IP address is not an IP address associated with one of the pages. If the bloom filter indicates the IP address is not an IP address associated with one of the pages, no further searching may not be necessary. If the bloom filter indicates the IP address may be an IP address associated with one of the pages, the table record may be searched to determine with which page the sub-range encompassing the IP address is associated. After such page is determined, the page itself is searched to determine whether the IP address is one of the IP addresses assigned to such page.
The searching according to embodiments discussed herein may be any appropriate method of searching for a data item. For example the search may be a binary search (sometimes referred to as a binary divide and conquer search), a linear search, an interpolation search, a search as described in co-owned and co-pending U.S. patent application Ser. No. 11/591,802, etc.
Additionally, or alternatively, the pages created by the methods described herein and/or the page record may be combined with appropriate instructions to create an engine for determining whether an input IP address is contained in the plurality of IP addresses processed according to the methods described above. The engine includes the pages with their assigned IP addresses, instructions operable to cause a processor to search the pages, and a finite state machine (FSM). The engine may sometimes be referred to as a pre-compiled exception list engine. The FSM includes the page record and instructions operable to cause a processor to search the page record and output an indication of which, if any, page will contain the input IP address if it is present. When an IP address is input to the engine, the FSM searches the page record to identify on which page the sub-range encompassing the input IP address is located and outputs an identifier of that page. The appropriate page is then searched to determine whether the IP address is located on the page. The engine then outputs a yes or no answer, for example a 1 or a 0, indicating that the input IP address is or isn't, respectively, one of the plurality of IP addresses. Some or all of the items discussed above, such as, for example bloom filters, may also, or alternatively, be included in the engine.
A simplified example of the use of the methods described herein will now be described with reference to
For sake of example, the range is initially divided into only two clusters. Each cluster has a sub-range of one half the range. The IP addresses are assigned to the clusters as discussed above. As is apparent, however, there is no way thirty IP addresses can be assigned to two clusters without at least one cluster including more than eight IP addresses. Therefore, the number of clusters is doubled and the size of the sub-ranges decreased by one-half. At this point there are four clusters. The IP addresses are assigned to them as discussed above. The result is a first cluster with a sub-range of (0-63) having twelve IP addresses (2, 3, 5, 11, 13, 33, 34, 41, 45, 50, 51, 60) assigned to it. A second cluster has a sub-range of (64-127) having six IP addresses (65, 66, 82, 83, 84, 85) assigned to it. A third cluster has a sub-range of (128-191) having five IP addresses (150, 151, 170, 175, 180) assigned to it. Finally, a fourth cluster has a sub-range of (192-255) having seven IP addresses (200, 205, 210, 225, 230, 235, 240) assigned to it. The first cluster still contains more than the page size limit of eight IP addresses and will not fit on a page. Therefore, the number of clusters is again doubled and the size of the sub-ranges is again decreased by one-half. The IP addresses are reassigned to the new clusters. The result is eight clusters 702-716. Each cluster 702-716 has a sub-range size of thirty-two possible IP addresses. The sub-range of each cluster 702-716 is indicated by SR in
Because each cluster 702-716 now has a cluster size less than or equal to the page size limit, the IP addresses in each cluster may be assigned to pages 718-724. The clusters are sorted in descending order according cluster size. Accordingly, the first cluster is cluster 704, having a cluster size of 7, and the last cluster is cluster 708, having a cluster size of 0. In this example, the clusters are assigned according to a first fit algorithm. Beginning with the largest cluster, cluster 704, a page with enough room to hold all of the IP addresses of cluster 704 is searched for. There are no pages to begin with and, therefore, no page with at room for at least seven IP addresses. Accordingly, a page 718 is created and the IP addresses from cluster 704 are assigned to it. The next largest cluster 706 has a cluster size of six. As the page size limit is eight and there are already seven IP addresses assigned to the first, and at this point only, page 718, there is not enough room on any existing page for the IP addresses of cluster 706. Thus, a second page 720 is created and the IP addresses from cluster 706 are assigned to the page 720. Similarly, neither the first page 718, nor the second page 720 has sufficient free entries to hold the IP addresses of the next largest cluster 702. A third page 722 is, therefore, created and the IP addresses of the cluster 702 are assigned thereto. Again, none of the pages 718, 720 or 722 have enough free entries to hold the four IP addresses of cluster 716. A fourth page 724 is created, to which the IP addresses of the cluster 716 are assigned. The next to be assigned is cluster 712, with a cluster size of three. Neither the first page, nor the second page has enough room for the three IP addresses of cluster 712. The third page 722 however, only contains five IP addresses of the eight that it can hold. Therefore, the IP addresses of cluster 712 can be, and are, assigned to the third page 722. When the next cluster 714 is assigned, the fourth page 724 is the first page having enough free entries to hold the IP addresses of cluster 714. At this point the first page 718 has one free entry, the second page has two free entries, the third page has zero free entries and the fourth page has one free entry. Cluster 710 is the next largest cluster with two IP addresses. The first one of the pages 718-724 that can hold the IP addresses in cluster 710 is the second page 720, to which the IP addresses are assigned. The last cluster 708 has no IP addresses and a cluster size of zero. The cluster 708 is, therefore, assigned to a null page (which may be page zero), indicating that it includes no IP addresses. All IP addresses in the clusters 702-716 have now been assigned to pages 718-724.
A load factor may be calculated at this point. As discussed above, the load factor is the ratio of the sum of the page size limits to number of IP addresses in the plurality of IP addresses (or alternatively, but equivalently, to the sum of the page sizes). In this example, the page size limit is eight. The sum of the page size limits therefore is thirty-two. The sum of the page sizes and the number of IP addresses in the plurality of IP addresses 700 is thirty. Thus, the load factor is 32/30 or 1.067. This load factor is acceptable (i.e. it is below the load factor threshold of 1.5). If the load factor were above the load factor threshold, the number of clusters would be doubled again as discussed above and the entire process discussed above is repeated again until the load factor is below the load factor threshold.
The IP addresses assigned to each page 718-724 are next sorted in numerical order. The first page 718 and the fourth page 724 have page sizes (seven in both cases) lass than the page size limit of eight. For each page 718, 724 a random IP address is chosen from the IP addresses assigned to that page 718, 724 and duplicated on that page 718, 724. For this example, thirty three was duplicated on first page 718, and two hundred ten was duplicated on the fourth page 724. The result is four pages 726-732 each having a page size equal to the page size limit of eight.
Either after the IP addresses have been assigned to the pages 726-732 or while the IP addresses are being assigned to pages 718-724, a page record may be created. The page record indicates each of the sub-ranges of the clusters 702-716 (typically by starting value and ending value) and the page to which the IP addresses encompassed by that sub-range were assigned. The page record can consist of a tuple for each sub-range formatted as (start value, end value, page number). Thus, the page record for this example would be: (0,31,3); (32,63,1); (64,95,2); (96,127,0); (128,159,2); (160,191,3); (192,223,4); (224,255,4). The plurality of IP addresses 700 may now be searched using the pages 726-732 and the page record as discussed above and/or an engine generated for searching the plurality of IP addresses.
Although primarily discussed in terms of internet protocol (IP) addresses, the methods disclosed herein is not so limited and may be applied to other types of data. For example, the methods may be applied to data such as street addresses, social security numbers, driver's license numbers, bank check numbers, etc.
Because of the speed at which very large set of IP addresses may be searched according to the methods presented herein, one use for such methods is in connection with network devices (also sometimes referred to as appliances). These devices, including appliances such as firewalls, routers, servers, etc., may receive packets of data from a first network and process the packets of data. The processing may include processing internal to the appliance (such as, for example, allowing the packet to cause an application to run on the appliance, cause the appliance to transmit data, files, etc. back to the computer originating the packet, etc.) and/or may include passing the packets to a second network. These devices may include rules, policies, etc., for whether to allow certain packets to be processed or to enter the second network. For example, a firewall may be located between a network and the internet. The firewall may wish to block packets originating from and/or addressed to certain IP addresses. These IP addresses may be the IP addresses of known spammers, hackers, foreign governments, etc. Alternatively, or additionally, these appliances may wish to block all packets except those originating from certain known IP addresses and/or process packets from certain IP addresses differently (such as, for example, prioritizing transmission of the packets, limiting bandwidth, etc.).
Thus, according to one aspect of the present disclosure, and as illustrated in
The processing the packet according to whether the IP address is an excepted IP address may include processing internal to the appliance 800 and/or may include passing the packets to a second network 804. If the processor 808 determines to allow the packet to proceed, processing internal to the appliance 800 may include allowing the packet to cause an application to run on the appliance 800, allowing the packet to cause the appliance 800 to transmit data, files, etc. back to the computer originating the packet, etc. Thus, for example, the appliance may be a Web server, network server, etc. that may determine whether to allow a remote user to access a webpage, run a program stored on the server, view/download files stored on the server, etc. The appliance 800 may also include at least one output 812 coupled to a second network 804 for transmitting the packet from the first network 802 to the second network 804 if the processor 808 determines to allow the packet from the first network 802 to enter the second network 804.
According to another aspect of the present application, a network appliance 800 for connection to a first network 802 includes at least one input 806 coupled to the first network 802 for receiving a packet from the first network 802. The packet includes an internet protocol (IP) address. The appliance 800 includes at least one processor 808 for determining whether to allow the packet from the first network 802 to proceed and at least one memory device 810. The appliance 800 also includes a first engine stored in the memory device 810. The first engine includes a plurality of pages storing a plurality of excepted IP addresses. The excepted IP addresses each has a numeric value within a range of numeric values and the range is divided into a plurality of contiguous sub-ranges. Each page includes one or more of the excepted IP addresses having numeric values within one or more of the sub-ranges associated with that page. Each page has a page size defined by the number of excepted IP addresses assigned to that page. The excepted IP addresses are assigned to each page ordered by numeric value. The first engine also includes a first finite state machine (FSM). The first FSM includes instructions executable by the processor 808 to determine the page associated with the sub-range encompassing the IP address. The engine includes instructions operable to cause the processor to search the page associated with the sub-range encompassing the IP address to determine if the IP address is an excepted IP address and output an indication of whether the IP address is an excepted IP address. The processor 808 is also configured via instructions stored in the memory device 810 to process the packet from the first network 802 according to the indication from the first engine.
The processing of the packet according to the indication from the first engine may include processing internal to the appliance 800 and/or may include passing the packets to a second network 804. If the processor 808 determines to allow the packet to proceed, processing internal to the appliance 800 may include allowing the packet to cause an application to run on the appliance 800, allowing the packet to cause the appliance 800 to transmit data, files, etc. back to the computer originating the packet, etc. Thus, for example, the appliance may be a Web server, network server, etc. that may determine whether to allow a remote user to access a webpage, run a program stored on the server, view/download files stored on the server, etc. The appliance 800 may also include at least one output 812 coupled to the second network 802 for transmitting the packet from the first network 802 to the second network 804 if the processor 808 determines to allow the packet from the first network 802 to enter the second network 804 via processing the packet according to the indication from the first engine.
The processor 808 may include cache memory 814. As discussed above, cache memory resides on the processor 808. Accessing, manipulating, acting upon, etc. data occurs much quicker when the data is stored in cache memory 814 than when it is stored in separate memory, such as memory device 810. For this reason, it may be preferable to have the page size limit of each page be selected such that the entire page will fit in the cache memory 814.
According to various embodiments, the excepted IP addresses may be IP addresses to be allowed entry to the second network, denied entry to the second network, and/or specially processed. Thus, in some embodiments, if a packet's IP address is an excepted address, the packet is prevented from entering the second network. In other embodiments, if a packet's IP address is an excepted address, the packet is allowed to enter the second network. In still other embodiments, if a packet's IP address is an excepted address, the packet is specially processed, such as being prioritized, rerouted to a different destination (whether within or without the second network), etc. The appliance 800 may operate with two or more sets of pages and page records and/or more than one engine as described above. For example, one engine may include pages having excepted IP addresses that are to be blocked, while a second engine includes excepted IP addresses to be allowed. There may also be more than one engine with one type of excepted IP address (e.g., blocked, allowed, special, etc.). Thus, a first engine may include excepted IP addresses of known hackers, while another engine includes excepted IP addresses of known spammers. Although only two engines are described above, there may be more than two. The engines, when there is more than one, may be prioritized in any appropriate manner. As one example, if an IP address is found by one engine (A), the appliance may restrict bandwidth available to packets from that IP address. IP addresses found by another engine (B) may be given high priority and increased bandwidth. If the IP address of an incoming packet is found by both engine (A) and engine (B), the appliance may need to apply priority rules to determine how to handle the packet. For example, engine (B) may be given highest priority and the packet may be given priority, or engine (A) may be given the higher priority and packets from the IP address will be permitted limited bandwidth. Additionally, or alternatively, more complex rules of priority may apply (such as A prioritized over B except when D is also true, etc.).
In various embodiments some or all of the elements of the method discussed above may be incorporated in or used with the appliance 800. Thus, for example, the data may include a page record. The processor 808 may be configured by the instructions to identify the target page by searching the page record. The processor 808 may be configured by the instructions to determine that there are no excepted addresses in the sub-range encompassing the IP address by searching the page record. The data may also include a null page associated with any sub-range encompassing no excepted addresses.
In one example embodiment, the appliance 800 is used by setup by a user (sometimes referred to as an administrator). The user may perform the method discussed above on a computer that is not the appliance 800. The page size limit should be selected such that an entire page will fit within the cache memory 814 of the appliance's 800 processor 808. The resulting pages and page records, or the resulting engines are then uploaded to the appliance 800 and stored in the at least one memory device 810. Alternatively, or additionally, the method may be performed within the appliance 800 itself if the appliance 800 has enough memory, processing power, etc. When operating, packets from the first network have their IP addresses fed to the engines before being allowed to enter the second network. As discussed above, the engines will perform the searching and output an indication of whether the IP address of the packet is in the pages of the engine (i.e., whether it is an excepted IP address). Depending on the result and the configuration of the appliance 800, the appliance 800 may allow, block, specially treat, etc. the packet, it may apply some other action to the packet (such as scanning it, copying it, etc.), or it may input the IP address to another engine. Alternatively, or additionally, the IP address of the packet may be simultaneously input to two or more engines (i.e., parallel processed).
The foregoing description of the embodiments has been provided for purposes of illustration and description. It is not intended to be exhaustive or to limit the invention. Individual elements or features of a particular embodiment are generally not limited to that particular embodiment, but, where applicable, are interchangeable and can be used in a selected embodiment, even if not specifically shown or described. The same may also be varied in many ways. Such variations are not to be regarded as a departure from the invention, and all such modifications are intended to be included within the scope of the invention.
This application claims the benefit of U.S. Provisional Application No. 61/171,176 filed on Apr. 21, 2009. The entire disclosure of the above application is incorporated herein by reference.
Number | Name | Date | Kind |
---|---|---|---|
7065888 | Jaklitsch et al. | Jun 2006 | B2 |
7136547 | Brown et al. | Nov 2006 | B2 |
7353360 | Muller et al. | Apr 2008 | B1 |
7501603 | Kalanovic | Mar 2009 | B2 |
7509674 | Sterne | Mar 2009 | B2 |
7602787 | Cheriton | Oct 2009 | B2 |
7609875 | Liu et al. | Oct 2009 | B2 |
7689485 | Kanekar et al. | Mar 2010 | B2 |
7724138 | Horwitz et al. | May 2010 | B2 |
7725470 | Richards et al. | May 2010 | B2 |
7725934 | Kumar et al. | May 2010 | B2 |
7773536 | Lloyd et al. | Aug 2010 | B2 |
7808929 | Wong et al. | Oct 2010 | B2 |
7812507 | Takahashi et al. | Oct 2010 | B2 |
7814233 | Ellis | Oct 2010 | B2 |
7831606 | Pandya | Nov 2010 | B2 |
7836080 | DeBie | Nov 2010 | B2 |
7847992 | Klug et al. | Dec 2010 | B2 |
7849204 | Yared et al. | Dec 2010 | B2 |
7869425 | Elliott et al. | Jan 2011 | B2 |
7904642 | Gupta et al. | Mar 2011 | B1 |
7908263 | Paiz | Mar 2011 | B1 |
7925666 | Johnson et al. | Apr 2011 | B1 |
7932925 | Inbar et al. | Apr 2011 | B2 |
7940960 | Okada | May 2011 | B2 |
7945941 | Sinha et al. | May 2011 | B2 |
7958549 | Nakae et al. | Jun 2011 | B2 |
7967813 | Cooper et al. | Jun 2011 | B2 |
8040882 | Chung et al. | Oct 2011 | B2 |
8041773 | Abu-Ghazaleh et al. | Oct 2011 | B2 |
8123740 | Madhani et al. | Feb 2012 | B2 |
8174588 | McKinley et al. | May 2012 | B1 |
20090063657 | Samuels et al. | Mar 2009 | A1 |
20090106207 | Solheim et al. | Apr 2009 | A1 |
20100175124 | Miranda | Jul 2010 | A1 |
20100195538 | Merkey et al. | Aug 2010 | A1 |
20110252228 | Chang et al. | Oct 2011 | A1 |
Entry |
---|
Chapman D B et al: “Building Internet Firewalls” Building Internet Firewalls, Sebastopol, CA : O'Reilly, US, Jan. 1, 1995, pp. 131-188, XP002202789 ISBN: 978-1-56592-124-5. |
Pankaj Gupta et al: “Algorithms for Packet Classification”, IEEE Network, IEEE Service Center, New York, NY, US, vol. 15, No. 2, Mar. 1, 2001, pp. 24-32, XP011092081, ISSN: 0890-8044. |
Number | Date | Country | |
---|---|---|---|
20100268799 A1 | Oct 2010 | US |
Number | Date | Country | |
---|---|---|---|
61171176 | Apr 2009 | US |