Reference will now be made in detail to a particular embodiment of the invention an example of which is illustrated in the accompanying drawings. While the invention will be described in conjunction with the particular embodiment, it will be understood that it is not intended to limit the invention to the described embodiment. To the contrary, it is intended to cover alternatives, modifications, and equivalents as may be included within the spirit and scope of the invention as defined by the appended claims.
Previous string matching techniques have had one or more of the disadvantages of false positives, high memory bandwidth requirement, large memory requirement, and unpredictable lookup latency. The described string-matching engine overcomes these drawbacks with a low memory and logic requirement using a combination of up front filtering to detect a mismatch (based, in part, upon the Bloom filter approach) followed by a low-memory, low-memory-bandwidth, high-speed mechanism to avoid false positives. In this way, the described string matching engine provides for efficient string matching using a low memory collision-free hash-based look up scheme with low average case bandwidth and power requirements that overcomes prior art limitations by providing the ability to match against a large dictionary of long and arbitrary length strings at line speed.
Generally speaking, the described embodiments utilize a primary memory (configured as a hash lookup table) arranged to store a transitively unique address identified for each dictionary entry S (character strings of arbitrary length) stored in a secondary memory. For every dictionary entry, an entry addition into the primary memory is performed by first checking if any hashed row for the entry to be added into the primary memory is not already being used as a unique bit by another entry. If such a row is not found, then the already existing entry that is colliding with the new entry is transferred to an alternate location when no unique address is found for the new entry. What is referred to as a unique bit (b0) at each primary memory address is used to mark the use of that particular address as a unique location corresponding to a particular dictionary entry. A second bit (or collection of bits (b1 to bx)) may be used to indicate if that primary memory address was hashed to by any other element of S as well as provide a counter that facilitates dictionary entry deletion. A third set of bits (bx+1 to bx+w) at a memory address location stores the address pointing to the secondary memory where the input key (or key-fingerprint) and any data associated with the key is stored.
In order to facilitate the understanding of the description of selected embodiments, a brief discussion of hashing is presented.
The original hash-based architecture consists of a hash function H(i), a table T (nominally an array), and a bucket B (nominally a linked list) stored at each row of the table. Each row of the table has an address associated with it using which the row can be accessed. In the typical lookup scheme, the input (nominally called the key) is mapped into a table address using the hash function. Hash functions are designed to provide a random distribution over the range of table addresses. The more uniform the distribution (for the set S in particular), the better the hash function. The size of the table T, |T|=m is generally greater than the size of the set S, |S|=n. In the typical hash function, multiple inputs in and external to S may map to the same table address. This is nominally called a collision. The bucket serves as the means to resolve collisions. Nominally, the bucket at an address contains a list of all the S elements (or, fingerprints derived from them) that map to that address. When a key maps to an address, the corresponding bucket is traversed to determine if the key (or its fingerprint) is stored in the bucket. If it is, the key is deemed to belong to S, otherwise it is not in S. Note that the bucket can also store any range value contiguous with the key/fingerprint to implement an arbitrary function.
One extension was the so-called perfect hash functions. A hash function is perfect if it maps each member of S to a unique address. With a perfect hash function the size of each bucket would be exactly one, and membership checking as well as function mapping could be done with predictable latency. An alternative to perfect hashing is the so-called “Cuckoo-hashing” scheme. In this scheme, two or more hash functions are used (the two-hash function version) and the key is hashed twice into the table. If neither address is unique to this key, the key is stored in one of the addresses and the existing key at that address is moved to its alternate address. This process is continued until a key gets moved to a vacant address. It has been demonstrated that if the size of the table is two or more times the size of S, the process completes in the typical case. The latency of lookups is predictable since each key requires only two hashes, and each bucket is of size one.
It has also been shown that insertions complete in expected constant time if |T|>=2×|S|. The performance can be improved by using more alternative locations (more hash functions).
A Bloom Filter allows membership queries with no false negatives, a very low memory requirement, and some other useful properties, but with the tradeoff that it allows false positives with a low, but finite, probability. The basic scheme uses multiple (k) hash functions. The table consists of one bit per address and is initialized with all 0s. During insertion, all the up to k addresses hashed to from a key are set to 1. During lookup, a key is said to be absent from S if any of its k addresses has a 0. If all k addresses have a 1, there is a very high probability that the key is in S. The probability of a false positive is approximately (0.62)m/n, where m is the number of rows in the lookup table and n is the number of dictionary entries as before. For m/n=30, and k=5, the false positive probability is about 0.000036. Although this may seem small, this non-zero false positive probability is statistically significant considering the large amount of data being looked up. Note that since only one bit is used per address, the memory requirement is quite low relative to the size of the dictionary even with m/n=30. A useful property of the Bloom Filter is that the table does not contain any dictionary contents. This makes it suitable for applications requiring a high degree of security. Another advantage is that the low memory requirement of the lookup table means it can be broadcast over a network consuming much less bandwidth than it would have cost to broadcast the entire dictionary.
Unfortunately, a disadvantage of the Bloom Filter is that deletions from S are not possible. The Counting Bloom Filter was proposed to overcome this. Each address in the table now has a counter instead of just one bit. The counter is incremented during an insertion at that address and decremented during a deletion from that address. Unfortunately, however, the memory requirement is increased since the counter must have enough bits to prevent overflows.
Another approach referred to as a Bloomier Filter and variations can be thought of as a combination of Cuckoo hashing and the Bloom filter. There are k hash functions as in the Bloom Filter, but membership is based on a function computed from the values stored at the k hash addresses rather than the presence of a 0 at one of the addresses. The computed function is also used to point to a results table that stores the desired mapping from S to a range. Like the Cuckoo hashing scheme, the Bloomier Filter also relies on the availability of a transitively unique location per S element for collision-free hashing. The ability to map S to an arbitrary range is not a new contribution since all hash functions have this ability. Similarly, the use of the transitively unique location is also not new given its prior proposal in Cuckoo hashing. Unfortunately, the Bloomier Filter and its associated variations require either require a memory substantially bigger than the dictionary size for false positive resolution and function mapping, or require k multi-bit lookups in the primary hash table (the bit width of each lookup is the log of the number of entries in the dictionary), representing a substantial memory bandwidth and power requirement.
The described embodiments will now be described in terms of a string matching engine, system, and method useful in a number of applications where memory and computing resources are at a premium or, high performance is desired. Such applications are typically found in portable devices such as personal communication devices 200 (shown in
The described string matching engine can be deployed as, or included in, a co-processor having its own memory and computing resources that are separate from a central processing unit, or CPU, arranged to filter any incoming traffic for character strings that have been identified as potential malware (i.e., a computer virus). In this way, malware detection can be off-loaded from the CPU thereby freeing up computing and memory resources otherwise required for detection of malware that would have the potential to severely disrupt the operation of the personal communication device 200. In some cases, the character strings are stored in a string dictionary and used by the string machine engine to detect such malware are supplied and periodically updated by a third party on either a subscription basis or as part of a service contract between a user and a service provider.
Referring back to
The cell phone 200 also includes a user input device 214 that allows a user to interact with the cell phone 200. For example, the user input device 214 can take a variety of forms, such as a button, keypad, dial, etc. Still further, the cell phone 200 includes a display 216 (screen display) that can be controlled by the processor 204 to display information to the user. A data bus can facilitate data transfer between at least the ROM 212, RAM 210, the processor 204, and a CODEC 218 that produces analog output signals for an audio output device 220 (such as a speaker). The speaker 220 can be a speaker internal to the cell phone 200 or external to the cell phone 200. For example, headphones or earphones that connect to the cell phone 200 would be considered an external speaker. A wireless interface 222 operates to receive information from the processor 204 that opens a channel (either voice or data) for transmission and reception typically using RF carrier waves.
During operation, the wireless interface 222 receives an RF transmission carrying an incoming data stream 224 in the form of data packets 226. Copies of the data packets are made and in some cases undergo additional processing prior to being forwarded to the co-processor 204 for examination by the string matching engine 208 for possible inclusion of character strings associated with known computer malware. In the described embodiment, the group of stored character strings (referred to as a string dictionary) used by the string matching engine 208 are provided by a third party and are periodically updated with new character strings in order to detect new computer malware. It should be noted that the string matching engine is capable of matching multiple tokens separated by a fixed or variable offset. Furthermore, the inputs to the string matching engine does not need to be derived solely from traffic. For example, inputs to the string matching engine can take the form of files already resident in the cell phone memory (RAM 210, ROM 212).
The string-matching engine 208 will provide a match flag 228 in those situations where the incoming data stream 224 includes a character string 230 that matches one of the entries in the string dictionary. The match flag 228 will notify the CPU 204 that the cell phone 200 has been exposed to potentially harmful computer malware and appropriate prophylactic measures must be taken. These measures can include malware sequestration, inoculation, quarantine, etc. provided by a security protocol.
Referring back to
It should be noted that the comparisons to the values in the secondary memory (string dictionary) are interleaved at the granularity of a byte (or similar small number). As a result, the expected number of bits to be fetched of the non-matching values in the secondary memory will be much smaller than the total number of stored bits. Also, for the typical lookup, the expected number of unique locations in the hash lookup table (for which b0=1), will be small. The total number of unique locations in the hash lookup table is equal to |S|. As a result, as m/n (number of rows in the hash lookup table divided by |S|) approaches the number of hash functions, k, the number of unique locations encountered in the typical lookup will approach 1. In this way, a low memory collision-free hash-based look up scheme with low average case bandwidth and power requirements are provided where the worst-case bandwidth requirements depend only on the width of the dictionary words.
The invention provides a number of advantages over the prior art, in particular, the Bloom bits reduce the need for further lookup once a mismatch has been established. The address stored in hash lookup table needs to be fetched and further lookup in the pattern memory only when the unique bit is set to one thereby considerably reducing the need to perform lookups in the string dictionary. In a particularly useful embodiment, a dual-memory architecture is used such that the hash lookup table stores only the addresses for further look up for establishing an actual match thereby allowing the hash lookup table to have a larger number of rows (hash buckets), and thereby, bring closer to 1 the expected number of rows with unique bit set to 1, and thereby, the number of rows to be looked up in the sting dictionary. As a result, the architecture represents a high-bandwidth lookup scheme in which the expected latency of lookup for a matching input requires fetching one pattern from the memory, with the worst case latency being bounded by k (˜5) pattern fetches.
Further, the memory fetches from the string dictionary are a fine granularity of possibly a byte so that a mismatch can be declared as soon as the first mismatching byte is identified and the remainder of the row is not fetched. Still, further, the memory fetches from the string dictionary can be interleaved so that character comparison can be done in parallel for the rows to be looked up. With its emphasis on minimizing the memory and number of memory fetches, the inventive architecture also represents a scheme for minimizing the energy consumption per lookup relative to competing schemes. The architecture supports an O(1) delete from the dictionary and an expected O(1) time for incremental adds.
Embodiments of the invention, including the apparatus disclosed herein, can be implemented in digital electronic circuitry, or in computer hardware, firmware, software, or in combinations of them. Apparatus embodiments of the invention can be implemented in a computer program product tangibly embodied in a machine-readable storage device for execution by a programmable processor; and method steps of the invention can be performed by a programmable processor executing a program of instructions to perform functions of the invention by operating on input data and generating output. Embodiments of the invention can be implemented advantageously in one or more computer programs that are executable on a programmable system including at least one programmable processor coupled to receive data and instructions from, and to transmit data and instructions to, a data storage system, at least one input device, and at least one output device. Each computer program can be implemented in a high-level procedural or object-oriented programming language, or in assembly or machine language if desired; and in any case, the language can be a compiled or interpreted language.
Suitable processors include, by way of example, both general and special purpose microprocessors. Generally, a processor will receive instructions and data from a read-only memory and/or a random access memory. Generally, a computer will include one or more mass storage devices for storing data files; such devices include magnetic disks, such as internal hard disks and removable disks; magneto-optical disks; and optical disks. Storage devices suitable for tangibly embodying computer program instructions and data include all forms of non-volatile memory, including by way of example semiconductor memory devices, such as EPROM, EEPROM, and flash memory devices; magnetic disks such as internal hard disks and removable disks; magneto-optical disks; and CD-ROM disks. Any of the foregoing can be supplemented by, or incorporated in, ASICs (application-specific integrated circuits).
A number of implementations of the invention have been described. Nevertheless, it will be understood that various modifications may be made without departing from the spirit and scope of the invention. Accordingly, other embodiments are within the scope of the following claims.
This patent application takes priority under 35 U.S.C. 119(e) to (i) U.S. Provisional Patent Application No. 60/840,168, filed on Aug. 25, 2006 (Attorney Docket No. NETFP001P) entitled “STRING MATCHING ENGINE” by Choudhary et al. This application is also related to (i) co-pending application entitled, “STRING MATCHING ENGINE FOR ARBITRARY LENGTH STRINGS” by Ashar et al (Attorney Docket No. NETFP002) having application Ser. No. ______ and filed ______ and (ii), co-pending application entitled, “REGULAR EXPRESSION MATCHING ENGINE” by Ashar et al (Attorney Docket No. NETFP003) having application Ser. No. ______ and filed ______ each of which are incorporated by reference in their entirety for all purposes.
Number | Date | Country | |
---|---|---|---|
60840168 | Aug 2006 | US |