The present invention relates generally to the field of processing a stream of data symbols to determine whether any strings of the data symbol stream match a pattern.
Advances in network and storage-subsystem design continue to push the rate at which data streams must be processed between and within computer systems. Meanwhile, the content of such data streams is subjected to ever increasing scrutiny, as components at all levels mine the streams for patterns that can trigger time sensitive action. Patterns can include not only constant strings (e.g., “dog” and “cat”) but also specifications that denote credit card numbers, currency values, or telephone numbers to name a few. A widely-used pattern specification language is the regular expression language. Regular expressions and their implementation via deterministic finite automatons (DFAs) is a well-developed field. See Hoperoft and Ullman, Introduction to Automata Theory, Languages, and Computation, Addison Wesley, 1979, the entire disclosure of which is incorporated herein by reference. A DFA is a logical representation that defines the operation of a state machine, as explained below. However, the inventors herein believe that a need in the art exists for improving the use of regular expressions in connection with high performance pattern matching.
For some applications, such as packet header filtering, the location of a given pattern may be anchored, wherein anchoring describes a situation where a match occurs only if the pattern begins or ends at a set of prescribed locations within the data stream. More commonly, in many applications, a pattern can begin or end anywhere within the data stream (e.g., unstructured data streams, packet payloads, etc.). Some applications require a concurrent imposition of thousands of patterns at every byte of a data stream. Examples of such applications include but are not limited to:
Today's conventional high-end workstations cannot keep pace with pattern matching applications given the speed of data streams originating from high speed networks and storage subsystems. To address this performance gap, the inventors herein turn to architectural innovation in the formulation and realization of DFAs in pipelined architectures (e.g., hardware logic, networked processors, or other pipelined processing systems).
A regular expression r denotes a regular language L(r), where a language is a (possibly infinite) set of (finite) strings. Each string is comprised of symbols drawn from an alphabet Σ. The syntax of a regular expression is defined inductively, with the following basic expressions:
As noted above, regular expressions find practical use in a plethora of searching applications including but not limited to file searching and network intrusion detection systems. Most text editors and search utilities specify search targets using some form of regular expression syntax. As an illustrative example, using perl syntax, the pattern shown in
Applications that use regular expressions to specify patterns of interest typically operate as follows: Given a regular expression r and a target string t (typically the contents of some input stream such as a file), find all substrings of t in L(r). The substrings are typically reported by their position within t. Thus, unless otherwise stated, it is generally intended that the pattern r is applied at every position in the target and that all matches are reported.
The simplest and most practical mechanism for recognizing patterns specified using regular expressions is the DFA, which is formally described as the 5-tuple:
(Q, Σ, qo, δ, A)
where:
A DFA operates as follows. It begins in state qo. If the DFA is in state q, then the next input symbol a causes a transition determined by δ(q, a). If the DFA effects a transition to a state qεA, then the string processed up to that point is accepted and is in the language recognized by the DFA. As an illustrative example, the regular expression of
The construction of a DFA typically involves an intermediate step in which a nondeterministic finite automaton (NFA) is constructed. An NFA differs from a DFA in that whereas a DFA is a finite state machine that allows at most one transition for each input symbol and state, an NFA is a finite state machine that allows for more than one transition for each input symbol and state. Also, every regular language has a canonical DFA that is obtained by minimizing the number of states needed to recognize that language. Unless specified otherwise herein, it should be assumed that all automata are in canonical (deterministic) form.
However, for the purpose of pattern matching, the inventors herein believe that the DFA shown in
A DFA is typically implemented interpretively by realizing its transitions δ as a table: each row corresponds to a state of the DFA and each column corresponds to an input symbol. The transition table for the DFA of
The inventors herein believe that the pattern matching techniques for implementing DFAs in a pipelined architecture can be greatly improved via the novel pattern matching architecture disclosed herein. According to one aspect of the present invention, a pipelining strategy is disclosed that defers all state-dependent (iterative, feedback dependent) operations to the final stage of the pipeline. Preferably, transition table lookups operate to retrieve all transition table entries that correspond to the input symbol(s) being processed by the DFA. Retrievals of transition entries from a transition table memory will not be based on the current state of the DFA. Instead, retrievals from the transition table memory will operate to retrieve a set of stored transition entries based on data corresponding to the input symbol(s) being processed.
In a preferred embodiment where alphabet encoding is used to map the input symbols of the input data stream to equivalence class identifiers (ECIs), these transition table entries are indirectly indexed to one or more input symbols by data corresponding to ECIs.
This improvement allows for the performance of single-cycle state transition decisions, enables the use of more complex compression and encoding techniques, and increases the throughput and scalability of the architecture.
According to another aspect of the present invention, the transitions of the transition table preferably include a match flag that indicates whether a match of an input symbol string to the pattern has occurred upon receipt of the input symbol(s) that caused the transition. Similarly, the transitions of the transition table preferably include a match restart flag that indicates whether the matching process has restarted upon receipt of the input symbol(s) that caused the transition. The presence of a match flag in each transition allows for the number of states in the DFA to be reduced relative to traditional DFAs because the accepting states can be eliminated and rolled into the match flags of the transitions. The presence of a match restart flag allows the DFA to identify the substring of the input stream that matches an unanchored pattern. Together, the presence of these flags in the transitions contribute to another aspect of the present invention—wherein the preferred DFA is configured with an ability to scale upward in the number of bytes processed per cycle. State transitions can be triggered by a sequence of m input symbols, wherein m is greater than or equal to 1 (rather than being limited to processing only a single input symbol per clock cycle). Because of the manner by which the transitions include match flags and match restart flags, as disclosed herein, the DFA will still be able to detect when and where matches occur in the input stream as a result of the leading or an intermediate input symbol of the sequence of m input symbols that are processed together by the DFA as a group.
According to yet another aspect of the present invention, incremental scaling, compression and character-encoding techniques are used to substantially reduce the resources required to realize a high throughput DFA. For example, run-length coding can be used to reduce the amount of memory consumed by (i.e., compress) the DFA's transition table. Furthermore, the state selection logic can then operate on the run-length coded transitions to determine the next state for the DFA. Masking can be used in the state selection logic to remove from consideration portions of the transition table memory words that do not contain transitions that correspond to the ECI of the input symbol(s) being processed.
Also, according to yet another aspect of the present invention, a layer of indirection can be used to map ECIs to transitions in the transition table memory. This layer of indirection allows for the use of various optimization techniques that are effective to optimize the run-length coding process for the transition entries in the transition table memory and optimize the process of effectively packing the run-length coded transition entries into words of the transition table memory such that the number of necessary accesses to transition table memory can be minimized. With the use of indirection, the indirection entries in the indirection table memory can be populated to configure the mappings of ECIs to transition entries in the transition table memory such that those mappings take into consideration any optimization processes that were performed on the transition entries in the transition table memory.
Furthermore, according to another aspect of the present invention, disclosed herein is an optimization algorithm for ordering the DFA states in the transition table, thereby improving the DFA's memory requirements by increasing the efficiency of the run-length coded transitions.
Further still, disclosed herein is an optimization algorithm for efficiently packing the transition table entries into memory words such that the number of transition table entries sharing a common corresponding input symbol (or derivative thereof such as ECI) that span multiple memory words is minimized. This memory packing process operates to improve the DFA's throughput because the efficient packing of memory can reduce the number of memory accesses that are needed when processing one or more input symbols.
According to another aspect of the present invention, the patterns applied during a search can be changed dynamically without altering the logic of the pipeline architecture itself. A regular expression compiler need only populate the transition table memory, indirection table, ECI mapping tables, and related registers to reprogram the pattern matching pipeline to a new regular expression.
Based on the improvements to DFA design presented herein, the inventors herein believe that the throughput and density achieved by the preferred embodiment of the present invention greatly exceed other known pattern matching solutions.
These and other inventive features of the present invention are described hereinafter and will be apparent to those having ordinary skill in the art upon a review of the following specification and figures.
a) and (b) depict a preferred transition table for the DFA of
a) and (b) depict transition tables for the regular expression of
a) and (b) depict an indirection table and a memory in which the run-length coded transition table of
a) and (b) depict an indirection table and a memory in which the run-length coded transition table of
a) and (b) depict an alternative formulation of the Indirection Table and TTM in which the run-length coded transition table of
a) and (b) respectively illustrate an exemplary transition table that has been optimized by state re-ordering, and the run length-coded version of the state re-ordered transition table;
The data tables and relevant registers of the regular expression circuit are preferably populated by the output of the regular expression compiler 500. Regular expression compiler 500 operates to process a specified (preferably user-specified) regular expression to generate the DFA that is realized by the regular expression circuit 502 as described herein. Preferably, regular expression compiler 500 is implemented in software executed by a general purpose processor such as the CPU of a personal computer, workstation, or server.
Regular expression compiler 500 can be in communication with regular expression circuit 502 via any suitable data communication technique including but not limited to networked data communication, a direct interface, and a system bus.
The regular expression circuit 502 preferably realizes the DFA defined by one or more specified regular expressions via a plurality of pipelined stages. A first pipeline stage is preferably an alphabet encoding stage 504 that produces an ECI output from an input of m input symbols, wherein m can be an integer that is greater than or equal to one. A second pipeline stage is preferably an indirection table memory stage 506. The indirection table memory stage 506 can be addressed in a variety of ways. Preferably, it is directly addressed by the ECI output of the alphabet encoding stage 504. A third pipeline stage is the transition table logic stage 508 that operates to receive an indirection table entry from the output of the indirection table memory stage 506 and resolve the received indirection entry to one or more addresses in the transition table memory stage 510. The transition table logic stage 508 also preferably resolves the received indirection table entry to data used by the state selection logic stage 512 when the stage selection logic stage processes the output from the transition table memory stage 510 (as described below in connection with the masking operations).
The transition table memory stage 510 stores the transitions that are used by the DFA to determine the DFA's next state and determine whether a match has been found. The state selection logic stage 512 operates to receive one or more of the transition entries that are output from the transition table memory stage 510 and determine a next state for the DFA based on the DFA's current state and the received transition(s). Optionally, the masking operations 514 and 516 within the state selection logic stage 512 that are described below can be segmented into a separate masking pipeline stage or two separate masking pipeline stages (an initial masking pipeline stage and a terminal masking pipeline stage). Additional details about each of these stages is presented herein.
High-Throughput DFAs
A conventional DFA processes one input symbol (byte) at a time, performing a table lookup on each byte to determine the next state. However, modern communication interfaces and interconnects often transport multiple bytes per cycle, which makes the conventional DFA a “bottleneck” in terms of achieving higher throughput. Throughput refers to the rate at which a data stream can be processed—the number of bytes per second that can be accommodated by a design and its implementation.
An extension of conventional DFAs is a DFA that allows for the performance of a single transition based on a string of m symbols. See Clark and Schimmel, “Scalable pattern matching for high speed networks”, IEEE Symposium on Field-Programmable Custom Computing Machines, April 2004, the entire disclosure of which is incorporated herein by reference. That is, the DFA processes the input stream in groups of m input symbols. Formally, this adaptation yields a DFA based on the alphabet Σm; the corresponding transition table is of size |Q||Σ|m. This apparently dramatic increase in resource requirements is mitigated by the compression techniques described herein. For convenience, we let δm denote a transition function that operates on sequences of length m, with δ=δ1.
As an illustrative example, consider doubling the effective throughput of the DFA shown in
In general, an algorithm for constructing δ3 for a given DFA is straightforward. The set of states is unchanged and the transition function (table) is computed by simulating progress from each state for every possible sequence of length m. That algorithm takes time θ(|Q||Σ|mm) to compute a table of size θ(|Q||Σ|m). A faster algorithm can be obtained by the following form of dynamic programming. Consider
Then
An algorithm based on the above proposition is shown in
To obtain higher throughput DFAs, the algorithm in
High-Throughput DFAs: Accepts
Because the higher throughput DFA performs multiple transitions per cycle, it can traverse an accepting state of the original DFA during a transition. We therefore augment the transition function to include whether an accepting state is traversed in the trace:
δm: Q×Σm→Q×{0, 1}
The range's second component indicates whether the sequence of symbols that caused the transition contains a nonempty prefix that takes the original DFA through an accept state.
Transition functions of this form obviate the need for a set of accepting states A, because the “accept” (match) information is associated with edges of the higher throughput DFA. This is formalized via the modified DFA we define in the “Synergistic Combination of Stride and Encoding” section below.
For m>1, accepts are now imprecise because the preferred DFA does not keep track of which intermediate symbol actually caused an accept (match) to occur. To favor speed, the high-throughput DFA can be configured to allow imprecise accepts, relegating precise determination of the accept point to software postprocessing.
High-Throughput DFAs: Restarts
As previously discussed, a pattern-matching DFA for a regular expression is preferably augmented with transitions that allow matches to occur throughout the target string. Because matches can occur at any starting position in the target, an accept should report the origination point of the match in the target. It is not clear in the automaton of
The λ-transitions introduced to achieve position independence of matching result in an NFA that can be transformed into a DFA through the usual construction. The “Synergistic Combination of Stride and Encoding” section below describes how to modify that construction to identify transitions that serve only to restart the automaton's matching process.
Formally, the transition function is augmented once more, this time to indicate when restarts occur:
δ1: Q×Em→Q×(0, 1)×(0, 1)
The first flag indicates a restart transition (a “match restart” flag) and the second flag indicates an accept transition (a “match” flag). Accordingly, the DFA diagrams henceforth show restart transitions with green edges and accept transitions with red edges. For example,
The actions of a DFA with the colored edges are as follows. The automaton includes context variables b and e to record the beginning and end of a match; initially, b=e=0, and the index of the first symbol of the target is 1. These variables allow the location of a matching string to be found in the context buffer as shown in
The use of match flags and match restart flags is particularly useful when scaling the DFA to process multiple input symbols per cycle.
Alphabet Encoding
As described above, the size of the transition table (δ) increases exponentially with the length of the input sequence consumed in each cycle. In this section, techniques are presented to encode the symbol alphabet, the goal of which is to mitigate the transition table's size and thus maximize the number of symbols processed per cycle.
Frequently, the set of symbols used in a given regular expression is small compared with the alphabet Σ of the search target. Symbols present in the target but not denoted in the pattern will necessarily be given the same treatment in the DFA for the regular expression. More generally, it may be the case that the DFA's behavior on some set of symbols is identical for all symbols in that set. As an illustrative example, the regular expression in
While a regular expression may mention character classes explicitly, such as “[0-9]”, a more general approach is achieved by analyzing a DFA for equivalent state-transition behavior. Formally, if
(∃aεΣ) (∃bεΣ) (∀qεQ)δ(q, a)=δ(q, b)
then it can be said that a and b are “transition equivalent.”
Given a transition table δ: Q×ΣQ, an O(|Sigma|2|Q|) algorithm for partitioning Σ into equivalence classes is shown in
Computing equivalence classes using the DFA, rather than inspection of its associated regular expression, is preferred for the following reasons:
Formally, the function in
Based on the ideas presented thus far,
Each transition in the transition table is a 3-tuple that comprises a next state identifier, a match restart flag and a match flag. For example, the transition indexed by state D and ECI 0 is (B, 1, 0) wherein B is the next state identifier, wherein 1 is the match restart flag, and wherein 0 is the match flag. Thus, the transition from state D to B caused by ECI 0 can be interpreted such that ECI 0 did not cause a match to occur but did cause the matching process to restart.
Synergistic Combination of Stride and Encoding
The ideas of improving throughput and alphabet encoding discussed above are now combined to arrive at an algorithm that consumes multiple bytes per cycle and encodes its input to save time (in constructing the tables) and space (in realizing the tables at runtime).
Such a new high-throughput DFAm can now be formally described as the 6-tuple (Q, Σ, qo, K, κ, δ) where:
The set of transformations begins with a regular expression r and perform the following steps to obtain DFAm:
The transition table for the preferred high-throughput DFA may contain |K|×|Q| entries. State minimization attempts to minimize |Q| and the previous discussion regarding the combination of higher throughput and alphabet encoding attempts to minimize |K|. Nonetheless, storage resources are typically limited; therefore, a technique for accommodating as many tables as possible should be addressed. The following addresses this matter by explaining how to compress the table itself.
Based on the discussion above, a transition table cell contains the three-tuple: (next state, start flag, accept flag). Run-length coding is a simple technique that can reduce the storage requirements for a sequence of symbols that exhibits sufficient redundancy. The idea is to code the string a″ as the run-length n and the symbol a; the notation n(a) can be used. Thus, the string aaaabbbcbbaaa is run-length coded as 4(a)3(b)1(c)2(b)3(a). If each symbol and each run-length requires one byte of storage, then run-length coding reduces the storage requirements for this example by three bytes (from 13 bytes to 10 bytes).
Examining the example of
While column compression can save storage, it appears to increase the cost of accessing the transition table to obtain a desired entry. Prior to compression, a row is indexed by the current state, and the column is indexed by the ECI. Once the columns are run-length coded, as shown in
If an entire compressed column is available, the circuit shown in
Supporting Variable-Length Columns in Memory
The storage layout shown in
Some architectures offer more flexibility than others with respect to the possible choices for x. For example, the bits of an FPGA Block Ram can sometimes be configured in terms of the number of words and the length of each word. The following considerations generally apply to the best choice for x:
Once x is chosen, the compressed columns will be placed in the physical memory as compactly as possible.
By introducing a layer of indirection in the transition table, it is possible to leverage the memory efficiency provided by run-length coding and compact deployment of entries in the transition table memory (TTM).
Once the Indirection Table entry is retrieved using the input symbol ECI, the pointer in the retrieved entry is used to read the first memory word from the TTM. Recall x is the number of entries per memory word in the TTM. An entire column is accessed by starting at address transition index and reading w consecutive words from the TTM, where w is given by:
The transition index and transition count values determine which entries in the first and last memory words participate in the column. In the example in
Because 0≦transition index<x, compact storage in the TTM increases the number of accesses by at most 1.
As discussed below and shown in
Furthermore, the allocation of memory for the Indirection Table is relatively straightforward, as each entry is the same size and the number of entries is equal to the maximum number of input symbol equivalence classes.
Supporting Variable-Length Columns in Memory: State Selection
The implementation of a State Select logic circuit that preferably takes into account the efficient storage layout of the TTM and offers other optimizations is now described. While the TTM offers compact storage of the compressed columns, state selection logic becomes more complex. The logic shown in
The logic shown in
The amount of logic used to determine the beginning and end of the compressed column can also be reduced. The start of each column is specified in the Indirection table using the pointer and transition index fields, which provide the TTM word containing the first entry and the index within that word of the entry. The number of words w occupied by the compressed column is then given by Equation (1). Each fully occupied word contains x entries of the compressed column. In the last word, the largest index occupied by the compressed column is given by:
(count+index−1)mod x (2)
Logic could be deployed in the State Select circuit to compute Equation 2. However, x is a design-time parameter. By appropriate parameterization of Hardware Definition Language (HDL) code, Equation 2 can be computed when the Indirection and TTM tables are generated.
Thus, the amount of computational logic can be reduced by storing the following variables for each entry in the Indirection Table:
The State Select block logic that is shown in
The next stage “masks” the entries in the last memory word that are not part of the transition table column. The run-length sums for entries that are not part of the transition table column are forced to the value of the Max Run-Length Register. This value records the maximum number of entries in a transition table column (i.e. the number of columns in the uncoded transition table; also the value of the run-length sum for the last entry in each coded transition table column). If the current memory word is the last memory word spanned by the transition table column (value of the Word Counter is equal to Word Count), then the Terminal Transition Index is used as the address to the Terminal Transition Index Mask ROM. If this is not the case, then no entries are masked during this stage. Forcing the run-length sums of trailing entries to be the maximum run-length sum value simplifies the Priority Encoder that generates the select bits for the multiplexer that selects the next state. This masking process produces an output vector from the less-than comparisons with the following property: the index of the left-most ‘1’ bit is the index of the next state entry, and all bits to right of this bit will be set to ‘1’. As previously referenced, it should be noted that the masking stages may be pipelined to increase throughput. In an alternative embodiment, only the less than comparisons, priority encoder, and next state selection logic need to occur in the final pipeline stage.
Optimizations
Achieving a high-throughput regular expression pattern-matching engine is the primary motivation for developing the high-throughput DFA, character encoding, and transition table compression techniques that are disclosed herein. In the following, techniques that optimize the throughput of the system at the expense of some memory efficiency are examined; thus, each of the following techniques is constrained by the TTM. Specifically, the TTM imposes the following constraints:
The optimization problems discussed in this section fall into the class of bin packing or knapsack problems. See Cormen et al., Introduction to Algorithms, Cambridge, Mass., The MIT Press, 1990, the entire disclosure of which is incorporated herein by reference. The number of entries per word defines the bin (or knapsack) size for the packing problems. The structure of the coded transition table may be altered to minimize the number of memory accesses by increasing the total number of entries and/or words required to represent the table so long as the total number entries or total number of words (bins or knapsacks) does not exceed the limits imposed by the Transition Table Memory.
Optimizations: Optimizing Throughput
The number of memory accesses required for a search is determined by the disposition of compressed columns in the TTM and the pattern by which those columns are accessed. The pattern depends on the set of regular expressions in the engine and the particular input data processed through the engine. In a DFAm, m input symbols are resolved to an ECI which induces one column lookup. The number of memory accesses depends on the length of the columns in the coded transition table and the column access pattern. The column access pattern depends on the regular expression (or set of regular expressions) in the engine and the input data. The total number of memory accesses for a given search can be expressed as:
where wi is the number of words spanned by row i in Transition Table Memory, fi is the relative frequency that row i is accessed, and N is the number of equivalence class identifiers produced by the input data.
While there is no prior knowledge of the input data, there is an ability to alter the structure of the coded transition table. By re-ordering the rows in the direct-addressed transition table, one can affect the length of the columns in the coded transition table. The optimization problem is to choose the column ordering that minimizes the total number of memory accesses, W. Assume that the transition table column access pattern follows a uniform distribution, fi=N/|K|. In this case:
Under these conditions, the optimization problem is to minimize the quantity:
Recall that counti, is the transition counti, or the number of entries in row i of the run-length coded transition table and x is the number of entries per word in the TTM.
To simplify the optimization problem, one can assume that x=1, so the quantity that now needs to be minimized is:
This will generally yield similar results to minimizing the function with an arbitrary x.
There are many approaches to state reordering. One approach is to minimize the length of a single column of the coded transition table by ordering the rows of the direct-addressed table according to the sorted order of the entries in the row. This maximizes the efficiency of the run-length coding for that one column. However, the re-ordering may also decrease the efficiency of the run-length coding for other columns.
The preferred approach is a greedy one; preferably it is desired to maximize the length of the runs for the most columns, thereby minimizing the length of each encoded column.
One can start by creating a difference matrix, which given two states indicates the number of ECIs that differ, and so will not continue a run. This algorithm is shown in
Next, one then orders the states from some starting point based on the entries in the difference matrix. One preferably chooses the states that preserves the most run lengths to get the next label. The starting state that is chosen is preferably the state that has the largest column-sum in the difference matrix. The idea for picking that state first is that it is the state that is the most different from all others. By moving that state to the end (rather than in the middle), one preserves the longest runs. This algorithm is outlined in
Optimizations: Memory Packing
Recall that the layer of indirection allows a column of the coded transition table to begin and end at any location in the TTM. Naïve packing of coded table columns into physical memory can thwart the aforementioned optimizations by incurring an extra memory access for each table column. Notice in
This memory packing problem is a variant of the classical fractional knapsack problem where w is the constraint or objective function. See Black, P. E., Dictionary of Algorithms and Data Structures, NIST, 2004, the entire disclosure of which is incorporated herein by reference. The difference in the preferred embodiment here is that we require contiguous storage of coded transition table columns. This imposes an additional constraint when partitioning an object (coded transition table column) across multiple knapsacks (memory words) in the classical problem.
One solution to this problem is based on subset sum. While this is an NP-complete problem in the general case (see Garey and Johnson, Computers and Intractability: A Guide to the Theory of NP-Completeness, W. H. Freeman and Co., 1979, the entire disclosure of which is incorporated herein by reference), there are certain conditions in which it runs in polynomial time, namely if the sum is much less than the number of elements that are to be chosen from to create the sum. The sum of the preferred embodiment will always be the width of a memory word, so the preferred algorithm will also run in polynomial time.
The basic idea is to find the longest run-length coded column and choose it first. One then will pack it into memory words guaranteeing that it achieves the best possible packing. One can then take the number of remaining entries in the last column and apply subset sum on it with the remaining run-length coded columns. This will pack the memory as full as possible without causing additional memory accesses. This process is repeated until no encoded columns remain. This algorithm is outlined in
An Implementation Architecture
In this section, an implementation of a high-performance regular expression search system based on the preferred high-throughput DFA and pipelined transition table techniques is described. The focus of this implementation is a hybrid processing platform that includes multiple superscalar microprocessors and reconfigurable hardware devices with high-bandwidth interconnect to an array of high-speed disks.
The purpose of the architecture is to maximize throughput by embedding an array of regular expression engines in the reconfigurable hardware devices (e.g., FPGAs). The array of engines, supporting control logic, and context buffer(s) may be logically viewed as a single firmware module. In an embodiment wherein the regular expression circuits/engines are realized on an FPGA, these engines can be synthesized to a hardware definition language (HDL) representation and loaded onto the FPGA using known techniques.
Each regular expression engine's primary task is to recognize regular expressions in the input files streaming off of the high-speed disks. The set of regular expressions is preferably specified by the user through the user interface, compiled into high-throughput DFAs, and translated into a set of tables and register values by a collection of software components. The set of tables and register values are written to the firmware module prior to beginning a search. When a regular expression engine recognizes a pattern, it sends a message to the Results Processor that includes the context (portion of the file containing the pattern), starting and ending indexes of the pattern in the file, and the accepting state label. Depending on the operating environment and level of integration, the user interface may be a simple command line interface, Graphical User Interface (GUI), or language-specific API. The following subsections provide detailed descriptions of the remaining components.
An Implementation Architecture: Regular Expression Compiler
As detailed in
An Implementation Architecture: Results Processor
It is expected that any of a variety of techniques can be used to report the results of a search via a results processor. The preferred results processor can be configured to resolve the exact expression and input string segment for each match using the results produced by the regular expression circuits (engines). In a preferred embodiment such as that shown in
An Implementation Architecture: File I/O Controller
The file I/O controller is a component of the system that controls the input stream. In the exemplary system of
An Implementation Architecture: Regular Expression Firmware
The regular expression firmware module is the primary datapath component in the system architecture shown in
As previously mentioned, the throughput of a regular expression engine is fundamentally limited by the rate at which it can compute state transitions for the deterministic finite automaton. Resolving the next state based on the current state and input symbol is an inherently serial operation. In order to take advantage of the reconfigurable logic resources available on the preferred implementation platform, it is desired to maximize parallelism. Pipelining is a common technique for increasing the number of parallel operations in serial computations; however, it requires that the processing pipeline be free of feedback loops. The outputs of operations at a given stage of the pipeline cannot depend upon the results of a stage later in the pipeline. As shown in
Regular Expression Firmware: Alphabet Encoding
The Alphabet Encoding block assigns an Equivalence Class
Identifier (ECI) for a set of m input symbols. If each input symbol is specified using i bits and an ECI is specified using p bits, then the Alphabet Encoding block essentially reduces the input from mi bits to p bits. A straightforward method for performing this operation is to perform pairwise combinations using direct-addressed tables. As shown in
The logic required to implement subsequent stages of the
Regular Expression Firmware: Buffers
Each regular expression engine preferably includes small input and output buffers. The input buffers prevent a single engine from stalling every engine in the array when it must retrieve a transition table column that spans multiple words in the TTM. While the entire array must stall when any engine's input buffer fills, the input buffers help isolate the instantaneous fluctuations in file input rates. The output buffers allow the regular expression engines to continue processing after it has found a match and prior to the match being transmitted to the Results Processor. The Context Buffer preferably services the output buffers of the regular expression engines in round-robin fashion. If the output buffer of any engine fills, then the engine must stall prior to sending another result to the output buffer. The array preferably must stall if the engine's input buffer fills.
While the present invention has been described above in relation to its preferred embodiment, various modifications may be made thereto that still fall within the invention's scope. Such modifications to the invention will be recognizable upon review of the teachings herein. For example, while the transition tables have been described herein such that the rows correspond to states and the columns correspond to ECIs, it should be readily understood that the rows and columns of any of the tables described herein can be reversed. As such, the full scope of the present invention is to be defined solely by the appended claims and their legal equivalents.
This patent application is a divisional of patent application Ser. No. 11/293,619, filed Dec. 2, 2005, entitled “Method and Device for High Performance Regular Expression Pattern Matching”, now U.S. Pat. No. 7,702,629, the entire disclosure of which is incorporated herein by reference.
Number | Name | Date | Kind |
---|---|---|---|
3601808 | Vlack | Aug 1971 | A |
3611314 | Pritchard, Jr. et al. | Oct 1971 | A |
3729712 | Glassman | Apr 1973 | A |
3824375 | Gross et al. | Jul 1974 | A |
3848235 | Lewis et al. | Nov 1974 | A |
3906455 | Houston et al. | Sep 1975 | A |
4081607 | Vitols et al. | Mar 1978 | A |
4298898 | Cardot | Nov 1981 | A |
4314356 | Scarbrough | Feb 1982 | A |
4385393 | Chaure et al. | May 1983 | A |
4464718 | Dixon et al. | Aug 1984 | A |
4550436 | Freeman et al. | Oct 1985 | A |
4823306 | Barbic et al. | Apr 1989 | A |
4941178 | Chuang | Jul 1990 | A |
5023910 | Thomson | Jun 1991 | A |
5050075 | Herman et al. | Sep 1991 | A |
5101424 | Clayton et al. | Mar 1992 | A |
5140692 | Morita | Aug 1992 | A |
5163131 | Row et al. | Nov 1992 | A |
5179626 | Thomson | Jan 1993 | A |
5226165 | Martin | Jul 1993 | A |
5243655 | Wang | Sep 1993 | A |
5249292 | Chiappa | Sep 1993 | A |
5255136 | Machado et al. | Oct 1993 | A |
5265065 | Turtle | Nov 1993 | A |
5319776 | Hile et al. | Jun 1994 | A |
5327521 | Savic et al. | Jul 1994 | A |
5339411 | Heaton, Jr. | Aug 1994 | A |
5347634 | Herrell et al. | Sep 1994 | A |
5371794 | Diffie et al. | Dec 1994 | A |
5388259 | Fleischman et al. | Feb 1995 | A |
5396253 | Chia | Mar 1995 | A |
5418951 | Damashek | May 1995 | A |
5421028 | Swanson | May 1995 | A |
5432822 | Kaewell, Jr. | Jul 1995 | A |
5440723 | Arnold et al. | Aug 1995 | A |
5461712 | Chelstowski et al. | Oct 1995 | A |
5465353 | Hull et al. | Nov 1995 | A |
5481735 | Mortensen et al. | Jan 1996 | A |
5488725 | Turtle et al. | Jan 1996 | A |
5497488 | Akizawa et al. | Mar 1996 | A |
5544352 | Egger | Aug 1996 | A |
5546578 | Takada et al. | Aug 1996 | A |
5651125 | Witt et al. | Jul 1997 | A |
5701464 | Aucsmith | Dec 1997 | A |
5721898 | Beardsley et al. | Feb 1998 | A |
5740466 | Geldman et al. | Apr 1998 | A |
5774835 | Ozawa et al. | Jun 1998 | A |
5774839 | Shlomot | Jun 1998 | A |
5781772 | Wilkinson, III et al. | Jul 1998 | A |
5781921 | Nichols | Jul 1998 | A |
5805832 | Brown et al. | Sep 1998 | A |
5813000 | Furlani | Sep 1998 | A |
5819273 | Vora et al. | Oct 1998 | A |
5819290 | Fujita et al. | Oct 1998 | A |
5826075 | Bealkowski et al. | Oct 1998 | A |
5864738 | Kessler et al. | Jan 1999 | A |
5913211 | Nitta | Jun 1999 | A |
5930753 | Potamianos et al. | Jul 1999 | A |
5943421 | Grabon | Aug 1999 | A |
5943429 | Handel | Aug 1999 | A |
5978801 | Yuasa | Nov 1999 | A |
5991881 | Conklin et al. | Nov 1999 | A |
5995963 | Nanba et al. | Nov 1999 | A |
6023760 | Karttunen | Feb 2000 | A |
6028939 | Yin | Feb 2000 | A |
6044407 | Jones et al. | Mar 2000 | A |
6058391 | Gardner | May 2000 | A |
6067569 | Khaki et al. | May 2000 | A |
6070172 | Lowe | May 2000 | A |
6138176 | McDonald et al. | Oct 2000 | A |
6147976 | Shand et al. | Nov 2000 | A |
6169969 | Cohen | Jan 2001 | B1 |
6175874 | Imai et al. | Jan 2001 | B1 |
6226676 | Crump et al. | May 2001 | B1 |
6279113 | Vaidya | Aug 2001 | B1 |
6317795 | Malkin et al. | Nov 2001 | B1 |
6336150 | Ellis et al. | Jan 2002 | B1 |
6339819 | Huppenthal et al. | Jan 2002 | B1 |
6370645 | Lee et al. | Apr 2002 | B1 |
6377942 | Hinsley et al. | Apr 2002 | B1 |
6381242 | Maher, III et al. | Apr 2002 | B1 |
6389532 | Gupta et al. | May 2002 | B1 |
6397259 | Lincke et al. | May 2002 | B1 |
6397335 | Franczek et al. | May 2002 | B1 |
6412000 | Riddle et al. | Jun 2002 | B1 |
6430272 | Maruyama et al. | Aug 2002 | B1 |
6463474 | Fuh et al. | Oct 2002 | B1 |
6499107 | Gleichauf et al. | Dec 2002 | B1 |
6535868 | Galeazzi et al. | Mar 2003 | B1 |
6564263 | Bergman et al. | May 2003 | B1 |
6578147 | Shanklin et al. | Jun 2003 | B1 |
6625150 | Yu | Sep 2003 | B1 |
6704816 | Burke | Mar 2004 | B1 |
6711558 | Indeck et al. | Mar 2004 | B1 |
6765918 | Dixon et al. | Jul 2004 | B1 |
6772345 | Shetty | Aug 2004 | B1 |
6785677 | Fritchman | Aug 2004 | B1 |
6804667 | Martin | Oct 2004 | B1 |
6877044 | Lo et al. | Apr 2005 | B2 |
6901461 | Bennett | May 2005 | B2 |
6931408 | Adams et al. | Aug 2005 | B2 |
6944168 | Paatela et al. | Sep 2005 | B2 |
6978223 | Milliken | Dec 2005 | B2 |
6980976 | Alpha et al. | Dec 2005 | B2 |
6981054 | Krishna | Dec 2005 | B1 |
7019674 | Cadambi et al. | Mar 2006 | B2 |
7046848 | Olcott | May 2006 | B1 |
7093023 | Lockwood et al. | Aug 2006 | B2 |
7127510 | Yoda et al. | Oct 2006 | B2 |
7139743 | Indeck et al. | Nov 2006 | B2 |
7167980 | Chiu | Jan 2007 | B2 |
7181437 | Indeck et al. | Feb 2007 | B2 |
7181608 | Fallon et al. | Feb 2007 | B2 |
7222114 | Chan et al. | May 2007 | B1 |
7224185 | Campbell et al. | May 2007 | B2 |
7225188 | Gai et al. | May 2007 | B1 |
7257842 | Barton et al. | Aug 2007 | B2 |
7287037 | An et al. | Oct 2007 | B2 |
7305391 | Wyschogrod et al. | Dec 2007 | B2 |
7356498 | Kaminsky et al. | Apr 2008 | B2 |
7386564 | Abdo et al. | Jun 2008 | B2 |
7408932 | Kounavis et al. | Aug 2008 | B2 |
7411957 | Stacy et al. | Aug 2008 | B2 |
7444515 | Dharmapurikar et al. | Oct 2008 | B2 |
7454418 | Wang | Nov 2008 | B1 |
7457834 | Jung et al. | Nov 2008 | B2 |
7461064 | Fontoura et al. | Dec 2008 | B2 |
7467155 | McCool et al. | Dec 2008 | B2 |
7478431 | Nachenberg | Jan 2009 | B1 |
7480253 | Allan | Jan 2009 | B1 |
7558925 | Bouchard et al. | Jul 2009 | B2 |
7565525 | Vorbach et al. | Jul 2009 | B2 |
7636703 | Taylor | Dec 2009 | B2 |
7685254 | Pandya | Mar 2010 | B2 |
7701945 | Roesch et al. | Apr 2010 | B2 |
7702629 | Cytron et al. | Apr 2010 | B2 |
7783862 | Cameron | Aug 2010 | B2 |
7805392 | Steele et al. | Sep 2010 | B1 |
20010014093 | Yoda et al. | Aug 2001 | A1 |
20010052038 | Fallon et al. | Dec 2001 | A1 |
20010056547 | Dixon | Dec 2001 | A1 |
20020031125 | Sato | Mar 2002 | A1 |
20020069370 | Mack | Jun 2002 | A1 |
20020095512 | Rana et al. | Jul 2002 | A1 |
20020105911 | Pruthi et al. | Aug 2002 | A1 |
20020129140 | Peled et al. | Sep 2002 | A1 |
20020162025 | Sutton et al. | Oct 2002 | A1 |
20020166063 | Lachman et al. | Nov 2002 | A1 |
20030009693 | Brock et al. | Jan 2003 | A1 |
20030014662 | Gupta et al. | Jan 2003 | A1 |
20030023876 | Bardsley et al. | Jan 2003 | A1 |
20030037037 | Adams et al. | Feb 2003 | A1 |
20030043805 | Graham et al. | Mar 2003 | A1 |
20030051043 | Wyschogrod et al. | Mar 2003 | A1 |
20030065607 | Satchwell | Apr 2003 | A1 |
20030065943 | Geis et al. | Apr 2003 | A1 |
20030074582 | Patel et al. | Apr 2003 | A1 |
20030110229 | Kulig et al. | Jun 2003 | A1 |
20030115485 | Milliken | Jun 2003 | A1 |
20030163715 | Wong | Aug 2003 | A1 |
20030177253 | Schuehler et al. | Sep 2003 | A1 |
20030221013 | Lockwood et al. | Nov 2003 | A1 |
20040015633 | Smith | Jan 2004 | A1 |
20040028047 | Hou et al. | Feb 2004 | A1 |
20040049596 | Schuehler et al. | Mar 2004 | A1 |
20040054924 | Chuah et al. | Mar 2004 | A1 |
20040064737 | Milliken et al. | Apr 2004 | A1 |
20040100977 | Suzuki et al. | May 2004 | A1 |
20040105458 | Ishizuka | Jun 2004 | A1 |
20040111632 | Halperin | Jun 2004 | A1 |
20040162826 | Wyschogrod et al. | Aug 2004 | A1 |
20040177340 | Hsu et al. | Sep 2004 | A1 |
20040196905 | Yamane et al. | Oct 2004 | A1 |
20040205149 | Dillon et al. | Oct 2004 | A1 |
20050005145 | Teixeira | Jan 2005 | A1 |
20050086520 | Dharmapurikar et al. | Apr 2005 | A1 |
20050175010 | Wilson et al. | Aug 2005 | A1 |
20050187974 | Gong | Aug 2005 | A1 |
20050195832 | Dharmapurikar et al. | Sep 2005 | A1 |
20050229254 | Singh et al. | Oct 2005 | A1 |
20060031263 | Arrouye et al. | Feb 2006 | A1 |
20060036693 | Hulten et al. | Feb 2006 | A1 |
20060047636 | Mohania et al. | Mar 2006 | A1 |
20060053295 | Madhusudan et al. | Mar 2006 | A1 |
20060129745 | Thiel et al. | Jun 2006 | A1 |
20060242123 | Williams, Jr. | Oct 2006 | A1 |
20060269148 | Farber et al. | Nov 2006 | A1 |
20060294059 | Chamberlain et al. | Dec 2006 | A1 |
20070011183 | Langseth et al. | Jan 2007 | A1 |
20070067108 | Buhler et al. | Mar 2007 | A1 |
20070078837 | Indeck et al. | Apr 2007 | A1 |
20070112837 | Houh et al. | May 2007 | A1 |
20070118500 | Indeck et al. | May 2007 | A1 |
20070174841 | Chamberlain et al. | Jul 2007 | A1 |
20070209068 | Ansari et al. | Sep 2007 | A1 |
20070237327 | Taylor et al. | Oct 2007 | A1 |
20070260602 | Taylor | Nov 2007 | A1 |
20070277036 | Chamberlain et al. | Nov 2007 | A1 |
20070294157 | Singla et al. | Dec 2007 | A1 |
20080086274 | Chamberlain et al. | Apr 2008 | A1 |
20080109413 | Indeck et al. | May 2008 | A1 |
20080114724 | Indeck et al. | May 2008 | A1 |
20080114725 | Indeck et al. | May 2008 | A1 |
20080114760 | Indeck et al. | May 2008 | A1 |
20080126320 | Indeck et al. | May 2008 | A1 |
20080133453 | Indeck et al. | Jun 2008 | A1 |
20080133519 | Indeck et al. | Jun 2008 | A1 |
20090262741 | Jungck et al. | Oct 2009 | A1 |
Number | Date | Country |
---|---|---|
0880088 | Nov 1996 | EP |
0851358 | Jul 1998 | EP |
0887723 | Dec 1998 | EP |
0911738 | Apr 1999 | EP |
02136900 | May 1990 | JP |
03014075 | Jan 1991 | JP |
2000286715 | Oct 2000 | JP |
2002101089 | Apr 2002 | JP |
9905814 | Feb 1999 | WO |
9955052 | Oct 1999 | WO |
0041136 | Jul 2000 | WO |
0122425 | Mar 2001 | WO |
0139577 | Jun 2001 | WO |
0161913 | Aug 2001 | WO |
0180082 | Oct 2001 | WO |
0180558 | Oct 2001 | WO |
02061525 | Aug 2002 | WO |
02082271 | Oct 2002 | WO |
03036845 | May 2003 | WO |
2004017604 | Feb 2004 | WO |
2004042560 | May 2004 | WO |
2004042561 | May 2004 | WO |
2004042562 | May 2004 | WO |
2004042574 | May 2004 | WO |
2005017708 | Feb 2005 | WO |
2005026925 | Mar 2005 | WO |
2005048134 | May 2005 | WO |
2006023948 | Mar 2006 | WO |
2006096324 | Sep 2006 | WO |
2007087507 | Aug 2007 | WO |
Number | Date | Country | |
---|---|---|---|
20100198850 A1 | Aug 2010 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 11293619 | Dec 2005 | US |
Child | 12703388 | US |