The present invention concerns a system and method to be applied in a communications system for detecting the presence of one or more words out of a predetermined list in a string of data.
Such checking procedure may be employed to detect and block packets of data containing Virus signatures or to detect and block addresses of specific Internet sites.
When searching in data the presence of words out of a pre-defined list, the eventual position of the word in the data is not known, and usually there is no marking or sign that indicates the beginning of the searched words in the data. For example, if the data to be searched is a packet of byte data, the words to be searched may start at any byte position in the data. The number of searches to be done is thus very high, because each starting position is to be checked, and the total searching time may be very long.
Such a checking procedure, when used to check data, will reduce the operating speed of the total system. In existing systems, mainly software is used to perform the comparison and the processing unit will make one comparison of one word from the list with one portion of the data. Since there may be many words in the list, the processor will use a procedure of dividing the data in sections, each section having the size of one word of the list. Then the processor will make a comparison of each section with the same word. This entire procedure will then be repeated as many times as there are words in the list. The total time required for the data checking operation may then be very long, and will significantly reduce the operation speed of the communication or computer system.
It is therefore desirable to provide a checking system that is capable of detecting the presence in a batch of data of one or more words out of a pre-stored list of such words at high speed.
The present invention provides a system and method for checking the presence of one or several words from a given list in a string of sub-words. The list of words is stored in a memory array comprising one comparator for each memory cell storing one sub-word. The string of data to be checked is divided in a series of sub-strings. Each Sub-string is loaded several times unto a compare register, each time being roll-shifted by one sub-word. At each memory cell, simultaneous comparisons are done with the input sub-string. A logic circuit is associated with each memory cell to detect consecutive matching of sub-words of the input string with the sub-words of a word of the list. Whenever a match occurs for a full word of the list, a signal is set for this word. Finally a global Match signal is set, and a priority encoder may be used to output the address (position) of one of the matching words.
The present invention provides a method and system for performing a checking operation whereby the presence of one or more words out of a predetermined list of words may be detected in a string of data. In accordance with the inventive system and method, the list may contain words of various length.
In a communication system, it may be desired to check whether data flowing through the communication apparatus contains one or several words out of a predetermined list of words. For example, the list of words may be a list of Virus signatures, and the novel system will be employed to detect and block packets of data containing such Virus Signature. In a second example, the list of words may be a list of Internet addresses of sites to which access is to be blocked.
In accordance with another application in a computer system, it may be desired to store data in different areas, the selection of the areas being dependent on the kind of data to be stored. For this purpose, a list of words is stored, that characterize the data selection, and if one word of the list is present in the data, then data is stored in a selected area. Several different lists of such words can be defined, and a classification of the data can be done, according to the words found in it. Such storage enables searching for a text containing one of the words related to a given subject.
The above described search procedures tend to considerably lower the overall system operation speed.
In the present invention a checking system is proposed wherein a large number of comparisons are done in a single comparison cycle, resulting in a considerably increased speed of operation. The size of Data that can be checked per time unit is very large, and the inventive checking procedure can be used in communication or computer systems with a minimum or null influence on the total system speed of operation.
A system built according to the invention can be used in many computer and communication systems, for example for virus detection, firewall, intelligent routers, protection against intruders, data-base management, etc . . .
Such a search procedure wherein a string of data is searched for the presence of one word from a list of words as opposed to a procedure where one predefined word is searched among a list of words shall be referred to herein as a “Reverse Search” system.
The invention will be described hereinbelow in detail in respect of a preferred embodiment. It will be understood however that many variations and modifications of the invention may be made without departing from the invention in its broader aspects and therefore the appended claims are to encompass within their scope all such changes and modifications as fall within the true spirit and scope of this invention.
In
An Input Buffer Register is used to store a number n of Sub-Words of the Input String, each Sub-Word being stored in one memory cell. A Buffer/Sectioner is used to divide the Input String in sub-strings of reduced size, each sub-string comprising a number of sub-words smaller or equal to the number of sub-words that can be stored in the Input Buffer register. This Buffer/Sectioner then sequentially writes the data of all the Sub-strings of the Input String into the Input Buffer Register for the purpose of comparing that data with the data stored in the Memory Array. All Sub-strings of the input string are successively input into the Input Buffer register. Each sub-string is then checked and compared to the “List of Words” by a procedure explained below. The whole Input String is then checked when all sections have been “passed and checked” through the Input Buffer register. The Buffer/Sectioner function is not shown here. It can be implemented by means of a processor and/or a logic circuit, using common software and hardware techniques. In particular, this Buffer Sectioner may be operating on flowing data, i.e. receiving the input string progressively from a communication line. In that case, the input string may be of infinite length. Each time n sub-words are received, the n sub-word sub-string is loaded to the Input Buffer Register and checked. In
Also shown in
Also shown in
The system shown in
As shown in
However each sub-word comprises a number of bits and, typically, one or two bit lines are needed per bit of the sub-word. It should be understood that per each sub-word, there are at least as many bit lines as there are bits in the sub-word.
For each memory cell, the comparator is designed to compare the Sub-Word stored in the Memory Cell with the Sub-Word of one Cell of the Compare register. This kind of comparator is of common use in Content Addressable Memories. The connection of the bit lines is cyclically arranged, so that the comparators of two adjacent memory cells k and k+1 receive as input the data of either two adjacent cells or of the last and first cell respectively, of the compare register. In the present specification a memory cell will be defined as “aligned” with a given cell of the Compare Register when the comparator of that cell of the memory array receives as input the data stored in the said cell of the compare register.
Referring again to
The required number of shift-roll operations in the preferred embodiment of the inventive method is equal to the number n of sub-words in the compare register. It will be understood however that the number of shift-roll operations may be smaller than n, depending on application requirements. It is envisaged within the framework of the inventive method that the inventive system may be operated in accordance with rules that define a restricted number of matching cases.
Thus for example, it may be desired to operate the reverse search system at high speed. For that purpose a method of reverse search may be applied wherein each word of the list is loaded twice in the Memory array and the said two appearances of the same word are positioned such that the first sub-word of the first appearance of the said word is aligned with a subword of the compare register that is removed at a distance of n/2 sub-words from the subword that is aligned with the first subword of the seond appearance of the said word. Due to this method of double loading the words of the list into the memory array, only n/2 roll/shifted positions will be needed when checking for the presence of one word of the list, since the eventual match may occur either on the first or second appearance of the word in the memory. It will be understood that where even faster operation is required the list of words may be loaded in the Memory array a number of times that is more than two.
Also in
In this preferred embodiment one delimiting line is used for each cell of the compare register. In that case this delimiting line being set to logical state 1 marks the first sub-word of the sub-string; the last sub-word of the sub-string is then marked by the delimiting line of the next sub-word being set. In another preferred embodiment within the scope of this invention, two different lines may be used per each sub-word, one being used to mark the first sub-word, and the other one to mark the last sub-word. In such case, the sub-string stored in the compare register may be of smaller size than the compare register itself. The preferred embodiment is described here with one delimiting line per compare register cell for the purpose of clarity only. In a system where both first and last sub-words of the sub-string stored in the compare register are marked by a signal these two marks are conveyed to all cells of the memory array that are aligned with the cells of the compare register in which the data of these first and last sub-words are stored. Where two delimiting lines are used per each subword of the compare register, the first subword of the substring is marked by setting one of these delimiting lines and the last subword of the substring is marked by setting the other delimiting line. The preferred embodiment is a simplified case where the first Sub-Word of the compare register is marked by setting one associated Delimiting line, whereby the preceding sub-word will be automatically marked as the last sub-word.
In
We shall first describe the general function of the Reverse Search system, then show the details of the logic circuit for the preferred embodiment.
As explained before, the Input string is divided into sub-strings by the buffer/sectioner and all sub-strings are loaded one by one to the Input Buffer Register. Each sub-string, when loaded in the input Buffer register, is shifted/rolled by one sub-word and loaded into the Compare Register n times. Each time the sub-string is loaded in the compare register with a given shift/roll, all comparators of the memory array execute simultaneously a comparison between the sub-words stored in the memory array and the sub-word stored in the aligned cell of the Compare Register. If a match is found, then a Sub-Word-Match signal SWMi is issued at each matching memory cell of the array. These Sub-Word-Match signals are then logically combined, by means of the L circuits, with a) the match signal of the preceding cell, b) the Start and End of Word signals, c) the delimiting line signals and finally d) the “Partial Match signal” of the preceding cell, in order to output a Word Match signal if any series of sub-words of the input string matches the series of sub-words of any word of the list.
The principle of the function of the logical combination is as follows:
A Word Match signal is issued at the ending sub-word of a word of the list if all preceding sub-words starting from the starting sub-word of the word have matching signals. This is checked by the generation of a Combined Match intermediate signal at each Memory Cell. This signal is set when the stored sub-word of the memory cell is found matching, and the preceding Combined Match signal is also set. In the case where the word is present in the input string, but it is split between several sub-strings, then Partial Match signals are set each time a series of sub-words is found matching up to the end of the sub-string, the last sub word of that substring being marked by the delimiting line. When the next sub-string is loaded and shifted, whenever the position of the first word of this sub string corresponds to the position next to that where a partial match was found, the partial match is used as a condition for checking the next sub string. In the event of a partial match in the first sub string, the comparison process will be continued into the second substring whereas in the event that no match was found in the first substring, the comparison process for this specific word will be discontinued.
Where the comparison process reaches the end of the word with consecutive match results a Word Match Signal is issued.
The Partial Match signal, when having been set, should be reset after being used for the match checking of the next sub-string. This is done in the following way: For each cell of the memory array, each time the correspondent delimiting line is set, indicating that the Aligned cell of the compare registers contains the first sub-word of the sub-string, then the Partial Match signal, if set, is first input to the L circuit, then reset to logical zero.
In
Each Li circuit outputs an intermediate combined signal, CBi, which is input to the next Li+1, circuit. This combined signal is output if one of the three following conditions is verified:
a)_ The signal CBi−1 of the preceding circuit Li−1 is also set and the Sub-Word i is found matching (i.e. the comparator Ci outputs a SWMi signal), and the delimiting line is not set. This case indicates that the sub-string has been found matching for all preceding sub-words, starting from the first sub-word of the word.
Or b)_ The delimiting line is set, and the Partial Match is set, and the Sub-Word is found matching (SWMi is set). This case occurs when the Partial Match has been set by a preceding operation on a preceding sub-string.
Or c)_ If the sub-word is the first one of the Word (The Start Word mark SWi is set), then CBi is set if the Sub-Word is found matching (SWMi is set).
The Partial Match PMi is set if the CBi is set, meaning that all preceding Sub-Words of the Word have been found matching, that the sub-word i is aligned with the ending sub-word of the sub-string and that the end of the word has not been reached. For this purpose, an AND function is provided that combines the CBi signal, the DL signal routed to the next Memory Cell and the inverse signal of WEi. The output of this AND function is then used to set the Partial Match signal.
A “Word Match” (WMi) is output if the following conditions are fulfilled: CBi is set, meaning that all preceding Sub-Words of the Word have been found matching, and the sub-word is marked as the last “sub-word” by the word end signal (WEi is set).
Finally, as shown in
Furthermore, as shown in
The reverse searching system and method of the invention have the advantage that a plurality of consecutive search operations, checking a string of subwords for the presence at any position of one or more words of a list of words stored in a memory array may be performed continuously whereby a considerable saving in operation time is achieved.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/IL01/00915 | 9/30/2001 | WO |