1. Field of the Invention
The present invention generally relates to data communications, and more specifically, relates to a system and method for providing security in during data transfers.
2. Description of the Related Art
Computer viruses and worms have caused millions dollars in computer and network downtimes and they made computer virus detection and elimination a thriving industry. Now, every computer is equipped with computer virus detection and prevention software, and every data network gateway is guarded with equally powerful virus detection and prevention software.
Computer virus, bugs, and worms are undesirable software developed by computer hackers or computer whiz kids, who are either testing their programming skills or having other ulterior motives. Like any software, each of these undesired viruses, bugs and worms have a unique digital signature. Once a virus became known, its digital signature is cataloged and made public. Once a virus's signature is known, computer virus prevention software can test incoming data in a data stream for this particular signature. If an incoming data contains this signature, then it is flagged as unsafe data and rejected.
The computer virus prevention software tests an incoming data against signatures of all known viruses, which number is in tens of thousands and still growing. Comparing each incoming data against a growing database of known viruses can be time consuming and slows down data traffic. To ensure a virus free environment, this comparison or screening of data is performed by all network gateways and on every single computer. This “global” comparison slows down substantially the data traffic, even when the majority of the data trafficking in a network at any given time is free of viruses, i.e., they are safe data.
Therefore, it is desirous to have an apparatus and method that enable rapid transfer of safe data in a data communication system, and it is to such apparatus and method the present invention is primarily directed.
Briefly described, an apparatus and method of the invention enables expeditious processing of an incoming data by quickly identifying safe data and releasing them for further processing. In one embodiment, there is provided a method for a computing device to identify safe data in a data stream, wherein the data stream is received from a network and may contain unsafe data. Each unsafe datum is identified by a unique data signature and the computing device has a plurality of unsafe data signatures identifying unsafe data. The method includes creating at least one matrix that has a first number of elements, for each unsafe data signature in the plurality of the unsafe data signatures, analyzing a first predetermined portion of a unsafe data signature, marking a position in the at least one matrix for each analysis result of each unsafe data signature, analyzing the data stream, comparing an analysis result with the at least one matrix, and, if a position in the at least one matrix corresponding to the at least one analysis result is un-marked, identifying the data stream as safe data.
In another embodiment, there is provided an apparatus for identifying safe data in a data stream, wherein the data stream is received from a network and may contain unsafe data and each undesirable datum is identified by a unique data signature. The apparatus includes a data receiver for receiving data from a data source, a plurality of filtering matrices, and a data analyzer for analyzing the received data against the plurality of filtering matrices. Each filtering matrix has a plurality of elements, and each element has two distinguished states, wherein a data signature of an unsafe datum is represented by a plurality of elements in a first state distributed among the plurality of filtering matrices. If the received data do not match to any element in the first state in the plurality of the matrices, the received data is classified as safe data.
In yet another embodiment, there is provided an apparatus for identifying safe data in a data stream, wherein the data stream is received from a network and may contain unsafe data and each unsafe datum being identified by a unique data signature. The apparatus includes a data receiver for receiving data from a data source, a database of unsafe data with a plurality of entries, a plurality of matrices, and a content pre-filtering engine for comparing a received data with a predetermined portion of each unsafe datum. Each entry of the database has an unsafe datum, and each filtering matrix has a plurality of elements, wherein each element has two distinguished states. The predetermined portion is less than the entire unsafe datum.
The present system and methods are therefore advantageous as they enable rapid transfer of safe data in a data communication system. Other advantages and features of the present invention will become apparent after review of the hereinafter set forth Brief Description of the Drawings, Detailed Description of the Invention, and the Claims.
In this description, the term “application” as used herein is intended to encompass executable and nonexecutable software files, raw data, aggregated data, patches, and other code segments. The term “exemplary” is meant only as an example, and does not indicate any preference for the embodiment or elements described. Further, like numerals refer to like elements throughout the several views, and the articles “a” and “the” includes plural references, unless otherwise specified in the description.
In overview, the present system and method enables fast transfer of safe data by identifying the safe data through comparison with a plurality of matrices.
The pre-filtering is done by comparing the signature of an incoming data with signatures of known unsafe data, which includes virus, spyware, attacks, and unauthorized contents. However, instead of comparing the signature of the incoming data with signatures of every known unsafe data, the pre-filtering compares the signature of the incoming data with a select portion of every unsafe data. If there is no match, then the incoming data is classified as safe data. If a portion of the signature of the incoming data matches the select portion of an unsafe data, then the incoming data is a suspect data, i.e., the incoming data may contain unsafe data. To further verify the incoming data, a subsequent portion of the signature of the incoming data is compared against a next select portion of every unsafe data. If there is no match in this second match, then the previous match is a false positive and the incoming data is safe. If the subsequent portion of the signature of the incoming data matches the next select portion of an unsafe data, the possibility of the incoming data being an unsafe data increases. The system can select to perform complete analysis of the incoming data if the possibility reaches a certain level. The possibility can be adjusted by controlling the number of matches is performed on the incoming data. The larger the number of the comparisons the larger is the possibility the incoming data is an unsafe data if the incoming data matches all the comparisons.
The comparisons may be accomplished in different ways. An expeditious way the comparison can be done is by creating a matrix of M×N elements, where each element may be zero or one. Initially the elements are unset and an element may be set if its position corresponds to a select portion of the signature of an unsafe data. When checking the incoming data, a predetermined portion of the signature of the incoming data is compared with an element corresponding to the predetermined portion of the signature of the incoming data. If the element is set, then there is a possibility that the incoming data may be an unsafe data, and further analysis may be warranted.
Octonary representations for all the entries in illustrated in
The matrices in
However, if a portion of the signature of the incoming data matches a set bit in the matrix 402, then a subsequent portion of the same signature is compared against the matrix 404 in a similar manner. If there is no match in the matrix 404, then a new shifted portion of the same signature is compared with the matrix 402 and the operations described above are repeated. On the other hand, if there is a match in the matrix 404, then another portion (a new shifted portion) of the signature is compared against the matrix 406. If there is a match again in the matrix 406, the incoming data is a good candidate for a complete analysis, where the incoming data will be matched against all known virus. If there is no match, another new portion of the same signature is compared with the matrix 402 and operations described above are repeated.
Having matched three matrices does not mean necessary the incoming data contains a virus; it may be a false positive case, where there are positive indications of a presence of a virus, but further a further analysis may prove the incoming data does not contain any virus. The possibility of a false positive can be reduced by increasing the number of matrices used for comparison. Taking the example of
The matrices described above can be implemented either in hardware, for example using registers, or in software, for example using data arrays. The matrices can be reloaded at any time and the performance is not affected by the size of signatures.
When there is a match, the incoming data stream is flagged as potentially having a virus and should be further checked. To reduce the possibility of a false positive, the next set of bits, 001 110, are checked against the next matrix 404. If the incoming data stream has a virus, it must include the entire signature of the virus. The signature of the next set of bits in the octonary system is 15 and is checked against the matrix 404. There is no match in the matrix 404 since the element at the position (1, 5) is not set. Because there is no match, the regular checking by shifting the mask is resumed and the bits 111 110 are selected for analysis against the matrix 402. The process continues until the entire incoming data are checked against the matrices.
If there are matches against three matrices, then the incoming data is selected for a full comparison against the entire virus database. Since most of data are virus free, the majority of data will be released for processing after passing through this pre-filtering stage. Only those data that have matches in all three matrices will be analyzed in detail. This approach quickly frees up the majority of data for normal processing, and thus increasing the performance of a system.
If, when comparing a portion of the data with a first matrix, there is a match, then a second portion of the data is matched against a second matrix, step 714. If there is another match against the second matrix, then the chance of the data containing a virus increases and the data maybe sent for a complete checking against virus, step 718. If there is no match in this second matrix, then the mask is shifted to take a new portion of the data for analysis against the first matrix and the process repeats until the end of the data. When the entire data have been analyzed and no match was found, then the data is sent for processing, step 720. Those skilled in the art will appreciate that the process illustrated in
In view of the method being executable on networking devices and servers, the method can be performed by a program resident in a computer readable medium, where the program directs a server or other computer device having a computer platform to perform the steps of the method. The computer readable medium can be the memory of the server, or can be in a connective database. Further, the computer readable medium can be in a secondary storage media that is loadable onto a networking computer platform, such as a magnetic disk or tape, optical disk, hard disk, flash memory, or other storage media as is known in the art.
In the context of
While the invention has been particularly shown and described with reference to a preferred embodiment thereof, it will be understood by those skilled in the art that various changes in form and detail may be made without departing from the spirit and scope of the present invention as set forth in the following claims. Furthermore, although elements of the invention may be described or claimed in the singular, the plural is contemplated unless limitation to the singular is explicitly stated.