The present invention relates to the detection of a data pattern by a computational system. The present invention more particularly relates to the rapid detection of a data pattern matching a signature, wherein the data pattern may be located within a formatted message or other data file.
Organizations, such as government departments and business enterprises that are dependent upon information technology systems often seek to detect the presence of a one or more specific data patterns within incoming messages, outgoing messages, data files or other accessible patterns.
This need to sift through volumes of data to detect the presence of particular data patterns, is felt by numerous businesses, agencies and other organizations that possess proprietary communications networks that are communicatively coupled with the Internet or other external communications networks, such as a telephony network. This communicative engagement of these in-house communication networks typically enable the served organization to more effectively transmit and receive critical information and messages in rapid and accessible methodologies. In fact, many organizations could not function at an acceptable performance level without information technology communication from their internal network(s) to the Internet or other external communications system. However access to the proprietary network by incoming messages and computer-readable media bearing software code sourced from outside of the network creates a potential for the network to accept particular pre-identified data patterns without detection by a system administrator.
Network computers are often tasked as simultaneously providing a bridge and a gate between a private network and an external network. In their bridging function, network computers enable transmission of data traffic, to include electronic messages, to and from a distinct network. In their gating function, network computers may be directed to examine data traffic and, under pre-established conditions, to impede or deny transmission of data traffic. As described below, network computers may be employed under the International Standards Organization (ISO) Open Source Interconnection (OSI) network model to provide the most fundamental layers of connectivity between the private network and external information technology systems. As network computers may also be positioned within a private network to manage and enable communication among computational elements of the network, a set of network computers of a communications network can be positioned to monitor the nature of data traffic to and from, as well as within, a communications network.
Yet permitting electronic messages to pass from an external entity into a proprietary or private communications network (“network”) often creates the possibility of a security breach of the network by a computer software security exploit, such as a worm. It is well understood that a computer software virus is software that is executed by a computer without the knowledge or authorization of the computer user. The term virus as defined herein includes all forms of undesirable progam or executable content, including spyware, worms, adware, and other software that penetrates a network or an element of the network, such as a computer, wherein this penetration is not desired by a computer user, network manager, or other party having an interest in the network, whether the intent of the exploit is malicious or not.
Upon activation, certain types of virus software will initiate an attack on the network by making unauthorized and unwanted modifications to one or more components of, or to information stored on, a computer or other element of the network. In particular, some computer viruses are capable of altering or destroying data stored on disk, scrambling characters or symbols on a monitor screen, displaying messages, and other damaging acts. Many viruses' attacks include attempts to propagate themselves (i.e., “amplify”) onto other elements of the network. This amplification may be directed in part to accessible computer-readable media, to include non-volatile memory such as portable memory devices, diskettes or hard disks.
To overcome the problems created by computer viruses, users have developed a variety of “anti-virus” programs that both detect and remove known viruses. Most anti-virus software programs search for certain characteristic behaviors of the known computer viruses. Once detected, the computer viruses are removed. Examples of commercially available anti-virus programs include Spy Sweeper™ by Webroot and AntiVirus by Symantec. The term “anti-virus software” is intended to include all such software, including those that inspect network traffic for malicious content and execute in a network computer as well as the aforementioned examples that execute on client and server end systems.
Viruses sometimes reside within a piece of executable code attached to a bona fide electronic message or computer software program. A network can be breached in many ways. A network can be penetrated by a properly authorized user installing a software program onto a computer from computer-readable media, whereby the virus can penetrate the network from a trusted element of the network, as well as by reception via a communications link from an external network. These user-introduced infections can be very difficult to detect and eradicate by prior art network computers, as the sheer volume of traffic to inspect can overwhelm many such systems.
Prior art anti-virus software employed to detect attempted or successful intrusions into a network can be effective but require significant application of computational resources of the network. These anti-virus programs usually receive updates of signatures of newly active or identified viruses from a trusted outside source. The producers of anti-virus software maintain secure records of such signatures which may be, for example, checksums.
Many networks use an Open Source Interconnection network model wherein a seven layer-networking framework implements specific protocols at each layer. Prior art anti-virus software is more demanding of network computational resources when it operates at the higher layers. The application layer is the highest level, or level seven. The application layer supports end-user processes and software application execution. In this level seven sources and targets of communications are identified, quality of service is recognized, user authentication and privacy are addressed, and data syntax constraints are taken into account. The operations at level seven are application-specific. The application layer supports Telnet and FTP applications and includes tiered application architectures.
The sixth layer, or presentation layer, translates from application to network format, and vice versa, to provides independence from encryption formats and other differences in data representation. The syntax layer provides freedom from data format incompatibility by formatting and encrypting data to be sent across the network, providing freedom from compatibility problems. Data is thereby transformed by the presentation layer, also known as the syntax layer, into a form that the application layer can implement.
A session layer addresses session and connection coordination between applications. This fifth layer establishes, coordinates, and terminates conversations, exchanges, and dialogues and other communications activities between two or more applications.
The transport layer effectuates transfer of data between elements of the network. This fourth layer provides end-to-end error recovery and is responsible for complete data transfer.
The third layer, or network layer, creates virtual for transmitting data from node to node by means of circuits switching and routing actions. The network layer executes packet addressing and sequencing, routing and forwarding, internetworking, error handling, and congestion management
At the second layer, or data link layer, data packets are encoded and decoded into bits. The data link layer handles errors in the physical layer, flow control and frame synchronization and provides transmission protocol knowledge and management to the network. A Media Access Control sublayer, or MAC sublayer, of the data link layer controls how computers and other elements of the network gain access to data and permission to transmit messages. An LLC sublayer controls frame synchronization, aspects of flow control, and error checking.
The physical layer conveys the bit streams into and out of the network, at the electrical and mechanical level. This first layer employs the hardware means of sending and receiving data on a carrier by delivering electrical impulses, light or radio signals to and from the network. The physical layer defines cables, cards, and other physical aspects of the network.
The higher the level within which an anti-virus functions generally the greater the demand on network resources imposed by the anti-virus software on the network. It is therefore a long felt need to generate systems and software that can efficiently and rapidly detect a specified data pattern in messages and data files entering, leaving, stored within, or accessible to an information technology system or network. As a subset of this long felt need, for pattern detection, there is a widely felt need to detect an attempted penetration, or presence, of a virus into or within a network and at lower levels of the networking protocol network.
These and other objects will be apparent in light of the prior art and this disclosure. The present invention provides a method and system for detecting a pattern included within and or derived from a data packet received from, or an electronic document accessible via, a source located off-chip and communicated to a pattern detection module. It is understood that the pattern detection module may be configured in part or entirely on a single semiconductor substrate, wherein an element of the pattern detection module may be located on-chip with one or more other elements of the pattern detection module.
In a first preferred embodiment of the method of the present invention a computational system is provided for detection of a data pattern comprised within a data file, such as a packet of an electronic message or other electronic document. A pattern detection module, configured as intrusion detection module of the computational system, is informed of one or more patterns of data to seek in the data file. These sought for data patterns are referred to as signatures and are stored within or accessible to signature blocks of the intrusion detection module. It is understood that the presence of a data pattern that is coded in a signature my present a data pattern that is not a portion of a worm or virus, but may rather indicate an actual or potential activity or attempted intrusion by or of a virus or worm.
It is further understood that seeking the presence of signatures in the data file may occur, in certain alternate preferred embodiments of the method, after the data of the data file has been modified by suitable techniques known in the art to seek obfuscated or otherwise arranged or encrypted data patterns.
In a first preferred embodiment of the present invention a pattern detection module is configured as an intrusion detection module and is programmed and employed to detect intrusions and attempted intrusions of a computer software virus (“virus”) into a communications network of an information technology system. In certain various alternate preferred embodiments of the method of the present invention the pattern of the data packet sought is related to or derived from a universal resource locator (“URL”), a portion of content data, a traffic classification indicator, and/or other computer software screening techniques. The first preferred embodiment of the method of the present invention provides an intrusion detection system for detecting a virus by identifying it's signature or bit pattern in a data packet, where the system includes a data packet normalization pipeline (“pipeline”), a signature block, and a shift register, where the pipeline accepts a data packet and generates a normalized data packet by hardware processing of the data packet. The normalized data packet is then sequenced through the shift register, and succeeding windows of the normalized data packet are compared with one or more virus signatures stored in the signature block. The normalization pipeline may optionally comprise one or more hardware normalization modules to include, a backslash converter circuit, a “/../” detector, a “/././” compressor, a numeric compressor, and/or a “whitespace” remover.
Certain alternate preferred embodiments of the method of the present invention comprise a method for determining if a data packet evidences a virus signature where the method includes one or more of the following steps:
In certain still alternate preferred embodiments of the present invention, an information technology system has a CPU, a shift register for streaming through a plurality of packets of binary data, a first signature register and a second signature register, wherein a method of pattern detection is executed, the method comprising:
The first signature and/or second signature may be a pattern related or derived from a virus, a URL a traffic classification indicator, and/or a portion of content of a data packet. There may be one or more value positions of a signature that is a “do not care” value or a case insensitive value.
The CPU may optionally prevent the transmission of a data packet to an address specified by the data packet when a match is determined to exist between the instantaneous values of the shift register and either the first signature or the second signature.
Certain still alternate preferred embodiments of the present invention include one or more of the following steps:
Certain yet alternate preferred embodiments of the present invention include one or more of the following steps:
The information technology system may, in certain yet alternate preferred embodiments of the present invention, include:
Certain other alternate preferred embodiments of the present invention include an integrated circuit comprising a normalization pipeline, the normalization pipeline located within the substrate and communicatively coupled with the data source and the shift register, and the normalization pipeline for accepting the data stream from the data source, deriving a normalized binary pattern from a first packet of the data stream, and for providing the normalized binary pattern to the shift register, whereby the comparisons with the first signature and the second signature are made with a normalized binary pattern. The integrated circuit may further comprise a plurality of signature registers located within the substrate and communicatively coupled with the shift register, and the plurality of signature registers, each register for accepting a portion of a plurality of portions of the first signature, wherein the plurality of portions of the first signature are sequentially stored in the plurality of signature registers, and the plurality of portions of the first signature is sequentially compared against the instantaneous values of the shift register, whereby a data packet of length equal to or less than the first signature is substantially simultaneously compared for a match with a first packet of the plurality of data packets. In still other preferred embodiments of the method of the present invention, a plurality of portions of the first signature is sequentially compared against the instantaneous values of the first packet and a second packet as sequenced through the shift register, whereby two data packets of summed length equal to or less than the first signature is substantially simultaneously compared for a match with a first signature.
Certain still alternate preferred embodiments of the present invention provide a computer-readable memory medium on which are stored a plurality of computer-executable instructions for performing aspects of the present invention as recited herein
The information technology system, having a central processing unit (“CPU”), a shift register for streaming through binary data, and a first signature register and a second signature registers, may execute a method of virus intrusion detection comprising:
Certain yet other alternate preferred embodiments of the present invention comprise a programmable logic device, such as a programmable gate array, to perform one or more of the steps or aspects of the present invention as recited herein.
Certain still alternate preferred embodiments of the method of the present invention enable and apply the intrusion detection module to detect the presence of data patterns wherein the data pattern is not a component of a virus or a worm, but indicates that an intrusion attempt may be in progress. Certain other alternate preferred embodiments of the method of the present invention enable and apply the intrusion detection module to detect the presence of data patterns wherein the data pattern is not a component of a pre-specified pattern, but where the detection of the data pattern does indicate a potential instantiation, presence, or attempted intrusion of a pre-specified data pattern.
Various modifications may be made without departing from the invention. It is understood that the invention has been disclosed herein in connection with certain examples and embodiments. However, such changes, modifications or equivalents as can be used by those skilled in the art are intended to be included. Accordingly, the disclosure is to be construed as exemplary, rather than limiting, and such changes within the principles of the invention as are obvious to one skilled in the art are intended to be included within the scope of the claims.
These, and further features of the invention, may be better understood with reference to the accompanying specification and drawings depicting the preferred embodiment, in which:
In describing the preferred embodiments, certain terminology will be defined. Such terminology is intended to encompass the recited embodiment, as well as all technical equivalents, which operate in a similar manner for a similar purpose to achieve a similar result.
The terms “computer” and “workstation” as used herein are defined to comprise an electronic computational or communications device that may communicate, or be configured to communicate, data or signals via a computer-readable medium, the Internet and/or other suitable computer networks known in the art, or may be communicatively linked with at least one computer-readable medium.
The term “computer-readable medium” as used herein refers to any suitable medium known in the art that participates in providing instructions to the network for execution. Such a medium may take many forms, including but not limited to, non-volatile media, volatile media, and transmission media. Non-volatile media includes, for example, optical or magnetic disks, tapes and thumb drives. Volatile media includes dynamic memory. Transmission media includes coaxial cables, copper wire and fiber optics. Transmission media can also take the form of acoustic or light waves, such as those generated during radio-wave and infra-red data communications.
Common forms of computer-readable media include, for example, a floppy disk, a flexible disk, hard disk, magnetic tape, or any other magnetic medium, a CD-ROM, any other optical medium, RAM, a PROM, and EPROM, a FLASH-EPROM, any other memory chip or cartridge, a carrier wave as described hereinafter, or any other medium from which a computer can read.
Various forms of computer readable media may be involved in carrying one or more sequences of one or more instructions to the network for execution. For example, the instructions may initially be carried on a magnetic disk of a remote computer. The remote computer can load the instructions into its dynamic memory and send the instructions over a telephone line using a modem. A modem local to or communicatively linked with the network can receive the data on the telephone line and use an infra-red transmitter to convert the data to an infra-red signal. An infra-red detector can receive the data carried in the infra-red signal and appropriate circuitry can provide the data to the network.
Referring now generally to the Figures and particularly to
Referring now generally to the Figures and particularly
The term element is defined herein to include computers, workstations, data storage devices, wireless computational devices and other suitable computational and communications devices and systems known in the art.
Referring now generally to the Figures and particularly to
Referring now generally to the Figures and particularly to
Referring now generally to the Figures and particularly to
In certain still alternate preferred embodiments of the present invention, the information technology system has a CPU, a shift register for processing streams of packets of binary data, a first signature register and a second signature register, wherein a method of pattern detection is executed, the method comprising:
The first signature and/or second signature may be patterns related or derived from a virus, a universal resource locator, a traffic classification indicator, and/or a portion of content of a data packet. There may be one or more value positions of a signature that are null values (i.e. “do not care” values) or a case insensitive value.
The CPU may optionally prevent the transmission of the first data packet to an address specified by the first data packet when a match is determined to exist between the instantaneous values of the shift register and either the first signature or the second signature.
Certain other alternate preferred embodiments of the present invention include one or more of the following steps in detecting potential viruses in the data packets:
Certain yet alternate preferred embodiments of the present invention include one or more of the following steps in detecting potential viruses in the data packets:
The information technology system may further include a third signature register, where the third signature register records the value of a third signature, whereby the information technology system may substantively simultaneously compare the first signature and the third signature with the contents of the shift register after each advance of the first packet of the data stream through the shift register. The CPU may additionally be informed when a match is determined to exist between the instantaneous values of the shift register and either the first signature or the third signature.
The information technology system may, in certain yet alternate preferred embodiments of the present invention, include:
Certain other alternate preferred embodiments of the present invention include an integrated circuit comprising a normalization pipeline, the normalization pipeline located within the substrate and communicatively coupled with the data source a shift register, and the normalization pipeline for accepting the data stream from the data source, deriving a normalized binary pattern from a first packet of the data stream, and for providing the normalized binary pattern to the shift register, whereby the comparisons with a first virus signature and the second virus signature are made with a normalized binary pattern. The integrated circuit may further comprise a plurality of signature registers located within the substrate and communicatively coupled with the shift register, and the plurality of signature registers for each accepting a portion of a plurality of portions of the first signature, wherein the plurality of portions of the first signature are sequentially stored in the plurality of signature registers, and the plurality of portions of the first signature is sequentially compared against the instantaneous values of the shift register, whereby a data packet of length equal to or less than the first signature is concurrently compared for a match with a first packet of the plurality of data packets. In still other preferred embodiments of the method of the present invention, a plurality of portions of the first signature are sequentially compared against the instantaneous values of the first packet and a second packet as sequenced through the shift register, whereby two data packets of summed length equal to or less than the first signature are substantially simultaneously compared for a match with a first signature
Certain other alternate preferred embodiments of the present invention provide a computer-readable memory medium on which are stored a plurality of computer-executable instructions for performing aspects of the present invention as recited herein.
The information technology system, having a central processing unit (“CPU”), a shift register for processing binary data, and a first signature register and a second signature register, may execute a method of virus intrusion detection comprising:
Certain yet other alternate preferred embodiments of the present invention comprise a programmable logic device, such as a programmable gate array, to perform one or more of the steps or aspects of the present invention as recited herein.
Referring now generally to the Figures and particularly to
Referring now generally to the Figures and particularly to
The packet 33 is accepted from the packet normalization pipeline 38 by a virus signature comparison circuit 56. The virus signature comparison circuit 56 compares data derived from or otherwise related to content of the packet 33 with the virus signatures 40 stored in the signature blocks 42. A state payload 92, as described in
Referring now generally to the Figures and particularly to
Referring now generally to the Figures and particularly to
Referring now generally to the Figures and particularly to
The data file 102 or the processed data file 104, or a portion of the data file 102 or processed data file 104, is loaded into the shift register 48 in Step A12. In optional step A14 the signature block 42 is configured to link two or more signatures 100 stored in the shift registers 42A, 42B & 42C whereby a plurality of signatures 100 are organized for comparison in series with the contents of the shift register 48. Optional step A14, in combination with the others steps of the method of
The network computer 10 determines in step A24 if the steps A6 through A24 should be again executed, or if the method of
Those skilled in the art will appreciate that various adaptations and modifications of the aforementioned described preferred embodiments can be configured without departing from the scope and spirit of the invention. Other suitable techniques and methods known in the art can be applied in numerous specific modalities by one skilled in the art and in light of the description of the present invention described herein. Therefore, it is to be understood that the invention may be practiced other than as specifically described herein. The above description is intended to be illustrative, and not restrictive. Many other embodiments will be apparent to those of skill in the art upon reviewing the above description. The scope of the invention should, therefore, be determined with reference to the knowledge of one skilled in the art and in light of the disclosures presented above.