The above and other features and advantages of the present invention will become more apparent by describing in detail exemplary embodiments thereof with reference to the attached drawings in which:
The present invention will now be described more fully with reference to the accompanying drawings, in which exemplary embodiments of the invention are shown.
For convenience of explanation, a method of generating a signature according to an embodiment of the present invention will be referred to as an optimizing set of signatures (OS2) method.
Referring to
The major elements and operation flow of the apparatus will now be described. First, the substring set generation unit 110 generates a substring set that is regarded as attacking contents in a packet that are an object of examination. A substring set comparison unit 120 compares the generated substring set with existing signatures. If the generated substring set is already registered, a signature application unit 140 applies a security policy corresponding to the substring set. If the set is not registered, the substring set confirmation unit 150 verifies whether or not the generated substring set has a characteristic as a signature. The verified substring set, that is, the signature, is optimized in the signature optimization unit 160 and is registered in a signature database (DB) 130.
The substring set generation unit 110 combines substrings that appear more frequently than a predetermined number of times from among a plurality of substrings extracted from the packet, thereby generating a substring set. A detailed structure of the substring set generation unit 110 and a method of generating a substring set will be explained in more detail later with reference to
The substring set confirmation unit 150 examines the attacking characteristic of a packet having the substring set generating the substring set generation unit 110, thereby confirming whether or not this substring set can be used as a signature for detecting an attacking packet.
In order to achieve this, the number of destination addresses of the packet may be examined, and if the number of the destination addresses is equal to or greater than the predetermined value, the generated substring set may be determined as being the signature of an attacking packet, and used as a signature for detecting the attacking packet.
When a session success ratio of the packet is examined, if the session success ratio is equal to or less than a predetermined value, the generated substring set may be determined as being the signature of an attacking packet, and used as a signature for detecting the attacking packet.
Also, any combination (and/or) of the two criteria may be used for determination.
The signature optimization unit 160 minimizes the size of the confirmed substring set, i.e., the size of the signature, thereby performing optimization so as to increase the distinction and storage efficiency of a signature. The optimization method will be explained in more detail later with reference to
Referring to
Referring to
In the major operation flow of the method, first, a substring set regarded as attacking contents is generated in a packet that is an object of examination in operation S310. Here, substrings appearing more than a predetermined number of times are combined, from among a plurality of substrings extracted from the packet, thereby generating the substring set. The method of generating a substring set will be explained in more detailed later with reference to
Then, in operation S320, the generated substring set is compared with existing signatures that are already registered. If the generated substring set is already registered, a security policy corresponding to the substring set is applied in operation S330. If the set is not registered, it is confirmed whether or not the generated substring set has a characteristic as a signature in operation S340. Here, by examining the attacking characteristic of the packet having the substring set, it is determined whether or not the substring set is to be used as a signature for detecting an attacking packet. The substring sets of packets classified as packets likely to attack are examined more precisely with respect to their behavioral characteristics. Here, the characteristics used for the examination include the distribution of destination addresses, and a session success ratio.
In this case, the number of destination addresses of the packet may be examined, and if the number of the destination addresses is equal to or greater than the predetermined value, the generated substring set may be determined as being the signature of an attacking packet, and used as a signature for detecting the attacking packet.
Also, when the session success ratio of the packet is examined, if the session success ratio is equal to or less than a predetermined value, the generated substring set may be determined as being the signature of an attacking packet, and used as a signature for detecting the attacking packet.
In addition, any combination (and/or) of the two criteria may be used for determination.
The signatures, based on the substring sets generated by the process described above, can effectively remove a part that can be incorrectly detected, such as a protocol header or a header of a predetermined application. However, when a substring set generated in relation to one packet is used for detecting attacks, the size of the signature and the number of signatures can become bigger than those of conventional methods, and it may cause degradation in the performance of a system. Accordingly, an optimization process for the signatures classified as attacking packets according to the process described above is performed.
After the optimization in which the size of each signature of the confirmed substring sets is minimized and the distinction and storage efficiency of a signature is increased, the automatic generation of signatures is completed in operation S350. The method of optimization will be explained in more detail later with reference to
Referring to
Each process illustrated in
First, in operation S410, substrings of a predetermined length are extracted from all packets arriving at a network device in which an object system is installed. 2 bytes to 100 bytes are generally used as the length of the substring. At this time, a continuous or discontinuous byte string having a predetermined length in a packet is used as a substring.
Then, the hash value of each extracted substring is calculated using a widely used simple hashing algorithm in operation S420.
Here, a representative method that can be used for extraction of a substring and calculation of a hash value is the Karp-Rabin fingerprinting technique described above. In this technique, one document is divided into substrings of k-byte length, and a hash value with respect to each substring is calculated. At this time, each substring is divided according to a moving window method. For example, if the first substring is formed from first byte to k-th byte, the second substring is formed from second byte to (k+1)−th byte. Here, if each byte of one substring is expressed by coefficients of a polynomial, the hash value of a continuous substring can be obtained by just a simple calculation. If the total size of a document is x bytes, the number of hash values to be generated is x−k+1, and the calculated (x−k+1) hash values represent the document.
A comparison of all the calculated hash values is a major factor in degrading the performance of a system as described above. Accordingly, the calculated hash values are sampled by using sampling methods in operation S430.
Although a variety of sampling methods can be applied, the following four methods will be explained here.
First, there is a method of determining whether or not a predetermined character string exists in the documents being compared. For this, a modulus p operation with respect to each calculated hash value is performed. Then, among the results, only a predetermined value, for example, a value having a modulus p of ‘0’, is selected for the substring set of the document. This method is simple and actually easy to apply, but it has a drawback in that the number of generated substring sets varies depending on the contents and size of a document.
As a method of compensating for this, there is a winnowing technique. In the winnowing technique, instead of selecting predetermined values occurring in the modulus p operation, a window having a predetermined size is used, thereby selecting a minimum value from among hash values corresponding to the window. In this way, a minimum number of substring sets that a document of predetermined size can have is guaranteed and a substring set can be extracted more accurately.
As a method that is a little simpler than the winnowing technique, there is a method of selecting n minimum values among hash values occurring in each document. The selected hash values are expressed as a set of values representing the document, and by comparing sets representing each document, the resemblance between documents is calculated. This method has a problem in that when a bigger document includes a smaller document, it is difficult to determine whether the two documents are similar to one another or one document is included in the other.
Finally, there is a content-based payload partitioning (COPP) method in which a predetermined value in a document is found, and a predetermined number of bytes from the position of the value, or the contents from the position of the value to a position where a character string that is desired to be found appears for a second time, are used as a fingerprint.
In the present invention, sampling may be performed using the winnowing technique. By sampling substrings according to the winnowing technique, the drawbacks of value sampling, that is, changes in the number of samples and a high frequency of a predetermined character string, can be compensated for.
A method of determining the number of samples to be extracted from one packet may be performed by determining the number of samples in proportion to the length of the packet.
The substrings selected through sampling occupy predetermined positions in the substring distribution table 240 illustrated in
If a substring that is to be processed remains, the processes described above are repeatedly performed in operation S450.
Next, the frequency of substrings registered in the substring distribution table 240 is confirmed, thereby confirming whether a substring is an activated substring in operation S460. If substrings are extracted from an identical packet, substrings appearing more than a predetermined number of times are combined, thereby generating a substring set in operation S470. That is, based on the frequency of a substring registered in the substring distribution table 240 and a preset threshold, substrings appearing more than the predetermined number of times are determined as substrings that are likely to attack a network, and a combination of the substrings is used to generate a substring set.
Registered substrings are divided into active substrings and inactive substrings according to their frequencies. At this time, the criterion for classifying the substrings is determined according to the frequencies in the substring distribution table 240 and the preset threshold.
Methods of determining the threshold include a method using an average frequency of entire substrings, and a method of setting a threshold using a highest frequency of a substring recorded at a predetermined time in the case of normal packets by means of experiments. The method using an average frequency further includes a method of obtaining the average of i latest substrings by using an exponentially weighted moving average, and a method using an arithmetic average of entire substring frequencies.
For example, when the average of the entire substrings is Aavg, a threshold Ath is β*Aavg (where β is a real number greater than 1), and if the frequency of a selected substring is greater than the threshold Ath, the substring is classified as an active substring.
Assuming that the total number of active substrings that are generated with respect to one packet, and are sampled and registered in the substring distribution table 240, and whose frequencies are greater than the threshold Ath is Na, then if Na is greater than a predefined threshold number (Sth) of substrings (where Sth is an integer greater than 1), the packet is classified as a packet that is likely to attack, and the Na substrings generated from the packet are stored in a separate space and combined as a substring set in operation S470.
In the current embodiment illustrated in
However, in another embodiment, it can be made that after the operation S470 for combining activated substrings in an identical packet, repetitive examination is performed. In this case, even without the flag, it can be immediately determined that a substring is an activated substring occurring in a packet being currently examined.
Referring to
The major purpose of the signature optimization is to prevent degradation of the distinction of a signature that can occur when a hash value is used to generate signatures, thereby minimizing incorrect detection. That is, if part of a generated signature includes a part that is commonly used in a plurality of packets, as the header or a protocol or application, system resources, such as a storage space required for storing a signature and processing power required for applying a signature, are unnecessarily used, thereby degrading the performance of the system. Accordingly, technology for increasing the efficiency of a system by removing a part included in a plurality of signatures is signature optimization.
For this, all extracted signatures are examined as to whether or not a substring included in each signature is included in another signature in operation S510. That is, regarding a signature that is a substring set, as a set, and regarding substrings forming the substring set, as elements of the set, a comparison is made in order to determine whether or not common elements (substrings) exist.
At this time, considering a collision of a hashing function and scalability, the number of duplicate substrings appearing may be limited to d in operation S520. That is, in the optimization process, only when one substring occurs in d or more than d signatures, the corresponding substring is deleted from each signature.
If the number of duplicate substrings is equal to or less than the preset value d, it is confirmed whether or not existing signatures available for comparison remain in operation S530, and the processes for the next signature is repeated in operation S540.
Meanwhile, if deletion is performed in this way, a case where attacking signatures, which have a different part that is a very small part, are all deleted in continuously generated attacking signatures, may occur. For example, in the case of the polymorphic worm, which changes part of an attacking code little by little in each attack attempt, if the duplicate part is all deleted, only a very small part that is different remains. This shows a characteristic similar to a signature generated in a system for detecting an attack by using only one substring as in the Earlybird technique described above. Accordingly, this undermines the advantages of the present invention.
In order to prevent this, a method may be used in which if one signature is included in another signature or is similar to another signature by more than a predetermined level, deletion is not performed.
First, the inclusion degree (C) and resemblance degree (R) are calculated between signatures in operation S550. For the inclusion degree (C) and the resemblance degree (R), a concept that is usually employed in set theory is used. That is, with respect to two sets (signatures) A and B, the degree (C) to which set A is included in set B is calculated according to equation 1 below:
Also, the resemblance (R) between sets A and B is calculated according to equation 2 below:
That is, when the inclusion degree (C) of the two signatures is less than a threshold value Cth predetermined according to the characteristic of a security system in operation S560, and when the resemblance degree (R) of the two signatures is less than a threshold value Rth predetermined according to the characteristic of the security system in operation S570, the duplicate substring can be deleted from the two signatures in operation S580.
In this example, it is assumed that 1 is used as a variable d indicating the duplication degree of a substring forming a signature, and 0.5 is used for both Rth and Cth.
For example, a case where signatures 1, 2, and 3 are sequentially generated and signature 4 is, at present, newly registered will now be explained. Here, the signature 4 has substrings 601, 603, 625, 630, and 617 (substrings registered in one signature may be sorted for convenience of operations that are to be required later, but it may be a cause of incorrect detection when detecting an attack, and therefore, the substrings are not sorted in the current embodiment). Among the substrings, substrings 601 and 603 overlap the substrings of signature 1. Also, substring 617 overlaps the substring of signature 3. This means that the newly generated signature 4 has common parts with existing signatures 1, 2, and 3, and the newly generated signature 4 has a weak distinction.
In this example, since d is 1, the conditions for the operation S520 illustrated in
The technology for expressing the inclusion degree and the resemblance degree, which are used in the signature optimization, as numbers, can also be used for detecting an attack using a signature. In the case of the polymorphic worm, the contents of the packet may vary little by little in each attack. In this case, if conventional exact pattern matching is used, incorrect detection may occur. However, when the technology for expressing the inclusion degree and the resemblance degree as numbers, as described above, is used, if an unchanged part is included in a packet even when part of the contents of the packet has changed, the packet can be detected as an attacking packet.
The method of the present invention as described above may be implemented as a program and can be used as a part of a network router or a part of security device of a network. Also, the method of the present invention can be implemented as a hardware method, for example, as an application-specific integrated circuit (ASIC) and a field programmable gate array (FPGA), in order to be used in an ultra high speed network.
According to the present invention, an attacking packet occurring in a high speed network is detected, and its signature is automatically generated, thereby protecting the network from an attack that may occur later.
Also, according to the present invention, instead of a pattern occurring in a part of a packet, a group of patterns occurring in a plurality of parts of the packet is used as an attacking signature, thereby minimizing incorrect detection. Also, the signature is optimized, thereby enabling the establishment of a security system in which generation, storage, management, and application of the signature is simplified.
The present invention can also be embodied as computer readable codes on a computer readable recording medium. The computer readable recording medium is any data storage device that can store data which can be thereafter read by a computer system. Examples of the computer readable recording medium include read-only memory (ROM), random-access memory (RAM), CD-ROMs, magnetic tapes, floppy disks, optical data storage devices, and carrier waves (such as data transmission through the Internet). The computer readable recording medium can also be distributed over network coupled computer systems so that the computer readable code is stored and executed in a distributed fashion.
While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it will be understood by those of ordinary skill in the art that various changes in form and details may be made therein without departing from the spirit and scope of the present invention as defined by the following claims. The preferred embodiments should be considered in descriptive sense only and not for purposes of limitation. Therefore, the scope of the invention is defined not by the detailed description of the invention but by the appended claims, and all differences within the scope will be construed as being included in the present invention.
Number | Date | Country | Kind |
---|---|---|---|
10-2006-0071654 | Jul 2006 | KR | national |