This application claims priority of Taiwan Patent Application No. 099102049 filed on Jan. 26, 2010, the entirety of which is incorporated by reference herein.
1. Technical Field
The invention relates to web mimicry attacks, and more particularly, to detection methods and devices for detecting web mimicry attacks.
2. Related Art
Presently, web sites are being developed to provide many application programs in order to provide diversified application services. However, this may make web servers more at a risk for malicious attacks.
Most web application attacks use scripts, wherein web attacks are created with variation and flexibility for when the attack occurs. This worsens web mimicry attacks. As for web mimicry attacks, it is a variable method, wherein hackers may gain access to web sites. Basically, a web intrusion detection system is tricked into deeming that a web mimicry attack is a normal action instead of a web mimicry attack. Thus, no detection is observed, and through the web mimicry attack, hackers may access web sites to manipulate, steal or maliciously attack the web sites.
The conventional web intrusion detection method is based on characters which detect web attacks. However, web mimicry attacks are made more easily due to the conventional web intrusion detection methods. Following, tokens were used in replace of characters, wherein a hypertext transfer protocol request is segmented to a token sequence and a model of normal actions is constructed for detecting attacks. However, the conventional method does not completely consider the probability of correlation among adjacent tokens.
Therefore, web mimicry attack detection methods and devices for effectively modeling correlation of adjacent tokens are desired.
One aspect of the present invention is to provide a web mimicry attack detection device, comprising: a first token sequence collector receiving a hypertext transfer protocol request and extracting string content of the hypertext transfer protocol request according to a token collection method to generate a token sequence corresponding to the hypertext transfer protocol request, wherein the token sequence comprises a plurality of the tokens; and a mimicry attack detector generating a label and a confidence score corresponding individually to the tokens according to the tokens and a conditional random field probability model, summing the confidence score individually corresponding to the tokens in the token sequence by a summary rule to generate a summary confidence score, and determining whether the hypertext transfer protocol request is an attack according to the summary confidence score and the label individually corresponding to the tokens.
Another aspect of the present invention is to provide a web mimicry attack detection method, comprising: constructing a conditional random field probability model; receiving a hypertext transfer protocol request by a first token sequence collector; extracting string content of the hypertext transfer protocol request according to a token collection method to generate a token sequence corresponding to the hypertext transfer protocol request, wherein the token sequence comprises a plurality of the tokens; generating a label and a confidence score corresponding individually to the tokens according to the tokens and a conditional random field probability model; summing the confidence score individually corresponding to the tokens in the token sequence by a summary rule to generate a summary confidence score; and determining whether the hypertext transfer protocol request is an attack according to the summary confidence score and the label individually corresponding to the tokens.
The advantage and spirit of the application will be better understood by the following recitations and the appended drawings.
The application can be more fully understood by reading the subsequent detailed description and examples with references made to the accompanying drawings, wherein:
The following description is of the best-contemplated mode of carrying out the invention. This description is made for the purpose of illustrating the general principles of the invention and should not be taken in a limiting sense. The scope of the invention is best determined by reference to the appended claims.
The first token sequence collector 102 in the web mimicry attack detection device 10 receives a hypertext transfer protocol request HR and extracts string content of the hypertext transfer protocol request HR according to a token collection method to generate a token sequence TS corresponding to the hypertext transfer protocol request HR, wherein the token sequence TS comprises a plurality of the tokens.
As shown in the
The web mimicry attacks detector 103 in the web mimicry attack detection device 10 generates a label and a confidence score corresponding individually to the tokens according to the all tokens of the token sequence TS and a conditional random field probability model CRFM generated by the token probability module 101, and sums the confidence score individually corresponding to the tokens in the token sequence TS by a summary rule to generate a summary confidence score. Next, the web mimicry attacks detector 103 determines whether the hypertext transfer protocol request is an attack or not according to the summary confidence score and the label individually corresponding to the tokens.
For example, the web mimicry attacks detector 103 receives a hypertext transfer protocol request and a token sequence as shown in
In the token sequence shown in the
Therefore, as shown in the
The web mimicry attacks detector 103 determines a label and a confidence score for every one of the tokens in the token sequence according to the conditional random field probability model CRFM generated by the token probability module 101, wherein the label corresponding individually to the tokens is a normal or offensive classification name.
For example, the web mimicry attacks detector 103 determines a label “A1” and a confidence score “0.6” for the first token in the token sequence according to the conditional random field probability model CRFM, wherein the label “A1” and the confidence score “0.6” represent that the probability that the first token is a first type of attack is 60%.
For another example, the web mimicry attack detector 103 determines a label “A2” and a confidence score “0.4” for the second token in the token sequence according to the conditional random field probability model CRFM, wherein the label “A2” and the confidence score “0.4” represent that the probability that the second token is a second type of attack is 40% and so on. The label “N” and the labels “A1”˜“A7” represent offensive classification names. For example, the label “A1” represents that a first type of attack and the label “A2” represents that a second type of attack and so on. The invention does not only limit the first to seventh type of attacks. A person skilled in the art can determine the classification of the network attack according to practical requirements.
Therefore, the web mimicry attacks detector 103 determines a label and a confidence score for every one of the tokens in the token sequence according to the conditional random field probability model CRFM, and then determines whether the hypertext transfer protocol request HR is an attack and the type of attack of attack according to the label individually corresponding to the tokens and the summary confidence score summed by all confidence scores. The attack warning signal AS is output, wherein the attack warning signal AS indicates the type of attack of the hypertext transfer protocol request HR when the hypertext transfer protocol request is determined to be an attack.
The conditional random field probability model CRFM is generated by the token probability module 101. The token probability module 101 in the web mimicry attack detection device 10 comprises a normal/offensive string database 1011, a second token sequence collector 1012, a token sequence correlator 1013 and a probability modeler 1014.
The normal/offensive string database 1011 stores normal string data NSD and offensive string data ASD, wherein the normal string data NSD and the offensive string data ASD are first defined by experts and the normal string data NSD and the offensive string data ASD are used to construct the conditional random field probability model CRFM by the token probability module 101.
The second token sequence collector 1012 extracts the normal string data NSD and the offensive string data ASD according to the token collection method to generate a normal token sequence NTS corresponding to the normal string data NSD and a offensive token sequence ATS corresponding to the offensive string data ASD, wherein the token collection rule is defined, wherein a token must be a the special symbol or a string composed of alphabets and digits.
The token sequence correlator 1013 calculates probabilities of adjacent token correlations in the normal token sequence NTS and probabilities of adjacent token correlations in the offensive token sequence ATS, and then constructs an adjacent token correlations probability table to generate a plurality of model parameters.
The probability modeler 1014 constructs the conditional random field probability model CRFM according to the model parameters. As shown in the
For example, the appearance probability of the token x1 in front of the token x2 and the appearance probability of the token x3 in back of the token x2 are gathered by statistics in the given of the token x2. The adjacent token correlations probability table is constructed by considering the appearance probability of the correlation between the front token and the back token in sequence of every token in the token sequence. And then the model parameters are generated according to the adjacent token correlations probability table.
For example, the appearance probability of the token x1 in front of the token x2 and the appearance probability of the token x3 in back of the token x2 are gathered by statistics in the given of the appearance probability of the token x2. The appearance probability of the token x2 in front of the token x3 and the appearance probability of the token x4 in back of the token x3 are gathered by statistics in the given of the appearance probability of the token x3. The appearance probability of the token x2 in back of the token x1 is gathered by statistics in the given of the appearance probability of the token x1.
Therefore, the adjacent token correlations probability table is generated by gathering the token correlation of every token in the normal token sequence NTS corresponding to the normal string data NSD and offensive token sequence ATS corresponding to the offensive string data ASD by statistics. And then the model parameters are generated according to the adjacent token correlations probability table.
The first data variability reducer 1021 punches the string content of the hypertext transfer protocol request HR by decoding strings, canceling repetitions and adding white space, and rewriting all letters of the string with lower case letters. The first token sequence generator 1022 extracts the punched string content of the hypertext transfer protocol request HR according to the token collection method to generate the token sequence TS corresponding to the hypertext transfer protocol request HR.
The second data variability reducer 10121 punches the string content of the normal string data NSD and the offensive string data ASD by decoding strings, canceling repetitions and adding white space, and rewriting all letters of the string with lower case letters. The second token sequence generator 10122 extracts the punched string content of the normal string data NSD and the offensive string data ASD according to the token collection method to generate the normal token sequence NTS corresponding to the normal string data NSD and offensive token sequence ATS corresponding to the offensive string data ASD.
It is noteworthy that if the label corresponding to any token in the token sequence belongs to any type of attack, the hypertext transfer protocol request is determined to be an attack. In other words, the hypertext transfer protocol request also is a normal hypertext transfer protocol request, when the labels corresponding to tokens in the token sequence all correspond to the label “N”.
The token T1 corresponds to a label “N”, the token T2 corresponds to a label “A1” and a confidence score “f2”, the token T3 corresponds to a label “A1” and a confidence score “f3”, the token T4 corresponds to a label “A2” and a confidence score “f4” and the token T5 corresponds to a label “A2” and a confidence score “f5”. The label “N” represents that the token corresponding to the label “N” is normal. The label “A1” represents that the token corresponding to the label “A1” is a first type of attack and the label “A2” represents that the token corresponding to the label “A2” is a second type of attack. The confidence score is the probability that the token belongs to a first type of attack or the probability that the token belongs to a second type of attack.
The web mimicry attacks detector 103 determines that the token sequence belongs to a type of attack according to all of the labels and all of the confidence scores corresponding to the tokens in the token sequence. For example, as shown in the
According to all confidence scores corresponding to the tokens in the token sequence, the confidence score “f2” and the confidence score “f3” belong to a first type of attack and the confidence score “f4” and the confidence score “f5” belong to a second type of attack. Therefore, the total confidence score in which the token sequence belongs to a first type of attack is f2+f3 and the total confidence score in which the token sequence belongs to a second type of attack is f4+f5. The web mimicry attack detector 103 determines that the token sequence belongs to a first type of attack when f2+f3>f4+f5, the web mimicry attack detector 103 determines that the token sequence belongs to a second type of attack when f4+f5>f2+f3, and the web mimicry attack detector 103 determines that the token sequence belongs to a first type of attack and a second type of attack when f2+f3=f4+f5. However, a person skilled in the art knows that the condition. f2+f3=f4+f5, may not occur.
In another example, the web mimicry attacks detector 103 determines that the token sequence belongs to a type of attack according to the number of appearance time of the labels, and then according to the confidence scores when the number of times of the different labels is the same. For example, in a token sequence, the web mimicry attacks detector 103 determines that the token sequence belongs to a first type of attack, when the number of appearance time of the label “A1” is the largest among other labels.
The web mimicry attacks detector 103 determines that the token sequence belongs to a type of attack according to all of the total confidence scores when the number of times of the different labels is the same. For example, in a token sequence, the web mimicry attacks detector 103 determines that the token sequence belongs to the type of attack according to the sum of the confidence scores corresponding to the label “A1” and the sum of the confidence scores corresponding to the label “A2”, when the number of times of the label “A1” and the number of appearance time of the label “A2” are simultaneously the same and largest among other labels. The web mimicry attack detector 103 determines that the token sequence belongs to first type of attack when the sum of the confidence scores corresponding to the label “A1” is larger than the sum of the confidence scores corresponding to the label “A2”, and the web mimicry attacks detector 103 determines that the token sequence belongs to a second type of attack when the sum of the confidence scores corresponding to the label “A1” is smaller than the sum of the confidence cores corresponding to the label “A2”. Note that the invention is not limited to the comparing order of the labels and the confidence scores or the comparing order of the labels and the weighted confidence scores.
Therefore, the web mimicry attacks detector 103 determines that the hypertext transfer protocol request is normal or belongs to the type of attack of attack according to every label and every confidence score corresponding to the token sequence.
The detection step S61 comprises: receiving a hypertext transfer protocol request HR by the first token sequence collector in step S611; extracting string content of the hypertext transfer protocol request HR according to the token collection method to generate a token sequence TS corresponding to the hypertext transfer protocol request HR in step S612, wherein the token sequence TS comprises a plurality of the tokens; generating a label and a confidence score corresponding individually to the tokens according to the conditional random field probability model CRFM generated by the token probability module 101 (step S613); in step S614, summing the confidence score individually corresponding to the tokens in the token sequence TS by a summary rule to generate a summary confidence score; and in step S615, determining whether the hypertext transfer protocol request HR is an attack according to the summary confidence score and the label individually corresponding to the tokens in the token sequence TS and outputting an attack warning signal AS when determining that the hypertext transfer protocol request HR is an attack.
While the invention has been described by way of example and in terms of the preferred embodiments, it is to be understood that the invention is not limited to the disclosed embodiments. To the contrary, it is intended to cover various modifications and similar arrangements (as would be apparent to those skilled in the art). Therefore, the scope of the appended claims should be accorded the broadest interpretation so as to encompass all such modifications and similar arrangements.
Number | Date | Country | Kind |
---|---|---|---|
099102049 | Jan 2010 | TW | national |