This application is related to U.S. patent application Ser. No. 10/819,574, filed Apr. 6, 2004, which is incorporated by reference herein.
1. Field of the Invention
This invention pertains in general to computer security and in particular to securing computer databases against code injection attacks.
2. Description of the Related Art
Databases are widespread in modern computing environments. Companies and other enterprises rely on databases to store both public and private data. Many enterprises provide publicly-accessible interfaces to their databases. For example, an electronic commerce web site typically includes a “search” field that accepts search terms and allows an end-user to search items for sale on the site. This search field is a publicly-accessible interface to a database that stores data describing the items for sale.
At a technical level, many of these publicly-accessible databases work by having a web server provide a web browser executing on the client with an HTML and/or JavaScript-based form. The web browser displays this form on the client, and the end-user provides values for the fields in the form. The end-user performs an action, such as pressing a “Submit” button, that causes the web browser to send the entered values to the server. At this point, back-end logic at the server constructs a query to the database using the user-supplied values. This query executes on the database and the server returns the results to the client web browser.
In an SQL (Structured Query Language) injection attack, the attacker fills out the form using specially-crafted data. These data, when used by the server to generate a query to the database, result in a malicious query being sent to the database on behalf of the attacker. The malicious query executes on the database and results in a malicious action.
For example, assume a form asks an end-user for his name and password. A legitimate user might enter “Jim” as his name and “Pickle” as his password. When these values are returned to the server, the server places the values into two variables, for example “name$” and “pass$”. The back-end logic constructs a query using the values of these variables. Assume that the query having the variables is:
Query$=“SELECT*FROM USERS WHERE NAME=‘“+name$+”’ AND PASS=‘“+pass$+”’”.
The back-end logic replaces the variables with the user-supplied values and produces the query:
Query$=“SELECT*FROM USERS WHERE NAME=‘Jim’ AND PASS=‘Pickle’”.
This query, when executed on the database, validates that the end-user supplied a matching name/password pair by returning the user's information if the data are correct.
To understand an SQL injection attack, consider what would happen if the user supplied the specially-crafted string:
‘OR AGE>=0--
as the name and “any” as the password. The back-end logic will construct the query to the database as:
Query$=“SELECT*FROM USERS WHERE NAME=‘ ’OR AGE>=0--’ AND PASS=‘any’”
As it turns out, the “--” sequence denotes a comment in SQL, so the resulting query is interpreted as follows:
SELECT*FROM USERS WHERE NAME=‘ ’OR AGE>=0
This query will select all users from the USERS table where the user's name is equal to the empty string ‘ ’ OR where the user's AGE (another field in the database in this example) is greater than or equal to zero years old. Since every user is at least zero years old, this augmented query will select all users and return their results to the attacker.
By using the techniques illustrated in this example, the attacker can inject code to obtain access to credit card numbers and other confidential information, modify or delete information on the database, or perform other malicious actions. Thus, there is a need in the art for a way to detect malicious queries and prevent them from executing on the database.
The above need is met by a database server that compares an incoming query against a set of template queries in order to determine whether the incoming query is legitimate, malicious (i.e., includes injected code), or anomalous. The database server receives an incoming query from a web server or another entity and converts the query into its canonical form. The database server compares the canonical incoming query with stored template queries. The template queries are canonical forms of queries known to be legitimate. If the input query matches one of the stored template queries, then the query is legitimate. Accordingly, one embodiment of the database server executes the query on a database.
If the canonical incoming query does not match one of the stored template queries, then the database server determines whether the incoming query is malicious or anomalous. The database server identifies tokens in the incoming query that are not present in a similar template query. If the tokens have meaning in the language utilized to express the query, the database server declares the query malicious. Otherwise, the database server declares the query anomalous.
The figures depict an embodiment of the present invention for purposes of illustration only. One skilled in the art will readily recognize from the following description that alternative embodiments of the structures and methods illustrated herein may be employed without departing from the principles of the invention described herein.
The network 114 enables data communication between and among the entities connected to the network and in one embodiment is the Internet. The network 114 can also utilize dedicated or private communications links that are not necessarily part of the Internet. In one embodiment, the network 114 uses standard communications technologies and/or protocols. Thus, the network 114 can include links using technologies such as Ethernet, 802.11, integrated services digital network (ISDN), digital subscriber line (DSL), asynchronous transfer mode (ATM), etc. Similarly, the networking protocols used on the network 114 can include multiprotocol label switching (MPLS), the transmission control protocol/Internet protocol (TCP/IP), the User Datagram Protocol (UDP), the hypertext transport protocol (HTTP), the simple mail transfer protocol (SMTP), and the file transfer protocol (FTP). The data exchanged over the network 114 can be represented using technologies and/or formats including the hypertext markup language (HTML), the extensible markup language (XML), the simple object access protocol (SOAP) etc. In addition, all or some of links can be encrypted using conventional encryption technologies such as the secure sockets layer (SSL), Secure HTTP and/or virtual private networks (VPNs). In another embodiment, the entities can use custom and/or dedicated data communications technologies instead of, or in addition to, the ones described above.
The web server 110 exchanges data with the client computers 112 via the network 114. In one embodiment, the web server 110 is in communication with a database server 116 having a database 118. The database server 116 is shown within the web server 110 of
In one embodiment, the web server 110 exchanges data with the client computers 112 using HTTP and/or other protocols. The exchanged data provides a means by which the client computers can access the database 118. The web server 110 receives data from the client computers 112, uses the data to generate queries, and passes the queries to the database server 116. The database server 116 verifies that queries are legitimate, executes the legitimate queries on the database 118 and, in some embodiments, passes the results of the queries back to the web server 110. The web server 110 can pass the results of the queries back to the client computers 112.
For example, the web server 110 can use HTTP to provide the client computers 112 with HTML-encoded web pages having forms for entering data. The web server 110 receives query strings from the client computers 112 incorporating data entered into the forms. The web server 110 extracts the data from the query strings and uses the data to create queries which it then passes to the database server 116 for execution. In one embodiment, the queries are formulated in the Structured Query Language (SQL), but other embodiments can use other languages or techniques for representing the queries. The web server 110 provides the results of the query to the client computers 112 via new HTML-encoded web pages.
In another example, the web server 110 uses web services to interact with the client computers 112. The web server 110 utilizes an XML schema that is provided to, or otherwise known by, the client computers 112. The web server 110 receives messages encoded using the XML schema from the client computers 112 via SOAP over HTTP. The web server 110 parses the XML-encoded messages to extract data, and uses the data to create the queries executed by the database server 116.
The client computers 112 are utilized by end-users to interact with the web server 110. In one embodiment, a client computer 112 is a typical personal computer such as an IBM-PC or Apple Macintosh compatible computer. In another embodiment, a client computer 112 is another type of electronic device, such as a cellular telephone, personal digital assistant (PDA), portable email device, etc. In one embodiment, a client computer 112 executes a web browser that receives an HTML-encoded web page from the web server 110 and displays it to the end-user. The web page contains a form for accepting information. The end-user uses a keyboard or other user input device to provide information into the form, and the web browser encodes the information in a query string and sends it to the web server 110. In another embodiment, a client computer 112 executes a program for interfacing with the web server 110 using web services.
As is known in the art, the computer system 200 is adapted to execute computer program modules. As used herein, the term “module” refers to computer program logic for providing the specified functionality. A module can be implemented in hardware, firmware, and/or software. When utilized, the modules are loaded into the memory 206 and executed by the processor 202.
The database server 116 includes an input/output (I/O) module 310. The I/O module 310 receives incoming queries from the web server 110 and/or other entities and also provides the results of the queries and/or determinations of whether the queries are legitimate, malicious, or anomalous to the appropriate entities. In one embodiment, a query is defined as an ordered set of tokens and parameterizable fields. A token is an independent unit of meaning in the language in which the query is specified. For example, in the SQL embodiment the “SELECT” statement is a token. Other tokens include the comment symbol (--), mathematical operators (e.g., +, −, and =), etc.
A parameterizable field is a location in a query where a variable is replaced with a literal for execution. The literal is typically based on a user-defined value such as a string or number. For example, the SQL fragment
The results of the query output by the I/O module 310 in one embodiment are the data in the database 118 satisfying the query. The results can likewise include one or more return codes indicating whether the query successfully executed and/or other information about the query execution. In one embodiment, the I/O module 310 returns a value describing whether the database server 116 found the query to be legitimate, malicious, or anomalous.
A storage module 312 holds templates representing canonical forms of known legitimate queries executed by the database server 116. In one embodiment, the templates are generated using the technique described in U.S. patent application Ser. No. 10/612,198, which is incorporated herein by reference. According to this technique, the templates are generated by observing all incoming queries to the database during a training period. Each query is added to an approved list of queries, and each query in the approved list is canonicalized (i.e., generalized) so it can be used to match different instances of the same query.
In one embodiment, the queries are canonicalized by removing the literals (i.e., the specific string and/or numeric values) from the parameterizable fields. For example, consider the two following non-canonicalized SQL queries:
In one embodiment, the storage module 312 also stores a set of data describing tokens having meaning in the language used to specify the queries. For example, in the SQL embodiment the storage module 312 stores a set of data describing SQL keywords (e.g., “exists” “member,” “concat,” “and”), operators (e.g., “=”, “−,” “+”), and special strings (e.g., comment indicators like “--” and “/*”). In one embodiment, the set of data stores the tokens as text strings or in another format that supports rapid comparison with other text strings.
In one embodiment, an injection detection module 314 evaluates an incoming database query against the query templates and determines whether the incoming query is legitimate, malicious (i.e., contains injected code) or anomalous. In one embodiment, the injection detection module 314 canonicalizes the incoming query. A query that includes an injection attack might be malformed, in which case the canonicalization process can fail. In one embodiment, if the canonicalization is successful, the injection detection module 314 compares the canonicalized query to the query templates in the storage module 312. If the canonicalized incoming query exactly matches any of the template queries, then in one embodiment the incoming query is considered legitimate.
If the canonicalized incoming query does not match a template query, or the incoming query cannot be canonicalized, then the incoming query is either malicious (i.e., a code injection attack) or anomalous. The injection detection module 314 performs additional tests on the incoming query to determine whether it is malicious. These tests are described below in relation to
A query reporting module 316 reports the results of the analyses performed by the injection detection module 314. The query reporting module 316 is operator-configured to provide different types of alerts based on the classification of the query. For example, the query reporting module 316 can be configured to generate a high-priority alert when a malicious query is detected, and a low-priority alert when an anomalous query is detected. In one embodiment, the output of the query reporting module 316 is provided to the I/O module 310. In another embodiment, the query reporting module 316 provides its report to other internal and/or external modules.
A query execution module 318 executes the input query on the database 318. In one embodiment, the query execution module 318 utilizes the results of the analyses performed by the injection detection module 314 to determine whether to execute the query. For example, the query execution module 318 can be configured to execute all queries classified as legitimate, but to not execute any queries classified as anomalous or malicious. Other embodiments can respond to the classifications in different manners.
The database server 116 receives 110 an incoming query from the web server 110 or another entity. The database server 116 converts the incoming query into its canonical form 412, if possible. Next, the database server 116 compares the canonical incoming query with the stored template queries. If 414 the canonical incoming query matches one of the stored template queries, then the query is legitimate 416. Accordingly, one embodiment of the database server 116 executes 418 the query on the database 118. The database server 116 reports 420 the results of the query back to the web server 110 and/or another entity.
If 414 the canonical incoming query does not match one of the stored template queries, or the incoming query cannot be canonicalized, then one embodiment of the database server 116 performs 422 additional tests to determine whether the query is malicious or anomalous. If the additional tests show that the query is malicious 424, the database server 116 reports 420 this result. If the tests show that the query is anomalous 426, the database server 116 likewise reports 420 this result.
Recall that in one embodiment, the steps of
The database server 116 determines 512 if all of the tokens in the candidate template query are present in the non-canonicalized incoming query, in the same order. In one embodiment, this determination 512 is performed by selecting a token in the template query (e.g., “USERS”) and using string matching to determine whether the characters forming the token are present in the incoming query. This process is repeated for each token of the template query, and the tokens are selected in the order that they appear in the template query. In one embodiment, the database server 116 does not search for certain tokens in the incoming query. For example, it is unnecessary to search for tokens occurring prior to the first “WHERE” in an SQL statement because no code injections are possible prior to that point. In addition, the database server 116 does not search for tokens that are artifacts of the canonicalization process, such as wildcards, because such tokens are not necessarily found in the incoming query.
The database server 116 identifies 514 any strings in the incoming query that are found between two matching tokens and/or after the final matching token. Since these additional strings are found in the incoming query but are not present in the template query, the additional strings were possibly introduced through a code injection attack. In one embodiment, the database server 116 stores any additional strings for analysis after all of the tokens of the template query have been sought in the incoming query. In another embodiment, the database server 116 analyzes each additional string as it is found.
If 516 not every token in the template query is found in the incoming query or no additional strings are found, the database server 116 declares 426 the query anomalous. Otherwise, the database server 116 tokenizes 518 each additional string found in the incoming query by separating the string into character sequences. This tokenization can be performed, for example, by declaring that each set of characters between two spaces forms a token. The database server 116 then determines 520 whether any of the tokens from the additional strings have meaning within the query language. In one embodiment, this latter step is performed by determining whether any of the tokens are found in the set of meaningful SQL tokens stored in the storage module 312. If 520 any token is found in the set of tokens stored by the storage module 312, then the token was added by a code injection attack and the database server 116 declares the query malicious.
The operation shown in
SELECT*FROM USERS WHERE NAME=‘ ’ OR AGE>=0--’ AND PASS=‘any’.
This query includes a code injection attack embodied by the string “‘OR AGE>=0--” and will not match any of the template queries. The database server 116 will select as the most similar template query:
SELECT*FROM USERS WHERE NAME=‘ ’ AND PASS=‘ ’.
The database server 116 identifies the first token after the “WHERE” statement in the template query, “NAME,” and searches for it in the incoming query. This token is found in the incoming query. Likewise, the next two tokens, “=” and “AND” are found in the incoming query. However, the database server 116 detects that the incoming query contains the additional string “‘OR AGE>=0--” between these two tokens.
The database server 116 tokenizes the additional string and searches for each token in the storage module 312. In this example, each token (e.g., the single quote and “OR”) are found in the storage module 312 because these tokens have meaning in SQL. Therefore, the database server 116 declares the query malicious.
In one embodiment, the database server 116 performs other tests on incoming queries found to be anomalous and/or malicious in order to reduce false positive detections. One such test performed by an embodiment of the database server 116 is determining whether the one or more additional tokens in the incoming query occur near a parameterizable field in the template query. All code injections will occur around parameterizable fields because these fields are the only places that the attacker can introduce the code. Therefore, one embodiment of the database server 116 does not declare malicious incoming queries that have additional tokens at locations not near parameterizable fields. Further, one embodiment does not declare queries with extra tokens malicious unless the extra tokens occur immediately after a parameterizable field. Depending upon the embodiment, the database server 116 can declare these non-malicious queries as anomalous or legitimate. Other embodiments can use different tests in addition to, or instead of, the ones described herein.
The above description is included to illustrate the operation of the preferred embodiments and is not meant to limit the scope of the invention. The scope of the invention is to be limited only by the following claims. From the above discussion, many variations will be apparent to one skilled in the relevant art that would yet be encompassed by the spirit and scope of the invention.
| Number | Name | Date | Kind |
|---|---|---|---|
| 4959849 | Bhusri | Sep 1990 | A |
| 5040214 | Grossberg et al. | Aug 1991 | A |
| 5355474 | Thuraisngham et al. | Oct 1994 | A |
| 5546576 | Cochrane et al. | Aug 1996 | A |
| 5584024 | Shwartz | Dec 1996 | A |
| 5664172 | Antoshenkov | Sep 1997 | A |
| 5742806 | Reiner et al. | Apr 1998 | A |
| 5768422 | Yaeger | Jun 1998 | A |
| 5774888 | Light | Jun 1998 | A |
| 5812840 | Shwartz | Sep 1998 | A |
| 5826076 | Bradley et al. | Oct 1998 | A |
| 6088803 | Tso et al. | Jul 2000 | A |
| 6128740 | Curry et al. | Oct 2000 | A |
| 6212524 | Weissman et al. | Apr 2001 | B1 |
| 6282539 | Luca | Aug 2001 | B1 |
| 6311278 | Raanan et al. | Oct 2001 | B1 |
| 6314409 | Schneck et al. | Nov 2001 | B2 |
| 6356887 | Berenson et al. | Mar 2002 | B1 |
| 6584569 | Reshef et al. | Jun 2003 | B2 |
| 6598038 | Guay et al. | Jul 2003 | B1 |
| 6775657 | Baker | Aug 2004 | B1 |
| 6775827 | Harkins | Aug 2004 | B1 |
| 6839850 | Campbell et al. | Jan 2005 | B1 |
| 6928553 | Xiong et al. | Aug 2005 | B2 |
| 7050936 | Levy et al. | May 2006 | B2 |
| 7085780 | Sakamoto et al. | Aug 2006 | B2 |
| 7085928 | Schmid et al. | Aug 2006 | B1 |
| 7099871 | Faybishenko et al. | Aug 2006 | B2 |
| 7120645 | Manikutty et al. | Oct 2006 | B2 |
| 7120933 | Mattsson | Oct 2006 | B2 |
| 7185232 | Leavy et al. | Feb 2007 | B1 |
| 7237265 | Reshef et al. | Jun 2007 | B2 |
| 20020065896 | Burakoff et al. | May 2002 | A1 |
| 20020083343 | Crosbie et al. | Jun 2002 | A1 |
| 20020087882 | Schneider et al. | Jul 2002 | A1 |
| 20020133606 | Mitomo et al. | Sep 2002 | A1 |
| 20020157020 | Royer | Oct 2002 | A1 |
| 20030037251 | Frieder et al. | Feb 2003 | A1 |
| 20030051026 | Carter et al. | Mar 2003 | A1 |
| 20030069880 | Harrison et al. | Apr 2003 | A1 |
| 20030101355 | Mattsson | May 2003 | A1 |
| 20030133554 | Nykanen et al. | Jul 2003 | A1 |
| 20030145226 | Bruton et al. | Jul 2003 | A1 |
| 20030154402 | Pandit et al. | Aug 2003 | A1 |
| 20030167229 | Ludwig et al. | Sep 2003 | A1 |
| 20030188189 | Desai et al. | Oct 2003 | A1 |
| 20030204719 | Ben-Itzhak | Oct 2003 | A1 |
| 20030221123 | Beavers | Nov 2003 | A1 |
| 20040098617 | Sekar | May 2004 | A1 |
| 20040098623 | Scheidell | May 2004 | A1 |
| 20040139061 | Colossi et al. | Jul 2004 | A1 |
| 20040193656 | Pizzo et al. | Sep 2004 | A1 |
| 20040199535 | Zuk | Oct 2004 | A1 |
| 20040205360 | Norton et al. | Oct 2004 | A1 |
| 20040220915 | Kline et al. | Nov 2004 | A1 |
| 20040250127 | Scoredos et al. | Dec 2004 | A1 |
| 20040250134 | Kohler et al. | Dec 2004 | A1 |
| 20040254919 | Giuseppini | Dec 2004 | A1 |
| 20040260945 | Raikar et al. | Dec 2004 | A1 |
| 20050086529 | Buchsbaum | Apr 2005 | A1 |
| 20050097149 | Vaitzblit et al. | May 2005 | A1 |
| 20050108063 | Madill et al. | May 2005 | A1 |
| 20050138006 | Bennett et al. | Jun 2005 | A1 |
| 20050138426 | Styslinger | Jun 2005 | A1 |
| 20050154733 | Meltzer et al. | Jul 2005 | A1 |
| 20050203886 | Wong | Sep 2005 | A1 |
| 20050203921 | Newman et al. | Sep 2005 | A1 |
| 20050273859 | Chess et al. | Dec 2005 | A1 |
| 20050289187 | Wong et al. | Dec 2005 | A1 |
| 20060070128 | Heimerdinger et al. | Mar 2006 | A1 |
| 20060117386 | Gupta et al. | Jun 2006 | A1 |
| 20060212438 | Ng | Sep 2006 | A1 |
| 20060212941 | Bronnikov et al. | Sep 2006 | A1 |
| 20060242136 | Hammond et al. | Oct 2006 | A1 |
| 20060248440 | Rhoads et al. | Nov 2006 | A1 |
| 20060271708 | Bolles et al. | Nov 2006 | A1 |
| 20070074188 | Huang et al. | Mar 2007 | A1 |
| 20070094728 | Julisch et al. | Apr 2007 | A1 |
| 20070156666 | VanRiper et al. | Jul 2007 | A1 |
| 20070169194 | Church et al. | Jul 2007 | A1 |
| Number | Date | Country |
|---|---|---|
| 287310 | Oct 1988 | EP |
| WO 0171499 | Sep 2001 | WO |