This invention relates generally to the field of web applications and, more specifically, to a system and a method for providing a symbolic execution engine for validating the functionality of web applications.
Typically a software application is validated through testing where a series of regression tests are run either manually or automatically after each modification of the software. Such testing techniques usually give poor functional coverage of the application under test and, further, may be time consuming. To address these issues, formal verification techniques have emerged as an alternative technology to validate software systems. Such verification tools try to mathematically prove the satisfiability of a specific requirement on a software application or obtain a counterexample in the form of a test case that breaks the requirement—thus, pointing to a bug.
A formal verification system used in software validation typically uses a state-based model checker as its internal proof engine. The checker requires non-deterministic user inputs in the drivers that feed the application being checked. Such model checkers cannot reason on a complete input space. For example, in the case of a complete range of integers, strings, etc., it can only evaluate the possible scenarios that are specified in the drivers.
Symbolic execution is a different type of stateless model checking that treats all inputs to a program as symbols and creates complex equations by executing all possible paths in the program. These equations are then solved through a solver generally [called a decision procedure] to obtain error scenarios, if any. Thus far, symbolic execution has been only successful in handling primitive types like integers, floats, and Booleans in Java programs that are used to create most web applications. However, in the case of web applications, most of the inputs and primitive types are strings. Hence, it is necessary to model strings in the symbolic execution algebra. Also, it may be necessary to symbolically model frequently used data structures in web applications like lists, maps, sets, etc. for performance reasons.
Therefore, the ability to solve verification problems in web applications creates an interesting challenge. As with all such processing operations, of critical importance are issues relating to speed, accuracy, and automation.
The present invention provides a method and a system for providing a symbolic execution engine for web applications that substantially eliminates or reduces at least some of the disadvantages and problems associated with previous methods and systems.
In accordance with a particular embodiment of the present invention, a method is offered that includes generating symbolic string manipulations for one or more web applications. The manipulations are generalized into a string manipulation symbolic algebra. The method also includes performing an integrated symbolic execution on other primitive data types like integers or Boolean values present in web applications. Typically, a Java model checker is augmented to check for certain types of properties while performing the symbolic execution. If an error scenario exists, a solution to a set of symbolic constraints is obtained, and the solution is mapped back to the source code to obtain an error trace.
In specific embodiments, a set of properties are identified that can be checked by symbolic execution type model checking, whereby properties are encoded through templates and checked using third party off-the-shelf decision procedures. The properties being checked can relate to security validation. Also, the symbolic execution can be customized and tuned for different types of Java-based web applications.
Technical advantages of particular embodiments of the present invention include: 1) exhaustive checking over input domain and feasible program execution paths; 2) creating user inputs in drivers becomes unnecessary; 3) unexpected errors/behaviors can be uncovered; 4) and automatic test data generation is available to uncover bugs if present.
Other technical advantages will be readily apparent to one skilled in the art from the following figures, descriptions, and claims. Moreover, while specific advantages have been enumerated above, various embodiments may include all, some or none of the enumerated advantages.
For a more complete understanding of particular embodiments of the invention and their advantages, reference is now made to the following descriptions, taken in conjunction with the accompanying drawings, in which:
In accordance with the teachings of example embodiments of the present invention, the architecture presented herein creates a new symbolic execution engine that is tuned to web applications. Off-the-shelf components (e.g., Java model checker 14 and decision procedure solver 26), can be used to check for certain types of requirements/properties, which were not previously possible to identify.
As a result the symbolic execution of this instrumented code can have better performance and need less computing resources. The instrumentation phase creates a symbolic model 36 of the web application. It should be appreciated that different web applications can be studied to compile a series of possible symbolic manipulation functions on primitive data types. In this embodiment of the invention, string manipulations functions such as concatenation, truncation, upper case/lower case, etc., are generalized into a string manipulation symbolic algebra. Common data structures used in web applications such as lists, maps, arrays, etc. have also been symbolically modeled. These symbolic data manipulation functions are stored in a symbolic class library 38.
At this point, a traditional state-based Java model checker 40 is invoked to do the symbolic execution on symbolic model 36 where the instrumented functions are interpreted using symbolic library 36.
The result is a series of complex equations that model the non-string symbolic data and a series of finite-state machines (FSMs) that model the symbolic string data, as shown as a symbolic equations and string FSMs component 42. This is fed into an off-the-shelf decision procedure that solves the non-string equations and an FSM intersector (shown as component 44), which intersects the sets of symbolic strings with an FSM representing error strings at a particular point in the application program. If the solution of the decision procedure or the FSM intersection is empty, then a requirement is validated. If not, an error scenario is generated that is mapped back to the application code to generate a test case that uncovers a bug. This is shown in validated or error trace component 46. A set of properties/requirements have been identified in web applications that can only be checked by this type of symbolic execution based model checking. There are many such examples in security based properties that need exhaustive checking for complete confidence in the robustness of the web application.
At a program hotspot 54, where a requirement is to be checked, the string FSM is intersected with an FSM representing the set of error strings that should not occur at that point. This set of strings is obtained from a user requirement and is shown in a component 56. In addition, the symbolic equation that encodes the program path that leads to the hotspot is solved with an off-the-shelf decision procedure solver. If the decision procedure solution is empty, then it signifies an impossible path or false path in the program. Alternately, if the intersection FSM is empty, then error strings are not possible at the hotspot. In either case the requirement is validated. However, if the decision procedure returns a solution (signifying a true path) and the intersection FSM is non-empty, then error strings are possible at the hotspot, and a bug is found. This solution is mapped back to the application program and percolated all the way up to the driver inputs to create an error trace and a test case that catches the bug. This test case generation is fully automated thereby reducing manual verification time. Moreover, such a test case may be missed if test cases are manually generated, thus, illustrating the usefulness of this technique.
Recall that the formal verification engine used in the software validation framework is a state-based model checker. The checker requires non-deterministic user inputs in the drivers. These model checkers cannot reason on a complete input space, for example, in a case of the whole range of integers, strings, etc., but can evaluate only the possible scenarios that are specified in the drivers.
In a case of symbolic execution, the model checking is stateless and it treats all inputs to a program as symbols, thereby, covering the complete input space. Symbolic execution has been only successful in handling primitive data (like integers and Booleans in a Java program). However, in the case of web applications, most of the inputs and primitive types are strings. Hence, it is necessary to model strings in the symbolic execution algebra.
However, the decision procedure used as a solver at the backend of this method is both CPU-time and memory intensive. Thus, it is necessary to symbolically model frequently used data structures in web applications like maps, sets, etc. for better performance of the decision procedure solver. Also, the amount of code instrumentation needed to create the symbolic model is kept to a minimum by using static analysis techniques (like relevancy analysis). This helps in reducing the size of the symbolic equations that need to be solved and, further, keeps the decision procedure complexity manageable.
The resultant architecture of the present invention offers a methodology that eliminates the need to create user inputs in drivers. Additionally, unexpected errors/behaviors can be uncovered. Also, with use of the present invention, manual test case generation time is reduced by automatically generating interesting test cases based on user requirement. Finally, the methodology has the potential to actually validate requirements based on exhaustive program path and input coverage. This is not possible using traditional testing methods but can be of critical importance in cases like security validation.
Note that deficiencies in formal validation techniques for software include: 1) state-based formal model checkers require input data in drivers; 2) an inability to handle all types of properties that span across the whole integer range, string range, etc.; and 3) automatic checking is limited to non-deterministic input choices provided in drivers.
Suppose there is a requirement that asks: Is it possible to have an integer in the input space that causes the system to break? In state-based model checking, it is not always possible to get that integer. To get around this issue, designers typically select a specific or a random integer to test for many scenarios. However, the exact integer that would cause a break condition would not necessarily be identified. A similar application involves strings in an input space (e.g., a login or a password where a malicious string is provided that orders the application to break). Again, the result is that, in these state-based scenarios, a designer does not know which string will break the application, so many have to be attempted.
Additionally, current symbolic execution engines are restrictive, for example: 1) algebra developed for integers, reals, and Booleans, but not for strings; 2) strings are the primary input values in web applications; and 3) certain data structures frequently used in web applications need to be modeled [e.g., hash-map, set, etc.]. Symbolic execution is able uncover error scenarios. Thus, the present invention aims to provide a symbolic execution engine that is customized and that is tuned for web applications.
Illustrated in
In a normal usage configuration, the user submits a generic login “doe” and a pin “123.” [SELECT info FROM users WHERE login=doe AND pin=123.] In the case of malicious usage, an attacker submits ‘; SHUTDOWN;—’ and pin of ‘0’. [SELECT info FROM users WHERE login=; SHUTDOWN;— AND pin=0.] The response in this scenario is that the database shuts down. This illustrates a piggy-back, stored procedure attack. This is a type of security attack on the web application database by only using the web browser and is known as an SQL injection attack. Such malicious strings can be detected and, further, restricted from reaching the database by using the present invention. This is further detailed and discussed below.
As is demonstrated in
Thus, a symbolic execution methodology and formal model checking techniques have been used to find security holes. There are several steps in the interaction of
In this scenario, a person can check not only expected inputs, but also unexpected ones. This can be accomplished using a sophisticated symbolic string manipulation library. [e.g., “declare @a char(20) select @a=0x73687574646f776e exec(@a)”]. This represents HEX for ‘SHUTDOWN’. The symbolic string manipulation libraries can automatically check for this variant of the malicious string.
In terms of advantages, the custom symbolic execution engine offers exhaustive checking over the input domain and over all feasible paths in the application program. There is no need for user inputs in drivers. In this case, unexpected behaviors/errors can be found. Moreover, the system is coupled with a GUI-based, intuitive user interface for specifying requirements/properties. The architecture can be customized and tuned for Java-based web applications. Thus, such an optimized architecture offers a symbolic execution tuned to web applications, which includes string manipulation algebra and applications for security validation. Such types of property checks that need to reason on the complete input space are not possible other than through this technology.
It is critical to note that the components illustrated in
While the present invention has been described in detail with specific components being identified, various changes and modifications may be suggested to one skilled in the art and, further, it is intended that the present invention encompass any such changes and modifications as clearly falling within the scope of the appended claims.
Note also that, with respect to specific process flows disclosed, any steps discussed within the flows may be modified, augmented, or omitted without departing from the scope of the invention. Additionally, steps may be performed in any suitable order, or concurrently, without departing from the scope of the invention.
Numerous other changes, substitutions, variations, alterations, and modifications may be ascertained to one skilled in the art and it is intended that the present invention encompass all such changes, substitutions, variations, alterations, and modifications as falling within the scope of the appended claims.