In order to better understand the present invention, the following definitions or working definitions are listed in Table I below:
Resource may also be a JavaScript link that creates a page. Resources are not limited to files that comprise web pages. Resource may also be a configuration file or file that does not serve content, but rather performs some functions. All substantial resource “types” are listed below in Table II.
Resource attributes are a resource (web page) that may contain some images as well as content that come from a database which require a cookie in order to browse the page. In this example, three attributes are needed to catalog: images, a database connection, and a cookie. Further examples of resource attributes are listed below in Table III.
Examples of Interactive Resources include database driven content in which database driven content is “interactive” because it requires the web server to communicate with the database and retrieve something specific. An attacker typically focuses on Interactive Resources because they can modify the request the web server issues in order to attempt some form of attack by interacting with these backend systems that run the web site.
On the other hand, non-interactive resources are typically a page that contains static text and perhaps a few images. A non-interactive resource does not require the web server to do anything other than having the server feed the flat file to a browser. The user cannot do anything to this flat file because the web server does not interact with anything.
A crawler is responsible for, among other things, crawling the entire site. A crawler is the foundation for all scan activity since it provides data subject to further processing by the present invention. If the crawler can not build a proper catalog of all site contents, the present invention will not be able to do anything to it (i.e. attack it to perform a vulnerability assessment including the generation of a report).
Referring to
Upon completion of the crawl, the spidering engine 10 passes the collected links to an analysis engine 12 that identifies attributes (e.g. attributes listed in Table III) that can be used to calculate exposure. Some of the attributes are cookies set by the “Set-Cookie” header, forms, hidden input fields, POST data, URL parameters, e-mail addresses, and HTML comments. The analysis engine 12 counts the raw number of attributes per link and the overall count for the application. Once the attributes have been identified, the exposure is then calculated. A report 14 is generated for analysis. The spidering engine and the analysis engine 12 may be controlled by a micro-controller 16.
Referring to
Referring to
A determination is made as to whether the resource cataloged is interactive or static (non-interactive) 40. It then takes all the static, non-interactive resources and tosses them out 42. What is left is the interactive content, or what we call Attack Points 44. Attack Points 44 are resources that possess attributes that an attacker could interact with (targeting the web server, application server or database), such as a form field, a database connection or a hidden field.
As shown in
One often refers to application threat modeling as a “qualitative analysis” of the target site. It does not contain any discrete vulnerability information (what is often called “quantitative analysis”), but rather focuses on the structure and content of the site and how that may have an impact on future, or emerging, security threats. This is what the present invention teaches.
A good example of why Attack Points 44 are a concern is shown with a site that has many form fields. While the application's processing of such form inputs may be secure at this time, any change to the site (such as a new application or a modification to one) could possibly introduce a form-based attack vulnerability. Additionally, a new attack could be devised so that it might affect form inputs that interact with such applications. Here we see that even though they may currently be secure, the sheer existence of such resources (i.e. form fields on a web page) creates a persistent concern that must be monitored and considered throughout the application life-cycle.
Additionally, the application threat modeling of the present invention allows security personnel to understand what their application security program should include to best secure their web sites. Since not all web sites have the same security exposure or security concerns, it is important to make sure that the organization is aligning their security programs with relevant security exposure. An exemplified technical explanation of the above using two types of web sites is shown below:
The above examples show us that not all sites are equally created. The application threat modeling of the present invention is designed to communicate this information so that a company's security, development, and QA teams may understand how their online business model is affected by such security threats. Simply put, the present invention gives them the information they need, but previously did not have in order to align their security related efforts of securing their web business.
The crawler also communicates with Response codes, Web server platforms, and External site links (including the data that is being sent via SSL and plaintext)
As mentioned, once the Present invention has catalogued all the interactive site content and its attributes, it then performs a calculation to determine the extent of “security exposure”. It is critical to point out that this calculation is subjective in that different people have different preconceived notions regarding the security field. Therefore while a paranoid individual might find even the slightest bit of exposure to be an unacceptable threat, another individual might not care that 100% of the site can be hacked through an abundance of attack vectors.
The present invention creates a rudimentary exposure scoring calculation that provides a perceived level of security exposure. The exposure is correlated with otherwise unused information into report 14 which communicates or answers the questions of:
Where for each type of attack point, the total number of points present in the application is denoted by (APtotal), which is multiplied by a weighting factor (APweight) that is predetermined by a user. An attack point can contribute no more than a maximum value (APceiling) to the exposure rating. The minimum value is chosen between the attack point's score and its ceiling. The sum of all attack point scores represents the exposure rating.
While other technologies may capture the above-mentioned data in many forms, some may capture only part of the data, and others may capture all of it. But the data is not the whole invention herein, but rather, it is the correlation of how the site construction does or does not create a security concern based upon a novel report 14 that correlates the parameters of a site automatically.
A human user or technician can perform the present invention. However, the present invention teaches an automatic process wherein human intervention during processing is not necessary. In other words, the present invention teaches a method of computer readable automatic data processing where no human operator is needed for generating the report 14 based upon equation 1.
Unlike prior art systems, such as the 737' patent that operates at OSI levels 4,5,6, the Web Application Scanner of the present invention operates at level 7 and generally only connects to the two web server ports (e.g. 80 and 443) in order to exercise the custom web application and the application's HTML pages. The present invention operates on a different network stack level, automating the manual input techniques an application tester would apply against the content of custom and dynamically generated HTML applications. In other words, the present invention does not test the level 6 input of the server.
The present invention is associated with a Web Application Scanner. A Web Application Scanner generally only connects to the two web server ports (e.g. 80 and 443) in order to exercise the custom web application that is accessed through it. The present invention only scans the web application content at level 7 of the network protocol stack and not the web server at layer 6 or lower. These packets for different levels are constructed differently and do not cross stack boundaries.
It is important to note that while the present invention has been described in the context of a fully functioning data processing system, those of ordinary skill in the art will appreciate that the processes of the present invention are capable of being distributed in a form of a computer readable medium of instructions in addition to a variety of other forms. Further, the present invention applies equally, regardless of the particular type of signal bearing media that is actually used to carry out the distribution. Examples of computer readable media include recordable-type media such a floppy disc, a hard disk drive, a RAM, a CD-ROM, a DVD-ROM, a flash memory card and transmission-type media such as digital and analog communications links, or wired or wireless communication links using transmission forms such as radio frequency and light wave transmissions. The computer readable media may take the form coded formats that are decoded for actual use in a particular data processing system.
Accordingly, it is to be understood that the embodiments of the invention herein described are merely illustrative of the application of the principles of the invention. Reference herein to details of the illustrated embodiments is not intended to limit the scope of the claims, which they themselves recite features regarded as essential to the invention.