Almost every web page contains active content in the form of JavaScripts, JAVA files, executable files, browser plugins, etc. Active content is necessary for creating dynamic web pages, but it also enables an attacker to launch attacks on visitors of malicious or compromised websites. Attacks can also be launched by exploiting vulnerabilities in the website that otherwise do no host malicious content. For example, a Cross Site Scripting (XSS) attack becomes feasible when the input from a user is not properly validated. An attacker can trick a user into clicking a specially crafted link that points to the vulnerable site. The XSS vulnerability causes the website to send malicious code (JavaScript provided by the attacker as part of the link) to the victim's machine. By exploiting XSS vulnerabilities, attackers can steal cookies or launch an exploit to install malware. These XSS attacks can be persistent or non-persistent. In 2007, XSS vulnerabilities accounted for 84% of all security vulnerabilities [1]. Two of the top ten risks are associated with XSS [2] and according to SANS it is the #1 software error [3].
Web content filters protect clients from web-based exploits by blocking access to known malicious websites and by scanning web pages for known malicious content. Web application firewalls (WAF) prevent XSS attacks by scanning the incoming data and finding patterns that are consistent with an attack. Some WAFs rewrite URLs to prevent cross-site request forgery attacks. Kausik [4] describes a method to automate the classification of the URLs being access.
While web application firewalls can protect the server side from attacks, the clients remains vulnerable. An attacker can infect the client from a vulnerable website and then target banking applications by placing malware on the client computer or inside the browser. Preventing XSS attacks on the client side is much more difficult and not addressed adequately by content filters, client security software, or network firewalls.
Some client security software [5] can disable scripting for untrusted websites, but that may interfere with the proper functioning of websites and still does not address compromised websites. Microsoft IE has a built-in XSS filter, but it is limited in its effectiveness [6]. Hegli et al. [7] describe a method for controlling access to Internet resource, i.e. a server, based on a reputation index. They need prior information on the content to classify as “bad” and a new malicious content will likely evade detection. Davenport et al. [8] describe a method to detect malicious actions in web page content based on calls to functions that expose vulnerability. This approach too is a “black-list” approach that defines execution of certain functions as “bad”. Dunagan et al. [9] attempt to prevent third-party active content in a web page from accessing private information by generating proxy representation of those objects. Their approach prevents some malicious actions by third-party scripts, but it is not a complete solution and it does not solve the XSS problem. Sterland et al. [10] propose a variation of Dunagan et al. by isolating the execution of untrusted scripts from trusted scripts. They limit untrusted script that are downloaded at runtime from accessing sensitive resources.
Therefore, a need exists for systems and methods to protect clients from web-based attacks. The solution must not take away features of the web in order to improve security. The security mechanism should work seamlessly and without any input from the user. As the web becomes the dominant platform for applications, commerce, banking, etc., the security concerns increase. Such a solution will not only save corporations several billion dollars each year, but it will be critical in maintaining the integrity of government and financial network infrastructure and consumer computers.
An objective of the present invention is to protect client computers when accessing a vulnerable, compromised, or malicious website. A method and system is provided for white-listing the contents of web pages to protect clients from web-based attacks and exploits by removing harmful components from the web pages being accessed by the clients. The present invention overcomes the problem based on traditional white-list and black-list based security solutions for blocking access to web-sites by authenticating the active components of individual web pages.
In accordance with an aspect of the invention, a web page received from a web server is scanned for active components; a hashing algorithm computes cryptographic hashes of active components; matching the cryptographic hashes with known cryptographic hashes for that web page; removing active content for which a cryptographic hash match was not made; forward the modified web page to its intended destination.
In accordance with another aspect of the invention the validation of active content in a web page is performed at a point in the network by first decrypting the web page; scanning the contents of the decrypted web page for active content; a hashing algorithm computes cryptographic hashes of active components; matching the cryptographic hashes with known cryptographic hashes for that web page; removing active content for which a cryptographic hash match was not made; forward the modified web page to its intended destination.
A benefit of using authentication of the contents of web pages is that it can prevent attacks originating from compromised and vulnerable web sites. A black and white list based method for blocking access to websites may not spot a recently compromised web site and permit access, which could result into attacks on the client computer. The task of generating a white list is simpler and can be automated much more efficiently compared to generating a black list of items to block.
Also described in this invention is a method for creating the white list rules for active content in a web page. In a deterministic approach for creating the rules, the web page for which the rule is to be created initiates the request and the rule server examines the page to create white-list rules for that page. Alternatively, the web pages can be scanned by a crawler to create a white list rule database. Finally, the communication between clients and web pages can be monitored and the collected information used for creating white list rules.
Another advantage of authenticating active content in web pages is that it can eliminate XSS attacks in an automated fashion. Instead of relying on heuristics to detect XSS attacks, which has limited effectiveness and can be bypassed, we can guarantee that no attacker supplied malicious code can be executed.
Various embodiments of the present invention taught herein are illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings, in which:
It will be recognized that some or all of the Figures are schematic representations for purposes of illustration and do not necessarily depict the actual relative sizes or locations of the elements shown. The Figures are provided for the purpose of illustrating one or more embodiments of the invention with the explicit understanding that they will not be used to limit the scope or the meaning of the claims.
In the following paragraphs, the present invention will be described in detail by way of example with reference to the attached drawings. While this invention is capable of embodiment in many different forms, there is shown in the drawings and will herein be described in detail specific embodiments, with the understanding that the present disclosure is to be considered as an example of the principles of the invention and not intended to limit the invention to the specific embodiments shown and described. That is, throughout this description, the embodiments and examples shown should be considered as exemplars, rather than as limitations on the present invention. Descriptions of well-known components, methods and/or processing techniques are omitted so as to not unnecessarily obscure the invention. As used herein, the “present invention” refers to any one of the embodiments of the invention described herein, and any equivalents. Furthermore, reference to various feature(s) of the “present invention” throughout this document does not mean that all claimed embodiments or methods must include the referenced feature(s).
In one embodiment of the present invention, authentication of active components in a web page and removal of unauthenticated active components is achieved on the network via a Network device. All connections to the Internet in a network of computing devices with plurality of operating systems are monitored. In another embodiment of the present invention, authentication of active components in a web page and removal of unauthenticated active components is achieved at the client computer via a process.
For the embodiment illustrated in
In an embodiment of the inventions, the validation server 150 executes a validation process 120 that may include several subcomponents, such as content validation process (CVP) 122, the content monitoring process (CMP) 124, and the rule database (RDB) 126. The RDB 126 contains a list of rules and it may be locally stored in the validation engine or reside at a remote server 160. The CVP 122 monitors HTTP requests and responses and is responsible for enforcement of the rules for the web pages being accessed by the client computers. The CMP 124 also monitors HTTP requests and responses to assist in creating new rules and for updating existing rules in the rule database 126. The rules in the RDB 126 may include a list of parameters that includes, but is not limited to, domain name, URL, active content type, active content cryptographic hash, and active content classification. This rule list in the RDB 126 can also be locally generated by monitoring active content from web pages accessed or it can be downloaded from a remote rule server 160.
The validation process 120 may be implemented in several ways.
Because the modification of a web page changes the size of the page, the in-line implementation of content validation is better achieved as a proxy server. The use of a proxy server overcomes the challenge associated with changes in individual network packet size when active content is removed from them. In another embodiment of the present invention where the in-line implementation of content validation is not a proxy server, the size of packets from which content is removed can be preserved by adding content that is not visible in web pages. When the content validation is implemented at the client, similar issues may arise if the implementation is at the transport layer or lower in the open systems interconnect (OSI) stack. However, if the implementation is above the session/transport layer, then the process is greatly simplified because the filtering is performed on the re-assembled web page and not on packets that contain only part of the web page. Web servers often encrypt web pages to improve security and confidentiality of data being accessed by the clients. When the web pages are transmitted in encrypted form, the plain-text of the web page is not accessible for validating the active content. SSL is the protocol used for encrypting all HTTP communications between the client and the web server. In one embodiment of the present invention, the validation server launches a MITM attack on all encrypted sessions to act as a proxy and gains access to the unencrypted plain-text of the web page. In another embodiment of the present invention, the validation server uses a key escrow system to decrypt the encrypted communications.
In one embodiment of the present invention, the rule database 126 is continually updated as client computers access web pages. As shown in
A potential cause for inconsistencies in observed active content of any given web page might be due to a legitimate update of the web page. In one embodiment of the present invention the creator of the web server can request update of the validation rules.
Thus, it is seen that systems and methods for validation of active content in web pages are provided. One skilled in the art will appreciate that the present invention can be practiced by other than the above-described embodiments, which are presented in this description for purposes of illustration and not of limitation. The specification and drawings are not intended to limit the exclusionary scope of this patent document. It is noted that various equivalents for the particular embodiments discussed in this description may practice the invention as well. That is, while the present invention has been described in conjunction with specific embodiments, it is evident that many alternatives, modifications, permutations and variations will become apparent to those of ordinary skill in the art in light of the foregoing description. Accordingly, it is intended that the present invention embrace all such alternatives, modifications and variations as fall within the scope of the appended claims. The fact that a product, process or method exhibits differences from one or more of the above-described exemplary embodiments does not mean that the product or process is outside the scope (literal scope and/or other legally-recognized scope) of the following claims.