1. Field of the Invention
The present invention relates to the field of data processing and in particular to a tool for monitoring rules for a rules-based transformation engine.
2. Related Art
The World Wide Web is the Internet's multimedia information retrieval system. In the web environment, client machines communicate with web servers using the HyperText Transfer Protocol (HTTP). The web servers provide users with access to files such as text, graphics, images, sound, video, etc., using a markup language such as HyperText Markup Language (HTML). HTML provides basic document formatting and allows the developer to specify connections known as hyperlinks to other servers and files. In the Internet paradigm, a network path to a server is identified by a resource address called a Uniform Resource Locator (URL) having a special syntax for defining a network connection. So-called web browsers, for example, Netscape Navigator (Netscape Navigator is a registered trademark of Netscape Communications Corporation) or Microsoft Internet Explorer (Microsoft and Internet Explorer are trademarks of Microsoft Corporation), which are applications running on a client machine, enable users to access information by specification of a link via the URL and to navigate between different HTML pages.
When the user of the web browser selects a link, the client machine issues a request to a naming service to map a hostname (in the URL) to a particular network IP (Internet Protocol) address at which the server machine is located. The naming service returns an IP address that can respond to the request. Using the IP address, the web browser establishes a connection to the server machine. If the server machine is available, it returns a web page. To facilitate further navigation within the site, a web page typically includes one or more hypertext references known as “links.”
For improved security, reverse proxy (also called IP-forwarding) topologies may be used. These use a reverse proxy server to represent a secure content server to outside clients. Outside clients are not allowed to access the content server; their requests are sent to the reverse proxy server instead, which then forwards the client requests to the content server. The content server forwards the requests to the applications or application servers for processing. The reverse proxy server returns the completed request to the client while hiding the identity of the portal and application servers from the client. This prevents the outside clients from obtaining direct, unmonitored access to the real content server.
Most reverse proxy systems use a simple configuration where HTML rewriting can be turned on or off and the definition of what is rewritten is “hard-wired.” For example, the IBM® WebSphere® Edge Server uses a “Junction Rewrite” setting.
Some reverse proxy servers use rules-based transformation engines to proxy the content from backend servers. A set of rules can be used to specify what content is transformed as well as how it is transformed, in order to achieve such proxying. For example, URLs referring to the content server will be transformed to refer to the reverse proxy server, such that future requests from client systems will address the reverse proxy server.
During the development of such systems, administrators need to be able to find and correct errors in the set of rules. An HTTP packet tracking utility can be used to show what requests are made to a backend server, and what content is returned. However, there is a need for an improved tool which eases the burden of finding and correcting errors.
A first aspect of the invention provides a method of monitoring a transformation of source markup by a rules-based transformation engine. The method comprises storing a set of rules, scanning the source markup, generating edit information in accordance with the set of rules, and transforming the source markup into transformed markup in accordance with the rules. At least one of the source markup and transformed markup are modified in accordance with the edit information, and the modified markup is rendered to highlight those portions affected by transformations.
A second aspect of the invention provides a rules-based transformation engine for transforming source markup in accordance with a set of rules. The transformation engine comprises a matching component for scanning the source markup and generating edit information in accordance with a set of rules. A transforming component transforms the source markup into transformed markup in accordance with the rules, and a text modifier receives the source markup, transformed markup, and edit information, and modifies at least one of the source markup and transformed markup in accordance with the edit information. A rendering component renders the modified markup to highlight those portions affected by transformations in a user display.
A further aspect of the present invention provides a tool for monitoring the transformation of source markup by a rules-based transformation engine as described above. The monitoring tool comprises a text modifier for receiving the source markup, transformed markup, and edit information and which modifies the source markup and/or transformed markup in accordance with the edit information. The modified markup can then be rendered by a rendering component in a format which highlights those portions affected by transformations.
The tool may be implemented in a reverse proxy mechanism and may also comprise a logging component for recording requests to a backend server and responses returned, as well as for storing the modified markup produced by the text modifier.
Embodiments of the invention provide a visual tool which is capable of showing how content has been transformed by a rules-based transformation engine and by which particular rules, in order to debug the dynamic proxying of markup content sent by backend servers. Users can see all requests that are made to the backend server. For each request, the time and URL of the request is shown, a response code, such as an HTTP response code, and the content type, such as the Multipurpose Internet Mail Extensions (MIME) type, of the response. For responses transformed by the rules-based transformation engine a link to representations of the source and transformed HTML is provided, in which the HTML content is modified to highlight any text affected by the transformation engine. Selection of a portion of highlighted text, for example by hovering a cursor over the text using a cursor control device such as a mouse, leads to the display of a “pop-up” message stating what rule was applied.
The present invention thus enables users to dynamically debug HTML content sent back by the backend servers and to see the requests made to the backend server, the status of each request, what content is transformed, how it is transformed, and by what particular rules.
Various embodiments of the present invention will now be described by way of example only, with reference to the accompanying drawings in which:
The reverse proxy server 14 is hosted on a portal server, which is operable to execute a web application which arranges web content into a portal page containing one or more portlets. A portlet is a web component which processes and generates dynamic web content. The portal aggregates this content, often called a fragment, with content from other portlets to form a portal page. The content generated by a portlet may vary from one user to another depending on the user configuration for the portlet. A portal can act as a gateway to one or more backend software applications. The portal can be used to deliver customised application content, such as forums, search engines, email and other information, within a standard template and using a common user interface mechanism. Users can be offered a single, personalised view of all the backend applications with which they work and can obtain access to a plurality of those backend applications through a single security sign-on.
The reverse proxy server 14 intercepts browser requests 38 from a browser on the client 10 and then redirects them to the backend server 12. When a portal page is requested by the browser, all portlets appearing on the requested page are called. The “real” address is determined and a request 40 is made to the backend server 12 by the reverse proxy server 14. The backend server 12 generates the requested content and sends a response 42 containing this content back to the reverse proxy server 14.
A transformation engine 18 (shown as “parser” in
In the reverse proxy example of this embodiment, the transformation engine 18 converts any references to the backend server 12 into references to the reverse proxy server 14. This ensures that future requests from the browser will be sent to, and thus intercepted by, the reverse proxy server 14. The transformed HTML 28 is then returned to the browser.
The reverse proxy server 14 also comprises a text modifier 22 that is used in conjunction with the transformation engine 18 and the rules 20, to record transformations made when a debug feature is activated. When the debug feature is switched on, as the transformation engine 18 transforms the backend content (source HTML 26), the text modifier 22 simultaneously creates two extra scripts, namely a Modified Source HTML 34 and a modified transformed HTML 36.
This process will be described in more detail with reference to
The rules 20 comprise a list of regular expression patterns, which can be used to identify particular patterns of code, each regular expression pattern having a corresponding “output model” which defines how a matched pattern of code is to be rewritten. The rules 20 indicate whether or not the search for each regular expression pattern in the received content is case sensitive. The regular expression patterns use certain characters, such as “.”, “*” and “?”, to represent wild card characters or wild card character strings (see, for example, http://jakarta.apache.org/regexp for more information).
One example of a matching pattern and its use by the matching and transforming components 60, 62 will now be explained below:
The matching component 60 may insert the markers and rule information into the source HTML 26 to provide a combined document 64, or it may provide this edit information separately to the source HTML 26. The source HTML 26 and edit information is passed to the transforming component 62 as well as to the text modifier 22. The transforming component 62 transforms the source HTML 26 according to the rules 20 using the edit information supplied and outputs the transformed HTML 28. Additionally, it passes the transformed HTML 28 and its associated edit information 66, to the text modifier 22.
The text modifier modifies the HTML 66, 64 it receives by escaping out the HTML tags so that it can be printed on a screen. Escape sequences, also known as character entities, are used to insert special characters, such as the left angle bracket (<), the right angle bracket (>), and the ampersand (&), which have special meanings in HTML, into an HTML document. The angle brackets are used to indicate the beginning and end of HTML tags, and the ampersand is used to indicate the beginning of an escape sequence. The text modifier escapes out the tags by replacing the left and right angle brackets with their respective escape sequences: < for <; and > for >. Thus, a browser will then display the tags as part of the HTML text, rather than interpreting them.
The text modifier 22 also uses the edit information to add new HTML tags to highlight the text which will be or has been transformed by the transformation engine 18. It may also add markup content to enable a pop-up message to be presented when a user selects a particular piece of highlighted text, for example by hovering a cursor over the text, the pop-up identifying which rule will be (in the modified source HTML 34) or was (in the modified transformed HTML 36) applied to transform that text by the transformation engine 18.
The text modifier generates the modified Source HTML 34 from the source HTML 26 and edit information 64, and generates the modified transformed HTML 36 from the transformed HTML 28 and edit information 66.
The system also comprises a logger 16 that logs relevant information including request info 30, response info 32, modified source HTML 34, and modified transformed HTML 36 in a log 24. The request info 30 comprises data identifying a particular request such as the time sent and URL to which it is addressed. The response info 32 comprises data such as HTTP response code and MIME-type of the response. HTTP response codes are grouped into a number of different series:
200-series HTTP response codes indicate that the request was processed without any error conditions;
300-series response codes indicate that the document requested has moved to some other location, or that the browser is being redirected for some other reason;
400-series messages indicate that the browser did something wrong; and
500-series messages indicate that something went wrong on the server.
In the debug feature, a user interface comprising preview and request pages 46, 44 may be provided to a system administrator. The preview page 46, an example of which is shown in
When the debug mode is active, the text modifier 22 creates modified source HTML 34 and modified transformed HTML 36 content. The logger 16 is then called to log the request info 30 and response info 32, the modified source HTML 34, and the modified transformed HTML 36 for this particular request.
The user may then switch to the requests page 44, an example of which is shown in
For example, for a given request, clicking on the “Source” link on the request page 44 will bring up a screen such as that shown in
For the same request, clicking on the “Transformed” link on the request page 44 will bring up a screen like that shown in
The transformation engine 18 can be implemented as a “reverse proxy portlet,” which can be installed on a portal page like any other portlet, and which acts as a window through which users interact with the back-end application. The reverse proxy portlet provides a highly customizable solution to reverse proxying, where rules can be created for every individual transformation requirement. The configuration rules of the portlet comprise the set of pattern matching rules to identify and rewrite URLs in received content. These rules can be configured for individual applications. The reverse proxy portlet rewrites all URLs contained in the source HTML 26 to point to the portlet itself rather than to the backend server.
Portlets have a number of different modes which can be selected, some of which are available only to a portlet developer or system administrator. The normal mode of operation of a portlet is the view mode, which is how the portlet is usually initially displayed to a user. A portlet may also support a help mode, which may provide a help page to enable users to obtain more information about the portlet. In the configure mode of a portlet, a portal developer or administrator can alter the configuration rules of the portlet. In an embodiment of the present invention, in the configure mode of the reverse proxy portlet the administrator is able to select a new “debug” feature which functions as described above.
Insofar as embodiments of the invention described are implementable, at least in part, using a software-controlled programmable processing device, such as a microprocessor, digital signal processor or other processing device, data processing apparatus or system, it will be appreciated that a computer program for configuring a programmable device, apparatus or system to implement the foregoing described methods is envisaged as an aspect of the present invention. The computer program may be embodied as source code or undergo compilation for implementation on a processing device, apparatus or system or may be embodied as object code, for example.
Suitably, the computer program is stored on a carrier medium in machine or device readable form, for example in solid-state memory, magnetic memory such as disc or tape, optically or magneto-optically readable memory such as compact disk (CD) or Digital Versatile Disk (DVD) etc, and the processing device utilizes the program or a part thereof to configure it for operation. The computer program may be supplied from a remote source embodied in a communications medium such as an electronic signal, radio frequency carrier wave or optical carrier wave. Such carrier media are also envisaged as aspects of the present invention.
It will be understood by those skilled in the art that, although the present invention has been described in relation to the preceding example embodiments, the invention is not limited thereto and that there are many possible variations and modifications which fall within the scope of the invention. For example, the tool may be used in any rules-based transformation engine, and although the preferred embodiment has been described in relation to the transformation of HTML, the tool could be applied to the transformation of any kind of markup or text. Such modifications and variations that may be apparent to a person skilled in the art are intended to be included within the scope of this invention as defined by the accompanying claims.