This is an original U.S. patent application.
The invention relates to data collection and analysis. More specifically, the invention relates to methods for tracking interactions between users and data servers over the Internet.
The Internet is a global system of interconnected computer networks that supports communication between endpoints and among participating entities. Many different protocols are used to send and receive a wide range of different data types, from simple command and control signals to text, audio, images and video. One common protocol is the Hypertext Transfer Protocol (“HTTP”), specified in a series of Request for Comments (“RFC”) documents, the most recent of which is RFC2616, published June 1999 by The Internet Society. HTTP is the basic workhorse protocol underlying the World Wide Web.
The World Wide Web is system of interlinked hypertext documents that may be accessed via the Internet, often using a computer program called a “browser.” The hypertext documents are stored at (or generated by) computers (“servers”) located at various places in the system of interconnected computers, and are delivered to users at other computers (“clients”) in response to requests from those clients.
There is no centralized registry or monitoring service that indexes all the materials available via the Internet or tracks what clients request or servers deliver).1 Users are relatively unconstrained in the materials they request and the order they request them; while content providers have only modest control over the materials they deliver (providers can refuse to send a requested item, or send something else instead, but cannot generally compel the user to browse from one document to the next). Further, providers have only a limited ability to track user activity: they can usually determine which documents and document sequences a particular user retrieves from their own servers, but not what the user viewed before visiting their servers, or where the user went after his visit. 1Internet “Search Engines” such as the service operated by Google, Inc. of Mountain View, Calif., do attempt to index resources available via the Internet, and these are an important source of information. However, while many content providers seek to be listed in search engines' databases, such listing is neither compulsory nor assured.
This tracking or history data is of great interest to many entities offering products, services and information through the Internet. An entire industry of web analytics tools have emerged to give web content providers a detailed view of how their content is consumed. These tools can tell you, for instance, how many users viewed a given page on a certain clay, where in the world those users are, and what if any other web site referred them to the content producer's site. Web content providers make great use of these tools in order to better understand their user base, and thus better achieve their goals (e.g., more viewers, more profit, etc.) through better understanding of their audience.
A variety of techniques have been developed to improve tracking ability and accuracy, but many of these require cooperation among entities. This exposes a website operator to financial, legal and technical liabilities2 clue to the cooperation, and the liabilities may outweigh the value of the information, or at least partially offset its value. 2For example, the cooperating entity can also collect information about the website's visitors, and may charge a fee for its cooperation.
An independent website operator, acting alone, may have more limited information available to it, or may have to resort to technical measures to collect information that adversely impact its business in other ways. Alternate methods of collecting information about website visitors may be of significant value in this field.
A website using an embodiment of the invention sends a client-side program to a browser, along with other materials requested by the browser. The client-side program dynamically alters the browser's handling of some hyperlinks in a document, without changing the textual representation of the hyperlinks, so that the browser reports activation of an altered hyperlink to the website, even when activation of the hyperlink causes the browser to retrieve resources from a different website.
Embodiments of the invention are illustrated by way of example and not by way of limitation in the figures of the accompanying drawings in which like references indicate similar elements. It should be noted that references to “an” or “one” embodiment in this disclosure are not necessarily to the same embodiment, and such references mean “at least one.”
Embodiments of the invention track some website-visitor departures by transmitting a client-side executable program that dynamically modifies a Document Object Model (“DOM”) structure created by the visitor's browser in the course of displaying a requested document. The modified DOM causes the browser to report activation of outbound links. Since the DOM is modified dynamically, the document content (including any exit destinations) can be indexed properly by a search engine. The operator of a website that employs an embodiment of the invention can track site departures without the cooperation of the external (destination) site's administrators.
At 200, the user directs his browser to retrieve a first web resource. This initial request may come from activation of a hyperlink in another program (e.g., an email viewer or a computer game), or the user may enter a Uniform Resource Locator (“URL”) manually. The browser contacts a web server via the Internet (210), issues a request for the resource (220), and receives data comprising the resource (230). Steps 210, 220 and 230 may be repeated several times (2123) to obtain related resources that are necessary to prepare or display the resource that the user wishes to view. Some types of data (e.g., images, audio) can be displayed or played for the user directly (240), while others (principally Hypertext Markup Language [“HTML”] documents) are parsed to create an in-memory Document Object Model (“DOM”) (250) which is further processed to produce a formatted representation for display (260) and then presented to the user (270).
A DOM may direct the browser to retrieve additional resources (e.g., images, fonts, formatting information or executable code) for use in preparing the display, so the browser may automatically issue additional requests to the web server and process the additional resources appropriately. In other words, steps 250, 260 and 270 may cause additional excursions through steps 210, 220 and 230.
Once the requested resource has been retrieved, prepared and presented, the user can review it (280). A word, phrase or image may be configured as a hyperlink to further information, and if the user activates the link (by using a browser-supported control action) (290), the browser repeats the retrieving-and-displaying sequence to show the linked-to information (or “target” of the hyperlink). Note that some hyperlinks refer the browser to a different resource available from the same web server, while others refer to a resource available from a different web server. The latter type of link will be called an “exit” link, since the browser normally ceases its interactions with the first web server, and starts a new conversation with a second web server. (Servers are often grouped together by a “domain” within which they operate. Domains are apparent to users as part of the URL. For example, the two URLs “http://www.example.com/doc1.html” and “http://www.example.com/doc2.html” refer to two resources, doc1.html and doc2.html, which are available from servers in the same www.example.com domain—in fact, possibly from a single server in that domain. On the other hand, the URL “http://www.other-domain.com/whitepaper.pdf” refers to a third resource which is available from a server at a different domain. It is appreciated that, at the server end, a single server may respond to requests for resources from different domains, or requests for resources from the same domain may be redirected to different servers. However, generally speaking, an embodiment of the invention is most beneficial in tracking a client's destination as it browses from server(s) in one domain, to an unrelated server in a different domain.)
Web analytics tools do not in general show exit destinations today, because the information is difficult to collect. One prior-art method of measuring exit destination involves creating an “intermediate” URL that is owned by the content provider. The purpose of this URL is to capture the exit destination and then forward the user to that destination. For example, if “mysite.com” wanted to create a link to send users to “scoutanalytics.com”, mysite.com would actually create a link of the form:
http://mysite.com/exit-track?destination=scoutanalytics.com This link really goes to mysite.com, not (directly) to scoutanalytics.com. The server for mysite.com would track the destination, and then redirect the user to the intended destination.
This prior-art method has considerable drawbacks. First, the site owner must create and manage the tracking mechanism. Second, users can determine that the URLs do not point to the ultimate destination directly, which may create user confusion. Third, search engines will not correctly interpret these links as pointing to the remote sites.
Embodiments of the invention provide a superior solution to the problem of tracking exit destinations, at least because:
An embodiment of the invention adds a small amount of executable code to the materials transmitted to the web browser in response to a request, to cause the web browser to perform additional operations (apart from the normal operations outlined in
At 110, an executable program is transmitted to a web browser. This program may be inserted into another resource being transmitted, or may be referred to from such a resource, so that the browser makes an additional request to obtain the program. The program may be executable instructions in a scripting language such as JavaScript™, byte codes compiled for an interpreter such as Java™, machine instructions for a processor implementing an instruction set such as Intel® 64, IA-32 or ARM®, or a combination of such instructions. In one preferred embodiment, the instructions are in JavaScript, and utilize the jQuery library of functions.
The executable program causes the web browser to perform additional activities while generating and processing a Document Object Model in preparation for display: first, the browser iterates over hyperlink objects in the DOM (120), and for at least some hyperlinks, a modification is made to cause the browser to perform additional actions if the hyperlink is activated (130). This modification is made dynamically, to an ephemeral, often in-memory DOM, rather than to the original resource (e.g., the HTML document) from which the DOM was created. Thus, neither the original resource (at the server) nor the resource (at the machine that retrieved it) is modified. This is an important difference: some prior-art methods of performing exit tracking require modification of hyperlinks in the source document from which the DOM is prepared. Such modified hyperlinks are often indexed differently (and unfavorably) by Internet search engines.
Finally, when one of the augmented or modified hyperlinks is activated by the user (140), the browser's default action (typically, to retrieve and display material from the hyperlink's target) is extended by transmitting an exit notification (150) before the browser navigates to the hyperlink destination (160).
At the other end of the exit-notification transmission (150), the web server that originally provided the executable program (or, sometimes, a different sewer) receives the notification (170) and records the information in a database (180) for later analysis.
Listing 1 shows a simple code fragment that can be added to documents at a server to cause the browser to retrieve the executable program:
10<script src=“js/jquery-1.4.2.min.js” type=“text/javascript”></script>
This code fragment can often be added on a site-wide basis by editing a single, commonly-included header file, or by inserting it as a customization to the framework code of a Content Management System (“CMS”). Alternatively, it can be added on an ad-hoc basis to particular files for which exit tracking is desired.
Listing 2 shows an example JavaScript method that can be attached to a hyperlink in the DOM to perform exit tracking:
And finally, Listing 3 shows an example JavaScript program to iterate over hyperlinks in a DOM and attach an exit-tracking function to (some of) them:
This example uses the jQuery library; the selector at line 30 identifies a subset of hyperlink (“anchor”) tags that should be tracked (in this example, exit URLs that start with “www.somesite.com”, that end with a “.pdf” extension, or that are present in a “videos” subdirectory).
Listing 4 shows a more-complicated jQuery routine that arranges for more detail about the exit destination to be passed to the exit tracking server than the saRthree( ) function shown above in Listing 2.
In some embodiments, the anchor-tag processing function (e.g., Listing 3) may be designed to attach different tracking functions to different subsets of hyperlinks in the DOM. For example, tags with a particular “id” or “class” specification may be outfitted with different functions. Different target (“href”) destinations may call for different functions. These different functions may report different information to the exit tracking server, or may report tracking information to different exit tracking servers (e.g., there may be separate tracking servers for video exits, PDF exits, and e-commerce site exits).
The embodiments described to this point have been suited for deployment at a single site (or within a single domain). However, by adding a few extra elements, exit tracking can be provided on a “service bureau” basis. I.e., a web analytics firm can provide exit tracking (along with other visitor analyses) for a plurality of unrelated customers who operate websites at different, unrelated domains. Each customer can receive exit-tracking information, even if the exit destination site operators are not also clients of the service bureau.
To accomplish this, the website of a customer of the web-analytics service bureau transmits an executable program to its site visitors, just as if the customer was operating its own stand-alone embodiment. However, this executable program performs additional operations, as outlined in
Early in the creation or processing of the Document Object Model for the instrumented web page, executable code implementing an embodiment of the invention reports correlation data to the analytics server (310). This can be accomplished by adding a small, transparent image (a “pixel tag”) to the DOM, to cause the browser to retrieve the image from the analytics server. The data retrieved is relatively unimportant; this request from the browser is principally useful because it causes the browser to report information about the browser and the page (at the analytics-customer's web site) that is being displayed. The information often comprises a unique token or tracking string to help the analytics server distinguish between different browsers that happen to be viewing the same resource at the web server.
Next, as in the self-hosted embodiments, the executable program iterates over hyperlinks in the DOM (320) and supplements or replaces the default action in some or all of the links (330). When one of the modified hyperlinks is activated (340), instead of navigating directly to the target URL, the browser executes additional code to issue another request to the analytics server (350). The request lists the current page as the referrer and the hyperlink's target URL as the source page. Additional details (e.g., the unique token or tracking string) may also be included in the request. The resource retrieved by this request is, again, relatively unimportant, but the fact that the request was issued allows the analytics server to record useful information such as the amount of time the user spent viewing the page and the exit destination URL. The executable code that handles the hyperlink activation may even abandon the HTTP request after issuing it (360).
Finally, the executable code directs the browser to retrieve the off-site target resource (370), and the browser's normal logic takes over to process and display the resource (380). In a practical implementation, the target-URL-processing code of an embodiment may include tests for conditions that could prevent the exit link from being opened as expected by the original anchor tag.
It is appreciated that instrumented or augmented hyperlinks need not refer to external or cross-domain resources; an embodiment may be applied to any hyperlinks found in the DOM.
Since the requests (450, 470) are issued in parallel, the user's experience is not delayed by the processing time of the analytics server, as would be the case with a conventional exit-tracking method involving a reporting-and-redirect process. Instead, the client's browser quickly proceeds to obtain, process and display the resource associated with the instrumented hyperlink's original target location.
Software engineers of ordinary skill will recognize that “parallel” execution, in the narrowest sense of “simultaneous performance of two different instruction sequences,” is often impossible in light of hardware and software limitations. However, most contemporary computers, operating systems, and software execution environments available within a web browser can simulate parallel or simultaneous execution by means of time sharing, threading, asynchronous callback notifications, and similar constructs. An embodiment of the invention may leverage such facilities to accomplish program behavior that appears (at a macro level) to be concurrent, simultaneous or otherwise parallel.
An embodiment of the invention may be a machine-readable medium having stored thereon data and instructions to cause a programmable processor to perform operations as described above. In other embodiments, the operations might be performed by specific hardware components that contain hardwired logic. Those operations might alternatively be performed by any combination of programmed computer components and custom hardware components.
Instructions for a programmable processor may be stored in a form that is directly executable by the processor (“object” or “executable” form), or the instructions may be stored in a human-readable text form called “source code” that can be automatically processed by a development tool commonly known as a “compiler” to produce executable code. Instructions may also be specified as a difference or “delta” from a predetermined version of a basic source code. The delta (also called a “patch”) can be used to prepare instructions to implement an embodiment of the invention, starting with a commonly-available source code package that does not contain an embodiment.
In some embodiments, the instructions for a programmable processor may be treated as data and used to modulate a carrier signal, which can subsequently be sent to a remote receiver, where the signal is demodulated to recover the instructions, and the instructions are executed to implement the methods of an embodiment at the remote receiver. In the vernacular, such modulation and transmission are known as “serving” the instructions, while receiving and demodulating are often called “downloading.” In other words, one embodiment “serves” (i.e., encodes and sends) the instructions of an embodiment to a client, often over a distributed data network like the Internet. The instructions thus transmitted can be saved on a hard disk or other data storage device at the receiver to create another embodiment of the invention, meeting the description of a machine-readable medium storing data and instructions to perform some of the operations discussed above. Compiling (if necessary) and executing such an embodiment at the receiver may result in the receiver performing operations according to a third embodiment.
In the preceding description, numerous details were set forth. It will be apparent, however, to one skilled in the art, that the present invention may be practiced without some of these specific details. In some instances, well-known structures and devices are shown in block diagram form, rather than in detail, in order to avoid obscuring the present invention.
Some portions of the detailed descriptions may have been presented in terms of algorithms and symbolic representations of operations on data bits within a computer memory. These algorithmic descriptions and representations are the means used by those skilled in the data processing arts to most effectively convey the substance of their work to others skilled in the art. An algorithm is here, and generally, conceived to be a self-consistent sequence of steps leading to a desired result. The steps are those requiring physical manipulations of physical quantities. Usually, though not necessarily, these quantities take the form of electrical or magnetic signals capable of being stored, transferred, combined, compared, and otherwise manipulated. It has proven convenient at times, principally for reasons of common usage, to refer to these signals as bits, values, elements, symbols, characters, terms, numbers, or the like.
It should be borne in mind, however, that all of these and similar terms are to be associated with the appropriate physical quantities and are merely convenient labels applied to these quantities. Unless specifically stated otherwise as apparent from the preceding discussion, it is appreciated that throughout the description, discussions utilizing terms such as “processing” or “computing” or “calculating” or “determining” or “displaying” or the like, refer to the action and processes of a computer system or similar electronic computing device, that manipulates and transforms data represented as physical (electronic) quantities within the computer system's registers and memories into other data similarly represented as physical quantities within the computer system memories or registers or other such information storage, transmission or display devices.
The present invention also relates to apparatus for performing the operations herein. This apparatus may be specially constructed for the required purposes, or it may comprise a general purpose computer selectively activated or reconfigured by a computer program stored in the computer. Such a computer program may be stored in a computer readable storage medium, including without limitation any type of disk including floppy disks, optical disks, compact disc read-only memory (“CD-ROM”), and magnetic-optical disks, read-only memories (ROMs), random access memories (RAMs), erasable, programmable read-only memories (“EPROMs”), electrically-erasable read-only memories (“EEPROMs”), magnetic or optical cards, or any type of media suitable for storing computer instructions.
The algorithms and displays presented herein are not inherently related to any particular computer or other apparatus. Various general purpose systems may be used with programs in accordance with the teachings herein, or it may prove convenient to construct more specialized apparatus to perform the required method steps. The required structure for a variety of these systems will be recited in the claims below. In addition, the present invention is not described with reference to any particular programming language. It will be appreciated that a variety of programming languages may be used to implement the teachings of the invention as described herein.
The applications of the present invention have been described largely by reference to specific examples and in terms of particular allocations of functionality to certain hardware and/or software components. However, those of skill in the art will recognize that website visitor exit tracking can also be produced by software and hardware that distribute the functions of embodiments of this invention differently than herein described. Such variations and implementations are understood to be captured according to the following claims.