The disclosed embodiments relate generally to tracking related events. In particular, the disclose embodiments relate to a system and method for tracking a sequence of events preceding conversion events based on Internet traffic data.
Internet traffic data may be analyzed to gain insight into the behavior of Internet users. For example, search queries and corresponding user clicks on search results may be used to improve search results for future search queries. However, there is presently no way to track related search queries of a respective user that led to a click on a search result. Similarly, web analytics systems allow an operator of a web site to obtain statistics about requests for web pages made by visitors to the web site. The statistics may also include statistics about the effectiveness of advertisement campaigns. For example, an operator of a website may be interested in the number of impressions (i.e., the number of views of an advertisement campaign), the number of click-throughs (i.e., the number of clicks the advertisement campaign received), and the number of conversions (i.e., the number of people that performed a desired action associated with the advertisement campaign) for the advertisement campaign. Although these statistics are useful for gauging the success of an advertisement campaign, these statistics do not allow the operator of the website to understand the sequence of events that led up to a conversion.
Some embodiments provide a system, a computer-readable storage medium including instructions, and a computer-implemented method for tracking conversion events. Tracking events are stored in a history table of a database, wherein the tracking events include conversion events associated with predetermined actions performed by users on websites, and wherein a respective tracking event is associated with a respective user and a respective website. A conversion event then stored in the history table of the database is identified, wherein the conversion event is associated with a predetermined action performed by a user on a website. Next, a set of tracking events is retrieved from the history table that are associated with the website, that are associated with the user, and that occurred prior in time to the conversion event. In response to a request from a user request, a report is generated for display on a client computer system, wherein the report includes the set of tracking events and the conversion event.
In some embodiments, a respective tracking event is selected from the group consisting of a conversion event that is generated when a user performs a predetermined action on a website, an impression event that is generated when an advertisement is displayed to a user, and a click-through event that is generated when a user clicks on an advertisement.
In some embodiments, the predetermined action performed by the user is selected from the group consisting of purchasing a product or service associated with the advertisement, visiting a website associated with the advertisement, and completing a survey.
In some embodiments, prior to storing the tracking events in the history table of the database, the tracking events are periodically obtained from log files.
In some embodiments, the database is a distributed database.
In some embodiments, the distributed database is a multi-dimensional sorted map.
In some embodiments, a respective tracking event is stored into the distributed database as follows. An event type of the respective tracking event is determined. A row name is generated based on an identifier of a respective website associated with the respective tracking event and an identifier of a user associated with the respective tracking event. Data for the respective tracking event is stored in a respective entry of the distributed database, wherein the respective entry has an index based on the row name, the event type, and a timestamp corresponding to a time when the respective tracking event was generated.
In some embodiments, locality groups of the distributed database are designated based on the event types of the tracking events.
In some embodiments, a first locality group includes conversion events, and a second locality group includes impression events and click-through events.
In some embodiments, the conversion event stored in the history table of the database is identified as follows. A conditional read against the first locality group is performed to retrieve one or more conversion events stored in the history table. The conversion event is then selected from the one or more conversion events.
In some embodiments, an aggregated view of tracking events for a respective website is periodically generated across all users that performed the predetermined action on the respective website.
In some embodiments, tracking events are periodically removed from the history table based on a garbage collection policy.
In some embodiments, the garbage collection policy is selected from the group consisting of a time-based garbage collection policy that removes tracking events older than a predetermined age, a user-based garbage collection policy that removes tracking events based on an identifier of a user, and a website-based garbage collection policy that removes tracking events based on an identifier of a website.
In some embodiments, the website is selected from the group consisting of an e-commerce website, an auction website, a multimedia-download website, a charitable contribution website, and a survey website.
In some embodiments, the set of tracking events that are retrieved from the history table include only the tracking events that occurred within a predetermined time interval prior in time to occurrence of the conversion event.
Like reference numerals refer to corresponding parts throughout the drawings.
Reference will now be made in detail to embodiments, examples of which are illustrated in the accompanying drawings. While the invention will be described in conjunction with the embodiments, it will be understood that the invention is not limited to these particular embodiments. On the contrary, the invention includes alternatives, modifications and equivalents that are within the spirit and scope of the appended claims. Numerous specific details are set forth in order to provide a thorough understanding of the subject matter presented herein. But it will be apparent to one of ordinary skill in the art that the subject matter may be practiced without these specific details. In other instances, well-known methods, procedures, components, and circuits have not been described in detail so as not to unnecessarily obscure aspects of the embodiments.
A client device 102 (also known as a “client”) may be any computer or similar device through which a user of the client device 102 can submit data access requests to and receive results or other services from the server system 106, web servers 130, and/or web server 140. Examples include, without limitation, desktop computers, laptop computers, tablet computers, mobile devices such as mobile phones, personal digital assistants, set-top boxes, or any combination of the above. A respective client 102 may contain at least one client application 112 for submitting requests to the server system 106, the web servers 130, and/or the web server 140. For example, the client application 112 can be a web browser or other type of application that permits a user to access the services provided by the server system 106, the web servers 130, and/or the web server 140.
In some embodiments, the client application 112 includes one or more client assistants 114. A client assistant 114 can be a software application that performs tasks related to assisting a user's activities with respect to the client application 112 and/or other applications. For example, the client assistant 114 may assist a user at the client device 102 with browsing information (e.g., web pages retrieved from the web servers 130 and/or 140), processing information (e.g., query results) received from the server system 106, and monitoring the user's activities on the query results. In some embodiments, the client assistant 114 is embedded in a web page (e.g., a query results web page) or other documents downloaded from the server system 106. In some embodiments, the client assistant 114 is a part of the client application 112 (e.g., a plug-in application of a web browser). The client 102 further includes a communication interface 118 to support the communication between the client 102 and other devices (e.g., the server system 106 or another client device 102).
The communication network(s) 104 can be any wired or wireless local area network (LAN) and/or wide area network (WAN), such as an intranet, an extranet, the Internet, or a combination of such networks. In some embodiments, the communication network 104 uses the HyperText Transport Protocol (HTTP) and the Transmission Control Protocol/Internet Protocol (TCP/IP) to transport information between different networks. The HTTP permits client devices to access various information items available on the Internet via the communication network 104. The various embodiments of the invention, however, are not limited to the use of any particular protocol.
In some embodiments, the server system 106 includes a web interface 108 (also referred to as a “front-end server”), a server application 110 (also referred to as a “mid-tier server”), and a backend server 120. The web interface 108 receives data access requests from client devices 102 and forwards the requests to the server application 110. In response to receiving the requests, the server application 110 decides how to process the requests including identifying data filters associated with a request, checking whether it has data available for the request, submitting queries to the backend 120 for data requested by the client, processing the data returned by the backend 120 that matches the queries, and returning the processed data as results to the requesting clients 102. After receiving a result, the client application 112 at a particular client 102 displays the result to the user who submits the original request.
In some embodiments, the backend 120 is effectively a database management system including a database server 123 that is configured to manage a database 124. In some embodiments, the database 124 is stored at the server system 106. In some embodiments, the database 124 is located on a computer system that is separate and distinct from the server system 106. In some embodiments, the database 124 includes aggregate tables 125. Aggregate tables include data that is aggregated on a periodic basis and allows the server system 106 to quickly provide results for data that is commonly requested. In some embodiments, the database 124 includes data records 126. In response to a query submitted by the server application 110, the database server 123 identifies zero or more data records that satisfy the query and returns the data records to the server application 110 for further processing. In some embodiments, the database 124 includes a history table 127 that stores tracking events. In some embodiments, the tracking events include a conversion event that is generated when a user performs a predetermined action on a website, an impression event that is generated when an advertisement is displayed to a user, and/or a click-through event that is generated when a user clicks on an advertisement. In some embodiments, the website is selected from the group consisting of an e-commerce website, an auction website, a multimedia-download website, a charitable contribution website, and a survey website. These embodiments are described in more detail with respect to
In some embodiments, the database 124 is a distributed database. In some embodiments, the distributed database is a multi-dimensional sorted map. For example, the multi-dimensional sorted map may be a BigTable.
In some embodiments, the server system 106 is an application service provider (ASP) that provides web analytics services to its customers (e.g., a web site owner) by visualizing the traffic data generated at a web site in accordance with various user requests. To do so, the server system 106 may include an analytics system 150 adapted for processing the raw traffic data of a web server 130 and other types of traffic data generated by the web server 130 through techniques such as page tagging. Note that the traffic data may include any type of user traffic (e.g., requests for static or dynamic web pages, traffic from mobile applications, requests by and request for Flash applications, etc.). In some embodiments, the traffic data includes tracking events produced from user actions on the web servers 130. In some embodiments, the server system 106 analyzes the traffic data to identify tracking events that lead up to a conversion event. For example, the server system 106 may identify a conversion event produced by actions of a user on a first website. Based on the conversion event, the server system 106 may then identify all (or a subset of) the tracking events (e.g., impression events, click-through events, and/or conversion events) associated with the user and the first website that occurred prior in time to the particular conversion event. Note that the tracking events can be generated in response to actions of a user other websites (e.g., websites other than the first website).
In some embodiments, the raw traffic data is obtained from log files 136 of the web servers 130. In these embodiments, the web servers 130 provide access to the log files 136 to the analytics system 150.
In some embodiments, the raw traffic data is obtained from log files 144 of a web server 140. In these embodiments, content providers insert tracking code (e.g., a script) into documents (e.g., web pages 132) for which the content providers desire to obtain traffic data. When these documents are accessed by users, the tracking code is executed and a request for a tracking object 142 (e.g., a specified image file) on the web server 140 is generated. In some embodiments, the request for the tracking object 142 includes parameters that provide information about the page being requested. The request for the tracking object 142 is recorded in the log files 144, including any parameters associated with the request for the tracking object. In some embodiments, the web servers 130 include the tracking object 142 that the analytics system 150 uses to track hits to web pages 132. In these embodiments, the analytics system 150 obtains the log files from the web servers 130.
In some embodiments, the raw traffic data is transmitted directly from the client devices 102 to the analytics system 150. In these embodiments, content providers insert tracking code (e.g., a script) into documents (e.g., web pages 132) for which the content providers desire to obtain traffic data. When these documents are accessed by users, the tracking code is executed by the client devices 132 and a request for a tracking object 152 (e.g., a specified image file) on the server system 106 is generated. The analytics system 150 receives the request from the client devices 132, processes the raw traffic data, and stores attribute-value pairs associated with the raw traffic data in the database 124. In some embodiments, the request for the tracking object 152 includes parameters that provide information about the page being requested.
In some embodiments, the tracking object 142 (or 152) is a tracking object for an advertisement associated with a website. In these embodiments, when a client assistant (e.g., the client assistant 114) of a client device (e.g., the client device 102-1) displays the advertisement associated with the website, the client assistant executes code associated with the advertisement that generates a request for the tracking object 142 (or 152), wherein the request includes parameters indicating that the advertisement was displayed (i.e., an impression of the advertisement was produced). This request for the tracking object 142 (or 152) generates an impression event in the log files 144 (or an impression event on the server system 106). When a user of the client device clicks on the displayed advertisement, client assistant executes code associated with the advertisement that generates a request for the tracking object 142 (or 152), wherein the request includes parameters indicating that the advertisement was clicked (i.e., a click-through of the advertisement was produced). This request for the tracking object 142 (or 152) generates a click-through event in the log files 144 (or a click-through event on the server system 106). When a user performs a predetermined action on the website associated with the advertisement, the website (or alternatively, the client assistant 114) generates a request for the tracking object 142 (or 152) that includes parameters indicating that the predetermined action on the website was performed by the user. This request for the tracking object 142 (or 152) generates a conversion event in the log files 144 (or a conversion event on the server system 106). Note that the user may have been shown the advertisement and/or the user may have clicked on the advertisement a number of times over a period of time prior to performing the predetermined action on the website associated with the advertisement (i.e., generating the conversion event). The embodiments described herein disclose techniques for tracking the tracking events leading up to the conversion event.
Note that in any of the aforementioned techniques, the raw traffic data may be included in an activity file. For example, the activity file may be the log files 136, the log files 144, or the raw traffic data received directly from the client devices 132. Also note that for the sake of clarity, the disclosed embodiments are described with respect to using the web server 140 to tracking requests web pages of a web site using the tracking object 142 and log files 144. However, any of the techniques for acquiring raw traffic data may be used. Furthermore, note that any technique for tracking raw traffic data may be used. For example, the raw traffic data may be stored in cookies on a client computer system that is periodically transmitted to the server system 106 for analysis, as described herein. Similarly, the raw traffic data may be stored on a client computer system (e.g., using a cookie, a database, etc.) and analyzed locally on the client computer system using the techniques described herein. The analyzed data may then transmitted to the server system 106 for storage.
After the raw traffic data is obtained from the activity files, the raw web traffic data is first processed into a multidimensional dataset that includes multiple dimensions and multiple metric attributes (or measures) before the server system 106 can answer any data visualization requests through the web interface 108. A more detailed description of the processing of raw web traffic data can be found in the U.S. Provisional Patent Application No. 61/181,275, filed May 26, 2009, entitled “System and Method for Aggregating Analytics Data” (attorney docket no. 060963-5406-PR) and the U.S. Provisional Patent Application No. 61/181,276, filed May 26, 2009, entitled “Dynamically Generating Aggregate Tables” (attorney docket no. 060963-5409-PR), the contents of which are incorporated by reference herein in their entirety. For simplicity, it is assumed herein that the data records managed by the backend 120 and accessible to the server application 110 are not the raw web traffic data, but the data after being pre-processed. Note that the traffic data may be sessionized and/or aggregated.
For convenience and custom, the web traffic data of a user session (or a visit) is further divided into one or more hits 230A to 230N. Note that hits 230A to 230N are also referred to as “hit records” or “database hit records” 230A to 230N. Also note that the terms “session” and “visit” are used interchangeably throughout this application. In the context of web traffic, a hit typically corresponds to a request to a web server for a document such as a web page, an image, a JavaScript file, a Cascading Style Sheet (CSS) file, etc. Each hit 230A may be characterized by attributes such as type of hit 240A (e.g., transaction hit, etc.), referral URL 240B (i.e., the web page the visitor was on when the hit was generated), a timestamp 240C that indicates when the hit occurs and so on. Note that the session-level and hit-level attributes as shown in
Referring back to
The process of generating a web analytics report is described in detail in U.S. patent application Ser. No. 12/575,437, filed Oct. 7, 2009, entitled “Method and System for Generating and Sharing Dataset Segmentation Schemes,” the content of which is incorporated by reference herein in its entirety.
Attention is now directed to
For high-volume implementations of the server system 106, the history table 127 may include over a billion rows, of which, on the order of a few million rows are conversion events. Since the conversion events are sparsely populated in the history table 127, identifying a sparse number of conversion events within the history table 127 is a time-consuming task for a traditional relational database management system. Thus, in some embodiments, the history table 127 is stored in a distributed database. In some embodiments, the distributed database is a multi-dimensional sorted map (e.g., BigTable). In these embodiments, data is stored into the database using a mapping of: {row key, event type, timestamp}. For example, a mapping may be {(user ID 1, advertisement ID 1), impression, Jan. 10, 2010}, corresponding to an impression event was recorded on occurred on Jan. 10, 2010 and associated with a user having a user ID of “1” and an advertisement having an advertisement ID of “1”. In some embodiments, to further improve read performance of the distributed database, locality groups are defined based on event types of the tracking events. For example, as illustrated in
Returning to
After the report module 330 identifies rows having one or more conversion events, the report module 330 generates reports based on the conversion events, the impression events, the click through events.
In some embodiments, the report module 330 periodically reads from the history table 127. In these embodiments, the report module 330 only reads and analyzes tracking events that are new since the prior read from the history table 127.
In some embodiments, a garbage collection module 350 periodically removes tracking events from the history table based on a garbage collection policy. In some embodiments, the garbage collection policy is selected from the group consisting of a time-based garbage collection policy that removes tracking events older than a predetermined age, a user-based garbage collection policy that removes tracking events based on an identifier of a user, and a website-based garbage collection policy that removes tracking events based on an identifier of a website.
Each of the above-identified elements in
Attention is now directed to
Next, the event importer module 310 stores (904) tracking events in a history table of a database, wherein the tracking events include conversion events associated with predetermined actions performed by users on websites, and wherein a respective tracking event is associated with a respective user and a respective website. Attention is now directed to
Returning to
Returning to
In response to a request from a user request, the report module 330 generates (910) a report for display on a client computer system, wherein the report includes the set of tracking events and the conversion event.
In some embodiments, the report module 330 generates (910) a report for display on a client computer system that includes statistics for conversion events. For example,
In some embodiments, the report module 330 periodically generates (912) an aggregated view of tracking events for a respective website across all users that performed the predetermined action on the respective website.
The methods 900-1100 may be governed by instructions that are stored in a computer readable storage medium and that are executed by one or more processors of one or more servers. Each of the operations shown in
Note that although the embodiments described herein are directed to tracking conversion events for advertisements, the embodiments described herein may be applied to tracking other related events. In general, the embodiments described herein may be used to track any sequence of related events that lead to an event satisfying predetermined criteria. For example, the embodiments described herein may be used to track a sequence of search queries submitted by a user that leads to a click event on a particular search result (i.e., the event satisfying the predetermined criteria).
The foregoing description, for purpose of explanation, has been described with reference to specific embodiments. However, the illustrative discussions above are not intended to be exhaustive or to limit the invention to the precise forms disclosed. Many modifications and variations are possible in view of the above teachings. The embodiments were chosen and described in order to best explain the principles of the invention and its practical applications, to thereby enable others skilled in the art to best utilize the invention and various embodiments with various modifications as are suited to the particular use contemplated.