Attention is now directed to the drawings, where like reference numerals or characters indicate corresponding or like components. In the drawings:
This document also includes a Large Table Appendix on a Compact Disk (disclosed above) as Appendix A, and Appendix B, that is attached to this document.
The present invention is related to systems and methods for behavioral targeting of users along a network such as the Internet, for various informational campaigns, such as advertising campaigns. The invention typically involves a two phase process.
In a first phase, probabilities of one informational campaign, typically an advertising campaign, with respect to another informational, typically an advertising campaign, are calculated, and values of expected revenue for each campaign are determined from the probabilities. The campaigns with the greatest expected revenues are then analyzed, to determine the extent of their correlation, in the second phase. By performing the process in two phases, false positives are nearly eliminated, and only the most relevant advertising campaigns are ultimately evaluated. This provides advertisers with a highly targeted audience, for whom to send their advertising communications, typically in the form of electronic mail (e-mail).
In the second phase, the correlation between two campaigns is determined. The correlation is expressed as a value. This phase involves determining a correlation coefficient between two campaigns, and analyzing the correlation coefficient for a lower confidence limit (LCL), expressed as a value, of a confidence interval.
The value of the correlation coefficient is used in determining if another informational campaign will be sent to the users, who received a previous informational campaign. The value of the correlation coefficient is in a range of −1 to 1. For example, the preferred values for the correlation coefficient are those as close as possible to 1.
From the correlation coefficient, a lower confidence limit (LCL) is calculated. The largest LCL (value for the LCL) is typically indicative of the campaigns considered to be the most correlated. Similarly, smaller LCLs or LCL values, are considered to have less correlated campaigns. When multiple paired campaigns are evaluated, the LCLs (LCL values) can be ranked, from largest to smallest, with the ranking indicative of the most correlated campaigns. Accordingly, the more correlated campaigns (high LCL) are typically sent to recipients (users) before the less correlated campaigns (low or lower LCL).
Throughout this document, numerous textual and graphical references are made to trademarks. These trademarks are the property of their respective owners, and are referenced only for explanation purposes herein.
Also throughout this document, references are made to “n” and “nth”, to indicate the last member, component, element, etc., of a series, sequence or the like.
There are, for example, numerous servers that are linked to the Internet 24, as part of the system 20. These servers typically include a Home Server (HS) 30, one or more content servers (CS) 34a-34n, as well as numerous other servers and devices. Depending on the content to be provided to users (in particular, to their computers or other computer-type devices or machines, through their e-mail clients) there may also be imaging servers, such Imaging Server (IS) 38, that along with the servers and related components described herein, are detailed in commonly owned U.S. patent application Ser. No. 10/915,975, entitled: Method And System For Dynamically Generating Electronic Communications (U.S. Patent Application Publication No. 2005/0038861 A1), this patent application and Patent Application Publication, are incorporated by reference herein. U.S. patent application Ser. No. 10/915,975, entitled: Method And System For Dynamically Generating Electronic Communications and U.S. Patent Application Publication No. 2005/0038861 A1, are used interchangeably herein. All of the aforementioned servers are linked to the Internet 24, so as to be in communication with each other. The servers 30, 34a-34 and 38 (depending on the content being sent to users), include multiple components for performing the requisite functions as detailed below, and the components may be based in hardware, software, or combinations thereof. The aforementioned servers may also have internal storage media and/or be associated with external storage media.
The servers 30, 34a-34n, 38 of the system 20 are linked (either directly or indirectly) to an endless number of other servers and the like, via the Internet 24. Other servers, exemplary for describing the operation of the system 20, include a domain server 39 for the domain (for example, the domain “abc.com”) of the user 40 (for example, whose e-mail address is user1@abc.com), linked to the computer 41 (or other computer type device) of the user. Still other servers may include third party servers (TPS) 42a-42n, controlled by content providers and the like.
While various servers have been listed, this is exemplary only, as the present invention can be performed on an endless numbers of servers and associated components, that are in some way linked to a network, such as the Internet 24. Additionally, all of the aforementioned servers include components for accommodating various server functions, in hardware, software, or combinations thereof, and typically include storage media, either therein or associated therewith. Also in this document, the aforementioned servers, storage media, components can be linked to each other or to a network, such as the Internet 24, either directly or indirectly.
The home server (HS) 30 is of an architecture that includes components for handling electronic mail, to perform an electronic mail (e-mail) server functionality, including e-mail applications. The home server (HS) 30 also includes components for recording events, such as the status of e-mails, when e-mails are sent, whether or not there has been a response to an e-mail (a certain time after the e-mail has been sent), whether the e-mail has been opened, and whether the opened e-mail has been activated or “clicked”, such that the browser of the user is ultimately directed to target web site, corresponding to the link that was “clicked.”
The architecture also includes components for providing numerous additional server functions and operations, for example, comparison and matching functions, policy and/or rules processing, various search and other operational engines. The home server (HS) 30 includes various processors, including microprocessors, for performing the aforementioned server functions and operations. The home server (HS) 30 may be associated with additional caches, databases, as well as numerous other additional storage media, both internal and external thereto. The home server (HS) 30 and all components associated therewith are, for example, in accordance with the home server (HS) 30, described in U.S. Patent Application Publication No. 2005/0038861 A1.
The home server (HS) 30 composes and sends e-mails to intended recipients (for example, e-mail clients hosted by a computer, workstation or other computing device, etc., associated with a user), over the network, typically a wide area network (WAN), such as the Internet 24, and sends these e-mails to e-mail clients in computers associated with users. The e-mail clients may be, for example, America Online® (AOL®), Outlook®, Eudora®, or other web-based clients. In this document, the client is an application that runs on a computer, workstation or the like and relies on a server to perform some operations, such as sending and receiving e-mail. Also, for explanation purposes, the Home Server (HS) 30 may have a uniform resource locator (URL) of, for example, www.homeserver.com.
The e-mails, sent by the home server (HS) 30, may be e-mails in accordance with those sent by the home server (HS) 30 in commonly owned U.S. Patent Application Publication No. 2005/0038861 A1. The e-mail may also be “static” e-mails, where the content and underlying links to target web sites are fixed when the e-mail is sent.
For example, the intended recipient or user 40 has a computer 41 (such as a multimedia personal computer with a Pentium® CPU, that employs a Windows® operating system), that uses an e-mail client. The computer 41 is linked to the Internet 24.
Content Servers (CS) 34a-34n (one or more) are also linked to the Internet 24. The content servers (CS) 34a-34n provide content, typically in text form, for the imaging server (IS) 38, typically through the Home Server (HS) 30, and typically, in response to a request from the Home Server (HS) 30, based on a designated keyword. These content servers (CS) 34a-34n may be, for example, Pay-Per-Click (PPC) servers of various content providers, such as internal providers, or external providers, for example, Overture Services, Inc. or Findwhat, Inc.
At least one imaging server (IS) 38 is linked to the Internet 24. The imaging server (IS) 38 functions to convert text (data in text format) from the content servers (CS) 34a-34n, as received through the Home Server (HS) 30, to an image (data in an image format). After conversion into an image, the image is typically sent back to the home server (HS) 30, to be placed into an e-mail opened by the user 40, as detailed below. Alternately, the imaging server (IS) 38 may send the image directly to the e-mail client associated with the user 40, over the Internet 24.
Turning also to
The “sent e-mail” as represented by text line 60, may be, for example, in Hypertext Markup Language (HTML), and may include one or more Hypertext Transport Protocol (HTTP) source requests. These HTTP source requests typically reference the Home Server (HS) 30.
The e-mails sent by the home server (HS) 30, may be in accordance with the e-mails of U.S. Patent Application Publication No. 2005/0038861 A1. It may also be in accordance with the conventional or static e-mail. The text line 60 corresponding to the e-mail sought to be opened, is then opened by activating a mouse or other pointing device, commonly known as “clicking” on the e-mail (the line of text 60 corresponding to the e-mail). The activation or click is indicated by the arrow 62, as shown in
With the e-mail now being opened, templates are built out, resulting in one of the two screen shots of the opened e-mail, as shown in
Both opened e-mails include buttons, locations or the like, on the image that covers the links 70 (
The targeted web site associated with the link is shown, for example, as the screen shot of
While
Attention is now directed to
In a first phase, probabilities of one informational campaign, typically, an advertising campaign, with respect to another campaign (informational, for example, advertising), are calculated, and values of expected revenue for each campaign are determined from the probabilities. The campaigns with the greatest expected revenues are then analyzed, to determine the extent of their correlation, in the second phase. By performing the process in two phases, false positives are nearly eliminated, and only the most relevant advertising campaigns are ultimately evaluated. This provides advertisers with a highly targeted audience, for whom to send their advertising communications, typically in the form of electronic mail.
To determine the probability of one advertising campaign, with respect to another, and the expected revenue for the respective campaigns, there will be, for example, five advertising campaigns established. These campaigns include: Campaign A, a campaign for Automobiles; Campaign B, a campaign for boats; Campaign C, a campaign for carpet; Campaign D, a campaign for dog toys; and, Campaign E, a campaign for eggs. These campaigns are also referred to throughout this document by their shortened names, A, B, C, D and E. Every campaign is evaluated with respect to every other campaign. For example, A|B represents the probability that a user will respond to a communication, typically, an e-mail, for Campaign A, given that the user has responded to Campaign B in the past. By “responded”, it is meant, that the a user has either “opened”, or, “opened” and “clicked”, collectively “clicked”, the e-mail sent to him. Also, an e-mail is considered “sent” when it was sent but not responded to in a predetermined time period after its having been sent.
In looking at A|B (the probability that a user will respond to a communication, typically, an e-mail, for Campaign A, given that the user has responded to Campaign B in the past), Campaign A is the “target” campaign, while Campaign B is the “predictor” campaign, as shown in
In
In
P(A|B)=NN/MM=a+b/a+b+d+e+g+h
By performing these calculations, the exemplary diagram and result list is obtained in
Using the probabilities from
For example, it has been determined that returns of $1.50 or more are sufficient for determining the correlation coefficient. Accordingly, only target campaigns A, B and C, include return amounts of at least $1.50, as indicated by the boxes CC1-CC6 of
Attention is now also directed to
The advertising campaigns are, for example, sent from the home server (HS) 30, and are received by the intended recipients, for example, USER 1 to USER n, in accordance with the dynamic or static e-mail described herein. For example, the sent e-mails may be opened, by the user clicking on the text bar, with this opening resulting in the screen shots of
Staying in
The charted responses of
The second phase of the process now begins. In this second phase, the correlation between informational or advertising campaigns is determined, as a correlation value is determined for two campaigns. This correlation value provides an indication of the correlation between two campaigns.
Initially, a correlation coefficient will be determined between two campaigns, and each correlation coefficient will be analyzed for a lower confidence limit (LCL), a value that is calculated. This LCL value will be useful in determining which campaigns to send to which users (recipients), and will allow for a ranking of correlated campaigns for sending to users (recipients).
Turning to
r=cov(x,y)/σ(x)σ(y)
The relationship of the correlation vector (cov (x,y)) to the vectors σ(x) and σ(y), is expressed in the equation:
The equation will yield a value of “r”, the correlation coefficient, ranging from −1 to 1. A positive value of the correlation coefficient “r” typically indicates a positive correlation between the two campaigns. Here for example, correlation coefficients “r” are determined for the correlation of Campaign A to Campaign B, the correlation of Campaign B to Campaign C, and, the correlation of Campaign A to Campaign C. Typically, the closer the correlation coefficient (r) is to “1”, the greater the correlation between the two campaigns being analyzed. Also, it is typical that campaigns whose correlation coefficient (r) is negative are not further analyzed.
The accuracy of the Pierson's Correlation Coefficient (r) between the two suitable campaigns, typically having a positive Pierson's Correlation Coefficient (r), is calculated, by applying the Lower Confidence Limit (LCL), expressed as r′, of this value (r). The lower confidence limit (LCL) of the Pierson's Correlation Coefficient (r) is used to rank order the campaigns in order of interest, typically from the highest value to the lowest value. The campaigns associated with the greatest LCL value (r′), are typically delivered first, as these campaigns are the best correlated campaigns, with delivery of the campaigns continuing until all ordered campaigns are exhausted.
The Lower Confidence Limit (LCL) for the Pierson's Correlation Coefficient is calculated, for example, in three steps, using the following method. In the Pearson's correlation coefficient (r), the Lower Confidence Limit (LCL) (r′) is simply the left bound of the confidence interval. The value (r′) for the LCL is typically a value less than 1, and due to the elimination of campaigns with negative correlation coefficients (r), the value for (r′) is typically between 0 and 1.
Convert the value of Pearson's correlation coefficient (r) to a confidence interval (z) as:
Calculate the confidence interval of z, expressed as z′, as:
Convert the confidence interval of z (expressed as z′) to the LCL value of r′ in accordance with the formula:
The values for the confidence intervals (r′) for the desired LCLs are ranked, with the greatest LCL (r′) values being the most correlated campaigns.
This Example references the Large Table Appendix (Appendix A) referenced above, and which is incorporated by reference herein. A portion of this Large Table Appendix is Table EX-A.
An Example data set is in the data file, attached to this document on a CD in ASCII language, as Appendix A. In this data set, that forms Table EX-A, there are nine columns representing nine advertising campaigns, from “Art Supplies” to “Vacations.” There are 10,000 rows representing 10,000 users (user01 to user10000). All users were sent all campaigns in e-mails, and have either responded to or not responded to the campaigns. Responses were classified as two kinds, an opening, where the user opened the communication for the campaign, and opened and “clicked.” A user must open an e-mail to click.
A subset of the first ten records of the data set (the Large Table Appendix-Appendix A) for users01-10, is listed in Table EX-A′. In this Table, an e-mail delivery with no response (not opened) is denoted with a value of 0. A delivery with an open but no click is denoted with a value of 0.03, while an e-mail delivery with an open and a click is denoted with a value of 1, such that Table EX-A′ is as follows:
From Table EX-A (and Table EX-A′), user01 responded to the various e-mails for each campaign as follows:
Also from Table EX-A (and Table EX-A′), user04 responded to the various e-mails for each campaign as follows:
Next, pay per click (PPC) values were provided. A PPC value is the amount of money that will be paid by an advertiser to a search engine or the like for directing a user to the advertiser's target website, when the user clicks on a link to the target web site provided by the search engine. The PPC values for each campaign were provided in List 1, as follows:
A conditional probability Pcond of a user clicking on one campaign (C1), given they responded to another campaign (C2) is given by the following equation:
P
cond=(users that clicked on C1+users who responded to C2)/(Total number of users that responded to C2).
Using the “Art Supplies” and “Books” campaigns, the conditional probability (Pcond(ArtSup-Books) of a user clicking on the Art Supplies campaign, given that they responded (opened OR opened and clicked) on the Books campaign can be given by the following equation:
P
cond(ArtSup-Books)=(Number of user users that clicked on the “Art Supply” campaign AND responded to the “Books” campaign)/(Number of users that responded to the Books campaign).
From the Table (TABLE EX-A) of the Large Table Appendix, the following table, known as Table EX-C, was created, as follows:
Using the values from Table EX-C, the conditional probability of a user clicking on the Art Supplies campaign, given that they responded to the “Books” campaign Pcond(ArtSup-Books) is determined as follows:
P
cond(ArtSup-Books)=(990+255)/(990+239+0+255+2578+248)=0.2889
A value for expected revenue (ER) is now determined based on the probability of the user clicking on the Art Supply Campaign given they responded to the Books Campaign. This expected revenue (ER) value is determined by the formula:
ER=PcondPPC
Here, for the specific campaigns of Art Supplies being delivered to users who responded to the “Books” campaign, the expected revenue (ER) is determined in accordance with the formula:
ER=Pcond(ArtSup-Books)PPCArtSupplies, or
ER=0.2889$0.32=$0.09
Therefore, the expected revenue (ER) of the Art Supply Campaign as delivered to users who responded to the Books Campaign is $0.09.
An important factor in the calculation of Part 1 that was ignored was the sample size. For Example, suppose there was a pair of campaigns (Campaign A and B) with the Table EX-D, listed as follows:
The probability P(A|B)1 a user would click on A (ax, bx) given that they responded to B (ax, bx, dx, ex, gx, hx) would be: (1+1)/(1+1+1+1+1+1)= 2/6=0.33.
The same probability would come from the following table:
The probability P(A|B)2 a user would click on A (ay, by) given that they responded to B (ay, by, dy, ey, gy, hy) would be:
(1000+1000)/(1000+1000+1000+1000+1000+1000)=2000/6000=0.33.
The estimate of the probability is the same in the above two cases, but the confidence in the estimate is different. In general, more data yields greater confidence in the estimate.
One method to quantify a level of certainty in an estimate is to establish a confidence interval (CI). The confidence interval (CI) is the proportion of samples of a given size that may be expected to contain the true mean. For example, in a 90% confidence interval (CI), for the number of samples collected and the confidence interval is computed, over time, 90% of these intervals would contain the true mean.
A 90% Lower Confidence Limit (LCL) is an interval that ranges from a first positive value, upward, to infinity. That is, 90% of the means would fall above the LCL. An important feature of this is that the LCL provides a level of certainty. The less certainty about the estimate, the lower the value must be to ensure that 90% of samples would be above this value. This property is used to account for variances in samples, such as those of Table A. The 90% Lower Confidence Limit (LCL) of the Binomial Distribution is calculated for the sample. This value is substituted for the probability.
Here, the 90% LCL was calculated as follows:
LCL=P(A|B)−1.645[(P(A|B))(1−P(A|B))/6]1/2
LCL
6samples=(⅓)−1.645[(⅓)(1−⅓)/6]1/2=0.017
LCL
6000samples=(⅓)−1.645[(⅓)(1−⅓)/6000]1/2=0.323
From List 1 above, the PPC for the Art Supplies Campaign is $0.32. The adjusted expected value is therefore: 0.2775065$0.32=$0.08.
The above is sufficient to deliver e-mail, as it is above a predetermined threshold, here $0.001.
In an additional procedure, the campaigns were analyzed to provide users with the most relevant campaigns. Once the non-profitable campaigns were removed, based on the previous procedures, as detailed above, the Pierson's Correlation Coefficient (r) was calculated to determine what campaign the particular user was most interested in, regardless of PPC.
The Pearson's Correlation Coefficient (r) is expressed as follows:
Taking the data from Table A, the Pierson's Correlation Coefficient (r) between the Art Supplies and Books campaigns is calculated as 0.7812.
The accuracy of the Pierson's Correlation Coefficient (r) between the Art Supplies and Books campaigns is further analyzed, by applying the Lower Confidence Limit (LCL), expressed as r′ (below), of this value (r). The lower confidence limit (LCL) of the Pierson's Correlation Coefficient (r) is used to rank order the campaigns in order of user interest, typically from the highest value to the lowest value. The campaigns associated with the greatest LCL (r′) value, are typically delivered first, as these campaigns are the best correlated campaigns, with delivery of campaigns continuing until all ordered campaigns are exhausted.
The Lower Confidence Limit (LCL) (r′) for the Pierson's Correlation Coefficient (r) was calculated using the following method:
There are three steps to calculate the confidence interval on Pearson's correlation coefficient (r). The Lower Confidence Limit (LCL) (r′) is simply the left bound of the confidence interval.
Convert the value of Pearson's correlation coefficient (r) to a confidence interval (z) as:
Calculate the confidence interval of z, expressed as z′, as:
Convert the confidence interval of z (expressed as z′) to the LCL value of r′ in accordance with the formula:
If the correlation coefficient of target campaign and predictor campaign is calculated as r=0.7812 based on 10,000 users. The 97.5% LCL was calculated using formula S1, to obtain a value of z, such that z=1.0484.
A 97.5% lower confidence interval of z, with z=1.0484 (from above), expressed as z′, is LCL (97.5%), using the formula S2, where,
whereby, the 97.5% confidence interval of r, expressed as r′, using the formula S3, where z′=0.9863 (from above), is:
The above-described processes including portions thereof can be performed by software, hardware and combinations thereof. These processes and portions thereof can be performed by computers, computer-type devices, workstations, processors, micro-processors, other electronic searching tools and memory and other storage-type devices associated therewith. The processes and portions thereof can also be embodied in programmable storage devices, for example, compact discs (CDs) or other discs including magnetic, optical, etc., readable by a machine or the like, or other computer usable storage media, including magnetic, optical, or semiconductor storage, or other source of electronic signals.
The processes (methods) and systems, including components thereof, herein have been described with exemplary reference to specific hardware and software. The processes (methods) have been described as exemplary, whereby specific steps and their order can be omitted and/or changed by persons of ordinary skill in the art to reduce these embodiments to practice without undue experimentation. The processes (methods) and systems have been described in a manner sufficient to enable persons of ordinary skill in the art to readily adapt other hardware and software as may be needed to reduce any of the embodiments to practice without undue experimentation and using conventional techniques.
While preferred embodiments of the present invention have been described, so as to enable one of skill in the art to practice the present invention, the preceding description is intended to be exemplary only. It should not be used to limit the scope of the invention, which should be determined by reference to the following claims.