The present invention relates to analyzing the impact of a particular feature or service, and more particularly, to automatically automated impact analysis of services and features on on-line advertisers.
An on-line advertising system may provide advertisements to users when they visit certain web pages. When a particular advertisement is of interest to a user, the user may perform various actions, such as selecting or clicking on the advertisement, which may take the user to a web page belonging to the advertiser associated with the advertisement. Additional examples of user actions may include signing-up for services at the target web page, placing an order, etc. On-line advertising systems may charge advertisers based, at least in part, on a number of clicks an advertisement receives.
On-line advertising systems continually develop and implemented new features and services for advertisers. It is important to identify whether such new features and services have a positive impact and to estimate the impact of a particular feature or service. However, properly attributing cause-effect relationships related to a particular feature or service is typically difficult in the presence of confounding factors that can lead to false attribution of cause and effect. For example, many issues, such as selection bias of the advertisers who have received the benefit of the feature or service, seasonality, and economic cycle make it difficult to accurately analyze the actual impact of the particular feature or service. In many cases, traditional randomized controlled experiment designs are not realistic for analyzing the impact of a particular good or service.
The present invention provides a method and system for automated impact analysis of a treatment applied to a portion of a population. Embodiments of the present invention provide an automated method to measure the impact of a treatment (e.g., feature or service) independent of other factors, such as selection bias, seasonality, and economic cycle.
In one embodiment of the present invention, data related to a treatment of interest and a population including a treated group and a non-treated group is received. Propensity scores are estimated for the treated group and the non-treated group based on the data. Subgroups of the treated group and the non-treated group are matched based on the propensity scores. An outcome model is generated for each subgroup of the non-treated group, and an impact of the treatment on the treated group is generated estimated outcomes for each subgroup of the treated group using the outcome model generated for the matching subgroup of the control group.
Outcome models may also be generated for the treated group and the non-treated group, and an impact of the treatment on the population may be generated based on the propensity scores and the outcome models for the test group and the non-treated group
These and other advantages of the invention will be apparent to those of ordinary skill in the art by reference to the following detailed description and the accompanying drawings.
The present invention is directed to a method and system for automatically analyzing the impact of a treatment of interest. As used herein a “treatment” is any service, feature, product, or program applied to a portion of a population. As described herein, embodiments of the present invention relate to automatically analyzing the impact of a treatment of interest on on-line advertisers. For example, embodiments of the present invention may be used to analyze the impact of services or features applied to certain on-line advertisers, such as particular sales activities directed to certain on-line advertisers, features offered to on-line advertisers to increase the effectiveness of their advertising, and marketing events offered to certain on-line advertisers. However, the present invention is not limited to treatment of on-line advertisers, and may be similarly applied to analyze the impact of various services, features, products, programs, etc., in various other industries and fields as well. For example, embodiments of the present invention can be applied to analyze the impact of certain medicines in clinical trials, and to analyze the impact of promotions in retail stores.
The user interface 104 provides an interface for a user to access and control the impact analysis tool 102 from a remote user device 110. The user device can connect to the server via a network 112, such as the Internet or a mobile network, using well known network protocols. In a possible implementation, the user interface 104 can be accessed through a web browser of the user device 110 in order to provide a web-based user interface for remote users. Through the user interface 104, the impact analysis tool 102 can receive information relating to a treatment of interest that is entered by a user. For example, a user can input customer information, such as customer IDs (CIDs) relating to customers in the treated group and the non-treated group, treatment information, such as the treatment dates, and other profile variables, such as parameters that indicate which metrics (e.g., clicks, money spent, etc.) to use to analyze the impact of the treatment. The user interface 104 may provide various menus, options, prompts, etc., in order to allow the user to easily input the necessary information.
The signal retrieval module 106 of the impact analysis tool 102 retrieves data necessary to perform the impact analysis from a signal repository 114. The signal repository stores feature variables that are used as input signals to build propensity models and outcome models. The feature variables (i.e., input signals) can include both continuous and categorical variables. The signal repository 114 also stores outcome data. The signal repository 114 retrieves this data from various data sources 116, 118, and 120 and stores the data for a certain time frame. The data sources can include a customer database 116, an advertiser database 118, and an activity database 120. The customer database 116 stores records of outcome data, such as clicks, money spent, etc., for various customers. The advertiser database 118 stores CIDs for various customers, as well as other information relating to the customers, such as size, financial information, business type, etc, which can be used as input features. The activity database 120 may store activity/treatment information. For example, the activity database 120 may store information that indicates whether a certain CID received certain services/treatments and when. Although it is possible for the impact analysis tool 102 to query the data sources 116, 118, and 120 in real time to collect the necessary data, this query will run in real time on large amounts of data, and may be inefficient.
According to an advantageous implementation, the signal repository 114 stores the signals for all CIDs for a certain time period, such as a week. For example, the signal repository 114 can store these signals in a distributed structured storage system, such as bigtable. This allows the signal repository 114 to be quickly queried by the signal retrieval module 106 in order to retrieve the signal data and outcome variables necessary to perform a requested impact analysis. The signal repository 114 can be keyed by CID, and only one locality group is needed since the signals can be all pulled by the signal retrieval module 106 together. Timestamps or dates can be the third dimension in the table. The two column families in the signal repository 114 can be: (1) features and (2) outcomes. The features columns correspond to various feature variables and can store feature data in a raw format, such as strings. The outcome columns correspond to outcome variables. The signal repository 114 can be updated at a regular time interval, such as every week. For example, the signal repository 114 can be updated using various types of known scripts or protocols to retrieve the signal and outcomes from the data sources 116, 118, and 120.
Once the feature variables (signals) and outcomes corresponding to a certain treatment of interest are retrieved by the signal retrieval module 106, the impact analysis module 108 uses the feature variables and outcomes to analyze the treatment of interest. According to various embodiments of the present invention, the feature analysis module 108 can estimate the effect of the treatment of interest on the treated group and the effect of the treatment of interest on the entire population. In particular, the signal retrieval module 106 utilizes the methods of
The impact analysis tool 102 then outputs the results of the treatment analysis to a user. For example, the impact analysis tool 102 can transmit the analysis results to the user device 110 over the network 112, where the results can be stored and/or viewed by the user. In a possible implementation, the results can be presented to the user in the user interface 104.
In order to understand the automated impact analysis of a treatment of interest on a portion of a population, a general framework of the automated impact analysis problem is first discussed. Suppose there is a random sample size n from a large population. For each unit i (e.g., advertiser i) in the sample, let Zi indicate whether the treatment of interest was received. That is, Zi=1 if the unit i received the treatment and Zi=0 of the unit i did not receive the treatment. Two ways to measure the impact of the treatment is to measure the average impact of the treatment on the population,
δpop=[Yi1]−[Yi0], (1)
or the average impact on the treated group,
δtr=[Yi1|Zi=1]−[Yi0|Zi=1], 2)
where Yi1 is the outcome for unit i when unit i received the treatment and Yi0 is the outcome for unit i when unit i did not receive the treatment.
The difficulty in estimating δpop or δtr is that only Yi1 or Yi0 can be observed for a particular unit, but not both. In order to estimate effects of treatment or non-treatment on particular units, feature variables (input signals) Xi that include both continuous and categorical variables are used. For example, for a particular advertiser i, Xi can include static characteristics of the advertiser, such as vertical and country, and summaries of activities, such as weekly spend. Vertical refers to the category of an advertiser, i.e. whether they are advertising travel related items, or educational items, etc. The only restriction on Xi is that the variables should depend only on information that could be collected before the treatment started. For simplicity, let Yi=Zi×Yi1+(1−Zi)×Yi0. Then, the problem is to determine the estimation {circumflex over (δ)}pop or {circumflex over (δ)}tr using observed data (Yi,Zi,Xi) for all iε1, K, n. For convenience, let (Y, Z, X) be random variables and (Yi,Zi, Xi), i=1, K, n be considered as observed values of (Y, Z, X).
Embodiments of the present invention consider four possible scenarios for splitting a treated group and a control group for measuring the impact of a treatment of interest. Scenario 0 refers to randomized controlled experiment designs, which are traditionally the easiest way to measure the impact. In Scenario 0, it is sufficient to directly compare the outcome of test and control groups. However, in many cases, this scenario is not realistic.
In Scenario 1, test and control groups are randomly split before offering treatment. However, a unit i that is in the random test group may not get the treatment because unit i did not want to get the treatment or for other reasons which cannot be controlled. For example, a test and a control group can be randomly split among all advertisers, and all advertisers in the test group can be contacted to offer a service to the advertisers in the test group. However, some of contacted advertisers may not accept the service offer. In this case, Zi=1 if unit i accepts the service offer. One of advantages in scenario 1 is that P[Z=1][X=x], the probability of accepting the service offer if contacted, can be estimated using test (contacted) group. A classifier can be applied to select units in the control (not contacted) group who are likely to accept service offers (i.e., units whose P[Z=1][X=x] is large).
Scenario 2 is more realistic than scenario 1 but more difficult to analyze. In many observational studies, the treated group and non-treated group are split according to scenario 2. In scenario 2, the test and control group cannot and should not be split randomly. For example, assume that the impact of a new treatment for a certain disease is to be measured. One cannot or should not necessarily choose patients who will get the treatment. Patients, themselves, should choose whether they will get the treatment or not based on factors, such as their economic conditions or beliefs. In this case, Zi=1 if patient i gets the treatment. The issue to be considered is that there may be some difference between test (treated) group and control (not treated) group. That is, there may be some mechanism that leads certain units to adopt treatment and other units not to adopt treatment. Accordingly, it is not proper to simply compare the two groups directly to measure the impact of a new treatment.
Scenario 3 is a hybrid of scenario 1 and scenario 2. Similar to scenario 1, there are test and control groups in scenario 3. However, these groups are not determined from a random selection procedure. Accordingly, there may be unknown reasons why some units are in the test group and some are not. Further, within the test group, some units accept service offers and some units do not. In this case, Zi=1 if unit i accepts the treatment. Many whitelist trials are included in this scenario. For example, a customer service representative (CSR) can choose a set of advertisers who are first offered a new feature in the ads front-end. Some advertisers may be chosen because they have asked for the new feature, some may be chosen because the CSR believes that the new feature will benefit the advertiser, some may be chose because the advertiser is not entirely satisfied, and some may be chosen for other reasons that may not be clear. Some of the advertisers offered the new feature by the CSR chose to use the new feature and some chose not to use the new feature. Scenario 3 is similar to scenario 1 except for the random sampling of test and control groups.
As described above, the impact analysis tool 102 running on the impact analysis server 100 utilizes various statistical algorithms to measure the impact of a treatment of interest on a treated group and/or a population. In particular, embodiments of the present invention utilize various statistical algorithms in building propensity score models and outcome models to remove selection bias and the effect of seasonality, economic cycle, etc. In order to build these models, the impact analysis tool 102 can retrieve various feature variables from the signal repository 114 and uses the feature variables in the statistical algorithms. According to various embodiments of the present invention, different statistical algorithms can be used in various stages of the automated impact analysis based on the scenario associated with the treatment. An overview of various statistical algorithms that may be used in various embodiments of the present invention is provided below.
Propensity Score Models.
A propensity score p(x) can be defined as the conditional probability that an advertiser (unit) is in the status Z=1, where the advertiser has the characteristics x: p(x)=P[Z=1|X=x]. We can use p(x) as a rule to make the best pairs of treated and non-treated units. For example, when unit A is in the status 0 (non-treated) and unit B is in the status 1 (treated), if propensity scores p(x) for A and B are close, it can be assumed that the impacts of A and B are similar. The motivation for using propensity score methods is that the dimensionality of possible feature variables is high in many cases. When the dimension of feature variables is low, simple matching is straight forward. However, when the dimension is high, it is difficult to determine which feature variables should be used and which weighting scheme should be applied. The propensity score is useful under such circumstances because it provides variables and weights in a data driven way. Also the use of the propensity score is efficient in the sense that computational cost relatively inexpensive, especially when the dimension of feature variables is high and the number of sample is large.
Inverse Propensity Weighted (IPW) Estimation.
If input signal X contains enough information to remove selection bias (i.e., no unmeasurement cofounders assumption: (Y0,Y1)⊥Z|X and 0<p(x)<1), then the observed outcomes can be expressed as:
[ZY1|X)]=[Y1|X)p(X)] (3)
[ZY0|X)]=[Y0|X)(1−p(X))] (4)
Combining (3) and (4) leads to the IPW estimation:
where {circumflex over (p)}(x) is an estimate of p(x). IPW is advantageous in that it is asymptotically unbiased when {circumflex over (p)}(x) is asymptotically unbiased. However, this means that it is required for the propensity model to be correct.
Doubly Robust Estimator.
Suppose that the true relationship is known between the outcome Y (outcome model) (e.g., the difference between pre-treatment advertiser spend and post treatment advertise spend) and the pre-treatment input signals X, that is represented as E[Y|X]=m(X,β) for unknown β, and that the treatment effect δpop is the same for all advertisers. Then, it can be expressed:
[Y|X,Z]=m(X,β)+Zδpop. (6)
It can be noted that:
Thus, if δpop is constant in X, an unbiased estimate of the regression coefficient, δpop is an unbiased estimate of the average treatment effect. However, in practice, it is difficult to assume that δpop is constant in X. IPW estimation shows comparative performance when the propensity score model is correct. However it is biased when the propensity score model is incorrect and its variance is large. Doubly Robust (DR) estimation is a combination of the two methods that is asymptotically unbiased even if either the outcome models or the propensity model is wrong. Let {circumflex over (m)}1(x) and {circumflex over (m)}0(x) be an estimation of E[Y1|x] and E[Y0|x], respectively. Then, the DR estimator is defined as:
The DR estimator is acceptable to use when either the propensity model or the outcome model is correct. If the propensity model is correct, the DR estimator will have a smaller variance than IPW. If the outcome model id correct, the DR estimator may have a larger variance than just using the outcome model. However, the DR estimator provides protection in case the outcome model is not correct.
A simple estimate of the standard error of {circumflex over (δ)}DR can be used to give confidence intervals of δ. Let
Then, the variance of {circumflex over (δ)}DR can be estimated as:
At 504, a scenario relating to splitting the treated group and the non-treated group is detected. In particular, it is detected which of scenario 0, scenario 1, scenario 2, and scenario 3 applies to the treated and non-treated groups of the treatment of interest. The impact analysis tool 102 uses different statistical algorithms to measure the impact for the different scenarios. Accordingly, before the impact can be measures based on the data relating to a treatment of interest, it is determined which scenario applies to the data.
At step 606, it is determined whether there is a selection bias for the members of the test (contacted) group that accepted treatment. If there is no selection bias for the treated group, that is the treated group and the non-treated group are randomly split, the method proceeds to step 610. If there is a selection bias for the treated group, that is those who accepted treatment in the test group (i.e., the treated group) is no randomly selected, the method proceeds to step 612. At step 610, it is determined that scenario 0 applies to the treated and non-treated groups in the data. In this case, it is only necessary to calculate the impact of the treatment on the treated group, and this can be accomplished by simply comparing the outcome between the test and control group, for example using Difference in Difference (DnD) methods. At step 612, it is determined that scenario 1 applies to the treated and non-treated groups in the data. In this case, it is only necessary to measure the impact of the treatment on the treated group (step 506 of
At step 608, it is determined whether the test (contacted) group is the same as the treated group. If the test group is the same as the treated group, the method proceeds to step 614. If the test group is not the same as the treated group, that is some members of the test group do not accept treatment, the method proceeds to step 616. At step 614, it is determined that scenario 2 applies to the treated and non-treated groups in the data. At step 616, it is determined that scenario 3 applies to the treated and non-treated groups in the data. In the cases of both scenario 2 and scenario 3, the impact of the treatment on the treated group (step 506 of
Returning to
Referring to
The SRF algorithm is a non-parametric Random Forests algorithm that is modified to be robust to overfitting.
Returning to
At step 706, the subgroups of the treated group are matched with corresponding subgroups of the non-treated group based on propensity scores. That is, for each treated subgroup, a matching non-treated subgroup is defined having a matching range of propensity scores.
At step 708, an outcome model is generated for each treated subgroup using the matching non-treated subgroup. The outcome model mo(x) for a treated subgroup is a model that predicts the outcome for a member of the treated subgroup if the treatment had not been received based on the input feature values. That is mo(x)=E[Y0|X=x], where E[Y0|X=x] is the expected value for an outcome Y0 for a given feature vector X without receiving treatment. According to a possible embodiment, the outcome model mo(x), for a particular treated subgroup, can be generated using non-parametric estimations of E[Y0|X=x] using the matching non-treated subgroup as training data. For example, in an advantageous implementation, Random Forests, which are relatively resistant to irrelevant feature variables, can be used to generate the outcome model for each treated subgroup based on the corresponding matching non-treated subgroups. A separate outcome model is generated for each treated subgroup.
At step 710, the impact on the treated group is calculated using the outcome models. In particular, the outcome model for each treated subgroup can be used to estimate the outcomes for the members of that treated subgroup if treatment was not received based on the feature values for each member. The impact on the treated group can then be calculated as the difference between the mean of the outcomes of the treated group and the mean of the estimated outcomes for the treated group if treatment was not received (i.e., mean(Ytr1)−mean(mo(Xtr)), where Ytr1 denotes the outcomes for the treated group and Xtr denotes the feature values for the treated group). Accordingly, the impact {circumflex over (δ)}tr on the treated group can be expressed as:
where Yi,tr is the outcome of sample i in the treated group, Xi,tr is the feature vector for sample i in the treated group, ni is the number of samples in the treated group, Sj denotes the subgroups of the treated group, J is the number of subgroups, and {circumflex over (m)}0j is the outcome model for subgroup j of the treated group and is an estimation of m0(x) generated using the matching subgroup of the non-treated group.
Returning to
For scenario 2 and scenario 3, a DR estimator can be used to calculate the impact on the population. In the following discussion, let m1(x)=E[Y1|×=x] be the true outcome model for the treated group, and m0(x)=E[Y0|X=x] bet the true outcome model for the non-treated group.
At step 904, an outcome model {circumflex over (m)}1(x) is generated using the treated group. The outcome model {circumflex over (m)}1(x) is an estimate of m1(x) that can be used to predict the outcome for a set of feature values if subjected to the treatment. The outcome model {circumflex over (m)}1(x) can be generated using a Random Forest algorithm with the treated group as training data.
At step 906, an outcome model {circumflex over (m)}0(x) is generated using the non-treated group. The outcome model {circumflex over (m)}0(x) is an estimate of m0(x) that can be used to predict the outcome for a set of feature values if not subjected to the treatment. The outcome model {circumflex over (m)}0(x) can be generated using a Random Forest algorithm with the non-treated group as training data.
At step 908, the impact of the treatment in the population is calculated using a doubly robust (DR) estimator. In particular, the impact in the population is calculated based on the estimated outcome models {circumflex over (m)}0(x) and {circumflex over (m)}1(x) and the estimated propensity model {circumflex over (p)}(x) using the DR estimator expressed in Equation (11) above, where Zi is the status of a sample i (i.e., treated (1) or non-treated (0)), n is the total number of samples in the treated and non-treated groups, Y is the outcome for sample i, and xi is the feature vector for sample i. It can be noted that if there is little overlap between propensity scores from the treated and non-treated groups, any estimation that is using the controls to estimate the counterfactuals for the treated or using the treated to estimate the counterfactuals for the non-treated may be suspect.
The above-described methods for analyzing impact of a treatment may be implemented on a computer using well-known computer processors, memory units, storage devices, computer software, and other components. Further, the above described impact analysis server and impact analysis tool can also be implemented on a computer using well-known computer processors, memory units, storage devices, computer software, and other components. A high level block diagram of such a computer is illustrated in
The foregoing Detailed Description is to be understood as being in every respect illustrative and exemplary, but not restrictive, and the scope of the invention disclosed herein is not to be determined from the Detailed Description, but rather from the claims as interpreted according to the full breadth permitted by the patent laws. It is to be understood that the embodiments shown and described herein are only illustrative of the principles of the present invention and that various modifications may be implemented by those skilled in the art without departing from the scope and spirit of the invention. Those skilled in the art could implement various other feature combinations without departing from the scope and spirit of the invention.