The present invention relates generally to techniques for gauging whether customers and potential customers will take certain actions in the future. More particularly, the present invention relates to techniques for calculating a score that predicts customer activity in the future such as whether the customer will make a purchase, visit a store, etc., or how much money the customer will spend, how many times the customer will shop, etc.
Customer relationship management often attempts to predict future customer behavior. It is desirable to know how individuals and groups of customers will respond to marketing or other initiatives of a product or service. This response is a driving factor when developing strategies of how and when to market to different groups of customers.
When selecting targets for specific direct marketing events, analysts often try to predict the likelihood of an individual customer response. It is frequently desired to include the customers in the event who have the highest response rates. Common techniques of selection include schemes based on one or a small number of variables representing past behavior (e.g., spend, number of transactions, type of transactions, frequency of activity, time since last activity), the classical Recency, Frequency, Monetary (RFM) scheme, and response models analyzing a similar marketing event from the past.
Selections based on a single variable, or a small number of variables (e.g., choosing all retail customers who have shopped in the last six months), although easy to implement, are typically not very powerful in terms of resulting Return On Investment (ROI).
The classical RFM scheme (which consists of dividing the customers in quintiles in each of the three dimensions and subsequently choosing certain parts of the resulting 125 segments), while somewhat more powerful than single- or few-variable based selections, is often difficult to implement because it is unclear as to which segments to choose, and how to choose within segments if certain target numbers are desired. Many of the existing variations of the classical RFM scheme have similar characteristics. Moreover, although both RFM and single-variable-based selection have the advantage of universality (i.e., they are independent of the specific marketing event that is being planned), which implies that they can be calculated once (within certain intervals) and used for all desired selections, with this convenience comes the disadvantage of reduced precision, since they are based on at most three variables.
Because response models can be based on a multitude of variables available on a customer level, if based on events similar to an upcoming effort, they tend to predict the results of that effort more precisely than single-variable and RFM schemes. However, for this same reason, response models tend to be less universal than these schemes. Moreover, response models require a significantly larger effort to develop, which often makes them impractical to use for every type of marketing event a business may want to execute.
Other data based approaches to customer relationship management include lifecycle management and behavioral/demographic segmentations. The management of these customer segments is customer-centered, and, hence, represents an important advance over product-based management. However, because these segments are based on demographics, a few discrete behaviors, or the life-stage of the customer, these segments tend not to directly align with future behavior.
Thus, an approach that explicitly pursues the target of future customer activity is needed, such that the population can be segmented accordingly.
In accordance with the present invention, techniques for calculating a score that predicts customer activity in the future such as whether the customer will make a purchase, visit a store, etc., or how much money the customer will spend, how many times the customer will shop, etc., are provided. Techniques for using this score are also provided. Furthermore, the present invention encompasses systems that calculate and use the score.
In certain embodiments of the invention, methods for scoring customers for marketing are provided. These methods include: collecting demographic data and transactional data for each of the customers; summarizing at least one variable in the demographic data and/or the transactional data to form summary data, and attaching the summary data to each of the customers; applying a statistical algorithm to the demographic data, the transactional data, and the summary data to create a model of a target variable related to customer activity and/or loyalty; deriving a score for each of the customers from the model; selecting at least some of the customers based on the score; and marketing directly to the selected customers.
In other embodiments of the invention, systems for scoring customers for marketing are provided. These systems include: at least one database containing demographic data and transactional data for each of the customers; a computer that receives from the at least one database the demographic data and the transactional data, summarizes at least one variable in the demographic data and/or the transactional data to form summary data, and attaches the summary data to each of the customers, applies a statistical algorithm to the demographic data, the transactional data, and the summary data to create a model of a target variable related to customer activity and/or loyalty, and derives a score for each of the customers from the model; selects at least some of the customers based on the score for each of the customers; and markets directly to the selected customers.
In yet other embodiments of the invention, computer readable mediums are provided. These mediums include instructions being executed by a computer, the instructions including a software application for scoring customers for marketing, the instructions for implementing the steps of: collecting demographic data and transactional data for each of the customers; summarizing at least one variable in the demographic data and/or the transactional data to form summary data, and attaching the summary data to each of the customers; applying a statistical algorithm to the demographic data, the transactional data, and the summary data to create a model of a target variable related to customer activity and/or loyalty; deriving a score for each of the customers from the model; selecting at least some of the customers based on the score; and marketing directly to the selected customers.
Various objects, features, and advantages of the present invention can be more fully appreciated as the same become better understood with reference to the following detailed description of the present invention when considered in connection with the accompanying drawings, in which:
As described above, in accordance with various embodiments of the present invention, techniques are provided for creating customer-level activity scores that express with one number per customer the magnitude of his/her future activity. Techniques are also provided for applying these scores in marketing. Furthermore, the present invention encompasses systems that calculate and use the score.
More particularly, various embodiments contemplated by the present invention envision obtaining customer scores that state the predicted activity level of each customer during a future time period (e.g., the next 12 months). Just like credit scores provide a single number expressing a customer's risk of loan default, the scores of the present invention give a single number expressing future customer activity, and, hence, by strong association, may also predict the susceptibility of a customer to marketing. These scores can be as beneficial for marketing in various industries as credit scores are today for the lending business. The scope of this approach may only be limited by the availability of customer data. Thus, for example, with data from one retailer, one can score all of its customers. Similarly, for example, with data from all department and specialty stores, one, such as a tender provider like Visa or MasterCard, or a participant to a data sharing agreement, can score all shoppers and provide an industry-wide marketing tool.
A statistical model may be used to obtain the scores provided by the present invention. For example, certain implementations may use parametric models, such as logistic regression models, a linear regression model, a non-linear regression model, a generalized linear model, generalized estimating equations, linear discriminant analysis, and quadratic discriminant analysis. As another example, certain implementations may use non-parametric models such as neural networks, support vector machines, nearest-neighbor models, non-parametric regression models, a spline model, a kernel model, a patient rule induction method, and a tree algorithm. Where the model uses a tree algorithm, CART, CHAID, TreeNet, Random Forests, or any other suitable tree algorithm may be used.
A target variable may be used to measure activity. For example, a target variable may be binary, e.g., a flag indicating whether there is activity in 12 months—thus indicating whether a customer is likely to be active or inactive may be used. Other binary events may include: a customer engaging in a given number of transactions in a given period; a customer spending a given amount in the given period, a customer making a given number of retail visits in the given period; a customer qualifying for a loyalty program; a customer showing purchase activity in a given period; and a customer purchasing or subscribing to a certain combination of products. Other target variables that may also be used may be numeric, e.g., the number of transactions, the number of purchases, the number of retailer visits, the number of products and/or subscriptions bought, the spending volume, the number of visits to a Web site, the number of purchases of at least a certain amount, or any combination of these.
Variables for predicting a value for this target variable (i.e., predictor variables) may include past customer transactions, demographic data, account information, and any other relevant data available on a customer level. It may also be desirable to adjust, transform, and derive additional variables from these predictor variables to increase their predictive value. In addition, the predictor variable, as adjusted and/or transformed, and any derivatives, may be summarized at various levels (e.g., retail store, transaction location, zip code, county, state, country, population cluster, etc.), and these summaries attached to each customer in the respective category—hence creating additional predictor variables, which may improve prediction. For categorical base variables, the summary variables may be absolute frequency and/or relative frequency of every level in the variables, for example. For continuous base variables, the summary variables may be mean, standard deviation, and quantiles, for example.
As will be appreciated by one of ordinary skill in the art, the activity scores (i.e., prediction of target variable results) provided by the present invention may be valuable marketing tools, especially for direct marketing. For example, for a typical marketing event, one may want to select the customers with the highest scores, which may translate into higher response rates and sales volume. Use of activity scores in this way may provide a substantial increase in ROI when compared to less advanced selection methods. The scores may also enable targeting of specific customer-lifecycle and activity segments, and, therefore, facilitate a variety of marketing strategies. For example, an attrition prevention strategy could be implemented by direct marketing to active customers with low or declining scores.
The target variable is preferably chosen at step 108 to measure customer activity. The specific choice depends on the goals of the implementation and therefore any suitable target variable may be chosen. At least some embodiments of the present invention may use as a target variable a flag for customer activity (e.g., the flag may indicate that the customer is (or has been predicted to be) active/not active in a certain time period), the number of transactions made by the customer, the number of purchases made by the customer, the number of (retail) visits by the customer, the number of products bought by the customer, the purchase volume of the customer, or any other suitable metric. A flag for customer activity or the number of visits by the customer may be chosen because these targets are generally good proxies for marketing response (see the discussion of
The forecast time periods tF selected at step 104 may be set according to a specific objective of a specific implementation of the invention. Several points may be considered in making this selection. For example, it may be desirable that tF be a meaningful time period for the particular business or application, and be at least large enough that this process of score creation can be carried out and applied to the desired task. In such case, tF preferably will be large enough such that the universality of the activity scores can be taken advantage of, i.e., each scoring can serve several applications. As another example, because only customer data up to the current time to minus tF can be used for model fitting (see the description accompanying step 128 below) tF may need to be small enough to ensure the existence of such data. Although not strictly necessary, it may be desirable to have customer data for at least some members of the population going back at least as far as t0−2×tF and hence to have a time interval of historical data to be used for modeling of at least length tF (i.e., at least the interval from t0−2×tF to t0−tF should be used). More history can likely improve the precision of the model, and may therefore be preferable. In businesses where seasonal variation is sizable, selecting a time period tF large enough to encompass them proportionally may be desirable. For example, where summer sales differ greatly from winter sales, or holiday sales from non-holiday sales, choosing a forecast period of one year may represent all seasons appropriately. One embodiment of this invention may use tF equal to one year for example. However, for a fast-moving business an adequate forecast period may be a fraction of a day, whereas for a slower moving one, multiples of decades may be more appropriate.
After setting the forecast time period and the target variable at steps 104 and 108, input elements for the statistical model fit may be prepared at steps 112, 116, and 120. At step 116, the target variable may be obtained or calculated for all customers at the current time, or at the most recent time point for which the target variable is available. This time point may be denominated t0. (Thus, at this particular step/point in time, the target variable is calculated or obtained rather than being predicted.) At step 120, a statistical algorithm may be chosen and at step 112, predictor variables may be prepared.
An example of a process 200 for the creation of the predictor variables at step 112 is outlined in
Turning more particularly to the steps of
Next, at step 232, the variables from the input data sets collected at steps 204, 208, 212, 216, 220, 224, and 228 may be prepared to serve as predictor variables in a statistical model. This step may determine what information the statistical model will be able to use. Quantitative knowledge about the problem and the data may be used to automatically select useful information based on preprogrammed parameters, or any other suitable process, or manually select useful information based upon user input. Some of the variables may need to be adjusted, transformed, converted from discrete to continuous values, or from continuous to discrete values, and combined or used to create new derived variables.
There are generally two aspects to keep in mind when creating variables at step 232. First, variables are preferably predictive of the target variable. At least some implementations of this invention may create large numbers of variables, including many that are suspected of being predictive. When in doubt, one may elect to err on the side of more variables. If the variables later turn out not to be predictive, the statistical model will usually remove them or weigh them down, without increasing the overall prediction error. Second, variables are preferably robust with respect to the time for which they are obtained—i.e., they should have the same meaning and predictive characteristics at times t0 and t0−tF, and also at different times t0 or t0−tF for different reruns of this process. Seasonal variations may present challenges to robustness. For example, total purchases in the voluminous month of December may be less predictive of January activity than June purchases are of July activity. Similarly, a $x purchase in December could have a very different meaning from a $x purchase in January. Hence, the variable “total purchases during the last month” may be predictive but not robust. However, it may be possible to make the variable robust, for example, by modifying it to “total purchases during the last month divided by average customer purchases during the last month.” This transformation will adjust for seasonal volume differences while hopefully maintaining most of the relevant information about the customer that was contained in the variable. Thus, at step 232, a customer-level data set (one record per customer) with a fairly sizable number of variables may be constructed.
The remaining steps in
As shown in
Turning back, to process 200 illustrated in
Continuing with process 100 illustrated in
Next, at step 128, a statistical model may be fit to the target variable. In at least some embodiments of this invention, this can be performed in one step by applying the algorithm straightforwardly or with the usual tweaks and parameter calibrations that skilled statisticians are familiar with. In other embodiments, to reduce computational cost and gain some insights, one may employ a multi-step strategy of variable selection. Such an approach fits the model independently with various subsets of predictor variables, where each predictor variable is present in at least one subset. Using the variable importance criterion of the algorithm, the most important variables may be chosen from each subset to form the set of predictor variables used in the next or final model fit. Sometimes, variable importance may be used again to further reduce the set of variables. Variations of this multi-step approach may be possible.
At step 124, customer-level predictor variables are obtained up to the current time (i.e., time t0). These customer-level predictor variables may be obtained as described for step 112 above. Next, at step 132, the customers will be scored using the statistical model created at step 128 and predictor variables obtained at step 124. The resulting customer-level activity scores may be used to predict the target variable at the future time t0+tF.
Process 100 of
Beginning at step 104, the forecast period tF may be set to 12 months. This choice may avoid problems with seasonal effects and is a meaningful time period for most regular retail businesses. Note however that for tF=12 months, it may be useful to have at least 24 months of customer and transactional history available (see considerations above). With 13-24 months of history, predictions are technically still possible, but their quality may suffer.
At step 108, the target variable may be chosen to be an indicator of customer activity in a 12-month period referred to as “activity12”. Acitvity12 may be set to “1” if the customer made, makes, or will make at least one purchase in the corresponding 12-month period, and activity12 may be set to “0” otherwise.
As described above, t0 is the current time, which may be the last time with complete data refresh such as the end of the last month or the last week. At step 112, all customer-level variables may be compiled as detailed in
Our target variable, activity12 may be obtained for the current time to at step 116. This simply means that activity12=1 is assigned to every customer who has made a purchase in the last 12 months, and activity12=0 is assigned to all customers who have not.
CART (Classification and Regression Trees) may then be chosen as the statistical algorithm at step 120. CART was originally developed by Leo Breiman (Department of Statistics, University of California Berkeley) in 1984. It is now part of many software packages, e.g., the CART package by SALFORD SYSTEMS. Note that many other choices of algorithms and software packages may be used.
At step 128, the CART algorithm selects variables out of the predictor variable set that are significant for distinguishing between the levels of the target variable (whether customer was active or not), and ultimately constructs a complex formula that can assign a probability of activity (a number between 0 and 1) to each possible combination of the predictor variables. In practice, one may split up this data set (predictors up to past time with target of current time) into a training, a validation, and a testing set, where the first is used for model fitting (“growing” the tree in case of CART), the second for adjusting certain fitting parameters (“pruning” in case of CART) and the third for evaluating the true predictive characteristics of the final model (i.e., error rates).
Next, at step 124, the predictor variables may be compiled again, but now up to the current time (t0), using the same process outlined before for step 112. This predictor variable data set may then be fed into the formula of the statistical model at step 132, to form a score for each customer (the number between 0 and 1). In this example this score is the predictive probability that the customer will make a purchase over the next 12 months. For example, a customer with a score of 0.9, has a “90% chance” of being active over the next year, whereas the customer with a score of 0.5 only has a “50% chance”.
Turning to
More particularly, as shown in
Process 600, as shown in
Process 700, as shown in
Process 800, as shown in
The processes described above in accordance with the present invention, as illustrated in
As described above, the resulting marketing indicators may be used to target marketing activity, and hence output devices may be used, for example, to generate mailing labels, to generate email or printed advertisements, to insert flyers into mailing (such as credit card statements), to route sales calls, etc. Thus, the output devices may include printers 928, email servers 932, inserting machines 936, telephone equipment 940 (e.g., computer telephony integration (CTI) or automatic call director (ACD)), or any other suitable equipment.
Although specific embodiments of the invention are described herein, it should be apparent to one of skill in the art that the present invention may be implemented with various alternatives within the spirit of the invention, and that the scope of the invention is limited only by the claims that follow.
This application claims the benefit of U.S. Provisional Patent Applications Nos. 60/636,128, filed Dec. 14, 2004, and 60/665,604, filed Mar. 25, 2005, which are both hereby incorporated by reference herein in their entireties.
Number | Date | Country | |
---|---|---|---|
60636128 | Dec 2004 | US | |
60665604 | Mar 2005 | US |