This patent specification relates to automated decisioning. More particularly, this patent specification relates to systems and methods for automated decisioning having variable learning rates.
Automated decisioning systems have been developed to aid people and businesses to make faster, fact-based decisions in business settings. Typically, automated decisioning systems enable the user to make real-time, informed decisions, while minimizing risk and increasing profitability. Decisioning systems can be used to quickly assess risk potential, streamline account application processes, and apply decision criteria more consistently for approving decisions and/or selling new products or services.
Conventionally, decision-making models or decisioning models have been manually or custom developed by human analysts. They have been deployed, often with the use of scoring software systems where the models score out incoming data. These conventional models do not use the data they were scoring out on to update themselves. Furthermore, they do not use the outcome of their decisions to update themselves. Since the incoming data characteristics in the real world tend to change over time, the models tend to degrade in performance unless they are updated. This updating process has also been conventionally undertaken manually by human analysts. The more quickly the trends and behavior patterns change, the shorter the lifespan of the model, and historic data becomes increasingly unreliable. Furthermore, conventional models do not normally take account of frequently changing lists of eligible choices.
An adaptive decisioning system for making decisions between available choices can be provided. The system includes a processor arranged and programmed to select a choice from the available choices based at least in part on evaluating a plurality of prior outcomes for the available choices, wherein the number of prior outcomes evaluated varies with time. According to certain embodiments, the system includes an input/output system in communication with the processor and arranged to communicate the selected choice to a user and to receive an outcome relating to the selected choice, and the processor automatically learns from the outcome by basing at least some subsequently calculated estimated probabilities on the outcome. Based on further embodiments the process is further programmed to calculate estimated probabilities associated with each choice based at least in part on evaluating a number of prior outcomes for the each choice, and the selection of a choice is based at least in part on the calculated estimated probabilities. The number of prior outcomes evaluated for the each choice can be based at least in part on an estimate of drift of the estimated probability associated with the that choice. The processor can be further programmed such that the selected choice is at least sometimes a sub-optimal choice such that outcome relating to the sub-optimal choice can be obtained, and the sub-optimal choice is selected at a rate that is proportional to an estimated probability associated with the sub-optimal choice.
According to other embodiments, a method for adaptively making decisions between available choices including at least a first choice and a second choice is provided. The method includes selecting a choice from the available choices; receiving an outcome relating to the selected choice; and automatically learning from the received outcome by incorporating the received outcome into subsequent steps of selecting a choice. The method also can also include calculating a first estimated probability associated with the first choice; calculating a second estimated probability associated with the second choice, wherein the step of selecting a choice is based at least in part upon the calculated first and second estimated probabilities, and the received outcome is incorporated into subsequent steps of calculating estimated probability associated with the selected choice. The automatic learning can be based on a learning rate which is variable with time, and influences the degree on which prior outcomes are relied upon when calculating an estimated probability associated with a choice. The learning rate can be a function of time and an estimate of drift of the probability associated with the selected choice.
Articles are also described that comprise a machine-readable medium embodying instructions that when performed by one or more machines result in operations described herein. Similarly, computer systems are also described that may include a processor and a memory coupled to the processor. The memory may encode one or more programs that cause the processor to perform one or more of the operations described herein.
The details of one or more variations of the subject matter described herein are set forth in the accompanying drawings and the description below. Other features and advantages of the subject matter described herein will be apparent from the description and drawings, and from the claims.
a and 8b show decisioning scenarios where a decision is being made to make one of thee different offers;
a shows error rates for nearest neighbor algorithms having different window sizes;
b shows the adaptively updated window size that were used for the data shown in
Adaptive analytics based algorithms can be used in statistical models to provide the capability of realtime automated update of the models in deployment. It has been found that an important factor in self-updating models used for decisioning is the learning rate. The rate at which the model is updated is very important in balancing two considerations; (1) keeping the error rate (which leads to wrong decisions) relatively low; and (2) keeping the rate of learning relatively high whenever the environment (incoming data characteristic or the correct decision) changes to quickly adapt to the change. It has been found that variable learning rate models perform well under many real-world decisioning situations to balance these two considerations.
One example of automated decisioning with variable learning rate has been applied to decisions regarding making either an offer for a first product or service, or an offer for a second product or service, to a customer based on the customer's profile. The model recommends which product to offer to a customer based on known customer information. The feedback given back to the model includes whether the recommended offer was accepted or not. The learning occurs with this feedback by updating the model. With the update, there is an increase or decrease in the probability of accepting this offer by a customer with the same characteristics value. To understand whether different offers would be accepted by customers of specific type or profile, alternate or non-optimal offers are sometimes made to customers and the feedback received and model is updated.
Further detail of the statistical models are provided below. The model used to estimate the probability of accepting an offer i at time t can be represented:
{circumflex over (p)}i(t)=(1−η){circumflex over (p)}i(t−1)+ηIi(t)
where η is the learning rate parameter, which controls how much the past is relied upon. As η approaches 1, the past is weighted less, and as η approaches zero, the network parameters change slowly from the previous model. Ii(t) is the feedback indicator function for offer i at the time t, which can be either 1 or 0:
As described herein, separate models {circumflex over (p)}(t) can be used for each combination of segment (for example, customer age, income, gender, etc.) and offer type (for example, offer to sell cellphone A, cellphone B, etc.). The optimum offer to make to the customer in a segment at a given time is then:
A simple probability table can be used for the model {circumflex over (p)}i(t), where for every possible combination of input values there is an output probability value. The predication model takes as inputs values characteristics of the object on which a prediction needs to be made. Shown in Table 1 and Table 2 is a simple model for two offers that takes as input the characteristics of a customer and produces the predicted acceptance rate for a given offer as the output. Tables 1 and 2 also correspond to models 210 and 220 respectively as shown in
A variable learning rate can be provided. One example of a dynamic learning rate in the context of prediction Bayesian networks, as opposed to decisioning systems, is described in I. Cohen, A. Bronstein, and F. Cozman, Adaptive Online Learning of Bayesian Network Parameters, HPL-2001-156.pdf (2001), and in United States Patent Application Pub. No. US2003/0115325, both of which are incorporated by reference herein. In order to understand the changes in the underlying model in real time, capture that change and take action,the above formulae are modified. The new formulae makes the learning rate a function of both: t (counts of the observations); and how far the estimate is away from the moving average over a period of a number of run. It has been found, for example, that for many applications, the deviation from a moving average of 100 runs is suitable.
The average and sample standard deviation of the adaptive learning algorithm {circumflex over (p)}i(t)=(1−η){circumflex over (p)}i(t−1)+ηIi(t) can be given by:
In other words, the learning parameter is both a function of time and of the estimated drift. In one variation, the foregoing formulae is implemented using the following computer code:
Active experimentation can be used in the learning process for decisioning systems—where decisions are recommended for more than one offer. It has been found that the decisioning system should make non-optimal offers (i.e. alternate offers), going against what the model recommends, in order to generate new training data for non-optimal target values. In order to learn how a particular customer type would respond to non optimal (according to the model) offers, such non-optimal offers need can be made at regular intervals. Without the use of experimentation, the optimum offer (e.g. with the highest value or greatest probability of being accepted) is always selected. In this case it becomes difficult or impossible to detect changes in the other, non-optimal offers. Unless the non-optimal offers are very close to the optimal offer, the non-optimal offers will never be selected and therefore those models will not detect changes with respect to those non-optimal offers. Thus in real world applications when making decisions among multiple offers whose probabilities are changing with time, simply selecting the optimal offer without experimentation will not allow for determining the accurate estimates of non optimal offer's acceptance probability.
Note that when the second best offer is close to the most likely accepted offer, there will not be much loss with both a low and high learning rates. In
On the other hand, when the offers are quite different in terms of their response rates, a higher learning rate causes a faster capture of the change but also causes many more errors due to the high variance in the expected rate. In
Making non optimal offers involves a cost higher than that of offering the optimal offer and hence should be minimized. At the same time, making non optimal offers is required to detect changes in the customer preferences. It has been found that the rate at which the alternate offers, or non-optimal offers, are made can be tied to the learning rate, which can be calculated as described above. Thus the rate at which alternate, or non-optimal offers are made can be governed by the learning rate: increasing when the learning rate is high and decreasing when it is low.
As in the context of predicting with respect to a single offer, with two or more offers decisioning systems with slow learning have more exposure to systematic errors, such as shown in
Since the decision rule
does not drive estimates of the alternative offers, alternative offers need to be tried to estimate the alternative offer probabilities. In one variation, a simple method for experimentation is to select the offers according to the probability of that offer being accepted over the sum of the probabilities of all the offers. In other words, an offer j is selected with having probability:
It has been found that, in general, more effective results are achieved when the learning rate is incorporated. Relying on the convergence of the estimate, bias the selection towards i* by weighting the sum by the learning parameter.
where γ is a scaling parameter. The above formula can be implemented using the following computer code:
According to yet alternative embodiments, the likelihood of error estimates can be used (as described above) to drive the decision to try alternatives, or the cost of “good” and “bad” decisions can be incorporated.
a and 8b show decisioning scenarios where a decision is being made to make one of thee different offers. In
In
The adaptive learning rate and experimentation techniques described herein can be applied to different model types like decision tree and nearest neighbor.
In one variation, the following code can be used to update the window size for a decision tree algorithm.
One decision tree can be used for each offer. For a given input value, the tree for different offers is used and the offers are compared. When choosing from multiple offers, using different trees, the best offer is not always chosen. The alternate offer selection mechanism is applied here as well and the non optimal offers are chosen to get data points for non optimal offers.
As more feedback is gathered with time, more and more examples or data points are placed on the neighborhood space shown in
The updates to the window size occur in a manner analogous to the way the learning rate is updated, as described above. When the window size is updated based on performance of the system, the nearest neighbor model adapts to the changes. The following code can be used to update the window size used in a nearest neighbor algorithm.
a. shows error rates for nearest neighbor algorithms having different window sizes. As can be seen from
Several example embodiments of variable learning rate decisioning systems will now be described in further detail. In a marketing setting, the objective is often to make the right offer to the customer who walks in to a store or call a customer service center. The same customer might not prefer the same thing at various time instances. Over time, preferences of customers change and so the same offer might not work later, even though it would have worked in the past. To counter this problem, the decision system as described herein is used to make decisions on what to offer. The system advantageously adapts to the changing reaction to offers and adjust itself to detect and react to the changing preferences. In order to more efficiently detect changes, the system also performs experimentation by making non optimal decisions as a means of exploring the various offers and seeing if the response rate for the different offers has changed. This constant experimentation and adaptation leads to the system being able to help with making the offer recommendation decision, even if the preferences changes.
The decisioning techniques described herein can be applied to decisioning in the context of a buyer deciding which product or service to purchase or use.
Various implementations of the subject matter described herein may be realized in digital electronic circuitry, integrated circuitry, specially designed ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various implementations may include implementation in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, coupled to receive data and instructions from, and to transmit data and instructions to, a storage system, at least one input device, and at least one output device.
These computer programs (also known as programs, software, software applications or code) include machine instructions for a programmable processor, and may be implemented in a high-level procedural and/or object-oriented programming language, and/or in assembly/machine language. As used herein, the term “machine-readable medium” refers to any computer program product, apparatus and/or device (e.g., magnetic discs, optical disks, memory, Programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term “machine-readable signal” refers to any signal used to provide machine instructions and/or data to a programmable processor.
To provide for interaction with a user, the subject matter described herein may be implemented on a computer having a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to the user and a keyboard and a pointing device (e.g., a mouse or a trackball) by which the user may provide input to the computer. Other kinds of devices may be used to provide for interaction with a user as well; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.
The subject matter described herein may be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a client computer having a graphical user interface or a Web browser through which a user may interact with an implementation of the subject matter described herein), or any combination of such back-end, middleware, or front-end components. The components of the system may be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include a local area network (“LAN”), a wide area network (“WAN”), and the Internet.
The computing system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.
Although a few variations have been described in detail above, other modifications are possible. For example, while some of the variations described herein have been described for some applications, other uses of the adaptive decisioning systems include applications such as fraud detection systems, where an adaptive decisioning system is used to react quickly to emerging fraudulent behavior. In addition, the logic flow depicted in the accompanying figures and/or described herein do not require the particular order shown, or sequential order, to achieve desirable results. Other embodiments may be within the scope of the following claims.
This present application claims priority under 35 U.S.C. §119 to U.S. Provisional application Ser. No. 60/891,191, filed Feb. 22, 2007,the disclosure of which is incorporated herein by reference.
Number | Name | Date | Kind |
---|---|---|---|
20010020236 | Cannon | Sep 2001 | A1 |
20030115325 | Cohen et al. | Jun 2003 | A1 |
20030208754 | Sridhar et al. | Nov 2003 | A1 |
20060200541 | Wikman et al. | Sep 2006 | A1 |
20070112615 | Maga et al. | May 2007 | A1 |
Number | Date | Country |
---|---|---|
WO 0070481 | Nov 2000 | WO |
Entry |
---|
I. Cohen, A. Bronstein, and F. Cozman, Adaptive Online Learning of Bayesian Network Parameters, http://www.hpl.hp.com/techreports/2001/HPL-2001-156.pdf (2001). |
Number | Date | Country | |
---|---|---|---|
20090164274 A1 | Jun 2009 | US |
Number | Date | Country | |
---|---|---|---|
60891191 | Feb 2007 | US |