The present invention relates generally to systems and methods for improving customer relations. More specifically, the present invention relates to a system and method for predicting customer attrition from an online service provider with a contractual subscription using dynamic user interaction data.
In the business world, attention to maintaining customer satisfaction in connection with products and/or services provided by business is paramount. This is particularly true for online service providers. Customer experiences using online service providers can generally be described as follows. First, a customer registers at a website. Next, the customer receives some free service and decides to subscribe with a certain contractual length, e.g., one month, three months, six months, twelve months, and the like, at a certain price for advanced services. Thereafter, the customer uses the services provided through the website for a period of time. Next, the customer decides whether to renew the contract for advanced services before expiration of the contract.
Although some online service providers, e.g., online dating companies and the like, record detailed customer service utilization data, this data has not generally been used to its full potential in predicting customer attrition. In particular, attrition and/or fading models are generally applied in many businesses using Customer Relationship Management (CRM) systems. Often such CRM systems include a single, static model that produces a one-time score for each customer. In some cases, the model score can be regenerated periodically with an update of some time series variables. However, these approaches are generally flawed for businesses with fixed length contractual subscription, flexible cancellation policies, adequate recorded service utilization patterns, and the like, as they generally fail to capture a customer's changing behavior in different periods of the whole subscription lifecycle. Consequently, the approaches currently implemented in the industry are typically not tuned to predict the attrition event early enough, e.g., before customers make decisions to abandon services. Therefore, these approaches generally do not fit well for risk mitigation.
Thus, a need exists not only for accurately predicting customer attrition, but also predicting attrition in such a manner that can predict the attrition event early enough in order for a service provider to intervene and/or save the customer. In particular, a need exists for predicting customer attrition so as to allow for targeted treatment opportunities to retain customers for a longer period of time. These and other needs are satisfied by the exemplary systems and methods disclosed herein.
The present invention relates to a system and method for predicting customer attrition using dynamic user interaction data. The system allows a user to load customer data into a database, processes the customer data using a scoring engine to calculate one or more attrition scores, and outputs and transmits the attrition scores to the user prior to expiration of a subscription of the user in order to increase a likelihood of renewal of the subscription by the user. The attrition scores can be utilized to predict customer attrition early and allow for a timely intervention to save a customer.
The foregoing features of the invention will be apparent from the following Detailed Description of the Invention, taken in connection with the accompanying drawings, in which:
The present invention relates to a system and method for predicting customer attrition using dynamic user interaction data, as discussed in detail below in connection with
Attrition modeling is widely applied in various industries. For businesses with contractual subscriptions, flexible cancellation policies, detailed customer service utilization data, and the like, the present disclosure provides an exemplary system and method for predicting and/or tracking customer attrition behaviors. The exemplary system and method disclosed herein provides the following features, among others: (1) population (customer) segmentation by subscription type and length; (2) an event of turning off auto renew is regarded as the attrition decision making signal in model development; (3) a short term hazard rate is predicted and treated as a target variable; (4) attrition signals from dynamic utilization data are generated from up-to-date user service utilization data to capture subtle customer behavior patterns and changes, including the comparison of customers with other users; (5) models are built for segmented user groups and distributed over time (e.g., days) and/or segments since subscription, over time of subscription, and the like, targeting various attrition behaviors at different stages of the subscription; (6) reason codes can be generated based on clustering of variables and/or visualized with a combination of score trends and/or a graphical chart of service utilization patterns; and (7) a production system implementing the steps described below can be utilized. The system disclosed herein thereby allows the capture of attrition signals early and, with the intuitive visualization design, provides the opportunity to intervene substantially on the spot.
An important aspect of the system and associated methods disclosed herein includes population segmentation. Although it is tempting to build a single model that covers all subscribed users, the entire population consists of groups with drastically different behaviors in business involving contractual subscriptions. Thus, population segmentation, i.e., customer segmentation, should be implemented. In general, there are at least three major user groups: (1) first time subscribers, (2) re-subscribers, and (3) renewal customers. Among each group, there are generally different contractual subscription periods. Thus, the attrition behavior of these groups is significantly different from each other, making a one-model-fit-all approach inadequate. Although the population segmentation described herein focuses on three month initial subscribers, in other embodiments, the exemplary methodology can be applied to different population segments.
An additional aspect of the system and associated methods disclosed herein includes an attrition decision time identification. Attrition is typically identified when a customer does not renew a service previously utilized. However, customers generally do not wait until the last moment to make the decision not to renew the service. For example, customers usually stop using the service in the middle of the contract. Thus, any customer service utilization data obtained after the decision was made generally leads to label leakage. Moreover, once the attrition decision has already been made, it is generally difficult to change. Therefore, it is important to predict the attrition event before the customer actually makes the decision.
The auto-renewal for a service contract is typically set to “on” by default by online service providers. When a customer actually turns the auto-renewal feature off in the middle of the subscription, it is a strong indicator that the attrition decision was made. This event can be called Renewal Turn-Off. For example, based on data obtained from an online dating company, about 98% of the customers who had a renewal turn-off event actually attrited, while all those who did not have a renewal turn-off event renewed their service contract. Therefore, all data after a renewal turn-off event should generally be ignored to avoid label leakage.
Another aspect of the system and associated methods disclosed herein includes an attrition target label. The exemplary attrition model discussed herein can be run on a daily basis to meet the business needs of timely intervention. To construct the model-building dataset, an attrition label should be assigned to each data record. In some exemplary embodiments, all records of a user may be labeled as positive, as long as the user has a renewal turn-off event anytime during the subscription. In this situation, the eventual attrition rate should be a monotone decreasing function, as shown by the exemplary real data represented by curve “a” in
The attrition event can be predicted by the system with a defined time window, e.g., whether or not a customer is going to have a renewal turn-off event within the next seven days. Data collected indicates that greater attrition and/or renewal decisions are made closer to the end of a subscription period. Thus, the “7-day hazard rate” generally increases as a function of time, as shown by curve “b” in
Another aspect of the system and associated methods disclosed herein includes the use of variables to categorize user profiles and behaviors. These variables can form part of the models implemented by the scoring engine of the system. At least two major categories of variables can be generated, e.g., static variables, dynamic variables, and the like. In general, static variables includes user profile information which rarely changes during the entire subscription lifecycle of the user. In contrast, dynamic variables are those that reflect the user's experience and/or behavior at different stages of the entire lifecycle. The generation of dynamic variables typically requires the processing of time series data to capture various patterns and/or signals. The captured patterns and/or signals from dynamic variables can be implemented by the system.
Exemplary dynamic variables can include, e.g., service utilization quantity measures, ratio variables, peer comparison variables, self-comparison variables, and the like. Although the exemplary dynamic variables discussed herein are provided for an online dating service provider, those of ordinary skill in the art should understand that alternative dynamic variables can be implemented based on the type of service provider utilizing the system.
For an online dating service provider, service utilization quantity measures can include, e.g., the number of matches, communications, successful matches, and the like. The users of such clients constitute a special type of social network. Thus, matches and/or communications can be regarded as lines between different vertices that represent users. Further, these variables generally indicate the degree of each vertex. More advanced features, e.g., a loop count, and the like, may also be derived from these networks.
Ratio variables can include, e.g., a response rate, an acceptance rate, a success rate, an effective match rate, and the like. These variables capture the interaction between variables and can bring deeper business insights. In particular, the guided communications on an exemplary online dating service platform provide interesting and/or useful user behavior patterns and/or hidden information regarding user experiences. For example, users generally go through a number of steps. The success rate can therefore be measured as the number of final-stage communications a user has reached divided by total first stage communications.
Peer comparison variables, e.g., group normalization variables, can measure the engagement level of a customer relative to other customers. For example, the user experience can be normalized by the average of renewed population in the same subscription period. This is important because users tend to use a service more intensively in the beginning phase and then decrease the amount of use of a service as time passes. For example, receiving three matches in day ten is fundamentally different from receiving three matches in day eighty. Thus, absolute value is generally less meaningful than ratio variables. To compensate for this, a renewal population may be utilized as a benchmark in the normalization. Therefore, a value of 0.8 can mean that the value is 80% of the renewal population average.
Self-comparison variables generally measure the engagement level of a customer compared with one's own longitudinal history. People are intrinsically different from each other. Some people are more proactive, while others are more conservative. Thus, the same amount of utilizations from proactive users has different ramifications than from a more passive user. The system can normalize dynamic variables by the customer's historical information. For example, the most recent engagement level of a customer may be compared with the historical average. In some exemplary embodiments, normalization with a customer's most active engagement level may be implemented.
As discussed above, the system and associated methods disclosed herein include one or more attrition models. For example, a segmented and distributed model may be implemented to address varying attrition behavior for different segments of a subscription and over time. Clustering methods can also be utilized to reveal detailed user segmentations. As an example, gender will be used to segment the user population. A traditional approach is to build one single model for each segment. This approach has several disadvantages in that, e.g., it requires a complex normalization of variables over the subscription period, it does not provide the flexibility to tailor predictive variables for different stages of a subscription, and the like.
Rather than building a single model, the system can construct a series of models distributed over time. In particular, a designated model should be built for each day of a subscription. With the model distributed over time, the predictive power can be optimized based on the number of days since subscription. Further, different variable sets can be utilized at various stages of a subscription period. Variable selection of a model generally indicates that the number of commutations initiated was predictive in the early stages and became less so in later stages based upon which website activity and/or life cycle completeness became more significant. This is further discussed and confirmed below with respect to the exemplary results of reason code distribution.
With reference to
An additional feature of the system and associated methods disclosed herein includes reason code generation and/or visualization. Reason codes provide an explanatory guide to the end-users of the system. The reason codes of the system can be based on variable clustering and/or business consideration. Exemplary reason codes can be, e.g., login activity, active service engagement activity, passive service engagement activity, positive experiences, negative experiences, price sensitivity, service quality, and the like.
In an online dating service example, eight reason codes were implemented by the system, as shown in Table 1 below. In particular, Table 1 shows the reason code distribution by month of a life cycle. All variables selected by the model were divided into the eight categories and each category indicates a distinct reason why the model generated a high score. In different periods of a user's life cycle, different reasons generally contribute to a higher-than-average score. Table 1 also shows the distribution of top reason codes across the user life cycle, which can offer, e.g., business directional guidance for a marketing campaign.
The score trend and bubble chart of
A sample output format generated by the system is provided in Table 2 below. The sample output can include, e.g., a user ID, days since subscription, a predicted probability value, a relative risk value, a precision value, an indication of at least one reason code, and the like.
As discussed previously, the two stages of the system include an initial data load stage and a daily scoring engine processing stage. In the exemplary process described above, the daily summary file summarizes the customers' historical information, thereby generating the “DNA” for each customer. To improve robustness, a copy of a current summary file can be made before it is updated. Also, to make the daily scoring more efficient, an exponentially decaying weighted moving average (EWMA) technique can be applied instead of a regular moving average for those service count signals, i.e., dynamic signals, previously discussed. The exemplary EWMA technique can be represented by Equation 1 below:
EWMAn=EWMAn-1*k+sample*(1−k) Equation 1
where k is a decay factor. The EWMA technique does not require the scoring of historical data. Rather, the EWMA technique only requires storing the current EWMA and updating data based on the most recent EWMA. This generally improves the solution time efficiency and the space efficiency of the system.
Still with reference to
The present invention can be embodied as an attrition prediction software module and/or engine 306, which can be embodied as computer-readable program code stored on the storage device 304 and executed by the CPU 310. The engine 300 could be programmed using any suitable, high or low level computing language, such as, e.g., Java, C, C++, C#, .NET, and the like. The network interface 308 can include, e.g., an Ethernet network interface device, a wireless network interface device, any other suitable device which permits the processing server 302 to communicate via the network, and the like. The CPU 310 can include any suitable single- or multiple-core microprocessor of any suitable architecture that is capable of implementing and/or running the attrition prediction engine 306, e.g., an Intel processor, and the like. The random access memory 312 can include any suitable, high-speed, random access memory typical of most modern computers, such as, e.g., dynamic RAM (DRAM), and the like.
Having thus described the invention in detail, it is to be understood that the foregoing description is not intended to limit the spirit or scope thereof. It will be understood that the embodiments of the present invention described herein are merely exemplary and that a person skilled in the art may make any variations and modification without departing from the spirit and scope of the invention. All such variations and modifications, including those discussed above, are intended to be included within the scope of the invention.
This application claims the benefit of U.S. Provisional Application Ser. No. 61/695,412 filed on Aug. 31, 2012, the entire disclosure of which is expressly incorporated herein by reference.
Number | Date | Country | |
---|---|---|---|
61695412 | Aug 2012 | US |