Predictive models have been used to identify customers who are at a high risk of churning. Such predictive models have involved the use a machine learning model, such as a random forest classification model using hundreds of features culled from customer use of the service. Both static and dynamic variables using commercial/billing data. Weekly reports were generated to alert product managers and account managers regarding risk of churning, allowing the managers to take actions to attempt to retain customers.
As the number of time series data increases or explodes, the number of features for use by the models will grow exponentially. At least one model utilized a recurrent neural network to model one or multiple time series of customer actions in order to predict customer churn. There is a desire to further improve model performance to predict customer churn.
A method to predict churn includes obtaining static features representative of a customer of a service, obtaining time series features representative of the customer's interaction with the service, using a deep neural network to process the static features, using a recurrent neural network to process the time series features; and combining outputs from the deep neural network and the recurrent neural network to predict likelihood of customer churn.
A machine readable storage device has instructions for execution by a processor of the machine to perform operations. The operations include obtaining static features representative of a customer of a service, obtaining time series features representative of the customer's interaction with the service, using a deep neural network to process the static features, using a recurrent neural network to process the time series features, and combining outputs from the deep neural network and the recurrent neural network to predict likelihood of customer churn.
A device includes a processor and a memory device coupled to the processor and having a program stored thereon for execution by the processor to perform operations. The operations include obtaining static features representative of a customer of a service, obtaining time series features representative of the customer's interaction with the service, using a deep neural network to process the static features, using a recurrent neural network to process the time series features, and combining outputs from the deep neural network and the recurrent neural network to predict a likelihood of customer churn.
In the following description, reference is made to the accompanying drawings that form a part hereof, and in which is shown by way of illustration specific embodiments which may be practiced. These embodiments are described in sufficient detail to enable those skilled in the art to practice the invention, and it is to be understood that other embodiments may be utilized and that structural, logical and electrical changes may be made without departing from the scope of the present invention. The following description of example embodiments is, therefore, not to be taken in a limited sense, and the scope of the present invention is defined by the appended claims.
The functions or algorithms described herein may be implemented in software in one embodiment. The software may consist of computer executable instructions stored on computer readable media or computer readable storage device such as one or more non-transitory memories or other type of hardware based storage devices, either local or networked. Further, such functions correspond to modules, which may be software, hardware, firmware or any combination thereof. Multiple functions may be performed in one or more modules as desired, and the embodiments described are merely examples. The software may he executed on a digital signal processor, ASIC, microprocessor, or other type of processor operating on a computer system, such as a personal computer, server or other computer system, turning such computer system into a specifically programmed machine.
Churn prediction and prevention is a critical component of cloud service based businesses. Churn or churning may be associated with customer turnover, defection, loss, or other form of customer attrition. Since churn is a rare event, and churn patterns may vary significantly across customers, predicting churn is a challenging task when using conventional machine learning techniques. However, a massive and rich amount of customer usage and billing data enables the exploitation of advanced machine learning techniques to create models to discover complex usage patterns for churn. Features extracted from time series data may grow exponentially and result in significant consumption of computing resources for prior customer relationship management machine learning systems.
A system and method utilize a deep neural network (DNN) and a recurrent neural network (RNN) to process static and time series data related to customer churn with respect to services. Both networks may include multiple layers to transform raw inputs into churn prediction outputs. The outputs of both neural networks are combined to provide a likelihood that a customer will elect to no longer participate in services. The stacking of the multiple layers of neural networks is referred to as architecture engineering. The combination of a deep neural network with a recurrent neural network provides increased accuracy as compared to prior use of separate random forest classification.
Static features may include data about customers stored in a data base, such as static data describing the customer and account information stored in customer fields. Example static features include offer type, tenure age, and billing status. In one embodiment, more than ten customer status variables may be used as static features. Time series data may include data collected regarding use of the service by the customers, such as usage telemetry logs documenting access and use of cloud based services, as well as various meters. Example time series data, also called dynamic features include daily usage of cloud services such as network, storage, virtual machine, etc. In some embodiments, over 400 variables may be extracted from the time series data. The data in one embodiment is collected over an eight week usage period. The training data in one example included over 98,000 samples.
Data collection 125 occurs during a training stage to provide training data 130 that includes both static and time series features. Data 130 may also include whether or not the customer actually did churn. Data 130 will be used to train a hybrid deep learning network model 135 to predict customer churn.
Data collection 125 also occurs during use of the system service 120 to provide customer data 140 for determining the likelihood that a customer will churn by the network model 135 once trained. Customer data 140 includes static and time series features collected during actual use of the service system 120 and is processed by the network model 135 to determine likelihood of churn for individual customers.
Network model 135 divides the feature inputs into a static feature input 145 and a dynamic feature input 150. The static feature input 145 provides static features to multiple layers of deep neural networks 155. The dynamic feature input 150 provides the time series features to one or more recurrent neural network layers 160. Outputs of the deep neural networks 155 and recurrent neural network layers 160 are combined at a combiner 165 to provide an indication of the likelihood of customer churn for each customer.
The network model 135 may initially be trained by observing the features and correlating them to actual churn. Following training, the network model 135 may be run periodically against one or more customers to obtain the indication or prediction of whether or not the customer is likely to churn. In one embodiment, the indication may simply be binary, such as a “1” or a “0”, with “1” meaning that the customer is a churn candidate. The prediction may also be expressed as a value between 0 and 1 in further embodiments, such as 0.75 for example, corresponding to 75% likely to churn.
An interface 170 may be used to provide the results generated by model 135. Results may be provided in periodic reports, such as weekly or monthly. Results may also be displayed in a dashboard, allowing timely contacting of customers likely to churn.
Network model 200 also has a dynamic feature input 240 that provides time series features to a first recurrent neural network (RNN) layer at 245, which is coupled to a second RNN layer at 250. The RNN layers may be identical and run long short-term memory (LSTM). First RNN layer 245 in one embodiment has a 56×18 input dimension and a 56×18 output dimension. The second RNN layer 250 may have a 56×18 input dimension and a 128 output dimension. Each RNN layer may also have a highway-like architecture in one embodiment.
The results from the DNN layers and RNN layers are combined at a merge layer 255. With the highway-like architecture, results from each of the layers are included to be combined. The merged results are then processed via a multi-layer perceptron (MLP) layer 260 to approximate a risk score that is provided by output layer 265.
A highway 325 is provided to obtain results from each DNN layer. The highway 325 in one embodiment collects the results from each DNN layer and provides the results to the merge layer 255. In
When highway connections are applied to RNN, the final 128 state output for each RNN layer is fed into the highway links, and propagates to the final output. The “action” of taking the final 128 state output from RNN's entire 56×128 output is illustrated as “lambda” block (540, 565) in
LSTM 530 provides it output as an input to LSTM 550 to provide output to dense layer 555. LSTM 530 also provides output to highway 515 which includes a lambda layer 565 and dense layer 570. The dense layers 545, 555, and 570 provide results to merge layer 255. The dense layers 545, 555, and 570 are also indicated in
FIG, 6 is a block diagram illustrating arrangements of
A comparison of model performance based on a set of 98,247 training samples with 21,054 valid samples, and 21,053 test samples was taken over 8 weeks. Time-series variables included 18 daily usage data points. Static variables include more than 10 customer status variables and 400+ dynamic variables extracted from the time series data. Customers that churned were labeled with a binary value, such as 0 for no churn or 1 for a customer that churned.
When compared with a random forest model and a DNN operating on static variables only, significant increases were noted across all performance metrics for the combined model. A precision-recall curve is illustrated in
At 820, a deep neural network having multiple neural network layers coupled in series is used to process the static features. At 825, a recurrent neural network having two or more layers is used to process the time series features. Outputs from the deep neural network and the recurrent neural network are combined at 830 to predict customer churn.
In one embodiment, the deep neural network comprises multiple deep neural network layers wherein a first deep neural network layer receives the static features as inputs and each succeeding deep neural network layer processes an output of a previous layer. Each deep neural network layer may have the same output dimension. A last layer provides a last layer output to a merge function and the other layer outputs are provided via a highway-like architecture to the merge function to provide the deep neural network output for combining with the recurrent neural network output. In one embodiment, each layer at each deep neural network layer may perform batch normalization.
In one embodiment, the recurrent neural network comprises a first recurrent neural network layer and a second recurrent neural network layer. The first neural network layer receives the time series features and provides a first output as an input to the second recurrent neural network layer. The second recurrent neural network layer provides the recurrent neural network output.
Memory 1003 may include volatile memory 1014 and non-volatile memory 1008. Computer 1000 may include—or have access to a computing environment that includes a variety of computer-readable media, such as volatile memory 1014 and non-volatile memory 1008, removable storage 1010 and non-removable storage 1012. Computer storage includes random access memory (RAM), read only memory (ROM), erasable programmable read-only memory (EPROM) & electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technologies, compact disc read-only memory (CD ROM), Digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices capable of storing computer-readable instructions for execution to perform functions described herein.
Computer 1000 may include or have access to a computing environment that includes input 1006, output 1004, and a communication connection 1016. Output 1004 may include a display device, such as a touchscreen, that also may serve as an input device. The input 1006 may include one or more of a touchscreen, touchpad, mouse, keyboard, camera, one or more device-specific buttons, one or more sensors integrated within or coupled via wired or wireless data connections to the computer 1000, and other input devices. The computer may operate in a networked environment using a communication connection to connect to one or more remote computers, such as database servers, including cloud based servers and storage. The remote computer may include a personal computer (PC), server, router, network PC, a peer device or other common network node, or the like. The communication connection may include a Local Area Network (LAN), a Wide Area Network (WAN), cellular, Wi-Fi, Bluetooth, or other networks.
Computer-readable instructions stored on a computer-readable storage device are executable by the processing unit 1002 of the computer 1000. A hard drive, CD-ROM, and RAM are some examples of articles including a non-transitory computer-readable medium such as a storage device. For example, a computer program 1018 may be used to cause processing unit 1002 to perform one or more methods or algorithms described herein.
Additional examples of the presently described method, system, and device embodiments include the following, non-limiting configurations. Each of the following non-limiting examples may stand on its own, or may be combined in any permutation or combination with any one or more of the other examples provided below or throughout the present disclosure.
A method to predict churn, the method comprising: obtaining static features representative of a customer of a service; obtaining time series features representative of the customer's interaction with the service; using a deep neural network to process the static features; using a recurrent neural network to process the time series features; and combining outputs from the deep neural network and the recurrent neural network to predict likelihood of customer churn.
The method of example 1 wherein the deep neural network comprises multiple deep neural network layers wherein a first deep neural network layer receives the static features as inputs and each succeeding deep neural network layer processes an output of a previous layer.
The method of example 2 wherein each deep neural network layer has a same output dimension.
The method of any of examples 2-3 wherein a last layer provides a last layer output to a merge function, and wherein the other layer outputs are provided via a highway-like architecture to the merge function to provide the deep neural network output for combining with the recurrent neural network output.
The method of any of examples 2-4 and further comprising performing batch normalization at each deep neural network layer.
The method of any of examples 1-5 wherein the recurrent neural network comprises a first recurrent neural network layer and a second recurrent neural network layer.
The method of example 6 wherein the first neural network layer receives the time series features and provides a first output as an input to the second recurrent neural network layer.
The method of example 7 wherein the second recurrent neural network layer provides the recurrent neural network output.
The method of any of examples 1-8 wherein combining outputs from the deep neural network and the recurrent neural network comprises: merging the outputs; using a multi-layer perceptron (MLP) to approximate a risk score; providing the risk score corresponding to a likelihood of churn for the customer.
The method of any of examples 1-9 wherein the static features include at least one of static features include offer type, tenure age, and billing status.
The method of example 10 wherein the dynamic features include at least one of daily usage of cloud services including network, storage, and virtual machine.
A machine readable storage device having instructions for execution by a processor of the machine to perform operations comprising: obtaining static features representative of a customer of a service; obtaining time series features representative of the customer's interaction with the service; using a deep neural network to process the static features; using a recurrent neural network to process the time series features; and combining outputs from the deep neural network and the recurrent neural network to predict likelihood of customer churn.
The storage device of example 12 wherein the deep neural network comprises multiple deep neural network layers wherein a first deep neural network layer receives the static features as inputs and each succeeding deep neural network layer processes an output of a previous layer and wherein each deep neural network layer has a same output dimension.
The storage device of any of examples 12-13 wherein a last layer provides a last layer output to a merge function, and wherein the other layer outputs are provided via a highway-like architecture to the merge function to provide the deep neural network output for combining with the recurrent neural network output.
The storage device of any of examples 12-14 wherein the recurrent neural network comprises a first recurrent neural network layer and a second recurrent neural network layer, wherein the first neural network layer receives the time series features and provides a first output as an input to the second recurrent neural network layer, and wherein the second recurrent neural network layer provides the recurrent neural network output.
The storage device of any of examples 12-15 wherein combining outputs from the deep neural network and the recurrent neural network comprises: merging the outputs; using a multi-layer perceptron (MLP) to approximate a risk score; providing the risk score corresponding to a likelihood of churn for the customer.
The storage device of any of examples 12-16 wherein the static features include at least one of static features include offer type, tenure age, and billing status and wherein the dynamic features include at least one of daily usage of cloud services including network, storage, and virtual machine.
A device comprising: a processor; and a memory device coupled to the processor and having a program stored thereon for execution by the processor to perform operations comprising: obtaining static features representative of a customer of a service; obtaining time series features representative of the customer's interaction with the service; using a deep neural network to process the static features; using a recurrent neural network to process the time series features; and combining outputs from the deep neural network and the recurrent neural network to predict a likelihood of customer churn.
The device of example 18 wherein the deep neural network comprises multiple deep neural network layers wherein a first deep neural network layer receives the static features as inputs and each succeeding deep neural network layer processes an output of a previous layer and wherein each deep neural network layer has a same output dimension, wherein the recurrent neural network comprises a first recurrent neural network layer and a second recurrent neural network layer, wherein the first neural network layer receives the time series features and provides a first output as an input to the second recurrent neural network layer, and wherein the second recurrent neural network layer provides the recurrent neural network output, and wherein combining outputs from the deep neural network and the recurrent neural network comprises: merging the outputs; using a multi-layer perceptron (MLP) to approximate a risk score; providing the risk score corresponding to a likelihood of churn for the customer.
The device of any of examples 18-19 wherein the static features include at least one of static features include offer type, tenure age, and billing status and wherein the dynamic features include at least one of daily usage of cloud services including network, storage, and virtual machine.
Although a few embodiments have been described in detail above, other modifications are possible. For example, the logic flows depicted in the figures do not require the particular order shown, or sequential order, to achieve desirable results. Other steps may be provided, or steps may be eliminated, from the described flows, and other components may be added to, or removed from, the described systems. Other embodiments may be within the scope of the following claims.