The subject matter described herein relates to scoring customers based on an enhanced market basket analysis.
In the retail industry, a lot of resources are typically spent on marketing and sales activities. A primary form of marketing is provision of offers (for example, coupons) on products that become available for purchase by customers. The offers can be provided based on a purchase history of the customers. For example, if a customer has been historically purchasing a hair conditioner, further offers on the hair conditioner can be provided to the customer. However, such a provision does not take into account whether the purchase of the hair conditioner can be predicted based on an earlier purchase of a predictor product, such as a shampoo.
The current subject matter describes a generation of a score of a customer based on an enhanced market basket analysis (eMBA). An eMBA model can receive historical data characterizing historical purchases of a plurality of products over a specified time-period. In response, the eMBA model can generate baskets, which can be associated with a causal status and a predictive nature of each product in those baskets. The generated baskets can be provided as an input to a group generator. The group generator can then generate product groups and confidence values. The product groups and confidence values can be provided to a score generator. In run-time, the score generator can receive current product data, and in return, can use the product groups and confidence values to generate a score. The score can characterize a likelihood of a purchase of the product by a corresponding customer associated with the product group. Based on the score, a merchant can determine an appropriate offer (for example, a discount offer) on the product to be provided to the customer. Related apparatus, systems, techniques and articles are also described.
In one aspect, data characterizing a product available for purchase can be received. The product can be associated with at least one subgroup that includes the product. The at least one subgroup can be at least one of a plurality of groups of historical products that have been shown to be frequently purchased together. Each subgroup can be associated with one or more confidence values. The data characterizing the groups can include causal statuses of the historical products. Using the one or more confidence values, a score can be generated. The score can characterize a likelihood of a purchase of the product by a corresponding customer associated with the at least one subgroup. Data characterizing the score can be provided. The receiving, the associating, the generating, and the providing can be implemented by at least one data processor forming part of at least one computing system.
In some variations one or more of the following can optionally be included.
The data characterizing the product can be an identifier of the product. The data characterizing the product can include at least one of: identity of the product, name of the product, manufacturer of the product, and a stock keeping unit associated with the product.
The groups can be associated with a plurality of confidence values. The one or more confidence values associated with the at least one subgroup can be selected from the plurality of confidence values associated with the groups.
Each causal status can be one of a predictor and a target. A causal status of the product available for purchase can be a target. The product can be predicted based on one or more products that have a predictor causal status.
The score can be a highest confidence value in the one or more confidence values associated with each subgroup. In another implementations, the score can be a mathematical multiplication product of a predetermined number of top confidence values of each subgroup. In a further implementation, the score can be a mathematical average of a top predetermined number of confidence values.
The one or more confidence values can be generated by performing the following. Based on historical data collected over a time-period, baskets can be generated. The time-period can be a predetermined time-period that can be specified by the merchant. Each basket can characterize corresponding historical products purchased by a customer within the time-period. The historical data can characterize historical purchases of the historical products between customers and merchants. Using the baskets, the groups of products can be formed. The groups of products can be products that are frequently purchased together by a customer. One or more ratios for the at least one subgroup can be determined. Each ratio being can be obtained by dividing a numerator by a denominator. The numerator can be a simultaneous occurrence of the one or more products and other products in the groups. The denominator can be an occurrence of the other products in the groups. The one or more ratios can characterize the one or more confidence values.
The baskets can be generated by performing the following. Transaction data can be extracted from the historical data. The transaction data can include a unique identification of a customer for each purchase, a date of each purchase, and a stock keeping unit associated with each purchase. A product map mapping each stock keeping unit with a respective product can be obtained. Using the transaction data and the product map, basket identifiers can be generated. The basket identifiers can identify the baskets and one or more product identifiers associated with each basket identifier. Each basket identifier can characterize a time-period when a corresponding customer made a purchase. The product identifier can characterize a product associated with the purchase and a causal status associated with the purchase.
The causal status can identify the purchased product as one of: a product used to predict a purchase of another product and a product obtained based on a purchase of another product.
The groups of products can be performed by performing the following. The baskets can be received. Each basket can be associated with respective products. A first table including each product and corresponding occurrence of each product in the baskets can be generated. A second table can be generated by removing, from the first table, one or more products that have values of occurrence below a first threshold. A third table can be generated by pairing each product in the second table with every other product in the second table to form product-sets including pairs of products. A fourth table can be generated, wherein the fourth table can include each product-set and an occurrence of the corresponding pair of products in the baskets. A fifth table can be generated by removing one of more product-sets that have values of occurrence below a second threshold. The product-sets in the fifth table can be the formed groups of products. The first threshold can be equal to the second threshold.
The generating of the score can be further based on a trend associated with the purchase. The trend can characterize a time-interval when the product is likely to be purchased. The trend can be determined based on a buffer window value provided by a merchant.
Computer program products are also described that include non-transitory computer readable media storing instructions, which when executed by at least one data processors of one or more computing systems, causes at least one data processor to perform operations herein. Similarly, computer systems are also described that may include one or more data processors and a memory coupled to the one or more data processors. The memory may temporarily or permanently store instructions that cause at least one processor to perform one or more of the operations described herein. In addition, methods can be implemented by one or more data processors that either are within a single computing system or are distributed among two or more computing systems.
The subject matter described herein provides many advantages. For example, scores for customers can be generated fairly accurately based on historical data collected over a short time-period, such as about 2 to 3 months, as compared to longer times periods, such as 1 to 2 years, as in conventional systems. Thus, merchants can provide accurate offers without requiring historical data collected over a long time-period. Such a collection over a short time-period can be advantageous for merchants that are new in the market and do not have access to historical data collected over long time-period, as the current enhanced system allows an accurate provision of offers (for example, discount offers) even with a short history. Moreover, such a collection over a short time-period can be advantageous for merchants that sell products that can only have a short history and may not have a long history, as the current enhanced system allows an accurate provision of offers (for example, discount offers) even with a short history. Further, the enhanced system described herein can be easier to develop as compared to conventional systems. Additionally, the enhanced system allows a scoring and subsequent provision of offers based on a causal status and a predictive nature of a product, both of which can be taken into account while generating product baskets from the historical data. Such an accounting of causal status and predictive nature can advantageously cause accurate scoring of customers for a product that becomes available for purchase, thereby allowing an effective provision of offers. Such effective provision of offers can result in significant cost advantages, and other business advantages.
The details of one or more variations of the subject matter described herein are set forth in the accompanying drawings and the description below. Other features and advantages of the subject matter described herein will be apparent from the description and drawings, and from the claims.
Like reference symbols in the various drawings indicate like elements.
The score can be provided to a merchant on a graphical user interface. The provision can be over a network, such as internet, local area network, wide area network, Bluetooth network, and any other network. The score can be displayed to a merchant on a graphical user interface. Based on the score, the merchant can determine and subsequently provide an offer (for example, a discount offer) on the product to the customer.
The generation of the product groups and confidence values 110 can occur in design-time, and the generation of the score 116 can occur in run-time. The run-time can be a time when a current/new product becomes available in real-time for purchase at a sales location of a merchant for a plurality of customers. Herein, a current/new product refers to a product, at least two months of transaction historical data associated with which is available. The score can characterize a likelihood of a purchase of the current/new product by a corresponding customer.
Historical data 102 can be collected over a past time-period, such as past one month, two months, six months, one year, two years, five years, or other predetermined period. In a case where a merchant may be newly established and does not have access to historical data and/or a case where a product is newly developed and does not have a long purchase history, the time-period for collection of data can be advantageously small, such as 2 or more months. Historical data can include historical purchases between merchants and customers. This historical data 102 can be received, at 202, at an enhanced market basket analysis model 104.
The enhanced market basket analysis model 104 can generate, at 204, baskets 106 of data. The baskets 106 can include causal and predictive data associated with the products in the baskets. For example, the data in the baskets 106 can indicate whether the purchase of a particular product can be used to predict purchase of other one or more products, and whether the purchase of a particular product can be predicted based on previous purchase of other one or more products. Such a generation of baskets 106 is described in more detail below with respect to diagram 300.
The baskets 106 can be provided to the group generator 108. The group generator 108 can use the baskets to form, at 206, groups of products that may be frequently purchased together by a customer. Such a forming of product groups is described in more detail below with respect to diagram 400.
One or more confidence values associated with each group can be generated, at 208. Each confidence value can be generated by dividing a numerator by a denominator, wherein the numerator is a simultaneous occurrence of the one or more products and other products in the groups, and the denominator is an occurrence of the the other products in the groups. Such a generation of one or more confidence values is described in more detail below with respect to diagram 500.
The transaction data 302 can be extracted from the historical data 102. The transaction data can include a customer identifier 304 for each purchase of a respective product, a date 306 (including month, day, year, and/or time) of each purchase, and a stock keeping unit (SKU) 308 associated with each purchase.
A product map 310 can be obtained. The product map 310 can map each stock keeping unit 308 with a product identifier 312.
Using the transaction data 302 and the product map 310, basket data 314 for the baskets 106 can be generated. The basket data 314 can include basket identifiers 316 and enhanced product identifiers 318. The generating of the basket data 314 can be based on a buffer window value, which can characterize a future time-interval (also referred to as a future trend) for which a likelihood of purchase of the target data needs to be computed. A buffer window value of zero, as shown in diagram 300, can characterize that a prediction for the purchase of the target product is made for a time interval subsequent to the time interval of purchase of the predictor product. For example, if the predictor time-interval for the purchase of the predictor product is a particular time interval, the target time interval for purchase of the target product is an immediately subsequent time-interval.
Although a buffer window values of zero has been described above, in some other implementations, other buffer window values can also be used, such as one, two, three, four, five, and so on. An buffer window value of “n” characterizes that a prediction for the purchase of the target product is made for a (n+1)th time-interval subsequent to the time interval of purchase of the predictor product. For example, when n=1 and if the predictor time-interval for the purchase of the predictor product is a particular time interval, the target time interval for purchase of a target product is the second subsequent time-interval after the predictor time-interval.
Each basket can be identified by basket identifiers 316. Each basket identifier 316 can characterize a time-period when a corresponding customer made a purchase. The basket identifier 316 can have a form of CustomerID_MonthOfPurchaseOfPredictorProduct_MonthOfPurchaseOfTargetProduct. For example, the basket identifier A—1—2 can indicate that customer A purchased a predictor product in month 1, and purchased a target product in month 2. Further, the basket identifier A— 2_—can indicate that the customer A purchased a predictor product in month 2, and then did not purchase a target product. The basket identifier B_-—2 can indicate that the customer B did not purchase a predictor product, and purchased a target product in month 2. Similarly, the basket identifier B—2—3 can indicate that customer B purchased a predictor product in month 2, and then purchased a target product in month 3. Further, the basket identifier B—3_—can indicate that customer B purchased a predictor purchase in month 3, and then did not purchase a target product. Furthermore, the basket identifier B_-—6 can indicate that the customer B did not purchase a predictor product, and then purchased a target product.
A predictor product can be used to predict other target products. A target product can be predicted based on one or more predictor products. For example, an automobile can be a predictor product, and gasoline can be a target product.
Based on the basket identifier 316 and the data obtained from the transaction data 302 and the product map 310, the enhanced product identifiers 318 can be generated. The enhanced product identifier can indicate a causal status associated with the purchase and a product associated with the purchase. For example, the enhanced product identifier x_P1 can indicate that P1 is a predictor product for this basket. Further, the enhanced product identifier y_P2 can indicate that P2 is a target product for this basket. Similarly, for other enhanced product identifiers, “x” can indicate that the product is a predictor product, and “y” can indicate that the product is a target product.
A database 402 including each basket and associated products can be obtained from the historical data 102. For example, products P1, P3, and P4 can exist in basket I; products P2, P3, and P5 can exist in basket II; and so on, as shown.
An occurrence of each product in the baskets can be determined to generate a first table 404. The occurrence of a product in a basket can be a number of baskets in which the product occurs. For example, if a customer purchases a shampoo in two baskets, the occurrence for the product shampoo is two.
From the first table 404, one or more products that have values of occurrence below a first threshold can be removed to generate a second table 406. In one implementation, the first threshold can be characterized by a minimum support value of 50%. In this implementation, the row with product P1 having an occurrence of 1 (that is, the row with product P1 occurring a single time) can be removed from the first table 404, as occurrence 1 is below the first threshold. Thus, the second table 406 can include the products that have an occurrence of 2 or more.
By pairing each product in the second table 406 with every other product in the second table 406, a third table 408 can be generated to form groups (for example, product-sets) including pairs of products. For example, product P1 is combined with each of P2, P3, and P5; P2 is combined with each of P1, P3, and P5; P3 is combined with each of P1, P2, and P5; and P5 is combined with each of P1, P2, and P3, as shown in the third table 408.
A fourth table 410 can be generated. The fourth table 410 can include the groups of the third table 408, and an occurrence of each group in the baskets of database 402.
The rows of one of more groups that have occurrence below a second threshold in the fourth table 410 can be removed to generate a fifth table 412. The second threshold can be can be characterized by a minimum support value of 50%. In every iteration, a same threshold can be used. For example, the first threshold can be equal to the second threshold. In this implementation, the row with group {P1 P2} and the row with group {P1 P5} have an occurrence of 1, and can be removed from the fourth table 410 to generate the fifth table 412. Thus, the fifth table 412 can include the groups that have an occurrence of 2 or more in the baskets of database 402. The groups/product-sets in the fifth table 412 can be the product groups that are a part of 110.
It may be noted that while 2 iterations have been described to form the product groups, more number of iterations can be performed based on the obtained historical data. Further, while each illustrated product group in the fifth table 412 includes the same number of products, in some other implementations, the final product groups can have different number of products by changing the requirement regarding pairing of the products to form product groups. For example, in some implementations, four products may be selected for a first set of groups, three products may be selected for a second set of groups, and two products (that is, pairs) may be selected for a third set of groups, as noted below in table 502.
For example, consider the group i, which has products P1, P2, and P5, with a support value of 22%. The confidence values 504 of each possible association between these products can be determined as shown. The symbol “” can characterize co-occurrence of the products on the left and right of it. The symbol “
” can characterize that the one or more products on left of it are predictor products, and one or more products on the right of it are target products. The confidence value for P5 in association “P1 ̂P2
P5” can be determined by dividing 2 (which is an occurrence of P5 with P1 and P2 in the table 502) by 4 (which is an occurrence of P1 and P2 in the table 502). Similarly, other confidence values can be generated for each association in each group.
Although the score has been described as a highest value in the confidence values, in some other implementations, the score can be computed differently in different implementations. For example, in one implementation, the score can be an average of at least some (for example, top four, top five, top six, or the like) confidence values. In another implementation, the score can be a mathematical product obtained by a multiplication of at least some (for example, top four, top five, top six, or the like) confidence values.
For example, the customer A 710 is associated with products P1, P2, P3, and T. Out of these products, P1 is associated with a confidence value of 0.05, P2 is associated with a confidence value of 0.01, and P3 is not associated with any confidence value. Out of these confidence values, 0.05 is the highest confidence value. Accordingly, customer A is allocated a score of 0.05. The score of 0.05 can characterize a likelihood of purchase of the product T by the customer A. Further, if a basket 706 contains one or more products that are not in any of the rules 702, then the score can be zero, as noted for customer C. That is, customer C is not likely to purchase the product T.
Merchants can determine appropriate offers (for example, coupons for one or more products) for each customer based on a score of the customer. For example, customer A can be provided one or more offers based on the detected scores. As noted below, the offers provided based on such scores can be effective. Further, such a strategic score-based provision of offers can be advantageous, as the number of redeemed offers is significantly higher than the number of redeemed offers when the provision of offers is based on conventional marketing techniques. Such an increase in redemption of offers can advantageously increase revenue and profits of a merchant that provides the offers.
Various implementations of the subject matter described herein can be realized/implemented in digital electronic circuitry, integrated circuitry, specially designed application specific integrated circuits (ASICs), computer hardware, firmware, software, and/or combinations thereof. These various implementations can be implemented in one or more computer programs. These computer programs can be executable and/or interpreted on a programmable system. The programmable system can include at least one programmable processor, which can have a special purpose or a general purpose. The at least one programmable processor can be coupled to a storage system, at least one input device, and at least one output device. The at least one programmable processor can receive data and instructions from, and can transmit data and instructions to, the storage system, the at least one input device, and the at least one output device.
These computer programs (also known as programs, software, software applications or code) can include machine instructions for a programmable processor, and can be implemented in a high-level procedural and/or object-oriented programming language, and/or in assembly/machine language. As can be used herein, the term “machine-readable medium” can refer to any computer program product, apparatus and/or device (for example, magnetic discs, optical disks, memory, programmable logic devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that can receive machine instructions as a machine-readable signal. The term “machine-readable signal” can refer to any signal used to provide machine instructions and/or data to a programmable processor.
To provide for interaction with a user, the subject matter described herein can be implemented on a computer that can display data to one or more users on a display device, such as a cathode ray tube (CRT) device, a liquid crystal display (LCD) monitor, a light emitting diode (LED) monitor, or any other display device. The computer can receive data from the one or more users via a keyboard, a mouse, a trackball, a joystick, or any other input device. To provide for interaction with the user, other devices can also be provided, such as devices operating based on user feedback, which can include sensory feedback, such as visual feedback, auditory feedback, tactile feedback, and any other feedback. The input from the user can be received in any form, such as acoustic input, speech input, tactile input, or any other input.
The subject matter described herein can be implemented in a computing system that can include at least one of a back-end component, a middleware component, a front-end component, and one or more combinations thereof. The back-end component can be a data server. The middleware component can be an application server. The front-end component can be a client computer having a graphical user interface or a web browser, through which a user can interact with an implementation of the subject matter described herein. The components of the system can be interconnected by any form or medium of digital data communication, such as a communication network. Examples of communication networks can include a local area network, a wide area network, internet, intranet, Bluetooth network, infrared network, or other networks.
The computing system can include clients and servers. A client and server can be generally remote from each other and can interact through a communication network. The relationship of client and server can arise by virtue of computer programs running on the respective computers and having a client-server relationship with each other.
Although a few variations have been described in detail above, other modifications can be possible. For example, the logic flows depicted in the accompanying figures and described herein do not require the particular order shown, or sequential order, to achieve desirable results. Other embodiments may be within the scope of the following claims.