One embodiment is directed generally to a computer system, and in particular to a computer system that generates a consumer decision tree.
Buyer decision processes are the decision making processes undertaken by consumers in regard to a potential market transaction before, during, and after the purchase of a product or service. More generally, decision making is the cognitive process of selecting a course of action from among multiple alternatives. Common examples include shopping and deciding what to eat.
In general there are three ways of analyzing consumer purchasing decisions: (1) Economic models—These models are largely quantitative and are based on the assumptions of rationality and near perfect knowledge. The consumer is seen to maximize their utility; (2) Psychological models—These models concentrate on psychological and cognitive processes such as motivation and need recognition. They are qualitative rather than quantitative and build on sociological factors like cultural influences and family influences; and (3) Consumer behavior models—These are practical models used by marketers. They typically blend both economic and psychological models.
One type of consumer behavior model is known as a “consumer decision tree” (“CDT”). A CDT is a graphical representation of a decision hierarchy of customers in a product attribute space for the purchase of an item in a given category. It models how customers consider different alternatives (based on attributes) within a category before narrowing down to the item of their choice, and helps to understand the purchasing decision of the customer. It is also commonly known as a “product segmentation and category structure”. CDTs are conventionally generated by brand manufacturers or third party market research firms based on surveys and other tools of market research. However, these methods lack accuracy and can lack authenticity since they may be based on biased data supplied by brand manufacturers.
One embodiment is a system that generates a consumer decision tree (“CDT”). The system receives customer purchasing data that includes transactions of a plurality of products each having at least one product attribute. For a product category, the system identifies a plurality of similar products from the purchasing data and one or more attributes corresponding to each similar product. The system assigns the product category as a current level of the CDT, and determines a most significant attribute of the plurality of attributes for the current level. The system forms a next level of the CDT by dividing the most significant attribute into a plurality of sub-sections, where each sub-section corresponds to an attribute value of the most significant attribute. The system then assigns each of the sub-sections as the current level, and repeats the determining of the most significant attribute and the forming of the next level of the CDT for each sub-section until a terminal node is identified.
One embodiment is a system that automatically generates a consumer decision tree (“CDT”) using a retailer's transaction data. The system first calculates similarities for each product and attribute value pair. The system then generates the CDT using similarity values. The CDT, because it can be generated by the retailer themselves, using their own data, is unbiased and objectively determined.
One embodiment generates a CDT to give retailers a “Customer-Centric” approach in managing categories of products. In the past, retailers used to plan their assortment and promotions based on a “Product-Centric” approach. The Product-Centric approach rewards products that are best sellers and punishes those with minimal sales. Inferences were typically made based on historical sales data which was greatly influenced by past assortments and promotions. Retailers gradually realized the shortcomings of the Product-Centric approach and also the fact that customers need to be included in the planning process. This has led many retailers to shift their strategy from the Product-Centric approach to the Customer-Centric approach. In the Customer-Centric approach, retailers identify target customer segments and cater their assortment and promotion to them.
Customer segments are group of customers with similar demographics or purchasing behavior. Product attribute types are the types of features of a given product, such as the product's brand, size, flavor, etc. Product attribute values are the actual features corresponding to a product attribute type, such as a brand of “premium”, a size of “large”, a flavor of “strawberry”, etc.
Known solutions for generating a CDT use an agglomerative (i.e., bottom up) clustering of products where distance between any two products is equivalent to their mutual dissimilarity value. Each node and split in the hierarchical structure can be explained in terms of attributes. However, translating a group of products at every node into attributes can be very tedious process and hard to automate. There frequently may be an instance where a split in the tree cannot be explained by single attribute. Therefore, these known solutions typically rely on human intervention. Further, there is no scoring approach available to quantify the accuracy of the CDT generated using known solutions.
Computer readable media may be any available media that can be accessed by processor 120 and includes both volatile and nonvolatile media, removable and non-removable media, and communication media. The communication media may include computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media. Processor 120 may be further coupled via bus 110 to a display 150, such as, e.g., a Liquid Crystal Display (“LCD”) or the like, for displaying information to the user. A keyboard 160 and a cursor control device 170, such as, e.g., a computer mouse or the like, may be further coupled to bus 110 to enable the user to interface with system 100.
In one embodiment, memory 130 may store software modules that can provide functionality when executed by processor 120. Modules may include an operating system (“OS”) 132 that can provides OS functionality for system 100. Modules may further include a consumer decision tree generation module 134 that automatically generates a CDT from retailer consumer data, as disclosed in more detail below. System 100 can be part of a larger system, such as a retail management system or an enterprise resource planning (“ERP”) systems. Therefore, system 100 may include one or more additional functional modules 136 to include the additional functionality. In one embodiment, module 136 is the “E-Business Suite” ERP system from Oracle Corp. A database 180 is coupled to bus 110 to provide centralized storage for modules 134 and 136 and store data to be used by modules 134 and 136. In one embodiment, the data includes customer purchasing data generated by a retailer, including for each customer the product attributes of the purchased products. This data can be generated by an ERP system and may be derived from a retailer's loyalty program, credit card information, etc. For example, a grocery store will generate data through its loyalty card program of the specific product purchased by each customer. The stored data on database 180 includes product attribute types and corresponding values.
CDT 200 provides a retailer with an insight into the decision process of customers when purchasing yogurt. For example, CDT 200 indicates that, among the customers, the size 204-206 of the yogurt product 202 is generally the most important factor during the decision-making process since size is the first level attribute value beneath the category of yogurt. Then, depending on the preferred size, the brand or production method are considered as the second most important factors. For example, for those who prefer a small size, the production method (e.g., organic 210 or non-organic 211) is the second most important factor. However, for those who prefer a medium or large size item, the brand is the second most important factor, and the production method does not have any impact on the decision-making process. Also, the flavor does not have any impact on the decision-making process of those who prefer a small sized yogurt product although the flavor is also considered among those who prefer a medium or large sized yogurt product that are from a mainstream brand.
In general, a CDT for an individual customer cannot be leveraged for any practical purposes due to scalability issues. Therefore, in order to provide meaningful information to a retailer, the CDT in accordance with one embodiment is escalated to embrace a group of customers with similar demographics or purchasing behaviors, which is referred to as a “customer segment.” Therefore, a CDT is defined at the customer segment level.
In
In calculating the similarities at 310, similarities between each product pair and attribute value pair for a given category are determined. The similarity between a pair of products and a pair of attribute values for a given attribute can be quantified based on the following ideas:
1. If any pair of products were purchased by the same customer anytime during the customer's transaction history, those products are considered similar. The number of such customers can weight the extent of similarity value.
2. If a pair of products were purchased in the same transaction, then products are dissimilar as the customer is probably buying it from a variety seeking objective.
3. The same logic from 1 and 2 above can be used to identify similarity between a pair of attribute values for any given attribute (e.g., the similarity between chocolate and vanilla).
One known approach for determining the similarity between products A and B is as follows:
However, assume a customer has purchased products A and B always in separate transaction except for one time where the customer bought them together. The above known approach will not count that customer towards product similarity, which can lead to inaccurate results. Therefore, one embodiment includes a factor for each customer which is equal to the proportion of transactions where products A and B were bought separately out of all transactions where either A or B was purchased. The count of customers is then replaced by the sum of this factor over all customers. Thus, in one embodiment, the similarity between a pair of products or attributes (determined at 310 of
Where A and B can be products or attribute values corresponding to any given attribute, and “F”=1 if the customer has bought both A and B at least once in the customer's transaction history, otherwise F=0:
F=δ(customer bought product A&B in his transaction history)
Further, “f” is defined as follows:
The dissimilarity between product pairs can be determined as follows:
Dissimilarity=1−Similarity, where Similarity is always between 0 and 1.
After all functional-fit attributes are identified, the functional-fit attributes are automatically placed at the top level of the CDT directly under the product category.
At 420 of
At 430, the items are divided into sub-sections, where each sub-section corresponds to a particular attribute value of the attribute identified at 420. For example, when a “form” product attribute is determined to be the most significant attribute for coffee at 420, “form” product attribute is divided into three sub-sections, each corresponding to a particular value of form for coffee: “Bean,” “Ground,” and “Instant.” The sub-sections form a next level 530 in
As disclosed, the tree is expanded until a terminal node is identified. In one embodiment, the criteria to declare a node as terminal is as follows:
After the CDT has been generated, as shown in
The CDT generation process can be characterized as a constrained divisive clustering of products. Therefore, the score of the resulting CDT in one embodiment is logically a comparison with the best and worst possible clustering solutions. In one embodiment, a weighted average dissimilarity (“WAD”) can be used to evaluate the quality of clustering solution. WAD is a weighted average of an average dissimilarity over all clusters where the weight is a number of items in each cluster. For the CDT, the terminal nodes constitute a set of clusters. The WAD can be determined as follows:
where Ni and ADi are the number of items and average dissimilarity in cluster “i”, respectively. The upper bound or “worst” solution for WAD is all items in one bucket, or no clustering, since the whole product group is presented as a single cluster. The lower bound or “best” or optimum solution for WAD is unconstrained clustering with the same number of clusters as the number of terminal nodes in a tree.
In one embodiment, the CDT score is defined as the percentage of gap filled between the worst solution and the best solution by the actual solution. This approach provides a more complete picture then merely reporting an absolute difference between the actual solution and the best solution. Therefore, the CDT score can be formulated as follows:
In one embodiment, in conjunction with 420 of
P1→f(A11,A12,A13)
P2→f(A21,A22,A23)
Where, Pi is the ith product and Aij is the ith attribute of ith product. The dissimilarity between pair of products and attributes can be denoted as follows:
ΔP→diss(P1,P2),ΔAi=diss(A1i,A2i),ΔP→f(ΔA1,ΔA2,ΔA3), where
ΔAi=Dissimilarity between product pair with respect to attribute ‘i’. This can also be referred to as the pairwise dissimilarity of attribute ‘i’.
The most significant attribute in one embodiment is the one whose dissimilarity has maximum impact on overall product dissimilarity. In order to find that, the correlation or association metrics between product dissimilarity and dissimilarity with respect to each attribute is calculated. Then, the attribute whose dissimilarity has the highest correlation/association with product dissimilarity is the most significant attribute.
Correlation is a linear dependence between two variables. “Pearson's Correlation Coefficient” is a well-known correlation metric. “Association” is a much broader term which encompasses any form of linkage between two variables. “Spearman Rank Correlation” is a well-known association metric. Other common metrics to find dependence between variables include “Entropy” and “Cosine” similarity. Embodiments of the present invention can use any known or other metrics for correlation and association.
Before these metrics can be evaluated, there is a need to verify if correlation or association really exists. In one embodiment, the verification comprises performing hypothesis testing by setting the null hypothesis of “No Correlation” or “No Association”. Detail of the hypothesis testing are as follows:
The attributes for which collinearity and association is verified (by hypothesis testing), metrics are obtained as follows:
Pearson's Correlation Coefficient:
Where x and y are ΔP and ΔA respectively.
Spearman Rank Correlation:
Where di is the difference between the ranks of data point in fields ΔP and ΔA and n is the total number of data points.
Embodiments rank the attributes based on these metrics and the attribute with the largest value is the most significant attribute. If the attributes were independent of each other the CDT would be symmetric. In that case it is relatively easy to obtain attribute ranking by significance and hence the tree in one step. However in reality, attributes are typically dependent on each other. The combination of attributes is constrained by existing products and not all possible combinations of attributes are possible. Therefore, an assumption of independence of attributes can lead to inaccurate results. For example, some brands of Yogurt come only non-flavored, which indicates strong dependence of flavor on brand.
To deal with the inter-dependence of attributes, one embodiment follows a “greedy” approach. At each node the most significant attribute is identified and then nodes at the next level are created corresponding to attribute values of identified significant attribute. Similar analysis is continued on subsequent nodes to grow the tree. The resulting tree might be unbalanced or asymmetric.
As disclosed, embodiments automatically generate a consumer decision tree using a retailer's transaction data. The system first calculates similarities for each product and attribute value pair and then generates the CDT using similarity values.
Known solutions for generating a CDT use an agglomerative (i.e., bottom up) clustering of products where distance between any two products is equivalent to their mutual dissimilarity value. In contrast, in embodiments of the present invention, each node and split in the hierarchical structure can be explained in terms of attributes. Embodiments involve constrained divisive clustering of products where partitioning/split is always decided by attributes. Since splits are based on attributes, nodes can always be associated with an attribute, which allows embodiments to be fully automated and executed by a computer system. Further, embodiments can also score the CDT to assess the quality of a CDT. Embodiments can work for any product, but more effectively can be used by retailers where customers make frequent purchases, such as a grocery store, so that the data used to generate the CDT is robust.
Several embodiments are specifically illustrated and/or described herein. However, it will be appreciated that modifications and variations of the disclosed embodiments are covered by the above teachings and within the purview of the appended claims without departing from the spirit and intended scope of the invention.
Number | Name | Date | Kind |
---|---|---|---|
5974396 | Anderson et al. | Oct 1999 | A |
6029195 | Herz | Feb 2000 | A |
6836773 | Tamayo et al. | Dec 2004 | B2 |
7003517 | Seibel et al. | Feb 2006 | B1 |
7082427 | Seibel et al. | Jul 2006 | B1 |
7120629 | Seibel et al. | Oct 2006 | B1 |
7315861 | Seibel et al. | Jan 2008 | B2 |
7330850 | Seibel et al. | Feb 2008 | B1 |
7818286 | Chu et al. | Oct 2010 | B2 |
8412656 | Baboo et al. | Apr 2013 | B1 |
20010014868 | Herz et al. | Aug 2001 | A1 |
20020161778 | Linstedt | Oct 2002 | A1 |
20040024773 | Stoffel et al. | Feb 2004 | A1 |
20040133551 | Linstedt | Jul 2004 | A1 |
20040199484 | Smith et al. | Oct 2004 | A1 |
20060004731 | Seibel et al. | Jan 2006 | A1 |
20080294996 | Hunt et al. | Nov 2008 | A1 |
20080319829 | Hunt et al. | Dec 2008 | A1 |
20090006156 | Hunt et al. | Jan 2009 | A1 |
20090018996 | Hunt et al. | Jan 2009 | A1 |
20100145773 | Desai et al. | Jun 2010 | A1 |
20100228604 | Desai et al. | Sep 2010 | A1 |
20110264617 | Eggers et al. | Oct 2011 | A1 |
20120066125 | Ma et al. | Mar 2012 | A1 |
20130346352 | Tiwari et al. | Dec 2013 | A1 |
Entry |
---|
Tanase, “The Retailers' Merchandise Mix Planning and the Process of Category Management,” Romanian Distribution Committee Magazine, 2011, vol. 2, issue 2, pp. 55-61. |
Number | Date | Country | |
---|---|---|---|
20130346352 A1 | Dec 2013 | US |