Various embodiments described herein relate to apparatus, systems, and methods for selecting next actions given data relating individuals to various events.
Retailers, advertisers, and many other institutions are keenly interested in understanding consumer spending habits. These companies invest tremendous resources to identify and categorize consumer interests, in order to learn how consumers spend money. If the interests of an individual consumer can be determined, then it is believed that advertising and promotions related to these interests will be more successful in obtaining a positive consumer response, such as purchases of the advertised products or services.
Conventional means of determining consumer interests have generally relied on collecting demographic information about consumers, such as income, age, place of residence, occupation, and so forth, and associating various demographic categories with various categories of interests and merchants. Interest information may be collected from surveys, publication subscription lists, product warranty cards, and myriad other sources. The data collected is processed resulting in some demographic and interest description of each of a number of consumers.
This approach to understanding consumer behavior often misses the mark. The assumption is that consumers will spend money on their interests, as expressed by things like their subscription lists and their demographics. Yet, the data on which the determination of interests is made is typically only indirectly related to the actual spending patterns of the consumer. For example, most publications have developed demographic models of their readership, and offer their subscription lists for sale to others interested in the particular demographics of the publication's readers. But subscription to a particular publication is a relatively poor indicator of what the consumer's spending patterns will be in the future.
Even taking into account multiple different sources of data, such as combining subscription lists, warranty registration cards, and so forth still only yields an incomplete collection of unrelated data about a consumer.
One of the problem associated with these conventional approaches is the failure to recognize that spending patterns are time based. That is, many times consumers spend money in a time related manner. For example, a consumer who is a business traveler spends money on plane tickets, car rentals, hotel accommodations, restaurants, and entertainment in preparation for and during a single business trip. These purchases together more strongly describe the consumer's true interests and preferences than any single one of the purchases alone.
Yet another problem with conventional approaches is that categorization of purchases is often based on standardized industry classifications of merchants and business, such as the SIC codes. This set of classification is entirely arbitrary, and has little to do with actual consumer behavior. Consumer do not decide which merchants to purchase from based on their SIC code. Thus, the use of arbitrary classifications to predict financial behavior is doomed to failure, since the classifications have little meaning in the actual data of consumer spending.
Still another problem is that different groups of consumers spend money in different ways. For example, consumers who frequent high-end retailers have entirely different spending habits than consumers who are bargain shoppers. To deal with this problem, most systems focus exclusively on very specific, predefined types of consumers, in effect, assuming that the interests or types of consumers are known, and targeting these consumers with what are believed to be advertisements or promotions of interest to them. However, this approach essentially puts the cart before the horse: it assumes the interests and spending patterns of a particular group of consumers, it does not discover them from actual spending data. It thus begs the questions as to whether the assumed group of consumers in fact even exists, or has the interest that are assumed for it.
Accordingly, what is needed is the ability to model consumer financial behavior based on actual historical spending patterns that reflect the time-related nature of each consumer's purchase. Further, it is desirable to extract meaningful classifications of merchants based on the actual spending patterns, and from the combination of these, predict future spending of an individual consumer in specific, meaningful merchant groupings.
One source of data now available to retailers is transaction data. Retailers typically sell and provide a wide variety of products to a large number of customers. Each of the transactions is recorded at a point of sale device and is used for accounting and other purposes. Many retailers retain data related to these transactions, which is sometimes referred to as transaction data. Transaction data includes all data related to a transaction including, for example, promotions, price changes, product features, store features, seasonal factors and customer loyalty data that may affect the transaction. The transaction data can also include demographics and firmographics. The transaction data includes data detailing an actual purchase, which is referred to as purchase data. Purchase data or transaction data can be used for a variety of purposes. Typically, purchase data is used to encourage repeat purchase behavior and to identify customers with high value growth potential. One challenge associated with transaction data or purchase data is associated with the sheer volume of the data. While the purchase data or transaction data offers a huge opportunity for vital marketing information, the sheer volume of the data challenges the traditional statistical and mathematical techniques at the retailers disposal. Retail data analysts use only limited online analytical processing (OLAP) capabilities to “slice and dice” the purchase data to extract basic statistical reports and use them and other domain language to make marketing decisions.
The invention is pointed out with particularity in the appended claims. However, a more complete understanding of the present invention may be derived by referring to the detailed description when considered in connection with the figures, wherein like reference numbers refer to similar items throughout the figures and:
The description set out herein illustrates the various embodiments of the invention and such description is not intended to be construed as limiting in any manner.
Entities can be any number of items associated with the data. In the instances where the data relates to transactions between customers and a retailer or retailers, entities include products, and product groups. Entities are also not limited to products and product groups, and can also represent a promotion, a change in price, or potions of information about consumers, or other data. Entities can also be promotion histories, or purchase histories of a customer or group of customers. Determining insights between entities 110 includes finding products that are coherent or bridge to other products. Insights include all types of relationships, including relationships that may have previously been unknown to the retailer or group of retailers. Determining insights 110 includes determining relationships between products and consumers. In short, determining relationships between entities 110 allows marketers to gain insight into relationships between the various entities.
The method 100 also includes predicting the likelihood of the occurrence of a future event 112. In retail situations, the future event many times is the purchase of another product. For example, when a consumer buys a personal computer many times the consumer will follow with purchases of other hardware or software. The consumer may buy a printer or may buy a word processing program shortly after making a computer purchase. The future event can actually include other items, such as an in-store visit or the like.
In addition to predicting that an event will occur, in some embodiments of the invention, a time frame in which the event will occur is also predicted. In one embodiment, predicting the likelihood of the occurrence of a future event 112 is generally done as a risk factor over a number of selected times. This is referred to as predicting the time to the event. The risk factor is set for the various time frames. The time frames can be as short or as long as desired. For example, the time frame may be a second, or it may be several days. The risk factor is based on the risk that the action takes place over the time frame. The subsequent time frame presents yet another risk factor. The time frames can be equal or can be unequal. The method 100 also includes selecting at least one action based on the predicted likelihood of the occurrence of a future event 114. In marketing, most of the time the at least one action will have a monetary component. In other words, the actions will cost money to perform. In business, it is desirable to get the most effect for the dollar spent. Therefore, selecting the action 114 may also include optimization so that the predictions made can be leveraged across customers and products to meet business goals and objectives within the bounds of resource constraints placed by the business.
In one particular embodiment of the method 100, the marketing action will be a recommendation for an action to be taken. The owner of the product, in this one embodiment, will pay a fee for a recommendation to be made. A retailer can make the recommendation to a particular customer. The source of the recommendation can also be other than the retailer. The method 100 also includes feeding back information regarding the occurrence of the event 116. This information is useful in determining or tweaking the relationships or insights between the entities associated with the data as well as predicting the likelihood of occurrence of a future event. Statistics can be kept as to the effectiveness of the predictions for the purpose of pricing the services. The statistics can also be used to determine the timing for retraining models for the predictive component or if some relationships found are no longer significant of it new ones have emerged. It should be noted that the discussion of business and marketing is one application or example application of the method for finding insights or relationships between events in a set of data and then predicting when this method 100 is extendable to other situations.
The client warehouse data 410, mentioned above, represents terabytes of transaction and other data related to a sales entity. The client warehouse data 410 includes extra information that does not need to be used to perform the method for selecting the next best action, such as method 100 or method 200. As a result, the needed data is extracted, transformed and loaded into a more useable subset of the warehouse data base called a solution data mart 450. The solution data mart 450 can be stored with the data client warehouse 410 or can be stored on a separate data server or other separate data location. The data associated with the solution data mart 450 is used or acted on to determine various relationships between entities.
The system 400 also includes a insight/relationship determination module 420, and a future event prediction module 430, and a selection and optimization module 440. The relationships are determined after reviewing historical transaction data and producing a model based on the historical data for a number of entities. The model can then be used to project future actions of a person or consumer based on other entities, such as promotions or the product. The future event prediction module 430 is used to determine the possibility of a future event occurring within a number of time frames. The future event prediction module 430 determines the possibility of a future event over at least two selected time frames. The future event prediction module 430 uses a proportional hazard type model. The possibility that an event will occur within a time frame is set forth as a number. The number represents the possibility that the event will occur in the particular time frame. The number is between zero (where it absolutely sill not occur) and one (where the event will occur during that particular time frame). The number assigned is actually a probability of the event occurring. Assigning the probability for the various time frames may also be referred to as scoring the possibility or propensity of the future event happening during the time frame. The future event prediction module shifts the emphasis to when an event, such as a purchase, will occur. In other words, the emphasis is not merely a prediction that the event will occur but the prediction is made with finer granularity with respect to the timing of the future event.
For each time frame, a propensity matrix including one or more customers and at least one product is formed. Several propensity matrices will be produced for each of the future time slots. This data is input to the selection module 440. The selection module 440 selects from among the best times to make a recommendation to the consumer. The selection module 440 can also be thought of as an optimization module for timing recommendations that will be the most effective in causing the future event. The recommendation or other marketing action is then output from the selection module 440 as a content offer. In some embodiments, a marketing channel is also recommended. For example, an offer may be made to a consumer by direct mail, or from a call center, or through a kiosk or over the internet. The recommendation or other marketing action data is transferred to a marketing execution platform 460 where the recommendation is fulfilled or made to the consumer. Of course the purchase transactions can then be monitored to see if the consumer acts or buys the product. In other words, the process has a feed back loop which can be monitored for success of the recommendations or other marketing action. The new purchases, the marketing action, the product and the timing of these actions then become part of the historical data of client data warehouse 410 that will be extracted, transformed, and loaded for use in the next iteration of the method.
Thus, the system 400 is a closed-loop process incorporating data acquisition and management, measurement and reporting, analytics, and complex decisioning to serve the highest performing interaction to customers at the right time and through the right channel. The system 400 is scalable to meet both current and future client marketing objectives.
The insight/relationship determination module 420 includes a scalable, highly automated parallel computing data mining application. It takes a large amount of customer transaction data and produces individual (disaggregated) likelihoods for a specified set of events that customers may experience in the near future (week, month, next mouseclick, etc.). The future event predictor module 320, in one embodiment uses a form of scorecards (one scorecard per predicted event) which tend to be interpretable.
The notion of events is very general. A user can specify which events to predict, after giving careful consideration to the business objectives and what's actionable. Predicting store visits, purchases in various departments, or of various products, can be exploited by sending brochures, discount coupons, or by means of a product recommendation engine. While purchase events are directly available from the data, it is also possible to define technical events as prediction targets.
The scorecards take into account previous transaction information (in the form of recency and frequency attributes), as well as seasonal information. This information is often very rich and predictive of future behavior. Other potential inputs are customer demographics, behavior summary features, marketing variables, pricing information, economic and competitor data, etc.)
In the operation, new transactions continuously stream into the data warehouse 410. The future events prediction module 430 regularly recomputed scores based on the latest information. The scores/event likelihoods may change over time, reflecting the changing needs and attitudes of the customers. The scores will input into the selection optimization module 440 and marketing execution platform 460 which use rules to turn scores into marketing or other decisions.
Due to expected changes in the environment (economy, competitors), changes in customer behavior (fashions), and the increasing information collected over time, it is important to occasionally update (some of) the underlying predictive models, both in terms of their structure and their parameters.
The output of the future event prediction module (330, 430) is a set of predictions, not decisions. To orchestrate smart decisions, (constrained) optimization techniques are used. The “propensity matrix” of purchase likelihoods of all customers for all products provides precise (accurate and timely) information for marketing optimization.
The system 400 and methods 100, 200 provide for highly personalized marketing campaigns by marketing the right products to the right customers at the right time and through the right channel.
Solutions should be designed to work on a large scale with many millions of customers and hundreds or thousands of products.
The discussion of
Insight/Insight/Relationship Determination Module
Referring now to
The insight/relationship determination module allows product associations to be analyzed in various contexts, e.g. within individual market baskets, or in the context of a next visit market basket, or across all purchases in an interval of time, so that different kinds of purchase behavior can be associated with different types of products and different types of customer segments can be revealed. Therefore, accurate customer-centric and product-centric decisions can be made. The insight/relationship determination module 320 can be scaled to very large volumes of data, and is capable of analyzing large numbers of products and even more transactions. The insight/relationship determination module 320 is interpretable and develops a graphical network structure that reveals the product associations and provides insight into the decisions generated by the analysis. It also enables a real-time customer-specific recommendation engine that can use a customer's past purchase behavior and current market basket to develop accurate, timely, and very effective cross-sell and up-sell offers.
The Insight/Relationship Determination Module 320 Framework
Traditional modeling frameworks in statistical pattern recognition and machine learning, such as classification and regression, seek optimal causal or correlation based mapping from a set of input features to one or more target values. The systems (input-output) approach suits a large number of decision analytics problems, such as fraud prediction and credit scoring. The transactional data in these domains is typically collected in, or converted to, a structured format with fixed number of observed and/or derived input features from which to choose. There are a number of data and modeling domains, such as language understanding, image understanding, bioinformatics, web cow-path analysis etc., in which either (a) the data are not available in such a structured format or (b) we do not seek input-output mappings, where a new computational framework might be more appropriate. To handle the data and modeling complexity in such domains, the insight/relationship determination module 320, a semi-supervised insight discovery and data-driven decision analytics framework, known as Pair-wise Co-occurrence Consistency that:
Each of the highlighted terms has a very specific meaning as it applies to different domains. Before describing these concepts as they apply to the retail domain, consider the details of the retail process and the retail data abstraction based on customer purchases.
Retail Transaction Data
At a high level, the retail process may be summarized as Customers buying products at retailers in successive visits, each visit resulting in the transaction of a set of one or more products (market basket). In its fundamental abstraction, as used in the insight/relationship determination module 320 framework, the retail transaction data is treated as a time stamped sequence of market baskets, as shown in
Transaction data are a mixture of two types of interspersed customer purchases:
1. Logical/Intentional Purchases (Signal)—Largely, customers tend to buy what they need/want and when they need/want them. These may be called intentional purchases, and may be considered the logical or signal part of the transaction data as there is a predictable pattern in the intentional purchases of a customer.
2. Emotional/Impulsive Purchases (Desirable Noise)—In case of most customers, the logical intentional purchase may be interspersed with emotion driven impulsive purchases. These appear to be unplanned and illogical compared to the intentional purchases. Retailers deliberately encourage such impulsive purchases through promotions, product placements, and other incentives because it increases their sales. But from an analytical and data perspective, impulsive purchases add noise to the intentional purchase patterns of customers. This makes the problem of finding logical patterns associated with intentional purchases more challenging.
Key Challenges in Retail Data Analysis
Based on this abstraction of the transaction data that they are a mixture of both intentional and impulsive purchases, there are three key data mining challenges:
1. Separating Intentional (Signal) from Impulsive (Noise) Purchases—As in any other data mining problem, it is important to first separate the wheat from the chaff or signal from the noise. Therefore, the first challenge is to identify the purchase patterns embedded in the transaction data that are associated with intentional behaviors.
2. Complexity of Intentional Behavior—The intentional purchase part of the transaction data is not trivial. It is essentially a mixture of projections of (potentially time-elapsed) latent purchase intentions. In other words:
3. Matching the Right Impulses to the Right Intentions—As mentioned above, the customer's impulsive behavior is desirable for the retailer. Therefore instead of ignoring the noise associated with it, the retailers might be interested in finding patterns associating the right kind of impulsive buying purchases with specific intentional purchases.
In the following discussion, a high level overview of the insight determination module 320 framework is given. The insight determination module combs transaction data to find various relationships between entities associated with the data.
The terminology used to define the insight/relationship determination module 320 framework is described. The insight/relationship determination module 320 process and benefits of the insight/relationship determination module 320 framework are also provided.
Entities in Retail Domain
In the retail domain, there are a number of entity-types: Products, Customers, Customer segments, Stores, Regions Channels, Web pages, Offers, etc. The insight/relationship determination module 320 primarily focuses on two main entity types: Products and Customers.
Products are goods and services sold by a retailer. We refer to the set of all products and their associated attributes including hierarchies, descriptions, properties, etc. by an abstraction called the product space. A typical product space exhibits the following four characteristics:
The set of all customers that have shopped in the past forms the retailer's customer base. Some retailers can identify their customers either through their credit cards or retailer membership card. However, most retailers lack this ability because customers are using either cash or they do not want to participate in a formal membership program. Apart from their transaction history, the retailer might also have additional information on customers, such as their demographics, survey responses, market segments, life stage, etc. The set of all customers, their possible organization in various segments, and all additional information known about the customers comprise the customer space. Similar to a product space, a typical customer space exhibits the following four characteristics:
Relationships in Retail Domain
There are different types of relationships in the retail domain. The three main types of relationships considered by the insight/relationship determination module 320 are:
1. First order, explicit purchase-relationships between customers and products, i.e. who purchased what, when, for how much, and how (channel, payment type, etc.)?
2. Second order, implicit consistency-relationships between two products, i.e. how consistently are two products co-purchased in a given context?
3. Second order, implicit similarity-relationships between two customers, i.e. how similar are the purchase behaviors exhibited by two customers?
While the purchase relationships are explicit in the transaction data, the insight/relationship determination module 320 framework is used primarily to infer the implicit product-product consistency relationships and customer-customer similarity relationships. To do this, the insight/relationship determination module 320 views products in terms of customers and views customers in terms of products.
The Insight/Relationship Determination Module 320 Graphs
The most natural representation of pair-wise relationships between entities abstraction is a structure called Graph. Formally, a graph contains:
The insight/relationship determination module 320 graphs are the internal representation of the pair-wise relationships between entities abstraction. There are three parameters that define an insight/relationship determination module graph.
1. Customization defines the scope of the insight/relationship determination module graph by identifying the transaction data slice (customers and transactions) used to build the graph. For example, one might be interested in analyzing a particular customer segment or a particular region or a particular season or any combination of the three. Various types of customizations that are supported in the insight/relationship determination module are described below.
2. Context defines the nature of the relationships between products (and customers) in the insight/relationship determination module graphs. For example, one might be interested in analyzing relationships between two products that are purchased together or within two weeks of each other, or where one product is purchased three months after the other, and so on. As described below, the insight/relationship determination module 320 supports both market basket contexts and purchase sequence contexts.
3. Consistency defines the strength of the relationships between products in the product graphs. There are a number of consistency measures based on information theory and statistics that are supported in the insight/relationship determination module 320 analysis. Different measures have different biases. These are discussed further below.
Insight-Structures in the Insight/Relationship Determination Module Graphs
As mentioned above, the insight/relationship determination module graphs may be mined to find insights or actionable patterns in the graph structure that may be used to create marketing decisions. These insights are typically derived from various structures embedded in the insight/relationship determination module graphs. The five main types of structures in the insight/relationship determination module graph that are explored are:
1. Sub-graphs—A sub-graph is a subset of the graph created by picking a subset of the nodes and edges from the original graph. There are a number of ways of creating a sub-graph from a insight/relationship determination module graph. These may be grouped into two types:
2. Neighborhoods—A neighborhood of a target product in an insight/relationship determination module graph is a special sub-graph that contains the target product and all the products that are connected to the target product with consistency strength above a threshold. This insight structure shows the top most affiliated products for a given target product. Decisions about product placement, store signage, and the like, can be made from such structures. A neighborhood structure may be seen with or without cross edges as shown in
3. Product Bundles—A bundle structure in the insight/relationship determination module graph is defined as a sub-set of products such that each product in the bundle has a high consistency connection with all the other products in the bundle. In other words, a bundle is a highly cohesive soft clique in a insight/relationship determination module graph. The standard market basket analysis tools seek to find Item-Sets with high support (frequency of occurrence). The insight/relationship determination module 320 product bundles are analogous to these item-sets, but they are created using a very different process and are based on a very different criterion known as bundleness that quantifies the cohesiveness of the bundle. The characterization of a bundle and the process involved in creating a product bundle exemplify the pair-wise relationships and is part of a suite of propriety techniques that seek to discover higher order structures from pair-wise relationships.
4. Bridge Structures—The notion of a bridge structure is inspired from that of polyseme in language where a word might have more than one meaning (or belongs to more than one semantic family). For example, the word ‘can’ may belong to the semantic family {‘can’, ‘could’, ‘would’ . . . } or {‘can’, ‘bottle’, ‘canister’ . . . }. In retail, a bridge structure embedded in the insight/relationship determination module graph is a collection of two or more, otherwise disconnected, product groups (product bundle or an individual product) that are bridged by one or more bridge product(s). For example, a wrist-watch may be a bridge product between electronics and jewelry groups of products. A bridge pattern may be used to drive cross department traffic and diversify a customer's market basket through strategic promotion and placement of products. More details on bridge structures are given below.
5. Product Phrases—A product phrase is a product bundle across time, i.e. it is a sequence of products purchased consistently across time. For example, a PC purchase followed by a printer purchase in a month, followed by a cartridge purchase in three months is a product phrase. A product bundle is a special type of product phrase where the time-lag between successive products is zero. Consistent product phrases can be used to forecast customer purchases based on their past purchases to recommend the right product at the right time. More details about product phrases is given below.
Logical vs. Actual Structures
All the structures discussed above are created by (1) defining a template-pattern for the structure and (2) efficiently searching for those patterns in the graphs of the insight/relationship determination module. One of the fundamental differences between the insight/relationship determination module 320 and conventional approaches is that the insight/relationship determination module 320 seeks logical structures in the graphs while conventional approaches, such as frequent item-set mining, seek actual structures directly in transaction data.
Consider, for example, a product bundle or an item-set shown in
The limitation of the transaction data that they do not contain an entire logical bundle throws a set of unique challenges for retail data mining in general, and item-set mining in particular. The insight/relationship determination module 320 addresses this problem. First, it uses these projections of the logical bundles by projecting them further down to their atomic pair-wise levels and strengthens only these relationships between all pairs within the actual market basket. Secondly, when the insight/relationship determination module graphs are ready, the insight/relationship determination module 320 discards the transaction data and tries to find these structures in these graphs directly. So even if edges between products A and B are strengthened because of a different set of customers, between A and C by another set of customers and between B and C by a third set of customers (because they all bought different projections of the logical bundle {A, B, C}), still the high connection strengths between A-B, B-C, and A-C result in the emergence of the logical bundle {A, B, C} in the insight/relationship determination module 320 and it's graph. Thus, the two stage process of first creating the atomic pair-wise relationships between products and then creating higher order structures from them gives insight/relationship determination module 320 a tremendous generalization capability that is not present in any retail mining framework. The same argument applies to other higher order structures such as bridges and phrases as well. This provides the insight/relationship determination module 320 a unique ability to find very interesting, novel, and actionable logical structures (bundles, phrases, bridges, etc.) that cannot be found otherwise.
The Insight/Relationship Determination Module Retail Mining Process
There are three stages in the insight/relationship determination module 320 retail mining process for extracting actionable insights and data-driven decisions from this transaction data:
1. Data Pre-processing—In this stage, the raw transaction data are (a) filtered and (b) customized for the next stage. Filtering cleans the data by removing the data elements (customers, transactions, line-items, and products) that are to be excluded from the analysis. Customization creates different slices of the filtered transaction data that may be analyzed separately and whose results may be compared for further insight generation, e.g. differences between two customer segments. This stage results in one or more clean, customized data slices on which further analyses may be done. Details of the Data Pre-processing stage are provided below.
2. The Insight/relationship determination module 320 Graph Generation—In this stage, The insight/relationship determination module 320 uses information theory and statistics to create The insight/relationship determination module 320 Graphs that exhaustively capture all pair-wise relationships between entities in a variety of contexts. There are several steps in this stage:
3. Insight Discovery and Decisioning from the Insight/relationship determination module Graphs—The insight/relationship determination module 320 graphs serve as the model or internal representation of the knowledge extracted from transaction data. They are used in two ways:
The Insight/Relationship Determination Module 320 Benefits
The insight/relationship determination module 320 framework integrates a number of desirable features in it that makes it a very compelling and powerful retail analytic approach. The insight/relationship determination module 320 framework is:
In the following discussion, a formal description of the retail data is presented. Mathematical notations are introduced to define products in the product space, customers in the customer space, and their properties. Additionally, the data pre-processing step involving filtering and customization are also described in this discussion.
Product Space
A retailer's product space is comprised of all the products sold by the retailer. A typical large retailer may sell anywhere from tens of thousands to hundreds of thousands of products. These products are organized by the retailer in a product hierarchy in which the finest level products (SKU or UPC level) are grouped into higher product groups. The total numbers of products at the finest level change over time as new products are introduced and old products are removed. However, typically, the numbers of products at coarser levels are more or less stable. The number of hierarchy levels and the number of products at each level may vary from one retailer to another. The following notation is used to represent products in the product space:
Ml:U0→Ul
In addition to these product sets and mappings, each product has a number of properties as described below.
Customer Space
The set of all customers who have shopped at a retailer in the recent past form the customer base of the retailer. A large retailer may have anywhere from hundreds of thousands to tens of millions of customers. These customers may be geographically distributed for large retail chains with stores across the nation or internationally. The customer base might be demographically, financially, and behaviorally heterogeneous. Finally, the customer base might be very dynamic in three ways:
1. new customers add over time to the customer base,
2. old customers churn or move out of the customer base, and
3. existing customers change in their life stage and life style.
Due to the changing nature of the customer base, most retail analysis including customer segmentation must be repeated every so of ten to reflect the current status of the customer base. We use the following formal notation to represent customers in the customer space:
As described below, each customer is associated with additional customer properties that may be used their retail analysis.
Retail Transaction Data
As described earlier, transaction data are essentially a time-stamped sequence of market baskets and reflect a mixture of both intentional and impulsive customer behavior. A typical transaction data record is known as a line-item, one for each product purchased by each customer in each visit. Each line-item contains fields such as customer id, transaction date, SKU level product id, and associated values, such as revenue, margin, quantity, discount information, and the like. Depending on the retailer, on an average, a customer may make anywhere from two, e.g. electronic and sports retailers, to 50, e.g. grocery and home improvement retailers, visits to the store per year. Each transaction may result in the regular purchase, promotional purchase, return, or replacement of one or more products. A line-item associated with a return transaction of a product is generally identified by the negative revenue. Herein, we are concerned only with product purchases. We use the following formal notation to represent transactions:
x
(n)=(t1(n), x1(n)
, . . . ,
tq(n), xq(n)
, . . . ,
tQnn, xQn(n)
),
is the qth market basket of nth customer at level 0
Properties in Retail Data
There are four types of objects in the retail data:
1. Product—atomic level object in the product space
2. Line Item—each line (atomic level object) in transaction data
3. Transaction—collection of all line items associated with a single visit by a customer
4. Customer—collection of all transactions associated with a customer
Typically, each of these objects is further associated with one or more properties that may be used to (i) filter, (ii) customize, or (iii) analyze the results of various retail applications. Notation and examples of properties of these four types of objects are as follows:
Product Properties
The insight/relationship determination module 320 recognizes two types of product properties:
1. Given or Direct product properties that are provided in the product dictionary, e.g. manufacturer, brand name, product type (consumable, general merchandise, service, warranty, etc.), current inventory level in a store, product start date, product end date (if any), etc. These properties may also be level dependent, for example, manufacture code may be available only for the finest level.
2. Computed or Indirect product properties are summary properties that can be computed from the transaction data using standard OLAP summarizations, e.g. average product revenue per transaction, total margin in the last one year, average margin percent, etc. Indirect properties of a coarser level product may be computed by aggregating the corresponding properties of its finer level products.
Line Item Properties
Each line item is typically associated with a number of properties such as quantity, cost, revenue, margin, line item level promotion code, return flag, etc.
Transaction Properties
The insight/relationship determination module 320 recognizes two types of transaction properties:
1. Direct or Observed properties such as transaction channel, e.g. web, phone, mail, store id, etc., transaction level promotion code, transaction date, payment type used, etc. These properties are typically part of the transaction data itself.
2. Indirect or Derived properties such as aggregates of the line item properties, e.g. total margin of the transaction, total number of products purchased, and market basket diversity across higher level product categories, etc.
Customer Properties
The insight/relationship determination module 320 recognizes three types of customer properties:
1. Demographic Properties about each customer, e.g. age, income, zip code, occupation, household size, married/unmarried, number of children, owns/rent flag, etc., that may be collected by the retailer during an application process or a survey or from an external marketing database.
2. Segmentation Properties are essentially segment assignments of each customer (and may be associated assignment weights) using various segmentation schemes, e.g. demographic segments, value based segments (RFMV), or purchase behavior based segment.
3. Computed Properties are customer properties computed from customer transaction history, e.g. low vs. high value tier, new vs. old customer, angle vs. demon customer, early/late adopter and the like.
Data Pre-Processing
As described herein, the first step in the insight/relationship determination module 320 process is data pre-processing. It involves two types of interspersed operations. As shown in
Filtering
Not everything in the transaction data may be useful in a particular analysis. The insight/relationship determination module 320 manages this through a series of four filters based on the four object types in the transaction data: products, line items, transactions, customers.
1. Product Filter—For some analyses, the retailer may not be interested in using all the products in the product space. A product filter allows the retailer to limit the products for an analysis in two ways:
2. Line Item Filter—For some analyses, the retailer may not be interested in using all the line items in a customer's transaction data. For example, he may not want to include products purchased due to a promotion, or products that are returned, etc. Rules based on line item properties may be defined to include or exclude certain line items in the analyses.
3. Transaction Filter—Entire transactions may be filtered out of the analyses based on transaction level properties. For example, one may be interested only in analyzing data from last three years or transactions containing at least three or more products, or the like. Rules based on transaction properties may be used to include or exclude certain transactions from the analysis.
4. Customer Filter—Finally, transaction data from a particular customer may be included or excluded from the analysis. For example, the retailer may want to exclude customers who did not buy anything in the last six months or who are in the bottom 30% by value. Rules based on customer properties may be defined to include or exclude certain customers from the analysis.
Customization
To create specific insights and/or tailored decisions, the insight/relationship determination module 320 allows customization of the analyses either by customer, e.g. for specific customer segments, or by transactions, e.g. for specific seasons or any combination of the two. This is achieved by applying the analyses on a customization specific sample of the transaction data, instead of the entire data.
1. Customer Customization—Retailers might be interested in customizing the analyses by different customer properties. One of the most common customer properties is the customer segment which may be created from a combination of demographic, relationship (i.e. how the customer buys at the retailer: recency, frequency, monetary value, (RFMV)), and behavior (i.e. what the customer buys at the retailer) properties associated with the customer. Apart from customer segments, customizations may also be done, for example, based on: customer value (high, medium, low value), customer age (old, new customers), customer membership (whether or not they are members of the retailer's program), customer survey responses, and demographic fields, e.g. region, income level, etc. Comparing The insight/relationship determination module 320 analyses results across different customer customizations and across all customers generally leads to valuable insight discovery.
2. Transaction Customization—Retailers might be interested in customization of the analyses by different transaction properties. The two most common transaction customizations are: (a) Seasonal customization and (b) Channel customization. In seasonal customization the retailer might want to analyze customer behavior in different seasons and compare that to the overall behavior across all seasons. This might be useful for seasonal products, such as Christmas gifts or school supplies, etc. Channel customization might reveal different customer behaviors across different channels, such as store, web site, phone, etc.
Together all these customizations may result in specific insights and accurate decisions regarding offers of the right products to the right customers at the right time through the right channel. At the end of the data-preprocessing stage the raw transaction data is cleaned and sliced into a number of processed transaction data sets each associated with a different customization. Each of these now serve as possible inputs to the next stages in the insight/relationship determination module 320 process.
Pair-Wise Contextual Co-Occurrences
According to the definition of The insight/relationship determination module 320 herein, it seeks pair-wise relationships between entities in specific contexts. In the following discussion, the notion of context is described in detail, especially as it applies to the retail domain. For each type of context the notion of a context instance, a basic data structure extracted from the transaction data, is described. These context instances are used to count how many times a product pair co-occurred in a context instance. These co-occurrence counts are then used in creating pair-wise relationships between products.
Definition of a Context
The concept of Context is fundamental to the framework of insight/relationship determination module 320. A context is nothing but a way of defining the nature of relationship between two entities by way of their juxtaposition in the transaction data. The types of available contexts depend on the domain and the nature of the transaction data. In the retail domain, where the transaction data are a time-stamped sequence of market baskets, there are a number of ways in which two products may be juxtaposed in the transaction data. For example, two products may be purchased in the same visit, e.g. milk and bread, or one product may be purchased three months after another, e.g. a printer purchased three months after a PC, or a product might be purchased within six months of another product, e.g. a surround sound system may be purchased within six months of a plasma TV, or a product may be purchased between two to four months of another, e.g. a cartridge is purchased between two to four months of a printer or previous cartridge. The insight/relationship determination module 320 retail mining framework is context rich, i.e. it supports a wide variety of contexts that may be grouped into two types as shown in
For every context, the insight/relationship determination module 320 uses a three step process to quantify pair-wise co-occurrence consistencies for all product pairs: (α,β)∈ Ul×Ul for each level l at which the analysis is to be done in the insight/relationship determination module 320.
1. Create context instances from filtered and customized, transaction data slice,
2. Count the number of times the two products co-occurred in those context instances, and
3. Compute information theoretic measures to quantify consistency between them.
These three steps are described for both the market basket and purchase sequence contexts next.
Market Basket Context
Almost a decade of research in retail data mining has focused on market basket analysis. Traditionally, a market basket is defined as the set of products purchased by a customer in a single visit. In the insight/relationship determination module 320, however, a market basket context instance is defined as a SET of products purchased on one or more consecutive visits. This definition generalizes the notion of a market basket context in a systematic, parametric way. The set of all products purchased by a customer (i) in a single visit, or (ii) in consecutive visits within a time window of (say) two weeks, or (iii) all visits of a customer are all valid parameterized instantiations of different market basket contexts. A versatile retail mining framework should allow such a wide variety of choices for a context for several reasons:
For a given market basket definition, the conventional association rules mining techniques try to find high support and high confidence item-sets. As mentioned above, these approaches fail because of two fundamental reasons: First the logical product bundles or item-sets typically do not occur as the transaction data is only a projection of logical behavior and, secondly, using frequency in a domain where different products have different frequency of purchase leads to a large number of spurious item-sets. The framework of the insight/relationship determination module 320 framework corrects these problems as described above. Consider the first two steps of creating pair-wise co-occurrence counts for the market basket context.
Creating Market Basket Context Instances
A parametric market basket context is defined by a single parameter: window width: ω. Technique 1 below describes how the insight/relationship determination module 320 creates market basket context instances, Bn, given:
The technique returns a (possibly empty) set of market basket context instances or a set of market baskets, B=Bn(ω). The parameter tlast is clarified later when we show how this function is used for the initial co-occurrence count and incremental co-occurrence updates since the last update.
The basic idea of Technique 1 is as follows: Consider a customer's transaction data shown in
Creating Market Basket Co-Occurrence Counts
The insight/relationship determination module 320 maintains the following four counts for each product level l at which the market basket analysis is done.
Note that the market basket context results in a symmetric co-occurrence counts matrix. Also, the diagonal elements of the matrix are zero because the product co-occurrence with itself is not a useful thing to define. A threshold is applied to each count such, that if the count is less than the threshold, it is considered zero. Also note that the single visit market basket used in traditional market basket analysis tools is a special parametric case: ω=0.
Purchase Sequence Context
While market basket context is ubiquitous in the retail mining literature, it is clear that it either ignores when it uses single visits as market baskets, or loses when it uses consecutive visits as market baskets, temporal information that establishes contexts across time. These purchase sequence contexts, as they are called in the insight/relationship determination module 320, may be very critical in making not only precise decisions about what product to offer a particular customer, but also timely decisions about when the product should be offered. For example, in grocery domain, there might be one group of customers who buy milk every week while another group who might buy milk once a month. In, for example, electronics retailers, where this is even more useful, there might be one group of customers who use cartridge more quickly than others or who change their cell phones more frequently than others, etc. Further, there might be important temporal relationships between two or more products for example between a PC purchase; followed by a new printer purchase; followed by the first cartridge purchase. There might be consistent product phrases that may be result in important insights and forecasting or prediction decisions about customers. The purchase sequence type context in The insight/relationship determination module 320 makes such analyses possible.
Creating Purchase Sequence Context Instances
Unlike a market basket context instance, which is nothing but a market basket or a single set of products, the purchase sequence context instance is a triplet: a,b,Δt
with three elements:
The time t in the transaction data is in days. Typically, it is not useful to create purchase sequence context at this resolution because at this resolution we may not have enough data, moreover, this may be a finer resolution than the retailer can make actionable decisions on. Therefore, to allow a different time resolution, we introduce a parameter: ρ that quantifies the number of days in each time unit (Δt). For example, if ρ=7, the purchase sequence context is computed at week resolution. Technique 2, below, describes the technique for creating a set of purchase sequence context instances, given:
The time in days is converted into the time units in Technique 2 using the function:
The technique returns a (possibly empty) set of purchase sequence context instances or a set of triplets, a,b,Δt
, P=Pn(ρ). Again, the parameter tlast is clarified later when we show how this function is used for the initial co-occurrence count and incremental co-occurrence updates since the last update.
p ← p−1; // Skip
Break;
aq ← aq ∪ M (xp(n));
P ← P ⊕
aq,bq,Δtlast
P ← P ⊕
aq,bq,Δtlast
Creating Purchase Sequence Co-Occurrence Counts
In the market basket context, there is a symmetric 2-D matrix with zero diagonals to maintain the co-occurrence counts. In purchase sequence context, a non-symmetric, three dimensional matrix to denote the co-occurrence counts is used. The insight/relationship determination module 320 maintains the following matrices for the purchase sequence co-occurrence counts:
Note that:
Initial vs. Incremental Updates
Transaction data are collected on a daily basis as customers shop. When in operation, the insight/relationship determination module 320 co-occurrence count engine uses an initial computation of the four counts: totals, margins, and co-occurrence counts using one pass through the transaction data. After that incremental updates may be done on a daily, weekly, monthly, or quarterly basis depending on how the incremental updates are set up.
The time complexity of the initial update is
and the time complexity of the incremental update is
where In is the number of new transactions since the last update.
The insight/relationship determination module 320 framework does not use the raw co-occurrence counts (in either context) because the frequency counts do not normalize for the margins. Instead, The insight/relationship determination module 320 uses consistency measures based on information theory and statistics. A number of researchers have created a variety of pair-wise consistency measures with different biases that are available for use in the insight/relationship determination module 320. Described in the following discussion is how these consistency matrices may be computed from the sufficient statistics that have already computed in the co-occurrence counts.
Definition of Consistency
Instead of using frequency of co-occurrence, consistency is used to quantify the strength of relationships between pairs of products. Consistency is defined as the degree to which two products are more likely to be co-purchased in a context than they are likely to be purchased independently. There are a number of ways to quantify this definition. The four counts, i.e. the total, the two margins, and the co-occurrence, are sufficient statistics needed to compute pair-wise co-occurrence.
In terms of these sets,
η(α,β)=|A∩B|;η(,)=|T|
η(α,)=|A|;η(,β)=|B|
In the left and the right Venn diagrams, the overlap between the two sets is the same. However, in case of sets A′ and B′, the relative size of the overlap compared to the sizes of the two sets is higher than that for the sets A and B and hence by our definition, the consistency between A′, B′ is higher than the consistency between A, B.
For the purchase sequence context, the four counts are available at each time-lag therefore all the equations above and the ones that follow can be generalized to purchase sequence as follows: η(*,*)→η(*,*|Δτ), i.e. all pair-wise counts are conditioned on the time-lag in the purchase sequence context.
Co-Occurrence Counts: Sufficient Statistics
The counts, i.e. total, the margin(s), and the co-occurrence counts, are sufficient statistics to quantify all the pair-wise co-occurrence consistency measures in insight/relationship determination module 320. From these counts, the following probabilities can be computed:
There are two caveats in these probability calculations: First if any of the co-occurrence or margin counts is less than a threshold then it is treated as zero. Second, it is possible to use smoother versions of the counts, which is not shown in these equations. Finally, if due to data sparsity, there are not enough counts, then smoothing from coarser class levels may also be applied.
Consistency Measures Library
There are a number of measures of interestingness that have been developed in statistics, machine learning, and data mining communities to quantify the strength of consistency between two variables. All these measures use the probabilities discussed above. Examples of some of the consistency measures are given below.
Φ={φ(α,β)}:∀α,β ∈ Ul
φ(α,β)=ƒ(η(æ,),η(α,),η(æ,β),η(α,β))
Φ={φ(α,β;Δτ)}:∀α,β ∈ Ul,Δτ ∈ {0 . . . ΔT}
φ(α,β;Δτ)=ƒ(η(,;Δτ),η(α,;Δτ),η(,β;Δτ),η(α,β;Δτ))
Before we go into the list of consistency measures, it is important to note some of the ways in which we can characterize a consistency measure. While all consistency measures normalize for product priors in some way, they may be:
φ(α,β)=φ(β,α)Symmetric Market Basket Consistency
φ(α|β)≠φ(β|α)Asymmetric Market Basket Consistency
φ(α,β;Δt)=φ(β,α;−Δt)Symmetric Purchase Sequence Consistency
φ(α|β;Δt)≠φ(β|α;−Δt)Asymmetric Purchase Sequence Consistency
These properties are highlighted as appropriate for each of the consistency measures in the library. For the sake of brevity, in the rest of this discussion, we use the following shorthand notation for the marginal probabilities:
P(α,)≡P(α);P(,β)≡P(β)
Statistical Measures of Consistency
Pearson's Correlation Coefficient
Correlation coefficient quantifies the degree of linear dependence between two variables which are binary in our case indicating the presence or absence of two products. It is defined as:
Comments:
Goodman and Kruskal's λ-Coefficient
λ-coefficient minimizes the error of predicting one variable given the other. Hence, it can be used in both a symmetric and a non-symmetric version:
Asymmetric Versions:
M(α|β)=max{P(α,β),P(
M(β|α)=max{P(α,β),P(α,
M(α)=max{P(α),P(
Symmetric Versions:
Comments:
Odds Ratio and Yule's Coefficients
Odds Ratio measures the odds of two products occurring or not occurring compared to one occurring and another non-occurring: The odds ratio is given by:
Odds may be unbounded and hence two other measures based on odds ratio are also proposed:
Youle-Q:
Youle's-Y:
Piatetsky-Shapiro's
φ(α|β)=P(α,β)−P(α)P(β)
Added Value
Klosgen
Certainty Coefficients
Asymmetric Versions:
Symmetric Version:
Data Mining Measures of Consistency
Support
φ(α,β)=P(α,β)
Confidence
Asymmetric Version:
Symmetric Version:
Conviction
Asymmetric Version:
Symmetric Version:
Interest and Cosine
Collective Strength
Information Theoretic Measures of Consistency
Point-Wise Mutual Information
The Insight/Relationship Determination Module 320 Suite of Applications
The insight/relationship determination module 320 includes a general framework that allows formulation and solution of a number of different problems in retail. For example, it may be used to solve problems as varied as:
(i) customer segmentation using pair-wise similarity relationships between customers,
(ii) creating product bundles or consistent item-sets using pair-wise consistency between products purchased in market basket context, or
(iii) predicting the time and product of the next possible purchase of a customer using pair-wise consistency between products purchased in a purchase sequence context.
From a technology perspective, the various applications of the insight/relationship determination module 320 are divided into three categories:
The insight/relationship determination module 320 Product consistency graphs are the internal representation of the pair-wise co-occurrence consistency relationships created by the process described above. Once the graph is created, the insight/relationship determination module 320 uses graph theoretic and machine learning approaches to find patterns of interest in these graphs. While we could use the pair-wise relationships as such to find useful insights, the real power of the insight/relationship determination module 320 comes from its ability to create higher order structures from these pair-wise relationships in a very novel, scalable, and robust manner, resulting in tremendous generalization that is not possible to achieve by purely data driven approaches. The following discussion focuses on four important higher-order-structures that might constitute actionable insights:
1. Product neighborhood,
2. product bundles,
3. bridge structures, and
4. product phrases.
Before discussing these structures further, we define a useful abstraction called the Product Space.
Product Space Abstraction
The notion of product space was introduced above as a collection of products and their properties. Now having a way to quantify connection strength (co-occurrence consistency) between all pairs of products, this can be used to create a discrete, finite, non-metric product space where:
Product Neighborhood
The simplest kind of insight about a product is that regarding the most consistent products sold with the target product in the insight/relationship determination module 320 graph or the products nearest to a product in the Product Space abstraction. This type of insight is captured in the product neighborhood analysis of the insight/relationship determination module 320 graph.
Definition of a Product Neighborhood
The neighborhood of a product is defined as an ordered set of products that are consistently co-purchased with it and satisfying all the neighborhood constraints. The neighborhood of a product γ is denoted by Nλ(γ|Φ), where:
N
λ(γ|Φ)={x1,x2, . . . , xK}
φ(γ,xk)≧φ(γ,xk÷1):∀k=1 . . . K−1
g
scope(xk,λscope)=TRUE:∀k=1 . . . K
g
size(Nλ(γ|Φ),λsize)=TRUE:∀k=1 . . . K
Note that the set is ordered by the consistency between the target product and the neighborhood products: The most consistent product is the first neighbor of the target product, and so on. Also note that here are two kinds of constraints associated with a neighborhood:
Scope Constraint: This constraint filters the scope of the products that may or may not be part of the neighborhood. Essentially, these scope-filters are based on product properties and the parameter λscope encapsulates all the conditions. For example, someone might be interested in the neighborhood to be limited only to the target product's department or some particular department or to only high value products or only to products introduced in the last six months, etc. The function gscope(x,λscope) returns a true if the product x meets all the criteria in λscope.
Size Constraint: Depending on the nature of the context used, the choice of the consistency measure, and the target product itself the size of the product neighborhood might be large even after applying the scope constraints. There are three ways to control the neighborhood size:
g
size(Nλ(γ|Φ),λsizelimit)=Nλ(γ|Φ)=K≦λsize limit
g
size(Nλ(γ|Φ),λsizeabsolute-threshold)=φ(γ,xK)≧λsizeabsolute-threshold
Business Decisions Based on Product Neighborhoods
Product neighborhoods may be used in several retail business decisions. Examples of some are given below:
Neighborhood Based Product Properties
As discussed above, a number of direct and indirect-product properties were introduced. The direct properties such as manufacturer, hierarchy level, etc. are part of the product dictionary. Indirect properties such as total revenue, margin percent per customer, etc. may be derived by simple online analytical processing (OLAP) statistics on transaction data. In the following discussion two more product properties that are based on the neighborhood of the product in the product graph are introduced: Value-based Product Density and Value-based Product Diversity.
Value-Based Product Density
If the business goal for the retailer is to increase the sale of high margin products or high revenue products, a direct approach would be to promote those products more aggressively. An indirect approach would be to promote those products that influence the sale of high margin or high revenue products. This principle can be generalized whereby if the business goal is related to a particular product property then a value-based product density based on its product neighborhood may be defined for each product.
For a given product neighborhood, i.e. neighborhood constraints, consistency measure, and product value-property v (revenue, frequency, etc.), the value-density of a product is defined as the linear combination of the follows:
Where:
An example of the Gibbs weight function is:
The parameter θ2 can be interpreted as the temperature for the Gibb's distribution. When the parameter θ1=0 the weights are normalized otherwise the weights take the consistency into account.
Value-based product densities may be used in a number of ways. In the recommendation engine post processing, for example, the value-based density may be used to adjust the recommendation score for different objective functions.
Value-Based Product Diversity
Sometimes the business objective of a retailer is to increase diversity of a customer shopping behavior, i.e. if the customer shops in only one department or category of the retailer, then one way to increase the customer's wallet share is to diversify his purchases in other related categories. This can be accomplished in several ways, for example, by increasing (a) cross-traffic across departments, (b) cross-sell across multiple categories, or (c) diversity of the market basket. The graphs of the insight/relationship determination module 320 may be used to define value-based product diversity of each product. In recommendation engine post-processing, this score may be used to push high diversity score products to specific customers.
For every product γ, product property ν, and product level l above the level of product γ, value based product diversity is defined as the variability in the product density along different categories at level l:
D
v(γ|λscope=ul,Φ,θ)=Dv(γ|m,Φ,θ):∀m ∈ {1, . . . , Ml}
Diversity should be low (say zero) if all the neighbors of the products are in the same category as the product itself, otherwise the diversity is high. An example of such a function is:
Product Bundles
One of the most important types of insight in retail pertains to product affinities or product groupings of products that are “co-purchased” in the same context. In the following discussion describes the application of The insight/relationship determination module 320 in finding, what we call, “Product bundles” in a highly scalable, generalized, and efficient way that they exceed both the quality and efficiency of the results of traditional frequency based market basket approaches. A large body of research in market-basket-analysis is focused on efficiently finding frequent item-sets, i.e. a set of products that are purchased in the same market basket. The support of an item-set is the number of market baskets in which it or its superset is purchased. The confidence of any subset of an item-set is the conditional probability that the subset will be purchased, given that the complimentary subset is purchased. Techniques have been developed for breadth-first search of high support item-sets. Due to the reasons explained above, the results of such analysis have been largely unusable because this frequency based approach misses the fundamental observation that the customer behavior is a mixture of projections of latent behaviors. As a result, to find one actionable and insightful item-set, the support threshold has to be lowered so that typically millions of spurious item-sets have to be looked at.
The insight/relationship determination module 320 uses transaction data to first create only pair-wise co-occurrence consistency relationships between products. These are then used to find logical bundles of more than two products. The insight/relationship determination module Product bundles and technique based item-sets are product sets, but they are very different in the way they are created and characterized.
Definition of a Logical Product Bundle
A product bundle for the insight/relationship determination module 320 may be defined as a Soft Clique (completely connected sub-graphs) in the weighted graph of the insight/relationship determination module 320, i.e. a product bundle is a set of products such that the co-occurrence consistency strength between all pairs of products is high.
The insight/relationship determination module 320 uses a measure called bundleness to quantify the cohesiveness or compactness of a product bundle. The cohesiveness of a product bundle is considered high if every product in the product bundle is highly connected to every other product in the bundle. The bundleness in turn is defined as an aggregation of the contribution of each product in the bundle. There are two ways in which a product contributes to a bundle in which it belongs: (a) It can either be the principal or driver or causal product for the bundle or (b) it can be the peripheral or accessory product for the bundle. For example, in the bundle shown in
In general, the seedness of a product in a bundle is defined as the contribution or density of this product in the bundle. Thus the bundleness quantification is a two step process. In the first, seedness computation stage, the seedness of each product is computed and in the second, seedness aggregation stage, the seedness of all products is aggregated to compute the overall bundleness.
Seedness Computation
The seedness of a product in a bundle is loosely defined as the contribution or density of a product to a bundle. There are two roles that a product may play in a product bundle:
Borrowing terminology from the analysis of Web structure, the Klineberg's Hubs and Authority formulation in the seedness computation is as follows:
Φ(x)=[φi,j=φ(xi,xj)]
a(x|Φ)=(a1=a(x1|x,Φ), . . . , ai=a(xi|x,Φ), . . . , an=a(xn|x,Φ))
h(x|Φ)=(h1=h(x1|x,Φ), . . . , hi=h(xi|x,Φ), . . . , hn=h(xn|x,Φ))
These scores are initially set to I for all the products are iteratively updated based on the following definitions: Authority (Influencer) score of a product is high if it receives a high support from important hubs (followers) and Hubness score of a product is high if it gives high support to important authorities.
a,h=GenerateSeedness(x,Φ,εmin)
Normalize Hubness and Update Authority Measure
Normalize Authority and Update Hubness Measure
Technique 3: Computing the Hubs (Follower Score) and Authority (Influencer Score) in a Product Bundle
The hub and authority measure converge to the first Eigen Vectors of following matrices:
a≡a
(∞)
←eig
1[Φ(x)Φ(x)T]
h≡h
(∞)
←eig
1[Φ(x)TΦ(x)]
Where: Φ(x)=[φi,j=φ(xi|xj)]
If the consistency matrices are symmetric, the hubs and authority scores are the same. If they are non-symmetric, the hubs and authority measures are different. We only consider symmetric consistency measures and hence would only consider authority measures to quantify bundleness of a product bundle.
Seedness Aggregation
There are several ways of aggregating the seedness values of all the products in the product bundle. The insight/relationship determination module 320 uses a Gibbs aggregation for this purpose:
Different settings of the temperature parameter λ yield different aggregation functions:
Although this defines a wide range of bundleness functions, by the definition of cohesiveness, i.e. every product should be highly connected to every other product in the product bundle, the most appropriate definition of bundleness would be based on the minimum temperature:
Techniques for Finding Cohesive Product Bundles
Similar to the automated item-set mining, the insight/relationship determination module 320 includes an affinity analysis engine that provides for automatically finding high consistency cohesive product bundles given the above definition of cohesiveness and a market basket coo-occurrence consistency measure. Essentially the goal is to find these optimal soft-cliques in the graphs of the insight/relationship determination module 320. Initially, the meaning of optimal in the context of a product bundle is defined and note that this is an NP hard problem. Following this, two broad classes of greedy techniques are described: depth first and breadth first methods.
Problem Formulation
The overall problem of finding all cohesive product bundles in a product space may be formulated in terms of the following simple problem: Given
F ⊂ C ⊂ U
F=Ø, C=UAll bundles at the product level of the universe F=C
One bundle: F
The problem is to find a set of all locally optimal product bundles x={x1, . . . , xn} of size two or more such that:
F ⊂ x ⊂ C
π(x|Φ)≧π(x′|Φ):∀x′ ∈ BNeb(x|F,C)
Where:
The bundle-neighborhood of a bundle is the set of all feasible bundles that may be obtained by either removing a non-foundation product from it or by adding a single candidate product to it.
BNeb(x|F,C)=BNebGrow(x|F,C)∪ BNebShrink(x|F,C)
BNebGrow(x|F,C)={x′=x⊕x:∀x ∈ C−x}
BNebShrink(x|F,C)={x′x\x:∀x ∈ x−F}
In other words, a bundle x is local optima for a given candidate set C if:
The definition of a bundle as a subset of products bounded by a the foundation set F (as a subset of every product bundle) and a candidate set C (as a superset of every product bundle) together with the definition of the neighborhood function defined above results in an abstraction called the Bundle Lattice-Space (BLS).
The BGrow and BShrink sets may be further partitioned into two subsets each depending on whether the neighboring bundle has a higher or lower bundleness as factored by a slack-parameter θ:
The condition for optimality may be stated in a number of ways:
For a given candidate set C and foundation set F, there are O(2|C|−|F|) possible bundles to evaluate in an exhaustive approach. Finding a locally optimal bundle is NP Complete because it reduces to the Clique problem in the simple case that the Authority measure (used to calculate your bundle-ness metric) is “1” or “0”, depending on whether a node is fully connected to other nodes in the bundle. The Clique problem (determining if a graph has a clique of a certain size K) is NP Complete
Depth First Greedy Techniques
Depth first class of techniques start with a single bundle and apply a sequence of grow and shrink operations to find as many locally optimal bundles as possible. In addition to the consistency matrix, Φ, the candidate set, C, and the foundation set, F, a depth first bundle search technique also requires: (1) Root Set, R containing root-bundles to start each the depth search, (2) Explored Set, Z containing the set of product bundles that have already been explored. A typical depth first technique starts off by first creating a Root-Set. From this root-set, it picks one root at a time and performs a depth first search on it by adding/deleting an product from it until local optima is reached. In the process, it may create additional roots-bundles and add to the root set. The process finishes when all the roots have been exhausted. Technique 4 below describes how the insight/relationship determination module 320 uses the Depth first search to create locally optimal product bundles.
A key observation that makes this technique efficient is that for each bundle x, any of its neighbors in the lattice space with bundleness less than the bundleness of x cannot be local optima. This is used to prune out a number of bundles quickly to make the search faster. Efficient implementation for maintaining the explored set Z for quick look-up and the root set R for quick way of finding the maximum makes this very efficient. The parameter θ controls the stringency of the greediness. It is typically in the range of 0 to infinity with 1 being the typical value to use.
Breadth First Greedy Techniques
Another class of greedy techniques for finding locally optimal bundles is the Breadth First approach. Here, the search for optimal bundles of size k+1 happens only after all the bundles of size k have been explored. There are two main differences in the insight/relationship determination module 320 approach and that used for standard market basket analysis:
1. Quality: the standard market basket analysis technique seeks actual high support item-sets while the insight/relationship determination module 320 seeks logical high consistency bundles. There is a large qualitative difference in the nature, interpretation and usability of the resulting bundles from the two methods. This distinction is already discussed above.
2. Efficiency: the standard market basket analysis technique requires a pass through the data after each iteration to compute the support of each item-set, while The insight/relationship determination module 320 uses the co-occurrence matrix to compute the bundleness without making a pass through the data. This makes The insight/relationship determination module 320 extremely efficient compared to the standard market basket analysis technique technique.
The insight/relationship determination module 320's breadth-first class of techniques for finding locally optimal product bundles start from the foundation set and in each iteration maintains and grows a list of potentially optimal bundles to the next size of product bundles. The standard market basket analysis technique monotonic property also applies to a class of bundleness functions where the parameter λ is low for example: π−∞(x|Φ). In other words, for bundleness measures, a bundle may have high bundleness only if all of its subsets of one size less have high bundleness. This property is used in a way similar to the standard market basket analysis technique to find locally optimal bundles in the Technique 5 described below. In addition to the consistency matrix, Φ, the candidate set, C, and the foundation set, F, a breadth first bundle search technique also requires a Potentials Set, P, of bundles of size s that have a potential to grow into an optimal bundle.
The Breadth vs. Depth first search methods both have their trade-offs in terms of completeness vs. time/space complexity. While the depth first techniques are fast, the breadth first techniques may result in more coverage i.e. find majority of locally optimal bundles.
Business Decisions Based on Product Bundles
Product bundles may be used in several retail business decisions as well as in advanced analysis of retail data. Examples of some are given below:
Business Projection Scores
Product bundles generated in The insight/relationship determination module 320 represent logical product associations that may or may not exist completely in the transaction data i.e. a single customer may have not bought all the products in a bundle as part of a single market basket. These product bundles may be analyzed by projecting them along the transaction data and creating bundle projection-scores, defined by the a bundle set, a market basket, and a projection scoring function:
The insight/relationship determination module 320 supports a large class of projection-scoring functions, for example:
A market basket can now be represented by a set of K bundle-features:
Such a fixed length, intention level feature representation of a market basket, e.g. single visit, recent visits, entire customer, may be used in a number of applications such as intention-based clustering, intention based product recommendations, customer migration through intention-space, intention-based forecasting, etc.
Bundle Based Product Recommendations
There are two ways of making decisions about which products should be promoted to which customer: (1) product-centric customer decisions about top customers for a given product and (2) customer-centric product decisions about top products for a given customer. Product bundles, in conjunction with customer transaction data and projection scores may be used to make both types of decisions. Consider, for example the coverage projection score. If we assume that (1) a product bundle represents a complete intention and (2) that a customer eventually buys either all the products associated with an intention or none of the products, then if a customer has a partial coverage for a bundle, the rest of the products in the bundle may be promoted to the customer. This can be done by first computing a bundle based propensity score for each customer n, product γ combination and is defined as a weighted combination of coverage scores across all available bundles:
Where:
To make product centric customer decisions, we sort the scores across all customers for a particular product in a descending order and pick the top customers. To make customer centric product decisions, all products are sorted for each customer in descending order and top products are picked.
Bridge Structures in the Insight/Relationship Determination Module 320 Graphs
There are two extensions of the product bundle structures: (1) Bridge structures that essentially contain more than one product bundles that share very small number of products, and (2) Product phases that are essentially bundles extended along time. The following discussion focuses on characterizing, discovering, analyzing, and using bridge structures.
Definition of a Logical Bridge Structure
In the insight/relationship determination module 320, a bridge structure is defined as a collection of two or more, otherwise disconnected or sparsely connected product groups, i.e. a product bundle or an individual product, that are connected by a single or small number of bridge product(s). Such structures may be very useful in increasing cross department traffic and strategic product promotions for increased lifetime value of a customer.
Motivation from Polyseme
The key motivation for bridge structures in product graphs from the insight/relationship determination module 320 comes from polyseme in language: A word may have more than one meaning. The right meaning is deduced from the context in which the word is used.
Bridgeness of a Bridge Structure
Earlier a measure of cohesiveness for a bundle i.e. the “bundleness” measure was defined. Similarly, for each bridge structure a measure called bridgeness is defined that depends on two types of cohesiveness measures:
The bridgeness of a bridge structure involving the first kmax groups of the bridge structure is defined to be high if the individual groups are relatively more cohesive i.e. their intra-group cohesiveness is higher, than the cohesiveness across the groups, i.e. their inter-group cohesiveness. Again a number of bridgeness measures can be created that satisfy this definition. For example:
Techniques for Finding Bridge Structure
A large number of graph theoretic, e.g. shortest path, connected components, and network flow based, techniques may be used to find bridge structures as defined above. We describe two classes of techniques to efficiently find bridge structures in the The insight/relationship determination module 320 graph: (1) bundle aggregation technique that uses pre-computed bundles to create bridge structures and (2) a successive bundling technique that starts from scratch and uses depth first search for successively create more bundles to add to the bridge structure.
1. Bundle Overlap Technique
A bridge structure may be defined as a group of two or more bundles that share a small number of bridge products. An ideal bridge contains a single bridge product shared between two large bundles. Let B be the set of bundles found at any product level using the methods described above, from which to create bridge structures. The basic approach is to start with a root bundle, keep adding more and more bundles to it such that there is a non-zero overlap with the current set of bridge products.
This technique is very efficient because it uses pre-computed product bundles and only finds marginally overlapping groups, but it does not guarantee finding structures with high bridgeness and its performance depends on the quality of product bundles used. Finally, although it tries to minimize the overlap between groups or bundles, it does not guarantee a single bridge product.
2. Successive Bundling Technique
The bundle aggregation approach depends on pre-created product bundles and, hence, they may not be comprehensive in the sense that not all bundles or groups associated with a group might be discovered as the search for the groups is limited only to the pre-computed bundles. In the successive bundling approach, the starting point is a product that is a potential bridge product. Product bundles are grown using depth first approach such that the foundation set contains the product and the candidate set is limited to the neighborhood of the product. As a bundle is created and added to the bridge, it is removed from the neighborhood. In successive iterations, the reduced neighborhood is used as the candidate set and the process continues until all bundles are found. The process is then repeated for all products as potential bridges. This exhaustive yet efficient method yields a large number of viable bridges.
Before describing the successive bundling technique, a GrowBundle function is defined and Technique 7 is used in it. This function takes in a candidate set, a foundation set, and an initial or root set of products and applies a sequence of grow and shrink operations to find the first locally optimal bundle it can find in the depth first mode.
The GrowBundle is called successively to find subsequent product bundles in a bridge structures as shown in the Successive bundling Technique 8 below. It requires a candidate set C from which the bridge and group products may be drawn (in general this could be all the products at a certain level), the consistency matrix, the bundleness function and bundleness threshold θ to control the stringency and the neighborhood parameter ν to control the scope and size of the bridge product neighborhood.
Special Bridge Structures
So far there are no constraints imposed on how the bridge structures are created except for the candidate set. However, special bridge structures may be discovered by using appropriate constraints on the set of products that the bridge structure is allowed to grow from. One way to create special bridge structure is to define a special candidate sets for different roles in the bridges structure, e.g. bridge product role, group product role, instead of using a single candidate set.
Technique 8 is modified to do special bridges as follows: Instead of sending a single candidate set, now there is one candidate set for the set of bridge products and one candidate set for (possibly each of the) product groups. Using the depth first bundling technique, product bundles are created such that they must include a candidate bridge product i.e. the foundation set contains the bridge product, and the remaining products of the bundle come from the candidate set of the corresponding group that are also the neighbors of the potential bridge product. High bridgeness structures are selected from the Cartesian product of bundles across the groups.
Business Decisions from Bridge Structures
Bridge structures embedded in the insight/relationship determination module 320 graphs may provide insights about what products link otherwise disconnected products. Such insight may be used in a number of ways:
Bridge Projection Scores
Both product bundles and bridge structures are logical structures as opposed to actual structures. Therefore, typically, a single customer buys either none of the products or a subset of the products associated with such structures. Described earlier were several ways of projecting a customer against a bundle resulting in various bundle-projection-scores that may be used in either making decisions directly or used for further analysis. Similarly, bridge structures may also be used to create a number of bridge-projection-scores. These scores are defined by a bundle structure, a market basket, and a projection scoring function:
There are several projection scores that may be computed from a bridge structure and market basket combination. For example:
Product Phrases or Purchase Sequences
Product bundles are created using market basket context. The market basket context loses the temporal aspect of product relationships, however broad the time window it may use. The following discussion defines an extension of product bundles in another higher order structure known as a product phrase or consistent purchase sequence created using the insight/relationship determination module 320 framework. Essentially, a product phrase is a product bundle equivalent for purchase sequence context. Traditional frequency based methods extend the known standard market basket techniques to create high frequency purchase sequences. However, because transaction data is a mixture of projections of latent intensions that may extend across time, frequency based methods are limited in finding actionable, insightful, and logical product phrases. The same argument for product bundles also applies to product phrases.
The insight/relationship determination module 320 uses transaction data first to create only pair-wise co-occurrence consistency relationships between products by including both the market basket and purchase sequence contexts. This combination gives a tremendous power to the insight/relationship determination module 320 for representing complex higher order structures including product bundles, product phrases, and sequence of market baskets and quantify their co-occurrence consistency. The following discussion defines a product phrase and present techniques to create these phrases.
Definition of a Logical Product Phrase
A product phrase is defined as a logical product bundle across time. In other words, it is a consistent time-stamped sequence of products such that each product is consistently co-occurs with all others in the phrase with their relative time-lags. In its most general definition, a logical phrase subsumes the definition of a logical bundle and uses both market basket as well as purchase sequence contexts, i.e. a combination that is referred to as the Fluid Context in the insight/relationship determination module 320, to create it.
Formally, a product phrase x,Δt
is defined by two sets:
Time lags are measured in a time resolution unit which could be days, weeks, months, quarters, or years depending on the application and retailer. The time-lags must satisfy the following constraints:
The slack parameter εΔi determines how strictly these constraints are imposed depending on how far the products are in the phrase. Also, note that this definition includes product bundles as a special case where all time-lags are zero:
Fluid Context
The context rich the insight/relationship determination module 320 framework supports two broad types of contexts: market basket context and purchase sequence context. For exploring higher order structures as general as product phrases, as defined above, we need a combination of both these context types into a single context framework. This combination is known as the Fluid Context. Essentially fluid context is obtained by concatenating the two-dimensional co-occurrence matrices along the time-lag dimension. The first frame in this fluid context video is the market basket context (Δτ=0) with a window size equal to the time resolution. Subsequent frames are the purchase sequence contexts with their respective Δτ's. Fluid context is created in three steps:
A fluid context is represented by a three dimensional matrix:
Cohesiveness of a Product Phrase: “Phraseness”
Cohesiveness of a phrase is quantified by a measure called phraseness which is akin to the bundleness measure of cohesiveness of a product bundle. The only difference is that in product bundles, market basket context is used and in phrases, fluid context is used. The three-stage process for computing phraseness is similar to the process of computing bundleness:
Techniques for Finding Cohesive Product Phrases
Techniques described earlier for finding product bundles using market basket context based in the insight/relationship determination module graphs may be extended directly to find phrases by replacing the market basket context with fluid context and including additional search along the time-lag.
Insights and Business Decisions from Product Phrases
Product phrases may be used in a number of business decisions that span across time. For example:
Recommendation Engine
Product neighborhoods, product bundles, bridge structures, and product phrases are all examples of product affinity applications of the insight/relationship determination module 320 framework. These applications seek relationships between pairs of products resulting in a graph and discover such higher order structures in it. Most of these applications are geared towards discovering actionable insights that span across a large number of customers. The following discussion describes a highly (a) customer centric, (b) data driven, (c) transaction oriented purchase behavior application of the insight/relationship determination module 320 framework, i.e. the Recommendation Engine. A goal for a Recommendation Engine application is to offer the right product to the right customer at the right time at the right price through the right channel so as to maximize the propensity that the customer actually take-up the offer and buy the product or products. A recommendation engine allows retailers to match their content with customer intent through a very systematic process that may be deployed in various channels and customer touch points.
The insight/relationship determination module 320 framework lends itself very naturally to a recommendation engine application because it captures customer's purchase behavior in a very versatile, unique, and scalable manner in the form of insight/relationship determination module graphs. In the following discussion, the various dimensions of a recommendation engine application are introduced and described increasingly complex and more sophisticated recommendation engines can be created from the insight/relationship determination module 320 framework. These recommendation engines can tell not just what is the right product but also when is the right time to offer that product to a particular customer.
Definition of a Recommendation Engine Application
Typically, a recommendation engine attempts to answer the following business question: Given the transaction history of a customer, what are the most likely products the customer is going to buy next? In The insight/relationship determination module 320 this definition is taken one step further and to try and answer not just what product the customer will buy next but also when is he most likely to buy it. Thus, the recommendation engine has three essential dimensions:
1. Products—that are being considered for recommendation
2. Customers—to who one or more products are recommended; and
3. Time—at which recommendation of specific products to specific customers is made.
A general purpose recommendation engine should therefore be able to create a purchase propensity score for every combination of product, customer, and time, i.e. it takes the form of a three dimensional matrix:
t1,x1
,...,
tL,xL
} = customer transaction history
Recommendation Process
1. Recommendation Engine—takes the raw customer transaction history, the set of products in the recommendation pool and the set of times at which recommendations have to be made. It then generates a propensity score matrix described above with a score for each combination of customer, product, and time. Business constraints, e.g. recommend only to customers who bought in the last 30 days or recommend products only from a particular product category, may be used to filter or customize the three dimensions.
2. Post-Processor—The recommendation engine uses only customer history to create propensity scores that capture potential customer intent. They do not capture retailer's intent. The post-processor allows the retailers to adjust the scores to reflect some of their business objectives. For example, a retailer might want to push the seasonal products or products that lead to increased revenue, margin, market basket size, or diversity. The insight/relationship determination module 320 provides a number of post-processors that may be used individually or in combination to adjust the propensity scores.
3. Business Rules Engine—Some business constraints and objectives may be incorporated in the scores but others are implemented simply as business rules. For example, a retailer might want to limit the number of recommendations per product category, limit the total discount value given to a customer, etc. Such rules are implemented in the third stage where the propensity scores are used to create top R recommendations per customer.
4. Channel Specific Deployment—Once the recommendations are created for each customer, the retailer has a choice to deliver those recommendations using various channels. For example, through direct mail or e-mail campaigns, through their web-site, through in-store coupons at the entry Kiosk or point of sale, or through a salesman. The decision about the right channel depends on the nature of the product being recommended and the customer's channel preferences. These decisions are made in the deployment stage.
Before we describe the recommendation engine and the post-processing stages, let important deployment issues be considered.
Deployment Issues
There are several important issues that affect the nature of the deployment and functionality of a recommendation engine: (1) Recommendation Mode—products for a customer or customers for a product?; (2) Recommendation Triggers—Real-time vs. Batch mode?; and (3) Recommendation Scope—what aspects of a customer's transaction should be considered.
1. Recommendation Modes: Customer vs. Product vs. Time—The insight/relationship determination module 320 recommendation engine can be configured to work in three modes depending on the business requirements.
The insight/relationship determination module 320 definition of the recommendation engine allows all the three modes.
2. Recommendation Triggers: Real-time vs. Batch-Mode—A recommendation decision might be triggered in a number of ways. Based on their decision time requirements, triggers may be classified as:
(a) Real-time or Near-Real time triggers require that the recommendation scores are updated based on the triggers. Examples of such triggers are:
(b) Batch-mode Triggers require that the recommendation scores are updated based on pre-planned campaigns. Example of such a trigger is a weekly Campaign where E-mails or direct mail containing customer centric offers are sent out. A batch process may be used to generate and optimize the campaigns based on recent customer history.
3. Recommendation Scope: Defining History—Propensity scores depend on the customer history. There are a number of ways in which a customer history might be defined. Appropriate definition of customer history must be used in different business situations. Examples-of some of the ways in which customer history may be defined are given below:
In the recommendation engines presented below, the goal is to cross-sell products that the customer did not purchase in the past. That is why the past purchased products are deliberately removed from the recommendation list. It is trivial to add them in, as discussed in one of the post-processing engines, later.
At the heart of the recommendation scoring is the problem of creating a propensity or likelihood score for what a customer might buy in the near or far away future based on his customer history. In the following discussion, we present two types of recommendation engines based on (a) the nature of the context used, (b) interpretation of customer history, and (c) temporal-scope of the resulting recommendations: The (1) Market Basket Recommendation Engine (MBRE) and (2) Purchase Sequence Recommendation Engine (PSRE).
Market Basket Recommendation Engine
When either the customer's historical purchases are unknown and only current purchases can be used for making recommendations, or when the customer history is to be interpreted as a market basket and when recommendations for the near future have to be generated, then The insight/relationship determination module 320's Market Basket Recommendation Engine may be used. In MBRE customer history is interpreted as a market basket, i.e. current visit, union of recent visits, history weighted all visit. Any future target product for which the recommendation score has to be generated is considered a part of the input market basket that is not in it yet. Note that the propensity score for MBRE
ρ(u,t|x,Φ=ρ(u|x,Φ) recommends products that the customer would buy in the near future and, hence, the time dimensions is not used here.
Creating the MBRE Recommendation Model
The market basket recommendation is based on coarse market basket context. A window parameter ω denotes the time window of each market basket. Earlier we have described how market basket consistency matrix is created from the transaction data, given the window parameter and product level. This counts matrix is then converted into a consistency matrix using any of the consistency measures available in the insight/relationship determination module 320 library. This matrix serves as the recommendation model for an MBRE. In general this model depends on the (a) choice of the window parameter, (b) choice of the consistency measure, and (c) any customizations, e.g. customer segment, seasonality, applied to the transaction data.
Generating the MBRE Recommendation Score
Given the input market basket customer history, x, the recommendation model in the form of the market basket based co-occurrence matrix, Φ, the propensity score ρ(u|x,Φ) for target product u may be computed in several ways, for example:
1. Gibb's Aggregated Consistency Score—The simplest class of scoring functions simply aggregates the consistencies between the products in the market basket with the target product. The insight/relationship determination module 320 uses a general class of aggregation function known as the Gibb's aggregation based on Gibb's distribution that weigh the different products in the market basket according to their consistency strength with the target product.
The parameter λ ∈[0,∞] controls the degree to which the higher consistency products are favored. While these scores are fast and easy to compute they assume independence among the products in the market basket.
2. Single Bundle Normalized Score—Transaction data is a mixture of projections of multiple intentions. In this score, we assume that a market basket represents a single intention and treat it as an incomplete intention whereby adding the target product would make it more complete. Thus, a propensity score may be defined as the degree by which the bundleness increases when the product is added.
3. Mixture-of-Bundles Normalized Score—Although the single bundle normalized score accounts for dependence among products, it still assumes that the market basket is a single intention. In general, a market basket is a mixture of bundles or intentions. The mixture-of-bundles normalized score goes beyond the single bundle assumption. It first finds all the individual bundles in the market basket and then uses the bundle that maximizes the single bundle normalized score. It also compares these bundles against single products as well as the entire market basket, i.e. the two extremes.
Purchase Sequence Recommendation Engine
In the market basket based recommendation engine, the timing of the product is not taken into account. Both the input customer history and the target products are interpreted as market baskets. For retailers where timing of purchase is important, the insight/relationship determination module 320 framework provides the ability to use not just what was bought in the past but also when it was bought and use that to recommend not just what will be bought in the future by the customer but also when it is to be bought. As shown in
Creating the PSRE Recommendation Model
The PSRE recommendation model is essentially the Fluid Context matrix described earlier. It depends on (a) the time resolution (weeks, months, quarters, . . . ), (b) type of kernel and kernel parameter used for temporal smoothing of the fluid context counts, (c) consistency matrix used, and of course (d) customization or transaction data slice used to compute the fluid co-occurrence counts.
Generating the PSRE Recommendation Score
Given the input purchase sequence customer history:
=(x1,t
, . . .
x1,t1
)=
x,Δt
x={x
1
, . . . , x
L
};Δt={Δt
ij
=t
j
−t
i}1≦i<j≦L
and the fluid context matrix (recommendation model) matrix, Φ, the propensity score ρ(u,t|
1. Gibb's Aggregated Consistency Score—The simplest class of scoring functions used in MBRE is also applicable in the PSRE.
Note how the time-lag between a historical purchase at time tl and the recommendation time: t, given by Δ(t,tl)=tl−t, is used to pick the time-lag dimensions in the fluid context matrix. This is one applications of the fluid context's time-lag dimension. Although, it is fast to compute and easy to interpret, the Gibb's aggregate consistency score assumes that all past products and their times are independent of each other, which is not necessarily true.
2. Single-Phrase Normalized Score—Transaction data is a mixture of projections of multiple intentions spanning across time. In this score, we assume that a purchase history represents a single intention and treat it as an incomplete intention whereby adding the target product at the decision time t would make it more complete. Thus, a propensity score may be defined as the degree by which the phraseness increases when the product is added at the decision time.
3. Mixture-of-Phrases Normalized Score—Although the single bundle normalized score accounts for dependence among products, it still assumes that the entire purchase history is a single intention. In general a purchase sequence is a mixture of phrases or intentions across time. The mixture-of-phrases normalized score goes beyond the single phrase assumption. It first finds all the individual phrases in the purchase sequence and then uses the phrase that maximizes the single phrase normalized score. It also compares the score against all the single element phrases as well as the entire phrase, i.e. the two extreme cases.
Post-Processing Recommendation Scores
The recommendation propensity scores obtained by the recommendation engines as described above depend only on the transaction history of the customer. The propensity scores do not incorporate retailer's business objective yet. In the following discussion various possible business objectives and ways to post-process or adjust the propensity scores obtained from the recommendation engines to reflect those business objectives are presented. The post-processing combines the recommendation scores with adjustment coefficients. Based on how these adjustment coefficients are derived, there are two broad types of score adjustments:
1. First order, transaction data driven score adjustments in which the adjustment coefficients are computed directly from the transaction data. Examples are seasonality, value, and loyalty adjustments.
2. Second order Consistency matrix driven score adjustments in which the adjustment coefficients are computed from the consistency matrices. Examples are density, diversity, and future customer value adjustments.
Some of the important score adjustments are described below:
(a) First Order: Seasonality Adjustment
In any retailer's product space, some products are more seasonal than others and retailer's might be interested in adjusting the recommendation scores such that products that have a higher likelihood of being purchased in a particular season are pushed up in the recommendation list in a systematic way. This is done in the insight/relationship determination module 320 by first computing a Seasonality Score for each product, for each season. This score is high if the product is sold in a particular season more than expected. There are a number of ways to create the seasonality scores. One of the simple methods is as follows:
Let's say seasons are defined by a set of time zones for example each week could be a time zone, each month, each quarter, or each season (summer, back-to-school, holidays, etc.). We can then compute a seasonal value of a product in each season as well as its expected value across all seasons. Deviation from the expected value quantify the degree of seasonality adjustment. More formally:
be the total value of the product u across all seasons.
be the total normalizer across all seasons.
Two parameters may be used to create seasonality adjustments: The seasonal deviation of a product from the expected: ΔV(u|sk) and the seasonality coefficient σλ(u) that indicates whether or not the product is seasonal. Because the unit of the recommendation score does not match the unit of the seasonality adjustment, adjustments in the relative scores or ranks may be used as follows:
Here α(γs,σ(u))∈[0,1] is the combination coefficient that depends on a user defined parameter γs ∈[0,1] that indicates the degree to which seasonality adjustment has to be applied and the seasonality coefficient σ(u) of the product u.
(b) First Order: Value Adjustment
A retailer might be interested in pushing in high-value products to the customer. This up-sell business objective might be combined with the recommendation scores by creating a value-score for each product and the value property. i.e. revenue, margin, margin percent, etc. These value-scores are then normalized, e.g. max, z-score, rank, and combined with the recommendation score to increase or decrease the overall score of a high/low value product.
(c) First Order: Loyalty Adjustment
The recommendation scores are created only for the products that the customer did not purchase in the input customer history. This makes sense when the goal of recommendation is only cross-sell and expand customer's wallet share to products that he has not bought in the past. One of the business objectives, however, could be to increase customer loyalty and repeat visits. This is done safely by recommending the customer those products that he bought in the recent past and encourage more purchases of the same. For retailers where there are a lot of repeat purchases, for example grocery retailers, this is particularly useful.
The simplest way to do this is to create a value-distribution of each product that the customer purchased in the past. Compare this to the value-distribution of the average customer or the average value distribution of that product. If a customer showed higher value than average on a particular product then increase the loyalty-score for that product for that customer. More formally, let:
These deviations are used as loyalty coefficients. If a retailer is making R recommendations, then he may decide to use all of them based on history weighting or any fraction of them based on loyalty coefficients and the rest based on recommendation scores.
(d) Second Order: Density Adjustment
Introduced earlier was a consistency based density score for a product that uses the consistencies with its neighboring products to quantify how well this product goes with other products. Recommendation score is therefore adjusted to push high density products for increased market basket sizes.
(e) Second Order: Diversity Adjustment
If the business objective is to increase the diversity of a customer's market basket along different categories or departments, then the diversity score may be used in the post-processing. Earlier how to compute the diversity score of a product was described. There are other variants of the diversity score where it is specific to a particular department i.e. if the retailer wants to increase the sale in a particular department then products that have high consistency with that department get a higher diversity score. Appropriate variants of these diversity scores may be used to adjust the recommendation scores.
(f) Second Order: Life-Time Value Adjustment
There are some products that lead to the sale of other products either in the current or future visits. If the goal of the retailer is to increase the customer lifetime value, then such products should be promoted to the customer. Similar to the density measure, computed from market basket context, a life-time value for each product is computed from the purchase sequence context. These scores may be used to push such products that increase the life-time value of customers.
Combining Multiple Customizations in the Insight/Relationship Determination Module 320
Discussed above was the use of a single consistency matrix in either creating insights such as bridges, bundles, and phrases or generating decisions, such as using recommendation engine. The insight/relationship determination module 320 also allows combining multiple consistency matrices as long as they are at the same product level and are created with the same context parameters. This is an important feature that may be used for either:
1. Dealing with Sparsity—It may happen that a particular customer segment may not have enough customers and the counts matrix does not have statistically significant counts to compute consistencies. In such cases a bake-off model may be used where counts from the overall co-occurrence counts matrix based on all the customers are combined linearly with the counts of this segment's co-occurrence matrix resulting in statistically significant counts.
2. Creating Interpolated Solutions—A retailer might be interested in comparing a particular segment against the overall population to find out what is unique in this segment's co-occurrence behavior. Additionally, a retailer might be interested in interpolating between a segment and the overall population to create more insights and improve the accuracy of the recommendation engine if it is possible.
The segment level and the overall population level analysis from the insight/relationship determination module 320 may be combined at several stages each of which has their own advantages and disadvantages.
1. Counts Combination—Here the raw co-occurrence counts from all customers (averaged per customer) can be linearly combined with the raw co-occurrence counts from a customer segment. This combination helps in sparsity problems in this early stage of graph generation from the insight/relationship determination module 320.
2. Consistency Combination—Instead of combining the counts, the consistency measures of the co-occurrence consistency matrices can be combined. This is useful in both trying alternative interpolations of the insight generation, as well as the recommendation engines.
3. Recommendation Scores—For recommendation engine application, the recommendation score may be computed for a customer based on the overall recommendation model as well as the recommendation model based on this customer's segment based recommendation model. These two scores may be combined in various ways to come up with potentially more accurate propensity scores.
Thus the insight/relationship determination module 320 provides a lot of flexibility in dealing with multiple product spaces both in comparing them and combining them.
Dealing with Data Sparsity in the Insight/Relationship Determination Module 320
The insight/relationship determination module 320 is data hungry, i.e. the more transaction data it gets, the better. A general rule of thumb in the insight/relationship determination module 320 is that as the number of products in the product space grows, the number of context instances should grow quadratically for the same degree of statistical significance. The number of context instances for a given context type and context parameters depends on: (a) number of customers, (b) number of transactions per customer, and (c) number of products per transactions. There might be situations where there is not enough such as: (a) Number of customers in a segment is small, (2) Retailer is relatively new has only recently started collecting transaction data, (3) A product is relatively new and not enough transaction data associated with the product, i.e. product margin, is available, (4) analysis is done at a fine product resolution with too many products relative to the transaction data or number of context instances, or (5) sparse customer purchases in the retailer, e.g. furniture, high-end electronics, etc. have very few transactions per customer. There are three ways of dealing with such spartisy in the insight/relationship determination module 320 framework.
1. Product Level Backoff Count Smoothing—If the number of products is large or the transaction data is not enough for a product for one or more of the reasons listed above then the insight/relationship determination module 320 uses the hierarchy structure of the product space to smooth out the co-occurrence counts. For any two products at a certain product resolution, if either the margin or co-occurrence counts are low, then counts from the coarser product level are used to smooth the counts at this level. The smoothing can use not just the parent level but also grand-parent level if there is a need. As the statistical significance at the desired product level increases due to, say, additional transaction data becoming available over a period of time, the contribution of the coarser levels decreases systematically.
2. Customization Level Backoff Smoothing—If the overall customers are large enough but an important customer segment, i.e. say high value customers or a particular customer segment or a particular store or region, does not have enough customers then the co-occurrence counts or consistencies based on all the customers may be used to smooth the counts or consistencies of this segment. If there is a multi-level customer hierarchy with segments and sub-segments and so on then this approach is generalized to use the parent segment of a sub-segment to smooth the segment counts.
3. Context Coarseness Smoothing—If the domain is such that the number of transactions per customer or number of products per transaction is low, then the context can be chosen at the right level of coarseness. For example, if for a retail domain a typical customer makes only two visits to the store per year then the window parameter for the market basket window may be as coarse as a year or two years and the time-resolution for the purchase sequence context may be as coarse as a quarter or six months. The right amount of context coarseness can result in statistical significance of the counts and consistencies.
Any combination of these techniques may be used in the insight/relationship determination module 320 framework depending on the nature, quantity, and quality (noise-to-signal ratio) of the transaction data.
Predictive Time To Event Module
The insights and relationships found in the transaction data by the insight/relationship module 320 and then input to the predictive time to event module 330. The predictive time to event module 330 can be hardware, software or a combination of both hardware and software. The predictive time to event module 330 may also be called or termed an analytic engine which may be a portion of the processor and software that forms the analytic engine for other modules or can be a separate processor and software.
The core requirement for the TTE component 320 and process is a dataset of discrete event data 2720 for a set of entities. The dataset must include N time series of discrete events/transactions (N could be the number of individuals tracked in a longitudinal study.) A unique match key for each individual. Also required are P discrete event types (P could be the number of behaviors exhibited by the individuals, or the number of actions taken on the individuals, or the number of external events that may matter for the analysis, or all together. Also required is a date/time stamp associated with each event. For example a dataset containing a list of purchase transactions for different customers over a given time period would meet this requirement.
Additional inputs can also be accommodated. For example, other events may be defined by marketing actions on the customers, product price changes, public holidays, competitor actions, weather conditions, economic indicators, season and other time measures, and the like. These events could be collected in other databases, or gathered informally. Still other data can include individual information (demographics, credit information, etc.) and product information (size, color, etc.)
The event data 2810 is passed into a cleaning, statistics generating and feature generation process 2812. The feature generation process produces a unique independent training dataset 2814, 2815, 2816 for each target product which will be modeled. Each training data set includes many labeled examples used to train a scorecard. An example is given by a vector of numeric predictive feature values, and an associated binary outcome label. An example feature could be the recency of any particular event, or its frequency, or the current season, or an economic index, or the like. There are potentially thousands or even millions of features. The training dataset 2814, 2815, 2816 is appropriately down sampled and labeled for the target.
Each training dataset 2814, 2815, 2816 is then put through a series of binning, variable reduction, model training, scoring and analyzing steps 2820. The analyzing steps include Filtering out characteristics with little power to predict the outcome, and maintaining a set of most predictive characteristics. Automatic scorecard characteristic selection and fitting of the weights in the scorecard. This results in a final scorecard model for each target product 2824, 2825, 2826, with a accompanying performance measure and validation reports. In other words, P scorecards are developed. One scorecard is developed for each training data set. Lastly all of the customers in the training dataset are scored using the developed models to produce the customer product propensity matrix 2730, which predicts the likelihood of each customer to buy each modeled product in the next time period.
The predictive time-to-event component 320 can also produce one prospensity matrix or more propensity matrices (which are discussed in more detail below along with
TTE produces a set of models, one scorecard model per target product. These models can be used directly to score out datasets. TTE is an automated process of generating data for, and building, a large number of scorecards. In order to build the large number of models required by the TTE component, a large amount of processing power is required. To obtain this multiple computers are used in parallel. In one embodiment, a large amount of under utilized computing power, is used to run various jobs required.
The result of the process associated with the TTE component 320 and the process 2710, is that a set of propensity matrices can be produced for several future time periods so as to define the relationship between the risk of an event occurring in each of several discrete time periods. It should be noted that the predictors can change their values in each of the future time periods so that a decision can be made to send a marketing offer while it has the most probability of maturing into a sale.
The results as time movers on are fed back to both the insight/relationship determination component 310 and the predictive time-to-event component 320. Scoring is repeated at regular time intervals, as determined by the business (e.g. every night, every weekend, or the like). The score value of a particular individual and a particular event can change over the course of time, either due to recent events experienced by the individual, or due to the passage of time itself. The score values (i.e. likelihoods) of all individuals for all events of interest are input into a decision optimization. For example, a retailer may use the scores in a recommendation engine, which matches customers to products for which they have a high propensity.
In operation, statistics of model performance are automatically generated and tested against known and estimated distributions of the statistic. When the likelihood of observing a value for the statistic falls below an a-priori determined performance cutoff the models are deemed “stale” and automatically rebuilt.
These questions can be answered by fixing the two out of the three dimensions, and picking the top scoring combination for the third dimension.
The propensity matrix can be optimized for various sets of given conditions. As mentioned above, one of the variables may be held constant and then the most likely propensities may be the basis for certain optimizations. For example, the propensities or risks associated with a sale of beer for a selected time can be input for making recommendations to a particular set of customers. By the same token, certain customers can be looked at for their propensities over a time frame. In each case, several time frames can also be looked at. Business rules can be applied as a set of restrictions to the propensity matrix. After application of the business rules the matrix can then be optimized. For example, the highest propensities may be selected over a three month period. Recommendations would be assigned a cost, and the highest propensity actions would be taken for a given budget.
For a selected set of constraints, propensity matrices can be reviewed for a number of time frames and the occurrences of time for customers for a set of events can be compiled into an optimized offer schedule.
A method of selecting actions with respect to a plurality of customers includes storing transition data, determining a relationship between a first entity, a second entity, and a third entity from information that includes the transaction data, ranking the possibility of a first future event occurring in a first selected time period for a first subset of the plurality of customers based on the relationship between the first entity, the second entity and the third entity; and ranking the possibility of a second future event occurring in a second selected time period for the first subset of the plurality of customers based on the relationship between the first entity, the second entity and the third entity. Some embodiments of the method further ranking the possibility of a third future event occurring in a first selected time period for a second subset of the plurality of customers based on the relationship between the first entity, the second entity and the third entity, and ranking the possibility of a fourth future event occurring in a second selected time period for the second subset of the plurality of customers based on the relationship between the first entity, the second entity and the third entity. The method can also include selecting one of the first, second, third or fourth future events based on the ranking of those events possibly occurring. The method for selecting actions with respect to a plurality of customers also may include selecting a combination of the first, second, third or fourth future events based on the ranking of those events possibly occurring. In still another embodiment, the method for selecting actions with respect to a plurality of customers also includes selecting a combination of the first, second, third or fourth future events based on optimizing a select amount of resources associated with at least one of the first entity, the second entity and the third entity. In one embodiment of the method at least one of the first entity, the second entity, and the third entity is a marketing action.
Technical Implementation
Exemplary Digital Data Processing Apparatus
A block diagram of a computer system 6000 that executes programming for performing the above methods is shown in
Computer-readable instructions stored on a computer-readable medium are executable by the processing unit 6002 of the computer 6010. A hard drive, CD-ROM, and RAM are some examples of articles including a computer-readable medium. A machine-readable medium provides instructions-that, when executed by a machine, cause the machine to read transaction data, determine a relationship between a first entity and a second entity from the transaction data, rank the possibility of a future event occurring in a first selected time period based on the relationship between the first entity and the second entity, and rank the possibility of a future action occurring in a second selected time period based on the relationship between the first entity and the second entity. The instructions, in some embodiments, further cause the machine to quantify the relationship between the first entity and the second entity. In another embodiment, the machine-readable medium provides instructions that, when executed by a machine, further cause the machine to select one of the first selected time period or the second selected time period based on the ranking of the possibility of a future event occurring in the first selected time period, and the ranking of the possibility of a future event occurring in the first selected time period. The machine-readable medium, in still further embodiments, provides instructions that, when executed by a machine, further cause the machine to determine a relationship between the first entity and the second entity and a third entity. The third entity may be a marketing action, or demographic information, or the like.
Logic Circuitry
In contrast to the digital data processing apparatus or computer system 6000 discussed above, a different embodiment of this disclosure uses logic circuitry instead of computer-executed instructions to implement processing entities of the system. Depending upon the particular requirements of the application in the areas of speed, expense, tooling costs, and the like, this logic may be implemented by constructing an application-specific integrated circuit (ASIC) Such an ASIC may be implemented with CMOS, TTL, VLSI, or another suitable construction. Other alternatives include a digital signal processing chip (DSP), discrete circuitry (such as resistors, capacitors, diodes, inductors, and transistors), field programmable gate array (FPGA), programmable logic array (PLA), programmable logic device (PLD), and the like.
A system for selecting a next action includes a memory for storing transaction data, a insight/relationship determination module, and a rank module. The insight/relationship determination module determines a relationship between a first entity and a second entity from the transaction data. The rank module ranks the possibility of a future event occurring in a first selected time period based on the relationship between the first entity and the second entity, and for ranking the possibility of a future action occurring in a second selected time period based on the relationship between the first entity and the second entity. In one embodiment, the insight/relationship determination module quantifies the relationship between the first entity and the second entity. Some embodiments also include a selection module for selecting-one of the first selected time period or the second selected time period based on the ranking of the possibility of a future event occurring in the first selected time period, and the ranking of the possibility of a future event occurring in the second selected time period.
Signal-Bearing Media
Wherever the functionality of any operational components of the disclosure is implemented using one or more machine-executed program sequences, these sequences may be embodied in various forms of signal-bearing media. Such a signal-bearing media may comprise, for example, the storage or another signal-bearing media, such as a magnetic or optical disk, tape, non-volatile or volatile memory such as. ROM (read only memory), EPROM (erasable programmable read only memory) flash PROM, or EEPROM, battery backup RAM, optical storage e.g. CD-ROM, WORM, DVD, digital optical tape, or other suitable signal-bearing media including analog or digital transmission media and analog and communication links and wireless communications as well as communications over the internet.
A machine-readable medium that provides instructions that, when executed by a machine, cause the machine to read transaction data, determine a relationship between a first entity and a second entity from the transaction data, rank the possibility of a future event occurring in a first selected time period based on the relationship between the first entity and the second entity, and rank the possibility of a future action occurring in a second selected time period based on the relationship between the first entity and the second entity. The instructions, in some embodiments, further cause the machine to quantify the relationship between the first entity and the second entity. In another embodiment, the machine-readable medium provides instructions that, when executed by a machine, further cause the machine to select one of the first selected time period or the second selected time period based on the ranking of the possibility of a future event occurring in the first selected time period, and the ranking of the possibility of a future event occurring in the first selected time period. The machine-readable medium, in still further embodiments, provides instructions that, when executed by a machine, further cause the machine to determine a relationship between the first entity and the second entity and a third entity. The third entity may be a marketing action, or demographic information, or the like.
The foregoing description of the specific embodiments reveals the general nature of the invention sufficiently that others can, by applying current knowledge, readily modify and/or adapt it for various applications without departing from the generic concept, and therefore such adaptations and modifications are intended to be comprehended within the meaning and range of equivalents of the disclosed embodiments.
It is to be understood that the phraseology or terminology employed herein is for the purpose of description and not of limitation. Accordingly, the invention is intended to embrace all such alternatives, modifications, equivalents and variations as fall within the spirit and broad scope of the appended claims.