Consumer decision-making has been a focus for many years. Companies that are attempting to meet a particular need in the marketplace, or that are attempting to find out how products or services are being received by the consumer, will often conduct market research to attempt to quantify attributes or characteristics of a particular consumer segment. If performed well, the consumer data extracted from this research can inform companies about how their and others' products or services are perceived and bought by purchasers or potential purchasers in the marketplace, and how the companies' products or services can be changed to achieve the companies' business goals.
Traditionally, this information is collected by introducing products and/or services to a test panel, focus group or another set of actual consumers and query whether they like the product and would be interested in purchasing or using the product or service. Such consumer interest/liking surveys are intended to give marketers a better idea about whether consumers would actually buy or like the products and/or services, how often (or likelihood of repeat purchases) and how many units or what size they would purchase, how much they would pay, etc. In addition, such surveys can also be used to determine interest in advertising and preferences for certain ingredients in foods or beverages, in packaging types, such as paperboard, plastic, etc.
While consumer data can be very useful, the data can often give inaccurate expectations and predictions about the probably success of the product or service, thereby creating potentially skewed results compared with actual sales. Such a situation can be embarrassing for a manufacturer and agency that conducted the surveys if expected/predicted purchasing levels as suggested by the manufacturer and agency are not attained.
This inaccuracy may be due to test panel participants or subjects providing feedback that does not match their actual liking or purchasing habits. While a few consumers in a survey may intentionally supply incorrect answers because they want to be invited back for other surveys or test product sampling, most participants generally try to be as accurate as possible, but their answers may not exactly correspond to their actual behavior. This change in circumstances may be due to a number of different reasons. One such reason is that test panelists sometimes don't understand the survey questions or may find the questions to be confusing or misleading. For example, in the food context, panelists might confuse the terms “refrigerated” and “frozen,” and give a survey response, which assumes an inaccurate product characteristic. Another reason for inaccuracy may be that the panelist is flattered that someone is asking for their opinion, and consequently is overly polite to the interviewer and indicates interest in the product even though the consumer wouldn't have enough interest in the actual product to seek it out and pay hard-earned money to buy it. Still other reasons may include errors in inputting or compiling survey responses and other factors. All of the foregoing can lead to inaccurate or skewed data when trying to interpret whether to continue supporting a product or service offering.
Much work has been tried in the past to make marketing survey results more accurate. Accordingly, what is needed is a technique for somehow taking inaccuracies of conventional consumer preference assessments into account while nevertheless providing a more accurate assessment or predictor of consumer interest in products and services.
A method is provided for predicting consumer behavior in selected products. The method includes providing a first matrix associated with N products evaluated by a plurality of consumers in terms of several different responses, providing a second matrix associated with the N products characterized by at least one of an analytical profile or an evaluation by a plurality of experts, and correlating the first matrix to the second matrix to produce a relationship model. In one embodiment, the first matrix is compressed to a dimensionality comparable to the dimensionality of the second matrix by computing average values for each product and consumer response variable, either over all consumers, or separately for likers and non-likers.
In one embodiment, the method can further include displaying a score plot of the relationship model. The score plot can include a diagnostic of the strength of association and correlation between the first matrix and the second matrix.
In another embodiment, the method can further include predicting consumer responses for new products using the relationship model. The predicted responses can be displayed with a level of confidence. A measure of reliability of the predictions for new products can be displayed as characterized by the second matrix.
In another embodiment, the method can further include building a third matrix associated with the N products characterized by either an analytical profile or an evaluated by an expert sensory panel not chosen in the building the second matrix and relating the first matrix to the third matrix to produce a relationship model. The method can further include relating any two matrices to each other.
In another embodiment, each matrix can be preprocessed by at least one preprocessing element to transform the data into a suitable form for analysis. The preprocessing elements can include scaling of data, mean-centering, transformation and expansion, advanced scaling, and data correction and compression. In one embodiment, the building the first matrix can include analyzing the preprocessed data using cross-validation to determine a number of significant components, inspecting the data for outliers, and removing the outliers from the data. The data can be displayed as scores to show indications of groups, trends, and outliers. In another embodiment, the building of the first relationship model can include analyzing the preprocessed data to determine a liking/non-liking model, cross-validating the liking/non-liking model to determine the number of significant components, and dividing liking/non-liking model into liker data and non-liker data based on the number of significant components. Further, an average value can be computed for each product and liker and non-liker consumer response variable.
A system is provided for predicting consumer behavior in selected products, including a first matrix module for providing a first matrix associated with N products evaluated by a plurality of consumers in terms of several responses, a second matrix module for providing a second matrix associated with the N products characterized by at least one of an analytical profile or an evaluation by a plurality of experts, and a correlation module for correlating the first matrix to the second matrix to produce a relationship model. The system can further include a display module for displaying a score plot of the relationship model. The score plot can include a diagnostic of the strength of association and correlation between the first matrix and the second matrix.
In another embodiment, the system can include a prediction module for predicting consumer responses for new products using the relationship model. The system can also include a display module for displaying the predicted responses with a level of confidence and/or a display module for displaying a measure of reliability of the predictors for the new products as characterized by the second matrix.
In another embodiment, the system can include a third matrix module for building a third matrix associated with the N products characterized by either an analytical profile or an evaluated by an expert sensory panel not chosen in the building the second matrix, and a relationship module for relating the first matrix to the third matrix to produce a relationship model. The system can relate any two matrices to each other.
In another embodiment, each matrix can be preprocessed by at least one preprocessing element to transform the data into a suitable form for analysis. The preprocessing elements can include scaling of data, mean-centering, transformation and expansion, advanced scaling, and data correction and compression.
In one embodiment, the system can include an analysis module for analyzing the preprocessed data of any matrix or pair of matrices using cross-validation to determine a number of significant components of the data, an inspection module for inspecting the number of significant components of the data for outliers, and an outlier module for removing the outliers from the data. The system can include a display module for displaying the data or scores to show indications of groups, trends, and outliers.
In one embodiment, building the first matrix includes analyzing the preprocessed data to determine a liking/non-liking model, cross-validating the liking/non-liking model to determine a number of significant components, and dividing liking/non-liking model into liker data and non-liker data based on the significant components. The dividing module further includes computing an average value for each product consumer response variable, either for all consumers testing the product, or separately for likers and non-likers.
A method of predicting consumer behavior in selected products, includes means for providing a first matrix associated with N products evaluated by a plurality of consumers, means for providing a second matrix associated with the N products characterized by at least one of an analytical profile or an evaluation by a plurality of experts, and means for correlating the first matrix to the second matrix to produce a relationship model.
A computer readable medium having prediction software stored thereon that when executed on a computing device correlates matrix data to produce a predicted relationship model, includes correlating a first matrix to a second matrix to produce a relationship model, and displaying a score plot of the relationship model.
The method and system provide the advantages of predicting consumer responses without the need for additional consumer input.
The basic objectives are (a) to understand the consumer responses and liking of the products as well as a comparison between the products with respect to the consumer data, and (b) to find the relationships between on the one hand the data matrices A and P, and on the other hand C (
The foregoing and other objects, features and advantages of the invention will be apparent from the following more particular description of preferred embodiments of the invention, as illustrated in the accompanying drawings in which like reference characters refer to the same parts throughout the different views. The drawings are not necessarily to scale, emphasis instead being placed upon illustrating the principles of the invention.
Generally, a system predicts consumer responses for N products and candidates as follows. At least two matrices are produced for the N products or candidates, one matrix based on consumer evaluation and the other matrix based on analytical profile characterization or expert panel evaluation. A third matrix can be produced based on analytical profile characterization or expert panel evaluation not used for building the other matrix. A relationship model is built by correlating the product candidate data evaluated by consumers with the same product candidate data evaluated or analyzed by an expert panel and/or an analytical profile. The relationship model is used to build a prediction model of consumer behavior from either analytical or expert panel data or both. The prediction model provides an understanding of the nature of consumer behavior in terms of physical, chemical, and other factors, and thus allows the modification of the product candidates to improve consumer liking.
Each matrix module (102, 104, 106) produces respective matrices each being based on N observations and K variables from a set of products that is evaluated by a group of consumers, and characterized by an analytical profile and/or a descriptive profile from a panel of experts. Note that the number of rows in C often exceeds the number of products N, since several consumers evaluate each product. Analogously, several experts evaluating each product, and each product sample may be subjected to the analytical instrument several times, and hence the number of rows in 204 and 206 may initially exceed N before a preprocessing by averaging reduces these numbers of rows to N (the number of products).
The correlation module 110 correlates at least two matrices to produce a relationship model that represents the relationship between the at least two matrices using PLS analysis. In some embodiments, before the matrices can be correlated they are transformed by the preprocessing module 140 and the PCA module 150 or the PLS module 160 into a suitable form for analysis using preprocessing elements. The preprocessing elements can include scaling of data, mean-centering, transformation and expansion, advanced scaling, and data correction and compression.
The display module 120 displays graphical results of the relationship model on a display device. The displayed results assist a user in narrowing the data set to produce a more detailed model. It should be understood that display device can be any type of display device known, such as a liquid crystal display (LCD), a cathode ray tube (CRT) or the like.
The prediction module 130 utilizes the relationship model to predict responses of other products and product candidates without the need for these to be evaluated by the group of consumers, but only by the analytical profile and/or the panel of experts.
The consumer matrix 202 represents the evaluation of each of the N products and candidates by a panel of consumers based on a set of KC criteria to build the matrix 202 with KC variables. The KC criteria can include an overall liking of each N products and candidates by each consumer, initially, after some time, e.g., 30 seconds, after some additional time, e.g., 2 minutes, etc., and specific likings or dislikings such as sourness, metallic taste, sweetness, juiciness, hardness initially, after, e.g., 30 seconds, after e.g., 2 minutes, etc.
In some embodiments, each consumer may only evaluate a fraction of the products and candidates, i.e., ½ or ⅓ of the candidates, where the selection of the products and candidates evaluated by each consumer can be done according to an incomplete block design or similar.
The analytical data matrix 204 represents the characterization each of the N products and candidates by analytical profiles to build the matrix 204 with KA variables. Examples of analytical profiles include gas or liquid chromatography (LC) and/or mass spectroscopy (MS), other spectroscopies (NMR, IR, NIR, Raman, or other) and combinations thereof, e.g., LC-MS.
The expert panel data matrix 206 represents the evaluation of each of the N products and candidates evaluated by an expert sensory panel to build the matrix 206 with KP attributes. Examples of attributes include for example toughness, color, acidity taste, bitterness and metallic taste taken at periodic time periods, i.e., after 0, 30, 60, and 300 seconds, etc. In some embodiments, the expert sensory panel evaluation is made in duplicate or triplicate, which then is averaged to expert panel data matrix 206 to have one matrix row per product.
The correlation module 110 correlates the compressed consumer matrix 202 with at least one of the analytical data matrix 204 and the expert panel data matrix 206 to produce the relationship model. In some embodiments, the correlation module 110 can correlate any two matrices. However, a complex correlation analysis is needed because each matrix is typically different in size. For example, the analytical data matrix 204 and the expert panel data matrix 206 usually have N rows (averaging over several experts or/and several analyses may be needed as a preprocessing), one for each product, while the consumer matrix 202 usually has a different and larger number of rows, one for each responding consumer with respect to one product; the number of columns in the matrices is usually different; and in some instances the analytical matrix 204 is absent or incomplete and difficult to employ in further data analysis.
In some embodiments, the preprocessing module 140 is used to transform the data in each matrix (202,204,206) into a suitable form for analysis using preprocessing elements, such as scaling of data, mean-centering, transformation and expansion, advanced scaling, and data correction and compression.
In some embodiments, the PCA module 150 and/or the PLS module 160 is used on the preprocessed consumer matrix 202 to (1) understand the consumer likings of the products and candidates, and (2) to “compress” the number of rows NC to N to make it possible to relate the compressed consumer matrix 202 to the analytical data matrix 204 and/or the expert panel data matrix 206.
The resulting model is first used to compress the consumer data from NC rows (one per consumer and product evaluation) to N rows (one per product) by means of averages over the consumers for each of the N products and for each consumer response. Sometimes this compression is made separately for likers and non-likers as seen in the scores of the first PLS model. The liking module 170 (
In some embodiments, a band of “indifferent” consumers with “t” between −0.5 and 0.5 can be excluded from further analysis to make a more distinct separation of likers from non-likers. It should be understood that the bandwidth can be customized by the user.
The dividing module 172 (
The compressed matrix of liker-data is then correlated with the analytical data matrix 204 (
The analysis module 162 (
As shown in
In some embodiments, if the number of variables is large, typically larger than 50, the PCA and PLS analyses may be done hierarchically by dividing the variables into blocks, analyzing each block separately, and then using the resulting block-scores from all the block models as new variables in a second PCA or PLS model. The loadings and other coefficients of the second PCA or PLS model give information about the importance of, and the correlation between, the blocks. A drill-down into each block model is made for important blocks (with large coefficients in the second model) to see which individual variables are important (having large coefficients in the respective block model) and how they are correlated.
The PLS module 160 estimates the relationship between two matrices X and Y, which can be used in 130 to predict Y-values for new samples of products or product candidates. Also, a reliability measure of the X-data for each sample is given (both “training set” and prediction samples), i.e., a distance to the model plot 320 as shown in
In some embodiments, the score plot 300 as shown in
In some embodiments, a PLS analysis can be used to produce a product liking profile by transforming the consumer data matrix 202 (
In some embodiments, the plot can show a distribution of consumer liking (score t1) for each product; the plot can be colored or shaded by product to indicate which product has strong likers; the plot can show which product has few/many weak likers or non-likers; etc.
In some embodiments where the consumer analyzed only part of the products, such as ½ or ⅓, a special PCA analysis of the folded out data matrix including “holes” can be done to get estimates of the values of the “holes” (the matrix elements with no value). Thereafter, the matrix with the “holes” can be filled in and analyzed by the special 3-way analysis as described above, resulting in scores plotted with different colors or shades for different products.
In another embodiment, a second plot can show the loadings of the consumer scales displaying which scales contribute strongly to the product profiles and which scales contribute weakly or not at all.
In some embodiments, it may be useful to understand the differences between known groups or classes of products and candidates in either the analytical, panel, or consumer data. The user can apply a PLS-discriminant analysis (PLS-DA) to the consumer matrix 202 (
As shown in
In some embodiments, if the number of variables is large, typically larger than 50, the PLS-discriminant analysis can be done hierarchically by dividing the variables into blocks and analyzing each block separately, and then using the resulting block-scores from all block models as new variables in a second “super model” as shown in the super model plot 370 of
The above-described systems and methods can be implemented in digital electronic circuitry, in computer hardware, firmware, and/or software. The implementation can be as a computer program product (i.e., a computer program tangibly embodied in an information carrier). The implementation can, for example, be in a machine-readable storage device and/or in a propagated signal, for execution by, or to control the operation of, data processing apparatus. The implementation can, for example, be a programmable processor, a computer, and/or multiple computers.
A computer program can be written in any form of programming language, including compiled and/or interpreted languages, and the computer program can be deployed in any form, including as a stand-alone program or as a subroutine, element, and/or other unit suitable for use in a computing environment. A computer program can be deployed to be executed on one computer or on multiple computers at one site.
Method steps can be performed by one or more programmable processors executing a computer program to perform functions of the invention by operating on input data and generating output. Method steps can also be performed by and an apparatus can be implemented as special purpose logic circuitry. The circuitry can, for example, be a FPGA (field programmable gate array) and/or an ASIC (application-specific integrated circuit). Modules, subroutines, and software agents can refer to portions of the computer program, the processor, the special circuitry, software, and/or hardware that implement that functionality.
Processors suitable for the execution of a computer program include, by way of example, both general and special purpose microprocessors, and any one or more processors of any kind of digital computer. Generally, a processor receives instructions and data from a read-only memory or a random access memory or both. The essential elements of a computer are a processor for executing instructions and one or more memory devices for storing instructions and data. Generally, a computer can include, can be operatively coupled to receive data from and/or transfer data to one or more mass storage devices for storing data (e.g., magnetic, magneto-optical disks, or optical disks).
To provide for interaction with a user, the above described techniques can be implemented on a computer having a display device. The display device can, for example, be a cathode ray tube (CRT) and/or a liquid crystal display (LCD) monitor. The interaction with a user can, for example, be a display of information to the user and a keyboard and a pointing device (e.g., a mouse or a trackball) by which the user can provide input to the computer (e.g., interact with a user interface element). Other kinds of devices can be used to provide for interaction with a user. Other devices can, for example, be feedback provided to the user in any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback). Input from the user can, for example, be received in any form, including acoustic, speech, and/or tactile input.
The above described techniques can be implemented in a distributed computing system that includes a back-end component. The back-end component can, for example, be a data server, a middleware component, and/or an application server. The above described techniques can be implemented in a distributing computing system that includes a front-end component. The front-end component can, for example, be a client computer having a graphical user interface, a Web browser through which a user can interact with an example implementation, and/or other graphical user interfaces for a transmitting device. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include a local area network (LAN), a wide area network (WAN), the Internet, wired networks, and/or wireless networks.
The system can include clients and servers. A client and a server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.
While this invention has been particularly shown and described with references to preferred embodiments thereof, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the scope of the invention encompassed by the appended claims.