The present invention relates to an information processing system, an information processing method, and an information processing program that analyze a factor that can contribute to a prediction target.
A method of conducting various analyses based on a large amount of performance data is known. Point of sale (POS) data is an example of data representing the sales performance of each store. For example, in a case where a company developing 1000 retail stores nationwide aggregates the sales volume of 2000 types of commodities per store per month, the number of such POS data in one year sums up to 1000 (stores)×12 (months/year)×2000 (types/month and store)=24,000,000.
As a method of analyzing such POS data, for example, there is a method of using an aggregation tool having a function such as a pivot table of EXCEL (registered trademark). By causing such an aggregation tool to read the POS data, it is possible for a user to aggregate the number of sales of commodities from various viewpoints, such as for each store, for each season, and for each commodity and to freely analyze a factor that has contributed to the sales from a micro viewpoint to a macro viewpoint.
Besides, tableau (registered trademark), Statistical Analysis System (SAS) (registered trademark), Statistical Package for Social Science (SPSS) (registered trademark), and the like are known as examples of such software specialized for statistics.
In addition, PTL 1 describes a system that aggregates unspecified majority of people using a plurality of pieces of data. The system described in PTL 1 acquires data on the number of visitors by counting visitors to a predetermined place based on input data and also acquires characteristic estimation data by estimating characteristics of visitors based on input data.
PTL 1: International Publication WO2009/041242
According to the technique described in PTL 1, it is possible to count the number of visitors to a predetermined place based on input data. However, the technique described in PTL 1 does not consider analyzing what kind of factors contributed to the number of visitors to what extent with respect to the number of visitors to a predetermined place.
Accordingly, an object of the present invention is to provide an information processing system, an information processing method, and an information processing program capable of analyzing a factor that can contribute to a prediction target.
An information processing system according to the present invention is an information processing system configured to predict a prediction target specified by a plurality of classifications using a prediction model including a variable that affects the prediction target, the information processing system including: an accepting unit that accepts classifications that specify the prediction target; and an aggregating unit that specifies the prediction target by the accepted classifications and aggregates, for each of the variables, the degree of contribution determined by the prediction model corresponding to the prediction target.
An information processing method according to the present invention is an information processing method configured to predict a prediction target specified by a plurality of classifications using a prediction model including a variable that affects the prediction target, the information processing method including: accepting classifications that specify the prediction target; and specifying the prediction target by the accepted classifications and aggregating, for each of the variables, the degree of contribution determined by the prediction model corresponding to the prediction target.
An information processing program according to the present invention is an information processing program applied to a computer configured to predict a prediction target specified by a plurality of classifications using a prediction model including a variable that affects the prediction target, the information processing program causing the computer to execute: an accepting process of accepting classifications that specify the prediction target; and an aggregating process of specifying the prediction target by the accepted classifications and aggregating, for each of the variables, the degree of contribution determined by the prediction model corresponding to the prediction target.
According to the present invention, it is possible to analyze a factor that can contribute to a prediction target.
As described in PTL 1, it is common to use a large amount of past performance data in analyzing information. Meanwhile, in analyzing information, it is also conceivable to use prediction models learned for each prediction target based on past performance data as well as past performance data itself. It is considered that the prediction model properly learned based on the performance data appropriately reflects the properties of the performance data. Therefore, it becomes possible to analyze a factor that can contribute to a prediction target based on such a prediction model.
However, prediction models are generally used for predicting results and it is not usual to use large amounts of prediction models themselves for factor analysis. When a prediction model is learned for each prediction target, if there is a large amount of prediction targets, a large amount of prediction models also exists. The present inventors gained the idea of analyzing a factor that can contribute to the prediction target by aggregating a large amount of prediction models.
Hereinafter, exemplary embodiments of the present invention will be described with reference to the drawings. In the following description, it is assumed that each prediction target is predicted using a prediction model and the prediction model has already been learned with past performance data and the like in advance. In addition, one prediction model is associated with one prediction target.
The prediction model is information representing a correlation between an explanatory variable and an objective variable. The prediction model is, for example, a component for predicting a prediction target result by calculating a variable as an objective based on an explanatory variable. The prediction model is generated by a learning device using learning data for which a value of an objective variable has already been obtained and an arbitrary parameter as inputs. The prediction model may be represented by, for example, a function c that maps an input x to a correct solution y. The prediction model may predict a numerical value of the prediction target or may predict a label of the prediction target. The prediction model may output a variable describing a probability distribution of the objective variable. The prediction model may be denoted as “model”, “learning model”, “estimation model”, “prediction formula”, or “estimation formula”, for example.
In the present exemplary embodiment, the prediction model is represented by a prediction formula including one or more explanatory variables indicating factors that can contribute to the prediction result of the prediction target. In the prediction model, for example, the objective variable is represented by a linear regression equation including a plurality of explanatory variables. In the above example, the objective variable is equivalent to the correct solution y and the explanatory variable is equivalent to the input y. The maximum number of explanatory variables included in one prediction model may be restricted for the purpose of, for example, enhancing the interpretability of the prediction model and preventing over learning. As will be described later, the prediction formula used for predicting one prediction target is not limited to one and a conditional predictor in which the prediction formula is selected according to the value of the explanatory variable may be used as the prediction model.
The prediction target is assumed to belong to one or more classifications designated by a user. The classification may be singular or may have a hierarchical structure. Taking a retail store as an example, the prediction target is, for example, “the number of sales of orange juice sold at a shop A in Tokyo”. In this case, the prediction target is specified by the classification of a sales store (Tokyo>shop A) and the classification of a commodity (beverage>fruit juice beverage>orange juice). Here, the symbol denoted by “>” indicates that the classification is in a hierarchical structure.
Besides, the prediction target is, for example, “the number of sales of a ballpoint pen of Company A's private brand sold in March 2016 at a shop B managed by Company A”. In this case, the prediction target is specified by the classification of a sales store (managed by Company A>shop B), the classification of timing of sales (2016>March 2016), and the classification of a commodity (Company A's private brand>stationery>ballpoint pen).
The storage unit 30 stores a prediction model for each prediction target.
In the example illustrated in
The storage unit 30 is implemented by, for example, a magnetic disk device. The output unit 40 outputs the aggregation result by the aggregating unit 20. The output unit 40 may also accept an input from the user for the output result. The output unit 40 is implemented by, for example, a display device or a touch panel.
The accepting unit 10 accepts a classification that specifies a prediction target. In other words, the accepting unit 10 accepts a classification for specifying a prediction target for which a factor is to be analyzed. The classification to be accepted is not limited to one, but may be plural. For example, when a factor for “apple juice” at each store in March 2016 is analyzed, the accepting unit 10 accepts “March 2016” and “apple juice” as classifications. In addition, when the classification is in a hierarchical structure, the accepting unit 10 may accept not only a lowest classification but also an upper classification. For example, the accepting unit 10 may cause the output unit 40 to display candidate classifications and accept one or more classifications selected by the user. Besides, the accepting unit 10 may accept a classification via a communication network.
The aggregating unit 20 specifies a prediction target based on the accepted classification and specifies a prediction model for the specified prediction target. Specifically, the aggregating unit 20 specifies a prediction model for the prediction target from within the storage unit 30.
Note that, when the accepting unit 10 accepts an upper classification in the hierarchical structure, the aggregating unit 20 may judge that all lower classifications belonging to that classification are designated and specify all prediction targets of the matching classifications.
For example, in the example illustrated in
Then, the aggregating unit 20 aggregates the weights of the explanatory variables for each explanatory variable included in the specified prediction models. Specifically, the aggregating unit 20 calculates the total sum of the weights for each explanatory variable included in the specified prediction models, thereby aggregating the weight of each explanatory variable. When the prediction formula is represented by a linear regression equation, since the weight of the explanatory variable corresponds to a coefficient, the aggregating unit 20 aggregates the coefficients of the explanatory variables for each explanatory variable.
Since the explanatory variable with a larger weight has a higher degree of contribution to the prediction result, in the following description, the weight specified for each explanatory variable or an aggregated value of the weights aggregated from a predetermined viewpoint is referred to as the degree of contribution of the explanatory variable. Note that the degree of contribution of the explanatory variable may be simply referred to as the degree of contribution hereinafter in some cases.
In addition, in the following description, the total sum of the weights for each explanatory variable included in the prediction models for the specified prediction targets is referred to as a first degree of contribution.
The aggregating unit 20 calculates the total sum of the weights of each explanatory variable. In the example illustrated in
Note that the value of the coefficient may be used as the weight instead of the absolute value of the coefficient. Specifically, the weight may be a signed value. In this case, the aggregating unit 20 may calculate the total sum of the weights of each explanatory variable while canceling out a positive coefficient and a negative coefficient (that is, by performing addition and subtraction in line with the signs). In addition, the aggregating unit 20 may separately aggregate a positive degree of contribution and a negative degree of contribution for one explanatory variable. In this manner, since the aggregating unit 20 aggregates the degree of contribution of one explanatory variable for each sign, it is possible to use one explanatory variable from the viewpoints of two explanatory variables.
Note that the aggregating unit 20 may standardize the coefficients included in each prediction formula. Specifically, the aggregating unit 20 may correct each coefficient such that the total value of the coefficients in each prediction formula becomes one (that is, the average becomes zero and the dispersion becomes one). For example, in the case of the prediction formula Y1 exemplified in
In addition, the aggregating unit 20 may calculate the ratio of the calculated degree of contribution (first degree of contribution) of each explanatory variable. Specifically, the aggregating unit 20 may calculate, for each explanatory variable, the ratio of the first degree of contribution of each explanatory variable to the total sum of the first degrees of contribution. For example, it is assumed that the prediction formula exemplified in
Furthermore, the aggregating unit 20 may standardize the degrees of contribution calculated for each explanatory variable. Specifically, the aggregating unit 20 may correct each degree of contribution such that the total value of the degrees of contribution of the respective explanatory variables becomes one (that is, the average becomes zero and the dispersion becomes one). For example, in the case of the example illustrated in
In this manner, the aggregating unit 20 standardizes the coefficients in each prediction formula or calculates the ratio of the degree of contribution, such that comparison with the degree of contribution of another explanatory variable becomes easy.
The accepting unit 10, the aggregating unit 20, and the output unit 40 are implemented by a central processing unit (CPU) of a computer working in accordance with a program (information processing program). For example, the program may be stored in the storage unit 30 such that the CPU reads this program and works as the accepting unit 10 and the aggregating unit 20 in accordance with the program. Alternatively, the functions of the information processing system may be provided in a software as a service (SaaS) format.
In addition, the accepting unit 10, the aggregating unit 20, and the output unit 40 may each be implemented by dedicated hardware. Furthermore, some or all of constituent elements of each device may be implemented by a general-purpose or dedicated circuitry, processor, or the like, or a combination thereof. These mechanisms may be constituted by a single chip or may be constituted by a plurality of chips connected via a bus. Some or all of constituent elements of each device may be implemented by a combination of the above-mentioned circuitry or the like and the program.
In a case where some or all of constituent elements of each device is implemented by a plurality of information processing devices, circuitry, or the like, the plurality of information processing devices, the circuitry, or the like may be concentratedly arranged or may be arranged in a distributed manner. For example, information processing devices, circuitry, or the like may be implemented as a mode in which respective members are connected via a communication network, such as a client and server system and a cloud computing system.
Next, the action of the information processing system of the present exemplary embodiment will be described.
Next, an action of specifying a prediction model from the accepted classification will be described.
The aggregating unit 20 specifies a prediction target associated with the accepted classification from the table exemplified in
As described above, in the present exemplary embodiment, the accepting unit 10 accepts a classification that specifies a prediction target and the aggregating unit 20 deals with the prediction target specified by the accepted classification to aggregate, for each variable, the degree of contribution determined by the prediction model corresponding to that prediction target. Therefore, a factor that can contribute to the prediction result can be analyzed.
That is, in the present exemplary embodiment, the accepting unit 10 accepts the classifications of the prediction targets, such that the aggregating unit 20 can narrow down the analysis targets. In addition, since the aggregating unit 20 performs aggregation by focusing on the weights (coefficients) of each explanatory variable, which are factors that can contribute to the prediction target, the user can grasp a degree of influence (a degree of contribution) of each factor.
Hereinafter, effects of the present exemplary embodiment will be described in detail with specific examples.
In the invention of the present application, a situation in which a large amount of prediction models is created is supposed. That is, in the present exemplary embodiment, a prediction model is created for each prediction target finely sorted and a factor is analyzed by aggregating a plurality of created prediction models.
For example, a situation is supposed in which there are the classification of “fruit juice beverage” and only three types of “orange juice”, “grape juice”, and “apple juice” as lower classifications of “fruit juice beverage”. When a factor is analyzed by focusing on “fruit juice beverage”, conceivable methods are (1) a method of analyzing a factor based on a prediction model created for whole fruit juice beverages and (2) a method of analyzing a factor by aggregating the prediction models created for each of orange juice, grape juice, and apple juice.
As in the invention of the present application, in a case where a prediction model is created for each prediction target finely sorted, the accuracy of factor analysis is enhanced when a factor is analyzed by aggregating prediction models created for individual prediction targets as in the above (2). For example, it is assumed that a campaign A is made for orange juice and another campaign B is made for apple juice. In this case, the reason is that finer factors (explanatory variables) can be taken into consideration by factor analysis on individual prediction models created with fine granularity rather than factor analysis for the entire “fruit juice beverage”. In particular, when an upper limit of the types of explanatory variables included in the prediction model is restricted in order to raise the easiness of interpretation of the model and to prevent over learning, more remarkable effects are exerted.
In addition, by creating prediction models in fine units, the effect of being able to aggregate freely from various viewpoints (stores, commodities, timing, and the like) also can be obtained.
Note that the aggregating unit 20 may standardize common coefficients of the explanatory variable. Specifically, the aggregating unit 20 may correct each coefficient such that the total value of the coefficients of each explanatory variable becomes one (the average becomes zero and the dispersion becomes one). For example, in the case of the explanatory variable x1 exemplified in
In addition, the aggregating unit 20 may calculate the ratio of the coefficient of the explanatory variable between the respective prediction formulas. Specifically, the aggregating unit 20 may calculate the ratio of the coefficient of the explanatory variable to the calculated total sum of the coefficients of the explanatory variable for each prediction target. For example, the ratio of the coefficient of the explanatory variable x1 exemplified in
In this manner, the aggregating unit 20 standardizes the coefficients of each explanatory variable or calculates the ratio of the coefficient, such that the degree of contribution to the same explanatory variable can be compared for each prediction target.
Next, a second exemplary embodiment of the information processing system according to the present invention will be described. The configuration of the second exemplary embodiment is the same as the configuration of the first exemplary embodiment. However, the present exemplary embodiment is different from the first exemplary embodiment in that an aggregating unit 20 calculates the degree of contribution including the actually measured value of the explanatory variable. Note that the action of an accepting unit 10 is the same as that of the first exemplary embodiment.
In the present exemplary embodiment, it is assumed that the prediction model is represented by a linear regression equation including a plurality of explanatory variables. The aggregating unit 20 specifies a prediction target based on the accepted classification and specifies a prediction model for the specified prediction target. In addition, the aggregating unit 20 also specifies an actually measured value of an explanatory variable included in that prediction model based on the accepted classification. The actually measured value is stored, for example, in a storage unit 30.
The aggregating unit 20 calculates the product of the weight (coefficient) of the explanatory variable in the linear regression equation and the actually measured value of this explanatory variable for each explanatory variable. Then, the aggregating unit 20 calculates the total sum of the calculated products for each explanatory variable to employ as the degree of contribution. In the following description, the total sum of the products calculated for each explanatory variable is referred to as a second degree of contribution.
The aggregating unit 20 calculates the product of the coefficient and the actually measured value of the explanatory variable for each explanatory variable. In the example illustrated in
Note that, as in the first exemplary embodiment, the aggregating unit 20 may standardize the products of the coefficients and the actually measured values of the explanatory variables calculated by each prediction formula. Specifically, the aggregating unit 20 may correct each product such that the total value of the products becomes one (the average becomes zero and the dispersion becomes one). Note that the standardization may be performed after the total sum of the products of the respective explanatory variables is calculated.
In addition, the aggregating unit 20 may calculate the ratio of the calculated degree of contribution (second degree of contribution) of each explanatory variable. Specifically, the aggregating unit 20 may calculate, for each explanatory variable, the ratio of the second degree of contribution of each explanatory variable to the total sum of the second degrees of contribution.
Next, the action of an information processing system of the present exemplary embodiment will be described.
As described above, in the present exemplary embodiment, the aggregating unit 20 calculates the product of the coefficient that is the weight of the explanatory variable in the linear regression equation and the actually measured value of this explanatory variable for each explanatory variable and calculates the total sum of the calculated products for each explanatory variable as the second degree of contribution. Therefore, in addition to the effects of the first exemplary embodiment, analysis that reflects the performance value is enabled.
Hereinafter, effects of the present exemplary embodiment will be described in detail with specific examples.
For example, it is assumed that “the number of sales of orange juice on a certain day in March 2016 at a shop A” is explained by the following prediction formula. Here, the explanatory variables are represented in the parentheses.
Number of sales=−11.3*(highest temperature in the month in the vicinity of the shop A)+60*(total precipitation of the day in the vicinity of the shop A)+130
Judging from the above formula alone, it seems at a glance that total precipitation of the day contributes greatly to the number of sales of orange juice on the certain day in March at the shop A, as the value of the coefficient thereof is large. However, it is assumed that, in reality, it was not rain at all in the vicinity of the shop A on the certain day in March. In that case, it can be said that, in reality, the total precipitation of the day in the vicinity of the shop A did not contribute at all to the number of sales of orange juice on the certain day in March at the shop A.
Accordingly, in the present exemplary embodiment, as compared with the first exemplary embodiment, the degree of contribution of a specific explanatory variable is calculated by the value of the product of “the value of the coefficient in the prediction formula” and “the actually measured value of the explanatory variable concerning this coefficient”, such that analysis that reflects the performance value is enabled.
Note that, as in the first exemplary embodiment, the aggregating unit 20 may standardize the products of the coefficients and the actually measured values of the explanatory variable with respect to a common explanatory variable. Specifically, the aggregating unit 20 may correct the value of each product such that the total value of the products for each explanatory variable becomes one (the average becomes zero and the dispersion becomes one).
In addition, the aggregating unit 20 may calculate, for each explanatory variable, the ratio of the product of the coefficient and the actually measured value of the explanatory variable between the respective prediction formulas. Specifically, the aggregating unit 20 may calculate, for each prediction formula, the ratio of the product of each explanatory variable to the total sum of the products calculated for the explanatory variables.
Next, modifications of the second exemplary embodiment will be described. In the second exemplary embodiment, the method of calculating the degree of contribution using the actually measured value has been described. Meanwhile, the result also can be predicted using the prediction model. In this case, it is possible to specify a difference (error) between a prediction result predicted based on the prediction model and an actual measurement result actually acquired. Therefore, the aggregating unit 20 may correct the degree of contribution using the error as a difference between the prediction result predicted based on the prediction model and the actual measurement result actually acquired.
For example, the aggregating unit 20 may correct, for each prediction target, the degree of contribution of each explanatory variable by the same percentage based on a difference between the prediction result and the actual measurement result. For example, when the actual measurement result takes a value of twice the value of the prediction result, the aggregating unit 20 may double each of the degrees of contribution of the respective explanatory variables.
Besides, for example, the aggregating unit 20 may provide a new explanatory variable indicating a difference between the prediction result and the actual measurement result and employ the difference as the degree of contribution of the new explanatory variable.
Note that a method by which the aggregating unit 20 corrects the degree of contribution according to the error is not limited to the above example. The aggregating unit 20 may change the percentage by which the degree of contribution is corrected, or two or more new explanatory variables may be provided.
Next, a third exemplary embodiment of the information processing system according to the present invention will be described. In the first and second exemplary embodiments, the method of calculating the degree of contribution for each explanatory variable has been described. Meanwhile, it is also supposed that the number of explanatory variables used for prediction will grow extremely large. That is, if the factors used for analysis are sorted too finely, the number of types of explanatory variables becomes very enormous when consolidated and there is a possibility that the interpretability may be affected.
The reason why the number of types of explanatory variables becomes enormous will be explained below with specific examples. For example, in a case where a company developing 1000 retail stores nationwide predicts the sales volume of 2000 types of commodities per store per month, the number of prediction models therefor in one year sums up to 1000 (stores)×12 (months/year)×2000 (types/month and store)=24,000,000.
Here, it is assumed that an operator wishes to analyze a factor of sales with respect to the sales of a specific commodity in a specific month nationwide. In this case, an accepting unit 10 accepts, from the operator, the classification of “the number of sales of orange juice on a certain day in March 2016” as a classification that specifies a prediction target. According to the classification accepted by the accepting unit 10, prediction models for 1000 stores are specified. That is, a prediction model that predicts the number of sales of orange juice on a certain day in March 2016 at each of 1000 stores is specified.
Meanwhile, as the number of prediction models increases, the types of explanatory variables included in these prediction models also increase. This will be explained by taking the prediction models illustrated in
For example, in the example illustrated in
When all these factors are aggregated, it can be seen that the sales of orange juice in March 2016 at the shop A to the shop D are affected by the factors indicated by the explanatory variables x2, x3, x4, x5, x6, x7, x9, x10, x11, x12, x13, x15, x16, and x17. However, too many explanatory variables to be considered may affect the interpretability. As a result, if the aggregating unit 20 performs an aggregating process on a large amount of prediction models, there are fears that a human being has a difficulty in interpreting a result of such aggregation because the types of explanatory variables included in the prediction models grow too large. That is, even if the number of explanatory variables constituting one prediction formula is not so large, as the number of prediction formulas becomes larger, the types of explanatory variables included therein may increase. Accordingly, the present exemplary embodiment will describe a method that can analyze a factor that can contribute to a prediction target from a more global viewpoint.
In the present exemplary embodiment, a category indicating the properties of a variable is set in each explanatory variable. However, the category may be set in the explanatory variables of the first and second exemplary embodiments.
For example, when explanatory variables such as “TV advertisement”, “Internet posting”, and “leaflet distribution” are included in a prediction model, the category of “advertisement” is set in these explanatory variables, for example. In addition, for example, assuming that a prediction target is predicted every day, when explanatory variables such as “whether it is Sunday”, “whether it is a holiday”, and “whether it is the day before the holiday” are included in a prediction model, the category of “calendar” is set in these explanatory variables, for example. Furthermore, for example, assuming that a prediction target is predicted every day, when explanatory variables such as “whether it is rainy day”, “highest temperature”, and “daylight amount” are included in a prediction model, the category of “weather” is set in these explanatory variables, for example. A relationship between the explanatory variable and a category to which this explanatory variable belongs is assumed to be, for example, preset.
The configuration of the third exemplary embodiment is also the same as the configurations of the first and second exemplary embodiments. However, the present exemplary embodiment is different from the other exemplary embodiments in that an aggregating unit 20 calculates the degree of contribution by summarizing the explanatory variables into each category set in the explanatory variables. Note that whether the degree of contribution is calculated for each category or the degree of contribution is calculated for each explanatory variable may be determined in advance or a method for calculating the degree of contribution may be accepted by the accepting unit 10.
First, the aggregating unit 20 calculates the degree of contribution for each explanatory variable. The aggregating unit 20 may calculate the first degree of contribution described in the first exemplary embodiment as the degree of contribution for each explanatory variable or may calculate the second degree of contribution described in the second exemplary embodiment as the degree of contribution for each explanatory variable.
Next, the aggregating unit 20 aggregates the calculated degrees of contribution for each category of the explanatory variable. For example, when the explanatory variable x1 and the explanatory variable x2 exemplified in
Also in the present exemplary embodiment, the aggregating unit 20 may standardize the degrees of contribution aggregated for each category. Specifically, the aggregating unit 20 may correct each degree of contribution such that the total value of the degrees of contribution aggregated for each category becomes one (that is, the average becomes zero and the dispersion becomes one).
In addition, the aggregating unit 20 may calculate the ratio of the degree of contribution (third degree of contribution) aggregated for each category. Specifically, the aggregating unit 20 may calculate, for each category, the ratio of the third degree of contribution of each category to the total sum of the third degrees of contribution.
Next, the action of an information processing system of the present exemplary embodiment will be described.
As described above, in the present exemplary embodiment, the aggregating unit 20 aggregates the degrees of contribution calculated for respective explanatory variables for each category of these explanatory variables to calculate as the third degrees of contribution. Therefore, in addition to the effects of the first or second exemplary embodiment, analysis from a more global viewpoint is enabled.
Note that, as in the first or second exemplary embodiment, the aggregating unit 20 may standardize the degrees of contribution aggregated for each category by respective prediction formulas. Specifically, the aggregating unit 20 may correct each degree of contribution such that the total value of the degrees of contribution for the respective categories becomes one (the average becomes zero and the dispersion becomes one).
In addition, the aggregating unit 20 may calculate the ratio of the degree of contribution for each category between the respective prediction formulas. Specifically, the aggregating unit 20 may calculate, for each prediction formula, the ratio of the degree of contribution of each category to the total sum of the degrees of contribution calculated for the respective categories.
Next, a fourth exemplary embodiment of the information processing system according to the present invention will be described. The configuration of the fourth exemplary embodiment is also the same as the configuration of the first exemplary embodiment. However, the present exemplary embodiment will describe a method of calculating the degree of contribution using a prediction model in which a prediction formula is specified according to the value (actually measured value) of a variable to be applied. As the prediction model in which a prediction formula is specified according to the actually measured value, for example, there is a conditional predictor that specifies one prediction formula according to a sample. Note that the action of an accepting unit 10 is the same as that of the first exemplary embodiment.
The aggregating unit 20 uses a prediction model in which a prediction formula is specified according to the value of a variable to be applied (that is, the conditional predictor) to calculate the degree of contribution for each explanatory variable. Specifically, the aggregating unit 20 specifies a matching prediction formula for each sample to be used, using the above conditional predictor.
Thereafter, the aggregating unit 20 may calculate the first degree of contribution indicated in the first exemplary embodiment (that is, the total sum of the weights of the explanatory variable included in the prediction models for the specified prediction targets), or the second degree of contribution indicated in the second exemplary embodiment (that is, the total sum of the products calculated for each explanatory variable). In addition, the aggregating unit 20 may calculate the third degree of contribution indicated in the third exemplary embodiment (that is, the degree of contribution aggregated for each category).
For example, when the first degree of contribution is calculated, the aggregating unit 20 calculates, for each prediction formula, the percentage of a sample used for specifying the prediction formula. In the example illustrated in
Next, the aggregating unit 20 corrects the coefficient according to the calculated percentage. Specifically, the aggregating unit 20 multiplies the coefficient of the corresponding prediction formula by the calculated percentage. Then, the aggregating unit 20 aggregates the coefficients of the explanatory variable for each explanatory variable included in the specified prediction formula. This serves as the degree of contribution of each explanatory variable for one prediction target.
When the second degree of contribution is calculated, the aggregating unit 20 calculates, for each explanatory variable, the product of the coefficient of the explanatory variable in the prediction formula specified according to the sample and the value of this sample of the explanatory variable. Then, the aggregating unit 20 calculates the total sum of the calculated products for each explanatory variable to employ as the degree of contribution. This serves as the degree of contribution of each explanatory variable for one prediction target.
When the third degree of contribution is calculated, the aggregating unit 20 simply calculates the first degree of contribution or the second degree of contribution and then aggregates the degrees of contribution for each group of the explanatory variables of the common category.
As described above, in the present exemplary embodiment, the aggregating unit 20 calculates the degree of contribution for each explanatory variable using a prediction model in which a prediction formula is specified according to the value of a variable to be applied.
Therefore, in addition to the effects of the first to third exemplary embodiments, the degree of contribution can be calculated even using such a prediction model that a prediction formula is selected according to a sample.
Next, a specific example of the information processing system according to the invention of the present application will be described.
First, a method in which a user performs an aggregating process from various viewpoints on about 10 to 100 prediction models specified based on the classification accepted by the accepting unit 10 will be described in a first specific example. In the first specific example, it is assumed that the prediction models specified from the information exemplified in
In addition, in the example illustrated in
In addition, in the example illustrated in
On the screen S 1, a radio button R1 is also provided in order to designate an aggregation method, which is used to select whether to aggregate for each factor (that is, each explanatory variable) or to aggregate for each category. Furthermore, a radio button R2 is also provided on the screen S1, which is used to select whether to display the weight of the explanatory variable described in the first exemplary embodiment as the degree of contribution or to display the product of the explanatory variable and the performance value described in the second exemplary embodiment by considering also the actually measured value as the degree of contribution.
When a user selects a classification and an aggregation method and presses an execution button B1 exemplified in
Hereinafter, an example of aggregation results in the case of accepting factor analysis from two types of viewpoints from the user will be described. The first type is a factor analysis of the sales of orange juice at all stores in Tokyo (that is, a shop A, a shop B, a shop C, and a shop D) in March 2016, whereas the second type is a factor analysis of the sales of whole fruit juice beverages (apple juice, orange juice, pineapple juice, grape juice, and peach juice) at a specific store (shop A) in March 2016.
As exemplified in
Note that, as illustrated in
In addition, when an upper classification is designated, the output unit 40 may display the aggregation result for each classification included in the lower rank.
Next, a second specific example of the information processing system according to the invention of the present application will be described. In the second specific example, a method of visualizing factors of various prediction targets as a list will be described. In the second specific example, six categories of “location”, “weather”, “calendar”, “shelf allocation”, “price” and “advertisement” are supposed as categories to which the explanatory variables belong. In addition, “TV advertisement”, “Internet posting”, and “leaflet distribution” are supposed as three explanatory variables belonging to the “advertisement” category.
Furthermore, it is assumed that prediction targets for which sales are predicted are summarized into six groups of “whole beverages”, “fruit juice beverage”, “coffee”, “350 ml can, single item”, “350 ml can, set”, “500 ml plastic bottle, single item”, and “500 ml plastic bottle, set”. “Orange juice”, “grape juice”, and “apple juice” are assumed to be included in “fruit juice beverage” and the shop A is assumed to be situated in Tokyo included in the Kanto area. Additionally, sales in the Kanto area in January are supposed as an initial classification.
The output unit 40 may output the aggregation result exemplified in
In addition, the output unit 40 may output the aggregation result exemplified in
In addition, the output unit 40 may display a result of aggregation for a category including a directly controllable explanatory variable and a result of aggregation for a category including an explanatory variable not directly controllable in a form distinguishable from each other.
In the example illustrated in
Note that the results of aggregation for each category are output in the example illustrated in
In addition, the output unit 40 may visualize the ratio of the degree of contribution of each explanatory variable to the calculated total sum of the degrees of contribution of the explanatory variables.
Furthermore, in the invention of the present application, since the degrees of contribution are aggregated by summarizing the prediction models (prediction formulas) provided for each prediction target, it is possible to implement the display by development and consolidation in both of a direction of the category of the explanatory variable and a direction of the classification of the prediction target.
In addition, the examples illustrated in
Note that the above-described specific examples have described a case where the sales relating to the commodity are treated as prediction targets. However, a case where a target relating to the service is assigned as a prediction target can be similarly handled. For example, the number of visitors to a facility providing a certain service can be cited as a prediction target relating to the service.
In addition, the above-described specific examples have exemplified the contents and properties of the commodity and the place where the commodity is provided as the classifications of the prediction target, but the classifications of the prediction target are not limited to these contents. For example, the classification may be provided from the viewpoint of a seller or purchaser, or may be provided from the viewpoint of the time the commodity is provided. Furthermore, such a classification is not limited to a case where the prediction target is a target relating to the commodity and can be adopted similarly also in a case where the prediction target is a target relating to the service.
For example, it is assumed that a factor of the number of visitors to a facility F providing a certain service is analyzed. In this case, it is conceivable to set the timing (March 2015) as the classification. In addition, an advertisement (for example, the number of times of broadcasting a commercial in the Kansai region, in which talent A has been appointed, and the number of times of appearing in an advertisement hanging inside a predetermined train) may be used as a factor (explanatory variable).
Besides, for example, it is assumed that a factor of a certain lifestyle disease is analyzed. At this time, for example, age (40 generations), gender (male), and the like can be cited as classifications.
In addition, from such a viewpoint, the information processing system of the invention of the present application can be used not only for sales predictions at retail stores, but also for a wide range of industries and prediction targets such as production predictions for manufacturing industries, predictions of the number of passengers for railway companies, and demand predictions for electric utilities.
Next, the outline of the present invention will be described.
With such a configuration, a factor that can contribute to the prediction target can be analyzed.
In addition, the information processing system 80 may further include a storage unit (for example, the storage unit 30) that stores a prediction target specified by a plurality of classifications in association with a prediction model including a variable that affects the prediction target. Then, the aggregating unit 82 may aggregate for the prediction target specified by the accepted classifications among a plurality of prediction targets stored in the storage unit.
In addition, the aggregating unit 82 may aggregate the degree of contribution (for example, the third degree of contribution) for each of categories based on a correspondence relationship between a variable and one of the categories to which this variable belongs. With such a configuration, analysis from a more global viewpoint is enabled.
Specifically, the aggregating unit 82 may aggregate a weight of the variable as the degree of contribution. In addition, the aggregating unit 82 may calculate, for each variable, a total sum of the weights of the variable included in prediction models for specified prediction targets as a first degree of contribution. With such a configuration, it is possible to analyze a contributable factor (explanatory variable) by summarizing a plurality of prediction targets.
In addition, the prediction model may be represented by a linear regression equation including a plurality of variables. At this time, the aggregating unit 82 may aggregate coefficients of the variables included in the prediction model as the weights of these variables.
In addition, when the prediction model is represented by a linear regression equation including a plurality of variables, the aggregating unit 82 may calculate products of coefficients of the variables included in the prediction model and actually measured values of these variables for each of the variables and calculate a total sum of the calculated products for each of the variables as a second degree of contribution. With such a configuration, analysis that reflects a performance value is enabled.
At that time, the aggregating unit 82 may correct the degree of contribution based on an error which is a difference between a predicted value and an actually measured value of the prediction target. In addition, the aggregating unit 82 may aggregate an error which is a difference between a predicted value and an actually measured value of the prediction target as the degree of contribution of a variable indicating this error.
In addition, the aggregating unit 82 may standardize the degrees of contribution calculated for each variable. For example, in the case of the example illustrated in
In addition, the aggregating unit 82 may calculate a ratio of the degree of contribution of a variable to a calculated total sum of the degrees of contribution of variables for each of these variables. For example, in the case of the example illustrated in
Meanwhile, the aggregating unit 82 may standardize weights of a variable common to respective prediction formulas for each variable. For example, in the case of the example illustrated in
In addition, the aggregating unit 82 may calculate a ratio of a weight of a common variable to a total sum of weights of the variable for each prediction target. For example, in the case of the example illustrated in
In addition, the aggregating unit 82 may calculate the degree of contribution for each variable using a prediction model (for example, the conditional predictor) in which a prediction formula is specified according to a value of a variable (for example, a sample) to be applied.
Note that the prediction target may be a target relating to a commodity or a service. Additionally, the classification may be information indicating any one of contents or properties of the commodity or the service, a seller or a purchaser of the commodity or the service, and a place or time at which the commodity or the service is provided.
In addition, the information processing system may include an output unit (for example, the output unit 40) that displays a result of aggregation for a directly controllable variable (for example, “advertisement”, “price”, and “shelf allocation” exemplified in
In addition, a case where the prediction model is a linear regression equation has been described thus far. However, the prediction model is not limited to the linear regression equation. The present invention can be applied as long as the prediction model is made up of a variable that affects the prediction target and the degree of contribution to the prediction target is determined by the prediction model.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/JP2016/001751 | 3/25/2016 | WO | 00 |