This application is based upon and claims the benefit of priority of the prior Japanese Patent Application No. 2022-086979, filed on May 27, 2022, the entire contents of which are incorporated herein by reference.
A certain aspect of embodiments described herein relates to a learning method of a value calculation model, a non-transitory computer-readable recording medium, and a selection probability estimation method.
It is desired to control the transportation of people to reduce the emission amount of CO2 and alleviating traffic congestion. For example, when there is a plurality of transportation options for a pair (OD) of an origin (O) and a destination (D), changing the fare for each transportation option will cause people to change the transportation option that they select. Therefore, by setting the fare for each transportation option appropriately, it is possible to appropriately control the transportation of people.
Conventionally, when predicting at what rate each of options having attribute values of different scales such as cost and time is selected, a numerical value (value) that can express each option with a single scale is obtained by inputting the attribute values into a predetermined formula (for example, a linear formula). Then, the degree to which each option is selected (selection probability) is predicted from the relative relationship between the obtained values of the options. Note that the art related to the present disclosure is also disclosed in Japanese Patent Application Laid-Open No. 2015-114988.
According to an aspect of the embodiments, there is provided a learning method of a value calculation model for calculating a value of an option used when a person acts from an attribute value of the option, implemented by a computer, the learning method including: acquiring input data in which a selection probability indicating a rate at which each option is selected from a plurality of options and attribute values of the plurality of options when the selection probability is obtained are associated with each other; and acquiring, for each combination of two options that can be extracted from the plurality of options, a relationship between selection probabilities of the two options included in each combination from the input data, and adjusting the value calculation model so that a relationship between values calculated when attribute values of the two options included in each combination are input to the value calculation model and a relationship between the selection probabilities corresponding to each combination are close to each other.
The object and advantages of the invention will be realized and attained by option of the elements and combinations particularly pointed out in the claims. It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory and are not restrictive of the invention, as claimed.
To learn a value calculation model for calculating a value from attribute values by machine learning or the like, data in which the attribute values of each of options and the value of each of the options are associated with each other is necessary as learning data. However, only the rate at which each option was selected (selection probability) can be obtained as the observed value of each option. Even when the selection probability of each option is obtained, the value of each option cannot be obtained from the selection probability, and thus the selection probability alone is insufficient as learning data.
Hereinafter, an embodiment will be described in detail with reference to
The information processing apparatus 10 of the present embodiment is an apparatus that determines and outputs an appropriate toll (billing amount) when a user wants to set road pricing (toll) for eliminating congestion on a road. For example, as illustrated in
As illustrated in
The transportation data acquisition unit 20 acquires transportation data illustrated in
The selection probability calculation unit 22 calculates the rate at which each option (car, train, bus) was selected for each OD from the transportation data of
The attribute value acquisition unit 24 acquires attribute values (in the present embodiment, cost and time) of each option for each OD. Here, the cost is a fare when using a train or a bus, a toll for a road when using a car, or the like. Time is the time required to move between OD. The attribute value acquisition unit 24 acquires attribute value data illustrated in
The learning data generation unit 26 generates learning data using the selection probability data (
The model learning unit 28 executes a process of learning the value calculation model using the learning data stored in the learning data storage unit 44.
The target selection probability acquisition unit 30 acquires a target value of the selection probability of each option for a certain OD (object OD) input by the user. The data of the target value input by the user is target selection probability data illustrated in
The optimum billing amount calculation unit 32 acquires the attribute value data (see
P
1
=V
1/(V1+V2+V3) (1)
In the example illustrated in
The output unit 34 outputs the optimum billing amount notified from the optimum billing amount calculation unit 32 to the display unit 93.
Here, an outline of learning of the model learning unit 28 will be described.
As illustrated in
As a result of intensive studies, the inventor has focused on the fact that a relationship (ratio) between values can be obtained from the selection probabilities, which are observed values. For example, as illustrated in
Next, a process executed by the information processing apparatus 10 will be described in detail. The information processing apparatus 10 executes the “learning preparation and learning process” illustrated in
When the process of
Then, in step 512, the selection probability calculation unit 22 calculates the selection probability of each option in each OD with reference to the transportation date (
Then, in step S14, the learning data generation unit 26 selects one unselected OD. When there are three ODs (OD1 to OD3) as illustrated in
Then, in step S16, the learning data generation unit 26 executes a learning data generation process. In this step S16, a process according to the flowchart of
In the process of
Then, in step S32, the learning data generation unit 26 acquires respective attribute values of the two options. The learning data generation unit 26 refers to the attribute value storage unit 42 and acquires the attribute values (cost and time) of, for example, the option (car) and the option (bus) for OD1 from the attribute value data illustrated in
Then, in step S34, the learning data generation unit 26 calculates the ratio of the selection probabilities of the two options. The learning data generation unit 26 refers to the selection probability storage unit 40, acquires, for example, the respective selection probabilities (50% and 17%) of the option (car) and the option (bus) for OD1 from the selection probability data in
Then, in step S36, the learning data generation unit 26 records the acquired attribute values and the calculated ratio of the selection probabilities as the learning data. In the above example, the learning data generation unit 26 stores the data with the learning data ID=“001” in
Then, in step S38, the learning data generation unit 26 determines whether all combinations of options have been selected. When the determination at step S38 is negative, the process returns to step S30, and the processes in and after step S30 are repeatedly executed. On the other hand, when the determination in step S38 is affirmative, the process proceeds to step S18 of
In step S18 of
In step S20, the model learning unit 28 executes a value calculation model learning process. In this step S20, a process according to the flowchart of
When the process of
Then, when the process proceeds to step S42, the model learning unit 28 selects one unselected piece of the learning data. For example, the model learning unit 28 selects the learning data that is in the top (learning data ID=001) in
Then, in step S44, the model learning unit 28 inputs the attribute values of the selected piece of the learning data to the value calculation model, and calculates the respective values (V1 and V2 in
Then, in step S46, the model learning unit 28 calculates the ratio (V1/V2) of the values of the options.
Then, in step S48, the model learning unit 28 calculates a difference ((V1/V2)−(P1/P2)) between the ratio of the values of the options and the ratio of the selection probabilities of the selected piece of the learning data and records the difference as a residual.
Then, in step S50, the model learning unit 28 determines whether all pieces of the learning data have been selected. When the determination in step S50 is negative, the process returns to step S42, and the processes in steps S42 to S50 are repeatedly executed until the residuals of all the pieces of the learning data are calculated. On the other hand, when the determination in step S50 is affirmative, the model learning unit 28 proceeds to step S52.
In step S52, the model learning unit 28 determines whether the sum of the residuals calculated in step S48 is equal to or less than the threshold value. When the determination in step S52 is negative, it is necessary to adjust the value calculation model, and thus the process proceeds to step S54.
In step S54, the model learning unit 28 updates the parameters of the value calculation model. In addition, the model learning unit 28 sets all the pieces of the learning data as unselected and deletes all the recorded residuals. Thereafter, the model learning unit 28 repeatedly executes the processes of steps S42 to S54 using the updated value calculation model. Then, when the sum of the residuals becomes equal to or less than the threshold value, the determination in step S52 becomes affirmative, and the model learning unit 28 proceeds to step S56.
In step S56, the model learning unit 28 stores the parameters of the value calculation model in the model parameter storage unit 46. Thus, the process of
Next, the billing amount determination process will be described with reference to the flowchart of
When the process of
Then, in step S72, the optimum billing amount calculation unit 32 selects one unselected option. For example, the optimum billing amount calculation unit 32 selects the option (car) from the option (car), the option (train), and the option (bus).
Then, in step S74, the optimum billing amount calculation unit 32 calculates the value of the selected option using the value calculation model that has been learned through the processes of
Then, in step S76, the optimum billing amount calculation unit 32 determines whether all option have been selected. When the determination in step S76 is negative, the process returns to step S72, and the processes in steps S72 to S76 are repeated until the values of all the options are calculated. When the determination in step S76 is affirmative, the optimum billing amount calculation unit 32 proceeds to step S78.
In step S78, the optimum billing amount calculation unit 32 calculates (estimates) the selection probability of each option from the calculated value of each option. Specifically, the optimum billing amount calculation unit 32 calculates (estimates) the selection probability of each option using the above equation (1).
Then, in step S80, the optimum billing amount calculation unit 32 determines whether the calculated selection probability matches the target selection probability. When the difference between the calculated selection probability and the target selection probability falls within a predetermined range, the optimum billing amount calculation unit 32 may determine that the calculated selection probability and the target selection probability match each other. When the determination in step S80 is negative, the process proceeds to step S82, the optimum billing amount calculation unit 32 updates the cost of the option (car), and the process returns to step S72. Thereafter, the optimum billing amount calculation unit 32 repeats the processes in and after step S72 until the determination in step S80 becomes affirmative, and when the determination in step S80 becomes affirmative, the process proceeds to step S84.
In step S84, the output unit 34 outputs the cost of the option (car) when the determination in step S80 is affirmative as the optimum billing amount. By checking the output optimum billing amount, the user can confirm how much the toll for the car is appropriate in order to match the selection probability of each option with the corresponding target selection probability.
As described above in detail, the information processing apparatus 10 of the present embodiment acquires the selection probability of each option that can be used when a person moves between OD, and the data (
The value calculation model used in the present embodiment is a neural network (MLP or the like) in which inputs are the attribute values of each option and an output is the value of each option. This allows the user to automatically learn the value calculation model without previously assuming a linear equation or the like as the value calculation model.
In the present embodiment, the model learning unit 28 calculates the differences (residuals) between the relationships (ratio) between the values of the options obtained from the learning data (learning data IDs=001 to 009) and the relationships (ratio) between the selection probabilities. Then, the model learning unit 28 adjusts the parameters of the value calculation model so that the sum of the differences is equal to or less than the threshold value (S42 to S54 in
Further, in the present embodiment, the optimum billing amount calculation unit 32 calculates the value of each option by inputting the attribute values of each option to the value calculation model learned by the processes of
Further, in the present embodiment, the optimum billing amount calculation unit 32 adjusts at least part of the attribute values of each option so that the estimated selection probability of each option approaches the corresponding target selection probability (S82). Accordingly, for example, by adjusting the cost of the option (car) so that the selection probability of the option (car) is reduced, it is possible to determine the optimum toll (road pricing) for eliminating the congestion of the road.
In the above-described embodiment, the optimum billing amount calculation unit 32 optimizes the toll of the road. However, this does not intend to suggest any limitation, and the fare (cost) of the train or the bus may be adjusted so that the selection probability of each option approaches the corresponding target selection probability. The options of transportation may include other options of transportation (a motorcycle, a ship, an airplane, or the like) in addition to or instead of at least one of a car, a train, or a bus.
In the above-described embodiment, the case where the relative evaluation of
In this case, the relationship between P1 and P2 can be expressed by the following equation (3).
From the above equation (3), it can be seen that when the logit model is used as the relative evaluation, the difference between the values can be used as the relationship between the values. Additionally, it can be seen that the value calculation model is to be learned so that the difference between the values approaches the relationship between the selection probabilities (the difference between the numerical values of the natural logarithms of the selection probabilities).
In the case of the present variation, since the difference between V1 and V2 is used as presented in the above equation (3), the calculation can be performed even the numerical values of the values V1 and V2 are 0, for example. Accordingly, since the loss function of the machine learning does not have a singular point, calculation in the machine learning can be stabilized.
In the above-described embodiment, the case where the value calculation model is a model of a neural network such as an MLP has been described, but this does not intend to suggest any limitation. As the value calculation model, a linear equation such as V=w1×cost+w2×time (w1 and w2 are weight coefficients, and V is value) may be used.
In the above-described embodiment, transportation options (car, train, bus) have been described as an example of the option used when people act, but this does not intend to suggest any limitation. There are various options used when people act, and for example, net shopping and an actual store used when people shop also correspond to options used when people act. That is, in a situation where a person selects an option from among a plurality of options when performing a certain action, when the value of each option and the selection probability of each option are obtained, the above-described embodiment can be appropriately modified and used. In the above-described embodiment, the case in which the attribute values are cost and time has been described, but the attribute values may be something other than cost or time.
In the above-described embodiment, the case in which the information processing apparatus 10 used by the user has the functions of
The above-described processing functions are implemented by a computer. In this case, a program in which processing details of the functions that a processing device is to have are written is provided. The aforementioned processing functions are implemented in the computer by the computer executing the program.
The program in which the processing details are written can be stored in a computer-readable recording medium (however, excluding carrier waves).
When the program is distributed, it may be sold in the form of a portable storage medium such as a digital versatile disc (DVD) or a compact disc read only memory (CD-ROM) storing the program. The program may be stored in a storage device of a server computer, and the program may be transferred from the server computer to another computer over a network.
A computer executing the program stores the program stored in a portable storage medium or transferred from a server computer in its own storage device. The computer then reads the program from its own storage device, and executes processes according to the program. The computer may directly read the program from a portable storage medium, and execute processes according to the program. Alternatively, the computer may successively execute a process, every time the program is transferred from a server computer, according to the received program.
All examples and conditional language recited herein are intended for pedagogical purposes to aid the reader in understanding the invention and the concepts contributed by the inventor to furthering the art, and are to be construed as being without limitation to such specifically recited examples and conditions, nor does the organization of such examples in the specification relate to a showing of the superiority and inferiority of the invention. Although the embodiments of the present invention have been described in detail, it should be understood that the various change, substitutions, and alterations could be made hereto without departing from the spirit and scope of the invention.
Number | Date | Country | Kind |
---|---|---|---|
2022-086979 | May 2022 | JP | national |