This application is based upon and claims the benefit of priority of the prior Japanese Patent Application No. 2022-013094, filed on Jan. 31, 2022, the entire contents of which are incorporated herein by reference.
The embodiment discussed herein is related to an information processing method and an information processing apparatus.
There is a technique using replicator dynamics as a technique for deriving an evolutionarily stable strategy in a two-player strategic form game or the like. In the following descriptions, the two-player strategic form game or the like will be simply referred to as a game.
An equilibrium solution is obtained by applying the replicator dynamics to the game and repeating update of a strategy evolutionarily. It is known that the equilibrium solution obtained at this time is an evolutionarily stable strategy.
Therefore, it is possible to derive an evolutionarily stable strategy by actually applying the replicator dynamics to the game and observing where it converges to equilibrium.
The manufacturer_A1, manufacturer_A2, . . . , and manufacturer_An want to sell the product at the highest possible price, whereas the retailer_B1, retailer_B2, . . . , and retailer_Bn want to buy the product at the lowest possible price.
The replicator dynamics is applied in a case of reviewing the institutional design of the auction as appropriate by investigating what kind of equilibrium (product transaction price and purchase volume) is settled when the retailer_B1, retailer_B2, . . . , and retailer_Bn make bids on the basis of the concept illustrated in
Here, an example of the replicator dynamics will be described. The replicator dynamics is defined by an equation (1). The differential of xi is represented by xi(dot).
{dot over (x)}
i
=x
i(pi−pTx) (1)
In the equation (1), xi represents a selection probability of the i-th strategy, or a ratio of groups that have selected the i-th strategy. An equation (2) defines x. Here, the possible range of xi is “0≤xi≤1”, and xi satisfies an equation (3).
In the equation (1), pi represents reward for the i-th strategy. An equation (4) defines p.
p=[p
1
. . . p
n]T (4)
When the replicator dynamics defined by the equation (1) is discretized for implementation in a computer, it may be expressed by an equation (5). In the equation (5), k represents a time (or the number of steps or the number of generations). An update width is represented by h. With the update width h set larger, the time until the value of x reaches the equilibrium solution and converges may be shortened. Accordingly, by increasing the update width h, for example, improvement in the convergence speed may be expected.
x(k+1)=xi(k)+hxi(k)(pi(k)−p(k)Tx(k)) (5)
International Publication Pamphlet No. WO 2007/066787 and Japanese Laid-open Patent Publication No. 2006-227754 are disclosed as related art.
According to an aspect of the embodiment, a non-transitory computer-readable recording medium stores a program for causing a computer to execute a process, the process includes in a case of calculating an equilibrium solution of selection probabilities of a plurality of strategies using replicator dynamics, calculating a differential value of a calculation result based on the replicator dynamics using, as an input, respective selection probabilities of the plurality of strategies and respective gains when a game is performed with the respective selection probabilities, and adjusting the respective selection probabilities after elapse of a predetermined time based on the differential value.
The object and advantages of the invention will be realized and attained by means of the elements and combinations particularly pointed out in the claims.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory and are not restrictive of the invention.
As described above, while an increase in the update width h may be expected to improve the convergence speed, a larger update width h may lead to fluctuation.
The horizontal axes of the graphs G1, G2, and G3 are axes corresponding to the time, and the vertical axes are axes corresponding to the proportion. As an example, the relationship between the time and the proportion will be described related to x1 and x2, which are selection probabilities of first and second strategies. A line l1 is a line corresponding to xi. A line l2 is a line corresponding to x2. The time taken for the value of x to fall within ±5% of a steady-state value may be referred to as a “settling time”.
While the settling time is Time=2 (s) in the graph G1, the settling time is Time=0.3 (s) in the graph G2. That is, as far as the graphs G1 and G2 are concerned, it may be said that an increase in the update width improves the time until convergence.
However, as illustrated in the graph G3, the values of x1 and x2 fluctuate and do not converge when the update width h is increased. Furthermore, when the update width h is increased, the condition “0≤xi≤1” may not be satisfied. Accordingly, the existing technique of simply increasing the update width h fails to reduce the time, the number of steps, or the number of generations until convergence to the equilibrium solution.
Hereinafter, an embodiment of an information processing method and an information processing apparatus disclosed in the present application will be described in detail with reference to the drawings. Note that the present embodiment does not limit the present disclosure.
Prior to explaining an information processing apparatus according to the present embodiment, a device based on an existing technique for calculating replicator dynamics will be described. In the following descriptions, the device based on the existing technique will be referred to as an existing device.
Upon reception of an input of the strategy selection probability xi(k), the game execution unit 51 executes a predetermined game, and calculates gain p(k) of each strategy with respect to the strategy selection probability xi(k). The game execution unit 51 outputs the gain p(k) of each strategy to the update unit 52.
The update unit 52 outputs a strategy selection probability xi(k+1) with a time advanced by one step on the basis of the strategy selection probability xi(k) and the gain p(k) of each strategy. The update unit 52 includes control blocks 53, 54, and 55.
The control block 53 calculates ui(k) on the basis of the strategy selection probability xi(k), the gain p(k) of each strategy, and an equation (6). The control block 53 outputs ui(k) to the control block 54.
u
i(k)=xi(k)(pi(k)−x(k)Tp(k)) (6)
The control block 54 multiplies ui(k) by the update width h, thereby calculating yi(k). Here, yi(k) corresponds to a value obtained by subtracting the current strategy selection probability xi(k) from the strategy selection probability xi(k+1) with the time advanced by one step, and corresponds to a change amount of xi. The control block 54 outputs yi(k) to the control block 55.
The control block 55 multiplies yi(k) by (1/(z−1)), thereby calculating the strategy selection probability xi(k+1) for the next time. Here, z is an operator that advances the time by one step. Note that multiplying yi(k) by (1/(z−1)) is the same as executing calculation expressed by an equation (7).
x
i(k+1)=xi(k)+yi(k) (7)
According to the existing device 50 described with reference to
Next, exemplary processing of the information processing apparatus according to the present embodiment will be described.
Upon reception of an input of the strategy selection probability xi(k), the game execution unit 61 executes a predetermined game, and calculates gain p(k) of each strategy with respect to the strategy selection probability xi(k). The game execution unit 61 outputs the gain p(k) of each strategy to the update unit 62.
The update unit 62 outputs the strategy selection probability xi(k+1) with the time advanced by one step on the basis of the strategy selection probability xi(k) and the gain p(k) of each strategy. The update unit 62 includes control blocks 63, 64, and 65.
The control block 63 calculates ui(k) on the basis of the strategy selection probability xi(k), the gain p(k) of each strategy, and the equation (6). The control block 63 outputs ui(k) to the control block 64.
The control block 64 is a controller H(k) that calculates a differential value of ui(k) and calculates yi(k) with respect to the strategy selection probability xi(k) on the basis of the differential value. Here, yi(k) corresponds to a value obtained by subtracting the current strategy selection probability xi(k) from the strategy selection probability xi(k+1) with the time advanced by one step, and corresponds to a change amount. The control block 64 outputs yi(k) to the control block 65.
The control block 65 multiplies yi(k) by (1/(z−1)), thereby calculating the strategy selection probability xi(k+1) for the next time. Here, z is an operator that advances the time by one step. Note that multiplying yi(k) by (1/(z−1)) is the same as executing calculation expressed by an equation (7).
Next, an example of the control block 64 illustrated in
The control block 71a multiplies the input ui(k) by a proportional gain Kp, thereby calculating Kpui(k). The proportional gain Kp is set in advance. The control block 71a outputs Kpui(k) to the adder 73.
The control block 71b multiplies the input ui(k) by a proportional gain Kd, thereby calculating Kdui(k). The proportional gain Kd is set in advance. The control block 71b outputs Kdui(k) to the control block 72.
The control block 72 performs approximate differentiation on Kdui(k). For example, the control block 72 multiplies Kdui(k) by (s/(Ns+1)) to obtain a differential value. Here, s represents a differential operator, and N represents a preset parameter. In the following descriptions, the value obtained by multiplying Kdui(k) by (s/(Ns+1)) will be referred to as a “differential value”. The control block 72 outputs the differential value to the adder 73.
The adder 73 adds Kpui(k) and the differential value, and outputs the addition result to the control block 74.
The control block 74 multiplies the addition result obtained from the adder 73 by an adjustment gain, thereby calculating yi(k). For example, the adjustment gain is expressed by an expression (8). The control block 74 outputs yi(k) to the control block 65. In the expression (8), p(k)T represents a transpose of a gain vector p(k). The inner product of the transpose of the gain vector p(k) and a strategy selection probability x(k) is represented by p(k)Tx(k).
1/p(k)Tx(k) (8)
Here, the control block 64 (controller H(k)) is set in advance to satisfy a relationship of an equation (9) such that a relationship of an equation (10) is satisfied. For example, the control block 74 adjusts yi(k) using the adjustment gain in such a manner that the sum of the change amounts yi(k) with respect to the strategy selection probability xi(k) becomes zero to satisfy the relationship of the equation (10).
As described above, the information processing apparatus 100 according to the present embodiment calculates the differential value of ui(k) calculated on the basis of the strategy selection probability xi(k) and the gain p(k) of each strategy, and adjusts the selection probability of a plurality of strategies after elapse of a predetermined time on the basis of the differential value. For example, as described with reference to
In the example illustrated in the graph G4, the settling time is 0.2 (s). For example, the settling time may be reduced by approximately 33% as compared with the settling time of 0.3 (s) of the existing technique described with reference to the graph G2 of
Meanwhile, when the replicator dynamics is applied to a game to obtain the equilibrium solution with the existing device, fluctuation may occur depending on the game even when a change width h is made smaller.
The horizontal axes of the graphs G11, G12, and G13 are axes corresponding to the time, and the vertical axes are axes corresponding to the proportion. As an example, the relationship between the time and the proportion will be described related to x1, x2, and x3, which are selection probabilities of first, second, and third strategies. A line l1 is a line corresponding to x1. A line l2 is a line corresponding to x2. A line l3 is a line corresponding to x3. As illustrated in
When the information processing apparatus 100 according to the present embodiment applies the replicator dynamics to a game, in which the fluctuation is generated at any change width and the equilibrium solution may not be obtained according to the existing technique described with reference to
As illustrated in the graph G14, with the information processing apparatus 100 provided with the controller H(k) performing the processing, the selection probabilities x1, x2, and x3 converge, and the equilibrium solution may be calculated.
Next, an exemplary configuration of the information processing apparatus 100 according to the present embodiment will be described.
The communication unit 110 performs data communication with an external device via a network.
The input unit 120 is an input device that receives an operation from a user, and is implemented by, for example, a keyboard, a mouse, or the like. The user operates the input unit 120 to input information related to game settings, the proportional gains Kp and Kd, a parameter N to be used to perform approximate differentiation, and the like.
The display unit 130 is a display device for outputting a result of equilibrium solution calculation and the like, and is implemented by, for example, a liquid crystal monitor, a printer, or the like.
The storage unit 140 is a storage device that stores various types of information, and is implemented by, for example, a semiconductor memory element such as a random access memory (RAM), a flash memory, or the like, or a storage device such as a hard disk, an optical disk, or the like. For example, the storage unit 140 stores setting information of the game to which the replicator dynamics is applied, the proportional gains Kp and Kd, information of the parameter N to be used to perform the approximate differentiation, and the like. Furthermore, the storage unit 140 stores the initial value of the strategy selection probability xi(k).
The control unit 150 is implemented by a processor such as a central processing unit (CPU), a micro processing unit (MPU), or the like, executing various programs stored in a storage device inside the information processing apparatus 100 using the RAM or the like as a work area. Furthermore, the control unit 150 may be implemented by an integrated circuit (IC) such as an application specific integrated circuit (ASIC), a field programmable gate array (FPGA), or the like.
The control unit 150 executes the processing described with reference to
The game execution processing unit 151 performs processing corresponding to that of the game execution unit 61 described with reference to
When the game execution processing unit 151 obtains the updated strategy selection probability with the time advanced by one step from the update processing unit 152, it repeatedly performs the process of executing the game again and calculating the gain of each strategy with respect to the updated strategy selection probability. The game execution processing unit 151 obtains the initial value of the strategy selection probability xi(k) from the storage unit 140.
The update processing unit 152 performs processing corresponding to that of the update unit 62 described with reference to
The update processing unit 152 calculates ui(k) on the basis of the strategy selection probability xi(k), the gain p(k) of each strategy, and the equation (6). The update processing unit 152 calculates a differential value of ui(k), and calculates yi(k) with respect to the strategy selection probability xi(k) on the basis of the differential value. The update processing unit 152 multiplies yi(k) by (1/(z−1)), thereby calculating the strategy selection probability xi(k+1) for the next time.
Furthermore, the update processing unit 152 determines whether or not the strategy selection probability has converged. For example, the update processing unit 152 determines that the strategy selection probability has converged when a difference between the previous strategy selection probability xi(k) and the current strategy selection probability xi(k+1) is less than a threshold value.
When the update processing unit 152 determines that the strategy selection probability has converged, it outputs an equilibrium solution to the equilibrium solution output unit 153 with the strategy selection probability xi(k+1) calculated this time as the equilibrium solution.
On the other hand, when the update processing unit 152 determines that the strategy selection probability has not converged, it outputs the strategy selection probability xi(k+1) calculated this time to the game execution processing unit 151. The update processing unit 152 repeatedly performs the process described above until the strategy selection probability converges.
The equilibrium solution output unit 153 outputs information regarding the equilibrium solution to the display unit 130 when the equilibrium solution is obtained from the update processing unit 152.
Next, an exemplary processing procedure of the information processing apparatus 100 according to the present embodiment will be described.
The game execution processing unit 151 executes a game on the basis of the strategy selection probability xi(k), and calculates a gain p(k) (step S102). The update processing unit 152 of the information processing apparatus 100 calculates ui(k) on the basis of the strategy selection probability xi(k) and the gain p(k) (step S103).
The update processing unit 152 calculates a multiplication result Kpui(k) of the proportional gain Kp and ui(k) (step S104). The update processing unit 152 calculates a differential value for the multiplication result of the proportional gain Kd and ui(k) (step S105).
The update processing unit 152 multiplies the addition result of Kpui(k) and the differential value by the adjustment gain to calculate yi(k) (step S106). The update processing unit 152 calculates the strategy selection probability xi(k+1) on the basis of yi(k) (step S107).
If the strategy selection probability has not converged (No in step S108), the update processing unit 152 proceeds to step S102. On the other hand, if the strategy selection probability has converged (Yes in step S108), the update processing unit 152 proceeds to step S109.
The equilibrium solution output unit 153 of the information processing apparatus 100 outputs an equilibrium solution to the display unit 130 (step S109).
Next, effects of the information processing apparatus 100 according to the present embodiment will be described. The information processing apparatus 100 calculates the differential value of ui(k) calculated on the basis of the strategy selection probability xi(k) and the gain p(k) of each strategy, and adjusts the selection probability of a plurality of strategies after elapse of a predetermined time on the basis of the differential value. For example, as described with reference to
For example, comparing the graph G4 according to the present embodiment described with reference to
Furthermore, as described with reference to
Meanwhile, although the case where the controller H(k) (control block 64) described in the information processing apparatus 100 according to the present embodiment is implemented by the PD controller and the adjustment gain has been described, it is not limited to this. For example, the information processing apparatus 100 may implement the controller H(k) using a phase-lead compensator.
Although an exemplary controller H(k) has been described with reference to
Next, an exemplary hardware configuration of a computer that implements functions similar to those of the information processing apparatus 100 described in the embodiment above will be described.
As illustrated in
The hard disk drive 207 stores a game execution processing program 207a, an update processing program 207b, and an equilibrium solution output program 207c. Furthermore, the CPU 201 reads the individual programs 207a to 207c, and loads them into the RAM 206.
The game execution processing program 207a functions as a game execution processing process 206a. The update processing program 207b functions as an update processing process 206b. The equilibrium solution output program 207c functions as an equilibrium solution output process 206c.
Processing of the game execution processing process 206a corresponds to the processing of the game execution processing unit 151. Processing of the update processing process 206b corresponds to the processing of the update processing unit 152. Processing of the equilibrium solution output process 206c corresponds to the processing of the equilibrium solution output unit 153.
Note that the individual programs 207a to 207c may not necessarily be stored in the hard disk drive 207 from the beginning. For example, each of the programs may be stored in a “portable physical medium” to be inserted in the computer 200, such as a flexible disk (FD), a compact disc read-only memory (CD-ROM), a digital versatile disc (DVD), a magneto-optical disk, or an IC card. Then, the computer 200 may read and execute each of the programs 207a to 207c.
All examples and conditional language provided herein are intended for the pedagogical purposes of aiding the reader in understanding the invention and the concepts contributed by the inventor to further the art, and are not to be construed as limitations to such specifically recited examples and conditions, nor does the organization of such examples in the specification relate to a showing of the superiority and inferiority of the invention. Although one or more embodiments of the present invention have been described in detail, it should be understood that the various changes, substitutions, and alterations could be made hereto without departing from the spirit and scope of the invention.
Number | Date | Country | Kind |
---|---|---|---|
2022-013094 | Jan 2022 | JP | national |