The present invention relates to an information processing apparatus, an incentive measure calculation method, and a program.
Although there are matters to be continuously performed such as healthy actions or learning actions, people have difficulty in voluntarily continuing the matters in some cases. Maintaining motivation for these actions at a high level in order to continue the actions is necessary for people to live better lives.
For example, in Non-Patent Literature 1, an experiment of encouraging a healthy action using a monetary incentive is performed, and the experiment performed with the following three designed as an incentive is disclosed.
Non-Patent Literature 2 discloses that a goal of exercise is set every day for a heart disease patient wearing a wearable device, and in a case where the goal can be achieved, a monetary incentive given in advance is maintained, and in a case where the goal cannot be achieved, a fixed amount of an incentive is forfeited every time (intervention by the loss-framed incentive), thereby promoting to habituate exercise. By the loss-framed incentive being utilized, the following four psychological effects can be incorporated.
In any of Non-Patent Literature 1 to 3, it is described that a success rate of a task is increased in a case where the loss-framed incentive is used. However, there is no description that a success rate of a task is the highest in a case where the amount of the loss-framed incentive is a fixed amount every time, and an amount of the loss-framed incentive that maximizes the success rate of a task has not yet been determined.
An object of the disclosed technology is to calculate an incentive measure that increases a success rate of a task.
The disclosed technology is an information processing apparatus including a data acquisition unit that acquires data indicating a relationship between success or failure of a task and an amount of a loss-framed incentive indicating an incentive that is forfeited in a case where the task is not successful, and information indicating a motivation function representing motivation of a user for the task, a parameter determination unit that determines a parameter of the motivation function based on the data, and an incentive measure calculation unit that calculates an incentive measure indicating an amount of a loss-framed incentive, based on the determined parameter and the motivation function.
An incentive measure that increases a success rate of a task can be calculated.
Hereinafter, an embodiment of the present invention (present embodiment) will be described with reference to the drawings. The embodiment described below is merely an example, and embodiments to which the present invention is applied are not limited to the embodiment described below.
An information processing device (information processing apparatus) according to the present embodiment acquires experimental data and motivation function information, determines parameters of a motivation function, and calculates a loss-framed incentive measure that increases a success rate of a task.
The data acquisition unit 11 acquires experimental data 101 and motivation function information 102.
The experimental data 101 is data indicating a result of performing an experiment in which one or a plurality of users are caused to perform a task with intervention by a loss-framed incentive performed. For example, the content of the experiment is as follows.
Each of the users is given p0 as a temporary acquisition incentive on the first day of an experimental period T. The temporary acquisition incentive is an incentive that each of the users temporarily acquires, and is an incentive that the users can definitely acquire by success of the task. Every day, before the success or failure of the task is determined, declaration of “the temporary acquisition incentive is maintained at pt−1 in a case where the task on the t-th day succeeds, and an incentive of xt is forfeited from pt−1 in a case where it fails” is made. The loss-framed incentive xt presented at this time is a value randomly selected from a set X. Provided that 0<xt<pt−1,
The experimental data 101 is data obtained from the above-described experiment, and includes, for example, loss-framed incentives xt, . . . , xT presented in each step of each of the users, temporary acquisition incentives p1, . . . , pT, and information of the success or failure of the task (yt=(1, 0)) y1, . . . , yT. Note that the experimental data 101 may be data obtained from theory, inference, or the like as long as the data indicates a relationship between the amount of a loss-framed incentive and the success or failure of the task.
The motivation function information 102 is information indicating a function indicating motivation of a user for the task (hereinafter, referred to as motivation function). The motivation function is expressed as follows, for example, using a success probability of the task in each step t of each of the users as ut.
The term of a loss-framed incentive presented satisfies the following conditions.
(1) As a loss-framed incentive presented increases, motivation also increases.
(2) There is a decreasing marginal utility property.
(3) In a case where a loss-framed incentive presented exceeds a certain threshold, motivation increases drastically.
(4) Normalization is performed.
The term of a temporary acquisition incentive satisfies the following conditions.
(1) As a temporary acquisition incentive increases, motivation also increases.
(2) There is a decreasing marginal utility property.
(3) In a case where a temporary acquisition incentive exceeds a certain threshold, motivation increases drastically.
(4) Normalization is performed.
As an example of satisfying (1) and (2) in each term, power functions xαt, pβt−1 may be used. Here, 0<α<1,0<β<1 are satisfied. As an example of satisfying (3) in each term, a sigmoid function of the following formula may be used using a gain in the term of a loss-framed incentive presented or the term of a temporary acquisition incentive as a or b.
As an example of satisfying (4) in each term, a min function may be used.
That is, an example of the motivation function ut is expressed as follows.
Provided that A, B, α, β, a, b, xloss, and ptemp represent user-specific parameters. Since A and B represent the ratio of the influence of a loss-framed incentive presented and a temporary acquisition incentive, A, B>0 and A+B=1 are satisfied. Note that, since the influence of a temporary acquisition incentive is weak and the influence of a loss-framed incentive presented is strong for a user for which A>B is satisfied, the user is a person who makes greater effort for task success by a loss-framed incentive presented being increased or decreased. Since the influence of a temporary acquisition incentive is strong for a user for which A<B is satisfied, the user is a person who does not make efforts for task success even if a loss-framed incentive presented is increased or decreased any amount in a case where the task fails and the temporary acquisition incentive decreases below a certain threshold.
xloss and ptemp represent thresholds at which motivation drastically increases.
The advantage of defining the motivation function in this manner is that a magnitude relationship of the influence of a loss-framed incentive presented and a temporary acquisition incentive can be clarified and the personality can be made clear.
The parameter determination unit 12 determines eight parameters included in the motivation function based on the experimental data 101 and the motivation function information 102.
The parameter determination unit 12 uses a loss-framed incentive presented in each step, a temporary acquisition incentive, and the success or failure of the task (yt=(1, 0)) as observation values based on the experimental data 101. That is, the observation values are (x1, p0, y1), . . . , (xT, pT-1, yT).
A probability model indicating the probability of the success or failure in the task in each step is expressed by a binomial distribution as in the following formula.
Here, P(yt|ut)=P(yt|ut(s) is satisfied. s represents the user-specific parameters A, B, α, β, a, b, xloss, and ptemp.
Likelihood L(s) in the probability model is expressed as following formula.
The parameter determination unit 12 determines each of the parameters using maximum likelihood estimation as indicated in the following formula.
An example of the parameters to be determined is as follows.
The incentive measure calculation unit 13 calculates an incentive measure that increases motivation of a user. For example, the incentive measure calculation unit 13 optimizes the amount of a loss-framed incentive that maximizes motivation of a user, and calculates the loss-framed incentive at that time.
Specifically, the incentive measure calculation unit 13 calculates an incentive measure based on the motivation function information 102 and the determined parameters. The incentive measure is a function ƒ that uses the task success/failure yt−1 on the (t−1)-th day, the temporary acquisition incentive pt−1, and the current time step t as inputs and outputs the loss-framed incentive amount xt on the t-th day as indicated in the following formula.
The optimum incentive measure is a measure that maximizes the expected value of the total number of times of task success in T days as indicated in Formula (1).
Here, E[·] represents an expected value.
Under the above-described motivation function, the success or failure yt of the task follows the following Markov decision process (hereinafter, referred to as MDP).
State on the t-Th Day:
Incentive Set that can be Taken on the t-Th Day:
Probability that the state Vt+1=(t+1, yt, pt) on the (t+1)-th day is generated under conditions of the state on the t-th day and an incentive:
Reward of the t-Th Day t: yt
Provided that a possible value of the loss-framed incentive xt is equal to or less than the temporary acquisition incentive pt−1 among N discrete values {a1, a2, . . . , aN} prepared in advance. It is known that, in the MDP, a measure that maximizes the expected value of the reward sum expressed as follows is obtained by the Bellman optimality equation being solved.
Therefore, the incentive measure calculation unit 13 obtains a measure f* that satisfies Formula (1) by similarly solving the Bellman optimality equation. There is a plurality of methods for solving the Bellman optimality equation, and as an example, Deep Q Network using a neural network can be cited (Reference Literature 1).
The incentive measure calculation unit 13 outputs incentive measure information 103 indicating the calculated incentive measure f*. For example, in a case where Deep Q Network is used, f* can be given by following formula using an action value function Q(Vt, xt) approximated by a neural network.
In this case, the output incentive measure information 103 is a model parameter group of the neural network.
Next, an operation example of the information processing device 10 will be described with reference to the drawings. The information processing device 10 starts incentive measure calculation processing in response to a user's operation or the like.
The parameter determination unit 12 determines the parameters based on the experimental data 101 and the motivation function information 102 (step S12).
The incentive measure calculation unit 13 calculates an incentive measure based on the motivation function information 102 and the determined parameters (step S13). The incentive measure calculation unit 13 outputs the incentive measure information 103 indicating the calculated incentive measure.
The information processing device 10 can be implemented, for example, by a computer being caused to execute a program in which processing content described in the present embodiment is described. Note that the “computer” may be a physical machine or may be a virtual machine in a cloud. In a case where a virtual machine is used, “hardware” described herein is virtual hardware.
The program can be stored and distributed by being recorded in a computer-readable recording medium (portable memory or the like). The program can also be provided through a network such as the Internet or an electronic mail.
The program for implementing processing in the computer is provided through a recording medium 1001 such as a CD-ROM or a memory card, for example. When the recording medium 1001 that stores the program is set in the drive device 1000, the program is installed from the recording medium 1001 into the auxiliary storage device 1002 via the drive device 1000. However, the program is not necessarily installed from the recording medium 1001 and may be downloaded from another computer via a network. The auxiliary storage device 1002 stores the installed program and also stores necessary files, data, and the like.
In a case where an instruction to start the program is issued, the memory device 1003 reads the program from the auxiliary storage device 1002, and stores the program therein. The CPU 1004 implements a function related to the device in accordance with the program stored in the memory device 1003. The interface device 1005 is used as an interface for connection to the network. The display device 1006 displays a graphical user interface (GUI) or the like according to the program. The input device 1007 includes a keyboard and a mouse, buttons, a touch panel, or the like, and is used to input various operation instructions. The output device 1008 outputs a computation result. Note that the computer may include a graphics processing unit (GPU) or a tensor processing unit (TPU) instead of the CPU 1004, and may include a GPU or a TPU in addition to the CPU 1004. In such a case, for example, processing may be shared and executed such that the GPU or the TPU executes processing requiring special computation and the CPU 1004 executes other processing.
According to the information processing device 10 according to the present embodiment, experimental data 101 and motivation function information 102 are acquired, parameters of a motivation function is determined, and a loss-framed incentive measure that increases a success rate of a task is calculated. As a result, an incentive measure that increases a success rate of a task can be calculated.
The information processing device 10 may calculate the amount of a loss-framed incentive that maximizes the average motivation in a period in which a task is executed. As a result, the task success rate can be further increased.
The information processing device 10 may input output incentive measure information 103 to a design device that performs task design and the like. The design device can design a task indicating a high success rate based on the input incentive measure information 103.
In the present specification, at least the information processing device (information processing apparatus), the incentive measure calculation method, and the program described in items described below are described.
An information processing device (information processing apparatus) including:
The information processing device according to the item 1,
The information processing device according to the item 1 or 2,
The information processing device according to any one of items 1 to 3,
An incentive measure calculation method performed by an information processing device, including:
A program for causing a computer to function as each unit in the information processing device according to any one of items 1 to 4.
Although the present embodiment has been described so far, the present invention is not limited to such a specific embodiment, and various modifications and changes can be made within the scope of the present invention disclosed in the claims.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/JP2021/034346 | 9/17/2021 | WO |