The present invention relates to a point process learning method, a point process learning apparatus, and a program.
Predicting the occurrence of future events is important in various applications, and a model called a point process has been often used conventionally. Note that the events are certain phenomena, and examples thereof include device failures, behaviors of human, crimes, earthquakes, infectious diseases, and the like.
Although many pieces of event data (i.e., event data representing a history of events that occurred in the past) and prior knowledge are required in order to predict the occurrence of future events by a point process, it may be difficult in reality to prepare these. For example, it is difficult to prepare many pieces of event data in a case where the phenomenon is a new one (e.g., an infectious disease caused by an unknown virus, usage status of a new service, etc.) and there are few events that have occurred in the past. Moreover, it is difficult to prepare the prior knowledge in a case where it is assumed that the occurrence tendency of the event is different from the past (e.g., a case where a service given in a region A is deployed in another region B, a case where a new law is enforced, etc.), for example.
An embodiment of the present invention has been made in view of the above points, and an object thereof is to accurately predict the occurrence of future events.
In order to achieve the above object, according to an embodiment, a point process learning method executed by a computer includes: an input procedure of inputting a learning data set including at least first event data representing a series of occurrences of first events; a division procedure of dividing the first event data included in the learning data set by using a prediction time observation area including at least a time series when predicting future event occurrence; and a learning procedure of learning a model parameter including a parameter of an intensity function of a predetermined point process model by using a divided learning data set divided in the division procedure.
It is possible to accurately predict the occurrence of future events.
Hereinafter, an embodiment of the present invention will be described. In the present embodiment, a point process learning apparatus 10 capable of accurately predicting the occurrence of future events by a point process even in a case where there is a small number of pieces of past event data and there is no prior knowledge regarding an event to be predicted will be described. Note that a learning time at which a parameter of a model (which will be hereinafter also referred to as a “prediction model”) is learned and a prediction time at which the occurrence of future events is predicted from a prediction model using a learned parameter exist in the point process learning apparatus 10 according to the present embodiment.
First, a hardware configuration of the point process learning apparatus 10 according to the present embodiment will be described with reference to
As illustrated in
The input device 11 is, for example, a keyboard, a mouse, a touch panel, or the like. The display device 12 is, for example, a display or the like. Note that the point process learning apparatus 10 may not have, for example, at least one of the input device 11 or the display device 12.
The external I/F 13 is an interface with an external device such as a recording medium 13a. The point process learning apparatus 10 can perform reading, writing, and the like to the recording medium 13a via the external I/F 13. Note that examples of the recording medium 13a include a compact disc (CD), a digital versatile disk (DVD), a secure digital memory card (SD memory card), a universal serial bus (USB) memory card, and the like.
The communication I/F 14 is an interface for connecting the point process learning apparatus 10 with a communication network. The processor 15 is, for example, one of various arithmetic devices such as a central processing unit (CPU) and a graphics processing unit (GPU). The memory device 16 is, for example, one of various storage devices such as a hard disk drive (HDD), a solid state drive (SSD), a random access memory (RAM), a read only memory (ROM), and a flash memory.
The point process learning apparatus 10 according to the present embodiment can implement learning processing and prediction processing to be described later by having the hardware configuration illustrated in
Next, symbols and the like to be used in the present embodiment are prepared.
The data set is denoted by D=(De, {Dc}cϵC). Here, De is event data, and Dc is auxiliary data related to the attribute cϵC. That is, the data set D includes the event data De and |C| pieces of auxiliary data.
The event data De is obtained by sorting a series of events in order of occurrence thereof, and is represented as:
e
={x
n}n=1N [Math. 1]
N is the number of pieces of data (i.e., the number of occurrences of events) included in the event data, and xn represents an n-th event that has occurred. The xn is a d-dimensional real vector, that is:
x
n∈d [Math. 2]
Examples of xn and an event include:
Hereinafter, the above example is assumed as an example in the case of d=1, 3. Moreover, in the following description, an element representing time among the elements of xn is denoted by t, and the remaining elements are denoted by r.
The auxiliary data Dc is data other than the event, and is represented as:
c={(xcn,acn)}n=1N
Nc is the number of pieces of data included in the auxiliary data regarding the attribute c∈C. Moreover, (Xcn, acn) represents Xcn and acn with respect to the attribute c, and is:
x
cn∈d
a
cn
∈
d
c
[Math. 4]
Here, dc (where dc≤d) is the number of dimensions of xcn, and dca is the number of dimensions of acn.
Examples of xcn and aca include:
However, dc=0 is a special case, and acn is associated with the entire series (i.e., all xn).
Note that, although the prediction accuracy is expected to be improved when the auxiliary data is present, the auxiliary data may not be present (this case means C=φ).
Moreover, it is assumed that the value of xn (and xcn) is normalized or the like for each data set so as to have a common domain between data sets. For example, in the case of d=3, the time t is normalized to represent the time that has elapsed from a reference with the observation start time point of the event as the reference (t=0). Moreover, the latitude and longitude are normalized by [0, 1] (i.e., 0≤r1, r2≤1 are satisfied where the latitude is denoted by r1 and the longitude is denoted by r2, for example).
It is assumed that the following two areas are given as d-dimensional areas.
Prediction time observation area o⊂d Prediction target area t⊂d [Math. 5]
The prediction time observation area is an area where the occurrence of events is observed at the time of prediction (i.e., at the time of predicting the occurrence of future events). On the other hand, the prediction target area is a prediction target area for which the occurrence of future events is predicted. Note that an outline character is displayed as a normal character in the text of the specification. For example, the prediction time observation area is denoted by Xo, and the prediction target area is denoted by Xt.
Examples of the prediction time observation area Xo and the prediction target area Xt in the case of d=3 include the following.
X
o={(t,r1,r2)|0≤t≤5,0≤r1,r2≤1}
X
t={(t,r1,r2)|5≤t≤1000,0≤r1,r2≤1}
It is assumed that |S| data sets {Ds}sϵS are given at the time of learning. Here,
s=(e
e
s
={x
n}n=1N
D
c
s={(xcns,acns)}n=1N
Note that the data set {Ds}sϵS is also referred to as a “learning data set”.
At the time of prediction, it is assumed that a data set Ds* (where s* is an element not included in S) and a prediction target area Xt are given. Here,
s*=(es*,{cs*}cϵC)
e
s*
={x
n}n=1N
c
s*={(xcns*,acns*)}n=1N
However, Ns* is a relatively small natural number (e.g., Ns*=5, Ns*=10, or the like). Note that the data set Ds* is also referred to as a “prediction data set”.
At this time, it is an object to accurately predict events
{xn}n=N
[Math. 9]
Note that each of the event data Des is a series of occurrences of first events used for learning of the prediction model, and the event data Des* is a series of occurrences of second events to be predicted. In the present embodiment, it is assumed that the first events and the second events are different events.
Hereinafter, the prediction model will be described. The prediction model includes the following latent vector z and the intensity function X, and the occurrence of events is predicted by a prediction method described below at the time of prediction.
The latent vector z is defined below.
z=f
z([fe({xn}n=1N),{fc({xcn,acn}n=1N
Here, [▪,▪] represents vector concatenation.
Moreover, fe is a function that outputs a ke-dimensional vector with an arbitrary number of events as an input. As fe, for example, a recurrent neural network (RNN), an attention model-based neural network, or the like can be used.
The fc is a function that outputs a kc-dimensional vector with auxiliary data as an input. A specific function to be used as fc depends on the format of the auxiliary data. In the case of the above-described image such as a satellite image, for example, a convolutional neural network (CNN) or the like is used as fc. Moreover, in the case of the series data (e.g., sensor data, etc.), for example, CNN, RNN, or the like is used as fc. In addition, a fully connected layer neural network, attention model-based neural network, or the like may be used as fc according to the format of the auxiliary data.
The fz is a function that outputs a K-dimensional vector with a (ke+ΣcϵCkc)-dimensional vector as an input. As fz, for example, a fully connected layer neural network can be used.
Note that the definition of the latent vector z represented in above Formula 10 is an example, and, for example, event data may not be used, that is,
f
e({xn}n=1N) [Math. 11]
The intensity function λ is defined below.
λ(x|{xn}n=1N,{xcn,acn}n=1N
Here, θ is all the parameters in the intensity function.
Note that the definition of the intensity function X represented in above Formula 12 is an example, and, for example, auxiliary data
{xcn,acn}n=1N
Moreover, although the intensity function λ is a function that characterizes a point process model, the present embodiment is applicable to an arbitrary point process model. As an example, a point process model and an intensity function λ that characterizes the point process model are shown below.
Extension using a neural network of the Hawkes process.
At this time, the intensity function λ is represented as follows.
λ(x|{xn}n=1N,z;θ)=fb(z)+Σx
Here,
g(x,x′;z)=exp (−∥fl([x,z])−fl([x′,z])∥2) [Math. 15]
Moreover, fl (l is a lower case of L) is an arbitrary neural network, and fb is a neural network in which an arbitrary output has a positive scalar value.
The above spatiotemporal extension.
It is represented as x=(t, r) with t as time and r as position coordinates (e.g., latitude and longitude). At this time, the intensity function λ is represented as follows.
λ((t,r)|{xn}n=1N;θ)=fb([r,z])+Σt
Here,
g
1(r,r′;z)=exp(−∥fl
g
2(t,t′;z)=exp (−∥fl
Moreover,
f
l
,f
l
[Math. 18]
In the process of predicting the occurrence of events, the occurrence of the events may be predicted by the prediction likelihood determined from the above intensity function λ, or may be predicted by simulation using the above intensity function λ.
The prediction likelihood determined from the above intensity function X is defined below.
p({xn|xn∈t}|{xn|xn∈o},{{(xn,acn)|xcn1∈o}}c∈C) [Math. 19]
On the other hand, as a simulation using the above intensity function λ, existing technique described in, for example, reference literature “Ogata, Y. “On Lewis ‘simulation method for point processes.”, IEEE Transactions on Information Theory 27(1), 2331 (1981)” or the like may be used.
Next, a functional configuration of the point process learning apparatus 10 at the time of learning will be described with reference to
As illustrated in
Moreover, the point process learning apparatus 10 at the time of learning has a storage unit 110. The storage unit 110 is implemented by, for example, the memory device 16. However, the storage unit 110 may be implemented by, for example, a storage device (e.g., a database server, etc.) connected with the point process learning apparatus 10 via a communication network.
The storage unit 110 stores a learning data set (Ds)sϵS for learning a parameter (which will be hereinafter referred to as a “model parameter”) of the prediction model.
The selection unit 101 randomly selects one data set Ds from the learning data set {Ds}sϵS stored in the storage unit 110.
The division unit 102 determines a learning observation area Xo′ from the prediction time observation area Xo, and uses the learning observation area Xo′ to divide the event data Des and the auxiliary data {Dcs}cϵC included in the data set Ds={Des, {Dcs}cϵC}. At this time, the division unit 102 makes a division into three pieces of event data Deso′ and auxiliary data {Dcso′}cϵC, corresponding to the learning observation area Xo′, event data Dest later than the learning observation area Xo′, and other data.
Note that a specific division method will be described later.
The feature extraction unit 103 calculates the latent vector zso by the above Formula 10 using the event data Deso′ and the auxiliary data {Dcso′}cϵC corresponding to the learning observation area Xo′.
The intensity function estimation unit 104 calculates the intensity function λ by the above Formula 12 using the event data Deso′ and the auxiliary data {Dcso′}cϵC corresponding to the learning observation area Xo′ and the latent vector zso.
The parameter update unit 105 updates the model parameters (i.e., the parameters of the neural network such as fe, fc, and fz, and the parameter θ of the intensity function λ) so as to minimize an error from the event data Dest later than the learning observation area Xo′. At this time, when the prediction likelihood is used, the negative log likelihood of p(Dest|Deso′, {Dcso′}cϵC) may be minimized. Note that the prediction likelihood may be p(Dest, Deso′|Deso′, {Dcso′}cϵC) (that is, Deso′ may be used at the time of calculating the likelihood). On the other hand, in the case of prediction by simulation, an error between the result and Dest may be minimized.
Next, the learning processing according to the present embodiment will be described with reference to
First, the selection unit 101 randomly selects one data set Ds from the learning data set {Ds}sϵS stored in the storage unit 110 (step S101).
Next, the division unit 102 determines a learning observation area Xo′ from the prediction time observation area Xo′ (step S102). Here, the learning observation area Xo′ is determined by the following determination method with reference to the prediction time observation area Xo.
As an example, an example of the learning observation area Xo′ in a case where Xo={(t, r1, r2)|0≤t≤5, 0≤r1, r2≤1} is satisfied will be described below.
X
o′={(t,r1,r2)|3≤t≤8,0≤r1,r2≤1}
X
o′={(t,r1,r2)|4≤t≤9,0≤r1,r2≤1}
X
o′={(t,r1,r2)|5≤t≤10,0≤r1,r2≤1}
Next, the division unit 102 divides the event data Des and the auxiliary data {Dcs}cϵC included in the data set Ds={Des, {Dcs}cϵC} using the learning observation area Xo′(step S103). That is, the division unit 102 divides the event data Des into three pieces of event data Deso′ corresponding to the learning observation area Xo′, event data Dest later than the learning observation area Xo′, and other data. Similarly, the division unit 102 divides the auxiliary data {Dcs}cϵC into three pieces of auxiliary data {Dcso′}cϵC: corresponding to the learning observation area Xo′ and other data. There are three pieces of data used in the processing described later: Deso′, Dest, and {Dcso′}cϵC, and no other data is used.
Next, the feature extraction unit 103 calculates the latent vector zso by the above Formula 10 using the event data Deso′ and the auxiliary data {Dcso′}cϵC corresponding to the learning observation area Xo′(step S104). That is, the feature extraction unit 103 calculates the latent vector zso by the following formula.
z
so
=f
z([fe(Deso′),{fc(Dcso′)}cϵC])
Note that, as described above, the latent vector zso may be calculated without using the event data Deso′ in a case where auxiliary data is given, or the latent vector zso may be calculated only using the event data Deso′ in a case where no auxiliary data is given.
Next, the intensity function estimation unit 104 calculates the intensity function λ by the above Formula 12 using the event data Deso′ and the auxiliary data {Dcso′}cϵC corresponding to the learning observation area Xo′ and the latent vector zso (step S105). That is, the intensity function estimation unit 104 calculates λ(x|Deso′, {Dcso′}cΣC, zso). Note that, as described above, the auxiliary data {Dcso′}cϵC may be used only partially or not used at all.
Next, the parameter update unit 105 calculates an error from event data Dest later than the learning observation area Xo′ (step S106). Note that, as described above, the negative log likelihood of the prediction likelihood p(Dest|Deso′, {Dcso′}cϵC) may be used as the error, or the error between the simulation result and Dest may be used as the error.
Then, the parameter update unit 105 updates the model parameter so as to minimize the error calculated in the above step S106 using, for example, the gradient method (step S107).
As described above, the point process learning apparatus 10 according to the present embodiment can learn the parameters (i.e., the parameters of the neural network such as fe, fc, and fz, and the parameter θ of the intensity function λ) of the prediction model. At this time, as described in the above steps S102 to S103, the point process learning apparatus 10 according to the present embodiment divides the data set Ds using the learning observation area Xo′ determined from the prediction time observation area Xo, and then calculates the intensity function, the prediction likelihood, and the like using the divided data set. As a result, it is possible to accurately predict the occurrence of future events even if the number of pieces of event data given at the time of prediction is small.
Next, a functional configuration of the point process learning apparatus 10 at the time of prediction will be described with reference to
As illustrated in
Moreover, the point process learning apparatus 10 at the time of prediction has the storage unit 110. The storage unit 110 is implemented by, for example, the memory device 16. However, the storage unit 110 may be implemented by, for example, a storage device (e.g., a database server, etc.) connected with the point process learning apparatus 10 via a communication network.
The storage unit 110 stores a prediction data set Ds* for predicting events that occur in the prediction target area Xt.
The feature extraction unit 103 calculates the latent vector zs* by the above Formula 10 using the event data Des* and the auxiliary data (Dcs*)cϵC included in the prediction data set Ds*. However, used are parameters of the neural network such as fe, fc, and fz that have already been learned.
The intensity function estimation unit 104 uses the event data Des* and the auxiliary data {Dcs*} ee included in the prediction data set Ds* and the latent vector zs* to calculate the intensity function λ by the above Formula 12. However, used is the learned parameter θ of the intensity function λ that has already been learned.
The prediction unit 106 predicts events that occur in the prediction target area Xt by the intensity function λ.
Next, prediction processing according to the present embodiment will be described with reference to
First, the latent vector zs* is calculated by the above Formula 10 using the event data Des* and the auxiliary data {Dcs*})cϵC included in the prediction data set Ds* (step S201). That is, the feature extraction unit 103 calculates the latent vector zs* by the following formula.
z
s
*=f
z([fe(Des*),{fc(Dcs*)}cϵC])
Note that, as described above, the latent vector zs* may be calculated without using the event data Des* in a case where auxiliary data is given, or the latent vector zs* may be calculated only using the event data Des* in a case where no auxiliary data is given.
Next, the intensity function estimation unit 104 uses the event data Des* and the auxiliary data {Dcs*}cϵC included in the prediction data set Ds* and the latent vector zs* to calculate the intensity function λ by the above Formula 12 (step S202). That is, the intensity function estimation unit 104 calculates λ(x|Des*, {Dcs*}cϵC, zs*). Note that, as described above, the auxiliary data {Dcs*}cϵC may be used only partially or not used at all.
Then, the prediction unit 106 predicts events that occur in the prediction target area Xt by the intensity function λ(x|Des*, {Dcs*}cϵC, zs*) (step S203).
As described above, the point process learning apparatus 10 according to the present embodiment can predict events that occur in the prediction target area Xt using the prediction data set Ds* including a relatively small number of pieces of data.
<Comparative Example with Conventional Technique>
The embodiment described above can be easily extended to an arbitrary marked point process. In the marked point process, the event data De is given below.
e
={x
n
,y
n)}n=1N [Math 20]
Note that yn may be any of discrete, continuous, and dimensional.
By replacing the event data De in the embodiment described above with the event data De represented in the above Formula 20, an arbitrary marked point process is extended.
As an example of the above embodiment, an example of data of a case where events to be predicted are set as “the occurrence of infected people of new infectious disease B* in region A* occurring in the next half year” is shown below. At this time, the event data De={xn} is xn=(time, latitude, longitude).
Example of learning data set: Series of occurrences events of infected people with other infectious diseases Bl, . . . , BN′, in other regions A1, . . . , AN′ (e.g., each in one year or the like)
Example of auxiliary data: Real-time demographic data, map data showing public transportation, and climate information (e.g., the highest temperature, the lowest temperature, the humidity, and the like in the region) data
Example of a mark when applied to marked point process: Gender, age, and occupation of infected person
Example of prediction data set: Series of occurrences of events for the past one week of infected people with new infectious disease B* in region A*, and above-described auxiliary data for the same period or independent of time (e.g., real-time demographic data and climate information as auxiliary data for the same period as the series of the occurrences of the events, map data indicating public transportation as auxiliary data independent of time, etc.)
The present invention is not limited to the above embodiment specifically disclosed, and various modifications and changes, combinations with known technique, and the like can be made without departing from the scope of the claims.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/JP2020/045033 | 12/3/2020 | WO |