POINT PROCESS LEARNING METHOD, POINT PROCESS LEARNING APPARATUS AND PROGRAM

Information

  • Patent Application
  • 20230385638
  • Publication Number
    20230385638
  • Date Filed
    December 03, 2020
    3 years ago
  • Date Published
    November 30, 2023
    6 months ago
Abstract
According to an embodiment, a point process learning method executed by a computer includes: an input procedure of inputting a learning data set including at least first event data representing a series of occurrences of first events; a division procedure of dividing the first event data included in the learning data set by using a prediction time observation area including at least a time series when predicting future event occurrence; and a learning procedure of learning a model parameter including a parameter of an intensity function of a predetermined point process model by using a divided learning data set divided in the division procedure.
Description
TECHNICAL FIELD

The present invention relates to a point process learning method, a point process learning apparatus, and a program.


BACKGROUND ART

Predicting the occurrence of future events is important in various applications, and a model called a point process has been often used conventionally. Note that the events are certain phenomena, and examples thereof include device failures, behaviors of human, crimes, earthquakes, infectious diseases, and the like.


CITATION LIST
Non Patent Literature



  • Non Patent Literature 1: Edwards, Harrison, and Amos Storkey. “Towards a neural statistician.” arXiv preprint arXiv:1606.02185 (2016).

  • Non Patent Literature 2: Du, Nan, et al. “Recurrent marked temporal point processes: Embedding event history to vector.” Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2016



SUMMARY OF INVENTION
Technical Problem

Although many pieces of event data (i.e., event data representing a history of events that occurred in the past) and prior knowledge are required in order to predict the occurrence of future events by a point process, it may be difficult in reality to prepare these. For example, it is difficult to prepare many pieces of event data in a case where the phenomenon is a new one (e.g., an infectious disease caused by an unknown virus, usage status of a new service, etc.) and there are few events that have occurred in the past. Moreover, it is difficult to prepare the prior knowledge in a case where it is assumed that the occurrence tendency of the event is different from the past (e.g., a case where a service given in a region A is deployed in another region B, a case where a new law is enforced, etc.), for example.


An embodiment of the present invention has been made in view of the above points, and an object thereof is to accurately predict the occurrence of future events.


Solution to Problem

In order to achieve the above object, according to an embodiment, a point process learning method executed by a computer includes: an input procedure of inputting a learning data set including at least first event data representing a series of occurrences of first events; a division procedure of dividing the first event data included in the learning data set by using a prediction time observation area including at least a time series when predicting future event occurrence; and a learning procedure of learning a model parameter including a parameter of an intensity function of a predetermined point process model by using a divided learning data set divided in the division procedure.


Advantageous Effects of Invention

It is possible to accurately predict the occurrence of future events.





BRIEF DESCRIPTION OF DRAWINGS


FIG. 1 is a diagram illustrating an example of a hardware configuration of a point process learning apparatus according to the present embodiment.



FIG. 2 is a diagram illustrating an example of a functional configuration of a point process learning apparatus at the time of learning.



FIG. 3 is a flowchart illustrating an example of learning processing according to the present embodiment.



FIG. 4 is a diagram for explaining an example of data division.



FIG. 5 is a diagram illustrating an example of a functional configuration of a point process learning apparatus at the time of prediction.



FIG. 6 is a flowchart illustrating an example of prediction processing according to the present embodiment.



FIG. 7 is a diagram illustrating a comparative example with conventional technique.





DESCRIPTION OF EMBODIMENTS

Hereinafter, an embodiment of the present invention will be described. In the present embodiment, a point process learning apparatus 10 capable of accurately predicting the occurrence of future events by a point process even in a case where there is a small number of pieces of past event data and there is no prior knowledge regarding an event to be predicted will be described. Note that a learning time at which a parameter of a model (which will be hereinafter also referred to as a “prediction model”) is learned and a prediction time at which the occurrence of future events is predicted from a prediction model using a learned parameter exist in the point process learning apparatus 10 according to the present embodiment.


<Hardware Configuration>

First, a hardware configuration of the point process learning apparatus 10 according to the present embodiment will be described with reference to FIG. 1. FIG. 1 is a diagram illustrating an example of a hardware configuration of the point process learning apparatus 10 according to the present embodiment.


As illustrated in FIG. 1, the point process learning apparatus 10 according to the present embodiment is implemented by a hardware configuration of a general computer or computer system, and has an input device 11, a display device 12, an external I/F 13, a communication I/F 14, a processor 15, and a memory device 16. These hardware devices are communicably connected via a bus 17.


The input device 11 is, for example, a keyboard, a mouse, a touch panel, or the like. The display device 12 is, for example, a display or the like. Note that the point process learning apparatus 10 may not have, for example, at least one of the input device 11 or the display device 12.


The external I/F 13 is an interface with an external device such as a recording medium 13a. The point process learning apparatus 10 can perform reading, writing, and the like to the recording medium 13a via the external I/F 13. Note that examples of the recording medium 13a include a compact disc (CD), a digital versatile disk (DVD), a secure digital memory card (SD memory card), a universal serial bus (USB) memory card, and the like.


The communication I/F 14 is an interface for connecting the point process learning apparatus 10 with a communication network. The processor 15 is, for example, one of various arithmetic devices such as a central processing unit (CPU) and a graphics processing unit (GPU). The memory device 16 is, for example, one of various storage devices such as a hard disk drive (HDD), a solid state drive (SSD), a random access memory (RAM), a read only memory (ROM), and a flash memory.


The point process learning apparatus 10 according to the present embodiment can implement learning processing and prediction processing to be described later by having the hardware configuration illustrated in FIG. 1. Note that the hardware configuration illustrated in FIG. 1 is an example, and the point process learning apparatus 10 may have another hardware configuration. For example, the point process learning apparatus 10 may have a plurality of processors 15 or a plurality of memory devices 16.


<Preparation>

Next, symbols and the like to be used in the present embodiment are prepared.


The data set is denoted by D=(De, {Dc}cϵC). Here, De is event data, and Dc is auxiliary data related to the attribute cϵC. That is, the data set D includes the event data De and |C| pieces of auxiliary data.


The event data De is obtained by sorting a series of events in order of occurrence thereof, and is represented as:






custom-character
e
={x
n}n=1N  [Math. 1]


N is the number of pieces of data (i.e., the number of occurrences of events) included in the event data, and xn represents an n-th event that has occurred. The xn is a d-dimensional real vector, that is:






x
ncustom-characterd  [Math. 2]


Examples of xn and an event include:

    • In the case of d=1, xn is time, and the event is a behavior of a person (e.g., to walk or to eat) or the like.
    • In the case of d=3, xn is spatiotemporal space (time, latitude, and longitude), and the event is cluster occurrence of an infectious disease or the like.


Hereinafter, the above example is assumed as an example in the case of d=1, 3. Moreover, in the following description, an element representing time among the elements of xn is denoted by t, and the remaining elements are denoted by r.


The auxiliary data Dc is data other than the event, and is represented as:






custom-character
c={(xcn,acn)}n=1Nc  [Math. 3]


Nc is the number of pieces of data included in the auxiliary data regarding the attribute c∈C. Moreover, (Xcn, acn) represents Xcn and acn with respect to the attribute c, and is:






x
cncustom-characterdc






a
cn

custom-character
d
c

a
  [Math. 4]


Here, dc (where dc≤d) is the number of dimensions of xcn, and dca is the number of dimensions of acn.


Examples of xcn and aca include:

    • In the case of d=1, it is assumed that dc=0 and dca=1, and acn is gender (e.g., the gender is represented by a categorical variable, and acn={0, 1}) or the like.


However, dc=0 is a special case, and acn is associated with the entire series (i.e., all xn).

    • In the case of d=1, it is assumed that dc=1 and dca=1, and xcn is time, acn is heart rate or the like.
    • In the case of d=3, it is assumed that dc=2 and dca=the number of pixels, and xcn is latitude and longitude, and acn is a pixel value (i.e., pixel value at the latitude and longitude of a satellite image, for example) or the like.
    • In the case of d=3, it is assumed that dc=3 and dca=1, and xcn is time, and the latitude and longitude of the temperature sensor, and acn is temperature or the like.


Note that, although the prediction accuracy is expected to be improved when the auxiliary data is present, the auxiliary data may not be present (this case means C=φ).


Moreover, it is assumed that the value of xn (and xcn) is normalized or the like for each data set so as to have a common domain between data sets. For example, in the case of d=3, the time t is normalized to represent the time that has elapsed from a reference with the observation start time point of the event as the reference (t=0). Moreover, the latitude and longitude are normalized by [0, 1] (i.e., 0≤r1, r2≤1 are satisfied where the latitude is denoted by r1 and the longitude is denoted by r2, for example).


It is assumed that the following two areas are given as d-dimensional areas.





Prediction time observation area custom-characterocustom-characterd Prediction target area custom-charactertcustom-characterd  [Math. 5]


The prediction time observation area is an area where the occurrence of events is observed at the time of prediction (i.e., at the time of predicting the occurrence of future events). On the other hand, the prediction target area is a prediction target area for which the occurrence of future events is predicted. Note that an outline character is displayed as a normal character in the text of the specification. For example, the prediction time observation area is denoted by Xo, and the prediction target area is denoted by Xt.


Examples of the prediction time observation area Xo and the prediction target area Xt in the case of d=3 include the following.






X
o={(t,r1,r2)|0≤t≤5,0≤r1,r2≤1}






X
t={(t,r1,r2)|5≤t≤1000,0≤r1,r2≤1}


<<Time of Learning>>

It is assumed that |S| data sets {Ds}sϵS are given at the time of learning. Here,






custom-character
s=(custom-charactere1s{Dcs}c=1C






custom-character
e
s
={x
n}n=1Ns






D
c
s={(xcns,acns)}n=1Ncs  [Math. 6]


Note that the data set {Ds}sϵS is also referred to as a “learning data set”.


<<Time of Prediction>>

At the time of prediction, it is assumed that a data set Ds* (where s* is an element not included in S) and a prediction target area Xt are given. Here,






custom-character
s*=(custom-characteres*,{custom-charactercs*}cϵC)






custom-character
e
s*
={x
n}n=1Ns*






custom-character
c
s*={(xcns*,acns*)}n=1Ncs*  [Math. 7]


However, Ns* is a relatively small natural number (e.g., Ns*=5, Ns*=10, or the like). Note that the data set Ds* is also referred to as a “prediction data set”.


At this time, it is an object to accurately predict events





{xn}n=Ns*+1Ns*+Nxt  [Math. 8]

    • that occur in the prediction target area Xt. Here,






custom-character  [Math. 9]

    • is the number of events that occur in the prediction target area Xt.


Note that each of the event data Des is a series of occurrences of first events used for learning of the prediction model, and the event data Des* is a series of occurrences of second events to be predicted. In the present embodiment, it is assumed that the first events and the second events are different events.


<Prediction Model>

Hereinafter, the prediction model will be described. The prediction model includes the following latent vector z and the intensity function X, and the occurrence of events is predicted by a prediction method described below at the time of prediction.


<<Latent Vector>>

The latent vector z is defined below.






z=f
z([fe({xn}n=1N),{fc({xcn,acn}n=1Nc)}c∈C])∈custom-characterK  [Math. 10]


Here, [,] represents vector concatenation.


Moreover, fe is a function that outputs a ke-dimensional vector with an arbitrary number of events as an input. As fe, for example, a recurrent neural network (RNN), an attention model-based neural network, or the like can be used.


The fc is a function that outputs a kc-dimensional vector with auxiliary data as an input. A specific function to be used as fc depends on the format of the auxiliary data. In the case of the above-described image such as a satellite image, for example, a convolutional neural network (CNN) or the like is used as fc. Moreover, in the case of the series data (e.g., sensor data, etc.), for example, CNN, RNN, or the like is used as fc. In addition, a fully connected layer neural network, attention model-based neural network, or the like may be used as fc according to the format of the auxiliary data.


The fz is a function that outputs a K-dimensional vector with a (kecϵCkc)-dimensional vector as an input. As fz, for example, a fully connected layer neural network can be used.


Note that the definition of the latent vector z represented in above Formula 10 is an example, and, for example, event data may not be used, that is,






f
e({xn}n=1N)  [Math. 11]

    • may not be used.


<<Intensity Function>>

The intensity function λ is defined below.





λ(x|{xn}n=1N,{xcn,acn}n=1Nc,z;θ  [Math. 12]


Here, θ is all the parameters in the intensity function.


Note that the definition of the intensity function X represented in above Formula 12 is an example, and, for example, auxiliary data





{xcn,acn}n=1Nc  [Math. 13]

    • may be used only partially or not used at all.


Moreover, although the intensity function λ is a function that characterizes a point process model, the present embodiment is applicable to an arbitrary point process model. As an example, a point process model and an intensity function λ that characterizes the point process model are shown below.

    • In the case of d=1


Extension using a neural network of the Hawkes process.


At this time, the intensity function λ is represented as follows.





λ(x|{xn}n=1N,z;θ)=fb(z)+Σxi<xg(x,xi;z)  [Math. 14]


Here,






g(x,x′;z)=exp (−∥fl([x,z])−fl([x′,z])∥2)  [Math. 15]


Moreover, fl (l is a lower case of L) is an arbitrary neural network, and fb is a neural network in which an arbitrary output has a positive scalar value.

    • In the case of d=3


The above spatiotemporal extension.


It is represented as x=(t, r) with t as time and r as position coordinates (e.g., latitude and longitude). At this time, the intensity function λ is represented as follows.





λ((t,r)|{xn}n=1N;θ)=fb([r,z])+Σti<tg1(r,ri;z)g2(t,ti;z)  [Math. 16]


Here,






g
1(r,r′;z)=exp(−∥fl1([r,z])−fl1([r′,z])∥2)






g
2(t,t′;z)=exp (−∥fl2([t,z])−fl2([t′,z])∥2)  [Math. 17]


Moreover,






f
l

1

,f
l

2
  [Math. 18]

    • is an arbitrary neural network, and fb is a neural network in which an arbitrary output has a positive scalar value.


<<Prediction Method>>

In the process of predicting the occurrence of events, the occurrence of the events may be predicted by the prediction likelihood determined from the above intensity function λ, or may be predicted by simulation using the above intensity function λ.


The prediction likelihood determined from the above intensity function X is defined below.






p({xn|xncustom-charactert}|{xn|xncustom-charactero},{{(xn,acn)|xcn1custom-charactero}}c∈C)  [Math. 19]


On the other hand, as a simulation using the above intensity function λ, existing technique described in, for example, reference literature “Ogata, Y. “On Lewis ‘simulation method for point processes.”, IEEE Transactions on Information Theory 27(1), 2331 (1981)” or the like may be used.


<Functional Configuration at the Time of Learning>

Next, a functional configuration of the point process learning apparatus 10 at the time of learning will be described with reference to FIG. 2. FIG. 2 is a diagram illustrating an example of a functional configuration of the point process learning apparatus 10 at the time of learning.


As illustrated in FIG. 2, the point process learning apparatus 10 at the time of learning has a selection unit 101, a division unit 102, a feature extraction unit 103, an intensity function estimation unit 104, and a parameter update unit 105. Each of these units is implemented by, for example, processing executed by the processor 15 by one or more programs installed in the point process learning apparatus 10.


Moreover, the point process learning apparatus 10 at the time of learning has a storage unit 110. The storage unit 110 is implemented by, for example, the memory device 16. However, the storage unit 110 may be implemented by, for example, a storage device (e.g., a database server, etc.) connected with the point process learning apparatus 10 via a communication network.


The storage unit 110 stores a learning data set (Ds)sϵS for learning a parameter (which will be hereinafter referred to as a “model parameter”) of the prediction model.


The selection unit 101 randomly selects one data set Ds from the learning data set {Ds}sϵS stored in the storage unit 110.


The division unit 102 determines a learning observation area Xo′ from the prediction time observation area Xo, and uses the learning observation area Xo′ to divide the event data Des and the auxiliary data {Dcs}cϵC included in the data set Ds={Des, {Dcs}cϵC}. At this time, the division unit 102 makes a division into three pieces of event data Deso′ and auxiliary data {Dcso′}cϵC, corresponding to the learning observation area Xo′, event data Dest later than the learning observation area Xo′, and other data.


Note that a specific division method will be described later.


The feature extraction unit 103 calculates the latent vector zso by the above Formula 10 using the event data Deso′ and the auxiliary data {Dcso′}cϵC corresponding to the learning observation area Xo′.


The intensity function estimation unit 104 calculates the intensity function λ by the above Formula 12 using the event data Deso′ and the auxiliary data {Dcso′}cϵC corresponding to the learning observation area Xo′ and the latent vector zso.


The parameter update unit 105 updates the model parameters (i.e., the parameters of the neural network such as fe, fc, and fz, and the parameter θ of the intensity function λ) so as to minimize an error from the event data Dest later than the learning observation area Xo′. At this time, when the prediction likelihood is used, the negative log likelihood of p(Dest|Deso′, {Dcso′}cϵC) may be minimized. Note that the prediction likelihood may be p(Dest, Deso′|Deso′, {Dcso′}cϵC) (that is, Deso′ may be used at the time of calculating the likelihood). On the other hand, in the case of prediction by simulation, an error between the result and Dest may be minimized.


<Learning Processing>

Next, the learning processing according to the present embodiment will be described with reference to FIG. 3. FIG. 3 is a flowchart illustrating an example of the learning processing according to the present embodiment. Note that the following steps S101 to S107 are repeatedly executed until a predetermined termination condition is satisfied. Examples of such a termination condition include that the number of repetitions has reached a predetermined number of times, that the value of the model parameter has converged (e.g., that the update amount of the model parameter becomes less than the predetermined threshold before and after the repetition), and the like.


First, the selection unit 101 randomly selects one data set Ds from the learning data set {Ds}sϵS stored in the storage unit 110 (step S101).


Next, the division unit 102 determines a learning observation area Xo′ from the prediction time observation area Xo′ (step S102). Here, the learning observation area Xo′ is determined by the following determination method with reference to the prediction time observation area Xo.

    • The learning observation area Xo′ has the same size as the size of the prediction time observation area Xo (however, for example, only the time direction may be lengthened or conversely shortened).
    • A start point of time of the learning observation area Xo′ is randomly determined (however, for example, the determination may be made on the basis of a certain rule such as adding 1 to the start point of time for each repetition after the initial value of the start point of time is set).


As an example, an example of the learning observation area Xo′ in a case where Xo={(t, r1, r2)|0≤t≤5, 0≤r1, r2≤1} is satisfied will be described below.






X
o′={(t,r1,r2)|3≤t≤8,0≤r1,r2≤1}






X
o′={(t,r1,r2)|4≤t≤9,0≤r1,r2≤1}






X
o′={(t,r1,r2)|5≤t≤10,0≤r1,r2≤1}


Next, the division unit 102 divides the event data Des and the auxiliary data {Dcs}cϵC included in the data set Ds={Des, {Dcs}cϵC} using the learning observation area Xo′(step S103). That is, the division unit 102 divides the event data Des into three pieces of event data Deso′ corresponding to the learning observation area Xo′, event data Dest later than the learning observation area Xo′, and other data. Similarly, the division unit 102 divides the auxiliary data {Dcs}cϵC into three pieces of auxiliary data {Dcso′}cϵC: corresponding to the learning observation area Xo′ and other data. There are three pieces of data used in the processing described later: Deso′, Dest, and {Dcso′}cϵC, and no other data is used. FIG. 4 schematically illustrates this. In FIG. 4, an area later than Xo′ is denoted by Xt, and event data Dest corresponding to this area Xt is used as so-called teacher data (or correct answer data). Note that hatched portions are data that are not used. Moreover, c1 and c2 are elements of C.


Next, the feature extraction unit 103 calculates the latent vector zso by the above Formula 10 using the event data Deso′ and the auxiliary data {Dcso′}cϵC corresponding to the learning observation area Xo′(step S104). That is, the feature extraction unit 103 calculates the latent vector zso by the following formula.






z
so
=f
z([fe(Deso′),{fc(Dcso′)}cϵC])


Note that, as described above, the latent vector zso may be calculated without using the event data Deso′ in a case where auxiliary data is given, or the latent vector zso may be calculated only using the event data Deso′ in a case where no auxiliary data is given.


Next, the intensity function estimation unit 104 calculates the intensity function λ by the above Formula 12 using the event data Deso′ and the auxiliary data {Dcso′}cϵC corresponding to the learning observation area Xo′ and the latent vector zso (step S105). That is, the intensity function estimation unit 104 calculates λ(x|Deso′, {Dcso′}cΣC, zso). Note that, as described above, the auxiliary data {Dcso′}cϵC may be used only partially or not used at all.


Next, the parameter update unit 105 calculates an error from event data Dest later than the learning observation area Xo′ (step S106). Note that, as described above, the negative log likelihood of the prediction likelihood p(Dest|Deso′, {Dcso′}cϵC) may be used as the error, or the error between the simulation result and Dest may be used as the error.


Then, the parameter update unit 105 updates the model parameter so as to minimize the error calculated in the above step S106 using, for example, the gradient method (step S107).


As described above, the point process learning apparatus 10 according to the present embodiment can learn the parameters (i.e., the parameters of the neural network such as fe, fc, and fz, and the parameter θ of the intensity function λ) of the prediction model. At this time, as described in the above steps S102 to S103, the point process learning apparatus 10 according to the present embodiment divides the data set Ds using the learning observation area Xo′ determined from the prediction time observation area Xo, and then calculates the intensity function, the prediction likelihood, and the like using the divided data set. As a result, it is possible to accurately predict the occurrence of future events even if the number of pieces of event data given at the time of prediction is small.


<Functional Configuration at the Time of Prediction>

Next, a functional configuration of the point process learning apparatus 10 at the time of prediction will be described with reference to FIG. 5. FIG. 5 is a diagram illustrating an example of a functional configuration of the point process learning apparatus 10 at the time of prediction.


As illustrated in FIG. 5, the point process learning apparatus 10 at the time of prediction has the feature extraction unit 103, the intensity function estimation unit 104, and a prediction unit 106. Each of these units is implemented by, for example, processing executed by the processor 15 by one or more programs installed in the point process learning apparatus 10.


Moreover, the point process learning apparatus 10 at the time of prediction has the storage unit 110. The storage unit 110 is implemented by, for example, the memory device 16. However, the storage unit 110 may be implemented by, for example, a storage device (e.g., a database server, etc.) connected with the point process learning apparatus 10 via a communication network.


The storage unit 110 stores a prediction data set Ds* for predicting events that occur in the prediction target area Xt.


The feature extraction unit 103 calculates the latent vector zs* by the above Formula 10 using the event data Des* and the auxiliary data (Dcs*)cϵC included in the prediction data set Ds*. However, used are parameters of the neural network such as fe, fc, and fz that have already been learned.


The intensity function estimation unit 104 uses the event data Des* and the auxiliary data {Dcs*} ee included in the prediction data set Ds* and the latent vector zs* to calculate the intensity function λ by the above Formula 12. However, used is the learned parameter θ of the intensity function λ that has already been learned.


The prediction unit 106 predicts events that occur in the prediction target area Xt by the intensity function λ.


<Prediction Processing>

Next, prediction processing according to the present embodiment will be described with reference to FIG. 6. FIG. 6 is a flowchart illustrating an example of prediction processing according to the present embodiment.


First, the latent vector zs* is calculated by the above Formula 10 using the event data Des* and the auxiliary data {Dcs*})cϵC included in the prediction data set Ds* (step S201). That is, the feature extraction unit 103 calculates the latent vector zs* by the following formula.






z
s
*=f
z([fe(Des*),{fc(Dcs*)}cϵC])


Note that, as described above, the latent vector zs* may be calculated without using the event data Des* in a case where auxiliary data is given, or the latent vector zs* may be calculated only using the event data Des* in a case where no auxiliary data is given.


Next, the intensity function estimation unit 104 uses the event data Des* and the auxiliary data {Dcs*}cϵC included in the prediction data set Ds* and the latent vector zs* to calculate the intensity function λ by the above Formula 12 (step S202). That is, the intensity function estimation unit 104 calculates λ(x|Des*, {Dcs*}cϵC, zs*). Note that, as described above, the auxiliary data {Dcs*}cϵC may be used only partially or not used at all.


Then, the prediction unit 106 predicts events that occur in the prediction target area Xt by the intensity function λ(x|Des*, {Dcs*}cϵC, zs*) (step S203).


As described above, the point process learning apparatus 10 according to the present embodiment can predict events that occur in the prediction target area Xt using the prediction data set Ds* including a relatively small number of pieces of data.


<Comparative Example with Conventional Technique>



FIG. 7 illustrates a comparative example of the point process learning apparatus 10 (proposed technique) according to the present embodiment and conventional technique. As illustrated in FIG. 7, a relatively large area is required as the prediction time observation area Xo in order to accurately predict events that occur in the prediction target area Xt in conventional technique, whereas the point process learning apparatus 10 according to the present embodiment can accurately predict events in a relatively small area as the prediction time observation area Xo. Therefore, it becomes possible with the point process learning apparatus 10 according to the present embodiment to accurately predict the occurrence of future events even in a case where only a relatively small number of pieces of event data can be observed (e.g., in a case where it is assumed that the occurrence tendency of a new phenomenon or event is different from the past, or the like).


<Extension to Marked Point Process>

The embodiment described above can be easily extended to an arbitrary marked point process. In the marked point process, the event data De is given below.






custom-character
e
={x
n
,y
n)}n=1N  [Math 20]


Note that yn may be any of discrete, continuous, and dimensional.


By replacing the event data De in the embodiment described above with the event data De represented in the above Formula 20, an arbitrary marked point process is extended.


EXAMPLES

As an example of the above embodiment, an example of data of a case where events to be predicted are set as “the occurrence of infected people of new infectious disease B* in region A* occurring in the next half year” is shown below. At this time, the event data De={xn} is xn=(time, latitude, longitude).


Example of learning data set: Series of occurrences events of infected people with other infectious diseases Bl, . . . , BN′, in other regions A1, . . . , AN′ (e.g., each in one year or the like)


Example of auxiliary data: Real-time demographic data, map data showing public transportation, and climate information (e.g., the highest temperature, the lowest temperature, the humidity, and the like in the region) data


Example of a mark when applied to marked point process: Gender, age, and occupation of infected person


Example of prediction data set: Series of occurrences of events for the past one week of infected people with new infectious disease B* in region A*, and above-described auxiliary data for the same period or independent of time (e.g., real-time demographic data and climate information as auxiliary data for the same period as the series of the occurrences of the events, map data indicating public transportation as auxiliary data independent of time, etc.)


The present invention is not limited to the above embodiment specifically disclosed, and various modifications and changes, combinations with known technique, and the like can be made without departing from the scope of the claims.


REFERENCE SIGNS LIST






    • 10 point process learning apparatus


    • 11 input device


    • 12 display device


    • 13 external I/F


    • 13
      a recording medium


    • 14 communication I/F


    • 15 processor


    • 16 memory device


    • 17 bus


    • 101 selection unit


    • 102 division unit


    • 103 feature extraction unit


    • 104 intensity function estimation unit


    • 105 parameter update unit


    • 106 prediction unit


    • 110 storage unit




Claims
  • 1. A point process learning method executed by a computer, the point process learning method comprising: inputting a learning data set including at least first event data representing a series of occurrences of first events;dividing the first event data included in the learning data set by using a prediction time observation area including at least a time series when predicting future event occurrence to obtain a divided learning data set; andlearning a model parameter including a parameter of an intensity function of a predetermined point process model by using the divided learning data set.
  • 2. The point process learning method according to claim 1, further comprising: inputting a prediction data set including at least second event data representing a series of occurrences of second events to be predicted, andpredicting a series of occurrences of second events in a prediction target area that is an area later than the prediction time observation area by the point process model by using the prediction data set and the learned model parameter.
  • 3. The point process learning method according to claim 2, wherein the learning data set and the prediction data set include one or more pieces of auxiliary data that is auxiliary information other than an event occurrence series.
  • 4. The point process learning method according to claim 2, wherein a mark for each of the first events and the second events is added to the first event data and the second event data.
  • 5. The point process learning method according to claim 1, wherein the point process model includes a function implemented by one or more neural networks and the intensity function, andthe learning includes learning model parameters including a parameter of the neural network and a parameter of the intensity function.
  • 6. The point process learning method according to claim 1, wherein the dividing of the first event data includes:creating a learning observation area in which a time series included in the prediction time observation area is changed; anddividing the first event data included in the learning data set into a first data group corresponding to the learning observation area, a second data group corresponding to a time series later than the time series included in the learning observation area, and a remaining data group.
  • 7. A point process learning apparatus comprising: a memory; anda processor coupled to the memory and configured toinput a learning data set including at least first event data representing a series of occurrences of first events;divide the first event data included in the learning data set by using a prediction time observation area including at least a time series when predicting future event occurrence to obtain a divided learning data set; andlearn a model parameter including a parameter of an intensity function of a predetermined point process model by using the divided learning data set.
  • 8. A non-transitory computer-readable recording medium storing a program for causing a computer to execute the point process learning method according to claim 1.
PCT Information
Filing Document Filing Date Country Kind
PCT/JP2020/045033 12/3/2020 WO