The present invention relates to a learning device, a predicting device, a learning method, a predicting method, and a program.
Techniques for predicting events such as equipment failures, human behavior, crimes, earthquakes, and infectious diseases using point processes are being researched. It is known that prediction by a point process is performed by learning using previous data of a sequence to be predicted and calculating an intensity function which indicates the likelihood of events occurring in a time period in future.
In addition, methods using meta-learning are being studied to save the trouble of learning for each sequence. For example, NPL 1 discloses a meta-learning technique based on model-agnostic meta-learning (MAML).
The technique in the related art has a problem that it is difficult to appropriately ascertain a relationship of previous events with a small amount of calculation in meta-learning for prediction by a point process.
An object of the disclosed technique is to appropriately ascertain a relationship of previous events with a small amount of calculation in meta-learning for prediction by a point process.
The disclosed technique is a learning device for predicting an occurrence of an event which includes: a dividing unit which divides a support set extracted from a set of previous data for learning into a plurality of sections; a latent expression extracting unit which outputs a first latent vector based on each of the plurality of divided sections and outputs a second latent vector based on each of the output first latent vectors; and an intensity function derivation unit which outputs an intensity function indicating a likelihood of an event occurring based on the second latent vector.
In meta-learning for prediction by a point process, a relationship of previous events can be ascertained appropriately with a small amount of calculation.
An embodiment of the present invention (present embodiment) will be described below with reference to the drawings. The embodiment described below is merely an example and embodiments to which the present invention is applied are not limited to the following embodiment.
A learning device 1 according to the present embodiment is a device which performs meta-learning for predicting an occurrence of an event by a point process. Event
t∈
1 [Math. 1]
represents a time at which an event occurred and an observation start of the sequence is set to 0.
E=({ti}i=1I,te) [Math. 2]
is a sequence of I events. Here, te is an observation end time. The number of events may differ depending on the sequence.
D={E
j}j=1J[Math. 3]
is J items of sequence data. For prediction, the observation time is Ts*=[0, ts*], the prediction period is Tq*=(ts*, tq*] and the prediction target sequence is E*. At this time, any event ti included in E* satisfies 0≤ti≤ts*. The goal of prediction is to obtain an intensity function λ(t) (ts*<t≤tq*) that indicates the likelihood of an event occurring during the prediction period Tq* of the prediction target sequence E*.
The extracting unit 11 randomly selects a sequence Ej (hereinafter also referred to as E by omitting j) from a data set D, which is a set of previous data for learning.
Subsequently, the extracting unit 11 determines ts, tq (0<ts<tq≤te). A determination method may be random or may use ts* and tq* when performing assumed prediction. Also, the extracting unit 11 extracts, from the sequence E, a support set Es={ti|0≤ti≤ts} and a query set Eq>={ti|ts<ti≤tq}. Note that the extracting unit 11 may extract the query set Eq from {ti|0≤ti≤tq}.
The dividing unit 12 divides the support set Es into a plurality of sections based on defined rules. Examples of dividing methods include specified time intervals (for example, [0,ts/3), [ts/3,2ts/3), [2ts/3,ts]), and equalizing the expected value of the number of events included in each section expressed as follows:
j,k
[|E
sk
j|] [Math. 4]
Hereinafter, the dividing unit 12 divides the support set Es into K sections and the sequence of events included in a kth section is defined as Esk.
The latent expression extracting unit 13 inputs each of the divided support sets
{Esk}k=1K [Math. 5]
to an NN1 corresponding to each of the sections to obtain the latent vector
{zk}k=1K [Math. 6]
(first latent vector). NN1 is a model (first model) that can handle variable-length inputs, such as Deepset, Transformer, or RNN.
The latent expression extracting unit 13 also inputs a latent vector zk of each section output from each NN1 to NN2 to obtain a latent vector z (second latent vector). NN2 (second model) can be any neural network if K is constant, and a neural network that can handle variable-length inputs if K is variable.
The intensity function derivation unit 14 inputs a latent vector z and a time t to NN3 to obtain an intensity function λ(t). NN3 (third model) is a neural network in which any output is a positive scalar value.
The parameter updating unit 15 calculates a negative logarithmic likelihood from the intensity function λ(t) and Eq, and uses an error backpropagation method or the like to update parameters of the models (NN1, NN2 and NN3) of the latent expression extracting unit 13 or the intensity function derivation unit 14.
The learning device 1 performs learning processing according to a user's operation or a predefined schedule. The extracting unit 11 randomly selects a sequence Ej from the data set D (Step S101). Also, the extracting unit 11 determines ts, tq(0<ts<tq≤te) (Step S102). Subsequently, the extracting unit 11 extracts the support set Es and the query set Eq from the sequence E (Step S103).
The dividing unit 12 divides the support set Es into a plurality of (K) sections (Step S104). The latent expression extracting unit 13 inputs each divided section Esk to NN1 corresponding to each of the sections to obtain a latent vector zk (Step S105). Furthermore, the latent expression extracting unit 13 inputs each latent vector zk to NN2 to obtain a latent vector z (Step S106).
Subsequently, the intensity function derivation unit 14 inputs the latent vector z and the time t to NN3 to obtain the intensity function λ(t) (Step S107). The parameter updating unit 15 updates the parameters of each model (Step S108).
The learning device 1 determines whether an end condition is satisfied as a result of updating the parameters (Step S109). The end condition is, for example, a condition in which a difference between values before and after updating is less than a predetermined threshold or a condition that the number of updates reaches a predetermined number.
When determining that the end condition is not satisfied (Step S109: No), the learning device 1 returns to Step S101. Alternatively, when determining that the end condition is satisfied (Step S109: Yes), the learning device 1 ends the learning process.
Also, a predicting device 2 according to the present embodiment is a device configured to predict an occurrence of an event by a point process using models of NN1, NN2, and NN3 whose parameters have been updated by the learning device 1.
The dividing unit 21 regards the prediction sequence E* as Es* and divides Es* into a plurality of sections Esk* as in the dividing unit 12 of the learning device 1.
As in the latent expression extracting unit 13 of the learning device 1, the latent expression extracting unit 22 inputs each of the divided support sets to NN1 (first model) corresponding to each of the sections to obtain the latent vector zk* (first latent vector). Also, the latent expression extracting unit 22 inputs the latent vector zk* of each section output from each NN1 to NN2 (second model) to obtain a latent vector z* (second latent vector).
As in the intensity function derivation unit 14 of the learning device 1, the intensity function derivation unit 23 inputs the latent vector z* and a time t to NN3 (third model) to obtain the intensity function λ(t).
The predicting unit 24 predicts a situation of occurrences of events during the prediction period Tq* using the intensity function λ(t).
The predicting device 2 may generate an event by simulation and output a prediction result (Y. Ogata, “On Lewis' simulation method for point processes”, IEEE Transactions on Information Theory, Volume 27, Issue 1, January 1981, pp. 23 to 31).
The dividing unit 21 of the predicting device 2 regards the prediction sequence E* as Es* (Step S201). Also, the dividing unit 21 determines ts* and tq* (Step S202). Subsequently, the dividing unit 21 divides the support set Es* into a plurality of sections (Step S203).
The latent expression extracting unit 22 inputs each divided section Esk* to NN1 to obtain a latent vector zk* (Step S204). Furthermore, the latent expression extracting unit 22 inputs each latent vector zk* to NN2 to obtain a latent vector z* (Step S205).
Subsequently, the intensity function derivation unit 23 inputs the latent vector z* and each time t within the prediction period Tq* to NN3 to obtain the intensity function λ(t) (Step S206).
In this case, when NN1 is, for example, a Deepset, there is a problem that a relationship of previous events cannot be ascertained. Furthermore, when NN1 is a Transformer, there is a problem that the amount of calculation is proportional to the square of the previous events and the amount of calculation becomes enormous. In addition, when NN1 is an RNN, although a relationship of adjacent events can be ascertained, there is a problem that it is difficult to ascertain a relationship of distant events. Furthermore, when NN1 is a Transformer or RNN, it is assumed that the input is time-sequence data at regular intervals. Thus, there is a problem that it is difficult to ascertain characteristics of the sparseness or density of previous data and input for each occurrence of an event, that are required to be ascertained.
According to the learning device 1 or the predicting device 2 relating to the present embodiment, an average sequence length to be calculated in NN1 is 1/K compared to the method in the related art in
Furthermore, the learning device 1 or the predicting device 2 can perform parallel distributed processing for each section. In this respect, for example, when NN1 is an RNN, it is necessary to perform sequential processing in the method in the related art.
Furthermore, the learning device 1 or the predicting device 2 can ascertain a sequential relationship of events according to which sections the respective events are included. In this respect, when NN1 is, for example, a Deepset, there is a problem that a relationship of previous events cannot be ascertained.
Furthermore, the learning device 1 or the predicting device 2 can directly ascertain whether occurrence intervals of events are sparse or dense for each section.
Marks or additional information may be added to the event data. For example, let the event data be (t,m). m is a mark or additional information. In this case, the learning device 1 or the predicting device 2 may perform learning processing and prediction processing using a neural network NN4 suitable for marks or additional information before NN1, as follows:
NN1({[tiNN4(mi)]}(t
Here, [ ] is a symbol indicating concatenation.
Furthermore, additional information ‘a’ may be added to the sequence. In this case, the learning device 1 or the predicting device 2 may perform learning processing or prediction processing using neural networks (NN5, NN6) suitable for additional information before NN3. That is to say, the learning device 1 or the predicting device 2 inputs the latent vector z′ obtained by the following expression to NN3.
z′=NN6([z,NN5(a)])
Also, although the dimensionality of the event is one in the present embodiment, it may be extended to any dimensionality (for example, three dimensions in time and space).
The learning device 1 and the predicting device 2 can be implemented, for example, by causing a computer to execute a program describing the processing details described in the present embodiment. Note that the “computer” may be a physical machine or a virtual machine in the cloud. When using a virtual machine, the “hardware” described herein is virtual hardware.
The above program can be recorded in a computer-readable recording medium (portable memory or the like), saved, or distributed. It is also possible to provide the above program through a network such as the Internet or e-mail.
A program for realizing processing by the computer is provided by, for example, a recording medium 1001 such as a CD-ROM or a memory card. When a recording medium 1001 storing a program is set in the drive device 1000, the program is installed from the recording medium 1001 to the auxiliary storage device 1002 via the drive device 1000. Here, the program need not necessarily be installed from the recording medium 1001 and may be downloaded from another computer via the network. The auxiliary storage device 1002 stores installed programs as well as necessary files and data.
The memory device 1003 reads the program from the auxiliary storage device 1002 and stores it when a program activation instruction is received. The CPU 1004 implements functions relating to the device according to programs stored in the memory device 1003. The interface device 1005 is used as an interface for connecting to a network. The display device 1006 displays a graphical user interface (GUI) or the like by a program. The input device 1007 is composed of a keyboard, a mouse, buttons, a touch panel, or the like and is used for inputting various operational instructions. The output device 1008 outputs the calculation result. Note that the computer may include a graphics processing unit (GPU) or a tensor processing unit (TPU) instead of the CPU 1004 or may include a GPU or TPU in addition to the CPU 1004. In that case, the processing may be divided and executed such that the GPU or the TPU executes processing which requires special computation, such as a neural network, and the CPU 1004 executes other processing.
As an example of the present embodiment, for example, it is possible to predict an occurrence of a user's future purchasing behavior on an electronic commerce (EC) site as an event. In this case, the sequence is user information and the mark or additional information which can be added to the event may be product information, payment method, or the like relating to the purchasing behavior of each user. Furthermore, the sequence additional information may be attributes such as the user's gender and age.
In this case, as Example 1, the learning data may be an existing user event sequence of an EC site and the prediction data may be a new user's sequence for one week. Also, as Example 2, the learning data may be an event sequence of each user at various EC sites and the prediction data may be an event sequence of users at another EC site.
The examples described above are merely examples and the learning device 1 and the predicting device 2 according to the present embodiment can be used to predict occurrences of various events.
This specification describes at least a learning device, a predicting device, a learning method, a predicting method, and a program described in each of the following items.
A learning device for predicting an occurrence of an event, including:
The learning device according to Item 1, further including:
The learning device according to Item 1 or 2, wherein the latent expression extracting unit outputs the first latent vector based on each of the plurality of divided sections by parallel distributed processing.
A predicting device for predicting an occurrence of an event, including:
The predicting device according to Item 4, further comprising:
A learning method performed by a learning device including:
A predicting method performed by a predicting device including:
A program causing a computer to function as each unit in the learning device according to any one of Items 1 to 3 or a program causing a computer to function as each unit in the predicting device according to Item 4 or 5.
Although the present embodiment has been described above, the present invention is not limited to such a specific embodiment and various modifications and changes are possible within the scope of the gist of the present invention described in the claims.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/JP2021/017568 | 5/7/2021 | WO |