The present invention relates to a data prediction apparatus, a method, and a program for predicting data for a prediction target time.
High-dimensional array data can be represented by using a tensor.
Here, the high-dimensional array data refers to data with values for a plurality of indexes. Now, it is assumed that n pieces of R-dimensional array data
(i1,i2, . . . ,iR,yi)
are given.
Such data can be represented by an Rth-order tensor:
[Math. 1]
i
,i
, . . . ,i
=y
i (1)
Tensor factorization such as CP decomposition or Tucker decomposition is used to analyze data represented by a tensor (Non Patent Literature 1).
In tensor factorization, a data tensor is decomposed into the form of a product of a plurality of matrices, and therefore a low-dimensional representation of the data is given. These matrices are called “factor matrices” and represent potential patterns corresponding to each dimension of the tensor. If the tensor contains a missing value, the factor matrix is first estimated by using only non-missing data. At the time of prediction, missing values are complemented by multiplying by a matrix learned from the data and restoring the original value. However, in these methods, there is a problem that external information that affects data to be predicted cannot be considered. Thus, a tensor simultaneous factorization method (Non Patent Literature 2) has been proposed. This is a technique for simultaneously decomposing a plurality of tensors corresponding to a plurality of types of data.
As a result, it is possible to make a prediction while taking into account the influence of external factors.
However, the method described in Non Patent Literature 2 considers all external information equally, and it is not possible to select information.
Thus, there is a problem that the prediction accuracy is reduced when auxiliary information not related to the data to be predicted is used.
As described above, in the method of the related art, it is not possible to separate the external information that affects the target data and the external information that does not affect the target data.
For this reason, there is a problem that the prediction accuracy is reduced when external information without attributes common to the data to be predicted is included.
The present invention has been made in view of the above circumstances, and an object of the present invention is to provide a data prediction apparatus, a method, and a program capable of accurately predicting data for a prediction target time.
In order to achieve the object described above, a data prediction apparatus according to an embodiment of the present invention is configured to include: an operation unit that receives high-dimensional array data representing data at each time, and external information data, which is a tensor or a matrix representing external information and which is correlated with the high-dimensional array data; a parameter estimation unit that decomposes the high-dimensional array data into a weighted sum of products of a plurality of factor matrices for each rank using weighting parameters for each rank in tensor factorization and decompose the external information data into a weighted sum of products of a plurality of factor matrices for each rank using weighting parameters for each rank and a plurality of factor matrices including a factor matrix common to the high-dimensional array data, under a sparse constraint of the weighting parameters for each rank; and a prediction unit that predicts the data for a prediction target time based on the weighting parameters for each rank and the plurality of factor matrices for each rank, obtained for the high-dimensional array data by the parameter estimation unit.
In addition, a data prediction method according to the present invention is configured to include: receiving, by an operation unit, high-dimensional array data representing data at each time, and external information data which is a tensor or a matrix that represents external information at each time and which is correlated with the high-dimensional array data; decomposing, by a parameter estimation unit, the high-dimensional array data into a weighted sum of products of a plurality of factor matrices for each rank using weighting parameters for each rank in tensor factorization and decomposing the external information data into a weighted sum of products of a plurality of factor matrices for each rank using weighting parameters for each rank and a plurality of factor matrices including a factor matrix common to the high-dimensional array data, under a sparse constraint of the weighting parameters for each rank; and predicting, by a prediction unit, the data for a prediction target time based on the weighting parameters for each rank and the plurality of factor matrices for each rank, obtained for the high-dimensional array data by the parameter estimation unit.
Further, a program of the present invention is a program for causing a computer to function as each unit of the above-described data prediction apparatus.
As described above, the data prediction apparatus, method, and program of the present invention exhibits an effect of being capable of accurately predict data for a prediction target time by decomposing the high-dimensional array data into a weighted sum of products of a plurality of factor matrices for each rank using weighting parameters for each rank in tensor factorization and decomposing the external information data into a weighted sum of products of a plurality of factor matrices for each rank using weighting parameters for each rank and a plurality of factor matrices including a factor matrix common to the high-dimensional array data, under a sparse constraint of the weighting parameters for each rank.
Hereinafter, an embodiment of the present invention will be described in detail with reference to drawings.
In the embodiment of the present invention, selection of external information is performed by imposing a sparse constraint during simultaneous factorization of tensors. With the tensor simultaneous factorization technique, a tensor (data tensor) representing data and a tensor (or matrices) representing external information are simultaneously decomposed while sharing a factor matrix, and therefore an indirect relationship between data and external information can be captured. At that time, the data tensor is approximated as a product of the plurality of factor matrices. In the embodiment of the present invention, weighting parameters corresponding to each factor matrix is introduced, and the data tensor is approximated as a product of the factor matrix and the weighting parameters of each factor matrix. By imposing a sparse constraint such as a L1 norm to the weighting parameters, unnecessary parameters can be crushed to 0, and a data tensor can be reconstructed at the time of prediction with reference to only some factor matrices.
Configuration of Data Prediction Apparatus According to the Embodiment of Present Invention
Next, the configuration of the data prediction apparatus according to the embodiment of the present invention will be described. As illustrated in
The operation unit 10 receives various operations from a user on data stored in a high-dimensional array data storage device 12 and an external information storage device 14 described below. The various operations include operations for registering, correcting, and deleting information stored in the high-dimensional array data storage device 12 and the external information storage device 14.
The input means of the operation unit 10 may be anything such as a keyboard, a mouse, a menu screen, or a touch panel. The operation unit 10 can be realized by a device driver of an input unit such as a mouse, or control software for a menu screen.
The search unit 20 receives information on time (week, day, time) that is a prediction target and the location (mesh area). The input means of the search unit 20 may be anything, such as a keyboard, a mouse, a menu screen, or a touch panel. The search unit 20 can be realized by a device driver of an input unit such as a mouse or control software for a menu screen.
The high-dimensional array data storage device 12 stores history information of high-dimensional array data that can be analyzed by the apparatus, and reads the history information of the high-dimensional array data according to a request from the apparatus and transmits the information to the data prediction apparatus 100. The high-dimensional array data is, for example, the transition of population in an arbitrary mesh area in a geographic space, and is composed of a set
{(ti,yi)}i=1N
of time ti and the number of people yi. Here, N is the number of pieces of data. If the week, the day of the week, and the time slot corresponding to the time ti are set to i1, i2, and i3, respectively, the population transition can be rewritten as a tuple series
{(i1,i2,i3,yi)}i=1N
including four components (see
Such data is represented by a third-order tensor composed of three axes of week i1, day i2, and time slot i3.
Each component of
(j)
corresponds to:
i
,i
,i
(j)
=y
i
It is assumed that the tensor in the j-th mesh area is:
(j)
The high-dimensional array data storage device 12 is a Web server that stores Web pages, a database server that has a database, or the like.
The external information storage device 14 stores external information that can be analyzed by the apparatus, reads out the external information according to a request from the apparatus, and transmits the information to the data prediction apparatus 100. The external information is data related to an external factor affecting the high-dimensional array data, and is, for example, a set
{(j′)}j′∈
of population data in a nearby mesh area (see
is a set of mesh areas adjacent to the j-th area. Such data is represented by a fourth-order tensor
composed of four axes obtained by adding the index j′ of the mesh area to the week i1, the day of the week i2, and the time slot i3.
The external information storage device 14 is a Web server that stores Web pages, a database server that has a database, or the like.
The parameter estimation unit 16 extracts a low-dimensional expression of the information on the basis of the information stored in the high-dimensional array data storage device 12 and the external information storage device 14, and estimates progression over time. The procedure will be described by using the above example. The procedure considers applying tensor factorization to a tensor representing history information of high-dimensional array data. The tensor factorization is an approximating method using a product of factor matrices. The goal of the present embodiment is to find a set of factor matrices that reproduce the original tensor well. The data tensor
of the high-dimensional array data is decomposed as follows.
Here,
vk(1), vk(2), vk(3)
are factor matrices,
b={b
k}k=1K
is a weighting parameter for each factor, and K is the number of ranks of tensor factorization, which is manually given on the basis of prior knowledge or determined by cross validation.
◯
represents an outer product of vectors. Similarly, the tensor
representing external information is decomposed as follows.
Here,
a={a
k}k=1K
is a weighting parameter for each factor, and
vk(1), vk(2), vk(3), vk(4)
are factor matrices. Here, the factor matrix of the tensor
(j)
of high-dimensional array data and the factor matrix of the tensor
of external information are shared. This enables tensor factorization in consideration of external information.
In order to select a factor matrix, a sparse constraint is imposed to weighting parameters:
a, b
Following typical sparse modeling procedures, regularization terms
ψ(a), ψ(b)
for
a, b
are introduced into a likelihood function L.
λ is a hyperparameter that controls the effect of a regularization term.
Although the form of a regularization term
ψ(a), ψ(b)
is not limited, the present embodiment introduces a least absolute shrinkage and selection operator (LASSO) that is generally used when selecting features of a regression problem.
[Math. 5]
ψ(a)=|a| (7)
This is a constraint that works in the direction of setting some elements of the vectors
a, b
to 0, and the effect of extracting only those matrices that well explain the target data among the latent matrices shared with the external information can be expected. The likelihood function of this model can be written as:
is an arbitrary distance measure representing the distance between x and y, and is defined by the sum of divergence for each element.
is the divergence between x and y, and is defined by the following equation for
β\∈{0,1}
The β divergence includes Euclidean distance (β=2) and KL divergence (β=1), which are generally used in tensor factorization, as special cases. The following discussion holds for any value of β. The goal of the present embodiment is to estimate a set
={v(1),v(2),v(3),v(4)}
of factor matrices and weighting parameters
a, b
that minimize the value of the likelihood function L. For optimizing the parameters, for example, an alternating direction multiplier method (ADMM) (Non Patent Literature 3) can be used.
Non Patent Literature 3 Huang, Kejun, Nicholas D. Sidiropoulos, and Athanasios P. Liavas. “A flexible and efficient algorithmic framework for constrained matrix and tensor factorization”. IEEE Transactions on Signal Processing 64. 19 (2016): 5052-5065.
In accordance with the ADMM procedure, the parameter optimization problem of the proposed model is rewritten as the following equation.
[Math. 9]
minimize Dβ(|)+Dβ(|)—λ|ha|+λ|hb| (13)
subject to =,W=,a=ha,b=hb (14)
The likelihood function can be rewritten as follows.
Here,
αh
are Lagrangian multipliers, and ρ is a hyperparameter that controls the step size. Thereafter, the above equations may be alternately optimized for each of the parameter sets
{vk(1),vk(2),vk(3),vk(4)}k=1K, a, b, ha, hb, , , αh
according to the following equations.
When KL divergence is used as a cost function (β=1), an update equation for
,
can be written as the following equation.
The description of the update equations
αh
of is omitted.
As described above, the parameter estimation unit 16 decomposes the high-dimensional array data into a weighted sum of products of a plurality of factor matrices for each rank using weighting parameters for each rank in tensor factorization and decomposes the external information data into a weighted sum of products of a plurality of factor matrices for each rank using weighting parameters for each rank and a plurality of factor matrices including a factor matrix common to the high-dimensional array data, under a sparse constraint of the weighting parameters for each rank. In other words, the parameter estimation unit 16 repeats updating the weighting parameters for each rank and the plurality of factor matrices for each rank for the high-dimensional array data, and the weighting parameters for each rank and the plurality of factor matrices for each rank for the external information data according to the above equations (17) to (23) in order to optimize a distance between the high-dimensional array data and the weighted sum of products of the plurality of factor matrices for each rank using the weighting parameters for each rank, a distance between the external information data and the weighted sum of products of the plurality of factor matrices for each rank using the weighting parameters for each rank, and the value of the likelihood function L of the above equation (4) expressed by using the regularization terms of the weighting parameters for each rank.
The parameter storage unit 18 stores a set of optimal parameters obtained by the parameter estimation unit 16. The parameter storage unit 18 may be anything as long as the set of estimated parameters is stored and can be restored. For example, the set of estimated parameters is stored in a specific area of a database or a general-purpose storage device (memory or hard disk device) provided in advance.
The prediction unit 22 predicts data for a prediction target time and the location on the basis of the information on a prediction target time and the location received by the operation unit 10, and the weighting parameters for each rank for the high-dimensional array data and a plurality of factor matrices for each rank stored in the parameter storage unit 18.
For example, in the case of the above example, the population at the time corresponding to a prediction target time (day i2 of the i1-th week, time slot i3) can be estimated by the following equation.
The output unit 24 outputs the result predicted by the prediction unit 22. Here, the output is a concept including displaying on a display, printing on a printer, sound output, transmission to an external apparatus, and the like. The output unit 24 may or may not include an output device such as a display or a speaker. The output unit 24 can be realized by driver software for an output device, or driver software for an output device and an output device.
Operation of Data Prediction Apparatus According to Embodiment of Present Invention
Next, the operation of the data prediction apparatus 100 according to the embodiment of the present invention will be described.
Learning Processing Routine
First, when history information of the high-dimensional array data is input from the operation unit 10, the data prediction apparatus 100 stores the history information of the high-dimensional array data in the high-dimensional array data storage device 12, and when external information is input by the operation unit 10, stores the external information in the external information storage device 14. Then, the data prediction apparatus 100 executes a learning processing routine illustrated in
First, in step S100, each of the parameter sets
{vk(1),vk(2),vk(3),vk(4)}k=1K, a, b, ha, hb, , , αh
is initialized.
In step S102, on the basis of the parameter sets
{vk(1),vk(2),vk(3),vk(4)}k=1K, a, b, ha, hb, , , αh
the weighting parameters
a, b, ha, hb
are updated according to the above equations (17) to (20).
In step S104, on the basis of the parameter sets
{vk(1),vk(2),vk(3),vk(4)}k=1K, a, b, ha, hb, , , αh
the factor matrices
{vk(1),vk(2),vk(3),vk(4)}k=1K
are updated according to the above equation (21).
In step S106, on the basis of the parameter sets
{vk(1),vk(2),vk(3),vk(4)}k=1K, a, b, ha, hb, , , αh
the tensors
,
are updated according to the above equations (22) and (23).
Also, αh
are updated.
In step S108, it is determined whether a predetermined convergence determination condition is satisfied, and if the convergence determination condition is not satisfied, the process returns to step S102, and on the other hand, if the convergence determination condition is satisfied, the process proceeds to step S110.
As the convergence determination condition, a condition where the estimated change amount of each parameter is equal to or less than a threshold or that a predetermined number of repetitions is reached may be used.
In step S110, the parameter sets
{vk(1),vk(2),vk(3),vk(4)}k=1K, a, b, ha, hb, , , αh
finally updated in steps S102 to S106 are stored in the parameter storage unit 18, and the learning processing routine ends.
Data Prediction Processing Routine
Next, the data prediction processing routine illustrated in
When the learning processing routine is executed, the parameter sets
{vk(1),vk(2),vk(3),vk(4)}k=1K, a, b, ha, hb, , , αh
are stored in the parameter storage unit 18, and information on a prediction target time and the location is input, the data prediction apparatus 100 executes a data prediction processing routine illustrated in
In step S20, the operation unit 10 receives information on the prediction target time and the location.
In step S122, the parameter sets
vk(1), vk(2), vk(3), b
for the high-dimensional array data stored in the parameter storage unit 18 are read.
In step S124, on the basis of the parameter sets read in step S122, the population for the week, day of the week, time slot, and location corresponding to the prediction target time is predicted according to the above equation (24).
In step S126, the output unit 24 outputs the population for the week, day of the week, time slot, and location corresponding to the prediction target time, predicted in step S124 as a result and ends the data prediction processing routine.
As described above, according to the data prediction apparatus according to the embodiment of the present invention, it is possible to select only information that explains the data well from a plurality of types of external information and accurately predict the data for a prediction target time by decomposing the high-dimensional array data into a weighted sum of products of a plurality of factor matrices for each rank using weighting parameters for each rank in tensor factorization and decomposing the external information data into a weighted sum of products of a plurality of factor matrices for each rank using weighting parameters for each rank and a plurality of factor matrices including a factor matrix common to the high-dimensional array data, under a sparse constraint of the weighting parameters for each rank.
The present invention is not limited to the embodiment described above, and various modifications and applications are possible without departing from the gist of the present invention.
For example, in the above embodiment, the case where a tensor representing external information is used has been described as an example, but the invention is not limited to the case, and a matrix representing external information may be used.
Further, the above-described data prediction apparatus 100 includes a computer system inside, but the “computer system” includes an environment for providing homepages (or environment for displaying homepages) if a WWW system is used.
In addition, in the specification of the present application, the embodiment in which the program is installed in advance has been described, but the program can be provided by being stored in a computer-readable recording medium, or can be provided via a network.
Number | Date | Country | Kind |
---|---|---|---|
2018-071497 | Apr 2018 | JP | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/JP2019/011877 | 3/20/2019 | WO | 00 |