The present invention relates to a time-specific area population estimation method, a time-specific area population estimation apparatus, and a program.
Location information on a person obtained from a global positioning system (GPS) or the like may be provided as time-specific area population data from which an individual cannot be tracked due to privacy considerations. Here, the time-specific area population data is information on the number of people per area on a per time step basis. The area is obtained, for example, by dividing a geographic space into grid shapes. Such data is observed per constant time interval, but there is a need to estimate a population at a time at which no observation is performed.
In the related art, population prediction technology based on supervised learning (NPL 1), a semi-supervised estimation using Wasserstein Propagation (NPL 2), and the like has been proposed.
However, there are two problems with the related art.
(1) In a scheme based on supervised learning, various types of external information are required as feature quantities for estimation, and a large amount of learning data is required for performing learning of a model.
(2) In an existing semi-supervised estimation scheme, it is necessary to manually determine a cost function for measuring a distance between distributions in advance. It is difficult to determine this well when data is limited, and when an appropriate cost is not selected, a solution greatly different from the reality is likely to be output.
The present invention has been made in view of the above point, and an object is to make it possible to efficiently estimate a population at a time at which no observation is performed.
Thus, in order to solve the above problems, a computer executes a movement probability estimation procedure for estimating a time-specific interareal movement probability, based on observed time-specific population in an area and a set of candidate areas for a movement from the area in a unit time, and a time-specific area population estimation procedure for estimating a population in the area at a time at which no observation is performed, wherein the population is estimated by using a cost function learned in the estimation of the movement probability.
A population can be efficiently estimated at a time at which no observation is performed.
Hereinafter, embodiments of the present invention will be described based on the drawings.
A program that implements processing in the time-specific area population estimation apparatus 10 is provided by a recording medium 101 such as a CD-ROM. When the recording medium 101 storing the program is set in the drive device 100, the program is installed in the auxiliary storage device 102 from the recording medium 101 via the drive device 100. However, the program need not be installed from the recording medium 101, and the program may be downloaded from another computer via a network. The auxiliary storage device 102 stores the installed program and also stores necessary files, data, and the like.
The memory device 103 reads and stores the program from the auxiliary storage device 102 in response to receiving an instruction to activate the program. The processor 104 is a CPU or a graphics processing unit (GPU), or a CPU and a GPU, and executes a function related to the time-specific area population estimation apparatus 10 according to a program stored in the memory device 103. The interface device 105 is used as an interface for connection to a network.
The operation unit 11 is an interface for performing an operation from the outside, and the operation unit 11 enables operations, such as storage and correction of the input data in the observation-time-specific area population storage unit 121 through operating the input unit, start of the movement probability estimation according to an instruction directed to the movement probability estimation unit 13, start of the estimation of the area population at a time at which no observation is performed according to an instruction directed to the time-specific area population estimation unit 14, and output of an estimation result according to an instruction directed to the output unit 15.
The input unit 12 stores the observed time-specific area population data in the observation-time-specific area population storage unit 121 and corrects the data.
The movement probability estimation unit 13 reads a time-specific area population data group from the observation-time-specific area population storage unit 121, and the movement probability estimation unit 13 estimates a time-specific interareal movement probability based on the time-specific area population data group while using the collective flow diffusion model (CFDM) (A. Kumar, D. Sheldon, B. Srivastava. Diffusion Over Networks: Models and Inference. In Proceedings of the 29th Conference on Uncertainty in Artificial Intelligence 2013.).
Symbols are defined as follows.
using a movement probability from i, θi={θij|ϵΓi}. Thus, when N={Nti|tϵ[T], iϵV}, and θ={θi|iϵV} are given, a posterior probability M={Mti|tϵ[T−1], iϵV} becomes
Further, a constraint indicating a number-of-people conservation law
is satisfied.
Further, it is assumed that a movement probability θ is parameterized by a certain parameter β.
The movement probability estimation unit 13 estimates time- and area-specific movement probabilities based on CFDM (Relationships (2) to (4)), and outputs the estimated movement probability to the estimated movement probability storage unit 122.
An example of a specific processing procedure that is executed by the movement probability estimation unit 13 is as follows.
The estimation is performed by minimizing a negative logarithmic posterior probability
under constraints (3) and (4). That is, an optimization problem to be solved is
is a set of all integers equal to or greater than 0. Minimization of a likelihood function L(M, θ) is performed by alternating minimization of M and θ.
In order to update M, the optimization problem
may be solved independently for tϵ[T−2].
First, the movement probability estimation unit 13 performs preprocessing so that ΣiϵVNt, i=ΣiϵVNt+1, i is satisfied. In order to achieve this, a virtual area v is added, and Nt, v=ΣiϵVNt+1, i−ΣiϵVNt, i and Nt+1, v=0 may be set when ΣiϵVNt, i<ΣiϵVNt+1, i and, Nt, v=0 and Nt+1, v=ΣiϵVNt, i−ΣiϵVNt+1, i may be set when ΣiϵVNt, i>ΣiϵVNt, i. After performing this processing, the movement probability estimation unit 13 sets F=ΣiϵVNt, i=ΣiϵVNt+1,i.
Here, Stirling's approximation log Mtij!≅Mtij log Mtij−Mtij is applied to an objective function of problem (7) to continuously relax Mtij such that an optimization problem
is obtained. However, a term
ΣiϵVΣjϵΓ
of the objective function is omitted because the term is a constant due to the constraint. Because it is known that this optimization problem can be solved by using a Sinkhom-Knopp algorithm (P. A. Knight. The Sinkhom-Knopp algorithm: convergence and applications. SIAM Journal on Matrix Analysis and Applications. 2008), the movement probability estimation unit 13 uses this to solve the optimization problem.
Minimization regarding θ can be performed by applying a Lagrange multiplier method, a gradient method, or the like to adjust a parameter θ.
The movement probability estimation unit 13 alternately optimizes M and θ in the procedure as described above until an objective function value converges, and the movement probability estimation unit 13 outputs a finally obtained (learned) AO as the estimated movement probability to the estimated movement probability storage unit 122.
The time-specific area population estimation unit 14 reads the observed time-specific area population data from the observation-time-specific area population storage unit 121, reads the estimated movement probability from the estimated movement probability storage unit 122, and calculates a cost function regarding movement (a cost function between pieces of time-specific population area data (between time-specific population distributions)) based on the time-specific area population data and the movement probability. The time-specific area population estimation unit 14 estimates a population in each area at a time at which no observation is performed, based on the cost function, and outputs an estimation result to the estimation-time-specific area population storage unit 123. An example of a specific processing procedure that is executed by the time-specific area population estimation unit 14 is as follows.
A cost function Cij for moving from the area i to the area j is defined by Cij:=−log {circumflex over ( )}θij using the estimated movement probability {circumflex over ( )}θ. In this definition, a cost is smaller when the probability of movement from the area i to the area j is higher, and the cost is larger when the movement probability from the area i to the area j is lower. By designing such a cost function, it is possible to perform an estimation so that a large number of moving people are allocated to areas between which the movement probability is estimated to be high. The cost function Cij is estimated from {circumflex over ( )}θij, and θij is learned as described above based on the observed time-specific area population data. Thus, it can be said that the cost function Cij is learned based on the observed time-specific area population data.
The time-specific area population estimation unit 14 uses this cost function to estimate the population in each area at a time at which no observation is performed. For example, it is assumed that a population distribution Nτ at time τ (t<τ<t+1) between time t and time t+1 is desired to be obtained. A value of τ may be input by the user. A set P={pϵRV|ΣiϵVpi=F, pi≥0 (iϵV)} is considered (R is a set of real numbers), and
an optimization problem
is considered for v, μϵP to express an optimal value as fC(ν, μ) that is a function of ν and μ. In this case, an estimated value of Nτ is obtained as a solution of the following optimization problem:
This problem is a problem called Wasserstein Barycenter with Entropic Regularization, for which a method of solving at high speed is known. The time-specific area population estimation unit 14 uses this to solve the problem (M. Cuturi, A. Doucet. Fast Computation of Wasserstein Barycenters. In Proceedings of the 31st International Conference on Machine Learning. 2014).
The time-specific area population estimation unit 14 outputs the obtained Nτ to the estimation-time-specific area population storage unit 123.
The output unit 15 reads the data stored in the estimation-time-specific area population storage unit 123 and outputs the data. A data output method is not limited to a predetermined method. The data may be displayed on a display apparatus or may be stored in the auxiliary storage device 102 or the like.
As described above, according to the embodiment, a population at a time at which no observation is performed can be estimated, only from the time-specific area population data without requiring external information as a feature quantity or a large amount of learning data for performing learning of a model. Thus, it is possible to efficiently estimate a population at a time at which no observation is performed.
Furthermore, a cost function for automatically measuring a distance between pieces of time-specific area population data is learned from the time-specific area population data that is an input, so that highly accurate estimation can be performed without manually designing the cost function.
Although the embodiments of the present invention have been described above in detail, the present invention is not limited to such specific embodiments, and various modifications and changes can be made within a scope of the gist of the present invention described in the claims.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/JP2020/023481 | 6/15/2020 | WO |