The present disclosure relates to an estimation apparatus, an estimation method, and an estimation program.
Generally, positional information of individuals obtained from a global positioning system (GPS) or the like is provided as spatiotemporal population data in which individuals cannot be tracked in consideration of privacy. Spatiotemporal population data is data indicating populations of areas at individual time steps where the areas refer to, for example, areas obtained by dividing a geographical space into grid cells.
As a technique for estimating the number of people who have moved between areas at individual time steps, based on such spatiotemporal population data, for example, a maximum a posteriori (MAP) estimation technique on a collective graphical model (CGM) in a path graph, has been known.
NPL 1: Yasunori Akagi, Takuya Nishimura, Takeshi Kurashima, Hiroyuki Toda, “A Fast and Accurate Method for Estimating People Flow from Spatiotemporal Population Data,” Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence (IJCAI-18), pp. 3293-3300, 2018.
NPL 2: D. R. Sheldon and T. G. Dietterich, “Collective Graphical Models,” In Proceedings of the 24th International Conference on Neural Information Processing Systems, pp. 1161-1169, 2011.
However, in the case of the above estimation technique, Stirling's approximation is applied to a factorial part of an objective function, and hence, a solution far from a correct solution may be output when the total number of samples is small. This is because Stirling's approximation is not accurate when the total number of samples is small.
Further, in the case of the above estimation technique, a feasible region is continuously relaxed (that is, a constraint of taking only integer values is removed) when optimizing the objective function, and hence, a solution of a non-integer value (a non-sparse solution) may be output.
Thus, upon estimating the number of people who have moved between areas at individual time steps based on spatiotemporal population data, an estimation method is required that is capable of outputting a more accurate sparse solution in a MAP estimation on a CGM in a path graph.
It is an object of the present disclosure to improve the estimation accuracy when estimating the number of people who have moved between areas at individual time steps based on spatiotemporal population data.
According to an aspect of the present disclosure, an estimation apparatus includes an input unit configured to receive spatiotemporal population data and a probability of movement between areas as input, a construction unit configured to construct a CGM in a path graph for estimating a number of people who have moved between areas from the spatiotemporal population data and the probability of movement between areas, a generation unit configured to generate an instance of a minimum cost flow problem for performing MAP estimation in the constructed CGM, an estimation unit configured to solve the instance of the minimum cost flow problem to estimate the number of people who have moved between areas at individual time steps, and an output unit configured to output the estimated number of people who have moved between areas at the individual time steps.
According to the present disclosure, the estimation accuracy when estimating the number of people who have moved between areas at individual time steps based on spatiotemporal population data can be improved.
Hereinafter, embodiments will be described with reference to the accompanying drawings. In the present specification and drawings, components having substantially the same functional configuration are denoted by the same reference numerals and duplicate description thereof will be omitted.
First, an overview of an estimation apparatus according to the present embodiment will be described. In performing MAP estimation on a CGM to improve the estimation accuracy when estimating the number of people who have moved between areas at individual time steps from spatiotemporal population data, the estimation apparatus according to the present embodiment
That is, the estimation apparatus according to the present embodiment replaces an optimization problem of an objective function with a minimum cost flow problem, to implement an estimation method in which Stirling's approximation is not applied to the factorial part of the objective function and a feasible region is not continuously relaxed when optimizing the objective function. Thereby, the estimation apparatus according to the present embodiment can output a sparse solution with higher accuracy.
Thus, in the following,
First, an overview of MAP estimation on a CGM in a path graph will be described.
Here, let H=(N, A) be an undirected graph, and consider a CGM expressed by a probability mass function shown in the following formulas (1) and (2) (see reference numeral 110 in
In the above formulas (1) and (2), the symbols are,
ϕij(xi,xj|θ): Local potential defined for random variable (Xi, Xj)
Z(θ) : Normalization constant (distribution function) [Math. 2]
wherein a random variable Xi takes values in a finite set
X
i [Math. 3]
and a random variable Xj takes values in a finite set
X
j [Math. 4]
Denoting samples of random variables in the above CGM as
X
(1)
, . . . , X
(M) [Math. 5]
contingency tables for vertices
n
i=(ni(xi)|xi∈Xi) [Math. 6]
and contingency tables for edges
n
ij=(nij(xi,xj)|xi∈Xi,xj∈Xj) [Math. 7]
can be defined by the following formulas (3) and (4) (see reference numeral 120 in
In the above formulas (3) and (4), II(⋅) is an indicator function.
With these definitions, a distribution of n (referred to as a CGM distribution) can be expressed by the following formulas (6), (7), and (8) (see reference numeral 130 in
Meanwhile, observed values y are generated from a certain probability distribution p(y|n) representing observation noise. Typical examples include a model in which values nij of edges are observed (formula (9) below) and a model in which values ni of vertices are observed (formula (10) below) (see reference numeral 210 in
In the following, the model in which values ni of vertices are observed will be considered for ease of explanation.
A posterior distribution of n is given by
p(n|y;θ)∝p(n;θ)·p(y|n) [Math. 11]
(see reference numeral 220 in
maxnp(n|y;θ) [Math. 12]
(see reference numeral 230 in
Here, considering a minimization problem of
−log p(n|y;θ) [Math. 13]
(see reference numeral 240 in
Here, the objective function in the above formula (11) is as shown in the following formula (12) (see reference numeral 310 in
Here, for ease of explanation,
X
i={1,2, . . . ,R} [Math. 16]
is defined for any i=1, 2, . . . , |N|, and
n
ijk
:=n
i,i+1(j,k)
ϕijk:=ϕi,i+1(j,k)
n
ij
:=n
i(j)
y
ij
:=y
i(j) [Math. 17]
are defined for any i=1, 2, . . ., |N|; then, in the case of a CGM in a path graph, vi is expressed as follows:
and hence, the objective function of the formula (12) can be expressed by the following formula (14) (see reference numeral 320 in
Further, defining the symbols in the above formula as follows:
f
ijk(z)=log z!−z·log ϕijk
g(z)=−log z!
h
ij(z)=−log[pi,j(yij|z)] [Math. 20]
the objective function of the above formula (14) can be expressed by the following formula (15) (see reference numeral 330 in
Here, the objective function of the above formula (15) can be explained in a framework of estimating the number of people who have moved between areas at individual time steps from spatiotemporal population data, with the following variables:
Note that as described above, conventionally, when solving the optimization problem (of equation (11)), Stirling's approximation
log x!≈x log x−x [Math. 22]
is applied to the factorial part of the objective function. Therefore, a solution far from a correct solution is output when the total number of samples is small; and
In contrast, in order to solve such problems, the estimation apparatus according to the first embodiment generates an instance of a minimum cost flow problem for performing MAP estimation on a CGM as described above.
Capacity constraint of each edge (i,j)∈E:uij∈≥0
Cost function of each edge (i,j)∈E:cij:≥0→
Demand of vertex i∈V:bi∈≥0 [Math. 23]
are assigned as shown in
Here, letting xij be a flow flowing through an edge (i, j), the minimum cost flow problem can be formulated as in the following formula (16) (see reference numeral 410 in
Then, in order to replace the optimization problem of the objective function (of formula (11)) with the formulated minimum cost flow problem, the estimation apparatus according to the present embodiment generates an instance of the minimum cost flow problem in a directed graph G=(V, E) according to the following procedure (see reference numeral 420 in
1) Let vertex set be V={o}∪((1)∪v(1))∪((2)∪v(2)) . . . ∪((|N|)∪v(|N|))∪{d}. Here, (i):={j(i)}j=1R, v(i):={vj(i)}j=1R.
2) Edges (0, +∞) are created from vertex o to vertices j(1) for j∈[R].
3) Edges (0, +∞) are created from vertices vj(|N|) to vertex d for j∈[R].
4) Edges (hij(z), +∞) are created from vertices j(i) to vertices vj(i) for i=1, |N|, j∈[R].
5) Edges (hij(z)−g(z), +∞) are created from vertices j(i) to vertices vj(i) for i=2, . . . , |N|−1, j∈[R].
6) Edges (fijk(z), +∞) are created from vertices vj(i) to vertices k(i+1) for i∈[|N|−1], j∈[R], k∈[R].
7) bo=M, bd=−M, bv=0 (v∈V\{o,d}) are set. [Math. 25]
Further, the estimation apparatus according to the present embodiment obtains,
[Math. 26]
n*
ijk:(Amount of flow flowing through edge from vertex vj(i) to vertex uk(i+1) (17)
n*
ij:(Amount of flow flowing through edge from vertex uj(i) to vertex vj(i) (18)
Here, the solution n* is an optimum solution in the optimization problem of the objective function (of formula (11)). As described above, the estimation apparatus according to the first embodiment replaces the optimization problem of the objective function with an instance of a minimum cost flow problem and solves the instance of the minimum cost flow problem to perform MAP estimation on a CGM.
Next, a shortest path iteration method used when solving an instance of a minimum cost flow problem will be described.
The shortest path iteration method is one of the methods of solving the minimum cost flow problem. Specifically, first, a residual graph is constructed for the minimum cost flow problem and flows are initialized for all edges (i, j) (see reference numeral 510 in
Next, in the residual graph, a shortest path from each vertex i that satisfies
b
i−(Σj:(i,j)∈Exi,j−Σj:(j,i)∈Exji)>0 [Math. 27]
to each vertex j that satisfies
b
i−(Σj:(i,j)∈Exij−Σj:(j,i)∈Exji)<0 [Math. 28]
is searched. Then, the flows are updated according to the found shortest paths (see reference numeral 530 in
When searching for a shortest path, a slow Bellman-Ford method is used because it is also necessary to take into account edges with negative costs. However, a fast Dijkstra method (of Reference 3) can be applied when flows are repeatedly updated while values defined for vertices, which are called potentials, are retained in the algorithm.
Next, the estimation apparatus according to the first embodiment that implements the above estimation method will be described in detail.
(1) Hardware Configuration of Estimation Apparatus
First, a hardware configuration of the estimation apparatus according to the first embodiment will be described.
The processor 601 includes various arithmetic/logic devices such as a central processing unit (CPU) and a graphics processing unit (GPU). The processor 601 reads and executes various programs (for example, an estimation program that will be described later) on the memory 602.
The memory 602 includes main memory devices such as a read-only memory (ROM) and a random access memory (RAM). The processor 601 and the memory 602 form a so-called computer, and the computer implements various functions by the processor 601 executing various programs loaded in the memory 602.
The auxiliary storage device 603 stores various programs and various data used when the various programs are executed by the processor 601. The auxiliary storage device 603 implements, for example, a movement probability storage unit 711, a spatiotemporal population data storage unit 712, and an estimated moved people count storage unit 713 that will be described later.
The interface device 604 is a connection device for connecting the estimation apparatus 600 with an operation device 610 and a display device 611 which are examples of external devices. The interface device 604 receives an operation on the estimation apparatus 600 through the operation device 610. The interface device 604 also outputs results of processing by the estimation apparatus 600 and displays the results the display device 611.
The communication device 605 is a communication device for communicating with other devices via a network.
The drive device 606 is a device for setting a recording medium 612 therein. The recording medium 612 referred to here includes media that record information optically, electrically, or magnetically such as a CD-ROM, a flexible disk, and a magneto-optical disc. The recording medium 612 may also include semiconductor memories and the like that electrically record information such as a ROM and a flash memory.
Note that various programs are installed in the auxiliary storage device 603, for example, by setting a distributed recording medium 612 in the drive device 606 and reading the various programs recorded on the recording medium 612 by the drive device 606. Alternatively, various programs may be installed in the auxiliary storage device 603 by being downloaded from the network via the communication device 605.
(2) Functional Configuration of Estimation Apparatus
Next, a functional configuration of the estimation apparatus 600 according to the first embodiment will be described.
The operation unit 701 provides an interface for receiving various instructions for operating the respective units of the estimation apparatus 600. A user of the estimation apparatus 600 inputs various instructions to the estimation apparatus 600 via the interface provided by the operation unit 701.
Instructions input by the user of the estimation apparatus 600 in the first embodiment include
The input unit 702 acquires movement probability data and spatiotemporal population data, which the operation unit 701 has issued an instruction to input, and stores the acquired movement probability data and spatiotemporal population data in the movement probability storage unit 711 and the spatiotemporal population data storage unit 712, respectively.
The CGM construction unit 703 is an example of a construction unit. In response to receiving an instruction to execute a process of estimating the number of people who have moved between areas at individual time steps from the operation unit 701, the CGM construction unit 703 reads the movement probability data and the spatiotemporal population data from the movement probability storage unit 711 and the spatiotemporal population data storage unit 712.
The CGM construction unit 703 also constructs a CGM in a path graph based on the read movement probability data and spatiotemporal population data, and formulates an optimization problem for estimating the number of people who have moved (which corresponds to reference numerals 110 to 330 in
The minimum cost flow problem construction unit 704 is an example of a generation unit. The minimum cost flow problem construction unit 704 formulates a minimum cost flow problem based on both the CGM constructed by the CGM construction unit 703 and the optimization problem for estimating the number of people who have moved, and then, generates an instance of the minimum cost flow problem (which corresponds to
The shortest path iteration unit 705 is an example of an estimation unit. The shortest path iteration unit 705 solves the generated instance of the minimum cost flow problem using the shortest path iteration method to estimate the number of people who have moved (which corresponds to
In response to receiving an instruction to output an estimation result from the operation unit 701, the output unit 706 reads the number of people who have moved between areas at individual time steps stored in the estimated moved people count storage unit 713 and the movement probability data stored in the movement probability storage unit 711 and outputs them to the user.
(3) Specific Example of Data Stored in Storage Units of Estimation Apparatus
Next, a specific example of data stored in the storage units (the movement probability storage unit 711, the spatiotemporal population data storage unit 712, and the estimated moved people count storage unit 713) of the estimation apparatus 600 according to the first embodiment will be described.
Movement probability data 810 is data indicating probabilities of movement between areas at individual time steps. As shown in
The fields of “departure time stamp” store time steps (time points at hourly intervals in the example of
Spatiotemporal population data 820 is data indicating the population of each area at individual time steps. As shown in
The fields of “time stamp” store time steps. The field of “area ID” stores an identifier indicating an area where a population has been observed at a time step stored in a corresponding field of “time stamp”. The field of “population information” stores a population observed in an area stored in a corresponding field of “area ID” at a time step stored in a corresponding field of “time stamp”.
Estimated moved people count data 830 is data indicating the number of people who have moved that the shortest path iteration unit 705 has estimated by solving the minimum cost flow problem. As shown in
The fields of “departure time stamp” store time steps. The field of “departure area” stores an identifier indicating a source area from which people have moved at a time step stored in a corresponding field of “departure time stamp”. The field of “arrival area” stores an identifier indicating a destination area to which people have moved at a time step stored in a corresponding field of “departure time stamp”. The field of “estimated moved people count” stores an estimation result of the number of people who have moved from a movement source stored in a corresponding field of “departure area” to a movement destination stored in a corresponding field of “arrival area” at a time step stored in a corresponding field of “departure time stamp”.
(4) Example of Generating Instance of Minimum Cost Flow Problem by Estimation Apparatus
Next, an example of generating an instance of the minimum cost flow problem by the minimum cost flow problem construction unit 704 of the estimation apparatus 600 according to the first embodiment will be described.
1) A vertex set includes a source 900, a sink 940, vertices 910, vertices 920, and vertices 930.
2) Edges (0, +∞) are created from the source 900 to vertices 910_u1(1), 920_u2(1), and 930_u3(1) of time stamp 1 (i=1) (see reference numeral 951).
3) Edges (0, +∞) are created from vertices 910_v1(3), 920_v2(3), and 930_v3(3) of time stamp 3 (i=3) to the sink 940 (see reference numeral 952).
4) For time stamp 1 (i=1), edges (h1j(z), +∞) are created from vertices 910_u1(1), 920_u2(1), and 930_u3(1) to vertices 910_v1(1), 920_v2(1), and 930_v3(1) (see reference numeral 953). For time stamp 3 (i=3), edges (h3j(z), +∞) are created from vertices 910_u1(3), 920_u2(3), and 930_u3(3) to vertices 910_v1(3), 920_v2(3), and 930_v3(3) (see reference numeral 954).
5) For time stamp 2 (i=2), edges (h2j(z)+g(z), +∞) are created from vertices 910_u1(2), 920_u2(2), and 930_u3(2) to vertices 910_v1(2), 920_v2(2), and 930_v3(2) (see reference numeral 955).
6) For time stamp 1 (i=1), edges (f1jk(z), +∞) are created
In this way, the minimum cost flow problem construction unit 704 generates an instance of the minimum cost flow problem.
(5) Flow of Moved People Count Estimation Process Performed by Estimation Apparatus
Next, a flow of a moved people count estimation process performed by the estimation apparatus 600 will be described.
In step S1001, the input unit 702 acquires movement probability data based on an instruction from the operation unit 701, and stores the movement probability data in the movement probability storage unit 711.
In step S1002, the input unit 702 acquires spatiotemporal population data based on an instruction from the operation unit 701, and stores the spatiotemporal population data in the spatiotemporal population data storage unit 712.
In step S1003, the CGM construction unit 703 reads the movement probability data and the spatiotemporal population data from the movement probability storage unit 711 and the spatiotemporal population data storage unit 712, respectively, and constructs a CGM in a path graph.
In step S1004, the CGM construction unit 703 formulates an optimization problem of an objective function, and the minimum cost flow problem construction unit 704 formulates a minimum cost flow problem.
In step S1005, the minimum cost flow problem construction unit 704 generates an instance of the minimum cost flow problem for estimating the number of people who have moved based on both the constructed CGM and the optimization problem for estimating the number of people who have moved.
In step S1006, the shortest path iteration unit 705 estimates the number of people who have moved by solving the minimum cost flow problem using a shortest path iteration method.
In step S1007, the output unit 706 outputs the number of people who have moved between areas at individual time steps and the input movement probability data.
As is apparent from the above description, the estimation apparatus 600 according to the first embodiment,
As described above, the estimation apparatus 600 according to the first embodiment replaces the optimization problem of the objective function with an instance of a minimum cost flow problem to implement an estimation method in which
As a result, according to the estimation apparatus 600 according to the first embodiment, it is possible to avoid problems that are caused in the MAP estimation on a CGM when estimating the number of people who have moved between areas at each time stamp from spatiotemporal population data, where the problems are:
That is, the estimation apparatus 600 according to the first embodiment can improve the estimation accuracy when estimating the number of people who have moved between areas at individual time steps based on spatiotemporal population data.
Although the first embodiment has been described above assuming that a shortest path iteration method is used when solving an instance of a minimum cost flow problem, the method of solving an instance of a minimum cost flow problem is not limited to the shortest path iteration method.
Note that the present invention is not limited to the configurations shown here, which include the configurations or the like described in the above embodiments, combinations thereof with other elements, and the like. In these regards, configurations can be changed without departing from the gist of the present invention and can be appropriately defined according to the application form thereof.
600: Estimation apparatus
701: Operation unit
702: Input unit
703: CGM construction unit
704: Minimum cost flow problem construction unit
705: Shortest path iteration unit
706: Output unit
810: Movement probability data
820: Spatiotemporal population data
830: Estimated moved people count data
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/JP2020/021223 | 5/28/2020 | WO |