The disclosed technique relates to a parameter estimation device, a parameter estimation method, and a parameter estimation program.
The Markov process is a highly versatile model that can represent a variety of dynamic systems and is used in a variety of applications, such as analysis of human or traffic flow in cities, analysis of ticket window queues, and the like.
Because the transition probability and the initial state probability, which are the parameters having the Markov process, are commonly not known, it is necessary to perform estimation from observation data. If ideal observation data obtained by observing transitions between states are available, the transition probability can be estimated based on the number of transitions between the states (NPL 1).
NPL 1: Patrick Billingsley, “Statistical Methods in Markov Chains”, The Annals of Mathematical Statistics, pp. 12-40, 1961.
However, observation data collected in a real environment are expressed as transition data (hereinafter referred to as “censored transition data”) in which observation is partially aborted due to the presence of unobservable states. Existing parameter estimation techniques cannot estimate parameters of an original Markov chain having observable states and unobservable states from censored transition data. Because unobservable states do not appear at all in observation data, an estimation result showing that the probability of transition to an unobservable state is 0 is obtained.
The disclosed technique has been made in view of the foregoing, and has an object to provide a parameter estimation device, method, and program for estimating parameters of a Markov chain model including unobservable states.
A first aspect of the present disclosure is a parameter estimation device including: an input unit configured to receive input data including a state set of a Markov chain to be estimated, a set of observable states, and censored transition data represented by a transition between the observable states and initial states of the observable states; an estimation unit configured to optimize an objective function including a term representing a degree of match of a transition probability of a first Markov chain generating the censored transition data received by the input unit and a transition probability of a second Markov chain made from a model representing the Markov chain to be estimated and the set of the observable states, by using a parameter to estimates the parameter; and an output unit configured to output the parameter estimated by the estimation unit.
A second aspect of the present disclosure is a parameter estimation method including: receiving, by an input unit, input data including a state set of a Markov chain to be estimated, a set of observable states, and censored transition data represented by a transition between the observable states and initial states of the observable states; optimizing, by an estimation unit, an objective function including a term representing a degree of match of a transition probability of a first Markov chain generating the censored transition data received by the input unit and a transition probability of a second Markov chain made from a model representing the Markov chain to be estimated and the set of the observable states, by using a parameter, and estimating the parameter; and outputting, by an output unit, the parameter estimated by the estimation unit.
A third aspect of the present disclosure is a parameter estimation program for causing a computer to function as: an input unit configured to receive input data including a state set of a Markov chain to be estimated, a set of observable states, and censored transition data represented by a transition between the observable states and initial states of the observable states; an estimation unit configured to optimize an objective function including a term representing a degree of match of a transition probability of a first Markov chain generating the censored transition data received by the input unit and a transition probability of a second Markov chain made from a model representing the Markov chain to be estimated and the set of the observable states, by using a parameter to estimate the parameter; and an output unit configured to output the parameter estimated by the estimation unit.
According to the disclosed techniques, it is possible to estimate a parameter of a Markov chain model including unobservable states.
Hereinafter, one example of embodiments of the disclosed technique will be described with reference to the drawings. Note that, in the drawings, the same reference numerals are given to the same or equivalent constituent elements and parts. Dimensional ratios in the drawings are exaggerated for the convenience of description and thus may be differ from actual ratios.
First, prior to describing the details of the embodiments, censored transition data will be described.
As noted above, observation data collected in a real environment is expressed as data in which some states cannot be observed, i.e., censored transition data where observation is partially aborted, because there are unobservable states.
A case in which some states cannot be observed will be described in detail using examples. First, a first example is movement history data of a vehicle in an area provided by a taxi company or the like. The movement history data is data obtained by converting location information such as Global Positioning System (GPS) data, for example. In this case, the movement of the vehicle can be expressed as a Markov chain where each point in the range of travel of the vehicle is a state and each movement of the vehicle between the points is a state transition.
Meanwhile, as shown in
Thus, as indicated by solid arrows and a dashed arrow in
A second example of a case in which some states cannot be observed is movement history data from a railway or bus operating company. The movement history data in this case is data indicating a history of movement between own stations, bus stops, and stations and bus stops recorded by users presenting IC cards or the like at the time of entrance/exit or getting on/off
As an ideal situation, as shown in
Thus, similarly to the example described above, the observation data in this example is also expressed as censored transition data, which represents transitions between observable states only, as indicated by solid arrows and dashed arrows in
As noted above, existing parameter estimation techniques cannot estimate parameters of an original Markov chain having observable and unobservable states from censored transition data. Thus, the disclosed technology proposes an approach to estimating parameters of an original Markov chain from censored transition data. In the disclosed technique, a theory related to a Markov chain (hereinafter referred to as “censored Markov chain”) having unobservable states is utilized. Embodiments according to the disclosed technique will be described in detail after the Markov chain and the censored Markov chain are described.
Note that in the present specification, “<<A>>” represents the letter A in cursive in mathematical equations (A is an arbitrary symbol), and “<A>” represents a bold letter A in mathematical equations.
Assume that <<X>>={1, 2, . . . , |<<X>>|} is a set of states. The Markov chain at discrete times on the state set <<X>> is defined as a stochastic process {Xt; t=1, 2, . . . } having the Markov property shown in Equation (1) below.
A Markov chain can be defined by a set of three elements {<<X>>, <<P>>, q}. <<P>>: <<X>>×<<X>>→[0, 1] is a transition probability, q: <<X>>>→[0, 1] is an initial state probability, which are defined as in Equation (2) below.
[Math. 2]
(xnext|x)Pr(Xi+1=xnext|Xt=x) and q(x0)Pr(C0=x0) (2)
Hereinafter, a Markov chain is assumed to be an irreducible Markov chain.
Further, the definition of a censored Markov chain is given. A censored Markov chain may be referred to as a censored process, a watched Markov chain, or an induced chain (References 1 to 3).
Reference 1: John G Kemeny, J Laurie Snell, and Anthony W Knapp, “Denumerable Markov Chains”, Vol. 40. Springer-Verlag New York, 1976.
Reference 2: David A Levin and Yuval Peres, “Markov Chains and Mixing Times”, Vol. 107. American Mathematical Soc., 2017.
Reference 3: Y Quennel Zhao and Danielle Liu, “The Censored Markov Chain and the Best Augmentation”, Journal of Applied Probability, Vol. 33, No. 3, pp. 623-629, 1996.
It is assumed that <<O>> is a subset of the state set <<X>>, where <<O>>⊆<<X>>. <<O>> represents a set of observable states. Similarly, a set of unobservable states x is written as <<U>>. The censored Markov chain {Xct; t=1, 2, . . . } is defined as a state Xct of the time t represents an observable state that appears at a t-th position by ignoring a state which is unobservable in the original Markov chain {Xr′; t′=1, 2, . . . }. The times at which observable states appear in the original Markov chain are written as σ0, σ1, . . . , σt, where Xct;=Xot. Intuitively, the censored Markov chain can be said to have only observable states extracted from the original Markov chain. The strict definitions are as follows.
[Math. 3]
Sequence of points {σt; t=1,2, . . . } representing time Xt ϵ is defined as:
σ0=0(if X0ϵ),σ0=inf{m≥1:Xmϵ}(otherwise), σt=inf{m≥σi−1:Xmϵ}.
The sequence Xct:=Xot obtained by observing Xt in the sequence σt is referred to as a censored Markov chain.
Hereinafter, it is assumed that states are rearranged without loss of generality, and the matrix representation of the transition probability of the Markov chain <P>, (<P>)xx′=<<P>>(x′|x), and the vector representation of the initial state probability (q), (<q>)x=q(x) are given by Equation (3) below.
The matrices of <P>oo, <P>ou, <P>uo, and <P>uu are matrices having sizes of |<<O>>|×|<<O>>|, |<<O>>|×|<<U>>|, |<<U>>|×|<<O>>|, and |<<U>>|×|<<U>>|, respectively.
The following results are shown for the censored Markov chain.
The censored Markov chain is a Markov chain in accordance with the transition probability matrix shown in Equation (4) below.
[Math. 5]
R
P
oo
+P
ou(I−Puu)−1Puo (4)
The following theorem can be derived for the initial state probability with substantially similar proof as described above.
The initial state probability of the censored Markov chain is the initial state vector shown in Equation (5) below.
[Math. 6]
s
q
o
+q
u(I−Puu)−1Puo (5)
Theorems 1 and 2 show that the censored Markov chain made from the Markov chain {<<X>>, <<P>>, q} and the set of observable states <<O>> is a Markov chain {<<O>>, <<R>>, s}. <<R>> is a set of transition probabilities according to the transition probability matrix <R> described above, and is a set of initial state probabilities according to the initial state vector <s> described above.
Hereinafter, embodiments according to the disclosed techniques will be described.
Next, a hardware configuration of the parameter estimation device 10 according to the present embodiment will be described.
As illustrated in
The CPU 11 is a central processing unit that executes various programs and controls each unit. In other words, the CPU 11 reads a program from the ROM 12 or the storage 14 and executes the program using the RAM 13 as a work area. The CPU 11 performs control of each of the components described above and various arithmetic operation processes in accordance with a program stored in the ROM 12 or the storage 14. In the present embodiment, a parameter estimation program for executing the parameter estimation process described below is stored in the ROM 12 or the storage 14.
The ROM 12 stores various programs and various kinds of data. The RAM 13 is a work area that temporarily stores a program or data. The storage 14 is constituted by a hard disk drive (HDD) or a solid state drive (SSD) and stores various programs including an operating system and various kinds of data.
The input device 15 includes a pointing device such as a mouse and a keyboard and is used for performing various inputs.
The display device 16 is, for example, a liquid crystal display and displays various kinds of information. The display device 16 may employ a touch panel system and function as the input device 15.
The communication I/F 17 is an interface for communicating with other devices and, for example, uses a standard such as Ethernet (trade name), FDDI, or Wi-Fi (trade name).
Next, a functional configuration of the parameter estimation device 10 will be described.
As illustrated in
The input unit 101 receives input data and stores the input data in the input data storage unit 201. The input data includes the following data (i) to (iii).
Nij represents the number of transitions from an observable state i ϵ <<O>> to an observable state j ϵ <<O>>, and Ninik represents the number of times that the observable state k ϵ <<O>> is observed as an initial state.
The input unit 101 receives setting parameters (details described below) and stores the setting parameters in the setting parameter storage unit 202.
The estimation unit 102 estimates the parameters of the model to be estimated, by using the input data stored in the input data storage unit 201 and the setting parameters stored in the setting parameter storage unit 202. The estimation unit 102 stores the estimated parameters in the model parameter storage unit 203.
Any model that represents the transition probability and the initial state probability of the original Markov chain can be utilized for the model to be estimated. The parameters of the model are written as θ=(η, λ), the model of the transition probability is written as Pη, and the model of the initial state probability is written as qλ. A specific example of the model will be described below. The transition probability and initial state probability of the original Markov chain when this model is used are written as in Equation (6) below.
[Math. 7]
Pr(Xt+1=xi|Xt=xi,θ)=Pijij,Pr(X0=xi|θ)=qiλ. (6)
Similarly to Equation (3), it is assumed that states are rearranged without loss of generality, and the matrix representation of the transition probability of the Markov chain, and the vector representation of the initial state probability are given by Equation (7) below.
The estimation unit 102 estimates the parameters by optimizing the objective function. Any function giving smaller values when the true distribution of generating data and the probability distribution of the model are close to one another, such as Kullback-Leibler divergence (KL divergence), can be utilized for the objective function. The following describes a case in which KL divergence is utilized.
The censored transition data, which is the input data, may be considered to be derived from the censored Markov chain {<<O>>, <R>*, <s>*}. <R>* and <s>* are unknown true parameters. From Theorems 1 and 2, the transition probability and the initial state probability of the censored Markov chain made from the model Pη, qλ and observable states <<O>> are given by <R>η and <s>ηλ in the following Equation (8).
[Math. 9]
Pr(Xi+1e=xj|Xtc=xijθ)=(Rij)ij,RηPooη+Pouη(I−Puuη)−1PuoηPr(X0c=xi|θ)=(sη,λ)i,sη,λqoλ+quλ(I−Puuη)−1Puoη. (8)
Thus, KL divergence between <R>η and <R>*, KL divergence between <s>η,λ and <s>*, and the linear sum of the regularization terms that prevent divergence of the estimation parameters can be utilized as an objective function. The objective function can be defined by Equation (9) below, except for the terms not relying on the parameters.
[Math. 10]
(θ)=−Nijlog(Rη)ij−αN kinilog(sη,λ)k+βΩ(θ). (9)
Here, Ω(θ) is a regularization term of the parameters, and any such as L2 norm can be used. α and β are hyperparameters that define the contribution of each term to the objective function.
Any optimization technique, such as a gradient method or a Newton's method, can be applied to optimization of the objective function. In a case where a gradient method is utilized, it is only required that update of the parameters is repeated according to Expression (10) below in a k-th optimization step.
[Math. 11]
θk+1←θk−γk∇θ(θ). (10)
Here, γk is a learning rate parameter. The gradient of the objective function ∇η<<L>>(θ) may use a function calculated and derived, or may use a numerically calculating method.
Here, examples of the input models Pη and qλ are illustrated. The model Pη for the transition probability may use the model shown in Equation (11) below having a parameter η={<v>base, <v>ftr}.
Here, g(i, j; η) is a score function defined by g(i, j; η)=vbaseij+φ(i, j)T<v>ftr, where φ(i, j) is a feature vector. The feature vector φ(i, j) is a vector with any attribute information relating to the state i and state j, and may represent, for example, a geographic distance between states, etc.
Similarly, the model qλ for the initial state probability may use the model shown in Equation (12) below having a parameter λ={<w>base, <w>ftr}.
[Math. 13]
(qλ)i=exp{h(i;λ)}/Σkexp{h(k;λ)}, (12)
Here, h (i; λ) is the score function defined by h(i; λ)=wbasei+Ψ(i)T <w>ftr, where Ψ (i) is a feature vector. The feature vector Ψ (i) is a vector with any attribute information relating to the state i, and may represent, for example, whether or not the state is a commercial region or the like.
The output unit 103 reads out and outputs the model parameter θ=(η, λ) from the model parameter storage unit 203. From this model parameter θ, the transition probability Pη and the initial state probability qλ of the original Markov chain are obtained.
Note that in a case where all of the states are observable states <<X>>=<<O>>, the problem setting in the present embodiment is a problem of estimating the parameters from normal transition data in an ideal environment, rather than censored transition data (NPL 1).
Next, effects of the parameter estimation device 10 will be described.
At step S101, the CPU 11 receives, as the input unit 101, the state set of the original Markov chain <<X>>, the set of observable states <<O>>, and the censored transition data D, which are input data, and stores the input data in the input data storage unit 201. The CPU 11 receives, as the input unit 101, setting parameters such as hyperparameters α and β of the objective function, and the learning rate parameter γk used during optimization, and stores the parameters in the setting parameter storage unit 202.
Next, at step S102, the CPU 11 reads, as the estimation unit 102, the input data from the input data storage unit 201, reads out the setting parameters from the setting parameter storage unit 202, and defines the objective function as illustrated in Equation (9), for example.
Next, at step S103, the CPU 11 initializes, as the estimation unit 102, the model parameter θ within the objective function defined at step S102 above.
Next, at step S104, the CPU 11 calculates, as the estimation unit 102, the gradient ∇θ<<L>>(θ) of the objective function in the model parameter θ, and updates θ by Expression (10).
Next, at step S105, the CPU 11 adds, as the estimation unit 102, one to the count of the number of repetitions of the optimization step of the objective function to update.
Next, at step S106, the CPU 11 determines, as the estimation unit 102, whether or not the number of repetitions exceeds a predetermined maximum number of repetitions. In a case where the number of repetitions exceeds the maximum number, the process proceeds to step S107. In a case where the number of repetitions does not exceed the maximum number, the process returns to step S104.
At step S107, the CPU 11 stores, as the estimation unit 102, the estimated model parameter θ in the model parameter storage unit 203. Then, the CPU 11 reads out and outputs, as the output unit 103, the model parameter θ stored in the model parameter storage unit 203, and the parameter estimation process ends.
As described above, the parameter estimation device according to the present embodiment receives the input data including the state set of the Markov chain to be estimated <<X>>, the set of observable states <<O>>, and the censored transition data D. Then, the parameter estimation device estimates the parameter θ(η,λ) by optimizing the objective function including terms representing the degree of match of the transition probabilities and the initial state probabilities of the next two censored Markov chains. The first one is the transition probability <R>* and the initial state probability <s>* of the censored Markov chain to generate the censored transition data D. The second one is the transition probability <R>η and the initial state probability <s>η,λ of the censored Markov chain created from the model representing the Markov chain to be estimated by using the parameter θ(η,λ) and the set of observable states <<O>>. In this way, according to the parameter estimation device according to the present embodiment, it is possible to estimate parameters of an original Markov chain including unobservable states from censored transition data. The possibility of such estimation allows a system represented by an original Markov chain to be learned in more detail.
Note that in the embodiments described above, a case has been described in which a gradient method is used in the optimization of the objective function for estimation of the model parameters, but the present invention is not limited thereto, and any optimization technique such as Newton's method can be used. The model of the state transition probability, the model of the initial state probability, and the regularization term of the objective function in the embodiments described above are examples, and any such model can be used.
In the embodiments described above, a case has been described in which both the term representing the degree of match of the transition probability and the term representing the degree of match of the initial state probability are included in the objective function, but the objective function according to the disclosed techniques may include at least a term representing the degree of match of the transition probability.
Note that, in each of the embodiments described above, various processors other than the CPU may execute parameter estimation processing which the CPU executes by reading software (program). Examples of the processor in such a case include a programmable logic device (PLD) such as a field-programmable gate array (FPGA) of which circuit configuration can be changed after manufacturing, a dedicated electric circuit such as an application specific integrated circuit (ASIC) that is a processor having a circuit configuration designed dedicatedly for executing a specific process, and the like. The parameter estimation process may be executed by one of such various processors or may be executed by a combination of two or more processors of the same type or different types (for example, a plurality of FPGAs, a combination of a CPU and an FPGA, or the like). More specifically, the hardware structure of such various processors is an electrical circuit obtained by combining circuit devices such as semiconductor devices.
In each of the embodiments described above, although a form in which the parameter estimation process program is stored (installed) in the ROM 12 or the storage 14 in advance has been described, the form is not limited thereto. The program may be provided in the form of being stored in a non-transitory storage medium such as a compact disc read only memory (CD-ROM), a digital versatile disc read only memory (DVD-RAM), or a universal serial bus (USB) memory. The program may be in a form that is downloaded from an external device via a network.
With respect to the above embodiments, the following supplements are further disclosed.
A parameter estimation device including:
A non-transitory recording medium storing a program executable by a computer to perform a parameter estimation process,
10 Parameter estimation device
11 CPU
12 ROM
13 RAM
14 Storage
15 Input device
16 Display device
17 Communication I/F
19 Bus
101 Input unit
102 Estimation unit
103 Output unit
200 Storage unit
201 Input data storage unit
202 Setting parameter storage unit
203 Model parameter storage unit
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/JP2019/025472 | 6/26/2019 | WO | 00 |