DYNAMIC DISTRIBUTION ESTIMATION DEVICE, METHOD, AND PROGRAM

Information

  • Patent Application
  • 20210035000
  • Publication Number
    20210035000
  • Date Filed
    February 08, 2019
    5 years ago
  • Date Published
    February 04, 2021
    3 years ago
Abstract
An object is to estimate parameters of a model including censored data at high speed, with a saved memory, and with the parameters having temporal continuity. A responsibility update unit 19 updates a responsibility based on data of a newly observed sample. Then, a moment update unit 20 updates a moment based on the data of the newly observed sample. Then, a statistic update unit 21 updates each statistic based on the responsibility or the moment. Then, a parameter update unit 22 updates a parameter related to a component based on the statistics.
Description
TECHNICAL FIELD

The present invention relates to dynamic distribution estimation device, method, and program.


BACKGROUND ART

Censored data refers to data in which, for a sample whose observed value is equal to or greater than a certain threshold (or equal to or less than a certain value), only information indicating that the value is equal to or greater than the threshold but is observed as no specific value is obtained. As such censored data, there are represented a variety of data such as clinical data describing the onset of a disease, death of a person, and the like, contract history data of an Internet line user, and use history data of a service on an e-commerce site. Similarly to the above examples, represented as censored data is also data that is related to the arrival times of spectators in the vicinity of the location of an event, such as a music concert of a famous artist or an international game of a popular sport and is collected on the day of the event. FIG. 7 illustrates a specific example. The total number of expected visitors, which is obtained from the total ticket sales, is represented as N, and the number of visitors observed up to the present time on the day of a concert is represented as M. The arrival time data is obtained for the M people who have arrived, but it is only known that the remaining N-M people have not arrived up to the present time. This is typical censored data.


Techniques for estimating the parameters of a mixture model (batchwise) from censored data are proposed in NPL 2 and NPL 3. Here, as an example, an existing technique of a mixture normal distribution which is one of typical mixture models will be described.


<Model>


Consider a situation in which input data is right censored. Right censoring refers to a condition in which, of samples, a value of a sample the value of which is equal to or greater than a known threshold






C∈
custom-character
  [Formula 1]


is unknown. All the data obtained are represented as






custom-character={di}i=1N  [Formula 2]


, where di represents the i-th data, and di=(wi, Xi) is composed of two: a variable wi∈{0, 1} representing whether or not the value of the i-th sample is observed and an observed value,






X
i(≤C)  [Formula 3]


For wi=1, it indicates that a value is observed, and for wi=0, it indicates that no value is observed. Here, N is the total number of samples (including those whose values are not observed), and the number of samples whose values are observed is represented as






M(≤N)  [Formula 4]


In the setting considered in this study, a threshold C is known, and two of X and W are observation variables. In general, a probability density function of a mixture model is defined by the following equation.





[Formula 5]






P(x|θ={π,φ})=Σk=1Kπkfk(x|φk)  (1)


, where K represents the number of components, and





π=(π1, . . . ,πK),φ=(φ1, . . . ,φK)  [Formula 6]


represent parameters of the model.





πkk  [Formula 7]


represent a mixture ratio of the k-th component and a parameter of the component, respectively. In this paper, consider especially a case where a normal distribution is adopted as the component. (The following discussion is also true for a case of considering a mixture model of a distribution belonging to any exponential distribution family such as an exponential distribution.) A probability density function of the normal distribution is given by the following equation using parameters of two types of components, an average μk and a standard deviation σk.









[

Formula





8

]












f


(

x
|

ϕ
k


)


=


1


2

π






σ
k
2







exp


(

-



(

x
-

μ
k


)

2


2


σ
k
2




)


.






(
2
)







Hereinafter, a cumulative density function of the normal distribution is represented by a function F:





[Formula 9]






F(C|φk)=∫−∞Cf(x|φk)dx,  (3)


A process of generating censored data includes the following four steps. First, for each data i, a latent variable representing the component to which the i-th data belongs:






z
i=(zi1, . . . ,ziK)  [Formula 10]


is generated according to the following polynomial distribution. Note that zik=1 for the i-th data belonging to the k-th component, and zik=0 for other k′≠k.





[Formula 11]






P(zi|π)=Mult(zi|π)=Πk=1Kπkzik  (4)


Next, an observation variable wi indicating whether or not a value is observed is generated according to a Bernoulli distribution having a cumulative density function of the following corresponding component as a parameter.














[

Formula





12

]













P


(



w
i

|

z
i


,
φ

)


=





k
=
1

K




Bernoulli


(


w
i

|

F


(

C
|

ϕ
k


)



)



z
ik



=




k
=
1

K




(



F


(

C
|

ϕ
k


)



w
i





(

1
-

F


(

C
|

ϕ
k


)



)


1
-

w
i




)


z
ik








(
5
)







Note that the cumulative density






F(C|φk)  [Formula 13]


represents a probability that the random variable is equal to or less than the threshold C.


Further, for data i with wi=1, that is, with being observable, an observation variable






x
i∈(−∞,C)  [Formula 14]


is generated according to a truncated normal distribution.





[Formula 15]






P(xi|wi=1,zi,ϕ)=Πk=1Kcustom-charactercustom-character(xik,−∞,C)zikwi  (6)


Note that the truncated normal distribution






custom-character
custom-character(x|μ,σ2,a,b)  [Formula 16]


is defined by the following probability density function that does not take a value outside a range [a, b].









[

Formula





17

]















(


x
|
μ

,

σ
2

,
a
,
b

)


=


f


(


x
|
μ

,

σ
2


)




F


(


b
|
μ

,

σ
2


)


-

F


(


a
|
μ

,

σ
2


)








(
7
)







Finally, for the data i with wi=0, that is, with being unobservable, a latent variable yi is generated according to the truncated normal distribution.









[

Formula





18

]












P


(




y
i

|

w
i


=
0

,

z
i

,
ϕ

)


=




k
=
1

K







(



y
i

|

ϕ
k


,
C
,


)




z
ik



(

1
-

w
i


)








(
8
)







By repeating the above for all data i, observed variables X and W and latent variables Z and Y are generated.


Hereafter, for the sake of simplicity of notation, it is assumed that the generated data is rearranged so that for





1≤i≤M  [Formula 19]


, wi=1, and for






M+1≤j≤N  [Formula 20]


, wj=0. Accordingly, a likelihood function of the complete data is given by the following equation using Equations (4), (5), (6), and (8):














[

Formula





21

]













P


(

X
,
W
,
Y
,

Z
|
θ


)


=



(




i
=
1






P


(


z
i

|
π

)




P


(



w
i

|

z
i


,
ϕ

)




)

·

(




i
=
1

M



P


(



x
i

|

w
i


,

z
i

,
ϕ

)



)








(




j
=

M
+
1






P


(



y
j

|

w
j


,

z
j

,
ϕ

)



)






(
9
)







<Batch Type EM Algorithm>


The Expectation-Maximization (EM) algorithm is a technique widely used for estimating a model including latent variables. It is composed of two steps: an E-step of calculating posterior probabilities of latent variables and calculating expected values using them, and an M-step of maximizing a function called a Q-function, which is an average of log likelihood functions with respect to posterior probabilities of latent variables.


In the E-step of this model, two posterior probabilities, P(zi|xi, wi=1, θ) when an observed value is obtained and P(zi|xi, wi=0, θ) when no observed value is obtained, are required, and each is given by the following equation:












P


(



z
i

|

x
i


,


w
i

=
1

,
θ

)


=



P


(


z
i

,



x
i

|

w
i


=
1

,
θ

)



P


(




x
i

|

w
i


=
1

,
θ

)



=



π
k



f


(


x
i

|

ϕ
k


)







k






π
k



f


(


x
i

|

ϕ

k




)







,






P


(


z
j

,



y
j

|

w
j


=
0

,
θ

)


=


P


(




z
j

|

w
j


=
0

,
θ

)




P


(




y
j

|

w
j


=
0

,

z
j

,
φ

)












P


(




z
j

|

w
j


=
0

,
θ

)


=



P


(


z
j

,


w
j

=

0
|
θ



)



P


(


z
j

=

0
|
θ


)



=




π
k



(

1
-

F


(

C
|

ϕ
k


)



)






k






π

k





(

1
-

F


(

C
|

ϕ

k




)



)




.







[

Formula





22

]







Using the above posterior probabilities, responsibilities γ, η of zi, zj and a moment {vk, ξk} of yj can be calculated by the following equation.









[

Formula





23

]













γ


(

z
ik

)


=



π
k



f


(


x
i

|

ϕ
k


)







k






π
k



f


(


x
i

|

ϕ

k




)






,




(
10
)








η


(

z
jk

)


=



π
k



(

1
-

F


(

C
|

ϕ
k


)



)






k






π

k





(

1
-

F


(

C
|

ϕ

k




)



)





,




(
11
)









v
k



(


y
j



:


C

)


=





y
j




[

y
j

]


=


μ
k

+



(

σ
k

)

2




f


(

C
|

ϕ
k


)



1
-

F


(

C
|

ϕ
k


)








,




(
12
)









ξ
k



(


y
j



:


C

)


=





y
j




[

y
j
2

]


=


μ
k
2

+

σ
k
2

+



(

σ
k

)

2





(

C
+

μ
k


)



f


(

C
|

ϕ
k


)




1
-

F


(

C
|

ϕ
k


)








,




(
13
)







Here,






custom-character
y

j
  [Formula 24]


represents an average related to a posterior probability,






P(yj|wj=0,zj,ϕ)  [Formula 25]


For this averaging operation, the results of the first moment and the second moment of the truncated normal distribution are used. Also, as is apparent from Equations (12) and (13),






v
k(yj;C),ξk(yj;C)  [Formula 26]


is hereinafter represented as






v
k(y;C),ξk(y;C)  [Formula 27]


because it does not depend on the subscript j. Using these, a Q-function to be maximized in the M-step is represented by the following equation.














[

Formula





28

]


















Q


(

θ
,

θ
old


)


=




Z
,

Y
|
X

,
W
,

θ
old





[

log






P


(

X
,
W
,
Y
,

Z
|
θ


)



]







(
14
)






=





k
=
1

K




N
k






log






π
k



+




k
=
1

K




-


N
k

2




log


(

2





π






σ
k
2


)




+




k
=
1

K




-

1

2






σ
k
2






{


S

k





2


-

2






S

k





1




μ
k


+


M
k



μ
k
2



}



+




k
=
1

K




-

1

2






σ
k
2







{


U

k





2


-

2






U

k





1




μ
k


+


(


N
k

-

M
k


)



μ
k
2



}

.








(
15
)







Here,





[Formula 29]






M
kj=1Mγ(z1k),Nk=Mk+(Nk−Mk)η(zjk),  (16)






S
k1i=1Mγ(zik)xi,Sk2i=1Mγ(zik)ri2,  (17)






U
k1j=M+1Mγ(zjk)vk(y)=(Nk−Mk)vk(y;C),  (18)






U
k2j=M+1Mγ(zjkk(y)=(Nk−Mkk(y;C),  (19)


Parameters that maximize the Q-function when solved with the partial derivatives set to zero are given as









[

Formula





30

]













π
k
new

=


N
k

N


,




(
20
)








μ
k
new

=


1

N
k




(


S

k





1


+

U

k





1



)



,




(
21
)









(

σ
k
new

)

2

=



1

N
k




(


S

k





2


+

U

k





2



)


-


1

N
k
2





(


S

k





1


+

U

k





1



)

2




,




(
22
)







As a result, a batch type EM algorithm for a mixture model for censored data has been desired. Procedures are summarized in Algorithm 1 shown below. Updating the parameters is repeated through the E-step and the M-step, and in each iteration, the log likelihood function monotonically increases so that convergence to the (local) optimal solution is guaranteed.









TABLE 1





Algorithm 1 Batch type EM Algorithm for Gaussian mixture model


for Censored data

















Input: X, W: input data, C: (right) censored value, K: number



of mixed models



Output: θ = {πk, μk, σk}Kk=1: parameters for maximizing



likelihood function



 Initialize parameters {πk, μk, σk}



 repeat



  E-step: Update responsibilities γ(zi), η(zj) using current



  parameters according to Equations (10) (11), and moments



  νk {yj; C), ξk(yj; C) according to Equations (12) (13)



  M-step: Calculate statistics Mk, Nk, Sk1, Sk2, Uk1, Uk2 using



  current responsibilities according to Equations



  (16) (17) (18) (19), and Update parameter θ according to



  Equations (20) (21) (22)



 until Repeat until convergence conditions are satisfied









CITATION LIST
Non Patent Literature

[NPL 1] Didier Chauveau., “Astochastic em algorithm for mixtures with censored data.”, Journal of statistical planning inference, 46(1): p. 1-25, 1995.

  • [NPL 2] Gyemin Lee and Clayton Scott. “Em algorithms for multivariate gaussian mixture models with truncated and censored data.”, Computational Statistics & Data Analysis, 56(9): p. 2816-2829, 2012.


SUMMARY OF THE INVENTION
Technical Problem

Existing techniques are only able to perform batch type estimation on censored data.


The present invention has been made in view of the above circumstances, and it is an object of the present invention to provide dynamic distribution estimation device, method, and program capable of estimating parameters of a model including censored data at high speed, with a saved memory, and with the parameters having temporal continuity.


Means for Solving the Problem

In order to achieve the above object, a dynamic distribution estimation device according to the present invention is for estimating parameters of a mixture model online, the mixture model being a mixture of arbitrary distributions belonging to an exponential distribution family and representing a distribution of data to be observed. The dynamic distribution estimation device includes an expected value update unit that updates, in a case of assuming that data of each of unobserved samples belongs to each component, an expected value for a distribution, having a sufficient statistic for the data of the unobserved sample, of a truncated component, based on data of a newly observed sample; a statistic update unit that updates a statistic related to each component based on the data of the newly observed sample and the expected value updated by the expected value update unit; and a parameter update unit that updates, for each component, a parameter related to the component based on the statistic updated by the statistic update unit, wherein updating by the expected value update unit, updating by the statistic update unit, and updating by the parameter update unit are repeated each time a predetermined parameter update timing is reached.


Further, a dynamic distribution estimation device according to the present invention is for estimating parameters of a Gaussian mixture model online, the Gaussian mixture model being for a mixture of a plurality of components and representing a distribution of data to be observed. The dynamic distribution estimation device includes a responsibility update unit that updates, based on data of a newly observed sample, a responsibility that indicates a degree to which the data of the newly observed sample belongs to each component, and a responsibility that indicates a degree to which data of each of unobserved samples belongs to each component; a moment update unit that updates, in a case of assuming that the data of each of the unobserved samples belongs to each component, a moment of the data of the unobserved sample; a statistic update unit that updates a statistic for the number of samples belonging to each component among observed samples based on the responsibility that indicates a degree to which the data of the newly observed sample belongs to each component, updates a statistic for the number of samples belonging to each component among all samples based on the responsibility that indicates a degree to which the data of each of the unobserved samples belongs to each component, updates, for each component, a statistic for data of the observed sample belonging to the component based on the responsibility that indicates a degree to which the data of the newly observed sample belongs to each component, and updates, for each component, in a case of assuming that data of each newly observed sample belongs to each component, a statistic for data of the unobserved sample belonging to the component based on the moment of the data of the unobserved sample, the statistic for the number of samples belonging to the component among the observed samples, and the statistic for the number of samples belonging to the component among all the samples; and a parameter update unit that updates, for each component, a parameter related to the component based on the statistic for the number of samples belonging to the component among all the samples, the statistic for the data of the observed sample belonging to the component, and the statistic for the data of the unobserved sample belonging to the component, wherein updating by the responsibility update unit, updating by the moment update unit, updating by the statistic update unit, and updating by the parameter update unit are repeated each time a predetermined parameter update timing is reached.


A dynamic distribution estimation device according to the present invention is for estimating parameters of a mixture model online, the mixture model being a mixture of arbitrary distributions that each represent a distribution of data to be observed and belong to an exponential distribution family. The dynamic distribution estimation device includes a latent variable parameter update unit that updates, based on data of a newly observed sample, a parameter of a variational distribution related to a latent variable of each component for data of the newly observed sample and a parameter of a variational distribution related to a latent variable of each component for data of a set of already observed samples including the newly observed sample; a statistic update unit that updates a statistic for the number of samples belonging to each component among observed samples based on the parameter of the variational distribution for each component for the data of the newly observed sample, updates a statistic for the number of samples belonging to each component among unobserved samples based on the parameter of the variational distribution for each component for the data of the set of already observed samples, updates, for each component, a statistic for data of the observed sample belonging to the component based on the parameter of the variational distribution for each component for the data of the newly observed sample, and updates, based on the data of the set of already observed samples and the statistic for the number of samples belonging to each component among the unobserved samples, the statistic for the data of the unobserved sample belonging to the component; and a parameter update unit that updates, for each component, a parameter of a variational distribution related to a parameter of the component based on the number of samples belonging to the component among all samples, the statistic for the data of the observed sample belonging to the component, and the statistic for the data of the unobserved sample belonging to the component, wherein updating by the latent variable parameter update unit, updating by the statistic update unit, and updating by the parameter update unit are repeated each time a predetermined parameter update timing is reached.


The parameter update timing of the present invention can be any one of a timing at which the data of the newly observed sample is obtained, a timing at which a predetermined number of pieces of data of the newly observed sample are obtained, and a timing when a predetermined update time is reached.


A dynamic distribution estimation method of the present invention is a dynamic distribution estimation device for estimating parameters of a mixture model online, the mixture model being a mixture of arbitrary distributions belonging to an exponential distribution family and representing a distribution of data to be observed. The dynamic distribution estimation method includes the steps of: by an expected value update unit, updating, in a case of assuming that data of each of unobserved samples belongs to each component, an expected value for a distribution, having a sufficient statistic for the data of the unobserved sample, of a truncated component, based on data of a newly observed sample; by a statistic update unit, updating a statistic related to each component based on the data of the newly observed sample and the expected value updated by the expected value update unit; and by a parameter update unit, updating, for each component, a parameter related to the component based on the statistic updated by the statistic update unit, wherein updating by the expected value update unit, updating by the statistic update unit, and updating by the parameter update unit are repeated each time a predetermined parameter update timing is reached.


A dynamic distribution estimation method of the present invention is for a dynamic distribution estimation device estimating parameters of a Gaussian mixture model online, the Gaussian mixture model being for a mixture of a plurality of components and representing a distribution of data to be observed. The dynamic distribution estimation method includes the steps of: by a responsibility update unit, updating, based on data of a newly observed sample, a responsibility that indicates a degree to which the data of the newly observed sample belongs to each component, and a responsibility that indicates a degree to which data of each of unobserved samples belongs to each component; by a moment update unit, updating, in a case of assuming that the data of each of the unobserved samples belongs to each component, a moment of the data of the unobserved sample; by a statistic update unit, updating a statistic for the number of samples belonging to each component among observed samples, based on the responsibility that indicates a degree to which the data of the newly observed sample belongs to each component, updating a statistic for the number of samples belonging to each component among all samples, based on the responsibility that indicates a degree to which the data of each of the unobserved samples belongs to each component, updating, for each component, a statistic for data of the observed sample belonging to the component based on the responsibility that indicates a degree to which the data of the newly observed sample belongs to each component, and updating, for each component, in a case of assuming that data of each newly observed sample belongs to each component, a statistic for data of the unobserved sample belonging to the component based on the moment of the data of the unobserved sample, the statistic for the number of samples belonging to the component among the observed samples, and the statistic for the number of samples belonging to the component among all the samples; and by a parameter update unit, updating, for each component, a parameter related to the component based on the statistic for the number of samples belonging to the component among all the samples, the statistic for the data of the observed sample belonging to the component, and the statistic for the data of the unobserved sample belonging to the component, wherein updating by the responsibility update unit, updating by the moment update unit, updating by the statistic update unit, and updating by the parameter update unit are repeated each time a predetermined parameter update timing is reached.


A dynamic distribution estimation method of the present invention is for a dynamic distribution estimation device estimating parameters of a mixture model online, the mixture model being a mixture of arbitrary distributions that each represent a distribution of data to be observed and belong to an exponential distribution family. The dynamic distribution estimation method includes the steps of: by a latent variable parameter update unit, updating, based on data of a newly observed sample, a parameter of a variational distribution related to a latent variable of each component for data of the newly observed sample and a parameter of a variational distribution related to a latent variable of each component for data of a set of already observed samples including the newly observed sample; by a statistic update unit, updating a statistic for the number of samples belonging to each component among observed samples based on the parameter of the variational distribution for each component for the data of the newly observed sample, updating a statistic for the number of samples belonging to each component among unobserved samples based on the parameter of the variational distribution for each component for the data of the set of already observed samples, updating, for each component, a statistic for data of the observed sample belonging to the component based on the parameter of the variational distribution for each component for the data of the newly observed sample, and updating, based on the data of the set of already observed samples and the statistic for the number of samples belonging to each component among the unobserved samples, the statistic for the data of the unobserved sample belonging to the component; and by a parameter update unit, updating, for each component, a parameter of a variational distribution related to a parameter of the component based on the number of samples belonging to the component among all samples, the statistic for the data of the observed sample belonging to the component, and the statistic for the data of the unobserved sample belonging to the component, wherein updating by the latent variable parameter update unit, updating by the statistic update unit, and updating by the parameter update unit are repeated each time a predetermined parameter update timing is reached.


A program according to the present invention is a program for functioning as the units of the dynamic distribution estimation device of the present invention.


Effects of the Invention

As described above, according to the dynamic distribution estimation device, method, and program of the present invention, by estimating parameters of an arbitrary distribution being for a mixture of a plurality of components and belonging to an exponential distribution family, an effect is obtained, capable of estimate parameters of a model including censored data at high speed, with a saved memory, and with the parameters having temporal continuity.





BRIEF DESCRIPTION OF DRAWINGS


FIG. 1 is an explanatory diagram for describing a sequential update online algorithm.



FIG. 22 is an explanatory diagram for describing an update timing.



FIG. 3 is a schematic diagram illustrating a configuration example of a dynamic distribution estimation device according to a first embodiment.



FIG. 4 is a flowchart illustrating the contents of a dynamic distribution estimation process routine in the dynamic distribution estimation device according to the first embodiment.



FIG. 5 is a schematic diagram illustrating a configuration example of a dynamic distribution estimation device according to a second embodiment.



FIG. 6 is a flowchart illustrating the contents of a dynamic distribution estimation process routine in the dynamic distribution estimation device according to the second embodiment.



FIG. 7 is an explanatory diagram for describing censored data.





DESCRIPTION OF EMBODIMENTS

Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.


<Outline of Embodiments of the Present Invention>


In a situation where data is being collected on the day of an event, as time passes, data on newly arrived spectators can be observed, and the data is constantly updated. In estimating the model parameters of an arrival time distribution in such a situation, an online algorithm is useful which updates parameters sequentially with newly arrived data reflected (e.g., see a reference (Olivier Cappe and Eric Moulines. “On-line expectation-maximization algorithm for latent data models.”, Journal of the Royal Statistical Society: Series B (Statistical Methodology), 71(3) p. 593-613, 2009)).


Therefore, in an embodiment of the present invention, an online EMCM algorithm (online EM algorithm for Censored Mixture models), which is an algorithm for estimating the model parameters of the arrival time online from censored data that is constantly updated, is constructed. This technique has an advantage over the batch type method in the following three points.


(1) Memory Saving


The proposed algorithm of the present embodiment can update the parameters only with sufficient statistics held, and does not need to hold all the arrival times of the arrived spectators. This is also good from a privacy standpoint.


(2) High Speed


The proposed algorithm of the present embodiment is updated using the above-described statistics and the amount calculated from newly observed data. The proposed algorithm of the present embodiment can perform a parameter update process in less steps than the batch type method in which calculation is performed using all data. Particularly, in concerts and sport events as described above, the number of visitors is the order of tens of thousands, and it is preferable to avoid the batch process using all data at each time.


(3) Estimated Parameters Having Temporal Continuity


A parameter output at each time by the proposed algorithm of the present embodiment is a value continuously varying from the parameter at the previous time. If a process of re-applying the batch type method is performed at each time, a local optimal solution having a different objective function may be reached, resulting in an output of a parameter completely different from that at the previous time, which is not preferable in practice. By contrast, the present embodiment does not have such a problem.


In the present embodiment, three types of algorithms: (a) sequential update type, (b) mini-batch type, and (c) schedule type, which have different parameter update timings so as to match more various real system implementations are provided. These three are all algorithms having the above-described three advantages. This makes it possible to use the present technique in any of the cases where the parameters are updated immediately when new data is obtained, where the parameters are updated after some pieces of related data have been collected, and where the parameters are updated at a fixed timing. Also, although the above-described example is for estimation of the arrival time distribution, the present embodiment can be widely used for estimation of the parameters of censored data.


Note that, in the present embodiment, a mixture model is adopted as a model of the arrival time. This is because it is expected that, in the events as described above, a distribution of the arrival times of spectators has multi-modality depending on whether they purchase goods such as artist goods and uniforms before the start of the event, or they are just in time for the start of the event.


Note that the present embodiment can be applied almost in the same manner even for a mixture model of a probability distribution other than a normal distribution such as an exponential distribution or a lognormal distribution.


The batch type EM algorithm of Algorithm 1 calculates responsibilities for all data in the memory in the E-step, and uses them to repeatedly calculate statistics,






N
k
,S
k1
,S
k2  [Formula 31]


in the M-step. This means that it is necessary to hold all the values of data X in the memory and to read out the entire memory at each iteration. On the other hand, the online EMCM algorithm, which is our proposed algorithm (online Expectation-Maximization algorithm for Censored Mixture models), uses only newly observed data without having held all the data in the memory, and calculates responsibilities and statistics to update the parameters.


<Sequential Update Type Online EM Algorithm>


First, a sequential update type algorithm that updates parameters each time data xt is newly observed will be described. In the algorithm, the data observation and the update timing coincide as illustrated in FIG. 1. The proposed algorithm is derived taking advantage of the batch type algorithm enabling statistics to be expressed in sequential form. When a statistic at the time when data xt-1 is observed is represented by a superscript (t−1), specific sequential calculation equations are given by the following:






M
k
(t)
=M
k
(t-1)+γ(ztk),  (23)






N
k
(t)
=M
k
(t)+(N−M)η(zjk),  (24)






S
k1
(t)
=S
k1
(t-1)+γ(ztk)xt,  (25)






S
k2
(t)
=S
k2
(t-1)+γ(ztk)xt2,  (26)






U
k1
(t)=(Nk(t)−Mk(t))vk(y:xt),  (27)






U
k2
(t)=(Nk(t)−Mk(t)k(yj:xt),  (28)


First, in the E-step, responsibilities to be calculated using the new observation data xt are calculated. Observation of this data indicates that unobserved data at that time is also equal to or greater than xt, and moments are calculated accordingly. After that, in the M-step, the statistics are calculated using the above-described sequential equations to update the parameters. By performing these each time new data arrives, the parameters are estimated. A procedure of the proposed algorithm is summarized in Algorithm 2.









TABLE 2





Algorithm 2 Online EM Algorithm (Sequential type) for Censored


data

















Input: N: number of data, K: number of mixed models



Output: {θ(t)}Nt=1



 1: Initialize parameters {πk(0), μk(0), σk(0)} and statistics



Mk(0), Sk1(0), Sk2(0)



 2: Initialize number of observed data M ← 0



 3: for t = 1 to N do



 4: Acquire data xt



 5: Update number of data and threshold: M ← M + 1, C ← xt



 6: E-step: Calculate responsibilities γ(zi), η(zj) of data



  xt according to Equations (10) (11) , and moments νk(yj; xt),



  ξk(yj; xt) according to Equations (12) (13)



 7: M-step: Update/Calculate statistics Mk(t), Nk(t), Sk1(t),



  Sk2(t), Uk1(t), Uk2(t) using the responsibilities according



  to Equations (23) (24) (25) (26) (27) (28), and Update



  parameter θ(t) according to Equations (20) (21) (22) using



  the statistics



 8: end for









<Mini-Batch Type and Schedule Type Algorithms>


In the previous section, the parameters are updated each time the data is updated. However, data every time is not necessarily required, and algorithms with different parameter update timings can be derived so as to be adapted to various real system implementations. Therefore, in this section, in addition to (a) the sequential update type in the previous section, two types of algorithms, (b) a mini-batch type and (c) a schedule type illustrated in FIG. 2 are provided.


First, the mini-batch type will be described. In this method, the number B of data to be stored before parameter update (which is referred to as a mini-batch size) is determined in advance, and the parameter update is performed at the time when the number of data is stored. In the calculation of the E-step, the M-step obtains data on responsibilities and moment calculated from the mini-batch as described below, and updates the statistics as described below. Since the number of executions of the M-step is reduced as compared with the sequential update type, the processing time can be further reduced.









[

Formula





33

]













M
k

(

)


=


M
k

(


-
1

)


+




t









γ


(

z
tk

)





,




(
29
)








N
k

(

)


=


M
k

(

)


+


(

N
-
M

)



η


(

z
jk

)





,




(
30
)








S

k





1


(

)


=


S

k





1


(


-
1

)


+




t










γ


(

z
tk

)




x
t





,




(
31
)








S

k





2


(

)


=


S

k





2


(


-
1

)


+




t










γ


(

z
tk

)




x
t
2





,




(
32
)








U

k





1


(

)


=


(


N
k

(

)


-

M
k

(

)



)




v
k



(

y


:









max







x
t














x
t



)




,




(
33
)








U

k





2


(

)


=


(


N
k

(

)


-

M
k

(

)



)




ξ
k



(


y
j



:








max


x
t














x
t



)




,




(
34
)







A procedure of the proposed algorithm is summarized in Algorithm 3. Note that the symbol,





└·┘  [Formula 34]


represents a floor function that returns an integer not exceeding an input value. Next, (c) the schedule type will be described. The algorithm is almost the same as (b) the mini-batch type.









TABLE 3





Algorithm 3 Online EM Algorithm (Mini-batch type) for Censored


data

















Input: N: number of data, K: number of mixed models, B:



mini-batch size



Output: {θ(l)}Ll=1



 1: Initialize parameters {πk(0), μk(0), σk(0)} and statistics



Mk(0), Sk1(0), Sk2(0)



 2: Initialize number of observed data M ← 0



 3: for l = 1 to [N/B] do



 4: Acquire data D1



 5: Update number of data and threshold: M ← M + B, C ←



max_(xtϵD1) xt



 6: E-step: Calculate responsibilities γ(zi), η(zj) of data



  xt according to Equations (10) (11), and moments νk(yj; C),



  ξk(yj; C) according to Equations (12) (13)



 7: M-step: Update/Calculate statistics Mk(l), Nk(l), Sk1(l),



  Sk2(l), Uk1(l), Uk2(l) using the responsibilities according



  to Equations (29) (30) (31) (32) (33) (34), and Update



  parameter θ(1) according to Equations (20) (21) (22) using



  the statistics



 8: end for









Here, the statistics,






custom-character,custom-character  [Formula 35]


are different.





[Formula 36]






custom-character=(custom-charactercustom-character)vk(y;Tl),  (35)






custom-character=(custom-charactercustom-characterk(yj;Tl).  (36)


A procedure of the proposed algorithm is summarized in Algorithm 4. Note that a method in which all or some of the update methods following the above three types of algorithms are mixed, for example, a method using both the mini-batch and the update schedule can be similarly constructed, but is omitted herein.









TABLE 4





Algorithm 4 Online EM Algorithm (Schedule type) for Censored


data

















Input: N: number of data, K: number of mixed models, L: number



of updates, Update schedule: T1, T2, . . . , TL



Output: {θ(l)}Ll=1



 1: Initialize parameters {πk(0), μk(0), σk(0)} and statistics



Mk(0), Sk1(0), Sk2(0)



 2: Initialize number of observed data M ← 0



 3: for l = 1 to L do



 4: Acquire data D1



 5: Update number of data and threshold: M ← M + |D1|, C ←



T1



 6: E-step: Calculate responsibilities γ(zi), η(zj) of data



  xt according to Equations (10) (11) , and moments νk(yj; C),



  ξk(yj; C) according to Equations (12) (13)



 7: M-step: Update/Calculate statistics Mk(l), Nk(l), Sk1(l),



  Sk2(l), Uk1(l), Uk2(l) using the responsibilities according



  to Equations (29) (30) (31) (32) (35) (36), and Update



  parameter θ(l) according to Equations (20) (21) (22) using



  the statistics



 8: end for









<Sequential Update Online Variational Bayes Algorithm>


In the above description, the online EM algorithm that is an extension of the batch type EM algorithm is shown. In addition to the EM algorithm, there is an algorithm called a variational Bayes (VB) algorithm as an estimation algorithm fora mixture model, and the VB algorithm can be an online algorithm through the same approach as the present invention. Therefore, the scope of the present invention is not limited to the online EM algorithm, but includes all online estimation algorithms for a mixture model for censored data. Hereinafter, an example of deriving a sequential update type algorithm based on the batch type algorithm of the VB algorithm will be described below.


<Batch Type Variational Bayes (VB) Algorithm>


In the variational Bayes algorithm, it is considered that, for the parameters of the model,





θ={π,μ,λ}  [Formula 37]


, a prior distribution,






P(θ)=P(π)P(μ,λ)  [Formula 38]


is set. Here,





λ={λk}k=1K  [Formula 39]


is a precision parameter, and in this chapter, in place of a standard deviation,





σk2  [Formula 40]


, a precision,





λk  [Formula 41]


is used so that










f


(


x
|

μ
k


,

λ
k

-
1



)


=




λ
k


2





π





exp


(


-


λ
k

2





(

x
-

μ
k


)

2


)







[

Formula





42

]







represents a probability density function of the normal distribution.






P(π) and P(μ,λ)  [Formula 43]


are a (symmetric) Dirichlet distribution and a normal-gamma distribution, respectively, and are defined by the following equations.














[

Formula





44

]



















P


(
π
)


=


Dirichlet


(

π
|

α
0


)


=



Γ


(

K






α
0


)




Γ


(

α
0

)


K







k
=
1

K



π
k


α
0

-
1






,





(
37
)












P


(

μ
,
λ

)


=




k
=
1

K







(



μ
k

|

μ
0


,


(


τ
0



λ
k


)


-
1



)




Gam


(


λ
k

,

a
0

,

b
0


)









(
38
)






=




k
=
1

K







τ
0



λ
k



2





π





exp


(


-



τ
0



λ
k


2





(


μ
k

-

μ
0


)

2


)




1

Γ


(

a
0

)





b
0

a
0




λ
k


a
0

-
1





exp


(


-

b
0




λ
k


)


.







(
39
)







By combining the above equation and Equation (9), a generation probability of the parameters and the complete data is represented by the following equation.





[Formula 45]






P(X,W,Y,Z,θ)=P(X,W,Y,Z|θ)P(θ)  (40)


The (batch type) VB algorithm is a method of estimating a variational distribution that approximates a posterior distribution of the parameters and the latent variables. In the VB algorithm for censored data, it is considered that, under the condition that the variational distribution is decomposed into






q(π,μ,λ,Y,Z)=q(π)q(μ,λ)q(Y,Z),  [Formula 46]


, a functional,






custom-character[q]  [Formula 47]


is minimized to estimate the variational distribution.











F
_



[
q
]


=



q



[

log







q


(
π
)



q


(

μ
,
λ

)




q


(

Y
,
Z

)




p


(

X
,
W
,
Y
,
Z
,
θ

)




]






(
41
)







Analysis by a variational method indicates that a desired variational distribution needs to satisfy the following optimality conditions:





[Formula 49]






q(π)∝exp(custom-characterq(μ,λ)q(Y,Z)[log P(X,W,Y,Z,θ)]),






q(μ,λ)∝exp(custom-characterq(π)q(Y,Z)[log P(X,W,Y,Z,θ)]),






q(Y,Z)∝exp(custom-characterq(π)q(μ,λ)[log P(X,W,Y,Z,θ)]),  (42)


Calculating the above, (optimal) variational distributions of





π,μ,λ,Y,Z  [Formula 50]


are shown to be given by the following Dirichlet distribution, normal-gamma distribution, and polynomial-truncated normal distribution, respectively:









[

Formula





51

]


















q


(
π
)


=

Dirichlet


(

π
|

α
k


)



,





(
43
)













α
k

=


α
0

+


M
_

k

+


L
_

k



,





(
44
)













q


(

μ
|
λ

)


=




k
=
1

K






(



μ
k

|


μ
_

k


,


(


λ
k



τ
k


)


-
1



)




,





(
45
)














μ
_

k

=




τ
0



μ
0


+



k





1


+



k





1






M
_

k

+


L
_

k

+

τ
0




,





(
46
)













τ
k

=



M
_

k

+


L
_

k

+

τ
0



,





(
47
)













q


(
λ
)


=




k
=
1

K






(



λ
k

|

a
k


,

b
k


)




,





(
48
)













a
k

=





M
_

k

+


N
_

k

+
1

2

+

a
0



,





(
49
)













b
k

=


b
0

+


1
2



{



τ
0



μ
0
2


+



k





2


+



k





2



}




,





(
50
)













q


(

Y
,
Z

)


=


q


(
Z
)




q


(

Y
|
Z

)




,





(
51
)













q


(
Z
)


=




i
=
1

M




Mult


(


y
i

|

ρ
ik


)







j
=

M
+
1






Mult


(


y
j

|

ϱ
k


)






,





(
52
)








γ
ik

=




q



[

log






π
k


]


-


1
2



τ
k

-
1



+


1
2





q



[

log






λ
k


]



-


1
2


log







λ
_

k


+

log






f


(



x
i

|


μ
_

k


,


λ
_

k

-
1



)





,




(
53
)









η
_

k

=




q



[

log






π
k


]


-


1
2



τ
k

-
1



+


1
2





q



[

log






λ
k


]



-


1
2


log







λ
_

k


+

log


(

1
-

F


(


C
|


μ
_

k


,


λ
_

k

-
1



)



)




,




(
54
)













ρ
ik



exp


(

γ
ik

)



,


ϱ
k



exp


(

η
k

)



,





(
55
)












q


(

Y
|
Z

)


=




i
=

M
+
1



N
-
M








(



y
j

|


μ
_


z
i



,


λ
_


z
j


,
C
,


)


.







(
56
)







The statistics and others in the above equations are as follows:









[

Formula





52

]














M
_

k

=




i
=
1

M




z
_

ik



,



L
_

k

=




j
=

M
+
1


N




z
_

jk



,




(
57
)









z
_

ik

=

ρ
ik


,



z
_

jk

=

ϱ
k


,




(
58
)










k





1


=




i
=
1

M





z
_

ik



x
i




,




k





1


=




j
=

M
+
1


N





z
_

jk







y
i

|
k




[

y
j

]





,




(
59
)










k





2


=




i
=
1

M





z
_

ik



x
i
2




,




k





2


=




j
=

M
+
1


N





z
_

jk







y
j

|
k




[

y
j
2

]





,




(
60
)












y
j

|
k




[

y
j

]


=



μ
_

k

+


1

τ
k






f


(


C
|


μ
_

k


,

τ
k

-
1



)


)



1
-

F


(


C
|


μ
_

k


,

τ
k

-
1



)



)





,




(
61
)












y
j

|
k




[

y
j
2

]


=



μ
_

k
2

+

1

τ
k


+


1

τ
k







(

C
+


μ
_

k


)



f


(


C
|


μ
_

k


,

τ
k

-
1



)



)


1
-

F


(


C
|


μ
_

k


,

τ
k

-
1



)







,




(
62
)









λ
_

k

=


a
k



/



b
k



,




q



[

log






λ
k


]


=


Ψ


(

a
k

)


-

log


(

b
k

)




,




(
63
)










q



[

log






π
k


]


=


Ψ


(

α
k

)


-

Ψ


(





k


=
1

K



α

k




)




,




(
64
)







Here,





Ψ(·)  [Formula 53]


represents a digamma function. As a result, a batch type VB algorithm is constructed as shown in Algorithm 5.









TABLE 5





Algorithm 5 Batch type VB Algorithm for Gaussian mixture model


for Censored data

















Input: X, W: input data, C: (right) censored value, K: number



of mixed models, α0, μ0, τ0, a0, b0: hyper parameters



Output: {αk, μk, τk, ak, bk}k: parameters for minimizing



functional



 Initialize variational parameters {αk, μk, τk, ak, bk}



repeat



 VB E-step: Calculate variational parameters ρ1k, Qk related



 to latent variables of observed data and censored data



 using current parameters according to Equation (55)



 VB M-step: Update/Calculate statistics Mk, Lk, Sk1, Sk2,



 Uk1, Uk2 according to Equations (57) (59) (60), and Update



 variational parameters {αk, μk, τk, ak, bk} according to



 Equations (44) (46) (47) (49) (50) using the statistics



until Repeat until convergence conditions are satisfied









<Sequential Update Type VB Algorithm>


An online algorithm is derived based on the above batch type VB algorithm. Statistics,







M

k
,L
k
,S
k1
,U
k1
,S
k2
,S
k2  [Formula 54]


are expressed in sequential form, as in the online EM algorithm. When a statistic at the time when data xt-1 is observed is represented by a superscript (t−1), specific sequential calculation equations are given by the following:





[Formula 55]






M
k
(t)
=M
k
(t-1)
+z
ik
,L
k
(t)=(N−M)Qk,  (65)






S
k1
(t)
=S
k1
(t-1)
+z
ik
x
t
,U
k1
(t)
=L
k
(t)
custom-character
y|k[y],  (66)






S
k2
(t)
=S
k2
(t-1)
+z
ik
x
t
2
,U
k2
(t)
=L
k
(t)
custom-character
y|k[y2],  (67)


Therefore, a sequential update type algorithm is constructed as shown in Algorithm 6. Similarly, it is possible to derive mini-batch type and schedule type algorithms based on the VB algorithm, but they are omitted.









TABLE 6





Algorithm 6 Online VB Algorithm (Sequential update type) for


Censored data

















Input: N: input data, K: number of mixed models, α0, μ0, τ0,



a0, b0: hyper parameters



Output: {αk(t), μk(t), τk(t), ak(t), bk(t)}k,t



 1: Initialize variational parameters {αk(0), μk(0), τk(0),



ak(0), bk(0)} and statistics Mk(0), Lk(0), Sk1(0), Sk2(0), Sk2(0),



Sk2(0), Uk1(0), Uk2(0)



 2: Initialize number of observed data M ← 0



 3: for t = 1 to N do



 4: Acquire data xt



 5: Update number of data and threshold: M ← M + 1, C ← xt



 6: VB E-step: Calculate variational paramenters ρtk, Qk



  related to latent variables of observed data xt and



  censored data according to Equation (55)



 7: VB M-step: Update/Calculate statistics Mk(t), Lk(t),



  Sk1(t), Sk2(t), Uk1(t), Uk2(t) according to Equations



  (65) (66) (67), and Update variational parameters {αk(t),



  μk(t), τk(t), ak(t), bk(t)} according to Equations



  (44) (46) (47) (49) (50) using the statistics



 8: end for









<Configuration of Dynamic Distribution Estimation Device 100 of First Embodiment>


A dynamic distribution estimation device 100 according to a first embodiment estimates parameters using a sequential update type online EM algorithm.


As illustrated in FIG. 3, the dynamic distribution estimation device 100 according to the first embodiment is configured to include a computer 10 and an external device 30, and the computer 10 includes a CPU (Central Processing Unit), a RAM (Random Access Memory), and a ROM (Read Only Memory) storing a program for executing a dynamic distribution estimation routine described later. The computer 10 functionally includes a storage unit 12, an initialization processing unit 17, an update processing unit 18, a parameter processing unit 23, and an input/output unit 24.


The storage unit 12 includes a parameter recording unit 13, an observation data number recording unit 14, a threshold recording unit 15, and a statistic recording unit 16.


In the parameter recording unit 13, parameters of a model,





k(0)k(0)k(0)}  [Formula 56]


are stored.


In the observation data number recording unit 14, the number M of observed data is stored.


In the threshold recording unit 15, the threshold C for observed data is stored.


In the statistic recording unit 16, statistics,






M
k
(t)
,N
k
(t)
,S
k1
(t),






S
k2
(t)
,U
k1
(t)
,U
k2
(t)  [Formula 57]


are stored.


The initialization processing unit 17 initializes the variational parameters stored in the parameter recording unit 13 and the statistics stored in the statistic recording unit 16.


The update processing unit 18 estimates online the parameters of a Gaussian mixture model which is for a mixture of a plurality of components and represents a distribution of data to be observed. The update processing unit 18 includes a responsibility update unit 19, a moment update unit 20, a statistic update unit 21, and a parameter update unit 22. The moment update unit 20 is an example of an expected value update unit.


The responsibility update unit 19 updates, based on data xt of a newly observed sample, a responsibility representing a degree to which data xt of the newly observed sample belongs to each component,





γ(ztk)  [Formula 58]


, and a responsibility representing a degree to which the data of each sample not yet observed belongs to each component,





η(zjk)  [Formula 59]


, according to Equations (23) and (24).


The moment update unit 20 updates, in a case of assuming that the data of each unobserved sample belongs to each component, a moment of the data of the unobserved sample,






v
k(yj;C),ξk(yj;C)  [Formula 60]


, according to Equations (12) and (13).


The statistic update unit 21 updates, based on a responsibility representing a degree to which data xt of the newly observed sample belongs to each component,





γ(ztk)  [Formula 61]


, a statistic for the number of samples belonging to each component among the observed samples,






M
k
(t)  [Formula 62]


, according to Equation (23).


The statistic update unit 21 updates a responsibility representing a degree to which the data of each sample not yet observed belongs to each component,





η(zjk)  [Formula 63]


, and a statistic for the number of samples belonging to each component among all the samples,






N
k
(t)  [Formula 64]


, according to Equation (24).


The statistic update unit 21 updates, for each component, based on a responsibility representing a degree to which the newly observed sample data belongs to each component,





γ(ztk)  [Formula 65]


, statistics for the data of the observed sample belonging to the corresponding component,






S
k1
(t),






S
k2
(t)  [Formula 66]


, according to Equations (25) and (26).


The statistic update unit 21 updates, for each component, in a case of assuming that the data of each unobserved sample belongs to the corresponding component, based on moments of the data of the unobserved sample,






v
k(yj;C),ξk(yj;C)  [Formula 67]


, a statistic for the number of samples belonging to the corresponding component among the observed samples,






M
k
(t)  [Formula 68]


, and a statistic for the number of samples belonging to the corresponding component among all the samples,






N
k
(t)  [Formula 69]


, statistics for the data of the unobserved samples belonging to the corresponding component,






U
k1
(t)
,U
k2
(t)  [Formula 70]


, according to Equations (27) and (28).


The parameter update unit 22 updates, for each component, based on a statistic for the number of samples belonging to the component, which is updated by the statistic update unit 21, among all the samples,






N
k
(t)  [Formula 71]


, statistics for the data of the observed sample belonging to the component,






S
k1
(t),






S
k2
(t)  [Formula 72]


, and statistics for the data of the unobserved samples belonging to the component,






U
k1
(t)
,U
k2
(t)  [Formula 73]


, parameters related to the component,





θ={πkkk}k=1K  [Formula 74]


, according to Equations (20) to (22).


The input/output unit 24 outputs parameters πnewk, μnewk, and (σnewk)2 updated by the parameter update unit 22 to the external device 30.


The external device 30 outputs the parameters output from the input/output unit 24 as a result.


<Operation of Dynamic Distribution Estimation Device 100>


Next, the operation of the dynamic distribution estimation device 100 according to the present embodiment will be described. First, the initialization processing unit 17 of the dynamic distribution estimation device 100 initializes the parameters stored in the parameter recording unit 13 and the statistics stored in the statistic recording unit 16. Then, the dynamic distribution estimation device 100 estimates the parameters of the model, the number of observation data, the threshold, and the statistics using the EM algorithm based on the already observed data, and stores them in the parameter recording unit 13, the observation data number recording unit 14, the threshold recording unit 15, and the statistic recording unit 16. Then, when the newly observed data xt is input, the dynamic distribution estimation device 100 executes a dynamic distribution estimation process routine illustrated in FIG. 4.


First, in step S100, the newly observed data xt is acquired.


In step S102, the number M of observation data and the threshold C are updated.


In step S104, the responsibility update unit 19 updates a responsibility γ(zt) according to Equation (10) based on data xt acquired in step S100. Further, the responsibility update unit 19 updates the responsibility η(zj) according to Equation (11) based on data xt acquired in step S100.


In step S106, the moment update unit 20 updates moments vk(yj; xt) and ξk(yj; xt) according to Equations (12) and (13) based on data xt acquired in step S100.


In step S108, the statistic update unit 21 updates a statistic M(t) k for the number of samples belonging to each component among the observed samples according to Equation (23) based on the responsibility γ(zt) updated in step S104. Further, the statistic update unit 21 updates a statistic N(t)k for the number of samples belonging to each component among all the samples according to Equation (24) based on the responsibility η(zj) updated in step S104. Further, the statistic update unit 21 updates, for each component, statistics S(t)k1 and S(t)k2 of the data of the observed sample belonging to the corresponding component according to Equations (25) and (26) based on the responsibility γ(zt) updated in step S104. Further, the statistic update unit 21 updates, for each component, the statistics U(t)k1 and U(t)k2 of the data of the not yet observed sample belonging to the corresponding component according to Equations (27) and (28) based on the moments vk(yj; xt) and ξk(yj; xt) updated in step S106 and the updated statistics M(t)k and N(t)k.


In step S110, the input/output unit 24 outputs the parameters πnewk, μnewk and (σnewk)2 updated in step S110 to the external device 30, and then the processing ends.


As described above, according to the dynamic distribution estimation device according to the first embodiment, the parameters of the Gaussian mixture model that is for a mixture of a plurality of components and represents a distribution of data to be observed data, are estimated online. Specifically, the dynamic distribution estimation device according to the first embodiment updates a responsibility based on data xt of the newly observed sample, updates a moment of the data of the unobserved sample, updates each statistic based on at least one of the responsibility and the moment, and updates a parameter related to a component based on each statistic. As a result, it is possible to estimate the parameters of the model at high speed, with a saved memory, and with the parameters having temporal continuity, using an online algorithm for censored data.


Thus, the present embodiment has three advantages of high speed, memory saving, and parameters having temporal continuity, as compared to the batch type estimation algorithm.


Note that the present invention is not limited to the above-described embodiment, and various modifications and applications are possible without departing from the scope and spirit of the present invention.


For example, the description in the above-described first embodiment is for a case where the sequential update online EM algorithm is used in which the parameter update timing is a timing at which data of a newly observed sample is obtained by way of example, but does not limit any embodiment. For example, the mini-batch type online EM algorithm may be used in which the parameter update timing is a timing at which a predetermined number of pieces of data of a newly observed sample are obtained. In this case, the responsibility and the moment may be updated based on the predetermined number of pieces of data xt of the newly observed sample according to Algorithm 3 described above, each statistic may be updated based on at least one of the responsibility and the moment, and parameters related to a component may be updated based on each statistic.


Further, a mini-batch type online EM algorithm may be used in which the parameter update timing is a timing at which a predetermined update time is reached. In this case, the responsibility and the moment may be updated based on data xt of samples newly observed until the update time is reached according to Algorithm 4 described above, each statistic may be updated based on at least one of the responsibility and the moment, and parameters related to a component may be updated based on each statistic.


<Configuration of Dynamic Distribution Estimation Device of Second Embodiment>


A dynamic distribution estimation device according to a second embodiment estimates parameters using a sequential update type online VB algorithm.


As illustrated in FIG. 5, a dynamic distribution estimation device 200 according to the second embodiment is configured to include a computer 210 and an external device 30, and the computer 210 includes a CPU (Central Processing Unit), a RAM (Random Access Memory), and a ROM (Read Only Memory) storing a program for executing a dynamic distribution estimation routine described later. The computer 210 functionally includes a storage unit 212, an initialization processing unit 217, an update processing unit 218, a parameter processing unit 23, and an input/output unit 24.


The storage unit 12 includes a parameter recording unit 213, an observation data number recording unit 14, a threshold recording unit 15, and a statistic recording unit 216.


In the parameter recording unit 213, variational parameters,





k(t),μk(t)k(t),ak(t),bk(t)}k,t  [Formula 75]


are stored.


In the statistic recording unit 216, statistics,







M

k
(t)
,L
k
(t)
,S
k1
(t)
,S
k2
(t)
,U
k1
(t)
,U
k2
(t)  [Formula 76]


are stored.


The initialization processing unit 217 initializes the variational parameters stored in the parameter recording unit 213 and the statistics stored in the statistic recording unit 16.


The update processing unit 218 estimates online the parameters of a Gaussian mixture model which is for a mixture of a plurality of components and represents a distribution of data to be observed. The update processing unit 218 includes a latent variable parameter update unit 219, a statistic update unit 221, and a parameter update unit 222.


The latent variable parameter update unit 219 updates, based on data xt of a newly observed sample, a parameter of a variational distribution related to a latent variable of each component for the newly observed sample data xt,







z

ikik  [Formula 77]


, and a parameter of a variational distribution related to a latent variable of each component for data of a set of already observed samples including the newly observed sample,







z

jk=custom-characterk  [Formula 78]


, according to Equation (55).


The statistic update unit 221 updates, based on a parameter of the variational distribution updated by the latent variable parameter update unit 219,







z

ikik  [Formula 79]


, a statistic for the number of samples belonging to each component among the observed samples,







M

k  [Formula 80]


, according to Equation (57).


Further, the statistic update unit 221 updates, based on a parameter of the variational distribution updated by the latent variable parameter update unit 219,







z

jk=custom-characterk  [Formula 81]


, a statistic for the number of samples belonging to each component among the unobserved samples,







L

k  [Formula 82]


, according to Equation (57).


Further, the statistic update unit 221 updates, based on a parameter of the variational distribution updated by the latent variable parameter update unit 219,







z

ikik  [Formula 83]


, statistics for the data of the observed sample belonging to the corresponding component,






S
k1
,S
k2  [Formula 84]


, according to Equations (59) and (60).


Further, the statistic update unit 221 updates, based on







z

jk=custom-characterk  [Formula 85]


, statistics for the data of the unobserved samples belonging to the corresponding component,






U
k1
,U
k2  [Formula 86]


, according to Equations (59) and (60).


The parameter update unit 222 updates, for each component, based on a statistic for the number of samples updated by the statistic update unit 221, among all the samples,







M

k







L

k  [Formula 87]


, statistics of the data of the observed sample belonging to the corresponding component and updated by the statistic update unit 221,






S
k1
,S
k2  [Formula 88]


, and statistics for the data of the unobserved samples belonging to the corresponding component,






U
k1
,U
k2  [Formula 89]


, parameters of the variational distribution related to the parameters of the corresponding component,





k,μkk,ak,bk}  [Formula 90]


, according to Equations (44) to (50).


The parameter processing unit 223 controls the related units so that the updating by the latent variable parameter update unit 219, the updating by the statistic update unit 221, and the updating by the parameter update unit 222 are repeated every time a predetermined parameter update timing is reached. For example, the parameter processing unit 223 controls the units so that the updating by the latent variable parameter update unit 219, the updating by the statistic update unit 221, and the updating by the parameter update unit 222 are repeated at the time when data of a newly observed sample is acquired as the predetermined parameter update timing.


<Operation of Dynamic Distribution Estimation Device 200>


Next, the operation of the dynamic distribution estimation device 200 according to the present embodiment will be described. First, the initialization processing unit 17 of the dynamic distribution estimation device 200 initializes the parameters stored in the parameter recording unit 213 and the statistics stored in the statistic recording unit 216. Then, the dynamic distribution estimation device 200 estimates the parameters of the model, the number of observation data, the threshold, and the statistics using the VB algorithm based on the already observed data, and stores them in the parameter recording unit 213, the observation data number recording unit 14, the threshold recording unit 15, and the statistic recording unit 216. Then, when the newly observed data xt is input, the dynamic distribution estimation device 200 executes a dynamic distribution estimation process routine illustrated in FIG. 6.


First, in step S100, the newly observed data xt is acquired.


In step S102, the number M of observation data and the threshold C are updated.


In step S204, the latent variable parameter update unit 219 updates, based on data xt of the newly observed sample, parameters of the variational distribution related to the latent variables,







z

ikik







z

jk=custom-characterk  [Formula 91]


, according to Equation (55).


In step S206, the statistic update unit 221 updates, based on the parameters of the variational distribution updated in step S204.


In step S208, the statistic update unit 21 updates, based on the responsibility γ(zt) updated in step S104, statistics,







M

k
(t)
,L
k
(t)
,S
k1
(t)
,S
k2
(t)
,U
k1
(t)
,U
k2
(t)  [Formula 92]


are updated according to Equations (44) to (50).


In step S210, the input/output unit 24 outputs the parameters updated in step S208,





k(t),μk(t)k(t),ak(t),bk(t)}  [Formula 93]


to the external device 30, and then the processing ends.


As described above, according to the dynamic distribution estimation device according to the second embodiment, the parameters of the Gaussian mixture model that is for a mixture of a plurality of components and represents a distribution of data to be observed data, are estimated online. Specifically, the dynamic distribution estimation device according to the second embodiment updates parameters of a variational distribution related to latent variables and variational parameters based on data xt of the newly observed sample, updates each statistic based on at least one of: the parameters of the variational distribution related to the latent variables and the variational parameters, and updates the parameters related to the components based on the respective statistics. As a result, when a VB algorithm is used, it is possible to estimate the parameters of the model at high speed, with a saved memory, and with the parameters having temporal continuity.


Note that the present invention is not limited to the above-described embodiment, and various modifications and applications are possible without departing from the scope and spirit of the present invention.


For example, the description in the above-described second embodiment is for a case where the sequential update online VB algorithm is used in which the parameter update timing is a timing at which data of a newly observed sample is obtained by way of example, but does not limit any embodiment. For example, the mini-batch type online VB algorithm may be used in which the parameter update timing is a timing at which a predetermined number of pieces of data of a newly observed sample are obtained. In this case, the parameters of a variational distribution related to latent variables and variational parameters may be updated based on a predetermined number of pieces of data xt of the newly observed sample, each statistic may be updated based on at least one of: the parameters of the variational distribution related to the latent variables and the variational parameters, and the parameters related to the components may be updated based on the respective statistics.


Further, a mini-batch type online VB algorithm may be used in which the parameter update timing is a timing at which a predetermined update time is reached. In this case, the parameters of a variational distribution related to latent variables and variational parameters may be updated based on data xt of samples newly observed until the update time is reached, each statistic may be updated based on at least one of: the parameters of the variational distribution related to the latent variables and the variational parameters, and the parameters related to the components may be updated based on the respective statistics.


Further, the description in the above-described embodiments is for a case where the Gaussian distribution is used as a distribution for a component by way of example. However, it does not limit any embodiment but includes a case where an arbitrary exponential distribution family is used. The exponential distribution family, a density function is represented by the following Equation (67).





[Formula 94]






f
E(x|φ)=h(x)exp(φ·T(x)−A(φ))  (67)





φ  [Formula 95]


is a natural parameter, T(x) is a sufficient statistic, h(x) is a base measure,






A(φ)  [Formula 96]


is a known function called a logarithmic distribution function, and the symbol “•” in the equation represents an inner product of vectors. The Gaussian distribution is also a probability distribution belonging to the exponential distribution family, and when a natural parameter, a sufficient statistic, a base measure, and a logarithmic distribution function are defined as in the following Equation (68), the density function of the Gaussian distribution represented in Equation (2) is equal to Equation (67).














[

Formula





97

]














T


(
x
)


=


(

x
,

x
2


)

T


,





ϕ
=


(



μ
k



/



σ
k
2


,


-
1



/



(

2






σ
k
2


)



)

T


,






h


(
x
)


=

1


2





π




,






A


(
ϕ
)


=



ϕ
1
2


4






ϕ
2



-


1
2



log


(


-
2







ϕ
2


)









(
68
)







Based on this, the moments of the truncated normal distribution calculated in the estimation algorithm of the Gaussian mixture model (Equations (12) and (13), (61) and (62)) are, when the normal distribution is expressed in the form of an exponential distribution family, regarded as expected values calculated when x of the values corresponding to the respective dimensions (x and square of x) of the sufficient statistic T (x) follows the truncated normal distribution. Even when a distribution belonging to an exponential distribution other than the Gaussian distribution is used as the component distribution, replacing the process of calculating the moments with a process of calculating the expected values for the truncated distribution of the values corresponding to the respective dimensions of the sufficient statistic makes it possible to perform the estimation in the same manner as in the case of the Gaussian mixture model. Note that the statistic update unit 221 that calculates Equations (61) and (62) is an example of an expected value update unit.


Further, the present invention can be realized by installing a program on a known computer via a medium or a communication line.


Further, the above-described device includes a computer system inside, but such a “computer system” includes a homepage providing environment (or display environment) if a WWW system is used.


Further, in the description of the present application, the embodiments are described in which the program is installed in advance. However, the program may be stored in a computer-readable recording medium and provided.


REFERENCE SIGNS LIST




  • 10, 210 Computer


  • 12, 212 Storage unit


  • 13, 213 Parameter recording unit


  • 14 Observation data number recording unit


  • 15 Threshold recording unit


  • 16, 216 Statistic recording unit


  • 17, 217 Initialization processing unit


  • 18, 218 Update processing unit


  • 19 Responsibility update unit


  • 20 Moment update unit


  • 21, 221 Statistic update unit


  • 22, 222 Parameter update unit


  • 23, 223 Parameter processing unit


  • 24 Input/output unit


  • 30 External device


  • 100, 200 Dynamic distribution estimation device


  • 219 Latent variable parameter update unit


Claims
  • 1-8. (canceled)
  • 9. A computer-implemented method for dynamically estimating distribution of data with truncated data based on a mixture model, the method comprising: receiving a set of newly observed data;updating, based on the received set of newly observed data, a number of the received set of observed data;updating, based on the received set of newly observed data, a threshold of truncated data;updating one or more expectation values for the distribution including components with truncated data;updating one or more statistics of one or more components of data based on the updated one or more expectation values;updating one or more parameters of a distribution model based on the updated one or more statistics; andproviding the updated one or more parameters of the distribution model as an estimated distribution of data with the truncated data.
  • 10. The computer-implemented method of claim 9, wherein the mixture model is one of a Gaussian mixture model, or a mixture of arbitrary distributions belonging to an exponential distribution family.
  • 11. The computer-implemented method of claim 9, the method further comprising: updating, based on the received set of newly observed data, one or more responsibility values according to the expectation-maximization (EM) algorithm, wherein the one or more responsibility values indicate a degree of;updating, based on the received set of newly observed data, one or more moment values according to the EM algorithm, wherein the one or more expectation values include the one or more responsibility values and the one or more moment values;updating, based at least on the updated one or more moment values, statistics of unobserved data in the one or more components; andupon a predetermined timing, repeating the updating the one or more responsibility values, the updating the one or more moment values, and the updating the statistics of the unobserved data.
  • 12. The computer-implemented method of claim 11, wherein each of the one or more responsibility of data indicates one of: a first degree to which each of the set of the newly observed data belongs to each component, ora second degree to which each of the set of the unobserved data belongs to each component.
  • 13. The computer-implemented method of claim 11, wherein the predetermined timing is one or more of: upon receiving the newly observed data;upon a number of accumulating newly observed data reaching a predetermined number of accumulated the newly observed data; anda predetermined time.
  • 14. The computer-implemented method of claim 9, the method further comprising: updating, based on the received set of newly observed data, one or more parameters of a variational distribution that relates to latent variable according to variational Bayesian (VB) algorithm; andupon a predetermined timing, repeating the updating the one or more parameters of the variational distribution.
  • 15. The computer-implemented method of claim 14, wherein the predetermined timing is one or more of: upon receiving the newly observed data;upon a number of accumulating newly observed data reaching a predetermined number of accumulated the newly observed data; anda predetermined time.
  • 16. A system for dynamically estimating distribution of data with truncated data based on a mixture model, the system comprising: a processor; anda memory storing computer-executable instructions that when executed by the processor cause the system to: receive a set of newly observed data;update, based on the received set of newly observed data, a number of the received set of observed data;update, based on the received set of newly observed data, a threshold of truncated data;update one or more expectation values for the distribution including components with truncated data;update one or more statistics of one or more components of data based on the updated one or more expectation values;update one or more parameters of a distribution model based on the updated one or more statistics; andprovide the updated one or more parameters of the distribution model as an estimated distribution of data with the truncated data.
  • 17. The system of claim 16, wherein the mixture model is one of a Gaussian mixture model, or a mixture of arbitrary distributions belonging to an exponential distribution family.
  • 18. The system of claim 16, the computer-executable instructions when executed further causing the system to: update, based on the received set of newly observed data, one or more responsibility values according to the expectation-maximization (EM) algorithm, wherein the one or more responsibility values indicate a degree of;update, based on the received set of newly observed data, one or more moment values according to the EM algorithm, wherein the one or more expectation values include the one or more responsibility values and the one or more moment values;update, based at least on the updated one or more moment values, statistics of unobserved data in the one or more components; andupon a predetermined timing, repeat the updating the one or more responsibility values, the updating the one or more moment values, and the updating the statistics of the unobserved data.
  • 19. The system of claim 18, wherein each of the one or more responsibility of data indicates one of: a first degree to which each of the set of the newly observed data belongs to each component, ora second degree to which each of the set of the unobserved data belongs to each component.
  • 20. The system of claim 18, wherein the predetermined timing is one or more of: upon receiving the newly observed data;upon a number of accumulating newly observed data reaching a predetermined number of accumulated the newly observed data; anda predetermined time.
  • 21. The system of claim 16, the computer-executable instructions when executed further causing the system to: update, based on the received set of newly observed data, one or more parameters of a variational distribution that relates to latent variable according to variational Bayesian (VB) algorithm; andupon a predetermined timing, repeat the updating the one or more parameters of the variational distribution.
  • 22. The system of claim 21, wherein the predetermined timing is one or more of: upon receiving the newly observed data;upon a number of accumulating newly observed data reaching a predetermined number of accumulated the newly observed data; anda predetermined time.
  • 23. A computer-readable non-transitory recording medium storing computer-executable instructions that when executed by a processor cause a computer system to: receive a set of newly observed data;update, based on the received set of newly observed data, a number of the received set of observed data;update, based on the received set of newly observed data, a threshold of truncated data;update one or more expectation values for the distribution including components with truncated data;update one or more statistics of one or more components of data based on the updated one or more expectation values;update one or more parameters of a distribution model based on the updated one or more statistics; andprovide the updated one or more parameters of the distribution model as an estimated distribution of data with the truncated data.
  • 24. The computer-readable non-transitory recording medium of claim 23, wherein the mixture model is one of a Gaussian mixture model, or a mixture of arbitrary distributions belonging to an exponential distribution family.
  • 25. The computer-readable non-transitory recording medium of claim 23, the computer-executable instructions when executed further causing the system to: update, based on the received set of newly observed data, one or more responsibility values according to the expectation-maximization (EM) algorithm, wherein the one or more responsibility values indicate a degree of;update, based on the received set of newly observed data, one or more moment values according to the EM algorithm, wherein the one or more expectation values include the one or more responsibility values and the one or more moment values;update, based at least on the updated one or more moment values, statistics of unobserved data in the one or more components; andupon a predetermined timing, repeat the updating the one or more responsibility values, the updating the one or more moment values, and the updating the statistics of the unobserved data.
  • 26. The computer-readable non-transitory recording medium of claim 25, wherein each of the one or more responsibility of data indicates one of: a first degree to which each of the set of the newly observed data belongs to each component, ora second degree to which each of the set of the unobserved data belongs to each component.
  • 27. The computer-readable non-transitory recording medium of claim 25, wherein the predetermined timing is one or more of: each time receiving the newly observed data;a time of a number of stored newly observed data reaching a predetermined number of the newly observed data; anda predetermined time.
  • 28. The computer-readable non-transitory recording medium of claim 23, the computer-executable instructions when executed further causing the system to: update, based on the received set of newly observed data, one or more parameters of a variational distribution that relates to latent variable according to variational Bayesian (VB) algorithm; andupon a predetermined timing, repeat the updating the one or more parameters of the variational distribution.
Priority Claims (1)
Number Date Country Kind
2018-023593 Feb 2018 JP national
PCT Information
Filing Document Filing Date Country Kind
PCT/JP2019/004677 2/8/2019 WO 00