The present invention relates to an estimation apparatus, an estimation system, an estimation method, and a program.
A Gaussian Cox process is known as a technique for estimating the occurrence rate of an event (hereinafter referred to as a “rate function”) at each point in a one-dimensional or multidimensional space using data representing events that have occurred in the space. For example, NPL 1 proposes a method of approximately estimating a rate function using a variational Bayesian method.
However, approximation using the variational Bayesian method may sometimes cause an error or bias in the estimation result and may possibly output estimated values significantly different from those of a true rate function. It is also difficult to quantitatively evaluate the magnitude of the errors of the estimated values and incorrect estimated values may be possibly adopted as correct ones.
One embodiment of the present invention has been made in view of the above points, and has an object of the present invention to analytically estimate a rate function.
To achieve the above object, an estimation apparatus according to an embodiment includes a rate function estimation unit that takes a set of observation points representing events observed in a space region of a predetermined dimensionality in an observation, an observation count representing the number of times the observation is performed, a first function satisfying a predetermined condition, and a parameter of the first function as inputs, and analytically estimates a rate function for obtaining occurrence rates of the events in the space region.
It is possible to analytically estimate a rate function.
Hereinafter, an embodiment of the present invention will be described. In the present embodiment, an estimation apparatus 10 that can analytically estimate a rate function based on a Gaussian Cox process (hereinafter abbreviated as “GCP”) will be described.
First, a first example of the present embodiment will be described.
A functional configuration of the estimation apparatus 10 according to the first example will be described with reference to
As illustrated in
It is assumed that the estimation target points 1000 are a set of P points (coordinate points) {s1, s2, . . . , sP} for which the value of the rate function is to be estimated. These points sp (p=1, 2, . . . , P) are points for which the value of the rate function is to be estimated in a space region T that will be described later.
The point event data 1100 includes the space region T where observation has been performed, an observation count R indicating the number of times the observation has been performed, and a set of points (coordinate points) {t1, t2, . . . tN} of N events that have been observed in the space region T. The space region T is a region in a space of any dimensionality (one-dimensional or multidimensional) specified in advance.
The GCP kernel 1200 is a function h(t, t′) that satisfies the following conditions 1 and 2 where t and t′ are any points in the space region T.
Condition 1: his a symmetric function. That is, h(t, t′)=h(t′, t) holds.
Condition 2: The eigenvalue of an integral operator is 0 or greater and 1/R or less when using the function h(t, t′) as a kernel of the integral operator where the eigenvalue ν of the function h(t, t′) is defined by the following equation (1):
[Math. 1]
∫Th(t,t′)ϕ(t′)dt′=νϕ(t) (1)
where φ(t) is an eigenfunction.
Hereinafter, a function satisfying the above conditions 1 and 2 will be referred to as a “GCP kernel” or a “GCP kernel function”. Examples of a GCP kernel are shown below.
Example 1 of GCP kernel: A function h expressed as the following equation (2) defined by a positive semi-definite kernel k(t, t′) and an upper limit of its eigenvalue kmax is a GCP kernel.
where kmax is evaluated as a maximum value of a function obtained by integrating the positive semi-definite kernel k(t, t′) with respect to one of the variables (t′ in equation (3)) as expressed as the following equation (3):
As the positive semi-definite kernel k(t, t′), a Gaussian kernel kGauss(t, t′) having an intensity α and a covariance matrix Σ as parameters, a Matern kernel kMatern(t, t′) having an intensity α and a scale β as parameters, or the like may be considered. The Gaussian kernel kGauss(t, t′) is expressed as the following equation (4) and the Matern kernel kMatern(t, t′) is expressed as the following equation (5):
where τ represents a transpose operation for vectors and matrices. Note that other than the Gaussian kernel kGauss(t, t′) or the Matern kernel kMatern(t, t′), for example, a Wiener kernel kWiener(t, t′) can also be used.
Example 2 of GCP kernel: A solution of a Fredholm integral equation of the second kind defined using the positive semi-definite kernel k(t, t′) is a GCP kernel. That is, a solution h(t, t′) of an integral equation expressed as the following equation (6) is a GCP kernel.
[Math. 6]
h(t,t′)+2R∫Tk(t,s)h(s,t′)ds=k(t,t′) (6)
The GCP kernel parameters 1300 are parameter values of the GCP kernel h(t, t′). The number of parameters of the GCP kernel h(t, t′) and ranges of values that the parameters can have vary depending on the definition of the GCP kernel h(t, t′). For example, when the function h shown in the above equation (2) is adopted as a GCP kernel using a Gaussian kernel kGauss(t, t′) as the positive semi-definite kernel k(t, t′), the parameters are an intensity α and a covariance matrix Σ. On the other hand, for example, when the function h shown in the above equation (2) is adopted as a GCP kernel using a Matern kernel kMatern(t, t′) as the positive semi-definite kernel k(t, t′), the parameters are an intensity α and a scale β.
The GCP kernel parameters 1300 also include a parameter μ representing the square root of the average value of the rate function. It is assumed that the parameter μ takes a non-negative value.
The value of each parameter included in the GCP kernel parameters 1300 are usually determined based on various prior knowledge, restrictions, and the like of each case to which the estimation apparatus 10 according to the present embodiment is applied. For example, it is conceivable that the parameter β is a value representing several weeks when the function h shown in the above equation (2) is adopted as a GCP kernel using the Matern kernel kMatern(t, t′) for events of a user of an EC site purchasing a highly seasonal product. It is also conceivable, for example, to estimate a rate function using some candidate parameter values, compare the estimation results, and adopt a parameter value that is subjectively considered appropriate.
The estimator generation unit 101 takes the point event data 1100, the GCP kernel 1200, and the GCP kernel parameters 1300 given to the estimation apparatus 10 as inputs, and generates and outputs a rate function estimator 1400.
The rate function estimator 1400 is a function λ(t) for obtaining an estimated value of the rate function at any point t in the space region T. This function λ(t) is expressed as the following equation (7):
[Math. 7]
λ(t)=[μ(1−Rh(t))+h(t,:)τz]2,t∈T (7)
where
h
(t) [Math. 8]
is a function obtained by integrating the GCP kernel h(t, t′) in the space region T, h(t, :) is a vector composed of the values of the GCP kernel at the coordinate points t1, t2, . . . tN of N events, and z is a vector composed of the reciprocals of the square roots of the estimated values of the rate function at the coordinate points t1, t2, . . . tN of the N events. That is,
h
(t)≡∫Th(t,t′)dt′
h(t,:)≡(h(t,t1), . . . ,h(t,tN))τ
z=(z1, . . . ,zN)t≡(λ−1/2(t1), . . . ,λ−1/2(tN))τ [Math. 9]
As shown in the above equation (7), the rate function estimator 1400 (that is, λ(t)) is constituted with the following five elements:
R,μ,h(t),h(t,:),z [Math. 10]
Thus, the estimator generation unit 101 outputs the rate function estimator 1400 by outputting these five elements. At this time, the estimator generation unit 101 outputs the rate function estimator 1400 after generating (calculating) z, which is an unknown number among the five elements, through calculation. Here, because each element zn=λ−1/2(tn) of the N-dimensional vector z is a function of an estimated value of the rate function, and the estimated value of the rate function is a function of z according to the above equation (7), there exists a conditional expression for the relationship between the two to hold consistently, and z is calculated by solving the conditional expression. This conditional expression is a system of simultaneous quadratic equation with N unknowns expressed as the following equation (8):
where n′ also satisfies 1≤n′≤N.
The estimation unit 102 takes the estimation target points 1000 given to the estimation apparatus 10 and the rate function estimator 1400 output by the estimator generation unit 101 as inputs, and calculates and outputs estimated values 1500. That is, the estimation unit 102 substitutes the P points s1, s2, . . . , sP into λ(t) shown in the above equation (7) to calculate a set of estimated values of the rate function {λ(s1), λ(s2), . . . , λ(sP)}. This set {λ(s1), (s2), . . . , λ(sP)} is the estimated values 1500.
Next, a flow of estimation processing executed by the estimation apparatus 10 according to the first example for obtaining the estimated values 1500 will be described with reference to
Step S101: First, the estimator generation unit 101 takes the point event data 1100, the GCP kernel 1200, and the GCP kernel parameters 1300 given as inputs, and generates and outputs a rate function estimator 1400.
Step S102: Then, the estimation unit 102 takes the given estimation target points 1000 and the rate function estimator 1400 output in the above step S101 as inputs, and calculates and outputs the estimated values 1500. As a result, a set of estimated values of the rate function {λ(s1), λ(s2), . . . , λ(sP)} is obtained. At this time, because the estimation apparatus 10 according to the first example estimates the rate function analytically based on GCP without using approximation, it is possible to prevent a situation in which an unintended error or bias is included in the estimation result.
Note that the processing of step S101 may be executed in advance before the processing of step S102 is executed (in this case, the rate function estimator 1400 is stored in a storage device or the like, and in the processing of step S101, the rate function estimator 1400 is read from the storage device or the like). At this time, if the processing of the above step S102 is executed a plurality of times using the same rate function estimator 1400, the processing of the above step S101 may be executed once. Alternatively, for example, if the processing of the above step S101 is executed on another apparatus and a rate function estimator 1400 is given from the other apparatus, the estimation apparatus 10 may execute only the processing of the above step S102 (in this case, the system may be constituted with the other apparatus and the estimation apparatus 10, and the other apparatus may include the estimator generation unit 101 whereas the estimation apparatus 10 may not include the estimator generation unit 101).
Next, a second example of the present embodiment will be described. The second example will be described with respect to the case where the value of each parameter included in the GCP kernel parameters 1300 are determined without subjectivity rather than being determined subjectively.
A functional configuration of the estimation apparatus 10 according to the second example will be described with reference to
As illustrated in
Because the estimator generation unit 101 and the estimation unit 102 are substantially the same as those in the first example, description thereof will be omitted. However, the estimator generation unit 101 receives as input GCP kernel parameters 1300 output by the parameter determination unit 103.
The parameter determination unit 103 takes the point event data 1100 and the GCP kernel 1200 given to the estimation apparatus 10 as inputs, and outputs the GCP kernel parameters 1300. That is, the parameter determination unit 103 optimizes the value of each parameter included in the GCP kernel parameters 1300 through cross validation using the space region T, the observation count R, the set of points of N events {t1, t2, . . . , tN}, and the GCP kernel h(t, t′). Specifically, the parameter determination unit 103 divides, for example, the set of points of N events {t1, t2, . . . , tN} into K batches (subsets) to use K−1 batches for training and the remaining one batch for validation, and repeats change and validation of parameter values while switching the batch for validation to optimize each parameter value. Note that K may be determined according to the space region T and the observation count R.
Next, a flow of estimation processing executed by the estimation apparatus 10 according to the second example for obtaining the estimated values 1500 will be described with reference to
Step S201: First, the parameter determination unit 103 takes the point event data 1100 and the GCP kernel 1200 given as inputs, and outputs the GCP kernel parameters 1300.
Step S202: Next, the estimator generation unit 101 takes the given point event data 1100 and GCP kernel 1200, and the GCP kernel parameters 1300 output in the above step S201 as inputs, and generates and outputs a rate function estimator 1400.
Step S203: Then, the estimation unit 102 takes the given estimation target points 1000 and the rate function estimator 1400 output in the above step S202 as inputs, and calculates and outputs the estimated values 1500. As a result, a set of estimated values of the rate function {λ(s1), (s2), . . . , λ(sP)} is obtained. Moreover, because the estimation apparatus 10 according to the second example can optimize the GCP kernel parameters 1300, it is possible to obtain accurate estimated values of the rate function even for users who lack enough knowledge and experience regarding prior knowledge, restrictions, and the like of the case to which the estimation apparatus 10 according to the present embodiment is applied.
Note that as in the first example, the processing of the above steps S201 and S202 may be executed in advance or the processing of the above steps S201 and S202 may be executed on another apparatus.
Next, a third example of the present embodiment will be described. In the third example, the case where errors of the estimated values of the rate function are also calculated, will be described.
A functional configuration of the estimation apparatus 10 according to the third example will be described with reference to
As illustrated in
Because the estimator generation unit 101 and the estimation unit 102 are substantially the same as those in the first example, description thereof will be omitted.
The error calculation unit 104 takes the estimation target points 1000 given to the estimation apparatus 10, the rate function estimator 1400 output by the estimator generation unit 101, and the estimated values 1500 output by the estimation unit 102 as inputs, and calculates and outputs errors 1600. The errors 1600 are estimation errors (a covariance matrix) at P points s1, s2, . . . , sP to be estimated with respect to the square root of the rate function. These estimation errors σ(t) are calculated by the following equation (9):
where A and H are N×N matrices wherein each (n, n′) component of Λ is Λnn′=λ(tn)δnn′ and each (n, n′) component of H is Hnn′=h(tn, tn′). Note that δnn′ is a function that is 1 when n=n′ and 0 otherwise.
Next, a flow of estimation processing executed by the estimation apparatus 10 according to the third example for obtaining the estimated values 1500 and the errors 1600 will be described with reference to
Step S301: First, the estimator generation unit 101 takes the point event data 1100, the GCP kernel 1200, and the GCP kernel parameters 1300 given as inputs, and generates and outputs a rate function estimator 1400.
Step S302: Next, the estimation unit 102 takes the given estimation target points 1000 and the rate function estimator 1400 output in the above step S301 as inputs, and calculates and outputs the estimated values 1500.
Step S303: Then, the error calculation unit 104 takes the given estimation target points 1000, the rate function estimator 1400 output in the above step S301, and the estimated values 1500 output in the above step S302 as inputs, and calculates and outputs errors 1600. As a result, not only the set of estimated values of the rate function {λ(s1), λ(s2), . . . , λ(sP)} but also a set of estimation errors {σ(s1), σ(s2), . . . , σ(sP)} are obtained. Thus, the estimation apparatus 10 according to the third example can quantitatively evaluate the magnitude of the errors of the estimated values of the rate function.
Note that as in the first example, the processing of the above step S301 may be executed in advance or the processing of the above step S301 may be executed on another apparatus.
Next, application examples in which the estimation apparatus 10 according to the present embodiment is applied to specific cases will be described.
A situation where a person in charge of advertising strategy on an EC site identifies users to whom the next summer sale advertisement for a certain product is to be preferentially sent will be considered. First, purchase time data obtained by extracting a purchase time sequence for each user from purchase history data (a set of data items each represented by a pair of a user ID and a purchase time) regarding the product on the EC site is prepared. The purchase time data of one user is a set of points of events {t1, t2, . . . , tN}. Here, it is assumed that a time takes a value of a one-year cycle at intervals of one second (01-01 00:00:00 to 12-31 23:59:59). At this time, for example, the space region T is a one-dimensional space, and the observation count R is three if using purchase history data for the past three years.
By giving point event data 1100 including the purchase time data of one user, the space region T, and the observation count R and a GCP kernel 1200 and GCP kernel parameters 1300 selected by the person in charge of advertising strategy to the estimation apparatus 10, a rate function estimator 1400 is output.
Next, the person in charge of advertising strategy gives the estimation apparatus 10 a sequence of time points {s1, s2, . . . , sP} obtained by dividing a period during which advertisement is to be sent into intervals of one second, as estimation target points 1000. As a result, estimated values 1500 (that is, λ(s1), (s2), . . . , λ(sP)) of the probabilities of purchase occurrence of the user at the time points in the period are obtained.
For example, under a hypothesis that advertisement is highly effective for users with a high probability of purchase occurrence, the person in charge of advertising strategy can select users each having a high probability of purchase occurrence on average, and set up a schedule for preferentially sending advertisements to the selected users.
Note that when there is no prior knowledge on the business regarding parameter values of the GCP kernel, the parameter values may be determined by the parameter determination unit 103 described in the second example. In addition to the purchase occurrence probabilities, estimation errors may be calculated by the error calculation unit 104 described in the third example. By calculating the estimation errors together with the purchase occurrence probabilities, for example, probabilities that the purchase occurrence probabilities take a certain value or greater, rather than estimated values of the purchase occurrence probabilities, can be calculated for each user. The probabilities also take into account errors of the estimated values, and hence, may be used as more robust indices for identifying users with a high probability of purchase occurrence.
A situation where a national government, a local government, or a security company formulates an annual deployment plan of security guards in a target area will be considered. First, a person in charge prepares past crime occurrence history data (a set of data items each being a three-dimensional vector represented by an occurrence time and the latitude and longitude of an occurrence position). This crime occurrence history data is a set of points of events {t1, t2, . . . , tN}. Here, it is assumed that a time takes a value of a one-year cycle at intervals of one second (01-01 00:00:00 to 12-31 23:59:59) and the latitude and longitude take decimal values at intervals of 0.000001 degrees. At this time, for example, the space region T is a three-dimensional space, and the observation count R is five, if using crime occurrence history data for the past five years.
By giving point event data 1100 including the crime occurrence history data, the space region T, and the observation count R; and a GCP kernel 1200 and GCP kernel parameters 1300 selected by the person in charge to the estimation apparatus 10, a rate function estimator 1400 is output.
Next, the person in charge gives the estimation apparatus 10 a sequence of points in time and space {s1, s2, . . . , sP}, obtained by dividing one year into intervals of one second and dividing the target area into intervals of 0.000001 degrees, as estimation target points 1000. As a result, estimated values 1500 of the crime occurrence rate at the points in time and space (that is, λ(s1), (s2), . . . , λ(sP)) are obtained. Thus, the person in charge can formulate an effective annual deployment plan of security guards by identifying hours and areas where the crime occurrence rate is high.
Note that, for example, it is also possible to estimate and utilize the infectious disease occurrence rate by the same method as in the case of estimating the crime occurrence rate for formulating an annual deployment plan of security guards. In this case, past infectious disease occurrence history data of a target area may be used. At this time, the infectious disease occurrence history data may include, for example, location information (latitude, longitude, and altitude) of the target area, time information (hours in a day, a season in a year, etc.), weather information (sunny weather, rainy weather, etc.) (that is, the infectious disease occurrence history data may be, for example, a set of multidimensional vectors each including information on, for example, latitude, longitude, altitude, hours, weather, etc.). This makes it possible to estimate a rate function for the occurrence of infectious diseases under various conditions and to formulate effective health and hygiene measures.
Evaluation
Next, an evaluation of the estimation processing method (the proposed method) performed by the estimation apparatus 10 according to the present embodiment will be described. This evaluation was performed using an artificial data set provided at “URL: https://github.com/VirgiAgl/STVB” on the Internet, similar to the method described in “4 Experiments” in a reference “Aglietti, V., Bonilla, E. V., Damoulas, T., and Cripps, S. Structured variational inference in continuous cox process models. In Advances in Neural Information Processing Systems, pp. 12437-12447, 2019”.
The method described in NPL 1 above was adopted as a conventional method to be compared with the proposed method. Average L2 norm errors between estimated values and true values of the rate function were adopted as an evaluation index where the averages were averages over 10 trials. The evaluation results are shown in Table 1 below.
Here, PPGauss, PPMatern, and PPWiener represent proposed methods adopting a Gaussian kernel kGauss(t, t′), a Matern kernel kMatern(t t′), and a Wiener kernel kWiener(t, t′), respectively, as the positive semi-definite kernel k(t, t′) where the function h shown in the above equation (2) is adopted as a GCP kernel. VBPP represents the method described in NPL 1 above.
λ1(t), λ2(t), and λ3(t) in Table 1 above are intensity functions used to generate sequences of points included in the artificial dataset, similar to those described in “4 Experiments” in the above reference. Numbers in parentheses in Table 1 above are standard L2 norm errors between the estimated values and the true values of the rate function over 10 trials.
As shown in Table 1 above, it can be seen that, in the case of using a sequence of points generated by the intensity function λ2(t), PPWiener is slightly less accurate than VBPP, whereas in other cases, the proposed methods can accurately estimate rate functions compared to the conventional method.
Hardware Configuration
Finally, a hardware configuration of the estimation apparatus 10 according to the present embodiment will be described with reference to
As illustrated in
The input device 201 includes, for example, a keyboard, a mouse, and a touch panel. The display device 202 is, for example, a display. Note that the estimation apparatus 10 may not include at least one of the input device 201 and the display device 202.
The external interface 203 is an interface with an external device such as a recording medium 203a. The estimation apparatus 10 can perform reading and writing from and to the recording medium 203a or the like via the external interface 203. The recording medium 203a may store, for example, one or more programs that implement each of the functional units (the estimator generation unit 101, the estimation unit 102, the parameter determination unit 103, and the error calculation unit 104) included in the estimation apparatus 10. Examples of the recording medium 203a include a compact disc (CD), a digital versatile disk (DVD), a secure digital (SD) memory card, and a universal serial bus (USB) memory card.
The communication interface 204 is an interface for connecting the estimation apparatus 10 to the communication network. One or more programs that implement each functional unit of the estimation apparatus 10 may be acquired (downloaded) from a predetermined server device or the like via the communication interface 204.
The processor 205 is any of various arithmetic/logic units such as, for example, a central processing unit (CPU). Each functional unit of the estimation apparatus 10 is implemented, for example, by processing that one or more programs stored in the memory device 206 cause the processor 205 to execute.
The memory device 206 is any of various storage devices such as, for example, an HDD, an SSD, a random access memory (RAM), a read only memory (ROM), or a flash memory.
The estimation apparatus 10 according to the present embodiment can implement the estimation processing described above by having the hardware configuration illustrated in
The present invention is not limited to the specific embodiment disclosed above and various modifications, changes, combinations with known techniques, and the like can be made without departing from the scope of the claims. The examples can be combined. For example, the second and third examples can be combined to configure an estimation apparatus 10 having the estimator generation unit 101, the estimation unit 102, the parameter determination unit 103, and the error calculation unit 104.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/JP2020/018115 | 4/28/2020 | WO |