1. Technical Field
The invention relates to an observation value prediction device and an observation value prediction method, which are used in a robot and the like.
2. Related Art
For example, a method of acquiring a physical knowledge is developed, in which in a case where a robot performs an operation on an object and as a result the object is moved, a hidden Markov model is used to learn a relation between the operation of the robot and a track of the object based on time series information of the robot itself and time series information of the object visually observed (for example, Komei Sugiura, Naoto Iwahashi, Hideki Kashioka, “HMM Synthesis by Penalized Likelihood Maximization for Object Manipulation Tasks,” Department lecture, SICE System Integration, pp. 2305-2306, 2012). In methods according to the related art including the above method, a track is generated by generalizing and reproducing the learned track. Therefore, the methods according to the related art do not generate an unknown track of the object from an unknown operation of the robot which has not been learned. In other words, when the track of the object is assumed as an observation target object, an unknown observation value not learned is hardly predicted. As described above, there is no development in the related art on a prediction device and a prediction method which can predict an unknown observation value not learned.
As described above, the prediction device and the prediction method which can predict an unknown observation value not learned has not been put to practical use. Therefore, there is a need for the prediction device and the prediction method which can predict an unknown observation value not learned.
A prediction device according to a first aspect of the invention includes: an observation unit configured to acquire an observation value of an observation target object; a learning unit configured to learn a transition probability and a probability distribution of a model from time series data of the observation value, wherein the model represents states of the observation target object and includes the transition probability between a plurality of states and the probability distribution of the observation value which corresponds to each state; and a prediction unit, using the time series data of the observation value before a predetermined time, configured to predict a state at the predetermined time based on the transition probability and to predict an observation value corresponding to the state at the predetermined time based on the probability distribution.
According to the prediction device of the aspect, the unknown observation value not learned can be predicted by using the model representing states of the observation target object and including the transition probability between the plurality of states and the probability distribution of the observation value which corresponds to each state.
In the prediction device according a first embodiment of the first aspect of the invention, the prediction unit is configured to obtain the state at the predetermined time and a plurality of sampling values of the observation value corresponding to the state, and set an average value of the plurality of sampling values to a prediction value of the observation value.
According to the embodiment, a prediction value can be simply obtained by setting the average value of the plurality of sampling values to the prediction value of the observation value.
In the prediction device according to a second embodiment of the first aspect of the invention, the observation value includes a position and a speed of the observation target object, and the prediction unit is configured to perform the prediction using the probability distribution of the position of the observation target object.
According to the embodiment, since a position of the object satisfying a dynamic constraint can be generated, a smooth track of the object can be generated.
In the prediction device according to a third embodiment of the first aspect of the invention, the model is a hierarchical Dirichlet process-hidden Markov model and the learning unit is configured to perform learning by Gibbs sampling.
According to the embodiment, there is no need to determine the number of states in advance, and the number of optimal states can be estimated according to the complexity of learning data.
A prediction method according to a second aspect of the invention predicts an observation value using a model, in which the model represents states of an observation target object and includes a transition probability between a plurality of states and a probability distribution of an observation value which corresponds to each state. The prediction method includes obtaining an observation value of the observation target object, learning the transition probability and the probability distribution of the model from time series data of the observation value, and predicting, using the time series data of the observation value before a predetermined time, a state at the predetermined time based on the transition probability and to predict an observation value corresponding to the state at the predetermined time based on the probability distribution.
According to the prediction method of the aspect, the unknown observation value not learned can be predicted by using the model representing states of the observation target object and including the transition probability between the plurality of states and the probability distribution of the observation value which corresponds to each state.
As an example, in a case where a robot performs an operation on an object using an arm, the arm and the object become the observation target objects. For example, an axis in a lateral direction, when viewed the robot in the front, is assumed as an x axis, and an axis in a longitudinal direction thereof is assumed as a y axis. The x coordinate and the y coordinate in front of the robot, and differences in these coordinates are used as total 4-dimensional information of the arm (the observation value), and similarly, the x coordinate and the y coordinate of the object and differences in these coordinates are used as the total 4-dimensional information (the observation value) of the object.
The observation unit 101 is configured to acquire the observation values of the arm and the object value using an image pickup device or various types of sensors of the robot. In other words, the observation unit 101 acquires the observation value of an observation target object (for example, the object), and also acquires other data (for example, position information of the arm of the robot) if necessary.
When the robot touches the object, the prediction device 100 observes the movement of the robot itself and the movement of the object and learns and predicts a relation between these movements. Through the learning, the robot can gain “knowledge” such as that a round object rolls over when being touched, that the round object rolls over further far away when being touched with a stronger force, or that a square object and a heavy object are hard to roll over. Of course, the movement of the object can be predicted with a high accuracy through a physical simulation. However, the physical simulation requires parameters which are difficult to be directly observed such as a mass of the object, a friction factor, and the like. On the other hand, a person can predict a movement (track) of the object by using knowledge gained through experience based on visually-acquired information without using such parameters. Therefore, learning and predicting by the above-mentioned prediction device 100 are important also for the robot.
As described above, the prediction device 100 uses time series information on the position of the arm and time series information on the position of the object obtained from the observation unit 101. Hitherto, a hidden Markov model (HMM) has been used for the learning of the track of the object, the operation of the robot, and the like (Komei Sugiura, Naoto Iwahashi, Hideki Kashioka, “HMM Synthesis by Penalized Likelihood Maximization for Object Manipulation Tasks,” Department Lecture, SICE System Integration, pp. 2305-2306, 2012). In the HMM, the number of states has to be given in advance. However, in the embodiment, since the number of optimal states is different according to the operation of the robot and the object, it is difficult to set the number of states in advance. Thus, the prediction device 100 employs a hierarchical Dirichlet process-hidden Markov model (HDP-HMM) in which a hierarchical Dirichlet process (HDP) is introduced to the HMM (M. J. Beal, Z. Ghahramani, and C. E. Rasmussen, “The infinite hidden Markov model”, Advances in neural information processing systems, pp. 577-584, 2001). The HDP-HMM is a model in which the number of states is not determined in advance and the number of optimal states can be estimated according to the complexity of learning data. In the embodiment, the HDP-HMM is further expanded to a multimodal HDP-HMM (MHDP-HMM) in which a plurality of pieces of time series information such as the object and the operation (that is, the movement of the arm) of the robot itself can be learned, and unsupervised learning on the operation of the robot itself and the track of the object is performed.
Such learning of the plurality of pieces of information using the MHDP-HMM enables a stochastic prediction on other not-observed information based on a piece of information. For example, even when the robot does actually not move yet, it is possible to predict a movement of the object based only on the movement to be made by the robot. The prediction on the track of the object can be realized by predicting a future state based on the obtained information and by generating a track of the object corresponding to the state.
(s0,s1, . . . , sT) [Mathematical Formula 1]
(y11,y12, . . . , y1T) [Mathematical Formula 2]
(y21,y22, . . . , y2T) [Mathematical Formula 3]
(where, y1* is information of the arm of the robot, and y2* is information of the object.)
Each state represented by the following Expression 4 can take an infinite state represented by the following Expression 5.
s
t (t=0, . . . , T) [Mathematical Formula 4]
k(=0, . . . , ∞) [Mathematical Formula 5]
(where, πk represents a probability to transition from state k to each state.)
The probability πk is calculated based on β which is generated by a GEM distribution (Stick Breaking Process) having γ as a parameter and the Dirichlet Process (DP) having α as a parameter (Daichi Mochihashi, “Recent Advances and Applications on Bayesian Theory (III): An Introduction to Nonparametric Bayesian Models” http://www.ism.ac.jp/˜daichi/paper/ieice10npbayes.pdf, Naonori Ueda, and another, “Introduction to Nonparametric Bayesian Models” http://www.kecl.ntt.co.jp/as/members/yamada/dpm_ueda_yamada2007.pdf, Yee Whye Teh, and three others, “Hierarchical Dirichlet Processes” http://www.cs.berkeley.edu/˜jordan/papers/hdp.pdf).
[Mathematical Formula 6]
β˜GEM(γ) (1)
πkDP(α,β) (2)
Herein, regarding α and γ, a γ distribution is assumed as a prior distribution, and sampling is performed based on a posteriori probability (Yee Whye Teh, and three others, “Hierarchical Dirichlet Processes” http://www.cs.berkeley.edu/˜jordan/papers/hdp.pdf).
State st at time t is determined by state st−1 at time t−1 and a transition probability πk. Further, θ* is a parameter of a probability distribution to generate an observation value y*t, and in this case an average and a dispersion of the Gaussian distribution are assumed. Moreover, a Gaussian Wishart distribution is assumed as a prior distribution of the Gaussian distribution, and the parameter is denoted by H*. In other words, the following relations are established.
[Mathematical Formula 6]
st˜M(πs
θ*dk˜P(θ*k|H*) (4)
y*t˜N(y|θ*,s
(where, M represents a multinomial distribution, P of Equation (4) represents a Gaussian Wishart distribution, and N represents a Gaussian distribution.)
In the model 105, the transition probability πk and the parameter θ*k of the Gaussian distribution are obtained by learning.
Next, the learning of the model 105 will be described. The learning is realized by sampling state st at each time t using Gibbs sampling. In the Gibbs sampling, st is sampled out of the following conditional probability on condition of the remnants excluding st.
[Mathematical Formula 8]
P(st|s−t, β, Y1, Y2, α, H1, H2)∝P(st|s−t, β, α)P(y1t|st, s−t, Y1,−t, H1)×P(y2t|st, s−t, Y2,−t, H2) (6)
In this case, each of Y1 and Y2 is a set of all the observation data. Further, a suffix −t means the remnants excluding a state at time t. In other words, s−t represents a state of all the time excluding st, and Y1, −t, and Y2, −t represent the remnants in which y1t and y2t are excluded from Y1 and Y2, respectively. The following Expression 9 in Equation (6) can be expressed by the following Expression 10 through Bayesian inference.
P(y*t|st, s−t, Y*,−t, H*) [Mathematical Formula 9]
[Mathematical Formula 10]
P(y*t|st, s−t, Y*−t, H*)=∫P(y*t|st, θs
Further, Expression 11 is a state transition probability.
P(st|s−t, β, α) [Mathematical Formula 11]
Expression 11 can be expressed by the following Expression 12 when the number of transition times from state i to state j is represented as nij.
Herein, K is the number of current states, and in the case of k=K+1, it means that a new state is generated.
In Equation (6), a spatial constraint expressed by Equation (7) and a time constraint expressed by the equation of the state transition probability are taken into consideration.
The learning starts from a random initial value, and can be obtained by the transition probability (Expression 13) by repeating the sampling according to Equation (6), and the probability distribution (Expression 14) outputting an observation value according to a state.
P(s|s, β, α) [Mathematical Formula 13]
P(y*t|s, Y*,−t, H*) [Mathematical Formula 14]
Further, in the embodiment, hyper parameters α and β are also estimated through the sampling (Y. W. The, M. I. Jordan, M. J. Beal, and D. M. Blei, “Hierarchical Dirichlet processes,” Journal of the American Statistical Association, vol. 101, no. 101, no. 476, pp. 1566-1581, 2006).
Herein, a parameter of a posteriori distribution of the Gaussian distribution corresponding to state st is assumed as θ′st. In other words, the following equation is established.
P(y*t|st, s−t, Y*,−t, H*)=∫P(y*t|st, θs
Further, updating the parameter of the posteriori distribution by adding an observation data item y is denoted by the following Expression 16.
θ′s
On the contrary, updating the parameter of the posteriori distribution excepting the observation data item y is denoted by the following Expression 17.
θ′s
In Step S1010 of
In Step S1020 of
In Step S1030 of
In Step S1040 of
In Step S1050 of
In Step S1060 of
In Step S1070 of
Next, the prediction on a position of an object using the model 105 will be described. In a case where position p2,t−1 of the object at time t−1 is given, position P2,t of the object at time t can be calculated by the following Equation (8). However, the following Expression 18 is established in consideration of a positional difference with respect to the position at the previous time as a dynamic feature.
y
2,t
={p
2,t
T, (p2,t−p2,t−1)T}T [Mathematical Formula 18]
[Mathematical Formula 19]
N(y2,t|Σs
Σs
In this case, Expression 20 represents a dispersion and an average of the Gaussian distribution corresponding to state st. Herein, assuming that position p2,t−1 is already known, Equation (8) can be modified into an equation depending only on position p2,t.
N(y2,t|Σs
In this case, Expression 22 is assumed as follows.
Σ′, μ′ [Mathematical Formula 23]
The following equations are established for Expression 23.
It is possible to generate position p2,t of the object satisfying a dynamic constraint by performing the sampling from the Gaussian distribution having the average and the dispersion. In other words, the following equation is established.
[Mathematical Formula 25]
p
2,t
˜P(p2,t|st, p2,t−1)=N(p2,t|Σ′, μ′) (14)
In a case where a state sequence is already known, it is possible to generate a track by repeating a sequential sampling using Equation (14). However, it cannot be said that the operation applied to the object is limited to the track included in the learning. Therefore, a startup generated in an obscure state will be considered. In a case where state st−1 at time t−1 and position p2,t−1 of the object at that moment are given, an expected value of the position p2,t of the object at time t is expressed as the following equation.
[Mathematical Formula 26]
2,t
=∫∫p
2,t
P(p2,t|st, p2,t−1)×P(st|st−1, p2,t−1)dp2,tdst (15)
In this way, an obscure track in such a state can be generated. However, since it is difficult to analytically solve the integration, an approximation is performed using Monte Carlo methods. First, the following sampling is repeated by N times, and N sampling values are obtained at time t.
(p1, . . . , pn, . . . , pN) [Mathematical Formula 27]
[Mathematical Formula 28]
sn˜P(sn|st−1, p2,t−1) (16)
pn˜P(pn|sn, p2,t−1) (17)
However, the following Expression 29 of Equation (16) is obtained using a part of a state transition probability (Expression 30) as follows.
P(sn|st−t, p2,t−1) [Mathematical Formula 29]
P(st|s−t, β, α) [Mathematical Formula 30]
P(sn|st−1, p2,t−1)∝ns
The following Expression 32 of Equation (17) uses Equation (14) in consideration of the dynamic constraint.
P(pn|st, p2,t−1) [Mathematical Formula 32]
Finally, an average value of the N sampling values is assumed as a prediction value of the position of the object at time t.
Herein, assuming that only the track of the arm between time 0 to time Tarm is observed and a probability P (sTarm=k) in state k at time Tarm and an initial value p2,Tarm of the object are given, the track of the object is generated. The state at time Tarm is expressed by the following equation.
P(sT
In Step S2010 of
In Step S2020 of
In Step S2030 of
[Mathematical Formula 35]
s
n
˜P(sT
p
n
=P
2,t−1 for all n (20)
In Step S2040 of
In Step S2050 of
In Step S2060 of
In Step S2070 of
In Step S2080 of
[Mathematical Formula 36]
sn˜P(s|sn, p2,t−1) for all n (21)
pn˜P(pn|sn, p2,t−1) for all n (22)
Herein, Equation (21) corresponds to Equation (16), and Equation (22) corresponds to Equation (17).
In Step S2090 of
In Step S2100 of
In Step S2110 of
In Step S2120 of
Next, a simulation experiment of the prediction device 100 according to the embodiment will be described. The track of the arm and the track of the object when the arm of the robot touches the object are obtained by a simulator. The simulator is created by a physical calculation engine (Open Dynamic Engine (ODE)) (http://www.ode.org/). According to ODE, a collision, a friction, and the like of the object can be simulated, and various types of information such as the position and the speed of the object on the simulator can be obtained.
In the embodiment, assuming a sphere having a radius of 10 centimeters as the object, the track of the arm and the track of the object are obtained by ODE in a case where the robot applies a force on the object from the side and in a case where a force is applied from the upside.
Actually, as a result of learning the track illustrated in
Next, the track of the object is generated according to the sequence illustrated in
Next, as a prediction on an unknown track, a track is predicted in a case where the arm obliquely collides with the object.
In the above description, the case where y1 is information of the arm of the robot and y2 is information of the object (for example, a ball) has been given as an example. However, the invention can also be applied to other cases, of course. Herein, another specific example to which the invention is applicable will be described.
In the first place, a case where the invention is applied to relations between an object and an object, a person and a person, a vehicle and a person, a vehicle and a vehicle, and the like may be considered. By setting 4-dimensional data of the position and the speed of one in each pair to y1, and 4-dimensional data of the position and the speed of the other to y2, it is possible to learn a relation between y1 and y2 and to predict information of the other from the one in each pair. For example, considering a case where a person (y1) and a person (y2) pass by one another, it is possible to predict a possibility of various behaviors of a person; for example, if y1 unexpectedly steps aside to the left side, y2 goes to the opposite side, or if y1 expectedly walks in the middle of the road, y2 is likely to go to some direction.
Next, there is considered a case where the invention is applied to a relation between a color of the signal in an intersection and a speed of a vehicle. In this case, the position and the speed of the vehicle is y1, and the color of the signal is y2. Since the color of the signal is one of three values (red, blue, and yellow), θ2 is set as a parameter of the multinomial distribution, and H2 is set as a parameter of a Dirichlet distribution. The position and the speed of the vehicle y1, for example, are considered in a coordinate system of which the origin is the center of the intersection. Therefore, a relation between y1 and y2 is learned according to the method of the invention, and for example, in a case where the color (y2) of the signal is changed to yellow at the time of the current position and the current speed (y1) of the vehicle, a future position and a future speed (y1) of the vehicle can be predicted. Furthermore, the track of the vehicle can be predicted according to the invention. In addition, the change of a behavior (y1) of the vehicle can also be learned according to timing when the color (y2) of the signal is changed.
Further, a gender (y3) of a driver, a model (y4) of a vehicle, an age (y5) of a driver, and the like may be added as observation information, and thus a relation among y1 to y5 can be grasped. In this case, θ3 to θ5 become parameters of the multinomial distribution which has morphisms as many as these elements, and H3 to H5 become parameters of a Dirichlet prior distribution.
Number | Date | Country | Kind |
---|---|---|---|
2013-181269 | Sep 2013 | JP | national |