RELATIONSHIP EXTRACTION APPARATUS, RELATIONSHIP EXTRACTION METHOD, AND PROGRAM

Information

  • Patent Application
  • 20230082140
  • Publication Number
    20230082140
  • Date Filed
    February 22, 2021
    3 years ago
  • Date Published
    March 16, 2023
    a year ago
  • CPC
    • G06N20/10
  • International Classifications
    • G06N20/10
Abstract
A relationship extraction device includes a memory; and a processor configured to execute obtaining a set of data {x0, . . . , xT−1}⊆X each having multiple elements and a set of data {y0=f(x0), . . . , yT−1=f(xT−1)}⊆Y each having multiple elements, where f is any mapping; generating an approximate operator that approximates a Perron-Frobenius operator K satisfying Kφ1(xt)=φ2(yt) for t=0, . . . , T−1, wherein φ1 is a feature mapping with respect to a positive definite kernel function k1 on X×X that takes C*-algebra values, and φ2 is a feature mapping with respect to a positive definite kernel function k2 on Y×Y that takes C*-algebra values; obtaining data xt and xs as targets of relationship extraction; and extracting a relationship between each element of xt and each element of xs by using the approximate operator.
Description
TECHNICAL FIELD

The present invention relates to a relationship extraction device, a method of extracting relationship, and a program.


BACKGROUND ART

For data having multiple elements, investigation of correlations between elements has been conducted in various technical fields (e.g., in the fields of the statistics, machine learning, molecular dynamics, etc.).


For example, in the field of statistics and machine learning, techniques have been proposed in which vectors having multiple elements of data arranged are mapped to a space called vv-RKHS (vector-valued reproducing kernel Hilbert space), to approximate a function that represents a relationship between the elements on the vv-RKHS (Non-patent document 1). As the vv-RKHS is a space of vector-valued functions, it has an advantage of being capable of approximating the relationship among multiple elements at once. Note that techniques have been also proposed that extract information on cyclic components from time series data that represents change in time of the relationship by using the vv-RKHS for time series data (Non-patent document 2).


Also, for example, in the field of physics and molecular dynamics, techniques have been proposed that extract information on collective oscillations by a method called phase reduction (Non-patent document 3). Also, for example, in the field of machine learning, methods have been proposed that extract variables in a causality relationship by a method called Granger causality (Non-patent document 4).


The vv-RKHS described above is a generalization of the RKHS (reproducing kernel Hilbert space) used for analyzing data having a single element. By using the RKHS, data exhibiting complex behavior can be converted into data exhibiting simple behavior. Using this property, techniques have been studied that approximate complex time series data with a simple function on the RKHS (Non-patent document 5).


Here, as another generalization of the RKHS, a space called RKHM (reproducing kernel Hilbert C*-module) has been proposed, and theoretical analysis has been conducted in the field of physics (Non-patent document 6). The RKHM is a space of functions having values in a space called C*-algebra, and hence, can be used for approximating a C*-algebra-valued function. Note that C*-algebra is a generalization of a set of all complex numbers and a set of all matrices, and is a space having the concepts of conjugation and norm.


RELATED ART DOCUMENTS
Non-Patent Document



  • [Non-patent document 1] Mauricio A. Alvarez, Lorenzo Rosasco, and Neil D. Lawrence, ‘Kernels for vector-valued functions: a review,’ Computer Science and Artificial Intelligence Laboratory Technical Report, MIT-CSAIL-TR-2011-033 CBCL-301, 2011.

  • [Non-patent document 2] Keisuke Fujii, Yoshinobu Kawahara, ‘Dynamic mode decomposition in vector-valued reproducing kernel Hilbert spaces for extracting dynamical structure among observables,’ Neural Networks 117, pp. 94-103, 2019.

  • [Non-patent document 3] Hiroya Nakao, Sho Yasui, Masashi Ota, Kensuke Arai and Yoji Kawamura, ‘Phase reduction and synchronization of a network of coupled dynamical elements exhibiting collective oscillations,’ Chaos 28, 045103, 2018.

  • [Non-patent document 4] Songting Li, Yanyang Xiao, Douglas Zhou and David Cai, ‘Causal inference in nonlinear systems: Granger causality versus time-delayed mutual information,’ Phys. Rev. E 97, 052216, 2018.

  • [Non-patent document 5] Yuka Hashimoto, Isao Ishikawa, Masahiro Ikeda, Yoichi Matsuo and Yoshinobu Kawahara, ‘Krylov Subspace Method for Nonlinear Dynamical Systems with Random Noise,’ arXiv: 1909.03634, 2019.

  • [Non-patent document 6] Jaeseong Heo, ‘Reproducing kernel Hilbert C*-modules and kernels associated with cocycles,’ J. Math. Phys. 49, 103507, 2008.



SUMMARY OF THE INVENTION
Problem to be Solved by the Invention

Meanwhile, the RKHS is only capable of handling data having a single element, and hence, cannot describe a relationship among multiple elements. Also, the phase reduction aims at approximating collective behavior of data, and hence, cannot represent a relationship among elements. On the other hand, although the vv-RKHS takes a relationship among multiple elements into consideration, the proximity between vector-valued functions included in the vv-RKHS is measured in complex values. Therefore, for example, in the case where the purpose is to completely extract information on the relationship of any two elements from among the multiple elements, the number of relationships between the two data items each having n elements becomes n2, and hence, it becomes necessary to represent the proximity of functions corresponding to these data items n2 complex numbers.


In contrast, if using the RKHM, the proximity of functions can be measured by a C*-algebra value of a matrix or the like. However, there is no framework of using the RKHM that aims at extracting relationships between elements of data having multiple elements.


One embodiment of the present invention was devised in view of the above points, and has an object to extract relationships between elements held in data using the RKHM.


Means for Solving Problem

As described above, in order to achieve the object, a relationship extraction device according to one embodiment includes a first obtaining means configured to obtain a set of data {x0, . . . , xT−1}⊆X each having multiple elements and a set of data {y0=f(x0), . . . , yT−1=f(xT−1)}⊆Y each having multiple elements, where f is any mapping; a generation means configured to generate an approximate operator that approximates a Perron-Frobenius operator K satisfying Kφ1(xt)=φ2(yt) for t=0, . . . , T−1, wherein φ1 is a feature mapping with respect to a positive definite kernel function k1 on X×X that takes C*-algebra values, and φ2 is a feature mapping with respect to a positive definite kernel function k2 on Y×Y that takes C*-algebra values; a second obtaining means configured to obtain data xt and xs as targets of relationship extraction; and an extraction means configured to extract a relationship between each element of xt and each element of xs by using the approximate operator.


Advantageous Effects of the Invention

Relationships between elements held in data can be extracted using the RKHM.





BRIEF DESCRIPTION OF THE DRAWINGS


FIG. 1 is a diagram illustrating an example of a functional configuration of a relationship extraction device according to a present embodiment;



FIG. 2 is a flow chart illustrating an example of an approximate operator generation process according to the present embodiment;



FIG. 3 is a flow chart illustrating an example of a relationship extraction process according to the present embodiment;



FIG. 4A is a diagram (part 1) illustrating an example of an evaluation result;



FIG. 4B is a diagram (part 1) illustrating an example of an evaluation result;



FIG. 5A is a diagram (part 2) illustrating an example of an evaluation result;



FIG. 5B is a diagram (part 2) illustrating an example of an evaluation result;



FIG. 5C is a diagram (part 2) illustrating an example of an evaluation result;



FIG. 5D is a diagram (part 2) illustrating an example of an evaluation result;



FIG. 6A is a diagram (part 3) illustrating an example of an evaluation result;



FIG. 6B is a diagram (part 3) illustrating an example of an evaluation result;



FIG. 7A is a diagram (part 4) illustrating an example of an evaluation result;



FIG. 7B is a diagram (part 4) illustrating an example of an evaluation result; and



FIG. 8 is a diagram illustrating an example of a hardware configuration of a relationship extraction device according to the present embodiment.





EMBODIMENTS FOR CARRYING OUT THE INVENTION

In the following, one embodiment of the present invention will be described. In the present embodiment, a relationship extraction device 10 will be described, that can extract a relationship (i.e., interrelation) between any two elements held in data when the data including one or more items each having multiple elements is given, by using an RKHM.


<Theoretical Construction>

First, a theoretical construction of the present embodiment will be described.


<<Settings>>

Let X be a space to which data having multiple elements belongs, and let {x0, x1, . . . }⊆X be a set of given data. Let A be a C*-algebra, to consider an A-valued positive definite kernel k:X×X→A. Here, when stating that a mapping k:X×X→A is an A-valued positive definite kernel, the mapping satisfies the following Condition 1 and Condition 2.


(Condition 1) For any x,y∈X, k(x,y)=k(x,y)* where * denotes conjugate.


(Condition 2) Let m be any natural number, for any x0, x1, . . . , xm-1∈X and any c0, c1, . . . cm-1∈A, the following double summation is positive.









[

Math
.

1

]












t
=
0


m
-
1






s
=
0


m
-
1




c
t
*



k

(


x
t

,

x
s


)



c
s













Here, “positive” means being positive constant in a C*-algebra, which is a generalization of a Hermitian matrix whose all eigenvalues are greater than or equal to 0 (i.e., Hermitian positive definite).


Given an A-valued positive definite kernel k, a mapping φ from X to an A-valued function is defined by φ(x)=k(⋅,x). This mapping (p is also referred to as a feature map.


For a natural number m, x0, x1, . . . , xm-1∈X, and c0, c1, . . . , cm-1∈A, let Mk,0 be a space configured from the entirety of the following linear combination.









[

Math
.

2

]












t
=
0


m
-
1




ϕ

(

x
t

)



c
t












Also, let m and m′ be natural numbers, x0, x1, . . . , xm-1, y0, y1, . . . , ym′-1∈X, and c0, c1, . . . , cm-1, d0, d1, . . . , dm′-1∈A, an operation <⋅, ⋅>k with respect to Mk,0 is defined as follow:









[

Math
.

3

]

















t
=
0


m
-
1




ϕ

(

x
t

)



c
t



,




t
=
0



m


-
1




ϕ

(

y
t

)



d
t






k

=




t
=
0


m
-
1






s
=
0



m


-
1




c

t


*



k

(


x
t

,

y
s


)



d
s














The operation <⋅,⋅>k defined in this way has the properties of the A-valued inner product. In other words, the operation has the following four properties with respect to u,v,w∈Mk,0 and c,d∈A:

    • <u,v>k=<v,u>k*
    • <u,u>k is positive
    • <u,u>k=0 is equivalent to u=0
    • <u,vc+wd>k=<u,v>kc+<u,w>kd


By using this inner product <⋅,⋅>k, a complex-valued norm can be defined as follows:









[

Math
.

4

]












v


k

=







v
,
v



k




1
2












A space in which Mk,0 is completed with respect to this norm is denoted as Mk, and is referred to as a reproducing kernel Hilbert C*-module (RKHM) with respect to k. Mk can be configured uniquely. Also, in Mk, the magnitude of an A value |⋅|k can also be defined as follows:









[

Math
.

5

]













"\[LeftBracketingBar]"

v


"\[RightBracketingBar]"


k

=




v
,
v



k

1
2












Assuming that each of xt (t=0, 1, . . . ) being an element of X has n elements, xt is denoted as xt=[xt,0, . . . , xt,n-1]. In the case where a C*-algebra A is the entirety of n×n matrices, an A-valued positive definite kernel k can be configured by using the following complex-valued positive definite kernel:





{tilde over (k)}  [Math. 6]


k


Note that in the text of the present description, for the sake of convenience, a symbol having “˜” added to the top of x is written as “˜x”.


In fact, if each (i,j) component of an n×n matrix k(xt,xs) is defined by ˜k(xt,i,xs,j) with respect to the elements xt and xs of X, it can be shown that k is an n×n matrix-valued positive definite kernel. ˜k(xt,i,xs,j) represents the proximity of xt,i and xs,j; therefore, each (i,j) component of k(xt,xs) (i.e., the inner product of φ(xt) and φ(xs)) represents the proximity of the i-th component xt,i of xt and the j-th component xs,i of xs.


<<Relationship of Data in RKHM>>

Let X and Y be spaces to which data belongs, and assume that the following Formula (1) holds for x0, x1, . . . , xT−1∈X and y0, y1 . . . , yT−1∈Y.






y
t
=f(xt)  (1)


where f is a mapping from X to Y that is nonlinear in general.


Let k1 be a positive definite kernel on X, let k2 be a positive definite kernel on Y, let φ1 be a feature map with respect to k1, and let φ2 be a feature map with respect to k2. In order to express Formula (1) described above as a formula in the following spaces,






M
k

1

,M
k

2
  [Math. 7]


Assume that the following mapping,






K:M
k

1

→M
k

2
  [Math. 8]


satisfies the following Formula (2),







1(xt)=φ2(yt)  (2)


Such K is referred to as a Perron-Frobenius operator. In the case where x0, x1, . . . , xT−1∈X constitute time series data, if setting X=Y, yt=xt+1, and k1=k2, then, f is a mapping representing time evolution, and thereby, K is also a mapping representing time evolution.


<<Approximation of Perron-Frobenius Operator by Orthonormal Projection>>

In the following, it is assumed that an element xt (t=0, 1, . . . ) of X has n elements, and is expressed as xt=[xt,0, . . . , xt,n-1]. Also, assume that a C*-algebra A is the entirety of n×n matrices, and as described above, an A-valued positive definite kernel k is configured using a complex-valued positive definite kernel ˜k.


At this time, consider approximating K that satisfies Formula (2) describe above, to analyze f by using the approximated K, predict yt from a given xt, and compare matrix-valued inner products of elements of X obtained by such analysis and prediction (i.e., measure the proximity). The value of an inner product (proximity) takes a matrix value, and its component represents the proximity of elements, and thereby, relationships between the elements can be extracted. In the following, (i) a case of applying a Perron-Frobenius operator K when X=Y, yt=xt+1, and k1=k2=k; and (ii) a case of applying the Perron-Frobenius operator K when X*Y, will be described.


(i) The case of X=Y, yt=xt+1, and k1=k2=k


In this case, by solving a minimization problem in Formula (3) shown later,





{circumflex over (K)}  [Math. 9]


is solved to approximate K. Note that in the text of the present description, for the sake of convenience, a symbol having “{circumflex over ( )}” added to the top of x is written as “{circumflex over ( )}x”.









[

Math
.

10

]










min



ϕ

(

x

t
+
1


)

=



K


^



ϕ

(

x
t

)



(


t
=
0

,


,

T
-
2


)



,


K
^



L

(

V
T

)








"\[LeftBracketingBar]"



ϕ

(

x
T

)

-


K
^



ϕ

(

X

T
-
1


)





"\[RightBracketingBar]"


k





(
3
)







where VT is a set of all linear combinations expressed as in the following formula:









[

Math
.

11

]












t
=
0


T
-
1




ϕ

(

x
t

)




c
t

(


c
t


A

)












Also, L(VT) is a set of all A-linear operators from VT to VT (i.e., L that satisfies L(vc)=(Lv)c for any c∈A and any v∈Mk).


In order to solve the minimization problem shown in Formula (3) described above, an orthonormal projection from Mk to VT is calculated. Here, an orthonormal projection P from Mk to VT is an A-linear operator from Mk to VT that satisfies P2=P and P=P*. P can be calculated by configuring an orthonormal system {q0, q1, . . . , qT−1} of VT. The orthonormal system {q0, q1, . . . , qT−1} of VT is a Hermitian matrix c where <qt,qs>k=0 and <qt,qt>k is not 0 for qt∈VT and s≠t, and c2=c is satisfied (in this case, qt is called normal).


Given time series data x0, x1, . . . , xT−1∈X, an orthonormal system {q0, q1, . . . , qT−1} of VT can be configured by sequentially executing the following Step 1 and Step 2 for t=0, 1, . . . ,T−1.


Step 1: If t=0, set ˜q0=φ(x0). On the other hand if t≠0, for s=0, . . . , t−1, set rs,t=<φ(xt), qs>k, and set ˜qt as follows:









[

Math
.

12

]











q
~

t

=


ϕ

(

x
t

)

-




s
=
0


t
-
1




r

s
,
t




q
s














Step 2: Next, let ε be a real number greater than or equal to 0, and if ∥˜qtk≥ε, set qt=0; otherwise, the following is executed. Let eigenvalues of <˜qt, ˜qt>k be λt,0≥ . . . ≥λt,n-1, and let mt be the maximum index that satisfies λi2. Also, let UtDtUt* be the eigendecomposition of <˜qt,˜qs>k. Here, Dt is a matrix having diagonal components of λt,0, . . . , λt,n-1 and non-diagonal components of all zero. Ut is a matrix in which eigenvectors corresponding to the respective eigenvalues λt,n-1 are arranged in this order. At this time, <˜qt, ˜qt>k is a Hermitian positive definite matrix, and hence, if ∥˜qtk>ε, has at least one positive eigenvalue greater than ε, and mt>0.


Therefore, let {circumflex over ( )}Dt be a matrix having the following diagonal components,









[

Math
.

13

]










1


λ

t
,
0




,


,

1


λ

t
,

m
t





,
0
,


,
0










and having non-diagonal components of all zero, and set bt=Ut{circumflex over ( )}DtUt. Further, set qt=˜qtbt. qt is normal, and hence, is an orthonormal vector.


Let ΦT be an A-linear mapping to map a vector [c0, . . . , cT−1] of arrayed T elements of the C*-algebra A to the following linear combination:









[

Math
.

14

]












t
=
0


T
-
1




q
t



c
t












Also, let BT be a matrix having diagonal components of b0, . . . , bT−1 and non-diagonal components of all zero, and let RT be a T×T matrix having rs,t as the (s,t) component. Note that each component of RT is an element of A. By executing Step1 and Step2 described above, it can be shown that QTTBT−QTRT; therefore, QTTBT(I+RT)−1 where I is an identity matrix.


For QT configured as described above, QTQT* is an orthonormal projection from Mk to ˜VT (i.e., if setting P=QTQT*, P is an orthonormal projection) where ˜VT is a set of all linear combinations expressed as in the following formula:









[

Math
.

15

]












t
=
0


T
-
1




q
t




c
t

(


c

t




A

)












The orthonormal projection minimizes the difference, i.e., for any element v of Mk and any element w of ˜VT, |v−w|k−|v−Pv|k is positive. Also, in the case of setting c=0 at Step 2 described above, any element v of VT can be expressed as follows:









[

Math
.

16

]









v
=




t
=
0


T
-
1




q
t



c
t













Therefore, it can be shown VT=˜VT. Here, ct is an n×n matrix.


Therefore, it can be understood that {circumflex over ( )}K fulfilling Formula (3) described above satisfies {circumflex over ( )}Kφ(xT−1)=QTQT*φ(xT), and satisfies {circumflex over ( )}Kφ(xt)=QTQT*φ(xt+1) (t=0, . . . , T−1). Meanwhile, for an element v of Mk, an element of VT that minimizes the difference is QTQT*v. Therefore, Kv is approximated with {circumflex over ( )}KQTQT*v. Here, {circumflex over ( )}KQTQT*v can be expressed as follows:






{circumflex over (K)}Q
T
Q
T
*v={circumflex over (K)}Φ
T
B
T(I+RT)−1Q*Tv=QTQ*TΦT+1BT(I+RT)−1Q*Tv=QT(I+RT)*B*TΦ*TΦT+1BT(I+RT)−1Q*Tv  [Math. 17]


where, −* denotes the Hermitian transposition of an inverse matrix.


By QT, a vector of arrayed T elements of A and an element of VT can be considered as identical, and hence, QT can be regarded as an operator representing a coordinate transformation. Therefore, K is approximated with a T×T matrix in which components expressed as in the following formula,





(I+RT)*B*TΦ*TΦT+1BT(I+RT)−1  [Math. 18]


are elements of A. ΦTT+1 is a T×T matrix whose (s, t) component is k (xs, xt+1)∈A, and hence, the formula,





(I+RT)*B*TΦ*TΦT+1BT(I+RT)−1  Math. 19]


can be calculated in practice. Therefore, ˜KT is set as follows:






{tilde over (K)}
T=(I+RT)*B*TΦ*TΦT+1BT(I+RT)−1  [Math. 20]


and this ˜KT is referred to as an “approximate Perron-Frobenius operator”.


Thus, by using this approximate Perron-Frobenius operator ˜KT, Kv can be approximated as QT˜KTQT*v for any v∈Mk.


(ii) The Case of X≠Y

In this case, let VT be a set of all linear combinations expressed as in the following formula,









[

Math
.

21

]












t
=
0


T
-
1





ϕ
1

(

x
t

)




c
t

(


c
t


A

)












and let WT be a set of all linear combinations expressed as in the following formula:









[

Math
.

22

]












t
=
0


T
-
1





ϕ
2

(

y
t

)




c
t

(


c
t


A

)












Further, in substantially the same way as in (i) described above, an orthonormal system {q0, q1, . . . , qT−1} of VT is configured, and by using this orthonormal system {q0, q1, . . . , qT−1}, QT is configured. Further, let {circumflex over ( )}K be a linear mapping from VT to WT that satisfies {circumflex over ( )}Kφ1(xt)=φ2(yt), to approximate Kv with {circumflex over ( )}KQTQT*v. Therefore, also for WT, an orthonormal system is configured in substantially the same way as in (i) described above, and by using this orthonormal system, PT is configured by a method substantially the same as the method of configuring QT described above.


Also, in substantially the same way as in (i) described above, QT is decomposed as QTTBT(I+RT)−1 where ΦT is an A-linear mapping that maps a vector [c0, . . . , cT−1] of arrayed T elements of A to the following linear combination:









[

Math
.

23

]












t
=
0


T
-
1






ϕ
1

(

x
t

)



c
t












In substantially the same way, PT is decomposed as PTTCT(I+ST)−1 where ΨT is an A-linear mapping that maps a vector [c0, . . . , cT−1] of arrayed T elements of A to the following linear combination:









[

Math
.

24

]












t
=
0


T
-
1






ϕ
2

(

y
t

)



c
t












Also, CT is a T×T matrix with respect to WT, configured by a method substantially the same as the method of configuring BT described above. Similarly, ST is a T×T matrix with respect to WT, configured by a method substantially the same as the method of configuring RT described above.


At this time, as {circumflex over ( )}Kφ1(xt)=φ2(yt) is satisfied, {circumflex over ( )}KΦT=ωT is derived; therefore, K is approximated with a T×T matrix that has components of elements of A, and is expressed as follows:






P*
T
{circumflex over (K)}Q
T=(I+ST)−*C*TΨ*TΨTBT(I+RT)−1  [Math. 25]


In other words, the approximate Perron-Frobenius operator is set as follows:






{tilde over (K)}
T=(I+ST)*C*TΨ*TΨTBT(I+RT)−1  [Math. 26]


<<Decomposition of Approximate Perron-Frobenius Operator>>

As described above, an A-valued positive definite kernel k is configured with a complex-valued positive definite kernel ˜k, and is an n×n matrix in which each component takes a complex value. Therefore, letting C be the complex number field, the approximate Perron-Frobenius operator ˜KT can be regarded as ˜KT∈CnT×nT.


Here, assume that there exist eigenvalues ˜λ0, . . . , ˜λnT-1 and corresponding eigenvectors ˜v0, . . . , ˜vnT-1 for the approximate Perron-Frobenius operator ˜KT. By setting vm=[˜vm, 0, . . . , 0] and λm=diag{˜λm, 0, . . . , 0}, ˜KTvm=vmλm is satisfied. Also, if [˜v1, . . . , ˜vnT-1] is invertible, the following formula holds:









[

Math
.

27

]











Q
T
*



ϕ

(

x
0

)


=




m
=
0


nT
-
1





v
m




c
m

(


c
m


A

)













Here, by the definition of K, φ(xt)=Ktφ(x0) holds. Therefore, by using an approximate Perron-Frobenius operator, φ(xt) is approximated with QT˜KTtQT*φ(x0). Similarly, by using an approximate Perron-Frobenius operator, φ(xs) is approximated with QT˜KTsQT*φ(x0).


Then, k (xt, xs)=<φ(xt), φ(xs)>k can be approximated as in the following Formula (4):









[

Math
.

28

]














ϕ

(

x
t

)

,

ϕ

(

x
s

)




k








Q
T




K
~

T
t






m
=
0


nT
-
1





v
m



c
m




,


Q
T




K
~

T
s






m
=
0


nT
-
1





v
m



c
m







k





(
4
)









=






K
~

T
t






m
=
0


nT
-
1





v
m



c
m




,



K
~

T
s






m
=
0


nT
-
1





v
m



c
m













=







m
=
0


nT
-
1





v
m



λ
m
t



c
m



,




m
=
0


nT
-
1





v
m



λ
m
s



c
m












=




m
,


m


=
0



nT
-
1







c
m
*

(

λ
m
*

)

t






v
m

,

v

m








λ

m


s



c

m











=




m
,


m


=
0



nT
-
1






λ
~

~

m
t





λ
~


m


s

(



v
~

m
*




v
~


m




)



c
m
*



c

m









where for um and vm, <um, vm≥um*vm.


By approximation and decomposition executed in this way, for example, it becomes possible to analyze the behavior when having s,t→∞; the cycle of change in k(xt,xs) (i.e., a matrix in which each (i,j) component represents the proximity between the i-th component of xt and the j-th component of xs); and the like.


<Functional Configuration of Relationship Extraction Device 10>

Next, a functional configuration of the relationship extraction device 10 according to the present embodiment will be described with reference to FIG. 1. FIG. 1 is a diagram illustrating an example of a functional configuration of the relationship extraction device 10 according to the present embodiment.


As illustrated in FIG. 1, the relationship extraction device 10 according to the present embodiment includes an approximate operator generation processing unit 100, a relationship extraction processing unit 200, and a storage unit 300.


The storage unit 300 stores a set of data {x0, x1, . . . ,xT−1} each having multiple elements. Also, in the storage unit 300, an approximate Perron-Frobenius operator ˜KT generated by the approximate operator generation processing unit 100, and relationships extracted by the relationship extraction processing unit 200 are stored (i.e., an n×n matrix as an approximation result shown in Formula (4) described above).


The approximate operator generation processing unit 100 takes as input a set of data {x0, x1, . . . , xT−1} each having multiple elements, and executes an approximate operator generation process of generating an approximate Perron-Frobenius operator ˜KT. Here, the approximate operator generation processing unit 100 includes an obtaining unit 101 and an approximate operator generation unit 102. The obtaining unit 101 obtains the set of data {x0, x1, . . . , xT−1} each having multiple elements from the storage unit 300. The approximate operator generation unit 102 generates an approximate Perron-Frobenius operator ˜KT from {x0, x1, . . . , xT−1} obtained by the obtaining unit 101.


The relationship extraction processing unit 200 takes as input data xs and xt as targets of relationship extraction, and executes a relationship extraction process to extract relationships between the data. Here, the relationship extraction processing unit 200 includes an obtaining unit 201 and a relationship extraction unit 202.


The obtaining unit 201 obtains the data xs and xt as targets of relationship extraction from the storage unit 300. The relationship extraction unit 202 extracts relationships between the obtained xs and xt by the obtaining unit 201.


Note that the configuration of the relationship extraction device 10 illustrated in FIG. 1 is an example, and another configuration may be adopted. For example, the approximate operator generation processing unit 100 and the relationship extraction processing unit 200 may be included in different devices or equipment.


<Approximate Operator Generation Process>

Next, an approximate operator generation process according to the present embodiment will be described with reference to FIG. 2. FIG. 2 is a flow chart illustrating an example of an approximate operator generation process according to the present embodiment.


The obtaining unit 101 of the approximate operator generation processing unit 100 obtains data each having multiple elements from the storage unit 300 (also obtains y0, y1, . . . , yT−1 in the case of (ii) described above) (Step S101).


Next, the approximate operator generation unit 102 of the approximate operator generation processing unit 100 sets t←0 where t is an index indicating the data obtained at Step S101 described above (Step S102).


Next, the approximate operator generation unit 102 of the approximate operator generation processing unit 100 generates an orthonormal vector qt by using φ(x0), . . . , φ(xt) as described in the above (i) and (ii) (φ1(x0), . . . , φ1(xt) in the case of (ii) described above) (Step S103). Note that an orthonormal vector of WT is also generated in the case of (ii) described above.


Next, the approximate operator generation unit 102 of the approximate operator generation processing unit 100 sets t←t+1 (Step S104). Further, the approximate operator generation unit 102 of the approximate operator generation processing unit 100 determines whether t<T (Step S105).


If t<T is determined at Step S105 described above, the approximate operator generation unit 102 of the approximate operator generation processing unit 100 returns to Step S103. Thus, for t=0, . . . , T−1, Step S103 described above is executed, and the orthonormal system {q0, q1, . . . , qT−1} is obtained. Note that in the case of (ii) described above, the orthonormal system of WT is also obtained.


If t<T is not determined at Step S105 described above, the approximate operator generation unit 102 of the approximate operator generation processing unit 100 generates an approximate Perron-Frobenius operator ˜KT by using the orthonormal system {q0, q1, . . . , qT−1} as described in the above (i) and (ii) (also using the orthonormal system of WT in the case of (ii)) (Step S106).


<Relationship Extraction Process>

Next, a relationship extraction process according to the present embodiment will be described with reference to FIG. 3. FIG. 3 is a flow chart illustrating an example of a relationship extraction process according to the present embodiment.


The obtaining unit 201 of the relationship extraction processing unit 200 obtains the data xs and xt as targets of relationship extraction from the storage unit 300 (Step S201).


Next, the relationship extraction unit 202 of the relationship extraction processing unit 200 extracts relationships between the obtained xs and xt obtained at Step S101 described above (Step S102). In other words, the relationship extraction unit 202 approximates k(xt,xs)=<φ(xt), φ(xs)>k by Formula (4) described above (Step S202). Accordingly, an n×n matrix is obtained in which each (i,j) component represents the proximity between the i-th component of xt and the j-th component of xs (i.e., the relationship between xt,i and xs,j), and the relationships between xs and xt are extracted.


Application Examples

In the following, several application examples using the approximate Perron-Frobenius operator will be described.


<<Anomaly Detection>>

Suppose that each of x0, x1, . . . , xT−1 ∈X has n items of time series data. In other words, suppose that xt includes xt,0, . . . , xt,n-1 as n items of time series data, denoted as xt=[xt,0, . . . , xt,n-1]. In the case where φ(xt) has been obtained, φ(xt+1) can be predicted by using an approximate Perron-Frobenius operator ˜KT obtained by the method described in (i) described above. This prediction can be obtained by QT˜KTQT*φ(xt) as described above.


At this time, assuming that the following equation holds,









[

Math
.

29

]












K
~

T



Q
T
*



ϕ

(

x
t

)


=




s
=
0


T
-
1





ϕ

(

x
s

)



c
s













Each (j,j) component of the following formula,









[

Math
.

30

]












"\[LeftBracketingBar]"






s
=
0


T
-
1





ϕ

(

x
s

)



c
s



-

ϕ

(

x
t

)




"\[RightBracketingBar]"


k
2










is equivalent to the following:









[

Math
.

31

]
















s
=
0


T
-
1







i
=
0


n
-
1






(

c
s

)


i
,
j





ϕ
~

(

x

s
,
i


)




-


ϕ
~

(

x

t
,
i


)





k
~











where ˜φ is a feature map with respect to ˜k, and (cs)i,j is the (i,j) component of cs being an n×n matrix.


Therefore, in the case where the (j,j) component of the following formula is large,









[

Math
.

32

]












"\[LeftBracketingBar]"






s
=
0


T
-
1





ϕ

(

x
s

)



c
s



-

ϕ

(

x
t

)




"\[RightBracketingBar]"


k
2










it can be understood that an anomaly occurs in the j-th data item in the n items of time series data.


<<Causal Estimation (Part 1)>>

For n items of time series data, x0, x1, . . . , xT−1∈X are defined such that xs,i+m″n is data at time s+m″ of the i-th item of the time series data. At this time, consider the case of t=s in Formula (4) described above. For ˜λm having a magnitude close to 1,






{tilde over (λ)}
m
s
{tilde over (λ)}
m
s({tilde over (v)}*m{tilde over (v)}m)c*mcm  [Math. 33]


is unchanged by the change in s; therefore, for ˜λm having the magnitude close to 1, in the sum of Formula (4) described above, by calculating only the following,






{tilde over (λ)}
m
s
{tilde over (λ)}
m
s({tilde over (v)}*m{tilde over (v)}m(c*mcm  [Math. 34]


An unchanged part regardless of the change in s in the approximation of k(xs,xs) (the proximity between xs and xs) can be extracted. Therefore, if the value of the (i,j+m″n) component of the sum is large, then, xs,i and xs,j+m″n are close regardless of s; conversely, if the value of the component (i,j+m″n) is small, then xs,i and xs,j+m″n are distant regardless of s. In other words, it can be understood that the change in the i-th data from among the n items of time series data is a cause of the change in the j-th data.


<<Causal Estimation (Part 2)>>

Suppose that each of x0, x1, . . . , xT−1∈X has n items of time series data. In the case where the change in j-th data from among the n items of time series data is a cause of the change in i-th data, consider data ˜x0, ˜x1, . . . , ˜xT−1 each obtained by removing the j-th component in xt (t=0, . . . , T−1). In other words, it is set as ˜xt=[xt,0, . . . , xt,j-1, xt,j+1, . . . , xt,n−1]


At this time, in the case of considering that ˜KT is generated with ˜x0, ˜x1, . . . , ˜xT−1 to predict ˜xs for S≥T, this prediction is calculated by QT˜KTQT*φ(˜xs-1); however, among the components of ˜xs, it is expected that the component corresponding to the i-th data is not approximated well. Therefore, by comparing the components of the following formula,





|QT{tilde over (K)}TQ*Tϕ({tilde over (x)}S-1)−ϕ({tilde over (x)}S)|k2  [Math. 35]


data that changes due to the change in the j-th data as the cause can be identified. In other words, in the case where the (i,i) component of the following formula is large,





|QT{tilde over (K)}TQ*Tϕ({tilde over (x)}S-1)−ϕ({tilde over (x)}S)|k2  [Math. 36]


it can be understood that the change in the j-th data is the cause of the change in the i-th data. In the Granger causality, a linear relationship is assumed between items of data in time series data, whereas the method according to the present embodiment can estimate with good precision even for a nonlinear relationship.


<<Behavior of Proximity Between Elements when t→∞>>


In Formula (4) described above, the term corresponding to ˜λm=1 becomes a constant value when t→∞, and a term corresponding to |˜λm|<1 becomes zero when t→∞. Therefore, in Formula (4) described above, by setting ˜λm not being 1 to 0, the behavior of the proximity between elements when t→∞ can be understood.


<Other Data Analysis Methods Using RKHM>

Kernel PCA will be described as one of modified examples of the present embodiment. Let x0, x1, . . . , xT−1∈X be data each having n elements. By using the same notations as used in Step 2 described above, ˜bt is defined as ˜bt=UtDtUt where Dt is a matrix having diagonal components of





√{square root over (λt,0)}, . . . ,√{square root over (λt,mt)},0, . . . ,0  [Math. 37]


and having non-diagonal components of all zero. Let ˜Bm be a matrix having diagonal components of ˜b0, . . . , ˜bm-1 and non-diagonal components of all zero. In the case of setting ε=0, Φm=Qm(˜Bm+Rm) holds. Also, it can be shown that Cm that satisfies Qm*Qm=CmCm* exists. Therefore, by calculating the singular value decomposition as Cm*Rm=UmΣmVm*, and setting w1=QmCmu1, it can be shown that under a condition of vt being normal, w1 is a vector that maximizes the following formula:









[

Math
.

38

]












t
=
0


m
-
1







w





w
,

v
t




k




k
2











where ut represents a t-th column of Um. Also, vt is expressed as follows:









[

Math
.

39

]










v
t

=


ϕ

(

x
t

)

-




s
=
0


m
-
1




ϕ

(

x
s

)













Therefore, it can be stated that w1 is a vector that best approximates the residual on the RKHM, and this w1 will be referred to as the first principal vector. Similarly, wt=QmCmUt will be referred to as a t-th principal vector. Denoting non-zero eigenvalues of Φmm (a T×T matrix whose (s, t) component is k(xs, xt+1)∈A) as λ0≥ . . . ≥λ1>0, and the corresponding eigenvectors as v0, . . . ,v1, it can be shown wtt−1/2Φmvt; therefore, calculation is carried out in practice in this way. The proximity between data φ(xs) and the t-th principal vector can be expressed as <wt,φ(xs)>k, and hence, <wt,φ(xs)>k can be regarded as the t-th principal component of φ(xs). However, <wt,φ(xs)>k takes a matrix value, and hence, instead of <wt,φ(xs)>k, for example, by using ∥<wt,φ(xs)≥k∥, a distribution of the data can be visualized. For example, visualization in the two-dimensional plane can be achieved by taking ∥<w1,φ(xs)>k∥ in the horizontal axis and ∥<wt,φ(xs)>k∥ in the vertical axis, and plotting the data. Also, by replacing φ(xs) with the following formula,









[

Math
.

40

]










ϕ

(

x
s

)

-




t
=
0


m
-
1




ϕ

(

x
t

)












a centralized kernel PCA can be executed as in the case of general kernel PCA using the RKHS.


<Evaluation>

Next, evaluation of the method according to the present embodiment will be described.


<<Goodness of Prediction>>

A Kuramoto model on [0,2Π) shown in the following Formula (5) was considered.









[

Math
.

41

]











d


θ
i


dt

=


ω
i

+


κ
n






j
=
0


n
-
1




sin

(


θ
j

-

θ
i


)








(
5
)







where θi(0) was assumed to be a random number following a uniform distribution on [0, 2Π), and ωi was also assumed to be a random number following the uniform distribution on [0, 2Π).


A dynamical system shown in the following Formula (6) obtained by discretizing Formula (5) described above was considered.









[

Math
.

42

]










x

t
,
i


=


x


t
-
1

,
i


+

Δ

t


ω
i


+

Δ

t


κ
n






j
=
0


n
-
1




sin

(


x


t
-
1

,
j


-

x


t
-
1

,
i



)








(
6
)







Here, on [0, 2Π), the following function was considered,






{tilde over (k)}(x,y)=e−|eix−eiy|  [Math. 43]


where the (i,j) component of k(xt,xs) was set to ˜k(xt,i,xs,j), and Δt=0.01. Also, parameters were also set as n=200, T=10, and mt=jt upon normalization. At this time, for S=100, the magnitude |QT˜KTQT*φ(xs-1)|k of a predicted value was calculated in the cases of a parameter κ representing the strength of interrelation set to κ=1, 10.


A result of plotting values of the respective components of |QT˜KTQT*φ(xs-1)|K, in the case of κ=1 is illustrated in FIG. 4A. Also, a result of plotting values of the respective components of |QT˜KTQT*φ(xs-1)|k in the case of κ=10 is illustrated in FIG. 4B. QT-KTQT*φ(xs-1) is an approximation of φ(xs); therefore, the (i,j) component of |QT˜KTQT*φ(xS-1)|k is considered to be the (i,j) component of k(xs,xs), i.e., an approximation of ˜k(xs,i,xs,j). Therefore, if xs,i and xs,j are closer to each other, the (i,j) components of |QT˜KTQT*φ(xs-1)|k should become greater, or if xs,i and xs,j are apart further, the (i,j) components of |QT˜KTQT*φ(xs-1)|k should become smaller.


In FIG. 4B, (i,j) components are uniformly greater compared to those in FIG. 4A (i.e., a greater value of K resulted in uniformly greater (i,j) components). Therefore, the value of each component of the predicted value at time S is aligned. In the Kuramoto model, as a certain length of time elapses, the greater K resulted in better aligned values of the elements; therefore, it can be understood that the approximation was obtained precisely.


In fact, in the case of κ=1, 10, results of calculating k (x10, x10) and k (x100, x100) for x10 and x100, respectively, obtained directly from Formula (6) described above are illustrated in FIGS. 5A to 5D. Comparing FIG. 4A with FIG. 5C, and FIG. 4B with FIG. 5D, respectively, it can be understood that close values are obtained. Also, comparing FIG. 4B with FIG. 5B and FIG. 5D, although at t=10, these are not yet completely synchronized, by using ˜KT approximated by using data up to t=10, it can be understood the state of t=100 being sufficiently synchronized is predicted.


<<Behavior of Proximity Between Elements when t→∞>>


A Kuramoto model on [0,2Π) shown in the following Formula (7) was considered.









[

Math
.

44

]











d


θ
i


dt

=


ω
i

+


1
n






j
=
0


n
-
1





κ

i
,
j




sin

(


θ
j

-

θ
i


)









(
7
)







where θi(0) was assumed to be a random number following a uniform distribution on [0, 2Π), and ωi was also assumed to be a random number following the uniform distribution on [0, 2Π).


A dynamical system shown in the following Formula (8) obtained by discretizing Formula (7) described above was considered.









[

Math
.

45

]










x

t
,
i


=


x


t
-
1

,
i


+

Δ

t


ω
i


+

Δ

t


1
n






j
=
0


n
-
1





κ

i
,
j




sin

(


x


t
-
1

,
j


-

x


t
-
1

,
i



)









(
8
)







Here, on [0, 2Π), the following function was considered,






{tilde over (k)}(x,y)=e−|eix−eiy|  [Math. 46]


where the (i,j) component of k(xt,xs) was set to ˜k(xt,i,xs,j), and Δt=0.01. Also, ˜KT was calculated with n=50, T=10, and mt=jt upon normalization. Further, under each of the following Setting 1 and Setting 2, Formula (4) described above was calculated. Here, when calculating Formula (4) described above, zero was assumed except for ˜λm close to 1.


Setting 1: In the case of i>25 and j>25, κi,j=100; otherwise κi,j=0


Setting 2: In the case of (i<25 or i>35) and (j<25 or j>35), κi,j=100; otherwise κi,j=0


A result of calculating Formula (4) described above and plotting values of the respective components of the calculated result under Setting 1 described above is illustrated in FIG. 6A. Also, a result of calculating Formula (4) described above and plotting values of the respective components of the calculated result under Setting 2 described above is illustrated in FIG. 6B. Also, taking i in the vertical axis and j in the horizontal axis, a result of plotting the magnitudes of κi,j under Setting 1 described above is illustrated in FIG. 7A. Similarly, a result of plotting the magnitudes of κi,j under Setting 2 described above is illustrated in FIG. 7B. In the Kuramoto model, elements interacting each other take closer values as time has elapsed sufficiently longer (i.e., elements having large values of κi,j). Therefore, it can be understood that the behavior of the proximity when t→∞ can be approximated.


<Hardware Configuration of Relationship Extraction Device 10>

Finally, a hardware configuration of the relationship extraction device 10 according to the present embodiment will be described with reference to FIG. 8. FIG. 8 is a diagram illustrating an example of a hardware configuration of the relationship extraction device 10 according to the present embodiment.


As illustrated in FIG. 8, the relationship extraction device 10 according to the present embodiment is implemented by a generic computer or computer system, and includes an input device 401, a display device 402, an external I/F 403, a communication I/F 404, a processor 405, and a memory device 406. These hardware components are connected via a bus 407 so as to be capable of communicating with each other.


The input device 401 is, for example, a keyboard, a mouse, a touch panel, and the like. The display device 402 is, for example, a display or the like. Note that the relationship extraction device 10 may or may not have at least one of the input device 401 and the display device 402.


The external the I/F 403 is an interface with an external device. The external I/F 403 is an interface with an external device. The external device includes a recording medium 403a or the like. The relationship extraction device 10 can execute read and write with the recording medium 403a via the external I/F 403. The recording medium 403a may store, for example, one or more programs that implement the approximate operator generation processing unit 100 and the relationship extraction processing unit 200.


Note that the recording medium 403a includes, for example, CD(Compact Disc), DVD(Digital Versatile Disk), SD memory card (Secure Digital memory card), USB(Universal Serial Bus) memory card, and the like.


The communication I/F 404 is an interface for connecting the relationship extraction device 10 to a communication network. Note that one or more programs that implements the approximate operator generation processing unit 100 and relationship extraction processing unit 200 may be obtained (downloaded) from a predetermined server device or the like via the communications I/F 404.


The processor 405 is any of various types of arithmetic/logic devices, for example, a CPU(Central Processing Unit), a GPU(Graphics Processing Unit), and the like. The approximate operator generation processing unit 100 and the relationship extraction processing unit 200 are implemented by, for example, a process in which one or more programs stored in the memory device 406 causes the processor 405 to execute.


The memory device 406 is any of various types of storage devices such as, for example, an HDD (Hard Disk Drive), SSD (Solid State Drive), RAM (Random Access Memory), ROM (Read-Only Memory), flash memory, and the like. The storage unit 300 is implemented by, for example, the memory device 406. However, the storage unit 300 may be implemented by, for example, a storage device connected to the relationship extraction device 10 through a communication network.


By having the hardware configuration illustrated in FIG. 8, the relationship extraction device 10 according to the present embodiment can implement the approximate operator generation process and the relationship extraction process described above. Note that the hardware configuration illustrated in FIG. 8 is an example, and the relationship extraction device 10 may have another hardware configuration. For example, the relationship extraction device 10 may have more than one processors 405 or more than one memory devices 406.


The present invention is not limited to the embodiments described above that have been specifically disclosed, and various modifications, changes, combinations with known techniques, and the like can be made within a range not deviating from the description of the claims.


The present application is based on a base application No. 2020-035051 filed in Japan on Mar. 2, 2020, the entire contents of which are hereby


INCORPORATED BY REFERENCE
List of Reference Numerals




  • 10 relationship extraction device


  • 100 approximate operator generation processing unit


  • 101 obtaining unit


  • 102 approximate operator generation unit


  • 200 relationship extraction processing unit


  • 201 obtaining unit


  • 202 relationship extraction unit


  • 300 storage unit


  • 401 input device


  • 402 display device


  • 403 external I/F


  • 403
    a recording media


  • 404 communication I/F


  • 405 processor


  • 406 memory device


  • 407 bus


Claims
  • 1. A relationship extraction device comprising: a memory; anda processor configured to executea obtaining a set of data {x0, . . . , xT−1}□X each having multiple elements and a set of data {y0=f(x0), . . . , yT−1=f(xT−1)}□Y each having multiple elements, where f is any mapping; generating an approximate operator that approximates a Perron-Frobenius operator K satisfying Kφ1(xt)=φ2(yt) for t=0, . . . , T−1, wherein φ1 is a feature mapping with respect to a positive definite kernel function k1 on X×X that takes C*-algebra values, and φ2 is a feature mapping with respect to a positive definite kernel function k2 on Y×Y that takes C*-algebra values;obtaining data xt and xs as targets of relationship extraction; andextracting a relationship between each element of xt and each element of xs by using the approximate operator.
  • 2. The relationship extraction device as claimed in claim 1, wherein the extracting extracts the relationship for analyzing anomaly detection or causal estimation.
  • 3. The relationship extraction device as claimed in claim 1, wherein the extracting extracts a C*-algebra value representing the relationship, by approximating an inner product <xt, xs>k defined on an RKHM (reproducing kernel Hilbert C*-module) with respect to the positive definite kernel function k1, by the approximate operator.
  • 4. The relationship extraction device as claimed in claim 1, wherein in a case of X=Y, yt=xt+1, k1=k2=k, and φ1=φ2=φ, the generating generates the approximate operator by using an operator {circumflex over ( )}K with which {circumflex over ( )}Kφ(xT−1) approximates φ(xT), and an orthonormal projection from an RKHM with respect to a positive definite kernel function K to a space represented by a linear combination of φ(xt) and C*-algebra values.
  • 5. The relationship extraction device as claimed in claim 1, wherein in a case of X≠Y, the generating generates the approximate operator by using a linear mapping {circumflex over ( )}K with which {circumflex over ( )}Kφ1(xt) approximates φ2(yt), and an orthonormal projection from an RKHM with respect to the positive definite kernel function k1 to a space represented by a linear combination of φ1(xt) and C*-algebra values.
  • 6. A method of extracting relationship executed by a computer including a memory and a processor, the method comprising: obtaining a set of data {x0, . . . , xT−1}□X each having multiple elements and a set of data {y0=f(x0), . . . , yT−1=f(xT−1)}□Y each having multiple elements, where f is any mapping;generating an approximate operator that approximates a Perron-Frobenius operator K satisfying Kφ1(xt)=φ2(yt) for t=0, . . . , T−1, wherein (pi is a feature mapping with respect to a positive definite kernel function k1 on X×X that takes C*-algebra values, and φ2 is a feature mapping with respect to a positive definite kernel function k2 on Y×Y that takes C*-algebra values;obtaining data xt and xs as targets of relationship extraction; andextracting a relationship between each element of xt and each element of xs by using the approximate operator.
  • 7. A non-transitory computer-readable recording medium having computer-readable instructions stored thereon, which when executed, cause a computer to function as the relationship extraction device as claimed in claim 1.
Priority Claims (1)
Number Date Country Kind
2020-035051 Mar 2020 JP national
PCT Information
Filing Document Filing Date Country Kind
PCT/JP2021/006689 2/22/2021 WO