The present disclosure relates to an analysis apparatus, an analysis method, and a program.
Data that appears in nature fundamentally involves randomness, and data analysis techniques that take randomness into account have been studied conventionally. As a framework for dealing with such randomness in data analysis, kernel mean embedding has been known. Randomness is formulated by a probability measure as a set function representing the likelihood of occurrences of an event. Kernel mean embedding is a method in which a concept of “proximity” such as an inner product or a norm is imparted to this probability measure, and the proximity between probability measures is determined by an inner product in a space referred to as an RKHS (reproducing kernel Hilbert space). As many data analysis methods are based on the concept of proximity, this makes it possible to apply general data analysis to data having randomness, such as measuring the proximity of data items including randomness, or estimating probability measures from which data having certain randomness is generated.
Meanwhile, as an analysis technique of data that does not include randomness and as a framework that takes interactions of multiple data items into account, a technique that uses an RKHM (reproducing kernel Hilbert Com-module) has been known. RKHM is an extension of RKHS, and instead of an inner product that normally takes a complex value, an inner product is defined to take a value in a space referred to as Com-algebra as a generalization of matrices and linear operators, with which analysis can be executed while preserving information on interactions. Accordingly, it becomes possible to precisely analyze data having interactions, and to extract information on interactions.
Meanwhile, it is often the case that data is generated by interactions of multiple random data items. Also, in the field of quantum computation or the like where a quantum is handled, the state of a quantum is represented by multiple probabilities, i.e., probabilities of observations. Although probability measures are used for formulating randomness, in the existing framework of data analysis, probability measures take complex values and cannot handle multiple randomness properties simultaneously. Meanwhile, in quantum mechanics, probability measures that take values of linear operators in a Hilbert space are used for formulating the state of a quantum represented by multiple probabilities (for example, Non-Patent Document 1). Also, in the field of pure mathematics, a concept referred to as a vector measure, which is a more generalized measure, is being studied theoretically (for example, Non-Patent Document 2).
However, Non-Patent Document 1 and Non-Patent Document 2 described above are still in the stage of theoretical studies, and in practical data analysis, no framework using a probability measure that takes a value of a linear operator has ever existed. Recently, studies that analyze data appearing from a quantum by using machine learning techniques have also attracted attention, and from such a viewpoint, it is considered that a framework using a probability measure that takes a value of a linear operator in data analysis, and is capable of handling multiple randomness properties simultaneously, is important.
One embodiment of the present invention has been made in view of the points described above, and has an object to implement data analysis having multiple randomness properties.
In order to achieve the above object, an analysis apparatus according to one embodiment includes: an obtainment unit configured to obtain a data set of multiple data items having randomness; and an analysis unit configured to calculate, as an inner product or a norm of probability measures μ and ν being probability measures on the data set and taking values in a von Neumann algebra, by using a mapping Φ that extends kernel mean embedding, an inner product or a norm of Φ(μ) and Φ(ν) mapped onto an RKHM.
Data analysis with multiple randomness properties can be implemented.
In the following, one embodiment of the present invention will be described. In the present embodiment, an analysis apparatus 10 that can analyze data having multiple randomness properties will be described. By using the analysis apparatus 10 according to the present embodiment, analysis of data having multiple randomness properties, in particular, for example, visualization of data in the case where multiple random data items are interacting with one another and data representing the state of a quantum, anomaly detection, and the like can be executed. Note that in addition to analysis such as visualization, anomaly detection, and the like, for example, the analysis apparatus 10 according to the present embodiment may execute control such as stopping a device, equipment, a program, or the like indicated by data in which an anomaly is detected based on the analysis result (in particular, an anomaly detection result or the like).
First, theoretical construction and application examples of the present embodiment will be described. In the present embodiment, kernel mean embedding is extended to impart a concept of “proximity” such as an inner product and a norm to a probability measure that takes a value of a linear operator. However, in order to execute an analysis that preserves as much information as possible on multiple randomness properties, the value of the inner product is not a complex value but a value of a linear operator. For this purpose, kernel mean embedding using an RKHM is used instead of the known kernel mean embedding using an RKHS.
Let X be a space to which data (data having randomness) belongs, and let A be a von Neumann algebra, to consider an A-valued positive definite kernel k:X×X→A. Here, when stating that a mapping k:X×X→A is an A-valued positive definite kernel, the mapping satisfies the following Condition 1 and Condition 2. Note that as specific examples of the von Neumann algebra, for example, a set of all linear operators, a set of all matrices, and the like may be enumerated.
(Condition 1) For any x, y∈X, k(x,y)=k(x,y)* (where * denotes conjugate).
(Condition 2) Let m be any natural number, for any x0, x1, . . . , xm-1∈X and any c0, c1, . . . , cm-1∈A, the following double summation is positive.
Here, “positive” means being positive constant in the von Neumann algebra, which is a generalization of a Hermitian matrix whose all eigenvalues are greater than or equal to 0 (i.e., Hermitian positive definite), or the like.
Given an A-valued positive definite kernel k, a mapping φ from X to an A-valued function is defined by φ(x)=k(⋅,x). This mapping φ is also referred to as a feature map.
For a natural number m; x0, x1, . . . xm-1∈X; and c0, c1, . . . , cm-1∈A, a space referred to as an RKHM can be constructed from the entirety of the following linear combination.
This space is denoted as Mk. In Mk, an inner product ⋅,⋅k taking an A value and a magnitude |⋅|k taking an A value can be defined.
An A-valued measure on X is a function μ from a subset of X referred to as a measurable set, to A, that satisfies, for a countable infinite number of measurable sets E1, E2, . . . where no two pair has an intersection, the following equation:
For an A-valued measure, an integral with respect to the measure can be considered. When an A-valued function f is represented as a limit of a sequence of functions referred to as simple functions as follows:
{si}i=1∞ [Math. 4]
the integral of f with respect to μ is defined as a limit of the integrals of si with respect to μ, where a simple function s is, for a certain finite number of measurable sets E1, . . . , En in which no two pair has an intersection, and c1, . . . , cn∈A, expressed as follow:
where
χΕ
is an indicator function.
At this time, a value obtained by integrating s(x) with μ from the left is defined as follows:
and expressed as follows:
∫x∈Xdμ(x)s(x) [Math. 8]
Similarly, a value obtained by integrating s(x) with μ from the right is defined as follows:
and expressed as follows:
∫x∈Xs(x)dμ(x) [Math. 10]
Under the settings described above, a mapping Φ that maps finite A-valued measures to elements in the RKHM is defined as follows:
Φ(μ)=∫x∈Xϕ(x)dμ(x) [Math. 11]
which is referred to as kernel mean embedding. As the A-valued inner product between elements in the RKHM is determined, if Φ is injective, the A-valued inner product of finite A-valued measures μ and ν can be defined by the A-valued inner product of Φ(μ) and Φ(ν).
For example, for X=Rd and A=Cm×m, define k:X×X→A as follows:
k(x,y)=e−c∥x-y∥
where ∥⋅∥Ε is a Euclidean norm on Rd, c>0, and I is an identity matrix of order m. Also, R represents the entirety of real number values and C represents the entirety of complex number values. At this time, it can be shown that Φ determined from this k is injective.
An A-valued distance between finite A-valued measures μ and ν is defined as follows:
Υ(μ,ν)=|Φ(μ)−Φ(ν)|k
At this time, if Φ is injective, for example, ∥Υ(μ,ν)∥ completely satisfies the properties of distance. In other words, if ∥ΥΥ(μ,ν)∥=∥Υ(ν,μ)∥ and ∥Υ(μ,ν)∥=0, then μ=ν; and ∥Υ(μ,ν)∥≥∥Υ(μ,λ)∥+∥Υ(λ,ν)∥ is satisfied for any finite A-valued measures μ, ν, and λ.
Two examples of finite A-valued measures are presented below.
Let A=Cm×m, and consider two sets of m random variables X1, . . . , Xm and Y1, . . . , Ym that take values in X. Let P be a probability measure on X, and let μx be A-valued measures whose (i, j) element is a measure (Xi, Xj)*P representing the covariance of Xi and Xj (or A-valued measures of a centered version of the measures expressed in the following formula).
(Xi,Xj)*P−Xi*P⊗Xj*P [Math. 13]
At this time, Υ(μx,μy)=0 is equivalent to the covariances of random variables transformed by any bounded functions f and g being equal to each other. Therefore, by executing Kernel PCA that will be described later on such an A-valued measure, a space having a lower dimensionality can be obtained in which information on the covariances between data items is preserved.
In practice, when given data {x1,1, x1,2, . . . , x1,N}, . . . , {xm,1, xm,2, . . . , xm,N} obtained from X1, . . . , Xm, and data {y1,1, y1,2, . . . , y1,N}, . . . , {ym,1, ym,2, . . . , ym,N} obtained from Y1, . . . , Ym, an (i, j) element of the inner product Φ(μX), Φ(μY)k of Φ(μX) and Φ(μY) is approximated by the following formula (1):
Here, a case is considered in which k(x, y) is a Cm×m-valued positive definite kernel such that every element is a complexed-valued positive definite kernel on X2 denoted as follow:
{tilde over (k)}(x,y) [Math. 15]
In quantum mechanics, A is defined as a set of all bounded linear operators. A state of a quantum is represented by a linear operator and its observation is represented by an A-valued measure μ; therefore, with respect to linear operators ρ1 and ρ2 representing states of the quantum, and A-valued measures μ1 and μ2 representing observations, the proximity of observations pipe μ1ρ1 and μ2ρ2 of the states can be represented by the inner product of Φ(μ1ρ1) and Φ(μ2ρ2).
For example, let A=Cm×m and X=Cm, and for i=1, . . . , s, let |ψi∈X be a normalized vector. Under these settings, consider observations (i.e., A-valued measures on X) expressed as follows:
At this time, for states ρ1, ρ2∈Cm×m, an inner product of Φ(μρ1) and Φ(μρ2) can be calculated by the following formula (2):
Let A=Cm×m. For multiple A-valued measures μ1, . . . , μn, let G be a matrix having Φ(μi), Φ(μj)k∈A as (i,j) blocks. Then, G is a Hermitian positive definite matrix, and hence, there exist eigenvalues λ1≥ . . . ≥λmn≥0 and orthonormal eigenvectors v1, . . . , vmn corresponding to these eigenvalues. An i-th principal axis is defined as follows:
√{square root over (λi)}[Φ(μ1), . . . ,Φ(μn)][vi,0, . . . ,0] [Math. 18]
and is denoting as pi, then, p1, . . . , ps satisfy the following formula (3) with respect to any s=1, . . . , mn.
In other words, p1, . . . , ps can be regarded as a vector that minimizes the error among s vectors (normally s<<n) that represent Φ(μ1), . . . , Φ(μn). Therefore, by approximating Φ(μi) with the following formula,
μ1, . . . , μn can be visualized, or for a certain A-valued measure μ0, by regarding the following formula,
as a value indicating to what extent μ0 deviates from μ1, . . . , μn, anomaly detection can be executed. Also, as described above, the dimensionality reduction can be executed while preserving information on the covariances between data items.
Existing methods in machine learning and statistics that use kernel mean embedding in an RKHS can be applied to data having multiple elements dependent on one another, by generalizing kernel mean embedding of probability measures in the RKHS to kernel mean embedding of measures representing covariances in an RKHM as described in the above Example 1. For example, the following examples may be considered.
Also, by using the inner product of kernel mean embedding for measures representing the state of a quantum described in Example 2 above, the state of the quantum can be analyzed using machine learning or statistical methods.
Next, a hardware configuration of the analysis apparatus 10 according to the present embodiment will be described with reference to
As illustrated in
The input device 11 is, for example, a keyboard, a mouse, a touch panel, and the like. The display device 12 is, for example, a display or the like. Note that the analysis apparatus 10 may or may not have at least one of the input device 11 and the display device 12.
The external I/F 13 is an interface with an external device. The external device includes a recording medium 13a or the like. The analysis apparatus 10 can execute reading and writing with the recording medium 13a via the external I/F 13. Note that the recording medium 13a includes, for example, CD (Compact Disc), DVD (Digital Versatile Disk), SD memory card (Secure Digital memory card), USB (Universal Serial Bus) memory card, and the like.
The communication I/F 14 is an interface for connecting the analysis apparatus 10 to a communication network. The processor 15 includes various types of arithmetic/logic devices, for example, a CPU (Central Processing Unit), a GPU (Graphics Processing Unit), and the like. The memory device 16 is various types of storage devices such as, for example, an HDD (Hard Disk Drive), SSD (Solid State Drive), RAM (Random Access Memory), ROM (Read-Only Memory), flash memory, or the like.
By having the hardware configuration illustrated in
Next, a functional configuration of the analysis apparatus 10 according to the present embodiment will be described with reference to
As illustrated in
The storage unit 103 stores data to be analyzed (e.g., elements in X to be analyzed and A-valued measures of these, and further, linear operators representing the state of a quantum in the case of applying to Example 2 described above).
The obtainment unit 101 obtains data to be analyzed from the storage unit 103. The analysis unit 102 analyzes data obtained by the obtainment unit 101 (i.e., for example, calculation of the inner product and the norm, and visualization and anomaly detection using the calculation results, and the like).
Next, a flow of data analysis processing executed by the analysis apparatus 10 according to the present embodiment will be described with reference to
First, the obtainment unit 101 obtains data to be analyzed (i.e., elements in X to be analyzed and A-valued measures of these; linear operators representing states of a quantum in the case of applying to Example 2 described above; and the like) from the storage unit 103 (Step S101).
Then, the analysis unit 102 analyzes the date obtained at Step S101 described above (Step S102). Note that as examples of data analysis, calculation of the inner product and the norm described in “2. Applications of kernel mean embedding using RKHM”, visualization and anomaly detection using the calculation results, comparison of data items with one another, data generation, learning, and the like may be enumerated. Note that specific examples of methods of calculating the inner product are as expressed in the above formula (1) in the case of a measure representing the covariances between multiple data items having randomness, and as expressed in the above formula (2) in the case of measures representing the state of a quantum.
As described above, the analysis apparatus 10 according to the present embodiment can execute analysis of data having multiple randomness properties (in particular, visualization of data in the case where multiple random data items are interacting and data representing the state of a quantum, anomaly detection, and the like).
Finally, experimental results in the case where the analysis apparatus 10 according to the present embodiment was applied to Example 1 and Example 2 described in “2.1 Distance between A-valued measures” will be described.
With settings of X=R and Ω=R5, from random variables on Ω expressed as in the following formulas (4) to (6) that take value in X, data was generated.
[Math. 22]
X
1(ω)=ω1, X2(ω)=ω2, X3(ω)=ω3 (4)
Y
1(ω)=ω4 cos(0.1ω4), Y2(ω)=eω4, Y3(ω)=√{square root over (|ωS|)} (5)
Z
1(ω)=eω
Let μx be A-valued measures such that each (i, j) element represents a covariance of Xi and Xj expressed as follows:
(Xi,Xj)*P−Xi*P⊗Xj*P [Math. 23]
At this time, each of the inner product of Φ(μX) and Φ(μY), the inner product of Φ(μY) and Φ(μZ), and the inner product of Φ(μX) and Φ(μZ) was calculated by the above formula (1), and μX, μY, and μZ were visualized with the first principal axis and the second principal axis by Kernel PCA. The result is illustrated in
(Comparison with Existing Method)
Independent data items according to [X1, X2, X3] defined by the above formula (4), and independent data items according to [Y1, Y2, Y3] defined by the above formula (5) were prepared, and the two-sample test described in the above Reference material 1 was executed. Note that the two-sample test is a test that determines whether two types of samples follow the same probability distribution.
Comparison was made between a result of executing the two-sample test applied to distances between data items (i.e., measured by |Φ(μX)−Φ(μY)|k) measured by the analysis apparatus 10 according to the present embodiment (the proposed method), and a result of executing the two-sample test applied to conventionally measured distances (a conventional method). As the conventional methods, an RKHS described in Reference material 1, and Kantrovich and Dadley described in Reference material 4 “B. K. Sriperumbudur, K. Fukumizu, A. Gretton, B. Schölkopf, and G. R. G. Lanckriet, On the empirical estimation of integral probability metrics. Electronic Journal of Statistics, 6:1550-1599, 2012”, were adopted. Also, in each of the following Case 1 and Case 2, tests were executed 50 times with different data sets for each of the proposed method and the conventional methods, and the rate of results in which the two types of samples were determined to follow the same distribution was calculated. The results are illustrated in Table 1 below.
Case 1: 10 independent data items according to [X1, X2, X3] and 10 independent data items according to [X1, X2, X3]
Case 2: 10 independent data items according to [X1, X2, X3] and 10 independent data items according to [Y1, Y2, Y3]
It can be stated that the determination problem is accurately solved when the rate at which the two types of samples are determined to follow the same distribution is high in Case 1, and the rate at which the two types of samples are determined to follow the same distribution is low in Case 2. In the proposed method, a high rate of Case 1 and a low rate of Case 2 were achieved simultaneously, and it can be stated that accurate determination could be made in both cases.
In Example 2 above, assume that m=2 and s=4. In addition, assume the following ranges:
At this time, for a1,i=0.25 (where i=1, 2, 3, 4), set ρ1 as follows:
Also, for a2,1=0.4, a2,4=0.1, a2,2=a2,3=0.25, set ρ2 as follows:
Further, μ is defined as in Example 2 described above. A small amount of noise was added to each of ρ1 and ρ2, and 50 samples were prepared for each.
At this time, a first principal axis p1 was determined to minimize the error (reconstruction error) expressed in the above formula (3) for each of the 50 samples ρ1,i (where i=1, . . . , 50) related to ρ1, and for each of the 50 samples ρj,i (where j=1, 2 and i=1, . . . , 50) related to ρ2, a Cm×m-valued reconstruction error was calculated as follows:
|Φ(ρj,iμ)−p1p1,Φ(ρj,iμ)k|k [Math. 27]
Then, values of the norms were plotted. The plotted results are illustrated in
As illustrated in
The present invention is not limited to the embodiments described above that have been specifically disclosed, and various modifications, changes, combinations with known techniques, and the like can be made within a scope not deviating from the description of the claims.
The present application is based on a base application No. 2020-122352 in Japan, filed on Jul. 16, 2020, the entire contents of which are hereby incorporated by reference.
Number | Date | Country | Kind |
---|---|---|---|
2020-122352 | Jul 2020 | JP | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/JP2021/026531 | 7/14/2020 | WO |