The present invention relates to a computer-implemented method of performing an inference task on a knowledge graph comprising semantic triples of entities, wherein entity types are subject, object and predicate, and wherein each semantic triple comprises one of each entity type, using a quantum computing device, wherein a first entity of a first type and a second entity of a second type are given and the inference task is to infer a third entity of the third type.
The invention also relates to a computing system for performing said computer-implemented method.
Semantic knowledge graphs (KGs) are graph-structured databases consisting of semantic triples (subject, predicate, object), where subject and object are nodes in the graph and the predicate is the label of a directed link between subject and object. An existing triple normally represents a fact, e.g., (California, located in, USA) with “California” being the subject, “located in” being the predicate and “USA” being the object. Missing triples stand for triples known to be false (closed-world assumption) or with unknown truth value. In recent years a number of sizable knowledge graphs have been developed. For example, the currently largest KGs contain more than 100 billion facts and hundreds of millions of distinguishable entities.
The large number of facts and entities in a knowledge graphs makes it particularly difficult to scale the learning and inference algorithm to perform inference on the entire knowledge graph.
In the fields of tensor decomposition and matrix factorization, among others the following algorithms have been developed so far:
In the publication by Jie Chen and Yousef Saad, “On the tensor svd and the optimal low rank orthogonal approximation of tensors”, SIAM Journal on Matrix Analysis and Applications, 30(4):1709-1734, 2009 (hereafter cited as “Chen et al.”), the singular value decomposition of tensors is described.
In the publication by Dimitris Achlioptas and Frank McSherry, “Fast computation of low-rank matrix approximations”, Journal of the ACM (JACM), 54(2):9, 2007 (hereafter cited as “Achlioptas et al.”), methods for fast computation of low-rank matrix approximations of large matrices are described.
In the publication by Maximilian Nickel et al., “A review of relational machine learning for knowledge graphs”, Proceedings of the IEEE, 104(1):11-33, 2016 (hereafter cited as “Nickel et al.”), some fundamental mathematical properties regarding knowledge graphs are described.
In the fields of quantum computing, several methods and algorithms have been developed so far.
For example, the publication by S. Lloyd et al., “Quantum principal component analysis”, arXiv: 1307.0401v2 of Sep. 16, 2013 (cited hereafter as “Lloyd et al.”), describes a method of using a density matrix (or: a density operator) ρ, for determining eigenvectors of unknown states.
In the publication by I. Kerenidis et al., “Quantum Recommendation Systems”, arXiv: 1603.08675v3 of Sep. 22, 2016 (cited hereafter as “Kerenidis et al.”), a quantum algorithm for recommendation systems is described.
In the publication by P. Rebentrost, “Quantum singular value decomposition of non-sparse low-rank matrices”, arXiv: 1607.05404v1 of Jul. 19, 2016 (cited hereafter as “Rebentrost et al.”), a method for exponentiating non-sparse indefinite low-rank matrices on a quantum computer is proposed.
In the publication by A. Kitaev, “Quantum measurements and the Abelian Stabilizer Problem”, arXiv: quant-ph/9511026v1 of Nov. 20, 1995 (hereafter cited as “Kitaev”), a polynomial quantum algorithm for the Abelian stabilizer problem is proposed. The method is based on a procedure for measuring an eigenvalue of a unitary operator.
In the publication by V. Giovannetti et al., “Quantum random access memory”, arXiv: 0708.1879v2 of Mar. 26, 2008 (hereafter cited as “Giovannetti et al.”), a method for implementing a robust quantum random access memory, qRAM, algorithm is proposed.
The publication by Ma et al., “Variational Quantum Circuit Model for Knowledge Graphs Embedding”, arXiv: 1903.00556v1 of Feb. 19, 2019 (hereafter cited as “Ma et al.”), variational quantum circuits for knowledge graph embedding and related methods are proposed.
A basic textbook about quantum information theory is the textbook by Nielsen et al., “Quantum computation and quantum information”, Cambridge University Press, ISBN 9780521635035 hereafter cited as “Nielsen et al.”.
It is an object of the present invention to provide an improved method of performing an inference task and an improved system for performing an inference task, in particular by utilizing the intrinsically parallel computing power of quantum computation and by providing quantum algorithms which can dramatically accelerate the inference task.
Thanks to the rapid development of quantum computing technologies, quantum machine learning is becoming an active research area which attracts researchers from different communities. In general, quantum machine learning exhibits great potential for accelerating classical algorithms.
Most of quantum machine learning algorithms contain subroutines for singular value decomposition, singular value estimation and singular value projection of data matrices that are prepared and presented as quantum density matrices. However, unlike it is the case for matrices, most tensor problems are NP-hard and there is no existing quantum computation method which can handle tensorized data. Since knowledge graphs comprise at least triplets of entities, and are thus modelled by at least three-dimensional tensors, such a quantum computation method is desired.
The above objectives are solved by the subject-matter of the independent claims. Advantageous options, refinements and variants are described in the dependent claims.
Therefore, the present invention provides, according to a first aspect, a computer-implemented method of performing an inference task on a knowledge graph comprising semantic triples of entities, wherein entity types are subject, object and predicate, and wherein each semantic triple comprises one of each entity type, using a quantum computing device, wherein a first entity of a first type and a second entity of a second type are given and the inference task is to infer a third entity of the third type, comprising at least the steps of:
Therefore, in this work a quantum machine learning method on tensorized data is proposed, e.g., on data derived from large knowledge graphs. The presented tensor factorization method advantageously has a polylogarithmic runtime complexity.
Quantum machine learning algorithms on two-dimensional matrices data, such as a preference matrix in a recommendation system, may be performed in any known way, for example as has been described in “Kerenidis et al.”.
The partially observed tensor {circumflex over (χ)} may be interpreted as a sub-sampled (or: sparsified) tensor of a theoretically completely filled tensor χ which comprises the information about all semantic triplets for all subjects, objects and predicates of given sets.
Providing the cutoff threshold T may comprise calculating the cutoff threshold as will be described in the following.
Creating the density operator in the quantum random access memory may be performed in any known way, for example as has been described in “Giovannetti et al.”.
The quantum computing device may be implemented in any known way, for example as has been described in “Ma et al.”.
Preparing the unitary operator and applying it to the first entity state may be performed in any known way, for example as has been described in “Lloyd et al.” and/or “Rebentrost et al.”.
Performing the quantum phase estimation on the clock register may be performed in any known way, for example as has been described in “Kitaev”.
Performing the computation to retrieve the singular values may be performed in any known way, for example as has been described in “Nielsen et al.”.
The quantum singular value projection and/or the tracing out of the clock register may be performed in any known way, for example as has been described in “Rebentrost et al.”.
The method described herein is highly advantageous because it shows how a meaningful cutoff for a useful approximation of a partially known tensor can be achieved. In matrix cutoff schemes based on matrix singular value decomposition, essentially the singular values larger than, or equal to, a cutoff threshold are kept and those that are smaller are disregarded.
However, in the tensor case, negative singular values can arise. This ordinarily creates the problem that, according to a normal ordering, singular values with large absolute values but negative sign would be arranged behind singular values with positive values with small absolute values. The classical cutoff scheme then is no longer be meaningful, as it would disregard singular values with large negative values which may potentially be important.
In the present invention, this issue is overcome by performing the method with the described steps so that the cutoff threshold is applied to the squares of the singular values, thus ignoring the above-discussed sign problem.
In some advantageous embodiments or refinements of embodiments, the partially observed tensor {circumflex over (χ)} is obtained such that, for each entry of the partially observed tensor, the entry is with a probability p directly proportional to a corresponding entry of a complete tensor χ modelling a complete knowledge graph and equal to 0 with a probability of 1−p, with p being smaller than 1. This allows to determine the cutoff threshold τ in a suitable way so that the required computing power and required memory is minimized.
In some advantageous embodiments or refinements of embodiments, the cutoff threshold is chosen as smaller or equal to a quantity which is indirectly proportional to the probability p. The inventors have found this to be a useful criterion for choosing a suitable cutoff threshold.
In some advantageous embodiments or refinements of embodiments, the probability p is chosen to be larger to or equal a maximum value out of a set of values. That set may be designated as a “lower bound set”.
In some advantageous embodiments or refinements of embodiments, the set of values comprises at least a value of 0.22.
In some advantageous embodiments or refinements of embodiments, the partially observed tensor {circumflex over (χ)} is expressable as the sum of the complete tensor χ and a noise tensor N. A desired value {tilde over (∈)}>0 may be defined such that the Frobenius norm ∥⋅∥F of a rank-r-approximation , of the noise tensor is bounded such that ∥r∥F≤{tilde over (∈)}∥A∥F. The set of values comprises at least one value that is proportional to r and indirectly proportional to {tilde over (∈)} to the n-th power, with n integer and n≥1.
In some advantageous embodiments or refinements of embodiments, the set of values comprises at least one value that is proportional to r and that is indirectly proportional to the square of {tilde over (∈)}.
In some advantageous embodiments or refinements of embodiments, the set of values comprises at least one value that is proportional to a square root of r and that is indirectly proportional to {tilde over (∈)}.
In some advantageous embodiments or refinements of embodiments, the set of values comprises at least one value that is independent of r and that is indirectly proportional to the square of {tilde over (∈)}.
The present invention also provides, according to a second aspect, a computing system comprising a classical computing device and a quantum computing device, wherein the computing system is configured to perform the method according to any embodiment of the method according to the first aspect of the present invention. The computing system comprises an input interface for receiving an inference task (or: query, i.e. a first entity of a first entity type and a second entity of a second entity type), and an output interface for outputting the inferred third entity of the third entity type.
The computing device may be realised as any device, or any means, for computing, in particular for executing a software, an app, or an algorithm. For example, the computing device may comprise a central processing unit (CPU) and a memory operatively connected to the CPU. The computing device may also comprise an array of CPUs, an array of graphical processing units (GPUs), at least one application-specific integrated circuit (ASIC), at least one field-programmable gate array, or any combination of the foregoing.
Some, or even all, parts of the computing device may be implemented by a cloud computing platform.
A storage/memory may be a data storage like a magnetic storage/memory (e.g. magnetic-core memory, magnetic tape, magnetic card, magnet strip, magnet bubble storage, drum storage, hard disc drive, floppy disc or removable storage), an optical storage/memory (e.g. holographic memory, optical tape, Tesa tape, Laserdisc, Phasewriter (Phasewriter Dual, PD), Compact Disc (CD), Digital Video Disc (DVD), High Definition DVD (HD DVD), Blu-ray Disc (BD) or Ultra Density Optical (UDO)), a magneto-optical storage/memory (e.g. MiniDisc or Magneto-Optical Disk (MO-Disk)), a volatile semiconductor/solid state memory (e.g. Random Access Memory (RAM), Dynamic RAM (DRAM) or Static RAM (SRAM)), a non-volatile semiconductor/solid state memory (e.g. Read Only Memory (ROM), Programmable ROM (PROM), Erasable PROM (EPROM), Electrically EPROM (EEPROM), Flash-EEPROM (e.g. USB-Stick), Ferroelectric RAM (FRAM), Magnetoresistive RAM (MRAM) or Phase-change RAM) or a data carrier/medium.
The invention will be explained in yet greater detail with reference to exemplary embodiments depicted in the drawings as appended.
The accompanying drawings are included to provide a further understanding of the present invention are incorporated in and constitute a part of the specification. The drawings illustrate the embodiments of the present invention and together with the description serve to illustrate the principles of the invention. Other embodiments of the present invention and many of the intended advantages of the present invention will be readily appreciated as they become better understood by reference to the following detailed description. Like reference numerals designate corresponding similar parts.
The numbering of method steps is intended to facilitate understanding and should not be construed, unless explicitly stated otherwise, to mean that the designated steps have to be performed according to the numbering of their reference signs. In particular, several or even all of the method steps may be performed simultaneously, in an overlapping way or sequentially.
Although specific embodiments have been illustrated and described herein, it will be appreciated by those of ordinary skill in the art that the variety of alternate and/or equivalent implementations may be substituted for the specific embodiments shown and described without departing from the scope of the present invention. Generally, this application is intended to cover any adaptations or variations of the specific embodiments discussed herein.
This description contains two parts. The first part contributes to the classical binary tensor sparsification method. Especially, the first binary tensor sparsification condition is derived under which the original tensor can be well approximated by a truncated (or: projected) tensor SVD of its subsampled tensor.
The second part contributes to the method of performing knowledge graph inference on universal quantum computers. In order to handle the tensorized data, a quantum tensor contraction subroutine is described. Then, a quantum sampling method on knowledge graphs using quantum principal component analysis, quantum phase estimation and quantum singular value projection is described. The runtime complexity is analyzed, and it is shown that this sampling-based quantum computation method provides exponential acceleration with respect to the size of the knowledge graph during inference.
All the state-of-the-art algorithms for statistical relational learning on knowledge graphs are implemented on classical computational powers, e.g., CPUs or GPUs. The major difference between a classical approach and quantum approach is that classical algorithm is learning-based, e.g., by back-propagating the gradients of loss function, while the proposed quantum algorithm is sampling-based. The present method for implicit knowledge inference on knowledge graphs is implemented at least by measuring the quantum states returned by the quantum algorithm without requiring any particular loss function and gradients update rules.
In the first part in the following the conditions under which the classical tensor singular value decomposition (tSVD) can be applied to recover a subsampled tensor are shown. These conditions ensure that the quantum counterpart is feasible and has good performance in comparison with benchmarking classical algorithms. The second part explains the method of implicit knowledge inference from tensorized data on universal quantum computers. Furthermore, the runtime complexity of the quantum method is analyzed.
As an overview, first some theoretical and practical foundations and considerations for the method are described, and then the method according to an embodiment of the first aspect of the present invention is described in more detail. In addition, we also incidentally describe a computing system according to an embodiment of the second aspect of the present invention.
Part 1: Classical Tensor Singular Value Decomposition
First, singular value decomposition (SVD) of matrices is described. Then, a tensor SVD is introduced and it is shown that a given tensor can be reconstructed with a small error from the low-rank tensor SVD of the subsampled tensor.
the single value decomposition, SVD, can be defined in the following: let A∈m×n, the SVD is a factorization of A of the form A=UΣVT, where Σ is a rectangle diagonal matrix with singular values on the diagonal, U∈m×m and V∈n×n are orthogonal matrices with UTU=UUT=Im and VTV=VVT=In, wherein Im is an m×m identity matrix.
An N-way tensor is defined as =(i
The Frobenius norm is defined as ∥∥F:=√{square root over (, F)}. The spectral norm ∥∥σ of the tensor A is defined as
max{⊗1x1 . . . ⊗NxN|xk∈Sd
where the tensor-vector product is defined as
and Sd
In the following, a tensor single value decompositions, Tensor SVD, is described. In analogy to the matrix singular value decomposition, tensor singular value decomposition is described in detail e.g. in “Chen et al.”.
Definition 1. If a tensor ∈d
In real-world applications, we can only observe part of the non-zero entries in a given tensor , and the task is to infer unobserved non-zero entries with high probability. This corresponds to items recommendation for users given an observed preference matrix, or implicit knowledge inference given partially observed relational database. In other words, a “partially observed” tensor representing a knowledge graph is only partially known, since not all semantic triplets are known a priori, and the inference task is to infer interesting entities (subject, predict, or object) of semantic triples which are not contained in the “partially observed” tensor but which would be obtained in a hypothetical complete tensor . The partially observed tensor is herein also designated as sub-sampled or sparsified, denoted . Particularly, without further specifying the dimensionality of the tensor, the following subsampling and rescaling scheme proposed in “Achlioptas et al.” is used:
This means that the non-zero elements of a hypothetical complete tensor are independently and identically sampled with the probability p and rescaled afterwards. The sub-sampled tensor can be rewritten as =+, where is a noise tensor. Entries of are thus independent random variables with distribution
In the following, a 3-dimensional semantic tensor χ as one example of a tensor is of particular interest. The present methods builds on the assumption that the original semantic tensor χ modelling the complete knowledge graph (or, in from a different viewpoint, the tensor χ as the complete knowledge graph) has a low-rank approximation, denoted χr, with small rank r.
This is a plausible assumption if the knowledge graph contains global and well-defined relational patterns, as has been shown in “Nickel et al.”. Therefore, the question may be posed under what conditions the original tensor χ can be reconstructed approximately from the low-rank approximation of subsampled semantic tensor {circumflex over (χ)} derived from the incomplete knowledge graph. In the following, {circumflex over (χ)}r denotes the r-rank approximation of the subsampled tensor {circumflex over (χ)}; and {circumflex over (χ)}|⋅|>τ denotes the projection of {circumflex over (×)} onto the subspaces whose absolute singular values are larger than a predefined threshold τ>0.
Thus, with reference to
This classical computer-readable memory structure may be any volatile or non-volatile computer-readable memory structure such as a hard drive, a solid state drive and/or the like. The computer-readable memory structure may be part of a classical computing device which in turn may be part of a computing system for performing an inference task on a knowledge graph.
It shall be understood that the method that is being described with respect to
In further description of the method of
Suppose that we infer on a knowledge graph given the query (s, p, ?), i.e. subject s and predicate p are given, and the inference task at hand is to infer on the object o which completes the semantic triplet—or, in other words, which makes the semantic statement (s, p, o) true.
In the present context, the subjects are a first entity type which corresponds to a dimension of size d1 in χ and {circumflex over (χ)}; the predicates are a second entity type which corresponds to a dimension of size d2 in χ and {circumflex over (χ)}; and the objects are a third entity type which corresponds to a dimension of size d3 in χ and {circumflex over (χ)}.
Then, given an incomplete semantic triple (or: query, or: inference task) as (s, p, ?), the running time for inferring the correct objects to the query scales, in classical systems, as O(d3). This is because the same algorithm is repeated at least d3 times in order to determine possible answers, leading to huge waste of computing power especially, when nowadays the sizes of knowledge graphs are consistently growing.
Advantageously, only top-n returns from the reconstructed tensor {circumflex over (χ)} written as {circumflex over (χ)}sp1, . . . , {circumflex over (χ)}spn, are read out, where n is a small integer corresponding to the commonly used Hits@n metric (see e.g. “Ma et al.”). The inference is called successful if the correct object corresponding to the query can be found in the returned list {circumflex over (χ)}sp1, . . . , {circumflex over (χ)}spn. It can be proven that the probability of a successful inference is high if the reconstruction error small enough. Therefore, in the following we provide sub-sampling conditions under which the construction error is unexpectedly small.
Without further specifying the dimension of the tensor, let us consider a high-dimensional tensor . Theorem 1 gives the condition for the subsample probability under which the original tensor can be reconstructed approximately from r.
Theorem 1. Let ∈{0, 1}d
where N0=log 3/2, then the original tensor can be recons from the truncated tensor SVD of the subsampled tensor . The error satisfies ∥−r∥F≤∈∥∥F with probability at least 1−δ, where ∈ is a function of {tilde over (∈)}. Especially, {tilde over (∈)} together with the sample probability controls the norm of the noise tensor.
In particular it is desired that the Frobenius norm ∥⋅∥F of a rank-r-approximation r of the noise tensor is bounded such that ∥r∥F≤{tilde over (∈)}∥A∥F.
Now, it is briefly discussed why tensor |⋅|>τ is introduced before describing the reconstruction error caused by it. Note that quantum algorithms are fundamentally different from classical algorithms. For example, classical algorithms for matrix factorization approximate a low-rank matrix by projecting it onto a subspace spanned by the eigenspaces possessing top-r singular values with predefined small r. Quantum methods, e.g., quantum singular value estimation, on the other hand, can read and store all singular values of a unitary operator into a quantum register.
However, singular values stored in the quantum register cannot be read out and compared simultaneously since quantum state collapses after one measurement; measuring the singular values one by one will also break the quantum advantage. Therefore, we perform a projection onto the union of operator's subspaces whose singular values are larger than a threshold; and this step can be implemented on the quantum register without destroying the superposition. Moreover, since herein quantum principal component analysis is used as a subroutine which ignores the sign of singular values during the projection, reconstruction error given by |⋅|≥τ for the quantum algorithm may be analyzed.
The following Theorem 2 gives the condition under which can be reconstructed approximately from |⋅|>τ.
Theorem 2. Let ∈{0, 1}d
wherein p<1, wherein
N0=log 3/2; l1 denotes the largest index of singular values of tensor with σl
then the original tensor can be reconstructed from the projected tensor SVD of . The error satisfies ∥−|⋅|≥τ∥F≤∈∥∥F with probability at least 1−δ, where ∈ is a function of {tilde over (∈)} and ∈1. Especially, {tilde over (∈)} together with p1 and p2 determine the norm of noise tensor and ∈1 together with p3 control the value of 's singular values that are located outside the projection boundary.
Thus, as shown in the equation above, advantageously the threshold τ is chosen as smaller or equal to a quantity which is indirectly proportional to the probability p and/or smaller or equal to a quantity which is indirectly proportional to {tilde over (∈)}.
On the other hand, the probability p is advantageously chosen to be larger to or equal a maximum value out of a set of values, which set of values in the foregoing example comprises four values: 0.22, p1, p2, and p3. In other words, p will always be larger or equal to at least 0.22. Instead of 0.22, another value in the range of 0.2 and 0.24 may be chosen. However, experiments by the inventors have shown that 0.22 is an ideal value in order to ensure desirable properties for the threshold τ. Specifically, experiments as well as a numerical proof by the inventors have shown that 0.22 is the minimal subsample probability that a subsampled tensor can be reconstructed with a bounded error.
Thus, the set of values the set of values comprises:
a) at least one value in the range of between 0.2 and 0.24, preferably between 0.21 and 0.23, most preferably of 0.22;
b) at least one value p2 that is proportional to r and indirectly proportional to {tilde over (∈)} to the n-th power, with n integer and n≥1, in particular proportional to r and indirectly proportional to the square of {tilde over (∈)};
c) at least one value p3 that is proportional to a square root of r and that is indirectly proportional to {tilde over (∈)};
and/or
d) at least one value p1 that is independent of r and that is indirectly proportional to the square of {tilde over (∈)}.
In the bodies of Theorem 1 and 2 there exist data-dependent parameters r and l1 which are unknown a priori. These parameters can be estimated by performing tensor SVD to the original and subsampled tensors explicitly. However, in practice, mostly the subsampled tensor is given without knowing the subsample probability. For example, given an incomplete semantic tensor it is usually not known what percentage of information is missing and therefore the entries in the incomplete tensor cannot be easily rescaled. Fortunately, unlike the prior art, the present invention provides a rational initial guess for the subsample probability numerically, and inversely an initial guess for the lower-rank r and the projection threshold τ as well.
Part 2: Inference on Knowledge Graphs Using Quantum Computers
Quantum Mechanics
For ease of understanding, the Dirac notations of quantum mechanics as it is used herein is briefly described. Under Dirac's convention quantum states can be represented as complex-valued vectors in a Hilbert space . For example, a two-dimensional complex Hilbert 2 space can describe the quantum state of a spin-1 particle, which provides the physical realization of a qubit.
By default, the basis in 2 for a spin-1 qubit read |0=[1, 0]T and |1=[0, 1]T. The Hilbert space of a n-qubits system has dimension 2n whose computational basis can be chosen as the canonical basis |i∈{|0, 1}⊗n, where ⊗ represents tensor product. Hence any quantum state |ϕ∈2
wherein the squared coefficients |ϕi|2 can also be interpreted as the probability of observing the canonical basis state |i after measuring |ϕ using canonical basis.
Moreover, we use ϕ| is used to represent the conjugate transpose of |ϕ, i.e., (|ϕ)†=ϕ|. Given two stats |ϕ and |ψ The inner product on the Hilbert space is defined as ϕ|ψ*=ψ|ϕ. A density matrix is a projection operator which is used to describe the statistics of a quantum system. For example, the density operator of the mixed state |ϕ in the canonical basis reads ρ=Σi=12
The time evolution of a quantum state is generated by the Hamiltonian of the system. The Hamiltonian H is a Hermitian operator with H†=H. Let |ϕ(t) denote the quantum state at time t under the evolution of an invariant Hamiltonian H. Then according to the Schrôdinger equation |ϕ(t)=e−iHt|ϕ(0),
where the unitary operator e−iHt can be written as the matrix exponentiation of the Hermitian matrix H, i.e.,
Eigenvectors of the Hamiltonian H, denoted |ui, also form a basis of the Hilbert space. Then the spectral decomposition of the Hamiltonian H reads H=Σiλi|uiui|, where λi is the eigenvalue or the energy level of the system. Therefore, the evolution operator of a time-invariant Hamiltonian can be rewritten as
where we use the observation (|uiui|)n=|uiui| for n=1, . . . , ∞.
When applying it on an arbitrary initial state |ϕ(0) we obtain |ϕ(t)=e−iHt|ϕ(0)=Σie−iλ
The present invention concerns a method for the inference on knowledge graphs using a quantum computing device 200. In the following we focus on the semantic tensor χ∈{0, 1}d
Since knowledge graphs contain global relational patterns, χ could be approximated by a lower-rank tensor χr thereof reconstructed approximately from {circumflex over (χ)} via tensor SVD according to Theorem 1 and 2. Since our quantum method is sampling-based instead of learning-based, without loss of generality we consider sampling the correct objects given the query (s, p, ?) as an example and discuss the runtime complexity of one inference. Herein we therefore designate the given subject as a first entity of a first entity type (“subjects”), the predicate as a second entity of a second entity type (“predicates”) and the unknown object as a third entity of a third entity type (“objects”).
The preference matrix of a recommendation system normally contains multiple nonzero entries in a given user-row; items recommendations are made according to the nonzero entries in the user-row by assuming that the user is ‘typical’. However, in a knowledge graph there might be only one nonzero entry in the row (s, p, ⋅). Therefore, advantageously, for the inference on a knowledge graph quantum algorithm triples with the given subject s are sampled and then and post-selected on the predicate p. This is a feasible step especially if the number of semantic triples with s as subject and p as predicate is (1).
The present method contains the preparing and exponentiating of a density matrix derived from the tensorized classical data. One of the challenges of quantum machine learning is loading classical data as quantum states and measuring the states since reading or writing high-dimensional data from quantum states might obliterate the quantum acceleration. Therefore, the technique quantum Random Access Memory (qRAM) was developed (see “Giovannetti et al.”) which can load classical data into quantum states with exponential acceleration. For details about the qRAM technique, it is referred to “Giovannetti et al.”. The basic idea of the present method is to project the observed data onto the eigenspaces of {circumflex over (χ)} whose corresponding singular values have an absolute value larger than a threshold τ. Therefore, we need to create an operator which can reveal the eigenspaces and singular values of {circumflex over (χ)}.
As mentioned in the foregoing, in a step S10, a knowledge graph is modelled as a partially observed tensor {circumflex over (χ)} in a classical computer-readable memory structure 110 of a classical computing device 100, see
In a step S12, which does not have to be performed in this order necessarily, a cutoff threshold τ is provided, which is preferably determined as has been described in the foregoing.
In a step S20, the following density operator (or: density matrix) is created, on the quantum computing device 200, from {circumflex over (χ)} via a tensor contraction scheme:
where
means tensor contraction along the first dimension (here: the subject dimension since the exemplary inference task is (s, p, ?)); a normalization factor is neglected temporarily.
Especially, ρ{circumflex over (χ)}
(polylog(d1d2d3)) (6)
in the following way: First, the quantum state
is prepared via qRAM, which can be implemented in time (polylog(d1d2d3)), where |i1⊗|i2⊗|i3 represents the tensor product of index registers in the canonical basis.
The corresponding density matrix of the quantum state reads
After preparation, a partial trace implemented on the first index register of the density matrix
gives the desired operator ρ{circumflex over (χ)}
Suppose that {circumflex over (χ)} has a tensor SVD approximation with
Then the spectral decomposition of the density operator can be written as
Especially, the eigenstates |u2(i)⊗|u3(i) of ρ{circumflex over (χ)}
Then we need to readout singular values of ρ{circumflex over (χ)}
In order to write the singular values into a quantum register, in a step S30 the unitary operator
is prepared which is the tensor product of a maximally mixed state
with the exponentiation of the rescaled density matrix
Especially, the clock register C is needed for the phase estimation and Δt determines the precision of estimated singular values.
Recall that the query is (s, p, ?) on the knowledge graph, and that the present method should return triples with subject s. Hence, in a step S40, the quantum state |{circumflex over (χ)}s(1)I is created (or: generated) via qRAM in an input data register I, where {circumflex over (χ)}s(1) denotes the s-row of the flattened tensor {circumflex over (χ)} along the first dimension.
After preparing S40 the quantum state |{circumflex over (χ)}s(1)I, in a step S50 the prepared unitary operator U is applied onto
Implementing the unitary operator U is nontrivial since the exponent {tilde over (ρ)}{circumflex over (χ)}
can be applied to any quantum state up to an arbitrary simulation time t. The total number of steps for simulation is
where ∈ is the desired accuracy, and T{tilde over (ρ)} is the time for accessing the density matrix {tilde over (ρ)}. Hence the unitary operator U can be applied to any quantum state given simulation time t in
steps on quantum computers.
After applying the unitary operator U onto
we have the following quantum state
where
are the rescaled singular values of {tilde over (ρ)}{circumflex over (χ)}
In a step S60, a quantum phase estimation on the clock register C is performed, preferably using the quantum phase estimation algorithm proposed in “Kitaev”. The resulting state after phase estimation reads Σi=1Rβi|λiC⊗|u2(i)I⊗|u3(i)I, where
In fact, it can be shown that the probability amplitude of measuring the register C is maximized when
where └⋅┐ represents the nearest integer. Therefore, the small time step Δt determines the accuracy of quantum phase estimation. We may choose
and the total run time is
according to Eq. (6) and Eq. (7).
In a step S70, a computation on the clock register C is performed to recover the original singular values of ρ{circumflex over (χ)}
For example, in this step S70 the λi stored in the clock register C may be transferred to σi2, λi being a function of σi2. The threshold operations discussed in the following as applied to the σi2 may therefore also be, in an alternative formulation, be applied to the λi, with the threshold τ being appropriately rescaled.
In a step S90, a quantum singular value projection on the quantum state obtained from the last step S70 is performed. Notice that, classically, this step corresponds to projecting {circumflex over (χ)} onto the subspace {circumflex over (χ)}|⋅|≥τ. In this way, observed entries will be smoothed and unobserved entries get boosted from which we can infer unobserved triples (s, p, ?) in the test dataset (see Theorem 2).
Quantum singular value projection given the threshold τ>0 can be implemented in the following way. Therefore, in a step S80, a new auxiliary register R is created on the quantum computing device 200 using an auxiliary qubit and
a unitary operation that maps |σi2C⊗|0R to |σi2C⊗|1R only if σi2<τ2, otherwise |0R remains unchanged. This step of projection gives the state
In other words, the step S90 means performing, on the result of the computation S70 to recover the singular values, a singular value projection conditioned on the state of the auxiliary register, R, such that eigenstates whose squared singular values are to one side of (here: smaller than) the squared cutoff threshold, τ2, are entangled with a first eigenstate |1R of the auxiliary register, R, and such that eigenstates whose squared values are to another side of (here: larger than) the squared cutoff threshold, τ2, or equal to the squared cutoff threshold, τ2, are entangled with the second eigenstate |0R of the auxiliary register, R.
One of the major advantages here is that not the individual singular values are used in any decisions but only their squares. This means that possible negative singular values which may occur in the case of tensors (unlike in the case of matrices) do not have any negative impact on the present method.
In a step S100, the new register R is measured and post-selected on the state |0R. This gives the projected state
In a step S110, the clock register C is traced out such that the following equation is obtained:
The tracing out may be performed e.g. as has been described in “Nielsen et al.”.
In a step S120, the resulting quantum state from the last step S110 is measured in the canonical basis of the input register I to get the triples with subject s.
In a step S130, they are post-selected on the predicate p. This will return objects to the inference (s, p, ?) after
steps.
The quantum algorithm is summarized also in the following table Algorithm 1.
One of the main advantages of the present invention is that a method for implementing implicit knowledge inference from tensorized data, e.g., relational databases such as knowledge graphs, on quantum computing devices is proposed.
The present method shows that knowledge inference from tensorized data can be implemented with exponential acceleration on quantum computing devices. Compared to classical systems, this is, as has been shown, much faster and thus less resource-consuming than classical methods.
We also test the classical part of our method, namely the tensor singular value decomposition, on a classical devices since due to technical challenges current quantum devices only have a few universal physical qubits. The simulation results show comparable results to other benchmarking algorithms, which ensures the performance of implementing the quantum τSVD on future quantum computers.
The acceleration is given by the intrinsic parallel computing of quantum computing devices as described in the foregoing which, however, is only made applicable by the specific technical implementation of the present invention.
In some sense, the present method is based on finding the corresponding quantum counterpart of classical tensor singular value decomposition method. To show that tensor singular value decomposition has comparable performance with other classical algorithms, the present method is verified by investigating the performance of classical tensor SVD on benchmark datasets: Kinship and FB15k-237, see e.g. the scientific publication by Kristina Toutanova and Danqi Chen, “Observed versus latent features for knowledge base and text inference.”, in: Proceedings of the 3rd Workshop on Continuous Vector Space Models and their Compositionality, pages 57-66, 2015.
Given a semantic triple (s, p, o), the value function of τSVD is defined as
where us, up, uo are vector representations of s, p, o, respectively. The τSVD is trained by minimizing the objective function
via stochastic gradient descent. The hyper-parameter γ is used to encourage the orthonormality of embedding matrices for subjects, predicates, and objects.
in the following Table 1, the performance of tensor SVD model with other benchmark models,
e.g., RESCAL (proposed in Maximilian Nickel, Volker Tresp, and Hans-Peter Kriegel, “A three-way
model for collective learning on multi-relational data”, in: ICML, volume 11, pages 809-816, 2011), Tucker (L. R. Tucker, “Some mathematical notes on three-mode factor analysis”, Psychometrika, September 1966, Vol. 31, Issue 3, pp. 279-311), and ComplEx (Théo Trouillon, Johannes Welbl, Sebastian Riedel, Éric Gaussier, and Guillaume Bouchard, “Complex embeddings for simple link prediction”, in: International Conference on Machine Learning, pages 2071-2080, 2016) are shown.
Noisy intermediate-scale quantum processing units (or: quantum computing devices) are expected to be commercially available in the near future. With the help of these quantum computing devices and the present method, learning and inference on the ever-increasing industrial knowledge graphs can be dramatically accelerated compared to conventional computers.
In short, the invention provides a computer-implemented method of performing an inference task on a knowledge graph comprising semantic triples of entities, wherein entity types are subject, object and predicate, and wherein each semantic triple comprises one of each entity type, using a quantum computing device, wherein a first entity of a first type and a second entity of a second type are given and the inference task is to infer a third entity of the third type.
By performing specific steps and choosing values according to specific prescriptions, an efficient and resource-saving method is developed that utilizes the power of quantum computing systems for inference tasks on large knowledge graphs. In particular, an advantageous value for a cutoff threshold for a cutoff based on singular values of a singular value tensor decomposition is prescribed, and a sequence of steps is developed in which only the squares of the singular values are of consequence and their signs are not.
Number | Name | Date | Kind |
---|---|---|---|
20180137155 | Majumdar | May 2018 | A1 |
20200242444 | Zhang | Jul 2020 | A1 |
Entry |
---|
Ciliberto, “Quantum machine learning: a classical perspective”, Proc. R. Soc. A 474: 20170551, 2018. (Year: 2018). |
Ma, “Quantum Machine Learning on Knowledge Graphs”, 32nd Conference on Neural Information Processing Systems (NIPS Dec. 2018), Montréal, Canada. (Exception as prior art under 102(b)(1)(A)—listed for completeness of record). (Year: 2018). |
Ma, “Quantum Machine Learning Algorithm for Knowledge Graphs” 2020. (Date and authors precludes as prior art usage—listed for completeness of record and showing the NPL research of the application). (Year: 2020). |
Ma et al., “Variational Quantum Circuit Model for Knowledge Graphs Embedding”, arXiv: 1903.00556v1 of Feb. 19, 2019. |
Maximilian Nickel et al., “A review of relational machine learning for knowledge graphs”, Proceedings of the IEEE, 104(1):11-33, 2016. |
I. Kerenidis et al., “Quantum Recommendation Systems”, arXiv: 1603.08675v3 of Sep. 22, 2016. |
A. Kitaev, “Quantum measurements and the Abelian Stabilizer Problem”, arXiv: quant-ph/9511026v1 of Nov. 20, 1995. |
Nielsen et al., “Quantum computation and quantum information”, Cambridge University Press, ISBN 9780521635035. |
Dimitris Achlioptas and Frank McSherry, “Fast computation of low-rank matrix approximations”, Journal of the ACM (JACM), 54(2):9, 2007. |
S. Lloyd et al., “Quantum principal component analysis”, arXiv :1307.0401v2 of Sep. 16, 2013. |
Jie Chen and Yousef Saad, “On the tensor svd and the optimal low rank orthogonal approximation of tensors”, SIAM Journal on Matrix Analysis and Applications, 30(4):1709-1734, 2009. |
P. Rebentrost, “Quantum singular value decomposition of non-sparse low-rank matrices”, arXiv:1607.05404v1 of Jul. 19, 2016. |
V. Giovannetti et al., “Quantum random access memory”, arXiv: 0708.1879v2 of Mar. 26, 2008. |
Number | Date | Country | |
---|---|---|---|
20200364599 A1 | Nov 2020 | US |