QUANTUM PROCESSING OF PROBABILISTIC NUMERIC CONVOLUTIONAL NEURAL NETWORKS

INTRODUCTION

Aspects of the present disclosure relate to quantum processing in machine learning.

Quantum computing promises to unlock previously inconceivable computing performance for various types of processing tasks. For example, quantum computing can perform extremely fast and power efficient matrix multiplication operations. Thus, quantum computing is an excellent candidate for computationally intensive tasks, like machine learning.

Unfortunately, applying quantum computing to conventional machine learning presents many challenges, and thus various state of the art machine learning architectures have not been successfully implemented on quantum computing hardware.

Accordingly, what is needed are techniques for implementing machine learning architectures in quantum computing hardware, such as optical quantum computing hardware.

BRIEF SUMMARY

Certain aspects provide a method, comprising: performing a probabilistic convolution operation with an optical quantum computer, wherein input signals to the probabilistic convolution operation are encoded in light beams.

Other aspects provide a method, comprising simulating a quantum probabilistic convolution operation using a non-quantum processing system.

Other aspects provide processing systems configured to perform the aforementioned methods as well as those described herein; non-transitory, computer-readable media comprising instructions that, when executed by one or more processors of a processing system, cause the processing system to perform the aforementioned methods as well as those described herein; a computer program product embodied on a computer readable storage medium comprising code for performing the aforementioned methods as well as those further described herein; and a processing system comprising means for performing the aforementioned methods as well as those further described herein.

The following description and the related drawings set forth in detail certain illustrative features of one or more embodiments.

BRIEF DESCRIPTION OF THE DRAWINGS

The appended figures depict certain aspects of the one or more embodiments and are therefore not to be considered limiting of the scope of this disclosure.

FIG. 1 depicts an example covariance ellipse representing a quantum blob.

FIG. 2 depicts an example method for performing quantum probabilistic convolution.

FIG. 3 depicts an example processing flow using quantum optical processing equipment.

FIG. 4A depicts an example of a quantum processing system.

FIG. 4B depicts an example processing system for performing simulation of quantum processing of probabilistic numeric convolutional neural networks on non-quantum computing hardware.

To facilitate understanding, identical reference numerals have been used, where possible, to designate identical elements that are common to the drawings. It is contemplated that elements and features of one embodiment may be beneficially incorporated in other embodiments without further recitation.

DETAILED DESCRIPTION

Aspects of the present disclosure provide apparatuses, methods, processing systems, and computer-readable mediums for quantum processing of probabilistic numeric convolutional neural networks, such as by quantum computers. Further aspects relate to simulating quantum processing of probabilistic numeric convolutional neural networks on non-quantum processing systems.

In particular, aspects described herein apply quantum field theory to machine learning by encoding input signals to Gaussian states (a generalization of Gaussian processes that encodes the uncertainty about an input signal), representing linear and nonlinear layers of a machine learning model as unitary quantum gates, and interpreting the fundamental excitations of the quantum model as particles. Thus, aspects described herein provide a quantum field interpretation of classical probabilistic neural network model architectures, which beneficially allows for processing using quantum processing hardware, such as optical quantum processing systems.

Beneficially, the use of Gaussian states instead of conventional Gaussian processes allows for encoding and accounting for uncertainty due to discretization and sampling errors in, for example, light used by an optical quantum computer. Further, the implementation of nonlinear functions using unitary quantum gates that leverage quantum effects significantly reduces required processing resources compared to conventional methods.

Further, aspects described herein enable a quantum implementation and generalization of a probabilistic numeric convolutional neural network.

Aspects described herein may be particularly beneficial for irregularly sampled data, such as time-series data, superpixel image data, and the like.

Probabilistic Numeric Convolutional Neural Networks

Probabilistic numeric convolutional neural networks (PNCNNs) may be used for classifying an input signal with missing data. Such models generally use a Gaussian process on a continuous space X= custom-character ^Dto interpolate the input signal, and define the convolutional neural network based on the Gaussian process representation.

Consider a finite dimensional input space wherein input to a PNCNN is D={(x_i, y_i)}_i=1^N, which corresponds to the observation y_iof an input signal (field) φ:X→ custom-character at location x_i∈X. An underlying finite input grid X={1, . . . , l}x^×d, where d is the dimensionality of the input data (e.g., d=1 for time series data, d=2 for image data, etc.) and ×d indicates the dimensionality grid X, may be denoted by φ, a vector with components φ_x. Further, a prior Gaussian process (GP) with mean equal to zero and kernel k may be used to compute a posterior GP (μ′, k′) to interpolate the input signal as follows:

μ′
_x
=k
_x
^T
A
⁻¹
y, k′
_x,x′
−k
_x
^T
A
⁻¹
k
_x′ (1)

where k_x={k_x,x_i}_i=1^N, A_i,j=k_x_x_,x_j+δ_i,jσ_n², and σ_n²is measurement of noise. A sequence of convolutional layers and nonlinearities may then be applied on the input random vector φ⁽¹⁾˜GP (μ′, k′) according to:

custom-character =σ(()) (2)

where custom-character +1 is the next layer in a convolutional neural network. In the intermediate layers, φ has shape N_C×|X|, with N_Cbeing the number of channels and components φ_a,xand |X| being the number of points in the set X. Further, σ is a pointwise nonlinearity, B adds a bias, and is linear according to:

custom-character =Σ_kW_k (3)

where [W_k]^abis a matrix of parameters, a, b index the channels, and

D
_k=Σ_i_{1, . . . ,}_i_d_≥0α_i_d∂₁ⁱ¹. . . ∂_dⁱ^d (4)

where the unit vector in direction μ∈{1, . . . , d} is denoted by e_μ, (∂_μφ)_x=φ_x+e_μ−φ_xand α_i_{1, . . . ,}_i_dis constant. In Equation 3, above, the operator custom-character is the formal solution to an ordinary differential equation that represents the convolution operation.

After L transformations according to Equation 2, the output feature undergoes a global average pooling, custom-character =P, where:

$\begin{matrix} {(P φ)}_{a, x} = {\begin{matrix} \frac{1}{\langle X \rangle} \sum_{x \in X} φ_{a, x} & first element \\ φ_{a, x} & else \end{matrix} & (5) \end{matrix}$

This implements an invertible version of the global average pooling layer. Finally, the first element of φ^(L+1)is passed through a linear layer that produces an output whose mean is interpreted as logits for classification. Denoting custom-character =∘, the chain of operations of the PNCNN is thus:

Φ= custom-character P∘σ∘∘ . . . ∘⁽⁰⁾ (6)

where ∘ denotes a convolution operation.

Brief Introduction to Quantum Mechanics

In the quantum formalism, every classical configuration of a random field φ={φ_x}_x∈X∈ custom-character ^|X| (where channels are ignored here for simplicity) is associated with a vector φ. The span of all these vectors is a vector space H, with elements given by superpositions |ψ according to:

|Ψ custom-character =∫D(φ)ψ(φ)|φ′, D(φ=Π_x∈Xdφ_x (7)

Note that ψ(φ) are the coefficients ∈ custom-character (complex numbers) that are used to combine vectors |φ. The Hilbert space H is equipped with a scalar product φ|φ′′=δ(φ−φ′), so that φ|Ψ=ψ(φ). A linear operator on H is called a quantum field (indicated with a hat). The algebra of quantum fields is generated by the pairs {{circumflex over (φ)}_x, {circumflex over (π)}_x}_x∈Xas follows:

$\begin{matrix} 〈 φ \rangle {\hat{φ}}_{x} \langle Ψ 〉 = φ_{x} ψ (φ) & (8) \\ φ {\hat{π}}_{x} Ψ = - i \frac{\partial}{\partial φ_{x}} ψ (φ) & (9) \end{matrix}$

Note that equations 8 and 9 are self-adjoint and satisfy the canonical commutation relations:

[{circumflex over (φ)}_x,{circumflex over (π)}_x′]:={circumflex over (φ)}_x{circumflex over (π)}_x′−{circumflex over (π)}_x′{circumflex over (φ)}_x=iδ_x,x′ (10)

There will often be a need to compute analytic functions of operators O, e.g. f(O)=e^O, which may be defined by the Taylor expansion of f. O may be a function of {circumflex over (φ)} and {circumflex over (π)}, such as defined in the Hamiltonian of Equation 31, below.

Generally, a quantum state is a normalized superposition: custom-character Ψ=1. The expectation values of an operator O can thus be defined by its matrix elements in a state Ψ|O|Ψ, which reduces to classical expectation values when O is diagonal according to Ψ|O|Ψ=∫D(φ)D(φ′)Ψ|φφ|O|φ′φ′|Ψ=_φ˜|ψ(φ)|₂[O(φ)].

Quantum dynamics needs to preserve the norm of quantum states and acts by unitary operators Û=e^−itĤ, where Ĥ is a self-adjoint Hamiltonian. Instead of evolving states, the observables Â to be measured can be equivalently evolved according to: Â(t)=(Û^t) custom-character ÂÛ^t, which can be computed using the Baker-Campbell-Hausdorff formula:

$\begin{matrix} e^{+ it \hat{H}} \hat{A} e^{- it \hat{H}} = \hat{A} + it [\hat{H}, \hat{A}] - \frac{t^{2}}{2} [\hat{H}, [\hat{H}, \hat{A}]] + \dots & (11) \end{matrix}$

Equivalently, time evolution is described by the Heisenberg equation of motion:

$\begin{matrix} \frac{d \hat{A} (t)}{dt} = i [\hat{H}, \hat{A} (t)] & (12) \end{matrix}$

Measurements generally reduce a quantum superposition to a classical configuration. This projection |Ψ custom-character |φ occurs with probability |ψ()|².

Consider the following proposition: let |Ψ custom-character be a prior state in H=H₁⊗H₂. If a measurement on H₁gives outcome y₁then then posterior state is:

|ζ_y₁ custom-character =(P_y₁⊗1₂)|Ψ/

with P_y=|y custom-character y|, the projector on state |y.

Notably, Bayes rule follows from this proposition. Thus, a subsequent measurement of an observable on H₂with outcome y₂on the state |ζ_y₁ custom-character will give outcome probability:

$\begin{matrix} p (y_{2} ❘ y_{1}) = 〈 ζ_{y_{1}} \rangle 1_{1} \otimes P_{y_{2}} \langle ζ_{y_{1}} 〉 = \frac{〈 Ψ \rangle P_{y_{1}} \otimes P_{y_{2}} \langle Ψ 〉}{〈 Ψ \rangle P_{y_{1}} \otimes 1_{2} \langle Ψ 〉} & (13) \end{matrix}$

which coincides with p(y₁, y₂)/p(y₁).

Introduction to Gaussian States

Consider the 2|X| dimensional vector of operators:

{circumflex over (R)}=({circumflex over (φ)}₁, . . . , {circumflex over (φ)}_|X|, {circumflex over (π)}₁, . . . , {circumflex over (π)}_|X|) (14)

After introducing the symplectic form J, Equation 10 reads as:

$\begin{matrix} [{\hat{R}}_{i}, {\hat{R}}_{j}] = i J_{ij}, J = (\begin{matrix} 0 & 1_{\langle X \rangle} \\ - 1_{\langle X \rangle} & 0 \end{matrix}) & (15) \end{matrix}$

Gaussian states are specified uniquely by their mean and covariance defined as:

m= custom-character Ψ|{circumflex over (R)}|Ψ (16)

½C_ij= custom-character Ψ|½({circumflex over (R)}_i{circumflex over (R)}_j+{circumflex over (R)}_j{circumflex over (R)}_i)|Ψ−m_im_j (17)

Here and below, the first and second |X| components related to {circumflex over (φ)}, {circumflex over (π)} sectors are denoted by 1, 2:

$\begin{matrix} m = (m^{1}, m^{2}), C = (\begin{matrix} C^{1 1} & C^{1 2} \\ C^{2 1} & C^{2 2} \end{matrix}) & (18) \end{matrix}$

The covariance matrix C satisfies:

C=C
^T
, C>0, C+iJ≥0 (19)

The condition C+iJ>0 encodes the uncertainty principle and distinguishes quantum Gaussian states from classical Gaussian distributions on phase space. FIG. 1 gives a visualization of this fact. In particular, in FIG. 1 depicts a covariance ellipse 102 given by ½ (z−m)^TC(z−m)≤1, also known as “quantum blob”, in a 2D phase space with coordinates z=(φ, π). The area of covariance ellipse 102 is proportional to Det(C)>1 per the uncertainty principle.

A Gaussian state may be indicated as |m, C custom-character ². The wave-function φ|m, C of a Gaussian state is a Gaussian function of φ, albeit with complex quadratic form.

Thus, it can be shown that the following unitary transformations, whose Hamiltonians are at most quadratic in {circumflex over (R)}_i, implement the most general transformations among Gaussian states:

$\begin{matrix} \hat{D} (ξ) = e^{i {\hat{R}}^{T} j ξ}, ξ \in ℝ^{2 \langle X \rangle} & (20) \\ \hat{ω} (S) = e^{\frac{i}{2} {\hat{R}}^{T} JX \hat{R}}, S = e^{X} \in S p_{2 \langle X \rangle} (ℝ) & (21) \end{matrix}$

The Lie group Sp_2|X| ( custom-character ) is the symplectic group of matrices satisfying SJS^T=J, which include rotations and scaling. An element X in its Lie algebra, S=e^X, satisfies JX+X^TJ=JX−(JX)^T=0, so:

({circumflex over (R)}^TJX{circumflex over (R)}) custom-character ={circumflex over (R)}^TJX{circumflex over (R)} (22)

which ensures unitarity of {circumflex over (ω)}(S). {circumflex over (D)} and {circumflex over (ω)} implement symmetry transformations (resp. translations and linear symplectic transforms) on H.

It is possible to show that {circumflex over (R)} intertwines their action on H and the fundamental action on custom-character ^2|X|. Consider that the unitaries of Equations 20 and 21 represent symplectic affine transformations of the canonical operators:

{circumflex over (D)}(ξ) custom-character {circumflex over (R)}{circumflex over (D)}(ξ)={circumflex over (R)}+ξ (23)

{circumflex over (ω)}(S) custom-character {circumflex over (R)}{circumflex over (ω)}(S)=S{circumflex over (R)} (24)

The effect on the mean and covariance of Gaussian states can then be derived. Specifically, under the unitaries of Equations 20 and 21, the Gaussian states transform as:

{circumflex over (D)}(ξ)|m,C custom-character =|m+ξ,C (25)

{circumflex over (ω)}(S)|m,C=|Sm,SCS^T (26)

Thus, for any |Ψ:

(ξ){circumflex over (R)}{circumflex over (D)}(ξ|Ψ=Ψ|({circumflex over (R)}+ξ)Ψ=m+ξ

Ψ|{circumflex over (ω)}(S){circumflex over (R)}{circumflex over (ω)}(S)|Ψ=Sm=m′

Ψ|{circumflex over (ω)}(S)½({circumflex over (R)}_i{circumflex over (R)}_j+{circumflex over (R)}_j{circumflex over (R)}_i){circumflex over (ω)}(S)|Ψ−m′_im′_j=Σ_k,lS_ik½C_klS_jl=½(SCS^T)_ij

The result follows by specifying |Ψ.

Quantum Extensions of Probabilistic CNNs
The following introduces a series of quantum operations that generalize the classical layers of a PNCNN. As below, each of these operations has a natural implementation on a quantum optical computer.

State Preparation
As an example of how to perform Bayesian inference with Gaussian states, let 0, C be a Gaussian prior state such that C_x,x′¹¹=k_x,x′. Given data D={(x_i,y_i)}_i=1^N. Then the posterior may be represented as:

$\begin{matrix} ζ_{D} = \frac{P_{D} \langle 0, C 〉}{ P_{D} \langle 0, C 〉 }, P_{D} = \prod_{i = 1}^{N} {\langle y_{i} 〉}_{x_{i} x_{i}} 〈 y_{i} \rangle & (27) \end{matrix}$

where |y_x is an eigenstate of {circumflex over (φ)}_x. Then:

|ζ_D=|m′,C′, (m′)1/x=μ_x′, (C′)_x,x′¹¹=k′_x,x′ (28)

where μ′, k′ is as in Equation 1.

It follows that |Ψ=|0, C, H₁is the space corresponding to locations x_iand the formulas for the Gaussian conditionals.

Thus, a quantum GP inference step according to Equations 27 and 28 allows for encoding a classical signal onto a quantum state in such a way that the quantum correlations represent uncertainty about discretization errors.

Quantum Linear Layers
Next, it is possible to show how to perform the quantum equivalent of a linear layer that performs on the quantum fields {circumflex over (R)} the same transformation that a classical linear layer would perform on a classical field R.

A quantum linear layer may be defined as the unitary:

Û
_lin(ξ,S)={circumflex over (D)}(ξ){circumflex over (ω)}(S) (29)

where {circumflex over (D)}(ξ) and {circumflex over (ω)}(S) generalize the bias and multiplication by the weight matrix respectively.

Quantum Nonlinearity
Similar to classical nonlinearities, a quantum nonlinearity acts pointwise on quantum fields. This restricts the associated Hamiltonian to be Σ_x,aĤ_x,a, where Ĥ_x,aacts non-trivially only on the quantum fields at x, a. As a design principle, the following class of time evolutions, which map {circumflex over (φ)}_x,ato a function σ({circumflex over (φ)}_x,a), may be considered.

Consider that under the time evolution generated by:

Û
_σ=exp(−iΣ_x,aĤ_x,a) (30)

Ĥ
_x,a=½({circumflex over (π)}_x,af({circumflex over (φ)}_x,a)+f({circumflex over (φ)}_x,a){circumflex over (π)}_x,a) (31)

the fields evolve according to the equations of motion:

{circumflex over (φ)}_x,a(t)=f({circumflex over (φ)}_x,a(t)) (32)

{circumflex over (π)}_x,a(t)=−½({circumflex over (π)}_x,a)(t)f′({circumflex over (φ)}_x,a(t))+h.c.) (33)

where “h.c.” means the Hermitian conjugate of the expression preceding it.

The following relates σ to f of the previous proposition. The ordinary differential equations for Equations 32 and 33 have solutions:

$\begin{matrix} {\hat{φ}}_{x, a} (t) = F^{- 1} (F ({\hat{φ}}_{x, a} (0)) + t) & (34) \\ {\hat{π}}_{x, a} (t) = \frac{1}{2} ({\hat{π}}_{x, a} (0) \frac{f ({\hat{φ}}_{x, a} (0))}{f ({\hat{φ}}_{x, a} (t))} + h . c .) & (35) \end{matrix}$

where F′(x)=1/f (x). This can be directly confirmed by differentiating with respect to t to show that the time evolved fields satisfy the equation of motions. Rewriting the first equation as F({circumflex over (φ)}_x,a(t))=F({circumflex over (φ)}_x,a(0))+t and differentiating the left hand side gives:

$\begin{matrix} \partial_{t} F ({\hat{φ}}_{x, a} (t)) = F^{'} ({\hat{φ}}_{x, a} (t)) {\hat{φ}}_{x, a} (t) = \frac{{\hat{φ}}_{x, a} (t)}{f ({\hat{φ}}_{x, a} (t))} & (36) \end{matrix}$

which equals ∂_t(F({circumflex over (φ)}_x,a(0))+t)=1, showing that {circumflex over (φ)}_x,a(t) satisfies Equation 32. For the second equation, the first term in the parenthesis is differentiated as:

$\begin{matrix} {\hat{π}}_{x, a} (0) f ({\hat{φ}}_{x, a} (0)) \partial_{t} {(f ({\hat{φ}}_{x, a} (t)))}^{- 1} & (37) \\ = - {\hat{π}}_{x, a} (0) \frac{f ({\hat{φ}}_{x, a} (0))}{f ({\hat{φ}}_{x, a} (t))} \frac{f^{'} ({\hat{φ}}_{x, a} (t))}{f ({\hat{φ}}_{x, a} (t))} {\hat{φ}}_{x, a} (t) & (38) \\ = - {\hat{π}}_{x, a} (t) f^{'} ({\hat{φ}}_{x, a} (t)) & (39) \end{matrix}$

which shows that {circumflex over (π)}_x,a(t) satisfies Equation 33.

This derivation gives a general, albeit implicit, solution to the problem of constructing a quantum nonlinearity. However, an explicit solution for the case of softplus, a smooth version of ReLU, may be constructed. Specifically, the softplus nonlinearity with temperature parameter β is:

$\begin{matrix} σ_{β} (x) = \frac{1}{β} \log (1 + e^{β x}) & (40) \end{matrix}$

which corresponds to time evolution from time 0 to time 1 under

$\begin{matrix} {\hat{H}}_{x, a} = \frac{1}{2 β} ({\hat{π}}_{a, x} e^{- β {\hat{φ}}_{a, x}} + e^{- β {\hat{φ}}_{a, x}} {\hat{π}}_{a, x}) & (41) \end{matrix}$

Quantum Neural Network Architecture
A quantum neural network architecture may be defined by applying the preceding derivations, such that:

Û
_NN
=Û
_lin(ξ^(L),S^(L))Û_PÛ_σÛ_lin() (42)

where Û_σ and Û_linare as designed above, while:

Û
_P={circumflex over (ω)}(S=P⊕(P⁻¹)^T) (43)

is a global average pooling operator, where P is as in Equation 5. To make a prediction, the spatial locations that have not been aggregated over by averaging are discarded, a final linear classifier is applied, and the means {circumflex over (φ)}_cfor the c=1, . . . , C classes are measured according to:

l_c=ζ_D|{circumflex over (φ)}_cÛ_NN|ζ_D (44)

These l_care interpreted as the logits for the classification task at hand. For simplicity, the state |ζ_D was introduced above without referring to the channel dimension. To make sense of the action of Û_NNon the channel dimension, extra registers may be added for the channel dimension, which are initialized to the vacuum state, which is a Gaussian state with zero mean and unit covariance. While only a global average pooling at the end is considered in this example, it is possible to pool features in intermediate layers as well by simply discarding registers associated with the modes to be discarded.

Computing the logits of Equation 44 is in general intractable classically and thus requires a quantum computer. The definition of a quantum neural network herein has several advantages. First, Gaussian states are used for data interpolation. Second, a unitary gate is used for implementing the nonlinearity. In particular, the unitary gate implementation of the nonlinearity is far more efficient than other methods, such as using a non-unitary quantum channel to implement a nonlinearity of the type {circumflex over (φ)}_x,aσ({circumflex over (ω)}_x,a). Further, employing Hamiltonians, such as:

Ĥ
_cubic={circumflex over (φ)}_a,x³, Ĥ_Kerr=({circumflex over (ω)}_a,x²+{circumflex over (π)}_a,x²)² (45)

as nonlinearities, which correspond to low degree polynomial nonlinearities, is known to be inefficient for classical neural network approximation.

Symmetries in Neural Network Models
Generally, symmetries in classical neural networks are realized as linear maps g∈G that act on the activations φ as ρ(g)φ, where ρ is a representation matrix. On top of translations, prominent examples of G in machine learning include rotations and permutations. Having replaced the linear action on activations with {circumflex over (ω)}, unitary representations of G on quantum states may be defined by {circumflex over (ω)}(S_g:=ρ(g)⊕ρ*(g)), where ρ*(g)=ρ(g⁻¹)^Tis the dual representation, ensuring symplecticity of S_g. Thus:

$\begin{matrix} {\hat{ω} (S_{g})}^{†} (\begin{matrix} \hat{φ} \\ \hat{π} \end{matrix}) \hat{ω} (S_{g}) = (\begin{matrix} ρ (g) \hat{φ} \\ ρ^{*} (g) \hat{π} \end{matrix}) & (46) \end{matrix}$

For example, in case of translations along the μ∈{1, . . . , d} axis, (τ_μφ)_a,x=φ_a,x+e_μ, and S_τ_μ=τ_μ⊕τ_μ, which translates both {circumflex over (φ)}, {circumflex over (π)} variables.

Equivariance of a quantum linear layer {circumflex over (ω)}(S) now amounts to the commutation relations:

{circumflex over (ω)}(S){circumflex over (ω)}(S_g)={circumflex over (ω)}(S_g){circumflex over (ω)}(S)⇒SS_g=S_gS (47)

where the second formula follows from the group homomorphism property: {circumflex over (ω)}(S){circumflex over (ω)}(S′)={circumflex over (ω)}(SS′). The characterization of symmetries presented herein completely solves the problem of designing equivariant quantum linear layers by reducing the problem to designing equivariant classical linear layers with symplectic weight matrices S which intertwine the action of ρ(g)⊕ρ*(g). Similar to the classical case, since the nonlinearities act pointwise, they will be invariant under operations that permute the coordinates ({circumflex over (φ)}_x,a, {circumflex over (π)}_x,a)({circumflex over (φ)}_x′,a′, {circumflex over (π)}_x′,a′), such as spatial symmetries, ensuring equivariance of the whole architecture.

As an illustration, with the parametrization S=e^X∈Sp_2M(), the condition SS_τ_μ=S_τ_μS restricts X to the form:

$\begin{matrix} X = (\begin{matrix} A & B \\ C & D \end{matrix}) & (48) \end{matrix}$

where each M×M block of Xis a convolution.

Particle Interpretation of Quantum Formalism for Neural Networks
The quantum formalism described herein is amenable to a particle interpretation for neural networks. To that end, the following operators are introduced:

$\begin{matrix} \hat{a} = \frac{1}{\sqrt{2}} (\hat{φ} + i \hat{π}), {\hat{a}}^{†} = \frac{1}{\sqrt{2}} (\hat{φ} - i \hat{π}) & (49) \end{matrix}$

A particle basis (e.g., a Fock space) may then be used for H, where a zero mean and unit covariance Gaussian state are identified with no particles (vacuum)|Ω=|0,1 such that Ω=0, and an orthogonal basis is created by acting with different monomials ()ⁿⁱ. . . ()ⁿ^mon |Ω n_ibeing the number of particles at location a,x. In the quantum optical setting, the particles are called photons. In the neural network context, these fundamental excitations may be referred to as “Hintons.”

Deep Linear Quantum Networks
Despite producing entangled states, the quantum linear layers acting on Gaussian states can be simulated efficiently on a classical computer. At this level, the only difference between a quantum evolution and a probabilistic classical evolution of a Gaussian Liouville measure in phase space is the covariance condition C+iJ≥0 coming from the non-commutativity of position and momenta in quantum mechanics. Notably, this condition is preserved by classical evolution thanks to the symplectic nature of classical mechanics.

Embedding Classical Probabilistic Neural Networks in a Quantum Architecture
Now it is possible to show how a classical probabilistic neural network can be embedded in the quantum model discussed above with respect to Equations 42-44.

First, it can be shown that the following representation of the push forward of a Gaussian process under a generic classical (invertible) map is a unitary operation.

Let |m, C be a Gaussian state and Û a unitary such that:

{circumflex over (ω)}Û=F({circumflex over (φ)}) (50)

Then:

⊕φÛm,C|²=(F#GP(m¹,C¹¹))(φ) (51)

where f#p denotes the push forward of p under f.

It is already known that Û_σ has this property, so the quantum linear layers only need to be constrained so that they do not mix {circumflex over (φ)} with {circumflex over (π)}.

Consider the unitary of Equation 42 with:

$\begin{matrix} S^{(ℓ)} = (\begin{matrix} A^{(ℓ)} & 0 \\ 0 & {({(A^{(ℓ)})}^{- 1})}^{T} \end{matrix}), ξ^{(ℓ)} = (b^{(ℓ)}, 0) & (52) \end{matrix}$

Then there is the quantum-classical duality:

|φ|Û_NN|ζ_D²=(Φ#GP(μ′,k′))(φ) (53)

where the right hand side of Equation 53 is the push forward of the Gaussian process posterior of Equation 1 under the PNCNN map Φ of Equation 6. This shows that the logits of Equation 44 computed by the quantum neural network coincide with those computed by a PNCNN with weights and biases ,

The Semiclassical Limit
Let R=(φ,π) denote the classical fields corresponding to the quantum operators introduced above (where the quantum operators include a hat accent).

The preceding discussion has established two tractable limits of the quantum model. For example, as above, the quantum model reduces to the push forward of a Gaussian Liouville distribution with constrained covariance C+iJ≥0 under the linear layer action. Further, restricting the linear layers to block diagonal matrices leads to the classical model, which also corresponds to the push of an initial Gaussian Liouville distribution under a neural network, but this time only involving the φ field.

A nonlinearity may be modified in such a way that the modified model corresponds to the push forward of an initial Gaussian measure under a neural network, involving both the φ and the π fields. To do that, Û_σ may be replaced with a classical Hamiltonian evolution under which the phase space measure evolves into a new classical phase space measure.

Under the classical time evolution generated by the Hamiltonian:

H=Σ
_x,aπ_x,af(φ_x,a) (54)

and the fields transform as:

$\begin{matrix} φ_{x, a} (t) = F^{- 1} (F (φ_{x, a} (0)) + t) & (55) \\ π_{x, a} (t) = π_{x, a} (0) \frac{f (φ_{x, a} (0))}{f (φ_{x, a} (t))} & (56) \end{matrix}$

where F′(x)=1/f (x).

Notably, the classical and quantum equations of motions and solutions look identical. This is a consequence of the correspondence between quantum and classical mechanics under the identification [Â, Ĥ]↔iℏ{A, H}.

A neural network may then be defined that pushes forward the input Gaussian Liouville distribution on phase space to an output distribution p_outby chaining linear and nonlinear classical layers. Its mean is interpreted as the logits for classification:

logit_c=_φ_c_˜p_out(φ_c) (57)

The resulting model can then be interpreted as a semi-classical limit of the quantum model since it uses elements of quantum mechanics (uncertainty relation for the covariance) as well as classical mechanics (for the nonlinearity).

The classical counterpart of the quantum softplus defined above may be obtained by replacing operators with classical variables as:

$\begin{matrix} U_{σ} : (\begin{matrix} φ \\ π \end{matrix}) \mapsto (\begin{matrix} \frac{1}{β} \log (1 + e^{βφ}) \\ π (1 + e^{- βφ}) \end{matrix}) & (58) \end{matrix}$

Notably, the nonlinearity of Equation 62, below, can lead to very large values of π when an entry of φ is large and negative due to the exponential. To cure this problem, a given nonlinearity σ can be associated with the following symplectic map.

For any smooth function σ, the following nonlinear map is symplectic:

$\begin{matrix} U_{σ} : (\begin{matrix} φ \\ π \end{matrix}) \mapsto (\begin{matrix} σ (φ) \\ {π (σ^{'} (φ))}^{- 1} \end{matrix}) & (59) \end{matrix}$

To prove this, it needs to be the case that the Jacobian is symplectic over the whole space for any smooth σ. The Jacobian is diagonal in the channel and X space and is:

$\begin{matrix} (\begin{matrix} σ^{'} (φ_{a, x}) & 0 \\ {(σ^{'} (φ_{a, x}))}^{- 1} \end{matrix}) & (60) \end{matrix}$

Since any 2×2 matrix with unit determinant is symplectic, this proves the the nonlinear map above is symplectic. Therefore to cure the divergence problem of the symplectic softplus, a leaky version may be defined as:

$\begin{matrix} σ (φ) = \frac{\log (1 + e^{(β - α) φ})}{β - α} + αφ & (61) \end{matrix}$

so that:

$\begin{matrix} U_{σ} : (\begin{matrix} φ \\ π \end{matrix}) \mapsto (\begin{matrix} \frac{\log (1 + e^{(β - α) φ})}{β - α} + αφ \\ π (e^{βφ} + e^{αφ}) {((1 + α) e^{βφ} + α e^{- αφ})}^{- 1} \end{matrix}) & (62) \end{matrix}$

Testing the semiclassical neural network with the leaky softplus nonlinearity on a simple proof of principle task showed similar performance to that of a classical neural network, which validates the use of the leaky softplus as nonlinearity for this task.

Quantum Optical Implementation
By identifying as a photon creation operator, it is possible to map a unitary operator onto a set of quantum optical gates. Optical devices are attractive since one can implement matrix multiplication in a very efficient way. A graphical depiction of the proposed implementation is depicted in FIG. 3. The following reviews the implementation of linear layers in quantum optical implementations, and then explains how to implement the quantum softplus nonlinearity and the data embedding in a Gaussian state.

Linear Layer of Quantum Optical Implementation
A unitary {circumflex over (φ)}(S) may be implemented in a quantum optical computer according to the following steps. First, the unitary is decomped in terms of elementary linear optical gates. To this end, the group homomorphism property {circumflex over (ω)}(S){circumflex over (ω)}(S′)={circumflex over (ω)})(SS′) may be used together with the Bloch-Messiah decomposition S=KΣL with K, L symplectic and orthogonal and Σ=diag(e^r¹, . . . , e^r^M, e^−r¹, . . . , e^−r^M) . Σ can be implemented directly using optical parametric amplifiers. The orthogonal matrices K, L can be further decomposed using Givens rotations as product of rotations of two components and implemented in terms of beamsplitters and phase shifters.

State Preparation of Quantum Optical Implementation
Encoding the input signal D={(x_i, y_i)}_i=1^Nstarts from the vacuum state |Ω|, as defined above, which can be created using lasers. Then, the operator {circumflex over (ω)})(S=A⊕(A⁻¹)^T) act as discussed above. The state {circumflex over (ω)})(S=A⊕⁻¹)^T)|Ω is the prior Gaussian state with prior kernel k=AA^T—note that there is no loss of generality since any positive matrix k can always be factorized as k=AA^T. The input is conditioned on data by measuring the operators {circumflex over (φ)} at x_i. The state is then prepared as |y_x=|y₁_x₁⊗ . . . ⊗|y_N_x_Nas the limit of squeezed states, which can be implemented efficiently.

Nonlinearity of Quantum Optical Implementation
Quantum computers can perform arbitrary computations if given a set of universal gates. For quantum optical computers, one can take the quadratic Hamiltonians and the cubic gate, whose Hamiltonian is {circumflex over (φ)}³. The following is an example of how to implement a nonlinearity with a Hamiltonian (as in Equation 54) where the Taylor series of the function f is truncated to order k:

Ĥ=Σ
_x,a

Ĥ
_l, =½({circumflex over (π)}_x,a+{circumflex over (π)}_x,a) (63)

Then, the standard procedure for quantum simulation can be used to implement e^iĤ^(K).

A first step is to trade e^im∈Ĥ for the m times application of , so that the problem reduces to how to implement . This is explained by the following.

Initially, define the unitaries:

=exp(i∈) (64)

Then, define =−1/(4(+2)), such that:

=+O(∈^3/4) (65)

= (66)

= (67)

Since Û_1,∈ has a quadratic Hamiltonian, the quantum gates can be built recursively using only {circumflex over (π)}²and {circumflex over (φ)}³.

Thus, an explicit decomposition may be derived using only the universal Hamiltonians {circumflex over (π)}_x,a, {circumflex over (π)}_x,a², and {circumflex over (φ)}_x,a³, which allows implementation on a quantum optical device.

Example Method for Performing Quantum Probabilistic Convolution

FIG. 2 depicts an example method 200 for performing quantum probabilistic convolution, such as with a quantum numeric probabilistic convolutional neural network as described herein.

Method 200 begins at step 202 with determining prior states of input signals. In some cases, the input signals may be laser light beams encoding input data.

Method 200 then proceeds to step 204 with conditioning on the input signals. For example, the input signals may be conditioned on by measuring the input signals at various times.

Method 200 then proceeds to step 206 with applying weights to the input signals to generate weighted input signals. For example, unitary quantum gates may be used to apply the weights to the input signals.

In some cases, such as where the input signals are encoded in light, optical components, such as beam splitters, phase shifters, optical parametric amplifiers, movable mirrors, movable lenses, and the like, may be used to apply the weights to the light beams (e.g., physical input signals).

Method 200 then proceeds to step 208 with applying a quantum nonlinearity to the weighted input signals to generate quantum activations.

In some cases the quantum nonlinearity comprises a quantum softplus nonlinearity to generate quantum activations. In some cases, the quantum nonlinearity comprises a Hamiltonian according to Equation 54.

Method 200 then proceeds to step 210 with making an inference based on the quantum activations.

FIG. 3 depicts an example flow through quantum optical processing equipment. FIG. 3 may thus be an example of implementing method 200 on a quantum processing system.

For example, laser emitters 302 may emit input signals in the form of laser light beams.

Prior state determination component 304 may use, for example, an interferometer to determine the prior state of each of the light-based input signals.

Measurement component 306 may condition on the input signals by measuring the input signals at various times, such as shown at 314. Note that the measured samples are not complete (e.g., there is a y₁and y₃measurement, but no measurement y₂)—thus a probabilistic-type convolution operation is necessary. In some examples, measurement component may be implemented as a quantum Gaussian process to create a posterior state.

Quantum linear component 308 may apply weights to the input signals, such as described above. Quantum linear component 308 thus acts like the quantum equivalent to a classical convolution layer.

In some examples, applying weights to light-based input signals may include the use of various optical components, such as beam splitters, movable mirrors, phase shifters, and others.

Quantum nonlinear component 310 may then apply a quantum nonlinearity to the weighted input signals, such as described above, to generate quantum activation data.

Prediction component 312 may then generate a prediction, such as a class prediction, based on the quantum activation data. For example, an observable may be measured with a detector to determine a resulting class, C, for classifying the input signal.

Note that FIG. 3 represents a simplified flow of the process of FIG. 2. In other examples, additional quantum linear operators (e.g., 308) and quantum nonlinear operators (e.g., 310) may be used to generate quantum equivalent to a probabilistic numeric convolutional neural network.

Example Processing System for Performing Quantum Processing Of Probabilistic Numeric Convolutional Neural Networks

FIG. 4A depicts an example of a quantum processing system 400.

Quantum processing system 400 includes a quantum processing unit 402, which may be configured to perform processing of quantum computing data.

Quantum processing system 400 further includes an optical signal transceiver configured for sending and receiving optical signals, such as laser light beams.

Quantum processing system 400 further includes state determination component 406, such as described above with respect to 304 in FIG. 3. In some examples, state determination component may include an optical interferometer.

Quantum processing system 400 further includes measuring component 408, such as described above with respect to 306 in FIG. 3. For example, measuring component 408 may be configured to determine characteristics, including quantum characteristics, of various signals, such as input signals. As above, the input signals may be laser light beam-based signals in some examples.

Quantum processing system 400 further includes input and output components 410, such as screens, touch-sensitive surfaces (including touch-sensitive displays), physical buttons, speakers, microphones, and the like.

Quantum processing system 400 further includes quantum linear component 412, such as described above with respect to 308 in FIG. 3. In some examples, quantum linear component may include beam splitters, phase shifters, optical parametric amplifiers, movable mirrors, movable lenses, and other optical components.

Quantum processing system 400 further includes quantum nonlinear component 414, such as described above with respect to 310 in FIG. 3.

Quantum processing system 400 further includes prediction component 416, such as described above with respect to 312 in FIG. 3.

The depicted components, and others not depicted, may be configured to perform various aspects of the methods described herein, including method 200 of FIG. 2.

Notably, in other embodiments, aspects of processing system 400 may be omitted, and other aspects may be added.

FIG. 4B depicts an example processing system 400 for performing simulation of quantum processing of probabilistic numeric convolutional neural networks on non-quantum computing hardware.

Processing system 450 includes a central processing unit (CPU) 452, which in some examples may be a multi-core CPU. Instructions executed at the CPU 452 may be loaded, for example, from a program memory associated with the CPU 452 or may be loaded from a memory partition 470.

Processing system 450 also includes additional processing components tailored to specific functions, such as a graphics processing unit (GPU) 454, a digital signal processor (DSP) 456, and a neural processing unit (NPU) 458.

An NPU, such as 458, is generally a specialized circuit configured for implementing all the necessary control and arithmetic logic for executing machine learning algorithms, such as algorithms for processing artificial neural networks (ANNs), deep neural networks (DNNs), random forests (RFs), and the like. An NPU may sometimes alternatively be referred to as a neural signal processor (NSP), tensor processing units (TPU), neural network processor (NNP), intelligence processing unit (IPU), vision processing unit (VPU), or graph processing unit.

NPUs, such as 458, are configured to accelerate the performance of common machine learning tasks, such as image classification, machine translation, object detection, and various other predictive models. In some examples, a plurality of NPUs may be instantiated on a single chip, such as a system on a chip (SoC), while in other examples they may be part of a dedicated neural-network accelerator.

NPUs may be optimized for training or inference, or in some cases configured to balance performance between both. For NPUs that are capable of performing both training and inference, the two tasks may still generally be performed independently.

NPUs designed to accelerate training are generally configured to accelerate the optimization of new models, which is a highly compute-intensive operation that involves inputting an existing dataset (often labeled or tagged), iterating over the dataset, and then adjusting model parameters, such as weights and biases, in order to improve model performance. Generally, optimizing based on a wrong prediction involves propagating back through the layers of the model and determining gradients to reduce the prediction error.

NPUs designed to accelerate inference are generally configured to operate on complete models. Such NPUs may thus be configured to input a new piece of data and rapidly process it through an already trained model to generate a model output (e.g., an inference).

In one implementation, NPU 458 is a part of one or more of CPU 452, GPU 454, and/or DSP 456.

Processing system 450 may also include one or more input and/or output devices 460, such as screens, touch-sensitive surfaces (including touch-sensitive displays), physical buttons, speakers, microphones, and the like.

Processing system 450 also includes memory 470, which is representative of one or more static and/or dynamic memories, such as a dynamic random access memory, a flash-based static memory, and the like. In this example, memory 470 includes computer-executable components, which may be executed by one or more of the aforementioned processors of processing system 450.

In particular, in this example, memory 470 includes probabilistic numeric convolutional neural network processing component 472 and quantum simulation component 474. Quantum simulation component 474 may generally be configured to simulate quantum processing of a probabilistic numeric convolutional neural network on non-quantum hardware. The depicted components, and others not depicted, may be configured to perform various aspects of the methods described herein.

Processing system 450 further comprises quantum simulation circuit 462, which may generally be configured to simulate quantum processing of a probabilistic numeric convolutional neural network on non-quantum hardware.

Notably, in other embodiments, aspects of processing system 450 may be omitted, and other aspects may be added.

Example Clauses
Implementation examples are described in the following numbered clauses:

Clause 1: A method, comprising: performing a probabilistic convolution operation with an optical quantum computer, wherein input signals to the probabilistic convolution operation are encoded in light beams.

Clause 2: The method of Clause 1, wherein performing the probabilistic convolution operation comprises determining one or more prior states of the input signals using an interferometer.

Clause 3; The method of any one of Clauses 1-2, wherein performing the probabilistic convolution operation further comprises conditioning the input signals by measuring the light beams at a plurality of times.

Clause 4: The method of any one of Clauses 1-3, wherein performing the probabilistic convolution operation further comprises projecting weights onto the input signals using a plurality of unitary quantum gates to generate weighted input signals.

Clause 5: The method of Clause 4, wherein projecting weights onto the input signals further comprises applying one or more of a beam splitter and a phase shifter to the light beams encoding the input signals.

Clause 6: The method of any one of Clause 1-4, wherein performing the probabilistic convolution operation further comprises applying a quantum nonlinearity to the weighted input signals to generate quantum activations.

Clause 7: The method of Clause 6, wherein the quantum nonlinearity comprises a quantum softplus nonlinearity.

Clause 8: The method of Clause 7, wherein the quantum nonlinearity comprises a Hamiltonian according to Equation 54.

Clause 9: The method of any one of Clauses 6-8, further comprising performing a prediction based on the quantum activations.

Clause 10: A method, comprising simulating a quantum probabilistic convolution operation using a non-quantum processing system.

Clause 11: The method of Clause 10, wherein simulating the quantum probabilistic convolution operation comprises determining one or more prior states of one or more input signals.

Clause 12: The method of any one of Clauses 10-11, wherein simulating the quantum probabilistic convolution operation further comprises conditioning the input signals using a Gaussian process.

Clause 13: The method of any one of Clauses 10-12, wherein simulating the quantum probabilistic convolution operation further comprises projecting weights onto the input signals using a plurality of unitary quantum gates to generate weighted input signals.

Clause 14: The method of any one of Clauses 10-13, wherein simulating the quantum probabilistic convolution operation further comprises applying a quantum nonlinearity to the weighted input signals to generate quantum activations.

Clause 15: The method of Clause 14, wherein the quantum nonlinearity comprises a quantum softplus nonlinearity.

Clause 16: The method of Clause 14, wherein the quantum nonlinearity comprises a Hamiltonian according to Equation 54.

Clause 17: The method of any of Clauses 10-16, further comprising performing a prediction based on the quantum activations.

Clause 18: A processing system, comprising: a memory comprising computer-executable instructions; and one or more processors configured to execute the computer-executable instructions and cause the processing system to perform a method in accordance with any one of Clauses 1-17.

Clause 19: A processing system, comprising means for performing a method in accordance with any one of Clauses 1-17.

Clause 20: A non-transitory computer-readable medium comprising computer-executable instructions that, when executed by one or more processors of a processing system, cause the processing system to perform a method in accordance with any one of Clauses 1-17.

Clause 21: A computer program product embodied on a computer-readable storage medium comprising code for performing a method in accordance with any one of Clauses 1-17.

Additional Considerations
The preceding description is provided to enable any person skilled in the art to practice the various embodiments described herein. The examples discussed herein are not limiting of the scope, applicability, or embodiments set forth in the claims. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments. For example, changes may be made in the function and arrangement of elements discussed without departing from the scope of the disclosure. Various examples may omit, substitute, or add various procedures or components as appropriate. For instance, the methods described may be performed in an order different from that described, and various steps may be added, omitted, or combined. Also, features described with respect to some examples may be combined in some other examples. For example, an apparatus may be implemented or a method may be practiced using any number of the aspects set forth herein. In addition, the scope of the disclosure is intended to cover such an apparatus or method that is practiced using other structure, functionality, or structure and functionality in addition to, or other than, the various aspects of the disclosure set forth herein. It should be understood that any aspect of the disclosure disclosed herein may be embodied by one or more elements of a claim.

As used herein, the word “exemplary” means “serving as an example, instance, or illustration.” Any aspect described herein as “exemplary” is not necessarily to be construed as preferred or advantageous over other aspects.

As used herein, a phrase referring to “at least one of” a list of items refers to any combination of those items, including single members. As an example, “at least one of: a, b, or c” is intended to cover a, b, c, a-b, a-c, b-c, and a-b-c, as well as any combination with multiples of the same element (e.g., a-a, a-a-a, a-a-b, a-a-c, a-b-b, a-c-c, b-b, b-b-b, b-b-c, c-c, and c-c-c or any other ordering of a, b, and c).

As used herein, the term “determining” encompasses a wide variety of actions. For example, “determining” may include calculating, computing, processing, deriving, investigating, looking up (e.g., looking up in a table, a database or another data structure), ascertaining and the like. Also, “determining” may include receiving (e.g., receiving information), accessing (e.g., accessing data in a memory) and the like. Also, “determining” may include resolving, selecting, choosing, establishing and the like.

The methods disclosed herein comprise one or more steps or actions for achieving the methods. The method steps and/or actions may be interchanged with one another without departing from the scope of the claims. In other words, unless a specific order of steps or actions is specified, the order and/or use of specific steps and/or actions may be modified without departing from the scope of the claims. Further, the various operations of methods described above may be performed by any suitable means capable of performing the corresponding functions. The means may include various hardware and/or software component(s) and/or module(s), including, but not limited to a circuit, an application specific integrated circuit (ASIC), or processor. Generally, where there are operations illustrated in figures, those operations may have corresponding counterpart means-plus-function components with similar numbering.

The following claims are not intended to be limited to the embodiments shown herein, but are to be accorded the full scope consistent with the language of the claims. Within a claim, reference to an element in the singular is not intended to mean “one and only one” unless specifically so stated, but rather “one or more.” Unless specifically stated otherwise, the term “some” refers to one or more. No claim element is to be construed under the provisions of 35 U.S.C. § 112(f) unless the element is expressly recited using the phrase “means for” or, in the case of a method claim, the element is recited using the phrase “step for.” All structural and functional equivalents to the elements of the various aspects described throughout this disclosure that are known or later come to be known to those of ordinary skill in the art are expressly incorporated herein by reference and are intended to be encompassed by the claims. Moreover, nothing disclosed herein is intended to be dedicated to the public regardless of whether such disclosure is explicitly recited in the claims.

QUANTUM PROCESSING OF PROBABILISTIC NUMERIC CONVOLUTIONAL NEURAL NETWORKS

Information

Publication Number

Date Filed

Date Published

Inventors

Original Assignees

CPC

International Classifications

Abstract

Description

Claims

CROSS-REFERENCE TO RELATED APPLICATIONS

Provisional Applications (1)