This application concerns quantum computing.
This disclosure descries embodiments of a quantum classifier training strategy. In some examples, optimal parameters are learned by coordinate ascent using closed-form equations for absolute conditional maximum of the utility function in one parameter at a time. In such examples, this strategy helps ensures monotonic convergence to local maxima in a parameter space at predictable convergence rates and eliminates overhead due to hyperparameter sweeps. Embodiments of of the strategy use asymptotically fewer measurements and post-selection steps than other possible approaches. Embodiments of the described methods are also applicable to circuit compression. Numerical simulations are also disclosed that demonstrate embodiments of the technology.
In certain embodiments, a circuit classifier is trained based on variational quantum circuits. The training can comprise, for example: receiving a set of labeled data; performing a coordinate-vise ascent to learn the circuit classifier for the labeled data, wherein the coordinate-wise ascent is performed on a classical computing device and thereby trains the circuit classifier; and executing the trained circuit classifier on a quantum computer. In some implementations, the circuit classifier has a structure with variational parameters. The circuit classifier can comprise a plurality of single-qubit and/or two-qubit gates that have variational parameters defining unitary action of the circuit on quantum states. In some examples, the training further comprises modifying variational parameters one at a time. In further examples, the training comprises maximizing a utility function by selection of one or more variational parameters. The utility function can, for example, apply a non-degenerate observable that has two different eigenvalues. In some examples, the variational parameters are fixed one by one, according to a predetermined schedule that visits each parameters at least once. In other examples, the variational parameters are fixed one by one, according to a randomized schedule that visits each parameter at least once. In some examples, the training comprises splitting the training data into smaller batches that are used by the utility function to update the variational quantum circuits.
Some example embodiments comprise receiving a batch of training samples, a variational quantum circuit skeleton for learning one or more variational parameters, and a set of initial values for the variational parameters; by a classical computer, generating a classical description of a quantum program to be implemented by a quantum circuit; by the classical computer, training the quantum circuit described by the quantum program using the batch of training samples and incrementally adjusting the variational parameters to improve prediction of a set of test data; and implementing the trained quantum circuit described by the quantum program on a quantum computing device.
In some examples, one or more of the following are received: parameter tolerance bounds, or a bound on a maximum number of iterations to be performed during the training. In some implementations, the training is performed iteratively by computing analytic expressions for expectation values of the training data to increase a probability of correct identification of training labels. In some implementations, the expectation value is inferred by computing overlaps between quantum states. For example, the overlap computation can be performed by a Hadamard test to infer the real and imaginary components of the overlap. In further examples, the training sample is given by a qubit encoding or an amplitude encoding in which data is represented as amplitudes or phases of a state vector of qubits of the quantum circuit.
Another example embodiment is a system, comprising: a quantum computing device; and a classical computing device in communication with the quantum computing device, the classical computing device being programmed to predict a class label using a quantum computer that applies a trained quantum circuit to a representation of the input data, measures the quantum state, and generates a sampled bit for inferring the class label. In certain implementations, the representation of the input data is given by an amplitude encoding of the data or a qubit encoding of the data. In further implementations, the classical computing device is programmed to train the quantum computer to predict a class label using a pre-trained classifier circuit. In further implementations, the coordinate ascent procedure uses hyperparameters. In some implementations, the set of hyperparameters includes one or more of: (a) a depth of the quantum circuit that is being trained; (b) a size of a mini-batch in training the quantum circuit; (c) a maximum number of iterations used for training; (d) a number of random restarts that are applied in the training. In certain implementations, a program run on a classical computer is used for sweeping through feasible values of hyperparameters and and post-selecting quantum circuit(s) with the optimal hyperparameter choices.
Any of the disclosed embodiments can be implemented by one or more computer-readable media storing computer-executable instructions, which when executed by a computer cause the computer to perform any of the disclosed methods. Also disclosed herein are systems for performing embodiments of the disclosed embodiments comprising a classical computer configured to program, control, and/or measure a quantum computing device.
The foregoing and other objects, features, and advantages of the disclosed technology will become more apparent from the following detailed description, which proceeds with reference to the accompanying figures.
As used in this application, the singular forms “a,” “an,” and “the” include the plural forms unless the context clearly dictates otherwise. Additionally, the term “includes” means “comprises.” Further, the term “coupled” does not exclude the presence of intermediate elements between the coupled items. Further as used herein, the term “and/or” means any one item or combination of any items in the phrase.
Although the operations of some of the disclosed methods are described in a particular, sequential order for convenient presentation, it should be understood that this manner of description encompasses rearrangement, unless a particular ordering is required by specific language set forth below. For example, operations described sequentially may in some cases be rearranged or performed concurrently. Moreover, for the sake of simplicity, the attached figures may not show the various ways in which the disclosed systems, methods, and apparatus can be used in conjunction with other systems, methods, and apparatus. Additionally, the description sometimes uses terms like “produce” and “provide” to describe the disclosed methods. These terms are high-level abstractions of the actual operations that are performed. The actual operations that correspond to these terms will vary depending on the particular implementation and are readily discernible by one of ordinary skill in the art.
With reference to
The computing environment can have additional features. For example, the computing environment 100 includes storage 140, one or more input devices 150, one or more output devices 160, and one or more communication connections 170. An interconnection mechanism (not shown), such as a bus, controller, or network, interconnects the components of the computing environment 100. Typically, operating system software (not shown) provides an operating environment for other software executing in the computing environment 100, and coordinates activities of the components of the computing environment 100.
The storage 140 can be removable or non-removable, and includes one or more magnetic disks (e.g., hard drives), solid state drives (e.g., flash drives), magnetic tapes or cassettes, CD-ROMs, DVDs, or any other tangible non-volatile storage medium which can be used to store information and which can be accessed within the computing environment 100. The storage 140 can also store instructions for the software 180 implementing any of the disclosed techniques. The storage 140 can also store instructions for the software 180 for generating and/or synthesizing any of the described techniques, systems, or quantum circuits.
The input device(s) 150 can be a touch input device such as a keyboard, touchscreen, mouse, pen, trackball, a voice input device, a scanning device, or another device that provides input to the computing environment 100. The output device(s) 160 can be a display device (e.g., a computer monitor, laptop display, smartphone display, tablet display, netbook display, or touchscreen), printer, speaker, or another device that provides output from the computing environment 100.
The communication connection(s) 170 enable communication over a communication medium to another computing entity. The communication medium conveys information such as computer-executable instructions or other data in a modulated data signal. A modulated data signal is a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. By way of example, and not limitation, communication media include wired or wireless techniques implemented with an electrical, optical, RF, infrared, acoustic, or other carrier.
As noted, the various methods and techniques for performing any of the disclosed technologies, for controlling a quantum computing device, to perform circuit design or compilation/synthesis as disclosed herein can be described in the general context of computer-readable instructions stored on one or more computer-readable media. Computer-readable media are any available media (e.g., memory or storage device) that can be accessed within or by a computing environment. Computer-readable media include tangible computer-readable memory or storage devices, such as memory 120 and/or storage 140, and do not include propagating carrier waves or signals per se (tangible computer-readable memory or storage devices do not include propagating carrier waves or signals per se).
Various embodiments of the methods disclosed herein can also be described in the general context of computer-executable instructions (such as those included in program modules) being executed in a computing environment by a processor. Generally, program modules include routines, programs, libraries, objects, classes, components, data structures, and so on, that perform particular tasks or implement particular abstract data types. The functionality of the program modules may be combined or split between program modules as desired in various embodiments. Computer-executable instructions for program modules may be executed within a local or distributed computing environment.
An example of a possible network topology 200 (e.g., a client-server network) for implementing a system according to the disclosed technology is depicted in
Another example of a possible network topology 300 (e.g., a distributed computing environment) for implementing a system according to the disclosed technology is depicted in
With reference to
The environment 400 includes one or more quantum processing units 402 and one or more readout device(s) 408. The quantum processing unit(s) execute quantum circuits that are precompiled and described by the quantum computer circuit description. The quantum processing unit(s) can be one or more of, but are not limited to: (a) a superconducting quantum computer; (b) an ion trap quantum computer; (c) a fault-tolerant architect ire for quantum computing; and/or (d) a topological quantum architecture (e.g., a topological quantum computing device using Majorana zero modes). The precompiled quantum circuits, including any of the disclosed circuits, can be sent into (or otherwise applied to) the quantum processing unit(s) via control lines 406 at the control of quantum processor controller 420. The quantum processor controller (QP controller) 420 can operate in conjunction with a classical processor 410 (e.g., having an architecture as described above with respect to
With reference to
In other embodiments, compilation and/or verification can be performed remotely by a remote computer 460 (e.g., a computer having a computing environment as described above with respect to
Variational quantum circuits (VQC) is a rapidly developing technology crucial for many machine learning applications. The technology is especially desirable for solutions involving noisy intermediate-scale quantum devices. In this disclosure, embodiments that greatly and substantially simplify and streamline the design of variational quantum circuits for classification are disclosed. These embodiments allow for the closed-form analysis of partial derivatives of a given variational circuit with respect to any chosen parameter.
In more detail, and in comparison to other approaches, embodiments of the disclosed technology: (1) asymptotically reduce the number of training epochs used; (2) within each epoch, asymptotically reduce the number of queries to a quantum coprocessor used; (3) provide for a robust monotonic convergence to a local optima; and/or (4) asymptotically reduce the number of destructive measurements used.
In embodiments of the disclosed technology, simplified variational quantum circuits refer to a composition of generalized rotations R(θ, P)=e−iθP or generalized controlled rotation CR(θ, P, Π)=(Π⊥+Πe−iθP), where Π is a projector, Π⊥ is its orthogonal complement (Π2=Π, Π+Π⊥=I), and the P and Π form a commuting pair: [P,Π]=0. The uncontrolled generalized rotation is a special case of the controlled one; R(θ, P)=CR(θ, P, I). It is also noted that the customary option of “controlled” is also a special case, where Π projects onto a certain half-space spanned by a half of the standard computational basis.
Remark 1. The commutativity requirement [P, Π]=0 in the above definition of CR(θ, P, Π) is adopted for convenience only. In the absence of commutativity, one can define CR as CR(θ, P, Π)=(Π⊥+Πe−iθPΠ) which defines a unitary operator in this more general case. Since this would lead to somewhat more intricate mathematics in the ongoing sections, one can adopt the commutativity requirement. It is satisfied in all further scenarios of interest.
In this example description of the coordinate ascent circuit strategy, one can start with the problem of approximating a target unitary operator and then proceed to develop a flavor of the strategy that trains variational quantum classifier circuits. The training of variational quantum classifiers is a desirable goal.
In both undergoing sections and further in this disclosure, the following ansatz for a variational quantum circuit U([θ], [P], [Π]) is used:
where for each j∈[L], θj is a real valued parameter, Pj2=I and Πj2=Πj.
A typical example of a rapidly-entangling variational quantum circuit, in five qubits is shown in
In this subsection:
Let be a finite set of data samples, where each sample χ∈ comes with efficient encoding into a vector of |χ of some Hilbert space that is common to all the data, samples.
Let A be a unitary operator on the Hilbert space and the goal is to approximate the action of operator A on the ensemble of the encoded data samples from .
To do this, one can pick a sufficiently large L and look for a circuit of the form (1) (where all Pj, Πj operate on ) that optimally emulates the action of A on .
One can define the optimal circuit as a circuit that maximizes (or otherwise improves) the following utility function:
This utility function has the upper bound of 1, which is reached if and only if the circuit U([θ], [P], [Π]) provide precise emulation of A on the set . Otherwise it, measures the mean cosine similarity between images of data samples under the action of U([θ], [P], [Π]) and A respectively.
For a chosen j∈[L], one can split the variational circuit (1) into
and where one can explicitly use the representation
CR(θj,Pj,Πj)=Πj⊥+Πj(cos(θj)I−i sin(θj)Pj). (4)
By usual convention, a product expression (as shown above) defaults to an identity operator when the number of factors is less than one.
The dependence of ([θ], [P], [Π]) on θj can now be explicitly written out as:
Here, θ˜j is introduced to denote the “punctured” list of parameters, in particular θ˜j=[θ1, . . . , θj−1, θj+1, . . . , θL].
It follows that:
Since each of the angles θj span the compact smooth circumference S1, the θ*j=argmax0
tan(θ*j)=Cj(θ˜j)/Bj(θ˜j).
Thus, for any choice of fixed values in θ˜j, one can compute the conditional maximum of the utility function in θj (given these values) in closed form. This eliminates problems related to a bad choice of learning rates, problems with vanishing stochastic gradients, and/or, more generally, problems with slow or unstable convergence of stochastic gradient ascent.
Consider now a mathematical description of a variational quantum classifier. Such a classifier strives to learn an optimal observable such that the expected value of that observable on a data sample predicts the class label to be assigned to that sample.
Let A be some standard observable—for example, a Pauli operator—on expressed in computational basis and let U([θ])=U([θ], [P], [Π]) be a variational circuit of the form (1). For quantum encoding |χ of an input data sample, one can look to use
U([θ]), [P], [Π])χ|A|U([θ], [P], [Π])χ,
as the inference expectation.
For simplicity, it is assumed here and below, that the observable A is non-degenerate and has two eigenvalues λ1, λ2∈. Suppose, without loss loss of generality, that λ1>λ2.
Let be a finite set of data samples, and :∴{λ1, λ2} be a two-level labeling function.
By spectral theorem,
A=λ
1Πλ
where Πλ
Note that, because λ1>λ2 one can rewrite the two projectors as:
One can infer the class label of an input data sample χ based on the overlap of its quantum encoding |χ with either of the two eigenspaces of the observable U([θ])†AU([θ]).
The mean probability of inferring the right label by this method can be expressed as:
Using the above formulae for projectors this cam be rewritten as:
is what is going to be used as the classifier utility function. Again, because the angle parameters [θ] span a smooth compact manifold, the utility function ([θ], [P], [Π]) desirably reaches a maximum at one of its interior critical points of the parameter space.
For a chosen j∈[L], fix all the parameters in θ˜j=(θ1, . . . , θj−1, θj+1, . . . , θL). In order to find conditional maximum maxθ
Recall the split:
U([θ], [P], [Π])=VjCR(θj, Pj, Πj)Wj.
In view of explicit representation (4), one can rewrite each term of equation (8) as follows:
χ|U([θ])†AU([θ])|χ=Vj(Πj⊥+Πj(cos(θj)I−i sin(θj)Pj))Wjχ|A|Vj(Πj⊥+Πj(cos(θj)I−i sin(θj)Pj))Wjχ
and
aj(θ˜j,χ)+bj(θ˜j,χ)cos(θj)+cj((θ˜j,χ)sin(θj)+dj(θ˜j,χ)cos2(θj)+ej(θ˜j,χ)sin(θj)cos(θj)+fj(θ˜j,χ)sin2(θj)
where:
a
j(θ˜j,χ)=(VjΠj⊥Wjχ|A|VjΠj⊥Wjχ,
b
j(θ˜j,χ)=2(Vj,Πj⊥Wjχ|A|VjΠjWjχ,
c
j(θ˜j,χ)=2ℑ(VjΠjPjWjχ|A|VjΠj⊥Wjχ,
d
j(θ˜j,χ)=(VjΠjWjχ|A|VjΠjWjχ,
e
j(θ˜j,χ)=2ℑVjΠjPjWjχ|A|VjΠjWjχ
f
j(θ˜j,χ)=ViΠjPjWjχ|A|VjΠjPjWjχ.
where, for each (S, s)∈{(B, b), (C, c), (D, d), (E, e), (F, f)}:
In order to obtain closed-form solutions at the critical points, where
one can introduce the intermediate variable t=tan(θj/2) and, by direct manipulation find that (9) is equivalent to
(Ej(θ˜j)−Cj(θ˜j))t4−2(Bj(θ˜j)−2Dj(θ˜j)+2Fj(θ˜j))t3−6Ej(θ˜j)t2− (10)
2(Bj(θ˜j)+2Dj(θ˜j)−2Fj(θ˜j))t+(Ej(θ˜j)+Cj(θ˜j))=0 (11)
Therefore, in certain cases, there are at most four candidate values for conditional argmax of ([θ], [P], [Π]) that can be inspected in a purely classical loop once the coefficients in (9) are collected. In principle, these candidates could be written out in closed form using Ferrari or Euler formulae). However, note that, in certain embodiments, the focus is on real roots of the quartic in t, and note also that the coefficients of that quartic are typically noisy—therefore an approximate search for real only roots is well-justified.
When j-th factor of U(θ) is an uncontrolled unitary, e.g. Πj=, Πj⊥=0, then the (9) collapses into a much simpler equation that has been discussed, for instance, in M. Ostaszewski and E. Grant and M. Benedetti, “Quantum circuit structure learning,” (2019), ArXiv 1905.09692.
In the definition of generalized controlled rotation CR(θ, P, Π)=(Π⊥+Πe−iθP) above, a general notion of a projector is allowed. To make the preceding computations constructive, however, it is desirable to bind the projector with a unitary operator that is efficiently computed.
For example, suppose G is some easy to compute unitary gate with two eigenvalues μ1, μ2, and
G=μ
1Π1+μ2Π2
is the spectral decomposition of G (with Π1, Π2 being the projectors onto the corresponding eigenspaces).
Then
Under this assumption, any overlap involving projectors can be computed as a linear combination of purely unitary expectations. For example, in the above derivation
In particular, a one-qubit control on the qubit number c can be written up with Gc=I⊗ . . . ⊗Zc⊗ . . . ⊗I that has eigenvalues μ1=1, μ2=−1. Here for Π=Π2, has:
Similarly, if one wants projectors Π, Π⊥ to implement two-level control with control qubits c1, c2, then one uses the two-qubit controlled Z gate CZc
Any number of control levels can be implemented in this fashion.
In many of the disclosed embodiments, variational circuits are comprised (and, in some cases, solely) from the single-cubit and two-qubit gates. However the above discussion explains how these designs can be readily generalized (if needed) to involve more sophisticated multi-cubits unitaries.
Let χ=be [χ1, . . . , χN] some classically defined vector in N. Assume, for simplicity, that N=2n where n is some integer. An quantum amplitude encoding of χ is a quantum state |χ such that the probability of measuring some |j, j∈[N] in this state is χj2/|χ|2. The task of a desirable amplitude encoding given χ is the task of finding such quantum circuit Cχ that constructs a quantum amplitude encoding of χ from some non-informative “free” state, for example Cχ|0=|χζ.
It is known that for vector χ in a general position, a desirable encoder circuit Cχ will contain Ω(N) single- and two-qubit gates. Since in an interesting machine learning training loop, encoder circuits for the classical data samples will be invoked millions, if not billions of times, it makes sense to, in some embodiments, preprocess the classical samples once and associate the shortest possible amplitude encoder circuit with each data, sample. Due to the above-mentioned asymptotic lower bound in Ω(N), there is a limit on how much an encoder circuit can be compressed. Since the data in a machine learning problem is almost always inherently noisy and since the inference in such problems is almost never expected to be perfect, it makes sense to look for resource-optimal approximate quantum encoders for classical data.
For example, suppose f>>0 is the desired fidelity of an approximate encoding. If Cχ is an n-qubit exact encoder for χ, Cχ|0=|χ and if {tilde over (C)}χ is an inexact encoder than the fidelity goal can be written as
Cχ0|{tilde over (C)}χ0>f
Assuming, again that dim χ=N=2n and both circuits C and {tilde over (C)} are n-qubit circuits, one can frame the task of approximate quantum encoding as the task of finding a variational quantum circuit {tilde over (C)} that approximates the known fixed unitary operator C on a one-sample reference set S0={[1, 0, . . . , 0]}. In this setting, the problem of finding a desirable (e.g., optimal) {tilde over (C)} is a special case of an operator approximation problem that has been solved in section IIIB.
In many cases, however, the desired approximate encoder circuit {tilde over (C)} can be made quadratically shorter if one were allowed to use clean ancillary qubits.
In some embodiments, for instance, suppose one added m clean ancillary qubits to the register and the goal now is to approximately emulate the fixed n-qubit unitary operator Cχ using a circuit {tilde over (C)}on the full register of n+m qubits. The emulation task can be defined as the task of approximating some product state |χ⊗|α where |α is an arbitrary m-qubit ancillary state. In case one wanted to represent the desired encoding |χ exactly, then one would have (Cχ†⊗Im){tilde over (C)}|0⊗(n+m)=|0⊗n⊗|α. Thus, the exact encoding the state (Cχ†⊗Im){tilde over (C)}|0⊗(n+m) lies entirely the 2m dimensional subspace α spanned by the states of the from |0⊗n⊗|α. In the approximate encoding setting one wants the overlap of the (Cχ†⊗Im){tilde over (C)}|0⊗(n+m) with the subspace α to be as close to 1 as possible.
Let Πα be the orthogonal projector of the full 2n+m-dimensional state space on the sub-space α. Let f be some fidelity threshold that is asymptotically close to 1. Then ultimately the approximate emulator circuit {tilde over (C)} at fidelity f can be defined as a one satisfying the condition
∥Πα(Cχ†⊗Im)C{tilde over (C)}|0)⊗(n+m)∥2>f
In order to derive an embeddable unitary version of this condition consider the n-qubit reflection operator that flips the amplitude sign of the single basis state |0⊗n (naturally, is an adjoint of the (n−1)-time controlled Pauli Z). Then Πα=½(In+m−⊗Im), ∥Πα(Cχ554⊗Im){tilde over (C)}|0⊗(n+m)∥2≤½(1−(Cχ554⊗Im){tilde over (C)}|0(n+m)|(R|0⊗Im) (Cχ554 ⊗Im){tilde over (C)}|0⊗(n+m)) and the task of approximating to fidelity f can be reinterpreted as task of minimizing (Cχ†⊗Im){tilde over (C)}|0⊗(n+m)|R|0⊗Im(Cχ†⊗Im){tilde over (C)}|0⊗(n+m) so that
(Cχ†⊗Im){tilde over (C)}|0⊗(n+m)|(R|0⊗Im)(Cχ†⊗Im){tilde over (C)}|0⊗(n+m)1−2f (12)
If one can choose a trainable template for the desired {tilde over (C)} according to the ansatz (1) then (12) can be solved with the tools developed in section IIIC with the stipulation that the sample set now consists of the single “sample” |0⊗(n+m).
In practice, the ancilla count m=2 is almost always sufficient for quadratic compression of an approximate state encoder, compared to ancilla-free approximations of the same encoder.
Embodiments of the disclosed technology are not dependant on learning rate scheduling and do not require asymptotically significant number of epochs to converge.
However, since embodiments of the disclosed variational circuit learning technique prove (e.g., optimize) essentially the same utility function with the same optimization landscape, the technique may not provide a different principled resolution to the problem of finding the global optimum. (The problem of global optimization over non-convex landscapes is, in general, NP hard.) Therefore, it is desirable to explore the global utility function landscape either by adhoc sampling in the parameter space or by more sophisticated strategies, such as simulated annealing or Bayesian inference.
Here, one can start with the subalgorithm that obtains an approximation of some local maximum of the utility function that is, in some sense, close to a given starting point in the parameter space. The pseudo-code for the algorithm is shown in
The argmaxσ
In order to compare the coordinate ascent algorithm to other algorithms, one can measure performance in terms of quantum overlap values computed. Given a measurable observable A and some small tolerance value δ>0 the computation unit that is referred to as a quantum overlap unit is the task of computing ether χ|A|y or χ|A|y, where quantum states χ and y can be prepared at some constant cost cprep. (The preparation cost will be roughly the same for all the data samples involved in estimating the quantum utility function (θ).)
Given the circuits Cχ, Cy to prepare quantum encodings of the two data samples, |χ=Cχ|0, |y=|0, then either or of the overlap can be estimated by estimating 0|CχACy†|0 to required precision a using one ancillary quint and a version of Hadamard test. In this context, the cost of an instance of the Hadamard test is fixed and it is desirable to use O(1/δ2) instances of the test to estimate either the real or imaginary part of the overlap.
In some embodiments, for a selected parameter index j∈[0. . . (L−1)], the corresponding coefficients are Bj, Cj, Dj, Ej, Fj that inform equations (9). And (10) are aggregated over the entire sample set S1 and thus the overall cost of generating these coefficients is in O(/δ2). It is noted that the values of the coefficients Bj, Cj, Dj, Ej, Fj can be collected independently given a sufficient width of the quantum register. Further parallelization is possible by computing the constituent overlaps simultaneously. (For example VjWjχ|A|VjWjχVjGWjχ|A|VjGWjχ and (μ1VjWjχ|A|VjGWjχ that inform the coefficient dj can be computed simultaneously.) Possible shortage of qubits for complete parallelization affects only the constant of the complexity term O(||/δ2), but not the shape of this term.
In a typical scenario, each of the trainable parameters is visited at least once. The cost of one complete iteration over the parameter set is thus in O(L||/δ2). Maximation with respect to several parameters is strongly related to commutation relations between the constituent quantum gates.
Consider the task of estimating a local optimum of the utility function to a precision δ by coordinate ascent starting at some initial guess θ[0]. This task will be referred to as a sprite(θ[0], δ). It is currently understood that for a θ[0] in general position the sprite(θ[0], δ) requires an iteration count limit maxIter in O(log(1/δ)). The numerical simulations corroborate this understanding. With this understanding, the overall cost of exploring one local maximum of the utility is in O(L||log(1/δ)/δ2) when measured as a total count of the required number of the instances of Hadamard test.
It is notable that the sequence of improvements of the utility function under coordinate ascent is almost strictly monotonic. Ideally, if all the quantum gates and measurement are completely precise and if all the required state overlaps are computed with infinite precision, then the utility function can never decrease upon a coordinate update. In practice due to accumulated imprecision, the utility function can occasionally experience small regressions by some amount in O(δ). In practice such regressions are observed infrequently.
Embodiments of the disclosed technology present one or more advantages over other techniques. For example:
First, in other approaches, convergence can be sensitive to the choice of learning rate. Poor choices of learning rates or learning schedules can lead to a non-converging process. But, given that when a convergence is established, the precision of the achieved proxy to a local minimum depends strongly on the learning rate strategy. Therefore, to achieve a sufficient precision one typically needs to perform sweeps and model selection over significant (often, rather large) numbers of candidate learning rate strategies.
Second, some other approaches suffer from a pathology known as a “vanishing gradient”. This happens at parameter configurations where the cost function forms saddle points or “barren plateaus’. There are often no succinct criteria to distinguish true local minima from temporary gridlocks caused by vanishing gradients. This may lead to practical obstacles for reaching global minimum in otherwise common scenarios. Coordinate methods are much less susceptible to problems of this kind.
Suppose the dimensionality of feature vectors that represent classical data samples is N˜2n. The information-theoretical lower bound for the number of parameters required for preparing a faithful amplitude encoding of a random N-dimensional vector is N. However, meaningful data vectors seldom have maximum entropy. It should be understood that the number of parameterized quantum gates required for preparing a high-fidelity encoding of a classical vector scales with O(entropy*log(N)) assuming constant number of available ancillary quoits.
A motivation for this understanding is given by the following:
Proposition 2. Let n be the desired qubit count. Given a k-sparse vector in 2n (e.g., a real-valued vector with k non-zero elements), an n-qubit quantum amplitude encoding of this vector can be prepared by a quantum circuit with O(k n) single and two-qubit quantum gates using a quantum register with at most (n+2) (i.e., assuming at most two additional ancillary qubits).
Proof. Recall that a two-level RY rotation (θ) rotates the span(|j, |) as follows: |jcos(θ/2)|j+sin(θ/2)|; |−sin(θ/2)|j+cos(θ/2)|). It leaves the orthogonal complement of the span(|j, |) stationary.
It is obvious that a quantum encoding of a k-sparse vector can be prepared starting from |00| by consecutive use of at most k two-level RY rotations.
It is now desirable to estimate the cost of the (θ) for an angle θ in general position. Because an asymptotic argument is being made, one can take n>5 in order to use a lemma from A. Barenco and et al., Phys. Rev. A 52, 3457 (1995). Since RY (θ)=S H RZ(θ) H S†, the cost of (θ) is asymptotically the same as that of (θ). The latter is however an (n−1) times multiplexed single-qubit diagonal unitary RZ(θ). It is well known that with one clean ancillary qubit the latter can be emulated with two single qubit RZ rotations and three n-times multiplexed X gates using an (n+1)-qubit quantum register. With one additional ancillary qubit each of these ((n+2)−2) times multiplexed X gates can be emulated using O(n) 3-qubit Toffoli gates. Since a 3-qubit Toffoli gate has an exact representation of constant depth and size, the understanding as described above follows.
While it can be shown that Θ(k n) is asymptotically optimal gate cost of quantum encoding for k-sparse vectors, a question of practical importance for quantum machine learning is developing a methods for the best effective approximate state preparation. The goal here can be described as that of finding efficiently computable constant cprep such that for small positive δ>0 a δ-approximation of the desired encoding of a k-sparse vector can be prepared using a a circuit with at most cprepk n log2(1/δ) single and two-qubit rotation gates.
One can do this using methods developed in section IIID.
In
In
In some examples, the method further comprises receiving one or more of parameter tolerance bounds, and/or a bound on a maximum number of iterations to be performed during the training. In some implementations, the training is performed iteratively by computing analytic expressions for expectation values of the training data to increase a probability of correct identification of training labels. In some implementations, the expectation value is inferred by computing overlaps between quantum states. For example, the overlap computation can be performed by a Hadamard test to infer the real and imaginary components of the overlap. In further examples, the training sample is given by a qubit encoding or an amplitude encoding in which data is represented as amplitudes or phases of a state vector of qubits of the quantum circuit.
The method of
In certain implementations, the representation of the input data is given by an amplitude encoding of the data or a qubit encoding of the data. In further implementations, the classical computing device is programmed to train the quantum computer to predict a class label using a pre-trained classifier circuit. In further implementations, the coordinate ascent procedure uses hyperparameters. In some implementations, the set of hyperparameters includes one or more of: (a) a depth of the quantum circuit that is being trained: (b) a size of a mini-batch in training the quantum circuit; (c) a maximum number of iterations used for training; (d) number of random restarts that are applied un the training. In certain implementations, a program run on a classical computer is used for sweeping through feasible values of hyperparameters and and post-selecting quantum circuit(s) with the optimal hyperparameter choices.
Having described and illustrated the principles of the disclosed technology with reference to the illustrated embodiments, it will be recognized that the illustrated embodiments can be modified in arrangement and detail without departing from such principles. For instance, elements of the illustrated embodiments shown in software may be implemented hardware and vice-versa. Also, the technologies from any example can be combined with the technologies described in any one or more of the other examples. For example, alternative embodiments can use a stochastic gradient descent approach that converges to a local minima. It will be appreciated that procedures and functions such as those described with reference to the illustrated examples can be implemented in a single hardware or software module, or separate modules can be provided. The particular arrangements above are provided for convenient illustration, and other arrangements can be used.