None
The invention relates to a method and system for extreme learning machine using a quantum processor or a quantum simulation processor.
One of the most promising directions for applying quantum computing in the quest for quantum advantage is machine learning (ML). There has been some effort in establishing possible routes to integrate quantum components in modern ML algorithms to ensure a training speedup and/or a performance enhancement, as is reported in Mujal et al “Opportunities in Quantum Reservoir Computing and Extreme Learning Machines,” published 10 Jul. 2021 on Arxiv.org (DOI:arXiv:2102.11831v2).
One of the challenges in designing quantum components for machine learning algorithms is the need to deal with decoherence and correct for the unavoidable presence of hardware noise. Quantum Extreme Learning Machine (QELM) is a hybrid classical-quantum framework that tackles the matter from a different angle and takes advantage of the complex and noisy dynamics of current NISQ devices to improve the learning effectiveness of ML tasks. This strategy has seen some exploration for temporal tasks involving time series but an adaptation for complex non-temporal classification tasks is still missing. This document sets out a method and system for this purpose.
This document describes a method and system for implementing a quantum-based machine learning system.
The quantum-based machine learning system uses noise in a quantum substrate (comprising a number of qubits). This noise causes the qubits to lose their quantum mechanical properties. This noise and the loss of the information in the qubits has traditionally been seen as a disadvantage. In the case of the quantum-based machine learning system, as set out in this document, the loss of the noise is found to introduce positive effect for information processing. In other words, the noise of the qubits can be used to enhance classical machine learning methods. The quantum-based machine learning system can leverage the rich dynamics that noisy quantum systems exhibit with their considerable number of degrees of freedom. The noise enables the generation of complex and interesting output states which are fed to the training layers.
Quantum noise is a strong limiting factor in gate-based quantum computing, so a paradigm like QELM shows a lot of potential and lends itself very well to current gate-based NISQ systems. A gate-based implementation, for example on superconducting quantum processors, can be realized and explained in two parts, data encoding and construction of the quantum substrate.
This document describes a method for implementing a quantum-based extreme learning machine using a quantum processor. The method comprises uploading an input features vector and applying, to the input features vector, a vector of optimal weights R previously generated from training data using the quantum processor and thereby generating a vector of expectation values to enable prediction of a new data point b from the input features vector.
A method for training a quantum-based extreme learning machine using a quantum processor implementing a quantum substrate and a set of training data is also disclosed. The training data comprises input features vectors with a plurality of N parameters and true labels vector. The method comprises uploading the training data to the quantum processor and encoding the uploaded training data. The input features vector is divided into a plurality of subsets of the input features vector and the plurality of subsets are passed through the quantum substrate to obtain a plurality of output vectors of expectation values. The plurality of output vectors of expectation values is concatenated to construct a matrix and an inverse matrix is computed from the matrix This inverse matrix is multiplied by the true labels vector to obtain a vector of optimal weights which is used in the implementation of the extreme learning machine.
In one aspect, the inverse matrix is a Moore-Penrose pseudo inverse matrix.
The encoding is one of basis encoding, amplitude encoding, angle encoding, qsample encoding and Hamiltonian encoding and the encoding may be done redundantly.
The quantum substrate comprises n qubits and wherein n<N.
A computing system for implementing a quantum-based extreme learning machine learning is also disclosed and comprises a plurality of input/output devices for inputting training data and outputting the vector of optimal weights (3. The computing system has a gate-based quantum processor for implementing the extreme learning machine (ELM) having an input layer, a quantum substrate with a plurality of noisy quantum gates and an output layer and a connection layer. In one aspect, the noisy quantum gates are a plurality of controlled NOT gates (C-NOT or CX).
The invention will now be described on the basis of the drawings. It will be understood that the embodiments and aspects of the invention described herein are only examples and do not limit the protective scope of the claims in any way. The invention is defined by the claims and their equivalents. It will be understood that features of one aspect or embodiment of the invention can be combined with a feature of a different aspect or aspects and/or embodiments of the invention.
A graphics processing unit 35 for processing vector calculations and a field programmable gate array (FGPA) 40 for control logic can also be connected to the central processing unit 20. A quantum processor 50 (also termed quantum accelerator) is connected to the central processing unit 20. In an alternative embodiment, the quantum processor 50 is simulated on a classical processor.
In one implementation of the computing system 10, the quantum processor 50 can be a gate-based quantum processor, such as one developed by IBM Wave, but this is not limiting of the invention. The computing system 10 is connected to a computer network 60, such as the Internet. It will be appreciated that the computing system 10 of
The inspiration for the quantum extreme learning machine method and system set out in this document is based on the classical counterpart, the Extreme Learning Machine (ELM) idea. The (classical) ELM is a learning machine (or method) that, unlike most of the neural networks learning methods, does not require gradient descent-based backpropagation and is a feed-forward neural network. The idea of the ELM is to pass the data through a complex fixed dynamical system, i.e., a classical neural network, termed a “substrate” and then to perform the training process, without tuning the substrate.
The ELM 200 has minimal requirements for learning (also termed “training”), mainly because a learning process is performed without an iterative tuning of the hidden nodes 230 in the feed-forward part of the substrate 220. The values of the hidden nodes 230 can be randomly assigned and are not updated or otherwise optimized during the learning process. It is only necessary to adjust the weights (connections) of the substrate 220 with the output layer 240 via, for example, a connection layer 250, which is a simple linear trainable layer that uses the Moore-Penrose pseudo-inverse method (explained later). In other words, the ELM offloads the bulk of the processing to a fixed complex system (i.e., in the connection layer 250), and desired input-output maps are achieved by only adjusting how the state of the ELM is post-processed between the substrate 220 and the output layer 240 in the connection layer 250. The post-processing between the substrate 220 and the output layer 240 enables the ELM 200 to avoid multiple iterations and local minimization for training, while maintaining a good generalization and fast training times.
The ELM 200 is trained as follows. Training data 260 is firstly normalized so that the values of parameters 262 in the training data take a value between 0 and 1 before being input into the ELM 200 through the input layer 210 to train the ELM 200. The training data 260 comprises vectors with a plurality of the (normalized) parameters 262 and corresponding true labels vector 265. The true labels vector 265 are the results that have been observed when the parameters 262 are measured. For example, the parameters 262 might be sensor values measured over time in an industrial process and the true labels vector 265 would be the outcomes of the industrial process. Alternatively, the parameters 262 might be financial data, such as options values or ticker prices, with the outcomes being the values of the underlying stocks or bonds.
The Moore-Penrose pseudo inverse method used in the connection layer 250 is a linear algebra technique used to approximate an inverse of non-invertible matrices and generalizes to arbitrary matrices the notion of matrix inversion. This Moore-Penrose pseudo inverse method can be used in the connection layer 250 of the ELM 200 because of the robustness and minimal computational resources required for performing the method. After the fixed feed-forward part in the substrate 220, a matrix H is built in the connection layer 250 with the output from the substrate 220. The pseudo-inverse matrix H from the output is computed. This pseudo-inverse matrix is multiplied by the true labels vector 265 to get the optimal weights β to be used to predict new data examples as:
{circumflex over (T)}=H{circumflex over (β)}
where T is the new data to classify.
The substrate 220 with the hidden nodes 230 is replaced in the quantum implementation by a quantum substrate 220. The quantum substrate 220 is a dynamical quantum system with a number of qubits.
The training method for the Quantum ELM is outlined in
In this method, input features vector 400x=(x1, . . . , xN) is input through input/output device 30 to the quantum processor 50. The input features vector 400x has N features or parameters. The input features vector 400x=(x1, . . . , xN) is then uploaded in step S610 using single-qubit rotation gates applied to the initialized state as follows:
|x=U(xi)|)⊗n
where U is a unitary operation parameterized by xi such as Rx, Ry, Rz rotational gates. This encoding strategy encodes the N features of the input feature vector 400x into n qubits with a constant circuit depth, where n≥N, so it is a very efficient in terms of operations (see, for example, LaRose and Coyle, 2020, “Robust data encodings for quantum classifiers, Phys. Rev. A 102, 032420, but is not optimal in the number of qubits used (as noted by Grant et al., 2018, “Hierarchical quantum classifiers”, NPJ Quantum Information, 2018 4:65, pages 1-8). Limitation of this prior art means that it is not possible to encode all the N features in the input feature vectors 400x at once, which is a requirement for prior art extreme learning machine methods.
The quantum substrate 220 is programmed in the quantum processor 50 and is constructed by a cascade of controlled-X (CX) gates to generate entanglement in step S630 between the qubits. A measurement of a suitable observable M is made for all the qubits or for a subset of the qubits in step S640. The measurement must result in a real number, and this means that the corresponding quantum operator must be Hermitian, i.e., having eigenvalues which are real values. Non-limiting of such operators are Pauli X/Y/Z operators. In this case, the average of Z observables is measured on each qubit over many runs of the quantum circuit in step S630. The final output in step S650 is then a vector of real expectation values 420.
The circuit in the quantum substrate 220 is depicted in
As noted above, the passage of all the examples of the training data 260 examples cannot be done directly using a quantum dynamical system because the size of the dataset with the training data 260 is large and the amount of training data 260 that can be passed is limited by the capabilities of the quantum systems with the limited number of qubits available. In method of this document, an intermediate matrix is constructed such that the entire dataset with the training data 260 can be passed through the quantum substrate 220, and training performed in one step as in the original scheme in a quantum-classical fashion. This step is outlined in
The feature vectors 400x of N features are passed in step S410 shown in
This intermediate representation obtained from the application of the quantum substrate 220 is used to compute the Moore-Penrose pseudo inverse method in step S440 shown in
The concept has been evaluated for standard classification datasets from sklearn using the IBM simulator. Better results than the classical ELM counterpart and comparable results to the default Random Forest of sklearn. The results are shown in
As already noted above, it is possible to introduce redundancy into the data encoding in step S620 by using more qubits n as the hidden nodes 230 than the number of features N which are to be encoded. A simple example will illustrate this. Let it be supposed that there are two features N and four qubits n. In this case the “extra” two qubits can redundantly encode the same feature as the other two qubits.
In one aspect the redundantly coded data can be modified in some manner. For example, the data could be multiplied by the qubit number before it is encoded. This means that the data is projected in a more disperse way on the Bloch sphere. This is interesting in the quantum case because the redundant qubits encoding the same feature are then entangled through the controlled-X (CX) cascade in the quantum substrate 220. Simulations indicate an improvement using this redundancy technique, especially in the quantum case where entanglement is also present.