The embodiments of the invention relate generally to the field of quantum computing. More particularly, these embodiments relate to an apparatus and method for loading classical data into quantum computers in short depth and with controllable errors.
Quantum computing refers to the field of research related to computation systems that use quantum mechanical phenomena to manipulate data. These quantum mechanical phenomena, such as superposition (in which a quantum variable can simultaneously exist in multiple different states) and entanglement (in which multiple quantum variables have related states irrespective of the distance between them in space or time), do not have analogs in the world of classical computing, and thus cannot be implemented with classical computing devices.
A better understanding of the present invention can be obtained from the following detailed description in conjunction with the following drawings, in which:
In the following description, for the purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the embodiments of the invention described below. It will be apparent, however, to one skilled in the art that the embodiments of the invention may be practiced without some of these specific details. In other instances, well-known structures and devices are shown in block diagram form to avoid obscuring the underlying principles of the embodiments of the invention.
A quantum computer uses quantum-mechanical phenomena such as superposition and entanglement to perform computations. In contrast to digital computers which store data in one of two definite states (0 or 1), quantum computation uses quantum bits (qbits), which can be in superpositions of states. Qbits may be implemented using physically distinguishable quantum states of elementary particles such as electrons and photons. For example, the polarization of a photon may be used where the two states are vertical polarization and horizontal polarization. Similarly, the spin of an electron may have distinguishable states such as “up spin” and “down spin.”
Qbit states are typically represented by the bracket notations |0 and |1
. In a traditional computer system, a bit is exclusively in one state or the other, i.e., a ‘0’ or a ‘1.’ However, qbits in quantum mechanical systems can be in a superposition of both states at the same time, a trait that is unique and fundamental to quantum computing.
Quantum computing systems execute algorithms containing quantum logic operations performed on qubits. The sequence of operations is statically compiled into a schedule and the qubits are addressed using an indexing scheme. This algorithm is then executed a sufficiently large number of times until the confidence interval of the computed answer is above a threshold (e.g., ˜95+%). Hitting the threshold means that the desired algorithmic result has been reached.
Qbits have been implemented using a variety of different technologies which are capable of manipulating and reading quantum states. These include, but are not limited to quantum dot devices (spin based and spatial based), trapped-ion devices, superconducting quantum computers, optical lattices, nuclear magnetic resonance computers, solid-state NMR Kane quantum devices, electrons-on-helium quantum computers, cavity quantum electrodynamics (CQED) devices, molecular magnet computers, and fullerene-based ESR quantum computers, to name a few. Thus, while a quantum dot device is described below in relation to certain embodiments of the invention, the underlying principles of the invention may be employed in combination with any type of quantum computer including, but not limited to, those listed above. The particular physical implementation used for qbits is orthogonal to the embodiments of the invention described herein.
Quantum dots are small semiconductor particles, typically a few nanometers in size. Because of this small size, quantum dots operate according to the rules of quantum mechanics, having optical and electronic properties which differ from macroscopic entities. Quantum dots are sometimes referred to as “artificial atoms” to connote the fact that a quantum dot is a single object with discrete, bound electronic states, as is the case with atoms or molecules.
The quantum dot device 100 of
Generally, the quantum dot devices 100 disclosed herein may further include a source of magnetic fields (not shown) that may be used to create an energy difference in the states of a quantum dot (e.g., the spin states of an electron spin-based quantum dot) that are normally degenerate, and the states of the quantum dots (e.g., the spin states) may be manipulated by applying electromagnetic energy to the gates lines to create quantum bits capable of computation. The source of magnetic fields may be one or more magnet lines, as discussed below. Thus, the quantum dot devices 100 disclosed herein may, through controlled application of electromagnetic energy, be able to manipulate the position, number, and quantum state (e.g., spin) of quantum dots in the quantum well stack 146.
In the quantum dot device 100 of
Multiple parallel second gate lines 104 may be disposed over and between the first gate lines 102. As illustrated in
Multiple parallel third gate lines 106 may be disposed over and between the first gate lines 102 and the second gate lines 104. As illustrated in
Although
Not illustrated in
After Richard Feynman asked in 1982 whether quantum physics could be simulated efficiently using a quantum computer, much effort researching for a quantum computer has been focused on its universality and its efficiency over classical computation. One such example is David Deutsch's quantum Turing machine in 1985 that can be programmed to perform any computational task that can be performed by any physical object.
In contrast to theories and algorithms, quantum physical machines are in still their infancy. Efforts to build quantum information processing systems have resulted in modest success to date. Small quantum computers, capable of performing a small set of quantum operations on a very few qubits, represent the state of the art in quantum computation. In addition, quantum states are fragile in the sense that quantum states only remain coherent for a limited duration. This gap between algorithms and physical machines has driven the effort to invent hybrid classical-quantum algorithms. Some recent quantum algorithm developments have focused on short-depth quantum circuits to carry out quantum computations formed as subroutines embedded in a larger classical optimization loop, such as the variational eigensolver (P. J. J. O'Malley, 2016). Quantum languages, tools, and flows have been developed, providing software layers/stacks to translate and optimize applications to the quantum physical layer to cope with the stringent resource constraints in quantum computing (Frederic T. Chong, 2017, 14 Sep.).
On the hardware side, classical computers have been used to perform error correction for quantum computations. The “quantum co-processor” model is the most favorable prevailing execution model where a classical CPU controls a quantum processing unit in a similar manner to how CPUs in modern computer systems interact with GPUs. As described in (X. Fu, 2016 May) and (X. Fu, 2018), the microarchitecture for experimental superconducting quantum co-processors included features such as an arbiter on the code fetch data path to steer classical instruction to host CPU and quantum instruction to quantum co-processor, an exchange register file to synchronize register files between host CPU and the quantum co-processor, and a quantum instruction cache.
The microarchitectures for these mechanisms, however, are not well defined and explicit support for hybrid classical-quantum programs is lacking. Consequently, it is unclear how a quantum co-processor would be implemented within a quantum computer, particularly one which is required to run a diverse set of quantum programs. A flexible and programmable model has yet to be developed for executing hybrid classical-quantum algorithms.
One embodiment of the invention adds a set of quantum instructions to an instruction set architecture (ISA) of a processor such as a CPU. By way of example, these instructions may be included in an extension to the ISA (e.g., such as the AVX-512 extensions for the x86 platform). In addition, in one embodiment, a quantum engine is added to the processor's execution unit and the new quantum instructions are fetched, decoded, scheduled, and executed on the functional units of the quantum engine. In one embodiment, the quantum engine interacts with the classical execution engines using a shared register file and/or system memory. Upon executing the quantum instructions (or quantum uops in certain embodiments described herein), the quantum execution engine generates control signals to manipulate the state of the qubits within the quantum processor. The quantum engine also executes instructions to take a measurement of specified sets of qubits and store the results. In these embodiments, a quantum/classical interface provides connectivity between the quantum engine of the classical processor and the quantum processor.
Quantum and non-quantum instructions 201A-B are fetched from memory 205 at the front end of the instruction pipeline and stored in a Level 1 (L1) instruction cache 201. Instructions and data may also be stored within a Level 2 or Level 3 cache within a cache/memory subsystem 215, which manages memory requests and cache coherency.
A decoder 202 decodes the instructions 201A-B into micro-operations or uops 203A which are scheduled for execution by a scheduler 203 and executed by execution circuitry 204. In one embodiment, certain stages of the pipeline are enhanced to include hardware support for processing the quantum instructions 201B while other stages are unaltered. For example, quantum decode circuitry 202A may be added to the decoder 202 for decoding the quantum instructions 201A, just as non-quantum decode circuitry 202B decodes non-quantum instructions 201B. Although illustrated as separate components in
In one embodiment, the decoder 202 generates a sequence of uops 203A in response to decoding the instructions 201A-B. In an implementation with quantum and non-quantum instructions, the uops may include a mixture of quantum uops and non-quantum uops, which are then scheduled for execution by an instruction scheduler 203.
The quantum and non-quantum uops 203A generated by the decoder 202 may initially be queued for execution within one or more uop queues of the scheduler 203, which dispatches the uops from the uop queue(s) in accordance with dependencies and/or execution resource availability. The embodiments of the invention may be implemented on various different types of processors with different types of schedulers. For example, in one embodiment, a set of execution “ports” couple the scheduler 203 to the execution circuitry 204, where each execution port is capable of issuing uops to a particular set of functional units 204C-E. In the example architecture shown in
In the particular embodiment shown in
In an embodiment in which quantum uops are mixed with non-quantum uops, the quantum uops are issued over one or more quantum ports to a set of quantum engine functional units 204E, which execute the quantum uops to perform the underlying quantum operations. For example, the quantum engine functional units 204E, in response to the quantum uops, may generate control signals over a quantum-classical interface 206 to manipulate and take measurements of the qubits of a quantum processor 207.
The quantum-classical interface 206 includes digital-to-analog (D-A) circuitry to convert the digital quantum control signals generated by the quantum engine functional units 204E to analog signals required to control the quantum processor 207 (e.g., such as the codeword triggered pulse generation (CTPG) units and Arbitrary Waveform Generator (AWG) described below) and also includes analog-to-digital (A-D) circuitry to convert the physical qubit measurements to digital result data.
In one embodiment, the quantum-classical interface 206 is integrated on the same semiconductor chip as the other components of the instruction processing pipeline (e.g., the execution circuitry 204, scheduler 203, decoder 202, etc). As discussed in detail below, different types of circuit/logic components may be used depending on the particular physical implementation of the quantum processor 207.
The operands for the quantum and non-quantum uops are stored in a set of shared registers 321 (as described above) and accessed by the quantum functional units 320 when executing the uops. The Q-C interface 320, in response to the quantum uops, controls the operation of the quantum processor 207.
Different examples of a quantum-classical interface 206 are illustrated in
The Q-C interface 206 shown in
To further guide the analysis and discussion, a concrete example is illustrated in
One example of a quantum program that uses this circuit for a portion of its computation is illustrated in
This program structure shows how classical operations and quantum operations may be tightly intertwined and executed on the classical-quantum processing architectures described herein. The most efficient way to execute this program is to process all instructions in a pipeline such as those described above, with the quantum engine functional units 204E for controlling qubits configured as execution engine peer to other classical execution engines 204A-B (such as integer, floating point, etc.).
A method in accordance with one embodiment of the invention is illustrated in
At 701 source code containing quantum instructions is compiled to generate runtime program code with quantum and non-quantum instructions. At 702 the quantum/non-quantum instructions are fetched from memory and stored in a local cache (e.g., the L1 instruction cache) or instruction buffer. As mentioned, quantum instructions may be freely mixed with non-quantum instructions within the pipeline.
At 703 the quantum and non-quantum instructions are decoded into sets of quantum and non-quantum uops, respectively, and stored in a queue prior to execution. At 704 the quantum/non-quantum uops are scheduled for execution based on uop and/or resource dependencies. For example, if a first uop is dependent on the results of a second uop then the first uop may be scheduled for execution only when the data produced by the second uop is available in one of the registers. Similarly, if a particular functional unit is busy, then the scheduler may wait for an indication that the functional unit is available before scheduling a uop which requires that functional unit. Various other/additional scheduling techniques may be implemented (e.g., scheduling based on priority, register load, etc).
At 705 the quantum uops and non-quantum uops are executed on their respective functional units within the execution circuitry. As mentioned, the shared register set may be used to store the source and destination operands required by these uops.
At 706, the results generated by the execution of the quantum uops may be used as input to an interface unit to control the quantum state of the qubits in a quantum processor. In one embodiment, a series of codewords or command packets may be generated which identify a quantum channel, one or more qubits within a quantum processor, a qubit type and/or a command state. The specific physical operations performed in response to the codeword or command packet is based on the underlying type of quantum processor used.
The embodiments described herein integrates quantum instructions within an existing processor pipeline. Because of the tight integration, these embodiments significantly reduces the various overheads/bottlenecks associated with current co-processor designs. These overheads/bottlenecks include, for example, the communication between the classical computation layers/modules and the quantum computation layers/modules in the software stack and between the classical CPU and the quantum chip via the message queue. Given the relatively small size of quantum routines, the current GPU-like co-processor implementations are inefficient.
Due to increased classical processing capabilities, hybrid co-processor models reduce some of the overhead. In one particular implementation which supports the hybrid co-processor model, many new micro-architecture mechanisms were introduced. However, these micro-architectural mechanisms were ambiguously defined as was the boundary between the classical CPU and quantum co-processor.
In contrast, in the hybrid architecture described herein, the classical computation pipeline is equipped to fully support a defined set of quantum instructions which may be freely mixed with non-quantum instructions both at the front end of the pipeline (i.e., at the macroinstruction level) and within the back-end of the pipeline (e.g., where quantum uops are mixed with non-quantum uops) and executed on functional units within the execution circuitry of the processor.
In quantum computing, a qubit is a unit of quantum information which is the quantum analogue of a classical binary bit. The computation is achieved by applying quantum gates, representing quantum logical operations, directly to qubits. Mathematically, this computing process is described as qubits undergo unitary transformations. Upon completion of computation, qubits are measured to gain information about the qubit states.
Therefore, to describe a quantum operation, it is necessary to identify the qubit or set of qubits to which the operation is applied. In a quantum program, each quantum instruction needs to encode both an operation to be performed and one or more qubits on which to perform the operation. In existing quantum instruction set architectures (e.g., QASM, Open QASM, QIS, etc) register operands are normally encoded in the opcode of an instruction. This scheme works for classical computing because the number of registers are very limited (e.g., 16, 32, 64, etc). However, this scheme is not scalable for quantum computing as quantum instructions will ultimately need to address a very large numbers of qubits. Consequently, encoding qubit addresses in the opcode field of quantum instructions would explode the instruction width.
As described above, in one embodiment, quantum instructions and non-quantum instructions are processed together within a shared processor pipeline. As such, the quantum instructions may rely on the same addressing modes as those available to the non-quantum instructions. The qubits in this embodiment are therefore addressed in a similar manner as non-quantum instructions which access system memory, providing a sufficiently large address space to accommodate a large number of qubits.
As illustrated in
The QIG 802 may operate in accordance with different addressing modes supported by the processor. In one embodiment, the instruction identifies one of the shared registers 321 which contains the qubit index value (sometimes also referred to as a qubit ID). It may then use the qubit index value to identify the qubit within the codeword/command packet 606 and/or perform an operation using the qubit index value to generate one or more additional qubit index values. For example, it may add the qubit ID value to an integer specified by the uop to generate a second qubit ID.
The following examples demonstrate one way in which the QIG 802 generates qubit IDs in response to uops using an x86 assembly syntax. These operations may be performed within an x86 pipeline extended to support quantum instructions. However, the same general principles may be implemented on any processor architecture.
The single qubit instruction “QIROTX [RDI], 1” applies an X gate to the qubit number stored in RDI. Thus, if RDI contains 5, the X gate is applied to qubit number 5. In this example, the QIG 802 determines the qubit ID simply by reading the value stored in RDI (which is one of the shared registers 321 in this example). In this embodiment, the RDI value was stored previously by another uop. As another example, if the architecture register RBX contains a value of 2, then the two qubit instruction “QCNOTUP [RBX+3],” applies a CNOT operation with qubit 2 (q[2]) being the control qubit and qubit 5 (q[5]) being the target qubit. The QIG interprets the [RBX+3] notation as: the ID of the control qubit is stored in RBX and the ID of the control qubit+3 is the target qubit ID. Thus, the addressing scheme is extended so that two different qubits can be addressed with a single instruction, (i.e., CNOT). In contrast, in classical computing, only one memory location is addressed per instruction.
The quantum error correction unit 808 may implement various techniques for detecting and correcting quantum errors. For example, in one embodiment, an error decoder (within the QEC unit 808) decodes a multi-qubit measurement from the quantum processor 207 to determine whether an error has occurred and, if so, implements corrective measures (is possible). The error measurements may be taken from multiple qubits in a manner which does not disturb the quantum information in the encoded state of the qubits (e.g., using ancilla qubits). In response, the QEC unit 808 generates error syndrome data from which it may identify the errors that have occurred and implement corrective operations. In one embodiment, the error syndrome data comprises a stabilizer code such as a surface code. In some cases, the response may simply be to reinitialize the qbits and start over. In other cases, however, modifications to the quantum algorithm implemented in the quantum program code 205C can be made to stabilize the region of the quantum processor responsible for the error (e.g., where compiler 205B includes a just-in-time (JIT) compiler). In either case, the CTPGs 402A perform the underlying physical operations under the control of the codewords/command packets 606 generated by the QEFU 204E. For example, the CTPG 402A may generate electromagnetic pulses to adjust the phase of one or more qbits in accordance with the detected phase error, or to reset the phase/spin of all qbits if re-initialization is required.
Addressing qubits in a manner which is similar to how classical CPU's address memory provides the scalability characteristics/attributes required for future quantum processor implementations. In particular, the above-described embodiments provide qubit indexing which is seamlessly integrated within an existing processor ISA and scales to a large number of qubit systems. These embodiments also remove pressure from the quantum instruction opcode space by way of a quantum extension to x86 or other architectures to address the qubit space and integrate quantum operations to existing processor pipelines.
A method in accordance with one embodiment of the invention is illustrated in
At 901 quantum and non-quantum instructions from runtime program code are fetched and decoded, generating quantum and non-quantum uops. At 902 an index generation unit evaluates quantum uops including register identifiers and optionally one or more values included with the uops to determine qubit index values. As described above, the indices may be generated using a variety of techniques including reading qubit index values from registers identified by the uops and generating additional qubit index values using integer values included with the uops.
At 902, the quantum execution circuitry generates a codeword specifying the quantum operations to be performed on the qubits identified by the calculated qubit index values. At 905, qubit measurements are performed in response to another codeword generated based on additional uops. At 906, the analog measurement made on one or more of the qubits are converted to digital values. Error correction and/or flow control may then be performed based on the resulted digital result values stored in a register file of the processor.
During two qubit operations in a quantum computing system an exchange or interaction mechanism is typically employed which adds a drift term to the phase of the interacting qubits. This drift term tends to degrade qubit coherence exponentially over sequences of two qubit operations resulting in a lower T2 (dephasing) time. This limits the amount of time available for quantum operations and reduces the robustness and usefulness of the quantum computing system.
The resilience of a quantum computing system can be improved using corrective pulse sequences transmitted along with the quantum operations. These corrective pulse sequences are generated statically by a compiler for later replay on quantum experimental hardware. Hand generated pulse sequences that compensate for decoherence in the quantum circuit may also be programmed directly into the system.
However, long trains of pulse sequences require exponential memory resources to store the waveforms prior to replay at the hardware level. In addition, bandwidth to feed the pulse train into the system hardware limits scalability to low circuit depth algorithms because of the overhead of sending corrective pulse sequences between each quantum gate operation. Hand-generated pulse sequences are tedious and not scalable to a large number of qubits or long circuit depth algorithms.
To build a more resilient quantum microcode for a general purpose quantum computing system, the issues of decoherence and incorrectly shaped control pulses need to be addressed. Decoherence refers to the fact that qubits decohere through loss of phase information encoded in them just by sitting idle. Imperfectly shaped control pulses can cause qubits to lose phase alignment, resulting in the qubits moving off resonance. The next quantum operation on that qubit will be only partially effective which results in a certain amount of error in the computation.
To address the above problems, one embodiment of the invention uses a lookup table or other indexed data structure (simply referred to below as a “lookup table”) to store sequences of corrective operations associated with different quantum operations. When a quantum instruction is received in the decoder unit, the lookup table is accessed to determine whether there is a corrective sequence available for this quantum operation. The unique opcode of the macroinstruction or combinations of uops resulting from the macroinstruction may be used as an index to the lookup table, to identify any corrective actions. If a corrective pulse sequence is found, then a corresponding set of corrective uops specifying the pulse sequence are injected in the instruction stream in place of (and/or in combination with) the uops for the qubit operations.
The corrective uops are forwarded to the quantum execution unit, which executes the corrective uops to generate the corrective set of pulses. In one embodiment, the corrective uops are uniquely tailored to each specific qubit as well as different combinations of qubits (e.g., for two qubit operations between qubits). In one embodiment, the corrective set of uops to generate the corrective pulses may be compiled over time based on observations made with respect to specific qubits, sets of qubits, and/or specific operations. If a particular qubit or set of qubits is showing problems with decoherence, for example, then one or more uops may be automatically added to the lookup table to correct this issue.
The decoherence problem may be identified by a quantum error correction unit which, in one embodiment, includes a machine-learning engine to identify the decoherence problem based on an analysis of quantum calculations over a period of time. It may then identify a specific set of uops and operand values needed to correct the problem. Thus, one embodiment of the invention includes a quantum processor, an instruction decoder, a micro-op sequencer, and a quantum micro-code execution engine along with a look-up table that contains some preconfigured pulse sequences for each type of quantum gate which is supported by the instruction set.
Regardless of how it is generated, the corrective uop sequence is scheduled for execution on the quantum engine functional units 204E which executes the new composite pulse sequence via the Q-C interface 206.
In one embodiment, the spin-echo sequence table 1005 is statically generated based on calibration tests run on the quantum processor 207. After the initial static update, the corrective sequence management circuitry/logic 1000 dynamically updates the spin-echo sequence table 1005 over time, as new errors are associated with the various qubits of the quantum processor 207. In one embodiment, the error detection and machine-learning logic/circuit 1008 may continuously analyze results generated by the quantum processor 207 during runtime and specify corrective actions to be taken by the corrective sequence management circuitry/logic 1000, which then updates the spin-echo sequence table 1005 with new corrective uop sequences and/or new operand values needed to make the corrections. Decoherence, for example, may be identified by repeated errors related to the state of a particular qubit or a particular combination of qubits.
In one embodiment, when the error detection and machine-learning logic/circuit 1008 detects an error syndrome which it has not seen before, it will attempt to identify any correlations between the new error syndrome and previously learned models. Based on these correlations, it may generate a new entry in the spin-echo sequence table 1005 with a set of correction uops. If the corrective recommendation did not resolve the error, the error detection and machine-learning logic/circuit 1008 will make another attempt until desired results are achieved, at which point it will keep the corrective uops entered in the spin-echo sequence table 1005.
Thus, in one embodiment, the machine-learning logic/circuit 1008 performs unsupervised learning of new errors as they occur. Unsupervised learning is particularly beneficial for working with a quantum processor 207 because the physical responses of the individual qbits may change over time and may also vary from one quantum processor to another. In one implementation, the error detection and machine-learning logic/circuit 1008 is initially equipped with a set of basic models which are commonly used to detect and correct certain types of errors. Starting with this base set of models, the error detection and machine-learning logic/circuit 1008 continually trains itself in response to detecting new errors and update the models and the spin-echo sequence table 1005 accordingly. As a result, the error detection and machine-learning logic/circuit 1008 will become familiar with the particular characteristics of the quantum processor 207 with which it is associated and will learn to correct different types of errors, some of which may be unique to this quantum processor 207.
A method in accordance with one embodiment of the invention is illustrated in
At 1101 a corrective training sequence may be executed where the qubits of a quantum processor are evaluated through a series of operations and measurements to determine corrective operations. Based on the results, a corrective sequence table (e.g., the spin-echo sequence table described above) is updated with entries specifying corrective operations to be performed on this particular quantum processor in response to certain instructions. As described above, the corrective entries may be stored in a microcode ROM and may identify sequences of uops to be executed in place of or in addition to the uncorrected quantum uops.
At 1103, in response to a quantum macroinstruction, the corrective sequence table is queried to identify corrective uops associated with the quantum operation and/or the specific qubits which will be used. At 1104, the specified quantum operations are performed on specified qubits over the classical-quantum interface. At 1105, qubit measurements are performed in response to a codeword specifying measurement(s). At 1106, the analog measurements are converted to digital values which are subject to error detection/correction and, in one embodiment, machine learning. The machine learning, for example, may identify changes to the corrective sequence table to improve the corrective uop sequences. The measurement values may also be stored in the shared register file where they may be further processed.
Small-scale quantum information processors have been realized with various physical architectures. These processors include racks of classical control electronics in addition to the physical quantum chip placed inside a dilution refrigerator.
As quantum devices continue to mature, there is an emerging need to efficiently organize and orchestrate all elements of the control electronics stack so that the quantum physical chip can be manipulated (electrical controls, microwaves, flux) and measured with acceptable precision, allowing quantum experiments and programs to be conducted in a reliable and repeatable manner.
Research efforts have started moving towards a more compact form of the control electronics stack and classical computing components. However, in all current proposals, the quantum computer is built from physically separate and independently designed components including a classical CPU, a quantum co-processor, and control electronics. Because these components are designed with more flexible and generalized interfaces, the communication between these components includes significant energy overhead, which negatively impacts the control and operational efficiency of the quantum processor.
To solve these problems, one embodiment of the invention, illustrated in
A quantum-classical interface 206 is also integrated on the quantum control stack chip 1210 which includes a quantum operation analog signal generator 1201 comprising an analog/RF component 1201B for generating analog signals to control the qubits of the quantum processor 207 based on digital waveforms received from the digital portion of the interface 1201A. In addition, qubit measurement circuitry 1202 includes an analog/RF measurement component 1202B for taking qubit measurements in response to signals received from a digital measurement component 1202A (e.g., responsive to execution of one or more measurement uops).
In one embodiment, the integrated quantum control stack chip 1210 has power/performance characteristics which allow it to be included within the room temperature stage floor 1250 of the quantum system and closely coupled to the quantum processor 207 which is maintained within the milli-kelvin stage floor 1251. In an alternate embodiment, a low temperature stage floor 1250 may be used (e.g., a 4k stage floor).
Thus, this embodiment eliminates any inter-module interface and communication overhead at architecture level, directly coupling the quantum control stack chip 1210 to the quantum processor 207. The individually designed chip 1210 includes standard interface protocols. For example, current implementations have control and measurement ICs which use low bandwidth buses, such as a serial peripheral interface (SPI) bus, to communicate with the primary controller chip. When the primary control chip and control/management ICs are integrated, the interface between these components can be removed. Integration enables a highly efficient pipeline and data-path to be design to communicate control and data between functional units.
In addition, the inter-module communication may be optimized at the architecture level, to pass operations and receive data between the commander and responder. One example of an architecture-level protocol optimization is in the queue-based signal crossing between the non-deterministic timing domain of the digital quantum control stack chip 1210 and the deterministic timing domain of the quantum processor 207. Optimizations may also be employed between clock domains.
In general, the embodiment illustrated in
While one embodiment integrates the digital processor 1210 with the control electronics 206 that drive analog control signals to the quantum physical chip 207 to manipulate qubits, all such control electronics functionality need not be integrated at the same time. For example, the integration can be staged to pull in certain integrated circuits which have been thoroughly tested first, and then other components when they become mature. By way of example, and not limitation, the DC electronics and flux AWG integration within the quantum-classical interface 206 may be performed at a later time.
Several prominent quantum algorithms that exhibit a quantum speedup do so only if the step of “loading” the classical data is not considered (Aaronson, Read the Fine Print, Nature-Physics volume 11, pages 291-293 (2015)). Often, because it takes so much time to simply load the data, any quantum speedup from the algorithm itself is effectively cancelled out. This is sometimes referred to as the “input problem” and is a widely discussed obstacle in the quantum community. The input problem can be a steep obstacle in quantum algorithms for machine learning (ML) and artificial intelligence (AI), and for partial differential equations (e.g., for fluids, mechanics, finance, etc).
In the most general case, loading does indeed take an exponential number of operations with respect to the number of qubits (Poulin et al., Quantum simulation of time-dependent Hamiltonians and the convenient illusion of Hilbert space (Feb. 7 2011)). To illustrate the point, assume a classical vector of length N=1 billion (109). This is not an unusually large classical data size, and it can be stored in just m=log2(109)≈30 qubits. In the worst case, the number of required quantum gates is O(poly (N)) or O(exp (m)). In other words, the worst case scales polynomially in the size of the data, which is equivalent to exponential scaling in the number of qubits. Since the goal is to use exponentially faster quantum algorithms on the loaded data, this worst-case loading time might often cancel out any quantum advantage from running the algorithm.
Though the most general input vector cannot be efficiently implemented, the technical challenge is to develop algorithms that can automatically and efficiently load as many classes of inputs as possible in as short a circuit depth as possible.
Embodiments of the invention (a) introduce automated tensor-network-based short-depth techniques for loading arbitrary classical data, and (b) provide early numerical results that demonstrate “solving” of the input problem for image processing, a very common class of data.
These embodiments are applicable to any application which processes classical data with quantum computers, broadening the types of classical data that can be efficiently loaded, leading to advantages in quantum algorithms for classical problems, including but not limited to image processing, financial analysis, fluid mechanics, and molecular dynamics.
While some have claimed to have created efficient data loaders, they have actually solved a different (significantly easier) problem, that of loading N qubits into significantly more than log2N qubits. One paper claims to use the data loading algorithm, stating that the loader “is a ‘parallel’ unary loader that loads a data point with d features, each of which can be a real number, with exactly d qubits . . . ” (Johri, et al., Nearest Centroid Classification on a Trapped Ion Quantum Computer (Dec. 9, 2020)). Additionally, their demonstration is of loading an 8-dimensional problem into 8 qubits (not into log2(8)=3 qubits). Hence, they are not solving the same problem—i.e. they are not loading data in a compact way. Using the example above, they would require 109 qubits to load a 109-length vector. In contrast, embodiments of the invention store, for example, a 109-dimensional vector into just 30 qubits.
The following general types of approaches have been introduced for the input problem: extremely low-depth methods that work for specific classes of smooth functions (see, e.g., Adam Holmes, A. Y. Matsuura, Efficient Quantum Circuits for Accurate State Preparation of Smooth, Differentiable Functions (May 9, 2020); pristine very-high-depth general input state preparation methods, which scales exponentially (see, e.g., Maria Schuld, Francesco Petruccione, Supervised Learning with Quantum Computers (2018), chapter 5); methods based on neural network methods from IBM (see, e.g., Christa Zoufal, st al., Quantum Generative Adversarial Networks for learning and loading random distributions (Nov. 22, 2019)); and methods that load data in a non-compact way (see, e.g., Sonika Johri et al., Nearest Centroid Classification on a Trapped Ion Quantum Computer (Dec. 9, 2020)).
The above-mentioned low-depth methods work well for a specific set of commonly used functions; they are not expected to perform well with real-world data which is more chaotic. We have confirmed numerically that this single-layer method does not accurately reproduce image data. Further, errors are not controllable.
The high-depth exact state-preparation methods scale exponentially; exponentially deep quantum circuits are likely to eliminate any expected quantum advantage.
The above-mentioned neural network-based methods (a) must have been trained on data that is similar to the desired input state, (b) require substantial computational resources simply to train, and (c) produce errors which cannot be easily controlled.
Embodiments of the invention include error-controllable techniques for loading classical data into quantum computers, thus specifically addressing the input problem. At least some of these techniques are based on tensor network theory using repetitions of singular value decomposition (SVD), truncation, and automated choice of tensor structure. Importantly, the algorithms' resulting tensor networks take the shape of a quantum circuit of one-qubit and two-qubit gates.
These implementations are controllable in two senses: (a) an arbitrarily low error can be achieved and (b) an arbitrary quantum circuit shape may be specified by the user. Moreover, at least some embodiments are automated, whereas previously, special classes of algorithms required human input.
In some embodiments, the circuit structure itself is arbitrary and modifiable, meaning that the techniques described herein are tailorable to arbitrary hardware topologies. At least one implementation is “deterministic” —no training or optimization or searching of the space is necessary—and the number of classical steps in the preparation are highly predictable.
Embodiments of the invention include an apparatus and method for inputting classical data into a quantum computer based on tensor network representations developed for compiling the desired quantum circuit. Referring to
The above description is similar to Ran, Encoding of matrix product states into quantum circuits of one- and two-qubit gates (Mar. 9 2020), which was applied to approximate quantum states (not classical input states). The tensor may be made unitary by first expanding the size of the tensor while keeping the original scalar values fixed, and subsequently for instance (a) determining the vectors of the null space of the original vectors in the tensor, and inserting these new vectors into the new positions; or by (b) varying the new scalar values variationally until the tensor is unitary. Because these new vectors (in the expanded tensors) may have arbitrary values, they are referred to as “gauge” degrees of freedom below. The process can be trivially generalized and applied in an iterative manner to any number of qubits.
For an arbitrary input vector, at least one embodiment of the invention performs the operations in
Some embodiments of the invention use tensor network decompositions that are more complex than the one-at-a-time layer described above. In contrast to building up the circuit one layer at a time, the circuit may be decomposed into a specific network pattern, an example of which is illustrated in
The benefit of directly (as opposed to indirectly layer-by-layer) constructing a deeper tensor network is that (a) they are often more efficient to compute, and (b) they are likely to produce greater accuracy for the same circuit depth, because of the well-known optimality of the SVD truncation procedure. In the example of the direct construction of a multi-layer quantum circuit, by keeping the “goal” tensor network 1601 in mind, the SVD decompositions, truncations, and contractions are performed that match the given pattern. In a final operation (not shown), the appropriate m−1 tensors are expanded with the new values being assigned such that the tensor is unitary, as described above.
Embodiments of the invention may tailor the tensor decomposition algorithm to an arbitrarily-defined quantum circuit ansatz such as those illustrated in
The previously discussed quantum circuit TN structure is illustrated at 1701. A shorter-depth quantum circuit TN structure is shown at 1702, and a quantum circuit TN structure that was tailored to the specific hardware connectivity shown is illustrated at 1703, taking full advantage of the native gates and hardware structure. Embodiments of the invention can be used to produce the input-loading quantum circuit on this hardware-specific ansatz.
A related embodiment uses a classical computer to simulate the variational quantum eigensolver (VQE) with any arbitrary circuit ansatz, where the cost function is the overlap (i.e. inner product) of the quantum circuit's state and the desired classical state vector. This embodiment does not have predictable runtime and may take a long time to converge; however, this process may require fewer total classical operations for some simple classical input vectors. When the algorithm has chosen optimal angles for the quantum gates, the resulting quantum circuit is the desired input-loading circuit.
An additional hardware-related consideration is the effect that a quantum operation has on nearby qubits not meant to be affected. Some embodiments use virtual compensation when implementing the input-loading quantum circuit. Using cross-talk models based on data from the quantum device, virtual compensation introduces pulses on these nearby qubits, in order to mitigate errors introduced by the operation. In this implementation, virtual compensation is taken into account in the “gauge” degrees of freedom for the input-loading algorithm; in other words, the arbitrary choices in the design of quantum gates (when they are made to be unitary as described above) may be tailored to ensure that as little virtual compensation as possible is required.
An additional hardware-tailored process is used in some embodiments, based on adapting the algorithm to conform to the specific native gate set of a particular quantum device. One particular embodiment uses a spin qubit quantum computer. However, various other quantum computer types may be used including, but not limited to ion traps and superconducting quantum hardware. A spin qubit quantum computer may have multiple native two-qubit gates; e.g. all controlled-Z-rotation gates of arbitrary angle (Watson et al., A programmable two-qubit quantum processor in silicon (Feb. 14, 2018), all SWAPk gates where k is an arbitrary real number (Li et al., A crossbar network for silicon quantum dot qubits (Jul. 6, 2018)), or CNOT gates (Zajac et al., Resonantly driven CNOT gate for electron spins (Dec. 7, 2017)). The “gauge” degrees of freedom during the quantum circuit compilation may be chosen such that the decomposition to these native gate sets use as few total gates as possible.
In an example embodiment, the above techniques were implemented in software to generate the following results for an example length-8 state [0.363281, 0.462119, 0.042107, 0.671827, 0.350777, 0.117648, 0.248069, 0.054076]
The resulting quantum gate sequence is:
This implementation was also used to study real-world data.
In the above detailed description, reference is made to the accompanying drawings that form a part hereof, and in which is shown, by way of illustration, embodiments that may be practiced. It is to be understood that other embodiments may be utilized and structural or logical changes may be made without departing from the scope of the present disclosure. Therefore, the following detailed description is not to be taken in a limiting sense.
Various operations may be described as multiple discrete actions or operations in turn in a manner that is most helpful in understanding the claimed subject matter. However, the order of description should not be construed as to imply that these operations are necessarily order dependent. In particular, these operations may not be performed in the order of presentation. Operations described may be performed in a different order from the described embodiment. Various additional operations may be performed, and/or described operations may be omitted in additional embodiments. Terms like “first,” “second,” “third,” etc. do not imply a particular ordering, unless otherwise specified.
For the purposes of the present disclosure, the phrase “A and/or B” means (A), (B), or (A and B). For the purposes of the present disclosure, the phrase “A, B, and/or C” means (A), (B), (C), (A and B), (A and C), (B and C), or (A, B, and C). The term “between,” when used with reference to measurement ranges, is inclusive of the ends of the measurement ranges. As used herein, the notation “A/B/C” means (A), (B), and/or (C).
The following are example implementations of different embodiments of the invention.
Example 1. A method, comprising: receiving an input tensor corresponding to a quantum state; performing a sequence of tensor network operations on the input tensor to determine an output tensor representing an ordered list of quantum gates to be executed by a specified target quantum device, wherein the sequence of tensor network operations include a plurality of singular value decomposition (SVD) operations and one or more operations to ensure that the output tensor is unitary.
Example 2. The method of example 1 wherein the sequence of tensor network operations are performed based on specified hardware capabilities of the target quantum device to execute the ordered list of quantum gates.
Example 3. The method of example 2 wherein the hardware capabilities of the target quantum device include a particular number of qubits and a particular connectivity between the qubits.
Example 4. The method of claim 1 wherein the plurality of SVD operations include one or more truncations, contractions, and/or decompositions.
Example 5. The method of example 1 wherein the one or more operations to ensure that the output tensor is unitary comprises adding indices to one or more gates of the ordered list of quantum gates.
Example 6. The method of example 1 wherein the one or more operations to ensure that the output tensor is unitary comprises expanding a size of the input tensor while keeping one or more original scalar values fixed, and either (a) determining vectors of a null space of original vectors in the input tensor and inserting these new vectors into new positions; or (b) varying new scalar values variationally until the tensor is unitary.
Example 7. The method of example 1 wherein for a quantum device comprising M qubits, the input tensor is limited to a size of N=2m.
Example 9. The method of example 4 wherein each of the plurality of SVD operations are performed in view of a defined tensor network goal.
Example 10. The method of example 1 further comprising: performing one or more virtual compensation operations to mitigate errors when loading the ordered list of quantum gates on the target quantum device, the virtual compensation operations comprising generating pulses on one or more qubits based on data associated with the quantum device.
Example 11. A machine-readable medium having program code stored thereon which, when executed by a machine, causes the machine to perform the operations of: receiving an input tensor corresponding to a quantum state; performing a sequence of tensor network operations on the input tensor to determine an output tensor representing an ordered list of quantum gates to be executed by a specified target quantum device, wherein the sequence of tensor network operations include a plurality of singular value decomposition (SVD) operations and one or more operations to ensure that the output tensor is unitary.
Example 12. The machine-readable medium of example 11 wherein the sequence of tensor network operations are performed based on specified hardware capabilities of the target quantum device to execute the ordered list of quantum gates.
Example 13. The machine-readable medium of example 12 wherein the hardware capabilities of the target quantum device include a particular number of qubits and a particular connectivity between the qubits.
Example 14. The machine-readable medium of example 11 wherein the plurality of SVD operations include one or more truncations, contractions, and/or decompositions.
Example 15. The machine-readable medium of example 11 wherein the one or more operations to ensure that the output tensor is unitary comprises adding indices to one or more gates of the ordered list of quantum gates.
Example 16. The machine-readable medium of example 11 wherein the one or more operations to ensure that the output tensor is unitary comprises expanding a size of the input tensor while keeping one or more original scalar values fixed, and either (a) determining vectors of a null space of original vectors in the input tensor and inserting these new vectors into new positions; or (b) varying new scalar values variationally until the tensor is unitary.
Example 17. The machine-readable medium of example 11 wherein for a quantum device comprising M qubits, the input tensor is limited to a size of N=2m.
Example 19. The machine-readable medium of example 14 wherein each of the plurality of SVD operations are performed in view of a defined tensor network goal.
Example 20. The machine-readable medium of example 11 further comprising program code to cause the machine to perform the operation of: performing one or more virtual compensation operations to mitigate errors when loading the ordered list of quantum gates on the target quantum device, the virtual compensation operations comprising generating pulses on one or more qubits based on data associated with the quantum device.
Example 21. An apparatus comprising: a memory to store program code and data; a processor to process the program code and data to perform a plurality of operations, comprising: receiving an input tensor corresponding to a quantum state; performing a sequence of tensor network operations on the input tensor to determine an output tensor representing an ordered list of quantum gates to be executed by a specified target quantum device, wherein the sequence of tensor network operations include a plurality of singular value decomposition (SVD) operations and one or more operations to ensure that the output tensor is unitary.
The description uses the phrases “in an embodiment” or “in embodiments,” which may each refer to one or more of the same or different embodiments. Furthermore, the terms “comprising,” “including,” “having,” and the like, as used with respect to embodiments of the present disclosure, are synonymous.
Embodiments of the invention may include various steps, which have been described above. The steps may be embodied in machine-executable instructions which may be used to cause a general-purpose or special-purpose processor to perform the steps. Alternatively, these steps may be performed by specific hardware components that contain hardwired logic for performing the steps, or by any combination of programmed computer components and custom hardware components.
As described herein, instructions may refer to specific configurations of hardware such as application specific integrated circuits (ASICs) configured to perform certain operations or having a predetermined functionality or software instructions stored in memory embodied in a non-transitory computer readable medium. Thus, the techniques shown in the figures can be implemented using code and data stored and executed on one or more electronic devices (e.g., an end station, a network element, etc.). Such electronic devices store and communicate (internally and/or with other electronic devices over a network) code and data using computer machine-readable media, such as non-transitory computer machine-readable storage media (e.g., magnetic disks; optical disks; random access memory; read only memory; flash memory devices; phase-change memory) and transitory computer machine-readable communication media (e.g., electrical, optical, acoustical or other form of propagated signals-such as carrier waves, infrared signals, digital signals, etc.).
In addition, such electronic devices typically include a set of one or more processors coupled to one or more other components, such as one or more storage devices (non-transitory machine-readable storage media), user input/output devices (e.g., a keyboard, a touchscreen, and/or a display), and network connections. The coupling of the set of processors and other components is typically through one or more busses and bridges (also termed as bus controllers). The storage device and signals carrying the network traffic respectively represent one or more machine-readable storage media and machine-readable communication media. Thus, the storage device of a given electronic device typically stores code and/or data for execution on the set of one or more processors of that electronic device. Of course, one or more parts of an embodiment of the invention may be implemented using different combinations of software, firmware, and/or hardware. Throughout this detailed description, for the purposes of explanation, numerous specific details were set forth in order to provide a thorough understanding of the present invention. It will be apparent, however, to one skilled in the art that the invention may be practiced without some of these specific details. In certain instances, well known structures and functions were not described in elaborate detail in order to avoid obscuring the subject matter of the present invention. Accordingly, the scope and spirit of the invention should be judged in terms of the claims which follow.
Number | Date | Country | |
---|---|---|---|
63414847 | Oct 2022 | US |