Disclosed is a ray-based classifier apparatus for tuning a device using machine learning with a ray-based classification framework, the ray-based classifier apparatus comprising: a machine learning module in communication with an autotuning module and that communicates a device state to the autotuning module, the machine learning module comprising: a training data generator module that produces fingerprint data; and a machine learning trainer module in communication with the training data generator module and that receives the fingerprint data from the training data generator module and produces the device state; and the autotuning module comprising: a recognition module in communication with the machine learning trainer module and a measurement module and that receives the device state from the machine learning trainer module, receives ray-based data from the measurement module, and produces recognition data based on the device state and the ray-based data; a comparison module in communication with the recognition module and that receives the recognition data from the recognition module and produces comparison data based on comparing the recognition data with a target state of the device; a prediction module in communication with the comparison module and that receives the comparison data from the comparison module and produces prediction data for the device based on the comparison data; a gate voltage controller in communication with the prediction module and the device and that receives the prediction data from the prediction module, produces controller data and device control data based on the prediction data, controls the device with the device control data, and communicates the controller data to a measurement module; and the measurement module in communication with the gate voltage controller, the device, and the recognition module and that receives the controller data from the gate voltage controller, receives device data from the device, produces ray-based data based on the controller data and the device data, and communicates the ray-based data to the recognition module, such that the recognition module performs recognition on the ray-based data using the device state, wherein the machine learning module and the autotuning module comprise one or more of logic hardware and a non-transitory computer readable medium storing computer executable code.
Disclosed is a process for tuning a device using machine learning with a ray-based classification framework and an autotuning module, the process comprising: generating, by a training data generator module using logic hardware, fingerprint data for the device; receiving, by a machine learning trainer module, the fingerprint data from the training data generator module; performing, by the machine learning trainer module using logic hardware, machine language training and producing a device state of the device from the fingerprint data; receiving, by a recognition module, the device state from the machine learning trainer module; recognizing, by the recognition module using logic hardware, the state of the device from the device state using a trained deep neural network and producing recognition data based on the device state; receiving, by a comparison module, the recognition data from the recognition module; comparing, by the comparison module using logic hardware, a target state of the device with the recognition data and producing comparison data as a result of the comparison; receiving, by a prediction module, the comparison data from the comparison module; producing, by the prediction module using logic hardware, prediction data based on the comparison data; receiving, by a gate voltage controller, the prediction data from the prediction module; producing, by the gate voltage controller using logic hardware, controller data and device control data based on the prediction data; receiving, by the device, the device control data from the gate voltage controller, controlling the device with the device control data to modify the state of the device, and producing device data in response to controlling the device with the device control data; receiving, by a measurement module, the controller data from the gate voltage controller and device data from the device; producing, by the measurement module using logic hardware, ray-based data based on the controller data and the device data; and receiving, by the recognition module, the ray-based data from the measurement module and performing recognition on the ray-based data using the device state from the machine learning trainer module.
Disclosed is a process for tuning a device using machine learning with a ray-based classification framework and action-based navigator module, the process comprising: generating, by a training data generator module using logic hardware, fingerprint data for the device; receiving, by a machine learning trainer module, the fingerprint data from the training data generator module; performing, by the machine learning trainer module using logic hardware, machine language training and producing a device state of the device from the fingerprint data; setting, by a charging module using logic hardware, the charging energy for each quantum well of the device and defining a state action for each of the quantum wells by sending charging data to the device using logic hardware; acquiring, by a data acquisition module using logic hardware, state data from the device for a selected state recognizer; receiving, by a data checker module in communication with the data acquisition module, the state data from the data acquisition module and checking quality of the state data; and receiving, by a state estimator module in communication with the data checker module and the machine learning trainer module, the state data from the data checker module and the device state from the machine learning trainer module; estimating, by the state estimator module using logic hardware, the state of the device, determining whether to tune the device based on the state data relative to an estimation for the state of the device, and producing charging data and tuning the device according to the charging data based on the number of quantum dots of the device.
The following description cannot be considered limiting in any way. Various objectives, features, and advantages of the disclosed subject matter can be more fully appreciated with reference to the following detailed description of the disclosed subject matter when considered in connection with the following drawings, in which like reference numerals identify like elements.
A detailed description of one or more embodiments is presented herein by way of exemplification and not limitation.
Aspects of the present disclosure may be embodied as an apparatus, system, method, or computer program product. Accordingly, aspects of the present disclosure may take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, micro-code, and the like), or an embodiment combining software and hardware aspects that may generally be referred to herein as a “circuit.” “module,” or “system.” Furthermore, aspects of the present disclosure may take the form of a computer program product embodied in one or more computer readable storage media having computer readable program code embodied thereon.
Many of the functional units described in this specification have been labeled as modules, to more particularly emphasize their implementation independence. For example, a module may be implemented as a hardware circuit including custom VLSI circuits or gate arrays, off-the-shelf semiconductors such as logic chips, transistors, or other discrete components. A module can be implemented in programmable hardware devices such as field programmable gate arrays, programmable array logic, programmable logic devices, or the like.
Modules also can be implemented in software for execution by various types of processors. An identified module of executable code may, e.g., include one or more physical or logical blocks of computer instructions that can, e.g., be organized as an object, procedure, or function. Nevertheless, the executables of an identified module need not be physically located together but can include disparate instructions stored in different locations that, when joined logically together, include the module and achieve the stated purpose for the module.
Indeed, a module of executable code can be a single instruction, or many instructions, and can be distributed over several different code segments, among different programs, or across several memory devices. Similarly, operational data can be identified and illustrated herein within modules, and can be embodied in any suitable form and organized within any suitable type of data structure. The operational data can be collected as a single data set or can be distributed over different locations including over different storage devices and can exist, at least partially, as electronic signals on a system or network. Where a module or portions of a module are implemented in software, the software portions are stored on one or more computer readable storage media. It should be appreciated that a executable code can be implemented in logical hardware that includes applicable circuit elements and communication media.
Any combination of one or more computer readable storage media can be used. A computer readable storage medium can be, e.g., but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing or elements known in the art.
Exemplary computer readable storage medium can include the following, a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), a portable compact disc read-only memory (CD-ROM), a digital versatile disc (DVD), a Blu-ray disc, an optical storage device, a magnetic tape, a Bernoulli drive, a magnetic disk, a magnetic storage device, a punch card, integrated circuits, other digital processing apparatus memory devices, or any suitable combination of the foregoing, but would not include propagating signals. In the context of this document, a computer readable storage medium can be any tangible medium that can contain or store a program for use by or in connection with an instruction execution system, apparatus, or device.
Computer program code for carrying out operations for aspects of the present disclosure can be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Python. C++, or the like and conventional procedural programming languages, such as the “C” programming language or similar programming languages. The program code can execute entirely on the users computer, partly on the user's computer, as a stand-alone software package, partly on the users computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection can be made to an external computer, e g, through the Internet using an Internet Service Provider.
Furthermore, the described features, structures, or characteristics of the disclosure can be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided, such as examples of programming, software modules, user selections, network transactions, database queries, database structures, hardware modules, hardware circuits, hardware chips, and the like, to provide a thorough understanding of embodiments of the disclosure. However, the disclosure can be practiced without one or more of the specific details, or with other methods, components, materials, and so forth. In other instances, well-known structures, materials, or operations are not shown or described in detail to avoid obscuring aspects of the disclosure.
Aspects of the present disclosure are described below with reference to schematic flowchart diagrams or schematic block diagrams of methods, apparatuses, systems, and computer program products according to embodiments of the disclosure. It will be understood that each block of the schematic flowchart diagrams or schematic block diagrams and combinations of blocks in the schematic flowchart diagrams or schematic block diagrams can be implemented by computer program instructions. These computer program instructions can be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, implement the functions or acts specified in the schematic flowchart diagrams or schematic block diagrams block or blocks.
These computer program instructions can be stored in a computer readable storage medium that can direct a computer, other programmable data processing apparatus, or other devices to function in a particular manner, such that the instructions stored in the computer readable storage medium produce an article of manufacture including instructions that implement the function or act specified in the schematic flowchart diagrams or schematic block diagrams block or blocks.
The computer program instructions can be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide processes for implementing the functions or acts specified in the flowchart or block diagram block or blocks.
The schematic flowchart diagrams or schematic block diagrams in the Figures illustrate architecture, functionality, and operation of possible implementations of apparatuses, systems, methods, and computer program products according to various embodiments of the present disclosure. In this regard, each block in the schematic flowchart diagrams or schematic block diagrams can represent a module, segment, or portion of code, which includes one or more executable instructions for implementing the specified logical function(s).
It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the Figures. For example, two blocks shown in succession can be executed substantially concurrently, or the blocks sometimes can be executed in the reverse order, depending upon the functionality involved. Other steps and methods can be conceived that are equivalent in function, logic, or effect to one or more blocks, or portions thereof, of the illustrated Figures.
Although various arrow types and line types may be employed in the flowchart or block diagrams, they are understood not to limit the scope of the corresponding embodiments. Indeed, some arrows or other connectors can be used to indicate the logical flow of the depicted embodiment. For instance, an arrow may indicate a waiting or monitoring period of unspecified duration between enumerated steps of the depicted embodiment. It will also be noted that each block of the block diagrams or flowchart diagrams, and combinations of blocks in the block diagrams or flowchart diagrams, can be implemented by special purpose hardware-based systems that perform the specified functions or acts or combinations of special purpose hardware and computer instructions.
A machine learning algorithm is an algorithm that can learn based on a set of data. Embodiments of machine learning algorithms can be designed to model high-level abstractions within a data set. For example, image recognition algorithms can be used to determine which of several categories to which a given input belong: regression algorithms can output a numerical value given an input; and pattern recognition algorithms can be used to generate translated text or perform text to speech or speech recognition.
An exemplary type of machine learning algorithm is a neural network. There are many types of neural networks: a simple type of neural network is a feedforward network. A feedforward network can be implemented as an acyclic graph in which the nodes are arranged in layers. Typically, a feedforward network topology includes an input layer and an output layer that are separated by at least one hidden layer. The hidden layer transforms input received by the input layer into a representation that is useful for generating output in the output layer. The network nodes are fully connected via edges to the nodes in adjacent layers, but there are no edges between nodes within each layer. Data received at the nodes of an input layer of a feedforward network are propagated (i.e., fed forward) to the nodes of the output layer via an activation function that calculates the states of the nodes of each successive layer in the network based on coefficients (weights) that are respectively associated with each of the edges connecting the layers. Depending on the specific model being represented by the algorithm being executed, the output from the neural network algorithm can take various forms.
Before a machine learning algorithm can be used to model a particular problem, the algorithm is trained using a training data set. Training a neural network involves selecting a network topology, using a set of training data representing a problem being modeled by the network, and adjusting the weights until the network model performs with a minimal error for all instances of the training data set. For example, during a supervised learning training process for a neural network, the output produced by the network in response to the input representing an instance in a training data set is compared to the correct labeled output for that instance, an error signal representing the difference between the output and the labeled output is calculated, and the weights associated with the connections are adjusted to minimize that error as the error signal is backward propagated through the layers of the network. The network is considered trained when the errors for each of the outputs generated from the instances of the training data set are minimized.
The accuracy of a machine learning algorithm can be affected significantly by the quality of the data set used to train the algorithm. The training process can be computationally intensive and can involve a significant amount of time on a conventional general-purpose processor Accordingly, parallel processing hardware is used to train many types of machine learning algorithms. This can be particularly useful for optimizing the training of neural networks, as the computations performed in adjusting the coefficients in neural networks lend themselves naturally to parallel implementations. Specifically, many machine learning algorithms and software applications have been adapted to make use of the parallel processing hardware within general-purpose graphics processing devices.
Hardware acceleration for machine learning application 272 can be enabled via machine learning framework 273. Machine learning framework 273 can provide a library of machine learning primitives. Machine learning primitives are basic operations that are commonly performed by machine learning algorithms. Without machine learning framework 273, developers of machine learning algorithms would be required to create and optimize the main computational logic associated with the machine learning algorithm, then re-optimize the computational logic as new parallel processors are developed. Instead, the machine learning application can be configured to perform the necessary computations using the primitives provided by machine learning framework 273. Exemplary primitives include tensor convolutions, activation functions, and pooling, which are computational operations that are performed while training a convolutional neural network (CNN). Machine learning framework 273 can provide primitives to implement basic linear algebra subprograms performed by many machine-learning algorithms, such as matrix and vector operations.
Machine learning framework 273 can process input data received from machine learning application 272 and generate the appropriate input to compute framework 274. Compute framework 274 can abstract the underlying instructions provided to GPGPU driver 275 to enable machine learning framework 273 to take advantage of hardware acceleration via GPGPU hardware 276 without requiring machine learning framework 273 to have intimate knowledge of the architecture of GPGPU hardware 276. Additionally, compute framework 274 can enable hardware acceleration for machine learning framework 273 across a variety of types and generations of GPGPU hardware 276.
The computing architecture provided by embodiments described herein can be configured to perform the types of parallel processing that is particularly suited for training and deploying neural networks for machine learning. A neural network can be generalized as a network of functions having a graph relationship. A variety of types of neural network implementations are used in machine learning. An exemplary type of neural network is the feedforward network, as previously described.
A second exemplary type of neural network is the Convolutional Neural Network (CNN). A CNN is a specialized feedforward neural network for processing data having a known, grid-like topology, such as image data. Accordingly, CNNs are commonly used for compute vision and image recognition applications. The nodes in the CNN input layer can be organized into a set of filters (feature detectors inspired by the receptive fields found in the retina), and the output of each set of filters is propagated to nodes in successive layers of the network. The computations for a CNN include applying the convolution mathematical operation to each filter to produce the output of that filter. Convolution is a specialized kind of mathematical operation performed by two functions to produce a third function that is a modified version of one of the two original functions. In convolutional network terminology, the first function to the convolution can be referred to as the input, while the second function can be referred to as the convolution kernel. The output can be referred to as the feature map. For example, the input to a convolution layer can be a multidimensional array of data that defines the various components, e.g., colors or contrasts, of an input image. The convolution kernel can be a multidimensional array of parameters, where the parameters are adapted by the training process for the neural network.
Recurrent neural networks (RNNs) are a family of feedforward neural networks that include feedback connections between layers. RNNs enable modeling of sequential data by sharing parameter data across different parts of the neural network. The architecture for a RNN includes cycles. The cycles represent the influence of a present value of a variable on its own value at a future time, as at least a portion of the output data from the RNN is used as feedback for processing subsequent input in a sequence. This feature makes RNNs particularly useful in dynamical systems where the state of the system changes, such as for language processing due to the variable nature in which language data can be composed.
The figures described below include a general process for respectively training and deploying various types of networks. It will be understood that these descriptions are exemplary and non-limiting as to any specific embodiment described herein, and the concepts illustrated can be applied generally to deep neural networks and machine learning techniques in general.
The exemplary neural networks described above can be used to perform deep learning. Deep learning is machine learning using deep neural networks. The deep neural networks used in deep learning are artificial neural networks composed of multiple hidden layers, as opposed to shallow neural networks that include only a single hidden layer. Deeper neural networks are generally more computationally intensive to train. However, the additional hidden layers of the network enable multistep pattern recognition that results in reduced output error relative to shallow machine learning techniques.
Deep neural networks used in deep learning typically include a front-end network to perform feature recognition coupled to a back-end network which represents a mathematical model that can perform operations (e.g., object classification, speech recognition, and the like) based on the feature representation provided to the model. Deep learning enables machine learning to be performed without requiring hand crafted feature engineering to be performed for the model. Instead, deep neural networks can learn features based on statistical structure or correlation within the input data. The learned features can be provided to a mathematical model that can map detected features to an output. The mathematical model used by the network is generally specialized for the specific task to be performed, and different models will be used to perform different task.
Once the neural network is structured, a learning model can be applied to the network to train the network to perform specific tasks. The learning model describes how to adjust the weights within the model to reduce the output error of the network. Backpropagation of errors is a common method used to train neural networks. An input vector is presented to the network for processing. The output of the network is compared to the desired output using a loss function and an error value is calculated for each of the neurons in the output layer. The error values are then propagated backwards until each neuron has an associated error value which roughly represents its contribution to the original output. The network can then learn from those errors using an algorithm, such as the stochastic gradient descent algorithm, to update the weights of the of the neural network.
In some embodiments, a device is tuned to form quantum dots having a selected electron occupancy. Such a device with selectively tailorable arrangements of quantum dots that are addressed via gate electrodes under control of individual gate electrodes potentials can be used in quantum computing. In quantum computing, there is a need for means for controlling and coupling of single charges and spins, for which processes and articles described herein provide.
For encoding and manipulation of quantum information, what is required is confinement of single electrons. The spin degree of freedom of the electron provides a natural two-level quantum system to encode the information in the form of a quantum bit (qubit), the fundamental unit of quantum information. In this case, the qubit includes a spin up state (state 0), a spin down state (state 1), and interim states that are a superposition of both the spin up and spin down states at the same time. The states of a qubit can be represented as points on the surface of a sphere (the Bloch sphere) as shown in
Of the variety of approaches to confining electron spins, confinement of a single electron spin in solid-state is sought with the goal of integration with solid-state (micro-) electronics. A quantum dot (QD) provides such confinement by using, in some implementations, electric control gates on a semiconductor substrate. Frequently used substrates include silicon (Si), aluminum gallium arsenide heterostructures (AlGaAs/GaAs), silicon germanium heterostructures (Si/SiGe), and indium arsenide (InAs).
Quantum computation can be performed with spin qubits from a plurality of quantum dots. Quantum computation is generally represented as a sequence of operations involving precise functionalities from a physical circuit. A sequence is represented in
An array of quantum dots (QDs) is used, and in some implementations their reservoir (R), like in
The next part is initializing qubits to a known state. It is performed in some implementations by applying an external magnetic field to polarize the spins, as in
A readout of some or all of the qubits determines the result of a quantum calculation. In some implementations, this can be obtained by spin dependent tunneling to the reservoir, where the electron occupation in the dot remains one if the spin is up, and becomes zero if the spin is down. The change in occupation is detected by charge sensing (
Control of coherent electron spin states in quantum dots can be limited by short coherence times due to a short stability of the superposition state. In this sense, qubits are fragile entities. The challenge is to protect the state of a qubit from the surrounding environment long enough to achieve a sufficient number of logic operations on the quantum state for useful calculations. In order to achieve this feat, the surrounding environment is controlled. Isotopically enriched 28Si substrates can provide sufficiently long coherence times for robust quantum computing with spin qubits in quantum dots.
Architectures for quantum dots include arena designs and local accumulation designs. Arena designs rely on electrostatic gates to deplete regions of a two-dimensional electron gas (2DEG), formed by an heterostructure or by a global accumulation gate.
In local accumulation designs, dots or reservoirs can be formed directly by local accumulation gates instead of a combination of a global accumulation gate and electrostatic gate areas. Additional gates increase the confinement of accumulated regions and control the tunnel barriers. Here, the quantum dot well can be provided using an accumulation gate, while the reservoir can be provided using a depletion mode tunnel barrier gate and confinement in the well is enhanced using various depletion mode gates.
It has been discovered that ray-based classifier apparatus and tuning a device using machine learning with a ray-based classification framework provide a machine learning algorithm trained on a dataset of simulated measurements of the device that includes a plurality of quantum dot can tune the device to operate for single- and few-electron configurations. In this respect, a deep neural network-based classification framework uses a minimal collection of one-dimensional measurements, referred to as rays, initiated at a given point to make a fingerprint of the device's state. The ray-based classifier apparatus and tuning the device substantially reduce the time and number of measurements to characterize the state's device when compared to conventional two-dimensional scans. In an aspect, the training dataset is generated using a selected physical model to create qualitative agreement. The physical model can be a Thomas-Fermi based electron density model. By varying the defining physical parameters in this model, a range of possible experimental configurations and realizations is sampled, making the classifier device agnostic. This trained system is then used to identify and tune real-world devices.
The ray-based classifier apparatus and tuning use a framework for assessing the state of multi-parameter devices that combine reduced measurements (i.e., ray-based measurements) with artificial intelligence (AI). The state recognition framework is based on a deep neural network trained on geometric information extracted from the ray-based measurements and simulations of the target physical system. This information, in combination with an optimization algorithm, allows us to tune the device state to specified, useful parameter regimes.
For quantum dot-based devices, measurements of QD states can be represented visually as various shapes in the N-dimensional space, where the response variable peaks at the boundaries of the shapes (corresponding to changes in the occupation of the QDs). Here, N is the number of electrostatic gates that define the QDs. The specific geometry of these shapes corresponds to the number of populated QDs, which is valuable information in the process of tuning a QD system. For the simple case of double QD devices, the states are characterized by a series of parallel lines of certain angularity (when only a single dot is formed), honeycomb-like shapes (when two coupled dots are formed), or a lack of regular curvatures (when no dot is formed). For devices with more dots, the states are characterized by different bounded and unbounded polytopes in the N-dimensional space. As used herein, “dot” refers to a quantum dot, and an isolated island of electron density is provided by each quantum dot.
A conventional calibration process for QD devices involves a series of measurements that involve sweeping one or more voltages on electrostatic gates that control various device parameters, including the number and occupation of QDs, while monitoring a single response variable. For physical construction of systems with N>>3 electrostatic gates used to create a large number of dots necessary for quantum computing, it is imperative to have a reliable automated method to find a stable, desirable electron configuration in an array of quantum dots.
As the number of gates increases, heuristic classification and tuning of the system becomes increasingly difficult, as does the time it takes to fully explore the voltage space of all relevant gates. Rather than using dense, multi-dimensional data, the ray-based classifier apparatus and tuning process described herein includes a DNN classification framework that uses a minimal collection of rays as one-dimensional representations to construct the fingerprint of the structure. The ray-based classifier apparatus and tuning process sample a small set of one-dimensional lines to determine volumetric information about the high dimensional space.
Ray-based classifier apparatus 200 tunes device 217 using machine learning with a ray-based classification framework. In an embodiment, with reference to
In an embodiment for action-based automated double dot navigation, with reference to
In an embodiment, with reference to
In an embodiment, with reference to
In an embodiment, device 217 can include double quantum dots. Here, double QD devices were analyzed using a physics-based simulator developed to mimic the behavior of actual experimental systems. A dataset of over 27 k fingerprints were generated over 20 different simulated devices. Specifically, devices were defined with five electrostatic gates (two plunger gates designed for QD formation, separated by three barrier gates controlling the movement of electrons, which can operate in one of five possible configurations: no dot (i.e., no island of electron density), a single dot primarily coupled to either the right or the left plunger gate or a single central dot (single island of electron density formed over the right or left plunger or centrally, respectively), and double dot (two islands of electron density).
A fully connected DNN can identify the state of the device. This trained network can be used to make predictions on data the DNN never encountered before. The ray-based classifier decreases computational cost and the amount of data needed as compared with conventional technology. The trained network can be combined with numerical optimization routines to identify and tune a series of devices into a pre-desired regime of operation.
Tuning device 217 using machine learning with the ray-based classification framework can include generating a simulated dataset of experimental results, training a neural network to learn certain characteristics from this dataset and then using the trained neural network to tune an physical device into proper regimes of operation. Tuning device 217 relies on existence of a good-quality dataset or simulation that can qualitatively mimic the device under operation. Training of the machine learning algorithm and its performance on real device data is dependent on whether the physical model that has gone into simulating the dataset has the right assumptions connecting it with real operation of device 217. Moreover, with an increasing number of quantum dots, simulation of the dataset can become prohibitively expensive, and there is a need to develop different approaches for dataset generation that ray-based classifier apparatus 200 and tuning described here provides. Advantageously, tuning a device using machine learning with a ray-based classification framework reduces the experimental and simulation time and data cost. Finally, tuning a device using machine learning with a ray-based classification framework provides a closed-loop system without intervention of a human-experimenter for tuning QDs.
Conventional adjustment of experimental devices often rely on heuristics developed by researchers. Tuning a device using machine learning with a ray-based classification framework eliminates such a dependence and instead substitutes it with a fully automatized routine with the heuristics gained from a dataset. Moreover, conventional tuning techniques rely on measuring 2D scans that does not scale with the increasing number of QDs. Tuning a device using machine learning with a ray-based classification framework provides an AI algorithm that is trained on data generated for a range of the defining physical parameters in the model, the classifier becomes device agnostic. As such, the trained system can be used to identify and tune various types and architectures of experimental devices, e.g., gate-defined QDs or dopants in semiconductors. The only thing that changes between the different devices is which gates need to be controlled by the tuner. Moreover, tuning a device using machine learning with a ray-based classification framework can be applied in efficient estimation of the states of solid-state and atomic experimental systems, as well as control problems in a variety of quantum computing architectures.
Ray-based classifier apparatus 200 and tuning a device using machine learning with a ray-based classification framework auto-tune quantum dot devices to a specific electron state that can be used to form quantum-dot-based qubits. This framework combines a data quality control module, machine-learning based state assessment with data collected either in a traditional 2D format or using the ray-based approach described above as well as an action-based approach to device calibration that combines small-scale ray-based measurements with physics knowledge about the device characteristics to bring the device to the desired electronic state. Ray-based classifier apparatus 200 and tuning a device using machine learning with a ray-based classification framework provides reliable automation of the calibration process while significantly reducing the time and number of measurements necessary for characterization compared to conventional approaches.
Ray-based classifier apparatus 200 and tuning a device using machine learning with a ray-based classification framework provides autonomous navigation of the voltage space of QD devices that exploits the features characteristic of the measurement space. QD qubit systems can include multiple electrostatic gates to isolate, control, and sense each qubit. Depending on the type of QD devices, specific gates can be designed to accumulate electrons into QDs (plungers) and gates to control the tunneling between QDs (barriers). There can be at least three metallic gates that are voltage-adjustable to isolate each dot to the single electron regime and to realize qubit performance.
Ray-based classifier apparatus 200 and tuning a device using machine learning with a ray-based classification framework can include modules for fine-tuning electrostatic gates to reach the device operating point. One module uses machine learning (ML) to identify the device state and the known effects of the gates on QD states to navigate to the N-QD region, where N is the number of charge islands possible in a QD device. Successful termination of this module can directly progress to a next module. The next module leverages calibrated physics-based actions and peak finding on sample-efficient 1-dimensional data (rays) to navigate to the area of the previous region where each charge island has a single charge.
The first module takes advantage of the designed effect of a device's gates to navigate voltage space. In contrast, conventional approaches for this level of tuning do not use the geometry of the manifolds defining QD states. For a QD device, the operating region includes a distinct island of electrons at the location of each plunger gate, separated by the electrostatic potential of the barrier gate. To reach this region, each plunger gate needs to be set to a high enough voltage to induce an electron island, but not too high relative to the barrier potential that the islands merge. Likewise, the barrier voltages need to be high enough to separate charge islands but not so high that no islands can form or that the interdot coupling is not possible. To determine which gates need to be changed and in what capacity, ray-based classifier apparatus 200 and tuning a device using machine learning with a ray-based classification framework combine physical knowledge about the gates with information about the state of the device through ML recognition of 2D data, of 1D data, or other methods such as pattern matching.
For a double QD device that includes two quantum dots, a no dot state indicates that no electrons are in the device so both plunger gate voltages must be increased. A left or right dot state indicates only one side of the dot is occupied so the voltage of the opposite plunger gate must be increased. A central dot indicates too many electrons are in the device so both plunger gate voltages must be decreased. A double dot state is the target, so no change is needed in this case. To address tuning in transitional regions where multiple states are present, the action taken is the average actions of the states weighted by the state percentage. For example, 50% single dot (decrease both plungers) and 50% left dot (increase right plunger) yield a decrease of the left plunger voltage.
The second module uses data-efficient 1-dimensional scans to unload each charge island to single electron occupation. This is a departure from conventional approaches that relied on 2D scans and ML. Changes in electron occupation are indicated by sharp changes in charge, which can be autonomously detected using peak detection algorithms. However, in the presence of noise, this peak detection can be unreliable. Moreover, each plunger gate has unintended effects on nearby quantum dots so the direction of 1D scans must be carefully chosen to ensure the desired outcome. Ray-based classifier apparatus 200 and tuning a device using machine learning with a ray-based classification framework uses automated quality assessment and redundancy to avoid failure due to unreliable peak detection. Ray-based classifier apparatus 200 and tuning a device using machine learning with a ray-based classification framework ensures that 1D scans affect the QD only as intended by measuring the effect of each gate on each dot before initiating the unloading process. This module greatly reduces the data needed to tune to the single occupation state while remaining effective as compared with conventional technology.
The articles and processes herein are illustrated further by the following Examples, which are non-limiting.
While classification of arbitrary structures in high dimensions may require complete quantitative information, for simple geometrical structures, low-dimensional quali-tative information about the boundaries defining the structures can suffice. Rather than using dense, multi-dimensional data, we propose a deep neural network (DNN) classification framework that utilizes a minimal collection of one-dimensional representations, called rays, to construct the “fingerprint” of the structure(s) based on substantially reduced information. We empirically study this framework using a synthetic dataset of double and triple quantum dot devices and apply it to the classification problem of identifying the device state. We show that the performance of the ray-based classifier is already on par with traditional 2D images for low dimensional systems, while significantly cutting down the data acquisition cost.
Deep learning is applicable to physical problems in the classification of arbitrary convex geometrical shapes embedded in an N-dimensional space. Having a mathematical frame-work to understand this class of problems and a solution that scales efficiently with the dimension N is essential. With increasing effective dimensionality of the system, including parameters and data, determining the geometry with measurements across the full parameter space may become prohibitively expensive. However, as we show, qualitative information about the boundaries defining the structures of interest may suffice for classification.
Anew framework for classifying simple high-dimensional geometrical structures herein is referred to as ray-based classification. Rather than working with the full N-dimensional data tensor, we train a fully connected DNN using one-dimensional representations in RN, called “rays”, to recognize the relative position of features defining a given structure. We position the boundaries of this structure relative to a point of interest, effectively “fingerprinting” its neighborhood in the RN space. The ray-based classifier is motivated primarily by experiments, particularly those in which sparse data collection is impractical. Our approach not only reduces the amount of data that needs to be collected, but also can be implemented in situ and in an online learning setting, where data is acquired sequentially.
We test the proposed framework using a modified version of the “Quantum dot data for machine learning” dataset developed to study the application of convolutional neural networks (CNNs) to enhance calibration of semiconductor quantum dot devices for use as qubits. Tuning these devices requires a series of measurements of a single response variable as a function of voltages on electrostatic gates. As the number of gates increases, heuristic classification and tuning becomes increasingly difficult, as does the time it takes to fully explore the voltage space of all relevant gates. The specific geometry of the response in gate-voltage space corresponds to the number and position of populated quantum dots, which is valuable information in the process of tuning of these systems.
An image-based CNN classifier for 2D volumes, i.e., solid images, combined with conventional optimization routines, can assist experimental efforts in tuning quantum dot devices between zero-, single- and double-dot states. Here, we consider a double- and triple-dot system. We show that using ray-based classification, the quantity of data required (and thus the time required) for identifying the state of the quantum dot system can be drastically reduced compared to an imaged-based classifier.
Consider Euclidean space RN with its conventional 2-norm distance function d, and a polytope function p:RN→{0, 1}. The set of points where p(x)=1 constitutes the boundary of a collection of polytopes. For example, a polytope function producing a square in R2 centered at the origin is p(x1, x2)={1 if |x1|+|x2|=1; 0 elsewhere}, where (x1, x2)ϵR2. In our quantum dot applications a value of p=1 indicates the location where an electron is transferred in or out of a dot.
Definition 1 (Rays). Given xo, xfϵRN, the ray Rxo,xf emanating from xo and terminating at xf is the set {x|x=(1−t)xo+txf, tϵ[0, 1]} (see
In practical applications, rays have a natural granularity that depends on the system as well as the data collection density. For quantum dots, the device parameters define an intrinsic separation between critical features that gives the scale of the problem. We refer to granularity of rays in terms of pixels.
To assess the geometry of a polytope enclosing any given point xo, we consider a collection of rays of a fixed length r centered at xo. The rays are uniquely determined by a set of M points on the sphere SN−1 of radius r centered at xo, P:={xmϵSxNo−1(r)|1≤m≤M}. We call a set of M rays,
RM:={Rxo,xm|xmϵP}, an M-projection (see
Definition 2 (Feature). Given a ray Rxo,xf and a polytope function p, a point xϵRxo,xf is a feature if p(x)=1.
The assumption that the weight function γ is monotonic in distance lets us define a ray's critical feature as the point xϵFxo,xf with highest (i.e., critical) weight Wxo,xf=γ(d(x, xo)). If Fxo,xf=ø, we put Wxo,xf=0. This allows us to “fingerprint” the space surrounding point xo.
Definition 3 (Point fingerprint). Let xoϵRN be a point from which a collection of rays RM={Rxo,x1f, . . . , Rxo,xfM} emanate. The point fingerprint of xo□ is the M-dimensional vector consisting of the rays' critical weights: Fxo=Wxo,x1f . . . , Wxo,xfM.
This point fingerprint Fxo of xo is the primary object of the ray-based clas-sification framework. If sufficiently many rays in appropriate directions are chosen from xo, the fingerprint is sufficient, at least in principle, to qualitatively determine the geometry of the convex polytope enclosing xo. Due to the cost of experimental data acquisition, determining how few rays are sufficient for a machine learning algorithm to make this determination is of crucial importance. Looking to establish a correspondence between the fingerprint Fxo of point xo and the class of the polytope enclosing this point, we define the following prob-lem:
Problem 1. Given a set of bounded and unbounded convex polytopes fill-ing an N-dimensional space and be-longing to C distinct classes, CϵN, and a point xoϵRN, determine to which of the classes the polytope enclosing xo belongs.
A solution to this problem in the supervised learning setting can be obtained by training a DNN with the input being the point fingerprint and the output identifying an appropriate class. The procedural steps for the proposed classification algorithm for N-dimensional data in the form of pseudocode are presented in Algorithm 1 shown in
The ray-based data is generated using a physics-based simulator of quantum dot devices. An example of a simulated measurement, like the ones typically seen in the laboratory, is shown in
To test the ray-based classification framework in 2D, we use 20 realizations of 2D maps qualitatively comparable to the one shown in
To test the proposed framework with triple-dot systems, we generated a dataset by sampling 17,576 fingerprints from a single simulated device with three dot gates. We varied the number of rays between 6 and 18, while keeping the length of the rays fixed at 60 voxels. For each configuration, we performed N=10 training and validation runs (with data divided 80:20). As shown in
Quantum dots (QDs) defined with electrostatic gates are a leading platform for a scalable quantum computing implementation. However, with increasing numbers of qubits, the complexity of the control parameter space also grows. Traditional measurement techniques, relying on complete or near-complete exploration via two-parameter scans (images) of the device response, quickly become impractical with increasing numbers of gates. We circumvent this challenge by introducing a measurement technique relying on one-dimensional projections of the device response in the multidimensional parameter space. Dubbed the “ray-based classification (RBC) framework,” we use this machine learning approach to implement a classifier for QD states, enabling automated recognition of qubit-relevant parameter regimes. We show that RBC surpasses the 82% accuracy benchmark from the experimental implementation of image-based classification techniques from prior work, while reducing the number of measurement points needed by up to 70%. The reduction in measurement cost is a significant gain for time-intensive QD measurements and is a step forward toward the scalability of these devices. We also discuss how the RBC-based optimizer, which tunes the device to a multiqubit regime, performs when tuning in the two-dimensional and three-dimensional parameter spaces defined by plunger and barrier gates that control the QDs. This work provides experimental validation of both efficient state identification and optimization with machine learning techniques for nontraditional measurements in quantum systems with high-dimensional parameter spaces and time-intensive measurements.
The ease of control, fast measurement, and long coherence of semiconductor quantum dots (QDs) make them a promising platform for quantum computing. Individual qubits can be built from single QDs or multiple QDs coupled together. At present, most QD qubit systems require multiple electro-static gates to isolate, control, and sense each qubit. Of-ten, there are specific gates designed to accumulate electrons into QDs (plungers), gates to control the tunneling between QDs (barriers), and gates to deplete electrons elsewhere (screening gates). As QD devices grow in the number of qubits and complexity so do the number of gate voltages to be controlled and tuned.
Although current few-qubit devices are mostly still tuned manually, there are several emerging auto-mated approaches to various steps in the process of tuning QDs. Depending on the specific device design, each of these tuning steps requires specialized approaches for automation. Some automation techniques focus on tuning devices ab initio to a voltage space where QDs can form. Others focus on tuning the configuration of QDs; that is from single QDs to coupled double QDs.
There are also methods to achieve a specific number of electrons in each QD or to measure and modify the couplings in multiple-QD systems. These various automation techniques have used many different tools: convolutional neural networks (CNNs), deep generative modeling, classical feature extraction (e.g., a Hough transformation), and many custom fitting models.
Motivated by the success of image-based autotuning, here we present an alternative approach that uses the recently proposed ray-based classification (RBC) framework to distinguish between different electron configurations. The RBC framework was originally pro-posed as an approach for classifying simple bounded and unbounded convex geometrical shapes. It thus naturally applies to identifying QD states that manifest themselves as distinct geometrical patterns in the charge sensor response as a function of the gate voltages. Here we present the classification of a Si/SixGe1-x QD device using this new method, both in a “live” measurement session during the experiment and “off-line” using a dataset of large stability diagrams taken from the device after tuning.
We explore how the hyperparameters of the RBC, such as number of rays, ray length, and the choice of the weight function, affect the classification accuracy of experimental data. We find a favorable comparison with image-based classification in terms of accuracy and the quantity of data required. Furthermore, we show an off-line implementation of the RBC framework within an optimizer-based autotuner for a QD system, tuning be-tween single and double QDs in a space of three gate voltages.
A visual inspection of the large scan of experimental data (differential charge sensing) presented in
A classification framework focusing on data acquisition efficiency, rather than using full 2D images capturing a small region of the voltage space, the RBC framework relies on a collection of evenly distributed one-dimensional traces (“rays”) originating from a single point xo and measured in multiple directions in the voltage space to describe the neighborhood of xo (see
A Si/SixGe1-x quadruple-QD device is used to create a double-QD charge sensed by a single sensing QD whose current readout is connected to a cryogenic amplifier. The device is a linear array of four QDs, opposing two charge sensors. The nearby gates (reservoir gates, depletion gates, and tunnel-barrier gates) are pretuned to allow single-QD and double-QD formation under the two leftmost plunger gates, P1 and P2 (see the inset in
To assess the geometry of the transition lines surrounding a given point xoϵ(VP1, VP2), we consider a collection of M rays of a fixed length centered at xo called the M-projection (see
Regardless of the ray data generation method, we collect complex voltage data from the lock-in amplifier
Once an M-projection for a given point xo is acquired, traditional signal processing techniques are used to test each ray for the presence of transition lines. While the noiseless simulation results in binary rays, with transitions easily identifiable along the rays, the noise present in the experimental data makes the transitions harder to detect. In the ac measurement, transitions manifest themselves as peaks along the ray (called “features” in the RBC framework, see
Finally, a “weight” function Γ is applied elementwise to scale the vector of critical features to a [0, 1] range, with rays having no peaks being assigned a default value of 0:
where γ:N>0→[0, 1] is a normalizing decreasing function. The normalized vector of distances Fxo is called the “point fingerprint”. Because of the differences in the geometry of the transition lines for different QD states, distinct point fingerprints are encoded for the different states and a classifier trained on point fingerprint data suffices for the QD state identification. We use a simple deep neural network (DNN) classifier with three hidden layers for this purpose.
The flow of the RBC algorithm is shown in
The output of the classifier is a probability vector,
p(x0)=[pND,pSD
quantifying the current state of the device, with ND de-noting no QDs formed, SDL, SDC, and SDR denoting the left, central, and right single QD, respectively, and DD denoting the double-QD state.
The RBC framework was developed and tested originally on a dataset of simulated double-QD devices. An average accuracy of 96.4(4) % (aver-aged over N=50 models) with just six rays and a weight function γ(x)=1/x was reported for double QDs, where the accuracy is defined as the fraction of correctly classified points from a test dataset. This is on par with the more-data-demanding CNN-based classifica-tion framework, while requiring 60% fewer data. Given the success of the RBC framework on simulated devices, its performance on experimental data reduces data required translates to reduction of the measurement time in the experiment.
To assess the performance of the RBC framework with experimental data, we use an ensemble of 20 DNN classi-fiers pretrained using a modified version of the “Quantum dot data for machine learning” dataset. This allows us to not have to manually label experimental data for training purposes. To prepare the DNNs, we rely on a dataset of 2.7×104 point fingerprints, sampled over 20 simulated QD devices. A number of parameters, such as the device geometry, gate positions, lever arms, and screening lengths, are varied between simulations to re-flect the minimum qualitative features across a range of devices. For training purposes, each fingerprint Fxo is tagged with a label identifying the state of the device at point xo. The labels are generated as part of the simula-tion. Before training, the labels are converted to one-hot vectors (i.e., vectors of length equal to the number of classes and a single nonzero element indicating the true class) and treated as the probabilities p(xo) that xo is in any of the five possible states.
To test the performance of the RBC, we establish an off-line dataset of 311 labeled fingerprints using two mea-surement scans qualitatively comparable to the one pre-sented in
Using the fingerprinting configuration for six evenly spaced rays of length 60 pixels (30 mV) and a weight function γ(x)=1/x we achieve an average accuracy of 87.1(2.0) % (N=20 mod-els). The number of rays, their length, and the choice of the weight function are all considered free parameters of the RBC framework. To optimize the machine learning process, we start by testing the effect of the weight func-tion on the performance of the classifier. We use the four most promising combinations of the number of rays and the ray length for five and six rays of length 50 pixels (25 mV) and of length 60 pix-els (30 mV). In our analysis, we consider a collection of three decreasing weight functions with varying decay rates: γ(x)=1/x, γ(x)=exp(−x), and γ(x)=1−x{circumflex over ( )}, where x{circumflex over ( )}=(x−min x)/(min x−max x) denotes the min−max normalization. In addition, we consider two node-creasing functions: the min−max normalization γ(x)=x
Finding no difference in performance when using sim-ulated data, we test all functions using the test set of off-line experimental data.
With the measurement efficiency in mind, we also test the effect of the number of rays and their length on the performance. We use M-projections with M=5, 6, 7, 9, and 12 rays and with lengths ranging between 20 pix-els (10 mV) and 80 pixels (40 mV), sampled every four pixels (2 mV). Since the ray length directly affects the fingerprints (i.e., shorter rays will naturally miss a tran-sition line that would be detected with a longer ray), the rays in the simulated dataset used to train DNNs are adjusted appropriately to ensure compatibility. As
To test the RBC in situ, we develop a measurement routine that enables live acquisition of ray data. After selection of a point xo, voltages on gates P1 and P2 are changed in tandem to achieve straight voltage rays em-anating from xo. This is, in effect, virtual gating of the (VP1, VP2) voltage space. The performance of the classifier for live measurement of 36 points is shown in
To assess the performance for a larger set of points, we run the RBC off-line for a set of 2,500 points presampled from a large scan. The performance is shown in
The RBC combined with an optimization loop can be used to tune the device from one state to another (e.g., from single-QD state to a double-QD state). We perform off-line tuning by initializing the device at a given point in the space of plunger voltages xo=(VP1,VP2) and then optimizing a fitness function over a premeasured scan to mimic an actual tuning run. The fitness function quanti-fies how close the probability vector returned by the RBC is to the desired target state. We use the fitness function:
δ(ptarget,p(xo))=[|ptarget−p(xo)|]2+ϵ(xo) (3)
where ∥⋅∥ is the Euclidean norm, ptarget is the proba-bility vector for the target state, p(xo) is the probability vector returned by the RBC at xo, and ε(xo) is a penalty function for tuning to larger plunger voltages. We use ε(xo) ∝{tan h[(VP1−VP01)/V0]+tan h[(VP2−VP02)/V0]}, where VP10 and VP20 are previously determined pinch-off values and V0 is a voltage scale normalizing the argu-ment of the tan h function. We use V0=20 mV, approx-imately equal to the charging energy of the QDs. The penalty function acts as a regularization function for the bare Euclidean distance between the current and target state probability vectors. In particular, it adds a smooth gradient to the background as well as helps the optimizer escape from local minima.
We use the Nelder-Mead optimizer implemented in SciPy. The optimizer maintains a set of objec-tive function values at a simplex of n+1 points in n-dimensional space; in our case it amounts to evaluation on vertices of a triangle in 2D gate space. The orenta-tion of the initial simplex is chosen dynamically on the basis of the initial state returned by the RBC and is ob-tained by changing the voltages on each of the plungers by 40 mV. The optimizer works by moving the simplex toward a minimum of the objective function on the basis of the function values at the simplex vertices. Since we lack analytic information about the derivative of the fit-ness function (Eq. 3 in this example), the Nelder-Mead optimizer is well suited for our purpose as it relies only on function evaluations.
We perform an off-line tuning on a sample premeasured large 2D scan to test the viability of the RBC framework in tuning the device state. The final state to be tuned to is set to the double-QD state. The initial points are uni-formly sampled in a square grid over a range of 200 mV, which encompasses approximately 18 electron transitions [highlighted in
We perform off-line tuning in a three-dimensional (3D) space formed by a series of scans in the plunger gates space taken at different values of the middle barrier gate. As can be seen in
The failure modes for the tuning process in both two dimensions and three dimensions include landing at tran-sition lines where the fingerprint does not correspond to either a single-QD state or double-QD state as well as converging to local minima of the fitness function. Although the addition of regularization ε(xo) mitigates the latter to some extent, further work on optimization algorithms is necessary to increase the tuning success rate. Incorporation of a CNN-based classifier to verify the state of the final state and, if necessary reinitiate the autotuner, would likely help alleviate the former is-sue. In comparison with the tuning results reported with CNNs, the RBC framework requires a compara-ble number of iterations to achieve the same end goal, leading to a significant reduction in data acquisition (approximately 60%) with use of rays instead of 2D scans.
An experimen-tal implementation of the ray-based classification frame-work using double-quantum-dot devices was examined. We propose a measurement scheme relying on one-dimensional projec-tions in the plunger gates space as means to “fingerprint” the device states. With measurement efficiency in mind, we consider various combinations of the number of rays and the length of rays as well as multiple weight functions to determine an optimal balance between measurement load and classification accuracy. We show that for the device used, the performance accuracy remains at about 87% regardless of whether six, seven, or nine rays are used. This translates to an up to approximately 70% reduction in the number of measured points needed for classification compared with the CNN-based approach. Increasing the number of rays to 12 results in an accu-racy of about 90%, while reducing the number of points measured by 40%. See
We also show how the RBC framework can be imple-mented to tune the QD device in 2D and 3D gate space. We perform autotuning on a series of premeasured scans in 2D and 3D gate voltage spaces, reliably tuning the device from one state to another. In this work, we fo-cus on automated tuning of a QD device into a voltage space with coupled double QDs. It is also important to note that this tuning scheme does not achieve a spe-cific occupation of each QD, but rather achieves a few-electron double-QD regime. Depending on the intended functionality, (single-electron qubit, multielectron qubit, etc), additional methods are required to achieve an exact occupation for each QD.
With the noisy intermediate-scale quantum technology era on the horizon [38], it is important to consider the practical aspect of implementing automated control as part of the device itself, in the “on-chip” fashion. The network architecture necessary for RBC is significantly simpler and smaller than for CNN-based classification, making it more suitable for an implementation on minia-turized hardware with low power consumption. In particular, the neural network used to train the RBC comprises only four fully connected dense layers with 128, 64, 32, and 5 units, respectively. The total number of parameters necessary for the RBC is about 1.2×104.
With increasing complexity of QD devices in both QD number and gate geometry, the need for automated state identification and tuning will increase. With the develop-ment of QD-based spin qubits using industrial technologies, a technique that enables efficient and scalable characterization of QDs for qubit applications is necessary and provided by the RBC framework for measurement-cost-effective solution for state classifica-tion and tuning.
The problem of classifying high-dimensional shapes in real-world data grows in complex-ity as the dimension of the space increases. For the case of identifying convex shapes of different geometries, a new classification framework has recently been proposed in which the intersections of a set of one-dimensional representations, called rays, with the boundaries of the shape are used to identify the specific geometry. This ray-based classi-fication (RBC) has been empirically verified using a synthetic dataset of two- and three-dimensional shapes and has been validated experimentally. Here, we establish a bound on the number of rays necessary for shape classification, defined by key angular metrics, for arbitrary convex shapes. For two dimensions, we de-rive a lower bound on the number of rays in terms of the shape's length, diameter, and exterior angles. For convex polytopes in RN, we generalize this result to a similar bound given as a function of the dihedral angle and the geometrical parameters of polygonal faces. This result enables a different approach for estimating high-dimensional shapes using substantially fewer data elements than volumetric or surface-based approaches.
The problem of recognizing objects within images has received immense and grow-ing attention in the literature. Aside from visual object recognition in two and three dimensions in real-world applications, such as in medical images segmentation or in self-driving cars, recognizing and classifying objects in N dimensions can be im-portant in scientific applications. A problem arises in cases where data is costly to procure; another problem arises in higher dimensions, where shapes rapidly be-come more varied and complicated and classical algorithms for object identification quickly become difficult to produce. We combine machine learning algorithms with sparse data collection techniques to help overcome both problems.
The method we explore here is the ray-based classification (RBC) framework, which utilizes information about large N-dimensional data sets encoded in a col-lection of one-dimensional objects, called rays. Ultimately, we wish to explore the theoretical limits of how little data—how few rays, in our case—is required for re-solving features of various sizes and levels of detail. In this paper, we determine these limits when the objects to be classified are convex polytopes.
The RBC framework measures convex polytopes by choosing a so-called obser-vation point within the polytope, shooting a number of rays as evenly spaced as possible from this point, and recording the distance it takes for each ray to encounter a face. While it is reasonable to expect that an explicit algorithm for recognizing polygons in a plane can be developed, in arbitrary dimension such an explicit algo-rithm would be tedious to produce and theoretically unelightening. We leave the actual classification to a machine learning algorithm.
The process here is applicable to quantum information systems, e.g., in calibrating the state of semiconductor quantum dots to work as qubits. The various device configurations create an irregular polytopal tiling of a configuration space, and the specific shape of a polytope conveys useful infor-mation about the corresponding device state. We map these shapes as cost-effectively as possible. Here, the cost arises because polytope edges are de-tected through electron tunneling events which places hard physical limits on data acquisition rates. Apart from this original application, the techniques we developed should be valuable in any situation where object classification must be done despite constraints on data acquisition.
In the broad field of data classification in N=2, 3, 4, etc. dimensions, there are many unique approaches, often tailored to the constraints of the problem at hand. For example, higher dimensional data can be projected onto lower dimensions to employ standard deep learning techniques such as 3D ConvNets. Multiple low dimensional views of higher dimensional data can be collected to ease data collection and recognition. Models such as ShapeNets directly work with 3D voxel data. Data collected using depth sensors can be presented as RGB-D data or point clouds representing the topology of features present. Often, depth information is sparsely collected due to limitations of the depth sensors themselves. Within the field of representing 3D or higher dimensional data as point clouds, data can be treated in various ways such as simply N-dimensional coordinates in space, patches, meshed polygons, or summed distances of the data to evenly spaced central points. Critically, the RBC approach is suited for an environment in which data can be collected in any vector direction in N dimensional space while even coarse data collection of the total space would be practically too expensive or unfeasible.
The complexity of any classification problem intensifies in higher dimensions. This is the so-called curse of dimensionality, which has a negative impact on generalizing good performance of algorithms into higher dimensions. In general, with each feature and dimension, the minimum data requirement increases exponentially. This can be seen √in the present work: according to Theorem 4.2, the data requirement increases like NeαN. At the same time, in many applications data acquisition is very expensive, resulting in datasets with a large number of features and a relatively small number of samples per feature (so-called High Dimension Low Sample Size datasets).
Begin with a convex region Q⊂RN along with a point xo, the observation point, in the interior of Q. Given a unit vector v, the ray based at xo in the direction v is
x
,v
={x
o
+tv|tϵ[0,∞)}. (3.1)
The set of directions v at xo is naturally parameterized by the unit sphere SN−1. M many directions v1, . . . , vMϵSN−1 produces M many rays {Ri}iM=1, Ri=Rxo,vi based at xo. Because Q is convex, in the direction vi there will be a unique distance ti at which the boundary ∂Q is encountered. Given a set of directions and an observation point, the corresponding collection of distances is called the point fingerprint.
Definition 3.1. Given a convex region Q, a point xo{circumflex over ( )}Q, and a set of directions {vi}jM=1⊆SN−1, the corresponding point fingerprint is the vector
x
o
,{v
i}i=1M)≡x
where tiϵ(0, ∞] is unique value with xo+tiviϵ∂Q.
In practice, there will be an upper bound on what values the ti may take, which we call T. If the ray does not intersect ∂Q prior to distance T, one would record ti=∞, indicating the region's boundary is effectively infinitely far away in that direction.
The fingerprinting process is depicted in
With an eye toward eventually approximating arbitrary regions with polytopes, we define the following polytope classes.
Definition 3.2. Given N{circumflex over ( )}{2, 3, . . . } and d, l, α>0, let Q(N, d, l, α) be the class of convex polytopes in RN that have diameter at most d, all face inscription sizes at least l, and all exterior dihedral angles at most α.
The “inscription size” of a polytope face is the diameter of the largest possible (N−1)-disk inscribed in that face. In the case N=2, polytopes are just polygons and polytope faces are line segments. In this case the inscription size of a face is just its length. For the case of N=3, the inscription size of a face is the diameter of the largest possible disk inscribed in this face, see
Problem 3.1 (The identification problem). Given a polytope QϵQ(N, d, l, α), determine the smallest M so that, no matter where xoϵQ is placed, a fingerprint made from no more than M many rays is sufficient to completely characterize Q.
Again, the actual identification is done with a machine learning algorithm. In R2, we actually solve this problem and find an optimal value of M. In higher dimensions we find a value for M that works, but could be sharpened in some applications.
Hidden in Problem 3.1 is another problem we call the ray placement problem. To explain this, note that a large number of rays may be placed at xo, but if the rays are clustered in some poor fashion, very little information about the polytope overall geometry will be contained in the fingerprint. This means that before one can determine how many rays are needed, one must already know where to place the rays.
In R2, this placement problem is easily solved: choosing a desired offset v0, the vi are placed at intervals of 2π/M along the unit circle. In higher dimensions the place-ment problem is much more difficult and we have to work with suboptimally-spaced rays. In fact, as we discuss later in this paper, even in R3 an optimal placement is out of reach. To overcome this problem, we propose a general placement algorithm that works in arbitrary dimension and is reasonably sharp. As we show, the pro-posed algorithm is sufficient to enable concrete estimates on the numbers of rays required to resolve elements in Q(N, d, l, α).
In many practical applications, such as calibration of quantum dot devices men-tioned earlier, Problem 3.1 is much too strict. We may not need to reconstruct polytopes exactly but only classify them to within approximate specifications. For example, we may only wish to know if a triangle is “approximately” a right trian-gle, without needing enough data to fully reconstruct it. Or we may wish to distin-guish triangles and hexagons, and not care about other polyhedra. Theoretically, this involves separating the full polytope set Q(N, d, l, α) into disjoint subclasses K C1, . . . , CK⊆Q(N, d, l, α), with possibly a “leftover” set CL=Q(N, d, l, α)\i=1 Ci of unclassifiable or perhaps unimportant objects. The idea is that an object's importance might not lie in its exact specifications, but in some characteristic it possesses.
Problem 3.2 (The classification problem). Assume Q(N, d, l, α) has been partitioned into classes {Ci}iK=1. Given a polytope Q, identify the Ci for which QϵCi.
The classification problem is eminently more suitable for machine learning than the full identification problem. This is in part because the outputs are more discrete (we can arrange it so the algorithm returns the integer i when QϵCi), and in part because machine learning usually produces systems good at identifying whole classes of examples that share common features, while ignoring unimportant details. Importantly, a satisfactory treatment of the classification problem can lead to solutions of more complicated problems, such as classifying compound items like tables, chairs, etc. in a 3D environment or geometrical objects obtained through measurements of an experimental variable in some parameter space. Depending on the origin or purpose of such objects, they naturally belong to different categories. For example, in the 3D real world, furniture and plants define two distinct classes that, if needed, can be further subdivided (e.g., a subclass of chairs, tables). Objects belonging to a single class, in principle, share common characteristics or similar geometric features of some kind.
In the quantum computing application boundaries are identified by measuring discrete tunneling events, and there is little ambiguity in determining when a boundary was crossed. Since the fingerprinting method relies on identifying boundary crossings, in other circumstances boundary detection might require some other resolution. Here, machine learning methods compensate for boundaries that are indistinct or partially undetectable, as such algorithms often remain robust in the presence of noise.
A solution to Problem 3.2 in the supervised learning setting is obtained by training a deep neural network (DNN) with the input being the point fingerprint and an output identifying an appropriate class. Apriori it is unclear how many rays are nec-essary for a fingerprint-based procedure to reliably differentiate between polytopes. With data acquisition efficiency being the focus of this work, we want to theoret-ically determine the lower bound on the number of rays needed. Such a bound is fully within reach for polygons in R2 (Theorem 4.1), and can be approximated in all higher dimensions (Theorem 4.2).
For a polytope face to be visible in a fingerprint, at least one ray must intersect it. To establish not only the presence of a face but its orientation in N-space, at least N many rays must intersect it. The smaller a face is, the further away from the observation point xo it is, or the more highly skewed its orientation is, the more difficult it is for a ray to intersect it. We address the case of polygons in R2 first, as we obtain the most complete information there.
Recall that Q(2, d, l, α) is the class of polygons in the plane with diameter<d, all edge lengths>l, and all exterior angles<α.
Theorem 4.1 (Polygon identification in R2). Assume Q is a polygon in Q(2, d, l, α), and let xo be a point in the polygon's interior, from which M many evenly spaced rays emanate. If
then two or more rays will intersect each boundary segment of Q, and one segment will be hit at least 3 times. The notation above indicates the usual ceiling function.
Knowing the location of two points on each edge is almost, but not quite, suffi-cient for identifying the polygon. There remains an ambiguity between the polygon and its dual; see
Identification in RN follows a largely similar theory, with two substantial changes. The first is that we must change what is meant by the angular span of a face, the second is that we must deal with the ray placement problem mentioned. The notion of angular span is relatively easily adjusted (see
Definition 4.1 (Angular span). If Q is a convex polytope in RN, N≥2, xo is an observation point in Q, and L is a face of Q, the angular span of L is the cone angle of the largest circular cone based at xo so that the cross-section of the cone that is created by plane containing L lies entirely within L.
We create a solution for the ray placement problem with an induction algorithm, but first we require some spherical geometry. Given two points v, wϵSN−1, let DistSN−1 (v, w) be the great-circle distance between them (see
v(r)={wϵN-1|DistS
For example, a ball Bv(π) of radius u is the entire sphere itself, and any ball of the form Bv(π/2) is a hemisphere centered on v. It will be important to know the (N−1)-area of the unit sphere SN−1, and also the (N−1)-area of any ball Bv(r)⊆SN−1. The standard area formulas from differential geometry are
Definition 4.2 (Density of points in SN−1). Let P⊆SN−1 be a finite col-lection of points P={v1, . . . , vk}, viϵSN−1 for 1≤i≤k. We say that the set P is ϕ-dense in SN−1 if, whenever vϵSN−1, then there is some viϵP with DistSN−1 (v, vi)≤ϕ.
We can now give a solution to the ray placement problem on SN−1. We use an inductive point-picking process. Pick a value ϕ; this will be the density one desires for the resulting set of directions on SN−1. Begin the induction with any arbitrary point v1ϵSN−1. If ϕ is small enough that Bv1 (ϕ) is not the entire sphere, then we select a second point v2 to be any arbitrary point not in Bv1 (ϕ). Continuing, if points v1, . . . , vi have been selected, let vi+1 be any arbitrary point chosen under the single constraint that it is not in any Bvj (ϕ), j<i. That is, choose vi+1 arbitrarily under the constraint
v
i+1ϵN-1\(
should such a point exist. Should such a point not exist, meaning Bv1 (ϕ) ∩ . . . ∩Bvi (ϕ) already covers SN−1, the process terminates, and we have our collection P={v1, . . . , vi}.
Whether an algorithm terminates or not is always a vital question. This one does, and Lemma 4.1 gives a numerical bound on its maximum number of steps. This process requires numerous arbitrary choices—each point vi is chosen arbitrarily except for the single constraint that it not be in any of the Bvj (ϕ), j<i—so it does not produce a unique or standard placement of points. This contrasts to the very orderly choice of directions vi=v0+2πi/M on S1 that we relied on in Theorem 4.1. Nevertheless, a set selected in this manner does have valuable properties, which we summarize in the following lemma.
Lemma 4.1 (Properties of the placement algorithm). Let P={v1, v2, . . . }⊆SN−1 be any set of points chosen using the inductive algorithm above. Then
Theorem 4.2 (Polytope identification in RN). Assume QϵQ(N, d, l, α). It is possible to choose a set of M many directions {vi}iM=1 so that given any observation point xoϵQ, the corresponding rays Ri=Rxo,vi have the following properties: (1) The collection of rays {Ri}iM=1 strikes each polytope face N or more times. (2) The number of rays M is no greater than
The estimate (4.10) can be improved if our solution for the placement problem can be improved. The optimal placement problem is unsolved in general; this and related problems go by several names, such as the hard spheres problem, the spheri-cal codes problem, the Fejes T'oth problem, or any of a variety of packing problems. A theoretical bound in any dimension, benchmarking, and comparison are provided. Codes that are empirical can include, once a particular setting has been chosen, a look-up table.
Problem 3.2 in the context of the quantum dot dataset studied considers electrons that are held within two potential wells of depths d1 and d2, which can be adjusted. Depending on these values, elec-trons might be confined, might be able to tunnel between the two wells or travel freely between them, and might be able to tunnel out of the wells into the exterior electron reservoir. Individual tunneling events can be measured, and, when plotted in the d1-d2 plane, create an irregular tiling of the plane by polygons. The polygonal chambers represent discrete quantum configurations, and their boundaries repre-sent tunneling thresholds. The shape of a chamber provides information about the quantum state it represents.
One cap map the (d1, d2) configurations onto the quantum states of the device by taking advantage of the geometry of these polygons. With scal-ability being the overall objective, it was essential that the mapping requires as little input data as possible. For theoretical reasons it is known that each of the lattice's polygons belongs to one of six classes; roughly speaking, these are quadri-lateral, hexagon, open cell (no boundaries at all), and three types of semi-open cells. Further, the hexagons themselves are known to be rather symmetric: they have center-point symmetry, with four longer edges typically of similar length, and two shorter edges of equal length (see
In the language of Problem 3.2, the interesting subclasses of polygons are C1: the hexagons with the symmetry attributes we described, including the quadrilat-erals which are “hexagons” with a=0; C2, C3, C4: three kinds of semi-open cells contained between parallel or almost parallel lines; and C5: the open-cell, which has no boundaries at all. The three classes of polygon C2, C3, C4 are distinguished from one another by their slopes in the d1-d2 plane: polygons in class C2 are between parallel lines with slopes between about 0 and −½, in class C3 between about −½, and about −2, and class C4 between about −2 and −∞. All other polygon types, for these purposes, are unimportant and can go in the “leftover” CL category. The question is how few rays are required to distinguish among the polygons within these classes.
In the quantum dot dataset, we must address one additional complication: the “aperture,” that is the shortest segment in
Prop 4.1. Let xo be an observation point which might be within a polygon of type C1-C5. Five rays are needed to distinguish these types. If the short segment is undetectable and the hexagon has the dimensions indicated in
many rays are needed to distinguish these types.
The theoretical bound given by Eq. (4.12) is compared with the performance of a neural network trained to recognize the difference between strips and hexagons, and a neural network approaches the theoretical ideal. In actual quantum dot environments, values of a lie between about 0 (where the hexagon degenerates to a quadrilateral) and about 1 w. For these values of a/w, Eq. (4.12) gives theoretical bounds on the necessary number of rays between six and about nine. Training experiments confirm that six rays and relatively small DNN are in fact sufficient to obtain classification accuracy of 96.4% (averaged over 50 training and testing runs, standard deviation σ=0.4%). This performance is on par with a ConvNet-based classifier using two-dimensional (2D) images of the shapes for which average accuracy of 95.9% (σ=0.6%). RBC has been verified using experimental data, both off-line (i.e., by sampling rays from pre-measured large 2D scans) and on-line (i.e., by directly measuring the device response in a ray-based fashion). The RBC outperformed the more traditional 2D image-based classification of experimental quantum dot data that relied on convolutional neural network while requiring up to 70% less data points.
With respect to ray based classification framework for convex polytopes, a lower bound on the number of rays for shape identification in two dimensions with generalized the results to arbitrary higher dimensions has been described.
Since objects in N-dimensional space can be approximated by convex polytopes, provided they are suitably rectifiable, this technique opens the way to generalization. The problem of dividing a complicated object into a set of approximating polytopes can be considered a form of salience recognition and data compression—of detecting and storing the most useful or important features of the object. When the data itself is scarce or costly to procure, one seeks methods that economize on input data while retaining salience recognition, even at the expense of some accuracy loss or of requiring heavy computing resources. RBC incorpo-rating multiple intersections of the rays can be extended to solve problems where multiple nested shapes are present enclosing the observation point. Ray-based data acquisition combined with machine learning provides a path forward.
Conventional autotuning approaches for quantum dot (QD) devices, while showing some success, lack an assessment of data reliability. This leads to unexpected failures when noisy data is processed by an autonomous system. In this example, we describe a framework for robust autotuning of QD devices that combines a machine learning (ML) state classifier with a data quality control module. The data quality control module acts as a “gatekeeper” system, ensuring that only reliable data is processed by the state classifier. Lower data quality results in either device recalibration or termination. To train both ML systems, we enhance the QD simulation by incorporating synthetic noise typical of QD experiments. We confirm that the inclusion of synthetic noise in the training of the state classifier significantly improves the performance, resulting in an accuracy of 95.1(7) % when tested on experimental data. We then validate the functionality of the data quality control module by showing the state classifier performance deteriorates with decreasing data quality, as expected. Our results establish a robust and flexible ML framework for autonomous tuning of noisy QD devices.
Gate-defined semiconductor quantum dots (QDs) are a quantum computing technology that has potential for scalability due to their small device footprint, operation at few Kelvin temperatures, and fabrication with scalable techniques. However, minute fabrication inconsistencies present in current devices mean that every qubit must be individually calibrated or tuned. To enable more efficient scaling, this requirement can be met with automated methods.
Automated tuners, both ML- and non-ML-based, make many sequential decisions based on limited data acquired at each step. In such a framework, small er-ror rates can quite rapidly compound into high failure rates. One failure mode of QD autotuning algo-rithms is signal-to-noise ratio (SNR) reductions during the tuning process. One way to avoid tuning failure and to promote trust in ML-based automation is to use an assessment techniques to verify the quality of data before moving forward with tuning.
In this example, a framework for robust automated tuning of QD devices that combines a convolutional neural network (CNN) for device state estimation with a CNN for assessing the data quality is described. Synthetic noise characteristic of QD devices are used train these two networks. To establish the validity of the noisy dataset, we first train a CNN module to classify device states and achieve an accuracy of 94.8(9) % on exper-imental data—an improvement of 47% over the mean accuracy of neural networks trained on noiseless simula-tions. We then use the noisy simulations to train a data quality control module for determining whether the data is feasible for state classification. We show that the latter not only makes intuitive predictions, but also that the predicted quality classes correlate with changes in classifier performance. These results establish a scalable framework for robust automated tuning and manipulation of QD devices.
Conventional automation proposals for QDs lack an assessment of the prediction reliability. This largely stems from a lack of such measures for ML, though for some approaches the “quantitative” rather than “qualitative” nature of labels further complicates this issue. The quantitative nature of prediction means that partial state identification is not only expected but might be necessary for successful operation. A two-state prediction for a given scan should indicate that the scan captures a transition between those states, which is used for tuning. At the same time, if the SNR is low or in the presence of unknown fabrication defects, such a mixed prediction might instead indicate model confusion. In the latter case, if such confusion is not accounted for and corrected, it is likely to result in autotuning failure.
To overcome this issue, we describe a framework that involves a device state estimation module (DSE) combined with an ML-based data quality control module (DQC) to alert the autotuning system when the measured scan is unsuitable for classification. A flow of the framework is shown in
Relatively shallow CNN-based noise estimation models can be used for some image pro-cessing and denoising tasks. However, the ability to de-velop and prepare such estimators hinges on the avail-ability of training data. The noise features present in QD devices can be complex and vary significantly be-tween devices. A reliable training dataset has to account for the different types and magnitudes of noise that can be encountered experimentally. While full control over the noise is unfeasible experimentally, it can be achieved with synthetic data, where the different types and magnitudes of physical noises can be controllably altered.
To establish a benchmark performance for compari-son with CNN classifiers trained on synthetic noise, we use a dataset of about 1.6×104 noiseless measurements. The QD simulator we use is based on a simple model of the electrical gates and a self-consistent potential calculation and capacitance model to determine the stable charge configuration. This simulator is capable of generating current maps and charge stability diagrams as a function of various gate voltages that reproduce the qualitative features of experimental charge stability diagrams. The simulated data represent an idealized device in which the charge state is sensed with perfect accuracy.
To validate the synthetic noise and test the performance of the state classifiers, we generate a dataset of 756 manually labeled experimental images. This data was ac-quired using two quadruple QD devices, both fabricated on a Si/SixGe1-x heterostructure in an accumulation-mode overlapping aluminum gate architecture and operated in a double dot configuration. The gate-defined QD devices use electric potentials defined by metallic gates to trap single electrons either in one central potential, or potentials on the left and right side of the device. Changes in the charge state are sensed by a sin-gle electron transistor (SET) charge sensor. The charge states of the device correspond to the presence and rel-ative locations of trapped electrons: no dot (ND), single left (LD), central (CD) or right (RD) dot, and double dot (DD). We use experimental data consisting of two different datasets of 82 and 503 images, respectively, as well as data collected from a different device resulting in 171 images. All images were man-ually labeled by two team members and any conflicting labels were reconciled through discussions with the re-searcher responsible for data collection.
There are multiple sources of noise in experimental data: dangling bonds at interfaces or defects in oxides lead to noise at the device level; thermal noise, shot noise, and defects in electronics throughout the readout chain result in noise at the readout level. In many QD devices, changes in the device state are sensed by conduc-tance shifts in an SET due to their sensitivity to transi-tions with no change in net charge. The response of an SET is nonlinear which causes variation in the signal of charge transitions. The various types of noise manifest themselves in the measurement though distortion that might obscure or deform the features indicating the state of the device (borders between stable charge regions).
To prepare a dataset for the DQC module, we ex-tend the QD simulator to incorporate the most common sources of experimental noise. We consider five types of noise: dot jumps, Coulomb peak effects, white noise, 1/f (pink) noise, and sensor jumps. Experimentally, white noise, 1/f noise, and sensor and dot jumps appear due to different electronic fluctuations affecting an SET charge sensor. White noise can be attributed to thermal and shot noise while the 1/f noise can have contributions from various dynamic defects in the device and readout circuit. We modeled the charge sensor with a linear response, though in reality it has a nonlinear response due to the shape of the Coulomb blockade peak. We account for this with a simple model of an SET in the weak coupling regime. Physically, dot jumps and sensor jumps are two manifestations of the same process: electrons populating and depopulating charge traps in the device, which we model as two level systems with characteristic excited and ground state life-times. Dot jumps are the effect of these fluctuations on the quantum dot while sensor jumps are the effect on the SET charge sensor. We provide additional details on how we implement these synthetic noises below.
Each of the modeled noises can obscure or mimic charge transition line features, potentially confusing ML models. White noise and 1/f noise both generate high frequency components that can be picked up in the charge sensor gradient. Additionally, the 1/f noise can generate shapes that look similar to charge transition lines. Sensor jumps cause large gradients where they oc-cur. By reducing the gradient, Coulomb peak movement can reduce the visibility of charge transitions. Finally, dot jumps can distort the shapes of charge transition lines. Panels B-F in
For each type of noise, we generate a distinct dataset of about 1.6×104 simulated measurements using the same device parameters as were used for the noiseless dataset. The initial noise magnitudes are set to pro-duce images qualitatively similar to moderately noisy experimental data. The final magnitudes are optimized through a semi-structured grid search over a range of val-ues centered around the initial noise levels. At each step, the correlation between the noise level and model per-formance on a subset of experimental images from one of the devices is used to guide the search. The dataset used to train models for each noise type are generated by varying each noise parameter with a standard deviation of 1% of the parameters' value. Panel G in
The final noisy simulated dataset is generated by fix-ing the relative magnitudes of white noise, 1/f noise, and sensor jumps and varying the magnitudes together in a normal distribution. The means of the magnitudes are set to the optimized values and the standard deviation is one third of each magnitude's value. Fixing the relative magnitudes and varying them together allows this dis-tribution of noise levels to approximate a range of SNR encountered in experiments.
The QD state labels are quantitative so a mixed label indicates an intermediate state so that a simple entropy of a model's prediction cannot be used as a measure of confusion. Rather, an alternative quality measure needs to be established. To achieve this, we leverage the simulated noise framework established in the previous section to perform a controlled analysis of the DSE module performance as noise levels are varied.
In the framework presented in
By evaluating a state classifier on this dataset we determine the relationship between the noise level and performance within each class. From the correlations between noise level and performance, we establish per-QD state data quality thresholds. The thresholds are chosen to en-sure high performance of the state classifier for the high quality data, an expected degradation of performance for data with moderate quality, and poor performance on data with low quality. Specifically, we set the cutoffs us-ing the relationship between the model's mean absolute error (MAE) and noise level, shown in
We set these cutoff levels at relatively conservative amounts of noise, which would enable a fairly risk-averse tuning algorithm. This parameter choice could be ad-justed to the needs of a given application depending on the error sensitivity of an autotuning method. To ensure that images in the low noise class are very reliably iden-tified, we set the threshold between low and moderate noise classes to be at the noise level where the average MAE has gone up by 2.5% of the full range, which is similar to a 2 sigma cutoff for the lower tail of a normal distribution. We set the threshold between moderate and high noise where the average MAE has reached 50% of its full range, where the model is roughly equally likely to be wrong as right for a single state image.
With these thresholds, state labels, and the known amount of noise added, we then assign the simulated data with quality classes for DQC module training. For this training we use a distinct dataset with the same distribution of noise used to set noise class thresholds.
To prepare the data quality control module (DQC in
To determine how the considered noise types affect the performance of the DSE classifier, we modify the simulation with each type of noise individually and evaluate models trained with that data on the experimental test dataset. For initial testing, we optimize a CNN architecture defining the simplistic model used for state recog-nition on noiseless data using the Keras Tuner API baseline, we include the 52.3(5.1) % test accuracy for models trained on simulated data without noise added. As expected, the high classification accuracy of 93.6(9) % achieved during training drops significantly when the models are used to classify noisy experimental images. Some data processing techniques used to suppress experimental noise might help with the performance. Our analysis confirms that preprocessing of experimental data improves the average accuracy and reduces the variance between models. However, the observed accuracy of 59.7(3.1) % (box plot) on the experimen-tal dataset is still much lower than necessary for reliable state assessment.
When looking at the various types of noise individually, analysis reveals that 1/f noise, white noise, and sensor jumps most significantly improve the model per-formance, with 71.1(5.6) %, 70.9(6.5) %, and 75.3(6.9) % accuracy, respectively. Coulomb peaks and dot jumps turn out to be unhelpful on their own. The latter seems to affect the performance negatively. Combining all types of noise results in a significant improvement in both the performance and variation of the result, with an accuracy of 92.5(7) %. For comparison, in the context of simulated transport data, previous work found that only the sensor jumps, 1/f, and white noise improved classifier performance, though the observed improvements were not significant. When combining the noises, a varied SNR was used by varying sensor jumps, 1/f, and white noise together. This uniformly tunes the SNR be-tween simulated images as a replacement for the explicit Coulomb peak. Effectively, this results in a varying visibility of charge transition lines but with more uniformity.
Since the model architecture we use was op-timized for a noiseless dataset, we re-optimize the CNN architecture using the noisy simulated dataset. This al-lows us to find a model that is structurally best suited to that type of data and thus further improve the per-formance. With these changes, we find an increase in the classification accuracy by about 2.5% to 95.1(7) %, box plot Gopt in
To confirm the validity of the thresholds used to define the three quality classes, we use the experimental dataset. The DQC module applied to the experimental images classified 607 images as high quality, 135 images as moderate quality, and 14 images as low quality.
We assess the viability of the proposed frame-work by performing tests of the DSE and DQC modules over two large experimental scans shown in
We use a series of 60 mV by 60 mV scans sampled at every pixel within the large scans and leaving a 30 mV margin at the boundary to ensure that each sampled scan is within the full scan boundaries. From
While areas with mixed labels are produced by both models, for the robust model, they are primarily indica-tive of transitions between states. For the simplistic model, mixed labels are assigned also within single-state parts of the scans. Such labels should not be used for au-totuning as they will degrade the optimization step (see
A side-by-side comparison of panels (e) and (g) (as well as (f) and (h)) in
Results show that adding physical noise to simu-lated data can dramatically improve the performance of machine learning algorithms on experimental data. Im-portantly, we are able to achieve high level performance without any preprocessing or denoising of the data. We also show how the synthetic noise can be used to develop ML tools to assess the quality of experimental data and that the assigned data quality correlates with state clas-sifier performance, as desired. Combining these tools en-ables a framework we outlined in
We note that the thresholds used to establish the qual-ity classes in the data quality control module were chosen to provide meaningful separation. However, depending on the application's risk tolerance, these thresholds can be adjusted to obtain the error rates needed to prevent failure of an autotuning algorithm. Beyond the classi-fication of the data quality, our flexible synthetic noise model allows for extensions in which the data is labeled by the exact type and level of noise rather than the over-all quality. ML models can then be trained to predict the predominant types of noise, which in turn would enable tailored recalibration actions to mitigate them.
Broadly, our noise augmentation approach confirms that perturbing simulated data with realistic, physics-based noise can vastly improve the performance of simulation-trained ML models. This may be a useful in-sight for other research combining ML and physics. From a transfer learning perspective, the observed performance increase could be attributed to the physical noise aug-mentation shifting the training data distribution nearer to the experimental test distribution. Additionally, our data quality control module presents a paradigm for ML reliability estimation in which physically-motivated noise models are used to determine whether to move for-ward with data classification.
Five different types of noise were added to the simulated data: dot jumps, Coulomb peak effects, 1/f noise, white noise, and sensor jumps. Of these, the white noise is the simplest to implement by adding normally distributed noise with zero mean and fixed standard deviation at every pixel. The standard de-viation value is determined as part of the noise optimiza-tion process. The 1/f noise is generated in Fourier space with random phase sampled uniformly over [0, 2π). The Coulomb peak effect is applied using a simple model of a quantum dot in the weak coupling regime which yields a conductance lineshape of the form:
G/G
max=cos h−2(A(V−Vmin))
where G is the conductance, Gmax is the peak conduc-tance of the line, A is a parameter that controls the linewidth and is determined during noise optimization, Vmin is the peak center, and V is the signal seen by the simulated sensor due to the quantum dots. Dot jumps and sensor jumps are generated using the same underlying physics principles. We model them as charge traps with characteristic excited and ground state life-times necessary for capturing or ejecting electrons. We achieve this by performing Bernoulli trials to determine if a jump occurs at a given pixel. This allows the jumps to follow a geometric distribution—the discrete analogue to an exponential distribution. Magnitudes of sensor jumps are drawn from a normal distribution with zero mean and fixed standard deviation determined during noise op-timization. Magnitudes of dot jumps are drawn from a Poissonian distribution with fixed rate also determined during noise optimization.
To provide better clarity on how we determine the noise level thresholds for training the DQC module, here we show plots of the data used to set these thresholds. The top row in
The dashed lines in the bottom row of
Since we found no clear dependence of the MAE for ND on the noise level, the ND thresholds were set sepa-rately. Above the 50% thresholds, the DSE has trouble distinguishing between ND and any other state, making the ND predictions unreliable. Thus, the upper thresh-old for ND was set based on the threshold determined for the remaining four states. For consistency, the lower threshold for ND was determine in an analogous fashion.
Both machine learning modules are built and trained using the TensorFlow (v.2.4.1) Keras Python API. We use three different model architectures: two for testing the DSE for noiseless and noisy data, and a third one in the DQC module. All architectures are optimized to ensure high performance using the Keras Tuner and the Optuna hyperparameter tuner.
The optimized neural network architectures are pre-sented in
As used herein. “autotuning” refers to finding a range of gate voltages where the device is in a particular “global configuration” (i.e., a no-dot, single-dot, or double-dot regime). Steps of the experimental implementation of the autotuner are presented in
Step 0: Preparation. Before the ML systems are engaged, the device is cooled down, and the gates are manually checked for response and pinch-off voltages. Furthermore, the charge sensor and the barrier gates are also tuned using traditional techniques.
Step 1: Measurement. A two-dimensional (2D) measurement of the charge-sensor response over a fixed range of gate voltages. The position for the initial measurement (given as a center and a size of the scan in millivolts) is provided by a user.
Step 2: Data processing. Resizing of the measured 2D scan VR and filtering of the noise (if necessary) to assure compatibility with the neural network.
Step 3: Network analysis. Analysis of the processed data. The CNN identifies the state of the device for VR and returns a probability vector p(VR).
Step 4—Optimization. An optimization of the fitness function δ(ptarget.p(VR)), given in Eq. (2), resulting either in a position of the consecutive 2D scan or decision to terminate the autotuning.
Step 5: Gate-voltage adjustment. An adjustment of the gate voltages as suggested by the optimizer. The position of the consecutive scan is given as a center of the scan (in millivolts).
The preparation step results in a range of acceptable voltages for gates, which allows “sandboxing” by limiting the two plunger voltages controlled by the autotuning protocol within these ranges to prevent device damage, as well as in establishment of the appropriate voltage level at which the barrier gates are fixed throughout the test runs (precalibration). The charge-sensing dot is also tuned manually at this stage. The sandbox also helps define the size of the regions used for state recognition. Proper scaling of the measurement scans is crucial for meaningful network analysis: scans that are too small may not contain enough features necessary for state classification, while scans that are too large may result in probability vectors that are not useful in the optimization phase.
Steps 1-5 mentioned above are repeated until the desired global state is reached. In other words, we formulate the autotuning as an optimization problem over the state of the device in the space of gate voltages, where the function to be optimized is a fitness function δ between probability vectors of the current and the desired measurement outcomes. The autotuning is considered successful if the optimizer converges to a voltage range that gives the expected dot configuration.
QDs are defined by electrostatically confining electrons using voltages on metallic gates applied above a 2D electron gas (2DEG) present at the interface of a semiconductor heterostructure. Realization of good qubit performance is achieved via precise electrostatic confinement, band-gap engineering, and dynamically adjusted voltages on nearby electrical gates. A false-color scanning electron micrograph of a Si/SixGe1-x quadruple-dot device identical to the one measured is shown in
To automate the tuning process and eliminate the need for human intervention, we incorporate ML techniques into the software controlling the experimental apparatus. In particular, we use a pretrained CNN to determine the current global state of the device. To prepare the CNN, we rely on a data set of 1001 quantum-dot devices generated using a modified Thomas-Fermi approximation to model a set of reference semiconductor systems comprising of a quasi-1D nanowire with a series of depletion gates the voltages of which determine the number of dots, the charges on each of those dots, and the conductance through the wire. The data set is constructed to be agnostic about the details of a particular geometry and material platform used for fabricating dots. To reflect the minimum qualitative features across a wide range of devices, a number of parameters are varied between simulations, such as the device geometry, gate positions, lever arm, and screening length, to name a few. The idea behind varying the device parameters when generating training data set is to enable the use of the same pretrained network on different experimental devices.
The synthetic data set contains full-size simulated 2D measurements of the charge-sensor readout and the state labels at each point as functions of plunger gate voltages (VP1,VP2) (at a pixel level). For training purposes, we generate an assembly of 10 010 random charge-sensor measurement realizations (ten samples per full-size scan), with charge-sensor response data stored as (30×30) pixel maps from the space of plunger gates (for examples of simulated single- and double-dot regions, respectively, see the right-hand column in
where |SD| and |DD| are the numbers of pixels with a single-dot and a double-dot state label, respectively, and N is the size of the image VR in pixels. As such, p(VR) can be thought of as a probability vector that a given measurement captures each of the possible states (i.e., no dot, single dot, or double dot). The resulting probability vector for a given region VR, p(VR), is an implicit function of the plunger gate voltages defining VR. It is important to note that, while CNNs are traditionally used to simply classify images into a number of predefined global classes (which can be thought of as a qualitative classification), we use the raw probability vectors returned by the CNN (i.e., quantitative classification).
The CNN architecture consists of two convolutional layers (each followed by a pooling layer) and four fully connected layers with 1024, 512, 256, and 3 units, respectively. The convolutional and pooling layers are used to reduce the size of the feature maps while extracting the most important characteristics of the data. The fully connected layers, on the other hand, allow for nonlinear combinations of these characteristics and classification of the data. We use the Adam optimizer with a learning rate η=0.001, 5000 steps per training and a batch size of 50. The accuracy of the network on the test set is 97.7%.
The optimization step of the autotuning process (Step 4 in
δ(ptarget,p())=∥ptarget−p()∥2+γ(), (2)
where ∥∩∥2 is the L2 norm and the penalty function γ is defined as
γ()=αg(pnone)+βg(pSD), (3)
where g(x) is the arctangent shifted and scaled to assure that the penalty is non-negative [i.e., g(x)≥0] and that the increase in penalty is more significant once a region is classified as predominantly non-double dot (i.e., the inflection point is at x=0.5). Parameters α and β are used to weight penalties coming from no dot and single dot, respectively.
For optimization, we use the Nelder-Mead method implemented in PYTHON. The Nelder-Mead algorithm works to find a minimum of an objective function by evaluating it at initial simplex points—a triangle in the case of the 2D gate space in this work. Depending on the values of the objective function at the simplex points, the subsequent points are selected to move the overall simplex toward the function minimum. In our case, the initial simplex is defined by the fitness value of the starting region VR and two additional regions obtained by lowering the voltage on each of the plungers one at a time by 75 mV.
To evaluate the autotuner in an experimental setup, a Si/SixGe1-x quadruple quantum-dot device (see
In the second phase, we evaluate the performance of the trained network on hand-labeled experimental data. The data set includes (30×30)mV scans with 1 mV per pixel and (60×60)mV with 2 mV per pixel. Prior to analysis, all scans are flattened with an automated filtering function to assure compatibility with the neural network (see the left-hand column in
In the third phase, we perform a series of trial runs of the autotuning algorithm in the (VP1,VP2) plunger space, as shown in
We initialize 45 autotuning runs, out of which seven are terminated by the user due to technical problems (e.g., stability of the sensor). Of the remaining 38 completed runs, in 13 cases the scans collected at an early stage of the tuning process are found to be incompatible with the CNN. In particular, while there are three possible realizations of the single-dot state (coupled strongly to the left plunger, the right plunger, or equally coupled, forming a “central dot”), the training data set includes predominantly realizations of the “central dot” state. As a result, whenever the single left or right plunger dot is measured, the scan is labeled incorrectly. When a sequence of consecutive “single-plunger-dot” scans is used in the optimization step, the optimizer misidentifies the scans as double dot and fails to tune away from this region. These runs are removed from further analysis, as with the incorrect labels, the autotuner terminates each time in a region classified as double dot (i.e., a success from the ML perspective) which in reality is a single dot (i.e., a failure for practical purposes). We discuss the performance of the autotuner based on the remaining 25 runs.
While tuning, it is observed that the autotuner tends to fail when initiated further away from the target double-dot region. An inspection of the test runs confirms that whenever both plungers are set at or above 375 mV, the tuner becomes stuck in the plateau area of the fitness function and does not reach the target area (with two exceptions). Out of the 25 completed runs, 14 are initiated with at least one plunger set below 375 mV. Out of these, two cases fail, both due to instability of the charge sensor resulting in unusually noisy data that is incorrectly labeled by the CNN and thus leads to an inconsistent gradient direction. The overall success rate here is 85.7% (for a summary of the performance for each initial point from this class, see
Tuning “off-line”—tuning within a premeasured scan for a large range of gate voltages that captures all possible state configurations—allows for the study of how the various parameters of the optimizer impact the functioning of the autotuner and further investigation of the reliability of the tuning process while not taking up experimental time. The scan that we use spans 125-525 mV for plunger P1 and 150-550 mV for P2, measured in 2-mV-per-pixel resolution.
The deterministic nature of the CNN classification (i.e., assigning a fixed probability to a given scan) assures that the performance of the tuner will be affected solely by changes made to the optimizer. On the other hand, with static data, for any starting point the initial simplex and the consecutive steps are fully deterministic, making a reliability test challenging. To address this issue, rather than repeating a number of autotuning tests for a given starting point (VP1,VP2), we initiate tuning runs for points sampled from a (9×9) pixels region around (VP1,VP2), resulting in 81 test runs for each point.
We assess the reliability of the autotuning protocol for the seven experimentally tested configurations listed in
To assess the performance of the autotuning protocol for a wider range of initial configurations, we perform off-line tuning over a set of premeasured scans. Using four scans spanning 100-500 mV for plunger P1 and 150-550 mV for P2, measured in 2-mV-per-pixel resolution, we initiate N=784 test runs per scan, sampling every 10 mV and leaving a margin that is big enough to ensure that the initial simplex is within the full scan boundaries. A heat map representing the performance of the autotuner is presented in
While a standardized fully automated approach to tuning quantum-dot devices is essential for their scalability, present-day approaches to tuning rely heavily on human heuristic and algorithmic protocols that are specific to a particular device and cannot be used across devices without fine readjustments. To address this issue, we are developing a tuning paradigm that combines synthetic data from a physical model with ML and optimization techniques to establish an automated closed-loop system of experimental device control. Here, we report on the performance of the proposed autotuner when tested in situ.
In particular, we verify that, within certain constraints, the proposed approach can automatically tune a QD device to a desired double-dot configuration. In the process, we confirm that a ML algorithm, trained using exclusively synthetic noiseless data, can be used to successfully classify images coming from experiment, where noise and imperfections typical of real measurements are present. This work also enables us to identify areas in which further work is necessary to improve the overall reliability of the autotuning system. A new training data set is necessary to account for all three possible single-dot states. The size of the initial simplex also seems to contribute to the mobility of the tuner out of the SD plateau. For comparison, in
These results serve as a baseline for future investigation of fine-grain device control (i.e., tuning to a desired charge configuration) and of “cold-start” autotuning (i.e., complete tuning without any precalibration of the device).
To use QD qubits in quantum computers, it is necessary to develop a reliable automated approach to control QD devices, independent of human heuristics and intervention. Working with experimental devices with high-dimensional parameter spaces poses many challenges, from performing reliable measurements to identifying the device state to tuning into a desirable configuration. By combining theoretical, computational, and experimental efforts, this interdisciplinary research sheds light on how modern ML techniques can assist experiments.
While one or more embodiments have been shown and described, modifications and substitutions may be made thereto without departing from the spirit and scope of the invention. Accordingly, it is to be understood that the present invention has been described by way of illustrations and not limitation. Embodiments herein can be used independently or can be combined.
All ranges disclosed herein are inclusive of the endpoints, and the endpoints are independently combinable with each other. The ranges are continuous and thus contain every value and subset thereof in the range. Unless otherwise stated or contextually inapplicable, all percentages, when expressing a quantity, are weight percentages. The suffix (s) as used herein is intended to include both the singular and the plural of the term that it modifies, thereby including at least one of that term (e.g., the colorant(s) includes at least one colorants). Option, optional, or optionally means that the subsequently described event or circumstance can or cannot occur, and that the description includes instances where the event occurs and instances where it does not. As used herein, combination is inclusive of blends, mixtures, alloys, reaction products, collection of elements, and the like.
As used herein, a combination thereof refers to a combination comprising at least one of the named constituents, components, compounds, or elements, optionally together with one or more of the same class of constituents, components, compounds, or elements.
All references are incorporated herein by reference.
The use of the terms “a,” “an,” and “the” and similar referents in the context of describing the invention (especially in the context of the following claims) are to be construed to cover both the singular and the plural, unless otherwise indicated herein or clearly contradicted by context. It can further be noted that the terms first, second, primary, secondary, and the like herein do not denote any order, quantity, or importance, but rather are used to distinguish one element from another. It will also be understood that, although the terms first, second, etc. are, in some instances, used herein to describe various elements, these elements should not be limited by these terms. For example, a first current could be termed a second current, and, similarly, a second current could be termed a first current, without departing from the scope of the various described embodiments. The first current and the second current are both currents, but they are not the same condition unless explicitly stated as such.
The modifier about used in connection with a quantity is inclusive of the stated value and has the meaning dictated by the context (e.g., it includes the degree of error associated with measurement of the particular quantity). The conjunction or is used to link objects of a list or alternatives and is not disjunctive; rather the elements can be used separately or can be combined together under appropriate circumstances.
This application claims the benefit of U.S. Provisional Patent Application Ser. No. 63/083,368 (filed Sep. 25, 2020), which is herein incorporated by reference in its entirety.
This invention was made with United States Government support from the National Institute of Standards and Technology (NIST), an agency of the United States Department of Commerce. The Government has certain rights in this invention.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US2021/052248 | 9/27/2021 | WO |
Number | Date | Country | |
---|---|---|---|
63083368 | Sep 2020 | US |