For a long time, the industry focused on improving general-purpose systems and improving the power of computing by orders of magnitude. But in recent years, special-purpose designs have been increasingly adopted for their efficacy in solving specific types of tasks such as encryption and network operations. At the same time, computations require better mechanisms to solve a wide array of modern problems. Machine learning tasks have become a new focus and many specialized architectures are proposed to accelerate their operations.
Many accelerator architectures have been proposed. They have low operation costs and control overheads, but most are largely within the von Neumann regime. In a related but different track of work, researchers are trying to leverage physical processes that evolves to a final state which naturally corresponds to a solution to some problem. In principle, such nature-based computing systems can be significantly better in performance and efficiency than conventional accelerators. Hence, there is much interest about these systems, particularly in the physics community. Quantum computers marketed by D-Wave Systems are prominent examples, but many other prototypes exist, and some are already showing exciting capabilities.
In a nutshell, the Ising model concerns a system with many nodes (e.g., atoms), each with a spin (σi) which takes one of two values (+1,−1). Every pair of spins (σi and σj) has a specific coupling coefficient (Jij) and each spin also interacts with external field μ with coefficient hi. The total energy of the system is thus given by the Ising formula,
or if we ignore the external field, the Hamiltonian simplifies to
This simplified version is more useful for the purpose of the present disclosure. Henceforth, when “the Ising model” or Ising formula, will refer to Equation 2.
A physical system with such a Hamiltonian naturally tends towards low-energy states. Hence it can be used to solve an optimization problem with a formulation equivalent to the Ising formula if parameters can be selected (e.g. Jij) to match that of the problem.
Many optimization problems naturally map to an Ising machine. Perhaps the most straightforward problem to map is the Max-Cut problem. Given a graph, G=(V,E), a “cut” is a partition of vertices into two sets of, say, V+ and V−, where V−=V−V+. The goal is to find a cut such that the combined weight of the edges spanning the two sets of vertices is maximum. In other words, the maximum cut is
where Wij is the weight of the edge (i,j).
It is easy to see the resemblance between Equation 2 and Equation 3.
If the coupling weight (Jij) is set to be the negative of edge weight (−Wij) then the Ising formula is simply twice the negative cut value plus a problem-specific constant (ΣWij) as follows (for notational simplicity, for i≥j, Wij is set to 0):
Hence if the machine finds the ground state of the Hamiltonian, it will return the maximum cut. Finding out the maximum cut of an arbitrary graph is an NP-hard problem. Practical algorithms only try to find a good answer. Similarly, existing Ising machines (including the disclosed design) are all Ising sampling machines that typically provide a good sample of a low-energy state, with no guarantee of optimality.
Because of the trivial mapping of the Max-Cut problem to the Ising formula, designers of Ising machines often focus on this optimization problem. However, other optimization problems can also be mapped to an Ising machine.
Building a fully programmable spin-system, however, is a challenging task: different realizations of spin-systems so far all have some non-trivial drawbacks (e.g., requiring cryogenic operating conditions), which manifest to end users as high operating costs and/or limited problem-solving capabilities. The recently-disclosed bistable resistively-coupled Ising Machine (BRIM) design is the first major drawback-free Ising machine. More information about BRIM may be found in U.S. patent application Ser. No. 17/996,283, filed on Oct. 14, 2022; International Application No. PCT/US2021/070402, filed on Apr. 16, 2021, and U.S. Provisional Application No. 63/011,245, filed on Apr. 16, 2020, all of which are incorporated herein by reference.
To facilitate computation for diverse workloads, researchers are trying to map an entire algorithm to physical processes such that the resulting state represents an answer to the mapped algorithm. An example is Quantum annealer from D-Wave Systems (Bunyk, et al., IEEE
Transactions on Applied Superconductivity, 2014). where a combinatorial optimization problem is mapped to a system of qubits such that the system's Hamiltonian is subjected to minimization. Careful control is required for the system to settle to an equilibrium (i.e. optimal solution), and the qubit states are read out as the solution to the mapped problem.
It is not definitive whether D-Wave's systems can reach some sort of quantum speedup, but it is clear that these Ising machines can indeed find some good solutions to an optimization problem in a very short amount of time (milli- or micro-second latencies). Recently, there have been designs that show good quality solutions for non-trivial sizes. The common properties of Ising machines is that: a problem is mapped to its physical setup, the internal state evolves according to machine-dependent physics, the evolution optimizes a particular formula (the Ising model), and the machine's physical state is read out to achieve a solution to the mapped problem.
Different from the von Neumann machine, these nature-based machines follow no explicit algorithm: nature is effectively carrying out the computation. Recently, diverse Ising machines have been implemented in a variety of ways. They differ in complexity of their underlying physics principles, and it is not clear whether any form provides a fundamental advantage over others in a large-scale system. It is noted that these machines are not guaranteed to reach the ground state (the optimal solution) in practice. Rather, they can find good solutions at high speed, and with good overall energy efficiency.
Despite their potential of outperforming conventional computers in terms of power and time to find solutions, building a robust and general Ising machine substrate remains challenging and scalability is one of the key factors that need to be considered. As the complexity of the problem to solve increases, the amount of required hardware components grows exponentially. In order to sustain a small form factor and chip area comparable to (or better than) a conventional computing substrate, simple building blocks are preferred for an Ising machine's implementation. Disclosed herein is a resistively-coupled Ising machine with bistable nodes and quantized nodal interactions (QuBRIM), which reduces the hardware complexity by directly adjusting the interactions among neighboring nodes.
To understand the rationale behind the disclosed designs, the present disclosure first analyzes why all-to-all coupling is essential by showing in more detail the true costs of near-neighbor coupling (see below). Having all-to-all coupling per se is not enough. After all, one can simulate any system with von Neumann computation. It is shown herein that such emulation is inefficient both in terms of time and energy cost. Only physical all-to-all coupling can make a dynamical system perform computation at speeds orders of magnitude faster than conventional computation.
In one aspect, a computation node comprises an input, an output, a bistable node comprising an input and an output, the output configured to have at least two equilibrium output voltages, a first buffer circuit, having an input and an output, the buffer input connected to the output of the bistable node and the buffer output connected to the output of the computation node, the buffer circuit configured to provide a first output voltage when a voltage at the buffer input is below a threshold, and a second output voltage when the voltage at the buffer input is above the threshold, and a current conveyor node having an input connected to the input of the computation node and an output connected to the input of the bistable node, the current conveyor configured to hold its input at a constant voltage and mirror current received at the input into the input of the bistable node.
In one embodiment, the first buffer circuit comprises an inverter. In one embodiment, the computation node further comprises a second, inverting output and a second buffer circuit having an input connected to an inverting output of the bistable node, the inverting output of the bistable node configured to have an inverse value of the output of the bistable node, the second buffer circuit further comprising an output connected to the second inverting output of the computation node. In one embodiment, the first and second buffer circuits each comprise an inverter. In one embodiment, the computation node further comprises a second, inverting input; and a second current conveyor node having an input connected to the second inverting input of the computation node and an output connected to a second input of the bistable node, the current conveyor configured to hold its input at a constant voltage and mirror current received at the input into the second input of the bistable node.
In one aspect, a network comprises at least first and second computation nodes as defined herein, further comprising a coupling node connecting the output of the first computation node to the input of the second computation node. In one embodiment, the coupling node comprises a resistor. In one embodiment, the coupling node comprises a pseudo-resistor.
In one aspect, a network of resistively-coupled computation nodes comprises at least one computation node, each comprising an input, an output, a bistable node, comprising an input and an output, the output configured to have at least two equilibrium output voltages, a first inverter having an input and an output, the inverter input connected to the output of the bistable node and the inverter output connected to the output of the computation node, and a current conveyor node having an input connected to the input of the computation node and an output connected to the input of the bistable node, and a resistive coupling connected to the output of the computation node.
In one embodiment, the at least one computation node comprises first and second computation nodes, wherein the input of the second computation node is connected to the input of the first computation node. In one embodiment, the output of the first computation node and the output of the second computation node are connected to a multiplication node configured to multiply the outputs and provide the multiplied output as a multiplication node output. In one embodiment, the multiplication node comprises an exclusive NOR gate.
The foregoing purposes and features, as well as other purposes and features, will become apparent with reference to the description and accompanying figures below, which are included to provide an understanding of the invention and constitute a part of the specification, in which like numerals represent like elements, and in which:
It is to be understood that the figures and descriptions of the present invention have been simplified to illustrate elements that are relevant for a clear understanding of the present invention, while eliminating, for the purpose of clarity, many other elements found in related systems and methods. Those of ordinary skill in the art may recognize that other elements and/or steps are desirable and/or required in implementing the present invention. However, because such elements and steps are well known in the art, and because they do not facilitate a better understanding of the present invention, a discussion of such elements and steps is not provided herein. The disclosure herein is directed to all such variations and modifications to such elements and methods known to those skilled in the art.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Although any methods and materials similar or equivalent to those described herein can be used in the practice or testing of the present invention, exemplary methods and materials are described.
As used herein, each of the following terms has the meaning associated with it in this section.
The articles “a” and “an” are used herein to refer to one or to more than one (i.e., to at least one) of the grammatical object of the article. By way of example, “an element” means one element or more than one element.
“About” as used herein when referring to a measurable value such as an amount, a temporal duration, and the like, is meant to encompass variations of +20%, +10%, +5%, +1%, and +0.1% from the specified value, as such variations are appropriate.
Throughout this disclosure, various aspects of the invention can be presented in a range format. It should be understood that the description in range format is merely for convenience and brevity and should not be construed as an inflexible limitation on the scope of the invention. Accordingly, the description of a range should be considered to have specifically disclosed all the possible subranges as well as individual numerical values within that range. For example, description of a range such as from 1 to 6 should be considered to have specifically disclosed subranges such as from 1 to 3, from 1 to 4, from 1 to 5, from 2 to 4, from 2 to 6, from 3 to 6 etc., as well as individual numbers within that range, for example, 1, 2, 2.7, 3, 4, 5, 5.3, 6 and any whole and partial increments therebetween. This applies regardless of the breadth of the range.
In some aspects of the present invention, software executing the instructions provided herein may be stored on a non-transitory computer-readable medium, wherein the software performs some or all of the steps of the present invention when executed on a processor.
Aspects of the invention relate to algorithms executed in computer software. Though certain embodiments may be described as written in particular programming languages, or executed on particular operating systems or computing platforms, it is understood that the system and method of the present invention is not limited to any particular computing language, platform, or combination thereof. Software executing the algorithms described herein may be written in any programming language known in the art, compiled or interpreted, including but not limited to C, C++, C#, Objective-C, Java, JavaScript, MATLAB, Python, PHP, Perl, Ruby, or Visual Basic. It is further understood that elements of the present invention may be executed on any acceptable computing platform, including but not limited to a server, a cloud instance, a workstation, a thin client, a mobile device, an embedded microcontroller, a television, or any other suitable computing device known in the art.
Parts of this invention are described as software running on a computing device. Though software described herein may be disclosed as operating on one particular computing device (e.g. a dedicated server or a workstation), it is understood in the art that software is intrinsically portable and that most software running on a dedicated server may also be run, for the purposes of the present invention, on any of a wide range of devices including desktop or mobile devices, laptops, tablets, smartphones, watches, wearable electronics or other wireless digital/cellular phones, televisions, cloud instances, embedded microcontrollers, thin client devices, or any other suitable computing device known in the art.
Similarly, parts of this invention are described as communicating over a variety of wireless or wired computer networks. For the purposes of this invention, the words “network”, “networked”, and “networking” are understood to encompass wired Ethernet, fiber optic connections, wireless connections including any of the various 802.11 standards, cellular WAN infrastructures such as 3G, 4G/LTE, or 5G networks, Bluetooth®, Bluetooth® Low Energy (BLE) or Zigbee® communication links, or any other method by which one electronic device is capable of communicating with another. In some embodiments, elements of the networked portion of the invention may be implemented over a Virtual Private Network (VPN).
Generally, program modules include routines, programs, components, data structures, and other types of structures that perform particular tasks or implement particular abstract data types. Moreover, those skilled in the art will appreciate that the invention may be practiced with other computer system configurations, including hand-held devices, multiprocessor systems, microprocessor-based or programmable consumer electronics, minicomputers, mainframe computers, and the like. The invention may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote memory storage devices.
The storage device 120 is connected to the CPU 150 through a storage controller (not shown) connected to the bus 135. The storage device 120 and its associated computer-readable media provide non-volatile storage for the computer 100. Although the description of computer-readable media contained herein refers to a storage device, such as a hard disk or CD-ROM drive, it should be appreciated by those skilled in the art that computer-readable media can be any available media that can be accessed by the computer 100.
By way of example, and not to be limiting, computer-readable media may comprise computer storage media. Computer storage media includes volatile and non-volatile, removable and non-removable media implemented in any method or technology for storage of information such as computer-readable instructions, data structures, program modules or other data. Computer storage media includes, but is not limited to, RAM, ROM, EPROM, EEPROM, flash memory or other solid state memory technology, CD-ROM, DVD, or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by the computer.
According to various embodiments of the invention, the computer 100 may operate in a networked environment using logical connections to remote computers through a network 140, such as TCP/IP network such as the Internet or an intranet. The computer 100 may connect to the network 140 through a network interface unit 145 connected to the bus 135. It should be appreciated that the network interface unit 145 may also be utilized to connect to other types of networks and remote computer systems.
The computer 100 may also include an input/output controller 155 for receiving and processing input from a number of input/output devices 160, including a keyboard, a mouse, a touchscreen, a camera, a microphone, a controller, a joystick, or other type of input device. Similarly, the input/output controller 155 may provide output to a display screen, a printer, a speaker, or other type of output device. The computer 100 can connect to the input/output device 160 via a wired connection including, but not limited to, fiber optic, Ethernet, or copper wire or wireless means including, but not limited to, Wi-Fi, Bluetooth, Near-Field Communication (NFC), infrared, or other suitable wired or wireless connections.
As mentioned briefly above, a number of program modules and data files may be stored in the storage device 120 and/or RAM 110 of the computer 100, including an operating system 125 suitable for controlling the operation of a networked computer. The storage device 120 and RAM 110 may also store one or more applications/programs 130. In particular, the storage device 120 and RAM 110 may store an application/program 130 for providing a variety of functionalities to a user. For instance, the application/program 130 may comprise many types of programs such as a word processing application, a spreadsheet application, a desktop publishing application, a database application, a gaming application, internet browsing application, electronic mail application, messaging application, and the like. According to an embodiment of the present invention, the application/program 130 comprises a multiple functionality software application for providing word processing functionality, slide presentation functionality, spreadsheet functionality, database functionality and the like.
The computer 100 in some embodiments can include a variety of sensors 165 for monitoring the environment surrounding and the environment internal to the computer 100. These sensors 165 can include a Global Positioning System (GPS) sensor, a photosensitive sensor, a gyroscope, a magnetometer, thermometer, a proximity sensor, an accelerometer, a microphone, biometric sensor, barometer, humidity sensor, radiation sensor, or any other suitable sensor.
Ising machines can map and solve optimization problems in Ising formulation. Such problems are also expressed in an equivalent quadratic unconstrained binary optimization (QUBO) formulation. In early stages of Ising machines, there is still an emphasis on demonstrating the underlying physical principle. Because of this, near-neighbor coupling is a common theme among designs, since coupling a large number of spins is technically very challenging. Moreover, since existing Ising machines largely avoid all-to-all connections, in newer efforts, designers understandably feel justified in also avoiding the challenge. As shown herein, a careful analysis will make it obvious that the problems it creates are numerous and significant.
The first challenge is time cost in preprocessing. When an Ising machine provides a limited connectivity scheme, an arbitrary problem will need to go through an embedding process before it can be mapped to the hardware.
The second challenge is cost in hardware resources. Even if the embedding time is ignored, the number of physical spins needed grows fundamentally at O(N2) for any fixed-degree topology. This is because a generic QUBO problem with N binary variables is specified with O(N2) control parameters. In an Ising machine with a fixed degree topology, to get O(N2) programmable edges necessarily means it has O(N2) physical spins. As a result, an Ising machine with fixed-degree couplings merely provides a large number of nominal spins. This is shown in a concrete example in Table 1.
Table 1 is an Example showing a lower-bound in the number of spins and coupling units (cu).
Existing Ising Machines may be divided into three categories based on the technology used for their design: Quantum, Optical, and Electronic annealers.
The latest Quantum Ising annealer manufactured by D-wave can support up to 2000 qubits. The qubits are coupled to form a chimera graph. As a result of local coupling, the D-wave machine can only map up to 64 nodes all to all connected graph. These annealers are also susceptible to noise, necessitating a cryogenic operating condition that consumes much power (25KW for the D-Wave 2000q).
The Most popular optical-based Ising machine is the Coherent Ising Machine (CIM). It uses an optical parametric oscillator (OPO) to generate and manipulate a signal to represent one spin. Unlike D-Wave, CIM nodes are all-to-all coupled. The machine has two components; an optical cavity built using kilometers of fiber, and an auxiliary computer to implement coupling between nodes. Every pulse's amplitude and phase in the optical cavity are detected, and its interaction with all other pulses is calculated using an auxiliary computer (FPGA). This computation is then used to modulate new pulses that are injected back into the optical cavity. Strictly speaking, the current implementation is a nature-simulation hybrid Ising machine. Thus, beyond the challenge of constructing the cavity, CIM also requires a significant supporting structure that involves fast conversions between optical and electrical signals.
The operating principle of CIM can be viewed as a Kuramoto model (see Takeda, et al., Quantum Science and Technology, 2017), so in theory, the one can achieve a similar goal using other oscillators. This led to the design of electronic Oscillator-based Ising Machines (OIM). These systems use LC tanks for spins and (programmable) resistors as coupling units. However, inductors are often a source of practical challenges for on-chip integration. They are area intensive and have undesirable parasitics with reduced quality and increased phase noise, which poses practical challenges in maintaining frequency uniformity and phase synchronicity between thousands of on-chip oscillators.
The disclosed BRIM design, discussed in more detail above, is an electronic design with resistive coupling, in which the Ising spin is implemented as capacitor voltage controlled by a feedback circuit, making it bistable. Since it uses voltage (as opposed to phase) to represent spin, it enables a straightforward interface to additional architectural support for computational tasks.
Researchers have also designed different chips to accelerate simulated annealing, or a variant of the classic algorithm. In these designs, the spins are virtual, in that they are bits in memory and manipulated by an algorithm (simulated annealing). These machines are specially built to accelerate that algorithm. Hence this design is referred to as an Accelerated Simulated Annealer (ASA). These are fundamentally different from regular Ising machines as they follow a particular algorithm, whereas regular Ising machines are guided by physical laws themselves.
Consider a fully-connected problem with 1000 spin variables and thus around 500,000 coupling coefficients. In a nearest-neighbor (mesh) system, one can calculate a very loose lower bound of the size of the system that can map the problem. This can be done by assuming every coupler is used to express a program coefficient—this is optimistic because in reality, many of the couplers will simply connect nearby spins to form a logical spin. Even in this case, an array of 500×500 is necessary to have enough couplers for the coupling coefficients. The nearest-neighbor architecture requires 250,000 spins just to get enough coupling units, which is what is truly needed. While the cost of 250,000 spin units is real and substantial, the notion that the system possesses that many nominal spins is meaningless. In fact, instead of arbitrarily referring to the problem as having 1000 spin variables, it could simply be described as having half a million coupling coefficients. In that case, the most resource-efficient system is an all-to-all architecture, with only 1000 spins needed.
Another challenge is cost in solution quality. Fixed-degree topologies are in fact wasteful in resources needed to provide the same computational capacity. Worse still, the process of mapping a single logical spin to a large number of physical spins creates subtle problems that result in an additional problem: poor solution quality. The present disclosure will first show an empirical observation and then explain that the loss in solution quality is fundamental.
Two example machines using fixed-degree coupling are first considered: D-Wave's 2000Q which uses a Chimera graph and an accelerated simulated annealer (ASA) that uses a King's Graph.
In an embedded problem, a logical spin from the original problem is represented as a (large) chain of physical spins that are strongly coupled together so that (hopefully) they have the same polarity. This is done by assigning a coupling strength of k (assuming the problem coefficients are already normalized to 1). Note that increasing k only reduces the chance a chain is “broken” (physical spins within it have both polarities) but cannot guarantee the absence of broken chains.
To provide a more concrete illustration of the problem, consider a set of very small programs of 16 to 32 logical spins, converted through minor embedding into a physical layout with Chimera-couplings. The transformed energy landscape may then be navigated using a state-of-the-art simulated annealing solver (Isakov, et al., Computer Physics Communications, 2015).
From
One might think that sampling an illegal solution is not a problem or that in any case, increasing k can easily mitigate the problem. Both are true, but only to some extent. First, obtaining an illegal solution is not an issue per se. It is not difficult to transform a raw solution into a legal solution of the original problem. A common practice is to use a majority vote to decide on the chain's overall polarity. Increasing k clearly has an effect on reducing the percentage of broken chains as shown in
Considering a concrete example, it is shown that the observations shown in the figure are quite understandable. With a 32-node graph g (representing a 32-spin Ising formulation), for example, the size of its phase space is 231. Through minor embedding, the transformed graph ge has 320 nodes. Only 232 phase points in ge are legal, i.e., they correspond directly back to those of g. In other words, for every legal phase point, there are 2288 illegal ones. When a solver explores the phase space ge and obtains a solution (call it ), it is entirely understandable that it is an illegal solution despite the fact that illegal phase points have a higher energy in general: the numerical odds (1 in 2288) are simply overwhelming.
When converting into a legal solution with a majority vote, one essentially substitutes with a legal solution
′). The merit of
′ is that it is the closest legal solution (in Hamming distance in the enlarged phase space of ge) to
. The implied assumption is that if
is a good solution (low energy), changing a few spins into
′ will still provide a reasonably good solution. But
′ being the closest legal solution does not imply it is close in some absolute sense: all 2288 phase points share one legal phase point as the closest legal phase point. Most of them will have a non-trivial distance from that legal phase point. In general, the greater the distance, the less correlation of energy there is between
′ and
. As a concrete example, in one 32-node graph with k=1, the majority-vote-derived solution
′ has a Hamming distance 112 from the original solution
.
This problem can be mitigated to some extent. Using a larger k, for instance, has the effect of increasing the penalty for illegal solutions. However, this is again not without cost. Increasing k beyond a certain point will be counterproductive, as that would put too much emphasis on not breaking any chain and not enough emphasis on the quality difference between different legal solutions. Empirically, in these test problems, it is found that an optimal value of k=4. Given a finite dynamic range of coupling coefficient in a realistic analog hardware, a large k value simply takes away the usable range to express the actual problem. In the disclosed example, k=4 has the effect of reserving 2 bits for the use of lowering the impact of broken chains. Unfortunately, as the graph increases in size, the optimal value of k increases as can be seen in the figure. This means that in real-world problems with thousands of spins, more of the system's dynamic range will be needed to program k.
As shown above, the embedding process not only changes the number of spins required to map the problem and demands significant preprocessing, it also has a significant impact on the solution quality because it forces a search on a proxy energy landscape rather than the original problem's energy landscape. All in all, near-neighbor coupling creates a host of genuine problems that are often ignored in literature. Every indication suggests that it is not a solution to the challenge of increasing a machine's problem-solving capacity. While it lowers the barrier to building proof-of-concept prototypes, machines in the next stage need to efficiently support all-to-all coupling.
One apparent alternative to providing physical all-to-all coupling is providing emulated coupling. In coherent Ising machines (CIM) (see Inagaki, et al., Science, 2016; Yamamoto, et al., npj Quantum Information, 2017; McMahon, et al., Science, 2016; Takata, et al., Scientific Reports, 2016; and Böhm, et al., Nature Communications, 2019), this is achieved by measuring a pulse i, numerically computing the coupling effect to another pulse j, and subsequently injecting a corresponding signal that represents the computed value back into the dynamical system. While this approach also facilitates building a proof-of-concept prototype, it creates problems for high-performance systems, as the computation delay and energy both limit the overall speed and energy efficiency of the system.
A recent comparison between a computational solution, Simulated Bifurcation (SB) and CIM is revealing (see Tatsumura, et al, 2019 29th International Conference on Field Programmable Logic and Applications (FPL), 2019).
In SB, the entire dynamical system is simulated with computation. For every round of interaction O(N2) multiply-accumulate (MAC) operations are needed for both SB and CIM. While one expects some advantage for CIM as it contains some hardware whereas SB is purely a simulator, it turns out SB is much faster. For a commonly used problem (K2000), to reach a certain solution quality, SB needed 25 rounds/steps, while CIM needs more than 150 rounds. In other words: the software-based SB is both faster and more energy efficient than a hybrid Ising machine. This example clearly casts doubt on the utility of von Neumann-emulated coupling in Ising machines.
Finally, a more recent incarnation of CIM manages to reach an impressive 100,000 spins due in part to the incorporation of a piezo-based fiber stretcher to help achieve a stable phase relationship among such a large number of pulses in the 5-km fiber. Upon a closer inspection of the computational substrate used to perform the so-called digitally assisted mutual interaction, the issue with such an approach is clear. First, the entire machine is constructed for a special type of graph where the weights of couplings are only 1 or −1. Despite supporting only low bit-precision calculations, the von Neumann computing took 54 top-of-the-line FPGAs (XCVU13P-2FHGB2104I). Together, just the computational elements consume about 6 kW (see Honjo, et al., Science Advances, 2021).
Intuitively, nature-based computational systems leverage the computation done by nature to achieve extraordinary speed and energy efficiency. Such advantages may disappear completely or would be significantly diminished if von Neumann computing is relied upon to emulate nature.
Though existing Ising machines seem to offer a non-trivial number of spins, they fall into two categories, neither of which represents a promising future:
(1) Architectures with near-neighbor coupling, which create a larger number of nominal spins. There is no indication that these genuinely improve the machine's problem-solving capacity (effective spins). Instead, they create an extremely expensive preprocessing demand, use hardware resources poorly, and fundamentally lowers the solution quality. Each of these three technical problems is extremely challenging to address and possibly has no solution.
(2) Emulated all-to-all coupling, which creates an enormous amount of traditional (von Neumann) computing demand and its concomitant energy overhead. At the moment, the computing demand and thus energy required is far worse than traditional solvers.
Therefore, the best (if not the only) approach is to deliver hardware coupling mechanisms so efficient (in area and energy) that genuine all-to-all coupling is feasible at the hardware level. Only then can the dynamical system leverage nature to perform certain types of tasks orders-of-magnitude faster and more energy efficiently than conventional von Neumann solvers. Every indication of preliminary research shows that this is doable. The focus of the disclosed hardware system is exactly to deliver such hardware—at scale. In the vast majority of cases, there are no disadvantages to the disclosed global all-to-all architecture over less demanding ones. In a few unlikely situations, current near-neighbor topologies may be worth consideration.
Given a technology of spin, it may be technically impossible to couple many nodes together. (Such a technology would not be a promising one to explore.) Another example of a potentially suitable use case for a near-neighbor topology is where a custom architecture is being built for fixed problems whose connection matrix is very sparse. In those cases, the disclosed technology can also be used with a sparse-coupling architecture, resulting in a correspondingly smaller system.
It is intuited that in the Ising model, when two nodes (i and j) are strongly and positively coupled (Jij is large and positive), their spins are likely to be parallel (σi=σj). Similarly, a strong negative coupling will likely lead to anti-parallel spins (σi=−σj), while a |Jij|≈0 suggests that the two spins are more likely to be independent. This behavior can be easily mimicked with resistively coupled capacitors, where the spin of a node is represented by the polarity of the voltage vi across the capacitor Ci while the coupling between nodes is through resistors Rij=Rc/|Jij|, where Rc is a scaling constant equal to the minimum resistance achievable (or allowed) in a certain technology. The sign of coupling can be achieved by connecting either the same or opposite polarity in the corresponding capacitors.
Bistability: The oversimplified design confirms the intuition but is far from a robust design. For example, in the presence of a trivial equilibrium state (vi=0), frustrations (i.e., more than one equilibrium with the same energy is present), or leakage on the capacitors, the equilibrium voltages can be 0 or just too low for reliable readout.
A local feedback circuit is therefore needed to keep the nodes away from 0V and closer to power rails—i.e., make nodes bistable. Such active feedback may be achieved in some embodiments by a ZIV diode, whose name comes from its slanted “Z” shaped IV-curve, gZIV(vi), as shown in
where P(vi) is obtained by integrating gZIV(vi) over vi. It is important to note that the particular IV characteristic of a ZIV diode will give P(v) a double-well profile (black curve 701 in
A more complete system design is illustrated in
Coupling unit (CU): With an array of BRIM nodes, one has the flexibility to couple them in any desired topology. As discussed quantitatively above, an all-to-all coupling is computationally more useful and is thus the better choice. The coupling in BRIM is achieved through an array of CUs each containing several programmable/variable resistors.
Although coupling resistors may be implemented in standard CMOS using a single MOSFET biased in a triode region, the linearity is expected to be unsatisfactory due to the large voltages applied (e.g., the largest anticipated voltage across the programmable resistors would span rail-to-rail). For this reason, we a double-PMOS-based resistor is used (referred to herein as R-PMOS). As illustrated at the top of
In the disclosed system, the coupling is bidirectional and, in principle, could be achieved by using a total of N(N−1)/2 CUs. However, to achieve bipolar coupling between any two nodal capacitors (both positive/parallel and negative/anti-parallel coupling) and to accommodate fully-differential operation, each CU in the N(N−1)/2 array would require four R-PMOS resistors (a pair for each coupling polarity choice), a 2×2 crossbar switch to select between parallel or anti-parallel connection as well as a memory cell and logic gates to store and control the state of the crossbar switch. A more area efficient design which alleviates the need for the crossbar switch and local memory cell is to divide each bidirectional CU of the N(N−1)/2 design into two smaller units each handling only one type of polarity. Although this approach results in a total of N(N-1) CUs (as shown in
Both the initial nodal values and the coupling resistances are programmable. To the right of the coupling unit array in
where Rc is the minimum achievable resistance of the R-PMOS. The activation of CUij VS. CUji is accomplished by the MUX units in ith and jth rows. Once the programming of all columns is completed, the row level DAC's and MUX units are shut down to save power. Similar to DRAM cells, we anticipate that R-PMOS will exhibit gate leakage currents that, if not refreshed, will discharge the R-PMOS capacitances over time (order of milliseconds) changing the pre-set resistance values. For this reason, in some embodiments, thick-oxide PMOS devices may be used to implement R-PMOS, which exhibit orders of magnitude lower leakage than thin oxide (low-voltage) devices. It is also anticipated that the programming steps will need to be repeated several times during operation to “refresh” the coupling values to the desired value.
It is apparent from Equation 6 and
As pointed out above, the BRIM system can become stuck in a local maximum determined by the initial state. To allow the BRIM to escape a local maximum, in some embodiments a similar strategy may be used as that used in simulated annealing, which is to deploy perturbations in the state variables. For example, by performing a “spin flip” of select nodes (i.e., to change the spin to its opposite value), the system can be moved to a neighboring state in the global phase space. This allows the system to escape a basin of attraction (e.g. a local maximum) and explore new regions. In simulated annealing, the probability of such bit flips is a function of both the energy difference due to the bit flip and the current temperature. For implementation convenience, the temperature is used in some embodiments to decide the probability/frequency of spin flips. In some embodiments, the temperature follows an exponential annealing schedule. In other words, the frequency of spin flips decays exponentially. With this support, BRIM may be used similarly to other Ising machines: first, program the weights/coupling coefficients; then, select the annealing time; and finally, read out the state of the nodes after the completion of the annealing schedule. The annealing schedule for ZIV diodes and spin perturbations may be controlled either globally for all nodes or locally for each node individually.
There are several circuit non-idealities that might affect the coupling accuracy and dynamic range (DR), for example, inherent kT/C noise, non-linearity in the R-PMOS resistance, gate oxide leakage current, and charge injection from the CU's access switch. Disclosed below is a preliminary analysis of the DR and precision, but it should be noted that a more detailed analysis exploring the many degrees of freedom and relevant trade-offs is warranted.
Assuming a CU area of 1 μm×1 μm in a 45 nm standard CMOS and a total capacitance seen at the gates of R-PMOS of 4fF (including the gate capacitance and additional 2fF Metal-insulator-metal, or MiM, capacitor per RPMOS), the expected kT/C noise is about 1m Vrms resulting in 0.1% and 0.38% error in the channel resistance of 60kΩ and 7MΩ, respectively. Since |Jij≈1/Rij, the DR of the coupling coefficients |Jij may be defined as a ratio of the upper bound on |Jij| corresponding to the smallest programmable resistance of R-PMOS (in this example CU in 45 nm, minimum Rij=60kΩ and the lower bound on |Jij|, which is defined by the error in the high-end resistance values. The relative error of R-PMOS resistance due to kT/C noise flattens at around 7MΩ to 0.38% resulting in a DR=20 log10 (7M/(0.0038*60k))=89 dB or about 15 bits of resolution. On the other hand, the limitation on accuracy due to the kT/C noise is defined by the peak relative error of about 0.38% which translates into about 8 bits of effective resolution.
Another important issue to consider that may limit the overall coupling accuracy is the nonlinearity of the R-PMOS resistors. The preliminary non-linearity modeling in 45 nm CMOS indicates that the relative error (calculated as the ratio between the RMS deviation from the best fit line and the slope of the best fit line in graph 903 in
It is also important to note that an extensive statistical analysis is needed to fully understand the extent of non-linearity effects on the BRIM's solution quality. More specifically, the type of the problems solved by Ising machines, including BRIM, are often such that the optimal solution is not guaranteed and only sub-optimal solutions are expected. Therefore, the true effect of the non-linearity on the solution quality could only be evaluated through a thorough analysis on a large set of graph problems.
Coupling between nodes is a key factor to build robust Ising machines because the problem to solve is directly embedded in it. Although the spin only takes two discrete values, the physical quantity that represents the spin can experience a large swing, which may pose many issues. For example, in resistively coupled Ising machines, this may cause a large variation in the coupling resistance, and very likely the machine is solving different problems at different times. Also, the trade-off between the size, power, and precise functionality of the building blocks limits the scalability.
In addition to reducing the maximum voltage swing across R-PMOS (e.g., by lowering power supply voltage), another approach to reducing the non-linearity effects is to adopt an alternative BRIM design (so-called Quantized BRIM, or QuBRIM, described in more detail below) where the voltage magnitudes across all R-PMOS are kept constant while the node-to node coupling becomes entirely current based. However, a QuBRIM design requires additional circuitry such as comparators/inverters and current-conveyors for current summation per node, thus scaling as O(N). In addition, since the coupling in QuBRIM is unidirectional (as opposed to BRIM's bidirectional coupling), each coupling would require two CU units, doubling the overall size of the CU array as compared to baseline BRIM.
The disclosed modified BRIM design (QuBRIM) is more area and power demanding than the baseline BRIM, but not only helps improve linearity and accuracy of the coupling coefficients but also creates a clear path towards an implementation of CMOS based Ising machines with multi-body interactions (e.g., cubic or higher order terms in Ising formula). The QuBRIM design is generated from the baseline BRIM by quantizing nodal variables (e.g., capacitor voltages vi) to binary values (e.g., +1 or rail-to-rail) before they are coupled to capacitors of other nodes.
The resulting dynamical system is governed by Equation 7 below with a Lyapunov function shown in Equation 8, where Q(vi) is a quantization function equal to +1 for vi≥0 and −1, otherwise (or alternatively Q(vi)=Vdd for vi≥ Vdd/2 and 0V, otherwise).
Considered another way, the dynamic of QuBRIM can be described by the change of each nodal voltage caused by other nodes connected to it. Without loss of generality, the following focuses on the dynamic of one node to make the analysis neat, however, it can be easily extended to the whole system [v1(t), v2(t), . . . , vN(t)]T.
The dynamic of node i is expressed as
where the summation on the right side represents the currents flow to node i from other nodes with sgn(Jij) indicating the polarity of the coupling. However, note that these currents do not depend on vi but only on the quantized voltage of other nodes, i.e., Q(vi), where Q(vi)=sgn (vi). Recalling that Rij=R/J|ij|, the above equation becomes
Given the set of differential equations above describing QuBRIM as a dynamic system, it can be shown that a Lyapunov function in the following form exists:
Because the quantity Q(vi) in Equation 11 only takes the values of {−1,+1} and can be used to directly represent the spin of the node j, the Lyapunov function in Equation 11 directly relates to the Ising Hamiltonian in Equation 2, i.e. by minimizing Lyapunov function in Equation 11, QuBRIM simultaneously seeks a minimum in the Ising Hamiltonian).
To show the ability for optimum-seeking of QuBRIM, a Lyapunov stability analysis is presented below, which states that if a function H(v) can be found such that
at point ve
in the region around ve, then ve is the equilibrium point. In the disclosed case, the and equilibrium point corresponds to the solution to the equation set
i=1 . . . N.
Taking the time derivative of Equation 11, the following result is obtained after applying the chain rule:
where the last line is the direct result of Equation 10. More importantly, one should notice that the product in the brackets is non-negative because vi and Q(vi) always change in the same direction. As a result, the time derivative of H(v) is non-positive, i.e. H(v) is a non-increasing function), so QuBRIM indeed satisfies the Lyapunov stability criterion. In other words, once the system enters a region, it will inevitably evolve towards lowering H(v) until it reaches the equilibrium point ve. Consequently, minimizing H(v) is equal to minimizing the Ising formula in Equation 2.
However, the above analysis does not guarantee that the system will converge to a global minimum as it depends on the energy landscape and initial conditions. This is similar to other annealers: None has strong guarantees for reaching the ground state in a typical usage scenario (as opposed to ideal conditions) or in an efficient manner. For example, it is shown that simulated annealing can reach the ground state in a system with a finite phase space. However, the time it takes to do so may be longer than enumerating the space.
As contemplated herein, in some embodiments, the quantization of a QuBRIM node can be performed by simple logic gates (e.g., an inverter 1201 as shown in
Quantizing nodal variables vi to binary values before they are connected to the CU array and other nodes allows a straightforward multiplication of the quantized variables Q(vi) by using, for example, XNOR logic gates to achieve multi-body interactions. For example, if assuming that a 4-spin objective function with cubic terms to be minimized is H=−J123σ1σ2σ3−J124σ1σ2σ4−J134σ1σ3σ4−J234σ2σ3σ4, the 4-node QuBRIM that minimizes the function H is constructed from the set of differential equations shown in Equation 13 below, where
XNOR(a,b) is a 2-input exclusive-NOR logic gate, and
VCM is a common-mode voltage typically equal to half of the power supply voltage. The left-hand side of Equation 13 represents a current into the nodal capacitance of node Ni, which is equal to the sum of all currents from coupling resistors Rijk and the ZIV diode's current.
This approach can be extended to construct a QuBRIM machine that minimizes an objective function including Ising terms of not only arbitrary order (higher than two) but also an Ising formulation that includes mixed-order terms as well. It is also to be understood that an all-to-all QuBRIM with, for example, cubic terms would require O(N3) hardware resources in terms of CUs and O(N2) in terms of XNOR gates. There are several area efficient designs for XNOR gates (e.g., a 3-transistor XNOR in as shown in S, et al., 2016 3rd International Conference on Advanced Computing and Communication Systems (ICACCS), 2016), indicating that the area penalty would be largely due to the O(N3) CU array.
An exemplary QuBRIM system is illustrated in
The second subsystem is the programming units. Both the initial nodal values and the coupling resistance are programmable. To the right of the coupling unit array is the programming array. This array consists of digital memory 1309 for storing the weights which drive an array of digital-to-analog converters (DACs) 1307. A small number of such DACs 1307 are sufficient to program all the coupling units in a time-interleaved fashion. In the figure, we show N DACs are shown programming the N×(N−1) coupling units. In such a configuration, corresponding column selectors and pulldown logic 1303, 1306 are needed, which are shown above and below the coupling units.
The third subsystem is the perturbation units. As the complexity of graphs grows and the number of local minima in the Ising formulation increases, a machine that follows a gradient descent is more likely to end up at a local energy minimum. One possible way to escape local minima is by flipping the polarity of some of the spins. However, this scheme requires some monitoring blocks to detect and change the polarity of each node, which dramatically increases the hardware complexity. The disclosed device instead randomly chooses a node and briefly clips it to one of the supply rails. The nodal spin then either maintains its current state or swaps to the opposite state in a certain time interval. This scheme is referred to as “spin-fix annealing.” The control signals are sent by the Annealer Control unit 1304 in
The following is a discussion of one embodiments of a detailed circuit implementation of the disclosed design, mainly focusing on the analog core part, i.e., a QuBRIM node and couplers.
Although embodiments may refer to one specific implementation, for example the implementation shown in
With reference to
The current conveyor is constructed as a class AB current conveyor (see Wilson, International Journal of Electronics, 1992), denoted as 1401, and which is formed by M1-M8 in
As the name of the disclosed design suggests, the voltage signal on the capacitor of one node is quantized before sending to the neighboring nodes. Because our system operates in a continuous time fashion, the quantizer 1402 can be simply constructed with two inverters connected in series. To increase the driving ability and decrease delay, the fan-out design metric is adopted. As shown in
Two pairs of switches (1403, 1404) are connected to the two capacitors in one QuBRIM node to configure the initial spins of the system before the machine's states are allowed to evolve in seeking an energy minimum. Another function of these switches is applying spin fix perturbation. SF1+ and SF1− are non-overlapping signals used to control S1 and S2, respectively whereas, whereas due to differential operation, the same signals control S2 and S1, respectively in the other half circuit of the QuBRIM node.
A detail view of an exemplary coupling unit is shown in
For a fully-connected network, the number of coupling units grows quadratically with the number of nodes, and the coupling units are expected to occupy the largest portion of the chip area. Therefore, implementing coupling elements as on-chip physical resistors, which tend to be large in area, is impractical. Instead, two back-to-back connected p-type transistors are used in the disclosed embodiment to form a pseudo-resistor that shares a similar I-V characteristic as a physical resistor, as shown on the right in
One drawback of this implementation is that it only provides moderate linearity. As the terminal voltage exceeds a certain range, its resistance varies a lot, which may affect the machine's coupling precision and overall solution quality. However, when used in QuBRIM, the pseudo-resistor becomes inherently linear thanks to the fact that the magnitude of its terminal voltage remains constant at Vdd/2 at all times. As can be seen in the schematics, one terminal of the pseudo-resistor is always fixed at VCM=Vdd/2 while the other terminal is connected to the quantizer's output which could only take values VDD or 0V, resulting in a terminal voltage across the pseudo-resistor of ±Vdd/2.
The invention is further described in detail by reference to the following experimental examples. These examples are provided for purposes of illustration only, and are not intended to be limiting unless otherwise specified. Thus, the invention should in no way be construed as being limited to the following examples, but rather, should be construed to encompass any and all variations which become evident as a result of the teaching provided herein.
Without further description, it is believed that one of ordinary skill in the art can, using the preceding description and the following illustrative examples, make and utilize the system and method of the present invention. The following working examples therefore, specifically point out the exemplary embodiments of the present invention, and are not to be construed as limiting in any way the remainder of the disclosure.
Simulations were run using the disclosed design to evaluate its performance, at both the circuit level and the behavioral level. Because of the straightforward connection to the Ising model, the Max-Cut problem was adopted as the benchmark. First the design was validated on the circuit level on small-size graph problems, then performance was compared between the disclosed design and other existing Ising machines on large graphs.
A 32-node QuBRIM was designed in a 45 nm generic process design kit (GPDK) and simulated in Virtuoso Analog Design Environment (ADE), both provided by Cadence. The test bench contained 20 fully-connected graphs with binary weights. The best Max-Cuts for these graphs are known. First, spin-fix annealing was disabled and each graph was simulated 100 times with different initial spin configurations. In the next simulations, spin-fix annealing was enabled. For a fair comparison, the same initial spin configurations were used, but different spin-fix sequences were applied in each run. Also, the occurring probability of spin-fix was set to decrease linearly over time, as discussed above.
Results indicate that the disclosed machine always ends in local minima, whether spin-fix annealing is on or off, which is expected. The distances to the best solution were then used as the figure of merit and these were plotted over time in
For those that are not the best solutions, the average distance reduces from 8.09 to 3.75 after applying spin-fix annealing.
The transient behavior of the system was also explored. A graph was randomly chosen from the above test bench and the change of the Ising energy as time advances was captured, as shown in the top graph of
On the other hand, when spin-fix annealing is enabled, the machine becomes busier in exploring many attraction regions within the energy landscape. Once it reaches the vicinity of the global minimum, spin-fix of a single or a few nodes will only temporally increase the energy level. However, they will be quickly pulled back to the global minimum by the rest of the nodes. This behavior can also be observed from the voltage waveforms near the end of the annealing time.
Comparison to other Ising Machines
It is also worth comparing the disclosed design with other existing Ising machines. The following machines were evaluated:
1) D-Wave: The 2000Q was used, which is the latest quantum annealer. Jobs were launched using the API provided by D-Wave. For each graph problem, 50 samples were collected. In terms of timing, no constraints were specified, and D-Wave default values were used-specifically: 20 μs, 198 μs, 21 μs, and 11.7 ms respectively for annealing, data readout, inter-sample delay, and qubits programming.
2) OIM: is an electronic Oscillator-based Ising Machine. For simulation, the Kuramoto model was used based on code provided in Wang, et al., 2019.
3) ASA: refers to a number of related designs of Accelerated Simulated Annealers. These accelerators use virtual spins and are straightforward to model based on the descriptions in literature. The version used in this example had 30,000 nominal spins. In this design, the coupling followed a near-neighbor pattern dubbed the King's graph. All nodes were grouped into 4 groups. Every annealing step (0.22 μs), nodes in one of the groups processed in parallel: they read off the neighbor's spin and the associated weights to compute whether keeping the same spin or inverting its current spin provides lower energy in the neighborhood. In addition, random bit flips similar to those in standard simulated annealing algorithms were also adopted.
Because most of the compared designs are based on the behavioral model, for a fair comparison, a behavioral model was also created for the disclosed design. Fully-connected tiny graphs were generated with binary edge weights, and node sizes ranging from 16 to 32 (in increments of 4). Each node size had 20 sample graphs, for a total of 100 graphs. For these graphs, all possible spin combinations were enumerated to determine the actual maximum cut.
Finally, the machined was simulated with even larger graphs to test its scalability. Again, the behavioral model was adopted due to the non-trival amount of time when simulating large graphs at the circuit level. The original Gset graphs from Stanford Ye, https://web.stanford.edu/˜yyye/yyye/Gset were used. These graphs have between 800 and 20,000 nodes. The edges as well as the weights of such edges were generated probabilistically, sometimes between +1 and −1, and sometimes all +1. QuBRIM was compared with an ASA with unlimited nodes to map the problem as well as the classic Simulated Annealing (SA) algorithm. The whole test was done on server cluster nodes of Intel Xeon Platinum 8268 CPUs at 2.9 GHz with 371 GB of RAM. Table 2 below summarizes the results for benchmarks of Gset with no more than 1000 nodes. For QuBRIM, the distances ranged from 0 to 16, with a mean of 5.45 and a median of 4. In contrast, the distance for ASA (1 ms) ranged from 0 to 42 with a mean of 15.1, and a median of 14. Additionally, the distance for SA (2 secs) ranged from 0 to 2 with a mean of 0.3, and a median of 0. SA obtained slightly better results, but note that it takes 6 orders of magnitude longer than QuBRIM. Increasing annealing time for QuBRIM may improve quality but becomes extremely expensive to simulate.
A bistable, resistively-coupled Ising Machine with quantized nodal interaction is disclosed herein. The simulation results indeed demonstrate its capability of solving challenging computation problems such as Max-Cut problems. Utilizing only simple building blocks makes the disclosed topology more scalable and inexpensive as compared to other desk-size (or even room size) Ising machines. With the help of spin-fix annealing, the performance of the disclosed machine is further improved. Not only does the probability of reaching Max-Cut increase by a factor of 2.6, but the solution quality is significantly better when the global optimum is not reached.
The disclosures of each and every patent, patent application, and publication cited herein are hereby incorporated herein by reference in their entirety. While this invention has been disclosed with reference to specific embodiments, it is apparent that other embodiments and variations of this invention may be devised by others skilled in the art without departing from the true spirit and scope of the invention. The appended claims are intended to be construed to include all such embodiments and equivalent variations.
The following publications are incorporated herein by reference in their entirety.
A. Sharma, R. Afoakwa, Z. Ignjatovic, and M. Huang, “Increasing Ising machine capacity with multi-chip architectures,” in Proceedings of the 49th Annual International Symposium on Computer Architecture, ser. ISCA '22. New York, NY, USA: Association for Computing Machinery, 2022, p. 508-521. [Online]. Available: https://doi.org/10.1145/3470496.3527414
Afoakwa, R., Zhang, Y., Vengalam, U., Ignjatovic, Z. & Huang, M. Brim: Bistable resistively coupled Ising machines. In International Symposium on High-Performance Computer Architecture (2021).
B. Erbagci, N. E. C. Akkaya, C. Teegarden, and K. Mai, “A 275 Gbps AES encryption accelerator using ROM-based S-boxes in 65 nm,” in 2015 IEEE Custom Integrated Circuits Conference (CICC), 2015, pp. 1-4.
B. Wilson, “Tutorial review trends in current conveyor and current-mode amplifier design,” International Journal of Electronics, vol. 73, no. 3, pp. 573-583, 1992. [Online]. Available: https://doi.org/10.1080/00207219208925692
Böhm, F., Verschaffelt, G. & Van der Sande, G. A poor man's coherent Ising machine based on opto-electronic feedback systems for solving optimization problems. Nature Communications 10, 3538 (2019). URL https://doi.org/10.1038/s41467-019-11484-3.
C. Johnson, D. H. Allen, J. Brown, S. Vanderwiel, R. Hoover, H. Achilles, C.-Y. Cher, G. A. May, H. Franke, J. Xenedis et al., “A wire-speed power tm processor: 2.3 GHz 45 nm soi with 16 cores and 64 threads,” in 2010 IEEE International Solid-State Circuits Conference (ISSCC). IEEE, 2010, pp. 104-105.
C. Roques-Carmes, Y. Shen, C. Zanoci, M. Prabhu, F. Atieh, L. Jing, T. Dubcek, C. Mao, M. R. Johnson, V. Ceperic, J. D. Joannopoulos, D. Englund, and M. Soljacic, “Heuristic recurrent algorithms for photonic Ising machines,” Nature Communications, vol. 11, no. 1, p. 249, 2020.
F. Ma, J.-K. Hao, and Y. Wang, “An effective iterated tabu search for the maximum bisection problem,” Computers & Operations Research, vol. 81, pp. 78-89, 2017.
H. Kaul, M. A. Anders, S. K. Mathew, G. Chen, S. K. Satpathy, S. K. Hsu, A. Agarwal, and R. K. Krishnamurthy, “14.4 a 21.5 m-queryvectors/s 3.37 nj/vector reconfigurable k-nearest-neighbor accelerator with adaptive precision in 14 nm tri-gate cmos,” in 2016 IEEE International Solid-State Circuits Conference (ISSCC). IEEE, 2016, pp. 260-261.
Honjo, T. et al. 100,000-spin coherent ising machine. Science Advances 7, eabh0952 (2021).
Inagaki, T. et al. A coherent ising machine for 2000-node optimization problems. Science 354, 603-606 (2016).
Isakov, S., Zintchenko, I., Rønnow, T. & Troyer, M. Optimised simulated annealing for ising spin glasses. Computer Physics Communications 192, 265-271 (2015). URL http://dx.doi. org/10.1016/j.cpc.2015.02.015.
J. Cai, W. G. Macready, and A. Roy, “A practical heuristic for finding graph minors,” ArXiv, vol. abs/1406.2741, 2014.
K. Takata, A. Marandi, R. Hamerly, Y. Haribara, D. Maruo, S. Tamate, H. Sakaguchi, S. Utsunomiya, and Y. Yamamoto, “A 16-bit coherent ising machine for one-dimensional ring and cubic graph problems,” Scientific Reports, vol. 6, no. 1, p. 34089, 2016.
Lucas, A. Ising formulations of many np problems. Frontiers in Physics 2, 5 (2014).
M. Vidyasagar, Nonlinear System Analysis. SIAM, 1978.
M. Yamaoka, C. Yoshimura, M. Hayashi, T. Okuyama, H. Aoki, and H. Mizuno, “24.3 20k-spin ising chip for combinational optimization problem with cmos annealing,” in 2015 IEEE International Solid-State Circuits Conference-(ISSCC) Digest of Technical Papers, February 2015, pp. 1-3.
N. P. Jouppi, C. Young, N. Patil, D. Patterson, G. Agrawal, R. Bajwa, S. Bates, S. Bhatia, N. Boden, A. Borchers, R. Boyle, P.-1. Cantin, C. Chao, C. Clark, J. Coriell, M. Daley, M. Dau, J. Dean, B. Gelb, T. V. Ghaemmaghami, R. Gottipati, W. Gulland, R. Hagmann, C. R. Ho, D. Hogberg, J. Hu, R. Hundt, D. Hurt, J. Ibarz, A. Jaffey, A. Jaworski, A. Kaplan, H. Khaitan, D. Killebrew, A. Koch, N. Kumar, S. Lacy, J. Laudon, J. Law, D. Le, C. Leary, Z. Liu, K. Lucke, A. Lundin, G. MacKean, A. Maggiore, M. Mahony, K. Miller, R. Nagarajan, R. Narayanaswami, R. Ni, K. Nix, T. Norrie, M. Omernick, N. Penukonda, A. Phelps, J. Ross, M. Ross, A. Salek, E. Samadiani, C. Severn, G. Sizikov, M. Snelham, J. Souter, D. Steinberg, A. Swing, M. Tan, G. Thorson, B. Tian, H. Toma, E. Tuttle, V. Vasudevan, R. Walter, W. Wang, E. Wilcox, and D. H. Yoon, “In-Datacenter Performance Analysis of a Tensor Processing Unit,” SIGARCH Comput. Archit. News, vol. 45, no. 2, p. 1-12, June 2017.
P. I. Bunyk, E. M. Hoskinson, M. W. Johnson, E. Tolkacheva, F. Altomare, A. J. Berkley, R. Harris, J. P. Hilton, T. Lanting, A. J. Przybysz, and J. Whittaker, “Architectural Considerations in the Design of a Superconducting Quantum Annealing Processor,” IEEE Transactions on Applied Superconductivity, vol. 24, no. 4, pp. 1-10, 2014.
P. L. McMahon, A. Marandi, Y. Haribara, R. Hamerly, C. Langrock, S. Tamate, T. Inagaki, H. Takesue, S. Utsunomiya, K. Aihara, R. L. Byer, M. M. Fejer, H. Mabuchi, and Y. Yamamoto, “A fully programmable 100-spin coherent ising machine with all-to-all connections,” Science, vol. 354, no. 6312, pp. 614-617, 2016.
Q. Wu and J.-K. Hao, “Memetic search for the max-bisection problem,” Computers & Operations Research, vol. 40, no. 1, pp. 166-179, 2013.
R. Hamerly, T. Inagaki, P. L. McMahon, D. Venturelli, A. Marandi, T. Onodera, E. Ng, C. Langrock, K. Inaba et al., “Scaling advantages of all-to-all connectivity in physical annealers: the coherent ising machine vs. d-wave 2000q,” arXiv: arXiv: 1807.00089, 2018.
R. Hamerly, T. Inagaki, P. L. McMahon, D. Venturelli, A. Marandi, T. Onodera, E. Ng, C. Langrock, K. Inaba, T. Honjo et al., “Experimental investigation of performance differences between coherent ising machines and a quantum annealer,” Science advances, vol. 5, no. 5, p. eaau0823, 2019.
R. Harris, M. W. Johnson, T. Lanting, A. J. Berkley, J. Johansson, P. Bunyk, E. Tolkacheva, E. Ladizinsky, N. Ladizinsky, T. Oh, F. Cioata, I. Perminov, P. Spear, C. Enderud, C. Rich, S. Uchaikin, M. C. Thom, E. M. Chapple, J. Wang, B. Wilson, M. H. S. Amin, N. Dickson, K. Karimi, B. Macready, C. J. S. Truncik, and G. Rose, “Experimental investigation of an eight-qubit unit cell in a superconducting optimization processor,” Phys. Rev. B, vol. 82, p. 024511, July 2010.
S, J., M. B. P. & MP, S. An ultra-low area and full-swing output 3t xnor gate using 45 nm technology. In 2016 3rd International Conference on Advanced Computing and Communication Systems (ICACCS), vol. 01, 1-4 (2016).
S. Dutta, A. Khanna, A. S. Assoa, H. Paik, D. G. Schlom, Z. Toroczkai, A. Raychowdhury, and S. Datta, “An ising hamiltonian solver based on coupled stochastic phase-transition nano-oscillators,” Nature Electronics, vol. 4, no. 7, pp. 502-512, 2021.
S. Kirkpatrick, C. D. Gelatt, and M. P. Vecchi, “Optimization by simulated annealing,” science, vol. 220, no. 4598, pp. 671-680, 1983.
S. Song, W. Tang, T. Chen, and Z. Zhang, “LEIA: A 2.05 mm2 140 mW lattice encryption instruction accelerator in 40 nm CMOS,” in 2018 IEEE Custom Integrated Circuits Conference (CICC), 2018, pp. 1-4.
T. G. J. Myklebust, “Solving maximum cut problems by simulated annealing,” 2015.
T. Wang and J. Roychowdhury, “OIM: Oscillator-Based Ising Machines for Solving Combinatorial Optimisation Problems,” 2019.
Tajalli, A., Leblebici, Y. & Brauer, E. Implementing ultra-high-value floating tunable cmos resistors. Electronics Letters 44 (2008).
Takemoto, T., Hayashi, M., Yoshimura, C. & Yamaoka, M. 2.6 a 2 by 30k-spin multichip scalable annealing processor based on a processing-in-memory approach for solving large-scale combinatorial optimization problems. In IEEE International Solid-State Circuits Conference (2019).
Tatsumura, K., Dixon, A. R. & Goto, H. Fpga-based simulated bifurcation machine. In 2019 29th International Conference on Field Programmable Logic and Applications (FPL), 59-66 (2019).
Y. Chen, T. Luo, S. Liu, S. Zhang, L. He, J. Wang, L. Li, T. Chen, Z. Xu, N. Sun, and O. Temam, “DaDianNao: A Machine-Learning Supercomputer,” in 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture, December 2014, pp. 609-622.
Y. Takeda, S. Tamate, Y. Yamamoto, H. Takesue, T. Inagaki, and S. Utsunomiya, “Boltzmann sampling for an XY model using a nondegenerate optical parametric oscillator network,” Quantum Science and Technology, vol. 3, no. 1, p. 014004, November 2017.
Y. Yamamoto, K. Aihara, T. Leleu, K.-i. Kawarabayashi, S. Kako, M. Fejer, K. Inoue, and H. Takesue, “Coherent ising machines-optical neural networks operating at the quantum limit,” npj Quantum Information, vol. 3, no. 1, p. 49, 2017.
Y. Ye. G-set benchmark. [Online]. Available: https://web.stanford.edu/˜yyye/yyye/Gset
Z. Varty, “Simulated annealing overview,” 2017. [Online]. Available: https://www.lancaster.ac.uk/pg/varty/RTOne.pdf
This application is a U.S. national phase application filed under 35 U.S.C. § 371 claiming benefit to International Patent Application No. PCT/US2023/060567, filed Jan. 12, 2023, which claims priority to U.S. Provisional Application No. 63/298,859 filed on Jan. 12, 2022, incorporated herein by reference in its entirety.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US2023/060567 | 1/12/2023 | WO |
Number | Date | Country | |
---|---|---|---|
63298859 | Jan 2022 | US |