Many problems in logistics, financial portfolio management, drug discovery, and other application domains require finding an assignment of values to their inputs (typically called variables) with the goal of optimizing an objective. For instance, such problems include “combinatorial optimization problems”. Unlike other areas of optimization, combinatorial optimization relates to problems where variables take values from a finite set. For example, valid assignments could be binary (e.g. whether to make an investment or not), from a limited set (e.g. one of three available routes to pick), or, in general from a finite subset of the integers. In such problems, there is a finite set of ways for combining the values of each variable. In principle, it is possible to enumerate all possible combinations and find the optimal assignment. In practice, however, such exhaustive search is infeasible for problems of even moderate sizes, as the set of combinations is extremely large (exponential in the number of variables).
There has been extensive work towards understanding the structure of such problems. A subset of combinatorial optimization problems belongs to the category of problems known as NP-complete (where NP stands for nondeterministic polynomial). NP-completeness is a concept known in computational complexity theory, and all NP-complete problems can be transformed into any other NP-complete problem. An efficient solver for any NP-complete problem implies that any NP-complete problem can be solved efficiently. All NP-complete problems also belong to a larger subset of problems known as ‘NP-hard’, where all NP-hard problems can also be transformed into all other NP-hard problems.
The term “efficient” in this setting means finding a solution to the problem without enumerating all possibilities. Specifically, an efficient solution to a graph optimization described herein is a solution whereby the amount of time taken to find the solution scales polynomially with the number of variables of the problem (such as graph vertices), whereas enumerating all possible solutions is exponential with the number of variables of the problem. However, it is widely accepted that no such efficient solver can ever exist. Instead, work in this area has focused on devising algorithms that find solutions that are “good enough”; often there are no assurances that such approximation algorithms will indeed provide an answer which is close enough to the exact solution.
A variety of combinatorial optimization problems exist, and, as described above, NP-complete problems may be transformed to other NP-complete problems. For example, the traveling salesman problem is defined as follows: for a given set of cities and pairwise distances between cities, the problem is to find a path via all the cities, wherein each city is visited exactly once, such that the path has the shortest total length.
A general form of combinatorial optimization problem can be defined called quadratic unconstrained binary optimization (QUBO) problems, which are defined by a set of binary variables V={v1, v2, . . . vN}, each taking a value of either 0 or 1, and a formula ΣiΣjQij·vi·vj, where the coefficient Qij defines the interaction between variable vi and vj. The travelling salesman problem can be formulated as a QUBO problem, by defining variables as positions in the path between each city for each possible city to be visited (for example, the first variable may indicate whether or not London is the first city visited), and with the distances between cities encoded in the matrix Q, such that the total distance is minimised subject to the constraint that all cities are visited exactly once. QUBO is a type of polynomial unconstrained binary optimization (PUBO) problem, which assigns values to a set of binary variables V={v1, v2, . . . , vN} so as to minimise a formula ΣV′⊆VQV′·Πv∈V′v, where in this case, the coefficients Q may encode interactions between any number of variables. As mentioned above, it is possible to transform between different formulations of NP-hard problems. It is possible to transform PUBO problems to QUBO by introducing auxiliary variables and terms in the formula.
A formulation called the Ising model, used in physics to model ferromagnetism and other physical processes, is equivalent to the QUBO problem defined above. The Ising model is described in terms of a physical system with variables that can exist in two discrete states, where these variables can interact with each other, and the total energy of the system is given by H(σ)=−Σi,jJijσiσj−μΣihiσi. In the Ising formulation, the binary variables (sometimes referred to as ‘spins’) are typically assigned to one of +1/−1, rather than I/O or any other binary assignment. However, the Ising formulation can easily be mapped to the QUBO formulation for Boolean variables, by applying the formula: σi=2vi−1.
Note that the notation used above is slightly different between the QUBO and the Ising formulation, with the variables to be assigned represented by σ and the interaction coefficients represented by J for the Ising model. For simplicity, in this application the notation σ will be used for variables to which values are assigned and J will be used to denote arrays of interaction coefficients in the context of an Ising solver. However, Q and v may also be used herein in general to denote a matrix of weights and a variable, respectively.
The second term −μΣihiσi in the above expression for the total energy represents the effect of an external ‘field’ or some external effect on the system being modelled. For example, in a ferromagnetic material, the first term represents the energy contribution for interactions between magnetic dipoles, while the second term represents the energy of the system due to an external magnetic field. Many problems are modelled as Ising problems without external fields, as it is much simpler to solve the Ising problem in the case of no external field. However, it is possible to convert a problem with an external field to a problem without an external field by introducing an extra spin and additional edges with weights chosen carefully. Any problems or models referred to as Ising problems in the description below assume no external field, or a problem already converted to one with no external field.
Until recently, algorithms for combinatorial optimization have typically been implemented in digital hardware, such as commodity CPUs, FPGAs, GPUs, and ASICs. Digital hardware has great advantages with respect to flexibility (i.e. the ability to program different algorithms), and reliability. However, digital solutions are also limited by the speed of execution and power consumption. In the past, improved computational power and reduced consumption could be achieved for each generation of digital hardware. It is widely predicted that improving performance of digital hardware will be increasingly difficult, as fundamental physical limits are approached. Searching for better answers for combinatorial optimization problems or tackling larger instances of those will come at a greater hardware cost.
However, more recently, there have been attempts to solve such problems using hardware based on non-digital physical processes. A popular realization of the Ising model with a physical process uses quantum annealers. In existing systems, the problem variables are represented by quantum bits, taking values +1 and −1, usually referred to as “spins”. However, this topology does not allow full connectivity. Instead, the qubits interconnect in an architecture comprising sets of connected unit cells, each with four horizontal qubits connected to four vertical qubits via couplers. Unit cells are tiled vertically and horizontally with adjacent qubits connected, creating a lattice of sparsely connected qubits. The limited connectivity of this architecture has undesirable implications, resulting in inefficient representations of the problem variables into the spins, i. e. the number of the quantum bits required for the physical system to represent the problem is much higher than the original variable number.
Due to this inherent physical limitation of the quantum annealers hardware, algorithms have been developed which can run in classical hardware and are inspired by the physical properties of Quantum. For example, Microsoft Azure has developed Quantum Inspired Optimization (QIO) algorithms, which have shown good promise to approximate PUBO problems.
In an optical solver, light signals are used to represent the input variables (e.g. σi=1 . . . N in the Ising problem), and an optical element is used to combine the signals in a way that models the interaction between the variables (e.g. the matrix J in the Ising problem). Optical elements that perform a vector-by-vector multiplication in the optical domain, such as a liquid crystal display or a ring-resonator, are known in the art. The summation (Σ) can be implemented using a photodetector that can perform coherent or incoherent addition of signals falling upon the photodetector's light sensing element.
In the cases where the inputs to the solver (i.e. the variables whose values are to be determined) can take binary positive and negative values, such as −1 and +1, or −½ and +½, then these are sometimes referred to as “spins” merely by analogy with the quantum property of spin. However in such a context, this does not actually mean the quantum property of spin. Instead the two possible “spin” values simply refer to two possible values of a binary variable, and could be represented using, for example, two different values of the amplitude, or phase, of the light.
State of the art solutions based or inspired in optics propose either digital approaches only (see Toshiba's Solving Traveling Salesman Problem with SBM [Simulated Bifurcation Machine], Ikuko Hasumi https://medium.com/toshiba-sbm/solving-traveling-salesman-problem-with-sbm-simulated-bifurcation-machine-89740c83ed37) or hybrid approaches, as per Böhm, Fabian, Guy Verschaffelt, and Guy Van der Sande. “A poor man's coherent Ising machine based on opto-electronic feedback systems for solving optimization problems.” Nature communications 10.1 (2019): 1-9 (https://www.nature.com/articles/s41467-019-11484-3) and Inagaki, Takahiro, et al. “A coherent Ising machine for 2000-node optimization problems.” Science 354.6312 (2016): 603-606 (https://science.sciencemag.org/content/354/6312/603).
In hybrid approaches, a building block to generate a signal representing the variable values is typically implemented in optical hardware, but the logic to compute the variable interactions is implemented using digital hardware and hardware to convert between optical and digital domains. By contrast, in ‘all-analogue’ solvers, non-digital hardware is instead used to convert a signal between optical (i.e. a light signal) and analogue electronic domains. An advantage of all-analogue solvers is the speed at which optical and analogue electronic signals can be transmitted (digital electronics are inherently much slower due to the need to clock sequences of bits through flip-flops). Whereas implementing part of the iteration in the digital domain defeats the point of an all-analogue solver, which is the speed of transmission compared to digital electronics. The speed of the system will be limited by the slowest part, so the inclusion of any digital electronics negates the benefit of optical solvers.
An all optical solution has been proposed and demonstrated with all-to-all connectivity for 4 spins/variables and partial connectivity for 16 spins/variables. (Marandi, A., Wang, Z., Takata, K., Byer, R. L. & Yamamoto, Y. Network of time-multiplexed optical parametric oscillators as a coherent Ising machine. Nat. Photonics 8, 937-942 (2014), K. Takata et al. “A 16-bit Coherent Ising Machine for One-Dimensional Ring and Cubic Graph Problems”, Scientific Report 2016. A 16-bit Coherent Ising Machine for One-Dimensional Ring and Cubic Graph Problems (europepmc.org)).
State of the art all-optical solvers generate variables using optical signals in a time-division multiplexing architecture. I.e. the signals are multiplexed in series into the same beam of light, and a different delay path is introduced for each variable so that the signals can then be combined in order to model the interactions between the variables. However, for time-division multiplexing, because spin generation is carried out in series, the time complexity of the solver is linear in the number of variables being modelled.
Solvers which are implemented wholly in the analogue domain use optical or analogue electronic vector multipliers to model the ‘spin’ interactions of Ising systems. Implementing optical vector multipliers has the advantage pointed out above, that it leverages the speed of optical transmission.
The present disclosure relates to a solver architecture implemented in an all-analogue system (either in the optical domain, analogue electronic domain or a combination) which is configured to determine an approximate solution to a problem comprising vector-by-matrix multiplication, such as an Ising problem.
As mentioned above, existing all-optical solvers use time division multiplexing to generate the signals representing the different variables of the problem (i.e. the “spin signals”) in series, and then use delay-lines of different path length to combine them to model the interaction between the variables (e.g. see
The present disclosure on the other hand discloses an all-analogue solver, implemented in optical and/or analogue electronic domain, which employs a space-division architecture (e.g. see
The architecture is configured such that each variable to be modelled is associated with a different channel, and optical or analogue electronic hardware is configured to compute a contribution for each variable to the update of an overall function to be optimized. The advantage of using optical solvers or analogue electronic solvers over digital solvers is that computation can be carried out more quickly, as computation makes use of the speed of optical or electrical signals, limited only by the time taken to detect these signals, but not limited by some of the power and speed considerations of digital hardware (such as the need to shift bits through latches).
Furthermore, the division of variables into multiple parallel channels (i.e. space-division multiplexing rather than time-division multiplexing used in existing solver architectures) is advantageous as it can be scaled to a larger number of variables without increasing the time complexity of the problem significantly. Existing, time-division based optical solvers are limited in the number of variables that can be modelled due to the requirement that a delay path is introduced for each variable in order to compute interactions between variables, and the time taken per iteration for time-division all-optical solvers scales linearly with the number of variables of the problem. Furthermore, existing hybrid solvers are eventually limited in speed by the same constraints as all digital solvers
A first aspect of the present disclosure provides a system for estimating values of a vector of variables that optimize a function, the function comprising a weighted sum of a plurality of terms, each term comprising a product of a corresponding subset of the variables from said vector and each term and being weighted by a corresponding weight from a matrix of weights that models interactions between the variables; wherein the system comprises a plurality of parallel hardware channels arranged to operate simultaneously with one another, each arranged to model a contribution of a respective one of the variables to the function, each of the parallel channels comprising: a respective signal generator configured to generate a respective modelling signal having a modulated property modelling a value of the respective variable; a respective splitter arranged to supply an instance of the respective modelling signal to each of the parallel channels, each channel thus receiving a vector of signals modelling the vector of variables; respective interaction logic comprising a respective vector multiplier configured to multiply the received vector of signals by a respective vector of weights from the matrix of weights modelling an interaction between the respective variable and the vector of variables, the interaction logic thereby generating a respective feedback signal representing the contribution of the respective variable modelled by the respective channel; and a respective feedback path arranged to return the feedback signal to the respective signal generator, wherein the respective signal generator is configured to adapt the respective modelling signal in dependence on the feedback signal; wherein each channel, including the respective signal generator, splitter, interaction logic and feedback path in each channel, is implemented only using optical and/or analogue electronic components.
Another aspect of the present disclosure provides a method for estimating values of a vector of variables that optimize a function, the function comprising a weighted sum of a plurality of terms, each term comprising a product of a corresponding subset of the variables from said vector and each term being weighted by a corresponding weight from a matrix of weights that models interactions between the variables; the method comprising, at each of a plurality of parallel hardware channels: generating, by a respective signal generator, a respective modelling signal having a modulated property modelling a value of a respective variable; supplying an instance of the respective modelling signal to each of the parallel channels, each channel thus receiving a vector of signals modelling the vector of variables; multiplying, at respective interaction logic, the received vector of signals by a respective vector of weights from the matrix of weights modelling an interaction between the respective variable and the vector of variables, thereby generating a respective feedback signal representing the contribution of the respective variable modelled by the respective channel, returning the feedback signal to the respective signal generator; and adapting, by the respective signal generator, the respective modelling signal in dependence on the feedback signal; wherein the method is implemented only using optical and/or analogue electronic hardware.
For a better understanding of the present disclosure, and to show how embodiments of the same may be put into effect, reference is made to the accompanying figures in which:
A general combinatorial optimisation problem can be solved by first mapping said problem to a QUBO problem, which can then be mapped to an Ising problem. For many problems, there is a known mapping to the QUBO formulation. For others, a mapping may have to be derived. The mapping of general NP-hard problems to a QUBO or Ising formulation is a topic that, in itself, will be understood by a person skilled in the art of mathematics. For example, a problem expressed in PUBO form, for example a cubic unconstrained binary optimization problem with the formula ΣijkQijkvivjvk, may be expressed as a QUBO problem by introducing extra variables and terms, and may thus be solved by the Ising solver disclosed herein. The solver disclosed herein provides a solution to an Ising problem, which can be used solve any NP-hard problem for which a mapping to that problem can be found.
In doing this, the problem is mapped to a physical system whose total energy is given by the Ising Hamiltonian, i.e. −Σi,jJijσiσj (assuming no external field). To map the given problem to an Ising system, the matrix J needs to be determined such that minimizing the total energy −Σi,jJij·σi·σj (i.e. maximizing Σi,jJij·σiσj) is equivalent to optimizing the problem.
The Hamiltonian is therefore a sum of a plurality of terms Jij·σi·σj, each being a product of a respective subset of the variables σi, σj, with a corresponding weight Jij. (so the first term is the subset of one variable σi, σj multiplied with itself and the weight J11, and the second term is the subset σ1, σ2 multiplied together and with the weight J12, etc.).
As can be seen, this sum can be broken down into a series of vector-by-vector (dot product) multiplications, by taking σi out to the left of the sum:
In this final representation, the sum in each line is an individual vector multiplication which represents a contribution of a different respective one of the variables to the energy in the Hamiltonian. I.e. the vector multiplication (σ1J11+σ2J12+ . . . +σNJ1N) is the contribution of σ1 toward the energy, and (σ1J21+σ2J22+ . . . +σNJ2N) is the contribution of σ2, etc. the weights J represent the interactions between variables (so J11 is the interaction of σ1 with itself, J22 is the interaction between σ1 and σ2, etc.). The weights are set depending on the problem being modelled (and for any given problem some weights may be zero). In the system of
An example application is the travelling salesperson problem. In a simple example, imagine there are three cites the salesperson needs to visit: London, Edinburgh and Cardiff. These can be modelled with nine variables in a QUBO problem: v1 represents London being visited first, v2 represents London visited second, v3 represents London third, v4 represents Edinburgh first, v5 represents Edinburgh second, v6 represents Edinburgh third, v7 represents Cardiff first, v8 represents Cardiff third, and v9 represents Cardiff third. The elements of the matrix Qij represent the penalties of travelling between corresponding pairs of cities. So Q15 (London first and Edinburgh second) is the distance penalty for London to Edinburgh, etc. Note that some weights, such as Q19 (London first then Cardiff third) are set to zero since they are not meaningful in this problem as the total distance is determined by the distances between consecutive cities only. Other weights, such as such as Q12 or Q13 (London first then London second, or London first then London third) may be set to large penalty values so as to impose the constraint that each city is visited once. The QUBO problem (minimizing ΣiΣjQij·vi·vj) can then be transformed into an Ising problem (minimize the energy in the Hamiltonian term —) and solved using the solver system of
Another example is a molecular similarity problem for estimating the molecular similarity between two molecules. E.g. this could be used to estimate that one molecule is likely to block another for use in a drug. Modelling molecular similarity as a QUBO problem is, in itself, known in the art.
An update rule may be derived for adapting the signals generated by the system in the direction that minimizes the Hamiltonian of the Ising system modelling the problem. A possible update equation for the Ising model may be written as follows:
where xi is the value of the modelling signal generated by the system to model the variable σi in the Ising model. This update equation is derived from the Hamiltonian of the Ising model. To derive the update equation, the update of each spin may be defined based on the expected effect that changing that spin's value has on the total energy. This can be used to derive an expression for the update of:
where the inside of the brackets above can be evaluated as 2 ΣiJij<xj>. The terms of the update may be multiplied by constants σ and β in order to control the size of the update of each spin at each iteration to ensure the system as a whole is adapted towards a minimum. The term in the above equation is Gaussian noise, applied at each iteration to perturb the system to avoid getting ‘stuck’ at a non-optimal solution. Finally, the cos2 ( ) term in equation 1 may be derived by observing that the Taylor expansion of
is approximately equal to the update equation above for appropriately chosen constants σ and β. Cosine squared is a useful approximation for optical spin generation in particular, due to specific hardware that can readily compute this function, described later. However, a different approximation may be made to evaluate other approximations of Equation 2 above. For example, analogue electronic components may be used to evaluate terms of a Taylor expansion of Equation 2 directly and generate an analogue signal for the updated value of xi. For example a cubic or quintic approximation of equation 2 could be used (an expansion only up to the cubic or 5th-order term, respectively). Indeed the reason for using cos2(x−π/4)−½ in equation (1) is because it approximates to x−x3 for small enough x. In other examples, any other formula that provides similar approximation (i.e. it is linear for x around 0) could also work.
As shown in
The above formula will be described in further detail later. Other formulations are possible. Whatever formulations are used, the underlying property of the update equation is that it pushes or adapts the signals xi such that the physical system being modelled tends towards a minimum energy (i.e. the minimum of the Hamiltonian given above). This is driven by the term βΣjJijxj[k], which represents a contribution of the given signal to the energy of the overall system, and whose sign determines the direction of the update. In other words, this term provides feedback to the signal generator 102 to adapt the respective modelling signal x, in the respective channel 102. The sign of this feedback causes an adaptation in the respective modeling signal (x) which drives that signal in a direction which reduces the overall energy in the Hamiltonian of the system. A value of this feedback determines the degree of the adaptation (optionally damped relative to the signal x by the coefficients σ and β).
Note that, while the solver determines signals directly representing Ising variables σi, this is equivalent to finding an optimal mapping of the QUBO variables v1, and may be transformed into a different set of variables in the form of the original problem. However, it is important that a mapping exists between a set of Ising variables (spins) which can be determined by the solver and a set of variables optimizing the original problem. Note that in the below description, either of vi or σi may be used to denote a binary variable modelled by the solver.
An all-analogue solver can be implemented which models the value of the binary variables σi as optical or electrical analogue signals, and which performs the above update for each modelled variable using a combination of non-digital hardware components. The solver generates an initial set of signals representing a given assignment of variables and generates new signals in a series of iterative steps based on a feedback signal computed using interaction logic implemented in analogue electronic or optical hardware. An example implementation of a solver architecture for an Ising problem is described in more detail later.
There are many choices of solver configuration which may be arranged according to the present disclosure, each configuration generating a feedback signal which encourages the signals generated over time into a set of signals which minimize the total energy of an ‘Ising’ system, which can be mapped to an optimal assignment of variables for the given problem definition.
The present disclosure provides a novel architecture for solving combinatorial optimization problems which can be mapped to Ising problems of N variables (sometimes referred to as ‘spins’), wherein the variables of the problem are modelled by a set of N distinct hardware channels, and updated iteratively based on feedback provided by signal interaction logic modeling the interaction of the variables according to the given problem definition. The system occurs only in the optical and analogue electronic domains, and the signal interaction logic may be modelled by either optical or analogue electronic hardware. This will now be described in further detail with reference to
A first channel 102 is configured to compute a modelling signal x1 corresponding to an Ising variable σ1 taking either a positive or a negative “spin” value, with the modelling variable x1 updated based on the feedback received at each iteration of the optimization. Note that while the variable a being modelled may be binary, the modelling signal x may take a soft value that can vary between the two possible binary values of the variable. The process of determining the contribution by each channel will now be described. Note that each channel comprises hardware components which carry out the same steps to compute its respective contribution to the function.
Each channel 102 comprises a signal generator 100, a splitter 106 and signal interaction logic 104, each of which may comprise one or more hardware components. Note that ‘logic’ as used herein in this context does not refer to digital logic, but rather refers to signal operations carried out using analogue or optical hardware. The signal generator 100 generates a modelling signal for σi, with a measurable property of the signal representing a binary value of the variable σi. The signal may, for example, be an optical signal generated by a light source such as a laser. An optical modulator may be used to modulate a property of the optical signal to model the variable σi. For a binary variable σi to be encoded in the value of a property such as amplitude, a mapping should be defined between the possible modulated property values (amplitude) and the binary values (e.g. 1 and −1). For example, xi may lie in the range between [−a, +a], where a is some constant, and where a positive amplitude maps to an Ising variable σ=1, and a negative amplitude maps to an Ising variable σi=−1. Once a modulated signal modelling the variable σi has been generated (this may be referred to herein as a ‘modelling signal’ xi), this signal can be copied by applying a splitter 106, to generate multiple instances of that modelling signal xi encoding the same variable vi, which can be communicated to other channels as shown by the arrows in
The signal interaction logic 104 receives multiple modelling signals, representing a vector of variables, with each signal received from the splitter 106 of a respective channel j. The interaction logic 104 comprises a vector-by-vector multiplier that combines the modelling signals xi into a signal representing a weighted sum of the modelled variables, with the weights corresponding to the relevant elements of the matrix J defining the spin interaction for the Ising problem. There are various possible hardware configurations that may be used to perform vector-by-vector multiplication. One example disclosed herein is a wavelength selective switch (WSS). This is described in further detail later. Optical vector-by-vector multiplication may alternatively be carried out by other, known optical technology including spatial light modulators (SLM), ring resonators or Mach-Zehnder interferometers (MZIs), or some combination of such technologies or other suitable optical components. As another alternative, the vector-by-vector multiplication operation may also be implemented in the analogue electronic domain (i.e. using electrical signals), for example by using memristors.
Note that, while
The feedback signal is passed back along a feedback path 108 to the signal generator 104, which determines a new signal according to the hardware of the system. The updated signal may be generated, for example, by passing the feedback signal to a modulator to modulate the input signal from a light source, and detecting the resulting optical signal with a photodiode. Alternatively, in some embodiments, an analogue electronic signal encoding the feedback signal may be generated directly using analogue electronic components, for example, by using memristors. Either way, the system is designed such that over time it tends to a stable state which maps to an optimal assignment of the variables which minimize the energy function for the given problem formula.
Each channel updates its signals according to the same scheme described above, until a stable state is reached for all signals, corresponding to a particular assignment of variables to values. The pairwise interactions of an arbitrary number of variables σ1, . . . , σN may be modelled in this way, by setting up N channels and splitting each signal to N identical copies of the signal, one to be sent to each channel.
Each channel 102 iteratively generates an updated modelling signal xi according to a feedback signal until the system settles into a stable set of states, representing an optimal assignment of variables according to the optimization problem to be solved. As described above, an update of the signal is given by the update equation, for example:
where xi[k] is the modelling signal at the kth iteration, Jij is the coefficient defining the interaction between the ith and jth variables according to the given problem as mapped to an Ising system, σ and β are multiplicative constants, and ζi[k] is a Gaussian noise term. The factors σ and β are chosen so as to control the size of the update of each variable, where a large σ relative to β causes the signal to move slowly in the direction given by the β term, i.e. β*ΣJijxij[k]. This is important in a system of many variables, as large updates at each step can prevent convergence of the full system to a suitable local optimum. Similarly the noise term provides a perturbation to the signal at each step to ensure that the system does not become ‘stuck’ in a local minimum that is a poor approximation of an optimal set of variables. The above equation may be derived mathematically by applying known principles based on the Hamiltonian of the Ising model and using sensible approximations. In particular, the cos2( ) term approximates the optimal update, which is easily applied using particular optical hardware, described later. The operation of a single channel 102 will now be described with reference to
An initial signal is generated by the spin generation hardware 300 representing an initial binary value of the variable a modelled by the given channel. Note that ‘spin’ is used herein to refer to a signal representing a binary variable of an Ising system, and should not be confused with the quantum mechanical definition of spin. An example implementation of the hardware components of the spin generation hardware 300 are described in more detail below, with reference to
Note that in this embodiment, the spin generator 300 comprises only part of the signal generator 100 of
Along the first path, the signal is combined with the output of a light source 302, which is a laser at a specific wavelength, in a modulator 304 to modulate the laser beam, thereby generating a modelling signal xi, as described above with reference to
The modulator 304 sends the modelling signal to a 1-to-N splitter 306, which communicates an identical optical signal to a vector-by-vector multiplier 314 (VVM) in each of the N channels of the system. In the example of
In embodiments, the signal output by the VVM 314 remains in the optical domain, as shown by the unbroken lines in
Along the second path, the signal i is output to an amplifier which amplifies the electrical signal, representing the multiplication of the variable σi by a constant σ, shown in Eq. 1. This is added to the sum
which is communicated along a feedback path from the analogue addition hardware 318, to obtain a signal
Finally, the updated signal is determined in the spin generation hardware 300, which modulates an optical signal based on the feedback signal 108 to compute a cosine of the feedback signal, detecting this signal at a photodetector and adding a second adaptive term, in order to evaluate the full expression of Eq. 1 and output an analogue electronic signal. Note that direct detection by the photodetector generates the square of the cosine in Eq. 1, as the photodetector measures intensity of the optical signal which is proportional to the square of the signal itself. For this reason, direct detection cannot be used for phase-modulated signals, as all phase information is lost in the detection of light intensity.
An example of the evaluation of the update equation by the spin generator 300 is described below with reference to
Note that multiple components operating together in
Each channel i is implemented in hardware which computes updates to that channel's signal in parallel. Updates continue until the system is stopped, for example after a predetermined stopping point of M iterations. Alternatively, the signals may be measured periodically, and the system stopped if there are no changes observed between subsequent measurements. An approximate solution is found when the system stabilizes, i.e. the set of variables modeled by the generated signals stays constant from one iteration to the next. This stable set of signals may then be mapped directly to an assignment of N variables which approximate the solution for the given Ising problem.
The example embodiment shown in
During each iteration of the example solver shown in
The signal-to-signal interaction logic 502 of
Note that in the example embodiment described above, the vector-by-vector multiplication operation of the signal interaction logic 504 is implemented in the optical domain, e.g. by a wavelength selective switch, described later. However, in other embodiments, the signal interaction may be implemented in the analogue electronic domain. Similarly, in some embodiments, other arithmetic operations such as addition of signals, may be carried out in the optical domain rather than the analogue electronic domains. The process shown in
As described above, an advantage of an architecture described herein is that it uses a ‘space-division’ multiplexing architecture, meaning that a system of N variables is modelled using separate hardware for each variable. Some state-of-the-art solvers, by contrast, use a time-division multiplexing architecture.
By contrast, the space-division multiplexing architecture shown in
As described with reference to
One possible detection scheme that allows detection of positive and negative values is coherent detection, which measures the amplitude and phase information of the received optical signal, which can be either positive or negative. However, a disadvantage of coherent detection is that it is more complex to implement than direct detection of light intensity. Coherent detection schemes often require digital signal processing. Some of the advantages of processing signals in the optical and analogue electronic domains, such as the speed of transmission of the signal are lost or diminished if converting back to the digital domain to carry out coherent detection.
An alternative detection method uses direct detection, i.e. detection of light intensity, which does not require the system complexity of coherent detection. Direct detection measures a positive-only signal in the analogue electronic domain, which may then be offset in the analogue electronic domain by adding or subtracting adaptive terms to correct the range of the signal to allow positive or negative values. This may be referred to as ‘differential detection’. Similar detection schemes are used in telecommunications to detect binary phase shift keying signals, which are real-valued.
A schematic illustration of this direct detection scheme is shown in
This signal is converted into an analogue signal by detecting it at photodetector 308. However, the detected signal is restricted to be positive only, as the photodetector 404 measures light intensity, which cannot have negative values. To correct this, the output signals of the VVM operation are corrected to allow positive or negative values by adding a DC offset term, shown in
This enables measurement of positive and negative signals required by the solver by adjusting the signal in the analogue domain. This differential detection scheme is simpler than a coherent one and can be implemented easily to convert the signal directly from the optical to the analogue electronic domain. However, for the VVM output, if the given input signals have different wavelengths, attention should be paid that the path lengths of all signals are matched. Incoherent addition of signals will be described in more detail below in the context of the operation of a wavelength selective switch.
While this differential detection scheme is described above in relation to the present solver architecture, direct detection with adaptive offset terms can be used for any application in which optical vector-by-vector multiplication operations taking real positive and negative values can be implemented. For example, this may be used in machine learning applications, such as deep neural networks, in which input vectors may be multiplied by network weights. This differential detection scheme may be applied to applications using various types of optical VVMs such as spatial light modulators (SLM), ring resonators, or wavelength selective switches, described in more detail below. This differential detection method has the advantage of allowing operations to be carried out in the optical domain, providing a significant speed improvement over digital operations, while enabling the desired range of real valued signals to be modelled, without requiring the difficult implementation of coherent detection schemes. Such a differential detection scheme may be implemented without requiring phase sensitivity of the system if different wavelengths are used for the input signals of the OVM, such as in a wavelength-selective switch or ring resonator VVM.
Note that in the described embodiments, the solver models x and the corresponding feedback signal in the form of positive and negative “spin” signals representing Ising variables, e.g. −1/1, as a method to solve QUBO problems which are easily mapped to Ising problems. The sign of the feedback signal represents the direction in which to drive the modelling signal x to reduce the energy of the Hamiltonian. However, in other embodiments, it is not excluded that purely positive signals could be used. Instead the matrix J may include positive and negative weights. In such embodiments the DC offset 310, 320 is not necessarily required. For example, QUBO variables I/O may be modelled directly. In this case, the positive signals generated by direct detection may not need to be corrected.
As described above, each channel may implement a respective vector-by-vector multiplier as part of the interaction logic 104. Various possible vector-by-vector multiplier configurations may be used with the solver architecture disclosed herein. Some VVMs may be implemented entirely in the optical domain, such as spatial light modulators, ring resonators, and Mach-Zehnder Interferometers. Other VVMs may be implemented in the analogue electronic domain, for example using memristors to compute the weighted sum of electrical signals.
One example of an optical VVM (OVVM) which is disclosed herein for use in some embodiments of the solver architecture disclosed herein is a wavelength selective switch. (WSS). WSSs are used in telecommunication applications and they allow signals at different wavelengths to be independently optimized to guarantee that all the signals are transmitted at the same power, as well as allowing signals of different wavelengths to be combined together in a single optical fiber or vice versa for add or drop functions at transmission nodes.
The implementation of WSS for optical vector multiplication is based on the fact that WSSs have the capability of emulating the product function as they attenuate (weigh) each individual wavelength and the addition function, achieved by its capability of combining different wavelength into a single fiber, subsequently detected by at least one photodetector.
A vector-by-matrix multiplication can be broken down into a series of vector-by-vector products of the following form:
Each element of the output vector o is a sum of elements of the ith row of the weight matrix W applied to the input vector y.
The configuration of
The operation of a wavelength selective switch to perform vector-by-vector or vector-by-matrix multiplication based on the above principle will now be described with reference to
The input vector v is represented by a set of optical signals 800 of different wavelengths, which may be, for example, a set of modelling signals {x1, . . . , xN} received from the N channels of a solver such as the one shown in
Note: for simplicity of illustration the fibers 808, 818 for only one channel 102 of the solver are shown in
The corresponding elements of the weight matrix Q are implemented in a spatial light modulator (SLM) 810, one example of which is a liquid crystal on silicon spatial light modulator (LCoS-SLM), which modulates each input optical signal 800 by a specific factor as described above. In this case, the signals are modulated by a factor dependent on the wavelength of the input, where each column of the SLM 810 corresponds to a different incident wavelength. The input signals 808 are passed through a lens to ensure that each of the signals reach the SLM in the correct horizontal position for its respective wavelength.
The output signal for a given channel is obtained by detecting at a photodetector 820 the modulated optical signals, combined into a single beam 818, which is then detected at a photodetector 820. The combination of the various optical signals, each having a different wavelength, into a single beam at the photodetector 820 may be referred to as wavelength-division multiplexing (WDM). This is facilitated by an arrangement of one or more lenses 816 and/or dispersive element(s) 814 (e.g. diffraction elements such as prisms or diffraction gratings); while the SLM guarantees independent weights to each individual wavelength.
The photodetector 820 performs incoherent addition of the various constituent light signals of different wavelengths. In order for the incoherent detection to compute the sum of the intensities of the constituent signals, it should be ensured that the difference in frequency of the respective signals being combined is much larger than the frequency bandwidth of the photodetector, meaning that the photodetector does not detect cross-terms from the interaction of the signals with each other. Incoherent detection of signals of different wavelengths does not require the signals to be phase matched. By contrast, if using a VVM architecture that takes as input light sources of the same wavelength, coherent addition must be performed at the detector, which has the difficult requirement of requiring all signals to be phase matched.
An architecture similar to that shown in
As described above, the solver described herein for Ising problems may be implemented in one of two architectures. In the first, as shown in
However, in the second architecture, a global vector-by-matrix multiplier (VMM) may be implemented, wherein the channels of the solver each provide their modelling signal xi to the VMM to form an input vector, the matrix in full being implemented in this VMM. The solver architecture shown in
An example WSS architecture is now described which extends the architecture of the WSS vector-by-vector multiplier to carry out vector-by-matrix operations. This architecture has the advantage of being capable of processing many more spins simultaneously than the vector-by-vector WSS described above.
To use a spatial light modulator for vector-by-matrix multiplication in this example solver architecture, the vertical axis of the SLM needs to provide different weights even for the same wavelength, so that the whole functionality of the vector-by-matrix multiplication is achieved. This is because, for matrix multiplication, the input vector needs to be multiplied by each row of the matrix Q to generate the full output vector. The SLM 908 is a modified version of that shown in
In the example solver architecture with a global VMM, a single input array 908 comprises the modelling signals xi generated at each channel. This vector is passed through a lenslet array 900 having a particular geometry that causes the signals to spread out vertically, while collimating the beam in the horizontal direction of the SLM 902 corresponding to that signal's wavelength. This allows more input signals of different wavelengths to be processed at a single SLM. Moving from a single lens as in
Note that in the architecture of
The SLM 902 comprises a 2D array of modulators, each element of the array applying a respective weight to the received input signal, in contrast with the SLM described for the vector-by-vector multiplier in
In embodiments, the output signals may be directed from the element 814 via one or more lenses, to direct the signals into a beam at the correct vertical height to be detected using incoherent addition at the photodetector corresponding to the output vector element represented by that beam. E.g. another lenslet array may also be included between the dispersive element 814 and the multiple channels (potentially fibers) at the end of the system.
The photodetector array 904 is arranged as a set of photodetectors in a vertical array, each combined signal directed from the dispersive element 814 corresponding with the output signal for a different channel.
A solver which uses a vector-by-matrix multiplier architecture described above allows simultaneous processing of the interaction of spins for all channels using a single hardware arrangement such as the one shown in
While optical vector multiplication has also been implemented by a number of existing technologies, such as spatial light modulators which do not use wavelength division multiplexing, ring resonators, and Mach Zehnder interferometers. Such technologies are described in detail for example in K. Kitayama et al, “Novel frontier of photonics for data processing—Photonic accelerator”, APL Photonics 2019, https://doi.org/10.1063/1.5108912, which is incorporated herein by reference in its entirety. The wavelength-selective switch implementation combines the spatial modulation of SLMs with the wavelength division of ring resonators, but where a ring resonator implementation requires the input signal to be passed through a series of ring resonators, the SLM only requires each signal to be passed through a single modulator, which is an advantage in terms of system losses. SLM VMM implementations do not use wavelength division, and instead use a single optical source, and use coherent addition at the photodetectors to compute the weighted sum for each element of the output array. The wavelength selective switch combines the advantages of both these techniques.
While the above description of wavelength-selective switches refers to its implementation in a solver architecture such as that described herein. However, vector-by-matrix multiplication has many applications, particularly in machine learning, for example to apply weights of a neural network to input vectors. The wavelength selective switch described herein may be used in such applications. Similarly, the wavelength selective switch VMM may be applied to other solver architectures, such as the time-division multiplexing architecture shown in
The techniques disclosed herein can be applied to a wide range of applications, in particular the solver implementation disclosed herein can be used to solve any NP-hard problems for which a known transformation to the Ising formulation exists. A well-known example of such problems is the Travelling Salesman problem. This may be also be used for problems in other fields, for example, in determining molecular similarity, for which work has been done to find a transformation of a graph similarity problem of graphical representations of molecules into a QUBO formulation. This work is described in Hernandez, Maritza, et al. “A quantum-inspired method for three-dimensional ligand-based virtual screening.” Journal of Chemical Information and Modeling 59.10 (2019): 4475-4485.
It will be appreciated that the above embodiments have been described by way of example. Other variants and applications of the disclosed techniques may become apparent to a person skilled in the art once given the disclosure of the concepts herein.
More generally, according to one aspect disclosed herein, there is provided a system for estimating values of a vector of variables that optimize a function, the function comprising a weighted sum of a plurality of terms, each term comprising a product of a corresponding subset of the variables from said vector and each term being weighted by a corresponding weight from a matrix of weights that models interactions between the variables; wherein the system comprises a plurality of parallel hardware channels arranged to operate simultaneously with one another, each arranged to model a contribution of a respective one of the variables to the function, each of the parallel channels comprising: a respective signal generator configured to generate a respective modelling signal having a modulated property modelling a value of the respective variable; a respective splitter arranged to supply an instance of the respective modelling signal to each of the parallel channels, each channel thus receiving a vector of signals modelling the vector of variables; respective interaction logic comprising a respective vector multiplier configured to multiply the received vector of signals by a respective vector of weights from the matrix of weights modelling an interaction between the respective variable and the vector of variables, the interaction logic thereby generating a respective feedback signal representing the contribution of the respective variable modelled by the respective channel; and a respective feedback path arranged to return the feedback signal to the respective signal generator, wherein the respective signal generator is configured to adapt the respective modelling signal in dependence on the feedback signal; wherein each channel, including the respective signal generator, splitter, interaction logic and feedback path in each channel, is implemented only using optical components and/or analogue electronic components.
In embodiments, each of the variables is binary.
In embodiments, each of the variables can take either a positive or a negative value, and the function comprises a Hamiltonian of the form:
where σ1 . . . N are the variables of the vector, Jij is the matrix of weights, i and j are indices for enumerating instances of the variables between 1 and N, and Σi,j represents a sum over all the subsets (σi, σj) of variables included in the function; wherein the system comprises a respective one of said channels for each i, and the respective vector multiplier is configured to perform the multiplication Σjσj·Jij, thus modelling the respective contribution of as to the function; and wherein the signal generators are configured to optimize the function by performing the adaptation of the modelling signals so as to minimize an energy of the Hamiltonian.
In embodiments, the positive and negative values are +½ and −½, or +1 and −1.
In embodiments, each of the variables is Boolean, and the function comprises a quadratic unconstrained binary optimization, QUBO, problem of the form:
where v1 . . . N are the variables of the vector, Qij is the matrix of weights, i and j are indices for enumerating instances of the variables between 1 and N, and Σi,j represents a sum over all the subsets (vi, vj) of variables included in the function; wherein the system comprises a respective one of said channels for each i, and the respective vector-by-matrix multiplier is configured to perform the multiplication Σjvj·Qij, thus modelling the respective contribution of vi to the function; and wherein the signal generators are configured to optimize the function by performing the adaptation of the modelling signals so as to minimize said function.
In embodiments, in each channel the respective feedback path is arranged to introduce a respective noise component into the respective feedback signal before return to the respective signal generator.
In embodiments, the modulated property of each modelling signal used to model the value of the respective variable comprises one of: an amplitude of the signal, or a phase of the signal.
In embodiments, in each channel the respective modelling signal generated by the respective signal generator comprises a light signal, and wherein the respective vector multiplier comprises an optical multiplier configured to perform its respective multiplication in the optical domain.
In embodiments, the light signal generated by each respective signal generator has a different respective optical wavelength.
In embodiments, the optical multiplier in each vector multiplier comprises one of:
In embodiments, each interaction logic further comprises a respective light detector arranged to detect a light output of the respective optical multiplier for producing the feedback signal to the respective signal generator in analogue electronic form.
In embodiments, each respective signal generator comprises: a respective light source; a respective spin generator configured to generate a respective spin, in the form of an analogue electronic signal, in dependence on the respective feedback signal; and a respective modulator arranged to modulate the respective spin into the modulated property of the respective light signal.
In embodiments, the respective spin generator in each channel comprises a further light source, a further modulator arranged to modulate light from the further light source in dependence on the respective feedback signal, and a further light detector arranged to detect the modulated light from the further modulator and generate the spin in dependence thereon.
In embodiments, in each channel: the property of each signal used to model the value of the respective variable comprises: an amplitude of the signal; each respective spin generator is arranged to generate the respective spin in the form of an analogue electronic signal with amplitude modulated between a positive and negative levels to represent the spin; the respective modulator is configured to modulate the spin into the amplitude of the respective light signal, wherein the amplitude of the light signal and the output of the respective optical multiplier can only take positive levels, not negative levels; the respective light detector of the interaction logic in each channel is configured to detect the output of the respective optical multiplier by incoherent detection, thereby generating a respective reading; and the respective interaction logic in each channel is configured to add a DC offset to the respective reading from the respective light detector in order to produce the respective feedback signal modulated between positive and negative levels.
According to another aspect disclosed herein, there is provided a method for estimating values of a vector of variables that optimize a function, the function comprising a weighted sum of a plurality of terms, each term comprising a product of a corresponding subset of the variables from said vector and each term being weighted by a corresponding weight from a matrix of weights that models interactions between the variables; the method comprising, at each of a plurality of parallel hardware channels: generating, by a respective signal generator, a respective modelling signal having a modulated property modelling a value of a respective variable; supplying an instance of the respective modelling signal to each of the parallel channels, each channel thus receiving a vector of signals modelling the vector of variables; multiplying, at respective interaction logic, the received vector of signals by a respective vector of weights from the matrix of weights modelling an interaction between the respective variable and the vector of variables, thereby generating a respective feedback signal representing the contribution of the respective variable modelled by the respective channel, returning the feedback signal to the respective signal generator; and adapting, by the respective signal generator, the respective modelling signal in dependence on the feedback signal; wherein the method is implemented only using optical and/or analogue electronic hardware.
In embodiments the method may further comprise steps in accordance with any of the system features disclosed herein.
Number | Date | Country | Kind |
---|---|---|---|
21155447.2 | Feb 2021 | EP | regional |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US2022/014172 | 1/28/2022 | WO |