The present invention relates to a signal processing method in a neural network. The teachings of the present invention may be used to implement fast computation in physical substrates with slow components. The present invention also relates to a computer program product comprising instructions for implementing the steps of the method, as well as to a neural network configured to carry out the proposed method.
Physical systems composed of large collections of simple, but intricately connected elements can exhibit powerful collective computational properties. Animals' nervous systems, and most prominently the human brain are prime examples. Its computational prowess has motivated a large, cross-disciplinary and ongoing endeavour to emulate aspects of its structure and dynamics in artificial substrates, with the aim of being ultimately able to replicate its function. The speed of information processing in such a system depends on the response time of its components; for neurons, for example, it can be the integration time scale determined by their membrane time constant.
If we consider hierarchically organised neural networks composed of such elements, each layer in the hierarchy causes a response lag with respect to a changing stimulus. This lag introduces two related critical issues. For one, the inference speed in these systems decreases with their depth. This in turn induces timing mismatches between instructive signals and neural activity, which disrupts learning. For example, recent proposals for bio-plausible implementations of error backpropagation (BP) in the brain require some form of relaxation, both for inference and during learning. Notably, this also affects some purely algorithmic methods involving auxiliary variables. To deal with this inherent property of physical dynamical systems, two approaches have been suggested: either phased plasticity that is active only following a certain relaxation period, or long stimulus presentation times with small learning rates. Both of these solutions entail significant drawbacks: the former is challenging to implement in asynchronous, distributed systems, such as cortical networks or neuromorphic hardware, while the latter results, by construction, in slow learning. This has prompted the critique that any algorithm requiring such a settling process is too slow to describe complex brain function, particularly when involving real-time responses. To the best of our knowledge this fundamental problem affects all modern models of approximate BP in biological substrates.
The present invention aims to overcome at least some of the above-identified problems. More specifically, to overcome the above problems, it proposes a novel signal processing framework in neural networks that allows fast computation and learning in physical substrates with slow components.
According to a first aspect of the invention, a method of processing signals in a neural network is provided, as recited in claim 1.
The speed of information processing in physical systems depends crucially on the reaction speed of their microscopic components; for example, springs need time to decompress, or capacitors need time to charge. In analogue neuronal networks, both biological and bio-inspired/neuromorphic, neurons (which operate similarly to capacitors) need time to accumulate and react to stimuli. In many information processing scenarios, such as deep neuronal networks, input needs multiple stages of processing. This is usually achieved by consecutive layers of neurons, so at every stage of processing, the delays induced by “slow” neurons accumulate. This slows down inference and, much more problematically, disrupts learning. The present invention provides a framework that effectively alleviates this problem. We can endow neurons with a mechanism that speeds up their reaction by guessing their future state at any point in time. In principle, this allows arbitrarily slow neurons to respond arbitrarily quickly to inputs, making local information processing effectively instantaneous.
Note that we differentiate explicitly between the processing/computation speed of individual units (or neurons) and the communication speed between them. Our proposed method refers to the former. The latter is always finite and constrained by physics, i.e. by the speed with which signals can travel between units, which is ultimately limited by the speed of light.
In addition to the above-mentioned mechanism, two additional components are developed according to the present invention. The first one is a full microscopic (i.e., at the level of individual neurons and synapses) realisation of inference and learning using a network structure inspired by cortex that is amenable to extremely efficient implementation in neuromorphic hardware. The second is a way to alleviate the effects of substrate variability and temporal noise in analogue circuits by neuronal adaptation and synaptic filtering; both of these are local computations carried out by simple components. Such robustness to spatiotemporal noise is an important prerequisite for a viable realisation in silico.
The proposed solution also provides a biologically plausible approximation of BP in deep cortical networks with continuous-time, leaky neuronal dynamics and local, continuous plasticity. Moreover, the proposed model is easy to implement in both software and hardware and is well-suited for distributed, asynchronous systems.
In the proposed framework, inference can be arbitrarily fast (up to finite simulation resolution or finite communication speed across physical distances) despite a finite response time of individual system components; downstream responses to input changes thus become effectively instantaneous. Conversely, responses to instructive top-down input that generate local error signals are also near-instantaneous, thus effectively removing the need for any relaxation phase. This allows truly phase-free learning from signals that change on much faster time scales than the response speed of individual network components. Similarly to some other approaches, the present method derives neuron and synapse dynamics from a joint energy function. However, the energy function according to the present invention is designed to effectively disentangle these dynamics, thus removing the disruptive co-dependencies that otherwise arise during relaxation. This is achieved by introducing the simple, but important new ingredient: neuronal outputs that try to predict their future state based on their current information, a property that in the present description is described as “prospective”. Thereby, the proposed framework also constructs an intimate relationship between such “slow” neuronal networks and artificial neural network (ANN), thus enabling the application of various auxiliary methods from deep learning. To differentiate between biologically plausible, leaky neurons and abstract neurons with instantaneous response, we respectively use the terms “neuronal” and “neural”.
According to a second aspect of the invention, a computer program product is provided, comprising instructions for implementing the steps of the method according to the first aspect of the present invention when loaded and run on the computing means of an electronic device.
According to a third aspect of the invention, a neural network is provided, which is configured to implement the method according to the first aspect of the present invention.
Other aspects of the invention are recited in the dependent claims attached hereto.
The invention will now be described in more detail with reference to the attached drawings, in which:
It should be noted that the figures are provided merely as an aid to understanding the principles underlying the invention, and should not be taken as limiting the scope of protection sought. Where the same reference numbers are used in different figures, these are intended to indicate similar or corresponding features. It should not be assumed, however, that the use of different reference numbers is intended to indicate any particular degree of difference between the features to which they refer. As used herein, unless otherwise specified the use of the ordinal adjectives “first”, “second”, “third”, etc. to describe a common object, merely indicate that different instances of like or different objects are being referred to, and are not intended to imply that the objects so described must be in a given sequence, either temporally, spatially, in ranking, or in any other manner.
The teachings of the present invention may be implemented in a neural network comprising a set of neurons. An example neuron structure is illustrated in
To illustrate the issue with relaxation, we consider a simple network of two model neurons arranged in a chain as shown in
Besides slow inference, this delayed response leads to critical issues during learning from downstream instructive signals. Consider the typical scenario where such a target signal is present in the output layer and plasticity is continuously active (and not phased according to some complicated schedule). If the system fulfils its task correctly, the target signal corresponds to the output of the relaxed system and no learning should take place. However, due to delays in neuronal responses, the output signal differs from the target during relaxation, which causes plasticity to adapt synaptic weights in an effort to better match the target. As a result, the system “overshoots” during early relaxation and has to undo these synaptic changes through further learning in the late relaxation phase (
Inspired by the canonical coordinates from classical mechanics, we describe the state of a neuronal network by its position in phase space (u, {dot over (u)}), with the generalised position u and the generalised momentum {dot over (u)}. The relation between these two components describes the physics of the system. To obtain the leaky integrator dynamics that characterise biological neurons, we first define the abstract network state ǔm as
We next define an energy function E that characterises this state:
This energy represents a global view of the network from which neuron and weight dynamics will be derived. Here, W and b represent the weight matrix and bias vector, respectively, φ is the neuronal activation function and the loss is scaled by a constant β. Bold font is used for matrices, vectors, and vector-valued functions. This formulation of the energy function is very generic and can, by appropriate choice of parameters, apply to different network topologies, including multilayer perceptrons, convolutional or recurrent networks. It is to be noted that this energy can be written as a neuron-wise sum over mismatch energies
where ri,pre:=φ(ǔi,prem) is the presynaptic input vector for the ith neuron in the network.
We can now derive neuronal dynamics as extrema of the energy function from Equation 2:
with presynaptic bottom-up input Wφ(ǔm) and top-down error signals
for hidden neurons, and e=−B∇ǔ for neurons which directly contribute to the loss. Using Equation 3, it is easy to see that in hierarchical networks these errors can be expressed recursively over layers
,
thus instantiating a variant of BP. Equation 3 can be interpreted as the dynamics of a structured pyramidal neuron receiving presynaptic, bottom-up input via its basal dendrites and top-down input via its apical tree. We later provide a more detailed description of our model's biophysical implementation. We note that the proposed approach contrasts with previous work that introduced neuron dynamics via gradient descent on an energy function. Indeed, this difference is crucial for solving the relaxation problem, as discussed below. Since for a given input our network moves, by construction, within a constant-energy manifold, we refer to our model as latent equilibrium (LE).
We can now revisit our choice of ǔm from a functional point of view. Instead of the classical output rate φ(u), our neurons fire with φ(ǔm), which depends on both u and {dot over (u)} (Equation 1). This represents a known, though often neglected feature of biological neurons and has also been considered in other models of bio-plausible BP derived from a stationary action. As neuron membranes are low-pass filters (Equation 3), ǔm can be viewed as a prospective version of u: when firing, the neuron tries to look into the future and predict the state of its membrane potential after relaxation. The prospective nature of ǔm also holds in a strict mathematical sense: the breve operator ̆m:=(1+τmd/dt) is the exact inverse of an exponential low-pass filter. While neuronal membranes continue to relax slowly towards their steady states, neuronal outputs use membrane momenta to compute a correct instantaneous reaction to their inputs, even in the case of jumps (
It is to be noted that, even for functionally feedforward networks, our resulting network structure is recurrent, with backward coupling induced by error inputs to the apical tree. As a non-linear recurrent network, it cannot settle instantaneously into the correct state; rather, in numerical simulations, it jumps quickly towards an estimated stationary activity state and reaches equilibrium within several such jumps (of infinitesimal duration). In practice, saturating activation functions can help avoid pathological behaviour under strong coupling. Moreover, we can introduce a very short exponential low-pass filter τs on top-down signals, slightly larger than the temporal resolution of the simulation. Thus, in physical systems operating in continuous time, τs can effectively become infinitesimal as well and does not affect the speed of information propagation through the network. In particular, as explained below, the perpetual consistency between input and output allows our model to continuously learn to reduce the loss, obviating the need for network relaxation phases and the associated global control of precisely timed plasticity mechanisms.
Based on our prospective energy function (Equation 2), we define synaptic weight dynamics, i.e., learning, as time-continuous stochastic gradient descent with learning rate ηW:
Thus, weights evolve continuously in time driven by local error signals without requiring any particular schedule. Neuronal biases are adapted according to the same principle. It is to be noted that this rule only uses quantities that are available at the locus of the synapse (as also explained later). Intuitively, this locality is enabled by the recurrent nature of the network: errors in the output units spread throughout the system, attributing credit locally through changes in neuronal membrane potentials. These changes are then used by synapses to update their weight in order to reduce the network loss. However, our learning rule is not an exact replica of BP, although it does approximate it in the limit of infinitely weak supervision β→0 (often referred to as “nudging”); strictly speaking, it minimises the energy function E, which implicitly minimises the loss . This form of credit assignment can be related to previous models which similarly avoid a separate artificial backward pass (as necessary in classical BP) by allowing errors to influence neuronal activity. Plasticity in the weights projecting to output neurons depends on the choice of
; for example, for an L2 loss, plasticity in the output layer corresponds to the classical delta rule: {dot over (W)}N=−ηβ[r*N−rN]rN-1T.
Despite similarities to previous work, learning in our framework does not suffer from many of the shortcomings noted above. Since activity propagates quasi-instantaneously throughout the network, our plasticity can be continuously active without disrupting learning performance. This is true by construction and most easily visualised for a sequence of (piecewise constant) input patterns: following a change in the input, membrane dynamics take place in a constant-energy manifold (Equation 3) across which synaptic weight dynamics remain unchanged, i.e., they equally and simultaneously pull downwards on all points of this manifold. This disentangling of membrane and synaptic weight dynamics constitutes an important difference to previous work, where the neuronal mismatch energies Ei change as dynamics evolve and thus cannot represent the true errors in the network before reaching a fixed point. We further note that LE also alleviates a second form of catastrophic forgetting in these other models: due to the misrepresentation of errors during relaxation, continuous learning changes synaptic weights even in perfectly trained networks (
An example physical implementation of the proposed system is explained in the following. Due to the simplicity of their implementation, the principles of LE can be applied to models of approximate BP in the brain in order to alleviate the issues discussed above. Here we demonstrate how a network of hierarchically organised dendritic microcircuits can make use of our theoretical advances to significantly increase both inference and training speed, thus removing several critical shortcomings towards its viability as a scalable model of cortical processing. The resulting dynamical system represents a detailed and biologically plausible version of BP, with real-time dynamics, and phase-free, continual local learning able to operate on effectively arbitrary sensory timescales.
The fundamental building block of the neural network 13 is a cortical microcircuit model consisting of pyramidal cells 15 or neurons and interneurons 17 as shown in
where i is the neuron index, El the leak potential, vibas and viapi the basal and apical membrane potentials, respectively, and gbas and gapi the dendro-somatic couplings. Due to the conductance-based interaction between compartments, the effective time constant of the soma is τs: =Cm/(gl+gbas+gapi). For somatic membrane potentials, assuming that apical dendrites encode errors (see below) and basal dendrites represent the input, this corresponds directly to Equation 3. Plasticity in basal synapses is driven by the local error signal given by the discrepancy between somatic and dendric membrane potentials uisom and vibas:
which is analogous to Equation 5 up to a monotonic transformation on the voltages.
In this architecture, plasticity serves two purposes. For pyramidal-to-pyramidal feedforward synapses 9, it implements error-correcting learning as a time-continuous approximation of BP. For pyramidal-to-interneuron synapses 9, it drives interneurons 17 to mimic their pyramidal partners 15 in the layers above. Thus, in a well-trained network, apical compartments 5 of pyramidal cells 17 are at rest, reflecting zero error, as top-down and lateral inputs cancel out. When an output error propagates through the network 13, these two inputs can no longer cancel out and their difference represents the local error ei. This architecture does not rely on the transpose of the forward weight matrix, improving viability for implementation in distributed asynchronous systems. Here, we keep feedback weights fixed, realising a variant of feedback alignment. In principle, these weights could also be learned in order to further improve the local representation of errors.
Incorporating the principles of LE is straightforward and requires only that neurons fire prospectively: φ(u)→φ(ǔeff). While we have already addressed the evidence for prospective neuronal activity, we note that our plasticity also uses these prospective signals, which constitutes an interesting prediction for future in-vivo experiments.
The proposed microcircuit model can learn perfect classification even for very short presentation times. In contrast, traditional models without the prospective mechanism stagnate at high error rates even for this simple task. This can be traced back to the learning process being disrupted during relaxation. Without prospective dynamics, traditional models require presentation times on the order of 100τeff to achieve perfect accuracy. In contrast, LE only degrades for presentation times below 0.1τeff, which is due to the limited resolution of our numerical integration method. Thus, incorporating LE into cortical microcircuits can bring the required presentation times into biologically plausible regimes, allowing networks to deal with rich sensory data.
Computer simulations often assume perfectly homogeneous parameters across the network. Models can hence inadvertently rely on this homogeneity, resulting in unpredictable behaviour and possibly fatal dysfunction when faced with the physics of analogue substrates which are characterised by both heterogeneity in their components as well as temporal perturbation of their dynamics. Therefore, we consider robustness to spatio-temporal noise to represent a necessary prerequisite for any mathematical model aspiring to physical implementation, be it biological or artificial.
Spatial noise reflects the individuality of cortical neurons or the heterogeneity arising from device mismatch in hardware. Here, we focus on the heterogeneity of time constants; in contrast to, for example, variability in synaptic parameters or activation functions, these variations cannot be “trained away” by adapting synaptic weights. The two time constants that govern neuron dynamics in our model, namely integration (Equation 3) and prospective coding (Equation 1), previously assumed to be identical, are affected independently by such variability. To differentiate between the two, we assign the prospective dynamics their own time constant: ǔr:=u+τr{dot over (u)}. We can now model heterogeneity as independent, multiplicative Gaussian noise on all time constants: τm/r→(1+ξ) τm/r, with ξ˜(0, στ2); we use multiplicative noise to emphasise that our model is agnostic to absolute time scales, so only relative relationships between specific time constants matter.
Due to the resulting mismatches between the timing of neuronal input and output, neuronal outputs suffer from exponential transients, leading to relaxation issues similar to the ones we already addressed in detail. However, depending on the transmission of top-down signals, the effects on learning performance can be very different. According to the formal theory, backward errors use the correct prospective voltages: e∝[ǔm−Wφ(ǔr)] (Equations 4 and 5); this allows robust learning even for relatively large perturbations in the forward signals. In contrast, in biophysical implementations such as the microcircuits discussed above, neurons can only transmit a single output signal φ(ǔir), which consequently also affects the errors: e∝[ǔr−Wφ(ǔr)]. Since deviations due to a mismatch between integration and prospective time constants persist on the time scale of τm, even small amounts of mismatch can lead to a significant loss in performance.
Here we address this issue by introducing a neuron-local adaptation mechanism that corrects the difference between prospective voltages ǔm and ǔr induced by mismatching time constants:
In biological substrates, this could, for example, correspond to an adaptation of the transmembrane ion channel densities, whereas on many neuromorphic substrates, neuronal time constants can be adapted individually. Before training the network, we allow it to go through a “developmental phase” in which individual neurons are not affected by top-down errors and learn to match their prospective firing to their membrane dynamics, thus recovering the performance of networks with perfectly matched time constants. This developmental phase merely consists of presenting input samples to the network, for a duration that depends on the required matching precision. Here, we achieved mismatches below 1‰ within the equivalent of 20 training epochs. It is to be noted that, as observed, for example, in vivo, neuronal time constants remain diverse after this phase, but are matched in a way that enables fast reaction of downstream areas to sensory stimuli—an ability that is certainly helpful for survival.
We next model additive temporal noise on neuronal outputs as might be induced, for example, by noisy transmission channels: r→r+ξ, with ξ˜(0, σr2). Formally, this turns membrane dynamics into a Langevin process. These perturbations add up over consecutive layers and can also accumulate over time due to recurrent interactions in the network. This introduces noise to the weight updates that can impair learning performance. Due to their slow integration of inputs, traditional neuron models filter out this noise, but our prospective mechanism effectively removes this filter. We thus optionally use an additional denoising mechanism, for which we again turn to biology: by introducing synaptic filtering with a short time constant τs, synaptic input is denoised before reaching the membrane; formally, r=φ(ǔm) is replaced by a low-pass-filtered
Networks equipped with synaptic filters learn reliably even in the presence of significant noise levels. However, introducing this filter affects the quasi-instantaneity of computation in our networks, which then require longer input presentation times in the presence of synaptic filtering. Even so, these presentation times need only be “long” with respect to the characteristic time constant of relaxation mismatches—in this case, τs. Thus, for the typical scenario of white noise described above, minuscule τs on and even below biological time scales can achieve effective denoising, without significantly affecting the advantages conferred by prospective coding. In conclusion, the demonstrated robustness of our model (to spatial and temporal substrate imperfections) introduced by simple, biologically inspired mechanisms, make it a promising candidate for implementation in analogue physical systems.
A new framework was introduced above for inference and learning in physical systems composed of computational elements with finite response times. Our model rests on four simple axioms: prospective coding (Equation 1), neuronal mismatch energy (Equation 2), energy conservation under neuronal dynamics (Equation 3) and gradient descent on the energy under synaptic plasticity (Equation 5). In particular, incorporating the simple, biologically inspired mechanism of prospective coding allows us to avoid critical issues and scalability bottlenecks inherent to many current models of approximate BP in cortex. Furthermore, the resulting implementations can be made robust to substrate imperfections, a prerequisite for deployment in analogue neuronal systems, both biological and artificial.
Our framework carries implications both for neuroscience and for the design of neuromorphic hardware. The prospective mechanism described here would allow biological circuits to respond much faster than previously assumed. Furthermore, our framework suggests that both inference and learning take place on prospective, rather than instantaneous neuronal quantities. From a hardware perspective, this lifts the previously perceived limitations of slow analogue components (as compared to digital ones) without relinquishing their power efficiency.
The flowchart of
In step 101, the synaptic inputs (i.e. the synaptic input signals) are updated, i.e. s(t)←W(t−Δt)r(t−Δt). In other words, the synaptic inputs are updated so that they become equal to a weighted sum over neuronal outputs (i.e. neuronal output signals). Thus, these updated synaptic inputs for a given neuron depend on a synaptic weight vector multiplied by a corresponding presynaptic signal vector. Optionally, if in this step a synaptic filter is applied e.g. according to the teachings of section “Robustness to substrate imperfections”, then s(t)←(τs)−1·[−s(t−Δt)+W(t−Δt)r(t−Δt)]. Multiple operations in this step can be carried out in parallel. It is to be noted that when a signal, an item or any value more generally is updated, the outcome is an updated signal, item or value even if this is not explicitly stated in this context.
In step 103, apical and basal potentials of the neurons 1 are updated. More specifically, the apical potentials are updated either using target values for a set of output neurons or by backpropagating the neuronal outputs from all other neurons. The basal potentials are updated using the updated synaptic inputs. In other words, ua(t)←etgt(t) if k∈O, where tgt denotes the target, O denotes the set of output neurons, and ua(t)←φ′(ŭr(t−Δt))·WT(t−Δt) [ŭr(t−Δt)−s(t)] otherwise, and ub(t)←s(t). Multiple operations in this step can be carried out in parallel. One apical potential and one basal potential are updated per neuron. It is to be noted that the teachings of the present invention are also applicable even if only one or two of the three update processes takes place in steps 101 and 103.
In step 105, potential differentials of the neurons 1 are updated (one potential differential per neuron). More specifically, potential differentials are updated using a filter function of their previous values, the apical and basal potentials and optionally any other neuron parameters. In other words, Δu(t)←
(u(t−Δt), ua(t), ub(t)|W(t−Δt), b(t−Δt), τm(t−Δt), . . . ). If no apical and/or basal potentials are updated in step 103, then the respective potential differential is updated by using at least a respective updated synaptic input signal.
In step 107, somatic potentials of the neurons 1 (one somatic potential per somatic compartment, i.e. per neuron) are updated. More specifically, the somatic potentials are updated via an Euler step using their previous values and their differential, in other words u(t)←u(t−Δt)+Δt Δu(t). Multiple operations in this step can be carried out in parallel.
In step 109, prospective potentials of the neurons (one prospective potential per neuron) are updated, which is also one of the main contributions of the present invention. More specifically, the prospective potentials are updated using an inverse filter function of the somatic potentials, their potential differentials and optionally any other neuron parameters. This means that ŭr(t)←
(u(t), Δu(t)|τr(t−Δt), . . . ).
Then optionally in step 111 and before training, time constants are updated for a subset of neurons or all of the neurons. More specifically, prospective and/or membrane time constants for a subset of neurons are updated using their previous values and the difference between the prospective potentials and the sum of basal potentials and neuronal biases. In other words, for membrane time constants τm(t)←τm(t−Δt)+Δt ητ·[ŭr(t)−ub(t)−b(t−Δt)], and for prospective time constants τr(t)←τr(t−Δt)−Δt ητ·[ŭr(t)−ub(t)−b(t−Δt)]. One aim would be to make the time constants τm(t) and τr(t) equal or substantially equal. Multiple operations in this step can be carried out in parallel.
In step 113, neuronal outputs (in this example one output signal per neuron) are updated. More specifically, neuronal outputs are updated as nonlinear functions of the prospective potentials. Accordingly, in this step, r(t)←φ(ŭr(t)). Multiple operations in this step can be carried out in parallel.
As soon as the above steps have been carried out for all the neurons or for the desired number of neurons, then the process can be terminated. Optionally, the process may be continued so that in step 115 synaptic plasticity is implemented. More specifically, the synaptic weights (one weight per synapse) and neuronal biases (one bias per neuron) are updated using their previous values, their local dendritic potentials and their neuron's somatic potentials and/or output. For example, the synaptic weights can now be updated during training as follows: W(t)←W(t−Δt)+Δt ηW·ua(t) rT(t) (the apical potential being the difference between the somatic and the basal potentials). Similarly, the neuronal biases can now be updated during training: b(t)←b(t−Δt)+Δt ηb·ua(t). Multiple operations in this step can be carried out in parallel.
The flowchart of
According to a second special case, the network uses harmonic oscillators. In this case, step 105 can be divided into two sub-steps. More specifically, first 2nd order potential differentials are updated: Δ2u(t)←−ku(t−Δt)+ub(t)+ua(t)+b(t−Δt). Subsequently, 1st order potential differentials are updated: Δu(t)←Au(t−Δt)+Δt Δ2u(t). According to the second special case, the prospective potentials are updated in step 109 in the following manner: ŭr(t)←k·u(t)+Δ2u(t).
According to a third special case, the network uses neuronal cables. According to this case, the potential differentials are updated in step 105 as follows: Δu(t)←[τm(t−Δt)]−1·[−u(t−Δt)+ub(t)+ua(t)+b(t−Δt)−λim(t)]. Furthermore, according to the third special case, the prospective potentials are updated in step 109 as follows: ŭr(t)←u(t)+τr(t−Δt)·Δu(t)−λim(t).
An example process of running the proposed method in a cortical microcircuit is next explained in more detail with reference to
In step 203, dendritic potentials (one potential per compartment) are updated. More specifically, pyramidal apical potentials are updated either using prospective target values for a set of output neurons or using the pyramidal apical synaptic input otherwise, upa(t)←ŭtgtr(t) if k∈O, where O denotes the set of output neurons, upa(t)←spa(t) otherwise. Pyramidal basal potentials are updated using the pyramidal basal synaptic input, upb(t)←spb(t). Interneuron basal potentials are updated using the interneuron basal synaptic input, uib(t)←sib(t). Multiple operations in this step can be carried out in parallel.
In step 205, steady-state potentials (one steady-state potential per neuron) are updated. More specifically, pyramidal steady-state potentials are updated using a convex combination of their leak, basal and apical potentials, weighted by their respective conductances, upeff(t)←[gl·El+gb·upb(t)+ga·upa(t)]/(gl+gb+ga). Interneuron steady-state potentials are updated using a convex combination of their leak, basal and nudging potentials, weighted by their respective conductances, uieff(t)←[gl·El+gb·uib(t)+gnudge·ŭpr(t−Δt)]/(gl+gb+gnudge). Multiple operations in this step can be carried out in parallel.
In step 207, potential differentials (one potential differential per neuron) are updated. More specifically, pyramidal and interneuron potential differentials are updated using the difference between their steady-state potentials and the previous values of their somatic potentials, their total conductance and their membrane capacitance. In other words, pyramidal differentials are updated as follows: Δup(t)←[upeff(t)−up(t−Δt)]·(gl+gb+ga)/Cm, while the interneuron potential differentials are updated as follows: Δui(t)←[uieff(t)−ui(t−Δt)]·(gl+gb+gnudge)/Cm.
In step 209, somatic potentials (one somatic potential per somatic compartment) are updated. More specifically, pyramidal and interneuron somatic potentials are updated via an Euler step using their previous values and their differentials. In other words, pyramidal somatic potentials are updated as follows: up(t)←up (t−Δt)+Δt Δup(t), while the interneuron somatic potentials are updated as follows: ui(t)←ui(t−Δt)+Δt Δui(t). Multiple operations in this step can be carried out in parallel.
In step 211, prospective potentials (one prospective potential per neuron) are updated. More specifically, pyramidal and interneuron prospective potentials are updated using their somatic potentials, their prospective time constants and their potential differentials. In other words, the pyramidal prospective potentials are updated as follows: ŭpr(t)←up(t)+τpr(t−Δt)·Δup(t), while the interneuron prospective potentials are updated as follows: ŭir(t)←ui(t)+τir(t−Δt)·Δui(t).
In step 213, neuronal outputs (in this example one output signal per neuron) are updated. More specifically, pyramidal and interneuron outputs are updated as nonlinear functions of the prospective potentials. In other words, the pyramidal outputs are updated as follows: rp(t)←φ(ŭpr(t)), while the interneuron outputs are updated as follows: ri(t)←φ(ŭir(t)). Multiple operations in this step can be carried out in parallel.
In step 215, synaptic weights (one weight per synapse) are updated. More specifically, forward synaptic weights are updated using their previous values, the pyramidal outputs and the pyramidal basal potentials, W(t)←W(t−Δt)+Δt ηW·[rpT(t)−φ(α·upb(t))]rpT(t). Pyramidal-to-interneuron synaptic weights are updated using their previous values, the pyramidal and interneuron outputs, and the interneuron basal potentials, Wip(t)←Wip(t−Δt)+Δt ηip·[riT(t)−φ(α·uib(t))] rpT(t). Interneuron-to-pyramidal synaptic weights are updated using their previous values, the interneuron outputs and the pyramidal apical potentials, Wpi(t)←Wpi(t−Δt)+Δt ηpi·[−upa(t)]riT(t).
The flowcharts described above have two interpretations—one as an algorithm, the other as dynamical equations for physical systems. The proposed algorithm is suitable to undo neuronal filtering and enable quasi-instantaneous information propagation, and this can be achieved in physical systems that obey our proposed dynamics. The method steps described above may be carried out by suitable circuits or circuitry when the process is implemented in hardware or using hardware for individual steps. However, the method may also be implemented in software using an artificial neural network comprising artificial neurons and artificial synapses. The terms “circuits” and “circuitry” refer to physical electronic components or modules (e.g. hardware), and any software and/or firmware (“code”) that may configure the hardware, be executed by the hardware, and or otherwise be associated with the hardware. The circuits may thus be operable (i.e. configured) to carry out or they comprise means for carrying out the required method steps as described above.
Since the proposed hardware implementation combines state-of-the-art computational capabilities with very low power consumption and a high degree of robustness to noise, the teachings of the present invention may be used for a large variety of edge-computing scenarios where energy and power supplies are strongly limited, such as small drones, spacecraft and, in particular, neuroprosthetics and bio-monitoring devices, both implanted and wearable. The products could be chips that can be connected to sensors and learn to perform particular tasks on their output data stream in real-time, such as navigation or diagnostics. It is to be noted that one key problem of existing neuromorphic hardware is the following dilemma: gain speed (often replacing analogue by digital circuits) but sacrifice power efficiency, or reduce the power consumption (using analogue components, especially neurons) at the cost of reducing speed and increasing noise. It is believed that the proposed approach resolves this dilemma and can thus harness the best of both worlds on a mixed-signal substrate (analogue neuron and synapse dynamics, digital communication between neurons).
Some additional examples are briefly explained next, namely networks of harmonic oscillators, learning of backward weights, and temporal processing.
According to the case of networks of harmonic oscillators, each node in the network is a harmonic oscillator. As previously, we obtain the node dynamics from Dui=Ii where Ii=uia+uib+bi and D is the differential operator of a damped harmonic oscillator
Here ζ and ω0 are the dampening ratio and angular frequency constants, respectively. These determine the time scale τ=(πω0)−1 on which the oscillator reacts to input. The special case of the undamped harmonic oscillator is obtained for ζ=0.
To illustrate this systems behaviour, we consider the somatic potential u and its prospective version ǔ=Du in the case of a quickly changing stimulus. The first part of the stimulus, i.e. the external input Iext in
As far as the example of learning of backward weights is concerned, to support successful learning in hierarchical networks of many layers, backward weights B need to propagate errors correctly. In the ideal case, they correspond to the transpose of the forward weight matrix B=WT. Under these conditions (and in the limit of weak target signals) the algorithm implements the error backpropagation algorithm. However, in physical substrates with time-continuous dynamics such a requirement is difficult to realize due to locality of information.
Using the proposed method, it is possible to learn useful backward weights in dynamical systems. To achieve this, at each hidden layer, Ornstein-Uhlenbeck noise ξl is added to the somatic potential ul. Output rates, calculated from these noisy somatic potentials, are then transmitted to the subsequent layer. At every backward synapse Bl,l+1 a high-pass filtered rate is computed which extracts the noise signal. Backward weights are then adjusted according to Bl,l+1=
−αBl,l+1. This algorithm approximately aligns the backward weights with the transpose of the forward weights.
Fundamentally, this algorithm relies on a separation of signal and noise in frequency space. It is thus essential that all frequency components can propagate through the network uninhibited. Without the proposed method, each node in the network would introduce a low-pass filter, thus making the learning of backward weights via this algorithm infeasible.
To illustrate the differences between optimal backward weights, fixed, random backward weights and our algorithm to learn backward weights, we consider learning of representations in the bottleneck of an autoencoder trained on hand-written digits. The bottleneck comprises only two units for which we can visualize their activity for all data samples. While optimal and learned backward weights lead to representations in which different digits are approximately separated, fixed, random backward weights fail to learn such separated representations.
The case of temporal processing is explained next. The proposed method leads to instantaneous reactions of network activity to stimulus changes. This implies that network outputs are constrained to compute with instantaneous stimulus values only. In general, however, a task may require the network to make use of past and present stimulus information to determine its output.
We consider an extension of the proposed method in which we decouple the integration and prospective time constants, i.e., the differential operator becomes
while the prospective membrane potential is computed as ǔ=u+τr{dot over (u)}. In the case of a pure sinusoidal stimulus driving a neuron with different integration and prospective time constants, the neuron's output experiences both a modulation of the amplitude as well as a phase shift compared to the stimulus. The phase shift illustrates that decoupling the integration and prospective time constants indeed allows the neuron to use delayed or advanced stimuli to compute its output. This enables a population of neurons with different ratios of integration and prospective time constants to transform temporal information in a stimulus into spatial information across a population of neurons. This decoupling of time constants thus extends the ability of the proposed method to signal processing in the temporal domain.
We demonstrate the temporal processing capabilities of a network containing such neurons by computing a continuous-time analogue of the exclusive-or (XOR) operation. We provide a layer of neurons with mismatched time constants with a pure sinusoid. From this population, which contains information about present as well as past stimulus values, a network consisting of two layers of instantaneous neurons computes the mismatch between the original stimulus and a time-shifted version thereof. Indeed, if the ratio of integration and prospective time constants in the first layer is chosen correctly, this network learns to correctly map the stimulus input to the target values (
To summarise the above teachings, the present invention according to one example concerns a method of processing signals in a neural network comprising a set of neurons interconnected by a set of synapses, with each neuron comprising a soma and optionally a set of apical and/or basal dendrites. The method comprises: updating at least one synaptic input signal configured to be received at input nodes of one or more neurons, and/or updating the corresponding basal potential of at least one of the basal dendrites, and/or updating the corresponding apical potential of at least one of the apical dendrites; updating a potential differential for at least one neuron by using the corresponding synaptic input signal, and/or the updated apical potential and/or the updated basal potential; updating the somatic potential of at least one of the somas by using at least the corresponding updated potential differentials; updating the prospective potential of at least one neuron by using at least the corresponding updated somatic potentials and potential differentials; and generating a neuronal output signal for at least one neuron by using the corresponding updated prospective potentials.
While the invention has been illustrated and described in detail in the drawings and foregoing description, such illustration and description are to be considered illustrative or exemplary and not restrictive, the invention being not limited to the disclosed embodiment. Other embodiments and variants are understood, and can be achieved by those skilled in the art when carrying out the claimed invention, based on a study of the drawings, the disclosure and the appended claims. New embodiments may be obtained by combining any of the teachings above.
In the claims, the word “comprising” does not exclude other elements or steps, and the indefinite article “a” or “an” does not exclude a plurality. The mere fact that different features are recited in mutually different dependent claims does not indicate that a combination of these features cannot be advantageously used. Any reference signs in the claims should not be construed as limiting the scope of the invention.
Number | Date | Country | Kind |
---|---|---|---|
21199018.9 | Sep 2021 | EP | regional |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/EP2022/075196 | 9/11/2022 | WO |