The present disclosure and attachments hereto generally relate to neural networks. Among the various aspects of the present disclosure is the provision of a neuromorphic and deep-learning system.
A single action potential generated by a biological neuron is not optimized for energy and consumes significantly more power than an equivalent floating-point operation in a graphics processing unit (GPU) or, for example, a Tensor processing unit (TPU). However, a population of coupled neurons in the human brain, using around 100 Giga coarse neural spikes or operations, can learn and implement diverse functions compared to an application-specific deep-learning platform that typically use around 1 Peta 8-bit/16-bit floating-point operations or more. Furthermore, unlike many deep-learning processors, learning in biological networks occurs in real-time and with no clear separation between training and inference phases. As a result, biological networks as small as a fly brain and, with less than a million neurons, can achieve a large functional and learning diversity at remarkable levels of energy-efficiency. Comparable silicon implementations are orders of magnitude less efficient both in terms of energy-dissipation and functional diversity.
In neuromorphic engineering, neural populations are generally modeled in a bottom-up manner, where individual neuron models are connected through synapses to form large-scale spiking networks. Alternatively, a top-down approach treats the process of spike generation and neural representation of excitation in the context of minimizing some measure of network energy. However, these approaches typically define the energy functional in terms of some statistical measure of spiking activity, such as firing rates, which does not allow independent control and optimization of neurodynamical parameters. A spiking neuron and population model where the dynamical and spiking responses of neurons that can be derived directly from a network objective or energy functional of continuous-valued neural variables like the membrane potential would provide numerous advantages.
In one aspect, a spiking neural network including a plurality of neurons implemented in respective circuits. Each neuron is configured to produce a continuous-valued membrane potential according to a Growth Transform bounded by an extrinsic energy constraint. The continuous-valued membrane potential is defined as a function of spiking current received from another neuron in the plurality of neurons, and a received electrical current stimulus. Each neuron is further configured to include a network energy function representing network energy consumed by the plurality of neurons. The spiking neural network includes a neuromorphic framework configured to minimize network energy consumed by the plurality of neurons to determine the extrinsic energy constraint, model synaptic connections among the plurality of neurons as respective transconductances that regulate magnitude of spiking currents received from each of the plurality of neurons by each other of the plurality of neurons, and encode the received electrical current stimulus in corresponding continuous-valued membrane potentials of the plurality of neurons.
In another aspect, a method of operating a neural network is described. The method includes implementing a plurality of neurons in respective circuits, producing a continuous-valued membrane potential according to a Growth Transform bounded by an extrinsic energy constraint, defining a function of spiking current received from another neuron in the plurality of neurons, and a received electrical current stimulus, as a continuous-valued membrane potential, representing network energy consumed by the plurality of neurons as a network energy function, minimizing network energy consumed by the plurality of neurons to determine the extrinsic energy constraint, modeling synaptic connections among the plurality of neurons as respective transconductances that regulate magnitude of spiking currents received from each of the plurality of neurons by each other of the plurality of neurons, and encoding the received electrical current stimulus in corresponding continuous-valued membrane potentials of the plurality of neurons.
In yet another aspect, at least one non-transitory computer-readable storage medium having computer-executable instructions embodied thereon for operating a neural network is described. When executed by at least one processor, the computer-executable instructions cause the processor to implement a plurality of neurons in respective circuits, produce a continuous-valued membrane potential according to a Growth Transform bounded by an extrinsic energy constraint, define a function of spiking current received from another neuron in the plurality of neurons, and a received electrical current stimulus, as a continuous-valued membrane potential, represent network energy consumed by the plurality of neurons as a network energy function, minimize network energy consumed by the plurality of neurons to determine the extrinsic energy constraint, model synaptic connections among the plurality of neurons as respective transconductances that regulate magnitude of spiking currents received from each of the plurality of neurons by each other of the plurality of neurons, and encode the received electrical current stimulus in corresponding continuous-valued membrane potentials of the plurality of neurons.
Spiking neural networks that emulate biological neural networks are often modeled as a set of differential equations that govern temporal evolution of its state variables, i.e., neuro-dynamical parameters such as membrane potential and the conductances of ion channels that mediate changes in the membrane potential via flux of ions across the cell membrane. The neuron model is then implemented, for example, in a silicon-based circuit and connected to numerous other neurons via synapses to form a spiking neural network. This design approach is sometimes referred to as a “bottom-up” design that, consequently, does not optimize network energy.
The disclosed spiking neural network utilizes a “top-down” design approach under which the process of spike generation and neural representation of excitation is defined in terms of minimizing some measure of network energy, e.g., total extrinsic power that is a combination of power dissipation in coupling between neurons, power injected to or extracted from the system as a result of external stimulation, and power dissipated due to neural responses. The network energy function, or objective, is implemented at a network level in a population model with a neuromorphic framework. The top-down design accurately emulates a biological network that tends to self-optimize to a minimum-energy state at a network level.
Top-down designs are typically implemented as a digital, or binary, system in which the state a neuron is, for example, spiking or not spiking. Such designs generally lack the ability to effectively control various neuro-dynamical parameters, such as, for example, the shape of the action potential, bursting activity, or adaptation in neural activity without affecting the network solution. The disclosed neuron model utilizes analog connections that enable continuous-valued variables, such as membrane potential. Each neuron is implemented as an asynchronous mapping based on polynomial Growth Transforms that are fixed-point algorithms for optimizing polynomial functions under linear or bound constraints. The disclosed Growth Transform neurons (GT neurons) can solve binary classification tasks while producing stable and unique neural dynamics (e.g., noise-shaping, spiking, and bursting) that can be interpreted using a classification margin. In certain embodiments, all neuro-dynamical properties are encoded at a network level in the network energy function. Alternatively, at least some of the neuro-dynamical properties are implemented within a given GT neuron.
In the disclosed neuromorphic framework, the synaptic interactions, or connections, in the spiking neural network are mapped such that the network solution, or steady-state attractor, is encoded as a first-order condition of an optimization problem, e.g., network energy minimization. The disclosed GT neurons and, more specifically, their state variables evolve stably in a constrained manner based on the bounds set by the network energy minimization, thereby reducing network energy relative to, for example, bottom-up synaptic mapping. The disclosed neuromorphic framework integrates learning and inference using a combination of short-term and long-term population dynamics of spiking neurons. Stimuli are encoded as high-dimensional trajectories that are adapted, over the course of real-time learning, according to an underlying optimization problem, e.g., minimizing network energy.
In certain embodiments, gradient discontinuities are introduced to the network energy function to modulate the shape of the action potential while maintaining the local convexity and the location of the steady-state attractor (i.e., the network solution). The disclosed GT neuron model is generalized to a continuous-time dynamical system that enables such modulation of spiking dynamics and population dynamics for certain neurons or regions of neurons without affecting network convergence toward the steady-state attractor. The disclosed neuromorphic framework and GT neuron model enable implementation of a spiking neural network that exhibits memory, global adaptation, and other useful population dynamics. Moreover, by decoupling spiking dynamics from the network solution (via the top-down design) and by controlling spike shapes, the disclosed neuromorphic framework and GT neuron model enable a spiking associative memory network that can recall a large number of patterns with high accuracy and with fewer spikes than traditional associative memory networks.
where vi,n≡vi(nΔt) and vi,n+1≡vi((n+1)Δt), Δt is the time increment between time steps, yi,n represents the depolarization due to an external stimulus that can be viewed as yi,n=Rm
EQ. 2 is derived assuming the spiking function Ψ( ) is a smooth function of the membrane potential that continuously tracks net electrical input at every instant. For a non-smooth spiking function, the temporal expectation of membrane potential encodes the net electrical input over a sufficiently-large time window. If
where
(1−γ)
where
Where Q=−(1−y)W−1, and
The result of optimizing, i.e., minimizing, the energy function in EQ. 2 is an asymptotic first-order condition corresponding to a typical input/output response of a neural network with a non-linearity represented by Ψ−1( ). A modulation function controls the rate of convergence to the first-order condition, but does not affect the final steady-state solution. The modulation function enables control of the network spiking rate and, consequently, the network energy independent of the network learning task. The energy minimization problem represented in EQ. 2 is solved, under the constraint of vc, using a dynamical system based on polynomial Growth Transforms. Growth Transforms are multiplicative updates derived from the Baum-Eagon inequality that optimizes a Lipschitz continuous cost function under linear or bound constraints on the optimization variables. Each neuron in the network implements a continuous mapping based on Growth Transforms, ensuring the network evolves over time to reach an optimal solution of the energy functional within the constraint manifold. The Growth Transform dynamical system is represented by the system equation:
Where EQ. 8 satisfies the following constraints for all time-indices n: (a) |vi,n|≤vc, (b) ({vi,n+1})≤({vi,n}) in mains where
is continuous, and (c) limn→∞(N(zi[n]))→0 ∀i, n where
where M→ is a function of vi, i=1, . . . , M with bounded partial derivatives; and where
∀i, n is a parameter; and where the initial condition for the dynamical system satisfies |vi,0|≤vc ∀i.
In the n-th iteration of EQ. 8, as the n-th time step for the neuron i EQ. 8 can be expressed in terms of the objective function for the neuron model, the network energy function shown in EQ. 2:
The dynamics resulting from EQ. 8 for a barrier function Ψ( )=0, and because the energy function is a smooth function, the neural variables vi,n converge to a local minimum, such that limn→∞vi,n=vi*. Accordingly, the third constraint (c) on the dynamical system in EQ. 8 can be expressed as:
Thus, as long as the constraint on membrane potential is enforced, the gradient term tends to zero, ensuring the dynamical system converges to the optimal solution within the domain defined by the bound constraints. The dynamical system represented by EQ. 8 ensures the steady-state neural responses, i.e., the membrane potentials, are maintained below vc. In the absence of the barrier term, the membrane potentials can converge to any value between −vc and +vc based on the effective inputs to individual neurons.
s
i,n
=v
i,n
+CΨ(vi,n) EQ. 12
where the trans-impedance parameter C>0Ω determines the magnitude of the spike and incorporates the hyperpolarization part of the spike as a result of vi oscillating around the gradient discontinuity. Thus, a refractory period is automatically incorporated in between two spikes.
This implies that asymptotically the network exhibits limit-cycles about a single attractor or a fixed-point such that the time-expectations of its state variables encode this optimal solution. A similar stochastic first-order framework was used to derive a dynamical system corresponding to ΣΔ modulation for tracking low-dimensional manifolds embedded in high-dimensional analog signal spaces. Combining EQ. 9 and EQ. 13 becomes
Rearranging the terms in EQ. 14, EQ. 7 is obtained. The penalty function R(vi) in the network energy functional in effect models the power dissipation due to spiking activity. For the form of R(.), the power dissipation due to spiking is taken to be zero below the threshold, and increases linearly above threshold.
The penalty term R(vi) of the form presented above works analogous to a barrier function, penalizing the energy functional whenever vi,n exceeds the threshold. This transforms the time-evolution of vi,n into a spiking mode above the threshold, while keeping the sub-threshold dynamics similar to the non-spiking case. The Growth Transform dynamical system ensures that the membrane potentials are bounded, thereby implementing a squashing or compressive function on the neural responses. How the model encodes external stimulus as a combination of spiking and bounded dynamics is described here. The average spiking activity of the i-th neuron encodes the error between the average input and the weighted sum of membrane potentials. For a single, uncoupled neuron where
there is
i[n]+Q0
Multiplying EQ. 17 on both sides by C Ω, where
it becomes
C
i[n]+
or, si[n]=C
where the EQ. 12 has been used.
EQ. 19 indicates that through a suitable choice of the trans-impedance parameter C, the sum of sub-threshold and supra-threshold responses encodes the external input to the neuron. This is also the rationale behind adding a spike to the sub-threshold response vi,n, as illustrated in
i[n]=
where the average spiking activity tracks the stimulus. Thus, by defining the coupling matrix in various ways, different encoding schemes for the network can be obtained.
The remapping from standard coupled conditions of a spiking neural network to the proposed formulation admits a geometric interpretation of neural dynamics. It may be shown that the activity of individual neurons in a network can be visualized with respect to a network hyper-plane. This geometric interpretation can then be used to understand network dynamics in response to different stimuli. Like a Hessian, if it is assumed that the matrix Q is positive-definite about a local attractor, there exists a set of vectors xi∈D, i=1, . . . , M such that each of the elements Qij can be written as an inner product between two vectors as Qij=xi.xj, 1≤i, j≤M. This is similar to kernel methods that compute similarity functions between pairs of vectors in the context of support vector machines. This associates the i-th neuron in the network with a vector xi, mapping it onto an abstract metric space D and essentially providing an alternate geometric representation of the neural network. From EQ. 14, the spiking activity of the i-th neuron for the n-th time-window can then be represented as
Single neurons show a vast repertoire of response characteristics and dynamical properties that lend richness to their computational properties at the network level. The proposed model may be extended into a continuous-time dynamical system, which enables it to reproduce a vast majority of such dynamics and also allows exploration of interesting properties in coupled networks. The continuous-time version of the dynamical system is derived using a special property of Growth Transforms.
The operation of the proposed neuron model is therefore governed by two sets of dynamics: (a) minimization of the network energy functional H; (b) modulation of the trajectory using a time-constant τi(t), also referred to as modulation function. Fortunately, the evolution of τi(t) can be made as complex as possible without affecting the asymptotic fixed-point solution of the optimization process. It can be a made a function of local variables like vi and {dot over (v)}i or a function of global/network variables like H and {dot over (H)}.
The proposed approach enables decoupling of the three following aspects of the spiking neural network: (a) fixed points of the network energy functional, which depend on the network configuration and external inputs; (b) nature and shape of neural responses, without affecting the network minimum; and (c) spiking statistics and transient neural dynamics at the cellular level, without affecting the network minimum or spike shapes. This makes it possible to independently control and optimize each of these neuro-dynamical properties without affecting the others. The first two aspects arise directly from an appropriate selection of the energy functional and were demonstrated above. Next, it will be shown how the modulation function loosely models cell excitability, and can be varied to tune transient firing statistics based on local and/or global variables. This allows encoding of the same optimal solution using widely different firing patterns across the network to have unique potential benefits for neuromorphic applications.
First, how a number of single-neuron response characteristics can be reproduced by changing the modulation function τi(t) in the neuron model is shown. For this, an uncoupled network is considered, where
These dynamics may be extended to build coupled networks with properties like memory and global adaptation for energy-efficient neural representation. Results are representative of the types of dynamical properties the proposed model can exhibit, but are by no means exhaustive.
When stimulated with a constant current stimulus bi, a vast majority of neurons fire single, repetitive action potentials for the duration of the stimulus, with or without adaptation. The proposed model shows tonic spiking without adaptation when the modulation function τi(t)=τ, where τ>0 s.
Bursting neurons fire discrete groups of spikes interspersed with periods of silence in response to a constant stimulus. Bursting arises from an interplay of fast ionic currents responsible for spiking, and slower intrinsic membrane currents that modulate the spiking activity, causing the neuron to alternate between activity and quiescence. Bursting response can be simulated in the proposed model by modulating τi(t) at a slower rate compared to the generation of action potentials, in the following way:
where τ1>τ2>0 s, B is a parameter and the count variable ci(t) is updated according to
I[.] being an indicator function.
When presented with a prolonged stimulus of constant amplitude, many cortical cells initially respond with a high-frequency spiking that decays to a lower steady-state frequency. This adaptation in the firing rate is caused by a negative feedback to the cell excitability due to the gradual inactivation of depolarizing currents or activation of slow hyperpolarizing currents upon depolarization, and occur at a time-scale slower than the rate of action potential generation. The spike-frequency adaptation was modeled by varying the modulation function according to
τi(t)=τ−2ϕ(h(t)*Ψ(vi(t))) EQ. 26
where h(t)*Ψ(v)(t) is a convolution operation between a continuous-time first-order smoothing filter h(t) and the spiking function Ψ(vi(t)), and
is a compressive function that ensures 0≤τi(t)≤τ s. The parameter τ determines the steady-state firing rate for a particular stimulus.
The proposed framework can be extended to a network model where the neurons, apart from external stimuli, receive inputs from other neurons in the network. First, Q is considered to be a positive-definite matrix, which gives a unique solution of EQ. 2. Although elements of the coupling matrix Q already capture the interactions among neurons in a coupled network, modulation function may be further defined as follows to make the proposed model behave as a standard spiking network
with the compressive-function ϕ(.) given by EQ. 27. EQ. 28 ensures that Qij>0 corresponds to an excitatory coupling from the pre-synaptic neuron j, and Qij<0 corresponds to an inhibitory coupling, as shown in
Apart from the pre-synaptic adaptation that changes individual firing rates based on the input spikes received by each neuron, neurons in the coupled network can be made to adapt according to the global dynamics by changing the modulation function as follows
with the compressive-function ϕ(.) given by EQ. 27. The new function F(.) is used to capture the dynamics of the network cost-function. As the network starts to stabilize and converge to a fixed-point, the function τi(.) adapts to reduce the spiking rate of the neuron without affecting the steady-state solution.
where F0>0 is a tunable parameter. This feature is important in designing energy-efficient spiking networks where energy is only dissipated during transients.
Next, a small network of neurons on a two-dimensional co-ordinate space is considered, and arbitrary inputs are assigned to the neurons. A Gaussian kernel is chosen for the coupling matrix Q as follows
Q
ij=exp(−γ∥xi−xj∥22). EQ. 31
This clusters neurons with stronger couplings between them closer to each other on the co-ordinate space, while placing neurons with weaker couplings far away from each other.
The Growth Transform neural network inherently shows a number of encoding properties that are commonly observed in biological neural networks. For example, the firing rate averaged over a time window is a popular rate coding technique that claims that the spiking frequency or rate increases with stimulus intensity. A temporal code like the time-to-first-spike posits that a stronger stimulus brings a neuron to the spiking threshold faster, generating a spike, and hence relative spike arrival times contain critical information about the stimulus. These coding schemes can be interpreted under the umbrella of network coding using the same geometric representation considered above. Here, the responsiveness of a neuron is closely related to its proximity to the hyperplane. The neurons which exhibit more spiking are located at a greater distance from the hyperplane.
The encoding of a stimulus in the spatiotemporal evolution of activity in a large population of neurons is often represented by a unique trajectory in a high-dimensional space, where each dimension accounts for the time-binned spiking activity of a single neuron. Projection of the high-dimensional activity to two or three critical dimensions using dimensionality reduction techniques like Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA) have been widely used across organisms and brain regions to shed light on how neural population response evolves when a stimulus is delivered. For example in identity coding, trajectories corresponding to different stimuli evolve toward different regions in the reduced neural subspace, that often become more discriminable with time and are stable over repeated presentations of a particular stimulus. This can be explained in the context of the geometric interpretation.
For the same network as above, the experiment starts from the same baseline, and perturbing the stimulus vector in two different directions. This pushes the network hyperplane in two different directions, exciting different subsets of neurons, as illustrated in
due to the presence of more than one attractor state. This is demonstrated by considering two different stimulus histories in a network of four neurons, where a stimulus “Stim 1a” precedes another stimulus “Stim 2” in
Associative memories are neural networks which can store memory patterns in the activity of neurons in a network through a Hebbian modification of their synaptic weights and recall a stored pattern when stimulated with a partial fragment or a noisy version of the pattern. Using an associative memory network of Growth Transform neurons shows how network trajectories are used to recall stored patterns and to use global adaptation using very few spikes and high recall accuracy.
The network comprises M=100 neurons, out of which a randomly selected subset m=10 are active for any stored memory pattern. The elements of the transconductance coupling matrix are set according to the following standard Hebbian learning rule
where k is a scaling factor and ts∈[0, 1]M, s=1, . . . , S, are the binary patterns stored in the network. During the recall phase, only half of the cells active in the original memory are stimulated with a steady depolarizing input, and the spiking pattern across the network is recorded. Instead of determining the active neurons during recall through thresholding and directly comparing with the stored binary pattern, the recall performance of the network was quantitatively measured by computing the mean distance between each pair of original-recall spiking dynamics as they unfold over time. This ensures that the firing of the neurons that belong to the pattern is accounted for, albeit are not directly stimulated, but exploits any contributions from the rest of the neurons in making the spiking dynamics more dissimilar in comparison to recalls for other patterns.
When the network is made to globally adapt according to the system dynamics, the steady-state trajectories can be encoded using very few spikes.
where rn, ΔtΨn and Δrn are the vectors of mean firing rates, mean inter-spike intervals and changes in the mean firing rates for the n-th bin for the entire network, respectively. The mean inter-spike interval is set equal to the bin length if there is a single spike over the entire bin length, and equal to twice the bin length if there are none. Note that the inter-spike interval computed for one time-bin may be different from (1/r), particularly for low firing rates, and hence encodes useful information. The similarity metric between the u-th stored pattern and the v-th recall pattern is given by
s
u,v=1−distu,v, EQ. 35
where distu,v is the mean Euclidean distance between the two decoding vectors over the total number of time-bins, normalized between [0, 1]. To estimate the capacity of the network, the mean recall accuracy over 10 trials for varying number of stored patterns is calculated, both with and without global adaptation.
Aside from pattern completion, associative networks are also commonly used for identifying patterns from their noisy counterparts. A similar associative memory network was used as above to classify images from the MNIST dataset which were corrupted with additive white Gaussian noise at different signal-to-noise ratios (SNRs), and which were, unlike in the previous case, unseen by the network before the recall phase. The network size in this case was M=784, the number of pixels in each image, and the connectivity matrix was set using a separate, randomly selected subset of 5,000 binary, thresholded images from the training dataset according to EQ. 33. Unseen images from the test dataset were corrupted at different SNRs and fed to the network after binary thresholding.
The test accuracies and mean spike counts for a test image are plotted and shown in
A geometric approach to achieve primal-dual mapping is illustrated in
A GTNN can also combine learning and noise-shaping, whereby the neurons that are important for discrimination, are closer to the stimuli hyperplane, and exhibit a more pronounced noise-shaping effect. This is illustrated in
Investigating real-time learning algorithms for GT neural network is critical to investigate a dynamical systems approach to adapt Q such that the network can solve a recognition task, as opposed to assuming the synaptic matrix Q to be fixed. In this regard, most learning algorithms are energy-based, developed with the primary objective of minimizing the error in inference and the energy functional captures dependencies among variables, for example, features and class labels. Learning in this case consists of finding an energy function that associates low energies to observed configurations of the variables, and high energies to unobserved ones. The energy function models the average error in prediction made by a network when it has been trained on some input data, so that inference is reduced to the problem of finding a good local minimum in this energy landscape. In the proposed approach, learning would mean optimizing the network for error in addition to energy, which in this case models a different system objective—the one of finding optimal neural responses. This is illustrated in
To achieve this task, a min-max optimization technique is investigated that will balance the short-term dynamics determined by the activity of neurons with the long-term dynamics determined by the activity of the synapses. The formulation is summarized as
where the metabolic cost function H will be maximized with respect to synaptic matrix Q, in addition to the minimization with respect to the neuronal responses. A normalization constraint shown in EQ. 36 is imposed which will ensure that the synaptic weights are sparse and don't decay to zero. This will also ensure that growth-transformations could be used for adapting Q, as shown in
For instance, if the normalization constraint Qij+Qji=1 is imposed, then the form of adaptation will not only model the conventional Hebbian learning but will also model learning due to spike-timing-dependent-plasticity (STDP). The time-constant r is chosen to ensure a balance between the short-term neuronal dynamics and long-term synaptic dynamics.
The min-max optimization for a simple learning problem comprising of one-layer auto-encoder network that performs inherent clustering in a feature space is investigated, followed by a layer of supervised learning to match inputs with the correct class labels. How the same general form of the cost function can perform both clustering and linear separation by introducing varying amounts of non-linearity at each layer is then investigated. For example, normalization of neural responses, along with a weaker regularization, introduces a compressive non-linearity in the neural responses at the hidden layer. This will be a by-product of optimizing the network objective function itself, and not a result of imposing an explicit non-linearity such as sigmoid, tanh, etc. at each neuron. This moreover has the capacity, through proper initialization of weights, to ensure that each hidden neuron preferentially responds to one region of the feature space, by tuning the weights in that direction. As such, it eradicates the need of modifying the network objective function, reconstruction error, to include cluster assignment losses, as is usually done in auto-encoder based clustering techniques to ensure similar data are grouped into the same clusters. The learning framework is then applied to a large-scale network comprising of 10-100 hidden layers comprising of 1 million to 100 million neurons. A GPU-based GT neural network accelerator is capable simulating 20 million neurons in near real-time. Since the network as a whole solves for both the optimal neural responses and synaptic weights through an optimization process, the non-linear neural responses at any layer will converge not in a single step of computation, but rather in a continuous manner. By appropriately adapting the modulation function for the neurons and the synaptic time-constant, intermediate responses are ensued to be continuously transmitted to the rest of the network through synapses. This ensures that all layers in the network can be updated simultaneously and in an asynchronous fashion. This is a new way of bypassing temporal non-locality of information that is often a bottleneck in neural networks, allowing dependent layers in the network to continuously update even when the preceding layers have not converged to their final values for a specific input.
While the dynamics of the proposed GTNN could be simulated using a GPU-based digital hardware accelerator, the underlying energy-efficiency advantages could only be realized when the architecture is mapped onto analog hardware and shows the importance of investigating scalability of GTNN on analog platforms. This mapping results in several variants of analog support-vector-machines. However, one of the challenges of implementing large-scale analog neural network is scaling the synaptic connectivity to millions or billions of neurons. Another limitation arises due to the lack of a reliable analog memory and the underlying cost of updating the contents of the memory. Overcoming both these limitations by exploiting some unique properties of the proposed fully coupled continuous-time architecture is important to GTNN performance.
The common approach to implement a large-scale spiking neuromorphic processor is to use an address event representation (AER) technique to route spikes over a digital bus. A spike generated by a neuron on one processor is routed to a neuron located on another processor by packetizing the source/destination information using an AER encoder/decoder, typically co-located on every neuromorphic processor. This encoding is not possible for transmitting analog signals which is only practical for connectivity across neighboring processors/chips. This is illustrated in
Reliable and scalable analog memory has been one of the challenges in the implementation of analog and continuous time processors. While some of the emerging memories like memristors or synaptic elements could be used to implement analog memories, and in-memory computation, the complexity arises due to heterogeneous integration into CMOS and sometimes an elaborate addressing mechanism for programming. The procedure not only dissipates significant energy, but in many cases the non-linearity and hysteresis in the device complicates the underlying update rule. Growth transform network may be used as an associative memory, where the synaptic junction Qij. is updated according to EQ. 37. By ensuring that Q is locally non-positive definite, the network can be controlled to converge to a local minima, as shown in
Next, an integrated circuit prototype which implements growth transform neurons and the corresponding synaptic update rules is implemented. For this purpose, sub-threshold and translinear/current-mode analog circuit topologies are leveraged. Some of the examples of system-on-chip prototypes are shown in
Evaluation included validation of both the GTNN algorithms and GTNN processor for benchmark real-time learning tasks. The current version of the GTNN software is able to simulate a network of 20 million neurons in real-time (spiking rate of 1-100 Hz) on a GPU-based accelerator. Enhancements include incorporating long-term and real-time learning dynamics for different network topologies and connectivity. Even though the proposed framework is applicable and scalable to most real-time learning task, the system was validated for a real-time speaker recognition task. A universal challenge has been to learn to identify new speakers and under varying environmental conditions. First, a benchmark GTNN network was trained to recognize 10 speakers out of a possible 100 speakers from a YOHO dataset. During the run-time environment, the features were presented corresponding to the rest of the 90 speakers, sequentially in time. In some cases, the labels, or reinforcement, were presented to the network in addition to the features shown in
A new spiking neuron and population model based on the Growth Transform dynamical system has been described. The system minimizes an appropriate energy functional under realistic physical constraints to produce emergent spiking activity in a population of neurons. This disclosure treats the spike generation and transmission processes in a spiking network as an energy-minimization problem involving continuous-valued neural state variables like the membrane potential. The neuron model and its response are tightly coupled to the network objective, and are flexible enough to incorporate different neural dynamics that have been observed at the cellular level in electrophysiological recordings.
The general approach described above offers an elegant way to design neuromorphic machine learning algorithms by bridging the gap that currently exists between bottom-up models that can simulate biologically realistic neural dynamics but do not have a network-level representation, and top-down machine learning models that start with a network loss function, but reduce the problem to the model of a non-spiking neuron with static non-linearities. In this regard, machine learning models are primarily developed with the objective of minimizing the error in inference by designing a loss function that captures dependencies among variables, for example, features and class labels. Learning in this case consists of adapting weights in order to associate low energies, or losses, to observed configurations of variables, and high energies, to unobserved ones. The non-differentiable nature of spiking dynamics makes it difficult to formulate loss functions involving neural variables. Neuromorphic algorithms currently work around this problem in different ways, including mapping deep neural nets to spiking networks through rate-based techniques, formulating loss functions that penalize the difference between actual and desired spike-times, or approximating the derivatives of spike signals through various means. Formulating the spiking dynamics of the entire network using an energy function involving neural state variables across the network would enable direct use of the energy function itself for learning weight parameters. Since the proposed energy function encompasses all the neurons in the network, and not just the “visible neurons” as in most neuromorphic machine learning algorithms, it can potentially enable easier and more effective training of hidden neurons in deep networks. Moreover, it would allow incorporation and experimentation with biologically relevant neural dynamics that could have significant performance and energy benefits.
This neuron-to-network energy-efficiency gap can be addressed by using a fully-coupled, analog neuromorphic architecture where the entire learning network is designed as a unified dynamical system that encodes information using short-term and long-term network dynamics.
A method for designing energy-efficient, real-time neuromorphic processors using growth-transform neural network (GTNN) based dynamical systems is described.
Technical advantages of this disclosure is allowing for independent control over three neuro-dynamical properties: (a) control over the steady-state population dynamics that encodes the minimum of an exact network energy functional; (b) control over the shape of the action potentials generated by individual neurons in the network without affecting the network minimum; and (c) control over spiking statistics and transient population dynamics without affecting the network minimum or the shape of action potentials. At the core of the model are different variants of Growth Transform dynamical systems that produce stable and interpretable population dynamics, irrespective of the network size and the type of neuronal connectivity (inhibitory or excitatory). This model has been configured to produce different types of single-neuron dynamics as well as population dynamics. The network is shown to adapt such that it encodes the steady-state solution using a reduced number of spikes upon convergence to the optimal solution. This network constructs a spiking associative memory that uses fewer spikes compared to conventional architectures, while maintaining high recall accuracy at high memory loads.
Embodiments of the disclosure may be described in the general context of computer-executable instructions, such as program modules, executed by one or more computers or other devices. The computer-executable instructions may be organized into one or more computer-executable components or modules. Generally, program modules include, but are not limited to, routines, programs, objects, components, and data structures that perform particular tasks or implement particular abstract data types. Aspects of the disclosure may be implemented with any number and organization of such components or modules. For example, aspects of the disclosure are not limited to the specific computer-executable instructions or the specific components or modules illustrated in the figures and described herein. Other embodiments of the disclosure may include different computer-executable instructions or components having more or less functionality than illustrated and described herein. Aspects of the disclosure may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
The computer systems, computing devices, and computer-implemented methods discussed herein may include additional, less, or alternate actions and/or functionalities, including those discussed elsewhere herein. The computer systems may include or be implemented via computer-executable instructions stored on non-transitory computer-readable media. The methods may be implemented via one or more local or remote processors, transceivers, servers, and/or sensors (such as processors, transceivers, servers, and/or sensors mounted on vehicle or mobile devices, or associated with smart infrastructure or remote servers), and/or via computer executable instructions stored on non-transitory computer-readable media or medium.
In some aspects, a computing device is configured to implement machine learning, such that the computing device “learns” to analyze, organize, and/or process data without being explicitly programmed. Machine learning may be implemented through machine learning (ML) methods and algorithms. In one aspect, a machine learning (ML) module is configured to implement ML methods and algorithms. In some aspects, ML methods and algorithms are applied to data inputs and generate machine learning (ML) outputs. Data inputs may include but are not limited to: images or frames of a video, object characteristics, and object categorizations. Data inputs may further include: sensor data, image data, video data, telematics data, authentication data, authorization data, security data, mobile device data, geolocation information, transaction data, personal identification data, financial data, usage data, weather pattern data, “big data” sets, and/or user preference data. ML outputs may include but are not limited to: a tracked shape output, categorization of an object, categorization of a type of motion, a diagnosis based on motion of an object, motion analysis of an object, and trained model parameters ML outputs may further include: speech recognition, image or video recognition, medical diagnoses, statistical or financial models, autonomous vehicle decision-making models, robotics behavior modeling, fraud detection analysis, user recommendations and personalization, game AI, skill acquisition, targeted marketing, big data visualization, weather forecasting, and/or information extracted about a computer device, a user, a home, a vehicle, or a party of a transaction. In some aspects, data inputs may include certain ML outputs.
In some aspects, at least one of a plurality of ML methods and algorithms may be applied, which may include but are not limited to: linear or logistic regression, instance-based algorithms, regularization algorithms, decision trees, Bayesian networks, cluster analysis, association rule learning, artificial neural networks, deep learning, dimensionality reduction, and support vector machines. In various aspects, the implemented ML methods and algorithms are directed toward at least one of a plurality of categorizations of machine learning, such as supervised learning, unsupervised learning, and reinforcement learning.
In one aspect, ML methods and algorithms are directed toward supervised learning, which involves identifying patterns in existing data to make predictions about subsequently received data. Specifically, ML methods and algorithms directed toward supervised learning are “trained” through training data, which includes example inputs and associated example outputs. Based on the training data, the ML methods and algorithms may generate a predictive function which maps outputs to inputs and utilize the predictive function to generate ML outputs based on data inputs. The example inputs and example outputs of the training data may include any of the data inputs or ML outputs described above. For example, a ML module may receive training data comprising customer identification and geographic information and an associated customer category, generate a model which maps customer categories to customer identification and geographic information, and generate a ML output comprising a customer category for subsequently received data inputs including customer identification and geographic information.
In another aspect, ML methods and algorithms are directed toward unsupervised learning, which involves finding meaningful relationships in unorganized data. Unlike supervised learning, unsupervised learning does not involve user-initiated training based on example inputs with associated outputs. Rather, in unsupervised learning, unlabeled data, which may be any combination of data inputs and/or ML outputs as described above, is organized according to an algorithm-determined relationship. In one aspect, a ML module receives unlabeled data comprising customer purchase information, customer mobile device information, and customer geolocation information, and the ML module employs an unsupervised learning method such as “clustering” to identify patterns and organize the unlabeled data into meaningful groups. The newly organized data may be used, for example, to extract further information about a customer's spending habits.
In yet another aspect, ML methods and algorithms are directed toward reinforcement learning, which involves optimizing outputs based on feedback from a reward signal. Specifically ML methods and algorithms directed toward reinforcement learning may receive a user-defined reward signal definition, receive a data input, utilize a decision-making model to generate a ML output based on the data input, receive a reward signal based on the reward signal definition and the ML output, and alter the decision-making model so as to receive a stronger reward signal for subsequently generated ML outputs. The reward signal definition may be based on any of the data inputs or ML outputs described above. In one aspect, a ML module implements reinforcement learning in a user recommendation application. The ML module may utilize a decision-making model to generate a ranked list of options based on user information received from the user and may further receive selection data based on a user selection of one of the ranked options. A reward signal may be generated based on comparing the selection data to the ranking of the selected option. The ML module may update the decision-making model such that subsequently generated rankings more accurately predict a user selection.
As will be appreciated based upon the foregoing specification, the above-described aspects of the disclosure may be implemented using computer programming or engineering techniques including computer software, firmware, hardware or any combination or subset thereof. Any such resulting program, having computer-readable code means, may be embodied or provided within one or more computer-readable media, thereby making a computer program product, i.e., an article of manufacture, according to the discussed aspects of the disclosure. The computer-readable media may be, for example, but is not limited to, a fixed (hard) drive, diskette, optical disk, magnetic tape, semiconductor memory such as read-only memory (ROM), and/or any transmitting/receiving medium, such as the Internet or other communication network or link. The article of manufacture containing the computer code may be made and/or used by executing the code directly from one medium, by copying the code from one medium to another medium, or by transmitting the code over a network.
Definitions and methods described herein are provided to better define the present disclosure and to guide those of ordinary skill in the art in the practice of the present disclosure. Unless otherwise noted, terms are to be understood according to conventional usage by those of ordinary skill in the relevant art.
In some embodiments, numbers expressing quantities of ingredients, properties such as molecular weight, reaction conditions, and so forth, used to describe and claim certain embodiments of the present disclosure are to be understood as being modified in some instances by the term “about.” In some embodiments, the term “about” is used to indicate that a value includes the standard deviation of the mean for the device or method being employed to determine the value. In some embodiments, the numerical parameters set forth in the written description and attached claims are approximations that can vary depending upon the desired properties sought to be obtained by a particular embodiment. In some embodiments, the numerical parameters should be construed in light of the number of reported significant digits and by applying ordinary rounding techniques. Notwithstanding that the numerical ranges and parameters setting forth the broad scope of some embodiments of the present disclosure are approximations, the numerical values set forth in the specific examples are reported as precisely as practicable. The numerical values presented in some embodiments of the present disclosure may contain certain errors necessarily resulting from the standard deviation found in their respective testing measurements. The recitation of ranges of values herein is merely intended to serve as a shorthand method of referring individually to each separate value falling within the range. Unless otherwise indicated herein, each individual value is incorporated into the specification as if it were individually recited herein.
Also, in the embodiments described herein, additional input channels may be, but are not limited to, computer peripherals associated with an operator interface such as a mouse and a keyboard. Alternatively, other computer peripherals may also be used that may include, for example, but not be limited to, a scanner. Furthermore, in the exemplary embodiment, additional output channels may include, but not be limited to, an operator interface monitor.
In some embodiments, the terms “a” and “an” and “the” and similar references used in the context of describing a particular embodiment (especially in the context of certain of the following claims) can be construed to cover both the singular and the plural, unless specifically noted otherwise. In some embodiments, the term “or” as used herein, including the claims, is used to mean “and/or” unless explicitly indicated to refer to alternatives only or the alternatives are mutually exclusive.
The terms “comprise,” “have” and “include” are open-ended linking verbs. Any forms or tenses of one or more of these verbs, such as “comprises,” “comprising,” “has,” “having,” “includes” and “including,” are also open-ended. For example, any method that “comprises,” “has” or “includes” one or more steps is not limited to possessing only those one or more steps and can also cover other unlisted steps. Similarly, any composition or device that “comprises,” “has” or “includes” one or more features is not limited to possessing only those one or more features and can cover other unlisted features.
All methods described herein can be performed in any suitable order unless otherwise indicated herein or otherwise clearly contradicted by context. The use of any and all examples, or exemplary language (e.g. “such as”) provided with respect to certain embodiments herein is intended merely to better illuminate the present disclosure and does not pose a limitation on the scope of the present disclosure otherwise claimed. No language in the specification should be construed as indicating any non-claimed element essential to the practice of the present disclosure.
Groupings of alternative elements or embodiments of the present disclosure disclosed herein are not to be construed as limitations. Each group member can be referred to and claimed individually or in any combination with other members of the group or other elements found herein. One or more members of a group can be included in, or deleted from, a group for reasons of convenience or patentability. When any such inclusion or deletion occurs, the specification is herein deemed to contain the group as modified thus fulfilling the written description of all Markush groups used in the appended claims.
All publications, patents, patent applications, and other references cited in this application are incorporated herein by reference in their entirety for all purposes to the same extent as if each individual publication, patent, patent application or other reference was specifically and individually indicated to be incorporated by reference in its entirety for all purposes. Citation of a reference herein shall not be construed as an admission that such is prior art to the present disclosure.
Having described the present disclosure in detail, it will be apparent that modifications, variations, and equivalent embodiments are possible without departing the scope of the present disclosure defined in the appended claims. Furthermore, it should be appreciated that all examples in the present disclosure are provided as non-limiting examples. The systems and methods described herein are not limited to the specific embodiments described herein, but rather, components of the systems and/or steps of the methods may be utilized independently and separately from other components and/or steps described herein.
Although specific features of various embodiments of the disclosure may be shown in some drawings and not in others, this is for convenience only. In accordance with the principles of the disclosure, any feature of a drawing may be referenced and/or claimed in combination with any feature of any other drawing.
This written description uses examples to provide details on the disclosure, including the best mode, and also to enable any person skilled in the art to practice the disclosure, including making and using any devices or systems and performing any incorporated methods. The patentable scope of the disclosure is defined by the claims, and may include other examples that occur to those skilled in the art. Such other examples are intended to be within the scope of the claims if they have structural elements that do not differ from the literal language of the claims, or if they include equivalent structural elements with insubstantial differences from the literal language of the claims.
This application claims the benefit of priority to U.S. Provisional Patent Application No. 62/865,703 filed Jun. 24, 2019, and titled “Method for Designing Scalable and Energy-efficient Analog Neuromorphic Processors,” the entire contents of which are hereby incorporated by reference herein.
Number | Date | Country | |
---|---|---|---|
62865703 | Jun 2019 | US |