A portion of the disclosure of this patent document contains material that is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, as it appears in the Patent and Trademark Office patent files or records, but otherwise reserves all copyright rights whatsoever.
The disclosure relates generally to the field of neuromorphic computing, as well as neural networks. More particularly, the disclosure is directed to methods and apparatus for spiking neural network computing based on e.g., a multi-layer kernel architecture, shared dendritic encoding, and/or thresholding of accumulated spiking signals.
Traditionally, computers include at least one processor and some form of memory. Computers are programmed by writing a program composed of processor-readable instructions to the computer's memory. During operation, the processor reads the stored instructions from memory and executes various arithmetic, data path, and/or control operations in sequence to achieve a desired outcome. Even though the traditional compute paradigm is simple to understand, computers have rapidly improved and expanded to encompass a variety of tasks. In modern society, they have permeated everyday life to an extent that would have been unimaginable only a few decades ago.
While the general compute paradigm has found great commercial success, modern computers are still no match for the human brain. Transistors (the components of a computer chip) can process many times faster than a biological neuron; however, this speed comes at a significant price. For example, the fastest computers in the world can perform nearly a quadrillion computations per second (1016 bits/second) at a cost of 1.5 megawatts (MW). In contrast, a human brain contains ˜80 billion neurons and can perform approximately the same magnitude of computation at only a fraction of the power (about 10 watts (W)).
Incipient research is directed to so-called “neuromorphic computing” which refers to very-large-scale integration (VLSI) systems containing circuits that mimic the neuro-biological architectures present in the brain. While neuromorphic computing is still in its infancy, such technologies already have great promise for certain types of tasks. For example, neuromorphic technologies are much better at finding causal and/or non-linear relations in complex data when compared to traditional compute alternatives. Neuromorphic technologies could be used for example to perform speech and image recognition within power-constrained devices (e.g., cellular phones, etc.) Conceivably, neuromorphic technology could integrate energy-efficient intelligent cognitive functions into a wide range of consumer and business products, from driverless cars to domestic robots.
Neuromorphic computing draws from hardware and software models of a nervous system. In many cases, these models attempt to emulate the behavior of biological neurons within the context of existing software processes and hardware structures (e.g., transistors, gates, etc.) Unfortunately, some synergistic aspects of nerve biology have been lost in existing neuromorphic models. For example, biological neurons minimize energy by only sparingly emitting spikes to perform global communication. Additionally, biological neurons distribute spiking signals to dozens of targets at a time via localized signal propagation in dendritic trees. Neither of these aspects are mimicked within existing neuromorphic technologies due to issues of scale and variability.
To these ends, novel neuromorphic structures are needed to efficiently emulate nervous system functionality. Ideally, such solutions should enable mixed-signal neuromorphic circuitry to compensate for one or more of component mismatches and temperature variability, thereby enabling low-power operation for large scale neural networks. More generally, improved methods and apparatus are needed for spiking neural network computing.
The present disclosure satisfies the foregoing needs by providing, inter alia, methods and apparatus for spiking neural network computing based on e.g., a multi-layer kernel architecture, shared dendritic encoding, and/or thresholding of accumulated spiking signals.
In one aspect, method for spiking neural network computing within a multi-layer kernel is disclosed. In one embodiment, the method includes: encoding a first vector based on a first matrix sub-computation associated with a first layer of the multi-layer kernel; decoding a second vector based on a second matrix sub-computation associated with a second layer of the multi-layer kernel; and generating a third vector based on the decoded second vector.
In one variant, encoding the first vector based on the first matrix sub-computation comprises connecting to one or more spatial locations within the first layer of the multi-layer kernel. In one such variant, connecting to one or more spatial locations within the first layer of the multi-layer kernel is an excitatory connection or an inhibitory connection. In one exemplary variant, encoding the first vector based on the first matrix sub-computation further comprises generating an electrical current based on the excitatory connection or the inhibitory connection.
In another variant, decoding the second vector based on the second matrix sub-computation comprises converting a received current to a digital spike. In one such variant, decoding the second vector based on the second matrix sub-computation further comprises multiplying the digital spike by a decoding weight.
In another aspect, a multi-layer kernel apparatus is disclosed. In one embodiment, the multi-layer kernel includes: a first layer comprising a population of somas configured to generate a plurality of spike trains; a second layer comprising one or more accumulator apparatus configured to decode at least one spike train of the plurality of spike trains; and a third layer comprising a shared dendrite configured to encode the at least one spike train to various ones of the population of somas.
In one variant, the one or more accumulator apparatus further comprises memories configured to store one or more decoding weight values. In one exemplary variant, the one or more accumulator apparatus further comprises digital logic configured to: multiply the at least one spike train of the plurality of spike trains by the one or more decoding weight values; and accumulate the multiplied at least one spike train.
In another variant, the shared dendrite further comprises a diffuser network. In one exemplary variant, the diffuser network attenuates current as a function of a spatial assignment. In one such variant the population of somas are further configured to receive a plurality of electrical currents via the diffuser network.
In yet another embodiment, the multi-layer kernel apparatus includes: a first stage comprising an analog processing domain configured to convert a first set of digital spikes into electrical currents for distribution according to an encoding matrix; and a second stage comprising a digital processing domain configured to convert the electrical currents into a second set of digital spikes according to a decoding matrix.
In one variant, the encoding matrix assigns the electrical currents to one or more spatial locations of a diffuser network.
In one variant, the decoding matrix assigns one or more decoding weights to the second set of digital spikes.
In one variant, the multi-layer kernel further includes a threshold accumulator that generates a temporally deprecated output vector based on the second set of digital spikes. In one such variant, the temporally deprecated output vector corresponds to an output vector for use by a user space application. In one exemplary implementation, the first set of digital spikes corresponds to an input vector generated by the user space application. In another such variant, the temporally deprecated output vector is fed back to the first stage. In still another variant, the temporally deprecated output vector is fed to a second analog processing domain configured to convert the temporally deprecated output vector into electrical currents for distribution according to a second encoding matrix.
In another aspect, a processor and non-transitory computer-readable medium implementing one or more of the foregoing aspects is disclosed and described. In one embodiment, the non-transitory computer-readable medium includes one or more instructions which when executed by the processor: encodes a first vector based on a first matrix sub-computation associated with a first layer of the multi-layer kernel; decodes a second vector based on a second matrix sub-computation associated with a second layer of the multi-layer kernel; and generates a third vector based on the decoded second vector.
In another aspect, a processor and non-transitory computer-readable medium implementing one or more of the foregoing aspects is disclosed and described. In one embodiment, the non-transitory computer-readable medium includes one or more instructions which when executed by the processor: receives a first and a second matrix sub-computation; assigns the first matrix sub-computation to a first layer; and assigns a second matrix sub-computation to a second layer.
In another aspect, an integrated circuit (IC) device implementing one or more of the foregoing aspects is disclosed and described. In one embodiment, the IC device is embodied as a SoC (system on Chip) device. In another embodiment, an ASIC (application specific IC) is used as the basis of the device. In yet another embodiment, a chip set (i.e., multiple ICs used in coordinated fashion) is disclosed.
Other features and advantages of the present disclosure will immediately be recognized by persons of ordinary skill in the art with reference to the attached drawings and detailed description of exemplary embodiments as given below.
All figures © Copyright 2018-2019 Stanford University, All rights reserved.
Reference is now made to the drawings, wherein like numerals refer to like parts throughout.
Exemplary embodiments of the present disclosure are now described in detail. While these embodiments are primarily discussed in the context of spiking neural network computing, it will be recognized by those of ordinary skill that the present disclosure is not so limited. In fact, the various aspects of the disclosure are useful in any device or network of devices that is configured to perform neural network computing, as is disclosed herein.
Many characterizations of neural networks treat neuron operation in a “virtualized” or “digital” context; each idealized neuron is individually programmed with various parameters to create different behaviors. For example, biological spike trains are emulated with numeric parameters that represent spiking rates, and synaptic connections are realized with matrix multipliers of numeric values. Idealized neuron behavior can be emulated precisely and predictably, and such systems can be easily understood by artisans of ordinary skill.
As shown in
During operation, a vector of continuous signals (a) representing spiking output for the first ensemble is transformed into an input vector (b) for a second ensemble via a weighting matrix (W) operation. Existing implementations of neural networks perform the weighting matrix (W) operation as a matrix multiplication. The matrix multiplication operations include memory reads of the values of each neuron 102A of the first ensemble, memory reads of the corresponding weights for each connection to a single neuron 102B of the second ensemble, and a multiplication and sum of the foregoing. The result is written to the neuron 102B of the second ensemble. The foregoing process is performed for each neuron 102B of the second ensemble.
As used in the present context, the term “rank” refers to the dimension of the vector space spanned by the columns of a matrix. A linearly independent matrix has linearly independent rows and columns. Thus, a matrix with four (4) columns can have up to a rank of four (4) but may have a lower rank. A “full rank” matrix has the largest possible rank for a matrix of the same dimensions. A “deficient,” “low rank” or “reduced rank” matrix has at least one or more rows or columns that are not linearly independent.
Any single matrix can be mathematically “factored” into a product of multiple constituent matrixes. Specifically, a “factorized matrix” is a “matrix” that can be represented as a product of multiple factor matrices. Only matrixes characterized by a deficient rank can be “factored” or “decomposed” into a “reduced rank structure”.
Referring now to
Notably, each connection is implemented with physical circuitry and corresponds to a number of logical operations. For example, the number of connections between each layer may directly correspond to the number of e.g., computing circuits, memory components, processing cycles, and/or memory accesses. Consequently, even though a full rank matrix could be factored into mathematically identical full rank factor matrices, such a decomposition would increase system complexity (e.g., component cost, and processing/memory complexity) without any corresponding benefit.
More directly, there is a cost trade-off between connection complexity and matrix factorization. To illustrate the relative cost of matrix factorization as a function of connectivity, consider two (2) sets of neurons N1, N2. A non-factorized matrix has a connection between each one of the neurons (i.e.. N1>N2 connections). In contrast, a factorized matrix has connections between each neuron of the first set (N1) and intermediary memories D, and connections between each neuron of the second set (N2) and the intermediary memories (i.e., N1×D+N2×D; or (N1+N2)×D connections). Mathematically, the cost/benefit “crossover” in connection complexity occurs where the number of connections for a factorized matrix equals the number of connections for its non-factorized matrix counterpart. In other words, the inflection point (Dcrossover) is given by N1×N2/(N1+N2). Factorized systems with a larger D than Dcrossover are inefficient compared to their non-factorized counterparts (i.e., with N1×N2 connections); systems with a smaller D than Dcrossover are more efficient.
As one such example, consider the systems 200 and 210 of
As used herein, the terms “decompose”, “decomposition”, “factor”, “factorization” and/or “factoring” refer to a variety of techniques for mathematically dividing a matrix into one or more factor (constituent) matrices. Matrix decomposition may be mathematically identical or mathematically similar (e.g., characterized by a bounded error over a range, bounded derivative/integral of error over a range, etc.)
As used herein, the term “kernel” refers to an association of ensembles via logical layers. Each logical layer may correspond to one or more neurons, intermediary memories, and/or other sequentially distinct entities. The exemplary neural network 200 is a “two-layer” kernel, whereas the exemplary neural network 210 is a “three-layer” kernel. While the following discussion is presented within the context of two-layer and three-kernels, artisans of ordinary skill in the related arts will readily appreciate, given the contents of the present disclosure, that the various principles described herein may be more broadly extended to any higher order kernel (e.g., a four-layer kernel, five-layer kernel, etc.)
Even though the two-layer and three-layer kernels are mathematically identical, the selection of kernel structure has significant implementation and/or practical considerations. As previously noted, each neuron 202 receives and/or generates a continuous signal representing its corresponding spiking rate. In the two-layer kernel, the first ensemble is directly connected to the second ensemble. In contrast, the three-layer kernel interposes an intermediate summation stage 204. During three-layer kernel operation, the first ensemble updates the intermediate summation stage 204, and the intermediate summation stage 204 updates the second ensemble. The kernel structure determines the number of values to store in memory, the number of reads from memory for each update, and the number of mathematical operations for each update.
Each neuron 202 has an associated value that is stored in memory, and each intermediary stage 204 has a corresponding value that is stored in memory. For example, in the illustrated two-layer kernel network 200 there are four (4) neurons 202A connected to four (4) neurons 202B, resulting in sixteen (16) distinct connections that require memory storage. Similarly, the three-layer kernel has four (4) neurons 202A connected to two (2) intermediate summation stages 204, which are connected to four (4) neurons 202B, also resulting in sixteen (16) distinct connections that require memory storage.
The total number of neurons 202 (N) and the total number of intermediary stages 204 (D) that are implemented directly correspond to memory reads and mathematical operations. For example, as shown in the two-layer kernel 200, a signal generated by a single neuron 202 results in updates to N distinct connections. Specifically, an inner product is calculated, which corresponds to N separate read and multiply-accumulate operations. Thus, the inner product results in N reads and N multiply-accumulates.
For a three-layer kernel 210 of
As illustrated within
Heterogeneous neuron programming is necessary to emulate the natural diversity present in biological and analog-hardware neurons (e.g., both vary widely in behavior and characteristics). The Neural Engineering Framework (NEF) is one exemplary theoretical framework for computing with heterogeneous neurons. Various implementations of the NEF have been successfully used to model visual attention, inductive reasoning, reinforcement learning, and many other tasks. One commonly used open-source implementation of the NEF is Neural Engineering Objects (NENGO), although other implementations of the NEF may be substituted with equivalent success by those of ordinary skill in the related arts given the contents of the present disclosure.
As previously noted, existing neural networks individually program each idealized neuron with various parameters to create different behaviors. However, such granularity is generally impractical to be manually configured for large scale systems. The NEF allows a human programmer to describe the various desired functionality at a comprehensible level of abstraction. In other words, the NEF is functionally analogous to a compiler for neuromorphic systems. Within the context of the NEF, complex computations can be mapped to a population of neurons in much the same way that a compiler implements high-level software code with a series of software primitives.
As a brief aside, the NEF enables a human programmer to define and manipulate input/output data structures in the “problem space” (also referred to as the “user space”); these data structures are at a level of abstraction that ignores the eventual implementation within native hardware components. However, a neuromorphic processor cannot directly represent problem space data structures (e.g., floating point numbers, integers, multiple-bit values, etc.); instead, the problem space vectors must be synthesized to the “native space” data structures. Specifically, input data structures must be converted into native space computational primitives, and native space computational outputs must be converted back to problem space output data structures.
In one such implementation of the NEF, a desired computation may be decomposed into a system of sub-computations that are functionally cascaded or otherwise coupled together. Each sub-computation is assigned to a single group of neurons (a “pool”). A pool's activity encodes the input signal as spike trains. This encoding is accomplished by giving each neuron of the pool a “preferred direction” in a multi-dimensional input space specified by an encoding vector. As used herein, the term “preferred direction” refers to directions in the input space where a neuron's activity is maximal (i.e., directions aligned with the encoding vector assigned to that neuron). In other words, the encoding vector defines a neuron's preferred direction in a multi-dimensional input space. A neuron is excited (e.g., receives positive current) when the input vector's direction “points” in the preferred direction of the encoding vector; similarly, a neuron is inhibited (e.g., receives negative current) when the input vector points away from the neuron's preferred direction.
Given a varied selection of encoding vectors and a sufficiently large pool of neurons, the neurons' non-linear responses can form a basis set for approximating arbitrary multi-dimensional functions of the input space by computing a weighted sum of the responses (e.g., as a linear decoding). For example,
Consider an illustrative example of a robot that moves within three-dimensional (3D) space. The input problem space could be the location coordinates in 3D space for the robot. In this scenario, for a system of ten (10) neurons and an input space having a cardinality of three (3), the encoding matrix has dimensions 3×10. During operation, the input vector is multiplied by the conversion matrix to generate the native space inputs. In other words, the location coordinates can be translated to inputs for the system of neurons. Once in native space, the neuromorphic processor can process the native space inputs via its native computational primitives.
The decoding matrix enables the neuromorphic processor to translate native space output vectors back into the problem space for subsequent use by the user space. In the foregoing robot in 3D space scenario, the output problem space could be the voltages to drive actuators in 3D space for the robot. For a system of ten (10) neurons and an output space with a cardinality three (3), the conversion matrix would have the dimensions 10×3.
As shown in
The aforementioned technique can additionally be performed recursively and/or hierarchically. For example, recurrently connecting the output of a pool to its input can be used to model arbitrary multidimensional non-linear dynamic systems with a single pool. Similarly, large network graphs can be created by connecting the output of decoders to the inputs of other decoders. In some cases, linear transforms may additionally be interspersed between decoders and encoders.
Within the context of NEF based computations, errors can arise from either: (i) poor function approximation due to inadequate basis functions (e.g., using too small of a population of neurons) and/or (ii) spurious spike coincidences (e.g., Poisson noise). As demonstrated in
Spurious spiking coincidences (e.g., Poisson noise) is a function of a synaptic time constant and the neurons' spike rates; Poisson noise is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time or space when the events occur with a constant rate and independently of the time since the last event. Specifically, Poisson noise is reduced with longer synaptic time constants. However, cascading stages with long synaptic time constants results in longer computational time.
Artisans of ordinary skill in the related arts will readily appreciate given the foregoing discussion that the foregoing techniques (cascaded factoring and longer synaptic time constants) are in conflict for high-dimensional functions with latency constraints. In other words, factoring may improve approximation, but spike noise will increase if the synaptic time-constant must be reduced so as to fit within a specific latency.
Incipient research is directed to further improving neuromorphic computing with mixed-signal hardware when used in conjunction with heterogeneous neuron programming frameworks described herein. For example, rather than using an “all-digital” network that is individually programmed with various parameters to create different behaviors, a “mixed-signal” network advantageously could treat the practical heterogeneity of real-world components as desirable sources of diversity. For example, transistor mismatch and temperature sensitivity could be used to provide an inherent variety of basis functions.
Various aspects of the present disclosure are presented in greater detail hereinafter. Specifically, methods and apparatus for spiking neural network computing based on e.g., a multi-layer kernel architecture, shared dendritic encoding, and/or thresholding of accumulated spiking signals are disclosed in greater detail hereinafter.
In one exemplary aspect, digital communication is sparsely distributed in space (spatial sparsity) and/or time (temporal sparsity) to efficiently encode and decode signaling within a mixed analog-digital substrate.
In one exemplary embodiment, temporal sparsity may be achieved by combining weighted spike (“delta”) trains via a thresholding accumulator. The thresholding accumulator reduces the total number of delta transactions that propagate through subsequent layers of the kernels. Various disclosed embodiments are able to achieve the same and/or acceptable levels of signal-to-noise ratio (SNR) at a lower output rate than existing techniques.
In another exemplary embodiment, spatial sparsity may be achieved by representing encoders as a sparse set of digitally programmed locations in an array of analog neurons. In one exemplary implementation, the array of analog neurons is a two-dimensional (2D) array and the sparse set of locations are distributed (tap-points) within the array; where each tap-point is characterized by a particular preferred direction. In one such implementation, neurons in the 2D array receive input from the tap-points through a “diffuser” (e.g., a transistor-based implementation of a resistive mesh). Functionally, the diffuser array performs a mathematical convolution via analog circuitry (e.g., resistances).
As used in the present context, the term “sparse” and “sparsity” refer to a dimensional distribution that skips elements of and/or adds null elements to a set. While the present disclosure is primarily directed to sparsity in temporal or spatial dimensions, artisans of ordinary skill in the related arts will readily appreciate that other schemes for adding sparsity may be substituted with equivalent success, including within other dimensions or spaces.
In still another exemplary embodiment, a heterogeneous neuron programming framework can leverage temporal and/or spatial (or other) sparsity within the context of a cascaded multi-layer kernel to provide energy-efficient computations heretofore unrealizable.
While the illustrated embodiment, is shown with a specific tessellation and/or combination of elements, artisans of ordinary skill in the related arts given the contents of the present disclosure will readily appreciate that other tessellations and/or combinations may be substituted. For example, other implementations may use a 1:1 (direct), 2:1 or 1:2 (paired), 3:1 or 1:3, and/or any other N:M mapping of somas to synapses. Similarly, while the present diffuser is shown with a “square” grid, other polygon-based connectivity may be used with equivalent success (e.g., triangular, rectangular, pentagonal, hexagonal, and/or any combination of polygons (e.g., hexagons and pentagons in a “soccer ball” patterning)), or yet other complex shapes or patterns.
Additionally, while the processing fabric 300 of
In one exemplary embodiment, a “soma” includes one or more analog circuits that are configured to generate spike signaling based on a value. In one such exemplary variant, the value is represented by an electrical current. In one exemplary implementation, the soma is configured to receive a first value that corresponds to a specific input spiking rate, and/or to generate a second value that corresponds to a specific output spiking rate. In some such variants, the first and second values are integer values, although they may be portions or fractional values.
In one exemplary embodiment, the input spiking rate and output spiking rate is based on a dynamically configurable relationship. For example, the dynamically configurable relationship may be based on one or more mathematical models of biological neurons that can be configured at runtime, and/or during runtime. In other embodiments, the input spiking rate and output spiking rate is based on a fixed or predetermined relationship. For example, the fixed relationship may be part of a hardened configuration (e.g., so as to implement known functionality).
In one exemplary embodiment, a “soma” includes one or more analog-to-digital conversion (ADC) components or logic configured to generate spiking signaling within a digital domain based on one or more values. In one exemplary embodiment, the soma generates spike signaling having a frequency that is directly based on one or more values provided by a synapse. In other embodiments, the soma generates spike signaling having a pulse density that is directly based on one or more values provided by a synapse. Still other embodiments may utilize generation of spike signaling having a pulse width, pulse amplitude, or any number of other spike signaling techniques.
In one exemplary embodiment, a “synapse” includes one or more digital-to-analog conversion (DAC) components or logic configured to convert spiking signaling in the digital domain into one or more values (e.g., current) in the analog domain. In one exemplary embodiment, the synapse receives spike signaling having a frequency that is converted into a one or more current signals that can be provided to a soma. In other embodiments, the synapse may convert spike signaling having a pulse density, pulse width, pulse amplitude, or any number of other spike signaling techniques into the aforementioned values for provision to the soma.
In one exemplary embodiment, the ADC and/or DAC conversion between spiking rates and values may be based on a dynamically configurable relationship. For example, the dynamically configurable relationship may enable spiking rates to be accentuated or attenuated. More directly, in some configurations, a synapse may be dynamically configured to receive/generate a greater or fewer number of spikes corresponding to the range of values used by the soma. In other words, the synapse may emulate a more or less sensitive connectivity between somas. In other embodiments, the ADC and/or DAC conversion is a fixed configuration. In yet other embodiments, a plurality of selectable predetermined discrete values of “sensitivity” are utilized.
In one exemplary embodiment, a “diffuser” includes one or more diffusion elements that couple each synapse to one or more somas and/or synapses. In one exemplary variant, the diffusion elements are characterized by resistance that attenuates values (current) as a function of spatial separation. In other variants, the diffusion elements may be characterized by active components that actively amplify signal values (current) as a function of spatial separation. While the foregoing diffuser is presented within the context of spatial separation, artisans of ordinary skill in the related arts will appreciate, given the contents of the present disclosure, that other parameters may be substituted with equivalent success. For example, the diffuser may attenuate/amplify signals based on temporal separation, parametric separation, and/or any number of other schemes.
In one exemplary embodiment, the diffuser comprises one or more transistors which can be actively biased to increase or decrease their pass through conductance. In some cases, the transistors may be entirely enabled or disabled so as to isolate (cut-off) one synapse from another synapse or soma. In one exemplary variant, the entire diffuser fabric is biased with a common bias voltage. In other variants, various portions of the diffuser fabric may be selectively biased with different voltages. Artisans of ordinary skill in the related arts given the contents of the present disclosure will readily appreciate that other active components may be substituted with equivalent success; other common examples of active components include without limitation e.g.: diodes, memristors, field effect transistors (FET), and bi-polar junction transistors (BJT).
In other embodiments, the diffuser comprises one or more passive components that have a fixed or characterized impedance. Common examples of such passive components include without limitation e.g., resistors, capacitors, and/or inductors. Moreover, various other implementations may be based on a hybrid configuration of active and passive components. For example, some implementations may use resistive networks to reduce overall cost, with some interspersed MOSFETs to selectively isolate portions of the diffuser from other portions.
Referring now to
In one exemplary embodiment, the spiking neural network 400 includes a digital computing substrate that combines somas 402 emulating spiking neuron functionality with synapses 408 that generate currents for distribution via an analog diffuser 410 (shared dendritic network) to other somas 402. As described in greater detail herein, the combined analog-digital computing substrate advantageously enables, inter alia, the synthesis of spiking neural nets of unprecedented scale.
In one exemplary embodiment, computations are mapped onto the spiking neural network 400 by using an exemplary Neural Engineering Framework (NEF) synthesis tool. During operation, the NEF synthesis assigns encoding and decoding vectors to various ensembles. As previously noted, encoding vectors define how a vector of continuous signals is encoded into an ensemble's spiking activity. Decoding vectors define how a mathematical transformation of the vector is decoded from an ensemble's spiking activity. This transformation may be performed in a single step by combining decoding and encoding vectors to obtain synaptic weights that connect one ensemble directly to another and/or back to itself (for a dynamic transformation). This transformation may also be performed in multiple steps according to the aforementioned factoring property of matrix operations.
The illustrated mixed analog-digital substrate of
In one exemplary embodiment, a transformation of a vector of continuous signals is decoded from an ensemble's spike activity by weighting a decoding vector (d) assigned to each soma 402 by its spike rate value and summing the results across the ensemble. This operation is performed in the digital domain on spiking inputs to the thresholding accumulators 406. The resulting vector is assigned connectivity to one or more synapses 408, and encoded for the next ensemble's spike activity by taking the resulting vector's inner-product with encoding vectors (e) assigned to that ensemble's neurons via the assigned connectivity. As previously noted, the decoding and encoding operations result in a mathematical kernel with three layers. Specifically, the decoding vectors define weights between the first and the second layers (the somas 402 and the thresholding accumulators 406) while encoding tap-weights define connectivity between the second and third layers (the synapses 408 and the shared dendrite 410).
In one exemplary embodiment, the decoding weights are granular weights which may take on a range of values. For example, decoding weights may be chosen or assigned from a range of values. In one such implementation, the range of values may span positive and negative ranges. In one exemplary variant, the decoding weights are assigned to values within the range of +1 to −1.
In one exemplary embodiment, connectivity is assigned between the accumulator(s) 406 and the synapse(s) 408. In one exemplary variant, connectivity may be excitatory (+1), not present (0), or inhibitory (−1). Various other implementations may use other schemes, including e.g., ranges of values, fuzzy logic values (e.g., “on”, “neutral” “off”), etc. Other schemes for decoding and/or connectivity will be readily appreciated by artisans of ordinary skill given the contents of the present disclosure.
In one exemplary embodiment, decoding vectors are chosen to closely approximate the desired transformation by minimizing an error metric. For example, one such metric may include e.g., the mean squared-error (MSE). Other embodiments may choose decoding vectors based on one or more of a number of other considerations including without limitation: accuracy, power consumption, memory consumption, computational complexity, structural complexity, and/or any number of other practical considerations.
In one exemplary embodiment, encoding vectors may be chosen randomly from a uniform distribution on the D-dimensional unit hypersphere's surface. In other embodiments, encoding vectors may be assigned based on specific properties and/or connectivity considerations. For example, certain encoding vectors may be selected based on known properties of the shared dendritic fabric. Artisans of ordinary skill in the related arts will readily appreciate given the contents of the present disclosure that decoding and encoding vectors may be chosen based on a variety of other considerations including without limitation e.g.: desired error rates, distribution topologies, power consumption, processing complexity, spatial topology, and/or any number of other design specific considerations.
Under existing technologies, a two-layer kernel's memory-cell count exceeds a three-layer kernel's by a factor of ½N/D (i.e., half the number of neurons (N) divided by the number of continuous signals (D)). However, an all-digital three-layer kernel implements more memory reads (communication) and multiplications (computation) by a factor of D. In contrast, the reduced rank structure of the exemplary spiking neural network 400 does not suffer the same penalties of an all-digital three-layer kernel because the thresholding accumulators 406 can reduce downstream operations without a substantial loss in fidelity (e.g., SNR). In one exemplary embodiment, the thresholding accumulators 406 reduce downstream operations by a factor equal to the average number of spikes required to trip the accumulator. Unlike a non-thresholding accumulator that updates its output with each incoming spike, the exemplary thresholding accumulator's output is only updated after multiple spikes are received. In one such exemplary variant, the average number of input spikes required to trigger an output (k), is selected to balance a loss in SNR of the corresponding continuous signal in the decoded vector, with a corresponding reduction in memory reads.
As a brief aside, several dozen neurons are needed to represent each continuous signal (N/D). The exact number depends on the desired amplitude precision and temporal resolution. For example, representing a continuous signal with 28.3 SNR (signal-to-noise ratio) at a temporal resolution of 100 milliseconds (ms) requires thirty two (32) neurons firing at 125 spikes per second (spike/s) (assuming that each neuron fires independently and that their corresponding decoding vectors' components have similar amplitudes).
Consider a scenario where the incoming point process (e.g., the spike train to be accumulated) obeys a Poisson distribution and the outgoing spike train obeys a Gamma distribution. The SNR (r≡λ/σ) of a Poisson point process filtered by an exponentially decaying synapse is rpoi=√(2τsyn λpoi), where τsyn is the synaptic time-constant and λpoi is the mean spike rate. Feeding this point process to the thresholding accumulator yields a Gamma point process with rgam≈rpoi/√(1+k2/3rpoi2), after it is exponentially filtered (assuming rpoi2>>1 and k2>>1). Thus, the SNR deteriorates negligibly if rpoi>>k. Under such circumstances, the number of downstream operations may be minimized by setting the thresholding accumulator's 406 threshold to a value that offsets the drops in SNR by the reduction in traffic. In one exemplary embodiment, k can be selected such that the average number of spikes required to trip it is k=(4r)2/3, where r is the desired SNR. The desired SNR of 28.3 can be achieved by setting k=23.4; this threshold effectively cuts the accumulator updates 19.7-fold without any deterioration in SNR. Other variants may use more or less aggressive values of k in view of the foregoing trade-offs.
Referring back to
In contrast, the shared dendrite 410 provides weighting within the analog domain as a function of spatial distance. In other words, rather than encoding synaptic weights, the NEF assigns spatial locations that are weighted relative to one another as a function of the shared dendrite 410 resistances. Replacing encoding vectors with dimension-to-tap-point assignments (spatial location assignments) cuts memory accesses since the weights are a function of the physical location within the shared dendrite. Similarly, the resistance loss is a physical feature of the shared dendrite resistance. Thus, no memory is required to store encoding weights, no memory reads are required to retrieve these weights, and no multiply-accumulate operations are required to calculate inner-products. When compared with the two-layer kernel's hardware, memory words are cut by a factor of N2/(D(N+T))≈N/D, where T is the number of tap-points per dimension since T<<N. When used in conjunction with the aforementioned thresholding accumulator 406 (and its associated k-fold event-rate drop), memory reads are cut by a factor of (N/D)/(1+T/k).
Furthermore, instead of performing N+D multiplications and additions for inner product calculations, each of D accumulator values is simply copied to each of the T tap-points assigned to that particular dimension.
While the foregoing discussion is presented within the context of a reduced rank spiking network 400 that combines digital threshold accumulators 406 to provide temporal sparsity and analog diffusers 410 to provide spatial sparsity, artisans of ordinary skill in the related arts will readily appreciate given the contents of the present disclosure that a variety of other substitutions and/or modifications may be made with equivalent success. For example, the various techniques described therein may be combined with singular value decomposition (SVD) to compress matrices with less than full rank; for example, a synaptic weight matrix (e.g., between adjacent layers of a deep neural network) may be transformed into an equivalent set of encoding and decoding vectors. Using these vectors, a two-layer kernel may be mapped onto a reduced rank implementation that uses less memory for weight storage.
Referring now to the shared dendritic operation, various aspects of the present disclosure leverage the inherent redundancy of the encoding process by using the analog diffuser to efficiently fan out and mix outputs from a spatially sparse set of tap-points, rather than via parameterized weighting. As previously alluded to, the greatest fan out takes place during encoding because the encoders form an over-complete basis for the input space. Implementing this fan out within parameterized weighting is computationally expensive and/or difficult to achieve via traditional paradigms. Specifically, the encoding process for all-digital networks required memory to store weighting definitions for each encoding vector. In order to encode stimulus for an ensemble's neurons, prior art neural networks calculated a D-dimensional stimulus vector's inner-product with each of the N D-dimensional encoding vectors assigned to the ensemble's neurons. Performing the inner-product calculation within the digital domain disadvantageously requires memory, communication and computation resources to store N×D vector components, read the N×D words from memory, and perform N×D multiplications and/or additions.
In contrast, the various embodiments described throughout use tap-points that are sparsely distributed in physical location within the analog diffuser. This provides substantial benefits because, inter alia, each neuron's resulting encoder is a physical property of the diffuser's summation of the “anchor encoders” of nearby tap-points, modulated by an attenuation (weight) dependent on the neuron's physical distance to those tap-points. Using this approach, it is possible to assign varied encoders to all neurons without specifying and implementing each one with digital parameterized weights. Additionally, encoding weights may be implemented via a semi-static spatial assignment of the diffuser (a location); thus, encoding weights are not retrieved via memory accesses.
As previously noted, the encoding vectors (i.e., preferred directions) should be greater than the input dimension to preserve precision. However, higher order spaces can be factored and cascaded from substantially lower order input. Consequently, in one exemplary embodiment, higher order input is factored such that the resulting input has sufficiently low dimension to be encoded with a tractable number of tap-points (e.g., 10, 20, etc.) to achieve a uniform encoder distribution. In one exemplary embodiment, anchor encoders are selected to be standard-basis vectors that take advantage of the sparse encode operation. Alternatively, in some embodiments, anchor encoders may be assigned arbitrarily e.g., by using an additional transform.
As a brief aside, any projection in D-dimensional space can be minimally represented with D orthogonal vectors. Multiple additional vectors may be used to represent non-linear and/or higher order stimulus behaviors. Within the context of neural network computing, encoding vectors are typically chosen randomly from a uniform distribution on a D-dimensional unit hypersphere's surface as the number of neurons in the ensemble (N) greatly exceeds the number of continuous signals (D) it encodes.
Referring now to
As used herein, the term “tap-points” refers to spatial locations on the diffuser (e.g., a resistive grid emulated with transistors where currents proportional to the stimulus vector's components are injected). This diffuser communicates signals locally while scaling them with an exponentially decaying spatial profile.
In the case of standard-basis anchor vectors, the amplitude of the component (e) of a neuron's encoding vector is determined by its distances from the T tap-points assigned to the corresponding dimension. For example, synapse 508A has distinct paths to soma 502A and soma 502B, etc., each characterized by different resistances and corresponding magnitudes of currents (e.g., iAA, iAB, etc.) Similarly, synapse 502B has distinct paths to soma 502A and soma 502B, etc., and corresponding magnitudes of currents (e.g., iBA, iBB, etc.) By attenuating synaptic spikes with resistances in the analog domain (rather than calculating inner-products in the digital domain), the shared dendrite eliminates N×D multiplications entirely, and memory reads drop by a factor of N/T. For a network of 256 neurons (N=256), and 8 tap-points (T=8), the corresponding reduction in memory reads is 32-fold.
In one embodiment, randomly assigning a large numbers of tap-points per dimension can yield encoding vectors that are fairly uniformly distributed on the hypersphere for ensembles. In other embodiments, selectively (non-randomly) assigning a smaller number of tap-points per dimension may be preferable where uniform distribution is undesirable or unnecessary; for example, selective assignments may be used to create a particular spatial or functional distribution. More generally, while the foregoing shared dendrite uses randomly assigned tap-points, more sophisticated strategies can be used to assign dimensions to tap-point location. Such strategies can be used to optimize the distribution of encoding vector directions for specific computations, minimize placement complexity, and/or vary encoding performances. Depending on configuration of the underlying grid (e.g., capacity for reconfigurability), these assignments may also be dynamic in nature.
In one exemplary variant, the dimension-to-tap-point assignment includes assigning a connectivity for different tap-points for the current. For example, as shown therein, accumulators 506A and 506B can be assigned to connect to various synapses e.g., 508A, 508B. In some cases, the assignments may be split evenly between positive currents (source) and negative currents (sink). In other words, positive currents may be assigned to a different spatial location than negative currents. In other variants, positive and negative currents may be represented within a single synapse.
In one exemplary embodiment, a diffuser is a resistive mesh implemented with transistors that sits between the synapse's outputs and the soma's inputs, spreading each synapse's output currents among nearby neurons according to their physical distance from the synapse. In one such variant, the space-constant of this kernel is tunable by adjusting the gate biases of the transistors that form the mesh. Nominally, the diffuser implements a convolutional kernel on the synapse outputs, and projects the results to the neuron inputs.
Referring now to
In one exemplary embodiment, the first biases may be selected to attenuate signal propagation as a function of distance from the various tap-points. By increasing the first bias, signals farther away from the originating synapse will experience more attenuation. In contrast, by decreasing the first bias, a single synapse can affect a much larger group of somas.
In one exemplary embodiment, the second biases may be selected to attenuate the amount of signal propagated to each soma. By increasing the second bias, a stronger signal is required to register as spiking activity; conversely decreasing the second bias results in more sensitivity.
Another set of transistors has a binary enable/disable setting thereby enabling “cuts” in the diffuser grid to subdivide the neural array into multiple logical ensembles. Isolating portions of the diffuser grid can enable a single array to perform multiple distinct computations. Additionally, isolating portions of the diffuser grid can enable the grid to selectively isolate e.g., malfunctioning portions of the grid.
While the illustrated embodiment shows a first and second set of biases, various other embodiments may allow such biases to be individually set or determined. Alternatively, the biases may be communally set. Still other variants of the foregoing will be readily appreciated by those of ordinary skill in the related arts, given the contents of the present disclosure. Similarly, various other techniques for selective enablement of the diffuser grid will be readily appreciated by those of ordinary skill given the contents of the present disclosure.
Furthermore, while the foregoing discussion is presented within the context of a two-dimensional diffuser grid, artisans of ordinary skill in the related arts will readily appreciate given the contents of the present disclosure that a variety of other substitutions and/or modifications may be made with equivalent success. For example, higher order diffuser grids may be substituted by stacking chips using TSVs (through-silicon-vias) to transmit its analog signals between neighboring chips. In some such variants, additional dimensions may result in a more uniform distribution of encoding vectors on a hypersphere without increasing the number of tap-points per dimension.
As a brief aside, so-called “linear” decoders (commonly used in all-digital neural network implementations) decode a vector's transformation by scaling the decoding vector assigned to each neuron by that neuron's spike rate. The resulting vectors for the entire ensemble are summed. Historically, linear decoders were used because it was easy to find decoding vectors that closely approximate the desired transformation by e.g., minimizing the mean squared-error (MMSE). However, as previously noted, linear decoders currently update the output for each incoming spike; more directly, as neural networks continue to grow in size, linear decoders require exponentially more memory accesses and/or computations.
However, empirical evidence has shown that when neuronal activity is conveyed as spike trains, linear decoding may be performed probabilistically. For example, consider an incoming spike of a spike train that is passed with a probability equal to the corresponding component of its neuron's decoding vector. Probabilistically passing the ensemble's neuron's spike trains results in a point process that is characterized by a rate (r) that is proportionally deprecated relative to the corresponding continuous signal in the transformed vector. Such memory-less schemes produce Poisson point processes, characterized by an SNR (signal-to-noise ratio) that grows only as a square root of the rate (r). In other words, to double the SNR, the rate (r) must be quadrupled (√4=2); by extension, reducing the rate (r) by a factor of four (4) only attenuates SNR by a factor of ½.
Referring now to
In slightly more detail, the weighted spike train is accumulated within the thresholding accumulator 706 via addition or subtraction according to weights stored within the decode weight memory 704; once the accumulated value breaches a threshold value (+C or −C), an output spike is generated for transmission via the assigned connectivity to synapses 708 and tap-points within the dendrite 710, and the accumulated value is decremented (or incremented) by the corresponding threshold value. In other variants, when the accumulated value breaches a threshold value, an output spike is generated, and the thresholding accumulator returns to zero.
Replacing a linear decoding summation scheme with the thresholding accumulator as detailed herein greatly reduces traffic and avoids hardware multipliers, while simplifying the analog synapse's circuit design. Specifically, the thresholding accumulator sums the rates of deltas instead of superposing them. Accumulation is functionally equivalent to linear decoding via summation, since the NEF encodes the values of delta trains by their filtered rates. However, rather than using multilevel inputs which require a digital-to-analog (DAC) converter that can be costly in terms of area, exemplary embodiments use accumulator deltas that are unit-area deltas with signs denoting excitatory and inhibitory inputs (e.g., +1, −1). In this manner, streams of variable-area deltas generated from somas 702 can be converted back to a stream of unit-area deltas before being delivered to the synapses 708 via the accumulator 706. Operating on delta rates restricts the areas of each delta in the accumulator's output train to be +1 or −1 and encoding value with modulation of only the rate and sign of the outputs. More directly, information is conveyed via a rate and sign, rather than by signal value (which require multiply-accumulates to process.)
For the usual case of weights smaller than one (1), the accumulator produces a lower-rate output stream, reducing traffic compared to the superposition techniques of linear decoding. As previously alluded to, linear decoding conserves spikes from input to output. Thus, O(Din) deltas entering a Din×Dout matrix will result in O(Din×Dout) deltas being output. This multiplication of traffic compounds with each additional weight matrix layer. For example, a N-D-D-N cascading architecture performs a cascaded decode-transform-encode such that O(N) deltas from the neurons results in O(N2D2) deltas delivered to the synapses. In contrast, the exemplary inventive accumulator yields O(N×D) deltas to the synapses of the equivalent network.
In one exemplary embodiment, the thresholding accumulator 706 is implemented digitally for decoding vector components (stored digitally). In one such variant, the decoding vector components are eight (8) bit integer values. In other embodiments, the thresholding accumulator 706 may be implemented in analog via other storage type devices (e.g., capacitors, inductors, memristors, etc.)
In one exemplary embodiment, the accumulator's threshold (C) determines the number of incoming spikes (k) required to trigger an outgoing spike event. In one such variant, C is selected to significantly reduce downstream traffic and associated memory reads.
Mechanistically, the accumulator 706 operates as a deterministic thinning process that yields less noisy outputs than prior probabilistic approaches for combining weighted streams of deltas. The accumulator decimates the input delta train to produce its outputs, performing the desired weighting and yielding an output that more efficiently encodes the input, preserving most of the input's SNR while using fewer deltas.
The accumulator's SNR performance can be adjusted by increasing or decreasing decimation rates (SNR=E[X]/√var(X), where X is the filtered waveform). As shown in
Advantageously, the various principles described herein may be generalized and applied to many different types of applications and scenarios.
One such principle is specifically directed to a multi-layer kernel that synergistically leverages different characteristics of its constituent stages to perform neuromorphic computing. For example, a first stage may leverage the diversity inherent to analog circuitry to enable efficient shared dendritic encoding, whereas a second stage may use digital processing to enable e.g., threshold accumulation. More generally, analog domain processing inexpensively provides diversity, speed, and efficiency, whereas digital domain processing enables a variety of complex logical manipulations (e.g., digital noise rejection, error correction, arithmetic manipulations, etc.). Isolating these functional differences between different layers of a multi-layer (e.g., three-layer) kernel results in substantial operational efficiencies over two-layer kernels (e.g., an “all-digital kernel”). These and other benefits of the present disclosure will be made readily apparent to those of ordinary skill in the related arts, given the contents of the present disclosure.
As used herein, the term “mixed-signal” refers without limitation to circuitry that includes multiple “domains.” Further, as used herein, the term “domain” refers without limitation to a set of circuitries having a common set of processing characteristics. For example, a mixed-signal processor may have an analog domain and a digital domain. Other common examples of domains may include e.g., clock domains, power domains, logic domains, etc.
In one exemplary embodiment, each “layer” of a kernel operates in a functionally distinct domain. For example, a three-layer kernel can isolate an analog domain and a digital domain. In the previously described implementations, the analog domain handles a first processing stage, and the digital domain handles a second processing stage; however, other alternate or more complex configurations may be substituted with equal success. For instance, some layers may contain multiple stages that are logically isolated. Such implementations may have two distinct digital domains characterized by e.g., different threshold accumulation, etc. Other such implementations may have two distinct analog domains characterized by e.g., different tessellations, etc.
As a brief aside, “analog domain processing” refers to signal processing that is based on continuous physical values; common examples of continuous physical values are e.g., electrical charge, voltage, and current. For example, synapses generate analog current values that are distributed through a shared dendrite to somas. In contrast, “digital domain processing” refers to signal processing that is performed on symbolic logical values; logical values may include e.g., logic high (“1”) and logic low (“0”). For example, spike signaling in the digital domain uses data packets to represent a spike.
While exemplary embodiments have been described in the context of a three (3) layer kernel implementing one analog stage and a digital stage, artisans of ordinary skill in the related arts given the contents of the present disclosure will readily appreciate that any number of stages and/or types of domain may be substituted with equal success (including permutation of ordering). For example, a processor may cascade a myriad of digital domains (e.g., a multi-layer kernel that is composed of four (4) or more layers). Still other implementations may use other mixed-signal technologies e.g., electro-mechanical (e.g., piezo electric, surface acoustic wave, etc.). Moreover, while the foregoing discussions are presented in the context of a 2D array, incipient manufacturing technologies may enable more complex dimensions (e.g., 3D, 4D, higher dimensions).
Additionally, while the aforementioned exemplary embodiments describe spiking neural network computing, artisans of ordinary skill in the related arts, given the contents of the present disclosure, will readily appreciate that the principles described herein may be applied to any neuromorphic applications that benefit from diverse layering so as to effectuate one or more desired behaviors or functionalities; e.g., error-tolerant computing, reduced power consumption, and/or any other functionally unique computational primitives.
At step 902 of the method 900, a first matrix sub-computation and a second matrix sub-computation are received from a heterogeneous neuron programming framework. In one embodiment, the matrix sub-computations may be generated using the exemplary Neural Engineering Framework (NEF). For example, a user may call the NEF synthesis tool e.g., to solve a problem in user space.
The heterogeneous neuron programming framework can generate any number of matrix sub-computations; however, the heterogeneous neuron programming framework may consider (or be constrained to) relevant limitations of a physical device, application, and/or use constraint. In one exemplary embodiment, the exemplary NEF may consider physical parameters associated with a mixed-signal circuit. For example, the matrix sub-computations may be generated based on implementation limitations of the specific mixed-signal circuit. Common examples of such implementation limitations may include, without limitation, the number, type, spatial location, and/or other parameters associated with the computational primitives of the mixed-signal circuit. In another such example, the matrix sub-computations may be generated based on user application limitations; for example, a limited operational power budget may require reduced accuracy and/or robustness of a target dynamic.
In one embodiment, the matrix sub-computation describes one or more connections between neuromorphic elements and/or the corresponding magnitude and nature (e.g., excitatory, inhibitory, etc.) of connectivity. Examples of neuromorphic computing elements can include without limitation: neurons, somas, synapses, accumulators, routing elements, and/or any other mixed-signal component emulating neuromorphic functionality.
In one embodiment, the matrix sub-computations are two or more factor matrices of a factorized matrix. In one such implementation, the factorized matrix is a reduced rank matrix. In some implementations, the factorized matrix may be a full rank matrix that has been expanded and/or sparsified (e.g., the addition of additional rows and/or columns which are not linearly independent).
In some implementations, the neuron programming framework may generate matrix sub-computations randomly. Alternative neuron programming frameworks may use pseudo-random, deterministic, and/or predetermined techniques to generate the matrix sub-computations. Still other implementations may mathematically determine matrix sub-computations based on e.g., linearly mixing computational primitive behaviors to achieve a target dynamic. In some instances, the matrix sub-computations may be determined as a combination of multiple techniques e.g., a first matrix sub-computation may be randomly generated, and a second matrix sub-computation may be solved-for.
Returning again to FIG, 9A, at step 904 of the method 900, a first matrix sub-computation is assigned to a first layer of a multi-layer kernel architecture. In one embodiment, the first matrix sub-computation is an “encode” matrix that assigns input signals (e.g., from the user space) to the computational primitives (e.g., of the native space). In one specific implementation, the encode matrix assigns digital spikes to taps (spatial locations/coordinates) of one or more analog domain diffusers.
The analog domain diffuser(s) may perform physical manipulations on current. For example, the analog domain diffuser may distribute currents from one or more synapses to their associated somas via a network of impedance elements. In one exemplary embodiment, the diffuser network provides impedance as a function of spatial distribution. Exemplary diffusers are described within U.S. patent application Ser. No. ______, filed contemporaneously herewith on Jul. 10, 2019 and entitled “METHODS AND APPARATUS FOR SPIKING NEURAL NETWORK COMPUTING BASED ON RANDOMIZED SPATIAL ASSIGNMENTS”, previously incorporated supra.
In one embodiment, the first matrix sub-computation is assigned in a spatially sparse manner (e.g., the neuromorphic elements are distributed in space, and multiple neuromorphic elements are not connected). In one variant, the spatially sparse assignments are random. In some such implementations, the random assignments are based on a distribution; for example, a uniform distribution on a D-dimensional unit hypersphere's surface. In other variants, spatially sparse assignments may be generated based on specific properties and/or connectivity considerations.
At step 906 of the method 900, a second matrix sub-computation is assigned to a second layer of a multi-layer kernel architecture. In one embodiment, the second matrix sub-computation is a “decode” matrix that linearly mixes outputs from the native space into the output vectors (e.g., of the user space). In one specific implementation, decoding includes assigning various decoding weights such that the linear mix of native space signaling approximates a target dynamic to within a desired tolerance.
In one embodiment, the second matrix sub-computation is assigned to a digital domain of the mixed-signal circuit. The digital domain may perform, e.g., logical manipulations on digital spikes. For example, in one exemplary embodiment, digital spike values are multiplied with their corresponding weight values and accumulated within one or more threshold accumulators based on a plurality of decoding weights. Artisans of ordinary skill in the related arts will readily appreciate that a variety of logical operations may be substituted with equivalent success, the foregoing being purely illustrative.
In some implementations, assigning the decoding matrix may entail programming digital logic to achieve a desired arithmetic function. For example, in one such implementation, the digital domain may include a threshold accumulator that may trade-off accuracy and/or robustness for other desirable traits. Reducing the spiking rate with a threshold accumulator may reduce power consumption while balancing loss in fidelity (e.g., signal to noise ratio (SNR)). In another such implementation, a decoding matrix may be configured to sum and/or weight a greater or fewer number of somas to achieve a target dynamic by trading precision for power consumption and/or complexity. More somas can be used to achieve higher precision, whereas fewer somas may be used where lower precision is acceptable.
In one exemplary embodiment, threshold accumulators may introduce temporal sparsity by deprecating the spiking rates between the matrix sub-computations in the digital domain. In one such implementation, a thresholding accumulator can be used as an intermediary layer to reduce spiking rates, such as via the exemplary methods and apparatus described within U.S. patent application Ser. No. ______, filed contemporaneously herewith on Jul. 10, 2019 and entitled “METHODS AND APPARATUS FOR SPIKING NEURAL NETWORK COMPUTING BASED ON THRESHOLD ACCUMULATION”, previously incorporated supra.
Once assigned and programmed, the mixed-signal processor executes/implements the matrix sub-computations to perform neuromorphic computations.
At step 952 of the method 950, an input vector is encoded according to a first matrix sub-computation. In one embodiment, the input vector is a data structure in the “problem space” or “user space”. For instance, the data structure may comprise a “spike” that is represented as a data packet. The data packet may include e.g., an address, and a payload. The address may identify the computational primitive to which the spike is addressed. The payload may identify whether or not the spike is excitatory and/or inhibitory. More complex data structures may incorporate other constituent data within the payload. For example, such alternative payloads may include e.g., programming data, formatting information, metadata, error correction data, and/or any other ancillary data.
While various aspects of the present disclosure are primarily directed to an input vector composed of data packets, artisans of ordinary skill—given the contents of the present disclosure—will readily appreciate that the principles described herein are not so limited. A myriad of other data structures may be used within the user space and/or native space of the mixed-signal processor. Other common examples of data structures which may be encoded into the native space may include e.g., Booleans, signed/unsigned numeric values, integers, floating points, and/or any other data structure common within the digital processing arts.
In some implementations, the input vector may be a feed-forward signal received from an input to the mixed-signal processor. Common examples of inputs to the mixed-signal processor include without limitation: network interfaces, user interfaces, sensors, processing interfaces, memory interfaces, and/or any other similar source of problem space data. In other implementations, the input vector may be fed back from another computational primitive of the mixed-signal processor (e.g., so as to effectuate dynamic behaviors via recurrent or iterative neuromorphic networks). For example, a recurrent neural network may tie the outputs of a soma back to another synapse, soma, dendrite, threshold accumulator, and/or any other neuromorphic entity.
In one embodiment, the input vector is encoded based on an assigned weighting defined by the first matrix sub-computation. In previously described embodiments, the assigned weighting is provided via a connectivity that may be for example randomly chosen from a uniform distribution on a D-dimensional unit hypersphere's surface. In other words, the connectivity corresponds to a weighting of connected (0), excitatory (+1), or inhibitory (−1). Alternative implementations may assign weighting with other techniques; for example, weighting may be a programmable gradient within a range (e.g., from −1 to +1), random (e.g., distributed via a physical substrate), and/or otherwise sufficiently diversified to provide a sufficient basis set for approximating arbitrary multi-dimensional functions of the problem space.
In one exemplary embodiment, the first matrix sub-computation leverages the diversity of manufacturing tolerances of analog components to provide a diverse population of inexpensive physical manipulations. Specifically, different spatial locations (taps) of an analog diffuser may provide a variety of different physical properties. Other implementations may substitute any other sources of diversity, e.g., as a function of technology. For example, alternative schemes for introducing digital diversity may be based on explicit programming (via a LFSR or similar pseudo-random component) or lack thereof (uninitialized digital components often have an unknown state; for example, an uninitialized DRAM may have latent charges stored therein). Similarly, more esoteric technologies may have randomness by virtue of their manufacture (e.g., randomized taps in a piezo-electric or surface acoustic substrate, etc.)
In one exemplary embodiment, the first matrix sub-computation may physically manipulate electrical currents as a function of e.g., spatial distribution within a diffuser. In one implementation, the electrical current is distributed based on spatial locations (taps) within a diffuser element. More directly, the current between any selection of taps varies is a function of physical distance (e.g., due to the impedance of the underlying diffuser network). Performing the manipulation with passive electronics (e.g., attenuation via the I-V properties of a resistive component) is much more efficient as compared to arithmetic alternatives (e.g., digital processing.) Additionally, manufacturing differences in the resistive mesh can contribute desirable sources of diversity at very reasonable cost.
More generally however, any other manipulation may be substituted to accomplish the encoding functionality. Common examples of alternative manipulations may include analog processing such as: amplification, attenuation, filtering, mixing, splitting, and/or any other linear or non-linear signal manipulation. Examples of digital manipulations may include: scaling, filtering, multiplication/division, addition/subtraction, decimation, duplication and/or any other arithmetic manipulation.
Returning to
In one embodiment, the native space vector is an electrical current received at the computational primitive (e.g., soma) of the mixed-signal processor. The electrical current may be a linear superposition of electrical currents received from multiple neuromorphic computational primitives (e.g., somas) via a shared medium (e.g., a shared dendrite). While the present disclosure is described primarily with reference to the electrical current's attenuation (magnitude), artisans of ordinary skill in the related arts given the contents of the present disclosure will readily appreciate that other implementations may incorporate e.g., phase, timing, rate, decay, and/or other physical manipulations.
In one exemplary embodiment, digital spikes are generated for the digital domain based on analog current. In one such implementation, each soma element of a mixed-signal processor receives a current signal and generates “digital spikes” for use within the digital domain. The digital spikes are represented by packets which identify the firing soma (based on a logical address). Exemplary schemes for generating digital spikes are described in greater detail within U.S. patent application Ser. No. 16/358,501 filed Mar. 19, 2019 and entitled “METHODS AND APPARATUS FOR SERIALIZED ROUTING WITHIN A FRACTAL NODE ARRAY,” previously incorporated supra.
More generally, the computational primitive may convert the native space vector back to a problem space data structure (e.g., a data packet representing a spike) for decoding via the second matrix sub-computation. In other embodiments, the computational primitive may directly decode the native space vector. For example, all-digital multi-layer kernel architectures may use spikes in the native space. Similarly, cascaded analog layers may directly operate on weighting analog currents (e.g., via a series of amplifiers/attenuators, RC circuits, etc.) Still other variants may implement a variety of other decoding techniques based on e.g., magnitude, phase, timing, rate, decay, and/or any other neuromorphic property.
In one embodiment, the decoding is based on an assigned weighting defined by the second matrix sub-computation. In previously described embodiments, the assigned weighting is based on decoding weights that approximate a specific target dynamic to within a desired tolerance. The decoding weights may for instance be read from a decoding memory and used within a multiply-accumulate logic (such as a threshold accumulator) to generate spikes. Alternative implementations may decode native space vectors with other techniques; for example, decode weights may be based on a programmable gradient, time decay, binary, etc.
The second matrix sub-computation leverages the pristine digital domain to provide flexible, reliable, and/or complex logical manipulations. In particular, the second matrix sub-computation may implement a variety of different logical and/or processing operations. Common examples of logical and/or arithmetic operations include without limitation: add, subtract, multiply, divide, bit shift, accumulate, etc. Other examples of operations that can be performed may include e.g., matrix manipulations, error correction, error recovery, error detection, noise rejection, and/or any number of other arithmetic functions.
In one embodiment, the digital spikes may be arithmetically multiplied by a decoding weight and/or accumulated. As a brief aside, spike-based signaling may be implemented as edge-based logic; in other words, the spike may be only present or not present (binary) and has no timing relative to other spikes. In some more complex variants, spike-based signaling may additionally include polarity information (e.g., excitatory or inhibitory). Information may be conveyed either as a spike or a number of spikes (e.g., a spike train); for example, a spike train may be used to convey a spike rate (a number of spikes within a period of time). Notably, the binary and/or signed nature spike signaling is particularly suitable for digital domain processing because of its immunity to noise and arithmetic nature (binary and/or signed).
At step 956 of the method 950, the decoded output may be further accumulated to generate an output vector (in user space). In one exemplary embodiment, an output vector is generated when an accumulated value breaches a prescribed threshold value. In one variant, the accumulated value exceeds a positive threshold and/or falls below a negative threshold.
In some embodiments, the accumulating layer of a multi-layer kernel isolates different layers from one another. For example, the threshold accumulator of a three-layer kernel isolates decoding and encoding layers from one another. In other words, transactions from the encoding layer need not be immediately populated to the decoding layer. The isolation qualities of the threshold accumulator advantageously enable, inter alia, the digital domain to arithmetically manipulate digital spikes, and the analog domain to distribute electrical current via a physical diffuser device with reduced interaction.
More directly, the threshold accumulator in one variant presents a lossy interface between the different domains of the first and second matrix sub-computations. Functionally, some amount of loss may be desirable, such as where input vector activity provides more fidelity than is required to generate the output vector. Exemplary threshold accumulators are described in greater detail within U.S.patent application Ser. No. ______, filed contemporaneously herewith on Jul. 10, 2019 and entitled “METHODS AND APPARATUS FOR SPIKING NEURAL NETWORK COMPUTING BASED ON THRESHOLD ACCUMULATION”, previously incorporated supra.
It will be recognized that while certain embodiments of the present disclosure are described in terms of a specific sequence of steps of a method, these descriptions are only illustrative of the broader methods described herein, and may be modified as required by the particular application. Certain steps may be rendered unnecessary or optional under certain circumstances. Additionally, certain steps or functionality may be added to the disclosed embodiments, or the order of performance of two or more steps permuted. All such variations are considered to be encompassed within the disclosure and claimed herein.
Moreover, in the present specification, an implementation showing a singular component should not be considered limiting; rather, the disclosure is intended to encompass other implementations including a plurality of the same component, and vice-versa, unless explicitly stated otherwise herein.
Further, the present disclosure encompasses present and future known equivalents to the components referred to herein by way of illustration.
While the above detailed description has shown, described, and pointed out novel features as applied to various embodiments, it will be understood that various omissions, substitutions, and changes in the form and details of the device or process illustrated may be made by those skilled in the art without departing from principles described herein. The foregoing description is of the best mode presently contemplated. This description is in no way meant to be limiting, but rather should be taken as illustrative of the general principles described herein. The scope of the disclosure should be determined with reference to the claims.
This application claims the benefit of priority to U.S. Provisional Patent Application Ser. No. 62/696,713 filed Jul. 11, 2018 and entitled “METHODS AND APPARATUS FOR SPIKING NEURAL NETWORK COMPUTING”, which is incorporated herein by reference in its entirety. This application is related to U.S. patent application Ser. No. ______ filed contemporaneously herewith on Jul. 10, 2019 and entitled “METHODS AND APPARATUS FOR SPIKING NEURAL NETWORK COMPUTING BASED ON THRESHOLD ACCUMULATION”, U.S. patent application Ser. No. ______, filed contemporaneously herewith on Jul. 10, 2019 and entitled “METHODS AND APPARATUS FOR SPIKING NEURAL NETWORK COMPUTING BASED ON RANDOMIZED SPATIAL ASSIGNMENTS”, and U.S. patent application Ser. No. 16/358,501 filed Mar. 19, 2019 and entitled “METHODS AND APPARATUS FOR SERIALIZED ROUTING WITHIN A FRACTAL NODE ARRAY”, each of the foregoing being incorporated herein by reference in its entirety.
This invention was made with Government support under contract N00014-15-1-2827 awarded by the Office of Naval Research, under contract N00014-13-1-0419 awarded by the Office of Naval Research and under contract NS076460 awarded by the National Institutes of Health. The Government has certain rights in the invention.
Number | Date | Country | |
---|---|---|---|
62696713 | Jul 2018 | US |