Artificial Neural Network (ANN) is a computational model (algorithm) based on the neuron model of the human brain. A simple model of a neuron has one or more input nodes (input layer), followed by computer processing (in a hidden layer), which leads to one or more output nodes (output layer). Neurons can be connected to each other and to inputs via synapses in a human nervous system. ANN is a simplified abstraction of its biological counterpart, biological neural network (BNN). In ANN, synapses are weights given to an input node. Thus the value of synapse weight is a measure of the strength of the contribution of the corresponding input in determining the output nodes.
ANNs are widely used for machine learning such as stock market prediction, character recognition, speech recognition, threat recognition, machine vision (also known as computer vision), image compression, etc., to name just a few applications. In general, neural networks are useful for modeling scenarios where the input-output relationship follows complicated patterns which are difficult to visualize and model.
The importance of the role of neurons in human vision is readily apparent when one compares current computer-based image processing systems to the way humans interpret image stimulations (i.e., visible light). Human vision is almost instantaneous upon receiving light whereas computer image processing is much slower. In computer image processing, acquired data must be off-loaded to a processor external to the focal plane array for processing. On the other hand, if the pixels in a focal plane array could perform some basic image processing tasks, it would go a long way toward making machine vision approach the speed of human vision. This capability of pixels in a focal plane array would amount to pixels behaving, in a rough approximation, like the neurons responsible for human vision. A neuromorphic focal plane array would be such a focal plane array
Similarly, speech recognition using computers can potentially be accelerated by orders of magnitude if sound signals can be analyzed (at least partially) at the sensor level without post-processing using external processors.
This invention concerns neural architectures and systems. In one example, they could be part of a neuromorphic focal plane array capable of real time vision processing.
The invention describes methods by which a mathematical algorithm is realized in a neural network architecture and then instantiated in neuromorphic circuits and hardware for processing information. “Neuromorphic” here denotes that elements of the network and its physical realization (i.e., circuit or hardware) behave like biological neurons since they accept inputs and process them into intermediate outputs at the neuron level in real-time or near real-time.
In addition to computer vision, such architectures can be used for sound processing also. However, the emphasis in this invention is on neuromorphic pixels and focal plane arrays of pixels.
This invention shows neuromorphic models of pixels and operations of pixel arrays, and presents a method for translating them into a circuit board.
In general, according to one aspect, the invention disclosed here describes a method and a system for implementing a mathematical algorithm as a neural architecture and realizing that architecture on an integrated circuit (IC) chip. The neural architecture has neurons that are capable converting current to frequency, voltage to frequency, frequency to frequency and time to frequency. The neurons can have multi-sensor (multiple synapses) inputs for either scaling an inhibiting neuron outputs.
The mathematical algorithm-to-neural architecture-to-hardware conversion method and system is general in nature but has been specifically demonstrated on mathematical algorithms designed for image processing applications.
In general, according to another aspect, the invention features neuromorphic system for processing signals from a sensor. The system comprises a synapse and a neuron that receives a current from the synapse and produces a frequency output.
Preferably, the synapse is controlled by a bias voltage. It might receive a current from a photodetector.
In some examples, the synapse or neuron is controlled by a second sensor. Also, multiple synapses can feed into the same neuron.
In embodiments, the neuron comprises a capacitor that is charged by the current from the synapse. A comparator then compares the voltage on the capacitor to a threshold voltage and resets the capacitor based on a comparison of the threshold voltage with the voltage of the capacitor.
To process information from an image sensor, frequency outputs from multiple neurons are collected to perform a convolution.
In general, according to another aspect, the invention features a method for embedding in an integrated circuit chip a neuromorphic architecture. The method comprises providing multiple neuromorphic circuit elements and performing a convolution with the circuit elements.
In general, according to another aspect, the invention features a method of embedding a mathematical algorithm into a neuromorphic architecture. The method comprises providing a desired algorithm, and generating a hardware-optimized algorithm, generating a neuron-optimized algorithm, and providing a Verilog description of chip design obtained from neural network definition of neuron-optimized algorithm, which in turn is obtained from the hardware-optimized algorithm.
The above and other features of the invention including various novel details of construction and combinations of parts, and other advantages, will now be more particularly described with reference to the accompanying drawings and pointed out in the claims. It will be understood that the particular method and device embodying the invention are shown by way of illustration and not as a limitation of the invention. The principles and features of this invention may be employed in various and numerous embodiments without departing from the scope of the invention.
In the accompanying drawings, reference characters refer to the same parts throughout the different views. The drawings are not necessarily to scale; emphasis has instead been placed upon illustrating the principles of the invention. Of the drawings:
The invention now will be described more fully hereinafter with reference to the accompanying drawings, in which illustrative embodiments of the invention are shown. This invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art.
As used herein, the term “and/or” includes any and all combinations of one or more of the associated listed items. Further, the singular forms and the articles “a”, “an” and “the” are intended to include the plural forms as well, unless expressly stated otherwise. It will be further understood that the terms: includes, comprises, including and/or comprising, when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. Further, it will be understood that when an element, including component or subsystem, is referred to and/or shown as being connected or coupled to another element, it can be directly connected or coupled to the other element or intervening elements may be present.
It will be understood that although terms such as “first” and “second” are used herein to describe various elements, these elements should not be limited by these terms. These terms are only used to distinguish one element from another element. Thus, an element discussed below could be termed a second element, and similarly, a second element may be termed a first element without departing from the teachings of the present invention.
Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the relevant art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
The basic element is a linear integrate-and-fire (LIF) neuron as exhibited in
This LIF node is capable of several types of data processing and transformations depending on the synapse's gate and source stimulus and the comparator's configuration. Furthermore, the synapse enables weighting of the integrated charge through numerous methods, e.g., FET width scaling, multiple synaptic paths, and adaptable gate voltage bias via wired control or programmable floating-gate. This can be used to perform scalar or non-linear functions allowing for features like per-neuron gain control or more complex mathematical operations like logarithmic transformations.
In summary, the core LIF node has several interesting characteristics: 1) Capability to process voltage, current, frequency, or time information, 2) Output in frequency or time, 3) Direct interface with digital logic or subsequent LIF stages enabling further quantization or computation, 4) Input scaling via synapse modulation, 5) Linear or non-linear input-to-output relationship (as configured,) and 6) Very low power consumption.
When applied to large sensor systems, such as image sensors, this node can provide a variety of valuable features: 1) Low power data conversion between sensors and digital layer, 2) Reconfigurable synaptic modulation for real-time scaling changes, 3) Multi-modal processing—processing multiple sensor streams at the same time, and 4) Low power pre-processing, e.g., 2-D convolution, multiplication/division, saliency.
Data Conversion Capabilities of LIF:
Although the output of LIF neurons is frequency (voltage spikes per sec), its input can be current, voltage, frequency or time since the inputs are mathematically related as discussed below.
Type 1 Current to Frequency (
In this mode, shown in
Type 2 Voltage to Frequency (
In this mode, a sensor 114 provides voltage information which produces the integrated current modulated by the resistance of the synapse. A depiction is given in
Because of the voltage source, more current scaling options are able to be employed, e.g., FET widening and multiple synaptic paths per input. This allows current gains greater than 1 to be achieved.
Type 3 Frequency to Frequency (
In this mode, the voltage source 114 of the synapse is fixed, and fixed-width pulse trains Fin are used to stimulate the synaptic channel. The comparator COMP and reset FET 112 are tuned to generate equivalently sized pulse widths upon firing. An example circuit (
Type 4 Time to Frequency (
This mode differs from the Type 3 mode only in that the input is time Tin rather than frequency. An example is given in
Output Scaling:
From Eq. 1, frequency can be independently scaled on a per-pixel basis via synapse current and/or Vth (COMP threshold) modulation. Synapse current can be modulated by Vbias and/or additional sink or source paths depending on whether the synapse is current or voltage sourced, respectively. Current-sourced synapses allow for a current scaling factor from 0 to 1. Voltage-sourced synapses allow for current scaling factors greater than 1. Vth modulation works as an independent scale factor linearly affecting the output frequency.
Multi-Sensor Processing:
To summarize information presented thus far, the core LIF node is capable of operating on a wide range of information types. Any combinations of voltage/current/frequency/time to frequency/time processing is achievable through configurations of the synapse and neuron. Furthermore, since the integration mechanism is the same for all modes, additional synaptic pathways controlled by disparate (or similar) sensor types can be added. These additional pathways will operate according to Kirchoff's current laws enabling multi-sensor interaction.
An example is shown in
A sensor stream can be used to scale or directly compete against another stream as exemplified by the two images of
In
In
The above examples show that several linear operations, e.g., addition, subtraction, and scaling, are directly realizable via LIF variations. These operations form the foundation of 2-D convolution, a common pre-processing step to a large variety of algorithms and a near-ubiquitous step in imaging applications.
Pre-Processing Computation:
Linear, Frequency Mode:
Eqs. 6 through 9 detail the mechanics of this operation. As is shown, Fout is proportional to the weighted sum of Fin1 and Fin2 multiplied by a scaling factor. To allow easier handling of the equations, this operation can be modeled more simply with Eqs. 10 and 11. This simpler model is depicted in
In addition, a mathematical algorithm can be identified and implemented in neuromorphic hardware. The key components of this process are shown in
The mathematical algorithm ST1 can be provided in a variety of forms, but preferably in some high-level language such as Matlab. Once the high level algorithm is defined, the hardware optimization begins by identifying functional pieces of the algorithm that may not be amenable to hardware implementations. In the current process this is done by manually but in the preferred embodiment automated tools assist in this identification process. Certain mathematical processes, such as argmax, which denote the input value where a function is maximum, or more complex mathematical functions (e.g., von Mises distributions, Bessel functions, etc.) do not lend themselves to straight forward hardware implementations. At this stage, they are replaced with hardware “friendly” approximations or substituted by simpler functions that retain the key aspects of the algorithm (for instance substituting the L1 norm for an L2 norm). The result of this step ST2 is the hardware-optimized algorithm.
The disclosed neuromorphic hardware affords unique advantages in terms of power savings and computational performance. To provide the best performance of the hardware-optimized algorithm in the neuromorphic hardware, the next step looks at where in the hardware-optimized algorithm one can get further optimizations based on the performance characteristics of the neurons themselves. This can take the form of identifying areas in the algorithms that can take advantage of reduced precision computation (i.e., such as quantizing kernel weights) and time-mode processing inherent in neural systems. The result is a neuron-optimized version ST3 of the mathematical algorithm.
The translation from the neuron-optimized code to the neural network definition involves taking that neuron-optimized code and expanding on the capabilities of neuromorphic computing by performing calculations through chains of synthetic neurons, thereby increasing the number of inputs to the system beyond the limiting number of synapses. A detailed description of this process with a specific example is discussed with respect to scaling below. This results in a neural network definition ST4 that can be realized on neuromorphic hardware.
The translation of the neural network definition to the Verilog description ST5 begins the automated process of implementing the algorithm in hardware. The neural network definition is similar to a netlist. The Verilog tools take that netlist and translate it to a Verilog description ST5 which is reviewed and modified to produce a chip design ST6 in adherence to fabrication rules. That chip design is then fabricated resulting in a fabricated IC chip ST7 that has analog neuromorphic circuitry that implements the original mathematical algorithm.
Most techniques try to go directly from the mathematical algorithm to neuromorphic hardware without being fully optimized. Opportunities for optimization may be missed by not performing the first four steps of the process of
Others have tried to solve the problem, but their fundamental approach is different in that their approach is either all digital or mixed signal. Furthermore, they tend to go directly from the algorithm to hardware without searching for possible areas for optimization.
The analog neuromorphic hardware affords power savings and computational power that other approaches (digital, mixed signal) do not have. This affords opportunities for implementing algorithms in this analog neuromorphic that other approaches do not have. Thus, the first four steps, while manual, affords us optimizations that other approaches do not.
Scaling:
Currently, one neuron with a set number of synapses is used to perform one calculation, such as the 2-D convolution with a 3×3 kernel. For convolution, a neuron with 9 synapses is used, one for each element of the kernel. However, in some cases, the number of inputs to a system will outnumber the synapses of the available neuron. In order to expand on the capabilities of neuromorphic computing, calculations are performed through chains of synthetic neurons, thereby increasing the number of inputs to the system beyond the limiting number of synapses. Here a methodology is introduced to implement this. As a test case, the methodology computes a 2-D convolution of an image with a 3×3 kernel using only synthetic neurons with 2 synapses.
For a test case a chip with 4 neurons, each of which has 2 synapses, was used. The neurons are labeled n0, n1, n2 and n3. The inputs and synapse weights of each neuron are configured by an FPGA. Each neuron will be used for multiple calculations, and the intermediate results will be stored in the FPGA to be passed as inputs to the next cycle of calculations.
To replicate the function of a single 9 synapse neuron, a tree of 2 synapse neurons is built, as shown in
For each pixel, one first determines the synapse tracking weights and inputs for each neuron based on the location of the pixel in the image and the convolution kernel. Due to edge effects, neuron inputs need to be duplicated if the filter kernel is run on edge or corner pixels. The inputs for each element of the kernel operating on a corner or edge of the image are listed below.
For clarity, p is omitted in
The side pixels for xth column will be represented as follows: px,0=top and px,m=bottom. The side pixels for the yth row are p0,y=left and pn,y=right.
Upper left (
Upper right (
Lower left (
Lower right (
Top, column. x (
Bottom, column x (
Side left, row y (
Side right, row y (
Initially, all 9 inputs were added pair-wise without regards the synapse weights being inhibitory or excitatory. However, this led to cases where a large inhibitory input was added to a small excitatory input, resulting in an output of 0 as the neurons are incapable of outputting negatives. This introduced a large amount of error. This was corrected for by sorting the inputs into inhibitory or excitatory, and performed pair-wise addition on each type of input separately. For this addition, all inhibitory inputs were treated as excitatory. Only in the final step the inhibitory inputs were negated and subtraction is performed. This ensures that that large inhibitory inputs are not masked by addition with small excitatory inputs.
Each addition through a neuron also introduced a scale factor, indicated by square brackets in Eq. 9, to the output. Under the operating conditions of the chip, the scale factor equaled approximately 0.18. As the tree structure required passing an input through multiple neurons, this reduced the output magnitude by up to 104. To compensate for this, an inverse scale factor (rounded to the nearest integer) was introduced through the FPGA, and multiplied all intermediates by this factor. To ensure that all inputs are subject to the same number of scalings, the pair-wise addition tree was rebuilt such that each input passes through 4 neurons.
In addition, the synthetic neurons reported high spike counts than expected when given lower synapse weights. In these cases, the error caused by higher spike counts was compensated by decreasing the compensating scale factor. Given the following weight selections, the scaling factors for the output of the neuron are as follows:
Additionally, a bias was found in the subtraction operation as performed by the synthetic neurons. When adding an excitatory input and an inhibitory input, the neuron requires the sinking inhibitory output to be larger than the sourcing excitatory input to produce a result of 0. At the final subtraction, the excitatory input must be scaled down to get an accurate output. This was compensated for by using the variance in the performance of the 4 neurons on the chip. Neuron n1, when given an input of between 3300 and 5000 spikes in 2 ms, undercounts by 15-20%. Final addition of the excitatory inputs was routed through this neuron to scale down the output in relation to the output from the addition of the inhibitory outputs. The residual bias left after this adjustment was removed in post-processing of the image by subtracting a constant from all recorded FPGA outputs.
While this invention has been particularly shown and described with references to preferred embodiments thereof, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the scope of the invention encompassed by the appended claims.
This application claims the benefit under 35 USC 119(e) of U.S. Provisional Application No. 62/474,353 filed on Mar. 21, 2017, which is incorporated herein by reference in its entirety.
Number | Date | Country | |
---|---|---|---|
62474353 | Mar 2017 | US |