This application is related to co-owned U.S. patent application Ser. No. 13/238,932 filed Sep. 21, 2011, entitled “ADAPTIVE CRITIC APPARATUS AND METHODS”, now issued as U.S. Pat. No. 9,156,165, U.S. patent application Ser. No. 13/313,826, filed Dec. 7, 2011, entitled “APPARATUS AND METHODS FOR IMPLEMENTING LEARNING FOR ANALOG AND SPIKING SIGNALS IN ARTIFICIAL NEURAL NETWORKS”, U.S. patent application Ser. No. 13/314,018, filed Dec. 7, 2011, entitled “APPARATUS AND METHODS FOR IMPLEMENTING LEARNING FOR ANALOG AND SPIKING SIGNALS IN ARTIFICIAL NEURAL NETWORKS”, and U.S. patent application Ser. No. 13/314,066, filed Dec. 7, 2011, entitled “NEURAL NETWORK APPARATUS AND METHODS FOR SIGNAL CONVERSION”, each of the foregoing incorporated herein by reference in its entirety.
A portion of the disclosure of this patent document contains material that is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, as it appears in the Patent and Trademark Office patent files or records, but otherwise reserves all copyright rights whatsoever.
1. Technological Field
The present disclosure relates to machine learning apparatus and methods, and in particular, to learning with analog and/or spiking signals in artificial neural networks.
2. Description of Related Art
Artificial spiking neural networks are frequently used to gain an understanding of biological neural networks and for solving artificial intelligence problems. These networks typically employ pulse-coded mechanisms, which encode information using the timing of pulses. Such pulses (also referred to as “spikes” or ‘impulses’) are short-lived (typically on the order of 1-2 ms) discrete temporal events. Several exemplary embodiments of such encoding are described in a commonly owned and co-pending U.S. patent application Ser. No. 13/152,084 entitled “APPARATUS AND METHODS FOR PULSE-CODE INVARIANT OBJECT RECOGNITION”, filed Jun. 2, 2011, and U.S. patent application Ser. No. 13/152,119, Jun. 2, 2011, entitled “SENSORY INPUT PROCESSING APPARATUS AND METHODS”, now issued as U.S. Pat. No. 8,942,466, each incorporated herein by reference in its entirety.
Spiking neural networks offer several benefits over other classes of neural networks, including without limitation: greater information and memory capacity, richer repertoire of behaviors (tonic and/or phasic spiking, bursting, spike latency, spike frequency adaptation, resonance, threshold variability, input accommodation and bi-stability), as well as efficient hardware implementations.
Biological neurons may be classified according to their electrophysiological characteristics and discharge patterns: Similarly, in artificial spiking neuron networks, tonic or regular spiking may be used to describe neuron behavior where the neuron is typically constantly (or tonically) active. Phasic or bursting spiking may be used to describe neuron behavior where the neuron fires in bursts.
In various implementations of spiking neural networks, it may be assumed that weights are the parameters that can be adapted. This process of adjusting the weights is commonly referred to as “learning” or “training”.
Supervised learning is often used with spiking neural networks. In supervised learning, a set of example pairs (x,yd), xεX, ydεY are given, where X is the input domain and Y is the output domain, and the aim is to find a function ƒ; X→Y in the allowed class of functions that matches the examples. In other words, we wish to infer the mapping implied by the data. The learning process is evaluated using a so-called “cost function”, which quantifies the mismatch between the mapping and the data, and it implicitly contains prior knowledge about the problem domain. A commonly used cost is the mean-squared error, which tries to minimize the average squared error between the network's output, y, and the target value yd over all the example pairs.
In various control applications (e.g., when controlling a motor actuator), it may be required to gate analog and/or spiking signals based on a spiking and/or analog input. When implementing gating functionality, most existing methodologies for implementing learning for analog and spiking signals in artificial neural networks may employ different node types and learning algorithms configured to process only one, specific signal type, for example, only analog or only spiking signal type. Such an approach has several shortcomings, for example, the necessity to provide and maintain learning rules and nodes of different types and node duplication and proliferation in circumstances in which the network is configured to process signals of the mixed types (analog and spiking). Network configurations comprising nodes of different types, therefore prevent dynamic node reconfiguration and reuse during network operation. Furthermore, learning methods of prior art that are suitable for learning for analog signals are not suitable for learning for spike-timing encoded signals. Similarly learning rules for spike-based signals are not efficient in training neural networks for processing analog signals.
Based on the foregoing, there is a salient need for apparatus and method for implementing unified approach to learning and training of artificial neuronal network comprising spiking neurons that are capable of processing spiking and/or analog inputs and generating spiking and/or analog outputs.
The present disclosure satisfies the foregoing needs by providing, inter cilia, apparatus and methods for implementing learning in artificial neural networks.
In one aspect of the disclosure, a method of operating a node in a computerized neural network is disclosed. In one embodiment, the method includes: (i) causing the node to generate tonic spiking output using a learning rule which combines at least one spiking input signal and at least one analog input, and (ii) causing the node to suppress output generation for a period of time using a teaching signal associated with the learning rule.
In a second aspect of the disclosure, a computer implemented method of implementing learning in a neural network is disclosed. In one embodiment, the method includes: (i) processing (e.g., at a node of the network), at least one spiking input signal and at least one analog input signal using a parameterized rule, (ii) modifying a state of the node in accordance with the parameterized rule in accord with the spiking signal and the analog signal, and (iii) generating a spiking output signal at the node based on the modification of the node state.
In a third aspect of the disclosure, a computer-implemented method of synaptic gating in a network is disclosed. In one embodiment, the method is performed by one or more processors configured to execute computer program modules.
In one variant, the method includes: (i) generating an output at a node of the network, the output configured to inhibit a gating unit of the network, (ii) based on at least one spiking input signal, at least one continuous input signal and a teaching signal, pausing generation of the output; and (iii) based on the pausing, activating the gating unit, thereby effectuating the synaptic gating.
In a fourth aspect of the disclosure, a non-transitory computer-readable apparatus configured to store one or more processes thereon is disclosed. In one embodiment, the one or more processes are configured to implement a learning rule on a neural network. The one or more processes comprise in one variant a plurality of instructions configured to, when executed: (i) receive, at a node of the neural network, at least one discreet input signal and at least one continuous input signal, (ii) based at least in part on the at least one discreet signal and the at least one continuous signal, adjust at least one characteristic of the node in accordance with the learning rule, and (iii) based at least in part on the adjustment, generate at least one of (a) a discreet output and (b) a continuous output at the node.
In a fifth aspect of the disclosure, a neural network configured to implement synaptic gating in accordance with at least one parameterized rule is disclosed. In one embodiment, the network includes: (i) a plurality of connections configured to facilitate transmission of spiking and non-spiking signals and (ii) a plurality of mixed-mode nodes inoperative communication with said plurality of connections.
In one variant, the nodes are configured to: (i) generate an output configured to inhibit one or more synaptic gates, and (ii) cease generation of the output for a duration based on a parameterized rule. In response to the cessation, the one or more synaptic gates are configured to switch one or more signals transmitted via the plurality of connections.
In a sixth aspect of the disclosure, neuronal network logic is disclosed. In one implementation, the neuronal network logic comprises a series of computer program steps or instructions executed on a digital processor. In another implementation, the logic comprises hardware logic (e.g., embodied in an ASIC or FPGA).
In a seventh aspect of the disclosure, a computer readable apparatus is disclosed. In one implementation the apparatus comprises a storage medium having at least one computer program stored thereon. The program is configured to, when executed, implement learning in a mixed signal artificial neuronal network.
In an eighth aspect of the disclosure, a system is disclosed. In one implementation, the system comprises an artificial neuronal (e.g., spiking) network having a plurality of “universal” nodes associated therewith, and a controlled apparatus (e.g., robotic or prosthetic apparatus).
In a ninth aspect of the disclosure, a universal node for use in a neural network is disclosed. In one implementation, the node comprises a node capable of dynamically adjusting or learning with respect to heterogeneous (e.g., spiking and non-spiking) inputs.
Further features of the present disclosure, its nature and various advantages will be more apparent from the accompanying drawings and the following detailed description.
All Figures disclosed herein are © Copyright 2012-2013 Brain Corporation. All rights reserved.
Exemplary implementations of the present disclosure will now be described in detail with reference to the drawings, which are provided as illustrative examples so as to enable those skilled in the art to practice the disclosure. Notably, the figures and examples below are not meant to limit the scope of the present disclosure to a single implementation. Rather, other implementations are possible by way of interchange of or combination with any or all of the described or illustrated elements. Wherever convenient, the same reference numbers will be used throughout the drawings to refer to same or similar parts.
Where certain elements of these implementations can be partially or fully implemented using known components, only those portions of such known components that are necessary for an understanding of the present disclosure will be described, and detailed descriptions of other portions of such known components will be omitted so as not to obscure the disclosure.
In the present specification, an implementation showing a singular component should not be considered limiting; rather, the disclosure is intended to encompass other implementations including a plurality of the same component, and vice-versa, unless explicitly stated otherwise herein.
Further, the present disclosure encompasses present and future known equivalents to the components referred to herein by way of illustration.
As used herein, the terms “computer”, “computing device”, and “computerized device” may include one or more of personal computers (PCs) and/or minicomputers (e.g., desktop, laptop, and/or other PCs), mainframe computers, workstations, servers, personal digital assistants (PDAs), handheld computers, embedded computers, programmable logic devices, personal communicators, tablet computers, portable navigation aids, J2ME equipped devices, cellular telephones, smart phones, personal integrated communication and/or entertainment devices, and/or any other device capable of executing a set of instructions and processing an incoming data signal.
As used herein, the term “computer program” or “software” may include any sequence of human and/or machine cognizable steps which perform a function. Such program may be rendered in a programming language and/or environment including one or more of C/C++, C#, Fortran, COBOL, MATLAB™, PASCAL, Python, assembly language, markup languages (e.g., HTML, SGML, XML, VoXML), object-oriented environments (e.g., Common Object Request Broker Architecture (CORBA)), Java™ (e.g., J2ME, Java Beans), Binary Runtime Environment (e.g., BREW), and/or other programming languages and/or environments.
As used herein, the terms “connection”, “link”, “transmission channel”, “delay line”, “wireless” may include a causal link between any two or more entities (whether physical or logical/virtual), which may enable information exchange between the entities.
As used herein, the term “memory” may include an integrated circuit and/or other storage device adapted for storing digital data. By way of non-limiting example, memory may include one or more of ROM, PROM, EEPROM, DRAM, Mobile DRAM, SDRAM, DDR/2 SDRAM, EDO/FPMS, RLDRAM, SRAM, “flash” memory (e.g., NAND/NOR), memristor memory, PSRAM, and/or other types of memory.
As used herein, the terms “integrated circuit”, “chip”, and “IC” may be meant to refer to an electronic circuit manufactured by the patterned diffusion of trace elements into the surface of a thin substrate of semiconductor material. By way of non-limiting example, integrated circuits may include field programmable gate arrays (e.g., FPGAs), a programmable logic device (PLD), reconfigurable computer fabrics (RCFs), application-specific integrated circuits (ASICs), and/or other types of integrated circuits.
As used herein, the terms “microprocessor” and “digital processor” may be meant generally to include digital processing devices. By way of non-limiting example, digital processing devices may include one or more of digital signal processors (DSPs), reduced instruction set computers (RISC), general-purpose (CISC) processors, microprocessors, gate arrays (e.g., field programmable gate arrays (FPGAs)), PLDs, reconfigurable computer fabrics (RCFs), array processors, secure microprocessors, application-specific integrated circuits (ASICs), and/or other digital processing devices. Such digital processors may be contained on a single unitary IC die, or distributed across multiple components.
As used herein, the term “network interface” refers to any signal, data, and/or software interface with a component, network, and/or process. By way of non-limiting example, a network interface may include one or more of FireWire (e.g., FW400, FW800, etc.), USB (e.g., USB2), Ethernet (e.g., 10/100, 10/100/1000 (Gigabit Ethernet), 10-Gig-E, etc.), MoCA, Coaxsys (e.g., TVnet™), radio frequency tuner (e.g., in-band or OOB, cable modem, etc.), Wi-Fi (802.11), WiMAX (802.16), PAN (e.g., 802.15), cellular (e.g., 3G, LTE/LTE-A/TD-LTE, GSM, etc.), IrDA families, and/or other network interfaces.
As used herein, the terms “node”, “neuron”, and “neuronal node” may be meant to refer, without limitation, to a network unit (e.g., a spiking neuron and a set of synapses configured to provide input signals to the neuron) having parameters that are subject to adaptation in accordance with a model.
As used herein, the terms “state” and “node state” may be meant generally to denote a full (or partial) set of dynamic variables used to describe node state.
As used herein, the term “synaptic channel”, “connection”, “link”, “transmission channel”, “delay line”, and “communications channel” include a link between any two or more entities (whether physical (wired or wireless), or logical/virtual) which enables information exchange between the entities, and may be characterized by a one or more variables affecting the information exchange.
As used herein, the term “Wi-Fi” may include one or more of IEEE-Std. 802.11, variants of IEEE-Std. 802.11, standards related to IEEE-Std. 802.11 (e.g., 802.11 a/b/g/n/s/v/ac), and/or other wireless standards.
As used herein, the term “wireless” means any wireless signal, data, communication, and/or other wireless interface. By way of non-limiting example, a wireless interface may include one or more of Wi-Fi, Bluetooth, 3G (3GPP/3GPP2), HSDPA/HSUPA, TDMA, CDMA (e.g., IS-95A, WCDMA, etc.), FHSS, DSSS, GSM, PAN/802.15, WiMAX (802.16), 802.20, narrowband/FDMA, OFDM, PCS/DCS, LTE/LTE-A/TD-LTE, analog cellular, CDPD, satellite systems, millimeter wave or microwave systems, acoustic, infrared (i.e., IrDA), and/or other wireless interfaces.
Overview
In one aspect of the disclosure, apparatus and methods for universal node design directed implementing a universal learning rule in a neural network are disclosed. This approach advantageously allows, inter alia, simultaneous processing of different input signal types (e.g., spiking and non-spiking, such as analog) by the nodes; generation of spiking and non-spiking signals by the node; and dynamic reconfiguration of universal nodes in response to changing input signal type and/or learning input at the node, not available to the existing spiking network solutions. These features are enabled, at least in part, through the use of a parameterized universal learning model configured to automatically adjust node model parameters responsive to the input types during training, and is especially useful in mixed signal (heterogeneous) neural network applications.
In one implementation, at one instance, the node apparatus, operable according to the parameterized universal learning model, receives a mixture of analog and spiking inputs, and generates a spiking output based on the node parameter that is selected by the parameterized model for that specific mix of inputs. At another instance, the same node receives a different mix of inputs, that also may comprise only analog or only spiking inputs) and generates an analog output based on a different value of the node parameter that is selected by the model for the second mix of inputs.
In another implementation, the node apparatus may change its output from analog to spiking responsive to a training input for the same inputs.
Thus, unlike traditional artificial neuronal networks, the universal spiking node of the exemplary embodiment of the present disclosure may be configured to process a mixed set of inputs that may change over time, using the same parameterized model. This configuration advantageously facilitates training of the spiking neural network, and allows node reuse when the node representation of input and output signals (spiking vs. non-spiking signal representation) to the node changes.
In a broader sense, the disclosure provides methods and apparatus for implementing a universal learning mechanism that operates on different types of signals, including but not limited to firing rate (analog) and spiking signals.
Detailed descriptions of the various aspects, implementations and variants of the apparatus and methods of the disclosure are now provided.
The disclosure finds broad practical application. Implementations of the disclosure may be, for example, deployed in a hardware and/or software implementation of a computer-controlled system, provided in one or more of a prosthetic device, robotic device and any other specialized apparatus. In one such implementation, a control system may include a processor embodied in an application specific integrated circuit (ASIC), a central processing unit (CPU), a graphics processing unit (GPU), a digital signal processor (DSP) or an application specific processor (ASIP) or other general purpose multiprocessor, which can be adapted or configured for use in an embedded application such as controlling a robotic device. However, it will be appreciated that the innovation is in no way limited to the foregoing applications and/or implementations.
Principles of the present disclosure may advantageously be applicable to various control applications (such as, for example, robot navigation controller; an automatic drone stabilization, robot arm control, etc.) that use a spiking neural network as the controller and comprise a set of sensors and actuators that produce signals of different types. Some sensors may communicate their state data using analog variables, whereas other sensors employ spiking signal representation.
By way of example, a set of such heterogeneous sensors may comprise, without limitation, the following:
Similarly, some of the actuators (e.g., electric DC motors, pneumatic or hydraulic cylinders, etc.) may be driven by analog signals, while other actuators may be driven by analog or spiking signals (e.g. stepper motors, and McKibben artificial muscles). In such heterogeneous system, the spiking controller may be required to integrate and concurrently process analog and spiking signals and similarly produce spiking and analog signals on its different outputs.
In some applications the encoding method may change dynamically depending on the additional factors, such as user input, a timing event, or an external trigger. In the example described supra, such a situation occurs when the sensors/motors operate in the different regimes such that, for example, in one region of the sensor/actuator operational state space a spiking signal representation may be more appropriate for data encoding, whereas in another region of operation an analog signal encoding may be more appropriate (e.g. as in the case of the accelerometer, as described above).
In some existing implementations of mixed signal networks, e.g., such as described in U.S. patent application Ser. No. 13/313,826, entitled “APPARATUS AND METHODS FOR IMPLEMENTING LEARNING FOR ANALOG AND SPIKING SIGNALS IN ARTIFICIAL NEURAL NETWORKS”, filed Dec. 7, 2011, incorporated supra, neurons may communicate via pulses. As discussed above, such pulses (also referred to as “spikes” or ‘impulses’) may comprise short-lived (typically on the order of 1-2 ms) discrete temporal events. In some implementations, spikes may correspond to messages, memory content (e.g., binary ‘0’/1’) and or other event indicators. When using spike communication, information may be encoded into the latency (e.g., a time interval) between two or more spikes and/or with respect to a reference.
In one or more applications, when the duration of the period of activity and/or inactivity must be controlled, it may be advantageous to encode information using pauses in a spiking neuron network. As used herein, the term “pause” may be used to describe an absence of neuron activity (e.g., absence of spikes) for a variable period of time. A pause may be characterized by a duration and/or timing (e.g., the onset time). Information encoding using pauses may be of use when processing multiple sensory and/or action sources and when filtering out redundant and/or unnecessary stimuli.
The control block 102 may provide a control signal 104 (e.g., a motor control signal to operate an actuator). The gating block 120 may be configured to pass (e.g., gate) the control signal 104 and to generate a gated signal 124 based on a gating input 114 from the learning block 110. In one or more implementations, signal gating may be utilized during selection of one or more of several signals and/or actions. In some control applications where one or more controllers may compete for the same actuators, the gating mechanism may be employed for selecting an action associated with individual controller (and/or from a subset of controllers). In some variants, signal gating may comprise a process in which a predetermined set of conditions, when established, permits another process to occur.
The network of the learning block may be adapted in accordance with a learning rule and a teaching signal 108. In some implementations, inputs into the learning block may comprise the control signal (denoted by the broken line 105 in
The network of the learning block 110 (and blocks 102, 120) of the apparatus in
where:
For example, for integrate and fire model of the neuron, the state vector and the state model may be expressed as:
where C is a membrane constant, ures is a value to which voltage is set after output spike (reset value). Accordingly, Eqn. 1 becomes:
In some implementations, the neuron process of Eqn. 1 may be expressed as:
and a, b, c, d are parameters of the model.
Some algorithms for spike-time learning (especially, reinforcement learning) in spiking neural networks may be represented using the following general equation described, for example, in co-pending and co-owned U.S. patent application Ser. No. 13/487,499 entitled “STOCHASTIC APPARATUS AND METHODS FOR IMPLEMENTING GENERALIZED LEARNING RULES”, incorporated supra:
where:
An exemplary eligibility trace may comprise a temporary record of the occurrence of an event, (such as visiting of a state, taking an action, or receipt of pre-synaptic input, etc.). The trace marks the parameters associated with the event (e.g., the synaptic connection, pre- and post-synaptic neuron IDs) as eligible for undergoing learning changes. In one approach, when a teaching (e.g., a reward/punishment signal occurs, only eligible states or actions may be ‘assigned credit’ for success or ‘blamed’ for an error. Thus, the eligibility traces aid in bridging the gap between the events and the training information.
The network of the block 110 may be operated in accordance with a learning process. A target signal {ydj} (e.g., the signal 108 in
The network of
During training, state of the network (neurons and/or synapses) may be adapted in accordance with a learning rule. For example, the neuron state adjustment may comprise, a firing threshold adjustment, output signal generation, node susceptibility or excitability modifications according to a variety of methods, such as those described in co-owned U.S. patent application Ser. No. 13/152,105 filed on Jun. 2, 2011, and entitled “APPARATUS AND METHODS FOR TEMPORALLY PROXIMATE OBJECT RECOGNITION”, now issued as U.S. Pat. No. 9,122,994, incorporated herein by reference in its entirety.
In some implementations, the neuron process may be characterized by a time constant τn=RC, where R is the input resistance and C is the membrane capacitance as defined in Eqn. 3. The firing threshold u may be used to determine output signal generation (firing) of a neuron. In a deterministic neuron, the neuron may generate the output (i.e., fires a spike) whenever the neuronal state variable u(t) exceeds the threshold υ. In a stochastic neuron, firing probability may be described by a probabilistic function of (υ−u(t)), e.g.:
p(υ−u(t))=exp(u(t)−υ), where u(t)<υ
When the stochastic neuron generates an output, the state variable u(t) may be reset to a predetermined reset value ureset(t)<υ. In an exemplary embodiment, the neuron state variable u(t) may be held at the reset level for a period of time trefr, referred to as the refractory period. In absence of any subsequent inputs to the neuron, the neuron state settles at the resting potential ures(t).
In various implementations, the synaptic connection adjustment includes modification of synaptic weights, and/or synaptic delays according to a variety of applicable synaptic rules, such as for example those described in and co-owned and co-pending U.S. patent application Ser. No. 13/239,255 filed on Sep. 21, 2011, and entitled “APPARATUS AND METHODS FOR SYNAPTIC UPDATE IN A PULSE-CODED NETWORK”, incorporated herein by reference in its entirety.
Referring now to
In the implementation illustrated in
The universal learning rule of the node 302 may be, for the exemplary embodiment, described as follows:
where:
is low-pass filtered version of the target spike train for neuron j, with a filter time constant τdj;
is the low-pass filtered version of the output spike train from neuron j, with a filter time constant τj; and
In some implementations (including the exemplary embodiment above), the low-pass filtered version of the spike train may be expressed as:
with a(s) being a smoothing kernel. In one or more variants, the smoothing kernel may comprise an exponential, Gaussian, and/or another function of time, configured using one or more parameters. Further, the parameters may comprise a filter time constant τ. An example of an exponential smoothing kernel is:
ak(s)=exp(−s/τ)
where τ is the kernel time constant.
In one or more implementations, the learning rate may be configured to vary with time, as described in detail in U.S. patent application Ser. No. 13/722,769 filed Dec. 20, 2012, and entitled “APPARATUS AND METHODS FOR STATE-DEPENDENT LEARNING IN SPIKING NEURON NETWORKS”, now issued as U.S. Pat. No. 8,990,133, the foregoing being incorporated herein by reference in its entirety.
The learning rule given by Eqn. 3 may be applicable to both online learning and batch learning, and the learning rule signal regime (i.e., analog vs. spiking) may be determined by changing just one parameter (or a defined parameter set) as described below. The signals
In one approach, connections may be modeled as a low-pass filter that delivers an input signal (the synaptic response i(t)) into post-synaptic neuron in response to receiving input spikes S(t), described as:
The synaptic time constant of the filter corresponds to the parameter τs in Eqn. 11. The synapse may be characterized by a synaptic delay d that defines a delay between the inputs spikes and the synaptic response i(t). In one variant, this may be achieved using the relationship of the input to the synapse, S(t−d).
In some implementations, transmission of spikes by synapses may be described using a deterministic model so that every input (or every input with one or more predetermined characteristics) spike generates a synaptic response i(t). In other implementations, transmission of spikes by synapses may be described using a stochastic approach, where synaptic responses are determined using a stochastic function (e.g., a response probability) based on the input.
Case 1: Learning in the Spike-Timing Domain (Spiking Inputs/Spiking Outputs)
Components of the rule of Eqn. 3 may, in the limit of τJ→0, τdJ→0 and with τi equal to the corresponding time constant of the i-th input signal, be expressed as
Accordingly, the learning rule of Eqn. 3 may be expressed as:
The learning rule of Eqn. 4 may be used to effectuate learning for a subset of the input signals reproduce target signals encoded in precise spike timing.
Case 2: Learning in the Firing-rate Domain (Analog Inputs, Analog Outputs)
Components of the rule of Eqn. 3 in the limit where the time constants τj, τdj, τi are long enough, such that the signals
may approximate firing rate of the corresponding spike trains, that is
In this case, the learning rule of Eqn. 3 may take the form:
In Eqn. 7 the signals
may be represented by floating-point values.
Case 3: Spiking Inputs, Analog Outputs
The time constants τj, τdj, τi can also be set up such that the spike-based and rate-based (analog) encoding methods are combined by a single universal neuron, e.g., the neuron 302 of
and τi→0, the learning rule of Eqn. 3 may take the following form:
which may be appropriate for learning in configurations where the input signals to the neuron 302 are encoded using precise spike-timing, and whereas the target signal ydj and output signals yj use the firing-rate-based encoding. In one variant, the analog output signals yj are represented using the floating-point computer format, although other types of representations appreciated by those of ordinary skill given the present disclosure may be used consistent with the disclosure as well.
Case 4: Analog Inputs, Spiking Outputs
In yet another case, applicable to firing rate based (analog) inputs and spiking outputs, the time constants τj, τdj corresponding to the analog inputs are infinitesimal (i.e. τj→0, τdj→0), such that
Further, the time constant τi may be much larger than τj, τdj such that
Accordingly, the learning rule of Eqn. 3 may take the following form:
This form may be appropriate for training of neurons receiving signals encoded in the neural firing rate and producing signals encoded in precise spike timing.
Other combinations of the spike-based and firing-based encoding within a single trained neuron are also possible. In one such implementation, by setting the time constants τi individually for each synaptic input 304, some inputs 304 become configured to respond to precise spike timing signals, while other inputs become configured to respond only to the firing rate signals.
During learning, model and node network parameter updates may be effectuated, upon receiving and processing a particular input by the node and prior to receipt of a subsequent input. This update mode may be referred to as the online-learning. In some implementations, parameter updates may be computed, buffered, and implemented at once in accordance with an event. In other implementations, the event may correspond to a trigger generated upon receipt of a particular number (a pre-selected or dynamically configured) of inputs, timer expiration, and/or an external event. This mode of network operation is often termed “batch learning”.
The learning method described by Eqn. 3 may be generalized to apply to an arbitrary synaptic learning rule as follows:
where:
The parameterized functions (
The approach described by Eqn. 16 provides a learning continuity for the input signals comprising both the analog and the spiking inputs and for the input signals that change their representation from one type (e.g., analog or spiking) to another in time.
As noted for the specific implementation of the rule described by Eqn. 8, the general approach also permits the training of neural networks that combine different representations of signals processed within networks.
A neural network trained according to one or more implementations may be capable of, inter alia, processing mixed sets of inputs that may change their representation (e.g., from analog to spiking and vice versa) over time, using the same neuron model. In some implementations, a single node may be configured to receive input signals, wherein some sets of inputs to the node carry information encoded in spike timing, while other sets of inputs carry information encoded using analog representation (e.g., firing rate).
In one or more implementations, training of the spiking neural network may enable the same nodes to learn to process different signal types, thereby facilitating node reuse and simplifying network architecture and operation. By using the same nodes for different signal inputs, a requirement for duplicate node populations and duplicate control paths (e.g., one for the analog and one for the spiking signals) is removed and a single population of universal nodes may be adjusted in real time to dynamically changing inputs and outputs. These advantages may be traded for a reduced network complexity, size and cost, or increased network throughput for the same network size.
In reinforcement learning, the input data x(t) may not be available, but are generated via an interaction between a learning agent and the environment. For individual time instance t, the agent may perform an action y_t and the environment generates an observation x_t and an instantaneous cost c_t, according to some (usually unknown) dynamics. The aim of the reinforcement learning is to discover a policy for selecting actions that minimizes some measure of a long-term cost; i.e., the expected cumulative cost. In some implementations, environmental dynamics and/or long-term cost-function associated with individual learning rules (policy) may be unknown in advance. In such implementations, the environmental dynamics and/or the long-term cost may be estimated through experience and learning.
By way of a non-limiting illustration, actions that may be associated with negative reinforcement signal (e.g. received through the environment) may be avoided whenever a similar context is experienced, e.g., a rover that hits an obstacle during learning, may learn to select a different action (e.g., turn right) when a similar context (e.g., visual input) occurs again. Conversely, actions that may result in a positive reinforcement signal may more likely to be executed again in the same context. In one or more implementations, the context may comprise sensory input received by the robot, robot state (e.g., position, speed) and/or other parameters.
In one implementation, training of neural network using reinforcement learning approach is used to control an apparatus (e.g., a robotic device) in order to achieve a predefined goal, such as for example to find a shortest pathway in a maze. This is predicated on the assumption or condition that there is an evaluation function that quantifies control attempts made by the network in terms of the cost function. Reinforcement learning methods like those described in detail in co-owned U.S. patent application Ser. No. 13/238,932 filed Sep. 21, 2011, and entitled “ADAPTIVE CRITIC APPARATUS AND METHODS”, now issued as U.S. Pat. No. 9,156,165, incorporated supra, can be used to minimize the cost and hence to solve the control task, although it will be appreciated that other methods may be used consistent with the disclosure as well.
Reinforcement learning is typically used in applications such as control problems, games and other sequential decision making tasks. However, such learning is in no way limited to the foregoing.
In some implementations, the methodology described herein may be utilized with unsupervised learning. In machine learning, unsupervised learning may refer to finding hidden structure in unlabeled data (e.g., clustering, and/or dimensionality reduction). Other tasks where unsupervised learning is used may include without limitation), estimation of statistical distributions, data compression and filtering.
The learning rules described by Eqn. 8, Eqn. 12-Eqn. 15 may be used to implement a neuron network (e.g., the network of the learning block 100 of
Traces 220, 222 in
Trace 250 depicts activity of one or more of the network neurons. The activity of trace 250 is comprised of several periods of spiking activity (e.g., the spike trains 252, 254) separated by period of inactivity (e.g., the pauses 256, 258).
Trace 260 depicts output activity of the network (e.g., the gated output 124 of
In some implementations, the active neurons, corresponding to the trace 250, may be configured to provide an inhibitory signal. By way of a non-limiting example, a neuron of the network (that is associated with the trace 250, hereinafter referred to as the neuron 250) may be configured to generate tonic output (e.g. the spike trains 252, 254 that may provide inhibitory input into one or more neurons that are associated with the activity trace 260). In the absence of the teaching signal (default base state), the activity of the trace 250 may inhibit the activity of the trace 260, as indicated by the pauses (e.g., absence of spikes) 266, 268.
Upon receiving one or more teaching spikes 242, 244, the neuron 250 may transition to an inhibited state where the output is suppressed (as indicated by absence of spikes during pauses 256, 258, respectively). Absence of spiking activity on trace 250 may cause spiking activity on trace 260, depicted by spike trains 262, 264.
The spiking activity 262, 264 may correspond to delivery (pass through) of the control signal 230 (e.g., the signal 104 in
Trace 224 in
Trace 270 illustrates activity of one or more of the neurons of the learning network (e.g. of the block 110 in
Trace 280 illustrates output activity of the gating apparatus (e.g., the signal 124 in
In some implementations, the active neurons, corresponding to the trace 270, may be configured to provide an inhibitory signal.
Upon receiving one or more teaching spikes 243, 245, the neuron 270 may transition to an inhibited state where the output is suppressed (as indicated by absence of spikes during pauses 276, 278, respectively). Absence of spiking activity on trace 270 may cause activity on trace 280, depicted by curves 282, 284.
Referring now to
At time t2, (i) the node 402 may receive mixed inputs 438 via the connections 404, and it produces analog output y1(t) 440; (ii) the node 412 receives a group of mixed inputs 448 via the connections 414 and it produces spiking output s2(t) 450; (iii) the node 422 receives a group of spiking inputs 458 via the connections 424, and it produces analog output y3(t) 460; and (iv) the node 432 receives a group of spiking inputs 478 via the connections 434 and it produces analog output y4(t) 480.
It can be seen from
1. A single spiking neuron (e.g., the neuron 302 of
2. 600 input connections (e.g., the connections 304 in
3. A connection (e.g., the connection 312 in
After a certain period of time subsequent to the occurrence of the teaching spike 514, the tonic network output is resumed. As shown in
As shown in the panel 610, at the beginning of training (e.g., the output area 616) network output is little affected in the vicinity of the training input 612. As the training progresses, network gradually learns to pause its output (as shown by the output area 618) in the vicinity of the training input 612.
As shown in the panel 710, time periods corresponding to peaks in the training signal (e.g., at about 0.12, 0.3, 0.4 s) level of the output signal 714 is reduced, compared to the output level prior and subsequent to the teaching signal peaks. In some implementations of analog output encoding, such reduction in the analog output signal level may correspond to pause generation.
In one or more implementations, methods of
Referring now to
At operation 802 of method 800, state of the network may be configured to produce tonic output. In one or more implementations, the tonic output by the network may comprise one or more signals encoding spiking and/or analog representation. In some variants, the output may be configured based on a sensory input comprising spiking and/or analog signals (e.g., the signals 220, 222 in
At operation 804 a determination may be made as to whether a training signal is present. When the training signal is not present, the method may proceed to operation 802.
When the training signal is present, the method may proceed to operation 806 where network state may be adjusted in accordance with a learning process associated with the training signal. In one or more implementations, the network adjustment may be configured to suppress the output in accordance with one or more applicable learning rules (e.g., the rules Eqn. 3-Eqn. 8, described above). In one or more implementations, the duration and the timing (e.g., time of onset) of the pause may be configured based on a teaching signal timing (e.g., teaching spike) and/or parameters of the learning process, such as: (a) the time constants of the low-pass filters adjusted individually for individual signals:
At operation 902 of method 900, action input may be received. In one or more implementations, the action input may comprise one or more actions (e.g., a motor commands, sensor activation commands such as for example, depicted by traces 230, 232 in
At operation 904 a determination may be made as to whether a gating activity signal is present. In an exemplary embodiment, the gating activity may correspond to the tonic activity of operation 802 of method 800, and/or signal 250 of
When the gating signal is present, the method may proceed to operation 906 where (at least a portion of) the action input may be provided to a desired destination. In one or more implementations, the destination may comprise a motor, a motor controller, and/or another network block. Further, it will be appreciated that in some cases, absence of inhibition (associated with the gating activity) e.g., pause 256 in
In one or more variants, the action input may comprise two or more action signals comprising, for example, brake activation and acceleration activation commands for a robotic rover controller. The gating activity may be used to gate individual action signals to cause the rover to brake and/or accelerate at appropriate time instances.
At operation 808 a gated output may be provided. The gated output may be based on a control input provided to the network. The gated output may comprise one or more portions of the control input that correspond to the time periods of the suppressed output (e.g., pauses) effectuated at operation 806. In one or more implementations, the tonic output of step 802 may be used to inhibit activity of a network gating portion. In some variants, absence of tonic output (during a pause) may activate activity of the gating portion thereby enable provision of (for example, the gated control signal at operation 808.
In some implementations, adaptive gating methodology described herein may be utilized in control applications, telecommunication, neurobiology, where it may be of benefit to adaptively learn timing and/or duration of gating events.
Gating mechanisms may be used for selecting appropriate actions (based for example on a learning policy) and/or for resolving conflicts when multiple functional areas of a control system may attempt to access common resources. By way of a non-limiting example of an autonomous rover equipped with a tracking camera, the camera controller may attempt to command the rover platform to move back (in order, for example, to capture full scene), rover navigation controller may attempt to command the rover platform move forward, and/or rover obstacle avoidance controller may instruct the rover platform not to move back (due to an obstacle being present). In such implementations, the gating mechanism may be utilized for selecting action in accordance with a control policy. In one or more implementations, such policies may comprise rover platform preservation (safety); speed of data acquisition (e.g., when performing emergency and/or rescue mission, energy conservation for long autonomous missions, and/or other policy.
In some implementations, a of a mobile robot comprising a navigation controller and a obstacle avoidance controller. When a conflict arises between actions requested by individual controllers (e.g. move left for target approach vs. move right to avoid an obstacle) a gating mechanism, as the one described in the disclosure, may be employed in order to allow individual action to be executed. In one or more implementations, such action selection may be based e.g., on learned priorities and learned duration of the gating.
Again, similar gating mechanisms can be used in telecommunication, when several signals are to be transmitted through a single communication channel with a limited information capacity. Gating is a mechanism that can be used there to allow only selected signals to be propagated through the communication channel. Again it will be advantageous to adaptively learn the priorities and the time window of the access for the particular signals.
In one or more implementations, particular adaptive gating as described in this disclosure, may provide a useful mechanism for operating control systems that may be characterized by a resource ‘bottleneck’, (e.g., a competition between multiple processes for the same resource(s).
Apparatus and methods implementing universal learning rules of the disclosure advantageously allow for an improved network architecture and performance. Unlike traditional artificial neuronal networks, the universal spiking node/network of the present disclosure is configured to process a mixed set of inputs that may change their representation (from analog to spiking, and vice versa) over time, using the same parameterized model. This configuration advantageously facilitates training of the spiking neural network, allows the same nodes to learn processing of different signal types, thereby facilitating node reuse and simplifying network architecture and operation. By using the same nodes for different signal inputs, a requirement for duplicate node populations and duplicate control paths (e.g., one for the analog and one for the spiking signals) is removed, and a single population of universal nodes may be adjusted in real time to dynamically changing inputs and outputs. These advantages may be traded for a reduced network complexity, size and cost for the same capacity, or increased network throughput for the same network size.
In one implementation, the universal spiking network is implemented as a software library configured to be executed by a computerized spiking network apparatus (e.g., containing a digital processor). In another implementation, the universal node comprises a specialized hardware module (e.g., an embedded processor or controller). In some implementations, the spiking network apparatus may be embodied in a specialized or general purpose integrated circuit, such as, for example ASIC, FPGA, or PLD). Myriad other configurations exist that will be recognized by those of ordinary skill given the present disclosure.
Advantageously, the present disclosure can be used to simplify and improve control tasks for a wide assortment of control applications including without limitation industrial control, navigation of autonomous vehicles, and robotics. Exemplary implementations of the present disclosure are useful in a variety of devices including without limitation prosthetic devices (such as artificial limbs), industrial control, autonomous and robotic apparatus, HVAC, and other electromechanical devices requiring accurate stabilization, set-point control, trajectory tracking functionality or other types of control. Examples of such robotic devices include manufacturing robots (e.g., automotive), military devices, and medical devices (e.g. for surgical robots). Examples of autonomous vehicles include rovers (e.g., for extraterrestrial exploration), unmanned air vehicles, underwater vehicles, smart appliances (e.g. ROOMBA®), etc. The present disclosure can advantageously be used also in all other applications of artificial neural networks, including: machine vision, pattern detection and pattern recognition, signal filtering, data segmentation, data compression, data mining, optimization and scheduling, or complex mapping.
It will be recognized that while certain aspects of the disclosure are described in terms of a specific sequence of steps of a method, these descriptions are only illustrative of the broader methods of the disclosure, and may be modified as required by the particular application. Certain steps may be rendered unnecessary or optional under certain circumstances. Additionally, certain steps or functionality may be added to the disclosed implementations, or the order of performance of two or more steps permuted. All such variations are considered to be encompassed within the disclosure disclosed and claimed herein.
While the above detailed description has shown, described, and pointed out novel features of the disclosure as applied to various implementations, it will be understood that various omissions, substitutions, and changes in the form and details of the device or process illustrated may be made by those skilled in the art without departing from the disclosure. The foregoing description is of the best mode presently contemplated of carrying out the disclosure. This description is in no way meant to be limiting, but rather should be taken as illustrative of the general principles of the disclosure. The scope of the disclosure should be determined with reference to the claims.
Number | Name | Date | Kind |
---|---|---|---|
5063603 | Burt | Nov 1991 | A |
5092343 | Spitzer et al. | Mar 1992 | A |
5245672 | Wilson et al. | Sep 1993 | A |
5355435 | DeYong et al. | Oct 1994 | A |
5388186 | Bose | Feb 1995 | A |
5408588 | Ulug | Apr 1995 | A |
5467428 | Ulug | Nov 1995 | A |
5638359 | Peltola | Jun 1997 | A |
5673367 | Buckley | Sep 1997 | A |
5875108 | Hoffberg | Feb 1999 | A |
6009418 | Cooper | Dec 1999 | A |
6014653 | Thaler | Jan 2000 | A |
6169981 | Werbos | Jan 2001 | B1 |
6363369 | Liaw et al. | Mar 2002 | B1 |
6458157 | Suaning | Oct 2002 | B1 |
6532454 | Werbos | Mar 2003 | B1 |
6545705 | Sigel | Apr 2003 | B1 |
6545708 | Tamayama | Apr 2003 | B1 |
6546291 | Merfeld | Apr 2003 | B2 |
6581046 | Ahissar | Jun 2003 | B1 |
6601049 | Cooper | Jul 2003 | B1 |
6643627 | Liaw et al. | Nov 2003 | B2 |
6917925 | Berenji et al. | Jul 2005 | B2 |
7395251 | Linsker | Jul 2008 | B2 |
7426501 | Nugent | Sep 2008 | B2 |
7672920 | Ito et al. | Mar 2010 | B2 |
7752544 | Cheng | Jul 2010 | B2 |
7849030 | Ellingsworth | Dec 2010 | B2 |
8015130 | Matsugu | Sep 2011 | B2 |
8103602 | Izhikevich | Jan 2012 | B2 |
8315305 | Petre | Nov 2012 | B2 |
8467623 | Izhikevich | Jun 2013 | B2 |
8655815 | Palmer | Feb 2014 | B2 |
8751042 | Lee | Jun 2014 | B2 |
20020038294 | Matsugu | Mar 2002 | A1 |
20030050903 | Liaw et al. | Mar 2003 | A1 |
20040193670 | Langan | Sep 2004 | A1 |
20050015351 | Nugent | Jan 2005 | A1 |
20050036649 | Yokono | Feb 2005 | A1 |
20050283450 | Matsugu | Dec 2005 | A1 |
20060161218 | Danilov | Jul 2006 | A1 |
20070022068 | Linsker | Jan 2007 | A1 |
20070176643 | Nugent | Aug 2007 | A1 |
20070208678 | Matsugu | Sep 2007 | A1 |
20080024345 | Watson | Jan 2008 | A1 |
20080162391 | Izhikevich | Jul 2008 | A1 |
20090043722 | Nugent | Feb 2009 | A1 |
20090287624 | Rouat | Nov 2009 | A1 |
20100086171 | Lapstun | Apr 2010 | A1 |
20100166320 | Paquier | Jul 2010 | A1 |
20100169098 | Patch | Jul 2010 | A1 |
20100198765 | Fiorillo | Aug 2010 | A1 |
20110016071 | Guillen | Jan 2011 | A1 |
20110119214 | Breitwisch | May 2011 | A1 |
20110119215 | Elmegreen | May 2011 | A1 |
20110160741 | Asano | Jun 2011 | A1 |
20120011090 | Tang | Jan 2012 | A1 |
20120011093 | Aparin | Jan 2012 | A1 |
20120036099 | Venkatraman | Feb 2012 | A1 |
20120109866 | Modha | May 2012 | A1 |
20120303091 | Izhikevich | Nov 2012 | A1 |
20120308076 | Piekniewski | Dec 2012 | A1 |
20120308136 | Izhikevich | Dec 2012 | A1 |
20130073080 | Ponulak | Mar 2013 | A1 |
20130073491 | Izhikevich | Mar 2013 | A1 |
20130073493 | Modha | Mar 2013 | A1 |
20130073496 | Szatmary | Mar 2013 | A1 |
20130073500 | Szatmary | Mar 2013 | A1 |
20130151448 | Ponulak | Jun 2013 | A1 |
20130151449 | Ponulak | Jun 2013 | A1 |
20130151450 | Ponulak | Jun 2013 | A1 |
20130204820 | Hunzinger | Aug 2013 | A1 |
20130218821 | Szatmary | Aug 2013 | A1 |
20130251278 | Izhikevich | Sep 2013 | A1 |
20130297541 | Piekniewski | Nov 2013 | A1 |
20130325766 | Petre | Dec 2013 | A1 |
20130325768 | Sinyavskiy | Dec 2013 | A1 |
20130325773 | Sinyavskiy | Dec 2013 | A1 |
20130325774 | Sinyavskiy | Dec 2013 | A1 |
20130325775 | Sinyavskiy | Dec 2013 | A1 |
20130325776 | Ponulak et al. | Dec 2013 | A1 |
20130325777 | Petre | Dec 2013 | A1 |
20140016858 | Richert | Jan 2014 | A1 |
20140025613 | Ponulak | Jan 2014 | A1 |
20140032458 | Sinyavskiy | Jan 2014 | A1 |
20140081895 | Coenen et al. | Mar 2014 | A1 |
20140193066 | Richert | Jul 2014 | A1 |
20140222739 | Ponulak | Aug 2014 | A1 |
Number | Date | Country |
---|---|---|
102226740 | Oct 2011 | CN |
1089436 | Apr 2001 | EP |
4087423 | Mar 1992 | JP |
2108612 | Oct 1998 | RU |
2406105 | Dec 2010 | RU |
2008083335 | Jul 2008 | WO |
2008132066 | Nov 2008 | WO |
Entry |
---|
Aleksandrov (1968), Stochastic optimization, Engineering Cybernetics, 5, 11-16. |
Amari (1998), Why natural gradient?, Acoustics, Speech and Signal Processing. (pp. 1213-1216). Seattle, WA, USA. |
Baras, D. et al. “Reinforcement learning, spike-time-dependent plasticity, and the BCM rule.” Neural Computation vol. 19 No. 8 (2007): pp. 2245-2279. |
Bartlett et al., (2000) “A Biologically Plausible and Locally Optimal Learning Algorithm for Spiking Neurons” Retrieved from http://arp.anu.edu.au/ftp/papers/ jon/brains.pdf.gz. |
Baxter et al. (2000.). Direct gradient-based reinforcement learning. In Proceedings of the international Symposium on Circuits. |
Bennett, M.R., (1999), The early history of the synapse: from Plato to Sherrington.Brain Res. Bull., 50(2): 95-118. |
Bohte et al., “A Computational Theory of Spike-Timing Dependent Plasticity: Achieving Robust Neural Responses via Conditional Entropy Minimization” 2004. |
Bohte, (2000). SpikeProp: backpropagation for networks of spiking neurons. In Proceedings of ESANN'2000, (pp. 419-424). |
Bohte, ‘Spiking Nueral Networks’ Doctorate at the University of Leiden, Holland, Mar. 5, 2003, pp. 1-133 [retrieved on Nov. 14, 2012]. Retrieved from the internet: <URL: http://holnepages,cwi ,n11-sbolltedmblica6ond)hdthesislxif>. |
Booij (Jun. 2005). A Gradient Descent Rule for Spiking Neurons Emitting Multiple Spikes. Information Processing Letters n. 6, v.95 , 552-558. |
Bouganis et al., (2010) “Training a Spiking Neural Network to Control a 4-DoF Robotic Arm based on Spike Timing-Dependent Plasticity”. Proceedings of WCC1201 0 IEEE World Congress on Computational Intelligence, CCIB, Barcelona, Spain, Jul. 18-23, 2010, pp. 4104-4111. |
Breiman et al., “Random Forests” 33pgs, Jan. 2001. |
Brette et al., Brian: a simple and flexible simulator for spiking neural networks, The Neuromorphic Engineer, Jul. 1, 2009, pp. 1-4, doi: 10.2417/1200906.1659. |
Capel, “Random Forests and Ferns” LPAC, Jan. 11, 2012, 40 pgs. |
Cuntz et al., ‘One Rule to Grow Them All: A General Theory of Neuronal Branching and Its Paractical Application’ PLOS Computational Biology, 6 (8), Published Aug. 5, 2010. |
Davison et al., PyNN: a common interface for neuronal network simulators. Frontiers in Neuroinformatics, Jan. 2009, pp. 1-10, vol. 2, Article 11. |
D'Cruz (1998) Reinforcement Learning in Intelligent Control: A Biologically-Inspired Approach to the Re/earning Problem Brendan May 1998. |
de Queiroz, M. et al. “Reinforcement learning of a simple control task using the spike response model.” Neurocomputing vol. 70 No. 1 (2006): pp. 14-20. |
Djurfeldt, Mikael, The Connection-set Algebra: a formalism for the representation of connectivity structure in neuronal network models, implementations in Python and C++, and their use in simulators BMC Neuroscience Jul. 18, 2011 p. 1 12(Suppl I ):P80. |
El-Laithy (2011), A reinforcement learning framework for spiking networks with dynamic synapses, Comput lntell Neurosci. |
Fidjeland et al., Accelerated Simulation of Spiking Neural Networks Using GPUs [online],2010 [retrieved on Jun. 15, 2013], Retrieved from the Internet: URL:http:// ieeexplore.ieee.org/xpls/abs—all.jsp?ammber=5596678&tag=1. |
Fletcher (1987), Practical methods of optimization, New York, NY: Wiley-Interscience. |
Floreano et al. “Neuroevolution: From Architectures to learning” Evol. Intel. Jan 2008 1:47-62 (retrieved online on Apr. 24, 2013 from http://infoscience.epfl.ch/record/112676/files/FloreanoDuerrMattiussi2008.pdf). |
Florian (2005), A reinforcement learning algorithm for spiking neural networks SYNASC '05 Proceedings of the Seventh International Symposium on Symbolic and Numeric Algorithms for Scientific Computing. |
Florian Reinforcement Learning Through Modulation of Spike-Timing-Dependent Synaptic Plasticity, Razvan V. Florian Neural Computation 19, 1468-1502 (2007) Massachusetts Institute of Technology. |
Fremaux et al., “Functional Requirements for Reward-Modulated Spike-Timing-Dependent Plasticity”, The Journal of Neuroscience, Oct. 6, 2010, 30 (40):13326-13337. |
Froemke et al., Temporal modulation of spike-timing-dependent plasticity, Frontiers in Synaptic Neuroscience, vol. 2, Article 19, pp. 1-16 [online] Jun. 2010 [retrieved on Dec. 16, 2013]. Retrieved from the internet: <frontiersin.org>. |
Fu (2005) Stochastic Gradient Estimation, Technical Research Report. |
Fu (2008), What You Should Know About Simulation and Derivatives Naval Research Logistics, vol. 55, No. 8 , 723-736. |
Fyfe et al., (2007), Reinforcement Learning Reward Functions for Unsupervised Learning, ISNN '07 Proceedings of the 4th international symposium on Neural Networks: Advances in Neural Networks. |
Gerstner (2002), Spiking neuron models: single neurons, populations, plasticity, Cambridge, U.K.: Cambridge University Press. |
Gewaltig et al., ‘NEST (Neural Simulation Tool)’, Scholapedia, 2007, pp. 1-15, 2(4): 1430. doi: 10.4249/scholapedia.1430. |
Gleeson et al., NeuroML: A Language for Describing Data Driven Models of Neurons and Networks with a High Degree of Biological Detail, PLoS Computational Biology, Jun. 2010, pp. 1-19 vol. 6 Issue 6. |
Glynn (1995), Likelihood ratio gradient estimation for regenerative stochastic recursions, Advances in Applied Probability, 27, 4, 1019-1053. |
Goodman et al., Brian: a simulator for spinking neural networks in Python, Frontiers in Neuroinformatics, Nov. 2008. pp. 1-10, vol. 2, Article 5. |
Gorchetchnikov et al., NineML: declarative, mathematically-explicit descriptions of spiking neuronal networks, Frontiers in Neurinformatics, Conference Abstract: 4th INCF Congress of Neuroinformatics, doi: 1 0.3389/conf.fninf.2011.08.00098. |
Graham, Lyle J., The Surf-Hippo Reference Manual, http:// www.neurophys.biomedicale.univparis5. fr/-graham/surf-hippo-files/Surf-Hippo%20Reference%20Manual.pdf, Mar. 2002, pp. 1-128. |
Ho, “Random Decision Forests” Int'l Conf. Document Analysis and Recognition, 1995, 5 pgs. |
Izhikevich (2007), Solving the distal reward problem through linkage of STDP and dopamine signaling, Cerebral Cortex, vol. 17, pp. 2443-2452. |
Izhikevich et al., ‘Relating STDP to BCM’, Neural Computation (2003) 15, 1511-1523. |
Izhikevich, E. (2007), Solving the Distal Reward Problem through Linkage of STDP and Dopamine Signaling, Cerebral Cortex, 17, 2443-2452. |
Izhikevich, ‘Simple Model of Spiking Neurons’, IEEE Transactions on Neural Networks, vol. 14, No. 6, Nov. 2003, pp. 1569-1572. |
Kaelbling, Leslie Pack, Michael L. Littman, and Andrew W. Moore, “Reinforcement learning: A survey.” arXiv preprint cs/96051 03 (1996). |
Kalal et al. “Online learniing of robust object detectors during unstable tracking” published on 3rd On-line Learning for Computer Vision Workshop 2009, Kyoto, Japan, IEEE CS. |
Karbowski et al., ‘Multispikes and Synchronization in a Large Neural Network with Temporal Delays’, Neural Computation 12, 1573-1606 (2000). |
Kenju, (2000), Reinforcement Learning in Continuous Time and Space, Neural Computation, 12:1, 219-245. |
Kiefer (1952), Stochastic Estimation of the Maximum of a Regression Function, Annals of Mathematical Statistics 23, #3, 462-466. |
Klampfl, (2009). Spiking neurons can learn to solve information bottleneck problems and extract independent components, Neural Computation, 21(4), pp. 911-959. |
Kleijnen et al., “Optimization and sensitivity analysis of computer simulation models by the score function method”, Invited Review European Journal of Operational Research, Mar. 1995. |
Klute et al., (2002). Artificial Muscles: Actuators for Biorobotic Systems. The International Journal 0./ Robotics Research 21:295-309. |
Larochelle et al., (2009), Exploring Strategies for Training Deep Neural Networks, J. of Machine Learning Research, v. 10, pp. 1-40. |
Laurent, ‘Issue 1—nnql—Refactor Nucleus into its own file—Neural Network Query Language’ [retrieved on Nov. 12, 2013]. Retrieved from the Internet: URL:https:// code.google.com/p/nnql/issues/detail?id=1. |
Laurent, ‘The Neural Network Query Language (NNQL) Reference’ [retrieved on Nov. 12, 2013]. Retrieved from the Internet: <URL'https://code.google.com/p/ nnql/issues/detail?id=1>. |
Legenstein et al., (2008), A learning theory for reward-modulated spike timing-dependent plasticity with application to biofeedback. PLoS Computational Biology, 4(10): 1-27. |
Lendek et al., (2006) State Estimation under Uncertainty: A Survey. Technical report 06-004, Delft Center for Systems and Control Delft University of Technology. |
Masakazu et al, “Convolutional Spiking Neural Network Model for Robust Face Detection”, 2002 Proceedings of the 9th International Conference on Neural Information Processing (ICONIP'02), vol. 2; pp. 660-664. |
Morrison, (2008)Phenomenological models of synaptic plasticity based on spike timing, Accepted: Apr. 9, 2008 The Author(s). |
Nikolic et al., (2011) High-sensitivity silicon retina for robotics and prosthetics. |
Ojala et al.. “Performance Evaluation of Texture Measures with Classification Based on Kullback Discrimination of Distributions” 1994 IEEE, pp. 582-585. |
Ozuysal et al., “Fast Keypoint Recognition in Ten Lines of Code” CVPR 2007. |
Ozuysal et al., “Fast Keypoint Recognition Using Random Ferns” IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol, 32, No. 3, Mar. 2010, pp. 448-461. |
Paugam-Moisy et al., “Computing with spiking neuron networks” G. Rozenberg T. Back, J. Kok (Eds.), Handbook of Natural Computing, Springer-Verlag (2010) [retrieved Dec. 30, 2013], [retrieved online from link.springer.com]. |
Pavlidis et al. Spiking neural network training using evolutionary algorithms. In: Proceedings 2005 IEEE International Joint Conference on Neural Networkds, 2005. IJCNIN'05, vol. 4, pp. 2190-2194 Publication Date Jul. 31, 2005 [online] [Retrieved on Dec. 10, 2013] Retrieved from the Internet <URL: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.5.4346&rep=rep1&type=pdf. |
PCT International Search Report and Written Opinion for International Appl. No. PCT/US2013/044124 dated Sep. 12, 2013. |
PCT International Search Report for International Application PCT/ US2013/060352 dated Jan. 16, 2014. |
PCT International Search Report for International Application PCT/ US2013/026738 dated Jul. 21, 2014 (10 pgs). |
PCT International Search Report for PCT/US2013/052136 dated Nov. 30, 2013. |
Pfister (2003), Optimal Hebbian Learning: A Probabilistic Point of View, In ICANN Proceedings. Springer, pp. 92-98. |
Pfister (2006), Optimal Spike-Timing Dependent Plasticity for Precise Action Potential Firing in Supervised Learning, Neural computation ISSN 0899-7667, 18-6. |
Ponulak (2006) Supervised Learning in Spiking Neural Networks with ReSuMe Method. Doctoral Dissertation Poznan, Poland. |
Ponulak et al., (2010) Supervised Learning in Spiking Neural Networks with ReSuMe: Sequence Learning, Classification and Spike-Shifting. Neural Comp., 22 (2): 467-510. |
Ponulak, (2005), ReSuMe—New supervised learning method for Spiking Neural Networks. Technical Report, Institute of Control and Information Engineering, Poznan University of Technology. |
Prokhorov, Danil V., and Lee A. Feldkamp, “Primitive adaptive critics.” Neural Networks, 1997, International Conference on vol. 4. IEEE, 1997. |
Reiman et al. (1989). Sensitivity analysis for simulations via likelihood ratios. Oper Res 37, 830-844. |
Robbins (1951), A Stochastic Approximation Method, Annals of Mathematical Statistics 22, #3, 400-407. |
Rosenstein et al., (2002), Supervised learning combined with an actor-critic architecture, Technical Report 02-41, Department of Computer Science, University of Massachusetts, Amherst. |
Rumelhart (1986), Learning internal representations by error propagation, Parallel distributed processing, vol. 1 (pp. 318-362), Cambridge, MA: MIT Press. |
Rumelhart et al., (1986), Learning representations by back-propagating errors, Nature 323 (6088) . pp. 533-536. |
Schemmel et al., Implementing synaptic plasticity in a VLSI spiking neural network model in Proceedings of the 2006 International Joint Conference on Neural Networks (IJCNN'06), IEEE Press (2006) Jul. 16-21, 2006, pp. 1-6 [online], [retrieved on Dec. 10, 2013]. Retrieved from the Internet <URL: http://www.kip.uni-heidelberg.de/veroeffentlichungen/download.egi/4620/ps/1774.pdf>. |
Schreiber et al., (2003). A new correlation-based measure of spike timing reliability. Neurocomputing, 52-54, 925-931. |
Seung, H. “Learning in spiking neural networks by reinforcement of stochastic synaptic transmission.” Neuron vol. 40 No. 6 (2003): pp. 1063-1073. |
Simulink.RTM. model [online], [Retrieved on Dec. 10, 2013] Retrieved from URL: http://www.mathworks.com/products/simulink/index.html> (2 pgs). |
Sinyavskiy et al. ‘Reinforcement learning of a spiking neural network in the task of control of an agent in a virtual discrete environment’ Rus. J. Nonlin. Dyn., 2011, vol. 7, No. 4 (Mobile Robots), pp. 859-875, chapters 1-8 (Russian Article with English Abstract). |
Sinyavskiy O. Yu.: ‘Obuchenic s podkrepleniem spaikovoy neiroiniy seti v zadache upravleniya agentom v diskretnoy virtualnoy srede.’ Nelineinaya Dinamika vol. T. 7., No. 24, 2011. pages 859-875. |
Sinyavskiy, et at. “Generalized Stochatic Spiking Neuron Model and Extended Spike Response Model in Spatial-Temporal Impulse Pattern Detection Task”, Optical Memory and Neural Networks (Information Optics), 2010, vol. 19, No. 4, pp. 300-309, 2010. |
Sjostrom et al., ‘Spike-Timing Dependent Plasticity’ Scholarpedia, 5(2):1362 (2010), pp. 1-18. |
Stein, (1967). Some models of neural variability. Biophys. J., 7: 37-68. |
Sutton R.S. (1988). Learning to predict by the methods of temporal differences. Machine Learning 3(1), 9-44. |
Szatmary et al., ‘Spike-timing Theory of Working Memory’ PLoS Computational Biology, vol. 6, Issue 8, Aug. 19, 2010 [retrieved on Dec. 30, 2013]. Retrieved from the Internet: <URL: http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.10008 79#>. |
Tishby et al., (1999), The information bottleneck method, In Proceedings of the 37th Annual Allerton Conference on Communication, Control and Computing, B Hajek & RS Sreenivas, eds., pp. 368-377, University of Illinois. |
Toyoizumi (2007), Optimality Model of Unsupervised Spike-Timing Dependent Plasticity: Synaptic Memory and Weight Distribution, Neural Computation, 19 (3). |
Toyoizumi et al., (2005), Generalized Bienenstock-Cooper-Munro rule for spiking neurons that maximizes information transmission, Proc. Natl. Acad. Sci. USA, 102, (pp. 5239-5244). |
Vasilaki et al., “Spike-Based Reinforcement Learning in Continuous State and Action Space: When Policy Gradient Methods Fail” PLoS, vol. 5, Issue 12, Dec. 2009. |
Vasilaki, et al., “Learning flexible sensori-motor mappings in a complex network” Biol Cybern (2009) 100:147-158. |
Vision Systems Design, “In search of the artificial retina” [online], Apr. 1, 2007. |
Weaver (2001), The Optimal Reward Baseline for Gradient-Based Reinforcement Learning, UAI 01 Proceedings of the 17th Conference in Uncertainty in Artificial Intelligence (pp. 538-545). Morgan Kaufman Publishers. |
Weber et al., (2009), Goal-Directed Feature Learning, In: Proc, International Joint Conference on Neural Networks, 3319-3326. |
Weber, C. et al. ‘Robot docking with neural vision and reinforcement.’ Knowledge-Based Systems vol. 17 No. 2 (2004): pp. 165-172. |
White et al., (Eds.) (1992) Handbook of Intelligent Control: Neural, Fuzzy and Adaptive Approaches. Van Nostrand Reinhold, New York. |
Widrow et al., (1960) Adaptive Switching Circuits. Ire Wescon Convention Record 4: 96-104. |
Williams (1992), Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning, Machine Learning 8, 229-256. |
Xie et al., (2004) “Learning in neural networks by reinforcement of irregular spiking”, Physical Review E, vol. 69, letter 041909, pp. 1-10. |
Yi (2009), Stochastic search using the natural gradient, ICML '09 Proceedings of the 26th Annual International Conference on Machince Learning. New York, NY, USA. |
Number | Date | Country | |
---|---|---|---|
20140222739 A1 | Aug 2014 | US |