A portion of the disclosure of this patent document contains material that is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, as it appears in the Patent and Trademark Office patent files or records, but otherwise reserves all copyright rights whatsoever.
1. Technological Field
The present disclosure relates to adaptive control of robotic devices.
2. Background
Robotic devices are used in a variety of applications, such as manufacturing, medical, safety, military, exploration, and/or other applications. Some existing robotic devices (e.g., manufacturing assembly and/or packaging) may be programmed in order to perform desired functionality. Some robotic devices (e.g., surgical robots) may be remotely controlled by humans, while some robots (e.g., iRobot Roomba®) may learn to operate via exploration.
Robotic devices may comprise one or more actuators configured to enable the robot to perform various tasks. Two or more contemporaneously occurring tasks may attempt to utilize the same hardware resources (e.g., actuators). In some uses, different tasks may attempt to issue competing and/or discordant control instructions to actuators (e.g., an aiming controller may command a steady state platform motion while obstacle avoidance controller may command an evasive maneuver). Selecting (e.g., allowing) a given action from multiple competing and/or conflicting actions may be required. Action selection may be described as a task of resolving conflicts between competing behavioral alternatives. A robotic controller may be configured to perform a repertoire of actions. Based on knowledge of its internal state, and sensory information related to robot's environment, the robotic controller may be required to decide what action (or action sequence) to perform in order for that agent to accomplish a target task.
One aspect of the disclosure relates to a non-transitory computer readable medium having instructions embodied thereon. The instructions may be executable by a processor to perform a method for controlling a robotic platform. The method may comprise: providing a first activation signal, a second activation signal, a first motor control signal, and a second motor control signal, the first activation signal being associated with the first motor control signal and the second activation signal being associated with the second motor control signal; selecting the first motor control signal for execution, the selection based on a comparison of the first activation signal and the second activation signal; and instructing the robotic platform to perform a first action based on execution of the first motor control signal.
Another aspect of the disclosure relates to a robotic controller apparatus. The apparatus may comprise one or more processors configured to execute computer program modules. The computer program modules may be executable to cause one or more processors to: provide a first control signal and a second control signal, the first control signal being configured to operate first controllable element of a robotic platform and the second control signal being configured to operate a second controllable element of the robotic platform; provide a first activation signal and a second activation signal, the first control signal and the second control signal each being configured to enable actuation of the first and second controllable elements, respectively; determine an enable signal based on competitive information associated with the first control signal and the second control signal; and enable execution of one and only one of the first control signal or the second control signal based on the enable signal.
In some implementations, the first control signal and the second control signal may be provided based on sensory input into the controller apparatus. The sensory input may convey information associated with one or both of the robotic platform or the surroundings. The execution of the first control signal or the second control signal by the controllable element may be configured to enable the platform to accomplish a first task or a second task. Execution of the second task may be incompatible with execution of the first task.
In some implementations, the computer program may be executable to cause one or more processors to cause the first control signal to be provided to a first relay neuron and the second control signal to be provided to a second relay neuron. The first relay neuron may be configured to prevent execution by the controllable element of the first control signal absent the enable signal and the second relay neuron being configured to prevent execution by the controllable element of the second control signal absent the enable signal.
In some implementations, the prevention of the execution may be based on a first inhibitory signal and a second inhibitory signal being provided to the first relay neuron and the second relay neuron, respectively. The execution of the first control signal or the second control signal by the controllable element may be configured based on an interruption of one of the first or the second inhibitory signals. The interruption may be based on the enable signal.
In some implementations, the sensory input may comprise a video signal conveying a representation of one or both of a target or an obstacle. The first task may comprise a target approach maneuver. The second task may comprise an obstacle avoidance maneuver.
In some implementations, the first control signal and the first activation signals may be provided by a first task controller operable in accordance with the sensory input to execute the first task. The second control signal and the second activation signals may be provided by a second task controller operable in accordance with the sensory input to execute the second task. The interruption may be effectuated by activation of one of a first selection neuron or a second selection neuron. Activation of a given one of the first selection neuron or the second selection neuron may be effectuated based on the enable signal.
In some implementations, the activation of a given one of the first selection neuron or the second selection neuron may be effectuated based on the enable signal being configured based on evaluation of a parameter of the first activation signal versus the parameter of the second activation signal.
In some implementations, the evaluation may comprise comparing an onset of the first activation signal to the onset of the second activation signal. The parameter may comprise onset time.
In some implementations, the evaluation may comprise comparing a magnitude of the first activation signal to a magnitude of the second activation signal. The parameter may comprise activation signal magnitude.
In some implementations, the competitive information associated with the first control signal and the second control signal may include information based on an operation of a first selection neuron and a second selection neuron operable in accordance with a winner-take-all (WTA) process configured to produce the enable signal according to: configuring the enable signal to activate the first task based on the WTA process indicating the first activation signal as a winning signal; and configuring the enable signal to activate the second task based on the WTA process indicating the second activation signal as the winning signal.
Yet another aspect of the disclosure relates to a method of providing a selected control signal of a plurality of control signals to an actuator. The method may comprise: coupling individual ones of the plurality of control signals to a plurality of relays, individual ones of the plurality of relays being configured, responsive to being activated, to provide a respective control signal to the actuator; preventing provision of all but one of the plurality of control signals to the actuator by deactivating all but one of the of plurality of relays; based on a plurality of activation signals associated with individual ones of the plurality of control signals, determining a relay selection signal configured to activate a relay of the plurality of relays; and effectuating provision of the selected control signal responsive to activation of the relay of the plurality of relays based on the selection signal.
In some implementations, individual ones of the plurality of control signal may comprise a spiking signal.
In some implementations, the selected control signal and the selection signal may comprise spiking signals. Individual ones of the plurality of control signals may comprise at least one of a spiking signal or an analog signal.
In some implementations, individual ones of the plurality of activation signals may comprise a binary signal.
In some implementations, individual ones of the plurality of activation signals may comprise digital signal characterized by two or more bits.
These and other objects, features, and characteristics of the present invention, as well as the methods of operation and functions of the related elements of structure and the combination of parts and economies of manufacture, will become more apparent upon consideration of the following description and the appended claims with reference to the accompanying drawings, all of which form a part of this specification, wherein like reference numerals designate corresponding parts in the various figures. It is to be expressly understood, however, that the drawings are for the purpose of illustration and description only and are not intended as a definition of the limits of the invention. As used in the specification and in the claims, the singular form of “a”, “an”, and “the” include plural referents unless the context clearly dictates otherwise.
All Figures disclosed herein are © Copyright 2013 Brain Corporation. All rights reserved.
Implementations of the present technology will now be described in detail with reference to the drawings, which are provided as illustrative examples so as to enable those skilled in the art to practice the technology. Notably, the figures and examples below are not meant to limit the scope of the present disclosure to a single implementation, but other implementations are possible by way of interchange of or combination with some or all of the described or illustrated elements. Wherever convenient, the same reference numbers will be used throughout the drawings to refer to same or like parts.
Where certain elements of these implementations can be partially or fully implemented using known components, only those portions of such known components that are necessary for an understanding of the present technology will be described, and detailed descriptions of other portions of such known components will be omitted so as not to obscure the disclosure.
In the present specification, an implementation showing a singular component should not be considered limiting; rather, the disclosure is intended to encompass other implementations including a plurality of the same component, and vice-versa, unless explicitly stated otherwise herein.
Further, the present disclosure encompasses present and future known equivalents to the components referred to herein by way of illustration.
As used herein, the term “bus” is meant generally to denote all types of interconnection or communication architecture that is used to access the synaptic and neuron memory. The “bus” may be optical, wireless, infrared, and/or another type of communication medium. The exact topology of the bus could be for example standard “bus”, hierarchical bus, network-on-chip, address-event-representation (AER) connection, and/or other type of communication topology used for accessing, e.g., different memories in pulse-based system.
As used herein, the terms “computer”, “computing device”, and “computerized device” may include one or more of personal computers (PCs) and/or minicomputers (e.g., desktop, laptop, and/or other PCs), mainframe computers, workstations, servers, personal digital assistants (PDAs), handheld computers, embedded computers, programmable logic devices, personal communicators, tablet computers, portable navigation aids, J2ME equipped devices, cellular telephones, smart phones, personal integrated communication and/or entertainment devices, and/or any other device capable of executing a set of instructions and processing an incoming data signal.
As used herein, the term “computer program” or “software” may include any sequence of human and/or machine cognizable steps which perform a function. Such program may be rendered in a programming language and/or environment including one or more of C/C++, C#, Fortran, COBOL, MATLAB™, PASCAL, Python, assembly language, markup languages (e.g., HTML, SGML, XML, VoXML), object-oriented environments (e.g., Common Object Request Broker Architecture (CORBA)), Java™ (e.g., J2ME, Java Beans), Binary Runtime Environment (e.g., BREW), and/or other programming languages and/or environments.
As used herein, the terms “connection”, “link”, “transmission channel”, “delay line”, “wireless” may include a causal link between any two or more entities (whether physical or logical/virtual), which may enable information exchange between the entities.
As used herein, the term “memory” may include an integrated circuit and/or other storage device adapted for storing digital data. By way of non-limiting example, memory may include one or more of ROM, PROM, EEPROM, DRAM, Mobile DRAM, SDRAM, DDR/2 SDRAM, EDO/FPMS, RLDRAM, SRAM, “flash” memory (e.g., NAND/NOR), memristor memory, PSRAM, and/or other types of memory.
As used herein, the terms “integrated circuit”, “chip”, and “IC” are meant to refer to an electronic circuit manufactured by the patterned diffusion of trace elements into the surface of a thin substrate of semiconductor material. By way of non-limiting example, integrated circuits may include field programmable gate arrays (e.g., FPGAs), a programmable logic device (PLD), reconfigurable computer fabrics (RCFs), application-specific integrated circuits (ASICs), and/or other types of integrated circuits.
As used herein, the terms “microprocessor” and “digital processor” are meant generally to include digital processing devices. By way of non-limiting example, digital processing devices may include one or more of digital signal processors (DSPs), reduced instruction set computers (RISC), general-purpose (CISC) processors, microprocessors, gate arrays (e.g., field programmable gate arrays (FPGAs)), PLDs, reconfigurable computer fabrics (RCFs), array processors, secure microprocessors, application-specific integrated circuits (ASICs), and/or other digital processing devices. Such digital processors may be contained on a single unitary IC die, or distributed across multiple components.
As used herein, the term “network interface” refers to any signal, data, and/or software interface with a component, network, and/or process. By way of non-limiting example, a network interface may include one or more of FireWire (e.g., FW400, FW800, etc.), USB (e.g., USB2), Ethernet (e.g., 10/100, 10/100/1000 (Gigabit Ethernet), 10-Gig-E, etc.), MoCA, Coaxsys (e.g., TVnet™), radio frequency tuner (e.g., in-band or OOB, cable modem, etc.), Wi-Fi (802.11), WiMAX (802.16), PAN (e.g., 802.15), cellular (e.g., 3G, LTE/LTE-A/TD-LTE, GSM, etc.), IrDA families, and/or other network interfaces.
As used herein, the terms “node”, “neuron”, and “neuronal node” are meant to refer, without limitation, to a network unit (e.g., a spiking neuron and a set of synapses configured to provide input signals to the neuron) having parameters that are subject to adaptation in accordance with a model.
As used herein, the terms “state” and “node state” is meant generally to denote a full (or partial) set of dynamic variables (e.g., a membrane potential, firing threshold and/or other) used to describe state of a network node.
As used herein, the term “synaptic channel”, “connection”, “link”, “transmission channel”, “delay line”, and “communications channel” include a link between any two or more entities (whether physical (wired or wireless), or logical/virtual) which enables information exchange between the entities, and may be characterized by a one or more variables affecting the information exchange.
As used herein, the term “Wi-Fi” includes one or more of IEEE-Std. 802.11, variants of IEEE-Std. 802.11, standards related to IEEE-Std. 802.11 (e.g., 802.11a/b/g/n/s/v), and/or other wireless standards.
As used herein, the term “wireless” means any wireless signal, data, communication, and/or other wireless interface. By way of non-limiting example, a wireless interface may include one or more of Wi-Fi, Bluetooth, 3G (3GPP/3GPP2), HSDPA/HSUPA, TDMA, CDMA (e.g., IS-95A, WCDMA, etc.), FHSS, DSSS, GSM, PAN/802.15, WiMAX (802.16), 802.20, narrowband/FDMA, OFDM, PCS/DCS, LTE/LTE-A/TD-LTE, analog cellular, CDPD, satellite systems, millimeter wave or microwave systems, acoustic, infrared (i.e., IrDA), and/or other wireless interfaces.
In one aspect of the disclosure, apparatus and methods for robotic controller design directed at implementing an action selection mechanism. The robotic controller may comprise multiple task controllers configured to implement multiple control actions. Two or more actions may occur contemporaneous with one another. The approach disclosed herein may advantageously allow for coordination between individual task controllers based on the concept of priorities. Responsive to activation of two or more task controllers, an arbitrator block may be utilized in order to regulate (e.g., gate) execution of competing actions based on priorities associates with individual actions. In some implementations, a task with a higher priority may be executed ahead of a task with a lower priority.
In some implementations, task priority indication (e.g., priority status) may be configured separate from the control signal itself (e.g., motor control). Such implementation may make the process of action selection independent of the representation and/or strength of the control signal. By way of illustration: a weaker control signal accompanied by a higher priority indication may be executed in place (and/or ahead) of a stronger control signal, but having a lower priority.
The disclosure finds broad practical application. Implementations of the disclosure may be, for example, deployed in a hardware and/or software implementation of a computer-controlled system, provided in one or more of a prosthetic device, robotic device, and/or other apparatus. In some implementations, a control system may include a processor embodied in an application specific integrated circuit (ASIC), a central processing unit (CPU), a graphics processing unit (GPU), a digital signal processor (DSP) or an application specific processor (ASIP) or other general purpose multiprocessor, which can be adapted or configured for use in an embedded application such as controlling a robotic device. However, it will be appreciated that the disclosure is in no way limited to the applications and/or implementations described herein.
Principles of the present disclosure may be applicable to various control applications that use a spiking neural network as the controller and comprise a set of sensors and actuators that produce signals of different types. Examples of those applications may include one or more of a robot navigation controller, an automatic drone stabilization, robot arm control, and/or other applications. Some sensors may communicate their state data using analog variables, whereas other sensors employ spiking signal representation.
The controller 102 may be operable in accordance with a learning process (e.g., reinforcement learning and/or supervised learning). In one or more implementations, the controller 102 may optimize performance (e.g., performance of the system 100 of
Learning process of adaptive controller (e.g., 102 of
Individual spiking neurons may be characterized by internal state. The internal state may, for example, comprise a membrane voltage of the neuron, conductance of the membrane, and/or other parameters. The neuron process may be characterized by one or more learning parameter which may comprise input connection efficacy, output connection efficacy, training input connection efficacy, response generating (firing) threshold, resting potential of the neuron, and/or other parameters. In one or more implementations, some learning parameters may comprise probabilities of signal transmission between the units (e.g., neurons) of the network.
In some implementations, the training input (e.g., 104 in
During operation (e.g., subsequent to learning): data (e.g., spike events) arriving to neurons of the network may cause changes in the neuron state (e.g., increase neuron membrane potential and/or other parameters). Changes in the neuron state may cause the neuron to generate a response (e.g., output a spike). Teaching data may be absent during operation, while input data are required for the neuron to generate output.
In one or more implementations, such as object recognition, and/or obstacle avoidance, the input 106 may comprise a stream of pixel values associated with one or more digital images. In one or more implementations of e.g., video, radar, sonography, x-ray, magnetic resonance imaging, and/or other types of sensing, the input may comprise electromagnetic waves (e.g., visible light, IR, UV, and/or other types of electromagnetic waves) entering an imaging sensor array. In some implementations, the imaging sensor array may comprise one or more of RGCs, a charge coupled device (CCD), an active-pixel sensor (APS), and/or other sensors. The input signal may comprise a sequence of images and/or image frames. The sequence of images and/or image frame may be received from a CCD camera via a receiver apparatus and/or downloaded from a file. The image may comprise a two-dimensional matrix of RGB values refreshed at a 25 Hz frame rate. It will be appreciated by those skilled in the arts that the above image parameters are merely exemplary, and many other image representations (e.g., bitmap, CMYK, HSV, HSL, grayscale, and/or other representations) and/or frame rates are equally useful with the present invention. Pixels and/or groups of pixels associated with objects and/or features in the input frames may be encoded using, for example, latency encoding described in U.S. patent application Ser. No. 12/869,583, filed Aug. 26, 2010 and entitled “INVARIANT PULSE LATENCY CODING SYSTEMS AND METHODS”; U.S. Pat. No. 8,315,305, issued Nov. 20, 2012, entitled “SYSTEMS AND METHODS FOR INVARIANT PULSE LATENCY CODING”; U.S. patent application Ser. No. 13/152,084, filed Jun. 2, 2011, entitled “APPARATUS AND METHODS FOR PULSE-CODE INVARIANT OBJECT RECOGNITION”; and/or latency encoding comprising a temporal winner-take-all mechanism described U.S. patent application Ser. No. 13/757,607, filed Feb. 1, 2013 and entitled “TEMPORAL WINNER TAKES ALL SPIKING NEURON NETWORK SENSORY PROCESSING APPARATUS AND METHODS”, each of the foregoing being incorporated herein by reference in its entirety.
In one or more implementations, object recognition and/or classification may be implemented using spiking neuron classifier comprising conditionally independent subsets as described in co-owned U.S. patent application Ser. No. 13/756,372 filed Jan. 31, 2013, and entitled “SPIKING NEURON CLASSIFIER APPARATUS AND METHODS” and/or co-owned U.S. patent application Ser. No. 13/756,382 filed Jan. 31, 2013, and entitled “REDUCED LATENCY SPIKING NEURON CLASSIFIER APPARATUS AND METHODS”, each of the foregoing being incorporated herein by reference in its entirety.
In one or more implementations, encoding may comprise adaptive adjustment of neuron parameters, such neuron excitability described in U.S. patent application Ser. No. 13/623,820 entitled “APPARATUS AND METHODS FOR ENCODING OF SENSORY DATA USING ARTIFICIAL SPIKING NEURONS”, filed Sep. 20, 2012, the foregoing being incorporated herein by reference in its entirety.
In some implementations, analog inputs may be converted into spikes using, for example, kernel expansion techniques described in co pending U.S. patent application Ser. No. 13/623,842 filed Sep. 20, 2012, and entitled “SPIKING NEURON NETWORK ADAPTIVE CONTROL APPARATUS AND METHODS”, the foregoing being incorporated herein by reference in its entirety. In one or more implementations, analog and/or spiking inputs may be processed by mixed signal spiking neurons, such as U.S. patent application Ser. No. 13/313,826 entitled “APPARATUS AND METHODS FOR IMPLEMENTING LEARNING FOR ANALOG AND SPIKING SIGNALS IN ARTIFICIAL NEURAL NETWORKS”, filed Dec. 7, 2011, and/or co-pending U.S. patent application Ser. No. 13/761,090 entitled “APPARATUS AND METHODS FOR IMPLEMENTING LEARNING FOR ANALOG AND SPIKING SIGNALS IN ARTIFICIAL NEURAL NETWORKS”, filed Feb. 6, 2013, each of the foregoing being incorporated herein by reference in its entirety.
The rules may be configured to implement synaptic plasticity in the network. In some implementations, the plastic rules may comprise one or more spike-timing dependent plasticity, such as rule comprising feedback described in co-owned and co-pending U.S. patent application Ser. No. 13/465,903 entitled “SENSORY INPUT PROCESSING APPARATUS IN A SPIKING NEURAL NETWORK”, filed May 7, 2012; rules configured to modify of feed forward plasticity due to activity of neighboring neurons, described in co-owned U.S. patent application Ser. No. 13/488,106, entitled “SPIKING NEURON NETWORK APPARATUS AND METHODS”, filed Jun. 4, 2012; conditional plasticity rules described in U.S. patent application Ser. No. 13/541,531, entitled “CONDITIONAL PLASTICITY SPIKING NEURON NETWORK APPARATUS AND METHODS”, filed Jul. 3, 2012; plasticity configured to stabilize neuron response rate as described in U.S. patent application Ser. No. 13/691,554, entitled “RATE STABILIZATION THROUGH PLASTICITY IN SPIKING NEURON NETWORK”, filed Nov. 30, 2012; activity-based plasticity rules described in co-owned U.S. patent application Ser. No. 13/660,967, entitled “APPARATUS AND METHODS FOR ACTIVITY-BASED PLASTICITY IN A SPIKING NEURON NETWORK”, filed Oct. 25, 2012, U.S. patent application Ser. No. 13/660,945, entitled “MODULATED PLASTICITY APPARATUS AND METHODS FOR SPIKING NEURON NETWORKS”, filed Oct. 25, 2012; and U.S. patent application Ser. No. 13/774,934, entitled “APPARATUS AND METHODS FOR RATE-MODULATED PLASTICITY IN A SPIKING NEURON NETWORK”, filed Feb. 22, 2013; multi-modal rules described in U.S. patent application Ser. No. 13/763,005, entitled “SPIKING NETWORK APPARATUS AND METHOD WITH BIMODAL SPIKE-TIMING DEPENDENT PLASTICITY”, filed Feb. 8, 2013, each of the foregoing being incorporated herein by reference in its entirety.
In one or more implementations, neuron operation may be configured based on one or more inhibitory connections providing input configured to delay and/or depress response generation by the neuron, as described in U.S. patent application Ser. No. 13/660,923, entitled “ADAPTIVE PLASTICITY APPARATUS AND METHODS FOR SPIKING NEURON NETWORK”, filed Oct. 25, 2012, the foregoing being incorporated herein by reference in its entirety
Connection efficacy updated may be effectuated using a variety of applicable methodologies such as, for example, event based updates described in detail in co-owned U.S. patent application Ser. No. 13/239, filed Sep. 21, 2011, entitled “APPARATUS AND METHODS FOR SYNAPTIC UPDATE IN A PULSE-CODED NETWORK”; 201220, U.S. patent application Ser. No. 13/588,774, entitled “APPARATUS AND METHODS FOR IMPLEMENTING EVENT-BASED UPDATES IN SPIKING NEURON NETWORK”, filed Aug. 17, 2012; and U.S. patent application Ser. No. 13/560,891 entitled “APPARATUS AND METHODS FOR EFFICIENT UPDATES IN SPIKING NEURON NETWORKS”, each of the foregoing being incorporated herein by reference in its entirety.
Neuron process may comprise one or more learning rules configured to adjust neuron state and/or generate neuron output in accordance with neuron inputs.
In some implementations, one or more leaning rules may comprise state dependent learning rules described, for example, in U.S. patent application Ser. No. 13/560,902, entitled “APPARATUS AND METHODS FOR STATE-DEPENDENT LEARNING IN SPIKING NEURON NETWORKS”, filed Jul. 27, 2012 and/or pending U.S. patent application Ser. No. 13/722,769 filed Dec. 20, 2012, and entitled “APPARATUS AND METHODS FOR STATE-DEPENDENT LEARNING IN SPIKING NEURON NETWORKS”, each of the foregoing being incorporated herein by reference in its entirety.
In one or more implementations, the one or more leaning rules may be configured to comprise one or more reinforcement learning, unsupervised learning, and/or supervised learning as described in co-owned and co-pending U.S. patent application Ser. No. 13/487,499 entitled “STOCHASTIC APPARATUS AND METHODS FOR IMPLEMENTING GENERALIZED LEARNING RULES, incorporated supra.
In one or more implementations, the one or more leaning rules may be configured in accordance with focused exploration rules such as described, for example, in U.S. patent application Ser. No. 13/489,280 entitled “APPARATUS AND METHODS FOR REINFORCEMENT LEARNING IN ARTIFICIAL NEURAL NETWORKS”, filed Jun. 5, 2012, the foregoing being incorporated herein by reference in its entirety.
Adaptive controller (e.g., the controller apparatus 102 of
Initially the rover 210 may proceed on a trajectory portion 206 towards the target 214. Robotic controller may execute a target approach (TA) task associated with the trajectory portion 206. Upon detecting a presence of an obstacle and/or a threat 212, the controller may adaptively alter task execution at a time corresponding to the location 218 in
In some implementations, an exclusive control of a given controllable element an (e.g., an actuator) may be realized. A given controller with the highest priority may be activated (e.g., allowed to control the element) responsive to a priority determination. In some implementations, the actuator may comprise one or more of a motor for moving or controlling a mechanism or system; a component operated by a source of energy (e.g., electric current, hydraulic fluid pressure, pneumatic pressure and/or other source of energy), which converts that energy into motion; an electromechanical actuator; a fixed mechanical or electronic system; and/or other actuator. Such an actuator may be associated with specific software (e.g. a printer driver, robot control system, and/or other software). In some implementations, an actuator may be a human or other agent.
The teaching agent may provide teaching input to the rover 210 during training. The teaching agent may comprise a human trainer of the robot. The trainer may utilize a remote control apparatus in order to provide training input to the rover, e.g., during the trajectory alteration events 218 in
Operation of the robotic apparatus 210 of
The controller 300 may comprise a task controller block 310. The task controller block 310 may comprise multiple task controllers. Individual task controllers (e.g., 302) may be configured to generate control output 306 based on input 304. The input 304 may comprise one or more of sensory input, estimated system state, input from a control entity, and/or other input. In some implementations, robotic platform feedback may comprise proprioceptive signals. Examples of proprioceptive signals may include one or more of readings from servo motors, joint position, torque, and/or other proprioceptive signals. In some implementations, the sensory input may correspond to the controller sensory input 106, described with respect to
In one or more implementations, the control entity may comprise a human trainer. The human trainer may communicate with the robotic controller via user interface. Examples of such a user interface may include one or more of a remote controller, a joystick, and/or other user interface. In one or more implementations, the control entity may comprise a computerized agent. Such a computerized agent may include a multifunction adaptive controller operable using reinforcement, supervised, and/or unsupervised learning and capable of training other robotic devices for one and/or multiple tasks.
In one or more implementations, individual task controllers may comprise a predictor apparatus, for example such as an adaptive predictor described in U.S. patent application Ser. No. 13/842,530 entitled “ADAPTIVE PREDICTOR APPARATUS AND METHODS”, filed Mar. 15, 2013, incorporated supra.
The control signals (e.g., 306) of individual task controllers may be provided to relay block 340. The relay block 340 may comprise one or more neurons 346. Connections coupling the signal 306 from the task control block 310 to the relay block 340 may be characterized by connection efficacy 344. In some implementations, the efficacy 344 may comprise a positive value causing excitatory input to the neurons of the relay block 340. Connection efficacy may in general refer to a magnitude and/or probability of input spike influence on neuronal response (i.e., output spike generation or firing). Connection efficacy may comprise, for example, a synaptic weight parameter by which one or more state variables of post synaptic unit are changed. During operation of the pulse-code network, synaptic weights may be dynamically adjusted using various forms of machine learning or biologically-inspired learning methods, for example, by what is referred to as the spike-timing dependent plasticity (STDP). In one or more implementations, STDP mechanism may comprise rate-modulated plasticity mechanism described, for example, in U.S. patent application Ser. No. 13/774,934, entitled “APPARATUS AND METHODS FOR RATE-MODULATED PLASTICITY IN A SPIKING NEURON NETWORK”, and/or bi-modal plasticity mechanism, for example, such as described in U.S. patent application Ser. No. 13/763,005, entitled “SPIKING NETWORK APPARATUS AND METHOD WITH BIMODAL SPIKE-TIMING DEPENDENT PLASTICITY”, each of the foregoing being incorporated herein by reference in its entirety. In some implementations, learning may be goal-oriented and realized for example by the use of reward-modulated STDP, e.g., as described in U.S. patent application Ser. No. 13/554,980, entitled “APPARATUS AND METHODS FOR REINFORCEMENT LEARNING IN LARGE POPULATIONS OF ARTIFICIAL SPIKING NEURONS”, filed on Jul. 20, 2012, the foregoing being incorporated herein by reference in its entirety.
The task controller 302 may be configured to provide an activation signal 308 associated with the control signal 306. In one or more implementations, the activation signal may comprise a spiking signal, a fixed point/floating point value stored in a register, a binary signal, and/or another representation. In one or more implementations, the activation signal may take binary values. For example, such binary values may include a ‘1’ corresponding to the active state of the controller and a ‘0’ corresponding to the inactive controller state. The controller may be configured to provide the activation signal ‘1’ based on a detection of a relevant context. In some implementations, the relevant context may correspond to a detection of an object representation (e.g., green ball) in the video input. For example, for the OA controller, the appropriate context may correspond to the sensory input signals indicating nearby obstacles in front of the robot. In this case, the OA may send the activation signal ‘1’ and otherwise ‘0’ may be sent.
In some implementations, the activation signal 308 may comprise a continuous real value, e.g., as described in detail in U.S. patent application Ser. No. 13/761,090 entitled “APPARATUS AND METHODS FOR IMPLEMENTING LEARNING FOR ANALOG AND SPIKING SIGNALS IN ARTIFICIAL NEURAL NETWORKS”, filed Feb. 6, 2013, incorporated supra.
The activation signal 308 may be provided to a selector block 320. In some implementations, the selector block 320 may comprise a winner-take-all network comprising one or more neurons 326. The neurons 326 may be configured to adjust efficacy 324 of connections delivering the activation signals 308 from the block 310 in order to determine which of the control actions from the task controllers should be executed at a given time. In one or more implementations, the activation signals may be characterized by two states: the low state (task controller inactive) and the high state (the task controller active). High state activation signals which may compete with one another within the selector block. The winner may be determined by the priorities, which may be encoded by the block 320. In some implementations, the task priority may be encoded using the efficacy 324. In one or more implementations the efficacy (and thus the priorities) may be adjusted such as to optimize the overall behavior of the system towards accomplishing a given task. The task accomplishment may be based on a control policy (e.g., lowest energy use, fastest time, lowers mean deviation from a target trajectory, and/or other policies). For example, in a task where a mobile robot may be configured to collect a set of certain objects (target objects) and navigate in a given environment such not to run into other objects (obstacles), the control policy (or a cost function) may be configured to minimize time necessary to collect the target objects without hitting any obstacles. Various optimization methods, for example reinforcement learning, may be used to in order to adjust efficacies from the TA and OA controllers such to meet the control policy for the given task.
In some implementations, activation of selection block neurons (e.g., the neuron 326 in
A
i
=w
i
a
i (Eqn. 1)
Task selection may be implemented by selecting the largest weighted activation signal Ai, e.g., as:
max(Ai),i:{1, . . . N} (Eqn. 2)
when more than one activation signals ai is present at a given time. In some implementations, individual activation signals (e.g., 324) of two or more task controllers may be configured at a given fixed magnitude (e.g., binary ‘0’ or ‘1’). In some implementations, the activation signal may be configured with varying magnitude. By way of illustration, when implementing a target approach with a control policy configured based on target size and/or weight, the control signal for a given target may be configured based on target size. For an obstacle avoidance control policy configured based on obstacle value and/or facility, the control signal for a given obstacle may be configured based on obstacle value and/or fragility. Information about targets and/or obstacles (e.g., size, weight, fragility, value, and/or other parameters) may be acquired by the robotic device in real time, based, for example, on sensory input, teaching signal, and/or other information. In some implementations, such data may be provided a priori and/or received via a remote link responsive to, for example, a request from a robot (e.g., “Is this a valuable target?”).
In some implementations, efficacy 324 may comprise weights configured in accordance with a control policy of the control task. By way of illustration, in some implementations of a vacuum cleaning robot (e.g., operating target approach and/or obstacle avoidance tasks) the obstacle avoidance task may be given a greater weight in order to reduce probability of a collision (e.g., for a robot operating in a museum, a stock room, and/or another application wherein a collision may cause damage to the robot and/or environment). In some implementations, e.g., when cleaning a toy room, cleaning may be designated as a greater priority while tolerating a given amount of collisions.
The selection block 320 may be coupled to inhibitor block 330 via one or more connections 338 characterized by inhibitory efficacy 334. The inhibitor block 330 may comprise one or more neurons 336 operated in accordance with a tonically active process. The tonically active process of a given neuron 336 may be characterized by generation of an output comprising plurality of spikes. In some implementations, tonic output may be characterized by a number of spikes within a given time duration (e.g., average spike rate), and/or inter-spike interval. The output of neurons 336 may be delivered to the relay block 340 via connections 348. The connections 348 may be characterized by inhibitory efficacy 344 configured to inhibit and/or suppress output generation by neurons 346 of the relay block 340.
In one or more implementations, inhibition of neurons 326 may be effectuated by deactivating their output (e.g., via output of the inhibited neurons for a period of time and/or by temporary disconnecting excitatory inputs to the inhibited neurons). In some implementations, inhibition may be realized by affecting the state of the inhibited neuron models, such to decrease the probability of their firing.
Based on inhibitory efficacy 334 breaching a threshold, one or more neurons of neurons 336 may become inhibited. An inhibited neuron 336 may reduce and/or altogether stop generation of tonic output on the connection 348. Responsive to absence of tonic output on connection 348, neurons 346 of the relay layer may become disinhibited (e.g., active). Active neurons 346 may propagate input control signals 306 from the respective (e.g., the winning) controller 302 to the output 352 of the relay block 340. The aggregation block 350 may be used to combine outputs of the relay block thereby generating the controller output 350. In some implementations, the aggregation block may comprise an aggregation neuron 356 configured such as to ensure that for command signals of a winning controller may be further relayed to a common and/or individually assigned destination modules. In some implementations, the destination modules may comprise robot actuators and/or motor controllers (e.g. PID controllers). It will be appreciated by those skilled in the arts that the controller configuration shown in
In the exemplary network associated with
In the exemplary network associated with
In the exemplary network associated with
In the exemplary network associated with
In the exemplary network associated with
In the exemplary network associated with
In some implementations, methods 600, 700, 800 may be implemented in one or more processing devices (e.g., a digital processor, an analog processor, a digital circuit designed to process information, an analog circuit designed to process information, a state machine, and/or other mechanisms for electronically processing information and/or execute computer program modules). The one or more processing devices may include one or more devices executing some or all of the operations of methods 600, 700, 800 in response to instructions stored electronically on an electronic storage medium. The one or more processing devices may include one or more devices configured through hardware, firmware, and/or software to be specifically designed for execution of one or more of the operations of methods 600, 700, 800.
Referring now to
At operation 602 of method 600, sensory context may be determined. In some implementations, the context may comprise one or more aspects of sensory input (e.g., 106) and/or robotic platform feedback (112 in
At operation 604, a teaching input may be received. In some implementations, the teaching input may comprise a control command (e.g., rotate right/left wheel and/or other command) configured based on the sensory context (e.g., appearance of a target in field of view of the robot's camera, and/or other sensory context) and provided by a human user. In one or more implementations, the teaching input signal may comprise a portion of sensory input, e.g., a signal from a touch sensor indicating an obstacle.
At operation 606, the teaching input and the context may be analyzed. In one or more implementations, the analyses of operation 606 may comprise generation of one or more control signals (e.g., 306 of
At operation 608 of method 600, a control signal may be selected from multiple control signals. In some implementations, the control signal selection may be based on evaluation of multiple activation signals using adaptive arbitration methodology described, for example, with respect to
At operation 610, a task associated with the control signal selected at operation 608 may be executed. In one or more implementations, an obstacle 212 avoidance task may be executed, as described for example, with respect to
At operation 702, based on an identification of two tasks to be executed, two control signals S1, S2 may be generated. The individual tasks S1, S2, may be configured to operate a given controllable element (e.g., a wheel or an arm motor). The control signals may be provided to a relay block (e.g., 340 in
At operation 704, an execution selection signal may be determined based on two activation signals associated with the two control signals S1, S2, respectively. In one or more implementations, the execution selection signal may comprise a signal transmitted via connection 338 as described above with respect to
At operation 706, a relay block may be configured to impede delivery of first of the two control signals S1, S2 to the controllable element.
At operation 708, the relay block may be configured to provide the second of the two control signals to the controllable element. The provision of the second of the control signals S1, S2 to the controllable element may be based on activation of the relay block by the execution selection signal obtained at operation 706.
At operation 802, a portion of the network may be configured to produce multiple tonic signals. In one or more implementations, individual tonic signals may correspond to spiking signals shown in
At operation 804, the multiple tonic signals may be provided to respective relay neurons via connections characterized by inhibitory efficacy, e.g., connections 348 in
At operation 806, all but one relay neurons may be inhibited based on the inhibitory efficacy associated with multiple tonic signals breaching a threshold.
At operation 808, control output, corresponding to the selection signal, may be provided to a controllable element by the non-inhibited relay neuron(s).
One or more objects (e.g., an obstacle 1174, a target 1176, and/or other objects) may be present in the camera field of view. The motion of the objects may result in a displacement of pixels representing the objects within successive frames, such as described in U.S. patent application Ser. No. 13/689,717, entitled “APPARATUS AND METHODS FOR OBJECT DETECTION VIA OPTICAL FLOW CANCELLATION”, filed Nov. 30, 2012, incorporated, supra.
When the robotic apparatus 1160 is in motion, such as indicated by arrow 1164 in
Various exemplary computerized robotic apparatus may be utilized with the action selection methodology of the disclosure. In some implementations, the robotic apparatus may comprise one or more processors configured to execute the adaptation methodology described herein. In some implementations, an external processing entity (e.g., a cloud service, computer station and/or cluster, and/or other processing entity) may be utilized in order to perform computations during operation of the robot (e.g., operations of methods 600, 700, 800).
Action selection methodology described herein may enable autonomous operation of robotic controllers. In some implementations, training of the robot may be based on a collaborative training approach wherein the robot and the user collaborate on performing a task, e.g., as described in owned U.S. patent application Ser. No. 13/918,338 entitled “ROBOTIC TRAINING APPARATUS AND METHODS”, filed Jun. 14, 2013. In one or more implementations, operation of robotic devices may be aided by an adaptive controller apparatus, e.g., as described in U.S. patent application Ser. No. 13/918,298 entitled “HIERARCHICAL ROBOTIC CONTROLLER APPARATUS AND METHODS”, filed Jun. 14, 2013, U.S. patent application Ser. No. 13/918,620 entitled “PREDICTIVE ROBOTIC CONTROLLER APPARATUS AND METHODS”, filed Jun. 14, 2013, and/or U.S. patent application Ser. No. 13/907,734 entitled “ADAPTIVE ROBOTIC INTERFACE APPARATUS AND METHODS”, filed May 31, 2013, incorporated supra. In some implementations, the adaptive controller may comprise one or more adaptive predictor controllers such as, for example, these described in U.S. patent application Ser. No. 13/842,530 entitled “ADAPTIVE PREDICTOR APPARATUS AND METHODS”, filed Mar. 15, 2013, incorporated supra.
Separating priority (or activation) signals from the control signals may enable action selection for a broader variety of control signals including for example, control signals based on binary, real-valued, message-based communication, use of pauses, and/or other communication methods.
Action selection methodology described herein may provide additional flexibility for selecting one or more actions by a robotic system performing a complex task. In some implementations, such complex task may comprise a composite task (e.g., serve a beverage by a mobile robot). Such target task (serving the coffee) may be decomposed into multiple sub-tasks, including, for example, one or more of identifying a beverage container, grasping the container, placing it under a dispenser, dispensing the beverage, carrying the container to a target destination, navigating obstacles during the traverse, placing the beverage at a target location, and/or other actions. Such decomposition may provide a mechanism allowing a developer of a robotic controller to decompose the problem of building intelligent systems into relatively simple components or modules and then reintegrate or coordinate the overall behavior.
In some implementations, the action selection methodology described herein may be employed for control of intelligent robotic devices configured to select one or more behaviors from a repertoire of behaviors (e.g. target approach versus obstacle avoidance versus exploration). These robotic devices may comprise mobile robots, robotic arms, and/or other robotic devices. In some implementations, the adaptive action selection controller may be configured to control a multi-agent system for coordinating behavior of the particular agents in the system. By way of illustration, the multi-agent system may comprise a plurality of autonomous mobile robots configured to operate in collaboration with one another in order to accomplish a task. In one or more implementations, the task may comprise exploring a certain environment for surveillance, cleaning, inspection, and/or other applications. Some applications may include disaster response, agriculture, factory maintenance, border security, and/or other applications.
Individual robots may be configured to prioritize and/or coordinate their actions with other robots. For example, an exploration domain of first robot may configure to not overlap with the exploration domain of a second robot. The second robot may configure its exploration route to avoid the first domain based on the exploration of the first domain by the first robot. In some implementations, individual robots may communicate to one another location of obstacles, hazards, charging stations, and/or other information about the environment.
Actions executed by individual robots that may be related to achievement of the overall task (e.g., cleaning refuse in a building) may be referred to as ‘global actions’ and denoted as TG1, TG2, . . . TGN. Examples of the global actions TG1, TG2, . . . , TGN may include. “explore room A”, “collect an object from room B”, and/or other actions. One or more actions may be communicated to individual robot by a system coordinator based on information provided by one or more robots. In some implementations, the system coordinator may comprise one of the robots, a human operator, a centralized and/or a distributed computerized controller.
Actions executed by individual robots that may be related to fulfillment of a given global action by the robot may be referred to as local actions and denoted as TL1, TL2, . . . , TLM. Examples of local actions TL1, TL2, . . . , TLM may include operations related to safety of the robot, its energy resource status, and/or other parameters. The robot may be configured to execute operations TL1, TL2, . . . , TLM autonomously.
In some implementations, the adaptive action selection methodology described herein may be employed in order to arbitrate behavior of the individual robot by adaptively coordinating action selection over an action set consisting of {{TG1, TG2, . . . , TGN}, {TL1, TL2, . . . , TLM}}.
An exemplary list of action to be arbitrated may include:
The action arbitrator may be implemented using a spiking neural network as described herein. In one or more implementations, the adaptive controller capability to generate control signal prior to or in lieu of the teaching input may enable autonomous operation of the robot and/or obviate provision of the teaching input. In some applications, wherein a teacher may be configured to control and/or train multiple entities (e.g., multiple controllers 300 of
It will be recognized that while certain aspects of the disclosure are described in terms of a specific sequence of steps of a method, these descriptions are only illustrative of the broader methods of the invention, and may be modified as required by the particular application. Certain steps may be rendered unnecessary or optional under certain circumstances. Additionally, certain steps or functionality may be added to the disclosed implementations, or the order of performance of two or more steps permuted. All such variations are considered to be encompassed within the disclosure disclosed and claimed herein.
While the above detailed description has shown, described, and pointed out novel features of the disclosure as applied to various implementations, it will be understood that various omissions, substitutions, and changes in the form and details of the device or process illustrated may be made by those skilled in the art without departing from the disclosure. The foregoing description is of the best mode presently contemplated of carrying out the invention. This description is in no way meant to be limiting, but rather should be taken as illustrative of the general principles of the invention. The scope of the disclosure should be determined with reference to the claims.
This application is related to co-pending and co-owned U.S. patent application Ser. No. 13/918,338 entitled “ROBOTIC TRAINING APPARATUS AND METHODS”, filed Jun. 14, 2013, U.S. patent application Ser. No. 13/918,298 entitled “HIERARCHICAL ROBOTIC CONTROLLER APPARATUS AND METHODS”, filed Jun. 14, 2013, U.S. patent application Ser. No. 13/918,620 entitled “PREDICTIVE ROBOTIC CONTROLLER APPARATUS AND METHODS”, filed Jun. 14, 2013, U.S. patent application Ser. No. 13/907,734 entitled “ADAPTIVE ROBOTIC INTERFACE APPARATUS AND METHODS”, filed May 31, 2013, U.S. patent application Ser. No. 13/842,530 entitled “ADAPTIVE PREDICTOR APPARATUS AND METHODS”, filed Mar. 15, 2013, U.S. patent application Ser. No. 13/842,562 entitled “ADAPTIVE PREDICTOR APPARATUS AND METHODS FOR ROBOTIC CONTROL”, filed Mar. 15, 2013, U.S. patent application Ser. No. 13/842,616 entitled “ROBOTIC APPARATUS AND METHODS FOR DEVELOPING A HIERARCHY OF MOTOR PRIMITIVES”, filed Mar. 15, 2013, U.S. patent application Ser. No. 13/842,647 entitled “MULTICHANNEL ROBOTIC CONTROLLER APPARATUS AND METHODS”, filed Mar. 15, 2013, U.S. patent application Ser. No. 13/842,583 entitled “APPARATUS AND METHODS FOR TRAINING OF ROBOTIC DEVICES”, filed Mar. 15, 2013, and U.S. patent application Ser. No. 13/761,090 entitled “APPARATUS AND METHODS FOR IMPLEMENTING LEARNING FOR ANALOG AND SPIKING SIGNALS IN ARTIFICIAL NEURAL NETWORKS”, filed Feb. 6, 2013 each of the foregoing being incorporated herein by reference in its entirety.