A portion of the disclosure of this patent document contains material that is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, as it appears in the Patent and Trademark Office patent files or records, but otherwise reserves all copyright rights whatsoever.
The present disclosure relates to, inter alia, computerized apparatus and methods for training of robotic devices to perform path navigation tasks.
Users may train a robot with supervised learning to perform a task (e.g., navigation, manipulation, and/or other tasks). Users may want to know what is the performance of the robot while executing a task (e.g., does it bump into obstacles, how precise is manipulation, and/or other measurements of performance). If a user knows about a robot's performance, he may speed up training process or make training more precise. Users may allow a robot to perform the task autonomously, but that may be expensive (e.g., time consuming, robot can damage itself, and/or other expenses or disadvantages).
According to conventional approaches, given a task in which a user wants to train a robot to navigate along a path from location A to location B (A and B may be the same location, in which case the path takes the form of a loop), the user may first control the robot one time or multiple times to move along the desired path. This may constitute the training of the robot. Thereafter, the robot may be expected to perform the same navigation autonomously.
One typical approach may be to store the motor commands that were executed during the training phase, and then simply replay them. This, however, may not work well in practice, at least because there may be some variability in how the motor commands translate into actual movement in physical space. In general, if the robot is slightly off course, it may continue to drift more and more off course.
One aspect of the disclosure relates to a non-transitory machine-readable storage medium having instructions embodied thereon, the instructions configured for execution by one or more processors, which when executed cause the one or more processors to: determine an occurrence of a feature in a sensory input provided by a sensor component of the robotic apparatus; obtain a training input for a learning prediction component. In one such embodiment, the learning prediction component is configured to produce a control output for the robotic apparatus; at a first time instance: evaluate a state of an activation indication; and based on an active state of the activation indication: cause the learning prediction component to produce a first version of the control output consistently with the feature and the training input; and adjust a parameter of the learning prediction component based on a discrepancy measure between the first version of the control output and the training input, the parameter adjustment configured to enable the learning prediction component to produce a second version of the control output subsequent to the adjustment, the second version of the control output characterized by a smaller discrepancy measure relative to the training input. Additionally, at a second time instance subsequent to the first time instance, the one or more processors: evaluate the state of an activation indication; and based on the active state of the activation indication: cause the learning prediction component to produce the second version of the control output in accordance with the occurrence of the feature; wherein the second version of the control output is configured to cause the robotic apparatus to perform an action consistent with the occurrence of the feature; and the instruction execution is configured to operate a hierarchical control process comprising a higher level component and the learning prediction component forming the lower level component with respect to the higher level component; and the activation indication is produced by the higher level component based on the determination of the occurrence of the feature.
In another aspect of the present disclosure, a method of reducing energy use/increasing duration of autonomous operation by a robotic device is disclosed. In one such embodiment, the robotic device comprises a first and a second actuator, a sensor component, a feature detection apparatus, and an output prediction apparatus comprising a first and a second predictor component. In one exemplary embodiment, the method includes: obtaining via the sensor component, one or more sensor data related to an environment of the robotic device; determining, by the feature detection apparatus, an occurrence of a given feature in the one or more sensor data, the determining of the given feature occurrence characterized by a plurality of computation operations; providing information related to the given feature to the first predictor component; evaluating a state of an activation indication; and based on the activation indication being in a first task state, operating the first predictor component to produce a first output based on the information, the first output configured to cause the robot to execute the first task comprising activation of the first actuator; wherein: based on the activation indication being in a second task state, operating the second predictor component to produce a second output based on the information, the second output configured to cause the robot to execute the second task comprising activation of the second actuator; and the operation of the second predictor component is characterized by an absence of additional computational operations by the feature detection component to determine the given feature occurrence in excess of the plurality of computation operations; where the absence of additional computational operations reduces energy use.
In one variant, the output prediction apparatus comprises a plurality of prediction components configured in a hierarchy comprising an upper and a lower level; the lower level of the hierarchy comprises the first and the second predictor components; the upper level of the hierarchy comprises a third predictor component of the plurality of predictor components; and the third predictor component is configured to produce the activation indication based on analysis of the sensor data. In one such variant, the activation indication comprises an output of a winner takes all process and the activation indication is configured to enable operation of one of the first or the second predictor components.
In another variant, the operation of the determination, by the feature detection apparatus, of the occurrence of the given feature in the one or more sensor data comprises: analyzing the one or more sensor data to determine a first plurality of input features of a first type and a second plurality of input features of a second type; determining a subset of features by randomly selecting a portion of the first input features and at least one feature from the second input features; comparing individual features of the subset of features to corresponding features of a plurality of training feature sets, individual ones of the plurality of training feature sets comprising a number of training features, the number being equal to or greater than the quantity of features within the subset of features; based on the comparison, determining a similarity measure for a given training set of the plurality of training feature sets, the similarity measure characterizing a similarity between the individual features of the subset and the corresponding features of the plurality of training feature sets; and responsive to the similarity measure breaching a threshold, selecting one or more training sets from the plurality of training sets.
In another such variant, the method includes determining one or more potential control outputs, individual ones of the one or more potential control outputs being associated with a corresponding training set of the plurality of training feature sets; and determining the first output based on a transformation obtained from the one or more potential control outputs; wherein: individual ones of the plurality of training feature sets comprise features of the first type and at least one feature of the second type; individual ones of the plurality of training feature sets are obtained during training operation of the robotic device, the training operation being performed responsive to receiving a training signal from the robotic device; an individual ones of the one or more potential control signals being determined based on the training signal and the features of the given training set.
In one variant, the similarity measure is determined based on a difference between one or more first values of the features of the subset and one or more second values of the features of the given training feature set.
In another variant, the similarity measure is determined based on a distance metric between individual features of the subset of features and corresponding features of the given training feature set. In some cases, selecting one or more training feature sets comprises selecting N training feature sets associated with a lowest percentile of the distance metric, N being greater than two.
In still other variants, the transformation comprises a statistical operation performed on individual ones of the one or more potential control signals associated with the selected N training sets. For example, the statistical operation may be selected from the group including mean and percentile.
In another aspect of the present disclosure, a robotic apparatus is disclosed. In one embodiment, the robotic apparatus includes: an energy source characterized by an energy capacity; a first actuator; a sensor component configured to obtain sensor data related to an environment of the robotic apparatus; a feature detection component configured to determine an occurrence of a feature in the sensor data, the determination of the given feature occurrence being based on the feature detection component performing a plurality of computation operations; and an output prediction component, comprising a first predictor, and a second predictor. In one exemplary embodiment, the output prediction component is configured to: operate the first predictor to produce a task activation output based on information related to the given feature occurrence; based on the task activation corresponding to a first task, operate the second predictor to produce a control output based on the information related to the given feature occurrence; and provide the output to the actuator thereby causing/enabling the robotic apparatus to execute the first task; wherein: the operation of the first predictor and the second predictor are based on a single instance of the feature detection component performing the plurality of computation operations, the single instance configured to reduce energy use associated with the first task execution.
In some variants, the output predictor component comprises a plurality of predictors configured in a hierarchy comprising an upper level and a lower level configured such that an output of a predictor of the upper level comprises an activation indication for a predictor of the lower level. In certain cases, the upper hierarchy level comprises the first predictor; and the lower hierarchy level comprises the second predictor. In one such exemplary implementation the robotic apparatus includes: a second actuator; wherein: the output prediction component comprises a third predictor, the third predictor corresponding to the lower hierarchy level, the output prediction component further configured to: based on the task activation corresponding to a second task, operate the third predictor to produce another control output based on the information related to the occurrence of the feature; and provide the output to the second actuator thereby causing/enabling the robot to execute the second first task.
In some cases, the robotic apparatus is a mobile platform comprising a wheel and a manipulator, the wheel being coupled to the first actuator, the manipulator being coupled to the second actuator; the first task comprises activation of the first actuator; the second task comprises activation of the second actuator; and execution of the first and the second tasks comprises the single instance of the feature detection component performing the plurality of computation operations. In some variants, the activation indication is configured to enable operation of one and only one of the second or the third predictors based on the detection of the occurrence of the feature.
In some cases, the robotic apparatus is an autonomous motorized platform comprising at least three wheels. In some such variants, the robotic apparatus includes a manipulator comprising first and second segments and at least one motorized joint, the motorized joint configured to modify an angle formed between the first and the second segments. In other variants, the feature comprises a representation of a target object; the first task comprises a target approach operation; and the second task comprises a target collision avoidance operation.
These and other objects, features, and characteristics of the system and/or method disclosed herein, as well as the methods of operation and functions of the related elements of structure and the combination of parts and economies of manufacture, will become more apparent upon consideration of the following description and the appended claims with reference to the accompanying drawings, all of which form a part of this specification, wherein like reference numerals designate corresponding parts in the various figures. It is to be expressly understood, however, that the drawings are for the purpose of illustration and description only and are not intended as a definition of the limits of the disclosure. As used in the specification and in the claims, the singular form of “a”, “an”, and “the” include plural referents unless the context clearly dictates otherwise.
All Figures disclosed herein are © Copyright 2014-2018 Brain Corporation. All rights reserved.
Implementations of the present disclosure will now be described in detail with reference to the drawings, which are provided as illustrative examples so as to enable those skilled in the art to practice the present technology. Notably, the figures and examples below are not meant to limit the scope of the present disclosure to a single implementation, but other implementations are possible by way of interchange of or combination with some or all of the described or illustrated elements. Wherever convenient, the same reference numbers will be used throughout the drawings to refer to same or like parts.
Although the system(s) and/or method(s) of this disclosure have been described in detail for the purpose of illustration based on what is currently considered to be the most practical and preferred implementations, it is to be understood that such detail is solely for that purpose and that the disclosure is not limited to the disclosed implementations, but, on the contrary, is intended to cover modifications and equivalent arrangements that are within the spirit and scope of the appended claims. For example, it is to be understood that the present disclosure contemplates that, to the extent possible, one or more features of any implementation may be combined with one or more features of any other implementation
In the present disclosure, an implementation showing a singular component should not be considered limiting; rather, the disclosure is intended to encompass other implementations including a plurality of the same component, and vice-versa, unless explicitly stated otherwise herein.
Further, the present disclosure encompasses present and future known equivalents to the components referred to herein by way of illustration.
As used herein, the term “bus” is meant generally to denote all types of interconnection or communication architecture that is used to access the synaptic and neuron memory. The “bus” could be optical, wireless, infrared or another type of communication medium. The exact topology of the bus could be for example standard “bus”, hierarchical bus, network-on-chip, address-event-representation (AER) connection, or other type of communication topology used for accessing, e.g., different memories in pulse-based system.
As used herein, the terms “computer”, “computing device”, and “computerized device”, include, but are not limited to, personal computers (PCs) and minicomputers, whether desktop, laptop, or otherwise, mainframe computers, workstations, servers, personal digital assistants (PDAs), handheld computers, embedded computers, programmable logic device, personal communicators, tablet or “phablet” computers, portable navigation aids, J2ME equipped devices, smart TVs, cellular telephones, smart phones, personal integrated communication or entertainment devices, or literally any other device capable of executing a set of instructions and processing an incoming data signal.
As used herein, the term “computer program” or “software” is meant to include any sequence or human or machine cognizable steps which perform a function. Such program may be rendered in virtually any programming language or environment including, for example, C/C++, C#, Fortran, COBOL, MATLAB™, PASCAL, Python, assembly language, markup languages (e.g., HTML, SGML, XML, VoXML), and the like, as well as object-oriented environments such as the Common Object Request Broker Architecture (CORBA), Java™ (including J2ME, Java Beans), Binary Runtime Environment (e.g., BREW), and other languages.
As used herein, the terms “connection”, “link”, “synaptic channel”, “transmission channel”, “delay line”, are meant generally to denote a causal link between any two or more entities (whether physical or logical/virtual), which enables information exchange between the entities.
As used herein the term feature may refer to a representation of an object edge, determined by change in color, luminance, brightness, transparency, texture, and/or curvature. The object features may comprise, inter alia, individual edges, intersections of edges (such as corners), orifices, and/or curvature
As used herein, the term “memory” includes any type of integrated circuit or other storage device adapted for storing digital data including, without limitation, ROM. PROM, EEPROM, DRAM, Mobile DRAM, SDRAM, DDR/2 SDRAM, EDO/FPMS, RLDRAM, SRAM, “flash” memory (e.g., NAND/NOR), memristor memory, and PSRAM.
As used herein, the terms “processor”, “microprocessor” and “digital processor” are meant generally to include all types of digital processing devices including, without limitation, digital signal processors (DSPs), reduced instruction set computers (RISC), general-purpose (CISC) processors, microprocessors, gate arrays (e.g., field programmable gate arrays (FPGAs)), PLDs, reconfigurable computer fabrics (RCFs), array processors, secure microprocessors, and application-specific integrated circuits (ASICs). Such digital processors may be contained on a single unitary IC die, or distributed across multiple components.
As used herein, the term “network interface” refers to any signal, data, or software interface with a component, network or process including, without limitation, those of the FireWire (e.g., FW400, FW800, and/or other FireWire implementation.), USB (e.g., USB2), Ethernet (e.g., 10/100, 10/100/1000 (Gigabit Ethernet), 10-Gig-E, etc.), MoCA, Coaxsys (e.g., TVnet™), radio frequency tuner (e.g., in-band or OOB, cable modem, etc.), Wi-Fi (802.11), WiMAX (802.16), PAN (e.g., 802.15), cellular (e.g., 3G, LTE/LTE-A/TD-LTE, GSM, and/or other cellular interface implementation) or IrDA families.
As used herein, the terms “pulse”, “spike”, “burst of spikes”, and “pulse train” are meant generally to refer to, without limitation, any type of a pulsed signal, e.g., a rapid change in some characteristic of a signal, e.g., amplitude, intensity, phase or frequency, from a baseline value to a higher or lower value, followed by a rapid return to the baseline value and may refer to any of a single spike, a burst of spikes, an electronic pulse, a pulse in voltage, a pulse in electrical current, a software representation of a pulse and/or burst of pulses, a software message representing a discrete pulsed event, and any other pulse or pulse type associated with a discrete information transmission system or mechanism.
As used herein, the term “receptive field” is used to describe sets of weighted inputs from filtered input elements, where the weights may be adjusted.
As used herein, the term “Wi-Fi” refers to, without limitation, any of the variants of IEEE-Std. 802.11 or related standards including 802.11 a/b/g/n/s/v and 802.11-2012.
As used herein, the term “wireless” means any wireless signal, data, communication, or other interface including without limitation Wi-Fi, Bluetooth, 3G (3GPP/3GPP2), HSDPA/HSUPA, TDMA, CDMA (e.g., IS-95A, WCDMA, and/or other wireless interface implementation.), FHSS, DSSS, GSM, PAN/802.15, WiMAX (802.16), 802.20, narrowband/FDMA, OFDM, PCS/DCS, LTE/LTE-A/TD-LTE, analog cellular, CDPD, RFID or NFC (e.g., EPC Global Gen. 2, ISO 14443, ISO 18000-3), satellite systems, millimeter wave or microwave systems, acoustic, and infrared (e.g., IrDA).
In one or more implementations, object recognition and/or classification may be implemented using a spiking neuron classifier comprising conditionally independent subsets as described in co-owned U.S. patent application Ser. No. 13/756,372 filed Jan. 31, 2013, and entitled “SPIKING NEURON CLASSIFIER APPARATUS AND METHODS” and/or co-owned U.S. patent application Ser. No. 13/756,382 filed Jan. 31, 2013, and entitled “REDUCED LATENCY SPIKING NEURON CLASSIFIER APPARATUS AND METHODS”, each of the foregoing being incorporated herein by reference in its entirety.
In one or more implementations, encoding may comprise adaptive adjustment of neuron parameters, such as neuron excitability which is described in U.S. patent application Ser. No. 13/623,820 entitled “APPARATUS AND METHODS FOR ENCODING OF SENSORY DATA USING ARTIFICIAL SPIKING NEURONS”, filed Sep. 20, 2012, the foregoing being incorporated herein by reference in its entirety.
In some implementations, analog inputs may be converted into spikes using, for example, kernel expansion techniques described in co-owned U.S. patent application Ser. No. 13/623,842 filed Sep. 20, 2012, and entitled “SPIKING NEURON NETWORK ADAPTIVE CONTROL APPARATUS AND METHODS”, the foregoing being incorporated herein by reference in its entirety. The term continuous signal may be used to describe a non-spiking signal (e.g., analog, n-ary digital signal characterized by n-bits of resolution, n>1). In one or more implementations, analog and/or spiking inputs may be processed by mixed signal spiking neurons, such as co-owned U.S. patent application Ser. No. 13/313,826 entitled “APPARATUS AND METHODS FOR IMPLEMENTING LEARNING FOR ANALOG AND SPIKING SIGNALS IN ARTIFICIAL NEURAL NETWORKS”, filed Dec. 7, 2011, and/or co-owned U.S. patent application Ser. No. 13/761,090 entitled “APPARATUS AND METHODS FOR GATING ANALOG AND SPIKING SIGNALS IN ARTIFICIAL NEURAL NETWORKS”, filed Feb. 6, 2013, each of the foregoing being incorporated herein by reference in its entirety.
In some implementations of robotic navigation in an arbitrary environment, the sensor component 166 may comprise a camera configured to provide an output comprising a plurality of digital image frames refreshed at, e.g., 25 Hz frame rate. The sensor output may be processed by a learning controller, e.g., as illustrated and described with respect to
In some implementations of robotic vehicle navigation, output of the sensor 166 in
Persistent switcher apparatus and methods are disclosed herein, in accordance with one or more implementations. Exemplary implementations may completely or partially alleviate this problem by using a hierarchy of behaviors and a stateful switcher. The switcher may learn what sensory contexts should be associated with changes in behavior, and what contexts shouldn't. The example task may then be trained with simple predictors based on the immediate visual input. Human operator knowledge about how to best divide a task in elementary behaviors may be leveraged.
In some implementations, a user (e.g., human operator) may train the system to switch between tasks based on the sensory context.
The predictor may be configured to output a vector of real values. Individual ones of those values may correspond to a possible task to perform. These values may be interpreted as the “priorities” of the different tasks. In some implementations, the priorities may be non-negative and add up to 1 (e.g., via a soft-max layer).
In some implementations, the predictor may output one value for each possible pair of tasks to switch between (e.g., there may be m2 outputs for m available tasks). This may be useful when a given context needs to be associated to different tasks depending on the task currently being performed.
The user may correct the system by providing an indication as to which task the system should be performing in a certain context. Assuming that the combiner (see below) is of the “overriding” type, user corrections may come as a vector with as many elements as tasks, with value “0” for all elements except “1” for the element corresponding to the task to be associated with the context.
In some contexts the user may want to signify that the system should not switch from whatever task it is performing. In some implementations, this may be done (e.g., in the case of an overriding combiner) by sending a vector of corrections with uniform values not breaching the threshold of the “persistent winner takes all” block. If all the predictor outputs do not breach the threshold in a given context, the “Persistent WTA” block may keep selecting the same task (see below).
The user corrections may be processed in specific ways before entering the combiner depending on the type of combiner and predictor used. For example, if the combiner is overriding and the predictor is a neural network with softmax output, it may be preferable to send [0.9, 0.05, 0.05] instead of [1, 0, 0] as a correction vector to avoid driving the network to saturation.
In some implementations, the combiner for this system may include the override combiner. Responsive to the user sending a correction, the combiner may output the correction, otherwise passing through the predictor signal. An additive combiner may be implemented when the user is aware of the current output of the predictor before it passes through the combiner and the persistent WTA.
The persistent WTA select output of the combiner (a vector of priorities), frame by frame, which of the available tasks should be performed. In some implementations, the persistent WTA may make such selection based on the following rules:
If the maximum of the input priorities is above a certain threshold, switch to the corresponding task.
Otherwise select the same task that had been selected in the previous frame.
The threshold parameter may be tuned to make the system more or less prone to switching. With a high threshold, the system may need very strong sensory evidence to switch from the current task, and vice-versa with a low threshold.
If the predictor outputs values for each possible pair of tasks to switch between, instead of just one value per task, the persistent WTA may work the same but may consider only the values of the pairs whose first task is the current one (the task that was selected in the previous frame).
Apparatus and methods for using cost of user feedback during training of robots are disclosed herein, in accordance with one or more implementations. According to exemplary implementations, a user may want to know about robot's performance without actually letting robot to perform the task autonomously. This may wholly or partially alleviate one or more disadvantages discussed above.
One starting point to solve this task may be to measure a current cost function C of a predictor while its learning to do a task:
C(t)=d(yd(t),y(t)) (Eqn. 1)
where C(t) represents current cost function at time t , y(t) represents output of the predictor (e.g., the component 422 of
The value of C(t) may be provided or shown to the user as a number and/or in any other graphical form (e.g., progress bar, intensity of the LED, and/or other techniques for conveying a quantity). Based on this number, the user may try to determine whether his corrections and predictions of the system are close or not, which may indicate how well or not the robot learned the task.
When a user shows the robot how to perform the task, he may do it in different ways on different occasions. For example, a user may teach the robot one or more possible obstacle avoidance trajectories which are close to each other. The system may generalize those examples and choose a single trajectory. In some implementations, if the user gives a new example of trajectory and measures costs according to Eqn. (1), the system may provide a large value indicating a mistake, even if on average the robot performs obstacle avoidance very well.
A possible solution may be to time-average (e.g., compute running average or sliding average) the costs so that all occasional prediction errors are not presented to the user. The user may receive a number that represents how many mistakes a predictor did on average for a given time interval (e.g., 1 second, 5 minutes, and/or other time interval).
The numeric values of costs may depend on one or more factors including one or more of the task, the predictor, the robot, the distance function in Eqn. (1), and/or other factors. For example, if a robot is trained to follow a trajectory with a constant linear velocity, then the costs may include costs of predicting angular velocity (e.g., costs on linear velocity may be small because it may be easy to predict a constant value). However, if a task is obstacle avoidance with backing up from obstacles, then predicting of linear velocity may contribute the costs. Different predictors may achieve different costs on different tasks. If a robot is trained with eight degrees of freedom, a range of costs may be different than costs during training navigation with two degrees of freedom (e.g., a (v, w) control space). Mean square error distance function used in Eqn. (1) may provide costs in different ranges comparing to cross entropy distance function.
In some implementations, in order to present costs to the user, it may be useful to normalize them to interval (0, 1) by the maximum achievable costs in this task (or by some fixed number if maximum costs are infinite like in cross entropy case). Normalizing may provide more independence of the cost value to the distance function and robot. Normalized costs may depend on the task and on the predictor. However, numbers from (0, 1) may be readily represented to the user and compared against each other.
Some tasks may differ from others in complexity and/or in statistical properties of a teacher signal. For example, compare a task A: navigating through a “right angle” path which consists of a straight line and then sudden turn and then straight line again and a task B: navigating a
To decrease sensitivity to the variations in the complexity and other properties of the task, a relative may be introduced to “blind” performance measure pb. A “blind” predictor may be used that does not takes into account input of the robot and only predicts average values of control signal. It may compute a running (or sliding) average of control signal. In some implementations, a “blind” performance measure pb may be expressed as:
P
b(t)=1−C(t)/Cb(t) (Eqn. 2)
where C(t) represents costs computed using Eqn. (1) for a main predictor, Cb(t) represents costs computed using Eqn. (1) for a “blind” predictor. In some implementations, if pb(t) is close to 1, then the prediction process may perform better than a baseline cost of the “blind” predictor. If Pb(t) is negative, then the main predictor may perform worse than a baseline.
In the example of training a “right angle” path, a blind predictor may provide low costs and be able to better perform the task the main predictor has to perform (which in this case means to perform also a turn and not only go straight). For a
A problem of presenting the costs to the user may be that costs may change in time in highly non-linear fashion:
The user may prefer presentation of costs as decreasing in a linear fashion (e.g., a feedback number slowly decreases from 0 to 1 during the training). Otherwise a user may see a huge progress during sudden decrease of the costs function and then there will be almost no progress at all.
The general shape of the costs curve may be universal (or nearly so) among tasks and predictors. A reference predictor may be selected, which is trained in parallel to the main predictor (i.e., the predictor that the robot actually uses to perform actions). A relative performance number may be expressed as:
P
r(t)=1−C(t)/Cr(t) (Eqn. 3)
where C(t represents costs computed using Eqn. (1) for a main predictor, Cr(t) represents costs computed using Eqn. (1) for a reference predictor. If pr(t)is close to 1, then the main predictor may perform better than the reference predictor. If pr(t) is negative, then the main predictor may perform worse than the reference.
A reference predictor may be selected such that it generally behaves worse than a main predictor but still follows the dynamics of costs of the main predictor (e.g., curves on
If there is noise in the teacher signal, noise in the environment, and/or the robot has changed, costs may increase because the main predictor has not yet adapted accordingly. However, if relative costs are used, this effect of noise (or robot change) may be diminished because costs of reference predictor may also increase, but relative performance may not change significantly.
Different predictors may perform differently with different tasks. Sometimes a user may try different predictors on the same task to determine which predictor is better for that task. Sometimes a user may train a robot to do different tasks using the same predictor. To disentangle variations in the predictors from variations in the tasks, a relative performance number may be introduced that is independent of the main predictor prb:
P
rb(t)=1−Cr(t)/Cb(t) (Eqn. 4)
where Cb (t represents costs computed using Eqn. (1) for a “blind” predictor, Cr(t) represents costs computed using Eqn. (1) for a reference predictor.
The main predictor prb may not depend on the main predictor the user chose to perform a task. If the reference predictor is fixed, prb may be used to characterize the task complexity. Consider a case when reference predictor is a linear perceptron. If prbis close to 1, then the task may be non-trivial so that the blind predictor cannot learn it, but simple enough for the linear predictor to learn it. If prb is close to zero, then either task may be too complex for the linear predictor to learn or it is trivial so that blind predictor achieves a good performance on it.
In some situations, it may be important to show to the user that something in the training process went wrong (e.g., changes in the environment such as lighting conditions and/or other environmental conditions, the user changing a training protocol without realizing it, and/or other ways in which the training process can be compromised). To achieve that, changes may be detected in the flow of relative performance values (prb(t), pr(t), pb(t)) using step detection algorithms. For example, a sliding average of p(t) may be determined and subtracted from the current value, and then normalized using either division by some max value or by passing into sigmoid function. The value may be presented to the user. An average of steps for different performance values may be determined and presented to the user. If the value is large, then something may have gone wrong, according to some implementations. For example, with prb(t), if the environment changed but task is the same, then performance of the “blind” predictor may stay the same because it may be unaffected by task changes, but performance of reference predictor may drop.
In the case of using several reference predictors [p0 . . . pn] that are trained in parallel to the main one, performance numbers may be determined from any pair of them:
p
ij(t)=1−Ci(t)/Cj(t) (Eqn. 5)
where Ci(t) represents costs computed using Eqn. (1) for a i-th reference predictor, Ci(t) represents costs computed using Eqn. (1) for a j-th reference predictor.
Depending on the properties of those reference predictors, performance numbers may characterize task, main predictor, and/or the whole training process differently. For example, [p0 . . . pn] may include a sequence of predictors so that a subsequent predictor is more “powerful” than a previous one (e.g., “blind”, linear, quadratic, . . . , look up table). The set of performance numbers may characterize how difficult is the task (e.g., only look up table predictor gets a good score vs. a task where linear predictor is already doing fine).
Reference predictors [p0 . . . pn] may include a sequence of predictors similar to the main predictor but with different parameters (e.g. learning coefficient). Performance numbers may be indicative of how noisy is the teacher signals and/or environment. In some implementations, if there a lot of noise, only predictors with a small learning coefficient may be able to learn the task. If training signals and features are clean (i.e., low or no noise), then a predictor with high learning coefficient may be able to learn the task.
A matrix of reference numbers pij (t) for a given set of predictors [p0 . . . pn] for different tasks may be provided into a clustering algorithm, which may uncover clusters of similar tasks. After that during training a new task, this clustering algorithm may provide to the user a feedback that the current task is similar in properties to the task already seen (e.g., so that the user can make a decision on which training policy to pick).
Predictor apparatus and methods are disclosed herein, in accordance with one or more implementations.
In some implementations, the predictor 422 and the combiner 414 components may be configured to operate a plurality of robotic platforms. The control signal 420 may be adapted by a decoder component 424 in accordance with a specific implementation of a given platform 430. In one or more implementations of robotic vehicle control, the adaptation by the decoder 424 may comprise translating binary signal representation 420 into one or more formats (e.g., pulse code modulation) that may be utilized by given robotic vehicle. U.S. patent application Ser. No. 14/244,890 entitled “APPARATUS AND METHODS FOR REMOTELY CONTROLLING ROBOTIC DEVICES”, filed Apr. 3, 2014 describes some implementations of control signal conversion.
In some implementations of the decoder 424 corresponding to the analog control and/or analog corrector 412 implementations, the decoder may be further configured to rescale the drive and/or steering signals to a range appropriate for the motors and/or actuators of the platform 430.
In some implementations of the discrete state space control implementation of the corrector 412, the decoder 424 may be configured to convert an integer control index into a corresponding steering/drive command using, e.g. a look up table approach described in detail in, e.g., U.S. patent application Ser. No. 14/265,113 entitled “TRAINABLE CONVOLUTIONAL NETWORK APPARATUS AND METHODS FOR OPERATING A ROBOTIC VEHICLE”, filed Apr. 29, 2014, the foregoing being incorporated herein by reference in its entirety.
The corrector 412 may receive a control input 428 from a control entity. The control input 428 may be determined based on one or more of (i) sensory input 402 and (ii) feedback from the platform (not shown). In some implementations, the feedback may comprise proprioceptive signals, such as feedback from servo motors, joint position sensors, and/or torque resistance. In some implementations, the sensory input 402 may correspond to the sensory input, described, e.g., with respect to
The corrector 412 may be operable to generate control signal 408 using a plurality of approaches. In some implementations of analog control for robotic vehicle navigation, the corrector output 408 may comprise target vehicle velocity and target vehicle steering angle. Such implementations may comprise an “override” functionality configured to cause the robotic platform 430 to execute action in accordance with the user-provided control signal instead of the predicted control signal.
In one or more implementations of analog correction provision for robotic vehicle navigation, the control signal 408 may comprise a correction to the target trajectory. The signals 408 may comprise a target “correction” to the current velocity and/or steering angle of the platform 430. In one such implementation, when the corrector output 408 comprises a zero signal (or substantially a null value), the platform 430 may continue its operation unaffected.
In some implementations of state space for vehicle navigation, the actions of the platform 430 may be encoded using, e.g., a 1-of-10 integer signal, where eight (8) states indicate 8 possible directions of motion (e.g., forward-left, forward, forward-right, left, right, back-left, back, back-right), one state indicates “stay-still”, and one state indicates “neutral”. The neutral state may comprise a default state. When the corrector outputs a neutral state, the predictor may control the robot directly. It will be appreciated by those skilled in the arts that various other encoding approaches may be utilized in accordance with controlled configuration of the platform (e.g., controllable degrees of freedom).
In some implementations of control for a vehicle navigation, the action space of the platform 430 may be represented as a 9-element state vector, e.g., as described in, e.g., the above referenced U.S. patent application Ser. No. 14/265,113. Individual elements of the state vector may indicate the probability of the platform being subjected to (i.e., controlled within) a given control state. In one such implementation, output 418 of the predictor 422 may be multiplied with the output 408 of the corrector 412 in order to determine probability of a given control state.
The adaptive predictor 422 may be configured to generate predicted control signal 418 based on one or more of (i) the sensory input 402 and the platform feedback (not shown). The predictor 422 may be configured to adapt its internal parameters, e.g., according to a supervised learning rule, and/or other machine learning rules.
Predictor realizations comprising platform feedback, may be employed in applications such as, for example, where: (i) the control action may comprise a sequence of purposefully timed commands (e.g., associated with approaching a stationary target (e.g., a cup) by a robotic manipulator arm), or where (ii) the platform may be characterized by platform state parameters (e.g., arm inertia, and/or motor response time) that change faster than the rate of action updates. Parameters of a subsequent command within the sequence may depend on the control plant state; a “control plant” refers to the logical combination of the process being controlled and the actuator (often expressed mathematically). For example, control plant feedback might be the exact location and/or position of the arm joints which can be provided to the predictor.
In some implementations, the robotic platform may comprise a manipulator arm comprising one or more segments (limbs) and a motorized joint, e.g., as shown and described in U.S. patent application Ser. No. 13/907,734 entitled “ADAPTIVE ROBOTIC INTERFACE APPARATUS AND METHODS”, filed May 31, 2013. As described in the above referenced application '734, the joint may be utilized to modify an angle of the manipulator segment and/or an angle between two segments.
In some implementations, the predictor 422 may comprise a convolutional network configured to predict the output 420 of the combiner 414 given the input 402. The convolutional network may be combined with other components that learn to predict the corrector signal given other elements of the sensory context. When the corrector 412 output comprises a zero signal (or null value), the combiner output 420 may equal the predictor output 418. When the corrector provides a non-zero signal, a discrepancy may occur between the prediction 418 and the output 420 of the combiner 414. The discrepancy may be utilized by the predictor 422 in order to adjust parameters of the learning process in order to minimize future discrepancies during subsequent iterations.
The sensory input and/or the plant feedback may collectively be referred to as sensory context. The sensory context may be utilized by the predictor 422 to produce the predicted output 418. By way of a non-limiting illustration, one exemplary scenario of obstacle avoidance by an autonomous rover uses an image of an obstacle (e.g., wall representation in the sensory input 402) combined with rover motion (e.g., speed and/or direction) to generate Context_A. When the Context_A is encountered, the control output 420 may comprise one or more commands configured to avoid a collision between the rover and the obstacle. Based on one or more prior encounters of the Context_A—avoidance control output, the predictor may build an association between these events as described in detail below.
The combiner 414 may implement a transfer function h(x) where x includes the control signal 408 and the predicted control signal 418. In some implementations, the combiner 414 operation may be expressed, e.g., as described in detail in co-owned U.S. patent application Ser. No. 13/842,530 entitled “ADAPTIVE PREDICTOR APPARATUS AND METHODS”, filed Mar. 15, 2013, as follows:
{circumflex over (u)}=h(u,uP). (Eqn. 6)
Various realizations of the transfer function of Eqn. 6 may be utilized. In some implementations, the transfer function may comprise one or more of: addition, multiplication, union, a logical ‘AND’ operation, a logical ‘OR’ operation, and/or other operations.
In one or more implementations, the transfer function may comprise a convolution operation, e.g., a dot product. In spiking network realizations of the combiner function, the convolution operation may be supplemented by use of a finite support kernel (i.e., a mapping function for linear space to a non-linear space) such as Gaussian, rectangular, exponential, etc. In one embodiment, a finite support kernel may implement a low pass filtering operation of input spike train(s). In some implementations, the transfer function h may be characterized by a commutative property. (Eqn. 7)
In one or more implementations, the transfer function of the combiner 414 may be configured as follows:
h(0,uP)=uP. (Eqn. 8)
In some implementations, the transfer function h may be configured as:
h(u,0)=u. (Eqn. 9)
In some implementations, the transfer function h may be configured as a combination of realizations of Eqn. 8-Eqn. 9 as:
h(0,uP)=uP, and h(u,0)=u, (Eqn. 10)
In one exemplary implementation, the transfer function satisfying Eqn. 10 may be expressed as:
h(u,uP)=(1−u)×(1=uP)−1. (Eqn. 11)
In one such realization, the combiner transfer function is configured according to Eqn. 8-Eqn. 11, to implement additive feedback. In other words, output of the predictor (e.g., 418) may be additively combined with the control signal (408) and the combined signal 420 may be used as the teaching input (404) for the predictor. In some implementations, the combined signal 420 may be utilized as an input (context) into the predictor 422, e.g., as described in co-owned U.S. patent application Ser. No. 13/842,530 entitled “ADAPTIVE PREDICTOR APPARATUS AND METHODS”, filed Mar. 15, 2013, incorporated supra.
In some implementations, the combiner transfer function may be characterized by a delay expressed as:
{circumflex over (u)}(ti+1)=h(u(ti),uP(ti)), (Eqn. 12)
where û(ti+1) denotes combined output (e.g., 420 in
As used herein, symbol ti may be used to refer to a time instance associated with individual controller update events (e.g., as expressed by Eqn. 12), for example ti denoting time of the first control output, e.g., a simulation time step and/or a sensory input frame step. In some implementations of training autonomous robotic devices (e.g., rovers, bi-pedaling robots, wheeled vehicles, aerial drones, robotic limbs, and/or other robotic devices), the update periodicity At may be configured to be between 1 ms and 1000 ms.
In some implementations, the combiner transfer function may be configured to implement override functionality (e.g., override combiner). The “override” combiner may detect a non-zero signal provided by the corrector, and provide a corrector signal as the combined output. When a zero (or no) corrector signal is detected, the predicted signal may be routed by the combiner as the output. In some implementations, the zero corrector signal may be selected as not a value (NaN); the non-zero signal may comprise a signal rather than the NaN.
In one or more implementations of a multi-channel controller, the corrector may simultaneously provide “no” signal on some channels and “a” signal on others, allowing the user to control one degree of freedom (DOF) of the robotic platform while the predictor may control another DOF.
It will be appreciated by those skilled in the art that various other realizations of the transfer function of the combiner 414 may be applicable (e.g., comprising a Heaviside step function, a sigmoid function, such as the hyperbolic tangent, Gauss error function, logistic function, and/or a stochastic operation). Operation of the predictor 422 learning process may be aided by a teaching signal 404. As shown in
u
d(ti)=h(u(ti−1),uP(ti−1)). (Eqn. 13)
The training signal ud at time ti may be utilized by the predictor in order to determine the predicted output uP at a subsequent time corresponding to the context (e.g., the sensory input x) at time ti:
u
P(ti+1)=F[xi,W(ud(ti))]. (Eqn. 14)
In Eqn. 14, the function W may refer to a learning process implemented by the predictor, e.g., a perceptron, and/or a look-up table.
In one or more implementations, such as illustrated in
Output 420 of the combiner e.g., 414 in
In one such realization of spiking controller output, the control signal 408 may comprise positive spikes indicative of a control command and configured to be combined with the predicted control signal (e.g., 418); the control signal 408 may comprise negative spikes, where the timing of the negative spikes is configured to communicate the control command, and the (negative) amplitude sign is configured to communicate the combination inhibition information to the combiner 414 so as to enable the combiner to ‘ignore’ the predicted control signal 418 for constructing the combined output 420.
In some implementations of spiking signal output, the combiner 414 may comprise a spiking neuron network; and the control signal 408 may be communicated via two or more connections. One such connection may be configured to communicate spikes indicative of a control command to the combiner neuron; the other connection may be used to communicate an inhibitory signal to the combiner network. The inhibitory signal may inhibit one or more neurons of the combiner the one or more combiner input neurons of the combiner network thereby effectively removing the predicted control signal from the combined output (e.g., 420 in
The gating information may be provided to the combiner by another entity (e.g., a human operator controlling the system with a remote control and/or external controller) and/or from another output from the corrector 412 (e.g., an adaptation block, an optimization controller). In one or more implementations, the gating information may comprise one or more of: a command, a memory address of a register storing a flag, a message, an inhibitory efficacy, a value (e.g., a weight of zero to be applied to the predicted control signal by the combiner), and/or other information capable of conveying gating instructions to the combiner.
The gating information may be used by the combiner network to inhibit and/or suppress the transfer function operation. The suppression (or ‘veto’) may cause the combiner output (e.g., 420) to be comprised solely of the control signal portion 418, e.g., configured in accordance with Eqn. 9. In one or more implementations the gating information may be used to suppress (‘veto’) provision of the context signal to the predictor without affecting the combiner output 420. In one or more implementations the gating information may be used to suppress (‘veto’) the feedback from the platform.
In one or more implementations, the gating signal may comprise an inhibitory indication that may be configured to inhibit the output from the combiner. Zero combiner output may, in some realizations, may cause zero teaching signal (e.g., 414 in
The gating signal may be used to veto predictor output 418 based on, for example, the predicted control output 418 being away from the target output by more than a given margin. The margin may be configured based on an application and/or state of the trajectory. For example, a smaller margin may be applicable in navigation applications wherein the platform is proximate to a hazard (e.g., a cliff) and/or an obstacle. A larger error may be tolerated when approaching one (of many) targets.
In one or more implementations, the gating/veto functionality may be implemented on a “per-channel” basis in a multi-channel controller wherein some components of the combined control vector may comprise predicted components, while some components may comprise the corrector components.
By way of a non-limiting illustration, if the turn is to be completed and/or aborted (due to, for example, a trajectory change and/or sensory input change), and the predictor output still produces turn instructions to the plant, the gating signal may cause the combiner to veto (ignore) the predictor contribution and pass through the controller contribution.
Predicted control signal 418 and the control input 408 may be of opposite signs. In one or more implementations, a positive predicted control signal (e.g., 418) may exceed the target output that may be appropriate for performance of as task. The control signal 408 may be configured to include negative signaling in order to compensate for over-prediction by the predictor.
Gating and/or sign reversal of controller outputs may be useful, for example, where the predictor output is incompatible with the sensory input (e.g., navigating towards a wrong target). Rapid changes in the environment (compared to the predictor learning time scale caused by e.g., appearance of a new obstacle, target disappearance), may require an “override” capability for the controller (and/or supervisor) to ‘override’ predictor output. In one or more implementations compensation for over-prediction may be controlled by a graded form of the gating signal.
In some implementations, the predictor learning process may be configured based on one or more look-up tables (LUT). Table 1 and Table 2 illustrate the use of look up tables for learning obstacle avoidance behavior.
Table 1 and Table 2 present exemplary LUT realizations characterizing the relationship between sensory input (e.g., distance to obstacle d) and control signal (e.g., turn angle a relative to current course) obtained by the predictor during training. Columns labeled N in Table 1 and Table 2, present use occurrence N (i.e., how many times a given control action has been selected for a given input, e.g., distance). Responsive to the selection of a given control action (e.g., turn of 15°) based on the sensory input (e.g., distance from an obstacle of 0.7 m), the counter N for that action may be incremented. In some implementations of learning comprising opposing control actions (e.g., right and left turns shown by rows 3-4 in Table 2), responsive to the selection of one action (e.g., turn of +15°) during learning, the counter N for that action may be incremented while the counter for the opposing action may be decremented.
As seen from the example shown in Table 1, the controller may produce a turn command as a function of the distance to obstacle falling to a given level (e.g., 0.7 m). As shown, a 15° turn is most frequently selected during the training for sequence. In some implementations, the predictor may be configured to store the LUT (e.g., Table 1) data for use during subsequent operation. During operation, the most frequently used response (e.g., turn of) 15° may be output for a given sensory input. In some implementations, the predictor may output an average of stored responses (e.g., an average of rows 3-5 in Table 1).
In some implementations, the predictor 422 learning process may be configured to detect targets and/or obstacles based on sensory input (e.g., 402 in
Training apparatus and methods are disclosed herein, in accordance with one or more implementations. Exemplary implementations may facilitate identifying multiple solutions (also referred to herein as teaching mode) that have a value when training a robot. Depending on one or more of the type of robot, the task, the state of training, and/or other information, the teacher may switch from one teaching mode to another one to teach a behavior in the most effective manner.
In some implementations, the control signal may include a combination of a correction signal and a prediction signal. The correction signal may be given by a teacher (e.g., a human user controlling the robot and/or an algorithm mastering the task). The prediction signal may be learned while performing the task by a module called Predictor. The combination of the two signals may be performed by the combiner (e.g., ModeCombiner in the diagram below).
There may be multiple behaviors the robot can perform when the teacher sends a correction signal. Examples of those behaviors may include one or more of:
There may be one or more ways the robot can behave when the teacher is not sending any correction. Examples of those behaviors may include one or more of:
Some implementations may provide five different modes that use a combination of what the robot does whether the teacher sends a correction or not. Those five combinations may assist teaching a behavior in the most effective manner.
In some implementations, the available modes may include one or more of Control, Listen, Override, Correct, Autonomous, and/or other modes. Exemplary implementations of various modes are described in the table below.
In some implementations, the available modes may be embodied in and/or effectuated by the combiner (also referred to herein as ModeCombiner).
The combiner mode may be changed either by the teacher (e.g., the human teaching the robot), and/or by an internal mechanism that determines the state the combiner should be in based on the internal of the system.
According to some implementations, the teacher may switch from one teaching mode to another one using the iPhone App, as depicted in the figure below.
In control mode, the robot may be remote controlled. Responsive to the teacher sending a correction, the robot may execute the command but may not learn the association. If the teacher is not sending any correction, then the robot may stay still. This mode may be useful when the teacher wants to control the robot without teaching (e.g., if the teacher is repositioning the robot to a starting position, but the teacher does not want the robot to do that on its own).
In listen mode, the robot may be “listening” to or otherwise observing what the teacher teaches, and may not do anything on its own. However, the robot may learn an association. But if the teacher stops sending a command, the robot may stay still and wait for the next command. In some implementations, teaching the robot may begin with the listen mode. Once enough examples have been provided and the robot has learned something, the override mode may be used.
In the override mode, the robot may execute what it has learned, unless a command is sent by the teacher. As soon as the teacher starts sending commands, the robot may stop taking the initiative and may let the teacher control it. For example, if the robot is turning left but the teacher wants the robot to turn right and provides a right turn command, then the robot may head the teacher's command, perform the action, and try to remember it for the next time the same situation occurs. Once a behavior only needs fine tuning, the correct mode may be used.
In the correct mode, the robot may integrate what the teacher commands with what the robot already knows. In some implementations, the robot may sum the teacher's command with what the robot already knows to get a final motor command. The teacher's correction may operate in this case as a deviation from the predicted command determined by the robot.
By way of non-limiting illustration, the robot may be driving full speed in a course. The teacher may want to teach the robot not to go so fast. A natural reaction for the teacher might be to press a “go-back button” on a gamepad used to provide commands to the robot. If the teacher does that in the override mode, which may tell the robot to drive backward, not to decrease its speed (the teacher still wants the robot to move forward in this context). The correct mode may be appropriate for this situation. The robots might say, “I like this blue trash bin over there, I am driving there as fast as I can,” and the teacher may say, “Hey Champ, you are going a little bit too fast, I would suggest that you reduce your speed.” Both variables may be added or otherwise combined, and at the end the robot might think something like, “Well, I still like this bin, but maybe I should go there a little bit more carefully.”
The autonomous mode may provide a way for the teacher to send a correction to the robot. In this mode, the learned behavior may be expressed without any changes or learning.
At operation 1902, the module may operate in the CONTROL mode. The teacher may tele-operate the robot and position it in a desired state.
At operation 1904, the teacher may switch to the LISTEN mode to initiate learning. The teacher may show a few examples of the task to the robot, but may not want the robot to interfere with the teacher's teaching during this process.
At operation 1906, after a stage of training, the teacher may switch to the OVERRIDE mode. The teacher may let the robot operate autonomously while retaining capability of providing correction(s) when the robot is not expressing the target behavior.
At operation 1908, the teacher may switch to the CORRECT mode. In this mode, the teacher may only provide small corrections (e.g., delta corrections) to optimize the behavior.
At operation 1910, once the teacher may determine that the behavior has been learned by the robot with a sufficient accuracy (e.g., based on an error determined with a based on a comparison of a target action performance and actual action performance), the teacher may switch the robot to the AUTONOMOUS mode, which may prevent any intervention from the teacher, and also provide a validation mode to test performance level.
In some implementations, switching from one mode to another may be done manually by the teacher (e.g., through a Smartphone App and/or other control mechanism). In some implementations, switching between modes may be based on an internal variable representing the state system (e.g., time, number of correction given, amplitude of the last n-corrections, quality of predictions, and/or other information).
Apparatus and methods for hierarchical learning are disclosed herein, in accordance with one or more implementations. In supervised learning, a user may train a robot to perform one or more tasks by demonstration. To achieve a given level of performance by the robot when performing a composite task (e.g. comprising multiple subtasks), the user may provide additional information to the robot, in addition to examples. In particular, the user may organize “low level” tasks (also referred to as behaviors) into hierarchies. Higher level control processes (e.g., classifiers and/or switchers) may be trained to control which lower level tasks/behaviors may be active in a given context. In some implementations, the user may select which sensory information (context) may be considered as relevant for a given task. Behavior training may comprise determination of associations between a given sensory context and motor output for the task. The user may select which particular prediction method (e.g., random k-nearest neighbors (RKNN) classifier, a perceptron, a look up table, a multilayer neural network, a fern classifier, a decision tree, a Gaussian processes, a probabilistic graphical model, and/or other classification approach) may be used for training a particular behavior.
A task of playing fetch may be referred to as a composite task, as described below with respect to
If the target is in the robot's gripper, the robot 2702 may navigate to the base 2740 while avoiding obstacles (e.g., wall 2744, and/or furniture items 2716, 2710 in
In some implementations, wherein the robot may be trained to execute the whole fetch task, the robotic device may employ a highly computationally intensive classifier (predictor) in order to determine infer the right sequence of actions from user demonstrations while maintaining resistance to noise. Such whole task classifier may be configured to determine which particular input may be relevant for a given task. For example, a state of the gripper may be regarded as not relevant to target search behavior. Additional training may be needed to communicate to the classifier that the search behavior should be performed independently of the state of the gripper.
According to some exemplary implementations, the user may train a subtask (component) of the fetch behavior at a given time. For example, at one time interval, the user may train the classifier how to search for a target and/or how to approach it while avoiding obstacles (e.g., “target approach”). At another time interval, the user may train the robot to perform a grasping behavior (e.g., “grasping a target”). Grasping training may correspond to a variety of contexts, e.g., objects of different shapes, colors, weight, material, object location relative gripper, and/or other characteristics. With the target object in the gripper the user may train the robotic device return to the base (“base approach”) e.g., via trajectory 2746 in
After achieving a target level of performance (e.g., adjudged by elapsed task execution time, presence and/or number of collisions, success rate, and/or other performance criteria) for individual tasks, the user may create a hierarchy from these behaviors.
According to the hierarchy depicted in
Hierarchies of subtasks (control actions) may be employed for operating robotic devices (e.g., the robotic vehicle 160 of
The apparatus 1000 may comprise an adaptive controller 1012 configured to determine and provide control output to channels 1020, 1022, 1024. In one or more implementations, individual channels 1020, 1022, 1024 may comprise one or more of motor actuators, electrical components (e.g., LED), transducers (e.g., sound transducer), and/or other electrical, mechanical, and/or electromechanical components. Output 1014 of the feature extractor 1010 may be provided to adaptive controller 1012. In some implementations of robotic navigation applications, the output 1014 may comprise, e.g., position of an object of interest represented by, e.g., coordinates (x, y, width, height) of a bounding box around the object, Extracted contours of obstacles from the image, information related to motion of the robotic platform (e.g., speed, direction of motion of the rover 160 in
In some implementations of robotic manipulation, the output 1014 may comprise angular position (e.g., angle) of one or more joints of the manipulator, and/or velocity of manipulator components (e.g., angular velocity of a joint, velocity of a manipulator segment (limb)) obtained, e.g., using a set of radial basis functions for added redundancy. In some implementations, the basis function decomposition may employ a basis kernel comprised of between 100 and 200 basis function components. In some implementations, the basis function decomposition may comprise kernel expansion methodology described in, e.g., U.S. patent application Ser. No. 13/623,842 entitled “SPIKING NEURON NETWORK ADAPTIVE CONTROL APPARATUS AND METHODS”, filed Sep. 20, 2012, the foregoing being incorporated herein by reference in its entirety. In one or more implementations, feature detection may be effectuated using saliency detection methodology such as described in e.g., U.S. patent application Ser. No. 14/637,138 entitled “SALIENT FEATURES TRACKING APPARATUS AND METHODS USING VISUAL INITIALIZATION”, filed Mar. 3, 2015, Ser. No. 14/637,164 entitled “APPARATUS AND METHODS FOR TRACKING SALIENT FEATURES”, filed Mar. 3, 2015, and Ser. No. 14/637,191 entitled “APPARATUS AND METHODS FOR SALIENCY DETECTION BASED ON COLOR OCCURRENCE ANALYSIS”, filed Mar. 3, 2015, each of the foregoing also incorporated herein by reference in its entirety.
The adaptive controller 1012 may operate a learning process configured based on teaching input 1018 and the output 1014 of the feature extractor. In some implementations, the learning process may comprise k-nearest neighbor (KNN) classifier, multilayer neural network, a fern classifier, a decision tree, a Gaussian process, probabilistic graphical model, and/or another classification process. During training, the user may use a remote interface device (e.g., a gamepad, a smartphone, a tablet, and/or another user interface device) configured to provide user commands 1018 to the learning process. The teaching input may 1018 may comprise one or more commands 1018 configured to cause the robot to approach the target while avoiding the obstacles thereby effectuating training of behavior 1 (subtask 1). During training and/or execution of behavior 1, the adaptive controller may utilize input from target tracker (feature extractor 1 1006) and from obstacle detector (feature extractor N 1004). Execution of behavior 1 by the adaptive controller 1012 may comprise provision of output to channels 1020, 1022. The learning process of the adaptive controller 1012 may be adapted based on the outcome of the behavior 1 execution. In one or more implementations, learning process adaptation may comprise modifying entries of a KNN classifier, e.g., as described in U.S. patent application Ser. No. 14/588,168 entitled “APPARATUS AND METHODS FOR TRAINING OF ROBOTS”, filed Dec. 31, 2014; modifying weights of an artificial neuron network (ANN), and/or one or more look-up tables, e.g., as described in U.S. patent application Ser. No. 13/842,530 entitled “ADAPTIVE PREDICTOR APPARATUS AND METHODS”, filed Mar. 15, 2013, Ser. No. 13/842,562 entitled “ADAPTIVE PREDICTOR APPARATUS AND METHODS FOR ROBOTIC CONTROL”, filed Mar. 15, 2013, Ser. No. 13/842,616 entitled “ROBOTIC APPARATUS AND METHODS FOR DEVELOPING A HIERARCHY OF MOTOR PRIMITIVES”, filed Mar. 15, 2013, Ser. No. 13/842,647 entitled “MULTICHANNEL ROBOTIC CONTROLLER APPARATUS AND METHODS”, filed Mar. 15, 2013, and Ser. No. 13/842,583 entitled “APPARATUS AND METHODS FOR TRAINING OF ROBOTIC DEVICES”, filed Mar. 15, 2013, each of the foregoing being incorporated herein by reference in its entirety.
Based on the learning process adaptation based on training, the adaptive controller 1012 may be capable of providing the output 1020, 1022 based on occurrence of features detected by the feature extractor 1006, 1007 and in absence of the teaching input 1018. By way of an illustration, the controller 1018 may learn to approach a target (e.g., 2706 in
In some implementations, operation of the adaptive controller 1012 may be configured based on an activation indication 1016. The activation indication 1016 may be used to convey to the adaptive controller 1012 as to whether it may be operable (e.g., actively executing the learning process) and/or providing the output 1020, 1022. In some implementations, the activation indication 1016 may convey a binary state (where “0” may be used to indicate that the component 1012 may be not active and “1” may be used to indicate that the component 1012 may be active, as shown in
The user may have an ability to perform managing operations on behavior 1. For example, the user may perform operations including one or more of change a name of a module, reset a module to naive state, revert a module to a particular point in history, delete a module, save/load locally, upload, download, and/or other operations. The user may change outputs and FE1. Such changes may depend on whether it is compatible with a trained behavior (e.g., dimensionality of features and outputs is preserved), in some implementations. The user may train a target approach. The user may change input from a target tracker to a base tracker so that robot will approach to base. A robot with multiple grippers may be trained to grasp with gripper 1 and user wired output from the adaptive module form gripper 1 to gripper 2.
In some implementations, a control apparatus (e.g., 1000 in
The apparatus 1030 of
The apparatus 1030 may be configured to learn to execute a plurality of tasks (e.g., learn a plurality of behaviors). The apparatus 1030 may comprise a plurality of adaptive controller components configured to determine and provide control output to channels 1052, 1054, 1055. In some implementations, the adaptive components 1046, 1050 may comprise individual hardware and/or software components. In one or more implementations, the components 1046, 1050 may comprise logical components of a learning control process.
In one or more implementations, the component 1046 may comprise component 1012 described above with respect to
Learning process of the component 1050 in
In one or more implementations, individual channels 1052, 1054, 1056 may comprise one or more of motor actuators, electrical components (e.g., LED), transducers (e.g., sound transducer), and/or other electrical, mechanical, and/or electromechanical components.
In some implementations, the learning process of component 1050 may be configured different (e.g., comprise a perceptron classifier) compared to learning process of the component 1046.
Subsequent to learning process 1050 adaptation based on training, the adaptive controller 1050 may be capable of executing behavior 1 (by providing the output 1052, 1054 based on occurrence of features 1032, 1040 detected by the feature extractor 1036, 1034) and in absence of the teaching input 1048; and executing behavior 2 (by providing the output 1054, 1055 based on occurrence of features 1032, 1041 detected by the feature extractor 1036, 1038) and in the absence of the teaching input 1049.
For example, the user may train the apparatus 1030 using a gamepad (or other input device used to provide teaching input 1049) to turn and grasp the target (“Behavior 2”). The learning process of the adaptive controller 1050 (e.g., perceptron classifier) may learn to turn the robot appropriately and grasp the target by autonomously effectuating correct torques on the robot's wheel (output 1054) and gripper (output 1055). During execution of subtask 2 (object grasping) controlling of another wheel (output 1052) may remain inactive. Sensory input 1002 in
Operation of adaptive controller components 1046, 1050 may be activated and/or deactivated using activation indications 1044, 1053, respectively. The activation indication 1044 may be used to convey to the adaptive controller 1046 as to whether it may be operable (e.g., actively executing the learning process) and/or providing the output 1052, 1054. In some implementations, the activation indication 1016 may convey a binary state (where “0” may be used to indicate that the component 1012 may be not active, as shown in
The activation indication 1534 may be used to convey to the adaptive controller 1050 as to whether it may be operable (e.g., actively executing the learning process) and/or providing the output 1055, 1054. In some implementations, the activation indication 1053 may convey a binary state (where “0” may be used to indicate that the component 1050 may be inactive; and “1” may be used to indicate that the component 1050 may be active, as shown in
If the adaptive controller component is activated, (e.g., 1050 in
The user may perform the managing operation of the apparatus 1030. The user may select (e.g., using a user interface) which component (e.g., 1046, 1050) may be active at a given time by providing appropriate activation indications 1044, 1053. Component 1050 in
The user may have an ability to perform managing operations of behavior 1 and 2 of the apparatus 1030. For example, the user may perform operations including one or more of change a name of a module, reset a module to a naive state, revert a module to a particular point in history, delete a module, save/load locally, upload, download, and/or other operations. The user may change output channel assignment and/or feature extractor components in accordance with a given application. Such changes may depend on whether it is compatible with a trained behavior (e.g., dimensionality of features and outputs is preserved), in some implementations. The user may train a target approach. The user may change input from a target tracker to a base tracker so that robot will approach to base. A robot with multiple grippers may be trained to grasp with gripper 1 and user wired output from the adaptive module form gripper 1 to gripper 2.
A control apparatus of a robot may be configured to execute a plurality of subtasks (behaviors). A task comprising two or more subtasks (behaviors) may be referred to as composite task or composite behavior. Upon learning to execute a given complex behavior, the robotic control apparatus may be trained to execute another complex behavior, as described below with respect to
The apparatus 1060 of
The apparatus 10960 may comprise a feature extractor component 1061 comprised of a plurality of individual feature extractors 1062, 1063, 1064. Individual feature extractors 1062, 1063, 1064 may be configured to determine occurrence of a respective feature (e.g., a representation of a target, gripper configuration) in sensory input 1002.
Adaptive controller components 1070, 1071, 1073, 1075 may receive output of the feature extractor component. Connectivity map denoted by broken arrows 1065 in
In some implementations, output 1066, 1067, 1068 of the higher level adaptive controller component 1070 may be provided as activation input (e.g., input 1016, 1044, 1053 in
Using the activation indication methodology, the controller apparatus 1060 of
Upon training, the apparatus 1060 may be configured to automatically execute a plurality of behaviors (e.g., “Target approach”, “Base approach”, and “Grasp the target”) based on the content of sensory input and learned configuration. The configuration of
In some implementations, a control apparatus may comprise a hierarchy of adaptive controller components configured to execute a plurality of complex tasks, e.g., as shown and described with respect to
The apparatus 1084 may comprise two higher hierarchy level adaptive controller components 1097, 1086, also referred to as switchers. In some implementations, the component 1097 is configured to implement a fetch composite task, described above with respect to
In one or more implementations (as shown in
During training, the user may activate different high-level behaviors (e.g. using a user interface) depending on what is the objective. For example, the first high level behavior (e.g., implemented by the component 1097) may be configured to find an object and bring it to the base. The second high level behavior (e.g., implemented by component 1086 may be configured to grasp the object near base and bring it to a closet. During operation of the first and the second higher level behaviors, one or more lower level behaviors (e.g., object grasping 1093) may be reused. The user may train the apparatus 1084 to switch between the two higher level behaviors based on a variety of parameters, e.g., the time of the day, number of objects in the closet, number and/or identity of objects near the base, and/or other considerations.
Switching between the higher level behaviors may be effectuated using activation indications 1090, 1085. As shown in
In some implementations, arbitrarily deep (e.g., within processing and/or energy capabilities of a given computational platform) hierarchies may be developed.
After training switcher components of one level in the hierarchy (e.g., 1097, 1086 in
In some implementations, activation indication output produced by a component of one level (e.g. M-th) in the hierarchy may be provided to components of the subsequent lower level (e.g., M−1). By way of an illustration of control hierarchy of the control apparatus 1100 shown in
In some implementations wherein the activations may comprise binary indications, the hierarchy may be represented using a graph. At a given time interval, an active path within the graph may be used to provide an activation indication from a higher level switcher (behavior) to the outputs. Control components in an active path (e.g., 1130, 1132, 1142, 1134) may be executed. Inactive components may remain not-executed (e.g., dormant and/or disabled).
The apparatus 1150 of
Components of the learning controller may be configured to receive output 1178 of the feature extractor components as shown in
Individual learning components e.g., 1176, 1180, 1152, 1154, 1156, 1158, 1160, 1162, 1164 of the hierarchy of
By way of an illustration of training dog-like behaviors, composite level two fetch 1154 and/or bring to center 1156 behaviors may be trained. The fetch behavior 1154 may activate one or more level one (simple) behaviors, e.g., target approach 1158, base approach 1160 and/or grasp the target 1164 behaviors.
One or more higher level (e.g., level three) composite behaviors may be trained. By way of an illustration, the third level switcher (behavior) 1180 may comprise training the robot to execute fetch behavior if ambient light is above a given level (e.g., sunrise to sunset and/or when lights are on). Training of the apparatus 1150 may comprise training the third level switcher (behavior) 1156 to cause the robot to execute bring to center behavior if ambient light is below a given level (e.g., when it is dark). Training one or more behaviors of a given level (e.g., 1152 of level three) may comprise bypassing activation of behaviors of the level immediately below (e.g., level two). As shown in
Another switcher (e.g., fourth level 1176) may be trained to select between tasks that fetch or follow; if the robot is outside fetch, or otherwise bring the object to the center if inside the house.
Individual modules may support an interface of getting features, human teaching signal, and activation signal. Different modules may include different logic.
Some implementations may include float activations (e.g., from 0 to 1) so that a module scales its output accordingly. This scheme may allow organizing an adaptive weighted mixture of behaviors depending on the context.
In some implementations, e.g., as shown and described above with respect to
In some implementations, task hierarchy illustrated in
At a given time interval, feature extractor component 1161 may be operated. Individual feature extractors (e.g., target tracker, gripper state) may be updated. Output of Individual feature extractors (e.g., feature detected or not, location of the feature, and/or other information).
Subsequently, output of feature extractors may be propagated to target destinations. By way of an illustration, components 1164, 1156, 1154 may be informed that the target tracker feature extractor has detected the feature. In some implementations, the propagation process may be configured based on a message queue, a semaphore, a register, software/hardware logical state (e.g., 0,1), and/or other approaches.
Subsequently, components of the hierarchy may be updated. In some implementations, the hierarchy update may comprise update of components associated with an active hierarchy tree. That is, active components (e.g., with active incoming activation indication) may be updated; inactive components (e.g., with inactive incoming activation indication) may remain unaffected and/or not executed.
The output of a feature extraction component may be configured in accordance with a format of input to one or more predictor component(s) (e.g., output 1032 consistent with input format for component 1046, 1050). In some implementations, a feature extractor may provide output comprising a data structure describing a bounding box of the tracked object with a given probability (e.g., 95%) that it is the object of interest and not a false positive. The predictor in the module may be configured to parse the data structure and interpret bounding box and probability separately.
In some implementations, the following considerations may be employed for interface between a feature extractor and a predictor component in order to obtain a given level of interoperability between the feature extractor and predictors:
In some implementations, in order to reduce energy use during operation of a robotic controller apparatus (e.g., 1150 of
In one or more implementations wherein performance of feature extractor may be improved, one or more feature extractors whose output may not be presently utilized (e.g., no subscribing active components) may remain operational. Operating unsubscribed (unclaimed) feature extractor(s) may expose learning processes of such unclaimed feature extractor(s) to additional sensory input. In some implementations wherein a feature extractor may be operated in accordance with an unsupervised learning process, additional sensory input may improve feature extractor performance by, e.g., reducing number of false positives of feature detection.
Individual learning controller components (e.g., 1012 in
In some implementations of e.g., fetch behavior comprising “target approach”, “base approach” and “grasping” may be implemented using a BrainOS™ approach, e.g., such as described in U.S. patent application Ser. No. 14/244,888 “LEARNING APPARATUS AND METHODS FOR CONTROL OF ROBOTIC DEVICES VIA SPOOFING”, filed Apr. 3, 2014, Ser. No. 13/918,338 entitled “ROBOTIC TRAINING APPARATUS AND METHODS”, filed Jun. 14, 2013, U.S. patent application Ser. No. 13/918,298 entitled “HIERARCHICAL ROBOTIC CONTROLLER APPARATUS AND METHODS”, U.S. patent application Ser. No. 14/244,890 entitled “LEARNING APPARATUS AND METHODS FOR CONTROL OF ROBOTIC DEVICES”, filed Apr. 3, 2014, Ser. No. 13/918,338 entitled “ROBOTIC TRAINING APPARATUS AND METHODS”, filed Jun. 14, 2013, U.S. patent application Ser. No. 13/918,298 entitled “HIERARCHICAL ROBOTIC CONTROLLER APPARATUS AND METHODS”, filed Jun. 14, 2013, 13/907,734 entitled “ADAPTIVE ROBOTIC INTERFACE APPARATUS AND METHODS”, filed May 31, 2013, Ser. No. 13/842,530 entitled “ADAPTIVE PREDICTOR APPARATUS AND METHODS”, filed Mar. 15, 2013, Ser. No. 13/842,562 entitled “ADAPTIVE PREDICTOR APPARATUS AND METHODS FOR ROBOTIC CONTROL”, filed Mar. 15, 2013, Ser. No. 13/842,616 entitled “ROBOTIC APPARATUS AND METHODS FOR DEVELOPING A HIERARCHY OF MOTOR PRIMITIVES”, filed Mar. 15, 2013, Ser. No. 13/842,647 entitled “MULTICHANNEL ROBOTIC CONTROLLER APPARATUS AND METHODS”, filed Mar. 15, 2013, and Ser. No. 13/842,583 entitled “APPARATUS AND METHODS FOR TRAINING OF ROBOTIC DEVICES”, filed Mar. 15, 2013, each of the foregoing being incorporated herein by reference in its entirety.
For example, the component 1154 of
In some implementations, components of a control hierarchy (e.g., Individual learning controller components (e.g., 1012 in
In some implementations wherein a given feature may be used by multiple components of a hierarchy (e.g., output 1032 of component 1036 in
In some implementations, a control hierarchy may be configured to operate using floating activation indications (e.g., configured in the range between 0 and 1). Such approach may allow for an adaptively weighted mixture of behaviors depending on the sensory context.
The apparatus 2800 may comprise an adaptive controller comprising a hierarchy of components. Components of the hierarchy of
Components of the hierarchy of
Activation output of the lowest level in the hierarchy (e.g., the output 2826 of level 2816) may comprise a plurality of active indications configured to enable simultaneous activation of multiple output channels (e.g., 2852, 2854) of level 2818 (e.g., multiple joints of a robotic arm).
In some implementations wherein the activations may comprise binary indications, the hierarchy may be represented using a graph. At a given time interval, an active path within the graph may be used to provide an activation indication from a higher level switcher (behavior) to the outputs. Control components in an active path. Inactive components may remain not-executed (e.g., dormant and/or disabled).
Components of the hierarchy of
The apparatus of
By way of an illustration of
A hierarchy comprising programmed and trained behaviors may be utilized to implement a variety of tasks. In some implementations, repeatable tasks may be pre-programmed (e.g., prepare coffee at 10 am daily); other tasks may be more readily trained (e.g., obstacle avoidance). Combining programming and training may provide individual users with flexibility to implement a variety of target application using the same general framework.
Notice that for ease of use some features may be hidden or optimized with UI: automatic selection of module type for switchers, no FE selection to avoid possible training bugs.
APPENDIX B.1-B.7 present computer code illustrating an exemplary implementation of a multi-level learning controller apparatus hierarchy comprising adaptive and pre-configured components, according to one or more implementations.
APPENDIX B.1 illustrates BrainOS™ back-end component comprising, module dynamic graph execution (self) (lines 718 through 885) configured to implement a feature extractor loop, propagation of active indication through the hierarchy graph, application of outputs, and/or other components.
APPENDIX B.2 illustrates implementation of BrainOS™ components, comprising, e.g., loading of previously trained behavior configurations, combiner implementation, predictor implementations, and/or other components.
APPENDIX B.3 illustrates one implementation of a bypass switching component configured to update selection choice only if received strong enough correction. Such implementations enables the control system to maintain selected behavior in absence of input.
APPENDIX B.4 illustrates one implementation of environment control components configured to Module that redirects corrections and activations to the environmental control
APPENDIX B.5 illustrates one implementation of BrainOS™ backend interface. The component i_brainos_backend.py describes an interface that shows a set of actions possible on the modules.
APPENDIX B.6 illustrates one implementation of BrainOS™ predictor and/or combiner components.
APPENDIX B.7 illustrates configuring a hierarchy of components in accordance with one or more implementations. The implementation of APPENDIX B.7 provides for creation, training, saving and loading of a hierarchy of behaviors for effectuating a target approach behavior. If the target is on the right, a controller may provide appropriate torques to the wheels in order to turn right; if the target is on the left, a controller may provide appropriate output (motor torque) to turn left. The controller of APPENDIX B.7 may comprise a two level hierarchy comprising two lower-level components configured to provide appropriate output to execute left/right turns and a higher-level switcher component configured to activate (select) one of the lower level behaviors based on the position of the target.
In some implementations, e.g., such as described above in connection with
The RKNN process may utilize a plurality of sensory inputs in order to predict motor command for controlling operation of a robot. In some implementations, the sensory input may comprise inputs characterized by different degrees of redundancy. In some implementations, the redundancy may be characterized by number of degrees of freedom (e.g., independent states) that may be conveyed by the input. By way of an illustration, a binary input (for example “ON”/“OFF”) indicative of wheel rotation (or lack thereof), proximity sensor output (ON, OFF), battery level below threshold, and/or other binary input may be characterized by lower level of redundancy compared to other inputs (e.g., video, audio). In some implementations of robotic vision based navigation, the input space may be regarded as having high dimensionality and/or highly redundant, compared to other inputs (e.g., audio, touch). In one or more implementations, an input characterized by number of dimensions that may at least 10 times that of be greater than number of dimensions of another input may be referred to as highly dimensional and/or highly redundant, compared to the other input.
When a highly redundant input may be augmented with data of lower redundancy, the highly redundant data may overwhelm the less redundant data when determining response of a KNN classifier.
The RKNN process may partition available data into subsets comprising a given number of features from the lower-dimension/lower redundancy data. The given number of features associated with lower-dimension/lower redundancy data may be referred to as the mandatory feature(s). As used herein the term feature may be used to describe one or more integer or floating point values characterizing the input, e.g., the presence or absence of an edge, corner, shape, texture, color, object, at particular locations in the image, values of pixels in an image, patches of color texture, brightness in the image, and/or in the image as a whole; properties of the mentioned features, such as size, orientation, intensity, predominance with respect to the surround, of an edge, corner, shape, texture, color, object; the position of one of the features in the image or the relative position of two or more of the above mentioned features; changes in features across consecutive frames—changes in position (optic flow), intensity, size, orientation; the pitch, intensity, spectral energy in specific bands, formants of sounds, the temporal changes thereof, disparity measure between two or more images, input from proximity sensors (e.g., distance, proximity alarm, and/or other), motor feedback (e.g., encoders position), motion sensor input (e.g., gyroscope, compass, accelerometer), previous motor commands or switching commands, a binary/Boolean categorical variable, an enumerated type, a character/string, and/or practically any characteristic of the sensory input.
Mandatory-feature RKNN approach may be utilized for determining associations between occurrence of one or more features (also referred to as context) and control output configured to cause an action by a robotic device.
Predicted output associated with individual subsets may be combined (e.g., averaged) to produce predicted output of the RKNN process. Selecting the number of neighbors within a subset, the subset size, and/or the number of subsets may be used to trade-off between speed of computations, and accuracy of the prediction.
By way of an illustration of operation of a robotic device controller (e.g., 400 in
During training, for a given occurrence of the input x (e.g., sensory features) and the output y (e.g., training input/correction signal) the associations may be determined using methodology described with respect to
The selection process may comprise, for a given classifier Ci of the plurality of classifiers (i=1 . . . N):
During operation, in order to compute the output y for a given input x, one or more (k) entries within individual classifiers Ci may be used to determine N output values yi of the output y. For a given classifier Ci, individual output yi may be determined based on a first statistical operation of the k-values of y obtained during training. In one or more implementations, the first statistical operation may comprise determination of a mean, median, mode, adaptively weighted mean, and/or other operation. The output y may be determined using a second statistical operation configured based on the N outputs yi of individual classifiers. In one or more implementations the second statistical operation may comprise determination of a mean, median, mode, adaptively weighted mean, and/or other operation.
For a given time instance, the inputs X1,1, X2,1, X3,1 in rows 1502, 1532, 1562, respectively, may be produced using a respective plurality of input features (e.g., the input 1402 in
Hashed rectangles in
The dimension d of the subset xi may be determined based on the dimension D of the input x as follows, in some implementations:
d=floor(√{square root over (D)}). (Eqn. 15)
Selecting processing parameters (e.g., d, N, k, and/or statistical operations) a trade-off between speed and accuracy may be adjusted.
With heterogeneous, multimodal feature vectors, adjusting processing parameters (e.g., d, N, k) may cause modification of the relative impact of the different types of features. By way of an illustration, if D=1024*1024*3+3, d may be determined using Eqn. 15, (d=1773). Accordingly, individual classifier may be characterized by a probability of p=0.0017 of using an audio feature. In order for an audio feature to be of influence with a level of certainty (e.g., greater than 50%) an impractically large ensemble size N may be required to see any effects of the audio features.
In some implementations of on-line learning for robot navigation, the input vector x may be configured by concatenating the RGB values of the pixels in an image (e.g., obtained using video camera 166 in
In order to facilitate contributions from different types of signals for determining a distance measure between features in a metric space (e.g., Euclidian distance), data from highly redundant input (e.g., the RGB pixel values) may be normalized. Various other distance measures (metrics) may be utilized, e.g., Mahalanobis, Manhattan, Hamming, Chebyshev, Minkowski, and/or other metrics.
In some implementations, the normalization may comprise shifting and/or scaling input features to a given value range (e.g., A1=64 to A2=196 for an 8-bit pixel value, 0 to 1 range, and/or other range). In one or more implementations, the normalization may be configured based on determining an on-line estimate of the mean and standard deviation of feature values to obtain z-score for individual feature (pixel). In one such implementation, for a given pixel (e.g., pixel at location (i1,i2)) a pair of values may be stored in history memory: one for the pixel mean and another for the pixel standard deviation. In some implementations, one or more parameters related to history of the input (e.g., pixel statistics) may be computed over a given interval, and/or the total duration of training. In one or more implementations, the learnings process may be configured to enable a user to reset contents of the parameter (e.g., pixel statistics).
In some implementations, data for one or more inputs may be scaled by a parameter NF, where NF is configured based on the overall number of features of a given feature types (i.e., the number of pixels in a subset t). In some implementations, the scaling parameter may be selected from the range between √{square root over (NF)} and 10×NF.
In some implementations, feature scaling operation may comprise determining an average distance measure for a plurality of input feature instances (e.g., distance between 2-100 images for images acquired at 25 fps) and scaling the input in accordance with the average distance measure. Various scaling implementations may be employed, e.g., scaling the less redundant input, scaling the highly redundant input, and or combination thereof. The scaling operation may enable reducing disparity between contributions to the distance determination from a highly redundant input (e.g., video and/or other input) and less redundant input (e.g., audio, touch sensor, binary, and/or other input).
The feature scaling may be configured based on an observed and/or an expected characteristic or characteristics of a feature that may be salient to the action. By way of an illustration of an implementation of vision based robotic navigation, size of a target, e.g., number of pixels and/or cumulative pixel value corresponding to a ball 174 in
When determining feature-action associations, traditional RKNN methodologies of the prior art may discount data provided via sensor modalities (e.g., audio, touch) characterized by fewer dimensions (fewer features) compared to other modalities (e.g., video). In some implementations of the present disclosure, a normalization operation may be applied to data of individual sensory modalities. The normalization operation may be used to increase and/or decrease contribution of data of one modality relative contribution of data of another modality to the RKNN distance determination. In some implementations, the normalization may comprise selecting a given number of mandatory features (e.g., the feature xl described above with respect to
In some applications wherein data from two modalities with greatly different number of features (e.g., video and audio) may be used with RKNN , distance between any two samples may be dominated by the sensory modality with greater number of features (e.g., video).
Equalization may be applied so that contribution of individual sensory modality on expected distances may be comparable relative contribution from another modality data. In some implementations, the equalization may comprise determining an on-line estimate of the mean and standard deviation of individual features; using the on-line estimates to calculate a normalizing constant Cs for individual sensory modality s such that the expected Euclidean distance between two samples, measured only using the features in modality s is 1.0. Weighting data of a given modality (to further reduce the mean squared error) as training parameters that may be optimized during training.
RKNN approach may be employed for determining the relative importance of the features for producing a given output. Feature relevance may be determined based on an error measure produced by individual KNN classifiers that contain those features. In some implementations, more relevant (e.g., “better”) feature for a given output, may correspond to a lower error of individual KNN classifier(s) that may contain that feature.
In some implementations, e.g., such as described with respect to
In one or more implementations, the computational load for a classification system may be characterized by being able to perform between 10 and 20 classifications per second (CPS) processing video input comprising a sequence of RGB frames of 12×12 pixel resolution refreshed at 25 frames per second. The processing system may comprise an embedded computer system comprising a processing component (e.g., Qualcomm Snapdragon 805/806) comprising a CPU component capable of delivering 210 Mega-Floating-point Operations Per Second (MFLOPS) and a GPU component capable of delivering 57 GFLOPS with maximum combined power draw of no more than about 2.5 W.
In some implementations, the RKNN may be utilized in order to determine a feature ranking parameter at a target rate (e.g., 15 CPS) while conforming to the processing load capacity and/or power draw limit by periodically re-initializing individual KNN classifiers on a rotating basis (i.e., not all at once) with a random set of features.
In order to re-populate the KNN classifier subsets (e.g., 1404, 1406, 1408 in
In some implementations of RKNN classifiers, feature assignment for a KNN classifier may be biased using a random process. By way of an illustration, random process used for selection of indexes for a classifier may be biased to increase probability of features with a higher utility within the input (e.g. 1402 in
In one or more implementations of RKNN, ensembles evolutionary algorithms (EA) may be employed. The evolving population may comprise population subsets of the classifiers. The genotype/phenotype characterizing the EA process may comprise the particular subset of features chosen for a given classifier. Low-utility classifiers may be culled from the population. New classifiers are may be produced by recombining and/or mutating the existing genotypes in the population of classifiers. The EA approach may produce a higher-performing ensemble of KNN classifiers, compared to existing approaches.
Apparatus and methods for behavioral undo during training of robots are disclosed herein, in accordance with one or more implementations. In some implementations, a robotic device may comprise a controller operating a software component (e.g., the BrainOS® software platform) configured to enable training. A user may control/training the robot with a remote device (e.g., comprising a Gamepad® controller and an iOS® application, and/or a handset device (e.g., a smartphone)). Training of the robot's controller may be based on the user observing robot's actions and sending one or more target control commands to the robot via the training handset. The trained controller of the robot may comprise a trained configuration configured to enable autonomous operation (e.g., without teaching input) by the robotic device. The trained configuration may be stored. A saved configuration may be loaded into the robot being trained thereby providing one or more trained behaviors to the robot. In some implementations, the trained configuration may be loaded to one or more other robots in order to provide learned behaviors. Subsequent to loading of the saved configuration, the controller learning process may match process configuration being present during saving of the configuration.
In some implementations, the BrainOS configuration may be stored (saved) automatically based on timer expiration (e.g., periodic saving) and/or based on an event (e.g., triggered by a user and/or based on a number of issued control commands).
The autosave timer interval T may be configured by the user via, e.g., interface of the training handset. In some implementations, the user may configure the controller process to save BrainOS configuration when the user may issue a command (correction) to the robot using the training handset. In one or more implementations, the training configuration may be saved upon receipt of n commands from the user (n≥1).
In some implementations, user commands (corrections) may arrive in one or more clusters (e.g., a plurality of commands) that may be interleaved by periods of user inactivity (e.g., training a race car to traverse a racetrack). In one or more implementations, a given command (e.g., the first, the last, and/or other command) in the cluster may trigger saving of the configuration.
In one or more implementations, the BrainOS may be configured to execute periodic and event-based autosave mechanisms contemporaneously with one another.
Trained behaviors of the robotic device may be configured based on learning of associations between sensory context (e.g., presence of an obstacle in front of the robotic vehicle) and a respective action (e.g., right turn) during training.
It may be beneficial to remove one or more trained behaviors from the trained configuration of the controller. In some implementations, the trained behavior removal may be based on one or more of performance below a target level, changes of the robot configuration (e.g., replacement of a wheel with a skate), changes in the robot's environment, learning of erroneous associations, and/or other causes.
The BrainOS software platform may be configured to save one or more parameters characterizing the learning process and/or the learned behaviors. In some implementations, the saved parameters may be used to produce (recreate) the BrainOS instance, for example, specify the sensory processing algorithms used for learning, describe learning algorithms. In one or more implementations, the saved parameters may be used to characterize learning parameters (e.g., the learning rate, weights in an artificial neuron network, entries in a look up table, and/or other parameters).
For example, the configuration saving may comprise storing of weights of a neural network may characterize mapping of the sensory input to motor outputs; and/or weights of feature extractor network component that may be used to process the sensory input.
The BrainOS software platform may be configured to enable users to selectively remove a learned behavior (and/or a portion thereof) via an undo and/or time machine operation.
At a given time, a user indication may be used to trigger an UNDO operation. In some implementations, the UNDO operation may comprise loading of the previously saved configuration. By loading at time t1 the configuration saved at time t0<t1, the robot controller effectively ‘forgets’ what it learned in time interval t0<t<t1.
The user UNDO indication may be configured based on one or more of the user activating a user interface element (e.g., a physical and/or virtual touch-screen button), a voice command, a gesture, and/or other actions, in one or more implementations. One or more undo indications may be utilized in order to remove multiple behaviors (and/or multiple versions of a given behavior). By a way of an illustration, pressing Ctl+Z in a MS Word® may effectuate UNDO of successive edits. Similarly, providing a plurality of UNDO indicating may cause removal of multiple learned associations.
In one or more implementations, the undo operation may be effectuated using a timeline comprising, e.g., a plurality of bookmarks (e.g., shown in
Combiner apparatus and methods are disclosed herein, in accordance with one or more implementations. In some implementations of supervised training of robots, control instructions (also referred to as corrections) produced by the trainer (e.g., human) may be combined with control instructions produced by the robot controller instructions (predictions).
In some implementations, the trainer may be provided with the control of the robot during training. Upon completion of the training, the robot may be configured to operate autonomously. In one or more implementations, training may comprise periods of autonomous operation and periods of learning, wherein trainer's control input may be combined with the robot's internally generated control.
The BrainOS software platform may be configured to enable online learning wherein trainer's input may be combined with the internally produced control instructions in real time during operation of the robotic device. That is, the input from the trainer may be applied to have an “on-line” effect on the robot's state during training. The robot not only learns to move forward in this sensory context, but it also actually moves forward into some new sensory context, ready to be taught from the new location or configuration.
By way of an illustration, when training a remotely controlled car using a joystick, the car may be trained to navigate a straight trajectory (e.g., autonomously move forward). Subsequently, a trainer may elect to commence training of one or more turn behaviors (e.g., turn left/right/turnaround/drive in a circle and/or other maneuvers). The trainer may use the joystick to provide left/right turn commands to the car to train it to turn. In one or more implementations, the trainer may assume the control during the turn action and/or provide the turn instructions incrementally (e.g., in three 30° increments to complete 90° turn).
Conversely, the car may be trained to follow a circle. In order to train the car to follow a straight line the trainer may utilize the joystick to provide the training input. In some implementations, the trainer may utilize the joystick forward position in order to override the car internal control input and to cause it to proceed forward. In one or more implementations, the trainer may utilize the joystick left/right position in order to provide an additive control input so as to guide the car to proceed in a straight line.
Controller of the robot may comprise a combiner component configured to effectuate the process of combining the training input (correction) with the internally generated control (prediction). In some implementations, the combiner may be configured to allocate a greater priority (e.g., larger weight) to the correction input, e.g., to implement “the trainer is always right” mode of operation. When the robotic platform (e.g., the car) may comprise multiple degrees of freedom (DOF), the training process may be configured to operate (e.g., train) a given DOF at a given time.
In some implementations, the combiner component may be operable in accordance with a Full Override process, wherein input by the trainer takes precedence (e.g., overrides) the internally generated (predicted) control signal. When operable in the override mode, the controller may learn the context-action association and produce predicted control signal. However, the prediction may not be acted upon. By way of an illustration of training a robot to traverse an obstacle course, the full override combiner may enable the trainer to communicate to the controller of the robot which actions to execute in a given portion of the course given the corresponding sensory context (e.g., position of obstacles). Use of the Full Override combiner process may reduce number of trials required to attain target level of performance, reduce probability of collisions with obstacles thereby preventing damage to the robot.
In some implementations, the combiner component may be operable in accordance with an Additive Combiner process. When operable in the Additive Combiner mode, the trainer's control input may be combined with the predictor output. In some implementations, the trainer's input and the predicted control may be configured in “delta” space wherein the controllable parameter (e.g., correction 408 in
For example, if the target angle is 45°, the trainer's input may initially exceed the target angle in order to reduce learning time. Subsequently as the robot begins to move its current trajectory towards the target (e.g., towards 45°), the trainer may reduce the input angle in order to prevent overshooting the target trajectory angle.
The Additive Combiner process may advantageously enable training of a one DOF at a given time instance thereby facilitating training of robotic devices characterized by multiple DOF. During training of the robot using the Additive Combiner process, the trainer and the robot contribute to the output (execute action). The trainer may adjudge the learning progress based on a comparison of the trainer's contribution and the action by the robot. The Additive Combiner process may facilitate provision of small corrections (e.g., heading change of a few degrees to direct the robot trajectory along 45° heading). In some implementations wherein default state of the robot's controller may be capable of providing control output that may operate the robot within a range from the target trajectory (e.g., drive forward in a straight line). The Additive Combiner may provide an economical approach to correcting the default trajectory to the target trajectory. By way of an illustration, natural predisposition of a randomly-initialized neural network may be sufficient for some behaviors (e.g., the neural network may have a tendency to turn away from certain obstacles without training.) This means that memory resources (e.g., weights) of the learning controller process may not have to be modified in some cases. When the predictor may select an action that may be acceptable to the trainer, network memory modifications may not be required. The network may be idiosyncratic in the way it performs certain tasks or actions, but reduced computational resources are required for achieving performance.
During training of a robot by a human trainer using the Additive Combiner, the teacher may encounter an appealing experience as the robot may begin to take over (assist) as the training progresses. Such experience may encourage the trainer (particularly a novice) to perform training of robots.
In some implementations, the combiner (e.g., 418 of the controller 400 in
In some implementations, e.g., such as illustrated in
In some implementations, e.g., such as illustrated in
In some implementations, the touchfader combiner may comprise overriding control methods, the user can implement “virtual additive” function by touching the screen just a bit to the left or to the right of the slider's current position.
In one or more implementations, the combiner (e.g., 414 in
If p>R×c,
b=c;
else
b=p+c; (Eqn. 16)
where b denotes the combiner output (e.g., 430 in
In some implementations, the interpolation may be expressed as follows:
where
p, predictor signal in [−1 1];
b, motor control (combiner output) signal in [−1 1];
c, corrector signal in [−1 1].
In some implementations, the combiner may be operable in accordance with the Threshold Nonlinearity (TN) process. The TN combiner process may be configured to provide additive and override functionality depending on the relative magnitude of the correction and prediction components. In some implementations, the TN combiner operation may be configured as follows:
b=p+c,
b=1 when b>1;
b=−1 when b<−1; (Eqn. 18)
where
p, predictor signal in [−1 1];
b, motor control (combiner output) signal in [−1 1];
c, corrector signal in [−2 2] range.
The combiner of Eqn. 18 may be operated to provide additive functionality. A threshold nonlinearity of the combiner of Eqn. 18 may be configured such that large corrector input (in excess of the maximum magnitude of the predicted component, e.g., 2) may be used to override the predictor component. By way of an illustration of an autonomous robot approaching an obstacle, when predicted output (e.g., −1) may cause a collision with the obstacle, an additive combiner with maximum correction signal value of 1 may be unable to prevent the collision. Using corrector signal range (e.g., from −2 to 2) that may exceed the predictor signal range (e.g., from −1 to 1) and the combined signal range (e.g., from −1 to 1). In the above example, the correction input of 2 may be used to effectively override the (erroneous) predicted output and guide the robot away from the obstacle.
The combiner of Eqn. 18 may be employed with the delta control process wherein the controllable parameter (e.g., correction 408) may be used to modify the current value of the system state (e.g., vehicle acceleration, motor torque, and/or other parameter) rather than indicating a target value (setpoint). In some implementations, the delta control approach may be utilized with a continuously varying robot state parameter (e.g., speed, orientation). In one or more implementations, the delta control approach may be used for manipulating a discrete state space controller (e.g., controlling an elevator, a pick and place manufacturing robot, a shelve stocking robot and/or other control application).
Systems and methods for training path navigation are disclosed herein. In some implementations, a robot may be trained to follow a path. The image shift determination may inform the robot of whether the robot is too far off to the left or right. The robot may adjust its heading to compensate. A PID controller may be used to add necessary negative feedback to make the system stable in following the path, in some implementations. Prior information about where in the training sequence the robot is currently operating may guide the robot in making correct inferences about new camera images, and may help the robot narrow the search space to gain computational efficiency.
One or more implementations described herein may provide a mechanism for enabling a robot to learn navigating a target trajectory while reducing deviation from a target path. In some implementations, the robot may comprise a robotic vehicle (e.g., 160 in
In one or more implementations, images 2000, 2010 may be obtained with a camera 166 mounted on a robotic vehicle 160 of
During training, images (e.g., raw and/or pre-processed) may be stored in a memory buffer (training buffer). In one or more implementations, preprocessing operations may comprise resampling, cropping, light balancing, and/or feature extraction. Motor commands issued by a trainer corresponding to time instances when the images are acquired may be stored. Additional sensory information (e.g., vehicle motion information, ambient environment information, vehicle operational parameters) corresponding to time instances when the images are acquired may be stored.
During autonomous operation, control process of the robot may be configured to compare a given (e.g., the most recent, current) image with one or more the images from the training buffer. In some implementations, the matching process may comprise comparing the given image to every image in the training buffer.
For computational efficiency reasons, it may not be desirable and/or feasible to compare each new camera image with every one of the stored images seen during training, according to some implementations. The robot may take advantage of the prior information about what are the likely regions of the path where it might be located, and only search those regions. The robot may search a random sample of other regions in case the prior information is inaccurate or invalidated for some reason.
In order to reduce computational requirements of the image match process, the given image may be compared to a subset of images from the training buffer using image match process described in detail below.
In some implementations, the search space may be narrowed using a form of particle filtering, where the robot maintains a plurality of particles indicating the likely parts of the path. That is, individual particle points at a particular image from the training buffer. As a new camera image arrives, the robot may search those images in the training buffer which are close to the particles. Individual particles may be moved to a nearby location in the training buffer where the stored image matches closely with the newly arrived image. Particles with poor match with the new image may be deleted. New particles may be created, either in the vicinity of the other particles, or from randomly sampled locations in the training buffer, shown in
The comparison subset of images may comprise a plurality of previously matched images and a plurality of randomly selected images (e.g., 20 in some implementations). The previously matched images may correspond to one or more tracked sequences (also referred as particles). The particle characterized by the best match (e.g., comprising previously used image) may be referred to as the primary particle. In some implementations, the best match image may be complemented by one or more second best image matches, corresponding to secondary particles.
The given image may be compared to images of the primary particle set. In some implementations, the primary particle set may comprise a previously used image I0 (e.g., 2000 in
In one or more implementations, the given image may be compared to images of one or more secondary particle set(s). A secondary particle set may comprise the previously identified second best IS1 and one or more (e.g., 2) images following the IS1 image in time in the training buffer. In some implementations, the secondary particle set may further comprise one or more (e.g., 2) images preceding the IS1 image in time in the training buffer. In one or more implementations, additional secondary particle sets of images may be configured in the manner that is described above. The particle sets and the randomly selected images may be referred to as the match search set.
In some implementations, the given image may be compared to images (e.g., 10-50) that may be randomly selected from images in the training buffer.
Image match process may be configured as follows, in accordance with one or more implementations. The amount of shift (e.g., in x and/or in y directions) between the given image and individual images of the match search set may be determined using the phase correlation approach. To determine whether the new image is shifted left or right compared with a stored image, a cross-correlation between the two images (e.g., 2000, 2010 in
In some implementations, the cross-correlation between two images may be determined by utilizing the spatial frequency domain. A windowing function (e.g., Hann, Gaussian, cosine, Hamming, and/or other windowing function) may be applied to individual images to produce windowed image and reduce edge effects. A fast-Fourier transform (FFT) may be performed on the windowed images to obtain a spatial frequency representation of the images. Normalized cross-power spectrum may be determined from the two spatial frequency representations. An inverse FFT may be applied to transform the cross spectrum to x,y domain and to obtain the cross-correlation. The argmax of the cross-correlation may be determined in order to obtain x,y coordinates (shift values) corresponding to maximum cross-correlation. In some implementations wherein x,y dimension may correspond to integer values (e.g., 1 pixel), the cross-correlation matrix may be interpolated onto a grid with greater resolution (e.g., 0.5 or 0.25 pixel grid).
Image shift parameters determined from the image correlation operation may be used when determining which image(s) from the match search set may be considered as a match to the given image. In some implementations, the given image may be shifted by amount determined from the image matching operation. By way of an illustration, image 2010 may be shifted to the right by amount depicted by arrow 2015 in
A similarity metric may be determined between the shifted/trimmed frames (e.g., 2020, 2040 in
As the robot is following a learned path, it may expect to receive approximately the same camera images in the same order as seen during training. In practice, the robot may not be expected to instantaneously jump from one part of the path to another part. It may be useful to determine and take into account prior information about which sequence number(s) of the training buffer are the most likely to be selected as the best match. The assigned likelihood of a new camera image actually being taken from the same location as a particular image in the buffer of training images, may be related to how well the new image matches up with the stored image as well as how likely that location was in the first place according to the prior information, as shown and described with respect to
In some implementations, history of the image matching process may be utilized in order to determine best match image. By way of an illustration, if a match search set image with the best match score (e.g., the lowest norm) belongs to the primary particle set than it may be selected as the best match. If the image with the best match score belongs to the secondary particle set than it may be selected based on an evaluation of image history parameter. In some implementations, image history parameter evaluation may be performed as follows:
a running window average match score may be determined by averaging over last N images within individual particle sets. In some implementations, the averaging window size may be selected equal 3 for video images acquired at 40 ms intervals and vehicle navigation speeds between 0.1 and 2 m/s. Other window lengths (e.g., 4-20 images) may be utilized and/or configured in accordance with expected navigation speed and/or video acquisition rate; (ii) the average match score for individual secondary particle sets may be compared to individual match scores from the match search set; (iii) best match image from the secondary particle set may be selected if it has the best match score (e.g., lower norm) of individual match scores from the match search set and the window averaged match score is better (e.g., lower norm) compared to the window-averaged match score of the primary particle.
The primary and/or secondary particle sets may be discontinued (discarded). In some implementations, the discarding may be configured based on a comparison of the match score for a given particle with the match score for randomly selected images. If image match score for a given particle is worse than individual scores for the randomly selected images the given particle may be discontinued. The discontinued particle may be replaced with the random image associated with the highest score.
Using this method, the estimate that the vehicle is in a given location may be based on data associated with previous frames, as accrued by each particle. For example, assuming independent noise across frames, a more robust estimate of the error in position could be achieved by calculating the product of the likelihood that the sensor data came from a given particle over the recent frames. Likelihood may be approximated using an exponentiated energy model. Likelihood may be explicitly calculated with a parametric statistical model. Particle deletion may be implemented using a temporally decaying cumulative log probability that deletes a given particle when the probability is lower than a fixed threshold. Additional techniques in rejection sampling (e.g. similar to Metropolis-Hastings process) sampling may be used to define a threshold.
The best match image obtained using image match methodology (e.g., such as described herein) may be used to determine changes (corrections) to motor commands during path navigation by a robotic vehicle. By way of an illustration, if the best match image (e.g., 2000 in
Exemplary implementations of the methodology described herein may be applicable to controlling trajectory of a robotic device due to (i) position mismatch (e.g., the robot being located physically to the left of a target location); and/or (ii) due to orientation mismatch (e.g., the robot being in the same physical location while oriented towards the left of the target trajectory). To illustrate, assuming the camera faces straight ahead, the center of the image may be the spot which the robot is headed towards. Thus, if this spot is to the left of the spot where the robot is supposed to be headed towards (as defined by the camera image seen during training), then the robot may need to adjust its heading rightwards.
During operation when the robot may follow a target trajectory the shift amount determined using the image matching process may be close to 0 (this configuration may be referred to as “the robot stays on track”). In some implementations the shift amount may be utilized as an error metric by the control process of the robot. The steering signal (which may be adjusted leftwards or rightwards) may be selected as the control variable for the process. A negative feedback loop may be used in order to reduce the error metric to (and/or maintain at) a target level operation of the robot. In some implementations, the target error level may comprise zero displacement.
A PID controller may be used in order to reduce/maintain the error metric during operation of the robot. In some implementations, motor commands at a given time step may be obtained by taking the stored motor commands from the training buffer that may correspond to the best matching stored image. Those motor commands may be combined with the output from the PID controller in order to stabilize operation of the robot.
Systems and methods for providing VOR for robots are disclosed herein, in accordance with one or more implementations. Exemplary implementations may provide VOR-like functionality for a robot. In some implementations, VOR for a robot may refer to the stabilization of the camera image while the robotic body is moving. In existing robotic platforms where the movement of the system might be subject to unexpected disturbances (e.g. quad copter, two-wheeled robot (e.g., a Segway-type configuration), and/or other robotic platforms), this stabilization may improve the quality of the camera signal. Exemplary implementations may, for example, reduce blurring associated with the motion of a camera. The cleaned camera image may be later used for various applications (e.g., recording of stable video footages, clean sensors data for better post processing, and/or other applications).
Image stabilization (IS) may include a family of techniques used to compensate for pan, tilt, and roll (e.g., angular movement, equivalent to yaw, pitch and roll) of the imaging device. That family of techniques may include one or more of optical image stabilization, digital image stabilization, stabilization filters, orthogonal transfer CCD, camera stabilizer, and/or other techniques.
In some implementations, a camera stabilizer may utilize a set gimbal device. According to some implementations, a gimbal may be a pivoted support that allows the rotation of an object about a single axis. A set of three gimbals mounted with orthogonal pivot axes may be used to allow an object mounted on the innermost gimbal to remain independent of the rotation of its support.
The system may use a physical camera stabilizer to solve the problem of stabilizing the camera mount, in some implementations. This approach may enable VOR-like functionality on a robot with low cost sensors (e.g., gyroscope, accelerometer, compass, and/or other sensors) and low cost actuators (e.g., open loop control system, no feedback from the servos, and/or other actuators). In comparison, existing systems typically either use a fairly complex and expensive mechanical system (e.g., a gimbal camera) and/or a computationally expensive software solution that are not adapted to small robots with embedded low-powered processing boards.
Exemplary implementations may be not computationally expensive and may provide one or more of the following properties: change the center of the visual field dynamically, compensate selectively for unexpected movements versus desired movements, dynamic activation and deactivation of the VOR-like functionality, compensate for sensory motor delays if couple with a predictive model, and/or other properties.
Some implementations may assume that the camera to be stabilized is mounted on a set of one, two, or three servos, wherein an individual servo is allowed to rotate the camera on one axis (e.g., pan, tilt, or roll). The combination of servos may provide up to three degree of freedom for the stabilization of the movement of the camera.
The figure below illustrates an exemplary architecture used to accomplish the VOR-like functionality stabilization of a camera image, in accordance with one or more implementations.
The VOR-like functionality module may integrate inputs from sensors (e.g., state of the system, blue box) and higher level signal (e.g., sensorimotor control systems, red box) to determine the correction and desired position of the camera to stabilize the image (e.g., camera servos position, right part of the diagram).
The state of the robot may be provided one or more sensors that provide the global orientation of the robot and/or a derivative of the global orientation in multiple axes. Some implementations may include one or more of a gyroscope, an accelerometer, a magnetometer, and/or other sensors. A gyroscope may include a device that measures orientation changes, based on the principles of angular momentum. Some implementations may utilize a three-axis gyroscope, which may provide the velocity of change in the three directions x, y, and z. An accelerometer may include an electromechanical device that measures acceleration forces. These forces may be static, like the constant force of gravity pulling at your feet, or they could be dynamic, caused by moving or vibrating the accelerometer. By measuring the amount of static acceleration due to gravity, the angle the device is tilted at with respect to the earth may be determined. A magnetometer may include a device that measures the direction of the magnetic field at a point in space. In some implementation, the system may include a three-axis magnetometer.
The higher level inputs may be provided by a sensorimotor control process, which may control the desired movement of the robot (e.g., output of the motor control system) and/or the desired focus point of the camera (e.g., output of the vision control system).
The motor control system may represent any process and/or devices configured to send a motor command to the robot. A motor command may, for example, be represented in a different space (e.g., a desired set point, a new desired linear and angular velocity for a wheeled robot, a torque command, and/or other representations). A motor control system may, for example, include one or more of a wireless joystick connected to the robot, a process that configured to follow a pre-defined path, a learning system, and/or other control mechanisms.
The vision control system may represent any process and/or device configured to update the focus point of the camera to be stabilized, and/or to switch on and off the VOR-like functionality module. In some implementations, a vision control system may include a handheld computing device (e.g., a tablet computer, a Smartphone, and/or other handheld device) where the user can tap on the screen displaying the camera stream the position where the camera image should be center, and/or an automatic tracker that follows an object of interest in the visual field.
At individual time steps, the VOR-like functionality module may receive the change of orientation since the last step, as well as the new motor commands. In this stage, the focus point may be assumed to be fixed and be set for each servo.
In some implementations, the process may run in an infinite loop, and may exit the loop responsive to the main program of the robot being stopped. Before entering the loop, the desired position for individual servos may be set to the actual position of the servo. This may suggest that, in the absence of movement, the servo should not be moved.
If the VOR module is activated, new sensors values may be provided, a new orientation of the robot may be updated, and the change of orientation on dt may be determined, according to some implementations. The motor command may be sent to the robot and signals to the next module may be provided in order to update a new desired position.
The next stage of the VOR process may be to update the new desired position of individual servos. The desired position may account for (i) un-expected movement (such displacement should be compensated) versus (ii) desired movement where the VOR-like functionality should be counter-compensated. For a given servo i, this may be achieved by a twofold process, in some implementations. First, the desired position of the given servo may be added to or otherwise combined with the velocity of change for the particular axis multiplied by dt and a gain that is servo dependent (k1[i]). Second, the amplitude of the desired movement may be removed along individual axes multiplied by dt and a gain that is also servo dependent (k2[i]). Some implementations may assume knowledge of how a given motor command will affect the camera movement in each direction.
The new desired position may be provided to individual servos of the camera mount. The desired position may be decayed so that it slowly gets back to the focus point overtime. This may facilitate compensating over time drift due to error measurement stemming from noise in the sensors. The gain k1 and k2 may not have to be perfect, in some implementations.
In some implementations, k1 and/or k2 may not be a constant to achieve perfect compensation, but instead may exhibit a slow drift toward the focus point.
In some implementations, the focus point of the camera may change dynamically by another process using the VOR module. Some implementations may include coupling the VOR system with a tracker (e.g., OpenTLD, MIL, and/or other tracker) such that the image is stabilized on the object of interest. Some implementations may involve coupling the VOR system with a user interface to control camera position. Such an interface may be a physical interface (e.g., a head-mounted device such as an Oculus Rift) configured to allow the user moves his/her head to define the new position and get the feedback from the camera on the head screen. Some implementations may include coupling the VOR system with a vision control system, making sure that the robot will look to a direction perpendicular to the acceleration vector (in the horizon).
The focus position of the camera may be a variable that can be updated by the vision control system. In this case, in the absence of unexpected movement, the “decay desired position” module may cause the camera to drift to the new position.
Compensation for sensory-motor delays may be included in implementations of the system. Some implementations may include a predictive module configured to prevent sensorimotor delays and/or components of un-desired movement that can be predicted based on the input of other sensors (once it is integrated). For example, according to some implementations, if the system goes into an oscillatory behavior, most of the oscillation may be predicted and compensated once it kicks on.
In some implementations, information from the gyroscope may be utilized to compensate for movement. In some implementations, a sensor fusion process may be utilized to integrate that information and improve the compensation.
The sensor fusion module may obtain a measurement from one or more of an accelerometer, magnetometer, gyroscope, and/or other source. The sensor fusion module may integrate the measurement(s) using a sensor fusion process to give an accurate estimation of the orientation of the system in space. The following figure illustrates an exemplary sensor fusion process, in accordance with one or more implementations.
APPENDIX A presents exemplary code in the Python language that may be utilized with a two-wheeled, self-balancing, robotic platform (e.g., similar to a Segway-type configuration), compensating for pan and tilt, in accordance with one or more implementations.
Implementations of the principles of the disclosure may be applicable to a wide assortment of applications including computer-human interaction (e.g., recognition of gestures, voice, posture, face, and/or other interactions), controlling processes (e.g., processes associated with an industrial robot, autonomous and other vehicles, and/or other processes), augmented reality applications, access control (e.g., opening a door based on a gesture, opening an access way based on detection of an authorized person), detecting events (e.g., for visual surveillance or people or animal counting, tracking).
A video processing system of the disclosure may be implemented in a variety of ways such as, for example, a software library, an IP core configured for implementation in a programmable logic device(e.g., FPGA), an ASIC, a remote server, comprising a computer readable apparatus storing computer executable instructions configured to perform feature detection. Myriad other applications exist that will be recognized by those of ordinary skill given the present disclosure.
Although the system(s) and/or method(s) of this disclosure have been described in detail for the purpose of illustration based on what is currently considered to be the most practical and preferred implementations, it is to be understood that such detail is solely for that purpose and that the disclosure is not limited to the disclosed implementations, but, on the contrary, is intended to cover modifications and equivalent arrangements that are within the spirit and scope of the appended claims. For example, it is to be understood that the present disclosure contemplates that, to the extent possible, one or more features of any implementation can be combined with one or more features of any other implementation.
This application is a continuation of, and claims the benefit of priority to, co-owned and co-pending U.S. patent application Ser. No. 14/694,901 of the same title filed Apr. 23, 2015, which claims the benefit of priority benefit to co-owned U.S. Provisional Patent Application Ser. No. 62/059,039 entitled “APPARATUS AND METHODS FOR TRAINING OF ROBOTS”, filed Oct. 2, 2014, each of which is incorporated herein by reference in its entirety. This application is related to co-pending and co-owned U.S. patent application Ser. No. 14/607,018 entitled “ROBOTIC APPARATUS AND METHOD FOR TRAINING PATH NAVIGATION”, filed Jan. 27, 2015, Ser. No. 14/588,168 entitled “MULTIMODAL RANDOM KNN ENSEMBLES ROBOTIC CONTROLLER APPARATUS AND METHODS”, filed Dec. 31, 2014, Ser. No. 14/244,890 entitled “APPARATUS AND METHODS FOR REMOTELY CONTROLLING ROBOTIC DEVICES”, filed Apr. 3, 2014, Ser. No. 13/918,338 entitled “ROBOTIC TRAINING APPARATUS AND METHODS”, filed Jun. 14, 2013, U.S. patent application Ser. No. 13/918,298 entitled “HIERARCHICAL ROBOTIC CONTROLLER APPARATUS AND METHODS”, filed Jun. 14, 2013, Ser. No. 13/907,734 entitled “ADAPTIVE ROBOTIC INTERFACE APPARATUS AND METHODS”, filed May 31, 2013, Ser. No. 13/842,530 entitled “ADAPTIVE PREDICTOR APPARATUS AND METHODS”, filed Mar. 15, 2013, Ser. No. 13/842,562 entitled “ADAPTIVE PREDICTOR APPARATUS AND METHODS FOR ROBOTIC CONTROL”, filed Mar. 15, 2013, Ser. No. 13/842,616 entitled “ROBOTIC APPARATUS AND METHODS FOR DEVELOPING A HIERARCHY OF MOTOR PRIMITIVES”, filed Mar. 15, 2013, Ser. No. 13/842,647 entitled “MULTICHANNEL ROBOTIC CONTROLLER APPARATUS AND METHODS”, filed Mar. 15, 2013, and Ser. No. 13/842,583 entitled “APPARATUS AND METHODS FOR TRAINING OF ROBOTIC DEVICES”, filed Mar. 15, 2013, each of the foregoing being incorporated herein by reference in its entirety.
Number | Date | Country | |
---|---|---|---|
62059039 | Oct 2014 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 14694901 | Apr 2015 | US |
Child | 16031950 | US |