The invention relates to a method for the automatic decision-making about the execution of actions in a situational context. The invention further relates to a program-controlled machine for performing a method. The method can be used in an autonomous system, such as e.g. a robot, which has one or several actions, in order to decide which of the actions are to be executed by the robot at a given time. The method is suitable for decisions on the execution of actions, whose execution requirements do not only depend on current measured values, but also on their temporal course.
Conventional automatic decision-making machine are known in the art.
It is assumed, that the situational context is defined by at least one measured variable M, which can be detected by at least one sensor. In this case, the sensor delivers measured variable-specific measured values M(tk), which are available in the course of time at defined times t0, . . . ,tm.
A first function V1(ta) or a reward value can be derived on the basis of the measured values M(tk) (k=a−1, . . . , a−m) up to the time ta via an artificial neural network at a current time ta. The function V1(ta) reflects the current need for the execution of the action at time ta.
Furthermore, a second function V2(ta) or a basic reward value can be assigned to the action at a time ta, which is calculated by a first algorithm from the first function V1(ta) and the temporally preceding value of V2(ta-1). The function V2(ta) reflects the cumulative need for the execution of the action at time ta.
The two functions V1(ta) and V2(ta) can also be created and improved by manually guiding the program-controlled machines or a part of the program-controlled machine, in particular a teach-tool. As a result, an automatic sequence generation and continuous improvement of the system can be achieved.
The decision on the execution of the action at time ta is made via a second algorithm realizing a third function F(ta,M(ta),V1(ta),P1,P2)->{0,1}, which compares, at the time ta, the measured value with a first parameter P1 at the time ta and the value of the second function V2(ta) with a second parameter P2. In this case, P1 is an action and measured variable-specific parameter or limit measured value representing an upper or a lower threshold value depending on the measured variable and P2 is an action specific parameter or a limit reward value.
The essential advantage of the method according to the invention is therefore that the decision on the execution of an action is not derived solely from the comparison of a current measured value with a limit measured value, which must be exceeded or fallen below, so that it comes to a decision for the execution of the action, but also from a cumulative basic reward value, which is aggregated from current reward values. The current reward value can also have a negative value, so that the cumulative basic reward value can not only increase but also decrease in the temporal course. The decision on the execution of an action is made even if the cumulative basic reward value increases a limit reward value.
In addition, values that are generated by manually guiding the program-controlled machine or a part of the program-controlled machine, in particular a teach-tool, can also be used for the calculation of the functions V1(ta) and V2(ta). As a result, an automatic sequence generation and a continuous improvement of the system can be achieved, i.e. the sequence generation can be made capable of learning by manual intervention (feedback loops), so that e.g. failures of the past can be avoided in the future.
The method according to the invention is used for the automatic decision-making of a program-controlled machine about the execution of at least one action A in a situational context. The program-controlled machine comprises,
In an advantageous embodiment of the invention, the first algorithm (Algo1) calculates the value of the second function V2(ta) at the time ta as the sum of the value of the first function V1(ta) at the time ta and the value of V2(ta-1) at the preceding time ta-1: V2(ta):=V1(ta)+V2(ta-1). Of course, it is also possible that the first algorithm (Algo1) calculates the value of the second function V2(ta) at the time ta as the product or difference of the value of the first function V1(ta) at the time ta and the value of V2(ta-1) at the preceding time ta-1.
It is also possible, that the first parameter P1 and/or the second parameter P2 is time-dependent and/or dependent on another variable, in particular the location.
In a particularly advantageous embodiment, a plurality of measured variables M is detected by a plurality of sensors, wherein the execution of a single action A is decided. It is also possible, that a single measured variable M is detected by one sensor or a plurality of sensors and the execution of several actions A is decided. Of course it is also possible, that a plurality of measured variables M is detected by a plurality of sensors and the execution of a plurality of actions A is decided.
Advantageously, the parameter P1 represents an upper threshold value or a lower threshold value.
Finally, the program-controlled machine, by which the method according to the invention is performed, is a permanently installed machine or a mobile machine, in particular a robot.
The invention also relates to a program-controlled machine for performing a method, wherein the program-controlled machine comprises:
The invention will be explained in more detail hereinafter with reference to the drawings.
The method according to the invention is now described in more detail with reference to an embodiment and the diagram according to
In the embodiment, the method is used to decide on the execution of a single action A on the basis of a single measured variable M. Of course, the method according to the invention can also be used for the decision-making about the execution of a single action A or several actions A on the basis of a single measured variable M and/or several measured variables M.
The method according to the invention could be used for example in an automatic irrigation system for a garden, which represents a program-controlled machine in the sense of the invention. The possible action A could be the irrigation of the garden via a sprinkler system. A possible measured variable M would be the amount of precipitation over the past 100 hours. This measured variable M could be detected by a sensor, which delivers the corresponding measured values M(tk) at defined times t0, . . . ,tm.
A first parameter P1 or a limit measured value would have to be defined for the action A irrigation of the garden and the measured variable M. A second parameter P2 or a limit reward value would also have to be defined for action A. An appropriately trained artificial neural network (ANN) would derive a first function V1(ta) or a reward value from the measured values M(tk) of the sensor at any time ta. V1(ta) would be positive at times of low or no precipitation in the past 100 hours, conversely, V1(ta) would be negative with significant precipitation. The reward value represented by the first function V1(ta) would therefore reflect the current need of the action A at the time ta.
From the reward values of the past, the first algorithm (Algo1) could calculate a second function V2(ta) or a basic reward value at the time ta from the value of the first function V1(ta) at the time ta and the temporally preceding value of V2(ta-1). The basic reward value represented by the second function V2(ta) would therefore reflect the cumulative need of the action A at the time ta.
The second algorithm (Algo2) would decide irrigation at the time ta, if the measured value of the amount of precipitation falls below the first parameter P1 (limit measured value) specific to irrigation at the time ta, or if the second function V2(ta) (basic reward value) specific to irrigation increases the defined second parameter P2 (limit reward value). This decision would be realized by a third function F(ta,M(ta),V2(ta),P1,P2)->{0,1}, wherein the action A is executed and the second function V2(ta) is reset, when the third function F delivers the value 1.
Furthermore, the first algorithm (Algo1) could be modified such that it calculates the value of the second function V2(ta) at the time ta as the sum of the value of the first function V1(ta) at the time ta and the value of V2(ta-1) at the preceding time ta-1: V2(ta):=V1(ta)+V2(ta-1). An initial value is assigned to the second function V2(t0).
A further modification of the method could be that the first parameter P1 and/or the second parameter P2 are each time-dependent.
An extended embodiment relates to an irrigation system of a garden, which has several actions, irrigation via a sprinkler system, irrigation via a drip system. In addition to the amount of precipitation of the past 100 hours, the air temperature, the air pressure and the air humidity could be used as further measured variables, for which measured values are delivered via corresponding sensors at defined times.
This application is a U.S. National Stage application of International Application No. PCT/EP2016/076754, filed Nov. 4, 2016, which claims priority to U.S. Application No. 62/251,756, filed Nov. 6, 2015, the contents of each of which are hereby incorporated herein by reference.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/EP2016/076754 | 11/4/2016 | WO | 00 |
Number | Date | Country | |
---|---|---|---|
62251756 | Nov 2015 | US |