The present disclosure relates to the field of assisting an operator of an ego-agent. In particular, a method for assisting the operator of an ego-agent, a corresponding program comprising program code, a corresponding non-transitory computer readable storage medium, and an assistance system for assisting the operator of an ego-agent are proposed.
When people operate ego-agents in an environment where also other agents are present, there is generally a risk of a collision between the agents. The density of traffic significantly increased over the last years, and it is quite challenging for the operators of the agents to observe all the other agents in the environment in order to react in due time on their behavior such that collisions can be avoided. Further, there are other risks that are encountered in operating a vehicle, for example curve risks or regulatory risks. The performance of modern processors and availability of sensors like radar sensors, cameras, LIDAR sensors or the like providing the processor with information on the environment of the ego-agent enables assistance for operators of ego-agents in their operation of the ego-agent. However, the style of operating an ego-agent significantly varies from operator to operator. Thus, in many situations, the assistance system proposes behaviors or even interfere (e.g., in a case of semi-automated driving) with the control operation by the operator in a way that is distracting the operator.
There are some approaches that tried to adapt the systems to individual operator's behaviors so that the operator is not bothered and therefore acceptance of the assistance made by the system is increased. U.S. Pat. No. 9,623,878 B2, for example, proposes a driver assistance system, which is personalized and learns from the driver's habits. However, the proposed system adjusts a control parameter, for example, a target distance of an ACC (adaptive cruise control) to the driver's particular driving style. Unfortunately, such approach leads to an assistance system that reflects the individual driving styles but has no effect on the behavior of the driver.
Quite a number of different approaches have been developed for generating control signals for autonomous driving or partially autonomous driving. One approach is described in U.S. Pat. No. 9,463,797 B2 in which a future trajectory for an ego-agent is predicted and from such prediction a plurality of trajectory alternatives for the ego-agent are generated. Further, a hypothetical future trajectory for another agent is determined and based on at least one pair of trajectories of the ego-agent and of the other agent risk functions over time or along the calculated hypothetical trajectory of the ego-agent alternatives are calculated. These risk functions are then combined in a risk map which is then analyzed to generate the control signal.
U.S. Pat. No. 10,627,812 B2 relates to another problem of known assistance systems. For making a correct prediction, it is necessary that information on the environment used in the prediction algorithm is correct. However, since most of the information is obtained using one or more sensors, certain areas in the environment relevant for correctly assessing the traffic situation may be occluded. U.S. Pat. No. 10,627,812 B2 mitigates the risk of such blind spots by assuming a virtual traffic entity in the area for which the confidence for the sensor data is below a certain threshold or no sensor data is available at all, and lets the virtual traffic entity interact with a predicted behavior of the ego-agent. A respective risk measure is estimated and the estimated risk measure is taken into consideration for a controlling action of the ego-agent.
US 2020/0231149 A1 supports a driver of an ego-agent by considering priority relationships between the ego-agent and at least one other traffic participant and selecting a respective prediction model for the traffic participant. The selection of the prediction model therefore takes into consideration the determined priority relationship between the involved agents and therefore improves the assistance by a more precise prediction of the future behavior of the other agent.
EP 4 068 153 A1 describes an advanced driver assistance system in which, based on information received from sensors, a feature of the environment is determined and the risk zone of the feature is estimated. The feature and the risk zone of the feature are then displayed on a displayed together with the environment of the vehicle in a map.
As the above provided examples of known assistance systems clearly show, there are several different approaches in order to estimate the risk and provide an operator of an agent with information on that risk or to adapt control of an agent to the habits of its operator. However, there is still the problem that the style of one operator of an agent in the traffic situation may significantly differ from another agent's style. This may lead to misinterpretations of a specific situation, which is not only difficult for a human operator but also for automated systems. Thus, there is still a need to find a way of harmonizing the styles of operation of the plurality of agents' operators involved in a traffic situation. Such a harmonization could significantly increase the quality of prediction and their confidences and would, thus, directly improve traffic safety.
With the present invention this problem is solved by adapting the estimation of a future risk based on habits of the individual operator but, contrary to what is known from the prior art, not by only by trying to meet the operator's expectations, but further considering a difference between the style of the operator and a target style. This is achieved by an adaptation in the estimation of future risks so that the communicated output finally educates the operator (driver) towards the target style. This target style is e.g., an average driving style and if operating in line with the target style could be reached for any operator of an agent, the dangerous differences in driving style between different agents could be reduced.
According to the invention, this is achieved by a method, corresponding program, computer-readable storage medium and assistance system for assisting an operator by communicating an estimated risk to the operator with the estimation of the risk being adapted to influence operation of the ego-agent starting from an analysis of the operator's driving habits. First, a behavior planning algorithm using a first value of the parameter and a second value of the parameter in a cost function of a behavior planning algorithm is executed to determine a first and a second planned behavior. Further, the current state of the ego-agent is determined in order to derive the actual behavior of the ego-agent operated by its operator. A personalized parameter value reflecting this actual behavior is then estimated based on the relation of the first and the second planned behavior, and the actual behavior of the ego-agent. Using such a personalized parameter value in the behavior planning algorithm would result in a planned behavior corresponding to the habits of the operator of the ego-agent. Since this does not lead to any change in the operator's habits but only models the operator's style in the cost function used for behavior planning, the invention proposes to correct the personalized parameter value using a parameter correction value. Determining the parameter correction value is at least based on the estimated personalized parameter value and a target parameter value. For example, knowing the target parameter value and the personalized parameter value estimated for the individual operator allows to shift the personalized parameter value towards the target parameter value thereby generating an adapted parameter value by correcting the personalized parameter value using the parameter correction value.
More than two parameter values can be used as well to determine more than two planned behaviors. An improved estimation of the personalized parameter value can be accordingly done using these more than two planned behaviors, for example by better interpolation.
This adapted parameter value is then applied in the behavior planning algorithm and a risk is estimated. This risk is finally communicated to the operator. So the risk communicated to the operator corresponds more to the risk that would be communicated to an operator with an average style. Since the estimation and communication of the risk no longer precisely reflects the style of the operator, this causes a training effect of the assisted operator finally harmonizing the styles of all assisted operators.
The description of embodiments refers to the enclosed figures.
According to an embodiment, the personalized parameter value is estimated using interpolating the actual behavior with the first and second planned behavior. Using interpolation for estimating the personalized parameter value has the advantage that the estimation can be made online, which means during operation of the ego-agent. As a consequence, even when the style of the operator changes, the guidance will be kept effective.
Further, the personalized parameter value can be estimated using a comparison of the acceleration of the actual behavior and the acceleration according to the first and second planned behavior. Acceleration can easily be sensed for the ego-agent and no or almost no preprocessing of the sensor values is necessary. This reduces the computational costs and improves the real time application, for example, in a vehicle.
According to an embodiment of the invention, the parameter correction value is calculated based on a difference between the target parameter value and the personalized parameter value, and a correction coefficient. This has the advantage that on the one hand it is directly considered how much the actual behavior differs from the target behavior, resulting in a greater correction in case that a greater difference is recognized. On the other hand, the correction value allows a further adjustment how strong the correction is. It is in particular possible to determine the value of the correction coefficient based at least on one of: amount of the difference between the personalized parameter value and the target parameter value, and an operator condition. So starting from the difference between the actual behavior and the target behavior, and a constant correction coefficient, even more aspects can be taken into consideration.
According to another advantageous embodiment, the estimation of the personalized parameter value is repeatedly executed during operation of the ego-agent. Repeatedly estimating the personalized parameter value allows to adjust the correction to the actual behavior which may change during operation of the ego-agent.
On the other hand, it might be preferred to estimate the personalized parameter value based on a plurality of actual parameter values estimated based on the relation of the first and second planned behavior and an actual behavior of the ego-agent. Taking into consideration a plurality of actual parameter values avoids that for every change of an estimated actual parameter value the personalized parameter value follows this change. For example, a moving average could be calculated from a certain number of actual parameter values or hysteresis could be applied, filtering small changes in the estimated actual parameter values. The personalized parameter value is then updated only if a significant change in the style of the operator is recognized.
According to another preferred embodiment, the last personalized parameter value estimated during a previous operation of the ego-agent is used as the personalized parameter value for the risk estimation of a current operation. If the starting point is the personalized parameter value estimated during previous operation of the ego-agent, it is specifically preferred to update the last personalized parameter value estimated during a previous operation of the ego-vehicle with the personalized parameter value estimated during the current operation of the ego-vehicle and the updated personalized parameter is then used as the personalized parameter value for the risk estimation of the current operation. In order to enable such use of a previously estimated personalized parameter value, the system comprises a non-volatile memory in which the latest personalized parameter value is stored.
Actual parameter values can be estimated repeatedly based on the relation of the first and second planned behavior, and an actual behavior of the ego-agent, and a confidence measure can be calculated for the latest actual parameter value. This last actual parameter value is taken over as the personalized parameter value when its confidence measure exceeds a confidence threshold. The confidence threshold, for example, quantifies the frequency of changes of the personalized parameter value.
The parameter value lies in an interval and, for determining a first and a second planned behavior, a lower limit of the interval is used as the first value of the parameter and the upper limit of the interval is used as the second value of the parameter. So if the extreme values defined by the limits of the interval are used in order to determine the first and the second planned behavior, it is easily possible to use an interpolation for estimating personalized parameter value with a high accuracy. The interpolation may use a step function, linear function, two concatenated linear functions or sigmoid function for estimating the actual parameter values.
It is to be noted that the personalized parameter value and an actual parameter value can be used interchangeably for embodiments in which no processing of the actual parameter values is made to derive the personalized parameters. The estimation of the personalized parameter values is the same as the estimation of the actual parameter value. However, for explanation of embodiments that process the actual parameter to determine the personalized parameter, it will be distinguished between “personalized” and “actual”.
The cost function preferably comprises, in addition to a risk component, at least one of: a utility component and a comfort component, each component being weighted with a dedicated parameter. Having such a cost function comprising a plurality of different components allows to guide the operator of ego-agent towards the desired behavior not only with respect to aspects of risk but also to other aspects like utility or comfort. In such a case, the method is executed in respect of the risk component, and, in addition, in respect of at least one of the utility component and the comfort component.
Further, the development of the correction value over time can be evaluated and a feedback on the evaluation result is provided to the operator. The correction values are then stored on a non-volatile memory and after a predetermined time interval or upon request (for example from the operator), an evaluation of the course of the correction values over time is made.
It is to be noted that the present invention can be applied for all kinds of vehicles including planes, boats, micro mobility robots but even pedestrians. In most cases, the operator of the ego-agent will be the driver of a ego-vehicle, pilot of a plane and so on. However, the operator may even control the ego-agent from a remote position.
Before all the estimation and determination steps are explained in more detail, the general background of the present invention and the effect of educating the operator shall be explained.
The individual style of an operator with which the operator operates the ego-agent can be modelled using coefficients weighting the individual components in a cost function that is used in behavior planning. The cost function can be described as:
cost=α*risk−β*utility+γ*comfort
The parameters α, β and γ adjust the overall cost function and can have values from an interval from a lower limit to an upper limit. It is to be noted that the interval does not need to be the same for all the parameters. The influence of the values a parameter can have been illustrated in
On the left side of
The cost function and the resulting planned behavior when a behavior planning algorithm is executed using the cost function with given parameters allow on the other hand to model an operating style of an operator. So, while originally the values for the three parameters define in which way a behavior in a certain situation is planned, it is also possible to determine from an observed actual behavior of the ego-agent the corresponding parameter values, which describe the habits and style of an operator. The first parameter α for example describes how close and operator (ego-agent) approaches other agents or how long he/she is staying close to other agents. The second parameter β describes how fast an operator wants to reach the destination or how often he/she changes lanes. Finally, the third parameter γ describes how often the operator is accelerating or decelerating and how strong the acceleration or deceleration is.
It is to be noted that apart from behavior planning, risk maps may also be used in order to create warnings and communicate the identified risks to the operator of the ego-agent. Such communication of risks or outputting of warnings and also behavior planning with risk maps is known in the art.
The trajectories are calculated in a known manner using information on vehicle states that are obtained from sensors mounted or carried by the agent. This information can be enhanced with agent-to-agent information. The obtained information is processed by a processor which performs the computational steps for the planning as explained above, but, with respect to the present invention also the estimation, correction and all other computations described below.
For the following explanations, the aspect “risk” will be mainly used for the explanations. However, these explanations are similarly valid for estimating a personalized parameter value β, γ for the components “utility” and/or “comfort”.
Based on risk maps that are used in a behavior planner as explained above, a personalized parameter value can be estimated from the cost function as given below.
As explained previously, the values of the three parameters result in different planned behaviors. The first parameter α is particularly relevant with respect to a risk of a collision when another agent is involved. It may be understood as a measure for the size of the risk zone, as illustrated in the upper part of
In the middle of
Then, based on the two planned behaviors 12, 13 and the actual behavior of the ego-agent, interpolation 14 is performed using current vehicle states, which are also used for the determination of the planned behaviors 12, 13, in order to derive from the vehicle states the current behavior of the ego-agent 4 corresponding to the actual control performed by the operator, and, as a result the actual parameter value. It is to be noted that the actual parameter value is the same as the personalized parameter value in case that no further processing of the estimated value is made. This interpolation may be made, for example, by comparing acceleration values of the planned behaviors and the actual behavior.
In the lower right of
Once the personalized parameter value is estimated, a correction is performed in order to generate an adapted parameter value used to estimate the risk by generating a risk map using the adaptive parameter value. This process is illustrated in
In
The adapted parameter value is calculated as:
αadapted (k)=αestimated+(αnormal−αestimated)*k
k is a correction coefficient allowing to adjust how strong the correction is for a given difference between the personalized parameter value αestimated and the target value αnormal. Basis for the calculation of the adapted parameter value αadaoted(k) is the deviation of the estimated parameter value αestimated from the normal parameter value αnormal, and k can be freely set in an interval, for example between 0 and 1, according to the desired effect. A correction value of k=0.1, for example, may slowly guide the driver to the average safe driving style.
In order to adjust the intensity of the effect, the correction coefficient may be adjusted. For example, if it is recognized that the driver has a low awareness, is sleepy and/or stressed, it may be desirable to provide a smaller correction value. Since the difference between the estimated parameter value αestimated and the normal parameter value αnormal is still the same, the correction coefficient k is then adjusted towards 0. On the other hand, in case that the operator's awareness is high, the operator is not sleepy or stressed, a stronger correction might be desired. By adjusting the coefficient to be higher (k->1), this effect can be achieved.
Another way of adapting the correction coefficient k is to take into consideration the amount of the deviation of the estimated parameter value αestimated from the normal parameter value αnormal. For example, the correction coefficient k can be increased for smaller differences and decreased for large differences. The reduction of the correction coefficient k for large differences avoids that the driver is overstrained. On the other hand, the increase of the correction coefficient k for small differences ensures that the adaptation of the parameter value still has an effect. Otherwise, it would be neglectable the closer the operator's style is to the normal style. A table associating the correction values with the calculated differences can be stored in a memory and retrieved by the processor executing all the calculations and determinations explained above in order to estimate the risk.
It is to be noted that the explanations given so far do not distinguish between the personalized parameter value and an actual parameter value for the current behavior of the ego-agent. This means that the actual parameter value that is estimated as explained above, for example by an interpolation, is directly used as personalized parameter value for further processing and estimating a risk to be communicated to the operator. However, it might be preferred to estimate a plurality of actual parameter values and, based thereon, determine a personalized parameter value. This avoids that fluctuations of the actual parameter value directly affect the estimation of the risk.