The subject disclosure relates to autonomous vehicles and their methods of operation and, in particular, to a system and method for performing an action at an autonomous vehicle that reduces an uncertainty and anxiety in a human passenger of the autonomous vehicle.
An autonomous vehicle performs various maneuvers that are based on a state of the vehicle and a traffic scenario. The vehicle plans the maneuvers in order to move itself safely through traffic. However, an action chosen by the vehicle can be different than an action that a human would select in the same situation or an action that the human would expect the vehicle to select. Thus, a user traveling in the vehicle may develop a level of surprise, uncertainty and/or anxiety when the vehicle performs the action. Accordingly, it is desirable to provide a system and method for determining a difference or gap between an expectation of the user in a given traffic scenario and an intended action of the vehicle in the scenario.
In one exemplary embodiment, a method of operating a vehicle is disclosed. A machine-selected action for the vehicle in a current state and an actual next state for the vehicle resulting from the machine-selected action is determined. Using a user model, a user-expected action for the vehicle in the current state and a user-expected next state for the vehicle resulting from applying the machine-selected action are determined. A gap value is determined based on at least one of the user-expected action, the machine-selected action, the actual next state, and the user-expected next state. A signal is output when the gap value meets a threshold.
In addition to one or more of the features described herein, the user model includes a first model characterizing the user-expected action for the vehicle in the current state and a second model characterizing the user-expected next state. Determining the gap value further includes at least one of determining a difference between the user-expected action and the machine-selected action, determining the difference between the user-expected next state the actual next state, determining the difference between a distribution over the user-expected action and the machine-selected action, and determining the difference between the distribution over the user-expected next state the actual next state. The method further includes creating the user model by at least one of polling a reaction of a test subject to a traffic scenario, and applying constraints on a Markov Decision Process to create a free energy model having one or more hyperparameters and polling the reaction of the test subject to determine the values of the one or more hyperparameters. The method further includes adjusting the value of the one or more hyperparameter of the user model to fit a behavior of a selected user. Outputting the signal further comprises at least one of providing an explanation to a user about the gap value, adjusting the machine-selected action to correspond to the user-expected action, transferring control of the vehicle to the user, and providing the gap value to a traffic controller. The method further includes adjusting the user model to suit a knowledge of a user.
In another exemplary embodiment, a system for operating a vehicle is disclosed. The system includes a processor configured to determine a machine-selected action for the vehicle in a current state and an actual next state for the vehicle resulting from the machine-selected action, determine, using a user model, a user-expected action for the vehicle in the current state and a user-expected next state for the vehicle resulting from applying the machine-selected action, determine a gap value based on at least one of the user-expected action, the machine-selected action, the actual next state and the user-expected next state, and output a signal when the gap value meets a threshold.
In addition to one or more of the features described herein, the user model includes a first model characterizing the user-expected action for the vehicle in the current state and a second model characterizing the user-expected next state. The processor is further configured to determine the gap value by determining at least one of a difference between the user-expected action and the machine-selected action, the difference between the user-expected next state the actual next state, the difference between a distribution over the user-expected action and the machine-selected action, and the difference between the distribution over the user-expected next state the actual next state. The processor is further configured to create the user model by at least one of polling a reaction of a test subject to a traffic scenario, and applying constraints on a Markov Decision Process to create a free energy model having one or more hyperparameters and polling the reaction of the test subject to determine the values of the one or more hyperparameters. The processor is further configured to adjust the value of the one or more hyperparameters of the user model to fit a behavior of a selected user. The processor is further configured to output the signal by performing at least one of providing an explanation to a user about the gap value, adjusting the machine-selected action to correspond to the user-expected action, transferring control of the vehicle to the user, and providing the gap value to a traffic controller. The processor is further configured to adjust the user model to suit a knowledge of a user.
In another exemplary embodiment, a vehicle is disclosed. The vehicle includes a processor configured to determine a machine-selected action for the vehicle in a current state and an actual next state for the vehicle resulting from the machine-selected action, determine, using a user model, a user-expected action for the vehicle in the current state and a user-expected next state for the vehicle resulting from applying the machine-selected action, determine a gap value based on at least one of the user-expected action, the machine-selected action, the actual next state and the user-expected next state, and output a signal when the gap value meets a threshold.
In addition to one or more of the features described herein, the user model includes a first model characterizing the user-expected action for the vehicle in the current state and a second model characterizing the user-expected next state. The processor is further configured to determine the gap value by determining at least one of a difference between the user-expected action and the machine-selected action, the difference between the user-expected next state the actual next state, the difference between a distribution over the user-expected action and the machine-selected action, and the difference between the distribution over the user-expected next state the actual next state. The processor is further configured to create the user model by at least one of polling a reaction of a test subject to a traffic scenario, and applying constraints on a Markov Decision Process to create a free energy model having one or more hyperparameters and polling the reaction of the test subject to determine the values of the one or more hyperparameters. The processor is further configured to output the signal to perform at least one of providing an explanation to a user about the gap value, adjusting the machine-selected action to correspond to a user-expected action, transferring control of the vehicle to the user, and providing the gap value to a traffic controller. The processor is further configured to adjust the user model to suit a knowledge of a user.
The above features and advantages, and other features and advantages of the disclosure are readily apparent from the following detailed description when taken in connection with the accompanying drawings.
Other features, advantages and details appear, by way of example only, in the following detailed description, the detailed description referring to the drawings in which:
The following description is merely exemplary in nature and is not intended to limit the present disclosure, its application or uses. It should be understood that throughout the drawings, corresponding reference numerals indicate like or corresponding parts and features.
In accordance with an exemplary embodiment,
The autonomous vehicle 10 generally includes at least a navigation system 20, a propulsion system 22, a transmission system 24, a steering system 26, a brake system 28, a sensor system 30, an actuator system 32, and a controller 34. The navigation system 20 determines a road-level route plan for automated driving of the autonomous vehicle 10. The propulsion system 22 provides power for creating a motive force for the autonomous vehicle 10 and can, in various embodiments, include an internal combustion engine, an electric machine such as a traction motor, and/or a fuel cell propulsion system. The transmission system 24 is configured to transmit power from the propulsion system 22 to two or more wheels 16 of the autonomous vehicle 10 according to selectable speed ratios. The steering system 26 influences a position of the two or more wheels 16. While depicted as including a steering wheel 27 for illustrative purposes, in some embodiments contemplated within the scope of the present disclosure, the steering system 26 may not include a steering wheel 27. The brake system 28 is configured to provide braking torque to the two or more wheels 16. In an embodiment, the autonomous vehicle 10 can be an electrical vehicle in various embodiments. In other embodiments, the autonomous vehicle 10 can include an autonomous vessel, a plane, or a machine used for agricultural purposes.
The sensor system 30 includes a radar system 40 that senses objects in an exterior environment of the autonomous vehicle 10 and determines various parameters of the objects useful in locating the position and relative velocities of various remote vehicles in the environment of the autonomous vehicle. Such parameters can be provided to the controller 34. In operation, the transmitter 42 of the radar system 40 sends out a radio frequency (RF) reference signal 48 that is reflected back at the autonomous vehicle 10 by one or more objects 50 in the field of view of the radar system 40 as one or more echo signals 52, which are reflected signals received at receiver 44. The one or more echo signals 52 can be used to determine various parameters of the one or more objects 50, such as a range of the object, Doppler frequency or relative radial velocity of the object, and azimuth, etc. The sensor system 30 includes additional sensors, such as digital cameras, for identifying road features, Lidar, etc.
A driver monitoring system 46 monitors a driver, user, or passenger of the autonomous vehicle 10. The driver monitoring system 46 records actions taken by the user, a direction of attention of the user (by observing eye location or movement), a facial expression of the user, etc., in order to determine a reaction of the user to vehicle movement. In other embodiments, the autonomous vehicle can be without a driver monitoring system 46. The use of a driver monitoring system is not meant to be a limitation on the invention.
The controller 34 builds a trajectory for the autonomous vehicle 10 based on the output of sensor system 30. The controller 34 can provide the trajectory to the actuator system 32 to control the propulsion system 22, transmission system 24, steering system 26, and/or brake system 28 in order to navigate the autonomous vehicle 10 with respect to the object 50.
The controller 34 includes a processor 36 and a computer readable storage device or computer readable storage medium 38. The storage medium includes programs or instructions 39 that, when executed by the processor 36, perform the methods disclosed herein for operating the autonomous vehicle 10 based on sensor system outputs. The computer readable storage medium 38 may further include programs or instructions 39 that when executed by the processor 36, provide information that can be used to allow the autonomous vehicle to navigate through traffic in a manner that reduces a level of uncertainty, surprise, or anxiety in the passenger or other vehicle user.
With respect to the traffic scenario shown in
The methods disclosed herein determine a difference between a possible action that a vehicle plans to take in a given traffic scenario and a possible action that a human would take in the same traffic scenario or that a human would expect the vehicle to take in the traffic scenario. A difference between these actions can cause surprise or uncertainty in the human. In one embodiment, the difference between either the actions or in driver expectations can be used to make an adjustment to the vehicles planned action that mitigates the difference. Alternatively, the difference can be used to provide a notification or explanation to the user about the reasons the vehicle behaves as it does.
The traffic scenario 402 is sent to a user model 412 to generate a user-expected action. The user model 412 is a model of a user's probable actions for a given state of the host vehicle 202 and a model of a user's expectations of the next state of the host vehicle 202 given a selected action. The model of the user's probable action can be a probability distribution over a domain of possible actions for the traffic scenario 402. Similarly, the model of the user's expectations of the next state of the vehicle given a selected action can be a probability distribution.
The traffic scenario 402 is input to the user model 412 to select a user-expected action 414 (represented as a) to the traffic scenario and to output a user-expected next state (represented as s″). For the illustrative traffic scenario of
A gap detector 416 receives the optimal action (a*) selected by the vehicle and the actual next state s′ of the vehicle given the optimal action. The gap detector 416 also receives the user-expected action a selected using the user model and the user-expected next state s″. The gap detector 416 determines whether a gap is significant (box 418) or whether a gap is not significant (box 420). A gap can refer to a difference between a user-expected action (or its distribution) and a machine-selected action, or a difference between the actual next state and a user-expected next state (or its distribution), or a combination thereof. The gap detector 416 compares the difference to a threshold. When the difference meets a criterion or is greater than the threshold, the method returns that the gap is true (e.g., a significant difference between the expected action or state and the action selected by the vehicle and resulting state). When the gap is less than the threshold, the method returns that the gap is false or that there is little or no significant difference between the expected action or state and the action selected by the vehicle and resulting state.
Box 508 includes a gap detector. The machine-selected action (a*) and the actual next state (s′) are output from the vehicle model to the gap detector. Also, the user-expected action (a) and the user-expected next state (s″) are output to the gap detector from the user model.
In box 508, the gap detector determines or estimates a gap or difference between the actions and between the states. There are two types of gaps: gaps in actions and gaps in states. Each type of gap can be determined either by comparing the probability distribution from the user model to the machine-selected action or actual next state, or by comparing the expected action or expected next state (according to the user model) to the machine-selected action or actual next state. A gap exists if any one of these methods indicates the existence of the gap. For each method, the gap detector compares the gap to a threshold value to determine whether, in one case, the vehicle behavior is significantly different than a user-expected behavior to cause alarm to the user (TRUE, box 510) or, in another case, the vehicle behavior is close enough to the user-expected behavior that the user has a level of certainty regarding the vehicle's behavior. (FALSE, box 512)
The user model of box 502 can be created using at least two methods. A first method includes testing of subjects by exposing them to simulations and collecting their responses. A second method includes solving a constrained Markov Decision Process with a different set of hyperparameters for different scenarios and then running the user model through test subjects to find a suitable range for the hyperparameters that balances an expected reward vs. uncertainty for the test subjects, at each scenario.
The user action probability model can be a probabilistic model or probability distribution indicating a probability of the user for taking an action in a given traffic scenario. Similarly, the expected state probability model can be a probabilistic model or probability distribution indicating a probability that a user expects to be in a next state of the vehicle given an action.
In one embodiment, the user study includes showing users a movie of a driving maneuver in different driving scenarios. Users then provide answers to questions. In one example, the users are asked to express their levels of trust and satisfaction during the maneuvers presented to them. In one embodiment, a trust and satisfaction interview can be used such as discussed in XAI metrics (“Metrics for Explainable AI: Challenges and Prospects”, Robert R. Hoffman, Shane T. Mueller, Gary Klein, Jordan Litman (2019)). In another example, users can be asked to specify a next expected action in a certain scenario given that a vehicle performs a certain maneuver or action.
The user action probability model or P_u(a|s) gives a probability that an action ‘a’ will be preferred by a user (indicated by subscript ‘u’) when the vehicle is in state ‘s’. The user actions can be determined by polling the test subjects. The user action probability model is computed based on votes from the test subjects or the one with maximal average trust and satisfaction, as shown in Eq. (1):
where N is the number of test subjects polled, i is the index of the test subject, vote(a,i)=1 if the ith test subject chooses action ‘a’ when the vehicle is in state ‘s’, otherwise, vote(a,i)=0.
The user model (both the user action probability model and the expected state probability model) can be maintained by repeating the user study at periodic intervals. Also, the test subjects for the user study can be selected so as to personalize the user model to a specific user or set of users. For example, the probability values of the user model can be filtered by age group, gender or any other characteristic that might affect the expectation of the user. The user study can also be applied by observing a user or passenger during a driving experience.
The driver monitoring system 46 can be used to detect a reaction of the passenger in order to gauge a level of comfort of the user with the vehicle's action. An example of a passenger's reaction includes the passenger taking manual control of the vehicle to disengage the vehicle from automated driving. The vehicle can record that the passenger has taken control, thereby determining that the passenger does not trust the action of the vehicle. The user can also provide direct feedback to the vehicle by pushing a button or other input device. The driver monitoring system 46 can also recognize facial gestures to record a passenger's satisfaction or discontent with the machine-selected action.
The free energy model of box 704 solves an optimization problem as shown in Eq. (1):
where V is the expected reward for policy π. The expected reward is as shown in Eq. (2):
V
π(s)=Σa∈Aπ(a|s)·Σs′∈SPr(s′|s,a)[R(s,z)+Vπ(s′)] (2)
The information in Eq. (1) includes an uncertainty H and a divergence P. A constraint equation for the uncertainty H for a given policy π is given in Eq. (3):
H
π(s)=E{−log Pr(s′|s,a)+Hπ(s′)}≤C1 (3)
and a constraint for a KL-divergence of the policy from a uniform distribution is given in Eq. (4):
In box 802, the driver monitoring system 46 obtains measurements that indicate where the driver or passenger is looking. The measurements allow knowledge of what information the passenger is or is not aware of. In box 804, a system transition model used by the vehicle is provided. In box 806, the system transition model is adjusted using the information of the user's awareness, resulting in an adjusted state and transition model (adjustments to the MDP). In box 808, the free energy model is applied to the adjusted MDP to create a user model policy, shown in box 810. In box 812, the passenger is monitored to identify any intervening actions by the passenger. In box 814, the hyperparameters of the user model are adjusted based on the passenger reactions. The original set of hyperparameters is the set determined during the user model creation process discussed with respect to
Different user model policies can be calculated off-line for different values of parameters. A precalculated user model policy can be applied to a particular user's behavior (i.e., more aggressive, more conservative) by applying a suitable set of hyperparameters, thereby tailoring the user model to the particular user. The tailored user model can be created either offline or online.
When a new user faces some current state s, the user model can be applied using various different methods. The current state s is compared to various stored states to determine a set of closest or most similar stored sates si. An expected action is determined for each state si. In a first implementation, a vote is made of the expected actions and the expected action with the most votes is selected. In a second implementation, an expected trust and satisfaction for the current state s is determined as the average values of trust and satisfaction for all the similar states si. The action associated with the state having the maximal trust and satisfaction is selected as the expected action. In a third implementation, the next action is selected using models for users that are similar to the new user. In a fourth implementation, a classifier is trained with data and used to predict an action and next state for a given traffic scenario and current state. The user expectation for an action is the one that maximized a predicted trust and satisfaction, as shown in Eq. (5):
Thus, the policy π performs the action (a) that maximizes the reward R.
The gap detector can perform at least one of four different comparisons. In box 1202, a comparison is made between action probabilities. In box 1204, a comparison is made between expected next state probabilities. In box 1206, a comparison is made between different action selections (i.e., between a user model expected action and a machine selected action). In box 1208, a comparison is made between different next states (i.e., between a user-expected next state based on the optimal action selected by the vehicle and the actual next state). Each comparison can be compared to a respective threshold, which is supplied to the gap detector at box 1210. The threshold helps to determine if the difference between the compared actions or states is actually considered enough to be of concern to a user.
In box 1202, a difference is determined between the user's action probability distributions and the systems chosen action. An action threshold θa>=0 is assumed.
In a first implementation of box 1202, the probability distribution P_u(a*|s) is compared to a maximum value of P_u(a|s). If the maximizing action is both sufficiently different from the optimal action and the difference in probabilities is greater than the threshold θa, a TRUE value is returned. Otherwise, FALSE is returned.
In a second implementation of box 1202, a maximum value of P_u(a|s) is found over all actions a that are within a selected neighborhood of the optimal action a*. This neighborhood can be determined as a semantic distance or other measure of subjective perception that tells that a user does not see any difference between these actions (if chosen). The maximizing value is then marked as a**. The probability P_u(a**|s) is then compared to the maximizing value of P_u(a|s) over all possible actions. If the maximizing action is sufficiently different from a** and the difference in probabilities is greater than the threshold θa, then TRUE is returned. Otherwise, FALSE is returned.
In box 1204, a gap is determined between expected next states. Assuming a probability distribution P_u(s″|s,a*) for the user model and the actual next state s′, two methods can be used to calculate the gap. In a first method, the maximizing state of P_u(s″|s, a*) is found and its probability is compared with that of P_u(s′|s, a*). If the difference is greater than the threshold and the maximizing state is sufficiently different from s′, TRUE is returned, otherwise, FALSE is returned. In a second method, a maximum value of P_u(s″|s, a*) is found over all states s″ that are within a selected neighborhood of the actual state s′. The maximizing state is denoted as s*. P_u(s*|s,a*) is compared to the maximum over all possible state of P_u(s″|s,a*). If this difference is greater than a threshold, return TRUE. Otherwise, return FALSE.
In box 1206, a difference is determined between selected actions. An action threshold θa>=0 is assumed. In a first implementation of box 1206, when the action threshold θa=0, if a* and a are different, then TRUE is returned. Otherwise, FALSE is returned. A second implementation of box 1206 includes a personalized or human-centered approach by determining whether a different action is considered distinguishable to a human. For example, reducing a vehicle's speed by 1-2 mph might be negligible while driving on a highway. If the difference is distinguishable, TRUE is returned. Otherwise, FALSE is returned.
In box 1208, a gap is determined between user expected next state (s″) and actual next state (s′). An expectation threshold θs>=0 is assumed. In a first implementation, when the threshold θs=0, if the actual next state is s′ and the user expected s″ based on action a* are not the same, then TRUE is returned. Otherwise, FALSE is returned. A second implementation of box 1208 includes a personalized or human-centered approach by determining whether a different state is considered distinguishable to a human.
A class of states is defined. The class is given by a set of similar states. A state is similar to another state when, for example, all of the parameter values are within a selected distance θs of each other. If the actual next state s′ and user expected next state s″ do not belong to the same class of states, the TRUE is returned. Otherwise, FALSE is returned.
In box 1212, an OR function is applied to the values (TRUE, FALSE) of each of the output from boxes 1202, 1204, 1206 and 1208 to return a final gap determination value. The results of the OR function returns either TRUE (box 1214) or FALSE (box 1216).
Once the results of the comparison are generated, various signals can be output. In one embodiment, the output signal includes that the action can either be used at the vehicle or an adjustment can be made to the machine selected action. In another embodiment, the signal provides an explanation to the user to inform the user about the differences between the user's expectations and the applied action. The signal can also provide that the vehicle adjusts the machine-selected action to correspond to a user-expected action. Alternatively, the signal can cause the vehicle to transfer control over to the user. The signal can cause the difference to be provided to a traffic controller to allow the traffic controller to make suitable adjustments.
For example, the current state s of the machine and actual next state s′ are represented by vectors shown in Eq. (7a) and Eq. (7b), respectively:
s=(x1,x2, . . . ,xk) (7a)
s′=(x1′,x2′, . . . ,xk) (7b)
The coordinates in Eq. (8) are considered to be not available to the user:
(xl,xl+1, . . . ,xk) (8)
The smoothed transition probability model is given as shown in Eq. (9)
where the summations in the numerator are over the unavailable coordinates of the current state and next state of Eqs. (7a) and (7b) and where:
ŝ=(x1,x1, . . . ,xi-1) (10)
represents the states available in the user model (composed of the available coordinates).
While the above disclosure has been described with reference to exemplary embodiments, it will be understood by those skilled in the art that various changes may be made and equivalents may be substituted for elements thereof without departing from its scope. In addition, many modifications may be made to adapt a particular situation or material to the teachings of the disclosure without departing from the essential scope thereof. Therefore, it is intended that the present disclosure not be limited to the particular embodiments disclosed, but will include all embodiments falling within the scope thereof.