The invention relates to a method for supervising operation of a motor vehicle. The invention also relates to a device for supervising operation of a motor vehicle. The invention also relates to a computer program implementing the aforementioned method. The invention lastly relates to a recording medium on which such a program is recorded.
Development of autonomous vehicles has made it necessary to be able to guarantee the safety of the automated systems employed in this type of vehicle. This in particular implies the ability to evaluate the effectiveness of automated systems in any type of situation, including critical situations such as sudden lane changes, collision avoidance, sensor failure, aggressive maneuvers by surrounding vehicles, etc.
Document US20200174471 discloses a method for evaluating the performance of an autonomous vehicle by analyzing the overall behavior of the vehicle relative to its environment, this method using a reinforcement learning algorithm by way of reward system.
However, this solution has drawbacks. In particular, the overall nature of the evaluation of behavior of the vehicle prevents the sub-system or sub-systems that could be the cause of a malfunction from being pinpointed.
The aim of the invention is to provide a device and method for supervising operation of a motor vehicle that remedy the above drawbacks and that improve on the supervising devices and methods known in the prior art. In particular, the invention makes it possible to provide a device and method that are simple and reliable and that allow the sub-system or sub-systems that are the cause of a malfunction to be pinpointed.
To this end, the invention relates to a method for supervising operation of a motor vehicle comprising an ordered set of at least two automated systems. The method comprises:
In one embodiment, iterations of the supervising second step are interrupted when an automated system obtains a negative score.
In one embodiment, the ordered set of at least two automated systems is formed, in order, of the following systems: a system for controlling movement of the vehicle, then a decision-making system, then a system for processing perception data.
In one embodiment, the motor vehicle comprises communication systems including vehicle-to-vehicle communication systems and/or vehicle-to-infrastructure communication systems, the motor vehicle further being equipped with a human-machine interface,
and the first step of activating supervision comprises receiving an evaluation of a behavior of the motor vehicle either from the human-machine interface or from the communication systems.
In one embodiment, the method comprises, following receipt of an evaluation of a behavior from the communication systems,
In one embodiment, the third step of updating at least one automated system comprises implementing a reinforcement learning algorithm or a switching system.
The invention further relates to a device for supervising operation of a motor vehicle, the vehicle being equipped with an ordered set of at least two automated systems. The device comprises hardware and/or software elements implementing the method such as defined above, in particular hardware and/or software elements designed to implement the method such as defined above, and/or the device comprises means for implementing the method such as defined above.
The invention also relates to a computer program product comprising program code instructions recorded on a computer-readable medium for implementing the steps of the method such as defined above when said program is run on a computer. The invention also relates to a computer program product that is downloadable from a communication network and/or recorded on a data medium that is readable by a computer and/or executable by a computer, comprising instructions that, when the program is executed by the computer, lead the latter to implement the method such as defined above.
The invention also relates to a computer-readable data recording medium on which is recorded a computer program comprising program code instructions for implementing the method such as defined above. The invention also relates to a computer-readable recording medium comprising instructions that, when they are executed by a computer, cause the latter to implement the method such as defined above.
The invention also relates to a signal of a data carrier, carrying the computer program product such as defined above.
The appended drawing shows, by way of example, one embodiment of a supervising device according to the invention and one mode of execution of a supervising method according to the invention.
One example of a motor vehicle 100 equipped with one embodiment of a device for supervising operation of a motor vehicle will now be described with reference to
The motor vehicle 100 may be a motor vehicle of any type, in particular a passenger vehicle, a commercial vehicle, a truck or even a means of public transport such as a bus or a shuttle. According to the described embodiment, the motor vehicle 100 is an autonomous vehicle and will be designated the “autonomous vehicle” in the remainder of the description.
This illustration is therefore given non-limitingly. In particular, the motor vehicle could be a non-autonomous vehicle equipped with an advanced driver-assistance system, in particular an advanced driver-assistance system corresponding to a level greater than or equal to level 2 autonomy, i.e. corresponding to a partially autonomous vehicle.
The autonomous vehicle 100 mainly comprises the following elements:
The sensors 1 of the environment of the autonomous vehicle 100 may comprise a set of cameras and/or lidars and/or radars for observing the environment all the way around, i.e. 360 degrees around, the autonomous vehicle 100. They may further comprise a GPS and an inertial measurement unit, which are used to locate the autonomous vehicle 100. The data delivered by the sensors 1 are transmitted to the system 5 for processing perception data.
The communication systems 2 comprise vehicle-to-vehicle communication systems (V2V systems) or vehicle-to-infrastructure communication systems (V2i systems) allowing vehicles to exchange information with one another and with infrastructure, in particular information regarding meteorological conditions.
The human-machine interface 3 is intended for the driver or user of the autonomous vehicle 100. It allows the driver or user of the autonomous vehicle 100 to evaluate the behavior of her or his own vehicle or a surrounding vehicle. It also makes it possible to inform the driver of the vehicle when responsibility for controlling her or his vehicle has been assigned thereto following failure of at least one automated system.
The actuators 4 effect the movement of the autonomous vehicle 100; they comprise an engine/motor torque actuator, a brake actuator and an actuator of rotation of the steered wheels.
The system 5 for processing perception data processes the data delivered by the sensors 1 and by the communication systems 2, and constructs a representation of the environment of the autonomous vehicle 100. The system 5 delivers as output the position of the vehicle and a description of all the relevant objects surrounding it.
The decision-making system 6 receives the data delivered by the system 5 for processing perception data, which inform it of the current situation of the vehicle and of its environment. Depending on these data, the system 6 adapts the behavior of the vehicle to the present situation. In particular, the system 6 may determine a decision of the autonomous vehicle 100, relating for example to a lane change maneuver, and/or to remaining in lane with a change of speed.
The system 6 comprises a navigation system the role of which is to generate the motion and to plan the behavior of the vehicle. It acts before the system 7 for controlling movement of the vehicle, to adapt the response of the vehicle to current scenarios.
The system 7 for controlling movement of the vehicle consists mainly of a longitudinal control sub-system, a lateral control sub-system and the chassis. The system 7 generates commands with a view to minimizing an error between the actual path of the vehicle and the path defined by the navigation system, for example for the purposes of lane keeping, speed tracking, etc.
The system 5 for processing perception data, the decision-making system 6 and the system 7 for controlling movement of the vehicle are capable of receiving rewards or scores from the microprocessor 81.
In the described embodiment, the autonomous vehicle 100 therefore comprises an ordered set ENS of automated systems, the ordered set comprising, in order, the following systems: the system 7 for controlling the movement of the vehicle, then the decision-making system 6, then the system 5 for processing perception data.
In the embodiment of the invention, the computer 81 makes it possible to execute a software package comprising the following modules, which communicate with one another:
One mode of execution of the method for controlling an autonomous vehicle will now be described with reference to
In a first step E1, supervision of the ordered set ENS of automated systems is activated.
In a first embodiment, activation of supervision is automatic. For example, it occurs on start-up of the autonomous vehicle 100, then the set of sub-systems is periodically supervised, with a period P that may be set or dependent on the navigation context of the vehicle.
In the first embodiment of step E1, it is for example possible to use a time delay TEMPO associated with step E1. The time delay is recorded in the local memory 82. It may take two states: a state called the inactive state, and a state called the active state. By default the time delay is in the inactive state. When a time delay is started, it enters the active state. Next, when the time delay has elapsed, the time delay enters the inactive state.
In the first embodiment of step E1, the state of the time delay TEMPO is tested.
If the time delay is inactive, then the time delay TEMPO is activated and assigned a duration P corresponding to the supervision period. The time delay then enters the active state for a time P. Step E2 is then passed to.
In a second embodiment, which is complementary or an alternative to the first embodiment, activation of supervision may comprise receipt of an evaluation of a behavior of the autonomous vehicle 100 delivered by either the human-machine interface 3 or the communication systems 2.
In other words, an evaluation of a behavior of the autonomous vehicle 100 has been issued
In the second embodiment, the way in which the vehicle 100 is being controlled is evaluated. Advantageously, the way in which the vehicle is being controlled is recorded in the local memory 82 and kept up to date. The vehicle 100 may be being controlled
If the vehicle is being controlled in the first way T1, this means that the evaluation of a behavior of the vehicle is meant for a human driver of the vehicle. A score is then assigned to the human driver depending on the evaluation of the behavior EC. It is transmitted to her or him via the human-machine interface 3. Step E1 is then returned to.
If the vehicle is being controlled in the second way T2, this means that the evaluation of a behavior of the vehicle must be analyzed depending on the automated systems of the set ENS. Step E2 is then passed to.
In the second step E2, the automated systems of the set ENS are successively supervised, the order of supervision of the systems being set. In the embodiment of the set ENS described, this amounts to supervising first the system 7 for controlling movement of the vehicle, then the decision-making system 6, then the system 5 for processing perception data.
Supervision of an automated system Si comprises two sub-steps:
In the remainder of the document, the terms “score” or “reward” are used interchangeably to designate a numerical evaluation of the operation of an automated system Si.
The processing operations performed in sub-steps E21 and E22 depend on the evaluated sub-system Si. Sub-steps E21 and E22 are iterated on the various systems Si.
For example, with regard to supervision of the system 7 for controlling movement of the vehicle, in sub-step E21, it is possible to compute a difference between a first path applied by the autonomous vehicle 100 and an ideal second path determined beforehand by the decision-making module 6. In sub-step E22, a positive, negative or zero score is assigned depending on the computed deviation.
With regard to supervision of the decision-making system 6, in sub-step E21, it is possible to verify whether a decision defined by the decision-making module 6 meets the constraints determined by the perceiving system 5. For example, it is possible to check whether the decision respects the configuration of the surrounding traffic, the state of traffic lights, etc. Next, in sub-step E22, if the decision respects the configuration of the surrounding traffic, then a positive or zero score is assigned to the decision-making module 6. Otherwise, a negative score is assigned to the decision-making module 6.
With regard to supervision of the perceiving system 5, in sub-step E21, it is possible to check the consistency between current perceptions delivered by the perceiving system 5 at supervision time T, and the perceptions delivered by the perceiving system 5 at a previous time T−dT. For example, it is possible to check whether ghost tracks have appeared between the times T−dT and T, or it is possible to detect an uncertainty in the location of a vehicle or of an object between the times T−dT and T. It is moreover possible to detect images that are degraded due to poor meteorological conditions. Next, in sub-step E22, depending on the results of sub-step E21, a positive or negative or zero score is assigned to the perceiving system.
Thus, each automated system Si supervised in step E2 obtains a positive, negative or zero score Ni. In one embodiment, the scores Ni obtained are recorded in the local memory 82 in order to be processed during the subsequent execution of step E3 of updating at least one automated system Si.
In addition, in one preferred embodiment, iteration of the second step E2 is interrupted as soon as an automated system Si obtains a negative score Ni. This preferred embodiment of step E2 is illustrated by
In a first step 140, it is tested whether the system 7 for controlling movement of the vehicle has achieved its objectives:
In the third step 150, it is tested whether the decision-making system 6 has achieved its objectives:
In the sixth step 160, it is tested whether the sensors 1 are faulty:
Thus, during the iteration of step E2 applied to the system 7 for controlling movement of the vehicle, if the system 7 for controlling movement of the vehicle obtains a negative score, the step E3 of updating the system 7 for controlling movement of the vehicle is passed to directly without performing supervision either of the decision-making system 6 or of the perceiving system 5.
Likewise, if, during the iteration of step E2 applied to the decision-making system 6, the decision-making system 6 obtains a negative score, step E3 of updating the decision-making system 6 is passed to directly without performing supervision of the perceiving system 5.
When a negative evaluation of a behavior of the autonomous vehicle 100 (delivered by the human-machine interface 3 or by the communication systems 2) has been processed in step E1, it is expected that in step E2 it will be determined which automated system Si was the origin of the negatively evaluated behavior. However, it may happen that no automated system Si receives a negative score during execution of step E2. Thus, the situation is then one in which the supervision of the systems performed in step E2 determines no possible source of the behavior of the autonomous vehicle, and therefore no means of correcting the negatively evaluated behavior. In this case, step E4 of transferring the task of driving to a human driver is passed to.
In step E3, at least one automated system Si of the ordered set ENS is updated depending on a score Ni assigned to the at least one automated system Si.
Step E3 comprises implementing an adaptation of each automated system Si that obtained a score Ni during execution of step E2. In one embodiment, the adaptation comprises implementing a reinforcement learning algorithm RLi and/or the adaptation comprises implementing a switching system SWi.
In the remainder of the document, the term “adaptation” of a system Si designates training or improving or updating the system Si by taking into account a score Ni obtained during execution of step E2.
As a variant, the learning algorithm RLi could use a learning method of the Q-Learning type, which allows the system Si to learn a strategy to determine which action Ai to perform in each state Ei of the system. It functions by learning a function Qi that makes it possible to determine the potential gain, Qi(Ei(t),Ai(t)), i.e. the long-term reward gained by choosing an action Ai(t) in a state Ei(t) in accordance with an optimal policy. One of the advantages of the Q-learning method is that it does not depend on the definition of an evolution model or control strategy defined beforehand by the user, but is based directly on the interaction of the system with its environment and the reward received in each step.
Thus, when adaptation of an autonomous system Si comprises implementing a learning algorithm RLi, the positive or zero score obtained by an automated system Si during execution of step E2 makes it possible to improve the robustness of the system, in particular by saving to the local memory 82 the parameters of the system Si that were applied during the scenario associated with obtainment of the positive or zero score, in particular by saving the data received as input by the system and the data delivered as output by the system Si.
Alternatively, adaptation of an autonomous system Si may comprise implementing a switching system SWi. A switching system SWi consists of a set of sub-systems SSWik and of a logical law Li (or switching controller Li) that indicates which sub-system SSWik is active.
In one embodiment, a Q-learning algorithm could interact with a switching controller Li designed to supervise the switching system SWi.
The choice between a first type of adaptation, implementing a reinforcement learning algorithm RLi, and a second type of adaptation, implementing a switching system SWi, may be guided by the need for the system Si to exhibit stability. For example, in the case of a system 7 for controlling movement of the vehicle 7, it is essential for the system to exhibit stability for reasons of safety of the movement of the vehicle. In this case, an adaptation of the second type will preferably be used, i.e. an adaptation implementing a switching system SWi.
Alternatively, when the system Si to be adapted manages a large number of data, it may be more advantageous to use an adaptation of the first type, i.e. one implementing a reinforcement learning algorithm RLi. For example, it is particularly advantageous to use a reinforcement learning algorithm to adapt the decision-making system 6. Specifically, this type of algorithm makes it possible to manage a large number of physical and measured data, then to evaluate relationships between data describing the environment of the vehicle and a decision made by the decision-making system 6.
In other words, the complexity of the task of driving and the unpredictability of the environment make it difficult to model the current situation and the risk associated therewith. Reinforcement learning algorithms RLi offer an alternative solution to modeling; they allow the increasing complexity of the system to be managed through provision of an intermediate solution comprising an exploratory processing first part, and a processing second part for exploiting data, the first and second parts together making it possible to infer a behavior of the system Si.
One example of implementation of a supervising method according to the invention is illustrated in
The graph G1 of
The target path 201 is represented by the straight line Y=0 in the graph G1.
The autonomous vehicle 100 initiates a lane change maneuver when it is located at a lateral distance from the target path 201 equal to 3 meters, as represented by point A in
The system 5 for processing perception data identifies the presence of a truck located at a y-coordinate 202 equal to −1 meter, i.e. at a lateral distance of 1 meter from the target path and at a lateral distance of 4 meters from the autonomous vehicle 100 at the start of the maneuver (i.e. at point A).
On the basis of the information delivered by the system 5 for processing perception data, the decision-making system 6 transmits to the movement-controlling system 7
Depending on the setpoints determined by the decision-making system 6, the objective of the movement-controlling system 7 is to follow the path smoothly (in particular by applying a small steering force) while respecting the required lateral constraints.
Consequently, in the iteration of the supervising step E2 applied to the movement-controlling system 7, and more particularly in sub-step E21, the system 7 is evaluated according to the following criteria:
As long as the vehicle meets the first and second criteria,
In the example illustrated in
In step E22, the first version 71 of the system 7 therefore obtains a highly negative score N71, in particular −6, as is illustrated in
In step E3, the score N71 is then transmitted to the system 7 which is thus informed that it must be improved.
A second version 72 of the system 7 is then used instead of the first version 71.
In step E3, the score N72 is then transmitted to the system 7 which is thus informed that it must still be improved.
A third version 73 of the system 7 is then used instead of the second version 72.
In a step E4, control of the vehicle is transferred to a user of the autonomous vehicle 100. The automated systems Si are then no longer active. The way in which the vehicle is being controlled is then updated in the local memory 82, i.e. to reflect that the first way T1 is now being used.
Finally, the supervising method according to the invention makes it possible to evaluate, validate and improve any autonomous advanced driver-assistance system (ADAS) or any automated system of an autonomous vehicle.
To this end, the supervising method assigns individual scores (or rewards) to the behavior of each supervised system, these scores potentially being positive or negative. The first effect thereof is to determine the source of a potential malfunction and thus to allow measures to be taken to increase the safety of operation of each supervised system individually. The second effect thereof is to improve operation of each supervised system, through storage of data reflecting the experience of the autonomous vehicle, in particular when a supervised system obtains a positive score. The experience of each supervised system may be capitalized upon through use, for example, of a reinforcement learning algorithm or a switching system.
The supervising method according to the invention may be implemented in various circumstances.
Firstly, supervision may occur during the phase of calibrating the autonomous vehicle, in order to train each automated system of the autonomous vehicle 100 before the vehicle is put on sale.
Secondly, supervision may occur during the everyday use of the autonomous vehicle 100 by a user. Supervision may then be periodic. In addition or alternatively, supervision may be triggered on an ad hoc basis following a negative evaluation of a behavior of the autonomous vehicle 100, the evaluation possibly coming from data delivered by V2V or V2i networks, or from a human-machine interface of the autonomous vehicle 100.
When it generates a positive score, the supervision then makes it possible to take an opportunity to learn from an experience when, for example, a decision made by the autonomous vehicle was particularly well suited to a delicate driving situation.
When it generates a negative score, the supervision signals a problem to be solved and leads to improvement of the supervised system and/or transfer of the task of driving to a user of the autonomous vehicle 100.
The supervising method is also applicable to a human driver of the autonomous vehicle 100. In this case, the supervising method transmits scores to the human driver via a human-machine interface, these scores potentially being determined depending on evaluations delivered by a surrounding vehicle or piece of infrastructure.
The supervising method according to the invention thus has a number of advantages.
Firstly, it improves the performance, reliability and safety of the vehicle, in particular by improving the adaptability of the vehicle to normal and critical situations.
In addition, the supervising method according to the invention allows the vehicle to improve its operation as it is used, by acquiring knowledge of the situations encountered.
The supervising method according to the invention is applicable to any automated system with which the vehicle is equipped. It may also be applied to a human driver of the vehicle.
Number | Date | Country | Kind |
---|---|---|---|
FR2113515 | Dec 2021 | FR | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/EP2022/085294 | 12/12/2022 | WO |