The present disclosure pertains to methods for evaluating the performance of trajectory planners in real or simulated scenarios, and computer programs and systems for implementing the same. Such planners are capable of autonomously planning ego trajectories for fully/semi-autonomous vehicles or other forms of mobile robot. Example applications include ADS (Autonomous Driving System) and ADAS (Advanced Driver Assist System) performance testing.
There have been major and rapid developments in the field of autonomous vehicles. An autonomous vehicle (AV) is a vehicle which is equipped with sensors and control systems which enable it to operate without a human controlling its behaviour. An autonomous vehicle is equipped with sensors which enable it to perceive its physical environment, such sensors including for example cameras, radar and lidar. Autonomous vehicles are equipped with suitably programmed computers which are capable of processing data received from the sensors and making safe and predictable decisions based on the context which has been perceived by the sensors. An autonomous vehicle may be fully autonomous (in that it is designed to operate with no human supervision or intervention, at least in certain circumstances) or semi-autonomous. Semi-autonomous systems require varying levels of human oversight and intervention. An Advanced Driver Assist System (ADAS) and certain levels of Autonomous Driving System (ADS) may be classed as semi-autonomous. There are different facets to testing the behaviour of the sensors and control systems aboard a particular autonomous vehicle, or a type of autonomous vehicle.
A “level 5” vehicle is one that can operate entirely autonomously in any circumstances, because it is always guaranteed to meet some minimum level of safety. Such a vehicle would not require manual controls (steering wheel, pedals etc.) at all.
By contrast, level 3 and level 4 vehicles can operate fully autonomously but only within certain defined circumstances (e.g. within geofenced areas). A level 3 vehicle must be equipped to autonomously handle any situation that requires an immediate response (such as emergency braking); however, a change in circumstances may trigger a “transition demand”, requiring a driver to take control of the vehicle within some limited timeframe. A level 4 vehicle has similar limitations; however, in the event the driver does not respond within the required timeframe, a level 4 vehicle must also be capable of autonomously implementing a “minimum risk maneuver” (MRM), i.e. some appropriate action(s) to bring the vehicle to safe conditions (e.g. slowing down and parking the vehicle). A level 2 vehicle requires the driver to be ready to intervene at any time, and it is the responsibility of the driver to intervene if the autonomous systems fail to respond properly at any time. With level 2 automation, it is the responsibility of the driver to determine when their intervention is required; for level 3 and level 4, this responsibility shifts to the vehicle's autonomous systems and it is the vehicle that must alert the driver when intervention is required.
Safety is an increasing challenge as the level of autonomy increases and more responsibility shifts from human to machine. In autonomous driving, the importance of guaranteed safety has been recognized. Guaranteed safety does not necessarily imply zero accidents, but rather means guaranteeing that some minimum level of safety is met in defined circumstances. It is generally assumed this minimum level of safety must significantly exceed that of human drivers for autonomous driving to be viable.
According to Shalev-Shwartz et al. “On a Formal Model of Safe and Scalable Self-driving Cars” (2017), arXiv:1708.06374 (the RSS Paper), which is incorporated herein by reference in its entirety, human driving is estimated to cause of the order 10−6 severe accidents per hour. On the assumption that autonomous driving systems will need to reduce this by at least three order of magnitude, the RSS Paper concludes that a minimum safety level of the order of 10−9 severe accidents per hour needs to be guaranteed, noting that a pure data-driven approach would therefore require vast quantities of driving data to be collected every time a change is made to the software or hardware of the AV system.
The RSS paper provides a model-based approach to guaranteed safety. A rule-based Responsibility-Sensitive Safety (RSS) model is constructed by formalizing a small number of “common sense” driving rules:
The RSS model is presented as provably safe, in the sense that, if all agents were to adhere to the rules of the RSS model at all times, no accidents would occur. The aim is to reduce, by several orders of magnitude, the amount of driving data that needs to be collected in order to demonstrate the required safety level.
A safety model (such as RSS) can be used as a basis for evaluating the quality of trajectories that are planned or realized by an ego agent in a real or simulated scenario under the control of an autonomous system (stack). The stack is tested by exposing it to different scenarios, and evaluating the resulting ego trajectories for compliance with rules of the safety model (rules-based testing). A rules-based testing approach can also be applied to other facets of performance, such as comfort or progress towards a defined goal.
A first aspect herein is directed to a computer-implemented method of evaluating the performance of a trajectory planner for a mobile robot in a scenario, in which the trajectory planner is used to control an ego agent of the scenario responsive to at least one other agent of the scenario, the method comprising: determining a scenario parameter set (set of one or more scenario parameters) for the scenario and a likelihood of the set of scenario parameters; computing an impact score for a dynamic interaction (failure event or near failure event) between the mobile robot and the other agent occurring in the scenario, the impact score quantifying severity of the dynamic interaction; and computing a risk score for the instance of the scenario based on the impact score and the likelihood of the set of scenario parameters.
The method allows the most material instances of failure (or near failure) to be easily pinpointed. That is to say, scenario instances that are both likely to occur in reality and that result in serious failure (or near failure). The risk score quantifies the significance of the scenario instance in a manner that depends on both the severity of failure (or near failure) and the likelihood of the scenario instance occurring in the real world (the most significant scenarios being those that are likely and result in the worst instances of failure or near-failure). “Risk” in this context refers to any measure that takes into account both the likelihood of the scenario instance and the severity of its outcome. The terms risk and significance are generally used interchangeably herein, unless otherwise indicated.
In embodiments, the scenario may be simulated (a simulated scenario instance run in a simulator according to the scenario parameter set). The mobile robot is a simulated ego agent of the scenario in this case.
The likelihood may be determined from at least one distribution associated with the set of scenario parameters.
For example, with a simulated scenario parameter, the scenario parameter set may be sampled for simulating the scenario based on the at least one parameter distribution, and the likelihood may be determined from the at least one distribution from which the parameter set is sampled.
The risk score may be stored in association with the scenario parameter set on which the instance of the scenario is based.
The risk score may be outputted on a graphical user interface.
The method may comprise generating display data for controlling a display to render a visualization of multiple scenario parameter sets, and a risk score may be computed for each scenario parameter set.
The dynamic interaction could be a failure event, such as a collision between the ego agent and the other agent. However, other forms of interaction event, such as a “near miss” may be considered. In the latter case, the impact score could quantify “how close” the agents have come to a failure event (such as a collision), e.g. based on the minimum distance between the agents. For example, the dynamic interaction could be one ego agent approaching the other, or cutting-in in front of it. The term ‘interaction’ is used in a broad sense, and does necessarily imply that the agents are specifically reacting to each other. The impact score generally quantifies the severity of failure and, in some embodiments, how close the trajectory planner came to failure; for example (for example, the impact score may depend on how close the system came to a failure event, such as a collision, and the severity of that type of failure event).
For example, the impact score may be a robustness score computed for a performance evaluation rule (or combination of rules), or be derived from a robustness score.
The method may comprise applying one or more performance evaluation rules to a trace of the ego agent and trace of the other agent generated in the instance of the scenario.
The dynamic interaction may, for example, be a failure on at least one performance evaluation rule (or near failure).
Each performance evaluation rule may be associated with an importance value, and the impact score may be computed based on the importance value(s) of the at least one performance evaluation rule on which failure or near failure occurs.
The scenario parameter set may be sampled for running the simulated scenario based on the at least one parameter distribution used to determine the likelihood of the scenario parameter set.
For example, the impact score may be computed in respect of a single rule, e.g. which is equal to that rule's importance value if the rule is failed at any point in the scenario instance, and zero otherwise.
As another example, the impact score may be computed for multiple rules, e.g. by summing or otherwise aggregating the importance values of any rules that are failed.
The importance value associated with a rule quantifies how critical that rule is relative to other rules. For example, collision avoidance rules may be assigned higher importance values than comfort rules.
A second aspect herein is directed to a computer-implemented method of evaluating the performance of a trajectory planner for a mobile robot in a scenario, in which the trajectory planner is used to control the mobile robot responsive to at least one other agent of the scenario, thereby generating a trace of the mobile robot and a trace of the other agent, the method comprising: determining a scenario parameter set for the scenario and a likelihood of the set of scenario parameters; applying one or more performance evaluation rules to the traces, thereby obtaining a set of performance evaluation results for the trajectory planner; computing an impact score based on the performance evaluation results; and computing a significance score for the instance of the scenario based on the impact score and the likelihood of the set of scenario parameters.
In embodiments, the scenario may be simulated (in this case, the mobile robot is an ego agent of the simulated scenario, which is run as a scenario instance according to the scenario parameter set in a simulator).
Another aspect provides a computer-implemented method of evaluating the performance of a trajectory planner for a mobile robot in a simulated scenario, the method comprising: determining a scenario parameter set for the simulated scenario and a likelihood of the set of scenario parameters; running an instance of the scenario according to the scenario parameter set in a simulator, in which the trajectory planner is used to control an ego agent of the scenario responsive to at least one other agent of the scenario; computing an impact score for a failure event or other dynamic interaction between the ego agent and the other agent occurring in the scenario instance, the impact score quantifying severity of the failure event; computing a risk score for the instance of the scenario based on the impact score and the likelihood of the set of scenario parameters.
Further aspects provide a computer system comprising one or more computers configured to implement the method of the first, second or third aspect or any embodiment thereof, and executable program instructions for programming a computer system to implement the same.
The method of any preceding claim, comprising using the risk score to identify and mitigate an issue in the trajectory planner.
In any of the above, the trajectory planner may be tested in isolation or in combination with one or more other components (such as a controller, prediction system and/or planning system).
In that case, the risk score may be used to identify and mitigate an issue in another such component.
For a better understanding of the present disclosure, and to show how embodiments of the same may be carried into effect, reference is made by way of example only to the following figures in which:
The described embodiments provide a testing pipeline to facilitate rules-based testing of mobile robot stacks in real or simulated scenarios. Agent (actor) behaviour in real or simulated scenarios is evaluated by a test oracle based on defined performance evaluation rules. Such rules may evaluate different facets of safety. For example, a safety rule set may be defined to assess the performance of the stack against a particular safety standard, regulation or safety model (such as RSS), or bespoke rule sets may be defined for testing any aspect of performance. The testing pipeline is not limited in its application to safety, and can be used to test any aspects of performance, such as comfort or progress towards some defined goal. A rule editor allows performance evaluation rules to be defined or modified and passed to the test oracle.
A “full” stack typically involves everything from processing and interpretation of low-level sensor data (perception), feeding into primary higher-level functions such as prediction and planning, as well as control logic to generate suitable control signals to implement planning-level decisions (e.g. to control braking, steering, acceleration etc.). For autonomous vehicles, level 3 stacks include some logic to implement transition demands and level 4 stacks additionally include some logic for implementing minimum risk maneuvers. The stack may also implement secondary control functions e.g. of signalling, headlights, windscreen wipers etc.
The term “stack” can also refer to individual sub-systems (sub-stacks) of the full stack, such as perception, prediction, planning or control stacks, which may be tested individually or in any desired combination. A stack can refer purely to software, i.e. one or more computer programs that can be executed on one or more general-purpose computer processors.
Whether real or simulated, a scenario requires an ego agent to navigate a real or modelled physical context. The ego agent is a real or simulated mobile robot that moves under the control of the stack under testing. The physical context includes static and/or dynamic element(s) that the stack under testing is required to respond to effectively. For example, the mobile robot may be a fully or semi-autonomous vehicle under the control of the stack (the ego vehicle). The physical context may comprise a static road layout and a given set of environmental conditions (e.g. weather, time of day, lighting conditions, humidity, pollution/particulate level etc.) that could be maintained or varied as the scenario progresses. An interactive scenario additionally includes one or more other agents (“external” agent(s), e.g. other vehicles, pedestrians, cyclists, animals etc.).
The following examples consider applications to autonomous vehicle testing. However, the principles apply equally to other forms of mobile robot.
Scenarios may be represented or defined at different levels of abstraction. More abstracted scenarios accommodate a greater degree of variation. For example, a “cut-in scenario” or a “lane change scenario” are examples of highly abstracted scenarios, characterized by a maneuver or behaviour of interest, that accommodate many variations (e.g. different agent starting locations and speeds, road layout, environmental conditions etc.). A “scenario run” refers to a concrete occurrence of an agent(s) navigating a physical context, optionally in the presence of one or more other agents. For example, multiple runs of a cut-in or lane change scenario could be performed (in the real-world and/or in a simulator) with different agent parameters (e.g. starting location, speed etc.), different road layouts, different environmental conditions, and/or different stack configurations etc. The terms “run” and “instance” are used interchangeably in this context.
In the following examples, the performance of the stack is assessed, at least in part, by evaluating the behaviour of the ego agent in the test oracle against a given set of performance evaluation rules, over the course of one or more runs. The rules are applied to “ground truth” of the (or each) scenario run which, in general, simply means an appropriate representation of the scenario run (including the behaviour of the ego agent) that is taken as authoritative for the purpose of testing. Ground truth is inherent to simulation; a simulator computes a sequence of scenario states, which is, by definition, a perfect, authoritative representation of the simulated scenario run. In a real-world scenario run, a “perfect” representation of the scenario run does not exist in the same sense; nevertheless, suitably informative ground truth can be obtained in numerous ways, e.g. based on manual annotation of on-board sensor data, automated/semi-automated annotation of such data (e.g. using offline/non-real time processing), and/or using external information sources (such as external sensors, maps etc.) etc.
The scenario ground truth typically includes a “trace” of the ego agent and any other (salient) agent(s) as applicable. A trace is a history of an agent's location and motion over the course of a scenario. There are many ways a trace can be represented. Trace data will typically include spatial and motion data of an agent within the environment. The term is used in relation to both real scenarios (with real-world traces) and simulated scenarios (with simulated traces). The trace typically records an actual trajectory realized by the agent in the scenario. With regards to terminology, a “trace” and a “trajectory” may contain the same or similar types of information (such as a series of spatial and motion states over time). The term trajectory is generally favoured in the context of planning (and can refer to future/predicted trajectories), whereas the term trace is generally favoured in relation to past behaviour in the context of testing/evaluation.
In a simulation context, a “scenario description” is provided to a simulator as input. For example, a scenario description may be encoded using a scenario description language (SDL), or in any other form that can be consumed by a simulator. A scenario description is typically a more abstract representation of a scenario, that can give rise to multiple simulated runs. Depending on the implementation, a scenario description may have one or more configurable parameters that can be varied to increase the degree of possible variation. The degree of abstraction and parameterization is a design choice. For example, a scenario description may encode a fixed layout, with parameterized environmental conditions (such as weather, lighting etc.). Further abstraction is possible, however, e.g. with configurable road parameter(s) (such as road curvature, lane configuration etc.). The input to the simulator comprises the scenario description together with a chosen set of parameter value(s) (as applicable). The latter may be referred to as a parameterization of the scenario. The configurable parameter(s) define a parameter space (also referred to as the scenario space), and the parameterization corresponds to a point in the parameter space. In this context, a “scenario instance” may refer to an instantiation of a scenario in a simulator based on a scenario description and (if applicable) a chosen parameterization.
For conciseness, the term scenario may also be used to refer to a scenario run, as well a scenario in the more abstracted sense. The meaning of the term scenario will be clear from the context in which it is used.
Trajectory planning is an important function in the present context, and the terms “trajectory planner”, “trajectory planning system” and “trajectory planning stack” may be used interchangeably herein to refer to a component or components that can plan trajectories for a mobile robot into the future. Trajectory planning decisions ultimately determine the actual trajectory realized by the ego agent (although, in some testing contexts, this may be influenced by other factors, such as the implementation of those decisions in the control stack, and the real or modelled dynamic response of the ego agent to the resulting control signals).
A trajectory planner may be tested in isolation, or in combination with one or more other systems (e.g. perception, prediction and/or control). Within a full stack, planning generally refers to higher-level autonomous decision-making capability (such as trajectory planning), whilst control generally refers to the lower-level generation of control signals for carrying out those autonomous decisions. However, in the context of performance testing, the term control is also used in the broader sense. For the avoidance of doubt, when a trajectory planner is said to control an ego agent in simulation, that does not necessarily imply that a control system (in the narrower sense) is tested in combination with the trajectory planner.
To provide relevant context to the described embodiments, further details of an example form of AV stack will now be described.
In a real-world context, the perception system 102 receives sensor outputs from an on-board sensor system 110 of the AV, and uses those sensor outputs to detect external agents and measure their physical state, such as their position, velocity, acceleration etc. The on-board sensor system 110 can take different forms but generally comprises a variety of sensors such as image capture devices (cameras/optical sensors), lidar and/or radar unit(s), satellite-positioning sensor(s) (GPS etc.), motion/inertial sensor(s) (accelerometers, gyroscopes etc.) etc. The onboard sensor system 110 thus provides rich sensor data from which it is possible to extract detailed information about the surrounding environment, and the state of the AV and any external actors (vehicles, pedestrians, cyclists etc.) within that environment. The sensor outputs typically comprise sensor data of multiple sensor modalities such as stereo images from one or more stereo optical sensors, lidar, radar etc. Sensor data of multiple sensor modalities may be combined using filters, fusion components etc.
The perception system 102 typically comprises multiple perception components which co-operate to interpret the sensor outputs and thereby provide perception outputs to the prediction system 104.
In a simulation context, depending on the nature of the testing—and depending, in particular, on where the stack 100 is “sliced” for the purpose of testing (see below)—it may or may not be necessary to model the on-board sensor system 100. With higher-level slicing, simulated sensor data is not required therefore complex sensor modelling is not required.
The perception outputs from the perception system 102 are used by the prediction system 104 to predict future behaviour of external actors (agents), such as other vehicles in the vicinity of the AV.
Predictions computed by the prediction system 104 are provided to the planner 106, which uses the predictions to make autonomous driving decisions to be executed by the AV in a given driving scenario. The inputs received by the planner 106 would typically indicate a drivable area and would also capture predicted movements of any external agents (obstacles, from the AV's perspective) within the drivable area. The driveable area can be determined using perception outputs from the perception system 102 in combination with map information, such as an HD (high definition) map.
A core function of the planner 106 is the planning of trajectories for the AV (ego trajectories), taking into account predicted agent motion. This may be referred to as trajectory planning. A trajectory is planned in order to carry out a desired goal within a scenario. The goal could for example be to enter a roundabout and leave it at a desired exit; to overtake a vehicle in front; or to stay in a current lane at a target speed (lane following). The goal may, for example, be determined by an autonomous route planner (not shown).
The controller 108 executes the decisions taken by the planner 106 by providing suitable control signals to an on-board actor system 112 of the AV. In particular, the planner 106 plans trajectories for the AV and the controller 108 generates control signals to implement the planned trajectories. Typically, the planner 106 will plan into the future, such that a planned trajectory may only be partially implemented at the control level before a new trajectory is planned by the planner 106. The actor system 112 includes “primary” vehicle systems, such as braking, acceleration and steering systems, as well as secondary systems (e.g. signalling, wipers, headlights etc.).
Note, there may be a distinction between a planned trajectory at a given time instant, and the actual trajectory followed by the ego agent. Planning systems typically operate over a sequence of planning steps, updating the planned trajectory at each planning step to account for any changes in the scenario since the previous planning step (or, more precisely, any changes that deviate from the predicted changes). The planning system 106 may reason into the future, such that the planned trajectory at each planning step extends beyond the next planning step. Any individual planned trajectory may, therefore, not be fully realized (if the planning system 106 is tested in isolation, in simulation, the ego agent may simply follow the planned trajectory exactly up to the next planning step; however, as noted, in other real and simulation contexts, the planned trajectory may not be followed exactly up to the next planning step, as the behaviour of the ego agent could be influenced by other factors, such as the operation of the control system 108 and the real or modelled dynamics of the ego vehicle). In many testing contexts, the actual trajectory of the ego agent is what ultimately matters; in particular, whether the actual trajectory is safe, as well as other factors such as comfort and progress. However, the rules-based testing approach herein can also be applied to planned trajectories (even if those planned trajectories are not fully or exactly realized by the ego agent). For example, even if the actual trajectory of an agent is deemed safe according to a given set of safety rules, it might be that an instantaneous planned trajectory was unsafe; the fact that the planner 106 was considering an unsafe course of action may be revealing, even if it did not lead to unsafe agent behaviour in the scenario. Instantaneous planned trajectories constitute one form of internal state that can be usefully evaluated, in addition to actual agent behaviour in the simulation. Other forms of internal stack state can be similarly evaluated.
The example of
The extent to which the various stack functions are integrated or separable can vary significantly between different stack implementations—in some stacks, certain aspects may be so tightly coupled as to be indistinguishable. For example, in other stacks, planning and control may be integrated (e.g. such stacks could plan in terms of control signals directly), whereas other stacks (such as that depicted in
It will be appreciated that the term “stack” encompasses software, but can also encompass hardware. In simulation, software of the stack may be tested on a “generic” off-board computer system, before it is eventually uploaded to an on-board computer system of a physical vehicle. However, in “hardware-in-the-loop” testing, the testing may extend to underlying hardware of the vehicle itself. For example, the stack software may be run on the on-board computer system (or a replica thereof) that is coupled to the simulator for the purpose of testing. In this context, the stack under testing extends to the underlying computer hardware of the vehicle. As another example, certain functions of the stack 110 (e.g. perception functions) may be implemented in dedicated hardware. In a simulation context, hardware-in-the loop testing could involve feeding synthetic sensor data to dedicated hardware perception components.
Scenarios can be obtained for the purpose of simulation in various ways, including manual encoding. The system is also capable of extracting scenarios for the purpose of simulation from real-world runs, allowing real-world situations and variations thereof to be re-created in the simulator 202.
In the present off-board content, there is no requirement for the traces to be extracted in real-time (or, more precisely, no need for them to be extracted in a manner that would support real-time planning); rather, the traces are extracted “offline”. Examples of offline perception algorithms include non-real time and non-causal perception algorithms. Offline techniques contrast with “on-line” techniques that can feasibly be implemented within an AV stack 100 to facilitate real-time planning/decision making.
For example, it is possible to use non-real time processing, which cannot be performed on-line due to hardware or other practical constraints of an AV's onboard computer system. For example, one or more non-real time perception algorithms can be applied to the real-world run data 140 to extract the traces. A non-real time perception algorithm could be an algorithm that it would not be feasible to run in real time because of the computation or memory resources it requires.
It is also possible to use “non-causal” perception algorithms in this context. A non-causal algorithm may or may not be capable of running in real-time at the point of execution, but in any event could not be implemented in an online context, because it requires knowledge of the future. For example, a perception algorithm that detects an agent state (e.g. location, pose, speed etc.) at a particular time instant based on subsequent data could not support real-time planning within the stack 100 in an on-line context, because it requires knowledge of the future (unless it was constrained to operate with a short look ahead window). For example, filtering with a backwards pass is a non-causal algorithm that can sometimes be run in real-time, but requires knowledge of the future.
The term “perception” generally refers to techniques for perceiving structure in the real-world data 140, such as 2D or 3D bounding box detection, location detection, pose detection, motion detection etc. For example, a trace may be extracted as a time-series of bounding boxes or other spatial states in 3D space or 2D space (e.g. in a birds-eye-view frame of reference), with associated motion information (e.g. speed, acceleration, jerk etc.). In the context of image processing, such techniques are often classed as “computer vision”, but the term perception encompasses a broader range of sensor modalities.
Further details of the testing pipeline and the test oracle 252 will now be described. The examples that follow focus on simulation-based testing. However, as noted, the test oracle 252 can equally be applied to evaluate stack performance on real scenarios, and the relevant description below applies equally to real scenarios. The following description refers to the stack 100 of
As described previously, the idea of simulation-based testing is to run a simulated driving scenario that an ego agent must navigate under the control of the stack 100 being tested. Typically, the scenario includes a static drivable area (e.g. a particular static road layout) that the ego agent is required to navigate, typically in the presence of one or more other dynamic agents (such as other vehicles, bicycles, pedestrians etc.). To this end, simulated inputs 203 are provided from the simulator 202 to the stack 100 under testing.
The slicing of the stack dictates the form of the simulated inputs 203. By way of example,
By contrast, so-called “planning-level” simulation would essentially bypass the perception system 102. The simulator 202 would instead provide simpler, higher-level inputs 203 directly to the prediction system 104. In some contexts, it may even be appropriate to bypass the prediction system 104 as well, in order to test the planner 106 on predictions obtained directly from the simulated scenario (i.e. “perfect” predictions).
Between these extremes, there is scope for many different levels of input slicing, e.g. testing only a subset of the perception system 102, such as “later” (higher-level) perception components, e.g. components such as filters or fusion components which operate on the outputs from lower-level perception components (such as object detectors, bounding box detectors, motion detectors etc.).
Whatever form they take, the simulated inputs 203 are used (directly or indirectly) as a basis for decision-making by the planner 108. The controller 108, in turn, implements the planner's decisions by outputting control signals 109. In a real-world context, these control signals would drive the physical actor system 112 of AV. In simulation, an ego vehicle dynamics model 204 is used to translate the resulting control signals 109 into realistic motion of the ego agent within the simulation, thereby simulating the physical response of an autonomous vehicle to the control signals 109.
Alternatively, a simpler form of simulation assumes that the ego agent follows each planned trajectory exactly between planning steps. This approach bypasses the control system 108 (to the extent it is separable from planning) and removes the need for the ego vehicle dynamic model 204. This may be sufficient for testing certain facets of planning.
To the extent that external agents exhibit autonomous behaviour/decision making within the simulator 202, some form of agent decision logic 210 is implemented to carry out those decisions and determine agent behaviour within the scenario. The agent decision logic 210 may be comparable in complexity to the ego stack 100 itself or it may have a more limited decision-making capability. The aim is to provide sufficiently realistic external agent behaviour within the simulator 202 to be able to usefully test the decision-making capabilities of the ego stack 100. In some contexts, this does not require any agent decision making logic 210 at all (open-loop simulation), and in other contexts useful testing can be provided using relatively limited agent logic 210 such as basic adaptive cruise control (ACC). One or more agent dynamics models 206 may be used to provide more realistic agent behaviour if appropriate.
A scenario is run in accordance with a scenario description 201a and (if applicable) a chosen parameterization 201b of the scenario. A scenario typically has both static and dynamic elements which may be “hard coded” in the scenario description 201a or configurable and thus determined by the scenario description 201a in combination with a chosen parameterization 201b. In a driving scenario, the static element(s) typically include a static road layout.
The dynamic element(s) typically include one or more external agents within the scenario, such as other vehicles, pedestrians, bicycles etc.
The extent of the dynamic information provided to the simulator 202 for each external agent can vary. For example, a scenario may be described by separable static and dynamic layers. A given static layer (e.g. defining a road layout) can be used in combination with different dynamic layers to provide different scenario instances. The dynamic layer may comprise, for each external agent, a spatial path to be followed by the agent together with one or both of motion data and behaviour data associated with the path. In simple open-loop simulation, an external actor simply follows the spatial path and motion data defined in the dynamic layer that is non-reactive i.e. does not react to the ego agent within the simulation. Such open-loop simulation can be implemented without any agent decision logic 210. However, in closed-loop simulation, the dynamic layer instead defines at least one behaviour to be followed along a static path (such as an ACC behaviour). In this case, the agent decision logic 210 implements that behaviour within the simulation in a reactive manner, i.e. reactive to the ego agent and/or other external agent(s). Motion data may still be associated with the static path but in this case is less prescriptive and may for example serve as a target along the path. For example, with an ACC behaviour, target speeds may be set along the path which the agent will seek to match, but the agent decision logic 210 might be permitted to reduce the speed of the external agent below the target at any point along the path in order to maintain a target headway from a forward vehicle.
As will be appreciated, scenarios can be described for the purpose of simulation in many ways, with any degree of configurability. For example, the number and type of agents, and their motion information may be configurable as part of the scenario parameterization 201b.
The output of the simulator 202 for a given simulation includes an ego trace 212a of the ego agent and one or more agent traces 212b of the one or more external agents (traces 212). Each trace 212a, 212b is a complete history of an agent's behaviour within a simulation having both spatial and motion components. For example, each trace 212a, 212b may take the form of a spatial path having motion data associated with points along the path such as speed, acceleration, jerk (rate of change of acceleration), snap (rate of change of jerk) etc.
Additional information is also provided to supplement and provide context to the traces 212. Such additional information is referred to as “contextual” data 214. The contextual data 214 pertains to the physical context of the scenario, and can have both static components (such as road layout) and dynamic components (such as weather conditions to the extent they vary over the course of the simulation). To an extent, the contextual data 214 may be “passthrough” in that it is directly defined by the scenario description 201a or the choice of parameterization 201b, and is thus unaffected by the outcome of the simulation. For example, the contextual data 214 may include a static road layout that comes from the scenario description 201a or the parameterization 201b directly. However, typically the contextual data 214 would include at least some elements derived within the simulator 202. This could, for example, include simulated environmental data, such as weather data, where the simulator 202 is free to change weather conditions as the simulation progresses. In that case, the weather data may be time-dependent, and that time dependency will be reflected in the contextual data 214.
The test oracle 252 receives the traces 212 and the contextual data 214, and scores those outputs in respect of a set of performance evaluation rules 254. The performance evaluation rules 254 are shown to be provided as an input to the test oracle 252.
The rules 254 are categorical in nature (e.g. pass/fail-type rules). Certain performance evaluation rules are also associated with numerical performance metrics used to “score” trajectories (e.g. indicating a degree of success or failure or some other quantity that helps explain or is otherwise relevant to the categorical results). The evaluation of the rules 254 is time-based—a given rule may have a different outcome at different points in the scenario. The scoring is also time-based: for each performance evaluation metric, the test oracle 252 tracks how the value of that metric (the score) changes over time as the simulation progresses. The test oracle 252 provides an output 256 comprising a time sequence 256a of categorical (e.g. pass/fail) results for each rule, and a score-time plot 256b for each performance metric, as described in further detail later. The results and scores 256a, 256b are informative to the expert 122 and can be used to identify and mitigate performance issues within the tested stack 100. The test oracle 252 also provides an overall (aggregate) result for the scenario (e.g. overall pass/fail). The output 256 of the test oracle 252 is stored in a test database 258, in association with information about the scenario to which the output 256 pertains. For example, the output 256 may be stored in association with the scenario description 210a (or an identifier thereof), and the chosen parameterization 201b. As well as the time-dependent results and scores, an overall score may also be assigned to the scenario and stored as part of the output 256. For example, an aggregate score for each rule (e.g. overall pass/fail) and/or an aggregate result (e.g. pass/fail) across all of the rules 254.
A number of “later” perception components 102B form part of the sub-stack 1005 to be tested and are applied, during testing, to simulated perception inputs 203. The later perception components 102B could, for example, include filtering or other fusion components that fuse perception inputs from multiple earlier perception components.
In the full stack 100, the later perception components 102B would receive actual perception inputs 213 from earlier perception components 102A. For example, the earlier perception components 102A might comprise one or more 2D or 3D bounding box detectors, in which case the simulated perception inputs provided to the late perception components could include simulated 2D or 3D bounding box detections, derived in the simulation via ray tracing. The earlier perception components 102A would generally include component(s) that operate directly on sensor data. With the slicing of
Such perception error models may be referred to as Perception Statistical Performance Models (PSPMs) or, synonymously, “PRISMs”. Further details of the principles of PSPMs, and suitable techniques for building and training them, may be bound in International Patent Publication Nos. WO2021037763 WO2021037760, WO2021037765, WO2021037761, and WO2021037766, each of which is incorporated herein by reference in its entirety. The idea behind PSPMs is to efficiently introduce realistic errors into the simulated perception inputs provided to the sub-stack 100S (i.e. that reflect the kind of errors that would be expected were the earlier perception components 102A to be applied in the real-world). In a simulation context, “perfect” ground truth perception inputs 203G are provided by the simulator, but these are used to derive more realistic (ablated) perception inputs 203 with realistic error introduced by the perception error models(s) 208. The perception error model(s) 208 serve as a “surrogate model” (being a surrogate for the perception system 102, or part of the perception system 102A, but operating on lower-fidelity inputs).
As described in the aforementioned reference, a PSPM can be dependent on one or more variables representing physical condition(s) (“confounders”), allowing different levels of error to be introduced that reflect different possible real-world conditions. Hence, the simulator 202 can simulate different physical conditions (e.g. different weather conditions) by simply changing the value of a weather confounder(s), which will, in turn, change how perception error is introduced.
The later perception components 102b within the sub-stack 100S process the simulated perception inputs 203 in exactly the same way as they would process the real-world perception inputs 213 within the full stack 100, and their outputs, in turn, drive prediction, planning and control.
Alternatively, PRISMs can be used to model the entire perception system 102, including the late perception components 208, in which case a PSPM(s) is used to generate realistic perception output that are passed as inputs to the prediction system 104 directly.
Depending on the implementation, there may or may not be deterministic relationship between a given scenario parameterization 201b and the outcome of the simulation for a given configuration of the stack 100 (i.e. the same parameterization may or may not always lead to the same outcome for the same stack 100). Non-determinism can arise in various ways. For example, when simulation is based on PRISMs, a PRISM might model a distribution over possible perception outputs at each given time step of the scenario, from which a realistic perception output is sampled probabilistically. This leads to non-deterministic behaviour within the simulator 202, whereby different outcomes may be obtained for the same stack 100 and scenario parameterization because different perception outputs are sampled. Alternatively, or additionally, the simulator 202 may be inherently non-deterministic, e.g. weather, lighting or other environmental conditions may be randomized/probabilistic within the simulator 202 to a degree. As will be appreciated, this is a design choice: in other implementations, varying environmental conditions could instead be fully specified in the parameterization 201b of the scenario. With non-deterministic simulation, multiple scenario instances could be run for each parameterization. An aggregate pass/fail result could be assigned to a particular choice of parameterization 201b, e.g. as a count or percentage of pass or failure outcomes.
A test orchestration component 260 is responsible for selecting scenarios for the purpose of simulation. For example, the test orchestration component 260 may select scenario descriptions 201a and suitable parameterizations 201b automatically, which may be based on the test oracle outputs 256 from previous scenarios and/or other criteria.
The performance evaluation rules 254 are constructed as computational graphs (rule trees) to be applied within the test oracle. Unless otherwise indicated, the term “rule tree” herein refers to the computational graph that is configured to implement a given rule. Each rule is constructed as a rule tree, and a set of multiple rules may be referred to as a “forest” of multiple rule trees.
Each assessor node 304 is shown to have at least one child object (node), where each child object is one of the extractor nodes 302 or another one of the assessor nodes 304. Each assessor node receives output(s) from its child node(s) and applies an assessor function to those output(s). The output of the assessor function is a time-series of categorical results. The following examples consider simple binary pass/fail results, but the techniques can be readily extended to non-binary results. Each assessor function assesses the output(s) of its child node(s) against a predetermined atomic rule. Such rules can be flexibly combined in accordance with a desired safety model.
In addition, each assessor node 304 derives a time-varying numerical signal from the output(s) of its child node(s), which is related to the categorical results by a threshold condition (see below).
A top-level root node 304a is an assessor node that is not a child node of any other node. The top-level node 304a outputs a final sequence of results, and its descendants (i.e. nodes that are direct or indirect children of the top-level node 304a) provide the underlying signals and intermediate results.
Signals extracted directly from the scenario ground truth 310 by the extractor nodes 302 may be referred to as “raw” signals, to distinguish from “derived” signals computed by assessor nodes 304. Results and raw/derived signals may be discretized in time.
A rule editor 400 is provided for constructing rules to be implemented with the test oracle 252. The rule editor 400 receives rule creation inputs from a user (who may or may not be the end-user of the system). In the present example, the rule creation inputs are coded in a domain specific language (DSL) and define at least one rule graph 408 to be implemented within the test oracle 252. The rules are logical rules in the following examples, with TRUE and FALSE representing pass and failure respectively (as will be appreciated, this is purely a design choice).
The following examples consider rules that are formulated using combinations of atomic logic predicates. Examples of basic atomic predicates include elementary logic gates (OR, AND etc.), and logical functions such as “greater than”, (Gt(a,b)) (which returns TRUE when a is greater than b, and false otherwise).
A Gt function is to implement a safe lateral distance rule between an ego agent and another agent in the scenario (having agent identifier “other_agent_id”). Two extractor nodes (latd, latsd) apply LateralDistance and LateralSafeDistance extractor functions respectively. Those functions operate directly on the scenario ground truth 310 to extract, respectively, a time-varying lateral distance signal (measuring a lateral distance between the ego agent and the identified other agent), and a time-varying safe lateral distance signal for the ego agent and the identified other agent. The safe lateral distance signal could depend on various factors, such as the speed of the ego agent and the speed of the other agent (captured in the traces 212), and environmental conditions (e.g. weather, lighting, road type etc.) captured in the contextual data 214.
An assessor node (is_latd_safe) is a parent to the latd and latsd extractor nodes, and is mapped to the Gt atomic predicate. Accordingly, when the rule tree 408 is implemented, the is_latd_safe assessor node applies the Gt function to the outputs of the latd and latsd extractor nodes, in order to compute a true/false result for each timestep of the scenario, returning TRUE for each time step at which the latd signal exceeds the latsd signal and FALSE otherwise. In this manner, a “safe lateral distance” rule has been constructed from atomic extractor functions and predicates; the ego agent fails the safe lateral distance rule when the lateral distance reaches or falls below the safe lateral distance threshold. As will be appreciated, this is a very simple example of a rule tree. Rules of arbitrary complexity can be constructed according to the same principles.
The test oracle 252 applies the rule tree 408 to the scenario ground truth 310, and provides the results via a user interface (UI) 418.
The numerical output of the top-level node could, for example, be a time-varying robustness score.
Different rule trees can be constructed, e.g. to implement different rules of a given safety model, to implement different safety models, or to apply rules selectively to different scenarios (in a given safety model, not every rule will necessarily be applicable to every scenario; with this approach, different rules or combinations of rules can be applied to different scenarios). Within this framework, rules can also be constructed for evaluating comfort (e.g. based on instantaneous acceleration and/or jerk along the trajectory), progress (e.g. based on time taken to reach a defined goal) etc.
The above examples consider simple logical predicates evaluated on results or signals at a single time instance, such as OR, AND, Gt etc. However, in practice, it may be desirable to formulate certain rules in terms of temporal logic.
Hekmatnejad et al., “Encoding and Monitoring Responsibility Sensitive Safety Rules for Automated Vehicles in Signal Temporal Logic” (2019), MEMOCODE '19: Proceedings of the 17th ACM-IEEE International Conference on Formal Methods and Models for System Design (incorporated herein by reference in its entirety) discloses a signal temporal logic (STL) encoding of the RSS safety rules. Temporal logic provides a formal framework for constructing predicates that are qualified in terms of time. This means that the result computed by an assessor at a given time instant can depend on results and/or signal values at another time instant(s).
For example, a requirement of the safety model may be that an ego agent responds to a certain event within a set time frame. Such rules can be encoded in a similar manner, using temporal logic predicates within the rule tree.
In the above examples, the performance of the stack 100 is evaluated at each time step of a scenario. An overall test result (e.g. pass/fail) can be derived from this—for example, certain rules (e.g. safety-critical rules) may result in an overall failure if the rule is failed at any time step within the scenario (that is, the rule must be passed at every time step to obtain an overall pass on the scenario). For other types of rule, the overall pass/fail criteria may be “softer” (e.g. failure may only be triggered for a certain rule if that rule is failed over some number of sequential time steps), and such criteria may be context dependent.
Certain rules apply only to the ego agent (an example being a comfort rule that assesses whether or not some maximum acceleration or jerk threshold is exceeded by the ego trajectory at any given time instant).
Other rules pertain to the interaction of the ego agent with other agents (for example, a “no collision” rule or the safe distance rule considered above). Each such rule is evaluated in a pairwise fashion between the ego agent and each other agent. As another example, a “pedestrian emergency braking” rule may only be activated when a pedestrian walks out in front of the ego vehicle, and only in respect of that pedestrian agent.
Not every rule will necessarily be applicable to every scenario, and some rules may only be applicable for part of a scenario. Rule activation logic 422 within the test oracle 422 determines if and when each of the rules 254 is applicable to the scenario in question, and selectively activates rules as and when they apply. A rule may, therefore, remain active for the entirety of a scenario, may never be activated for a given scenario, or may be activated for only some of the scenario. Moreover, a rule may be evaluated for different numbers of agents at different points in the scenario. Selectively activating rules in this manner can significantly increase the efficiency of the test oracle 252.
The activation or deactivation of a given rule may be dependent on the activation/deactivation of one or more other rules. For example, an “optimal comfort” rule may be deemed inapplicable when the pedestrian emergency braking rule is activated (because the pedestrian's safety is the primary concern), and the former may be deactivated whenever the latter is active.
Rule evaluation logic 424 evaluates each active rule for any time period(s) it remains active. Each interactive rule is evaluated in a pairwise fashion between the ego agent and any other agent to which it applies.
There may also be a degree of interdependency in the application of the rules. For example, another way to address the relationship between a comfort rule and an emergency braking rule would be to increase a jerk/acceleration threshold of the comfort rule whenever the emergency braking rule is activated for at least one other agent.
Whilst pass/fail results have been considered, rules may be non-binary. For example, two categories for failure—“acceptable” and “unacceptable”—may be introduced. Again, considering the relationship between a comfort rule and an emergency braking rule, an acceptable failure on a comfort rule may occur when the rule is failed but at a time when an emergency braking rule was active. Interdependency between rules can, therefore, be handled in various ways.
The activation criteria for the rules 254 can be specified in the rule creation code provided to the rule editor 400, as can the nature of any rule interdependencies and the mechanism(s) for implementing those interdependencies.
A first selectable element 534a is provided for each time-series of results. This allows lower-level results of the rule tree to be accessed, i.e. as computed lower down in the rule tree.
A second selectable element 534b is provided for each time-series of results, that allows the associated numerical performance scores to be accessed.
An impact score can be computed for a failure event (or near failure event) between agents in various ways, e.g. based on the robustness score of a failed (or near-failed) rule(s), some parameter(s) relating to the failure (or near-failed) event and/or an importance value assigned to a failed (or near-failed) rule(s). The impact score quantifies the severity of the failure or (near-failure) event, and can be defined in various ways.
In addition, a probability can be computed for a given scenario, indicating how likely that scenario is to occur.
An overall risk (or significance) score is computed for a run as a function of the impact score and the likelihood of that run.
To compute an impact score in the case of a collision event, the impact score could be computed as an impact velocity or function thereof, e.g. defined as the absolute or relative velocity of the ego agent (relative to the other agent) when a collision event occurs between those agents. Other metrics can also be used (for example, based on more sophisticated modelling of the collision event).
Alternatively or additionally, severity levels may be assigned to individual rules (all rules or a subset of one or more rules), and the severity level of a failed rule may be used in determining the impact score. This approach assigns a severity level to each performance evaluation rule, quantifying the severity of failure on that rule. For example, each rule may be assigned a severity level from one to five (e.g. with collision avoidance rules assigned level 5, unsafe lane change rules assigned level 4, comfort rules assigned level 1 or 2 etc.). Each severity level is, in turn, associated with an importance value for calculating a risk score.
In this case, an impact score can be computed for a single rule, e.g. which is equal to the rule severity if that rule is failed in a given run, and zero otherwise.
Alternatively or in addition, an overall impact score may be computed, which is aggregated across all runs (e.g. as the sum of the rule severities for any rules that are failed in a given run).
For a scenario characterized by a combination of scenario parameters, the scenario probability indicates how likely that combination of parameters is to occur in the real-world. The probability (likelihood) of a scenario may be modelled as a distribution (or distributions) over variables of the scenario, such as agent starting location, speed etc., from which the scenario parameters (values of those variables) are sampled.
Whether real or simulated, a scenario may be characterized by a set of parameters (variables) V (such as lane widths, road curvature, agent locations, speeds, accelerations etc.). This, in turn, allows the probability of the scenario to be derived as the probability of a particular combination of parameter values, by treating the scenario parameters V as random variables, and determining a probability distribution F over those variables. In mathematical notation, V˜F is used to mean a set of scenario variables V with distribution F. With independent scenario variables, F can be decomposed into multiple distributions over individual scenario parameters (or individual subsets of scenario parameters).
For example, given a sufficiently large data set of real scenario data, the probability distribution F assigned to the scenario parameters V can be determined through a statistical analysis of the dataset (building the distribution F to match the probability of occurrence observed across the real data). For example, the distribution F may be itself be parameterized by distribution parameters θ, such that P(V=s|F)=ƒ(s;θ) where ƒ is a function of s parameterized by θ. In this case, the distribution F can be determined by fitting the distribution parameters θ to the dataset. Suppose the dataset includes observed scenarios 1, . . . N described by parameter values S={s1, . . . , sN} respectively (the observations). Then, the distribution F may, for example, be determined by maximizing the likelihood of the observations, i.e., finding distribution parameters θ*=argmaxθ(θ) where the likelihood is equal to the joint probability of the observations: (θ)=P(s1, . . . , sN|F)=ƒ(s1;θ)× . . . ׃(sN;θ), where the latter equality assumed independence between observations. Having determined the distribution F in this way, given a (real or simulated scenario) characterized by parameter values s, the probability of that (real or simulated scenario) may be determined as P(s|θ*)=ƒ(s;θ*). (Note that, in this specific context, the term ‘likelihood’ is used in a specific statistical sense; elsewhere in this disclosure, the term likelihood is used in the broader everyday sense of the word).
As another example, scenario parameters of simulated scenarios could be assigned distribution(s) at a design stage. Given a parameter distribution F at the design stage, the probability of a given scenario instance may similarly be expressed as P(V=s|F) where V denotes the parameters (variables) of the scenario and s denotes a particular set of values. For a real scenario run, the parameters V, the distribution F and the parameter values may be derived from the underlying scenario data, in order to characterize the real scenario for the purpose of determining its probability P(s|F). For a simulated run, the parameters V and the distribution F may be used to generate the simulated run, by sampling the parameter values from the distribution as s-F, which would also inherently provide the scenario probability P(s|F). In the examples below, this is expressed as a set of constraints C placed on the scenario variables V, where the probability of the scenario is given by P(V=s|C).
The top row of
Thus, a scenario run in the 90% bucket with an impact score of 0.8 is assigned a risk score of 0.72 (top right cell).
An (impact score, probability) pair forms an input to the lookup table and returns the associated risk score.
A risk score computed for a run in which a failure event occurs may be rendered on the test GUI 500 as part of the test output, which in turn assists the expert in identifying and mitigating issues in the stack 100 under testing that has given rise to the failure event. A high risk score indicated a high priority issue, as it generally indicates a relatively severe failure event in a relatively probable scenario.
The impact score can also be defined in a way that accounts for near-failure. For example, the rule severity of a given rule could be scaled according to its robustness score (the latter quantifying how close a rule was to being passed/failed). In this case, the impact score quantifies how close the system came to failing, and also the severity of that failure or near-failure.
Robustness scores are computed by the test oracle 252 as set out above. A set of performance evaluation rules is determined that is applicable to a scenario. A numerical performance metric is computed for each performance evaluation rule, in the form of a robustness score that quantifies the extent or success of failure. The robustness score may be normalized, so that zero represents the boundary between pass and fail (a score of zero means the rule has only ‘just’ been failed), with the score preferably normalized to a predetermined range, e.g. [−1,1], with −1 denoting maximum failure and 1 denoting maximum pass. As an example, for a “no collision rule” defined between a pair of agents, the robustness score could be in terms of distance and/or intersection. For example, a collision area may be defined around each agent, with a collision occurring if those areas intersect. A rule failure occurs when the intersection is non-zero and the severity of failure may, for example, be quantified as the impact velocity or some function thereof (preferably normalized). When the intersection is zero, the rule is passed. In this case, a measure such as the (minimum) distance between the collision areas could be used to quantify ‘how close’ the agents are to colliding (robustness score of zero in the above example). It is appropriate to use the risk score in this case, as the risk score is defined in a manner than quantifies severity of failure.
In the previous example, the robustness score itself is used to quantify the impact. For example, as noted, an overall robustness score can be used to score ego performance on a given rule over a run as a whole. For example, the overall robustness score for each rule may be equal to the minimum robustness score on that rule across all time steps of the run (the minimum robustness score denoting the worst failure point on that rule if negative, or the closest the ego came to failing that rule if positive or zero). In such cases, the impact score may, for example, be defined as zero when the minimum robustness score is zero or positive (positive robustness score means the rule is passed, thus no failure event on that rule), and equal to the magnitude of robustness score (or some transformation thereof) when the minimum robustness score is negative.
However, in other instances, the impact score may quantify the severity of a failure event independently of the robustness score (such that the impact score and the robustness score relate to each other only in so far as the robustness score indicates whether or not a failure event has occurred). In this case, the robustness score only indicates the existence of a failure event on a rule (if it falls below zero), but the value of the robustness score is not used to quantify the severity of the failure event. For example, as discussed above, each rule may be assigned a severity level (e.g. from 1 to 5), and the impact score may be determined e.g. as the maximum severity level of any failed rule, or a sum (or other aggregation) of the severity levels of all failed rules. In this case, if a rule assigned a severity level 5 were failed, the impact score would be based on this severity level, and would be the same irrespective of whether the minimum robustness score on that rule were, e.g., −0.01, −0.1 or −1. For example, safety rules may have higher severity ratings than comfort rules.
In another example, the two approaches may be combined, whereby the impact score depends on the severity level assigned to a failed rule and its robustness score (e.g. overall robustness score). For example, the impact score may be obtained by scaling the magnitude of a rule's overall robustness score based on the rule's severity level, e.g., so that a fail on a rule of level 5 severity has impact score of 0.05, 0.5 or 5 for an overall robustness score of −0.01, −0.1 or −1 respectively.
The principles could also be extended to ‘near failure’ events, whose impact is assessed and influences the risk score. A near failure event may be defined, for example, in terms of a second threshold applied to the robustness score (e.g. with a score<0 defined as failure, and a score between 0 and +0.1 defined as near failure). In this case, a near failure event may give rise to a non-zero impact score.
As noted above, in one implementation, distributions are assigned to scenario variables of simulated scenarios at the design stage, and those distribution are used to sample specific instances (runs) and assign probability of occurrence to those instances. An extension to the platform to accommodate parameters distributions as the design stage will now be described.
The following description draws a distinction between a “scenario model”, a “scenario” and a “scenario run” (or instance).
A “scenario model” defines a class of scenarios probabilistically, in terms of one or more distributions associated with scenario variable(s) from which value(s) of those scenario variable(s) may be sampled. A scenario variable that can take different values with probabilities defined by a distribution is referred to as a probabilistic variable for conciseness. Scenario variables may describe characteristics of the road layout (e.g. number of lanes, lane characteristics, curvature, markings, surface type etc.) as well as dynamic agent(s) (e.g. agent lane, agent type, starting position, motion characteristics etc.).
In this context, a scenario is consumed by a simulator and may be generated from a scenario model by sampling value(s) of any probabilistic scenario variables of the scenario model. A single scenario model can be used to generate multiple scenarios, with different sampled value(s). In the described implementation, a scenario is represented as a scenario description that may be provided to a simulator as input. A scenario description may be encoded using a scenario description language (SDL), or in any other form that can be consumed by whichever component(s) require it. For example, a road network of a scenario may be stored in a format such as ASAM OpenDRIVE®, and ASAM OpenSCENARIO® may be used to describe dynamic content. Other forms of scenario description may be used, including bespoke languages and formats, and the present techniques are not limited to any particular SDL, storage format, schema or standard.
A scenario is run in a simulator, resulting in a scenario run. Multiple runs may be obtained from the same scenario, e.g. with different configurations of the AV stack under testing. Hence, a scenario model may result in multiple scenarios (with different sampled parameter values), each of which could potentially result in multiple simulated runs.
The design GUI 1006 allows a user to select scenario variables, from a set of available, predetermined scenario variables 1007 and assign constraints to those variables. The predetermined scenario variables 1007 are associated with predetermined scenario generation rules, according to which scenarios are generated, subject to any constraints placed on those variables in the scenario model.
The selected scenario variables (V) and the assigned constraints (C) are embodied in a scenario model 1000. The system allows the user to assign constraints that are probabilistic in nature, allowing multiple scenarios to be sampled from the scenario model probabilistically. The scenario model 1000 may be characterized as a probabilistic form of “abstract” scenario (being a more abstracted/higher-level scenario description) from which different “concrete” scenarios (less-abstracted/lower-level scenarios) may be generated via sampling. The scenario model 1000 can also be characterized as a generative model that generates different scenarios with some probability. Mathematically, this may be expressed as:
s˜S(V,C),
where s denotes a scenario and S(V,C) denotes a scenario model 1000 defined by a set of scenario variables V and a set of constraints C on those variables V. The probability of generating a given scenario s given a scenario model S(V,C) is denoted P(V=s|C). A valid instance of the scenario is defined by a set of values s for the variables V such that the constraints C are satisfied.
The user can define scenario elements of different types (e.g. road, junction, agent etc.) and different scenario variables may be applicable to different types of elements. Certain scenario variables may pertain to multiple element types (e.g. variables such as road length, number of lanes etc. are applicable to both road and junction elements).
The model editing component 1016 creates, in the memory 1014, the scenario model 1000 and modifies the scenario model 1000 according to model creation inputs. The scenario model 1000 is stored in the form of a scenario specification, which is an encoding of the selected scenario variables and their assigned constraints. Such constraints can be formulated as deterministic values assigned to scenario variables or distributions assigned to scenario variables from which deterministic values of those scenario variables can be subsequently sampled. Constraints may also be formulated in terms of relationships between different scenario variables, where those relationships may be deterministic or probabilistic in nature (probabilistic relationships may also be defined in terms of distributions). Examples of different scenario models are described in detail below.
The sampling component 1002 has access to the scenario model 1000 and can generate different scenarios based on the scenario model 1000. To the extent the scenario variables defined in the scenario model 1000 are constrained probabilistically (rather than deterministically), the generation of a scenario 1004 includes the sampling of deterministic values of scenario variable(s) from associated distribution(s) that define the probabilistic constraints. By way of example, the scenario model 1000 is shown to comprise first, second and third scenario variables 1024a, 1024b, 1024c, which are associated with first, second and third distributions 1026a, 1026b, 1026c respectively. The scenario 1004 generated from the scenario model 1000 is shown to comprise respective values 1028a, 1028b, 1028c assigned to the first, second and third scenario variables 1024a, 1024b, 1024c, which have been sampled from the first, second and third distributions 1026a, 1026b, 1026c respectively.
As described in further detail below, a scenario variable could pertain to a road layout or a dynamic agent. For example, a road curvature variable might be assigned some distribution, from which different road curvature values may be sampled for different scenarios. Similarly, a lane number variable might be associated with a distribution, to allow scenarios with different numbers of lanes to be generated, where the number of lanes in each scenario is sampled from that distribution. An agent variable might correspond to a position or initial speed of an agent, which can be similarly assigned a distribution, from which different starting positions or speeds etc. can be sampled for different scenarios.
Relationships may be imposed between variables of different types, e.g. a scenario designer could use a road layout variable to define or constrain a dynamic variable e.g. as agent_position=[0 . . . lane width].
To assist the designer (user) who is creating or editing the scenario model 1000, the scenario 1004 is generated in the memory 1014 and provided to the scenario rendering component 1018, which in turn renders a scenario visualization on the design GUI 1006. The scenario visualization comprises at least one image representation of the scenario 1004 generated from the scenario model 1000 (scenario image), which may be a static image or video (moving) image.
The scenario visualisation within the GUI 1006 can be “refreshed”, which means rendering a scenario image(s) of a new scenario generated from the scenario model 1000, e.g. to replace the previous scenario image(s).
The set of available scenario variables 1007 is also shown as an input to the model rendering component 1020, and the available scenario variables may be rendered on the design GUI 1006, for example, in a drop-down list or other GUI component in which the available scenarios are rendered as selectable elements for selective inclusion in the scenario model 1000.
The scenario 1004 could be in SDL formal (which is to say the SDL format may be generated directly from the scenario model 1000), or the scenario model 1004 could be encoded in some ‘intermediate’ format used for the purpose of visualization, and which can be converted to SDL format subsequently.
Once finalised, the scenario model 1000 may be exported to a scenario database 1001 for use in subsequent simulation-based testing.
A converter 1106 is shown, which receives the generated scenario 504 and converts it to an SDL representation 148 (this assumes the scenario 504 is initially generated in an intermediate format; if generated directly in SDL, the converter 1106 may be omitted). The SDL representation 148 is a scenario description that is consumable by the simulator 202 and may for example conform to the ASAM Open SCENARIO format or any other scenario description format conducive to simulation.
The scenario description 148 can, in turn, be used as a basis for one or (more likely) multiple simulated runs, in the manner described above. Those simulated runs may have different outcomes, even though the underlying scenario description 148 is the same, not least because the stack 100 under testing might be different and the outcome of each simulated run depends on decisions taken within the stack 100 and the manner in which those decisions are implemented in the simulation environment. The result of each simulated run is a set of scenario ground truth 150, which in turn can be provided to the test oracle 252 of
The sampling component 1102 additional assigned a probability 1108 to the scenario 1104, which is the probability P(V=s|C) where V and C are, respectively, the variables and constrains of the scenario model 1100, and s denotes the values of the variables V sampled by the sampling component 1102 to generate the scenario 1104.
Referring to
Whilst the above examples consider AV stack testing, the techniques can be applied to test components of other forms of mobile robot. Other mobile robots are being developed, for example for carrying freight supplies in internal and external industrial zones. Such mobile robots would have no people on board and belong to a class of mobile robot termed UAV (unmanned autonomous vehicle). Autonomous air mobile robots (drones) are also being developed.
References herein to components, functions, modules and the like, denote functional components of a computer system which may be implemented at the hardware level in various ways. A computer system comprises execution hardware which may be configured to execute the method/algorithmic steps disclosed herein and/or to implement a model trained using the present techniques. The term execution hardware encompasses any form/combination of hardware configured to execute the relevant method/algorithmic steps. The execution hardware may take the form of one or more processors, which may be programmable or non-programmable, or a combination of programmable and non-programmable hardware may be used. Examples of suitable programmable processors include general purpose processors based on an instruction set architecture, such as CPUs, GPUs/accelerator processors etc. Such general-purpose processors typically execute computer readable instructions held in memory coupled to or internal to the processor and carry out the relevant steps in accordance with those instructions. Other forms of programmable processors include field programmable gate arrays (FPGAs) having a circuit configuration programmable through circuit description code. Examples of non-programmable processors include application specific integrated circuits (ASICs). Code, instructions etc. may be stored as appropriate on transitory or non-transitory media (examples of the latter including solid state, magnetic and optical storage device(s) and the like). The subsystems 102-108 of the runtime stack
Number | Date | Country | Kind |
---|---|---|---|
2115738.3 | Nov 2021 | GB | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/EP2022/080564 | 11/2/2022 | WO |