The present disclosure pertains to methods for evaluating the performance of a trajectory planner in a real or simulated scenario, and computer programs and systems for implementing the same. Example applications include ADS (Autonomous Driving System) and ADAS (Advanced Driver Assist System) performance testing.
There have been major and rapid developments in the field of autonomous vehicles. An autonomous vehicle (AV) is a vehicle which is equipped with sensors and control systems which enable it to operate without a human controlling its behaviour. An autonomous vehicle is equipped with sensors which enable it to perceive its physical environment, such sensors including for example cameras, radar and lidar. Autonomous vehicles are equipped with suitably programmed computers which are capable of processing data received from the sensors and making safe and predictable decisions based on the context which has been perceived by the sensors. An autonomous vehicle may be fully autonomous (in that it is designed to operate with no human supervision or intervention, at least in certain circumstances) or semi-autonomous. Semi-autonomous systems require varying levels of human oversight and intervention, such systems including Advanced Driver Assist Systems and level three Autonomous Driving Systems. There are different facets to testing the behaviour of the sensors and control systems aboard a particular autonomous vehicle, or a type of autonomous vehicle.
Safety is an increasing challenge as the level of autonomy increases. In autonomous driving, the importance of guaranteed safety has been recognized. Guaranteed safety does not necessarily imply zero accidents, but rather means guaranteeing that some minimum level of safety is met in defined circumstances. It is generally assumed this minimum level of safety must significantly exceed that of human drivers for autonomous driving to be viable.
According to Shalev-Shwartz et al. “On a Formal Model of Safe and Scalable Self-driving Cars” (2017), arXiv:1708.06374 (the RSS Paper), which is incorporated herein by reference in its entirety, human driving is estimated to cause of the order 10−6 severe accidents per hour. On the assumption that autonomous driving systems will need to reduce this by at least three order of magnitude, the RSS Paper concludes that a minimum safety level of the order of 10−9 severe accidents per hour needs to be guaranteed, noting that a pure data-driven approach would therefore require vast quantities of driving data to be collected every time a change is made to the software or hardware of the AV system.
The RSS paper provides a model-based approach to guaranteed safety. A rule-based Responsibility-Sensitive Safety (RSS) model is constructed by formalizing a small number of “common sense” driving rules:
The RSS model is one example of a rule-based safety model for assessing autonomous behaviour. An aim herein is to provide a flexible testing platform that can be tailored to different safety models and/or scenarios with minimal effort.
A first aspect herein provides a computer system for evaluating the performance of a trajectory planner in a real or simulated scenario, the computer system comprising: at least one input configured to receive scenario data, the scenario data generated using the trajectory planner to control an ego agent responsive to at least one other agent in the real or simulated scenario; a test oracle configured to provide predetermined extractor functions for extracting time-varying numerical signals from the scenario data and predetermined assessor functions for assessing the extracted time-varying signals. The test oracle is configured to apply, to the scenario data, a rule graph comprising extractor nodes and assessor nodes. Each extractor node is configured to apply one of the predetermined extractor functions to the scenario data to extract an output in the form of a time-varying numerical signal. Each assessor node has one or more child nodes, each child node being one of the extractor nodes or another of the assessor nodes, the assessor node configured to apply one of the predetermined assessor functions to the output(s) of its child node(s) to compute an output therefrom. The test oracle is configured to provide an output graph comprising the output of at least one of the assessor nodes and the output(s) of at least one of its child node(s).
The predetermined extractor and assessor functions within the test oracle constitute a set of modular “building blocks”. The rule editor allows custom rules of arbitrary complexity to be constructed from these atomic functions in a hierarchical fashion. The custom rule graph is a computational graph of nodes at which selected atomic functions are applied and edges (parent-child relationships) that can be flexibility defined.
In embodiments, the computer system may comprise a rule editor configured to create the rule graph responsive to rule creation inputs specifying the predetermined extractor function of each extractor node, the predetermined assessor function of each assessor node, and parent-child relationships between the extractor nodes and the assessor nodes.
The computer system may comprise a rule editor configured to create the rule graph, wherein: each extractor node may be created in response to a node creation input comprising an identifier of the predetermined extractor function; and each assessor node may be created in response to a node creation input comprising an identifier of the assessor function and an identifier(s) of the one or more child nodes.
The time-series of results computed by the assessor node may, for example, be a series of categorical results over multiple time steps (e.g. binary “pass/fail” results), and the derived time-varying numerical signal may exceed a threshold when a first type of result is computed (e.g. “pass”), but not when any other type of result (e.g. “fail”) is computed.
In embodiments, the output graph may comprise the outputs of some or all of the assessor nodes.
Alternatively or additionally, the output graph may comprise the output(s) of one, some or all of the extractor nodes.
The computer system may be configured to provide a graphical user interface (GUI) for accessing the output graph, via which a visualization of each output of the output graph is accessible.
The GUI may be configured to initially display a visual representation of the output of the assessor node, wherein, responsive to a graph expansion input, the GUI is configured to display a visual representation of the output of the child node.
The output of each assessor node may comprise at least one of: a time-series of categorical results, and a derived time-varying numerical signal.
For example, the output of the assessor node may comprise a time-series of categorical results and a derived time-varying numerical signal, wherein the derived time-varying signal satisfies a threshold condition when and only when a first type of categorical result is computed.
The above GUI may be configured to initially display a visual representation of the time-series of categorical results, wherein, responsive to a node expansion input, the GUI may be configured to display a visual representation of the derived time-varying signal.
The derived time-varying signal may be displayed with a visual indication of any portion(s) that satisfy the threshold condition.
At least one of the assessor and/or extractor functions may be one or more configurable parameters, and the rule editor may be configured to receive one or more parameter configuration input(s) for configuring the parameters.
The test oracle may be configured to only partially compute the outputs as required for a current configuration of the parameter(s), and store the partially-computed outputs in a cache, Responsive to a change in the configuration of the parameters, the test oracle may be configured determine an extent to which the cached outputs are unaffected by the change, determine an extent to which re-computation and/or further computation of the outputs is required, (re-)compute the outputs as required, and combine the (re-)computed outputs with the unaffected cached outputs.
The test oracle may be configured to apply the rule graph to the scenario data by: for at least one of the assessor nodes having multiple child nodes, computing the output(s) of a first subset of one or more of the multiple child nodes, determining from those output(s) that the output of the assessor node is computable without computing, or by only partially computing, the output(s) of the remaining child nodes, and computing the output of the assessor node without computing, or by only partially computing, the output(s) of the remaining child nodes.
The rule editor may be configured to receive inputs denoting at least one scenario condition for the rule graph, and the test oracle may be configured to: assess the scenario data at multiple time steps or time intervals, to determine whether or not the scenario condition is satisfied at that time step or time interval; partially compute the output of at least one of the assessor nodes, in respect of anytime step(s) or time interval(s) for which the scenario condition is satisfied, the output(s) of its child node(s) being only partially computed as needed to partially compute the output of the assessor node.
The test oracle may be configured to store the partially-computed output(s) of the child node(s) in a cache, and reuse at least some of the cached outputs when evaluating another rule graph on the scenario data, and/or re-evaluating the rule graph on the scenario data responsive to a change in at least one configurable parameter of the rule graph, the cached output(s) being combined with partially computed output(s) of those node(s) for at least one further time interval or time period.
The GUI may be configured to display an initial visualization of the rule graph that is updated in response to changes in the node creation inputs.
The node creation inputs may be embodied in rule creation code, and the rule editor may be configured to receive and interpret the rule creation code.
The rule creation code may be interpreted according to a domain specific language.
At least one of the assessor functions may comprise a temporal or non-temporal logic operator.
Another aspect herein provides a rule editor for creating rules for evaluating scenario data generated using a trajectory planner to control an ego agent responsive to at least one other agent in a real or simulated scenario, the rule editor embodied in transitory or non-transitory media as program instructions which, when executed on one or more computer processors, cause the one or more processor to: create a custom rule graph comprising extractor nodes and assessor nodes, wherein: each extractor node is created in response to a node creation input comprising an identifier of one of a predetermined extractor function provided by a test oracle, the extractor node configured to apply the identified extractor function to scenario data to extract an output in the form of a time-varying numerical signal; and each assessor node is created in response to a node creation input comprising an identifier of one of a predetermined assessor functions provided by the test oracle and an identifier(s) of one or more child nodes, each child node being one of the extractor nodes or another of the assessor nodes, the assessor node configured to apply the identified assessor function to the output(s) of its child node(s) to compute an output therefrom.
A further aspect herein provides a computer system for evaluating the performance of a trajectory planner for an autonomous vehicle in a real or simulated scenario based on at least one driving rule, the computer system comprising: at least one input configured to receive scenario data, the scenario data generated using the trajectory planner to control the autonomous vehicle responsive to at least one other agent in the real or simulated scenario; a rule editor configured to receive as input a driving rule to be applied the scenario data, the driving rule defined in the form of a temporal or non-temporal logic predicate evaluated on one or more extractor functions; a test oracle configured to apply the driving rule to the scenario by applying the one or more extractor functions to the scenario data to compute one or more extracted signals therefrom, and evaluating the logic predicate on the one or more extracted signals at multiple timesteps of the scenario, thereby computing a top-level output, in the form of a time-series of categorical results; and a graphical user interface configured to display an output graph visualizing: the top-level output, multiple intermediate outputs, each being a time-series of categorical results used to derive the top-level output, each computed by evaluating a component predicate of the driving rule, and a set of hierarchical relationships between top-level output and the multiple intermediate outputs.
In embodiments, the output graph may comprise a visual representation of a derived signal correlated with the top-level output or one of the multiple intermediate outputs.
In embodiments, the output graph may comprise a visual representation of: at least one extracted signal of the one or more extracted signals, and a hierarchical relationship between the at least one extracted signal and the multiple intermediate outputs.
A further aspect herein provides a computer system for evaluating the performance of a trajectory planner in a real or simulated scenario, the computer system comprising: at least one input configured to receive scenario data, the scenario data generated using the trajectory planner to control an ego agent responsive to at least one other agent in the real or simulated scenario; a test oracle configured to provide predetermined extractor functions for extracting time-varying numerical signals from the scenario data and predetermined assessor functions for assessing the extracted time-varying signals; a rule editor configured to create a custom rule graph comprising extractor nodes and assessor nodes, wherein: each extractor node is created in response to a node creation input comprising an identifier of one of the extractor functions, the extractor node configured to apply the identified extractor function to the scenario data to extract an output in the form of a time-varying numerical signal; and each assessor node is created in response to a node creation input comprising an identifier of one of the assessor functions and an identifier(s) of one or more child nodes, each child node being one of the extractor nodes or another of the assessor nodes, the assessor node configured to apply the identified assessor function to the output(s) of its child node(s) to compute an output therefrom; wherein the test oracle is configured to apply the custom rule graph to the scenario data, and provide an output graph comprising the output of at least one of the assessor nodes and the output(s) of at least one of its child node(s).
Another aspect herein provides executable program instructions for programming a computer system to implement any of the functionality described herein.
For a better understanding of the present disclosure, and to show how embodiments of the same may be carried into effect, reference is made by way of example only to the following figures in which:
Herein, a “scenario” can be real or simulated and involves an ego agent (an ego vehicle or other mobile robot) moving within an environment (e.g. within a particular road layout), typically in the presence of one or more other agents (other vehicles, pedestrians, cyclists, animals etc.). A “trace” is a history of an agent's (or actor's) location and motion over the course of a scenario. There are many ways a trace can be represented. Trace data will typically include spatial and motion data of an agent within the environment. The term is used in relation to both real scenarios (with physical traces) and simulated scenarios (with simulated traces). The following description considers simulated scenarios but the same techniques can be applied to assess performance on real-world scenarios.
In a simulation context, the term scenario may be used in relation to both the input to a simulator (such as an abstract scenario description) and the output of the simulator (such as the traces). It will be clear in context which is referred to.
A typical AV stack includes perception, prediction, planning and control (sub)systems. The term “planning” is used herein to refer to autonomous decision-making capability (such as trajectory planning) whilst “control” is used to refer to the generation of control signals for carrying out autonomous decisions. The extent to which planning and control are integrated or separable can vary significantly between different stack implementations—in some stacks, these may be so tightly coupled as to be indistinguishable (e.g. such stacks could plan in terms of control signals directly), whereas other stacks may be architected in a way that draws a clear distinction between the two (e.g. with planning in terms of trajectories, and with separate control optimizations to determine how best to execute a planned trajectory at the control signal level). Unless otherwise indicated, the planning and control terminology used herein does not imply any particular coupling or separation of those aspects. An example form of AV stack will now be described in further detail, to provide relevant context to the subsequent description.
In a real-world context, the perception system 102 would receive sensor outputs from an on-board sensor system 110 of the AV, and use those sensor outputs to detect external agents and measure their physical state, such as their position, velocity, acceleration etc. The on-board sensor system 110 can take different forms but generally comprises a variety of sensors such as image capture devices (cameras/optical sensors), lidar and/or radar unit(s), satellite-positioning sensor(s) (GPS etc.), motion/inertial sensor(s) (accelerometers, gyroscopes etc.) etc. The onboard sensor system 110 thus provides rich sensor data from which it is possible to extract detailed information about the surrounding environment, and the state of the AV and any external actors (vehicles, pedestrians, cyclists etc.) within that environment. The sensor outputs typically comprise sensor data of multiple sensor modalities such as stereo images from one or more stereo optical sensors, lidar, radar etc. Sensor data of multiple sensor modalities may be combined using filters, fusion components etc.
The perception system 102 typically comprises multiple perception components which co-operate to interpret the sensor outputs and thereby provide perception outputs to the prediction system 104.
In a simulation context, depending on the nature of the testing—and depending, in particular, on where the stack 100 is “sliced” for the purpose of testing—it may or may not be necessary to model the on-board sensor system 100. With higher-level slicing, simulated sensor data is not required therefore complex sensor modelling is not required.
The perception outputs from the perception system 102 are used by the prediction system 104 to predict future behaviour of external actors (agents), such as other vehicles in the vicinity of the AV.
Predictions computed by the prediction system 104 are provided to the planner 106, which uses the predictions to make autonomous driving decisions to be executed by the AV in a given driving scenario. The inputs received by the planner 106 would typically indicate a drivable area and would also capture predicted movements of any external agents (obstacles, from the AV's perspective) within the drivable area. The driveable area can be determined using perception outputs from the perception system 102 in combination with map information, such as an HD (high definition) map.
A core function of the planner 106 is the planning of trajectories for the AV (ego trajectories), taking into account predicted agent motion. This may be referred to as trajectory planning. A trajectory is planned in order to carry out a desired goal within a scenario. The goal could for example be to enter a roundabout and leave it at a desired exit; to overtake a vehicle in front; or to stay in a current lane at a target speed (lane following). The goal may, for example, be determined by an autonomous route planner (not shown).
The controller 108 executes the decisions taken by the planner 106 by providing suitable control signals to an on-board actor system 112 of the AV. In particular, the planner 106 plans trajectories for the AV and the controller 108 generates control signals to implement the planned trajectories. Typically, the planner 106 will plan into the future, such that a planned trajectory may only be partially implemented at the control level before a new trajectory is planned by the planner 106.
Scenarios can be obtained for the purpose of simulation in various ways, including manual encoding. The system is also capable of extracting scenarios for the purpose of simulation from real-world runs, allowing real-world situations and variations thereof to be re-created in the simulator 202.
Further details of the testing pipeline and the test oracle 252 will now be described. The examples that follow focus on simulation-based testing. However, as noted, the test oracle 252 can equally be applied to evaluate stack performance on real scenarios, and the relevant description below applies equally to real scenarios. The following description refers to the stack 100 of
The idea of simulation-based testing is to run a simulated driving scenario that an ego agent must navigate under the control of a stack (or sub-stack) being tested. Typically, the scenario includes a static drivable area (e.g. a particular static road layout) that the ego agent is required to navigate in the presence of one or more other dynamic agents (such as other vehicles, bicycles, pedestrians etc.). Simulated inputs feed into the stack under testing, where they are used to make decisions. The ego agent is, in turn, caused to carry out those decisions, thereby simulating the behaviour of an autonomous vehicle in those circumstances.
Simulated inputs 203 are provided to the stack under testing. “Slicing” refers to the selection of a set or subset of stack components for testing. This, in turn, dictates the form of the simulated inputs 203.
By way of example,
By contrast, so-called “planning-level” simulation would essentially bypass the perception system 102. The simulator 202 would instead provide simpler, higher-level inputs 203 directly to the prediction system 104. In some contexts, it may even be appropriate to bypass the prediction system 104 as well, in order to test the planner 106 on predictions obtained directly from the simulated scenario.
Between these extremes, there is scope for many different levels of input slicing, e.g. testing only a subset of the perception system, such as “later” perception components, i.e., components such as filters or fusion components which operate on the outputs from lower-level perception components (such as object detectors, bounding box detectors, motion detectors etc.).
By way of example only, the description of the testing pipeline 200 makes reference to the runtime stack 100 of
Whatever form they take, the simulated inputs 203 are used (directly or indirectly) as a basis for decision-making by the planner 108.
The controller 108, in turn, implements the planner's decisions by outputting control signals 109. In a real-world context, these control signals would drive the physical actor system 112 of AV.
In simulation, an ego vehicle dynamics model 204 is used to translate the resulting control signals 109 into realistic motion of the ego agent within the simulation, thereby simulating the physical response of an autonomous vehicle to the control signals 109.
To the extent that external agents exhibit autonomous behaviour/decision making within the simulator 202, some form of agent decision logic 210 is implemented to carry out those decisions and determine agent behaviour within the scenario. The agent decision logic 210 may be comparable in complexity to the ego stack 100 itself or it may have a more limited decision-making capability. The aim is to provide sufficiently realistic external agent behaviour within the simulator 202 to be able to usefully test the decision-making capabilities of the ego stack 100. In some contexts, this does not require any agent decision making logic 210 at all (open-loop simulation), and in other contexts useful testing can be provided using relatively limited agent logic 210 such as basic adaptive cruise control (ACC). One or more agent dynamics models 206 may be used to provide more realistic agent behaviour.
A simulation of a driving scenario is run in accordance with a scenario description 201, having both static and dynamic layers 201a, 201b.
The static layer 201a defines static elements of a scenario, which would typically include a static road layout.
The dynamic layer 201b defines dynamic information about external agents within the scenario, such as other vehicles, pedestrians, bicycles etc. The extent of the dynamic information provided can vary. For example, the dynamic layer 201b may comprise, for each external agent, a spatial path to be followed by the agent together with one or both of motion data and behaviour data associated with the path. In simple open-loop simulation, an external actor simply follows the spatial path and motion data defined in the dynamic layer that is non-reactive i.e. does not react to the ego agent within the simulation. Such open-loop simulation can be implemented without any agent decision logic 210. However, in closed-loop simulation, the dynamic layer 201b instead defines at least one behaviour to be followed along a static path (such as an ACC behaviour). In this case, the agent decision logic 210 implements that behaviour within the simulation in a reactive manner, i.e. reactive to the ego agent and/or other external agent(s). Motion data may still be associated with the static path but in this case is less prescriptive and may for example serve as a target along the path. For example, with an ACC behaviour, target speeds may be set along the path which the agent will seek to match, but the agent decision logic 110 might be permitted to reduce the speed of the external agent below the target at any point along the path in order to maintain a target headway from a forward vehicle.
The output of the simulator 202 for a given simulation includes an ego trace 212a of the ego agent and one or more agent traces 212b of the one or more external agents (traces 212).
A trace is a complete history of an agent's behaviour within a simulation having both spatial and motion components. For example, a trace may take the form of a spatial path having motion data associated with points along the path such as speed, acceleration, jerk (rate of change of acceleration), snap (rate of change of jerk) etc.
Additional information is also provided to supplement and provide context to the traces 212. Such additional information is referred to as “environmental” data 214 which can have both static components (such as road layout) and dynamic components (such as weather conditions to the extent they vary over the course of the simulation). To an extent, the environmental data 214 may be “passthrough” in that it is directly defined by the scenario description 201 and is unaffected by the outcome of the simulation. For example, the environmental data 214 may include a static road layout that comes from the scenario description 201 directly. However, typically the environmental data 214 would include at least some elements derived within the simulator 202. This could, for example, include simulated weather data, where the simulator 202 is free to change weather conditions as the simulation progresses. In that case, the weather data may be time-dependent, and that time dependency will be reflected in the environmental data 214.
The test oracle 252 receives the traces 212 and the environmental data 214, and scores those outputs in the manner described below. The scoring is time-based: for each performance metric, the test oracle 252 tracks how the value of that metric (the score) changes over time as the simulation progresses. The test oracle 252 provides an output 256 comprising a score-time plot for each performance metric, as described in further detail later. The metrics 254 are informative to an expert and the scores can be used to identify and mitigate performance issues within the tested stack 100.
A number of “later” perception components 102B form part of the sub-stack 100S to be tested and are applied, during testing, to simulated perception inputs 203. The later perception components 102B could, for example, include filtering or other fusion components that fuse perception inputs from multiple earlier perception components.
In the full stack 100, the later perception component 102B would receive actual perception inputs 213 from earlier perception components 102A. For example, the earlier perception components 102A might comprise one or more 2D or 3D bounding box detectors, in which case the simulated perception inputs provided to the late perception components could include simulated 2D or 3D bounding box detections, derived in the simulation via ray tracing. The earlier perception components 102A would generally include component(s) that operate directly on sensor data.
With this slicing, the simulated perception inputs 203 would correspond in form to the actual perception inputs 213 that would normally be provided by the earlier perception components 102A. However, the earlier perception components 102A are not applied as part of the testing, but are instead used to train one or more perception error models 208 that can be used to introduce realistic error, in a statistically rigorous manner, into the simulated perception inputs 203 that are fed to the later perception components 102B of the sub-stack 100 under testing.
Such perception error models may be referred to as Perception Statistical Performance Models (PSPMs) or, synonymously, “PRISMs”. Further details of the principles of PSPMs, and suitable techniques for building and training them, may be bound in International Patent Application Nos. PCT/EP2020/073565, PCT/EP2020/073562, PCT/EP2020/073568, PCT/EP2020/073563, and PCT/EP2020/073569, incorporated herein by reference in its entirety. The idea behind PSPMs is to efficiently introduce realistic errors into the simulated perception inputs provided to the sub-stack 102B (i.e. that reflect the kind of errors that would be expected were the earlier perception components 102A to be applied in the real-world). In a simulation context, “perfect” ground truth perception inputs 203G are provided by the simulator, but these are used to derive more realistic perception inputs 203 with realistic error introduced by the perception error models(s) 208.
As described in the aforementioned reference, a PSPM can be dependent on one or more variables representing physical condition(s) (“confounders”), allowing different levels of error to be introduced that reflect different possible real-world conditions. Hence, the simulator 202 can simulate different physical conditions (e.g. different weather conditions) by simply changing the value of a weather confounder(s), which will, in turn, change how perception error is introduced.
The later perception components 102b within the sub-stack 100S process the simulated perception inputs 203 in exactly the same way as they would process the real-world perception inputs 213 within the full stack 100, and their outputs, in turn, drive prediction, planning and control. Alternatively, PSPMs can be used to model the entire perception system 102, including the late perception components 208.
Rules are constructed within the test oracle 252 as computational graphs (rule graphs).
Each assessor node 304 is shown to have at least one child object (node), where each child object is one of the extractor nodes 302 or another one of the assessor nodes 304. Each assessor node receives output(s) from its child node(s) and applies an assessor function to those output(s). The output of the assessor function is a time-series of categorical results. The following examples consider simple binary pass/fail results, but the techniques can be readily extended to non-binary results. Each assessor function assesses the output(s) of its child node(s) against a predetermined atomic rule. Such rules can be flexibly combined in accordance with a desired safety model.
In addition, each assessor node 304 derives a time-varying numerical signal from the output(s) of its child node(s), which is related to the categorical results by a threshold condition (see below).
A top-level root node 304a is an assessor node that is not a child node of any other node. The top-level node 304a outputs a final sequence of results, and its descendants (i.e. nodes that are direct or indirect children of the top-level node 304a) provide the underling signals and intermediate results.
Signals extracted directly from the scenario ground truth 310 by the extractor nodes 302 may be referred to as “raw” signals, to distinguish from “derived” signals computed by assessor nodes 304. Results and raw/derived signals may be discretised in time.
A rule editor 400 is provided, which receives rule creation inputs from a user. The rule creation inputs are coded in a domain specific language (DSL), and an example section of rule creation code 406 is depicted. The rule creation code 406 defines a custom rule graph 408 of the kind depicted in
Within the code 406, an extractor node creation input is depicted and labelled 411. The extractor node creation input is shown to comprise an identifier 412 of one of the predetermined extractor functions 402.
An assessor node creation input 413 is also depicted, and is shown to comprise an identifier 414 of one of the predetermined assessor functions 404. Here, the input 413 instructs an assessor node to be created with two child nodes, having node identifiers 415a, 415b (which happen to be extractor nodes in this example, but could be assessor nodes, extractor nodes or a combination of both in general).
The nodes of the custom rule graph are objects in the object-oriented programming (OOP) sense. A node factory class (Nodes( )) is provided within the test oracle 252. To implement the custom rule graph 408, the node factory class 410 is instantiated, and a node creation function (add_node) of the resulting factory object 410 (node-factory) is called with the details of the node to be created.
The following examples consider atomic rules that are formulated as atomic logic predicates. Examples of basic atomic predicates include elementary logic gates (OR, AND etc.), and logical functions such as “greater than”, (Gt(a,b)) (which returns true when a is greater than b, and false otherwise).
The example rule creation code 406 uses a Gt building block to implement a safe lateral distance rule between an ego agent and another agent in the scenario (having agent identifier “other_agent_id”). Two extractor nodes (latd, latsd) are defined in the code 406, and mapped to predetermined LateralDistance and LateralSafeDistance extractor functions respectively. Those functions operate directly on the scenario ground truth 310 to extract, respectively, a time-varying lateral distance signal (measuring a lateral distance between the ego agent and the identified other agent), and a time-varying safe lateral distance signal for the ego agent and the identified other agent. The safe lateral distance signal could depend on various factors, such as the speed of the ego agent and the speed of the other agent (captured in the traces 212), and environmental conditions (e.g. weather, lighting, road type etc.) captured in the environmental data 214. This is largely invisible to an end-user, who simply has to select the desired extractor function (although, in some implementations, one or more configurable parameters of the function may be exposed to the end-user).
An assessor node (is_latd_safe) is defined as a parent to the latd and latsd extractor nodes, and is mapped to the Gt atomic predicate. Accordingly, when the rule graph 408 is implemented, the is_latd_safe assessor node applies the Gt function to the outputs of the latd and latsd extractor nodes, in order to compute a true/false result for each timestep of the scenario, returning true for each time step at which the latd signal exceeds the latsd signal and false otherwise. In this manner, a “safe lateral distance” rule has been constructed from atomic extractor functions and predicates; the ego agent fails the safe lateral distance rule when the lateral distance reaches or falls below the safe lateral distance threshold. As will be appreciated, this is a very simple example of a custom rule. Rules of arbitrary complexity can be constructed according to the same principles.
The test oracle 252 applies the custom rule graph 408 to the scenario ground truth 310, and provides the results in the form of an output graph 417—that is to say, the test oracle 252 does not simply provide top-level outputs, but provides the output computed at each node of the custom rule graph 408. In the “safe lateral distance example”, the time-series of results computed by the is_latd_safe node are provided, but the underlying signals latd and latsd are also provided in the output graph 417, allowing the end-user to easily investigate the cause of a failure on a particular rule at any level in the graph. In this example, the output graph 417 is a visual representation of the custom rule graph 408 that is displayed via a user interface (UI) 418; each node of the custom rule graph is augmented with a visualization of its the output (see
The numerical output of the top-level node may be referred to as a time-varying ‘robustness’ score. A robustness score denotes the extent of success/failure (that is, in the event a vehicle passed a rule at a given time instant, ‘how close’ it was to failing, and in the event it failed the rule, how close it was to passing). The robustness score is preferably normalized, e.g. to a scale of [−1,+1] and scaled so that the pass/fail threshold corresponds to a robustness score of zero. Such normalization and scaling makes the output highly intuitive, and facilities easy and meaningful comparison of the results on different rules (or different components of the same rule). For example, in the case of a distance rule defined with respect to some threshold, a robustness score of zero might denote the point at which that threshold is reached, decreasing to −1 with as distance decreases below the threshold, and increasing to +1 as distance increases above the threshold. For more complex rules, such as a rule defined in terms of the maximum or minimum of two distance functions (e.g. as lateral and longitudinal distance), the robustness score may be defined in terms of whichever distance applies at a given time step.
A predefined scoring function may be associated with each assessor function. For an atomic predicate whose children are also assessor(s), the scoring function may be defined as a function of is children's score(s). For an assessor function whose children are extractor function(s), the scoring function may be defined as a function of its children's extracted signal(s). For an assessor function with both assessor and extractor children, the scoring function may be defined as a function of the score(s) and signal(s) provided by its children.
The rule editor 400 allows rules to be tailored, e.g. to implement different safety models, or to apply rules selectively to different scenarios (in a given safety model, not every rule will necessarily be applicable to every scenario; with this approach, different rules or combinations of rules can be applied to different scenarios).
The above examples consider simple logical predicates evaluated on results or signals at a single time instance, such as OR, AND, Gt etc. However, in practice, it may be desirable to formulate certain rules in terms of temporal logic.
Hekmatnejad et al., “Encoding and Monitoring Responsibility Sensitive Safety Rules for Automated Vehicles in Signal Temporal Logic” (2019), MEMOCODE '19: Proceedings of the 17th ACM-IEEE International Conference on Formal Methods and Models for System Design (incorporated herein by reference in its entirety) discloses a signal temporal logic (STL) encoding of the RSS safety rules. Temporal logic provides a formal framework for constructing predicates that are qualified in terms of time. This means that the result computed by an assessor at a given time instant can depend on results and/or signal values at another time instant(s).
For example, a requirement of the safety model may be that an ego agent responds to a certain event within a set time frame. Such rules can be encoded as temporal logic predicates.
The first output graph is depicted in a collapsed form, and only the time-series of binary pass/fail results for the root node is visualized (as a simple colour-coded horizontal bar within the first visual element 502). However, the first visual element 502 is selectable to expand the visualization to lower-level node(s) and their output(s).
The second output graph is depicted in an expanded form, accessed by selecting the second visual element 504. Visual elements 506, 508 represent lower-level assessor nodes within the applicable rule graph, and their results are visualized in the same way. Visual elements 510, 512 represent extractor nodes within the graph.
The visualization of each node is also selectable to render an expanded view of that node. The expanded view provides a visualization of the time-varying numerical signal computed or extracted at that node. The second visual element 504 is shown in an expanded state, with a visualization of its derived signal displayed in place of its binary sequence of results. The derived signal is colour-coded based on the failure threshold (as noted, the signal dropping to zero or below denotes failure on the applicable rule).
The visualizations 510, 512 of the extractor nodes are expandable in the same way to render visualizations of their raw signals.
Below, a section of code is provided that defines a custom rule graph (ALKS_01) as a temporal logic predicate, using an alternative syntax.
In the above example, LongitudinalDistance( ) and Velocity AlongRoadLateralAxis( ) are predetermined extractor functions, and functions such as “and”, Eventually( ), Next( ) and Always( ) are atomic assessor functions. The function AgentIsOnSameLane( ) is an assessor function applied directly to the scenario that determined whether a given agent is in the same lane as the ego agent.
Here, NearbyAgents( ) is time-varying iterable identifying any other agents that satisfy some distance threshold to the ego agent.
A node creation input 411, 414 may additionally set value(s) for one or more configurable parameter(s) (such as thresholds, time intervals etc.) of the associated assessor or extractor function.
In certain embodiments, increased computational efficiency may be achieved via selective evaluation of a rule graph. For example, within the graph of
An assessor or extractor function may have one or more configurable parameters. For example, the latsd and lonsd nodes may have configurable parameter(s) that specify how the threshold distances are extracted from the scenario ground truth 310, e.g. as configurable functions of ego velocity.
Further efficiency gains can be obtained by caching and reusing results to the extent possible.
For example, when a user modifies the graph or some parameter, only the outputs of affected nodes may be recomputed (and, in some cases, only to the extent necessary to compute the top-level result—see above).
Whilst the above examples considers outputs in the form of time-varying signals and or time-series of categorical (e.g. PASS/FAIL or TRUE/FALSE results), other types of output can, alternatively or additionally, be passed between nodes. For example, time-varying iterables (i.e. objects that can be iterated over n a for loop), may be passed between nodes
Variables may be assigned and/or passed through the tree and bound at runtime. The combination of runtime variables and iterables provides control of loops and runtime (scenario-relevant) parameterisation, whilst the tree itself remains ‘static’.
For loops can define scenario-specific conditions under which rules apply, for example “for agents in front” or “for each traffic light at this junction” etc. To implement such loops, variables are needed (e.g. to implement the loop ‘for each nearby agent’ based on an ‘other_agent’ variable) but can also be used to define (store) variables in a current context which can then be accessed (loaded) by other blocks (nodes) further below in the tree.
Time periods may only be computed as required (also in a top-down manner), and results may be cached and merges for newly required time periods.
For example, one rule (rule graph) might require an acceleration to be computed for a forward vehicle to check against an adaptive cruise control headway. Separately, another rule (rule tree) might require the acceleration of all vehicles around the ego agent (‘nearby’ agents).
Where the applicable time periods overlap, one tree may be able to re-use the other's acceleration data (e.g. in the case that the duration for which an ‘other vehicle’ is considered ‘forward’ is a subset of the duration for which it is considered ‘nearby’).
In one implementation, parameters of a rule tree may be encoded hierarchically parameter objects, whose fields may themselves be parameter objects (nested parameter objects). Nested parameter objects can give rise to complex hierarchies of parameters. Rather than exposing the nested parameter objects directly, the hierarchy may be exposed only to the extent necessary to resolve name clashes. To this end, a mapping component maps parameters with the nested parameter object to minimal, non-conflicting qualified variable names. The minimal names are exposed via the rule editor 400.
For example, a given parameter may be referred to by its name at the deepest level only, unless this name clashes with another parameter.
A further aspect herein provides a computer system for evaluating the performance of a trajectory planner in a real or simulated scenario, the computer system comprising: at least one input configured to receive scenario data, the scenario data generated using the trajectory planner to control an ego agent responsive to at least one other agent in the real or simulated scenario; a test oracle configured to provide extractor functions for extracting time-varying numerical signals from the scenario data and assessor functions for assessing the extracted time-varying signals; wherein the test oracle is configured to apply, to the scenario data, a rule graph comprising extractor nodes and assessor nodes; wherein each extractor node is configured to apply an extractor function to the scenario data to extract an output in the form of a time-varying numerical signal; wherein each assessor node has one or more child nodes, each child node being one of the extractor nodes or another of the assessor nodes, the assessor node configured to apply an assessor function to the output(s) of its child node(s) to compute an output therefrom; wherein the test oracle is configured to provide an output graph comprising the output of at least one of the assessor nodes and the output(s) of at least one of its child node(s).
In embodiments, a rule editor of the kind described above may be provided, to allow custom rule graphs to be created.
A further aspect herein provides a computer system for evaluating the performance of a trajectory planner in a real or simulated scenario, the computer system comprising: at least one input configured to receive scenario data, the scenario data generated using the trajectory planner to control an ego agent responsive to at least one other agent in the real or simulated scenario; a test oracle configured to apply a (predetermined and/or custom) rules to the scenario data for evaluating the performance of the trajectory planner in the real or simulated scenario.
In embodiments, the test oracle may have one or more configurable parameters, the computer system configured to receive one or more parameter configuration input(s) for configuring the parameters.
For example, the parameters could be assessor and/or extractor node parameters, if some or all of the rules are implemented as rule graphs.
For example, the configurable parameters may be encoded hierarchically in a parameter object that defined parent-child relationships between the parameters, wherein each child parameter is identified by reference to a parent parameter.
A mapping component may be configured to map each child parameter to a unique, minimal (or simplified) non-conflicting parameter name, wherein the computer system is configured to expose the minimal non-conflicting parameter name (e.g. to a user or programmer) for configuring the parameter.
For each child parameter having a unique name in the parameter object, the unique, minimal/simplified non-conflicting parameter name may be based on the unique name of the child parameter only.
For two or more child parameters having the same name (name conflict), their unique minimal non-conflicting parameter names may be assigned based on respective name of their respective parent and/or grandparent parameters.
Annex A shows code of an example algorithm that may be implemented by the mapping component in order to assign unique, minimal, non-conflicting name to each nested parameter.
Whilst the above examples consider AV stack testing, the techniques can be applied to test components of other forms of mobile robot. Other mobile robots are being developed, for example for carrying freight supplies in internal and external industrial zones. Such mobile robots would have no people on board and belong to a class of mobile robot termed UAV (unmanned autonomous vehicle). Autonomous air mobile robots (drones) are also being developed.
A computer system comprises execution hardware which may be configured to execute the method/algorithmic steps disclosed herein and/or to implement a model trained using the present techniques. The term execution hardware encompasses any form/combination of hardware configured to execute the relevant method/algorithmic steps. The execution hardware may take the form of one or more processors, which may be programmable or non-programmable, or a combination of programmable and non-programmable hardware may be used. Examples of suitable programmable processors include general purpose processors based on an instruction set architecture, such as CPUs, GPUs/accelerator processors etc. Such general-purpose processors typically execute computer readable instructions held in memory coupled to or internal to the processor and carry out the relevant steps in accordance with those instructions. Other forms of programmable processors include field programmable gate arrays (FPGAs) having a circuit configuration programmable through circuit description code. Examples of non-programmable processors include application specific integrated circuits (ASICs). Code, instructions etc. may be stored as appropriate on transitory or non-transitory media (examples of the latter including solid state, magnetic and optical storage device(s) and the like). The subsystems 102-108 of the runtime stack
Number | Date | Country | Kind |
---|---|---|---|
2102006.0 | Feb 2021 | GB | national |
2105838.3 | Apr 2021 | GB | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/EP2022/053406 | 2/11/2022 | WO |