The present description relates to a system and method for indicating performance of planning stacks (or portions of planning stacks) in autonomous vehicles.
There have been major and rapid developments in the field of autonomous vehicles. An autonomous vehicle is a vehicle which is equipped with sensors and autonomous systems which enable it to operate without a human controlling its behaviour. The term autonomous herein encompass semi-autonomous and fully autonomously behaviour. The sensors enable the vehicle to perceive its physical environment, and may include for example cameras, radar and lidar. Autonomous vehicles are equipped with suitably programmed computers which are capable of processing data received from the sensors and making safe and predictable decisions based on the context which has been perceived by the sensors. There are different facets to testing the behaviour of the sensors and autonomous systems aboard a particular autonomous vehicle, or a type of autonomous vehicle. AV testing can be carried out in the real-world or based on simulated driving scenarios. An ego vehicle under testing (real or simulated) may be referred to as an ego vehicle.
One approach to testing in the industry relies on “shadow mode” operation. Such testing seeks to use human driving as a benchmark for assessing autonomous decisions. An autonomous driving system (ADS) runs in shadow mode on inputs captured from a sensor-equipped but human-driven vehicle. The ADS processes the sensor inputs of the human-driven vehicle, and makes driving decisions as if it were notionally in control of the vehicle. However, those autonomous decisions are not actually implemented, but are simply recorded with the aim of comparing them to the actual driving behaviour of the human. “Shadow miles” are accumulated in this manner typically with the aim of demonstrating that the ADS could have performed more safely or effectively than the human.
Existing shadow mode testing has a number of drawbacks. Shadow mode testing may flag some scenario where the available test data indicates that an ADS would have performed differently from the human driver. This currently requires a manual analysis of the test data. The “shadow miles” for each scenario need to be evaluated in comparison with the human driver miles for the same scenario.
One aspect of the present disclosure addresses such challenges. According to one aspect of the invention, there is provided a computer implemented method of evaluating planner performance for an ego robot, the method comprising:
The method, when carried out for a plurality of runs, may comprise generating a respective performance indicator for each run of the plurality of runs.
The performance indicator of each level may be associated with a visual indication which is visually distinct from performance indicators of other levels.
The method may further comprise supplying the scenario data to the simulator configured to execute a third planner and to generate third run data, wherein the performance indicator is generated based on a comparison between the first run data and the third run data.
The method may also comprise rendering on a graphical user interface a visual representation of the performance indicators. In such an embodiment, the method may further comprise assigning a unique run identifier to each run of the plurality of runs, the unique run identifier associated with a position in the visual representation of the performance indicators when rendered on a graphical user interface.
The second planner may comprise a modified version of the first planner, wherein the modified version of the first planner comprises a modification affecting one or more of its perception ability, prediction ability and computer execution resource.
The visually distinct visual indications may comprise different colours.
One way of carrying out a performance comparison is to use juncture point recognition as described in our UK Application no GB2107645.0, the contents of which are incorporated by reference. The performance card can be used in a cluster of examination cards, as further described herein.
In some embodiments, the method comprises rendering on the graphical user interface, a plurality of examination cards, each of which comprises a plurality of tiles, where each tile provides a visual indication of a metric indicator for a respective different run, wherein for one of the examination cards, the tiles of that examination card provide the visual representation of the performance indicators.
In some embodiments, the method comprises rendering on a graphical user interface a key, which identifies the levels and their corresponding visual indications.
In some embodiments, the method comprises, supplying the scenario data to the simulator configured to execute a third planner to generate third run data, wherein the performance indicator is generated based on a comparison between the first run data and the third run data.
In some embodiments, the second planner comprises a modified version of the first planner, wherein the modified version of the first planner comprises a modification affecting one or more of its perception ability, prediction ability and computer execution resource.
In some embodiments, the comparing the first run data and the second run data to determine a difference in at least one performance parameter comprises using juncture point recognition to determine if there is a juncture in performance.
In some embodiments, the run data comprises one or more of: sensor data; perception outputs captured/generated onboard one or more vehicles; and data captured from external sensors.
According to a second aspect, there is provided a computer program comprising a set of computer readable instructions, which when executed by a processor cause the processor to perform a method according to the first aspect or any embodiment thereof.
According to a third aspect, there is provided a non-transitory computer readable medium storing the computer program according to the second aspect.
According to fourth aspect, there is provided an apparatus comprising a processor; and a code memory configured to store computer readable instructions for execution by the processor to: extract scenario data from first run data of a first run to generate scenario data defining a scenario, the run data generated by applying a planner in the scenario of that run to generate an ego trajectory taken by an ego robot in the scenario; provide the scenario data to a simulator configured to execute a simulation using the scenario data and implement a second planner to generate second run data; compare the first run data and the second run data to determine a difference in at least one performance parameter; and generate a performance indicator associated with the run, the performance indicator indicating a level of the determined difference between the first run data and the second run data.
In some embodiments, the apparatus comprises a graphical user interface.
In some embodiments, the processor is configured to execute the computer readable instructions to: perform the extracting scenario data and the providing scenario data for each of a plurality of runs; and generate a respective performance indicator for each run of the plurality of runs.
In some embodiments, the processor is configured to execute the computer readable instructions to render on a graphical user interface, a visual representation of the performance indicators.
In some embodiments, the processor is configured to execute the computer readable instructions to assign a unique run identifier to each run of the plurality of runs, the unique run identifier associated with a position in the visual representation of the performance indicators when rendered on a graphical user interface.
In some embodiments, the processor is configured to execute the computer readable instructions to render on the graphical user interface, a plurality of examination cards, each of which comprises a plurality of tiles, where each tile provides a visual indication of a metric indicator for a respective different run, wherein for one of the examination cards, the tiles of that examination card provide the visual representation of the performance indicators.
In some embodiments, the performance indicator of each level is associated with a visual indication, which is visually distinct from performance indicators of other levels.
In some embodiments, the visually distinct visual indications comprise different colours.
In some embodiments, the processor is configured to execute the computer readable instructions to render on a graphical user interface a key, which identifies the levels and their corresponding visual indications.
In some embodiments, the processor is configured to execute the computer readable instructions to render on the graphical user interface, a visual representation of the performance indicators.
In some embodiments, the processor is configured to execute the computer readable instructions to assign a unique run identifier to each run of the plurality of runs, the unique run identifier associated with a position in the visual representation of the performance indicators when rendered on a graphical user interface.
In some embodiments, the processor is configured to execute the computer readable instructions to supply the scenario data to the simulator configured to execute a third planner to generate third run data, wherein the performance indicator is generated based on a comparison between the first run data and the third run data.
In some embodiments, the second planner comprises a modified version of the first planner, wherein the modified version of the first planner comprises a modification affecting one or more of its perception ability, prediction ability and computer execution resource.
In some embodiments, the comparing the first run data and the second run data to determine a difference in at least one performance parameter comprises using juncture point recognition to determine if there is a juncture in performance.
In some embodiments, the run data comprises one or more of: sensor data; perception outputs captured/generated onboard one or more vehicles; and data captured from external sensors.
For a better understanding of the present invention and to show how the same may be carried into effect, reference will now be made by way of example to the accompanying drawings in which:
A performance evaluation tool is described herein, that enables different ‘runs’ to be compared. A so-called “performance card” is generated to provide an accessible indication of the performance of a particular planning stack (or particular portions of a planning stack). A performance card is a data structure comprising a plurality of performance indicator regions, each performance indicator region indicating a performance parameter associated with a particular run. The performance indicator regions are also referred to herein as tiles. A performance card is capable of being visually rendered on a display of a graphical user interface to allow a viewer to quickly discern the performance parameter for each tile. Before describing the performance card in detail, a system with which they may be utilized is firstly described.
In a real-world context, the perception system 102 would receive sensor inputs from an on-board sensor system 110 of the AV and uses those sensor inputs to detect external agents and measure their physical state, such as their position, velocity, acceleration etc. The on-board sensor system 110 can take different forms but generally comprises a variety of sensors such as image capture devices (cameras/optical sensors), lidar and/or radar unit(s), satellite-positioning sensor(s) (GPS etc.), motion sensor(s) (accelerometers, gyroscopes etc.) etc., which collectively provide rich sensor data from which it is possible to extract detailed information about the surrounding environment and the state of the AV and any external actors (vehicles, pedestrians, cyclists etc.) within that environment. The sensor inputs typically comprise sensor data of multiple sensor modalities such as stereo images from one or more stereo optical sensors, lidar, radar etc.
The perception system 102 comprises multiple perception components which co-operate to interpret the sensor inputs and thereby provide perception outputs to the prediction system 104. External agents may be detected and represented probabilistically in a way that reflects the level of uncertainty in their perception within the perception system 102.
The perception outputs from the perception system 102 are used by the prediction system 104 to predict future behaviour of external actors (agents), such as other vehicles in the vicinity of the AV. Other agents are dynamic obstacles from the perceptive of the EV. The outputs of the prediction system 104 may, for example, take the form of a set of predicted of predicted obstacle trajectories.
Predictions computed by the prediction system 104 are provided to the planner 106, which uses the predictions to make autonomous driving decisions to be executed by the AV in a given driving scenario. A scenario is represented as a set of scenario description parameters used by the planner 106. A typical scenario would define a drivable area and would also capture any static obstacles as well as predicted movements of any external agents within the drivable area.
A core function of the planner 106 is the planning of trajectories for the AV (ego trajectories) taking into account any static and/or dynamic obstacles, including any predicted motion of the latter. This may be referred to as trajectory planning. A trajectory is planned in order to carry out a desired goal within a scenario. The goal could for example be to enter a roundabout and leave it at a desired exit; to overtake a vehicle in front; or to stay in a current lane at a target speed (lane following). The goal may, for example, be determined by an autonomous route planner (not shown). In the following examples, a goal is defined by a fixed or moving goal location and the planner 106 plans a trajectory from a current state of the EV (ego state) to the goal location. For example, this could be a fixed goal location associated with a particular junction or roundabout exit, or a moving goal location that remains ahead of a forward vehicle in an overtaking context. A trajectory herein has both spatial and motion components, defining not only a spatial path planned for the ego vehicle, but a planned motion profile along that path.
The planner 106 is required to navigate safely in the presence of any static or dynamic obstacles, such as other vehicles, bicycles, pedestrians, animals etc.
Returning to
In a physical AV, the actor system 112 comprises motors, actuators or the like that can be controlled to effect movement of the vehicle and other physical changes in the real-world ego state.
Control signals from the controller 108 are typically low-level instructions to the actor system 112 that may be updated frequently. For example, the controller 108 may use inputs such as velocity, acceleration, and jerk to produce control signals that control components of the actor system 112. The control signals could specify, for example, a particular steering wheel angle or a particular change in force to a pedal, thereby causing changes in velocity, acceleration, jerk etc., and/or changes in direction.
Embodiments herein have useful applications in simulation-based testing. Referring to the stack 100 by way of example, in order to test the performance of all or part of the stack 100 though simulation, the stack is exposed to simulated driving scenarios. The examples below consider testing of the planner 106—in isolation, but also in combination with one or more other sub-systems or components of the stack 100.
In a simulated driving scenario, an ego agent implements decisions taken by the planner 106, based on simulated inputs that are derived from the simulated scenario as it progresses. Typically, the ego agent is required to navigate within a static drivable area (e.g. a particular static road layout) in the presence of one or more simulated obstacles of the kind a real vehicle needs to interact with safely. Dynamic obstacles, such as other vehicles, pedestrians, cyclists, animals etc. may be represented in the simulation as dynamic agents.
The simulated inputs are processed in exactly the same way as corresponding physical inputs would be, ultimately forming the basis of the planner's autonomous decision making over the course of the simulated scenario. The ego agent is, in turn, caused to carry out those decisions, thereby simulating the behaviours of a physical autonomous vehicle in those circumstances. In simulation, those decisions are ultimately realized as changes in a simulated ego state. There is this a two-way interaction between the planner 106 and the simulator, where decisions taken by the planner 106 influence the simulation, and changes in the simulation affect subsequent planning decisions. The results can be logged and analysed in relation to safety and/or other performance criteria.
Turning to the outputs of the stack 100, there are various ways in which decisions of the planner 106 can be implemented in testing. In “planning-level” simulation, the ego agent may be assumed to exactly follow the portion of the most recent planned trajectory from the current planning step to the next planning step. This is a simpler form of simulation that does not require any implementation of the controller 108 during the simulation. More sophisticated simulation recognizes that, in reality, any number of physical conditions might cause a real ego vehicle to deviate somewhat from planned trajectories (e.g. because of wheel slippage, delayed or imperfect response by the actor system, or inaccuracies in the measurement of the vehicle's own state 112 etc.). Such factors can be accommodated through suitable modelling of the ego vehicle dynamics. In that case, the controller 108 is applied in simulation, just as it would be in real-life, and the control signals are translated to changes in the ego state using a suitable ego dynamics model (in place of the actor system 112) in order to more realistically simulate the response of an ego vehicle to the control signals.
In that case, as in real life, the portion of a planned trajectory from the current planning step to the next planning step may be only approximately realized as a change in ego state.
The testing pipeline 200 is shown to comprise a simulator 202, a test oracle 252 and an ‘introspective’ oracle 253. The simulator 202 runs simulations for the purpose of testing all or part of an AV run time stack.
By way of example only, the description of the testing pipeline 200 makes reference to the runtime stack 100 of
The simulated persecution inputs 203 are used as a basis for prediction and, ultimately, decision-making by the planner 108. However, it should be noted that the simulated perception inputs 203 are equivalent to data that would be output by a perception system 102. For this reason, the simulated perception inputs 203 may also be considered as output data. The controller 108, in turn, implements the planner's decisions by outputting control signals 109. In a real-world context, these control signals would drive the physical actor system 112 of AV. The format and content of the control signals generated in testing are the same as they would be in a real-world context. However, within the testing pipeline 200, these control signals 109 instead drive the ego dynamics model 204 to simulate motion of the ego agent within the simulator 202.
A simulation of a driving scenario is run in accordance with a scenario description 201, having both static and dynamic layers 201a, 201b.
The static layer 201a defines static elements of a scenario, which would typically include a static road layout.
The dynamic layer 201b defines dynamic information about external agents within the scenario, such as other vehicles, pedestrians, bicycles etc. The extent of the dynamic information provided can vary. For example, the dynamic layer 201b may comprise, for each external agent, a spatial path to be followed by the agent together with one or both motion data and behaviour data associated with the path.
In simple open-loop simulation, an external actor simply follows the spatial path and motion data defined in the dynamic layer that is non-reactive i.e. does not react to the ego agent within the simulation. Such open-loop simulation can be implemented without any agent decision logic 210.
However, in “closed-loop” simulation, the dynamic layer 201b instead defines at least one behaviour to be followed along a static path (such as an ACC behaviour). In this, case the agent decision logic 210 implements that behaviour within the simulation in a reactive manner, i.e. reactive to the ego agent and/or other external agent(s). Motion data may still be associated with the static path but in this case is less prescriptive and may for example serve as a target along the path. For example, with an ACC behaviour, target speeds may be set along the path which the agent will seek to match, but the agent decision logic 110 might be permitted to reduce the speed of the external agent below the target at any point along the path in order to maintain a target headway from a forward vehicle.
The output of the simulator 202 for a given simulation includes an ego trace 212a of the ego agent and one or more agent traces 212b of the one or more external agents (traces 212).
A trace is a complete history of an agent's behaviour within a simulation having both spatial and motion components. For example, a trace may take the form of a spatial path having motion data associated with points along the path such as speed, acceleration, jerk (rate of change of acceleration), snap (rate of change of jerk) etc.
Additional information is also provided to supplement and provide context to the traces 212. Such additional information is referred to as “environmental” data 214 which can have both static components (such as road layout) and dynamic components (such as weather conditions to the extent they vary over the course of the simulation).
To an extent, the environmental data 214 may be “passthrough” in that it is directly defined by the scenario description 201 and is unaffected by the outcome of the simulation. For example, the environmental data 214 may include a static road layout that comes from the scenario description 201 directly. However, typically the environmental data 214 would include at least some elements derived within the simulator 202. This could, for example, include simulated weather data, where the simulator 202 is free to change weather conditions as the simulation progresses. In that case, the weather data may be time-dependent, and that time dependency will be reflected in the environmental data 214.
The test oracle 252 receives the traces 212 and the environmental data 214, and scores those outputs against a set of predefined numerical metrics 254. The metrics 254 may encode what may be referred to herein as a “Digital Highway Code” (DHC) or digital driving rules.
The scoring is time-based: for each performance metric, the test oracle 252 tracks how the value of that metric (the score) changes over time as the simulation progresses. The test oracle 252 provides an output 256 comprising a score-time plot for each performance metric.
The metrics 254 are informative to an expert and the scores can be used to identify and mitigate performance issues within the tested stack 100.
The processor 50 executes the computer readable instructions from the code memory 60 to execute a triaging function which may be referred to herein as an examination card function 63. The examination card function accesses the memory 54 to receive the run data as described further herein. Examination cards which are generated by the examination card function 63 are supplied to a memory 56. It will be appreciated that the memory 56 and the memory 54 could be provided by common computer memory or by different computer memories. The introspective oracle 253 further comprises a graphical user interface (GUI) 300 which is connected to the processor 50. The processor 50 may access examination cards which are stored in the memory 56 to render them on the graphical user interface 300 for the purpose further described herein. A visual rendering function 66 may be used to control the graphical user interface 300 to present the examination cards and associated information to a user.
Each run is associated with the same tile position in every examination card 401. In the example of
Each run may be subject to analysis under multiple metrics. For each metric analysed, a corresponding indicator may be generated, and the indicator represented in a tile of an examination card 401. The quantity of cards 401 therefore corresponds to the number of metrics under which each run is analysed. Tile positions within an examination card 401 will be referred to herein using coordinates such as T(a,b), where “a” is the tile row starting from the top of the card, and “b” is the tile column starting from the left. For example, coordinate T(1,20) would refer to the tile in the top right position in a card.
In an examination card 401, a tile may include a representation of the indicator assigned to that tile for the metric of that card 401; the representation may be a visual representation such as a colour. In a particular card 401, tiles which have been assigned the same indicator will therefore include the same representation of that indicator. If each indicator in the group of indicators for the metric of the card 401 is associated with a different visual representation, such as a colour, then tiles associated with a particular indicator will be visually distinguishable from the tiles which are not associated with that particular indicator.
Tiles with the same indicator in the same card represent a cluster 405. A cluster 405 is therefore a set of runs sharing a common indicator for the metric associated with the card. Each examination card 401 may identify one or more cluster 405. Runs in the same cluster may be identified by visual inspection by a user as being in the same cluster because they share a visual representation when displayed on the GUI 300. Alternatively, clusters may be automatically identified by matching the indicators to group tiles with a common indicator.
For each examination card, there may be an associated cluster key 409 generated by the processor and rendered on the GUI 300, which identifies the clusters 405 and their corresponding visual representations. A user may therefore quickly identify, by visual inspection, runs which have similar characteristics with respect to the metric of each examination card 401. As mentioned, an automated tool may be programmed to recognise where tiles share a common value and are therefore in a cluster 405. Tiles in a cluster 405 can indicate where a focus may be needed.
The system may be capable of multi-axis cluster recognition. A multi-axis cluster may comprise a quantity of runs which are in the same cluster in multiple examination cards 401. That is, a multi-axis cluster comprise runs which are similar with respect to multiple metrics.
A first examination card 401a is a location card. The location card 401a comprises 80 tiles, each tile representing a different run. For each run, a location indicator has been generated and assigned to the corresponding tiles. The location indicator can be identified in the run data when it is uploaded to the introspective oracle 253, or can be inferred from data gathered during the run. In the example of
Runs which share the same location value may then be identified as being in a cluster 405. For example, the runs in positions T(1,1), T(1,2), T(3,2) and T(3,5) of location card 401a are represented by brown tiles which, as seen in the cluster key 409 associated with the location card 401a, identifies those runs as taking place in “Location A.”
A second examination card 401b is a driving situation card. The driving situation card 401b comprises 80 tiles, each tile position representing the same run as in the corresponding tile position on the location card 401a. For each run, a situation indicator has been generated and assigned to the corresponding tiles. The situation indicator can be identified in the run data when it is uploaded to the introspective oracle 253, or can be inferred from data gathered during the run. In the example of
For example, the runs in positions T(1,1), T(1,2), T(3,2) and T(3,5) of the driving situation card 401b are represented by grey tiles which, as seen in the cluster key 409 associated with the situation card 401b, identifies those runs as taking place in an “unknown” driving situation.
The cards 401a and 401b are associated with qualitative situation or context metrics of the run scenarios. The following described examination cards, 401c and 401d, are associated with outcome metrics, which assess outcomes evaluated during a run, such as a road rule failure by the test oracle as described earlier.
A third examination card 401c is a road rules card. Road rules card 401c comprises 80 tiles, each tile position representing the same run as in the corresponding tile position on the location card 401a and the driving situation card 401b. Each run is assigned a road rules indicator from a predetermined group of road rules indicators. Each indicator in the group thereof may represent a quantisation level with respect to the road rules metric. In the example of
For example, the runs in positions T(3,19), T(4,19) and T(4,20) of road rules card 401c are represented by red tiles which, as seen in the cluster key 409 associated with the road rules card 401c, indicates that those runs included at least one road rule violation.
A fourth examination card 401d is a performance card. The performance card 401d comprises 80 tiles, each tile position representing the same run as in the corresponding tile position on the location card 401a, the driving situation card 401b and the road rules card 401c. The clusters associated with the performance card differentiate each run based on the extent to which each run is susceptible of improvement. The visual indicators 407 of the performance card 401d define a scale with which a user can visually identify the improvement potential of each run. For example, a dark green, light green, blue, orange or red tile would respectively indicate that the associated run is susceptible of no improvement, minimal, minor, major, or extreme improvement. The performance card is described in more detail later.
Further, the described report may be presented to a user by displaying it on the graphical user interface (GUI) 300. Each tile 403 in an examination card 401 may be a selectable feature which, when selected on the GUI 300, opens a relevant page for the associated run. For example, selecting a tile 403 in the road rules card 401c may open a corresponding test oracle evaluation page. The above described report may be received, for example, by email. Users may receive a report as an interactive file through which they can access test oracle pages or other relevant data.
For example, with reference to
The unique identifiers 501 in the consistent outliers category 505 of
The summary section of
The summary section of
The summary section of
The manner in which these visual indications are determined is described in more detail in the following. When the performance card is rendered to a user on a graphical user interface, a user may select a particular tile, for example based on the visual indication (colour) of that tile. When a tile is selected, details of the potential performance improvement may be displayed to the viewer on the graphical user interface. In certain embodiments, an interactive visualization with metrics and automated analysis of the results may be presented to aid a user in understanding the reasons for indicating a certain level of potential performance improvement.
The performance card is particularly useful in enabling a user to understand the performance of his planning stack (or certain portions of his planning stack). For this application, details of a user run are required.
The run data from the simulation is supplied to a performance comparison function 156. The ground truth actual run data is also supplied to the performance comparison function 156. The performance comparison function 156 determines whether there is a difference in performance between the real run and the simulated run. This may be done in a number of different ways, as further described herein. One novel technique discussed herein and discussed in UK patent application no: GB2107645.0 is juncture point recognition. The performance difference of the runs is used to generate a visual indication for the tile associated with this run in the performance card. If there was no difference, a visual indication indicating that no improvement has been found is provided (for example, dark green). This means that the comparison system has failed to find any possible improvement for this scenario, even when run against a “reference planner stack”. This means that the original planner stack A performed as well as it could be expected to, or that no significant way could be found to improve its performance. This information in itself is useful to a user of stack A.
If a significant difference is found in the performance between the real run and the simulated run, an estimate may be made of how much the performance could be improved. A visual indication is provided for each level of estimate in a quantized estimation scale.
As illustrated in
An exemplary workflow is shown in
As mentioned, one possible technique for comparing performance is to use juncture point recognition. When a juncture is identified, it is possible to identify either semi, automatically or fully automatically how the performance may be improved. In certain embodiments, this may be performed by “input ablation”. The term “input ablation” is used herein to denote analysis of a system by comparing it with the same system but with a modification to reconfigure it. Specifically, the reconfiguring can involve removing some aspect of the system or some performance element of the system. For example, it is possible to use perception input ablation, in which case the performance of a stack is analysed without relying on ground truth perception. Instead, realistic perception data is utilized, with the expectation that this will show a lower performance.
A specific example of perception input ablation is discussed further herein. As explained above with reference to
Other forms of “ablation” may be utilized to allow a user to be assisted in determining when a line of investigation may be helpful or not. For example, certain prediction parameters may be ablated. In another example, resource constraints may be modified, for example, limits may be imposed on the processing resource, memory resource or operating frequency of the planning stack.
Ablation may be performed in a binary (on/off) manner as described in our PCT patent application No: PCT/EP2020/073563, the contents of which are incorporated by reference. This PCT application provides an approach to simulation-based safety testing using what are referred to herein as “Perception Statistical Performance Models” (PSPMs). PSPMs model perception errors in terms of probabilistic uncertainty distributions, based on a robust statistical analysis of actual perception outputs computed by a perception component or components being modelled. A unique aspect of PSPMs is that, given a perception ground truth (i.e. a “perfect” perception output that would be computed by a perfect but unrealistic perception component), a PSPM provides a probabilistic uncertainty distribution that is representative of realistic perception components that might be provided by the perception component(s) it is modelling. For example, given a ground truth 3D bounding box, a PSPM which models a PSPM modelling a 3D bounding box detector will provide an uncertainty distribution representative of realistic 3D object detection outputs. Even when a perception system is deterministic, it can be usefully modelled as stochastic to account for epistemic uncertainty of the many hidden variables on which it depends on practice.
Perception ground truths will not, of course, be available at runtime in a real-world AV (this is the reason complex perception components are needed that can interpret imperfect sensor outputs robustly). However, perception ground truths can be derived directly from a simulated scenario run in a simulator. For example, given a 3D simulation of a driving scenario with an ego vehicle (the simulated AV being tested) in the presence of external actors, ground truth 3D bounding boxes can be directly computed from the simulated scenario for the external actors based on their size and pose (location and orientation) relative to the ego vehicle. A PSPM can then be used to derive realistic 3D bounding object detection outputs from those ground truths, which in turn can be processed by the remaining AV stack just as they would be at runtime.
A PSPM for modelling a perception slice of a runtime stack for an autonomous vehicle or other robotic system may be used e.g. for safety/performance testing. A PSPM is configured to receive a computed perception ground truth, and determine from the perception ground truth, based on a set of learned parameters, a probabilistic perception uncertainty distribution, the parameters learned from a set of actual perception outputs generated using the perception slice to be modelled. A simulated scenario is run based on a time series of such perception outputs (with modelled perception errors), but can also be re-run based on perception ground truths directly (without perception errors). This can, for example, be a way to ascertain whether perception error was the cause of some unexpected decision within the planner, by determining whether such a decision is also taken in the simulated scenario when perception error is “switched off”.
The examples discussed in the present disclosure enables at each juncture an interactive visualization to be indicated to a user with metrics and automated analysis of the results to aid the user in understanding between the two runs which are being compared, for example:
As already mentioned, while two scenarios have been utilized two runs have been used for ease of explanation, a user may be comparing multiple scenarios in a multidimensional performance comparison against multiple planner stacks/input ablations/original scenarios.
The examples described herein are to be understood as illustrative examples of embodiments of the invention. Further embodiments and examples are envisaged. Any feature described in relation to any one example or embodiment may be used alone or in combination with other features. In addition, any feature described in relation to any one example or embodiment may also be used in combination with one or more features of any other of the examples or embodiments, or any combination of any other of the examples or embodiments. Furthermore, equivalents and modifications not described herein may also be employed within the scope of the invention, which is defined in the claims.
Number | Date | Country | Kind |
---|---|---|---|
2107644.3 | May 2021 | GB | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/EP2022/064466 | 5/27/2022 | WO |