This disclosure relates generally to generating realistic sensor-fusion detection estimates of objects.
In general, there are a lot of challenges to developing an autonomous or semi-autonomous vehicle. To assist with its development, the autonomous or semi-autonomous vehicle often undergoes numerous tests based on various scenarios. In this regard, simulations are often used in many instances since they are more cost effective to perform than actual driving tests. However, there are many instances in which simulations do not accurately represent real use-cases. For example, in some cases, some simulated camera images may look more like video game images than actual camera images. In addition, there are some types of sensors, which produce sensor data that is difficult and costly to simulate. For example, radar detections are known to be difficult to simulate with accuracy. As such, simulations with these types of inaccuracies may not provide the proper conditions for the development, testing, and evaluation of autonomous and semi-autonomous vehicles.
The following is a summary of certain embodiments described in detail below. The described aspects are presented merely to provide the reader with a brief summary of these certain embodiments and the description of these aspects is not intended to limit the scope of this disclosure. Indeed, this disclosure may encompass a variety of aspects that may not be explicitly set forth below.
In an example embodiment, a system for generating a realistic simulation includes at least a non-transitory computer readable medium and a processing system. The non-transitory computer readable medium includes a visualization of a scene that includes a template of a simulation object within a region. The processing system is communicatively connected to the non-transitory computer readable medium. The processing system includes at least one processing device, which is configured to execute computer-readable data to implement a method that includes generating a sensor-fusion representation of the template upon receiving the visualization as input. The method includes generating a simulation of the scene with a sensor-fusion detection estimate of the simulation object instead of the template within the region. The sensor-fusion detection estimate includes object contour data indicating bounds of the sensor-fusion representation. The sensor-fusion detection estimate represents the bounds or shape of an object as would be detected by a sensor-fusion system.
In an example embodiment, a computer-implemented method includes obtaining, via a processing system with at least one computer processor, a visualization of a scene that includes a template of a simulation object within a region. The method includes generating, via the processing system, a sensor-fusion representation of the template upon receiving the visualization as input. The method includes generating, via the processing system, a simulation of the scene with a sensor-fusion detection estimate of the simulation object instead of the template within the region. The sensor-fusion detection estimate includes object contour data indicating bounds of the sensor-fusion representation. The sensor-fusion detection estimate represents the bounds or shape of an object as would be detected by a sensor-fusion system.
In an example embodiment, a non-transitory computer readable medium includes computer-readable data that, when executed by a computer processor, is configured to implement a method. The method includes obtaining a visualization of a scene that includes a template of a simulation object within a region. The method includes generating a sensor-fusion representation of the template upon receiving the visualization as input. The method includes generating a simulation of the scene with a sensor-fusion detection estimate of the simulation object instead of the template within the region. The sensor-fusion detection estimate includes object contour data indicating bounds of the sensor-fusion representation. The sensor-fusion detection estimate represents the bounds or shape of an object as would be detected by a sensor-fusion system.
These and other features, aspects, and advantages of the present invention are discussed in the following detailed description in accordance with the accompanying drawings throughout which like characters represent similar or like parts.
The embodiments described herein, which have been shown and described by way of example, and many of their advantages will be understood by the foregoing description, and it will be apparent that various changes can be made in the form, construction, and arrangement of the components without departing from the disclosed subject matter or without sacrificing one or more of its advantages. Indeed, the described forms of these embodiments are merely explanatory. These embodiments are susceptible to various modifications and alternative forms, and the following claims are intended to encompass and include such changes and not be limited to the particular forms disclosed, but rather to cover all modifications, equivalents, and alternatives falling within the spirit and scope of this disclosure.
In an example embodiment, the simulation system 100 includes a memory system 120, which comprises any suitable memory configuration that includes at least one non-transitory computer readable medium. For example, the memory system 120 includes semiconductor memory, random access memory (RAM), read only memory (ROM), virtual memory, electronic storage devices, optical storage devices, magnetic storage devices, memory circuits, any suitable memory technology, or any combination thereof. The memory system 120 is configured to include local, remote, or both local and remote components with respect to the simulation system 100. The memory system 120 stores various computer readable data. For example, in
In an example embodiment, the simulation system 100 also includes at least a communication network 130, an input/output interface 140, and other functional modules. The communication network 130 is configured to enable communications between and/or among one or more components of the simulation system 100. The communication network 130 includes wired technology, wireless technology, any suitable communication technology, or any combination thereof. For example, the communication network 130 enables the processing system 110 to communicate with the memory system 120 and the input/output interface 140. The input/output interface 140 is configured to enable communication between one or more components of the simulation system 100 and one or more components of the application system 10. For example, in
In an example embodiment, the application system 10 is configured to receive realistic simulations from the simulation system 100. In an example embodiment, for instance, the application system 10 relates to a vehicle 20, which is autonomous, semi-autonomous, or highly-autonomous. Alternatively, the simulations can be applied to a non-autonomous vehicle. For example, in
In an example embodiment, the data collection process 210 includes obtaining and storing a vast amount of collected data from the real-world. More specifically, for instance, the data collection process 210 includes collecting sensor-based data (e.g., sensor data, sensor-fusion data, etc.) via various sensing devices that are provided on various mobile machines during various real-world drives. In this regard, for example,
In an example embodiment, the vehicle 220 includes a vehicle processing system 220B with non-transitory computer-readable memory. The computer-readable memory is configured to store various computer-readable data including program instructions, sensor-based data (e.g., raw sensor data, sensor-fusion data, etc.), and other related data (e.g., map data, localization data, etc.). The other related data provides relevant information (e.g., context) regarding the sensor-based data. In an example embodiment, the vehicle processing system 220B is configured to process the raw sensor data and the other related data. Additionally or alternatively, the processing system 220B is configured to generate sensor-fusion data based on the processing of the raw sensor data and the other related data. After obtaining this sensor-based data and other related data, the processing system 220B is configured to transmit or transfer a version of this collected data from the vehicle 220 to the memory system 230 via communication technology, which includes wired technology, wireless technology, or both wired and wireless technology.
In an example embodiment, the data collection process 210 is not limited to this data collection technique involving vehicle 220, but can include other data gathering techniques that provide suitable real-world sensor-based data. In addition, the data collection process 210 includes collecting other related data (e.g. map data, localization data, etc.), which corresponds to the sensor-based data that is collected from the vehicles 220. In this regard, for example, the other related data is advantageous in providing context and/or further details regarding the sensor-based data.
In an example embodiment, the memory system 230 is configured to store the collected data in one or more non-transitory computer readable media, which includes any suitable memory technology in any suitable configuration. For example, the memory system 230 includes semiconductor memory, RAM, ROM, virtual memory, electronic storage devices, optical storage devices, magnetic storage devices, memory circuits, cloud storage system, any suitable memory technology, or any combination thereof. For instance, in an example embodiment, the memory system 230 includes at least non-transitory computer readable media in at least a computer cluster configuration.
In an example embodiment, after this collected data has been stored in the memory system 230, then the process 200 includes ensuring that a processing system 240 trains the machine-learning model with appropriate training data, which is based on this collected data. In an example embodiment, the processing system 240 includes at least one processor (e.g., CPU, GPU, processing circuits, etc.) with one or more modules, which include hardware, software, or a combination of hardware and software technology. For example, in
In an example embodiment, upon obtaining the collected data, the pre-processing module 240A is configured to provide suitable training data for the machine-learning model. In
In an example embodiment, the processing module 240B is configured to train at least one machine-learning model to generate sensor-fusion detection estimates for objects based on real-world training data according to real-use cases. In
In an example embodiment, the processing module 240B is configured to train machine-learning technology (e.g., machine-learning algorithms) to generate sensor-fusion detection estimates for objects in response to receiving object data for these objects. In this regard, for example, the memory system 230 includes machine-learning data such as neural network data. More specifically, in an example embodiment, for instance, the machine-learning data includes a generative adversarial network (GAN). In an example embodiment, the processing module 240B is configured to train the GAN model to generate new objects based on different inputs. For example, the GAN is configured to transform one type of image (e.g., a visualization, a computer graphics-based image, etc.) into another type of image (e.g., a real-looking image such as a sensor-based image). The GAN is configured to modify at least parts of an image. As a non-limiting example, for instance, the GAN is configured to transform or replace one or more parts (e.g., extracted object data) of an image with one or more items (e.g., sensor-fusion detection estimates). In this regard, for example, with the appropriate training, the GAN is configured to change at least one general attribute of an image.
In
In an example embodiment, the generation of sensor-fusion detection estimates of objects include the generation of sensor-fusion representations, which indicate bounds of detections corresponding to those objects. More specifically, in
In an example embodiment, the processing module 240B is configured to train the GAN to transform the extracted object data corresponding to the objects into sensor-fusion detection estimates, separately or collectively. For example, the processing module 240B is configured to train the GAN to transform object data of selected objects into sensor-fusion detection estimates on an individual basis (e.g., one at a time). Also, the processing module 240B is configured to train the GAN to transform one or more sets of object data of selected objects into sensor-fusion detection estimates, simultaneously. As another example, instead of performing transformations, the processing module 240B is configured to train the GAN to generate sensor-fusion detection estimates from object data of selected objects on an individual basis (e.g., one at a time). Also, the processing module 240B is configured to train the GAN to generate sensor-fusion detection estimates from object data of one or more sets of object data of selected objects, simultaneously.
At step 302, in an example embodiment, the processing system 240 is configured to obtain training data. For instance, as shown in
At step 304, in an example embodiment, the processing system 240 is configured to train the neural network to generate realistic sensor-fusion detection estimates. The processing system 240 is configured to train the neural network (e.g., at least one GAN model) based on training data, which includes at least real-world sensor-fusion detections of objects and corresponding annotations. In an example embodiment, the training includes steps 306, 308, and 310. In addition, the training includes determining whether or not this training phase is complete, as shown at step 312. Also, the training can include other steps, which are not shown in
At step 306, in an example embodiment, the processing system 240 is configured to generate sensor-fusion detection estimates via at least one machine-learning model. In an example embodiment, the machine-learning model includes a GAN model. In this regard, upon receiving the training data, the processing system 240 is configured to generate sensor-fusion detection estimates via the GAN model. In an example embodiment, a sensor-fusion detection estimate of an object provides a representation that indicates the general bounds of sensor-fusion data that is identified as that object. Non-limiting examples of these representations include data structures, graphical renderings, any suitable detection agents, or any combination thereof. For instance, the processing system 240 is configured to generate sensor-fusion detection estimates for objects that include polygonal representations, which comprise data structures with polygon data (e.g., coordinate values) and/or graphical renderings of the polygon data that indicate the polygonal bounds of detections amongst the sensor-fusion data for those objects. Upon generating sensor-fusion detection estimates for objects, the processing system 240 is configured to proceed to step 308.
At step 308, in an example embodiment, the processing system 240 is configured to compare the sensor-fusion detection estimates with the real-world sensor-fusion detections. In this regard, the processing system 240 is configured to determine discrepancies between the sensor-fusion detection estimates of objects and the real-world sensor-fusion detections of those same objects. For example, the processing system 240 is configured to perform at least one difference calculation or loss calculation based on a comparison between a sensor-fusion detection estimate and a real-world sensor-fusion detection. This feature is advantageous in enabling the processing system 240 to fine-tune the GAN model such that a subsequent iteration of sensor-fusion detection estimates are more realistic and more attuned to the real-world sensor-fusion detections than the current iteration of sensor-fusion detection estimates. Upon performing this comparison, the processing system 240 is configured to proceed to step 310.
At step 310, in an example embodiment, the processing system 240 is configured to update the neural network. More specifically, the processing system 240 is configured to update the model parameters based on comparison metrics obtained from the comparison, which is performed at step 308. For example, the processing system 240 is configured to improve the trained GAN model based on results of one or more difference calculations or loss calculations. Upon performing this update, the processing system 240 is configured to proceed to step 306 to further train the GAN model in accordance with the updated model parameters upon determining that the training phase is not complete at step 312. Alternatively, the processing system is configured to end this training phase at step 314 upon determining that this training phase is sufficient and/or complete at step 312.
At step 312, in an example embodiment, the processing system 240 is configured to determine whether or not this training phase is complete. In an example embodiment, for instance, the processing system 240 is configured to determine that the training phase is complete when the comparison metrics are within certain thresholds. In an example embodiment, the processing system 240 is configured to determine that the training phase is complete upon determining that the neural network (e.g., at least one GAN model) has been trained with a predetermined amount of training data (or a sufficient amount of training data). In an example embodiment, the training phase is determined to be sufficient and/or complete when accurate and reliable sensor-fusion detection estimates are generated by the processing system 240 via the GAN model. In an example embodiment, the processing system 240 is configured to determine that the training phase is complete upon receiving a notification that the training phase is complete.
At step 314, in an example embodiment, the processing system 240 is configured to end this training phase. In an example embodiment, upon completing this training phase, the neural network is deployable for use. For example, in
At step 402, in an example embodiment, the processing system 110 is configured to obtain simulation data, which includes a simulation program with at least one visualization of at least one simulated scene. In an example embodiment, for instance, the visualization of the scene includes at least a three-channel pixel image. More specifically, as a non-limiting example, a three-channel pixel image is configured to include, for example, in any order, a first channel with a location of the vehicle 20, a second channel with locations of simulation objects (e.g., dynamic simulation objects), and a third channel with map data. In this case, the map data includes information from a high-definition map. The use of a three-channel pixel image in which the simulation objects are provided in a distinct channel is advantageous in enabling efficient handling of the simulation objects. Also, in an example embodiment, each visualization includes a respective scene, scenario, and/or condition (e.g., snow, rain, etc.) from any suitable view (e.g., top view, side view, etc.). For example, a visualization of the scene with a two-dimensional (2D) top view of template versions of simulation objects within a region is relatively convenient and easy to generate compared to other views while also being relatively convenient and easy for the processing system 110 to handle.
In an example embodiment, the simulation objects are representations of real-world objects (e.g., pedestrians, buildings, animals, vehicles, etc.), which may be encountered in a region of that environment. In an example embodiment, these representations are model versions or template versions (e.g. non-sensor-based versions) of these real-world objects, thereby not being accurate or realistic input for the vehicle processing system 30 compared to real-world detections, which are captured by sensors 220A of the vehicle 220 during a real-world drive. In an example embodiment, the template version include at least various attribute data of an object as defined within the simulation. For example, the attribute data can include size data, shape data, location data, other features of an object, any suitable data, or any combination thereof. In this regard, the generation of visualizations of scenes that include template versions of simulation objects is advantageous as this allows various scenarios and scenes to be generated at a fast and inexpensive rate since these visualizations can be developed without having to account for how various sensors would detect these simulation objects in the environment. As a non-limiting example, for instance, in
At step 404, in an example embodiment, the processing system 110 is configured to generate a sensor-fusion detection estimate for each simulation object. For example, in response to receiving the simulation data (e.g., a visualization of a scene) as input, the processing system 110 is configured to implement or employ at least one trained GAN model to generate sensor-fusion representations and/or sensor-fusion detection estimates in direct response to the input. More specifically, the processing system 110 is configured to implement a method to provide simulations with sensor-fusion detection estimates. In this regard, for instance, two different methods are discussed below in which a first method involves image-to-image transformation and the second method involves image-to-contour transformation.
As a first method, in an example embodiment, the processing system 110 together with the trained GAN model is configured to perform image to image transformation such that a visualization of a scene with at least one simulation object is transformed into an estimate of a sensor-fusion occupancy map with sensor-fusion representations of the simulation object. In this case, the estimate of the sensor-fusion occupancy map is a machine-learning based representation of a real-world sensor-fusion occupancy map that a mobile machine (e.g., vehicle 20) would generate during a real-world drive. For example, the processing system 110 is configured to obtain simulation data with at least one visualization of at least one scene that includes a three-channel image or any suitable image. More specifically, in an example embodiment, the processing system 110, via the trained GAN model, is configured to transform the visualization of a scene with simulation objects into a sensor-fusion occupancy map (e.g., 512×512 pixel image or any suitable image) with corresponding sensor-fusion representations of those simulation objects. As a non-limiting example, for instance, the sensor-fusion occupancy map includes sensor-fusion representations with one or more pixels having pixel data (e.g., pixel colors) that indicates object occupancy (and/or probability data relating to object occupancy for each pixel). In this regard, for example, upon obtaining a visualization of a scene (e.g., image 800A of
Also, for this first method, after generating the sensor-fusion occupancy map with sensor-fusion representations corresponding to simulation objects, the processing system 110 is configured to perform object contour extraction. More specifically, for example, the processing system 110 is configured to obtain object information (e.g., size and shape data) from the occupancy map. In addition, the processing system 110 is configured to identify pixels with an object indicator or an object marker as being sensor-fusion data that corresponds to a simulation object. For example, the processing system 110 is configured to identify one or more pixel colors (e.g., dark pixel colors) as having a relatively high probability of being sensor-fusion data that represents a corresponding simulation object and cluster those pixels together. Upon identifying pixels of a sensor-fusion representation that corresponds to a simulation object, the processing system 110 is then configured to obtain an outline of the clusters of pixels of sensor-fusion data that correspond to the simulation objects and present the outline as object contour data. In an example embodiment, the processing system 110 is configured to provide the object contour data as a sensor-fusion detection estimate for the corresponding simulation object.
As a second method, in an example embodiment, the processing system 110 is configured to receive a visualization of a scene with at least one simulation object. For instance, as a non-limiting example of input, the processing system 110, via the at least one trained GAN model, is configured to receive a visualization of a scene that includes at least one simulation object in a center region with a sufficient amount of contextual information regarding the environment. As another example of input, the processing system 110, via the at least one trained GAN model, is configured to receive a visualization of a scene that includes at least one simulation object along with additional information provided in a data vector. For instance, in a non-limiting example, the data vector is configured to include additional information relating to the simulation object such as a distance from that simulation object to the vehicle 10, information regarding other vehicles between the simulation object and the vehicle 10, environment condition (e.g., weather information), other relevant information, or any combination thereof.
Also, for this second method, upon receiving simulation data as input, the processing system 110 via the trained GAN model is configured to transform each simulation object from the visualization directly into a corresponding sensor-fusion detection estimate, which includes object contour data. In this regard, for instance, the object contour data includes a suitable number of points that identify an estimate of an outline of bounds of the sensor-fusion data that represents that simulation object. For instance, as a non-limiting example, the processing system 110 is configured to generate object contour data, which is scaled in meters for 2D space and includes the following points: (1.2, 0.8), (1.22, 0.6), (2.11, 0.46), (2.22, 0.50), (2.41, 0.65), and (1.83, 0.70). In this regard, the object contour data advantageously provides an indication of estimates of bounds of sensor-fusion data that represent object detections as would be detected by a sensor-fusion system in an efficient manner with relatively low memory consumption.
For the first method or the second method associated with step 404, the processing system 110 is configured to generate or provide an appropriate sensor-fusion detection estimate for each simulation object in accordance with how a real-world sensor-fusion system would detect such an object in that scene. In an example embodiment, the processing system 110 is configured to generate each sensor-fusion detection estimate for each simulation object on an individual basis. As another example, the processing system 110 is configured to generate or provide sensor-fusion detection estimates for one or more sets of simulation objects at the same time. As yet another example, the processing system 110 is configured to generate or provide sensor-fusion detection estimates for all of the simulation objects simultaneously. In an example embodiment, the processing system 110 is configured to provide object contour data as sensor-fusion detection estimates of simulation objects. After obtaining one or more sensor-fusion detection estimates, the processing system 110 proceeds to step 406.
At step 406, in an example embodiment, the processing system 110 is configured to apply the sensor-fusion detection estimates to at least one simulation step. More specifically, for example, the processing system 110 is configured to generate a simulation scene, which includes at least one visualization of at least one scene with at least one sensor-fusion detection estimate in place of the template of the simulation object. In this regard, the simulation may include the visualization of the scene with a transformation of the extracted object data into sensor-fusion detection estimates or a newly generated visualization of the scene with sensor-fusion detection estimates in place of the extracted object data. Upon applying or including the sensor-fusion detection estimates as a part of the simulation, the processing system 110 is configured to proceed to step 408.
At step 408, in an example embodiment, the processing system 110 is configured to transmit the simulation to the application system 10 so that the simulation is executed on one or more components of the application system 10, such as the vehicle processing system 30. For example, the processing system 110 is configured to provide this simulation to a trajectory system, a planning system, a motion control system, a prediction system, a vehicle guidance system, any suitable system, or any combination thereof. More specifically, for instance, the processing system 110 is configured to provide the simulations with the sensor-fusion detection estimates to a planning system or convert the sensor-fusion detection estimates into a different data structure or a simplified representation for faster processing. With this realistic input, the application system 10 is provided with information, such as feedback data and/or performance data, which enables one or more components of the application system 10 to be evaluated and improved based on simulations involving various scenarios in a cost-effective manner.
As described herein, the simulation system 100 provides a number of advantageous features, as well as benefits. For example, when applied to the development of an autonomous or a semi-autonomous vehicle 20, the simulation system 100 is configured to provide simulations as realistic input to one or more components of the vehicle 20. For example, the simulation system 100 is configured to provide simulations to a trajectory system, a planning system, a motion control system, a prediction system, a vehicle guidance system, any suitable system, or any combination thereof. Also, by providing simulations with sensor-fusion detection estimates, which are the same as or remarkably similar to real-world sensor-fusion detections that are obtained during real-world drives, the simulation system 100 is configured to contribute to the development of an autonomous or a semi-autonomous vehicle 20 in a safe and cost-effective manner while also reducing safety-critical behavior.
In addition, the simulation system 100 employs a trained machine-learning model, which is advantageously configured for sensor-fusion detection estimation. More specifically, as discussed above, the simulation system 100 includes a trained machine learning model (e.g., GAN. DNN, etc.), which is configured to generate sensor-fusion representations and/or sensor-fusion detection estimates in accordance with how a mobile machine, such as a vehicle 20, would provide such data via a sensor-fusion system during a real-world drive. Although the sensor-fusion detections of objects via a mobile machine varies in accordance with various factors (e.g., distance, sensor locations, occlusion, size, other parameters, or any combination thereof), the trained GAN model is nevertheless trained to generate or predominately contribute to the generation of realistic sensor-fusion detection estimates of these objects in accordance with real-use cases, thereby accounting for these various factors and providing realistic simulations to one or more components of the application system 10.
Furthermore, the simulation system 100 is configured to provide various representations and transformations via the same trained machine-learning model (e.g. trained GAN model), thereby improving the robustness of the simulation system 100 and its evaluation. Moreover, the simulation system 100 is configured to generate a large number of simulations by transforming or generating sensor-fusion representations and/or sensor-fusion detection estimates in place of object data in various scenarios in an efficient and effective manner, thereby leading to faster development of a safer system for an autonomous or semi-autonomous vehicle 20.
That is, the above description is intended to be illustrative, and not restrictive, and provided in the context of a particular application and its requirements. Those skilled in the art can appreciate from the foregoing description that the present invention may be implemented in a variety of forms, and that the various embodiments may be implemented alone or in combination. Therefore, while the embodiments of the present invention have been described in connection with particular examples thereof, the general principles defined herein may be applied to other embodiments and applications without departing from the spirit and scope of the described embodiments, and the true scope of the embodiments and/or methods of the present invention are not limited to the embodiments shown and described, since various modifications will become apparent to the skilled practitioner upon a study of the drawings, specification, and following claims. For example, components and functionality may be separated or combined differently than in the manner of the various described embodiments, and may be described using different terminology. These and other variations, modifications, additions, and improvements may fall within the scope of the disclosure as defined in the claims that follow.