Embodiments generally relate to sensors. More particularly, embodiments relate to realistic sensor simulation and probabilistic measurement correction.
Simulators may be used to model real world environments for a wide variety of applications. For example, the design of an automated navigation system for an autonomous vehicle (AV) might involve testing the system in a simulator prior to deployment. In such a case, the simulator may use a virtual lidar (light detection and ranging) sensor to emulate the operation of a physical lidar sensor to be mounted to the AV in the real world environment. The virtual lidar sensor typically uses ray tracing techniques to detect and/or classify nearby objects in the virtual world. The ray tracing techniques may rely on geometric principles that fail to account for the effect that environmental conditions (e.g., weather, sunlight, temperature) and object material characteristics (e.g., diffraction, absorption, etc.) have on the measurements of the physical lidar sensor. Accordingly, the sensor output may be too idealized, which gives rise to reliability concerns. For example, the automated navigation system might be trained on sensor data that is more accurate than the sensor data available upon deployment in the real world environment.
The various advantages of the embodiments will become apparent to one skilled in the art by reading the following specification and appended claims, and by referencing the following drawings, in which:
Turning now to
In the illustrated example, the view is from the perspective of a vehicle cabin (e.g., driver seat, pilot seat, etc.). In an embodiment, a physical ranging sensor such as, for example, a physical lidar sensor, radar (radio detection and ranging) sensor, sonar (sound navigation and ranging) sensor, etc., is used to automatically detect and/or classify objects in the real world environment 20. For example, a physical lidar sensor (e.g., mounted to the roof of the AV) may emit a focused light beam (e.g., outbound optical pulse transmitted in a 360° Field of View/FOV) and measure the time-of-flight of one or more optical reflections (e.g., inbound reflections) to detect the presence, shape, etc., of a physical stop sign 24 in the real world environment 20.
By contrast, a virtual ranging sensor such as, for example, a virtual lidar, radar, sonar, etc., may be used to render objects in the virtual world environment 22. In an embodiment, the virtual ranging sensor uses predetermined information regarding the virtual world environment 22 and ray tracing techniques to determine whether objects such as, for example, a virtual stop sign 26 in the virtual world environment 22 are visible from the perspective of the vehicle cabin or occluded from view.
Of particular note is that environmental conditions such as, for example, weather, sunlight, temperature, etc., may have an impact on the visual appearance of the real world environment 20. These environmental factors might be partially accounted for in the simulation, but differences might occur from reality in the performance of the virtual and physical sensors. In the illustrated example, the presence of fog 28 results in the physical stop sign 24 being partially occluded from view. Accordingly, the output of the physical ranging sensor may be relatively noisy. More particularly, the sensor noise is due to the fog 28 changing the path of outbound optical pulses and/or inbound reflections, outbound radio pulses and/or inbound reflections, and so forth. In an embodiment, the appearance of the real world environment 20 is also dependent on other environmental conditions such as temperature. For example, temperature variations throughout the day, from season-to-season, or across years (e.g., due to climate change), may change the way light, radio waves and/or sound is scattered in the real world environment 20. Indeed, other conditions such as, for example, object material characteristics (e.g., diffraction, absorption, etc.) may also have an impact on the output of the physical ranging sensor. For example, wood typically exhibits different scattering behavior than metal.
The virtual ranging sensor typically does not consider the various environmental conditions that might arise in the real world environment 20. For example, the type of material that the physical stop sign 24 is made of may not be known or taken into consideration when ray tracing is conducted with respect to the virtual stop sign 26. Accordingly, the illustrated virtual world environment 22 is rendered without the impact of, for example, the fog 28 and objects such as the virtual stop sign 26 are not partially occluded from view. Even if a probability density function were to be applied to the output of the virtual ranging sensor, the result still only be dependent on the geometry of the virtual world environment 22.
As will be discussed in greater detail, machine learning (e.g., convolutional neural network/CNN) technology may learn, infer and/or estimate the difference between the measured output of the sensor and the simulated output associated with the sensor. For example, a non-autonomous vehicle equipped with the physical ranging sensor and one or more cameras might be operated through the real world environment 20, wherein the camera(s) generate reference data such as, for example, color image samples, depth map samples, semantic classification samples, and so forth. A neural network may be trained with the measured output of the sensor and the reference data (e.g., color image samples, depth map samples, semantic classification samples, etc., from the simulator), wherein the output of the neural network estimates (e.g., infers) the difference between the measured output of the sensor and the simulated output associated with the sensor. Moreover, the difference may be used to adjust the simulated output associated with the sensor so that the virtual world environment 22 is more realistic. Indeed, an embodiment involves using the trained neural network to correct subsequent measurements of the physical sensor in a probabilistic fashion.
For each set st, k samples of the same scene are captured with the lidar sensor 32 and the camera(s) 36. The lidar sensor 32 generates N number of points for each sample k; N depending on the sensor specifications. The camera(s) 36 have a resolution of W×H pixels, where W and H are the width and height of the image respectively. The kth sample of the tth set is defined as:
s
k
t={(P,I,D,S),P∈3×N,I∈W×H×3,D∈W×H,S∈0W×H}
with t=0, . . . , T−1, and k=0, . . . , K−1, where class labels may be unsigned integers 0.
Each data point in the dataset st is obtained by collapsing its k samples as follows. First, the average values of each image vector are computed (Ī,
s
t
={
Each of the components of st contain per-pixel information and are presented to a “delta” neural network (ΔNN) as stacked data channels.
Turning now to
For example, computer program code to carry out operations shown in the method 48 may be written in any combination of one or more programming languages, including an object oriented programming language such as JAVA, SMALLTALK, C++ or the like and conventional procedural programming languages, such as the “C” programming language or similar programming languages. Additionally, logic instructions might include assembler instructions, instruction set architecture (ISA) instructions, machine instructions, machine dependent instructions, microcode, state-setting data, configuration data for integrated circuitry, state information that personalizes electronic circuitry and/or other structural components that are native to hardware (e.g., host processor, central processing unit/CPU, microcontroller, etc.).
Illustrated processing block 50 provides for setting a controllable covariance parameter of the neural network. More particularly, the covariance inferred for each point after training may be used to dynamically generate stochastic variations for each single scan of the sensor. Accordingly, using a distance metric such as, for example, the Mahalanobis distance (e.g., quantifying how many standard deviations away a point P is from the mean of a distribution D), for each point it is possible to draw a family of samples Pj based on the mean point Pi:=(μx, μy, μz) and the covariance Σj matrix. The advantage of deriving the point from such a formulation is that the uncertainty can be parametrically controlled by a scalar factor λ and the Eigen values from the singular value decomposition SVD(Σj)u:
P
j(Pi,λ)=(μxi,μyi,μzi)+λ(SVD(Σj)u1/2λdiag(rand[−1,1],rand[−1,1],rand[−1,1]))
As will be discussed in greater detail, this formulation may enable the accuracy decay and robustness of each system (e.g., automated navigation system instance) to be analyzed in a sound and formally complete manner. In other words, the simulated output associated with the sensor may be transformed in a more realistic model, characterized by the parameter λ.
A forward (e.g., input to output) propagation of the neural network is conducted at illustrated block 52 based on reference data (e.g., color image samples, depth map samples, semantic classification samples) associated with the simulator. In an embodiment, block 52 includes computing a gradient of a loss function with respect to current weights of the neural network layers. More particularly, block 52 may include averaging color image samples on a per point basis, averaging depth map samples on a per point basis, and generating a histogram of semantic classification samples on a per point basis. Additionally, a backward (e.g., output to input) propagation of the neural network may be conducted at block 54 based on the actual output of the sensor. In an embodiment, block 54 includes updating the current weights in a manner that reduces the value of the loss function. Block 56 determines whether the loss function has converged to an acceptable value. If not, the illustrated method 48 returns to block 52. Otherwise, the method 48 may terminate. The resulting neural network is configured to estimate the difference between the measured output of the sensor and the simulated output associated with the sensor.
Turning now to
Illustrated processing block 80 obtains a neural network such as, for example, the neural network 46 (
More particularly, the illustrated solution exploits the learned uncertainty and noise behavior of the physical lidar sensor 32, via ΔNN training, to improve the measurements of the lidar sensor 32. This enhancement procedure may be accomplished in two phases.
First, using the RGBD cameras and associated semantic segmentation information, the neural network 46 generates an expected probabilistic point cloud. In this set, each point is described as a three-dimensional (3D) Gaussian distribution. Thus, it is possible to compare (e.g., via registration and closest point calculation in Euclidean distance) each point obtained with the physical lidar sensor 32 against the corresponding point generated by the neural network 46. In an embodiment, the deviation distance is once again the Mahalanobis distance Γ(Pi). Therefore, the normalized confidence Γ(Pi) for each point is expressed as:
A relatively large Mahalanobis distance indicates low confidence, whereas a relatively small Mahalanobis distance indicates a high confidence. This confidence information, which is determined on a per point basis, may be very valuable for many systems (e.g., automated navigation systems) because it expresses which regions of the environment have been sampled more reliably. Indeed, the confidence information may be incorporated into the probabilistic frameworks of localization, segmentation and motion estimation technology. Moreover, the point-wise nature of the confidence information may enable the full frame rate of the sensor to be leveraged in a real-time setting. The confidence value per point Pi may also lead to the second phase.
The second phase is to use the inverse equation the off-line mode to correct (e.g., remove uncertainty from) the points captured with the physical lidar sensor 32, as already noted. The correction parameter is actually Ω=λ−1. This approach therefore enables the enhancement of real sensing signals with the deep learned stochastic behavior of each specific sensor instance in order to improve the overall digitalization in terms of accuracy and reliability.
The method 100 may generally be implemented in a physical sensor such as, for example, the lidar sensor 32 (
Illustrated processing block 102 obtains a neural network such as, for example, the neural network 46 (
Turning now to
The host processor 164 may include logic 186 (e.g., logic instructions, configurable logic, fixed-functionality hardware logic, etc., or any combination thereof) to perform one or more aspects of the method 48 (
Additionally, the logic 186 may obtain a neural network output that estimates a point-wise difference between a first measured output of the sensor 162 and a simulated output of the sensor 162 and determines, on a per point basis, the confidence of a second measurement output of the sensor based on the difference. In an embodiment, the logic 186 also subtracts the point-wise difference from the second measurement output of the sensor. The illustrated system 160 therefore exhibits enhanced performance due to more realistic simulation of the sensor 162 and/or more accurate readings from the sensor 162. Although the logic 186 is shown in the host processor 164, the logic 186 may be located elsewhere in the system 160.
Example 1 includes a semiconductor apparatus comprising one or more substrates and logic coupled to the one or more substrates, wherein the logic is implemented at least partly in one or more of configurable logic or fixed-functionality hardware logic, the logic coupled to the one or more substrates to obtain a neural network output that estimates a difference between a measured output of a sensor and a simulated output associated with the sensor and add the difference to the simulated output associated with the sensor.
Example 2 includes the semiconductor apparatus of Example 1, wherein the neural network output includes mean displacement data and parametrically controllable covariance data.
Example 3 includes the semiconductor apparatus of Example 1, wherein the logic coupled to the one or more substrates is to train a neural network with the measured output of the sensor and reference data, and wherein the neural network output is obtained from the trained neural network.
Example 4 includes the semiconductor apparatus of Example 3, wherein the logic coupled to the one or more substrates is to conduct one or more forward propagations of the neural network based on the reference data, and conduct one or more backward propagations of the neural network based on the measured output of the sensor to train the neural network.
Example 5 includes the semiconductor apparatus of Example 3, wherein the reference data is associated with a simulator and includes one or more of color image samples, depth map samples or semantic classification samples.
Example 6 includes the semiconductor apparatus of Example 5, wherein the logic is to average the color image samples on a per point basis, average the depth map samples on a per point basis, and generate a histogram of the semantic classification samples on a per point basis.
Example 7 includes the semiconductor apparatus of Example 1, wherein the simulated output is a simulated point cloud and the measured output is a measured point cloud.
Example 8 includes the semiconductor apparatus of Example 1, wherein the logic coupled to the one or more substrates is to input reference data to a trained neural network to obtain the neural network output.
Example 9 includes at least one computer readable storage medium comprising a set of instructions, which when executed by a computing system, cause the computing system to obtain a neural network output that estimates a difference between a measured output of a sensor and a simulated output associated with the sensor, and add the difference to the simulated output associated with the sensor.
Example 10 includes the at least one computer readable storage medium of Example 9, wherein the neural network output includes mean displacement data and parametrically controllable covariance data.
Example 11 includes the at least one computer readable storage medium of Example 9, wherein the instructions, when executed, cause the computing system to train a neural network with the measured output of the sensor and reference data, and wherein the neural network output is obtained from the trained neural network.
Example 12 includes the at least one computer readable storage medium of Example 11, wherein the instructions, when executed, cause the computing system to conduct one or more forward propagations of the neural network based on the reference data, and conduct one or more backward propagations of the neural network based on the measured output of the sensor to train the neural network.
Example 13 includes the at least one computer readable storage medium of Example 11, wherein the reference data is associated with a simulator and includes one or more of color image samples, depth map samples or semantic classification samples.
Example 14 includes the at least one computer readable storage medium of Example 13, wherein the instructions, when executed, cause the computing system to average the color image samples on a per point basis, average the depth map samples on a per point basis, and generate a histogram of the semantic classification samples on a per point basis.
Example 15 includes the at least one computer readable storage medium of Example 9, wherein the simulated output is a simulated point cloud and the measured output is a measured point cloud.
Example 16 includes the at least one computer readable storage medium of Example 9, wherein the instructions, when executed, cause the computing system to input reference data to a trained neural network to obtain the neural network output.
Example 17 includes a semiconductor apparatus comprising one or more substrates, and logic coupled to the one or more substrates, wherein the logic is implemented at least partly in one or more of configurable logic or fixed-functionality hardware logic, the logic coupled to the one or more substrates to obtain a neural network output that estimates a point-wise difference between a first measured output of a sensor and a simulated output associated with the sensor, and determine, on a per point basis, a confidence of a second measurement output of the sensor based on the difference.
Example 18 includes the semiconductor apparatus of Example 17, wherein the logic coupled to the one or more substrates is to subtract the point-wise difference from a second measurement output of the sensor.
Example 19 includes the semiconductor apparatus of Example 17, wherein the neural network output includes mean displacement data and parametrically controllable covariance data.
Example 29 includes the semiconductor apparatus of Example 17, wherein the logic coupled to the one or more substrates is to input reference data to a trained neural network to obtain the neural network output.
Example 21 includes at least one computer readable storage medium comprising a set of instructions, which when executed by a computing system, cause the computing system to obtain a neural network output that estimates a point-wise difference between a first measured output of a sensor and a simulated output associated with the sensor, and determine, on a per point basis, a confidence of a second measurement output of the sensor based on the difference.
Example 22 includes the at least one computer readable storage medium of Example 21, wherein the instructions, when executed, cause the computing system to subtract the point-wise difference from a second measurement output of the sensor.
Example 23 includes the at least one computer readable storage medium of Example 21, wherein the neural network output includes mean displacement data and parametrically controllable covariance data.
Example 24 includes the at least one computer readable storage medium of Example 21, wherein the instructions, when executed, cause the computing system to input reference data to a trained neural network to obtain the neural network output.
Thus, technology described herein may improve the realism of synthetically generated data in simulators by generating a model that learns the behavior of real sensors and performs a transformation of virtual input to the desired deterministic sensor. Such an approach enables the capture of realistic sensor data in a simulator without affecting the virtual world rendering. Furthermore, a reverse application of the approach can be used on real data captures to reduce signal noise perceived by sensors.
Embodiments are applicable for use with all types of semiconductor integrated circuit (“IC”) chips. Examples of these IC chips include but are not limited to processors, controllers, chipset components, programmable logic arrays (PLAs), memory chips, network chips, systems on chip (SoCs), SSD/NAND controller ASICs, and the like. In addition, in some of the drawings, signal conductor lines are represented with lines. Some may be different, to indicate more constituent signal paths, have a number label, to indicate a number of constituent signal paths, and/or have arrows at one or more ends, to indicate primary information flow direction. This, however, should not be construed in a limiting manner. Rather, such added detail may be used in connection with one or more exemplary embodiments to facilitate easier understanding of a circuit. Any represented signal lines, whether or not having additional information, may actually comprise one or more signals that may travel in multiple directions and may be implemented with any suitable type of signal scheme, e.g., digital or analog lines implemented with differential pairs, optical fiber lines, and/or single-ended lines.
Example sizes/models/values/ranges may have been given, although embodiments are not limited to the same. As manufacturing techniques (e.g., photolithography) mature over time, it is expected that devices of smaller size could be manufactured. In addition, well known power/ground connections to IC chips and other components may or may not be shown within the figures, for simplicity of illustration and discussion, and so as not to obscure certain aspects of the embodiments. Further, arrangements may be shown in block diagram form in order to avoid obscuring embodiments, and also in view of the fact that specifics with respect to implementation of such block diagram arrangements are highly dependent upon the computing system within which the embodiment is to be implemented, i.e., such specifics should be well within purview of one skilled in the art. Where specific details (e.g., circuits) are set forth in order to describe example embodiments, it should be apparent to one skilled in the art that embodiments can be practiced without, or with variation of, these specific details. The description is thus to be regarded as illustrative instead of limiting.
The term “coupled” may be used herein to refer to any type of relationship, direct or indirect, between the components in question, and may apply to electrical, mechanical, fluid, optical, electromagnetic, electromechanical or other connections. In addition, the terms “first”, “second”, etc. may be used herein only to facilitate discussion, and carry no particular temporal or chronological significance unless otherwise indicated.
As used in this application and in the claims, a list of items joined by the term “one or more of” may mean any combination of the listed terms. For example, the phrases “one or more of A, B or C” may mean A; B; C; A and B; A and C; B and C; or A, B and C.
Those skilled in the art will appreciate from the foregoing description that the broad techniques of the embodiments can be implemented in a variety of forms. Therefore, while the embodiments have been described in connection with particular examples thereof, the true scope of the embodiments should not be so limited since other modifications will become apparent to the skilled practitioner upon a study of the drawings, specification, and following claims.