The field of the disclosure relates generally to autonomous vehicles and, more specifically, to systems and methods for signal processing optimizations in autonomous vehicle perception systems employing light detection and ranging (LiDAR) sensors.
Autonomous vehicles employ fundamental technologies such as, perception, localization, behaviors and planning, and control. Perception technologies enable an autonomous vehicle to sense and process its environment. Perception technologies process a sensed environment to identify and classify objects, or groups of objects, in the environment, for example, pedestrians, vehicles, or debris. Localization technologies determine, based on the sensed environment, for example, where in the world, or on a map, the autonomous vehicle is. Localization technologies process features in the sensed environment to correlate, or register, those features to known features on a map. Localization technologies may rely on inertial navigation system (INS) data. Behaviors and planning technologies determine how to move through the sensed environment to reach a planned destination. Behaviors and planning technologies process data representing the sensed environment and localization or mapping data to plan maneuvers and routes to reach the planned destination for execution by a controller or a control module. Controller technologies use control theory to determine how to translate desired behaviors and trajectories into actions undertaken by the vehicle through its dynamic mechanical components. This includes steering, braking and acceleration.
LiDAR sensors are used for scanning the environment of the autonomous vehicle is a broadly adopted perception technology. LiDAR sensors emit pulses of light in all directions, and then examine the returned light. Since the emitted light may encounter unforeseen obstacles (e.g., reflection from multiple surfaces, or atmospheric conditions such as fog, rain, etc.) before it returns to a detector of the LiDAR sensor, extracting useful information from the signal received at the detector is a challenging and critical task performed by a digital signal processor (DSP). After the DSP decodes the signal, the decoded data is used to generate three-dimensional (3D) point clouds for downstream 3D vision modelling. Existing LiDAR DSPs are based upon cascades of parametrized operations requiring tuning of configuration parameters for improved 3D object detection, intersection of union (IoU) losses, or depth error metrics. However, known LiDAR sensor systems are generally available as fixed black boxes, and interfacing DSP hyperparameters for tuning configuration parameters is not straightforward.
This section is intended to introduce the reader to various aspects of art that may be related to various aspects of the present disclosure described or claimed below. This description is believed to be helpful in providing the reader with background information to facilitate a better understanding of the various aspects of the present disclosure. Accordingly, it should be understood that these statements are to be read in this light and not as admissions of prior art.
In one aspect, a system including at least one memory storing instructions and at least one processor in communication with the at least one memory is disclosed. The at least one processor is configured to execute the stored instructions to: (i) control a light detection and ranging (LiDAR) sensor to emit a pulse into an environment of the LiDAR sensor; (ii) generate temporal histograms corresponding to a signal detected by a detector of the LiDAR sensor for the pulse emitted by the LiDAR sensor; (iii) denoise a temporal waveform generated based on the temporal histograms; (iv) estimate ambient light; (v) determine a noise threshold corresponding to the ambient light; (vi) determine a peak of a plurality of peaks that has a maximum intensity; and (vii) add the peak to a point cloud.
In another aspect, a computer-implemented method is disclosed. The computer-implemented method includes (i) controlling a light detection and ranging (LiDAR) sensor to emit a pulse into an environment of the LiDAR sensor; (ii) generating temporal histograms corresponding to a signal detected by a detector of the LiDAR sensor for the pulse emitted by the LiDAR sensor; (iii) denoising a temporal waveform generated based on the temporal histograms; (iv) estimating ambient light; (v) determining a noise threshold corresponding to the ambient light; (vi) determining a peak of a plurality of peaks that has a maximum intensity; and (vii) adding the peak to a point cloud.
In yet another aspect, a vehicle including at least one light detection and ranging (LiDAR) sensor, at least one memory storing instructions, and at least one processor in communication with the at least one memory is disclosed. The at least one processor is configured to execute the stored instructions to: (i) control the LiDAR sensor to emit a pulse into an environment of the LiDAR sensor; (ii) generate temporal histograms corresponding to a signal detected by a detector of the LiDAR sensor for the pulse emitted by the LiDAR sensor; (iii) denoise a temporal waveform generated based on the temporal histograms; (iv) estimate ambient light; (v) determine a noise threshold corresponding to the ambient light; (vi) determine a peak of a plurality of peaks that has a maximum intensity; and (vii) add the peak to a point cloud.
Various refinements exist of the features noted in relation to the above-mentioned aspects. Further features may also be incorporated in the above-mentioned aspects as well. These refinements and additional features may exist individually or in any combination. For instance, various features discussed below in relation to any of the illustrated examples may be incorporated into any of the above-described aspects, alone or in any combination.
The patent or application file contains at least one drawing executed in color. Copies of this patent or patent application publication with color drawings will be provided by the Office upon request and payment of the necessary fee.
The following drawings form part of the present specification and are included to further demonstrate certain aspects of the present disclosure. The disclosure may be better understood by reference to one or more of these drawings in combination with the detailed description of specific embodiments presented herein.
Corresponding reference characters indicate corresponding parts throughout the several views of the drawings. Although specific features of various examples may be shown in some drawings and not in others, this is for convenience only. Any feature of any drawing may be referenced or claimed in combination with any feature of any other drawing.
The following detailed description and examples set forth preferred materials, components, and procedures used in accordance with the present disclosure. This description and these examples, however, are provided by way of illustration only, and nothing therein shall be deemed to be a limitation upon the overall scope of the present disclosure. The following terms are used in the present disclosure as defined below.
An autonomous vehicle: An autonomous vehicle is a vehicle that is able to operate itself to perform various operations such as controlling or regulating acceleration, braking, steering wheel positioning, and so on, without any human intervention. An autonomous vehicle has an autonomy level of level-4 or level-5 recognized by National Highway Traffic Safety Administration (NHTSA).
A semi-autonomous vehicle: A semi-autonomous vehicle is a vehicle that is able to perform some of the driving related operations such as keeping the vehicle in lane and/or parking the vehicle without human intervention. A semi-autonomous vehicle has an autonomy level of level-1, level-2, or level-3 recognized by NHTSA.
A non-autonomous vehicle: A non-autonomous vehicle is a vehicle that is neither an autonomous vehicle nor a semi-autonomous vehicle. A non-autonomous vehicle has an autonomy level of level-0 recognized by NHTSA.
Various embodiments described herein correspond with systems and methods for optimizing LiDAR sensing and DSP parameters (or hyperparameters) for a downstream task such as, a 3D object detection task, a vehicle localization task, a road surface detection task, or a lane geometry identification task, etc. As described herein, optimization of LiDAR system parameters is performed using a realistic LiDAR simulation method generating raw waveforms as input to a LiDAR DSP pipeline. Additionally, LiDAR parameters (or hyperparameters) are optimized for 3D object detection IoU losses or depth error metrics, or both, by solving a nonlinear multi-objective optimization (MOO) problem with a 0th-order stochastic algorithm. In some embodiments, and by way of a non-limiting example, the methods described herein for 3D object detection tasks may outperform manual expert tuning by up to about 39.5% mean Average Precision (mAP), or more.
Various embodiments in the present disclosure are described with reference to
The vehicle 100 may be an autonomous vehicle, in which case the vehicle 100 may omit the steering wheel and the steering column to steer the vehicle 100. Rather, the vehicle 100 may be operated by an autonomy computing system (not shown) of the vehicle 100 based on data collected by a sensor network (not shown in
In the example embodiment, sensors 202 may include various sensors such as, for example, radio detection and ranging (RADAR) sensors 210, light detection and ranging (LiDAR) sensors 212, cameras 214, acoustic sensors 216, temperature sensors 218, or inertial navigation system (INS) 220, which may include one or more global navigation satellite system (GNSS) receivers 222 and one or more inertial measurement units (IMU) 224. Other sensors 202 not shown in
Cameras 214 are configured to capture images of the environment surrounding autonomous vehicle 100 in any aspect or field of view (FOV). The FOV can have any angle or aspect such that images of the areas ahead of, to the side, behind, above, or below autonomous vehicle 100 may be captured. In some embodiments, the FOV may be limited to particular areas around autonomous vehicle 100 (e.g., forward of autonomous vehicle 100, to the sides of autonomous vehicle 100, etc.) or may surround 360 degrees of autonomous vehicle 100. In some embodiments, autonomous vehicle 100 includes multiple cameras 214, and the images from each of the multiple cameras 214 may be processed for 3D objects detection in the environment surrounding autonomous vehicle 100. In some embodiments, the image data generated by cameras 214 may be sent to autonomy computing system 200 or other aspects of autonomous vehicle 100 or a hub or both.
LiDAR sensors 212 generally include a laser generator and a detector that send and receive a LiDAR signal such that LiDAR point clouds (or “LiDAR images”) of the areas ahead of, to the side, behind, above, or below autonomous vehicle 100 can be captured and represented in the LiDAR point clouds. RADAR sensors 210 may include short-range RADAR (SRR), mid-range RADAR (MRR), long-range RADAR (LRR), or ground-penetrating RADAR (GPR). One or more sensors may emit radio waves, and a processor may process received reflected data (e.g., raw RADAR sensor data) from the emitted radio waves. In some embodiments, the system inputs from cameras 214, RADAR sensors 210, or LiDAR sensors 212 may be used in combination in perception technologies of autonomous vehicle 100.
GNSS receiver 222 is positioned on autonomous vehicle 100 and may be configured to determine a location of autonomous vehicle 100, which it may embody as GNSS data. GNSS receiver 222 may be configured to receive one or more signals from a global navigation satellite system (e.g., Global Positioning System (GPS) constellation) to localize autonomous vehicle 100 via geolocation. In some embodiments, GNSS receiver 222 may provide an input to or be configured to interact with, update, or otherwise utilize one or more digital maps, such as an HD map (e.g., in a raster layer or other semantic map). In some embodiments, GNSS receiver 222 may provide direct velocity measurement via inspection of the Doppler effect on the signal carrier wave. Multiple GNSS receivers 222 may also provide direct measurements of the orientation of autonomous vehicle 100. For example, with two GNSS receivers 222, two attitude angles (e.g., roll and yaw) may be measured or determined. In some embodiments, autonomous vehicle 100 is configured to receive updates from an external network (e.g., a cellular network). The updates may include one or more of position data (e.g., serving as an alternative or supplement to GNSS data), speed/direction data, orientation or attitude data, traffic data, weather data, or other types of data about autonomous vehicle 100 and its environment.
IMU 224 is a micro-electrical-mechanical (MEMS) device that measures and reports one or more features regarding the motion of autonomous vehicle 100, although other implementations are contemplated, such as mechanical, fiber-optic gyro (FOG), or FOG-on-chip (SiFOG) devices. IMU 224 may measure an acceleration, angular rate, or an orientation of autonomous vehicle 100 or one or more of its individual components using a combination of accelerometers, gyroscopes, or magnetometers. IMU 224 may detect linear acceleration using one or more accelerometers and rotational rate using one or more gyroscopes and attitude information from one or more magnetometers. In some embodiments, IMU 224 may be communicatively coupled to one or more other systems, for example, GNSS receiver 222 and may provide input to and receive output from GNSS receiver 222 such that autonomy computing system 200 is able to determine the motive characteristics (acceleration, speed/direction, orientation/attitude, etc.) of autonomous vehicle 100.
In the example embodiment, autonomy computing system 200 employs vehicle interface 204 to send commands to the various aspects of autonomous vehicle 100 that actually control the motion of autonomous vehicle 100 (e.g., engine, throttle, steering wheel, brakes, etc.) and to receive input data from one or more sensors 202 (e.g., internal sensors). External interfaces 206 are configured to enable autonomous vehicle 100 to communicate with an external network via, for example, a wired or wireless connection, such as Wi-Fi 226 or other radios 228. In embodiments including a wireless connection, the connection may be a wireless communication signal (e.g., Wi-Fi, cellular, LTE, 5G, 6G, Bluetooth, etc.).
In some embodiments, external interfaces 206 may be configured to communicate with an external network via a wired connection 244, such as, for example, during testing of autonomous vehicle 100 or when downloading mission data after completion of a trip. The connection(s) may be used to download and install various lines of code in the form of digital files (e.g., HD maps), executable programs (e.g., navigation programs), and other computer-readable code that may be used by autonomous vehicle 100 to navigate or otherwise operate, either autonomously or semi-autonomously. The digital files, executable programs, and other computer readable code may be stored locally or remotely and may be routinely updated (e.g., automatically, or manually) via external interfaces 206 or updated on demand. In some embodiments, autonomous vehicle 100 may deploy with all of the data it needs to complete a mission (e.g., perception, localization, and mission planning) and may not utilize a wireless connection or other connections while underway.
In the example embodiment, autonomy computing system 200 is implemented by one or more processors and memory devices of autonomous vehicle 100. Autonomy computing system 200 includes modules, which may be hardware components (e.g., processors or other circuits) or software components (e.g., computer applications or processes executable by autonomy computing system 200), configured to generate outputs, such as control signals, based on inputs received from, for example, sensors 202. These modules may include, for example, a calibration module 230, a mapping module 232, a motion estimation module 234, a perception and understanding module 236, a behaviors and planning module 238, a control module or controller 240, and a multi-objective optimization (MOO) module 242. The MOO module 242, for example, may be embodied within another module, such as behaviors and planning module 238, or perception and understanding module 236, or separately. These modules may be implemented in dedicated hardware such as, for example, an application specific integrated circuit (ASIC), field programmable gate array (FPGA), a digital signal processor (DSP), or microprocessor, or implemented as executable software modules, or firmware, written to memory and executed on one or more processors onboard autonomous vehicle 100.
The MOO module 242 may perform one or more tasks including, but not limited to, setting or generating a LiDAR wavefront simulation environment that models realistic transient scene responses, implementing an optimization method for balance multi-objective black-box optimization of LiDAR sensor hyperparameters, or validating end-to-end LiDAR sensing and DSP optimization for 3D object detection and error depth estimation.
Autonomy computing system 200 of autonomous vehicle 100 may be completely autonomous (fully autonomous) or semi-autonomous. In one example, autonomy computing system 200 can operate under Level 5 autonomy (e.g., full driving automation), Level 4 autonomy (e.g., high driving automation), or Level 3 autonomy (e.g., conditional driving automation). As used herein the term “autonomous” includes both fully autonomous and semi-autonomous.
Computing system 300 also includes I/O devices 316, which may include, for example, a communication interface such as a network interface controller (NIC) 318, or a peripheral interface for communicating with a peripheral device 320 over a peripheral link 322. I/O devices 316 may include, for example, a GPU for image signal processing, a serial channel controller or other suitable interface for controlling a sensor peripheral such as one or more acoustic sensors, one or more LiDAR sensors, one or more cameras, one or more weight sensors, a keyboard, or a display device, etc.
As described herein, environment perception for autonomous drones and vehicles requires precise depth sensing for safety-critical control decisions. Scanning LiDAR sensors have been broadly adopted in autonomous driving as they provide high temporal and spatial resolution, and recent advances in MEMS scanning and photodiode technology have reduced their cost and form factor.
In the 3D detection methods described herein, 3D point cloud (PC) data is taken as input. The 3D PC data is produced by a LiDAR and digital signal processor (DSP) pipeline with many measurements and processing steps. As described herein, typical LiDAR sensors operate by emitting a laser pulse and measuring the temporal response through a detector, e.g., an Avalanche Photo Diode (APD) detector. This temporal wavefront signal is fed to a DSP that extracts peaks corresponding to echoes from candidate targets within the environment. As such, DSP processing may result in a 1000-fold data reduction for a single emitted beam, producing single or multiple 3D points per beam. Compressing the waveform into points in 3D space with minimal information loss is challenging because of object discontinuities, sub-surface scattering, multipath reflections, and scattering media, etc.
Generally, LiDAR sensor systems are black boxes with configuration parameters hidden from the user. In some embodiments, to account for noisy point cloud measurements with spurious artifacts, simulated adverse effects and point cloud degradations that model rain, fog and snow may be added to LiDAR datasets of LiDAR systems that are referenced in the present disclosure as black-box LiDAR systems. Additionally, downstream vision models are retrained for predictions using augmented point clouds that are more robust to point cloud data corruption. Additionally, or alternatively, synthetic measurements from 3D scenes may be generated using rendering engines. However, currently known methods avoid simulating transient light propagation and signal processing by converting 3D scene depth directly into a point cloud. As a result, known methods lack physically realistic modeling of fluctuations arising from multipath effects or measurement noise. Further, known simulation methods that alter measurements or generate synthetic point clouds generally do not optimize sensing or DSP parameters for downstream vision performance. Embodiments described herein addresses these shortcomings of known LiDAR systems and DSP methods of optimizations.
In some embodiments, LiDAR pulse configuration and DSP hyperparameters for end-to-end downstream 3D object detector losses and PC depth quality metrics may be optimized as described herein. Optimization of LiDAR pulse configuration and DSP hyperparameters is a challenging task because hyperparameter space generally involves tens to hundreds of categorical, discrete, and effectively continuous parameters affecting downstream tasks in complex nonlinear ways via an intermediate point cloud. Examples of categorical hyperparameters include Velodyne LiDAR sensor return modes, which is an example of a continuous hyperparameter, and configured internal wave-front peak selection algorithms for point cloud formation rotation velocity, which impacts angular resolution.
As described herein, grid search optimization is impractical because of combinatorial explosion. 0th-order stochastic algorithm can find camera DSP hyperparameters that improve downstream 2D object detectors. An optimization method for LiDAR sensing and DSP hyperparameters, as described herein, may minimize end-to-end domain-specific losses such as root mean squared error (RMSE) of the measured depth against ground truth and IoU measured on downstream 3D object detection. In some embodiments, and by way of a non-limiting example, a LiDAR simulation method based on the Car Learning to Act (CARLA) engine that models a LiDAR DSP as well as the full transient noisy waveform formed by multiple laser echoes may be used in which sensing and DSP hyperparameters are optimized by solving a Multi-Objective black-box Optimization (MOO) problem with a novel Covariance Matrix Adaptation-Evolution Strategy (CMA-ES) that relies on a max-rank multi-objective scalarization loss to dynamically improve scale matching between different loss components. Additionally, a balanced Pareto-optimal solution for which no loss component has a comparatively poor value for LiDAR optimization with multiple objectives may also be used with the proposed LiDAR simulation method. In some embodiments, validation method of the proposed optimization method for 3D object detection and point cloud depth estimation, as described herein, may be used for validating the proposed optimization method in simulation and using an off-the-shelf experimental LiDAR sensor.
In other words, embodiments in the present disclosure describe (i) a LiDAR wavefront simulation for the CARLA simulation environment that models realistic transient scene responses; (ii) a multi-objective optimization method for balanced MOO of LiDAR parameters; and (iii) a method for validating end-to-end LiDAR sensing and DSP optimization for 3D object detection and depth estimation through simulation and with a real system.
Optimization of sensors and DSPs for downstream vision tasks is disclosed. Conversely, known methods target camera image signal processors (ISPs) and optics. Instead of tuning hyperparameter manually by experts, optimization methods described herein may optimize the hyperparameters automatically, based upon one or more downstream performance metrics. As digital signal processor (DSP) and sensor hyperparameters can be categorical and losses are often non-convex and noisy, diverse optimization methods are described in the present disclosure.
Some optimization methods target specific processing blocks as differentiable programs or in a reduced parameter space or rely on differentiable pipeline proxies or 0th-order optimization, alone or in combination with block coordinate descent. One advantage of 0th-order optimizers is that they handle black box hardware and DSPs. 0th-order solvers used to optimize camera systems include MOEA/D and CMA-ES. These approaches successfully tackle camera pipeline optimization from the optics to downstream detectors. However, in the present disclosure, for an end-to-end LiDAR system optimization, a loss-driven method in which LiDAR hyperparameter optimization is performed automatically for improving performance of downstream depth and detection tasks.
LiDAR sensors produce point clouds by emitting pulses of light into a scene and measuring the round trip time of sensor returns. Extracting a point cloud from time-of-flight measurements is a complex process that depends on measurement procedure specifics like beam formation, scanning, pulse/continuous wave generation, and peak finding within the acquired temporal laser response. LiDAR sensors differ in their scanning pattern, beam steering technology, wavelength, pulse profile, coherence of the measurement step, detector technology and DSP capabilities to process the measurement echoes.
As described herein, LiDAR sensors can extract single or multiple peaks resulting from multi-path interreflections in the scene. By way of a non-limiting example, for a single Lambertian reflector in the scene, the temporal resolution and signal-to-noise ratio (SNR) of the measurement are tied to laser power. Accordingly, in some optimization methods, automated runtime laser power adjustment may be used to maximize SNR while preventing oversaturation. Additionally, or alternatively, other approaches for adaptive beam steering may also be used. In some embodiments, beam configuration optimization may be performed via reinforcement learning methods driven by a downstream 3D detection loss, which predicts beam patterns, e.g., where to place sparse samples. Additionally, or alternatively, DSP hyperparameters corresponding to, but not limited to only, sensing, pulse power and scanning parameters may be optimized.
To assess and validate the optimization method, a LiDAR simulation method that plugs directly into, for example, an open-source CARLA simulator is generally used in several simulation environments using simulation frameworks. Simulation frameworks enable creation of multimodal synthetic datasets, e.g., PreSIL, SHIFT, AIODrive and SYNTHIA. However, underlying simulation methods employ heuristic forward models, and none of the datasets include full waveform returns that allows simulating LiDAR point cloud generation. For example, the AIODrive dataset, in which multiple peaks are returned via depth image convolution and Single Photon Avalanche Diode (SPAD) quantization, bakes transients into SPAD image formation, which falls short of enabling realistic transient simulation. Similarly, real PC dataset augmentation methods that are employed to tackle rare occurring events like obstacles, traffic, rain, fog or snow. However, such augmentation methods fail to facilitate modeling of the DSP pipeline because the underlying datasets do not include the raw wavefronts. The disclosed simulation method simulates full wavefront signals that, when combined with a realistic DSP model, produces PC data representative of real systems.
In some embodiments, for a single laser pulse emitted by a LiDAR unit or a LiDAR sensor into a 3D scene, from which a returned signal is detected by a SPAD detector. The SPAD detector then sends temporal histograms to the sensor DSP. For channel n at time t, the sensor-incident temporal photon flux may be defined as:
In Eq. 1 above, g(n) is the temporally varying photon flus emitted by the laser channel n, H is the transient response from the scene, a(t) is the ambient photon flux, and * is the temporal convolution operator.
The transient scene response H includes multipath returns from scene interreflections and scattering. The detector measures the returned signal and digitizes the temporal measurement into temporal wavefronts processed by the DSP. For low photon counts or path lengths above a few meters in automotive scenes, the binning process may be modeled as a Poisson random process. Consequently, the wavefront's number of photons r(n) detected within the integration time Δ in channel n's time bin k may be modelled using Eq. 2 below.
Based upon a linear model for direct laser reflections in the LiDAR context for the incident transient response H*g(n) of Eq. 1, the transient response may be modeled as Eq. 3 below.
In Eq. 3, R is the distance between the sensor and the observed point, c is the speed of light, C is proportionality constant independent of t and R describing the system, and 2T(n) is the total pulse duration for channel n. Path length may be converted to time with t=R/c, and the pulse shape may be defined as Eq. 4 below.
In Eq. 4, p0(n) is channel n's pulse power magnitude. The transient scene response H embedded in Eq. 3 includes geometric attenuation of the light, proportional to 1/(2R)2, and the scene response. For a single opaque point object i, the latter is proportional to its reflectance ρi and Dirac function δ(R−Ri), where Ri being the object distance to the sensor. Reformulating Eq. 3 for a single echo from the single opaque point object i may yield as:
In Eq. 7 above, s, d, and α∈[0,1] refer to the specular, diffuse and roughness properties of a surface material and k=(α+1)2/8. To render realistic textures without a large texture database, s, d, and a may be approximated through CARLA's Phong-like parameters s, d, and s, d, and α. Because these parameters are not directly accessible, these parameters are extracted by projecting targeted hit points onto custom camera images encoding their values, as illustrated in
The ambient light a(t) in Eq. (1) is modeled at a location i as projected on the red channel of a rendered RGB camera image in which shadows and reflections are properly accounted for; denote this image by Ired. Further, a(t) may be approximated as a constant over waveform time bins, that is, a(t)≡ai=ρi(Ired), where ai is independent of t.
Multipath transients for laser beams hitting object discontinuities may be taken as primary artifact sources in automotive scenarios. Multipath transients may be modeled as linear combinations of neighboring waveforms. Specifically, a supersampled collection of {Ri} and channels may be computed using direct illumination only; then, for each LiDAR channel and each horizontal angle, a downsampled waveform ψj(m) may be obtained as:
In Eq. 8, N(j) and N(m) define the spatial neighborhood of the target point j and the channel m, and the k are normalized weights that may be interpreted as a beam spatial intensity profile.
In some embodiments, with compressed notation, LiDAR sensing may be modelled as a function Φ(θ) with hyperparameters θ=(P0(m), T(m), V(m)). Laser power P0(m) and pulse duration T(m) are functions of the channel m, and determine the emitted pulse g(m). DSP noises by convolving the measured waveform rj(m) with the emitted pulse g(m), and ambient light may be estimated by removing the waveform's median form rj(m), which allows the DSP to find adequate noise thresholds V(m) since ambient light varies strongly throughout the scene. DSPs use a rising edge detector that finds peaks along the denoised waveform by identifying intersections with V(m). However, in some examples, multiple peaks may arise; the peak with the highest sensitivity may be added to the point cloud O. Additionally, by way of a non-limiting example, the maximum intensity may be compensated for the emitted laser pulse with the pulse half width T(m)/2 and power level P0(m) may be used as scaling factors to recover the true intensities as shown in Eq. 3.
By way of an example, LiDAR sensors may have upward of 128 channels. Accordingly, optimizing every channel hyperparameter individually may be prohibitive. The Velodyne HDL-64 bundles lasers in two groups, and similarly LiDAR models may group lower and upper lasers. Within each group, hyperparameters modulation may be an affine function of the channel number with tunable slope and bias. The edge threshold may be modelled as a continuous parameter V(m) ∈[0,2]. In contrast, power levels P0(m) may take one of a predesignated number of values (e.g., 11 values), such that the lowest power level may make the peak almost indistinguishable from ambient noise and the highest power level may likely saturate at close range. Pulse duration T(m) may take discrete values ranging from 3 to 15 ns in 1 ns increment.
A multi-objective optimization (MOO) method described herein finds Pareto-Optimal LiDAR hyperparameters. Using the MOO method, as described herein, high-quality point clouds for depth precision and optimal mAP may be generated and inputted to a downstream object detection module.
In some embodiments, hyperparameters may be optimized with loss-driven end-to-end optimization such that the system as a whole, including hardware functionality, for example, the laser beam power of each channel, with DSP functionality may be optimized. The operation of the LiDAR imaging pipeline Φ that is modulated by P hyperparameters θ=(θ1, . . . , θp) with ranges of values normalized to the unit interval [0, 1]. With T>>2T, each of the J channels Φj of Φ=(Φ1, . . . Φ1) may be modeled as:
In Eq. 9, Oj may be a mapping from the unit sphere S (proxy for projective geometry) to nonnegative distance where 0 may be interpreted as “undefined” so that each Φj may reconstruct a portion Oj of the overall point cloud O from a waveform r truncated to the time interval [−T, 0]. The overall θ-modulated LiDAR pipeline may be defined as Eq. 10 below:
The LiDAR pipeline as defined in Eq. 10 maps the set of truncated waveforms r=(r1, . . . rJ) to the point cloud O including the compressed information available to downstream detectors about the changing scene H. Pareto-optimal with respect to the MOO loss vector =(1, . . . , L) may be defined as:
In Eq. 11 above, loss components may not directly use the point cloud O; for example, 1 may use data tapped out of the pipeline (e.g., a channel's waveform rj) or the output of a downstream detector (e.g., a deep convolutional neural network (CNN)), which ingests the point cloud O (e.g., mAP). The pareto front including Pareto-optimal compromises between losses may be the solution set of Eq. 11, and from which a single “champion” may be selected using additional criteria. In the present disclosure, the term “champion” refers to the best possible state returned by the optimization performing equally good on all metrices under investigation.
In some embodiments, similar to the convex combination
which boils down to the l1-norm of the loss vector with unit weights wl, scalarizations may be used to combine multiple objectives so that a single objective optimizer may yield MOO solutions. Scalarization weights may be difficult to choose when loss variations are not commensurate. However, the max-rank loss may address this issue. In the context of a generation-based algorithm like the algorithm shown in
In Eq. 12 above, ranks are counted from 0 and loss component value ties are resolved by left bisection. The weighted (left-bisection) max-rank loss Mq,m,n of the hyperparameter vector θq,m at the end of generation n may be as shown below:
By way of a non-limiting example, the max-rank loss may be dynamic, and for a given θq,m, its values may be monotone non-decreasing with respect to addition of data. Because weights multiply ranks, they are non-dimensional, which may dial in the relative importance of loss components. While each wl is scaled by the (damped) running proportion of individuals that “fail” to pass a user-defined threshold, and such adaptive weights may break monotonicity. However, when wls are kept fixed, average left and right bisection ranks may be improved to stabilize Mq,m,n with respect to loss value tie breaking from, e.g., noise, and creation (e.g., quantization) to define stable (dynamically monotone) max-rank loss scalarization. Further, if the left bisection rank is 0, the average may be set to 0.
Besides more refined seatbelting, the algorithm illustrated in
Additionally, or alternatively, loss of the weighted centroid of every generation may be evaluated (standard CMA-ES only generate Gaussian clouds with them) as shown by the greedy branch in lines 18-20 of the algorithm illustrated in
The LiDAR simulation model described in the present disclosure may be validated by jointly optimizing depth estimation and downstream 3D object detection within scenes. The proposed optimization algorithm may be compared with the 0th-order solvers, using off-the-shelf hardware LiDAR system.
As described herein, hyperparameters affect the wavefront and DSP, and the DSP rising edge threshold V(m). In some embodiments, and by way of a non-limiting example, a predesignated number of LiDAR hyperparameters (e.g., 10 hyperparameters) may control low-level sensing parameters including the laser power P0(m) and laser pulse width T(m) for each of the 128 channels.
In some embodiments, point clouds may be optimized for depth and intensity with an optimizer described herein. An average root mean square error (RMSE) of the depth and an average RMSE of the intensity may be minimized using:
In Eq. 14 and Eq. 15 above, F corresponds with the number of frames in the validation set. The depth loss depth rewards accurate point cloud depth estimates over the full range, whereas int. ensures that accurate intensities are measured with the pulse power P0(j). Generally, high output power may result in more accurate point clouds at father distances but may also lead to excessive saturation. Further, int. penalizes saturation.
In some embodiments, and by way of a non-limiting example, an optimization for object detection and classification may be performed using Average Precision (AP) as an additional optimization objective, in which AP is maximized for cars and pedestrians at 40 recall positions over an optimization set with F=100 frames, as shown in Eq. 16 below.
In standard Karlsruhe Institute of Technology and Toyota Technological Institute (KITTI) IoU thresholds, the CV loss is evaluated over a 0-80 m range. A PointVoxel Region-based Convolutional Neural Network (PV-RCNN) for 3D object detection may be trained on 5900 full range point clouds collected from the simulation environment with 8 different expert-tuned parametrizations θ.
Off-the-shelf LiDAR Optimization
The optimization algorithm shown in
Accordingly, the in-the-loop black box 0th-order optimization method for LiDAR sensing pipelines as described herein finds optimal parameters for depth and intensity estimation and 3D object detection and classification. To assess the in-the-loop black box 0th-order optimization method, a LiDAR simulation method may be integrated into the CARLA simulator. Accordingly, optimizing the LiDAR sensing pipeline may significantly improve depth, intensity and object detection and classification compared to manual expert tuning. Specifically, for 3D object detection our optimization method may result in a major increase of 51.9% AP for cars and 27.2% AP for pedestrians, compared to fine-tuning a detector on expert-optimized LiDAR vendor parameters. Further, real-time scene-dependent optimization of LiDAR scanning parameters may be performed, which may potentially lead to adaptive sensing in adverse weather in urban and highway scenarios.
The method operations may also include generating 1304 temporal histograms corresponding to a signal detected by a detector of the LiDAR sensor for the pulse emitted 1302 by the LiDAR sensor. The detector of the LiDAR sensor may be a SPAD detector. Further, the generated 1304 temporal waveform may be denoised 1306 based on temporal histograms by convolving the waveform with the pulse emitted by the LiDAR sensor, and ambient light may be estimated 1308 by removing the temporal waveform's median from noisy and saturated waveforms. A noise threshold corresponding to the estimated 1308 ambient light may be determined 1310, and a peak of a plurality of peaks having the maximum intensity may be determined 1312. The determined peak 1312 may be added to a point cloud. Additionally, or alternatively, true intensity of the peak may be recovered by compensating the maximum intensity of the peak using a half pulse width and power level of the pulse as scaling factors. Further, an edge threshold may be determined as a continuous parameter having a value between 0 and 2. The determined continuous parameter is the determined 1310 noise threshold.
The method operations may include constructing 1406 a max-rank loss scalarization for the signal using the optimized 1404 pipeline, and computing 1408 transients using centroid weights based upon the max-rank loss scalarization. Further, corresponding to the computed 1408 transients and based upon a CMA-ES, upon determining a new centroid, replacing 1410 a centroid with the new centroid. The max-rank loss scalarization may be dynamic. Additionally, or alternatively, the max-rank loss scalarization may be a stable max-rank loss scalarization based on keeping a weight associated with each loss vector fixed or stabilizing the weighted max-rank loss with respect to loss value tie breaking from noise and quantization.
Various functional operations of the embodiments described herein may be implemented using machine learning algorithms, and performed by one or more local or remote processors, transceivers, servers, and/or sensors, and/or via computer-executable instructions stored on non-transitory computer-readable media or medium.
In some embodiments, the machine learning algorithms may be implemented, such that a computer system “learns” to analyze, organize, and/or process data without being explicitly programmed. Machine learning may be implemented through machine learning methods and algorithms (“ML methods and algorithms”). In one exemplary embodiment, a machine learning module (“ML module”) is configured to implement ML methods and algorithms. In some embodiments, ML methods and algorithms are applied to data inputs and generate machine learning outputs (“ML outputs”). Data inputs may include but are not limited to images. ML outputs may include, but are not limited to identified objects, items classifications, and/or other data extracted from the images. In some embodiments, data inputs may include certain ML outputs.
In some embodiments, at least one of a plurality of ML methods and algorithms may be applied, which may include but are not limited to: linear or logistic regression, instance-based algorithms, regularization algorithms, decision trees, Bayesian networks, cluster analysis, association rule learning, artificial neural networks, deep learning, combined learning, reinforced learning, dimensionality reduction, and support vector machines. In various embodiments, the implemented ML methods and algorithms are directed toward at least one of a plurality of categorizations of machine learning, such as supervised learning, unsupervised learning, and reinforcement learning.
In one embodiment, the ML module employs supervised learning, which involves identifying patterns in existing data to make predictions about subsequently received data. Specifically, the ML module is “trained” using training data, which includes example inputs and associated example outputs. Based upon the training data, the ML module may generate a predictive function which maps outputs to inputs and may utilize the predictive function to generate ML outputs based upon data inputs. The example inputs and example outputs of the training data may include any of the data inputs or ML outputs described above. In the exemplary embodiment, a processing element may be trained by providing it with a large sample of images with known characteristics or features or with a large sample of other data with known characteristics or features. Such information may include, for example, information associated with a plurality of images and/or other data of a plurality of different objects, items, or property.
In another embodiment, a ML module may employ unsupervised learning, which involves finding meaningful relationships in unorganized data. Unlike supervised learning, unsupervised learning does not involve user-initiated training based upon example inputs with associated outputs. Rather, in unsupervised learning, the ML module may organize unlabeled data according to a relationship determined by at least one ML method/algorithm employed by the ML module. Unorganized data may include any combination of data inputs and/or ML outputs as described above.
In yet another embodiment, a ML module may employ reinforcement learning, which involves optimizing outputs based upon feedback from a reward signal. Specifically, the ML module may receive a user-defined reward signal definition, receive a data input, utilize a decision-making model to generate a ML output based upon the data input, receive a reward signal based upon the reward signal definition and the ML output, and alter the decision-making model so as to receive a stronger reward signal for subsequently generated ML outputs. Other types of machine learning may also be employed, including deep or combined learning techniques.
In some embodiments, generative artificial intelligence (AI) models (also referred to as generative machine learning (ML) models) may be utilized with the present embodiments and may the voice bots or chatbots discussed herein may be configured to utilize artificial intelligence and/or machine learning techniques. For instance, the voice or chatbot may be a ChatGPT chatbot. The voice or chatbot may employ supervised or unsupervised machine learning techniques, which may be followed by, and/or used in conjunction with, reinforced or reinforcement learning techniques. The voice or chatbot may employ the techniques utilized for ChatGPT. The voice bot, chatbot, ChatGPT-based bot, ChatGPT bot, and/or other bots may generate audible or verbal output, text, or textual output, visual or graphical output, output for use with speakers and/or display screens, and/or other types of output for user and/or other computer or bot consumption.
In some embodiments, various functional operations of the embodiments described herein may be implemented using an artificial neural network model. The artificial neural network may include multiple layers of neurons, including an input layer, one or more hidden layers, and an output layer. Each layer may include any number of neurons. It should be understood that neural networks of a different structure and configuration may be used to achieve the methods and systems described herein.
In the exemplary embodiment, the input layer may receive different input data. For example, the input layer includes a first input a1 representing training images, a second input a2 representing patterns identified in the training images, a third input a3 representing edges of the training images, and so on. The input layer may include thousands or more inputs. In some embodiments, the number of elements used by the neural network model changes during the training process, and some neurons are bypassed or ignored if, for example, during execution of the neural network, they are determined to be of less relevance.
In some embodiments, each neuron in hidden layer(s) may process one or more inputs from the input layer, and/or one or more outputs from neurons in one of the previous hidden layers, to generate a decision or output. The output layer includes one or more outputs each indicating a label, confidence factor, weight describing the inputs, an output image, or a point cloud. In some embodiments, however, outputs of the neural network model may be obtained from a hidden layers in addition to, or in place of, output(s) from the output layer(s).
In some embodiments, each layer has a discrete, recognizable function with respect to input data. For example, if n is equal to 3, a first layer analyzes the first dimension of the inputs, a second layer the second dimension, and the final layer the third dimension of the inputs. Dimensions may correspond to aspects considered strongly determinative, then those considered of intermediate importance, and finally those of less relevance.
In some embodiments, the layers may not be clearly delineated in terms of the functionality they perform. For example, two or more of hidden layers may share decisions relating to labeling, with no single layer making an independent decision as to labeling.
Based upon these analyses, the processing element may learn how to identify characteristics and patterns that may then be applied to analyzing and classifying objects. The processing element may also learn how to identify attributes of different objects in different lighting. This information may be used to determine which classification models to use and which classifications to provide.
Some embodiments involve the use of one or more electronic processing or computing devices. As used herein, the terms “processor” and “computer” and related terms, e.g., “processing device,” and “computing device” are not limited to just those integrated circuits referred to in the art as a computer, but broadly refers to a processor, a processing device or system, a general purpose central processing unit (CPU), a graphics processing unit (GPU), a microcontroller, a microcomputer, a programmable logic controller (PLC), a reduced instruction set computer (RISC) processor, a field programmable gate array (FPGA), a digital signal processor (DSP), an application specific integrated circuit (ASIC), and other programmable circuits or processing devices capable of executing the functions described herein, and these terms are used interchangeably herein. These processing devices are generally “configured” to execute functions by programming or being programmed, or by the provisioning of instructions for execution. The above examples are not intended to limit in any way the definition or meaning of the terms processor, processing device, and related terms.
The various aspects illustrated by logical blocks, modules, circuits, processes, algorithms, and algorithm steps described above may be implemented as electronic hardware, software, or combinations of both. Certain disclosed components, blocks, modules, circuits, and steps are described in terms of their functionality, illustrating the interchangeability of their implementation in electronic hardware or software. The implementation of such functionality varies among different applications given varying system architectures and design constraints. Although such implementations may vary from application to application, they do not constitute a departure from the scope of this disclosure.
Aspects of embodiments implemented in software may be implemented in program code, application software, application programming interfaces (APIs), firmware, middleware, microcode, hardware description languages (HDLs), or any combination thereof. A code segment or machine-executable instruction may represent a procedure, a function, a subprogram, a routine, a subroutine, a module, a software package, a class, or any combination of instructions, data structures, or program statements. A code segment may be coupled to, or integrated with, another code segment or an electronic hardware by passing or receiving information, data, arguments, parameters, memory contents, or memory locations. Information, arguments, parameters, data, etc. may be passed, forwarded, or transmitted via any suitable means including memory sharing, message passing, token passing, network transmission, etc.
The actual software code or specialized control hardware used to implement these systems and methods is not limiting of the claimed features or this disclosure. Thus, the operation and behavior of the systems and methods were described without reference to the specific software code being understood that software and control hardware can be designed to implement the systems and methods based on the description herein.
When implemented in software, the disclosed functions may be embodied, or stored, as one or more instructions or code on or in memory. In the embodiments described herein, memory includes non-transitory computer-readable media, which may include, but is not limited to, media such as flash memory, a random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM), and non-volatile RAM (NVRAM). As used herein, the term “non-transitory computer-readable media” is intended to be representative of any tangible, computer-readable media, including, without limitation, non-transitory computer storage devices, including, without limitation, volatile and non-volatile media, and removable and non-removable media such as a firmware, physical and virtual storage, CD-ROM, DVD, and any other digital source such as a network, a server, cloud system, or the Internet, as well as yet to be developed digital means, with the sole exception being a transitory propagating signal. The methods described herein may be embodied as executable instructions, e.g., “software” and “firmware,” in a non-transitory computer-readable medium. As used herein, the terms “software” and “firmware” are interchangeable and include any computer program stored in memory for execution by personal computers, workstations, clients, and servers. Such instructions, when executed by a processor, configure the processor to perform at least a portion of the disclosed methods.
As used herein, an element or step recited in the singular and proceeded with the word “a” or “an” should be understood as not excluding plural elements or steps unless such exclusion is explicitly recited. Furthermore, references to “one embodiment” of the disclosure or an “exemplary” or “example” embodiment are not intended to be interpreted as excluding the existence of additional embodiments that also incorporate the recited features. Likewise, limitations associated with “one embodiment” or “an embodiment” should not be interpreted as limiting to all embodiments unless explicitly recited.
Disjunctive language such as the phrase “at least one of X, Y, or Z,” unless specifically stated otherwise, is generally intended, within the context presented, to disclose that an item, term, etc. may be either X, Y, or Z, or any combination thereof (e.g., X, Y, and/or Z). Likewise, conjunctive language such as the phrase “at least one of X, Y, and Z,” unless specifically stated otherwise, is generally intended, within the context presented, to disclose at least one of X, at least one of Y, and at least one of Z.
The disclosed systems and methods are not limited to the specific embodiments described herein. Rather, components of the systems or steps of the methods may be utilized independently and separately from other described components or steps.
This written description uses examples to disclose various embodiments, which include the best mode, to enable any person skilled in the art to practice those embodiments, including making and using any devices or systems and performing any incorporated methods. The patentable scope is defined by the claims and may include other examples that occur to those skilled in the art. Such other examples are intended to be within the scope of the claims if they have structural elements that do not differ from the literal language of the claims, or if they include equivalent structural elements with insubstantial differences from the literal language of the claims.
This application claims priority to U.S. Provisional Patent Application No. 63/508,781, filed Jun. 16, 2023, entitled “SIGNAL PROCESSING OPTIMIZATION FOR AUTONOMOUS VEHICLE PERCEPTION SYSTEMS,” the entire content of which is hereby incorporated by reference in its entirety.
Number | Date | Country | |
---|---|---|---|
63508781 | Jun 2023 | US |