This disclosure relates generally to deep learning and, more particularly, to calibrating uncertainty for regression and continuous structured model prediction tasks.
In recent years, the field of deep learning in artificial intelligence has provided significant valuable in the extraction of important information out of large data sets. As data continues to be generated at ever increasing rates, the ability to make intelligent decisions based on large sets of data is vital to increase the efficiency of data analysis. Deep learning applications are useful across many industries that have a demand for large amounts of data, such as autonomous driving. The predictions of data-learned models may be calibrated for uncertainty. A well-calibrated model is expected to show low uncertainty when predictions are accurate and higher uncertainty when predictions are less accurate.
The figures are not to scale. Unless specifically stated otherwise, descriptors such as “first,” “second,” “third,” etc., are used herein without imputing or otherwise indicating any meaning of priority, physical order, arrangement in a list, and/or ordering in any way, but are merely used as labels and/or arbitrary names to distinguish elements for ease of understanding the disclosed examples. In some examples, the descriptor “first” may be used to refer to an element in the detailed description, while the same element may be referred to in a claim with a different descriptor such as “second” or “third.” In such instances, it should be understood that such descriptors are used merely for identifying those elements distinctly that might, for example, otherwise share a same name.
As used herein, the phrase “in communication,” including variations thereof, encompasses direct communication and/or indirect communication through one or more intermediary components, and does not require direct physical (e.g., wired) communication and/or constant communication, but rather additionally includes selective communication at periodic intervals, scheduled intervals, aperiodic intervals, and/or one-time events. As used herein, “processor circuitry” is defined to include (i) one or more special purpose electrical circuits structured to perform specific operation(s) and including one or more semiconductor-based logic devices (e.g., electrical hardware implemented by one or more transistors), and/or (ii) one or more general purpose semiconductor-based electrical circuits programmed with instructions to perform specific operations and including one or more semiconductor-based logic devices (e.g., electrical hardware implemented by one or more transistors). Examples of processor circuitry include programmed microprocessors, Field Programmable Gate Arrays (FPGAs) that may instantiate instructions, Central Processor Units (CPUs), Graphics Processor Units (GPUs), Digital Signal Processors (DSPs), XPUs, or microcontrollers and integrated circuits such as Application Specific Integrated Circuits (ASICs). For example, an XPU may be implemented by a heterogeneous computing system including multiple types of processor circuitry (e.g., one or more FPGAs, one or more CPUs, one or more GPUs, one or more DSPs, etc., and/or a combination thereof) and application programming interface(s) (API(s)) that may assign computing task(s) to whichever one(s) of the multiple types of the processing circuitry is/are best suited to execute the computing task(s).
Artificial intelligence (AI), including machine learning (ML), deep learning (DL), and/or other artificial machine-driven logic, enables machines (e.g., computers, logic circuits, etc.) to use a model to process input data to generate an output based on patterns and/or associations previously learned by the model via a training process. For instance, the model may be trained with data to recognize patterns and/or associations and follow such patterns and/or associations when processing input data such that other input(s) result in output(s) consistent with the recognized patterns and/or associations.
Many different types of machine learning models and/or machine learning architectures exist. In some examples disclosed herein, a Neural Network (NN) model is used. Using a Neural Network (NN) model enables the interpretation of data wherein patterns can be recognized. In general, machine learning models/architectures that are suitable to use in the example approaches disclosed herein will be Convolutional Neural Network (CNN) and/or Deep Neural Network (DNN), wherein interconnections are not visible outside of the model. However, other types of machine learning models could additionally or alternatively be used such as Recurrent Neural Network (RNN), Support Vector Machine (SVM), Gated Recurrent Unit (GRU), Long Short Term Memory (LSTM), etc.
In general, implementing a ML/AI system involves two phases, a learning/training phase and an inference phase. In the learning/training phase, a training algorithm is used to train a model to operate in accordance with patterns and/or associations based on, for example, training data. In general, the model includes internal parameters that guide how input data is transformed into output data, such as through a series of nodes and connections within the model to transform input data into output data. Additionally, hyperparameters are used as part of the training process to control how the learning is performed (e.g., a learning rate, a number of layers to be used in the machine learning model, etc.). Hyperparameters are defined to be training parameters that are determined prior to initiating the training process.
Different types of training may be performed based on the type of ML/AI model and/or the expected output. For example, supervised training uses inputs and corresponding expected (e.g., labeled) outputs to select parameters (e.g., by iterating over combinations of select parameters) for the ML/AI model that reduce model error. As used herein, labelling refers to an expected output of the machine learning model (e.g., a classification, an expected output value, etc.) Alternatively, unsupervised training (e.g., used in deep learning, a subset of machine learning, etc.) involves inferring patterns from inputs to select parameters for the ML/AI model (e.g., without the benefit of expected (e.g., labeled) outputs).
In examples disclosed herein, ML/AI models are trained using known vehicle trajectories (e.g., ground truth trajectories). Training is performed using hyperparameters that control how the learning is performed (e.g., a learning rate, a number of layers to be used in the machine learning model, etc.).
Conventional deep learning models often make unreliable predictions, and a measure of uncertainty is not provided in regression tasks with such models. Uncertainty estimation is crucial in particular for safety-critical tasks such as in Autonomous Driving for informed decision making. For a reliable model, the model uncertainty should correlate with its prediction error. Uncertainty calibration is applied to improve the quality of uncertainty estimates, hence more informed decision making is possible on the model prediction during inference. A well-calibrated model indicates low uncertainty about its prediction when the model is accurate and indicates high uncertainty when it is likely to be inaccurate (see
The existing approaches for uncertainty calibration have been applied for classification tasks or post-hoc finetuning. For example, current differentiable accuracy versus uncertainty calibration loss functions are limited in application to classification tasks. Additionally, current post-hoc uncertainty calibration methods do not provide well calibrated uncertainties under distributional shifts in real world applications. Continuous structured prediction introduces greater complexities compared to regression problems because it is based on time series analysis. Various approaches exist to estimate uncertainty in neural network predictions including Bayesian and non-Bayesian methods.
Examples are disclosed herein to calibrate error aligned uncertainty for regression and continuous structured prediction tasks/optimizations. The example optimizations disclosed herein are orthogonal and can be applied in conjunction with methods described above to further improve uncertainty estimates.
Error aligned uncertainty calibrations can be applied to many different usage cases across industries, such as in autonomous driving, robotics, industrial manufacturing, etc. Uncertainty estimation is commonly utilized with safety critical tasks that involve image and other sensor inputs. For ease of explanation, the examples described below will focus on an autonomous driving application but can be applied to any other application that involves uncertainty estimations.
In some examples, the example uncertainty quantification calibration circuitry 102 receives (e.g., obtains) input 106 for a regression (e.g., prediction) model circuitry 104. The regression model circuitry 104 may include processor circuitry and memory that instantiates a regression model. The input 106 for the example regression model circuitry 104 is a single scene (e.g., a series of images) context x consisting of static input features (e.g., map of the environment that can be augmented with extra information such as crosswalk occupancy, lane availability, direction, and speed limit) and time-dependent input features (e.g., occupancy, velocity, acceleration and yaw for vehicles and pedestrians in the scene). In some examples, the output 120 of the regression model circuitry 104 is D top trajectory predictions (y(d)|d ∈ 1, . . . , D) for the future movements of the target vehicle together with their corresponding confidence scores (c(d)|d ∈ 1, . . . , D) or uncertainty scores (u(d)|d ∈ 1, . . . , D, as shown in
In some examples, a training set to train the regression model circuitry 104 for vehicle motion prediction is denoted as Dtrain=(xi,yi)Ni=1. In some examples, y denotes the ground truth trajectories paired with high-dimensional features x of the corresponding scenes. Each example y=(s1, . . . , sT) corresponds to the trajectory of a given vehicle observed by the automated vehicle perception stack, and each state st corresponds to the dx- and dy-displacement of the vehicle at timestep t, such that y ∈ RT×2. In some examples, the training set (e.g., inputs like input 106) includes images (e.g., a series of images that make up a scene or multiple scenes) and/or data associated with images that provide information on vehicle locations and trajectories over time.
In some examples, a given scene is M seconds long and divided into K seconds of context features and L seconds of ground truth targets for prediction separated by the time T=0. The goal is to predict the movement trajectory of vehicles at time T ∈ (0, L] based on the information available for time T ∈ [−K, 0].
In some examples, the uncertainty quantification calibration circuitry 102 includes a neural network architecture circuitry 108. The neural network architecture circuitry 108 instantiates one or more of any type of artificial neural networks (ANN) (e.g., a deep neural network (DNN)) that includes nodes, layers, weights, etc. to be utilized to train the regression model. The neural network architecture circuitry 108 may include processor circuitry and memory that instantiates a neural network.
Motion prediction is a multi-modal task. In some examples, incorporation of uncertainty into motion prediction includes introducing two types of uncertainty quantification metrics:
Per-trajectory confidence-aware metrics: For a given input x, an example stochastic model accompanies its D top trajectory predictions with scalar per-trajectory confidence scores (c(i)|i ∈ 1, . . . , D) based on e.g., log-likelihood.
Per-prediction request confidence-aware metrics: U is computed by aggregating the D top per-trajectory confidence scores to a single uncertainty score (e.g., U=−(ΣDi=1c(i))/D).
In some examples, an automated vehicle associates a high per-prediction request uncertainty in the existence of unfamiliar or high-risk scene context. However, since uncertainties do not have ground truth, assessing the quality of these uncertainty measures is challenging.
In some examples, robustness to distributional shift is assessed via metrics of predictive performance such as Average Displacement Error (ADE) or Mean Square Error (MSE) in case of continuous structured prediction and regression tasks, respectively. In some examples, ADE is a standard performance metric for time-series data and measures the quality of a prediction y with respect to the ground truth y* as:
where y=(s1, . . . , sT).
In some examples, the analysis is done with two types of evaluation datasets, which are the in-distribution and shifted datasets. Models which have a smaller degradation in performance on the shifted data are considered more robust.
In some examples, there are situations where a model performs well on shifted data and poorly on in-distribution data. Thus, in some examples, joint assessment of the quality of uncertainty estimates and robustness to distributional shift is utilized. Joint analysis enables an understanding of whether measures of uncertainty correlate well with the presence of an incorrect prediction or a high degree of error.
In some examples, error and F1 retention curves are utilized for joint assessment. The area under error retention curve (R-AUC) can be decreased either by improving the model such that it has lower overall error, or by providing better estimates of uncertainty such that predictions with more errors are rejected earlier. In some examples, for F1-retention curves, a higher area under curve (F1-AUC) indicates better calibration performance. In some examples, the dataset used contains both an ‘in-distribution’ and a distributionally shifted subset.
In the illustrated example of
In some examples, for regression and continuous structured prediction tasks, robustness is measured in terms of MSE and ADE, respectively, instead of accuracy score. Lower MSE and ADE indicate more accurate results.
In some examples, two metrics are used to classify predictions of samples (e.g., sample sequences of images used from a scene): certainty and accuracy. As used herein, the following annotations are used to show the count of each of the four possible classifications of predictions: the number of accurate and certain samples (nLC), the number of inaccurate and certain samples (nHC), the number of accurate and uncertain samples (nLU) and the number of inaccurate and uncertain samples (nHU). This classification grid is illustrated in Table 1 below.
In some examples, the regression model is more certain about predictions when it is accurate and less certain about inaccurate predictions. In some examples, the goal is to have a greater number of certain samples when the predictions are accurate (LC) vs. inaccurate (HC) and have a greater number of uncertain samples when the predictions are inaccurate (HU) vs. accurate (LU). Thus, in some examples, a reliable and well-calibrated model provides a higher EaU measure (EAU ∈ [0, 1]). An example Equation 2 illustrates how the EaU measure is calculated (e.g., an EaU indicator function).
An example chart of predictive uncertainty 122 for a well-calibrated model is shown in
An example Equation 3 illustrates how to count and/or calculate the number of samples that fall into each of four accuracy-certainty classification categories. In some examples, the example set of equations in Equation 3 may change based on the nature of the certainty parameters provided (e.g., “less than” may switch to “greater than” if uncertainty parameters are provided).
In some examples, average displacement error (adei) as the robustness measure to classify the sample as accurate or inaccurate comparing it with a task-dependent threshold (adeth). In some examples, the adeth is determined upon evaluation of a pre-training result. In some examples, the samples are classified as certain or uncertain according to the confidence score c of each sample. The ci is based on “log likelihood” in the continuous structured prediction. Similarly, in some examples, the log likelihood of each sample, which is our certainty measure, is compared with a task-dependent threshold cth.
As the equations in Equation 3 are not differentiable, the loss calculation circuitry 110 includes a trainable uncertainty calibration loss (LEaUC) calculation circuitry 114 and a sample classification counting and calculation circuitry 112 to provide differentiable approximations (e.g., proxy functions) for the indicator functions illustrated in Equations 2 and 3. The LEaUC serves as the utility-dependent penalty term within the loss-calibrated approximate inference framework for regression and continuous structured prediction tasks. In some examples, the LEaUC calculation circuitry 114 calculates the LEaUC using the calculation function shown in Equation 4. In some examples, the sample classification counting and calculation circuitry 112 calculates the counts of samples of each classification type using the calculation functions shown in Equation 5.
where:
In some examples, the sample classification counting and calculation circuitry 112 uses a hyperbolic tangent function as a bounding function to scale the error and/or uncertainty measures to the range [0, 1]. The example approximate functions show that the bounded error tanh(ade)→0 when the predictions are accurate and tanh(ade)→1 when inaccurate. To scale the robustness and uncertainty measures to the appropriate range for the bounding function or to be used directly, the sample classification counting and calculation circuitry 112 applies a post-processing on the robustness measure adei and uncertainty measure ci with x and y, shown in in Equation 4, respectively. In some examples, the post-processing steps are adapted according to each performed task based on the results of initial training epochs. In some examples, the LEaUC is a secondary loss and is added to the standard negative log likelihood loss (LNLL).
In the illustrated example of
L
Final
=L
NLL+(β×LEaUC)
In some examples, to have a significant impact from the secondary loss, the LEaUC value may be weighted with a β hyperparameter in the final loss calculation, which is determined by comparing/analyzing the primary loss value (LNLL) to the initially calculated LEaUC value. In some examples, under ideal conditions, the proxy functions defined in Equations 4 and 5 are equivalent to the indicator functions defined in Equations 2 and 3.
In safety-critical scenarios, it is important to be certain when predictions are accurate. In some examples, the sample classification counting and calculation circuitry 112 and the LEaUC calculation circuitry 114 provide higher weights to the class of LC samples while calculating Equations 4 and 5. Equation 7 illustrates how high weights are assigned by the LEaUC calculation circuitry 114 to these samples in our loss, where s>1.
In some examples, the uncertainty quantification calibration circuitry 102 includes an optimization circuitry 118 to calibrate the regression (prediction) model circuitry 104 using the LFINAL calculation function results to calibrate the regression model 104 (e.g., during training of the model) for an increased robustness of predictions.
In some examples, the uncertainty quantification calibration circuity 102 includes means for instantiating a regression model. For example, the means for instantiating a regression model may be implemented by regression (prediction) model circuitry 104. In some examples, the regression (prediction) model circuitry 104 may be instantiated by processor circuitry such as the example processor circuitry 312 of
In some examples, the uncertainty quantification calibration circuity 102 includes means for instantiating instantiates one or more of any type of artificial neural networks (ANN) (e.g., a deep neural network (DNN)) that includes nodes, layers, weights, etc. to be utilized to train the regression model. For example, the means for instantiating instantiates one or more of any type of artificial neural networks (ANN) (e.g., a deep neural network (DNN)) that includes nodes, layers, weights, etc. to be utilized to train the regression model may be implemented by neural network architecture circuitry 108. In some examples, the neural network architecture circuitry 108 may be instantiated by processor circuitry such as the example processor circuitry 312 of
In some examples, the uncertainty quantification calibration circuity 102 includes means for calculating a total certainty loss for the regression model circuitry's 104 prediction that includes a loss attributed to an error aligned uncertainty calibration (EaUC). For example, the means for calculating a total certainty loss for the regression model circuitry's 104 prediction that includes a loss attributed to an error aligned uncertainty calibration (EaUC) may be implemented by loss calculation circuitry 110. In some examples, the loss calculation circuitry 110 may be instantiated by processor circuitry such as the example processor circuitry 312 of
In some examples, the uncertainty quantification calibration circuity 102 includes means for calculating the counts of samples of each classification type. For example, the means for calculating the counts of samples of each classification type may be implemented by sample classification and counting circuitry 112. In some examples, the sample classification and counting circuitry 112 may be instantiated by processor circuitry such as the example processor circuitry 312 of
In some examples, the uncertainty quantification calibration circuitry 102 includes means for calculating the uncertainty calibration loss (LEaUC). For example, the means for calculating the uncertainty calibration loss (LEaUC) may be implemented by LEaUC calculation circuitry 114. In some examples, the LEaUC calculation circuitry 114 may be instantiated by processor circuitry such as the example processor circuitry 312 of
In some examples, the uncertainty quantification calibration circuitry 102 includes means for calculating the final loss (LFINAL) from the combined results of the standard negative log likelihood loss (LNLL) and the LEaUC. For example, the means for calculating the final loss (LFINAL) from the combined results of the standard negative log likelihood loss (LNLL) and the LEaUC may be implemented by LFINAL calculation circuitry 116. In some examples, the LFINAL calculation circuitry 116 may be instantiated by processor circuitry such as the example processor circuitry 312 of
In some examples, the uncertainty quantification calibration circuitry 102 includes means for calibrating the regression (prediction) model circuitry 104 using the LFINAL calculation function results to calibrate the regression model 104 (e.g., during training of the model) for an increased robustness of predictions. For example, the means for calibrating the regression (prediction) model circuitry 104 using the LFINAL calculation function results to calibrate the regression model 104 (e.g., during training of the model) for an increased robustness of predictions may be implemented by optimization circuitry 118. In some examples, the optimization circuitry 118 may be instantiated by processor circuitry such as the example processor circuitry 312 of
While an example manner of implementing the uncertainty quantification calibration circuitry 102 is illustrated in
A flowchart representative of example hardware logic circuitry, machine readable instructions, hardware implemented state machines, and/or any combination thereof for implementing the uncertainty quantification calibration circuitry 102 of
The machine readable instructions described herein may be stored in one or more of a compressed format, an encrypted format, a fragmented format, a compiled format, an executable format, a packaged format, etc. Machine readable instructions as described herein may be stored as data or a data structure (e.g., as portions of instructions, code, representations of code, etc.) that may be utilized to create, manufacture, and/or produce machine executable instructions. For example, the machine readable instructions may be fragmented and stored on one or more storage devices and/or computing devices (e.g., servers) located at the same or different locations of a network or collection of networks (e.g., in the cloud, in edge devices, etc.). The machine readable instructions may require one or more of installation, modification, adaptation, updating, combining, supplementing, configuring, decryption, decompression, unpacking, distribution, reassignment, compilation, etc., in order to make them directly readable, interpretable, and/or executable by a computing device and/or other machine. For example, the machine readable instructions may be stored in multiple parts, which are individually compressed, encrypted, and/or stored on separate computing devices, wherein the parts when decrypted, decompressed, and/or combined form a set of machine executable instructions that implement one or more operations that may together form a program such as that described herein.
In another example, the machine readable instructions may be stored in a state in which they may be read by processor circuitry, but require addition of a library (e.g., a dynamic link library (DLL)), a software development kit (SDK), an application programming interface (API), etc., in order to execute the machine readable instructions on a particular computing device or other device. In another example, the machine readable instructions may need to be configured (e.g., settings stored, data input, network addresses recorded, etc.) before the machine readable instructions and/or the corresponding program(s) can be executed in whole or in part. Thus, machine readable media, as used herein, may include machine readable instructions and/or program(s) regardless of the particular format or state of the machine readable instructions and/or program(s) when stored or otherwise at rest or in transit.
The machine readable instructions described herein can be represented by any past, present, or future instruction language, scripting language, programming language, etc. For example, the machine readable instructions may be represented using any of the following languages: C, C++, Java, C#, Perl, Python, JavaScript, HyperText Markup Language (HTML), Structured Query Language (SQL), Swift, etc.
As mentioned above, the example operations of
“Including” and “comprising” (and all forms and tenses thereof) are used herein to be open ended terms. Thus, whenever a claim employs any form of “include” or “comprise” (e.g., comprises, includes, comprising, including, having, etc.) as a preamble or within a claim recitation of any kind, it is to be understood that additional elements, terms, etc., may be present without falling outside the scope of the corresponding claim or recitation. As used herein, when the phrase “at least” is used as the transition term in, for example, a preamble of a claim, it is open-ended in the same manner as the term “comprising” and “including” are open ended. The term “and/or” when used, for example, in a form such as A, B, and/or C refers to any combination or subset of A, B, C such as (1) A alone, (2) B alone, (3) C alone, (4) A with B, (5) A with C, (6) B with C, or (7) A with B and with C. As used herein in the context of describing structures, components, items, objects and/or things, the phrase “at least one of A and B” is intended to refer to implementations including any of (1) at least one A, (2) at least one B, or (3) at least one A and at least one B. Similarly, as used herein in the context of describing structures, components, items, objects and/or things, the phrase “at least one of A or B” is intended to refer to implementations including any of (1) at least one A, (2) at least one B, or (3) at least one A and at least one B. As used herein in the context of describing the performance or execution of processes, instructions, actions, activities and/or steps, the phrase “at least one of A and B” is intended to refer to implementations including any of (1) at least one A, (2) at least one B, or (3) at least one A and at least one B. Similarly, as used herein in the context of describing the performance or execution of processes, instructions, actions, activities and/or steps, the phrase “at least one of A or B” is intended to refer to implementations including any of (1) at least one A, (2) at least one B, or (3) at least one A and at least one B.
As used herein, singular references (e.g., “a”, “an”, “first”, “second”, etc.) do not exclude a plurality. The term “a” or “an” object, as used herein, refers to one or more of that object. The terms “a” (or “an”), “one or more”, and “at least one” are used interchangeably herein. Furthermore, although individually listed, a plurality of means, elements or method actions may be implemented by, e.g., the same entity or object. Additionally, although individual features may be included in different examples or claims, these may possibly be combined, and the inclusion in different examples or claims does not imply that a combination of features is not feasible and/or advantageous.
The machine readable instructions and/or operations 200 of
At block 204, the example LEaUC calculation circuitry 116 calculates the trainable uncertainty calibration loss (LEaUC) with the calculated counts of samples of each of the accuracy-certainty classification categories. In some examples, the LEaUC calculation circuitry 116 uses the LEaUC calculation function illustrated in Equation 4. In other examples, the LEaUC calculation circuitry 116 uses the LEaUC calculation function illustrated in Equation 2.
At block 206, the example LFINAL calculation circuitry 118 calculates the final differentiable loss value. In some examples, the LFINAL calculation circuitry 118 uses the LFINAL calculation function illustrated in Equation 6. In other examples, the LFINAL calculation circuitry 118 uses the LEaUC calculation function illustrated in Equation 7.
At block 208, the optimization circuitry 120 calibrates the prediction model (e.g., regression model) using the calculated the final differentiable loss value. At this point the process concludes.
The processor platform 300 of the illustrated example includes processor circuitry 312. The processor circuitry 312 of the illustrated example is hardware. For example, the processor circuitry 312 can be implemented by one or more integrated circuits, logic circuits, FPGAs microprocessors, CPUs, GPUs, DSPs, and/or microcontrollers from any desired family or manufacturer. The processor circuitry 312 may be implemented by one or more semiconductor based (e.g., silicon based) devices. In this example, the processor circuitry 312 implements the example uncertainty quantification calibration circuitry 102, the example regression model circuitry 104, the example neural network architecture circuitry 108, the example loss calculation circuitry 110, the example sample classification counting and calculation circuitry 112, the example LEaUC calculation circuitry 114, the example LFINAL circuitry 116, and the example optimization circuitry 118.
The processor circuitry 312 of the illustrated example includes a local memory 313 (e.g., a cache, registers, etc.). The processor circuitry 312 of the illustrated example is in communication with a main memory including a volatile memory 314 and a non-volatile memory 316 by a bus 318. The volatile memory 314 may be implemented by Synchronous Dynamic Random Access Memory (SDRAM), Dynamic Random Access Memory (DRAM), RAMBUS® Dynamic Random Access Memory (RDRAM®), and/or any other type of RAM device. The non-volatile memory 316 may be implemented by flash memory and/or any other desired type of memory device. Access to the main memory 314, 316 of the illustrated example is controlled by a memory controller 317.
The processor platform 300 of the illustrated example also includes interface circuitry 320. The interface circuitry 320 may be implemented by hardware in accordance with any type of interface standard, such as an Ethernet interface, a universal serial bus (USB) interface, a Bluetooth® interface, a near field communication (NFC) interface, a PCI interface, and/or a PCIe interface.
In the illustrated example, one or more input devices 322 are connected to the interface circuitry 320. The input device(s) 322 permit(s) a user to enter data and/or commands into the processor circuitry 312. The input device(s) 322 can be implemented by, for example, an audio sensor, a microphone, a camera (still or video), a keyboard, a button, a mouse, a touchscreen, a track-pad, a trackball, an isopoint device, and/or a voice recognition system.
One or more output devices 324 are also connected to the interface circuitry 320 of the illustrated example. The output devices 324 can be implemented, for example, by display devices (e.g., a light emitting diode (LED), an organic light emitting diode (OLED), a liquid crystal display (LCD), a cathode ray tube (CRT) display, an in-place switching (IPS) display, a touchscreen, etc.), a tactile output device, a printer, and/or speaker. The interface circuitry 320 of the illustrated example, thus, typically includes a graphics driver card, a graphics driver chip, and/or graphics processor circuitry such as a GPU.
The interface circuitry 320 of the illustrated example also includes a communication device such as a transmitter, a receiver, a transceiver, a modem, a residential gateway, a wireless access point, and/or a network interface to facilitate exchange of data with external machines (e.g., computing devices of any kind) by a network 326. The communication can be by, for example, an Ethernet connection, a digital subscriber line (DSL) connection, a telephone line connection, a coaxial cable system, a satellite system, a line-of-site wireless system, a cellular telephone system, an optical connection, etc.
The processor platform 300 of the illustrated example also includes one or more mass storage devices 328 to store software and/or data. Examples of such mass storage devices 328 include magnetic storage devices, optical storage devices, floppy disk drives, HDDs, CDs, Blu-ray disk drives, redundant array of independent disks (RAID) systems, solid state storage devices such as flash memory devices, and DVD drives.
The machine executable instructions 332, which may be implemented by the machine readable instructions of
The cores 402 may communicate by an example bus 404. In some examples, the bus 404 may implement a communication bus to effectuate communication associated with one(s) of the cores 402. For example, the bus 404 may implement at least one of an Inter-Integrated Circuit (I2C) bus, a Serial Peripheral Interface (SPI) bus, a PCI bus, or a PCIe bus. Additionally or alternatively, the bus 404 may implement any other type of computing or electrical bus. The cores 402 may obtain data, instructions, and/or signals from one or more external devices by example interface circuitry 406. The cores 402 may output data, instructions, and/or signals to the one or more external devices by the interface circuitry 406. Although the cores 402 of this example include example local memory 420 (e.g., Level 1 (L1) cache that may be split into an L1 data cache and an L1 instruction cache), the microprocessor 400 also includes example shared memory 410 that may be shared by the cores (e.g., Level 2 (L2_cache)) for high-speed access to data and/or instructions. Data and/or instructions may be transferred (e.g., shared) by writing to and/or reading from the shared memory 410. The local memory 420 of each of the cores 402 and the shared memory 410 may be part of a hierarchy of storage devices including multiple levels of cache memory and the main memory (e.g., the main memory 314, 316 of
Each core 402 may be referred to as a CPU, DSP, GPU, etc., or any other type of hardware circuitry. Each core 402 includes control unit circuitry 414, arithmetic and logic (AL) circuitry (sometimes referred to as an ALU) 416, a plurality of registers 418, the L1 cache 420, and an example bus 422. Other structures may be present. For example, each core 402 may include vector unit circuitry, single instruction multiple data (SIMD) unit circuitry, load/store unit (LSU) circuitry, branch/jump unit circuitry, floating-point unit (FPU) circuitry, etc. The control unit circuitry 414 includes semiconductor-based circuits structured to control (e.g., coordinate) data movement within the corresponding core 402. The AL circuitry 416 includes semiconductor-based circuits structured to perform one or more mathematic and/or logic operations on the data within the corresponding core 402. The AL circuitry 416 of some examples performs integer based operations. In other examples, the AL circuitry 416 also performs floating point operations. In yet other examples, the AL circuitry 416 may include first AL circuitry that performs integer based operations and second AL circuitry that performs floating point operations. In some examples, the AL circuitry 416 may be referred to as an Arithmetic Logic Unit (ALU). The registers 418 are semiconductor-based structures to store data and/or instructions such as results of one or more of the operations performed by the AL circuitry 416 of the corresponding core 402. For example, the registers 418 may include vector register(s), SIMD register(s), general purpose register(s), flag register(s), segment register(s), machine specific register(s), instruction pointer register(s), control register(s), debug register(s), memory management register(s), machine check register(s), etc. The registers 418 may be arranged in a bank as shown in
Each core 402 and/or, more generally, the microprocessor 400 may include additional and/or alternate structures to those shown and described above. For example, one or more clock circuits, one or more power supplies, one or more power gates, one or more cache home agents (CHAs), one or more converged/common mesh stops (CMSs), one or more shifters (e.g., barrel shifter(s)) and/or other circuitry may be present. The microprocessor 400 is a semiconductor device fabricated to include many transistors interconnected to implement the structures described above in one or more integrated circuits (ICs) contained in one or more packages. The processor circuitry may include and/or cooperate with one or more accelerators. In some examples, accelerators are implemented by logic circuitry to perform certain tasks more quickly and/or efficiently than can be done by a general purpose processor. Examples of accelerators include ASICs and FPGAs such as those discussed herein. A GPU or other programmable device can also be an accelerator. Accelerators may be on-board the processor circuitry, in the same chip package as the processor circuitry and/or in one or more separate packages from the processor circuitry.
More specifically, in contrast to the microprocessor 400 of
In the example of
The interconnections 510 of the illustrated example are conductive pathways, traces, vias, or the like that may include electrically controllable switches (e.g., transistors) whose state can be changed by programming (e.g., using an HDL instruction language) to activate or deactivate one or more connections between one or more of the logic gate circuitry 508 to program desired logic circuits.
The storage circuitry 512 of the illustrated example is structured to store result(s) of the one or more of the operations performed by corresponding logic gates. The storage circuitry 512 may be implemented by registers or the like. In the illustrated example, the storage circuitry 512 is distributed amongst the logic gate circuitry 508 to facilitate access and increase execution speed.
The example FPGA circuitry 500 of
Although
In some examples, the processor circuitry 312 of
A block diagram illustrating an example software distribution platform 605 to distribute software such as the example machine readable instructions 332 of
The performance of the apparatus and method to calibrate error aligned uncertainty for regression and continuous structured prediction tasks/optimizations is discussed below. The performance of both the continuous structured prediction and the regression tasks were evaluated using publicly available data sets.
The Error aligned Uncertainty Calibration (EaUC) loss benefits regression models by improving the quality of predictive uncertainty. The calibration method described was adapted to a more challenging continuous structured prediction task, vehicle motion prediction. The real-world Shifts vehicle motion prediction dataset and benchmark was utilized because it is a real-world task and representative of an actual industrial application as collected by Yandex Self Driving Group. In this task, distributional shift is ubiquitous, and it is affected by real, ‘in-the-wild’ distributional shifts which pose challenges for uncertainty estimation.
Shifts Dataset has data collected from six geographical locations, three seasons, three times of day, and four weather conditions to evaluate the quality of uncertainty under distributional shift. Currently it is the largest vehicle motion prediction dataset, containing 600,000 scenes. It consists of both in-distribution and shifted datasets.
In Shifts benchmark, optimization is done based on NLL objective, and results are reported for two baseline architectures, which are the stochastic Behavioral Cloning (BC) Model and the Deep Imitative Model (DIM). The results are reported incorporating the ‘Error Aligned Uncertainty Calibration’ loss LEaUC as secondary loss to Shifts pipeline as shown in Equation 6.
The aim is to learn distributions capturing uncertainty during training to better estimate uncertainty during inference through sampling and to predict trajectories for the next 5 seconds with data collected with 5 Hz sampling rate, which makes the length of the prediction 25.
During training, for each BC and DIM models, the density estimator (likelihood model) is generated by teacher-forcing (e.g., from the distribution of ground truth trajectories). The model is trained with AdamW optimizer with a learning rate (LR) of 1e-4, using a cosine annealing LR schedule with 1 epoch warmup, and gradient clipping at 1. Training is stopped after 100 epochs in each experiment.
During inference, Robust Imitative Planning is applied. Sampling is applied on the likelihood model considering a predetermined number of predictions G=10. Top D=5 predictions of the model (or multiple models in the use of ensembles) is selected according to their log likelihood. The predictive performance of the model using the weightedADE metric is shown. The quality of the relative weighting of the D trajectories with their corresponding normalized per-trajectory confidence scores C˜d, computed by applying softmax to log likelihood scores for each prediction, is assessed by calculating the weightedADE metric:
The joint quality assessment of uncertainty and robustness is achieved using both error retention curves and FI-weightedADE retention curves. The error metric is weightedADE and the retention fraction is based on per-prediction uncertainty score U in the retention curves. Mean Averaging is applied while computing U based on the per-plan log-likelihoods as well as for the aggregation of ensemble results.
The secondary loss incentivizes the model to align the uncertainty with average displacement error (ADE) while training the model. Experimental results are conducted by setting β (see Equation 7) as 200, adeth and uth as 0.8 and 0.6, respectively, for both BC and DIM models.
tanh is applied as bounding function for the robustness measure ade after scaling it with weight x (see Equation 5) to make the values applicable for the bounding function. x is set to 0.5 (x=0.5) so that samples are assigned with ADE below 1.6 as an accurate sample. In F1-retention evaluations, acceptable prediction threshold is selected as 1.6 as well.
The uncertainty metric is the confidence value based on log likelihood. To get a meaningful representation of uncertainty in the loss, likelihood scores were clipped between 0 and 100 range (numbers <0 set to eps and numbers >100 set to 100). Then confidence is normalized to [0, 1] range, and the output is directly used as the uncertainty measure (c, in Equation 5).
R-AUC decreases, and F1-AUC and F1@95% increase for both models using all Full, In, and Shifted datasets with the LEaUC loss, which indicates better calibration performance using all three metrics. The example apparatus and method disclosed herein to calibrate error aligned uncertainty for regression and continuous structured prediction tasks/optimizations outperform the results on two baselines, which indicates the approach disclosed herein provides well-calibrated uncertainties.
In addition to improving the quality of uncertainty, the approach to calculate calibration loss herein improves the model performance by reducing the weightedADE by 1.69% and 4.69% for BC and DIM, respectively.
weightedADE is observed to be higher for Shifted dataset compared to In-distribution dataset, which indicates that error is higher for out-of-distribution data.
Setting the accurate prediction threshold as 1.6, for the binary classification of samples as accurate and inaccurate, AUROC increases from 0.763 to 0.813, and from 0.761 to 0.822 when LEaUC is incorporated to BC and DIM models, respectively (see
Impact of Assigning Higher Weights to the Class of Accurate and Certain Samples (LC) in the EaUC Loss:
In safety-critical model prediction scenarios, it is important to have certainty in predictions when the predictions are accurate.
BC-EaUC/DIM-EaUC and BC-EaUC*/DIM-EaUC* denote the results according to Equation 5 and according to Equation 7, respectively. BC-EaUC* and DIM-EaUC* provide better performance in terms of robustness (weightedADE) and model calibration (R-AUC) compared to BC-EaUC and DIM-EaUC. Thus, experiments reported in
Additionally, even though BC-EaUC and DIM-EaUC provide not improved robustness (weightedADE) compared to their corresponding baseline performances (BC and DIM in
The disclosed method herein was evaluated on UCI regression datasets. The Bayesian neural network (BNN) is used with Monte Carlo dropout approximate Bayesian inference. In this setup, the neural network is used with two hidden layers fully-connected with 100 neurons and a ReLU activation. A dropout layer with probability of 0.5 is used after each hidden layer, with 20 Monte Carlo samples for approximate Bayesian inference. The optimal hyperparameters for each dataset using Bayesian optimization with HyperBand and the models are trained with an SGD optimizer and batch size of 128. The predictive variance from Monte Carlo forward passes is used as the uncertainty measure within the error aligned uncertainty calibration (EaUC) loss.
From the foregoing, it will be appreciated that example systems, methods, apparatus, and articles of manufacture have been disclosed that calibrate error aligned uncertainty for regression and continuous structured prediction tasks/optimizations. The disclosed systems, methods, apparatus, and articles of manufacture improve the efficiency of using a computing device by improving the calibration of an uncertainty prediction model to make the model more robust. The disclosed systems, methods, apparatus, and articles of manufacture are accordingly directed to one or more improvement(s) in the operation of a machine such as a computer or other electronic and/or mechanical device.
Further examples and combinations thereof include the following:
Example 1 includes an apparatus, comprising a prediction model, at least one memory, instructions, and processor circuitry to at least one of execute or instantiate the instructions to calculate a count of samples corresponding to an accuracy-certainty classification category, calculate a trainable uncertainty calibration loss value based on the calculated count, calculate a final differentiable loss value based on the trainable uncertainty calibration loss value, and calibrate the prediction model with the final differentiable loss value.
Example 2 includes the apparatus of example 1, wherein the accuracy-certainty classification category contains one of accurate and certain samples, inaccurate and certain samples, accurate and uncertain samples, or inaccurate and uncertain samples.
Example 3 includes the apparatus of example 1, wherein the count of samples corresponding to the accuracy-certainty classification category is determined using a regression model.
Example 4 includes the apparatus of example 1, wherein a standard negative log likelihood loss is calculated as a primary loss value.
Example 5 includes the apparatus of example 4, wherein the standard negative log likelihood loss is added to the trainable uncertainty calibration loss to calculate the final differentiable loss value.
Example 6 includes the apparatus of example 1, wherein a robustness score is calculated and used to calibrate the prediction model with the final differentiable loss value.
Example 7 includes the apparatus of example 6, wherein the robustness score is calculated using an Average Displacement Error (ADE).
Example 8 includes a non-transitory computer readable medium comprising instructions that, when executed, cause a machine to at least calculate a count of samples corresponding to an accuracy-certainty classification category, calculate a trainable uncertainty calibration loss value based on the calculated count, calculate a final differentiable loss value based on the trainable uncertainty calibration loss value, and calibrate a prediction model with the final differentiable loss value.
Example 9 includes the non-transitory computer readable medium of example 8, wherein the accuracy-certainty classification category contains one of accurate and certain samples, inaccurate and certain samples, accurate and uncertain samples, or inaccurate and certain samples.
Example 10 includes the non-transitory computer readable medium of example 8, wherein the count of samples corresponding to the accuracy-certainty classification category is determined using a regression model.
Example 11 includes the non-transitory computer readable medium of example 8, wherein a standard negative log likelihood loss is calculated as a primary loss value.
Example 12 includes the non-transitory computer readable medium of example 11, wherein the standard negative log likelihood loss is added to the trainable uncertainty calibration loss to calculate the final differentiable loss value.
Example 13 includes the non-transitory computer readable medium of example 8, wherein a robustness score is calculated and used to calibrate the prediction model with the final differentiable loss value.
Example 14 includes the non-transitory compute readable medium of example 13, wherein the robustness score is calculated using an Average Displacement Error (ADE).
Example 15 includes a method for uncertainty calibration, the method comprising calculating a count of samples corresponding to an accuracy-certainty classification category, calculating a trainable uncertainty calibration loss value based on the calculated count, calculating a final differentiable loss value based on the trainable uncertainty calibration loss value, and calibrating a prediction model with the final differentiable loss value.
Example 16 includes the method of example 15, wherein the accuracy-certainty classification category contains one of accurate and certain samples, inaccurate and certain samples, accurate and uncertain samples, or inaccurate and uncertain samples.
Example 17 includes the method of example 15, wherein the count of samples corresponding to the accuracy-certainty classification category is determined using a regression model.
Example 18 includes the method of example 15, wherein a standard negative log likelihood loss is calculated as a primary loss value.
Example 19 includes the method of example 18, wherein the standard log likelihood loss is added to the trainable uncertainty calibration loss to calculate the final differentiable loss value.
Example 20 includes the method of example 15, wherein a robustness score is calculated and used to calibrate the prediction model with the final differentiable loss value.
Example 21 includes the method of example 20, wherein the robustness score is calculated using an Average Displacement Error (ADE).
Although certain example systems, methods, apparatus, and articles of manufacture have been disclosed herein, the scope of coverage of this patent is not limited thereto. On the contrary, this patent covers all systems, methods, apparatus, and articles of manufacture fairly falling within the scope of the claims of this patent.
The following claims are hereby incorporated into this Detailed Description by this reference, with each claim standing on its own as a separate embodiment of the present disclosure.
Number | Date | Country | |
---|---|---|---|
63313688 | Feb 2022 | US |