The embodiments relate generally to natural language processing and machine learning systems, and more specifically to an online conformal prediction method.
Machine learning systems have been widely used in decision making, for example, outputting predictions using past experience such as weather forecasting, power usage forecasting, and/or the like. In high stakes decision-making tasks, machine learning systems would not only provide the predictions, but also quantify a certainty level of these predictions. Conformal prediction is one of the tools for quantifying uncertainties for predictions, to generate prediction sets that associate each input with a set of candidate labels, such as prediction intervals for regression, and label sets for classification. In an online setting where the data distribution may vary arbitrarily over time, online conformal prediction techniques may be adopted to leverage regret minimization algorithms to learn prediction sets with approximately valid coverage and small regret. However, in regard to uncertainty quantification, traditional regret minimization could be insufficient for handling changing environments, where performance guarantees may be desired not only over the full time horizon but also in all (sub-)intervals of time.
Therefore, there is a need for improving online conformal prediction.
Embodiments of the disclosure and their advantages are best understood by referring to the detailed description that follows. It should be appreciated that like reference numerals are used to identify like elements illustrated in one or more of the figures, wherein showings therein are for purposes of illustrating embodiments of the disclosure and not for purposes of limiting the same.
As used herein, the term “network” may comprise any hardware or software-based framework that includes any artificial intelligence network or system, neural network or system and/or any training or learning models implemented thereon or therewith.
As used herein, the term “module” may comprise hardware or software-based framework that performs one or more functions. In some embodiments, the module may be implemented on one or more neural networks.
Conformal prediction is one of the tools for quantifying uncertainties for predictions, to generate prediction sets that associate each input with a set of candidate labels, such as prediction intervals for regression, and label sets for classification. For example, for training examples (x, y) E×
to predict a label y from input x. A conformal predictor generates a prediction set C:
→2y, which is a set-valued function that maps any input x to a set of predicted labels C(x)⊂
. Two prevalent examples are prediction intervals for regression in which
=
and C (x) is an interval, and label prediction sets for (m-class) classification in which
=[m] and C(x) is a subset of [m]. Prediction sets are a popular approach to quantify the uncertainty associated with the point prediction ŷ=f(x) of a black box model, e.g., in weather forecasts, power usage forecasts, market forecasts, and/or the like. Conformal prediction may be applied to compute the best and worst scenarios to help in making sensible decisions.
Specifically, when data arrives in a sequential order, conformal prediction may be applied in an online fashion with the sequentially received data input. At each time step, the online conformal predictor outputs a prediction set parameterized by a single radius parameter that controls the size of the set. After receiving the true label, the predictor adjusts this parameter adaptively via regret minimization techniques-such as Online Gradient Descent (OGD) on a certain quantile loss over the radius. These methods are shown to achieve empirical coverage frequency close to 1−α, regardless of the data distribution, with a being a target coverage level.
While traditional regret minimization techniques achieve coverage and regret guarantees, they may fall short in more dynamic environments where a strong performance not just over the entire time horizon (as captured by the regret), but also within every sub-interval of time. For example, if the data distribution shifts abruptly for a few times, strong performance is desired within each contiguous interval between two consecutive shifts, in addition to the entire horizon. Some existing systems adopt the Fully Adaptive Conformal Inference (FACI) algorithm, a meta-algorithm that aggregates multiple experts (base learners) that are OGD instances with different learning rates. However, these methods may not be best suited for achieving such interval-based guarantees, as each expert still runs over the full time horizon and is not really localized. In other words, regret minimization is limited as the regret measures performance over the entire time horizon [T], which may be insufficient when the algorithm encounters changing environments. For example, when the “true radius” of the prediction set, e.g., (i. e. the smallest radius s such that the prediction set at time t: Ĉt covers Yt), St:=inf{s∈: Yt∈Ĉt (Xt, s)}, 1 for 1≤T/2 and St=100 for T/2<t≤T, then achieving small regret on all (sub)-intervals of size T/2 is a much stronger guarantee than achieving small regret over [T]. This is also reflected in the fact that FACI achieves a near-optimal
(√k) regret within intervals of a fixed length k, but is unable to achieve this over all lengths k∈[T] simultaneously. For this reason, localized guarantees over all intervals simultaneously are desired, to prevent worst-case scenarios such as significant miscoverage or high radius within a specific interval.
In view of the need for learning prediction sets with a valid coverage and small regret in response to the online setting, embodiments described herein provide a Strongly Adaptive Online Conformal Prediction (SAOCP) framework that manages multiple experts each for predicting a respective prediction radius, while each expert only operates on its own active interval. An aggregated prediction radius may be computed as a weighted sum of the predicted radii, each weighted by the respective probability that the respective expert is active at the time step. Specifically, each expert may be operated with a Scale-Free OGD (SF-OGD) method to update the generated predicted radius. A base conformal predictor may then generate a prediction set using the aggregated radius at the time step.
In this way, the SAOCP framework achieves a near-optimal strongly adaptive regret of (√k) regret over all intervals of length k simultaneously, and that both SAOCP and SF-OGD achieve approximately valid coverage. Accuracy of prediction is largely improved. Furthermore, the online conformal prediction method consistently attains better coverage and smaller prediction sets on real-world tasks, such as time series forecasting and image classification under distribution shift, as further illustrated in
In one embodiment, each radius predictor 110a-n may implement a Scale-Free OGD (SF-OGD) that decays its effective learning rate based on cumulative past gradient norms, as further described in relation to
In one embodiment, online samples (X1Y1), . . . , (XTYT) may arrive sequentially. At each time step t∈[T], the online sample of input Xt 105a and the corresponding output label Yt 105b may be observed. A new radius predictor t may be instantiated with an active interval [t, t+L(t)−1], where L(t) is its lifetime:
and g∈≥1 is a multiplier for the lifetime of each expert 110a-n. Therefore, at time t, an active set 130 of radius predictor 110a-b may be determined. It is noted that the number of radius predictors 110a-n, and/or the number of active radius predictor 110a-b at time t are for illustrative purpose only, and any number of (active) radius predictors may be engaged. It is also noted that most g└log2t┘ experts are active at any time t under choice (1), granting the SAOCP framework 100 a total runtime of
(T log T) for any g=Θ(1), which improves system efficiency.
In one embodiment, at each time t∈[T], each radius predictor 110a-b in the active set 130 may generate a predicted radius parameter 116a-b, respectively, at the time step, e.g., ŝi,t∈ where i represents the ith active radius predictor. Then, at any time t, the predicted radius ŝt 116 is obtained by aggregating the predictions of active radius predictors 110a-b:
where the weights{pi,t}i∈[t] rely on the {wi,t}i∈[t] computed by the coin betting scheme, as further described at lines 4-6 of Alg. 1 in 2A FIG.
Given the predicted radius ŝt, the base conformal predictor 150 generates a prediction set Ĉt=Ĉt(Xt) 118 based on the current input Xt 105a and past observations {(Xi,Yi)}i≤t−1, before observing the true label Yt. For example, over the time interval, the family (Ct)t∈[T] is generated through one or more base predictors 150 (for example, {circumflex over (f)}t (for example, {circumflex over (f)}t=f can be a fixed pretrained model). The base conformal predictor 150 in regression is a base predictor {circumflex over (f)}t:→
, and choose Ĉt(Xt,s):=[{circumflex over (f)}t(Xt)−s, {circumflex over (f)}t(Xt)+s] to be a prediction interval around {circumflex over (f)}t(Xt), in which case the radius s is the (half) width of the interval. In one example, Ct 118 are nested sets in the sense that Ĉt(x, s)⊆Ĉt(x, s′) for all x∈
and s≤s′, so that a larger radius always yields a larger set.
An example property of the prediction set 118 is to achieve valid coverage: [Yt∈Ĉt (Xt)]=1−α, where 1−α∈(0,1) is the target coverage level pre-determined by the user. Example choices for α include {0.1, 0.05}, which correspond to {90%, 95%} target coverage respectively.
In one embodiment, the SAOCP framework 100 adopts online learning techniques to learn the predicted radius ŝt based on past observations. For example, defining the “true radius” St:=inf{s∈:Yt∈Ĉt(Xt, s)} (i.e. the smallest radius s such that Ĉt covers Yt), the (1−α)-quantile loss 120 between St and any predicted radius ŝ by the active radius predictor 110a-b is computed by:
It is assumed that all true radii are bounded: St∈[0, D] almost surely for all t∈[T].
Therefore, after observing input Xt 105a, predicting the radius ŝt 116, and observing the label Yt 105b (and hence computing the true radius St), the gradient ∇(t) (ŝt) of the quantile loss 120 can be computed as:
where errt is the indicator of miscoverage at time t(errt=1 if Ĉt did not cover Yt). An Online Gradient Descent (OGD) step is performed to obtain ŝt+1:
where η>0 is a learning rate, and the algorithm is initialized at some ŝ1∈. Update (3) increases the predicted radius if Ĉt did not cover Yt (errt=1), and decreases the radius otherwise. This makes intuitive sense as an approach for adapting the radius to recent observations.
As illustrated, the method 200 includes a number of enumerated steps, but aspects of the method 200 may include additional steps before, after, and in between the enumerated steps. In some aspects, one or more of the enumerated steps may be omitted or performed in a different order.
At step 202, from a memory (e.g., 420 in
For example, at each time step, a new radius predictor t may be instantiated with an active interval, e.g., corresponding to line 2 of Alg. 1 in
In one implementation, for each online radius predictor (including the newly initialized radius predictor), a respective lifetime based on a current time instance is computed, e.g., according to Eq. (1). The active set of online radius predictors is then selected at the current time instance based on lifetimes of the set of online radius predictors from the current time instance, e.g., corresponding to line 3 of Alg. 1 in
At step 204, the active set of online radius predictors may generate a predicted radius (e.g., 116 in
In one embodiment, a prediction set (e.g., 118 in
In steps 206-218, a meta loss and per-expert losses are computed to update the experts (radius predictors). At step 206, a ground-truth radius may be computed based on a ground-truth prediction corresponding to the real-time input variable and a prediction set (e.g., 118 in
At step 208, a quantile loss (e.g., 120 in
For all the online radius predictors in the active set, the online radius predictors are trained based on the quantile loss at step 210. The trained respective online radius predicator may then generate a next predicted radius at step 212, e.g., according to line 11 of Alg. 1 in
At step 214, for each online radius predictor in the active set, a respective predictor quantile loss is computed between the ground-truth radius and a respective predicted radius from the respective online radius predictor according to the target coverage level, and then at step 216, a gradient is computed based on a difference between the quantile loss and the respective predictor quantile loss corresponding to the respective radius predictor, e.g., according to line 12 of Alg. 1 in
At step 218, updating parameters of the respective online radius predictor based on the computed gradient, e.g., according to line 13 of Alg. 1 in
As illustrated, the method 300 includes a number of enumerated steps, but aspects of the method 300 may include additional steps before, after, and in between the enumerated steps. In some aspects, one or more of the enumerated steps may be omitted or performed in a different order.
At step 302, the real-time input variable (e.g., 105a in
At step 304, each respective online radius predictor (e.g., 110a-b in
At step 306, a respective ground-truth radius may be computed based on the ground-truth prediction and the respective prediction set, e.g., according to line 4 of Alg. 2 in
At step 308, for the radius predictor, a respective quantile loss may be computed between the respective ground-truth radius and the respective predicted radius according to the target coverage level, e.g., according to line 5 of Alg. 2 in
At step 310, the respective predicted radius may be updated for a next time instance based on the respective predicted radius at the current time instance and a gradient of the respective quantile loss, e.g., according to line 6 of Alg. 2 in
For example, the predicted radius for the next time step may be updated by:
Method 300 of the SF-OGD may be implemented as a strong regret minimization algorithm itself. In other words, SF-OGD can also be run independently (over [T]) as an algorithm for online conformal prediction to generate the prediction set at step 304 and then update the predicted radius at step 310 at each time step. On the quantile loss (3) (executed over the full horizon [T] with learning rate η=Θ(D); η=D/√{square root over (3)} is optimal), SF-OGD enjoys an anytime regret guarantee:
Further, the SAOCP method 200 in
The (D√{square root over (k)}) rate achieved by SAOCP is near-optimal for general online convex optimization problems, due to the standard regret lower bound Ω(D√{square root over (k)}) over any fixed interval of length k.
Therefore, the SARegret guarantee of SAOCP method 200 improves substantially over the traditional FACI algorithm (Gibbs & Candès, 2022), an extension of ACI. Concretely, the SARegret bound for SAOCP method 200 holds simultaneously for all lengths k. By contrast, FACI only achieves SAReg(T, k)≤(D2/η+ηk), where η>0 is their meta-algorithm learning rate. This can imply the same rate
(D√{square root over (k)}) for a single k by optimizing η, but not multiple values of k simultaneously.
Also, in terms of algorithm styles, while both SAOCP and FACI are meta-algorithms that maintain multiple experts (base algorithms), a main difference between them is that all experts in FACI differ in their learning rates and are all active over [T], whereas experts in SAOCP differ in their active intervals.
Memory 420 may be used to store software executed by computing device 400 and/or one or more data structures used during operation of computing device 400. Memory 420 may include one or more types of machine-readable media. Some common forms of machine-readable media may include floppy disk, flexible disk, hard disk, magnetic tape, any other magnetic medium, CD-ROM, any other optical medium, punch cards, paper tape, any other physical medium with patterns of holes, RAM, PROM, EPROM, FLASH-EPROM, any other memory chip or cartridge, and/or any other medium from which a processor or computer is adapted to read.
Processor 410 and/or memory 420 may be arranged in any suitable physical arrangement. In some embodiments, processor 410 and/or memory 420 may be implemented on a same board, in a same package (e.g., system-in-package), on a same chip (e.g., system-on-chip), and/or the like. In some embodiments, processor 410 and/or memory 420 may include distributed, virtualized, and/or containerized computing resources. Consistent with such embodiments, processor 410 and/or memory 420 may be located in one or more data centers and/or cloud computing facilities.
In some examples, memory 420 may include non-transitory, tangible, machine readable media that includes executable code that when run by one or more processors (e.g., processor 410) may cause the one or more processors to perform the methods described in further detail herein. For example, as shown, memory 420 includes instructions for a conformal prediction module 430 that may be used to implement and/or emulate the systems and models, and/or to implement any of the methods described further herein. The conformal prediction module 430 may receive input 440 such as an input training data (e.g., documents and/or photos) via the data interface 415 and generate an output 450 which may be a prediction set for conformal prediction models.
The data interface 415 may comprise a communication interface, a user interface (such as a voice input interface, a graphical user interface, and/or the like). For example, the computing device 400 may receive the input 440 (such as a training dataset) from a networked database via a communication interface. Or the computing device 400 may receive the input 440, such as a photo, a question, a sentence, an article, and a document, from a user via the user interface.
In some embodiments, the SAOCP module 430 is configured to provide an output 450 in the form of an online conformal prediction set (e.g., 118 in
In one embodiment, the SAOCP module 430 may store parameters and/or weights of the submodules 431 and 432. The SAOCP module 430 may further comprise processor-executable instructions to perform the method 200 illustrated in
In one embodiment, the SAOCP module 430 and one or more of its submodules 431 and 432 may be implemented via an artificial neural network. The neural network comprises a computing system that is built on a collection of connected units or nodes, referred as neurons. Each neuron receives an input signal and then generates an output by a non-linear transformation of the input signal. Neurons are often connected by edges, and an adjustable weight is often associated to the edge. The neurons are often aggregated into layers such that different layers may perform different transformations on the respective input and output transformed input data onto the next layer. Therefore, the neural network may be stored at memory 420 as a structure of layers of neurons, and parameters describing the non-linear transformation at each neuron and the weights associated with edges connecting the neurons.
In one embodiment, the neural network based SAOCP module 430 and one or more of its submodules 431 and 432 may be trained by updating the underlying parameters of the neural network based on a loss (e.g., the quantile loss computed in Eq. (3)) from the training process of dense retrievers. For example, the loss is a metric that evaluates how far away a neural network model generates a predicted output value from its target output value (also referred to as the “ground-truth” value), e.g., the “truth output” label 105b in
Some examples of computing devices, such as computing device 400 may include non-transitory, tangible, machine readable media that include executable code that when run by one or more processors (e.g., processor 410) may cause the one or more processors to perform the processes of method. Some common forms of machine-readable media that may include the processes of method are, for example, floppy disk, flexible disk, hard disk, magnetic tape, any other magnetic medium, CD-ROM, any other optical medium, punch cards, paper tape, any other physical medium with patterns of holes, RAM, PROM, EPROM, FLASH-EPROM, any other memory chip or cartridge, and/or any other medium from which a processor or computer is adapted to read.
The user device 510, data vendor servers 545, 570 and 580, and the server 530 may communicate with each other over a network 560. User device 510 may be utilized by a user 540 (e.g., a driver, a system admin, etc.) to access the various features available for user device 510, which may include processes and/or applications associated with the server 530 to receive an output data anomaly report.
User device 510, data vendor server 545, and the server 530 may each include one or more processors, memories, and other appropriate components for executing instructions such as program code and/or data stored on one or more computer readable mediums to implement the various applications, data, and steps described herein. For example, such instructions may be stored in one or more computer readable media such as memories or data storage devices internal and/or external to various components of system 500, and/or accessible over network 560.
User device 510 may be implemented as a communication device that may utilize appropriate hardware and software configured for wired and/or wireless communication with data vendor server 545 and/or the server 530. For example, in one embodiment, user device 510 may be implemented as an autonomous driving vehicle, a personal computer (PC), a smart phone, laptop/tablet computer, wristwatch with appropriate computer hardware resources, eyeglasses with appropriate computer hardware (e.g., GOOGLE GLASS®), other type of wearable computing device, implantable communication devices, and/or other types of computing devices capable of transmitting and/or receiving data, such as an IPAD® from APPLE®. Although only one communication device is shown, a plurality of communication devices may function similarly.
User device 510 of
In various embodiments, user device 510 includes other applications 516 as may be desired in particular embodiments to provide features to user device 510. For example, other applications 516 may include security applications for implementing client-side security features, programmatic client applications for interfacing with appropriate application programming interfaces (APIs) over network 560, or other types of applications. Other applications 516 may also include communication applications, such as email, texting, voice, social networking, and IM applications that allow a user to send and receive emails, calls, texts, and other notifications through network 560. For example, the other application 516 may be an email or instant messaging application that receives a prediction result message from the server 530. Other applications 516 may include device interfaces and other display modules that may receive input and/or output information. For example, other applications 516 may contain software programs for asset management, executable by a processor, including a graphical user interface (GUI) configured to provide an interface to the user 540 to view a prediction output.
User device 510 may further include database 518 stored in a transitory and/or non-transitory memory of user device 510, which may store various applications and data and be utilized during execution of various modules of user device 510. Database 518 may store user profile relating to the user 540, predictions previously viewed or saved by the user 540, historical data received from the server 530, and/or the like. In some embodiments, database 518 may be local to user device 510. However, in other embodiments, database 518 may be external to user device 510 and accessible by user device 510, including cloud storage systems and/or databases that are accessible over network 560.
User device 510 includes at least one network interface component 517 adapted to communicate with data vendor server 545 and/or the server 530. In various embodiments, network interface component 517 may include a DSL (e.g., Digital Subscriber Line) modem, a PSTN (Public Switched Telephone Network) modem, an Ethernet device, a broadband device, a satellite device and/or various other types of wired and/or wireless network communication devices including microwave, radio frequency, infrared, Bluetooth, and near field communication devices.
Data vendor server 545 may correspond to a server that hosts database 519 to provide training datasets, including Wikipedia, CommonCrawl, Open-Domain Question Answering datasets, to the server 530. The database 519 may be implemented by one or more relational database, distributed databases, cloud databases, and/or the like.
The data vendor server 545 includes at least one network interface component 526 adapted to communicate with user device 510 and/or the server 530. In various embodiments, network interface component 526 may include a DSL (e.g., Digital Subscriber Line) modem, a PSTN (Public Switched Telephone Network) modem, an Ethernet device, a broadband device, a satellite device and/or various other types of wired and/or wireless network communication devices including microwave, radio frequency, infrared, Bluetooth, and near field communication devices. For example, in one implementation, the data vendor server 545 may send asset information from the database 519, via the network interface 526, to the server 530.
The server 530 may be housed with the conformal prediction module 130 and its submodules described in
The database 532 may be stored in a transitory and/or non-transitory memory of the server 530. In one implementation, the database 532 may store data obtained from the data vendor server 545. In one implementation, the database 532 may store parameters of the conformal prediction module 130. In one implementation, the database 532 may store previously generated prediction output/prediction set, and the corresponding input feature vectors.
In some embodiments, database 532 may be local to the server 530. However, in other embodiments, database 532 may be external to the server 530 and accessible by the server 530, including cloud storage systems and/or databases that are accessible over network 560.
The server 530 includes at least one network interface component 533 adapted to communicate with user device 510 and/or data vendor servers 545, 570 or 580 over network 560. In various embodiments, network interface component 533 may comprise a DSL (e.g., Digital Subscriber Line) modem, a PSTN (Public Switched Telephone Network) modem, an Ethernet device, a broadband device, a satellite device and/or various other types of wired and/or wireless network communication devices including microwave, radio frequency (RF), and infrared (IR) communication devices.
Network 560 may be implemented as a single network or a combination of multiple networks. For example, in various embodiments, network 560 may include the Internet or one or more intranets, landline networks, wireless networks, and/or other appropriate types of networks. Thus, network 560 may correspond to small scale communication networks, such as a private or local area network, or a larger scale network, such as a wide area network or the Internet, accessible by the various components of system 500.
In some embodiments, the SAOCP framework 100 shown in
, where the base predictor {circumflex over (f)} uses the history Xt:=y1:t to predict H steps into the future, i.e. {circumflex over (f)}(Xt)={{circumflex over (f)}(h)(Xt)}h∈[H]={ŷt+h(h)}h∈[H], where ŷt+h(h) is a prediction for yt+h. Using {circumflex over (f)}(Xt), each SAOCP 610a-n may produce the prediction sets 612a-n as a fixed-width prediction intervals:
Where ŝt(h) 611a-n is predicted by an independent copy of the SAOCP framework 100 for each h∈[H] (so that there are H such algorithms in parallel). The online setting may be formed using a standard rolling window evaluation loop, wherein each batch consists of predicting all H intervals {Ĉty(h)}h∈[H], observing all H true values {yt+h}h∈[H], and moving to the next batch by setting t→t+H. For each h∈[H], only yt+h is evaluated against one interval Ĉt(h)(Xt,ŝt(h)). After the evaluation is done, all pairs {(yt+k, ŷt+k(h))}k∈[H] are compared to update ŝt(h)→ŝt+H(h).
In one embodiment, each SAOCP 610a-n may employ base predictors (e.g., similar to 150 in
Example datasets for time series forecasting may include 5111 time series: the hourly (414 time series), daily (4227 time series), and weekly (359 time series) subsets of the M4 Competition, a dataset of time series from many domains including industries, demographics, environment, finance, and transportation (Makridakis et al., The m4 competition: Results, findings, conclusion and way forward, International Journal of Forecasting, 34(4):802-808, 2018); and NN5, a dataset of 111 time series of daily banking data (Ben Taieb et al., A review and comparison of strategies for multi-step ahead time series forecasting based on the nn5 forecasting competition. Expert Systems with Applications, 39(8):7067-7083, 2012). Each time series is normalized to lie in [0, 1].
In the example experiments, horizons H of 24, 30, and 26 for hourly, daily, and weekly data, are adopted, respectively. Each time series of length L is split into a training set of length L-120 with 80% for training the base predictor and 20% for initializing the UQ methods, and a test set of length 120 to test the UQ methods.
For each experiment, the following statistics is averaged across all time series: global coverage, median width, worst-case local coverage error
and strongly adaptive regret SAReg(T, k), referred to as SARegk. In all cases, an interval length of k=20 is used. The average mean absolute error (MAE) of each base predictor is also used as a performance metric as shown in
Two regimes may be considered: sudden shifts where the corruption level alternates between 0 (the base test set) and 5, and gradual shifts where the corruption level increases in the order of {0, 1, . . . , 5}. 500 data points are randomly sampled from the input image 702a for each corruption level before changing to the next level.
The SAOCP framework 705 may generate a prediction set 708 for each data point in the input image 702a as follows. Let {circumflex over (f)}: d→Δm be a classifier that outputs a probability distribution on the m-simplex. At each t, Ut˜Unif[0,1] is sampled and let
where π is the permutation that ranks {circumflex over (f)}x in decreasing order, π(ky)=y, and λ and kreg are regularization parameters designed to reduce the size of the prediction set. For TinyImageNet, λ=0.01 and Kreg=20. For ImageNet, λ=0.01 and Kreg=10.
When evaluating the UQ methods, the local coverage and prediction set size (PSS) of each method are considered using an interval length of k=100,
the local coverage to a target of 1−α, while the local PSS is compared to the 1−α empirical quantile of the oracle set sizes PSS*t=|{y:St (Xt,y)≤St(Xt, Yt)}|. These targets are the “best fixed” values in each window. The worstcase local coverage error LCE100 is also considered.
As shown in
There are multiple instances where all of SCP/NExCP/FACI fail to attain global coverage in (0.85, 0.95) (
All methods besides SCP predict sets of similar sizes, though FACI's, FACI-S's, and NExCP's prediction set sizes adapt more slowly to changes in the best fixed size (e.g., t E [500, 700] for gradual shift in
Embodiments described herein provide an improved conformal prediction method using models which minimize regret to provide a prediction set with valid coverage and small regret.
This description and the accompanying drawings that illustrate inventive aspects, embodiments, implementations, or applications should not be taken as limiting. Various mechanical, compositional, structural, electrical, and operational changes may be made without departing from the spirit and scope of this description and the claims. In some instances, well-known circuits, structures, or techniques have not been shown or described in detail in order not to obscure the embodiments of this disclosure. Like numbers in two or more figures represent the same or similar elements.
In this description, specific details are set forth describing some embodiments consistent with the present disclosure. Numerous specific details are set forth in order to provide a thorough understanding of the embodiments. It will be apparent, however, to one skilled in the art that some embodiments may be practiced without some or all of these specific details. The specific embodiments disclosed herein are meant to be illustrative but not limiting. One skilled in the art may realize other elements that, although not specifically described here, are within the scope and the spirit of this disclosure. In addition, to avoid unnecessary repetition, one or more features shown and described in association with one embodiment may be incorporated into other embodiments unless specifically described otherwise or if the one or more features would make an embodiment non-functional.
Although illustrative embodiments have been shown and described, a wide range of modification, change and substitution is contemplated in the foregoing disclosure and in some instances, some features of the embodiments may be employed without a corresponding use of other features. One of ordinary skill in the art would recognize many variations, alternatives, and modifications. Thus, the scope of the invention should be limited only by the following claims, and it is appropriate that the claims be construed broadly and, in a manner, consistent with the scope of the embodiments disclosed herein.
The application is a nonprovisional of and claims priority under 35 U.S.C. 119 to co-pending and commonly-owned U.S. provisional application No. 63/481,564, filed Jan. 25, 2023, which is hereby expressly incorporated by reference herein in its entirety.
| Number | Date | Country | |
|---|---|---|---|
| 63481564 | Jan 2023 | US |