Not applicable.
Not applicable.
The present invention relates to the analysis of data where outlier elements are removed (or filtered) from the analysis development. The analysis may be related to the computation of simple statistics or more complex operations involving mathematical models that use data in their development. The purpose of outlier data filtering may be to perform data quality and data validation operations, or to compute representative standards, statistics, data groups that have applications in subsequent analyses, regression analysis, time series analysis or qualified data for mathematical models development.
Removing outlier data in standards or data-driven model development is an important part of the pre-analysis work to ensure a representative and fair analysis is developed from the underlying data. For example, developing equitable benchmarking of greenhouse gas standards for carbon dioxide (CO2), ozone (O3), water vapor (H2O), hydrofluorocarbons (HFCs), perfluorocarbons (PFCs), chlorofluorocarbons (CFCs), sulfur hexafluoride (SF6), methane (CH4), nitrous oxide (N2O), carbon monoxide (CO), nitrogen oxides (NOx), and non-methane volatile organic compounds (NMVOCs) emissions requires that collected industrial data used in the standards development exhibit certain properties. Extremely good or bad performance by a few of the industrial sites should not bias the standards computed for other sites. It may be judged unfair or unrepresentative to include such performance results in the standard calculations. In the past, the performance outliers were removed via a semi-quantitative process requiring subjective input. The present system and method is a data-driven approach that performs this task as an integral part of the model development, and not at the pre-analysis or pre-model development stage.
The removal of bias can be a subjective process wherein justification is documented in some form to substantiate data changes. However, any form of outlier removal is a form of data censoring that carries the potential for changing calculation results. Such data filtering may or may not reduce bias or error in the calculation and in the spirit of full analysis disclosure, strict data removal guidelines and documentation to remove outliers needs to be included with the analysis results. Therefore, there is a need in the art to provide a new system and method for objectively removing outlier data bias using a dynamic statistical process useful for the purposes of data quality operations, data validation, statistic calculations or mathematical model development, etc. The outlier bias removal system and method can also be used to group data into representative categories where the data is applied to the development of mathematical models customized to each group. In a preferred embodiment, coefficients are defined as multiplicative and additive factors in mathematical models and also other numerical parameters that are nonlinear in nature. For example, in the mathematical model, f(x,y,z)=a*x+b*yc+d*sin(ez)+f, a, b, c, d, e, and f are all defined as coefficients. The values of these terms may be fixed or part of the development of the mathematical model.
A preferred embodiment includes a computer implemented method for reducing outlier bias comprising the steps of: selecting a bias criteria; providing a data set; providing a set of model coefficients; selecting a set of target values; (1) generating a set of predicted values for the complete data set; (2) generating an error set for the dataset; (3) generating a set of error threshold values based on the error set and the bias criteria; (4) generating, by a processor, a censored data set based on the error set and the set of error threshold values; (5) generating, by the processor, a set of new model coefficients; and (6) using the set of new model coefficients, repeating steps (1)-(5), unless a censoring performance termination criteria is satisfied. In a preferred embodiment, the set of predicted values may be generated based on the data set and the set of model coefficients. In a preferred embodiment, the error set may comprise a set of absolute errors and a set of relative errors, generated based on the set of predicted values and the set of target values. In another embodiment, the error set may comprise values calculated as the difference between the set of predicted values and the set of target values. In another embodiment, the step of generating the set of new coefficients may further comprise the step of minimizing the set of errors between the set of predicted values and the set of actual values, which can be accomplished using a linear, or a non-linear optimization model. In a preferred embodiment, the censoring performance termination criteria may be based on a standard error and a coefficient of determination.
Another embodiment includes a computer implemented method for reducing outlier bias comprising the steps of: selecting an error criteria; selecting a data set; selecting a set of actual values; selecting an initial set of model coefficients; generating a set of model predicted values based on the complete data set and the initial set of model coefficients; (1) generating a set of errors based on the model predicted values and the set of actual values for the complete dataset; (2) generating a set of error threshold values based on the complete set of errors and the error criteria for the complete data set; (3) generating an outlier removed data set, wherein the filtering is based on the complete data set and the set of error threshold values; (4) generating a set of new coefficients based on the filtered data set and the set of previous coefficients, wherein the generation of the set of new coefficients is performed by the computer processor; (5) generating a set of outlier bias reduced model predicted values based on the filtered data set and the set of new model coefficients, wherein the generation of the set of outlier bias reduced model predicted values is performed by a computer processor; (6) generating a set of model performance values based on the model predicted values and the set of actual values; repeating steps (1)-(6), while substituting the set of new coefficients for the set of coefficients from the previous iteration, unless: a performance termination criteria is satisfied; and storing the set of model predicted values in a computer data medium.
Another embodiment includes a computer implemented method for reducing outlier bias comprising the steps of: selecting a target variable for a facility; selecting a set of actual values of the target variable; identifying a plurality of variables for the facility that are related to the target variable; obtaining a data set for the facility, the data set comprising values for the plurality of variables; selecting a bias criteria; selecting a set of model coefficients; (1) generating a set of predicted values based on the complete data set and the set of model coefficients; (2) generating a set of censoring model performance values based on the set of predicted values and the set of actual values; (3) generating an error set based on the set of predicted values and the set of actual values for the target variable; (4) generating a set of error threshold values based on the error set and the bias criteria; (5) generating, by a processor, a censored data set based on the data set and the set of error thresholds; (6) generating, by the processor, a set of new model coefficients based on the censored data set and the set of model coefficients; (7) generating, by the processor, a set of new predicted values based on the data set and the set of new model coefficients; (8) generating a set of new censoring model performance values based on the set of new predicted values and the set of actual values; using the set of new coefficients, repeating steps (1)-(8) unless a censoring performance termination criteria is satisfied; and storing the set of new model predicted values in a computer data medium.
Another embodiment includes a computer implemented method for reducing outlier bias comprising the steps of: determining a target variable for a facility, wherein the target variable is a metric for an industrial facility related to its production, financial performance, or emissions; identifying a plurality of variables for the facility, wherein the plurality of variables comprises: a plurality of direct variables for the facility that influence the target variable; and a set of transformed variables for the facility, each transformed variable is a function of at least one direct facility variable that influences the target variable; selecting an error criteria comprising: an absolute error, and a relative error; obtaining a data set for the facility, wherein the data set comprises values for the plurality of variables; selecting a set of actual values of the target variable; selecting an initial set of model coefficients; generating a set of model predicted values based on the complete data set and the initial set of model coefficients; generating a complete set of errors based on the set of model predicted values and the set of actual values, wherein the relative error is calculated using the formula: Relative Errorm=((Predicted Valuem−Actual Valuem)/Actual Valuem)2 wherein ‘m’ is a reference number, and wherein the absolute error is calculated using the formula: Absolute Errorm=(Predicted Valuem−Actual Valuem); generating a set of model performance values based on the set of model predicted values and the set of actual values, wherein the set of overall model performance values comprises of: a first standard error, and a first coefficient of determination; (1) generating a set of errors based on the model predicted values and the set of actual values for the complete dataset; (2) generating a set of error threshold values based on the complete set of errors and the error criteria for the complete data set; (3) generating an outlier removed data set by removing data with error values greater than or equal to the error threshold values, wherein the filtering is based on the complete data set and the set of error threshold values; (4) generating a set of outlier bias reduced model predicted values based on the outlier removed data set and the set of model coefficients by minimizing the error between the set of predicted values and the set of actual values using at least one of: a linear optimization model, and a nonlinear optimization model, wherein the generation of the new model predicted values is performed by a computer processor; (5) generating a set of new coefficients based on the outlier removed data set and the previous set of coefficients, wherein the generation of the set of new coefficients is performed by the computer processor; (6) generating a set of overall model performance values based on the set of new predicted model values and the set of actual values, wherein the set of model performance values comprise: a second standard error, and a second coefficient of determination; repeating steps (1)-(6), while substituting the set of new coefficients for the set of coefficients from the previous iteration, unless: a performance termination criteria is satisfied, wherein the performance termination criteria comprises: a standard error termination value and a coefficient of determination termination value, and wherein satisfying the performance termination criteria comprises: the standard error termination value is greater than the difference between the first and second standard error, and the coefficient of determination termination value is greater than the difference between the first and second coefficient of determination; and storing the set of new model predicted values in a computer data medium.
Another embodiment includes a computer implemented method for reducing outlier bias comprising the steps of: selecting an error criteria; selecting a data set; selecting a set of actual values; selecting an initial set of model predicted values; determining a set of errors based on the set of model predicted values and the set of actual values; (1) determining a set of error threshold values based on the complete set of errors and the error criteria; (2) generating an outlier removed data set, wherein the filtering is based on the data set and the set of error threshold values; (3) generating a set of outlier bias reduced model predicted values based on the outlier removed data set and the previous model predicted values, wherein the generation of the set of outlier bias reduced model predicted values is performed by a computer processor; (4) determining a set of errors based on the set of new model predicted values and the set of actual values; repeating steps (1)-(4), while substituting the set of new model predicted values for the set of model predicted values from the previous iteration, unless: a performance termination criteria is satisfied; and storing the set of outlier bias reduced model predicted values in a computer data medium.
Another embodiment includes a computer implemented method for reducing outlier bias comprising the steps of: determining a target variable for a facility; identifying a plurality of variables for the facility, wherein the plurality of variables comprises: a plurality of direct variables for the facility that influence the target variable; and a set of transformed variables for the facility, each transformed variable being a function of at least one direct facility variable that influences the target variable; selecting an error criteria comprising: an absolute error, and a relative error; obtaining a data set, wherein the data set comprises values for the plurality of variables, and selecting a set of actual values of the target variable; selecting an initial set of model coefficients; generating a set of model predicted values by applying a set of model coefficients to the data set Absolute Errork=(Model Resultsk−Actual Emissionsk)2; determining a set of performance values based on the set of model predicted values and the set of actual values, wherein the set of performance values comprises: a first standard error, and a first coefficient of determination; (1) generating a set of errors based on the set of model predicted values and the set of actual values for the complete dataset, wherein the relative error is calculated using the formula: Relative Errorm=((Predicted Valuem−Actual Valuem)/Actual Valuem), wherein ‘m’ is a reference number, and wherein the absolute error is calculated using the formula: Absolute Errorm=(Predicted Valuem−Actual Valuem)) (2) generating a set of error threshold values based on the complete set of errors and the error criteria for the complete data set; (3) generating an outlier removed data set by removing data with error values greater than or equal to the set of error threshold values, wherein the filtering is based on the data set and the set of error threshold values; (4) generating a set of new coefficients based on the outlier removed data set and the set of previous coefficients (5) generating a set of outlier bias reduced model predicted values based on the outlier removed data set and the set of new model coefficient by minimizing the error between the set of predicted values and the set of actual values using at least one of: a linear optimization model, and a nonlinear optimization model, wherein the generation of the model predicted values is performed by a computer processor; (6) generating a set of updated performance values based on the set of outlier bias reduced model predicted values and the set of actual values, wherein the set of updated performance values comprises: a second standard error, and a second coefficient of determination; repeating steps (1)-(6), while substituting the set of new coefficients for the set of coefficients from the previous iteration, unless: a performance termination criteria is satisfied, wherein the performance termination criteria comprises: a standard error termination value, and a coefficient of determination termination value, and wherein satisfying the performance termination criteria comprises the standard error termination value is greater than the difference between the first and second standard error, and the coefficient of determination termination value is greater than the difference between the first and second coefficient of determination; and storing the set of outlier bias reduction factors in a computer data medium.
Another embodiment includes a computer implemented method for assessing the viability of a data set as used in developing a model comprising the steps of: providing a target data set comprising a plurality of data values; generating a random target data set based on the target dataset; selecting a set of bias criteria values; generating, by a processor, an outlier bias reduced target data set based on the data set and each of the selected bias criteria values; generating, by the processor, an outlier bias reduced random data set based on the random data set and each of the selected bias criteria values; calculating a set of error values for the outlier bias reduced data set and the outlier bias reduced random data set; calculating a set of correlation coefficients for the outlier bias reduced data set and the outlier bias reduced random data set; generating bias criteria curves for the data set and the random data set based on the selected bias criteria values and the corresponding error value and correlation coefficient; and comparing the bias criteria curve for the data set to the bias criteria curve for the random data set. The outlier bias reduced target data set and the outlier bias reduced random target data set are generated using the Dynamic Outlier Bias Removal methodology. The random target data set can comprise of randomized data values developed from values within the range of the plurality of data values. Also, the set of error values can comprise a set of standard errors, and wherein the set of correlation coefficients comprises a set of coefficient of determination values. Another embodiment can further comprise the step of generating automated advice regarding the viability of the target data set to support the developed model, and vice versa, based on comparing the bias criteria curve for the target data set to the bias criteria curve for the random target data set. Advice can be generated based on parameters selected by analysts, such as a correlation coefficient threshold and/or an error threshold. Yet another embodiment further comprises the steps of: providing an actual data set comprising a plurality of actual data values corresponding to the model predicted values; generating a random actual data set based on the actual data set; generating, by a processor, an outlier bias reduced actual data set based on the actual data set and each of the selected bias criteria values; generating, by the processor, an outlier bias reduced random actual data set based on the random actual data set and each of the selected bias criteria values; generating, for each selected bias criteria, a random data plot based on the outlier bias reduced random target data set and the outlier bias reduced random actual data; generating, for each selected bias criteria, a realistic data plot based on the outlier bias reduced target data set and the outlier bias reduced actual target data set; and comparing the random data plot with the realistic data plot corresponding to each of the selected bias criteria.
A preferred embodiment includes a system comprising: a server, comprising: a processor, and a storage subsystem; a database stored by the storage subsystem comprising: a data set; and a computer program stored by the storage subsystem comprising instructions that, when executed, cause the processor to: select a bias criteria; provide a set of model coefficients; select a set of target values; (1) generate a set of predicted values for the data set; (2) generate an error set for the dataset; (3) generate a set of error threshold values based on the error set and the bias criteria; (4) generate a censored data set based on the error set and the set of error threshold values; (5) generate a set of new model coefficients; and (6) using the set of new model coefficients, repeat steps (1)-(5), unless a censoring performance termination criteria is satisfied. In a preferred embodiment, the set of predicted values may be generated based on the data set and the set of model coefficients. In a preferred embodiment, the error set may comprise a set of absolute errors and a set of relative errors, generated based on the set of predicted values and the set of target values. In another embodiment, the error set may comprise values calculated as the difference between the set of predicted values and the set of target values. In another embodiment, the step of generating the set of new coefficients may further comprise the step of minimizing the set of errors between the set of predicted values and the set of actual values, which can be accomplished using a linear, or a non-linear optimization model. In a preferred embodiment, the censoring performance termination criteria may be based on a standard error and a coefficient of determination.
Another embodiment of the present invention includes a system comprising: a server, comprising: a processor, and a storage subsystem; a database stored by the storage subsystem comprising: a data set; and a computer program stored by the storage subsystem comprising instructions that, when executed, cause the processor to: select an error criteria; select a set of actual values; select an initial set of coefficients; generate a complete set of model predicted values from the data set and the initial set of coefficients; (1) generate a set of errors based on the model predicted values and the set of actual values for the complete dataset; (2) generate a set of error threshold values based on the complete set of errors and the error criteria for the complete data set; (3) generate an outlier removed data set, wherein the filtering is based on the complete data set and the set of error threshold values; (4) generate a set of outlier bias reduced model predicted values based on the outlier removed data set and the set of coefficients, wherein the generation of the set of outlier bias reduced model predicted values is performed by a computer processor; (5) generate a set of new coefficients based on the outlier removed data set and the set of previous coefficients, wherein the generation of the set of new coefficients is performed by the computer processor; (6) generate a set of model performance values based on the outlier bias reduced model predicted values and the set of actual values; repeat steps (1)-(6), while substituting the set of new coefficients for the set of coefficients from the previous iteration, unless: a performance termination criteria is satisfied; and store the set of overall outlier bias reduction model predicted values in a computer data medium.
Yet another embodiment includes a system comprising: a server, comprising: a processor, and a storage subsystem; a database stored by the storage subsystem comprising: a target variable for a facility; a set of actual values of the target variable; a plurality of variables for the facility that are related to the target variable; a data set for the facility, the data set comprising values for the plurality of variables; and a computer program stored by the storage subsystem comprising instructions that, when executed, cause the processor to: select a bias criteria; select a set of model coefficients; (1) generate a set of predicted values based on the data set and the set of model coefficients; (2) generate a set of censoring model performance values based on the set of predicted values and the set of actual values; (3) generate an error set based on the set of predicted values and the set of actual values for the target variable; (4) generate a set of error threshold values based on the error set and the bias criteria; (5) generate a censored data set based on the data set and the set of error thresholds; (6) generate a set of new model coefficients based on the censored data set and the set of model coefficients; (7) generate a set of new predicted values based on the data set and the set of new model coefficients; (8) generate a set of new censoring model performance values based on the set of new predicted values and the set of actual values; using the set of new coefficients, repeat steps (1)-(8) unless a censoring performance termination criteria is satisfied; and storing the set of new model predicted values in the storage subsystem.
Another embodiment includes a system comprising: a server, comprising: a processor, and a storage subsystem; a database stored by the storage subsystem comprising: a data set for a facility; and a computer program stored by the storage subsystem comprising instructions that, when executed, cause the processor to: determine a target variable; identify a plurality of variables, wherein the plurality of variables comprises: a plurality of direct variables for the facility that influence the target variable; and a set of transformed variables for the facility, each transformed variables being a function of at least one direct variable that influences the target variable; select an error criteria comprising: an absolute error, and a relative error; select a set of actual values of the target variable; select an initial set of coefficients; generate a set of model predicted values based on the data set and the initial set of coefficients; determine a set of errors based on the set of model predicted values and the set of actual values, wherein the relative error is calculated using the formula: Relative Errorm=((Predicted Valuem−Actual Valuem)/Actual Valuem), wherein ‘m’ is a reference number, and wherein the absolute error is calculated using the formula: Absolute Errorm=(Predicted Valuem−Actual Valuem); Absolute Errork=(Model Resultsk−Actual Emissionsk)2 determine a set of performance values based on the set of model predicted values and the set of actual values; wherein the set of performance values comprises: a first standard error, and a first coefficient of determination; (1) generate a set of errors based on the model predicted values and the set of actual values; (2) generating a set of error threshold values based on the complete set of errors and the error criteria for the complete data set; (3) generate an outlier removed data set by filtering data with error values outside the set of error threshold values, wherein the filtering is based on the data set and the set of error threshold values; (4) generate a set of new model predicted values based on the outlier removed data set and the set of coefficients by minimizing an error between the set of model predicted values and the set of actual values using at least one of: a linear optimization model, and a nonlinear optimization model, wherein the generation of the outlier bias reduced model predicted values is performed by a computer processor; (5) generate a set of new coefficients based on the outlier removed data set and the set of previous coefficients, wherein the generation of the set of new coefficients is performed by the computer processor; (6) generate a set of performance values based on the set of new model predicted values and the set of actual values; wherein the set of model performance values comprises: a second standard error, and a second coefficient of determination; repeat steps (1)-(6), while substituting the set of new coefficients for the set of coefficients from the previous iteration, unless: a performance termination criteria is satisfied, wherein the performance termination criteria comprises: a standard error, and a coefficient of determination, and wherein satisfying the performance termination criteria comprises: the standard error termination value is greater than the difference between the first and second standard error, and the coefficient of determination termination value is greater than the difference between the first and second coefficient of determination; and store the set of new model predicted values in a computer data medium.
Another embodiment of the present invention includes a system comprising: a server, comprising: a processor, and a storage subsystem; a database stored by the storage subsystem comprising: a data set, a computer program stored by the storage subsystem comprising instructions that, when executed, cause the processor to: select an error criteria; select a data set; select a set of actual values; select an initial set of model predicted values; determine a set of errors based on the set of model predicted values and the set of actual values; (1) determine a set of error threshold values based on the complete set of errors and the error criteria; (2) generate an outlier removed data set, wherein the filtering is based on the data set and the set of error threshold values; (3) generate a set of outlier bias reduced model predicted values based on the outlier removed data set and the complete set of model predicted values, wherein the generation of the set of outlier bias reduced model predicted values is performed by a computer processor; (4) determine a set of errors based on the set of outlier bias reduction model predicted values and the corresponding set of actual values; repeat steps (1)-(4), while substituting the set of outlier bias reduction model predicted values for the set of model predicted values unless: a performance termination criteria is satisfied; and store the set of outlier bias reduction factors in a computer data medium.
Another embodiment of the present invention includes a system comprising: a server, comprising: a processor, and a storage subsystem; a database stored by the storage subsystem comprising: a data set, a computer program stored by the storage subsystem comprising instructions that, when executed, cause the processor to: determine a target variable; identify a plurality of variables for the facility, wherein the plurality of variables comprises: a plurality of direct variables for the facility that influence the target variable; and a set of transformed variables for the facility, each transformed variable is a function of at least one primary facility variable that influences the target variable; select an error criteria comprising: an absolute error, and a relative error; obtain a data set, wherein the data set comprises values for the plurality of variables, and select a set of actual values of the target variable; select an initial set of coefficients; generate a set of model predicted values by applying the set of model coefficients to the data set; determine a set of performance values based on the set of model predicted values and the set of actual values, wherein the set of performance values comprises: a first standard error, and a first coefficient of determination; (1) determine a set of errors based on the set of model predicted values and the set of actual values, wherein the relative error is calculated using the formula: Relative Errork=((Predicted Valuek−Actual Valuek)/Actual Valuek)2, wherein ‘k’ is a reference number, and wherein the absolute error is calculated using the formula: Absolute Errork=(Predicted Valuek−Actual Valuek)2, Absolute Errork=(Model Resultsk−Actual Emissionsk)2; (2) determine a set of error threshold values based on the set of errors and the error criteria for the complete data set; (3) generate an outlier removed data set by removing data with error values greater than or equal to the error threshold values, wherein the filtering is based on the data set and the set of error threshold values; (4) generate a set of new coefficients based on the outlier removed dataset and the set of previous coefficients; (5) generate a set of outlier bias reduced model values based on the outlier removed data set and the set of coefficients and minimizing an error between the set of predicted values and the set of actual values using at least one of: a linear optimization model, and a nonlinear optimization model; (5) determine a set of updated performance values based on the set of outlier bias reduced model predicted values and the set of actual values, wherein the set of updated performance values comprises: a second standard error, and a second coefficient of determination; repeat steps (1)-(5), while substituting the set of new coefficients for the set of coefficients from the previous iteration, unless: a performance termination criteria is satisfied, wherein the performance termination criteria comprises: a standard error termination value, and a coefficient of determination termination value, and wherein satisfying the performance termination criteria comprises the standard error termination value is greater than the difference between the first and second standard error, and the coefficient of determination termination value is greater than the difference between the first and second coefficient of determination; and storing the set of outlier bias reduction factors in a computer data medium.
Yet another embodiment includes a system for assessing the viability of a data set as used in developing a model comprising: a server, comprising: a processor, and a storage subsystem; a database stored by the storage subsystem comprising: a target data set comprising a plurality of model predicted values; a computer program stored by the storage subsystem comprising instructions that, when executed, cause the processor to: generate a random target data set; select a set of bias criteria values; generate outlier bias reduced data sets based on the target data set and each of the selected bias criteria values; generate an outlier bias reduced random target data set based on the random target data set and each of the selected bias criteria values; calculate a set of error values for the outlier bias reduced target data set and the outlier bias reduced random target data set; calculate a set of correlation coefficients for the outlier bias reduced target data set and the outlier bias reduced random target data set; generate bias criteria curves for the target data set and the random target data set based on the corresponding error value and correlation coefficient for each selected bias criteria; and compare the bias criteria curve for the target data set to the bias criteria curve for the random target data set. The processor generates the outlier bias reduced target data set and the outlier bias reduced random target data set using the Dynamic Outlier Bias Removal methodology. The random target data set can comprise of randomized data values developed from values within the range of the plurality of data values. Also, the set of error values can comprise a set of standard errors, and the set of correlation coefficients comprises a set of coefficient of determination values. In another embodiment, the program further comprises instructions that, when executed, cause the processor to generate automated advice based on comparing the bias criteria curve for the target data set to the bias criteria curve for the random target data set. Advice can be generated based on parameters selected by analysts, such as a correlation coefficient threshold and/or an error threshold. In yet another embodiment, the system's database further comprises an actual data set comprising a plurality of actual data values corresponding to the model predicted values, and the program further comprises instructions that, when executed, cause the processor to: generate a random actual data set based on the actual data set; generate an outlier bias reduced actual data set based on the actual data set and each of the selected bias criteria values; generate an outlier bias reduced random actual data set based on the random actual data set and each of the selected bias criteria values; generate, for each selected bias criteria, a random data plot based on the outlier bias reduced random target data set and the outlier bias reduced random actual data; generate, for each selected bias criteria, a realistic data plot based on the outlier bias reduced target data set and the outlier bias reduced actual target data set; and compare the random data plot with the realistic data plot corresponding to each of the selected bias criteria.
The following disclosure provides many different embodiments, or examples, for implementing different features of a system and method for accessing and managing structured content. Specific examples of components, processes, and implementations are described to help clarify the invention. These are merely examples and are not intended to limit the invention from that described in the claims. Well-known elements are presented without detailed description so as not to obscure the preferred embodiments of the present invention with unnecessary detail. For the most part, details unnecessary to obtain a complete understanding of the preferred embodiments of the present invention have been omitted inasmuch as such details are within the skills of persons of ordinary skill in the relevant art.
A mathematical description of one embodiment of Dynamic Outlier Bias Reduction is shown as follows:
Initial Step 1: Using initial model coefficient estimates, {circumflex over (β)}Q=1, compute initial model predicted values by applying the model to the complete data set:
{circumflex over (Q)}1=M({circumflex over (X)}:{circumflex over (β)}Q=1)
Initial Step 2: Compute initial model performance results:
{circumflex over (Ω)}1=f({circumflex over (Q)}1,Ã,k=0,r2, standard error, etc.)
Initial Step 3: Compute model error threshold value(s):
E1=F(Ψ,C)
Initial Step 4: Filter the data records to remove outliers:
{circumflex over (X)}1={∀×∈{circumflex over (X)}|Ψ({circumflex over (Q)}1,<B1}
Iterative Computations, k>0
Iteration Step 1: Compute predicted values by applying the model to the accepted data set:
{circumflex over (Q)}k+1=M({circumflex over (X)}k:{circumflex over (β)}k→k+1)
Iteration Step 2: Compute model performance results:
{circumflex over (Ω)}k+1=f({circumflex over (Q)}k+1,Â,k,r2, standard error, etc.)
If termination criteria are achieved, stop, otherwise proceed to Step 3:
Iteration Step 3: Compute results for removed data, {circumflex over (X)}Ck={∀×∈X|×∉{circumflex over (X)}k} using current model:
{circumflex over (Q)}Ck+1=M({circumflex over (X)}Ck:{circumflex over (β)}k→k+1)
Iteration Step 4: Compute model error threshold values:
Ek+1=F(Ψ({circumflex over (Q)}k+1+{circumflex over (Q)}Ck+1,C)
Iteration Step 5: Filter the data records to remove outliers:
{circumflex over (X)}k+1={∀×∈{circumflex over (X)}|Ψ({circumflex over (Q)}k+1+{circumflex over (Q)}Ck+1<Bk+1}
Another mathematical description of one embodiment of Dynamic Outlier Bias Reduction is shown as follows:
Nomenclature:
Initial Computation, k=0
Iterative Computations, k>0
Increment k and proceed to Iteration Step 1.
After each iteration where new model coefficients are computed from the current censored dataset, the removed data from the previous iteration plus the current censored data are recombined. This combination encompasses all data values in the complete dataset. The current model coefficients are then applied to the complete dataset to compute a complete set of predicted values. The absolute and relative errors are computed for the complete set of predicted values and new bias criteria percentile threshold values are computed. A new censored dataset is created by removing all data values where the absolute or relative errors are greater than the threshold values and the nonlinear optimization model is then applied to the newly censored dataset computing new model coefficients. This process enables all data values to be reviewed every iteration for their possible inclusion in the model dataset. It is possible that some data values that were excluded in previous iterations will be included in subsequent iterations as the model coefficients converge on values that best fit the data.
In one embodiment, variations in GHG emissions can result in overestimation or underestimation of emission results leading to bias in model predicted values. These non-industrial influences, such as environmental conditions and errors in calculation procedures, can cause the results for a particular facility to be radically different from similar facilities, unless the bias in the model predicted values is removed. The bias in the model predicted values may also exist due to unique operating conditions.
The bias can be removed manually by simply removing a facility's data from the calculation if analysts are confident that a facility's calculations are in error or possess unique, extenuating characteristics. Yet, when measuring a facility performance from many different companies, regions, and countries, precise a priori knowledge of the data details is not realistic. Therefore any analyst-based data removal procedure has the potential for adding undocumented, non-data supported biases to the model results.
In one embodiment, Dynamic Outlier Bias Reduction is applied to a procedure that uses the data and a prescribed overall error criteria to determine statistical outliers that are removed from the model coefficient calculations. This is a data-driven process that identifies outliers using a data produced global error criteria using for example, the percentile function. The use of Dynamic Outlier Bias Reduction is not limited to the reduction of bias in model predicted values, and its use in this embodiment is illustrative and exemplary only. Dynamic Outlier Bias Reduction may also be used, for example, to remove outliers from any statistical data set, including use in calculation of, but not limited to, arithmetic averages, linear regressions, and trend lines. The outlier facilities are still ranked from the calculation results, but the outliers are not used in the filtered data set applied to compute model coefficients or statistical results.
A standard procedure, commonly used to remove outliers, is to compute the standard deviation (a) of the data set and simply define all data outside a 2σ interval of the mean, for example, as outliers. This procedure has statistical assumptions that, in general, cannot be tested in practice. The Dynamic Outlier Bias Reduction method description applied in an embodiment of this invention, is outlined in
Relative Errorm=((Predicted Valuem−Actual Valuem)/Actual Valuem)2 (1)
Absolute Errorm=(Predicted Valuem−Actual Valuem)2 (2)
Absolute Errork=(Model Resultsk−Actual Emissionsk)2
In Step 110, the analyst specifies the error threshold criteria that will define outliers to be removed from the calculations. For example using the percentile operation as the error function, a percentile value of 80 percent for relative and absolute errors could be set. This means that data values less than the 80th percentile value for a relative error and less than the 80th percentile value for absolute error calculation will be included and the remaining values are removed or considered as outliers. In this example, for a data value to avoid being removed, the data value must be less than both the relative and absolute error 80th percentile values. However, the percentile thresholds for relative and absolute error may be varied independently, and, in another embodiment, only one of the percentile thresholds may be used.
In Step 120, the model standard error and coefficient of determination (r2) percent change criteria are specified. While the values of these statistics will vary from model to model, the percent change in the preceding iteration procedure can be preset, for example, at 5 percent. These values can be used to terminate the iteration procedure. Another termination criteria could be the simple iteration count.
In Step 130, the optimization calculation is performed, which produces the model coefficients and predicted values for each facility.
In Step 140, the relative and absolute errors for all facilities are computed using Eqns. (1) and (2).
In Step 150, the error function with the threshold criteria specified in Step 110 is applied to the data computed in Step 140 to determine outlier threshold values.
In Step 160, the data is filtered to include only facilities where the relative error, absolute error, or both errors, depending on the chosen configuration, are less than the error threshold values computed in Step 150.
In Step 170, the optimization calculation is performed using only the outlier removed data set.
In Step 180, the percent change of the standard error and r2 are compared with the criteria specified in Step 120. If the percent change is greater than the criteria, the process is repeated by returning to Step 140. Otherwise, the iteration procedure is terminated in step 190 and the resultant model computed from this Dynamic Outlier Bias Reduction criteria procedure is completed. The model results are applied to all facilities regardless of their current iterative past removed or admitted data status.
In another embodiment, the process begins with the selection of certain iterative parameters, specifically:
(1) an absolute error and relative error percentile value wherein one, the other or both may be used in the iterative process,
(2) a coefficient of determination (also known as r2) improvement value, and
(3) a standard error improvement value.
The process begins with an original data set, a set of actual data, and either at least one coefficient or a factor used to calculate predicted values based on the original data set. A coefficient or set of coefficients will be applied to the original data set to create a set of predicted values. The set of coefficients may include, but is not limited to, scalars, exponents, parameters, and periodic functions. The set of predicted data is then compared to the set of actual data. A standard error and a coefficient of determination are calculated based on the differences between the predicted and actual data. The absolute and relative error associated with each one of the data points is used to remove data outliers based on the user-selected absolute and relative error percentile values. Ranking the data is not necessary, as all data falling outside the range associated with the percentile values for absolute and/or relative error are removed from the original data set. The use of absolute and relative errors to filter data is illustrative and for exemplary purposes only, as the method may be performed with only absolute or relative error or with another function.
The data associated with the absolute and relative error within a user-selected percentile range is the outlier removed data set, and each iteration of the process will have its own filtered data set. This first outlier removed data set is used to determine predicted values that will be compared with actual values. At least one coefficient is determined by optimizing the errors, and then the coefficient is used to generate predicted values based on the first outlier removed data set. The outlier bias reduced coefficients serve as the mechanism by which knowledge is passed from one iteration to the next.
After the first outlier removed data set is created, the standard error and coefficient of determination are calculated and compared with the standard error and coefficient of determination of the original data set. If the difference in standard error and the difference in coefficient of determination are both below their respective improvement values, then the process stops. However, if at least one of the improvement criteria is not met, then the process continues with another iteration. The use of standard error and coefficient of determination as checks for the iterative process is illustrative and exemplary only, as the check can be performed using only the standard error or only the coefficient of determination, a different statistical check, or some other performance termination criteria (such as number of iterations).
Assuming that the first iteration fails to meet the improvement criteria, the second iteration begins by applying the first outlier bias reduced data coefficients to the original data to determine a new set of predicted values. The original data is then processed again, establishing absolute and relative error for the data points as well as the standard error and coefficient of determination values for the original data set while using the first outlier removed data set coefficients. The data is then filtered to form a second outlier removed data set and to determine coefficients based on the second outlier removed data set.
The second outlier removed data set, however, is not necessarily a subset of the first outlier removed data set and it is associated with second set of outlier bias reduced model coefficients, a second standard error, and a second coefficient of determination. Once those values are determined, the second standard error will be compared with the first standard error and the second coefficient of determination will be compared against the first coefficient of determination.
If the improvement value (for standard error and coefficient of determination) exceeds the difference in these parameters, then the process will end. If not, then another iteration will begin by processing the original data yet again; this time using the second outlier bias reduced coefficients to process the original data set and generate a new set of predicted values. Filtering based on the user-selected percentile value for absolute and relative error will create a third outlier removed data set that will be optimized to determine a set of third outlier bias reduced coefficients. The process will continue until the error improvement or other termination criteria are met (such as a convergence criteria or a specified number of iterations).
The output of this process will be a set of coefficients or model parameters, wherein a coefficient or model parameter is a mathematical value (or set of values), such as, but not limited to, a model predicted value for comparing data, slope and intercept values of a linear equation, exponents, or the coefficients of a polynomial. The output of Dynamic Outlier Bias Reduction will not be an output value of its own right, but rather the coefficients that will modify data to determine an output value.
In another embodiment, illustrated in
In Step 210 the initial data is listed in any order.
Step 220 constitutes the function or operation that is performed on the dataset. In this embodiment example, the function and operation is the ascending ranking of the data followed by successive arithmetic average calculations where each line corresponds to the average of all data at and above the line.
Step 230 computes the relative and absolute errors from the data using successive values from the results of Step 220.
Step 240 allows the analyst to enter the desired outlier removal error criteria(%). The Quality Criteria Value is the resultant value from the error calculations in Step 230 based on the data in Step 220.
Step 250 shows the data quality outlier filtered dataset. Specific values are removed if the relative and absolute errors exceed the specified error criteria given in Step 240.
Step 260 shows the arithmetic average calculation comparison between the complete and outlier removed datasets. The analyst is the final step as in all applied mathematical or statistical calculations judging if the identified outlier removed data elements are actually poor quality or not. The Dynamic Outlier Bias Reduction system and method eliminates the analyst from directly removing data but best practice guidelines suggest the analyst review and check the results for practical relevance.
In another embodiment illustrated in
In Step 310, the paired data is listed in any order.
Step 320 computes the relative and absolute errors for each ordered pair in the dataset.
Step 330 allows the analyst to enter the desired data validation criteria. In the example, both 90% relative and absolute error thresholds are selected. The Quality Criteria Value entries in Step 330 are the resultant absolute and relative error percentile values for the data shown in Step 320.
Step 340 shows the outlier removal process where data that may be invalid is removed from the dataset using the criteria that the relative and absolute error values both exceed the values corresponding to the user selected percentile values entered in Step 330. In practice other error criteria may be used and when multiple criteria are applied as shown in this example, any combination of error values may be applied to determine the outlier removal rules.
Step 350 computes the data validated and original data values statistical results. In this case, the Pearson Correlation Coefficient. These results are then reviewed for practical relevance by the analyst.
In another embodiment, Dynamic Outlier Bias Reduction is used to perform a validation of an entire data set. Standard error improvement value, coefficient of determination improvement value, and absolute and relative error thresholds are selected, and then the data set is filtered according to the error criteria. Even if the original data set is of high quality, there will still be some data that will have error values that fall outside the absolute and relative error thresholds. Therefore, it is important to determine if any removal of data is necessary. If the outlier removed data set passes the standard error improvement and coefficient of determination improvement criteria after the first iteration, then the original data set has been validated, since the filtered data set produced a standard error and coefficient of determination that too small to be considered significant (e.g. below the selected improvement values).
In another embodiment, Dynamic Outlier Bias Reduction is used to provide insight into how the iterations of data outlier removal are influencing the calculation. Graphs or data tables are provided to allow the user to observe the progression in the data outlier removal calculations as each iteration is performed. This stepwise approach enables analysts to observe unique properties of the calculation that can add value and knowledge to the result. For example, the speed and nature of convergence can indicate the influence of Dynamic Outlier Bias Reduction on computing representative factors for a multi-dimensional data set.
As an illustration, consider a linear regression calculation over a poor quality data set of 87 records. The form of the equation being regressed is y=mx+b. Table 1 shows the results of the iterative process for 5 iterations. Notice that using relative and absolute error criteria of 95%, convergence is achieved in 3 iterations. Changes in the regression coefficients can be observed and the Dynamic Outlier Bias Reduction method reduced the calculation data set based on 79 records. The relatively low coefficient of determination (r2=39%) suggests that a lower (<95%) criteria should be tested to study the additional outlier removal effects on the r2 statistic and on the computed regression coefficients.
In Table 2 the results of applying Dynamic Outlier Bias Reduction are shown using the relative and absolute error criteria of 80%. Notice that a 15 percentage point (95% to 80%) change in outlier error criteria produced 35 percentage point (39% to 74%) increase in r2 with a 35% additional decrease in admitted data (79 to 51 records included). The analyst can use a graphical view of the changes in the regression lines with the outlier removed data and the numerical results of Tables 1 and 2 in the analysis process to communicate the outlier removed results to a wider audience and to provide more insights regarding the effects of data variability on the analysis results.
As illustrated in
As illustrated in
As illustrated in
As
As illustrated in
In addition to the above-described quantitative analysis facilitated by the illustrative graph of
The foregoing disclosure and description of the preferred embodiments of the invention are illustrative and explanatory thereof and it will be understood by those skilled in the art that various changes in the details of the illustrated system and method may be made without departing from the scope of the invention.
This application is a continuation application and claims the benefit, and priority benefit, of U.S. patent application Ser. No. 16/145,544, filed Sep. 28, 2018 by Richard B. Jones and entitled “Dynamic Outlier Bias Reduction System and Method,” which is incorporated herein by reference in its entirety.
Number | Name | Date | Kind |
---|---|---|---|
5339392 | Risberg et al. | Aug 1994 | A |
6085216 | Huberman et al. | Jul 2000 | A |
6832205 | Aragones et al. | Dec 2004 | B1 |
6847976 | Peace | Jan 2005 | B1 |
6988092 | Tang et al. | Jan 2006 | B1 |
7039654 | Eder | May 2006 | B1 |
7233910 | Hileman et al. | Jun 2007 | B2 |
7239984 | Moessner | Jul 2007 | B2 |
7313550 | Kulkarni et al. | Dec 2007 | B2 |
7447611 | Fluegge et al. | Nov 2008 | B2 |
7469228 | Bonissone et al. | Dec 2008 | B2 |
7536364 | Subu et al. | May 2009 | B2 |
7966150 | Smith et al. | Jun 2011 | B2 |
8050889 | Fluegge et al. | Nov 2011 | B2 |
8055472 | Fluegge et al. | Nov 2011 | B2 |
8060341 | Fluegge et al. | Nov 2011 | B2 |
8195484 | Jones et al. | Jun 2012 | B2 |
8346691 | Subramanian et al. | Jan 2013 | B1 |
8548833 | Jones et al. | Oct 2013 | B2 |
8554588 | Jones et al. | Oct 2013 | B2 |
8554589 | Jones et al. | Oct 2013 | B2 |
8595036 | Jones et al. | Nov 2013 | B2 |
8676610 | Jones et al. | Mar 2014 | B2 |
8686364 | Little, III et al. | Apr 2014 | B1 |
8719059 | Jones et al. | May 2014 | B2 |
8812331 | Jones et al. | Aug 2014 | B2 |
9069725 | Jones | Jun 2015 | B2 |
9111212 | Jones | Aug 2015 | B2 |
9536364 | Talty et al. | Jan 2017 | B2 |
9646262 | Phillipps et al. | May 2017 | B2 |
9659254 | Achin et al. | May 2017 | B2 |
10198339 | Salunke et al. | Feb 2019 | B2 |
10317854 | Nakagawa et al. | Jun 2019 | B2 |
10339695 | Petkov et al. | Jul 2019 | B2 |
10409891 | Jones | Sep 2019 | B2 |
10452992 | Lee et al. | Oct 2019 | B2 |
10557840 | Jones | Feb 2020 | B2 |
10638979 | Gupta et al. | May 2020 | B2 |
10739741 | Wenzel et al. | Aug 2020 | B2 |
11007891 | Kamal et al. | May 2021 | B1 |
11288602 | Jones | Mar 2022 | B2 |
11328177 | Jones | May 2022 | B2 |
11334645 | Jones | May 2022 | B2 |
11550874 | Jones | Jan 2023 | B2 |
20030171879 | Pittalwala et al. | Aug 2003 | A1 |
20030216627 | Lorenz et al. | Nov 2003 | A1 |
20040122625 | Nasser et al. | Jun 2004 | A1 |
20040172401 | Peace | Sep 2004 | A1 |
20040186927 | Eryurek et al. | Sep 2004 | A1 |
20040254764 | Wetzer et al. | Dec 2004 | A1 |
20050022168 | Zhu et al. | Jan 2005 | A1 |
20050038667 | Hileman et al. | Feb 2005 | A1 |
20050125322 | Lacomb et al. | Jun 2005 | A1 |
20050131794 | Lifson | Jun 2005 | A1 |
20050187848 | Bonissone et al. | Aug 2005 | A1 |
20060080040 | Garczarek et al. | Apr 2006 | A1 |
20060247798 | Subbu et al. | Nov 2006 | A1 |
20060259352 | Hileman et al. | Nov 2006 | A1 |
20060271210 | Subbu et al. | Nov 2006 | A1 |
20070035901 | Albrecht et al. | Feb 2007 | A1 |
20070105238 | Mandl et al. | May 2007 | A1 |
20070109301 | Smith | May 2007 | A1 |
20080015827 | Tryon et al. | Jan 2008 | A1 |
20080069437 | Baker | Mar 2008 | A1 |
20080104624 | Narasimhan et al. | May 2008 | A1 |
20080201181 | Hileman et al. | Aug 2008 | A1 |
20080300888 | Dell'Anno et al. | Dec 2008 | A1 |
20090093996 | Fluegge et al. | Apr 2009 | A1 |
20090143045 | Graves et al. | Jun 2009 | A1 |
20090287530 | Watanabe et al. | Nov 2009 | A1 |
20090292436 | D'Amato et al. | Nov 2009 | A1 |
20100036637 | Miguelanez et al. | Feb 2010 | A1 |
20100152962 | Bennett et al. | Jun 2010 | A1 |
20100153328 | Cormode et al. | Jun 2010 | A1 |
20100262442 | Wingenter | Oct 2010 | A1 |
20110153270 | Hoffman | Jun 2011 | A1 |
20110246409 | Mitra | Oct 2011 | A1 |
20120296584 | Itoh | Nov 2012 | A1 |
20130046727 | Jones | Feb 2013 | A1 |
20130173325 | Coleman et al. | Jul 2013 | A1 |
20130231904 | Jones | Sep 2013 | A1 |
20130262064 | Mazzaro et al. | Oct 2013 | A1 |
20150278160 | Jones | Oct 2015 | A1 |
20150294048 | Jones | Oct 2015 | A1 |
20150309963 | Jones | Oct 2015 | A1 |
20150309964 | Jones | Oct 2015 | A1 |
20150331023 | Hwang et al. | Nov 2015 | A1 |
20160239749 | Peredriy | Aug 2016 | A1 |
20170178332 | Lindner et al. | Jun 2017 | A1 |
20180189667 | Tsou et al. | Jul 2018 | A1 |
20180307741 | Kida | Oct 2018 | A1 |
20180329865 | Jones | Nov 2018 | A1 |
20190034473 | Jha et al. | Jan 2019 | A1 |
20190050510 | Mewes et al. | Feb 2019 | A1 |
20190102703 | Belyaev | Apr 2019 | A1 |
20190108561 | Shivashankar et al. | Apr 2019 | A1 |
20190213446 | Tsou et al. | Jul 2019 | A1 |
20190271673 | Jones | Sep 2019 | A1 |
20190287039 | Ridgeway | Sep 2019 | A1 |
20190296547 | Kelly et al. | Sep 2019 | A1 |
20190303662 | Madhani et al. | Oct 2019 | A1 |
20190303795 | Khiari et al. | Oct 2019 | A1 |
20190313963 | Hillen | Oct 2019 | A1 |
20200004802 | Jones | Jan 2020 | A1 |
20200074269 | Trygg et al. | Mar 2020 | A1 |
20200104651 | Jones | Apr 2020 | A1 |
20200160180 | Lehr et al. | May 2020 | A1 |
20200160229 | Atcheson | May 2020 | A1 |
20200167466 | Cheng | May 2020 | A1 |
20200175424 | Kursun | Jun 2020 | A1 |
20200182847 | Jones | Jun 2020 | A1 |
20200201727 | Nie et al. | Jun 2020 | A1 |
20200210781 | Desilets-Benoit et al. | Jul 2020 | A1 |
20200311615 | Jammalamadaka et al. | Oct 2020 | A1 |
20200349434 | Zhang et al. | Nov 2020 | A1 |
20200364561 | Ananthanarayanan et al. | Nov 2020 | A1 |
20200364583 | Pedersen | Nov 2020 | A1 |
20200387833 | Kursun | Dec 2020 | A1 |
20200387836 | Nasr-Azadani et al. | Dec 2020 | A1 |
20200402665 | Zhang et al. | Dec 2020 | A1 |
20210034581 | Boven et al. | Feb 2021 | A1 |
20210042643 | Hong et al. | Feb 2021 | A1 |
20210110313 | Jones | Apr 2021 | A1 |
20210136006 | Casey et al. | May 2021 | A1 |
20220092346 | Jones | Mar 2022 | A1 |
20220277058 | Jones | Sep 2022 | A1 |
20220277232 | Jones | Sep 2022 | A1 |
Number | Date | Country |
---|---|---|
2845827 | Feb 2013 | CA |
1199462 | Nov 1998 | CN |
1553712 | Dec 2004 | CN |
1770158 | May 2006 | CN |
102081765 | Jun 2011 | CN |
103077428 | May 2013 | CN |
102576311 | Apr 2014 | CN |
104090861 | Oct 2014 | CN |
10425488 | Dec 2014 | CN |
106471475 | Mar 2017 | CN |
104254848 | Apr 2017 | CN |
106919539 | Jul 2017 | CN |
106933779 | Jul 2017 | CN |
109299156 | Feb 2019 | CN |
ZL201410058245.X | Jun 2019 | CN |
110378386 | Oct 2019 | CN |
110411957 | Nov 2019 | CN |
110458374 | Nov 2019 | CN |
110543618 | Dec 2019 | CN |
110909822 | Mar 2020 | CN |
111080502 | Apr 2020 | CN |
111157698 | May 2020 | CN |
111709447 | Sep 2020 | CN |
112257963 | Jan 2021 | CN |
2745213 | Jun 2014 | EP |
2770442 | Aug 2014 | EP |
3129309 | Feb 2017 | EP |
3483797 | May 2019 | EP |
3493079 | Jun 2019 | EP |
3514700 | Jul 2019 | EP |
2004-068729 | Mar 2004 | JP |
2004-145496 | May 2004 | JP |
2004-191359 | Jul 2004 | JP |
2004-530967 | Oct 2004 | JP |
2007-522477 | Aug 2007 | JP |
2007-522658 | Aug 2007 | JP |
3968039 | Aug 2007 | JP |
2008-503277 | Feb 2008 | JP |
4042492 | Feb 2008 | JP |
2008-166644 | Jul 2008 | JP |
2008-191900 | Aug 2008 | JP |
2009-098093 | May 2009 | JP |
2009-253362 | Oct 2009 | JP |
2010-502308 | Jan 2010 | JP |
2010-250674 | Nov 2010 | JP |
2011-048688 | Mar 2011 | JP |
2012-155684 | Aug 2012 | JP |
2014-170532 | Sep 2014 | JP |
5592813 | Sep 2014 | JP |
5982489 | Aug 2016 | JP |
2017-514252 | Jun 2017 | JP |
6297855 | Mar 2018 | JP |
2018113048 | Jul 2018 | JP |
2018116712 | Jul 2018 | JP |
2018116713 | Jul 2018 | JP |
2018116714 | Jul 2018 | JP |
2018136945 | Aug 2018 | JP |
2018139109 | Sep 2018 | JP |
6527976 | May 2019 | JP |
6613329 | Nov 2019 | JP |
6626910 | Dec 2019 | JP |
6626911 | Dec 2019 | JP |
6636071 | Jan 2020 | JP |
6686056 | Apr 2020 | JP |
10-2008-0055132 | Jun 2008 | KR |
10-1010717 | Jan 2011 | KR |
10-2012-0117847 | Oct 2012 | KR |
10-1329395 | Nov 2013 | KR |
20140092805 | Jul 2014 | KR |
10-2024953 | Sep 2019 | KR |
10-2052217 | Dec 2019 | KR |
2007117233 | Oct 2007 | WO |
2008126209 | Jul 2010 | WO |
2011080548 | Jul 2011 | WO |
2011089959 | Jul 2011 | WO |
2013028532 | Feb 2013 | WO |
2015157745 | Oct 2015 | WO |
2019049546 | Mar 2019 | WO |
2020260927 | Dec 2020 | WO |
2021055847 | Mar 2021 | WO |
2022060411 | Mar 2022 | WO |
Entry |
---|
North American Electric Reliavility Council; Predicting Unit Availability: Top-Down Analyses for Predicting Electirc Generating Unit Availavility; Predicted Unit Availability Task Force, North American Electirc Reliability Council; US; Jun. 1991; 26 pages. |
Cipolla, Roberto et al.; Motion from the Frontier of Curved Surfaces; 5th International Conference on Computer Vision; Jun. 20-23, 1995; pp. 269-275. |
Richwine, Robert R.; Optimum Economic Performance: Reducing Costs and Improving Performance of Nuclear Power Plants; Rocky Mountain Electrical League, AIP-29; Keystone Colorado; Sep. 13-15, 1998; 11 pages. |
Richwine, Robert R.; Setting Optimum Economic Performance Goals to Meet the Challenges of a Competitive Business Environment; Rocky Mountain Electrical League; Keystone, Colorado; Sep. 13-15, 1998; 52 pages. |
Int'l Atomic Energy Agency; Developing Economic Performance Systems to Enhance Nuclear Poer Plant Competitiveness; International Atomic Energy Agency; Technical Report Series No. 406; Vienna, Austria; Feb. 2002; 92 pages. |
Richwine, Robert R.; Optimum Economic Availability; World Energy Council; Performance of Generating Plant Committee—Case Study of the Month Jul. 2002; Londong, UK; Jul. 2002; 3 pages. |
World Energy Council; Perfrmance of Generating Plant: New Realities, New Needs; World Energy Council; London, UK; Aug. 2004; 309 pages. |
Richwine, Robert R.; Maximizing Avilability May Not Optimize Plant Economics; World Energy Council, Performance of Generating Plant Committee—Case Study of the Month Oct. 2004; London, UK; Oct. 2004; 5 pages. |
Curley, Michael et al.; Benchmarking Seminar; North American Electirc Reliavility Council; San Diego, CA; Oct. 20, 2006; 133 pages. |
Richwine, Robert R.; Using Reliability Data in Power Plant Performance Improvement Programs; ASME Power Division Conference Workshop; San Antonio, TX; Jul. 16, 2007; 157 pages. |
Gang, Lu et al.; Balance Programming Between Target and Chance with Application in Building Optimal Bidding Strategies for Generation Companies; International Conference on Intelligent Systems Applications to Power System; Nov. 5-8, 2007; 8 pages. |
U.S. Patent and Trademark Office; Notice of Allowance and Fee(s) Due; issued in connection with U.S. Appl. No. 11/801,221; dated Jun. 23, 2008; 8 pages; US. |
U.S. Patent and Trademark Office; Supplemental Notice of Allowability; issued in connection with U.S. Appl. No. 11/801,221; dated Sep. 22, 2008; 6 pages; US. |
U.S. Patent and Trademark Office; Non-Final Office Action, issued against U.S. Appl. No. 12/264,117; dated Sep. 29, 2010' 19 pages; US. |
U.S. Patent and Trademark Office; Non-Final Office Action, issued against U.S. Appl. No. 12/264,127; dated Sep. 29, 2010; 18 pages; US. |
U.S. Patent and Trademark Office; Non-Final Office Action, issued against U.S. Appl. No. 12/264,136; dated Sep. 29, 2010; 17 pages; US. |
U.S. Patent and Trademark Office; Interview Summary, issued in connection with U.S. Appl. No. 12/264,117; dated Mar. 3, 2011; 9 pages; US. |
U.S. Patent and Trademark Office; Interview Summary, issued in connection with U.S. Appl. No. 12/264,136; dated Mar. 4, 2011; 9 pages; US. |
U.S. Patent and Trademark Office; Ex Parte Quayle Action, issued in connection with U.S. Appl. No. 12/264,136; Apr. 28, 2011; 7 pages; US. |
U.S. Patent and Trademark Office; Notice of Allowance and Fee(s) Due; issued in connection with U.S. Appl. No. 12/264,136; dated Jul. 26, 2011; 8 pages; US. |
U.S. Patent and Trademark Office; Notice of Allowance and Fee(s) Due; issued in connection with U.S. Appl. No. 12/264,117; dated Aug. 23, 2011; 13 pages/ US. |
U.S. Patent and Trademark Office; Notice of Allowance and Fee(s) Due; issued in connection with U.S. Appl. No. 12/264,127; dated Aug. 25, 2011; 12 pages; US. |
U.S. Patent and Trademark Office; Non-Final Office Action, Issued against U.S. Appl. No. 13/772,212; dated Apr. 9, 2014; 20 pages; US. |
European Patent Office, PCT International Search Report and Written Opinion, issued in connection to PCT/US2012/051390; dated Feb. 5, 2013; 9 pages; Europe. |
Japanese Patent Office; Office Action, Issued against Application No. JP2014-527202; dated Oct. 13, 2015; Japan. |
European Patent Office; Extended European Search Report, issued in connection to EP14155792.6; dated Aug. 18, 2014; 5 pages; Europe. |
European Patent Office; Communication Pursuant to Article 94(3) EPC, issued in connection to EP14155792.6; dated May 6, 2015; 2 pages; Europe. |
European Patent Office; Invitation Pursuant to Rule 137(4) EPC and Article 94(3) EPC, issued in connection to EP14155792.6; dated Jan. 3, 2018; 4 pages; Europe. |
European Patent Office, Result of Consultation, issued in connection to EP14155792.6; dated Jun. 19, 2018; 3 pages; Europe. |
European Patent Office; Communicaiton Pursuant to Article 94(3) EPC, issued in connection to EP12769196.2; dated May 6, 2015; 5 pages; Europe. |
State Intellectual Property Office of the People's Republic of China; Notification of the First Office Action, issued in connection to CN201710142639.7; dated Sep. 29, 2018; 8 pages; China. |
State Intellectual Property Office of the People's Republic of China; Notification of the First Office Action, issued in connection to CN201710142741.7; dated Sep. 4, 2018; 8 pages; China. |
Canadian Intellectual Property Office; Examiner's Report, issued in connection to CA2845827; dated Jan. 28, 2019; 5 pages; Canada. |
Canadian Intellectual Property Office; Examiner's Report, issued in connection to CA2845827; dated May 10, 2018; 5 pages; Canada. |
Korean Intellectual Property Office; Notification of Provisional Rejection, issued in connection to KR10-2014-7007293; dated Oct. 16, 2018; 3 pages; Korea. |
State Intellectual Property Office of the People's Republic of China; Notification of the Second Office Action, issued in connection to CN201410058245.X; dated Aug. 6, 2018; 11 pages; China. |
State Intellectual Property Office of the People's Republic of China; Notification of the First Office Action, issued in connection to CN201410058245.X; dated Sep. 5, 2017; 20 pages; China. |
Japanese Patent Office; Office Action, issued in connection to JP2018-029938; dated Dec. 4, 2018; 5 pages; Japan. |
Daich Takatori, Improvement of Support Vector Machine by Removing Outliers and its Application to Shot Boundary Detection, Institute of Electronics, Information and Communication Engineers (IEICE) 19th Data Engineering Workshop Papers [online], Japan, IEICE Data Engineering Research Committee, Jun. 25, 2009, 1-7, ISSN 1347-4413. |
The International Bureau of WIPO; PCT International Preliminary Report on Patentability, issued in connection to PCT/US2012/051390; dated Mar. 6, 2014; 6 pages; Switzerland. |
United States Patent and Trademark Office; PCT International Search Report and Written Opinion, Issued in connection to PCT/US15/25490; dated Jul. 24, 2015; 12 pages; US. |
The International Bureau of WIPO; PCT International Preliminary Report on Patentability, issued in connection to PCT/US15/25490; dated Oct. 20, 2016; 7 pages; Switzerland. |
European Patent Office; Communication Pursuant to Rules 70(2) and 70a(2) EPC, issued in connection to EP15776851.6; dated Mar. 13, 2018; 34 pages; Europe. |
European Patent Office; Extended European Search Report, issued in connection to EP15776851.6; dated Feb. 22, 2018; 3 pages; Europe. |
State Intellectual Property Office of the People's Republic of China; Notification of the First Office Action, issued in connection to CN201580027842.9; dated Jul. 12, 2018; 9 pages; China. |
Japanese Patent Office; Notification of Reason for Rejection, issued in connection to JP2017-504630; dated Jan. 1, 2019; 13 pages; Japan. |
Ford, AP et al.; IEEE Strandard definitions for use in reporting electric generating unit reliability, availability and productivity; IEEE Power Engineering Society, Mar. 2007; 78 pages. |
North American Electric Reliability Council; Predicting Generating Unit Reliability; Dec. 1995; 46 pages. |
European Patent Office; Extended European Search Report, issued in connection to EP19153036.9; 8 pages; dated May 8, 2019; Europe. |
Japanese Patent Office; Notice of Reasons for Rejection, issued in connection to JP2018-029939; 5 pages; dated Apr. 17, 2019; Japan. |
European Patent Office; Decision to refuse a European patent application, issued in connection to application No. EP12769196.2; dated Jul. 4, 2018; 4 pages; Europe. |
China National Intellectual Property Administration; Notification of Third Office Action, issued in connection to application No. 201580027842.9; dated Mar. 15, 2022; 10 pages; China. |
Canadian Intellectual Property Office; Examination Report, issued in connection to application No. 3116974; dated May 16, 2022; 5 pages; Canada. |
Canadian Intellectual Property Office; Examination Report, issued in connection to application No. 2945543; dated May 12, 2020; 4 pages; Canada. |
Korean Intellectual Property Office; Notification of Provisional Rejection, issued in connection to KR10-2016-7031635; dated Aug. 20, 2020; 10 pages; Korea. |
Korean Intellectual Property Office; Notification of Provisional Rejection, issued in connection to KR10-2022-7002919; dated Feb. 22, 2022; 8 pages; Korea. |
Kassidas, Athanassios et al.; Synchronization of Batch Trajectories Using Dynamic Time Warping; Process Systems Engineering; Jan. 9, 1998; 12 pages. |
United States Patent and Trademark Office; Non-Final Office Action, issued in connection to U.S. Appl. No. 17/025,889; dated Apr. 15, 2021; 15 pages; US. |
United States Patent and Trademark Office; Non-Final Office Action, issued in connection to U.S. Appl. No. 17/572,566; dated Jul. 18, 2022; 25 pages; US. |
United States Patent and Trademark Office; Non-Final Office Action, issued in connection to U.S. Appl. No. 17/204,940; dated Jun. 10, 2021; 12 pages; US. |
European Patent Office; Communication pursuant to Rules 161(1) and 162 EPC, issued in connection to application No. EP20785887.9; dated Apr. 14, 2022; 3 pages; Europe. |
Indian Patent Office; First Examination Report, issued in connection to application No. 202237016971; dated Aug. 5, 2022; 6 pages; India. |
Russian Patent Office; Official Action, issued in connection to application No. 2022110167/28; dated Dec. 2, 2022; 19 pages; Russia. |
Japanese Patent Office; Notice of Reasons for Rejection, issued in connection to JP2018-029940; 4 pages; dated Apr. 17, 2019; Japan. |
Japanese Patent Office; Notice of Reasons for Rejection, issued in connection to JP2018-029941; 7 pages; dated Apr. 17, 2019; Japan. |
Japanese Patent Office; Notice of Reasons for Rejection, issued in connection to JP2018-029944; 4 pages; dated May 14, 2019, 2019; Japan. |
Japanese Patent Office; Notice of Reasons for Rejection, issued in connection to JP2018-029943; 7 pages; dated May 8, 2019; Japan. |
European Patent Office; Extended European Search Report, issued in connection to EP18192489.5; 7 pages; dated Apr. 26, 2019; Europe. |
State Intellectual Property Office of the People's Republic of China; Notification of the Second Office Action, issued in connection to CN201580027842.9; dated May 28, 2019; 9 pages; China. |
China National Intellectual Property Administration; Notification of the Second Office Action and Search Report, issued in connection with CN201710142639.7; dated Aug. 20, 2019; 23 pages; China. |
Qing, Liu; Measurement Error of Thermistor Compensated by CMAC-Based Nonlinear Inverse Filter; Journal of Instrument; vol. 26, No. 10; Oct. 30, 2005; pp. 1077-1080; China. |
Hui, Liu; RSSI-Based High-Precision Indoor Three-Dimensional Spatial Positioning Algorithm; Chinese Master's Theses Full-Text Database Information Science and Technology, No. 8; Aug. 15, 2011; 4 pages. |
Korean Intellectual Property Office; Notification of Provisional Rejection, issued in connection to application No. 10-2014-0019597; dated Jun. 18, 2019; 7 pages; Korea. |
Japanese Patent Office; Notice of Final Decision of Rejection, issued in connection to application No. 2017-504630; dated Dec. 3, 2019; 11 pages; Japan. |
Indian Patent Office; First Examination Report, issued in connection to application No. 389/KOLNP/2014; dated Jan. 27, 2020; 7 pages; India. |
China National Intellectual Property Administration; Rejection Decision, issued in connection to application No. 201710142741.7; dated Jan. 15, 2020; 8 pages; China. |
Canadian Intellectual Property Office; Examination Report, issued in connection to application No. 2843276; dated Feb. 3, 2020; 5 pages; Canada. |
Indian Patent Office; First Examination Report, issued in connection to application No. 215/KOL/2014; dated Feb. 21, 2020; 7 pages; India. |
Canadian Intellectual Property Office; Examiner's Report, issed in connection to application No. 2845827; dated Feb. 19, 2020; 7 pages; Canada. |
China National Intellectual Property Administration; Rejection Decision, issued in connection to application No. 201580027842.9; dated Feb. 3, 2020;—pages; China. |
China National Intellectual Property Administration; Third Office Action, issued in connection to application No. 201710142639.7; dated Mar. 5, 2020; 8 pages; China. |
Intellectual Property Office of India; Examination Report, issued in connection to application No. 201637036024; dated Feb. 25, 2021; 8 pages; India. |
Canadian Intellectual Property Office; Examination Report, issued in connection to application No. 2843276; dated Feb. 22, 2021; 4 pages; Canada. |
Canadian Intellectual Property Office; Examiner's Report, issued in connection to application No. 2,945,543; dated May 12, 2020; 4 pages; Canada. |
Raj S, Stephen et al.; Detection of Outliers in Regression Model for Medical Data; 2017; International Journal of Medical Research and Health Sciences; pp. 50-56. |
Korean Intellectual Property Office; Notification of Provisional Rejection, issued in connection to application No. 10-2016-7031635; dated Aug. 20, 2020; 11 pages; Korea. |
China National Intellectual Property Administration; Rejection Decision, issued in connection to application No. 201710142639.7; dated Aug. 25, 2020; 10 pages; China. |
Nguyen, Minh N.H. et al.; Outliers Detection and Correction for Cooperative Distributed Online Learning in Wireless Sensor Network; Jan. 11, 2017; pp. 349-353; IEEE Xplore. |
Santhanam, T. et al.; Comparison of K-Means Clustering and Statistical Outliers in Reduction Medical Datasets; Nov. 27, 2014; 6 pages; IEEE Xplore. |
Yoon, Tae Bok et al.; Improvement of Learning Styles Diagnosis Based on Outliers Reduction of User Interface Behaviors; Dec. 10, 2007; pp. 497-502; IEEE Xplore. |
European Patent Office; PCT International Search Report, issued in connection to application No. PCT/US2020/051627; dated Jan. 15, 2021; 4 pages; Europe. |
European Patent Office; PCT Written Opinion of the International Searching Authority, issued in connection to application No. PCT/US2020/051627; dated Jan. 15, 2021; 5 pages; Europe. |
Sato, Danilo et al.; Continuous Delivery for Machine Learning; Sep. 19, 2019; 41 pages. |
Saucedo, Alejandro; The Anatomy of Production ML; Dec. 2020; 58 pages. |
Sadik, MD. Shiblee; Online Detection of Outliers for Data Streams; 2013; A Dissertation Submitted to the Graduate Faculty; 289 pages. |
Suresh, Anuganti; How to Remove Outliers for Machine Learning; Nov. 30, 2020; 33 pages. |
Sharifzadeh, Mahdi et al.; Machne-Learning Methods for Integrated Renewable Power Generation: A Comparative Study of Artificial Neural Networks, Support Vector Regression, and Gaussian Process Regression; Jul. 25, 2018; 26 pages; Elsevier Ltd. |
Japanese Patent Office; Office Action, issued in connection to application No. 2020-065773; dated Mar. 9, 2021; 4 pages; Japan. |
Japanese Patent Office; Office Action, issued in connection to application No. 2020-065773; dated Jun. 22, 2021; 9 pages; Japan. |
European Patent Office; PCT International Search Report, issued in connection to application No. PCT/US2021/022861; dated Jul. 9, 2021; 5 pages; Europe. |
European Patent Office; PCT Written Opinion of the International Searching Authority, issued in connection to application No. PCT/US2021/022861; dated Jul. 9, 2021; 9 pages; Europe. |
Rubin, Cynthia et al.; Machine Learning for New York City Power Grid; IEEE Transactions on Patters Analysis and Machine Intelligence; vol. 34, No. 2; IEEE Computer Society; Feb. 1, 2022; pp. 3285-3345; U.S. |
Lazarevic, Aleksandar et al.; Feature Bagging for Outlier Detection; Proceedings of the 11th. ACM Sigkdd International Conference on Knowledge Discovery and Data Mining; Aug. 21, 2005; pp. 157-166; U.S. |
Zhao, Yue et al.; DCSO: Dynamic Combination of Detector Scores for Outlier Ensembles: Cornell University Library; Nov. 23, 2019; 9 pages; U.S. |
European Patent Office; Communication Pursuant to Article 94(3) EPC, issued in connection to application No. EP18192489.5; dated Jul. 12, 2021; 8 pages; Europe. |
European Patent Office; Communication Pursuant to Article 94(3) EPC, issued in connection to application No. EP15776851.6; dated Jun. 4, 2021; 5 pages; Europe. |
Canadian Intellectual Property Office; Examiner's Report, issued in connection to application No. 2,845,827; dated Aug. 10, 2021; 5 pages; Canada. |
China National Intellectual Property Administration; Board Opinion, issued in connection to application No. 201710142741.7; Jan. 13, 2022; 7 pages; China. |
China National Intellectual Property Administration; Board Opinion, issued in connection to application No. 201710142639.7; Dec. 8, 2022; 14 pages; China. |
Canadian Intellectual Property Office; Examination Report, issued in connection to application No. 2843276; dated Nov. 25, 2022; 3 pages; Canada. |
European Patent Office; Communication Pursuant to Article 94(3) EPC, issued in connection to application No. EP19153036.9; dated Jun. 1, 2022; 7 pages; Europe. |
Japanese Patent Office; Notice of Reasons for Rejection, issued in connection to application No. 2021-183813; dated Oct. 25, 2022; 5 pages; Japan. |
Number | Date | Country | |
---|---|---|---|
20230091421 A1 | Mar 2023 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 16145544 | Sep 2018 | US |
Child | 17949743 | US |