Fano-Based Information Theoretic Method (FBIT) for Design and Optimization of Nonlinear Systems

BACKGROUND
1. Technical Field

This invention relates generally to the field of information theory. More particularly, it relates to an information theory method that uses the Fano equality with the Data Processing Inequality in a Markovian channel construct for the study of component-level uncertainty and information loss within a system.

2. Description of the Related Art

Recent innovations in radio frequency (RF) sensing component technology, particularly in the area of remote target signature measurement and exploitation, include multi-channel spatially diverse antennas, sensitive receivers, fast analog-to-digital converters, adaptive transmit waveforms, and sparse sampling approaches. These innovations support new signature information sensing functions such as calibrated target measurements, feature processing, and inference-based decision algorithms. The ability to characterize target information extraction while under the effects of system uncertainties is critical to the full application of the scientific method in the expanding trade space of the new functional capabilities, particularly regarding waveform design and the analysis of radar signatures and radar systems. Regardless of the application, the success of any information systems theory model will largely depend on its ability to address several challenges: the ability to (1) characterize the performance of modular systems within critical regions in the space of inputs while under the effects of various sources of uncertainty; (2) propagate the effects of these uncertainly sources acting on individual components within the system to the predicted system performance measures; (3) effectively minimize the overall loss in the information flow while trading costs associated with component design, and (4) operate effectively within the nonlinear high dimensional spaces inherent in many systems such as signature sensor systems.

A variety of information theoretic approaches have been formulated and applied to the area of RF sensing, particularly to the analysis and design of waveforms and radar systems such as new radar architecture referred to as the MIMO (Multiple Input Multiple Output) radar. For example, information theory-based frameworks employing a variety of techniques have been presented in the field of radar analysis, including application of the Fano bound to train and develop target classifiers in automatic target recognition (ATR) systems and use of mutual information (MI) as a similarity measure for the evaluation of suitability of radar signature training surrogates. Other approaches, such as the information bottleneck approach, have presented the radar system in terms of a Markov Chain within a channel configuration and characterized the information flow from source to sink in order to, for example, study information loss. However, existing systems theory prototypes frequently fall short in their ability to fully characterize the flow of information through the components of a sensing system while that system is subjected to the effects of system uncertainty. The ability to isolate the effects of uncertainty within the components of the system allows for component design trade methods that lead to optimal information flow.

In engineering scenarios, the error associated with system parameters is of interest. For example, the tolerance of machined components in a mechanical system are a key consideration in the manufacturing process, impacting the amount of testing and measurement needed to ensure compliance, as well as a contributing factor to overall system assembly expense (generally, the more stringent the fabrication requirements, the more expensive the end product.) Similarly, the confidence a user has in a meter reading value output by a system is also of critical importance. For example, a pilot needs to know whether the fuel gauge in an airplane cockpit indicates that the aircraft can reach its destination with 50%, 90% or 100% confidence.

The uncertainty associated with a system parameter is typically due to many sources. In traditional linear signal processing models with additive Gaussian noise, sources of uncertainty (noise) are assumed to be statistically independent. Because the sum of Gaussians is a Gaussian, the final overall uncertainty for a system output value is easily tabulated from the individual component uncertainties. Real life systems however often have nonlinear behavior. In addition, the noise may not be Gaussian, additive, or statistically independent. These deviations from the linear, additive independent Gaussian noise model quickly make uncertainty and error estimation analytically intractable. As a recourse, engineers frequently resort to numerical simulation methods such as Monte Carlo-based techniques. However, real life systems have a large number of degrees of freedom and numerical simulation in such situations must be carefully addressed. Hence, the need arises for accurate, analytically-based, methods for uncertainty estimation and propagation analysis.

BRIEF DESCRIPTION OF THE DRAWINGS

The description of the illustrative embodiments can be read in conjunction with the accompanying figures. It will be appreciated that for simplicity and clarity of illustration, elements illustrated in the figures have not necessarily been drawn to scale. For example, the dimensions of some of the elements are exaggerated relative to other elements. Embodiments incorporating teachings of the present disclosure are shown and described with respect to the figures presented herein, in which:

FIG. 1 is an exemplary depiction of an exemplary decision rule subspace.

FIG. 2 is an exemplary depiction of an exemplary Markovian channel model.

FIG. 3A is a flowchart illustrating an exemplary method according to the present disclosure for characterizing information flow and loss in a nonlinear system comprising two or more components that are subject to one or more sources of uncertainty.

FIG. 3B is a flowchart illustrating an exemplary method according to the present disclosure for determining optimal component design for a nonlinear system comprising two or more components that are subject to one or more sources of uncertainty.

FIG. 4 is a plot illustrating the relationship of P_eto F(x).

FIG. 5 is an exemplary plot illustrating possible tradeoffs between information vs. component design.

FIG. 6 is a flowchart illustrating a modeling and analysis approach to uncertainty propagation within the sensitivity analysis and modeling of an information sensing system.

FIG. 7 is a plot illustrating the inverse entropy function.

FIG. 8 is a plot of the slope at an example operating point identified in FIG. 7.

FIG. 9 is a plot illustrating the phase transitions in {right arrow over (X)} and computing the minimum sampling N_Musing the Maximum Likelihood Estimate (MLE) method.

FIG. 10 is an isometric view of a three-dimensional depiction of a target cluster in a radar sensor coordinate system.

FIG. 11 is an exemplary depiction of propagating uncertainty in a system.

FIGS. 12A and 12B are plots illustrating the amplitude response for the N sample ensemble of high range resolution radar signatures for a baseline set of conditions defined as Case 2 (μ_r=0 and μ_t=0).

FIG. 13 is a plot illustrating the phase transition within a typical set of {right arrow over (X)} vs. N, where n_b=6.

FIG. 14 is a plot illustrating the scaled standard deviation of estimator of entropy of {right arrow over (X)} vs. N, where n_b=6 and L=1000.

FIG. 15 is a plot illustrating the scaling properties of Î(H; {right arrow over (X)}), Î(H; {right arrow over (Y)}) , and Î(H; Q) vs. ensemble size N, where N_T=3×10³, n_b=6, and L=1000.

FIG. 16 is a plot comparing the calculated probability of error estimate to the probability of error computed using simulation.

FIG. 17 is a plot illustrating the application of Eq. (44) to the results of FIG. 15.

FIG. 18 is a plot illustrating application of Eq. (48) to the radar example at the three link positions {right arrow over (X)}, {right arrow over (Y)} and Q.

FIG. 19 is a plot comparing the performance given by the Fano approximation using Case 2 (μ_r=0 and μ_t=0) conditions, the simulated “true” performance, and the performance using the equality form of Fano in Theorem I.

FIG. 20 is a plot illustrating an exemplary system trade between mutual information and thermal noise.

FIG. 21 is a plot illustrating an exemplary system trade between mutual information and dynamic range.

FIG. 22 is a plot illustrating an exemplary system trade between mutual information and bandwidth.

FIG. 23 is a plot illustrating the individual contributions to the cumulative link loss variance in an incremental fashion.

FIG. 24 is a plot illustrating the relationships between the channel mutual information, the predicted link performance, and the reliability of the predicted link performance.

FIG. 25 is a plot illustrating the reliability of the predicted link performance as a variance.

DETAILED DESCRIPTION

The present disclosure includes a method for identifying and characterizing component-level information loss in a nonlinear system comprising a plurality of components, one or more of which are subject to at least one source of uncertainty that each comprises a plurality of system uncertainty parameters. The method comprises the steps of: a) determining discrete decision states for the nonlinear system that comprise a true object state H and a decision state Q, with the discrete decision states being characterized in a Markovian channel model comprising a plurality of links that each correspond to one component of the nonlinear system; b) modeling the system uncertainty parameters to create a plurality of distributions that each comprise a plurality of values ranging from a theoretical maximum entropy to a theoretical minimum entropy for one system uncertainty parameter, in which at least one of the system uncertainty parameters is unknown; c) calculating an entropy at each component, H(H), H(X), H(Y), . . . H(Q), that is directly related to an amount of uncertainty at each component; d) computing an amount of mutual information between H and Q, I(H;Q), in which I(H;Q) is used to characterize a total system performance and the one or more sources of uncertainty increases a total amount of entropy in the nonlinear system, thereby decreasing I(H;Q) and degrading the total system performance; e) calculating an amount of cumulative component information loss from H to Q, IL_X, IL_Y, . . . IL_Q, in which IL_Qis equal to a sum of the component-level information loss that occurs at each component, IL_XΔ, IL_YΔ, . . . IL_QΔ, and component-level information loss occurs only within the Markovian channel model; f) correlating, using Fano's equality, at least one of I(H;Q) and IL_Qto the total amount of entropy to generate at least one overall probability of error, P_e, for the nonlinear system; g) estimating, using the Data Processing Inequality together with Fano's equality, a component-level probability of error, P_e^X, P_e^Y, . . . P_e^Q; and h) correlating the component-level probability of error to the component-level information loss.

The present disclosure further includes a method for computing a component-level performance reliability and attributing a contribution of each system uncertainty parameter to the component-level performance reliability by: a) determining a real world statistical variation of the system uncertainty parameters; b) performing a Monte-Carlo simulation of a plurality of the statistical uncertainty parameters for a plurality of settings by iteratively performing, according to the present disclosure, the step of modeling the system uncertainty parameters through the step of correlating the component-level probability of error to the component-level information loss; c) calculating a component-level probability of error statistical distribution at each component; d) determining the component-level performance reliability based on a standard deviation of each component-level probability of error statistical distribution; and e) correlating the contribution of each system uncertainty parameter to the component-level performance reliability. In some embodiments, the step of performing the Monte-Carlo simulation further comprises determining a proper ensemble sample size.

The present disclosure further includes a method for determining at least one component-level ensemble sampling requirement comprising the steps of: a) determining a set of test criteria for a maximum allowable sampling uncertainty of the component-level information loss relative to the component-level probability of error statistical distributions; b) determining a sample ensemble size N_Mfor the component-level information loss using a phase transition method; and c) computing the component-level performance reliability using a numerical simulation method on the sample ensemble size N_M. In some embodiments, the numerical simulation method comprises Monte Carlo modeling.

The present disclosure further includes a method for determining an optimal component design for a nonlinear system comprising a plurality of components, one or more of which are subject to at least one source of uncertainty that each comprises a plurality of system uncertainty parameters. The method comprises the steps of: a) establishing an information loss budget comprising a desired P_e^Q; b) calculating, according to the present disclosure, the component-level information loss, IL_XΔ, IL_YΔ, . . . IL_QΔ; c) calculating, according to the present disclosure, component probability of error, P_e^X, P_e^Y, . . . P_e^Q, to generate a calculated P_e^Q; d) comparing the calculated P_e^Qwith the desired P_e^Q; e) identifying at least one source of information reduction that comprises component-level information loss and/or information flow reduction; f) determining the optimal component design to minimize the calculated P_e^Qthat includes at least one tradeoff between information flow and component design, in which the tradeoff decreases the at least one source of information reduction; and g) repeating the step of calculating component-level information loss through the step of determining the optimal component design until the calculated P_e^Qis equal to or less than the desired P_e^Q. In some embodiments, the method further comprises identifying at least two sources of information reduction that comprise component-level information loss and/or information flow reduction, ranking the two or more sources of information reduction according to impact on the calculated P_e^Qto identify at least one dominant source of information reduction, and determining the optimal component design to minimize the calculated P_e^Q, in which the optimal component design includes at least one tradeoff between information flow and component design that reduces the at least one dominant information loss source.

The present disclosure includes theoretical models and methods for identifying and quantifying information loss in a system due to uncertainty and analyzing the impact on the reliability of (or confidence in) system performance. These models and methods join Fano's equality, which is derived from Fano's inequality, with the Data Processing Inequality in a Markovian channel construct. In particular, the presently disclosed invention allows for the study of information flow and the effects of uncertainty on the information flow within the various components of a system. The present disclosure allows the determination of risk and characterization of system performance upper bounds based on the information loss attributed to each component. Taking an information theoretic view, degrading effects are considered as sources of entropy, which may be used to represent propagating uncertainty within an information channel. Treating the system as an information flow pipeline from input to output, the propagating effects of various sources of uncertainty (i.e. entropy) degrade the mutual information (MI) between the input and output. Development and application of a systems theory model allows for performing component-level design trades within the information sensing application based on a component-level information loss budget (Bits). Demonstration of the max flow in conjunction with the Data Processing Inequality further identifies information flow bottlenecks and provides analysis of these bottlenecks in the information flow pipeline.

The presently disclosed models and methods may be particularly useful within radar signature exploitation systems, and as such, key attributes of the presently disclosed theoretical models and methods are demonstrated under the constraints of a radar high range resolution (HRR) sensor system example. Simplified target scattering models are used to illustrate the value of component-level analysis under the effects of various sources of uncertainty in sensing systems. While the present disclosure is often referenced throughout with relation to radar and radar systems, one of ordinary skill in the art will appreciate that these models and methods may be employed in the design and analysis of a wide variety of systems and structures, including production/assembly lines, communications systems, and virtually any other multi-component system containing sources of uncertainty.

The use of information theoretic principles in the presently disclosed models and methods affords several advantages in dealing with the challenges associated with a variety of systems, particularly those in the areas of information sensing and exploitation. First, information theory prototypes enable the study of the propagating effects of various sources of uncertainty on system performance at the point of noise infiltration. For example, using Fano's inequality, the max flow criteria bounds the optimal Bayes error. Entropy and MI are analytically connected to the probability of error (P_e), and more generally the Neyman P_earson criteria, allowing for the rate of noise infiltration to be related to the rate of entropy growth and ultimately to the rate of degradation of system performance. The information loss associated with uncertainty sources can then be characterized in terms of a confidence interval about the predicted system performance at each component of the system. The Data Processing Inequality affords a method to determine information loss points and maximize information flow via component trades within a system information loss budget.

Second, the convexity of MI yields a unique solution and enables rapid numerical convergence (low computational complexity) to maximum MI configurations. MI affords the optimization of a scalar quantity, while classical Bayes likelihood ratio techniques involve optimizing on non-convex surfaces over high dimensional signature spaces. On a convex surface, the use of highly efficient search algorithms such as the Conjugate Gradient method will converge on the order of N operations (N dimensional problem). While entropy-based methods operate non-parametrically such that the probability does not have to be estimated, complicating factors can include numerical computation issues that occur within high dimensional processes (Bellman's Curse of Dimensionality). It can be shown, however, that computing the entropy of the multivariate sensor signature processes is also O(N). As a consequence of the law of large numbers, the asymptotic equipartition property asserts that there are large regions within the entropic signature subspace that will never occur under the decision hypotheses. Thus, the information theoretic approach holds the potential to exploit entropy-based methods operating within this “typical” signature subspace.

Third, classical statistical pattern recognition approaches use the maximum likelihood (ML) decision criteria, which include only the second order statistics present in the training process. The use of MI in nonlinear processing affords advantages over linear processing in that it accounts for higher-order statistics within the design of nonlinear optimal decision rules and in the optimization of features. In the context of radar systems, nonlinear scattering phenomenon resulting from the interaction of individual target mechanisms can also reduce the effectiveness of second order techniques in the optimization of diverse transmit waveforms. The use of MI as a nonlinear signal processing method for optimizing waveform design will address this phenomenon. It is these inherent benefits that distinguish the presently disclosed information theoretic models and methods over traditional statistical pattern recognition methods.

The present disclosure additionally includes methods for estimating the sampling requirements for entropic quantities based, for example, on a characterization of the typical set underlying the sufficient statistics of a random signature process. Interdependencies among multivariate target signatures can significantly impede information extraction, and the expansion of the signature statistical support is related to incremental increases in uncertainty. Baseline statistical support (in the native coordinate system) associated with the resolved radio frequency target scattering is characterized for specified states of certainty. The performance estimate variance associated with lower sample counts within a Monte Carlo experiment may be scaled (via central limit theorem) to the estimate variance associated with higher sample counts.

The present disclosure further includes methods for relating sampling uncertainty to sensing uncertainty to better understand the entropic effects within the information sensing system and to ensure confidence estimates are of sufficiently low variance. Referring to radar signature analysis, both sensor uncertainty and model training uncertainty are propagated into a classifier algorithm where uncertain decisions are inferred from uncertain observations. The uncertainty (i.e. the increase in entropy) is ultimately realized in the form of confidence or reliability intervals about the estimated system performance. A sensitivity analysis is performed to study the relative significance of various “unknown” operating conditions to the reliability of the performance estimate at each component of the system. The effects of sampling uncertainty are contrasted to reliability of performance estimates in order to study the variance effects in performance estimation within high dimensional signature processes subject to unknown operating conditions.

Uncertainty Analysis: In the sensor measurement community, “accuracy” generally refers to the agreement between a measured value and the true or correct value, while “precision” generally refers to the repeatability of a measurement. “Error” refers to the disagreement between the measured value and the true or accepted value. The “uncertainty” in a stated measurement is the interval of confidence around the measured value such that the measured value is expected not to lie outside this stated interval. This use of the term “uncertainty” implies that the true or correct value may not be known and can be stated along with a probability, which recognizes the deterministic nature of error and the stochastic nature of uncertainty. However, this definition is often insufficient to address the full range of issues within an information sensing system containing multiple sources of uncertainty.

For example, radar systems produce signature measurements that when combined with the effects of various system uncertainties, are realized as a random signature process. Conclusions are inferred by applying instances taken from this random measured signature process to a decision rule. The “unknowable” nature of parameters affecting the measured signature process leads to challenges in developing a signature process model that will generate the optimal decision rule for inferring information. The combined effects of these uncertainties limit the exploitation of physics-based features and result in a loss in information that can be extracted from target signature measurements. The resulting decision uncertainty is driven by both the distorted measurements and the degree of agreement between the signature process under measurement and the process used to train the optimal decision rule.

As a specific example, measurement of airborne moving objects using high range resolution (HRR) waveforms is complicated by several sources of uncertainty. As shown in Table 1, two classes of system uncertainty are introduced into the system: sensing uncertainty and uncertainty resulting from decision rule training limitations. Sensing uncertainty is further divided into three subcategories: (a) signature measurement uncertainty due to sensor design/limitations; (b) object tracking position and motion uncertainty; and (c) uncertainty due to interference.

The object under measurement by the sensing radar system can be viewed as a collection of scattered field sources filling an electrically large volume in space. The system measurement of this object is subject to uncertainty identified in source 1(a) generating the statistical support underlying a random signature process at a fixed position in time. Target fixed body motion within the measurement interval induces scintillation within the scattering sources, resulting in an additional increase in entropy. Imperfect knowledge of target position, velocity, and aspect also alters the statistical characterization of the random signature process (source 1(c)), and the random signature process interacts with an external environment (source 1(b)) to further impact the statistical nature of the measured signature process.

TABLE 1

RADAR SYSTEM UNCERTAINTY SOURCES

Uncertainty Core Area
Parameter Uncertainty Subcategory

1. Sensing

(a) Signature
Nonlinear
I&Q Quantization/
Amplitude &

Measurement
Effects
Clipping
Phase Calibration

(b) Environmental
Clutter/
RF Interference
Jamming

Thermal Noise

(c) Object Tracking &
Object Range,
Object Articulation
Intra-measurement

Motion
Velocity, &

Motion

Aspect

Estimates

2. Decision Rule Training
Process Under
Target Configuration
Target Modeling

Limitations
Sampling
Variation
Parameters

These sources of uncertainty, along with limitations within the training process, result in a decision rule design that is less than optimal with respect to system performance. The exploitation of this signature process using a decision algorithm requires the training (generally via supervised learning) of an optimal decision rule that operates within the entropy produced by sources 1(a)-(c), but only a subset of the phenomenon (parameters) underlying source 1 can be modeled and/or characterized within the statistical decision rule training process. While uncertainty source 1(a) is generally epistemic and may be modeled and characterized, uncertainty sources 1(b), 1(c), and 2 are aleatoric in nature and are generally considered “unknowable.” As such, uncertainty sources 1(b), 1(c), and 2 may generally only be characterized statistically and may result in a reduction in certainty from the highest certainty state.

The sources of uncertainty associated with source 2 in Table 1 are traceable to the corresponding effect within the decision rule subspace in the classical statistical pattern recognition approach to the binary hypothesis testing. The decision rule design (threshold d) is based on statistical training support resulting from the uncertainties in Table 1. If the sensing uncertainties within source 1 are adequately represented in the statistics of the training process, the decision rule design should provide optimal performance; however, the effects due to many of the uncertainties in Table 1 are unavoidable. For example, realizations are often formed through the integration of many sequential measurements. Intra-measurement object motion can cause distortion and induce uncertainty in the decision rule subspace that is not accounted for in the decision rule training process. In another example, the object under measurement may be configured differently than that represented in the training data (extra fuel tanks, wing flaps up, or damaged surface for example).

Information Theoretic Decision Rule Subspace: Referring now to the drawings, like reference numerals may designate like or corresponding parts throughout the several views. One approach to viewing the decision rule subspace is shown in FIG. 1, where the decision rule subspace is cast in terms of information theoretic quantities based on entropy, which is a measure of the size of a typical set. In FIG. 1, information is defined in terms of the MI, I(H;Q), I(H;Q′), between the “typical subspaces” associated with the true object state H and the decision state Q, where H and Q are discrete random variables and Q is of the same alphabet as H. FIG. 1 also includes an additional subspace H(Q′), which represents an altered Q based on a non-optimal decision rule (discussed in more detail herein). Systems (and associated sub-component) designs that increase the MI between these “typical signature subspaces” increase the flow of information, while the introduction of sources of uncertainty acts to alter to the typical signature subspaces (growth or movement) associated with the highest certainty state. A change to the typical subspaces can result in a loss in the flow of information and a decrease in decision performance, creating overlapping subspaces that reflect a decreased amount of MI.

Information Theoretic Radar Channel Model: The concept of uncertainty introduced in FIG. 1 may be realized in terms of an increase in entropy within a Markov chain, which is a discreet memoryless information channel referred to herein as a Markovian channel. Information sensing systems such as a radar information sensing system may be viewed within this Markovian channel model, which is depicted in FIG. 2 as the information flow through the signature sensing and processing components of a radar system. In FIG. 2, the relationship between H and Q is the basis chosen for performance characterization. The discrete random variable H represents which of the N_cpossible hypotheses has occurred. Successful flow of information results in agreement between H and Q.

Referring to FIGS. 2 and 3, exemplary models and methods according to the present disclosure will be described. FIG. 3A is a flowchart illustrating an exemplary method 300 for identifying, characterizing, and quantifying information loss in a nonlinear system that comprises two or more components or links that experience information loss due to one or more sources of uncertainty. The method 300 begins with determining discrete decision states i.e. input and output states for the nonlinear system (Step 305). These discrete decision states may comprise a true object state (input) H and a decision state (output) Q, where H and Q are of the same alphabet and are characterized in a Markovian channel model, where the model comprises a plurality of links that each correspond to one component of the nonlinear system.

One or more of the components of the system comprising links or stages within the Markovian channel model, referred to here as X, Y, . . . , is subject to at least one source of uncertainty. In Step 310, these sources of uncertainty, each of which may comprise a variety of governing system uncertainty parameters or variables, are modeled. Modeling the parameters creates a series of distributions, each of which represents a set of values ranging from the theoretical maximum value of entropy to the theoretical minimum value for each parameter. For example, in a radar system, these variables may include values that are constantly changing and/or that are unknowable or aleatoric such as the target aspect angle, the leading edge location, thermal noise, the presence of jamming frequencies, etc. These aleatoric variables must generally be characterized statistically, and these characterizations may be in the form of statistical distributions or in the case of radar systems, range bins.

Continuing with the radar system example, X in FIG. 2 is generally a multidimensional probability distribution of radar returns, which become multidimensional probability distribution Y following signal processing, and Q denotes the classifier algorithm decision of which hypothesis H (instance of H) occurred based on Y. Conditioned on the generating hypothesis H (instance of H), there is typically a multidimensional encoded source {right arrow over (X)}_E, which is realized as the image projection of the scattered field of the object under measurement i.e. a target. In this context, {right arrow over (X)}_Eis deterministic, resulting from the convolution of the target's physical scattering mechanisms with the transmitted waveform s, and given the “unknowable” nature of this code through measurement or modeling, the code itself is only observable in the random form of {right arrow over (X)}. In the case of HRR radar measurements, {right arrow over (X)}_Eis the band-limited frequency response associated with the scattered field of the observed object in thermal noise. As shown for example in FIG. 2, after mixing, filtering, and signal processing, these returns become the measured random signature vector {right arrow over (X)}_n. The sensing of {right arrow over (X)}_nis subjected to the uncertainties listed in source 1 of Table 1, leading to the random signature process {right arrow over (X)}. The various cases of the sensed signature are summarized in Table 2.

TABLE 2

SENSOR MEASUREMENT SIGNALCASES

Encoded Deterministic Multivariate
{right arrow over (X)}_E

Signal

Deterministic Signal in Additive
{right arrow over (X)}_n= {right arrow over (X)}_E+ {right arrow over (n)}

Noise

Random Multivariate Signal in
{right arrow over (X)} = {right arrow over (X)}_E+ {right arrow over (n)}

Additive Noise

The multivariate sample feature {right arrow over (Y)}ⁱis extracted from the i^thinstance test sample {right arrow over (X)}ⁱto support the desired function of the exploitation system. Given the random nature of {right arrow over (X)}, the extracted signature feature {right arrow over (Y)} is also random. The training feature process {right arrow over (Y)}′ is developed from the set of typical signatures within a decision rule training process {right arrow over (X)}′ (not separately shown). {right arrow over (X)}′ (and thus {right arrow over (Y)}′) is developed offline using a surrogate process and is used to determine the ‘optimal’ decision rule d. The decision algorithm applies {right arrow over (Y)}ⁱto the decision rule d, yielding the decision Q (instance of Q) declaring which of the hypotheses has occurred.

Referring still to FIG. 3A, the method 300 continues with determining a sample ensemble size N_M(Step 315) as described in more detail herein. For example, N_Mmay be determined using a phase transition method (also described in more detail herein). In the radar system example, the evaluation of an ensemble N_Mof test samples {right arrow over (X)}ⁱ{i=1→N_M} produces the sample ensemble of the N_Mmatching tests (H,Q) to statistically characterize the decision performance.

Following determination of the sample ensemble size N_M(Step 315), the next step is calculating an amount of entropy at each component of the system (Step 320). An instance of each parameter is drawn from the distributions created in Step 310, and based on the N_Mensemble of samples calculated in Step 315, the entropy is determined for each component of the system H(H), H(X), H(Y), . . . H(Q). The amount of entropy at each component of the system is equal to H(H), H(X), H(Y), . . . H(Q), respectively, and a total amount of entropy for the system is equal to the sum of the entropies at each component. The next step is computing the amount of MI between H and all other components, including the output of the system at Q, I(H ;X), I(H ;Y), . . . I(H;Q) (Step 325). The one or more sources of uncertainty may cause a degradation of performance by increasing the amount of entropy in the nonlinear system and thus decreasing I(H;Q). I(H;Q) may be mathematically related to the total system performance as described herein, thereby allowing a correlation between increases in entropy, decreases in MI, and changes in system performance.

Following calculation of MI in Step 325, the next step is determining an amount of cumulative component information loss from H to each component, including the output component at Q, IL_X, IL_Y, . . . IL_Q, as well as component-level information loss IL_XΔ, IL_YΔ, . . . IL_QΔ (Step 330). IL_Qis equal to an end-to-end sum of the component-level information loss that is occurring at each component. Information cannot be gained and can only be lost within the Markovian channel. In the context of the radar system example, these sources of component information loss may include, for example, loss due to uncertainty in the sensing and/or feature extraction processes, as well as loss occurring from the decision process due to imperfect training. IL_XΔ, IL_YΔ, . . . IL_QΔ may be determined by apportioning the IL_Qdetermined in Step 330 among each component. Because IL_Qis equal to a sum of the component-level information loss i.e. the information loss that occurs at each link or component within the system, IL_X, IL_Y, . . . IL_Qmay be used to determine IL_XΔ, IL_YΔ, IL_QΔ.

The next step in the method 300 is calculating a predicted overall system performance P_eand the predicted link performance i.e. the component probability of error P_e^X, P_e^Y, . . . P_e^Q) (Step 335). Using Fano's equality, cumulative information loss, which may be, for example 1-I(H;Q) and/or IL_Q, is correlated with the total amount of entropy associated with system uncertainties as determined in Step 320, with this correlation being characterized as at least one overall probability of error P_efor the nonlinear system i.e. the overall system performance. Using Fano's equality and the Data Processing Inequality together, P_emay be used to estimate a component-level probability of error P_e^X, P_e^Y, . . . P_e^Qfor each component in the system. In Step 337, P_e^X, P_e^Y, . . . P_e^Qmay then be correlated to IL_XΔ, IL_YΔ, IL_QΔ.

Steps 310 to Step 335 may be repeated until the number of instances of the statistically distributed of uncertainty parameters calculated in Step 310 are realized. In one embodiment, this iterative sampling process may comprise a Monte Carlo method in which L draws are taken from the distribution of modeled uncertainty parameters (Step 310).

Once a sufficient number of samples (L) have been obtained, the method 300 continues with determining the statistical distribution of each component probability of error due to the various sources of parameter uncertainty (Step 340). The variance and mean of the cumulative component information loss IL_X, IL_Y, . . . IL_Qcalculated in Step 330 are used to determine the variance on the predicted performance at each link or component. A distribution is created for each source of uncertainty, providing the random mapping to the performance estimate P_eat each component P_e^X, P_e^Y, . . . P_e^Q.

The statistical distribution of the component probability of error is then used to compute the reliability of the component probability of error (Step 345), followed by termination of the method. The standard deviation σ_Pe_x, σ_Pe_Y. . . σ_Pe_Qof the respective predicted component performance functions P_e^X, P_e^Y, . . . P_e^Qis used as a measure of reliability. In this manner, the presently disclosed method 300 may be used to estimate the reliability (or confidence) of a component-level probability of error for each component in the system, and this component-level probability of error and associated reliability may then be correlated to the component-level information loss. The presumption is that for a fixed H(H), maximizing I(H,Q) will minimize the equivocation H(H/Q) and thus minimize the probability of error P_e. Unknown parameters can affect performance estimates thus knowledge of this reliability is needed to effectively minimize P_ein design.

In addition, the presently disclosed method allows the determination of the relative contribution of each system uncertainty parameter to the component-level performance reliability, as well as a comparison of the performance reliability estimates determined in Step 345 to real world uncertainty sources. This determination can be very helpful in the traceability of the effects of uncertainty on the reliability of performance. The disclosed method of decomposition will allow for designers to identify where the uncertainty is having the most detrimental effect on performance reliability. The ability to perform this traceability at the component level will further allow component designers to design for the minimum effects of uncertainty.

In one embodiment, a method for computing component-level performance reliability and attributing the contribution of each system uncertainty parameter to the component-level performance reliability may begin with determining a real world statistical variation of the system uncertainty parameters. In this context, “real world” refers to information obtained using actual events and/or experiments. For example, continuing with the radar system example, a variety of uncertainties exist, many of which occur due to chance and are hence unknowable. Samples obtained under real world conditions are subject to a variety of these system uncertainty parameters (known and unknown), and the statistical variation of the system uncertainty parameters may be calculated as described above. Following determination of real world statistical variation of the system uncertainty parameters, the method continues with performing Monte-Carlo modeling of a plurality of the statistical uncertainty parameters for a plurality of settings to determine component-level information loss. This step may occur, for example, through many iterations of Steps 310 to 335 in FIG. 3A as described above, followed by calculating a component-level probability of error statistical distribution at each component as in Step 340 and calculating a component-level performance reliability based on a standard deviation of each component-level probability of error statistical distribution as in Step 345.

The contribution of each system uncertainty parameter is then correlated to the component-level performance reliability. This calculation may be performed, for example using Eq. (34) described herein. The independent nature of the individual system uncertainty parameters allows the effects of each parameter to be seen. Using the Data Processing Inequality, a decomposition of reliability may be obtained so that the decomposition of reliability effects may be seen at each component. These calculations may be used to determine how well the information sensing system performs when real world data is used and to determine the acceptability of performance reliability with respect to real world uncertainty sources.

The presently disclosed invention further includes methods for determining the optimal design of components of a nonlinear system in order to minimize information loss, while maximizing information flow and MI. Referring to FIG. 3B, the method begins with establishing an information loss budget for the complete system (Step 350) in terms of a desired performance, such as establishing a desired value for P_e^Q. The P_e^Qfor the total system may be related directly to the total system cumulative information loss IL_Q. In Step 355, a predicted component/link loss (for example; IL_XΔ, IL_YΔ) is calculated, followed by calculation of a component/link performance (for example; P_e^X, P_e^Y)(Step 360). Steps 355 and 360 may be accomplished, for example, using the method in FIG. 3A. The next step (Step 365) is to compare the P_e^Qcalculated in Step 360 with the desired P_e^Qvalue determined in Step 350. If the calculated P_e^Qis within the information budget (“Yes”), the method terminates.

However, if the calculated P_ee exceeds the information loss budget (“No”), the method may continue with identifying one or more sources of information loss and information flow reduction i.e. bottlenecks (Step 370). In some embodiments, there are two or more of information loss and information flow reduction, and the method further includes the step of identifying the dominant source(s) of information loss and information flow reduction i.e. bottlenecks (Step 370). These dominant sources may be identified by ranking the various sources of uncertainty at each link/component (for example; IL_XΔ, IL_YΔ) based on their individual impact on cumulative component information loss and performance of the system.

The next step is determining the optimal component design to minimize P_e^Qand IL_Q, while maximizing I(H;Q) (Step 375) within the information budget via one or more tradeoffs between information flow and component design (described in more detail herein). Following determination of the optimal component design in Step 375, the method returns to Step 355 to continue component design iterations guided by relative levels of component information loss within a system component/link loss budget until a component design is determined that keeps the calculated P_e^Qwithin the desired information loss budget established in Step 350.

Fano-Based Information Theoretic Method (FBIT) and Data Processing Inequality: Fano's Inequality and the Data Processing Inequality, both of which are theorems from information theory, may be used in Step 335 of the method in FIG. 3A to quantify the effects of “uncertainty” and the associated alteration to the typical signature subspaces in terms of the flow of information and the impact to system performance. Fano's Inequality relates information theoretic quantities to the P_e(probability of error) criterion in for example, an object classification system, while the Data Processing Inequality allows the analysis of information flow from measured object returns through the signature sensing, signal processing architecture and into the decision stage. The Data Processing Inequality may be used to identify where information is lost and quantify the impact on system performance. In this manner, stages in the information processing pipeline where information is lost can be identified, analyzed, and optimized, leading to improvement in overall system performance.

Fano's Inequality provides a mathematical means to relate the MI between H and Q, I(H;Q), to a lower bound on P_e. Fano's Inequality may be written as an equality as in Equation (Eq.) (1):

H(P_e)=δ−P_e·log(N_c−1)+H(H/Q) (1)

In Eq. (1), P_eis a real random variable between 0 and 0.5 representing the probability of error of the decision algorithm. N_cis the discrete size of the alphabet of H and Q. H(H) is the Shannon entropy of the discrete random variable H. δ is a bias offset derived from asymmetries in the data and decision algorithm. Typically, δ is small and to a first approximation, may be neglected.

Theorem I: For N_c=2, Fano's equality can be written as H(P_e)=1−I(H;Q) +I(Q;V), where V is the binary discrete random variable representing the probability that the decision rule makes a correct decision. Using I(H;Q)=H(H)−H(H/Q) and Eq. (1), Eq. (2) may be obtained:

H(P_e)=δ−P_e·log(N_c−1)+H(H)−I(H;Q) (2)

The asymmetry factor in Eq. (2) may be computed directly from the output of the decision algorithm. Let δ=I(Q;V) for N_c=2; where V is the binary discrete random variable representing the probability that the decision rule makes a correct decision. V=1 when H=Q; otherwise V=0. Eq. (2) can then be written more completely for N_c=2 as in Eq. (3):

H(P_e)=1−I(H;Q)+I(Q;V) (3)

Eq. (3) may be written in terms of the inverse entropy function, F, as shown in Eq. (4):

P
_e
=F(H(H))−I(H;Q)+I(Q;V) (4)

In Eq. (4), F is a deterministic, strictly monotonically increasing function that maps information theoretic quantities into the P_eat the corresponding operating point. The relationship of P_eto F(x) where x ∈ [0, 0.5] is shown in FIG. 4, which illustrates the binary entropy function. The quantity IL_Qin Eq. (5) is the end-to-end information loss for the system that is determined in Step 330 in FIG. 3A:

IL
_Q
=H(H)−I(H;Q)+I(Q;V) (5)

In general, minimizing the information loss minimizes the system P_e. The entropic quantity H(H) is determined by the a priori probabilities of the outcomes of the random variable H corresponding to the different target classes. δ is fixed by architectural considerations. Since F is a known function, the deterministic relation P_e=F(H(H)−I(H;Q)+I(Q;V)), for fixed H(H) and δ, determines the MI, I(H;Q), needed to achieve a specified P_e. For example, for an equiprobable binary hypothesis scenario, H(H)=1 Bit and I(Q;V)≈0, an approximation for P_ecan be written as Eq. (6):

P
_e
≈F(1−i(H;Q)) (6)

Specifying a desired P_edetermines the amount of allowed IL_Q. How the IL_Qbudget is “spent” as information cascades from the input space at H to the classifier output space at Q can be traded off via component (link) design. FIG. 5 presents an abstract diagram indicating possible tradeoffs between component level design vs. information (Bits). Information losses within the channel can be studied with respect to various sources of uncertainty in Table 1.

The Data Processing Inequality states that information can only be lost in the channel as shown in Eq. (7):

I(H;{right arrow over (X)})≥I(H;{right arrow over (Y)})≥I(H;Q) (7)

Using the relationship in Eqs. (4) and (5), the loss associated with each link within the channel can be characterized as in Eq. (8):

H(H)−I(H;{right arrow over (X)})≤H(H)−I(H;{right arrow over (Y)})≤H(H)−I(H;Q) (8)

The approximation to the cumulative information loss at each link in the channel can then be written as below applying Eq. (5):

IL
_{{right arrow over (X)}}
≈H(H)−I(H;{right arrow over (X)}); custom-character ∈|x| (9.a)

IL
_{{right arrow over (Y)}}
≈H(H)−I(H;{right arrow over (Y)}); {right arrow over (Y)}∈|Y| (9.b)

IL
_Q
≈H(H)−I(H;Q); Q ∈|Q| (9.c)

Theorem II: The respective information loss due to each link within a Markov chain H→X→Y→Q can then be approximated using Eqs. (10a-10c):

Loss due to Sensing≡IL_SΔH(H)−I(H; {right arrow over (X)}) (10.a)

Loss due to Feature Extraction≡IL_FΔ≈I(H; {right arrow over (X)})−I(H; {right arrow over (Y)}) (10.b)

Loss due to Decision Rule≡IL_DΔ≈I(H; {right arrow over (Y)})−I(H;Q) (10.c)

Thus, the probability of error can be estimated at various points in the channel using the approximation in Eq. (6):

P
_e
^X
≈F(H(H)−I(H; {right arrow over (X)}) (11.a)

P_e^Y≈F(H(H)−I(H; {right arrow over (Y)}) (11.b)

P
_e
^Q
≈F(H(H)−I(H;Q) (11.c)

Uncertainty In the Information Channel: The feature extraction f and decision rule d in FIG. 2 are designed to maximize I(H;Q), but it can be seen from Eq. (2) that sources of uncertainty introduced in the channel may result in a reduction in I(H;Q) and subsequently an increase in P_e. A decrease in I(H;Q) is always accompanied by an increase in H(P_e), resulting in a degradation to the realized P_e.

Referring to the radar system example, the loss at {right arrow over (X)}IL_SΔ, is due solely to the sensing process (source 1 in Table 1). The sensing uncertainty inherently alters the statistical support associated with {right arrow over (X)}_n, generating statistical independence between {right arrow over (X)}_nand {right arrow over (X)}, thus degrading the performance of the signature sensing process as quantified by P_e^Xin Eq. (11a). The loss in information due to sensing uncertainty is then realized at {right arrow over (X)} as IL_SΔ in Eq. (10a) and is quantified by the entropy H(P_e^X):

H(P_e^X)≈H(H)−I(H;{right arrow over (X)}) (12)

The level of statistical agreement between {right arrow over (X)} and {right arrow over (X)}′ will directly affect the loss in the channel due solely to the decision process (source 2 in Table 2), which is closely tied to the surrogate training process {right arrow over (X)}′. The sensing uncertainty sources in Table 1 are to some degree reproducible in the decision rule training process {right arrow over (X)}′. However, sources 1(b) and 1(c) in Table 1 are not fully reproducible in {right arrow over (X)}. The dissimilarity between {right arrow over (X)} and {right arrow over (X)}′ results in a decision rule d that is less than optimal. The application of d to the feature process {right arrow over (Y)} induces a loss in the channel due to imperfect training. The effects of decision uncertainty within the decision rule subspace are realized at Q as IL_DΔ as illustrated in FIG. 2. The decision uncertainty IL_DΔ can be interpreted in terms of the entropy H(P_e^Q) as in Eq. (13) and quantified as defined in Eq. (11.c).

H(P_e^Q)≈H(H;Q)−H(Q) (13)

The resulting H(P_e^Q) provides the best possible performance for a given component design (radar sensor design, feature selection, algorithm design, and decision rule design). As stated above, {right arrow over (X)} is often not completely observable and a training surrogate {right arrow over (X)}′ is used to develop f and d. Under conditions such as those listed in uncertainty source 2 in Table 1, the surrogate representation {right arrow over (X)}′ used in the training of the decision rule results in a non-optimal d. This is represented by the altered entropic quantity H(Q′) and more importantly I(H;Q′). The alternate Markov chain H→{right arrow over (X)}→{right arrow over (Y)}→Q′ is shown as the dotted subspace H(Q′) in FIG. 1. The corresponding form of Eq. (3) can then be written as:

H(P_e′)=1−I(H;Q′)+I(Q′;V) (14)

Therefore since H(P_e′)≥H(P_e), I(H;Q′)−I(Q′;V)≤I(H;Q)−I(Q;V).

Corollary I: Information loss due to imperfect training, IL_TΔ, is then mathematically quantified in terms of the increase in entropy ΔH(P_e) resulting from a non-optimal design of f and d:

$\begin{matrix} \begin{matrix} {IL}_{T Δ} = Δ H (P_{e}) = H (P_{e}^{'}) - H (P_{e}) \\ = - I (H; Q^{'}) + I (Q^{'}; V) + I (H; Q) - I (Q; V) \end{matrix} & (15) \end{matrix}$

If it can be shown that I(Q;V)≅I(Q′;V) and that I(Q;V)<<H(H)−I(H;Q) and I(Q′;V)<<H(H)−I(H;Q′), then:

Imperfect Training Loss≡IL_TΔ≅I(H;Q)−I(H;Q′) (16)

The decrease in information flow due to imperfect training is illustrated in FIG. 1 as the reduction in overlap between the subspaces of H and Q.

Theorem III: The total loss in the channel is equal to the sum of link information loss:

IL
_Total
=IL
_SΔ
+IL
_FΔ
+IL
_DΔ
+IL
_TΔ (17)

Definition 1: Any phenomenon producing an increase in I(H;Q) and a subsequent reduction in H(P_e) can be defined as a “system information gain” within the information channel. Any phenomenon producing a decrease in I(H;Q) resulting in an increase in H(P_e) is defined as a “system information loss.”

Propagating Effects of Uncertainty: Uncertainty propagation is the study of how uncertainty in the output of a model (numerical or otherwise) can be allocated to different sources of uncertainty in the model inputs, which are used in Step 310 of FIG. 3A to model the effects of a variety of system uncertainty parameters on various components of the system and the system as a whole. FIG. 6 provides an illustration of a modeling and analysis approach to uncertainty propagation within the sensitivity analysis and modeling of an information sensing system. The careful definition of variables plays a central role in case-controlled studies of the effects of uncertainty on system performance. The vector custom-character represents the control parameters of interest within computer-generated experiments. Continuing to use the HRR sensor system example, absent the uncertainties identified in Table 1, the effects of selected values for on the deterministic mapping function P_e^X in Eq. (11.a) are certain. Further experimentation involving the unknowable random environmental and position estimation effects in sensing are best studied statistically. Thus, the respective estimated random input parameters of {right arrow over (V)}_Eand {right arrow over (V)}_tare introduced, resulting in the mapping to the random signature process {right arrow over (X)}( custom-character , {right arrow over (V)}_E, {right arrow over (V)}_t). The sensing uncertainty is then subsequently propagated into the random feature process {right arrow over (Y)} (, {right arrow over (V)}_E, {right arrow over (V)}_t) and ultimately to the decision process Q(, {right arrow over (V)}_E, {right arrow over (V)}_t). For brevity, {right arrow over (Y)}(( custom-character , {right arrow over (V)}_E, {right arrow over (V)}_t) is written as {right arrow over (Y)} and Q(, {right arrow over (V)}_E, {right arrow over (V)}_t) is written as Q.

The distributions associated with the input parameters in {right arrow over (V)}_Eand {right arrow over (V)}_tare estimated from experimental data. The estimated parameters become factors within a Monte Carlo simulation. The cumulative link information loss as quantified within Eq. (5) and approximated in Eqs. (9.a), (9.b), (9.c) then become random variables as shown:

IL
_{{right arrow over (X)}}
≈H(H)−I(H; {right arrow over (X)}(( custom-character , {right arrow over (V)}_E, {right arrow over (V)}_t)); (18.a)

IL
_{{right arrow over (Y)}}
≈H(H)−I(H; {right arrow over (Y)}( custom-character , {right arrow over (V)}_E, {right arrow over (V)}_t)); (18.b)

IL
_Q
≈H(H)−I(H; Q ( custom-character , {right arrow over (V)}_E, {right arrow over (V)}_t)); (18.c)

Similarly, the link information loss IL_SΔ, IL_FΔ, and IL_DΔ in Eqs. (11.a), (11.b), and (11.c) also become random variables.

The unknowable characteristics of the observed signature process {right arrow over (X)} are realized within the input variables to the modeled training process {right arrow over (X)}′ ( custom-character , {right arrow over (V)}_E,{right arrow over (V)}Δ_t). If it is assumed that ≠, {right arrow over (V)}_E′≠{right arrow over (V)}_E, {right arrow over (V)}_t′≠{right arrow over (V)}_t, then the mapping to the non-optimal decision rule will be d(, {right arrow over (V)}_E′, {right arrow over (V)}_t′), which will be written as d for brevity. The decision rule d is applied to {right arrow over (Y)} ( custom-character , {right arrow over (V)}_E, {right arrow over (V)}_t) generating Q′(, {right arrow over (V)}_E, {right arrow over (V)}_t′), written as Q′, while the optimal decision rule d_optgenerates Q(, {right arrow over (V)}_E, {right arrow over (V)}_t). Each realization of d and d_optresulting from each ensemble {right arrow over (X)}′({right arrow over (v)}_c′, {right arrow over (V)}_E′, {right arrow over (V)}_t′) and {right arrow over (X)}( custom-character , {right arrow over (V)}_E, {right arrow over (V)}_t), respectively, in the Monte Carlo simulation will result in the randomization of the imperfect training loss function in Eq. (19) and the randomization of the cumulative loss function in Eq. (20):

IL
_TΔ
≡I(H;Q)−I(H;Q′) (19)

IL
_Q
′≈H(H)−I(H;Q′) (20)

In Eq. (19), the special case of {right arrow over (V)}_E′={right arrow over (V)}_Eand {right arrow over (V)}_t′={right arrow over (V)}_t, the loss due to the optimal training of d=d_optyields IL_TΔ=0 and IL_Q′=IL_Q. To narrow the focus of analysis, the training space ( custom-character , {right arrow over (V)}_E′, {right arrow over (V)}_t′) will be considered fixed and thus will become a component of the system control parameter . Therefore, d becomes fixed by design as d.

Independent Sources of Uncertainty Loss: Loss due to isolated sources of uncertainty within the channel can be computed to provide a means to characterize the relative impacts to information flow at various points in the channel. The various sources of sensing uncertainty induce information loss in the channel as characterized by the random link loss functions IL_SΔ, IL_FΔ, IL_DΔ, and IL_TΔ. The prior distributions on the random parameters within {right arrow over (V)}_Eand {right arrow over (V)}_tare propagated to the respective loss functions using Monte Carlo simulation.

Definition 2: The expected value of the link information loss can be written as the expected values of the individual random loss components as in Eqs. (21.a)-(21.d):

$\begin{matrix} μ_{{IL}_{S_{Δ}}} = E {{IL}_{S Δ}} & (21. a) \\ μ_{{IL}_{F_{Δ}}} = E {{IL}_{F Δ}} & (21. b) \\ μ_{{IL}_{D_{Δ}}} = E {{IL}_{D Δ}} & (21. c) \\ μ_{{IL}_{T_{Δ}}} = E {{IL}_{T Δ}} & (21. d) \end{matrix}$

The sensing uncertainty factors within {right arrow over (V)}_Eand {right arrow over (V)}_tare assumed to be independent. Given that the total loss function IL_Totalcan account for multiple independent sources of uncertainty within the parameter space of ( custom-character , {right arrow over (V)}_E, {right arrow over (V)}_t), the variance on IL_Totalis the sum of the individual variances within the components of IL_Total.

Corollary II: Assuming ne factors within {right arrow over (V)}_Eand nt factors within {right arrow over (V)}_t, the link loss variance can be decomposed as given in Eqs. (22.a)-(22.d):

$\begin{matrix} σ_{{IL}_{S_{Δ}}}^{2} = σ_{{IL}_{S_{Δ_{(V_{E 1})}}}}^{2} + \dots σ_{{IL}_{S_{Δ_{(V_{n_{e}})}}}}^{2} + σ_{{IL}_{S_{Δ_{(V_{t 1})}}}}^{2} + \dots σ_{{IL}_{S_{Δ_{(V_{n_{t}})}}}}^{2} & (22. a) \\ σ_{{IL}_{F_{Δ}}}^{2} = σ_{{IL}_{F_{Δ_{(V_{E 1})}}}}^{2} + \dots σ_{{IL}_{F_{Δ_{(V_{n_{e}})}}}}^{2} + σ_{{IL}_{F_{Δ_{(V_{t 1})}}}}^{2} + \dots σ_{{IL}_{F_{Δ_{(V_{n_{t}})}}}}^{2} & (22. b) \\ σ_{{IL}_{D_{Δ}}}^{2} = σ_{{IL}_{D_{Δ_{(V_{E 1})}}}}^{2} + \dots σ_{{IL}_{D_{Δ_{(V_{n_{e}})}}}}^{2} + σ_{{IL}_{D_{Δ_{(V_{t 1})}}}}^{2} + \dots σ_{{IL}_{D_{Δ_{(V_{n_{t}})}}}}^{2} & (22. c) \\ σ_{{IL}_{T_{Δ}}}^{2} = σ_{{IL}_{T_{Δ_{(V_{E 1})}}}}^{2} + \dots σ_{{IL}_{T_{Δ_{(V_{n_{e}})}}}}^{2} + σ_{{IL}_{T_{Δ_{(V_{t 1})}}}}^{2} + \dots σ_{{IL}_{T_{Δ_{(V_{n_{t}})}}}}^{2} & (22. d) \end{matrix}$

Definition 3: The expected value of the cumulative link information loss can then be written as the expected values of the individual random cumulative loss components as in Eqs. (23.a)-(23.d), which may be used to obtain the cumulative component information loss as in Step 330 in FIG. 3A:

μ_IL_{{right arrow over (X)}}=E {IL_{{right arrow over (X)}}} (23.a)

μ_IL_{{right arrow over (Y)}}=E {IL_{{right arrow over (Y)}}} (23.b)

μ_IL_Q=E {IL_Q} (23.c)

μ_IL_Q′=E {IL_Q′} (23.d)

Corollary III: Assuming n_efactors within {right arrow over (V)}_Eand n_tfactors within {right arrow over (V)}_t, the cumulative link loss variance can be decomposed as given in Eqs. (24.a)-(24.d):

$\begin{matrix} σ_{{IL}_{\vec{X}}}^{2} = σ_{{IL}_{{\vec{X}}_{(V_{E 1})}}}^{2} + \dots σ_{{IL}_{{\vec{X}}_{(V_{n_{e}})}}}^{2} + σ_{IL {\vec{X}}_{(V_{t 1})}}^{2} + \dots σ_{{IL}_{{\vec{X}}_{(V_{n_{t}})}}}^{2} & (24. a) \\ σ_{{IL}_{{\vec{Y}}_{Δ}}}^{2} = σ_{{IL}_{{\vec{Y}}_{(V_{E 1})}}}^{2} + \dots σ_{{IL}_{{\vec{Y}}_{(V_{n_{e}})}}}^{2} + σ_{{IL}_{{\vec{Y}}_{(V_{t 1})}}}^{2} + \dots σ_{{IL}_{{\vec{Y}}_{(V_{n_{t}})}}}^{2} & (24. b) \\ σ_{{IL}_{Q}}^{2} = σ_{{IL}_{Q_{(V_{E 1})}}}^{2} + \dots σ_{{IL}_{Q_{(V_{n_{e}})}}}^{2} + σ_{{IL}_{Q_{(V_{t 1})}}}^{2} + \dots σ_{{IL}_{Q_{(V_{n_{t}})}}}^{2} & (24. c) \\ σ_{{IL}_{Q^{'}}}^{2} = σ_{{IL}_{{Q^{'}}_{(V_{E 1})}}}^{2} + \dots σ_{{IL}_{{Q^{'}}_{(V_{n_{e}})}}}^{2} + σ_{{IL}_{{Q^{'}}_{(V_{t 1})}}}^{2} + \dots σ_{{IL}_{{Q^{'}}_{(V_{n_{t}})}}}^{2} & (24. d) \end{matrix}$

Propagating Link Loss to Link Performance: The variance and mean of the random cumulative loss components IL_{{right arrow over (X)}}, IL_{{right arrow over (Y)}}, IL_Qand IL_Q′ are used directly to determine the variance on the performance at the random link performance components P_e^{{right arrow over (X)}}, P_e^{{right arrow over (Y)}}, P_e^Q, and P_e^Q′. The Maximum Likelihood Estimate (MLE) of P_eis inferred at each realization of the sufficient statistical support about ( custom-character , {right arrow over (V)}_E, {right arrow over (V)}_t), providing the random mapping to performance P_eat each link as in Step 335 in FIG. 3A.

Corollary IV: Given sufficient sampling of the space of {right arrow over (V)}_Eand {right arrow over (V)}_twithin the finite alphabet |x| and |Y|, the environmental and position estimate uncertainty factors result in the respective random performance at {right arrow over (X)} and {right arrow over (Y)} given by functions P_e^X( custom-character , {right arrow over (V)}_E, {right arrow over (V)}_t) and P_e^Y(, {right arrow over (V)}_E, {right arrow over (V)}_t) as in Eqs. (25) and (26):

P
_e
^{{right arrow over (X)}}
≡P
_e
^X( custom-character , {right arrow over (V)}_E, {right arrow over (V)}_t)≈F(IL_{{right arrow over (X)}}) (25)

P
_e
^{{right arrow over (Y)}}
≡P
_e
^Y( custom-character , {right arrow over (V)}_E, {right arrow over (V)}_t)≈F(IL_{{right arrow over (Y)}}) (26)

If the conditions of Corollary IV hold and perfect training conditions are assumed where custom-character =, {right arrow over (V)}_E′={right arrow over (V)}_E, {right arrow over (V)}_t′={right arrow over (V)}_t, then the mapping to the decision rule d_optwill be optimal.

Corollary V: The output of the discrete random variable Q (from the finite alphabet |Q|) is driven by the inferred decision out of the application of each realization of {right arrow over (Y)} to d_opt. The random performance function P_e^Q( custom-character , {right arrow over (V)}_E, {right arrow over (V)}_t) can be expressed as random realization of the information loss in the channel, IL_Qin Eq. (18.c). Using the approximation form of Eq. (13) (assume I(Q;V)≈0), the random performance function P_e^Qis given by Eq. (27).

P
_e
^Q
≡P
_e
^Q( custom-character , {right arrow over (V)}_E, {right arrow over (V)}_t)≈F {IL_Q} (27)

The approximation in Eq. (27) can be replaced by an equality using the full representation in Eq. (4):

P
_e
^Q
=F {IL
_Q
+I(Q;V)} (28)

In Eqs. (27) and (28), the relaxation of the constraint {right arrow over (V)}_E′={right arrow over (V)}_Eand {right arrow over (V)}_t′={right arrow over (V)}_texpands the study of the effects of uncertainty to the loss due to the non-optimal training of d.

Corollary VI: The output of the discrete random variable Q′ (from the finite alphabetly |Q|) is driven by the inferred decision out of the application of each realization of {right arrow over (Y)} to d. The random performance function P_e^Q′( custom-character , {right arrow over (V)}_E, {right arrow over (V)}_t) can be expressed as random realization of the information loss in the channel, H(H)−I(H;Q′). Fixing the suboptimal decision rule d(=, β_c, {right arrow over (V)}_E′=β_E, {right arrow over (V)}_t′=β_t) and using the approximation form of Eq. (4) (assume I(Q;V′) ≈0), the random performance function P_e^Q′ is given by Eq. (29):

P
_e
^Q
′≡P
_e
^Q′( custom-character , {right arrow over (V)}_E, {right arrow over (V)}_t)≈F{IL_Q′}=F{H(H)−I(H;Q′)} (29)

The approximation in Eq. (29) is replaced by an equality using the full representation in Eq. (4):

P
_e
^Q
′≡P
_e
^Q′( custom-character , {right arrow over (V)}_E, {right arrow over (V)}_t)=f{H(H)−I(H;Q′)+I(Q′;V′)} (30)

Definition 4: The expected link performance under control parameters custom-character and in the presence of sensing uncertainty ({right arrow over (V)}_E, {right arrow over (V)}_t) is defined as the expectation of the random link performance components P_e^{{right arrow over (X)}}, P_e^{{right arrow over (Y)}}, P_e^Q, and P_e^Q′.

$\begin{matrix} μ_{P_{e_{\vec{X}}}} = E {P_{e}^{\vec{X}}} & (31. a) \\ μ_{P_{e_{\vec{Y}}}} = E {P_{e}^{\vec{Y}}} & (31. b) \\ μ_{P_{e_{Q}}} = E {P_{e}^{Q}} & (31. c) \\ μ_{P_{e_{Q^{'}}}} = E {P_{e}^{Q^{'}}} & (31. d) \end{matrix}$

Given a sufficient number of Monte Carlo samples over the random parameters in {right arrow over (V)}_Eand {right arrow over (V)}_t, the standard deviation of the random link component performance function is used as a measure of reliability. Reliability is interpreted as 95% confidence that any estimate would fall within the bounds of one standard deviation.

Definition 5: Reliability in predicted link performance is defined as the standard deviation (σ_P_e^{{right arrow over (X)}}, σ_P_e^{{right arrow over (Y)}}, σ_P_e^Q, and σ_P_e^Q′) of the respective random cumulative link performance associated with P_e^{{right arrow over (X)}}, P_e^{{right arrow over (Y)}, P}_e^Q, and P_e^Q′. The variability in link performance is defined as the square of the reliability; σ_P_e_{{right arrow over (X)}}², σ_P_e_{{right arrow over (Y)}}², σ_P_e_Q², and σ_P_e_Q²′.

Uncertainty in Performance: The independent sources of uncertainty contributing to σ_IL_{{right arrow over (X)}}²n Eq. (24.a) are individually functionally mapped to the variance on the random performance function P_e^{{right arrow over (X)}} to determine the respective effects on the reliability of the predicted link performance estimate as in Step 340 in FIG. 3A. The uncertainty is passing through the transcendental relationship between IL_{{right arrow over (X)}} and P_e^{{right arrow over (X)}}. The nature of the nonlinear relationship makes it difficult to commute the independent loss variance sources analytically. It is important to relate the independent sources of uncertainty underlying σ_IL_{{right arrow over (X)}}²to the corresponding set of variances that combine to equal the variance on P_e^{{right arrow over (X)}}.

It is possible to approximate the inverse entropy function (F) by a linear relationship about the mean of IL_{{right arrow over (X)}}: F(IL_{{right arrow over (X)}})=a+b·(IL_{{right arrow over (X)}}). The mean and variance of the approximation are then E[F(IL_{{right arrow over (X)}})]=a+b(μ_IL_{{right arrow over (X)}}) and Var[E(IL_{{right arrow over (X)}})]=b². (σ_IL_{{right arrow over (X)}}²).

Using established approximation techniques, the first order Taylor expansion of F around the mean μ_IL_{{right arrow over (X)}} of IL_{{right arrow over (X)}} is equal to:

F(IL_{{right arrow over (X)}})≈F(μ_{IL{right arrow over (X)}})+F′(μ_IL_{{right arrow over (X)}})·(IL_{{right arrow over (X)}}−μ_IL_{{right arrow over (x)}}) (32)

Using the Taylor Series expansion in Eq. (32), the approximation for E[F(IL_{{right arrow over (X)}})] and Var[F(IL_{{right arrow over (X)}})] are:

E[F(IL_{{right arrow over (X)}})]=E[P_e^{{right arrow over (X)}}]≈F(μ_{IL{right arrow over (X)}})=H⁻¹(μ_IL_{{right arrow over (X)}}) (33)

Var[F(IL_{{right arrow over (X)}})]=σ_P_e_{{right arrow over (X)}}²≈{F′(μ_IL_{{right arrow over (X)}})}²·(σ_{IL{right arrow over (X)}}²) (34)

and F′(μ_IL_{{right arrow over (X)}}can be shown to equal:

$F^{'} (μ_{{IL}_{X}^{\to}}) = \log [\frac{1}{[\frac{H^{- 1} (μ_{{IL}_{\vec{X}}})}{1 - H^{- 1} (μ_{{IL}_{\vec{X}}})}}]$

Assuming n_efactors within {right arrow over (V)}_Eand n_tfactors within {right arrow over (V)}_t, the cumulative link loss variance components given in Eq. (24.a) are applied to Eq. (34):

$\begin{matrix} σ_{P_{e}^{\vec{X}}}^{2} \approx {F^{'} (μ_{{IL}_{\vec{X}}})}^{2} \cdot {σ_{{IL}_{{\vec{X}}_{(V_{E 1})}}}^{2} + \dots σ_{{IL}_{{\vec{X}}_{(V_{n_{e}})}}}^{2} + σ_{IL {\vec{X}}_{(V_{t 1})}}^{2} + \dots σ_{{IL}_{{\vec{X}}_{(V_{n_{t}})}}}^{2} & (35) \end{matrix}$

The variance on the performance estimate P_e^{{right arrow over (X)}} is then decomposed into the individual sources of sensing uncertainty being propagated through the decision space at {right arrow over (X)}.

$\begin{matrix} σ_{P_{e}^{\vec{X}}}^{2} \approx σ_{P_{e_{E}}^{\vec{X}}}^{2} + \dots σ_{P_{e_{n_{e}}}^{\vec{X}}}^{2} + σ_{P_{e_{t}}^{\vec{X}}}^{2} + \dots σ_{P_{e_{n_{t}}}^{\vec{X}}}^{2} & (36) \end{matrix}$

Similar methods are applied to the independent contributions to the sensing uncertainty of {right arrow over (V)}_Eand {right arrow over (V)}_tcomprising the variances

$σ_{{IL}_{{\vec{Y}}_{Δ}}}^{2}, σ_{{IL}_{Q}}^{2}, and σ_{{IL}_{Q^{'}}}^{2}$

at {right arrow over (Y)}, Q, and Q′ respectively.

Stability of the Linear Approximation: The validity of the linear approximation in Eq. (34) requires σ_IL_{{right arrow over (X)}}²be small. Thus, the contributing sources of sensing uncertainty within σ_IL_{{right arrow over (X)}}²mustbe individually small. Given that the regime of interest is one where μ_IL_{{right arrow over (X)}} and thus E[ P_e^{{right arrow over (X)}}] are small, the derivative (slope) evaluated at μ_IL_{{right arrow over (X)}} is relatively small. The slope within this regime is illustrated in FIG. 7 for an arbitrary operating point (the inverse Entropy Function; f(z)=H⁻¹(w)). The slope

$\frac{{dH}^{- 1} (w)}{dw}$

is plotted in FIG. 8 for w ∈[0, 1].

Dimensionality and Computing: The computation of the entropy of {right arrow over (X)} involves the joint probability mass function (PMF) of the random multivariate {right arrow over (X)} and is complicated by the large dimensional nature of the observation mapping H→{right arrow over (X)}. It is desired to compute the discrete entropy for {right arrow over (X)} absent any assumption regarding dependence between the respective dimensions of {right arrow over (X)}. If the {right arrow over (X)} space consists of K random variables (dependent or independent) and the random variable X_k; k ∈ {1, K} has nb distinct bins (statistical divisions), then the size of the alphabet of {right arrow over (X)}, |{right arrow over (x)}| is given in Eq. (37):

|{right arrow over (x)}|=Π
_k=1
^k=K
n
_k (37)

For example, if K=3 and n_k=2=n_bfor all k, |{right arrow over (x)}|=2·2·2=8.

The joint PMF of {right arrow over (X)}, p (x_k_N^j); k ∈ {1, K}, j ∈ {1,n_b} is generated from a finite N sample ensemble and discretely binned with nb statistical divisions within each of the K elements of {right arrow over (X)}. Stable entropic estimates require the statistics of the multivariate PMF be sampled sufficiently. A reasonable example in the context of the HRR example with K=10 and n_b=5 for all k would present a theoretical typical set of 5¹⁰=9,765,625. The typical set represents the set of most probable events and contains almost all of the probability as the number of samples increases. In the case of the radar example developed here, this would be the set of most probable signature amplitude combinations for all K dimensions of {right arrow over (X)}. To generate a meaningful sample size for a PIVIF of this size, at least 10 times the typical set would need to be produced, which equals approximately 100 million samples. Thus K and n_bdrive the dimensionality of {right arrow over (X)} and subsequently the sampling requirements for each ensemble within the Monte Carlo simulation.

A high dimensional problem is one where the alphabet of {right arrow over (X)}, |{right arrow over (x)}|, underlying the random process far exceeds the number of samples observed (N), i.e.; |{right arrow over (x)}|>>N. Sensing systems typically operate within this high dimensional signature data space of |{right arrow over (x)}|. The high dimension arises due to factors within the space {right arrow over (X)}( custom-character , {right arrow over (V)}_E, {right arrow over (V)}_t). Hypothesis testing and inference within the high dimensional space of {right arrow over (X)} in turn leads to large sampling requirements to adequately determine the underlying statistical nature of the phenomenon under study. Without accurate determination of the underlying system statistics, poorly performing hypothesis tests and/or parameter estimation occur (Bias/Variance tradeoff).

The number of statistical bins, n_b, within the discrete sampling of the K element joint PMF of {right arrow over (X)} also has a significant effect on |{right arrow over (x)}| as well as the entropy computation of {right arrow over (X)}. An increase in size of n_bin {right arrow over (X)} will result in an increase in the entropy of {right arrow over (X)}. However, in the limit, the value for I(H; {right arrow over (X)}) as a function of nb asymptotes to a constant value after one reaches the full intrinsic dimensionality of the subspace of I(H; {right arrow over (X)}). This will be true for I(H; {right arrow over (Y)}), I(H;Q), and I(H;Q′) as well. A method for determining the intrinsic dimensionality of {right arrow over (X)} is then needed to guide the selection of N.

Sample Size and Minimum Sampling Requirements: The methods used to determine the minimum sampling requirements for entropy estimation and the variance parameters of these entropy estimations (Step 315 in FIG. 3A) are now described. In one embodiment, a method for determining component level ensemble sampling requirements N_Mbegins with determining a set of test criteria for maximum sampling uncertainty of the component-level information loss relative to the component-level probability of error statistical distributions, followed by determining a sample ensemble size N_Mfor the component-level information loss using a phase transition method and computing the component-level performance reliability using a numerical simulation method such as Monte Carlo modeling on an ensemble size N_M.

The link performance variability estimate at each of the respective links, σ_P_e_{{right arrow over (X)}}², σ_P_e_{{right arrow over (Y)}}², σ_P_e_Q², and σ_P_a_Q²′ are generated though a sufficient number of draws from the respective random link performance functions P_e^{{right arrow over (X)}}, P_e^{{right arrow over (Y)}}, P_e^Q, and P_e^Q′. Each draw involves the estimation of an entropic quantity computed from PMF p(x_k_N^j) based on the N sample ensemble taken from {right arrow over (X)}. The estimate of the link performance variability at {right arrow over (X)}, {circumflex over (σ)}_P_e_{{circumflex over (x)}}², , is written more precisely as in Eq. (38):

$σ_{{\hat{P}}_{e_{N}}^{\vec{X}}}^{2}$

is defined as the N sample estimation variance or “sampling uncertainty”

$\begin{matrix} {\hat{σ}}_{P_{e}^{\vec{X}}}^{2} = σ_{P_{e}^{\vec{X}}}^{2} + σ_{{\hat{P}}_{e_{N}}^{\vec{X}}}^{2} & (38) \end{matrix}$

associated with the true variability σ_P_e_{{right arrow over (X)}}². Eq. (38) can be written as

${\hat{σ}}_{P_{e}^{\vec{X}}}^{2} = σ_{P_{e}^{\vec{X}}}^{2} = (1 + σ_{{\hat{P}}_{e_{N}}^{\vec{X}}}^{2} / σ_{P_{e}^{\vec{X}}}^{2}) .$

For the high dimensional problem, N must be large enough for:

$\begin{matrix} \frac{σ_{{\hat{P}}_{e_{N}}^{\vec{X}}}^{2}}{σ_{P_{e}^{\vec{X}}}^{2}} << 1 & (39) \end{matrix}$

The objective then is to produce link reliability estimates that are within this regime. The choice of N must be selected to ensure the uncertainty of the entropic estimate is much less than the reliability limits realized due to various factors within ( custom-character , {right arrow over (V)}_E, {right arrow over (V)}_t) under study. That is, the ensemble size N of {right arrow over (X)}, {right arrow over (Y)}, Q, and Q′ should be sufficiently large to ensure that the variance of the estimate falls within three significant digits of the variability levels

$(σ_{{IL}_{{\vec{Y}}_{Δ}}}^{2}, σ_{{IL}_{Q}}^{2}, and σ_{{IL}_{Q^{'}}}^{2}) .$

Thus for the case of variability at {right arrow over (X)}, it is desired that

$\frac{σ_{{\hat{P}}_{e_{N}}^{\vec{X}}}^{2}}{σ_{P_{e}^{\vec{X}}}^{2}} < .001 .$

As stated above, |{right arrow over (x)}| in particular, can grow to large levels and as such the number of samples required will grow as well. Given that the sampling ensemble size N of {right arrow over (X)} is likewise imposed on {right arrow over (Y)} and Q, the following analysis is focused on the process at {right arrow over (X)}.

From FIG. 8, it is evident that the slope at an operating point within this regime will be on the order of 0.25, affording reduced sensitivity to effects of the size of σ_IL_{{right arrow over (X)}}².

Phase Transitions and the Typical Set: The entropy computation requires the development of the joint mass function associated with the multi-variate {right arrow over (X)}, p(x_k^j); j ∈ {1: n_b}, k ∈ {1 K}. The development of this mass function assumes no independence between the K indices of {right arrow over (X)} and is performed using a “linked list” approach to limit the memory requirements during computation. A doubly linked list implementation with a hash table search approach yields a computational complexity of O(N). The Miller-Madow estimate provides a faster convergence over the MLE method for finite sample estimates.

Maximum Likelihood Estimate of H({right arrow over (X)}_k):

$\begin{matrix} {\hat{H}}_{{MLE}_{N}} (X_{k}) = \sum_{j = 1}^{n_{b}} - p (x_{k_{N}}^{j}) \log_{2} {p (x_{k_{N}}^{j})} & (40) \end{matrix}$

Miller-Madow Estimate of H({right arrow over (X)}_k) (note: M₊=number of statistical bins for which p(x_k_N^j)≠0:

Ĥ
_MM
_N(X_k)=Ĥ_MLE_N(X_k)+{1/(2N)}{M₊−1} (41)

The N sample estimates for Ĥ_MLE_N(X_k) and Ĥ_MN_N(X_k) are generated from the joint mass function, p(x_k^j); j ∈{1: n_b}, k ∈{1:K}.

Phase transitions within the growth trajectory of the estimated entropy with increasing N are useful in defining the alphabet size |{right arrow over (x)}. The following illustration demonstrates the usefulness of this approach. The signature process under evaluation will be constructed by design such that the actual entropy value is known. The multivariate random signature vector {right arrow over (X)} is modeled to be uniformly distributed (standard uniform {0,1}) with n_b=6 (all indices of {right arrow over (X)}) and K=3. The theoretical maximum value of the entropy of {right arrow over (X)} is then log2(n_b^K) or log₂(6³)=7.7549 Bits. In FIG. 9, the estimate of the discrete entropy of {right arrow over (X)} is incrementally generated for an increasing number of samples. The typical set of {right arrow over (X)} is plotted for each increment. The typical set A_ε=2^{H({right arrow over (x)})}is computed from the discrete entropy H({right arrow over (X)}). Each of the estimated values for the typical set of {right arrow over (X)} asymptote at the maximum dimensionality of {right arrow over (X)} where the theoretical values of H({right arrow over (X)})=7.7549 Bits and A_ε=216.

Initially, the samples are filling the open high dimensional space of {right arrow over (X)} in a uniform fashion. The linear dashed line represents the log2(N) growth of the entropy associated with this uniform distribution. Note that the actual achieved entropy computation begins to diverge from a uniform distribution. Only after the samples of {right arrow over (X)} begin to accumulate in the bin space of the joint mass function of {right arrow over (X)} does this transition occur. This phase transition point represents the point at which the fundamental statistics of {right arrow over (X)} change.

The phase transition point is determined from the intersection of the line tangent to the linear portion of the typical set profile and the line tangent to the asymptotic portion of the profile. The number of samples coinciding with this phase transition point is N_T. For the example here, N_Tis found to be approximately 250 as illustrated in FIG. 9, which illustrates the phase transitions in {right arrow over (X)} and computing the minimum sampling N_Musing the MLE method. The minimum number of samples, N_M, is taken to be 100 times the value of N_T. In this example, N_Mis found to be 25,000. The Miller-Madow estimate for entropy Ĥ_MM_N(X_k) is used for all entropic computation within the remaining body of this analysis.

Sampling Uncertainty for Probability of Error Estimate: Since the random estimation error variable is essentially the sum of many independently distributed random variables, the estimation error is Gaussian. The standard deviation of the Gaussian distribution of Î(H;{right arrow over (X)}), will then scale as a function of 1/N. Thus the variance on the estimate Î(H; {right arrow over (X)}),

$σ_{{\hat{I} (H; x)}_{2 N_{T}}}^{2},$

can be scaled to a large sample size (σ_Î(H,x)_N²). The standard deviation of the estimate {circumflex over (P)}_e^Xcan be determined from the independent contributions of H(H) and Î(H; {right arrow over (X)}) shown in Eq. (42).

{circumflex over (P)}
_e
^X
≈H
⁻¹(H(H)−Î(H; {right arrow over (X)})) (42)

For the equal probable binary hypothesis case, H(H) is equal to 1 Bit. Therefore the sampling uncertainty

$σ_{{\hat{P}}_{e_{N}}^{\vec{X}}}^{2}$

is a function only of σ_Î(H;x)_N².

As previously noted, the inverse entropy function in Eq. (42) is a transcendental function and as such the variance on the estimate {circumflex over (P)}_e^X,

$σ_{{\hat{P}}_{e_{N}}^{\vec{X}}}^{2},$

can be very difficult to determine analytically. Following a similar line of analysis found in Eqs. (33) and (34), the mean and variance of {circumflex over (P)}_e^Xcan be calculated as:

$\begin{matrix} E [{\hat{P}}_{e}^{X}] \approx H^{- 1} (1 - μ_{{\hat{I} (H; x)}_{2 N_{T}}}) & (43) \\ σ_{{\hat{P}}_{e_{N}}^{x}}^{2} \approx {\overset{'}{f} (μ_{{\hat{I} (H; x)}_{2 N_{T}}})} 2 \cdot σ_{{\hat{I} (H; x)}_{N}}^{2} & (44) \end{matrix}$

The use of Eq. (44) requires an estimate of the mean of Î(H;{right arrow over (X)}), which is taken to be the sample mean μ_Î(H;x)_N². The ultimate goal is to learn the sampling uncertainty for {circumflex over (P)}_e^X,

$σ_{{\hat{P}}_{e_{N}}^{\vec{X}}},$

from a low sample estimate of the mean of Î(H; {right arrow over (X)}),

$μ_{{\hat{I} (H; x)}_{2 N_{T}}} .$

Manipulating Eq. (44) above, σ_Î(H;x)_N²can be written in terms of the required variance on the estimate of error,

$σ_{{\hat{P}}_{e_{Req}}^{x}}^{2} :$

$\begin{matrix} σ_{{\hat{I} (H; x)}_{N}}^{2} \leq σ_{{\hat{P}}_{e_{Req}}^{x}}^{2} \cdot {\log (\frac{H^{- 1} (1 - μ_{{\hat{I} (H; x)}_{2 N_{T}}})}{1 - H^{- 1} (1 - μ_{{\hat{I} (H; x)}_{2 N_{T}}})})}^{2} & (45) \end{matrix}$

To ensure

$σ_{{\hat{P}}_{e_{N}}} \leq σ_{{\hat{P}}_{e_{Req}}},$

the relationship in Eq. (45) is essential.

The regime of interest is where Î(H; {right arrow over (X)}) is close to 1 and

$H^{- 1} (1 - μ_{{\hat{I} (H; x)}_{2 N_{T}}})$

and thus {circumflex over (P)}_e^Xis small. The derivative of the estimate in this regime is on the order of 0.25 as illustrated in FIG. 8. A slope of 0.25 is small relative to the range of values given in FIG. 8, yet large with respect to

$μ_{{\hat{I} (H; x)}_{2 N_{T}}} .$

Therefore, errors in the estimate of μ_Î(H,x)can have a significant impact on the estimate of the number of samples required to reach a target sampling uncertainty of

$σ_{{\hat{P}}_{e_{Req}}^{x}}^{2} .$

This means that a conservative approach is needed to estimate E[Î(H; {circumflex over (X)})] based on a small number of samples. Instead of using the sample mean

$μ_{{\hat{I} (H; x)}_{2 N_{T}}}$

as an estimate of the expectation E{Î(H; {right arrow over (X)})}, a value somewhat less than the sample mean should be chosen.

Depending on the level of confidence required in the estimate of the number of samples N, a higher confidence estimate can be achieved by replacing

$μ_{{\hat{I} (H; x)}_{2 N_{T}}} with μ_{{\hat{I} (H; x)}_{2 N_{T}}} - σ_{{\hat{I} (H; x)}_{2 N_{T}}}$

in Eq. (45).

As discussed above, the variance on the estimate Î(H; {right arrow over (X)}),

$σ_{{\hat{I} (H; x)}_{2 N_{T}}}^{2},$

can be scaled to large sample size

$(σ_{{\hat{I} (H; x)}_{N}}^{2}) .$

The mean of the estimate of Î(H; {right arrow over (X)}),

$μ_{{\hat{I} (H; x)}_{2 N_{T}}},$

and the standard deviation,

$σ_{{\hat{I} (H; x)}_{2 N_{T}}},$

can be estimated using the low number of samples (N=2N_T).

Sampling Uncertainty versus Variability in Performance: The expression in Eq. (45) provides guidance on the level of sampling uncertainty associated with Î(H; {right arrow over (X)}) that is required to achieve the corresponding sampling uncertainty in {circumflex over (P)}_e^X. A more important question relevant to the study of uncertainty and performance estimation is the relationship introduced in Eq. (44) and written in general form:

$\begin{matrix} \frac{σ_{{\hat{P}}_{e_{N}}^{\vec{X}}}^{2}}{σ_{P_{e}^{\vec{X}}}^{2}} < α & (46) \end{matrix}$

The variable α may be set to limit the degree of sampling uncertainty to be realized in the performance confidence analysis. Using Eq. (44), Eq. 34 and the fact that σ_IL_{{right arrow over (X)}}²=σ_{I(H;{right arrow over (X)})}², Eq. (46) may be written as in Eq. (47):

$\begin{matrix} \frac{σ_{{\hat{P}}_{e_{N}}^{\vec{X}}}^{2}}{σ_{P_{e}^{\vec{X}}}^{2}} \approx [\frac{σ_{{\hat{I} (H; X)}_{2 \cdot N_{T}}}^{2}}{σ_{I (H; \vec{X})}^{2}}] . β (N, N_{T}) & (47) \end{matrix}$

The factor β(N,N_T) in Eq. (47) is given as:

$β (N, N_{T}) = [\frac{4 \cdot N_{T}^{2}}{N^{2}}] \cdot [\frac{\log [\frac{H^{- 1} (1 - μ_{I (H; \vec{X})})}{1 - H^{- 1} (1 - μ_{I (H; \vec{X})})}]}{\log [\frac{H^{- 1} (1 - μ_{{\hat{I} (H; X)}_{2 \cdot N_{T}}})}{1 - H^{- 1} (1 - μ_{{\hat{I} (H; X)}_{2 \cdot N_{T}}})}]}] .$

Thus, the expression in Eq. (48) may be used to test for conditions specified in Eq. (46):

$\begin{matrix} [\frac{σ_{{\hat{I} (H; X)}_{2 \cdot N_{T}}}^{2}}{σ_{I (H; \vec{X})}^{2}}] \cdot β (N, N_{T}) < α & (48) \end{matrix}$

The FBIT model provides a platform for the study and analysis of the relationship of the level of sampling uncertainty to the level of performance uncertainty. Incremental values for the ratio on the left side of Eq. (48) can be computed for increasing N. The point at which the inequality is obeyed is related to the phase transition minimum sample methods previously generated.

The following examples and methods are presented as illustrative of the present disclosure or methods of carrying out the invention, and are not restrictive or limiting of the scope of the invention in any manner.

An Information Flow Numerical Example: The application of the FBIT method to the study of uncertainty propagation is now illustrated within a simple radar sensor example. An information loss budget is constructed for a baseline design. Selected forms of uncertainty in Table 1 are introduced into the system to demonstrate the analysis of the effects of propagating uncertainty through the information sensing channel.

Observed Target Scattering Model: In the high frequency regime used to obtain HRR signatures, the target may be approximated as a collection of scattering centers valid over a limited aspect window and frequency band. These scattering centers may be considered to be localized to a point and may represent a variety of scattering phenomena ranging from specular reflection to diffraction phenomena such as edge and tip diffraction. The fields radiated by these point scatterers depend upon both temporal and spatial frequencies (angular dependence). Because the radar illuminating the target has finite bandwidth and is a one dimensional imaging system, the target is seen as a collection of contiguous swaths of range, with each range swath corresponding to a particular range. The extent of each range swath, range resolution, depends upon the signal bandwidth. For a typical extended target of interest, each range swath contains a number of scattering centers which can be widely spaced in cross-range.

The electromagnetic field obtained as a result of the interference of the scattered fields from the scattering centers appears as the signal corresponding to a particular range bin of the target signature. The target signature may be considered to be a one dimensional image of the reflectivity (or scattering) profile of the target for a given azimuth/elevation aspect angle (θ, ϕ) and bandwidth. The mathematical definition of the radar signature is developed from the normalized scattered field in Eq. (49), where {right arrow over (E)}^sand {right arrow over (E)}ⁱare the scattered field and the incident field, respectively:

$\begin{matrix} S (θ, φ) = \lim_{R \to \infty} 4 π R^{2} \frac{{\vec{E}}^{S}}{\langle {\vec{E}}^{i} \rangle} & (49) \end{matrix}$

Using scattering center modeling and the far field approximation, Eq. (49) can be written in terms of the target aspect angle and the transmitted wavelength as shown in Eq. (50):

$\begin{matrix} S_{E} (θ, φ, λ) \sum_{m = 1}^{M} \sqrt{σ_{m}} e^{j \frac{4 π}{λ} R_{m} (θ, φ)} & (50) \end{matrix}$

In Eq. (50), S_Eis the band-limited frequency response of the target comprised of M scattering centers at the respective range R_m. Conditioned on the target hypothesis H at a fixed aspect angle (θ_i, ϕ_i), {right arrow over (S)}_E(θ_i, ϕ_i)=S_E, (θ_i, ϕ_i, λ), λ∈ {λ_l, λ_l+1. . . λ_f} defines the band-limited frequency response of the normalized scattered field measurements given in Eq. (50). Clusters of simple scattering centers are chosen for targets of interest at X-band frequencies (8-12 GHz) in the following development. The targets are electrically large with dimensions in range and cross-range of many wavelengths. The target cluster of M isotropic scatters occupies the target volume within the radar sensor coordinate system illustrated in FIG. 10.

The three-dimensional target scattering center configuration for the two targets examined in the following example occupy an approximate cubic volume of {x=2, y=3, z=2.5} meters and are positioned at a line-of-site, {right arrow over (l)}os , of for (θ_t, ϕ_t)=10°,7.5°. Both targets are comprised of 100 scattering centers of unity amplitude and three strong localized scattering clusters of amplitude 5. Target 1 differs from target 2 in that the length of the target 1 is shorter than target 2 in the Y dimension by 0.5 meters. One of the localized scattering clusters is also displaced by (0.2, 0.2, 0) meters.

Radar Sensor Model: Applying matched filter processing and the discrete Fourier transform to the observed signature {right arrow over (S)}_E(θ_i, ϕ_i) in additive noise, the measured HRR signature can be modeled for a range of frequencies present in the transmitted waveform. The multidimensional encoded source {right arrow over (X)}_Eⁱis defined here as the vector form of the time delay transformation of the band-limited frequency response {right arrow over (S)}_E(θ_i, ϕ_i). The measured random signature process {right arrow over (X)}_nⁱ, is then defined as in equation Eq. (51) were {right arrow over (n)} is additive white noise:

{right arrow over (X)}
_n
ⁱ
={right arrow over (X)}
_E
ⁱ
+{right arrow over (n)} (51)

The process {right arrow over (X)}_nⁱis modeled at the output of a radar step frequency measurement sensor system for the specified target aspect angle (θ_i, ϕ_i). The additive noise process {right arrow over (n)} is modeled as the sum of thermal white noise and quantization noise components. The quantization error component is thought of as a random process uncorrelated with both the signal and the thermal noise. The complete radar step frequency measurement model system parameters are summarized in Table 3.

The sensing of {right arrow over (X)}_nⁱin a dynamic, real world environment is subject to the uncertainties listed in area 1 of Table 1 leading to the random signature process {right arrow over (X)} as previously outlined and as summarized in Table 2. Given the dynamic nature of the phenomenon underlying these uncertainties, the statistics associated with the dimensions of {right arrow over (X)} are often time varying. The target statistics are assumed to be stationary (constant with time), thus, the sample signatures associated with this random vector correspond to a stationary random process. Given the short measurement times associated with radar measurements of the nature under study, this assumption is appropriate.

TABLE 3

SENSOR SUMMARY

Center frequency
9.6 GHz

Transmit Bandwidth
800 MHz

Number Bits in A/D
8 Bits

Conversion

Number of Pulses
1024

Integrated

Signal-to-Noise Ratio
20 dB

(time delay domain)
(variable)

Modeling Pose Angle Estimation Uncertainty: The observed object aspect angle estimate can be viewed as lying within a solid cone angle centered on the observed object aspect angle (θ_t, ϕ_t). The parameter σ_tis defined as the uncertainty associated with the sensor estimate of (θ_t, ϕ_t). The parameter σ_tand μ_tare elements of {right arrow over (V)}_tand are the standard deviation and bias of the object aspect angle estimate, respectively.

The variation in measured signature phenomenology due to the uncertainties in target aspect angle are generated in the signal model in Eq. (50) through the introduction of distributions on θ and ϕ. The parameters θ and ϕ are both modeled as Gaussian random variables each with variance σ_t²and mean μ_t+θ_t, μ_t+ϕ_t. The bias parameter μ_tis assumed to be unknown and is modeled uniformly distributed between the interval [−1, 1] degrees.

Modeling Leading Edge Position Estimation Uncertainty: The leading edge location estimation will vary under real world sensing conditions. Thus, the range alignment (along {right arrow over (l)}os) of the measured signature process {right arrow over (X)} to the decision rule training process {right arrow over (X)}′, is imperfect and can be modeled as an uncertainty source. The process {right arrow over (X)} alignment to {right arrow over (X)} is modeled through a positive bias applied to the phase center of the scattering cluster underlying {right arrow over (X)}. The bias parameter μ_ris assumed to be unknown and is modeled uniformly distributed between [0, 2] meters. Note that μ_ris another element of {right arrow over (V)}_t.

Modeling Imperfect Training: The training process component {right arrow over (X)}′ in FIG. 2 represents the best achievable statistical characterization of the observed signature process {right arrow over (X)}. Signature training processes must represent the radar measured signature process across a wide range of measurement uncertainties and target configurations, as well as under many uncertain operating conditions including clutter, obscuration, and other sources of radio frequency (RF) interference. Construction of a signature training database derived entirely from measurements is expensive and can be an impractical proposition. It is possible to construct a signature database using electromagnetic scattering codes. However, given the complexity of typical targets and the challenge of modeling a variety of electromagnetic scattering phenomena ranging from specular reflection to edge diffraction, smooth surface diffraction, etc., computation of signatures with sufficient accuracy is a challenging task. Within this analysis, the dissimilarity between {right arrow over (X)} with {right arrow over (x)}′ will be generated using a matched scattering center model configuration with k . The uncertain parameters of {right arrow over (V)}_tand {right arrow over (V)}_Emodeled within {right arrow over (X)} are not modeled in {right arrow over (X)}′. {right arrow over (X)}′={right arrow over (X)} only when {right arrow over (x)} is used directly for the training of the decision rule d.

Feature Discriminate and Decision Rule Design: The function f used to compute the feature discriminate {right arrow over (Y)} in FIG. 2 is developed from the squared error of the distance from the mean templates {right arrow over (μ)}_{{right arrow over (X)}}₁and {right arrow over (μ)}_{{right arrow over (X)}}₂derived from the marginal training processes {right arrow over (X)}′₁and {right arrow over (X)}′₂as defined below. The operator |{right arrow over (τ)}| is defined as the element-wise magnitude of each complex element of the random vector {right arrow over (τ)}.

{right arrow over (μ)}_{{right arrow over (X)}}₁′=E{|{right arrow over (X)}′₁|}, {right arrow over (μ)}_{{right arrow over (X)}′}₂=E{|{right arrow over (X)}′₂|}, {right arrow over (μ)}_{{circumflex over (X)}}₁₂′={right arrow over (μ)}_{{right arrow over (X)}}₁′−{right arrow over (μ)}_{{right arrow over (X)}′}₂

{right arrow over (Y)}
₁
=[{right arrow over (X)}
₁
|−{right arrow over (μ)}
_{{right arrow over (X)}}
₁₂
′]·[{right arrow over (X)}
₁|−{right arrow over (μ)}_{{right arrow over (X)}}₁₂′]^T

{right arrow over (Y)}
₂
=[{right arrow over (X)}
₂
|−{right arrow over (μ)}{right arrow over (X)}
₁₂
′][{right arrow over (X)}
₂|−{right arrow over (μ)}_{{right arrow over (X)}′}₁₂]^T

{right arrow over (Y)}=[{right arrow over (Y)}
₁
, {right arrow over (Y)}
₂]

The Maximum Likelihood estimator is used to determine the optimal decision rule d:

$d = E {\frac{E {{\vec{Y}}_{1}} - E {{\vec{Y}}_{2}}}{2}}$

Assuming equally likely priors on each of the binary hypotheses H₁and H₂in {right arrow over (X)} and {right arrow over (Y)}, the samples ({right arrow over (Y)}) from {right arrow over (Y)} are applied to the decision ruled . {right arrow over (Y)}<d are declared from Hi (denoted Q₁) and {right arrow over (Y)}>d are declared from H₂(denoted Q₂). The in-class and out-of-class scoring system is given by the conditional probabilities within α, β, γ, and κ as provided below:

α=p({right arrow over (X)}₁)·p(Q₁/{right arrow over (X)}₁), β=p({right arrow over (X)}₁)·p(Q₂/{right arrow over (X)}₁)

γ=p({right arrow over (X)}₂)·p(Q₁/{right arrow over (X)}₂, ) κ=p({right arrow over (X)}₂)·p(Q₂/{right arrow over (X)}₂)

The output of the decision algorithm Q as formed from the scoring system above can be summarized by the confusion matrix for the binary classifier given in Table 4:

TABLE 4

CONFUSION MATRIX FOR Q

Train Class

Test Class
{right arrow over (x)}′₁
{right arrow over (x)}′₂

{right arrow over (x)}₁
α
β

{right arrow over (x)}₂
γ
κ

Certainty States: The most certain state achievable for the example HRR radar example presented here is the case of the observed deterministic multivariate signal in noise ({right arrow over (X)}_nⁱ) when accompanied by perfect training ({right arrow over (X)}′={right arrow over (X)}_nⁱ). Table 5 relates selected combinations of measurement and training uncertainty sources from Table 1. The cases 1-6 identified in Table 5 represent the certainty states of interest within the system. Unknown parameters are shown in bold.

TABLE 5

MEASUREMENT AND TRAINING CERTAINTY CASES

Case
Training Level
Measurement Level

No.
{right arrow over (X)}′ = {right arrow over (X)}_n
{right arrow over (X)}_n

Case
σ_t= 0°, μ_t= 0°
σ_t= 0°, μ_t= 0°

1
σ_r= 0 m, μ_r= 0° SNR = 20 dB
σ_r= 0 m, μ_r= 0° SNR = 20 dB

(θ_t, φ_t) = 10°, 7.5°
(θ_t, φ_t) = 10°, 7.5°

Case
σ_t= .75°, μ_t
σ_t= .75°, μ_t

2
σ_r= 0 m, μ_r= 0° SNR = 20 dB
σ_r= 0 m, μ_r= 0° SNR = 20 dB

(θ_t, φ_t) = 10°, 7.5°
(θ_t, φ_t) = 10°, 7.5°

Case
σ_t= .75°, μ_t= 0°
σ_t= .75°, μ_t= 0°

3
σ_r= 0 m, μ_rSNR = 20 dB
σ_r= 0 m, μ_rSNR = 20 dB

(θ_t, φ_t) = 10°, 7.5°
(θ_t, φ_t) = 10°, 7.5°

Case
σ_t= .75°, μ_t= 0°
σ_t= .75°, μ_t= 0°

4
σ_r= 0 m, μ_r= 0° SNR
σ_r= 0 m, μ_r= 0° SNR

(θ_t, φ_t) = 10°, 7.5°
(θ_t, φ_t) = 10°, 7.5°

Case
σ_t= .75°, μ_t
σ_t= .75°, μ_t

5
σ_r= 0 m, μ_rSNR
σ_r= 0 m, μ_rSNR

(θ_t, φ_t) = 10°, 7.5°
(θ_t, φ_t) = 10°, 7.5°

Case
σ_t= .75°, μ_t
σ_t= .75°, μ_t

6
σ_r= 0 m, μ_r= 0° SNR = 20 dB
σ_r= 0 m, μ_rSNR

(θ_t, φ_t) = 10°, 7.5°
(θ_t, φ_t) = 10°, 7.5°

* note: Parameter μ_tmodeled uniform [−1°, 1°], Parameter μ, modeled uniform [0°, .2°], Parameter SNR modeled Gaussian (μ = 20 dB, σ²= 4 dB).

Assuming sufficient sampling to completely determine the probability density function (pdf) associated with the additive noise, the resulting statistical characteristics of the random performance functions will resemble the delta function and thus the reliability in predicted link performance (such as σ_p_e) will be very high as shown in case 1 of Table 5. In a less certain case, the signal under measurement is random in nature ({right arrow over (X)}). The expected performance of the random performance functions will reflect the loss in information due to the degree of uncertainty present in {right arrow over (X)} as well as a decrease in reliability. Given the progressively large number of degrees of freedom associated with the uncertainty parameters associated with {right arrow over (V)}_Eand {right arrow over (V)}_tin {right arrow over (X)}, the statistical support underlying the statistics of the random link performance functions P_e^{{right arrow over (X)}}, P_e^{{right arrow over (Y)}}, P_e^Q, and P_e^Q′ can quickly increase as is shown in case 3-5 in FIG. 11, which illustrates the propagation of uncertainty.

Case 1 of Table 5 represents an observed process {right arrow over (X)}_nof a stationary object of known aspect angle with perfect training. Case 1 conditions correspond to the highest certainty state possible. Case 2 corresponds to the observed process {right arrow over (X)} of an object that is moving slow enough as to appear stationary during the measurement interval. The aspect estimation is σ_t=0.75 degrees with an unknown bias (μ_t), and again the training is perfect. Case 3 conditions are similar with an unknown leading edge position bias μ_r.

The signal-to-noise ratio (SNR) parameter is treated as an unknown parameter in Case 4. Case 5 is a combined condition of the unknown parameters in Cases 2, 3, and 4. In case 6, a form of imperfect training is presented where the measurement parameter uncertainty provided in Case 5 is combined with training level B (μ_r=0 and μ_t=0).

Sampling and FBIT Analysis: The amplitude response for the N sample ensemble of HRR signatures for a “baseline” set of conditions defined as Case 2 (μ_r=0 and μ_t=0) are provided in FIGS. 12A and 12B (magnitude of {right arrow over (X)}₁and {right arrow over (X)}₂, respectively). The five target features (K=5) at range bins 17-21 are selected for discriminate processing in {right arrow over (X)}→{right arrow over (Y)}.

Sampling Uncertainty Example: The sampling uncertainty previously defined is illustrated using the baseline uncertainty conditions and multiple target ensembles similar to those previously discussed. Using the Monte Carlo simulation, the typical set for {right arrow over (X)}₁, {right arrow over (X)}₂, and {right arrow over (X)}, is computed for an increasing value for N. Multiple ensembles of each are simulated at each value of N to generate both the mean and variance of the entropy estimate within the typical set.

FIG. 13 illustrates the phase transition within a typical set of {right arrow over (X)}vs. N. The typical set plot in FIG. 13 provides the value for N_Mfor the entropy estimates for {right arrow over (X)} as defined previously where n_b=6. The number of samples required for each ensemble based on the phase transition at N_T=2×10³within the typical set profile is N_M=2×10⁵.

FIG. 14 demonstrates the entropy scaling property previously discussed by illustrating the scaled standard deviation of estimator of entropy of {right arrow over (X)} vs. N. In the following example, Monte Carlo simulation is used to compute the actual estimation variance (L draws=1000; n_b=6) at each incremental setting of N_M. The estimation variance at N_T=3×10³is scaled at each setting of N_Mto N_M=2×10⁵, validating the use of the 1/N scaling factor in Eq. (47).

The sampling uncertainty associated with entropic estimation at {right arrow over (X)} is realized within the estimate Î(H; {right arrow over (X)}). FIG. 15 applies the 1/N scaling directly to the MLE estimate of Î(H; {right arrow over (X)}), Î(H; {right arrow over (Y)}), and Î(H; Q) beginning at 2×N_T=6×10³.

In Eq. (47), Corollary IV and V are used to compute the sampling uncertainty associated with the estimate of the probability of error. The following figures demonstrate the accuracy of Corollary IV and V using Eq. (44), which is applied at each link in the radar channel. Note that each application of Eq. (44) is conducted with the 2×N_T=6×10³as the basis for the scaling. The approximation for the standard deviation of the probability of error is computed for the complete range of ensemble size out to N=3×10⁴. FIG. 16 provides a comparison of the probability of error estimate using Eq. (44) to the error computed using simulation where L=1000 and n_b=6. These results show that the estimates compare very nicely to the “actual” results. This agreement indicates that the dispersion of the mean mutual information of the estimate is low enough to support the use of the linear approximation.

The application of Eq. (44) at each draw of the Monte Carlo simulation will generate an approximation of the sampling uncertainty within the probability of error estimate. FIG. 17 illustrates the application of Eq. (44) to the results in FIG. 15.

Eq. (48) provides the test for minimum sampling based on low sample ensemble sizes. In FIG. 18, Eq. (48) is applied to the radar example at the three link positions {right arrow over (X)}, {right arrow over (Y)} and Q. The test results in FIG. 18 show that the true ratio of sampling variance to the variability in predicted link performance is given as a function of ensemble size N. This is indicated by the solid lines. The dashed lines represent the ratio as given by the 1/N scaling as discussed above. The required ratio a is given by the dashed black line at two different levels. The results of the test given in Eq. (48) are given at each increment for N_T=3×10³. The interesting observation in FIG. 18 is that the point at which the test falls below the threshold α is consistent with the ensemble size N_Mas derived from the phase transition point N_Tas previously outlined. This is a significant validation of the phase transition method for estimating minimum ensemble size within Monte Carlo simulation. The results of the three tests above provide insight into the relationship of the required ensemble size N to the reliability in link performance estimates within sensitivity analysis simulations.

The Fano Equality: It is important to demonstrate the validity of Theorem I as written in Eq. (3). Using the radar example, FIG. 19 illustrates that the addition of I(Q;V) brings the approximation form of Fano into agreement with the “true” probability of error as simulated using Monte Carlo within the radar example outlined above. Again using Case 2 (μ_r=0 and μ_t=0) conditions for the binary classification, the performance given by the Fano approximation is given by the line with asterisks (P_eFano). The simulated “true” performance is given by the line with circles (P_eactual). The line with stars (P_eFano Exact) represents the performance using the equality form of Fano in Theorem I. The equality form of Fano agrees with the “true” performance which validates Theorem I.

Experiments: The experiments conducted are given in Table 6:

TABLE 6

LIST OF EXPERIMENTS AND APPLICABLE CASES

Experiment
Case
Hypothesis

1. Information Flow
2
1

2. System Trades
2
2

3 System Uncertainty
1, 2, 3, 4, 5, 6
3, 4, 5,

and Information Flow

Information Flow and Design Trades within the Radar Channel: The value of the Data Processing Inequality is readily seen from FIGS. 20-22 where the individual loss at each link in the channel can be quantified. In each of the figures, the MI and probability of error is computed for a changing design parameter within v_c. Three design parameters are traded: system thermal noise, system dynamic range, and system bandwidth.

The signal-to-noise ratio of the signatures resulting from sensor measurements depends in part on the noise figure of the system. In FIG. 20, thermal noise is scaled by varying the noise figure across a range that affects a SNR range of 1 dB to 10 dB (SNR is given in frequency domain prior to inverse Fourier Transform gain). The results of the SNR trade indicate that an SNR of 8 dB in the frequency domain (19 dB in the time-delay domain after transform gain) will generate maximum information flow.

It is also of interest how the dynamic range of the sensor affects the information flow through the channel. Specifically, the sensitivity of I(H;Q) and ultimately P_eto the dynamic range in the sensor is of interest. The A/D conversion of the radar intermediate frequency (IF) signal to a digital representation must preserve the amplitude and phase information contained in the radar return with minimum error. The effects of quantization at each measurement point (quantization event) due to the twos-complement rounding error are assumed to be zero mean white noise processes. The A/D conversion and associated quantization noise are modeled as an additive noise component {right arrow over (e)} and added to the measured signature process.

{right arrow over (X)}_nⁱ={right arrow over (X)}_Eⁱ+{right arrow over (n)}+{right arrow over (e)} (52)

The maximum dynamic range supportable by a “B-bit” quantizer is the ratio of the largest representable magnitude to the smallest nonzero representable magnitude. The dynamic range for twos compliment and magnitude encoding for a “B-bit” quantizer is

$Dynamic Range (dB) = 20 \cdot \log_{10} (\frac{2^{(B - 1)} - 1}{1}) .$

The dynamic range trade in FIG. 21 indicates that a 3 or 4 Bit A/D converter is needed to maximize information flow in the channel given the binary target set under evaluation.

The analysis of the bandwidth trade in FIG. 22 can be nicely linked to the physical scattering configurations of target 1 and target 2. As mentioned earlier port, the locations for the non-collocated dominant scatterer differ by 0.2 meters or 0.65 feet.

One would then expect that there should be a ‘bump’ in information flow when the bandwidth reaches levels that support the resolution necessary to resolve the peaks associated with these two scatterers. The theoretical resolution to achieve this feature separation would be approximately 800 MHz using the fundamental bandwidth relationship;

$\frac{c \cdot τ}{2} = .6 feet = \frac{c}{2 \cdot BW} .$

In FIG. 22, the bump in performance is centered at 800 MHz where the mutual information at {right arrow over (Y)} and Q is rapidly increasing and where the probability of error is greatly reduced.

In each figure, it can be seen that the MI decreases as links move further down the channel. With one Bit going into the channel (binary classification problem), Table 7 tabulates the information loss budget for each trade study at the selected baseline operating point.

The study of Table 7 reveals several key points. First, in this particular example problem, the targets appear to be separating very well at {right arrow over (X)}, and much of the loss occurs within the feature extraction and at the application of the decision rule. The loss at link {right arrow over (Y)} appears to be the dominant information limiting component in the system. There is a loss of 0.3-0.4 Bits at the feature extraction function at {right arrow over (Y)}. The information loss associated with signature measurement and signature processing results in only 0.1 Bits of loss. This is very important information in the effective optimization of system design for information sensing. Little gain can be expected through the expansion of sensing degrees of freedom (DOF) in improving the overall performance of the system.

TABLE 7

INFORMATION LOSS BUDGET FOR VARIOUS TRADES

Information Loss, Bits

Trade 1

System Component
(SNR)
Trade 2 (DR)
Trade 3 (BW)

Source-to-Measurement ({right arrow over (X)})
0.1
0.1
0.05

Measurement-to-Feature ({right arrow over (Y)})
0.4
0.3
0.4

Decision Rule Application (Q)
0.1
0.2
0.1

Total Channel Loss*
0.6
0.6
0.55

*Baseline Conditions; SNR = 20 dB, BW = 800 MHz, DR = 20 dB

Also, the loss due to the decision component of the system is in the range of 0.1-0.2 Bits. Depending on the performance requirements of the system, improvements to the decision stage of the system may or may not be warranted. At the decision stage of the system, 0.4-0.5 bits of loss have been sustained resulting in an “upper bound” in performance of something in the area of P_e=0.1. No improvements to the classifier design within the decision component of the system can improve upon this performance level. Improvements appear to be best directed toward the feature extraction stage.

An optimal design operating point may for example include the following component selections: (i) A/D converter with B=4 Bits; (ii) receiver design that achieves 20 dB SNR under tactically significant conditions; and (iii) transmit waveform with BW >800 MHz.

Information Flow and System Uncertainty: The study of the effects of sources of uncertainty on system performance confidence while under control parameters custom-character and in the presence of sensing uncertainty ({right arrow over (V)}_E, {right arrow over (V)}_t) is of particular interest. For a fully sampled signature process with negligible sampling uncertainty per Eq. (46), the FBIT method can be applied to study the independent sources of uncertainty. The effects of each independent source of uncertainty can be studied at each link in the channel. Eq. (36) is demonstrated for links {right arrow over (X)}, {right arrow over (Y)}, and Q under case 5 conditions defined in Table 5. Under these conditions, three independent sources of uncertainty are introduced in the system under perfect training conditions. An unknown bias in target aspect estimation and an unknown bias in leading edge range bias estimation are assumed. The target range is also unknown and as such a third uncertainty in introduced in the SNR of the measured signature. All assumed statistics associated with the uncertainties are as defined under case 5 of Table 5 and as described previously.

Using Monte Carlo simulation, L independent draws of an N_Msample ensemble from {right arrow over (X)} are generated. The FBIT method is applied at each draw to generate the decomposition of the performance estimate reliability in Eq. (36) at {right arrow over (X)}, {right arrow over (Y)}, and Q. In FIG. 23, the cumulative link loss standard deviation defined in Eqs. (24.a), (24.b), and (24.c) resulting from the sum of the independent three uncertainty sources is computed about the expected link information loss defined in Eqs. (23.a), (23.b), and (23.c). To clearly illustrate the level of agreement of the independent link loss contributions to the total produced by the joint simulation, the individual contributions to the cumulative link loss variance are individually plotted in an incremental fashion in FIG. 23. FIG. 23 shows that the sum of the independent uncertainty sources yields the same results as the Monte Carlo simulation involving all three factors in a joint process.

The corresponding impacts to the reliability in link performance can be generated through the application of Corollary IV and V. In FIG. 24, the reliability in predicted link performance as quantified by Definition 5 resulting from the sum of the independent three uncertainty sources is presented in the error bars about the expected link performance defined in Eqs. (31.a), (31.b), and (31.c). The dashed line represents the results of the joint Monte Carlo simulation where all three independent uncertainty factors are simulated simultaneously. The results in FIG. 24 show clearly that the sum of the independent events equals the joint event, thus validating the assumption of independence in the three sources of uncertainty acting on the predicted performance risk.

In FIG. 25, a similar validation of the propagation of independent uncertainty sources is given for the reliability in predicted performance. The example demonstrates that the use of Corollary IV and V to approximate the reliability on the performance estimate using the link loss variance is a very effective means to address the transcendental relationship underlying this method. The data points marked with the asterisks represent the sum of the independent contributions to the reliability in performance prediction. The respective plotted lines represent the results of direct simulation at the specified link.

The implications of imperfect training are realized in the final stage of the channel at Q′ as shown in FIG. 24. At Q′, case 6 conditions in Table 5 are used to present a naive training approach as previously developed.

A summary of the expected link loss, expected link performance, reliability in link performance, and results of respective sampling uncertainty tests in FIG. 18 are given in Table 8. The reliability in predicted performance decreases as information propagates down the sensing channel. The expected link performance also decreases in accordance with the principles of mutual information and the Data Processing Inequality. Much of the decrease in reliability and loss in predicted performance and loss in performance comes at the feature extraction stage in the system. The reduced reliability in performance prediction is most sensitive to the uncertainty factor of SNR. The effects of the factors associated with target range bias and pose estimate bias are of less significance relative to the total reliability in predicted performance.

TABLE 8

INFORMATION CONFIDENCE & LOSS BUDGET

FOR VARIOUS CONDITIONS

Link Information Measure

Expected
Reliability
Sampling

Link Loss,
Link
in Link
Uncertainty

Link
Bits
Performance
Performance
Test

H
0.0
—
—
—

{right arrow over (X)}
μ_IL_SΔ = 0.05

μ_{P_{e_{\overset{}{X}}}} = 0.013

σ_{P_{e}^{\overset{}{X}}} = 0.003

(\frac{σ_{{\hat{P}}_{e_{N}}^{\overset{}{X}}}^{2}}{σ_{P_{e_{N}}^{\overset{}{X}}}^{2}}) < .001

{right arrow over (Y)}
μ_IL_FΔ = 0.35

μ_{P_{e_{\overset{}{Y}}}} = 0.073

σ_{P_{e}^{\overset{}{Y}}} = 0.0228

(\frac{σ_{{\hat{P}}_{e_{N}}^{\overset{}{Y}}}^{2}}{σ_{P_{e_{N}}^{\overset{}{Y}}}^{2}}) < .003

Q
μ_IL_DΔ = 0.16

μ_{P_{e_{Q}}} = 0.12

σ_{P_{e}^{Q}} = 0.0255

(\frac{σ_{{\hat{P}}_{e_{N}}^{Q}}^{2}}{σ_{P_{e_{N}}^{Q}}^{2}}) < .006

Q′
μ_IL_TΔ = 0.04

μ_{P_{e_{Q^{'}}}} = 0.125

σ_{P_{e}^{Q^{'}}} = 0.0266

—

From Table 8 it can be seen that gains in performance due to component design trades must also take into account the reliability level associated with predicted performance. In this example problem, changes within two significant digits of the expected performance should be studied in the context of the reliability of the performance estimates based on uncertainty factors introduced in the system.

By virtue of the foregoing, a method is provided for identifying and characterizing component-level information loss in a nonlinear system comprising a plurality of components, wherein at least one of the components of the nonlinear system is subject to at least one source of uncertainty, each source of uncertainty comprising a plurality of system uncertainty parameters, the method comprising the steps of: (a) determining discrete decision states for the nonlinear system, wherein the discrete decisions states comprise a true object state H and a decision state Q, the discrete decision states being characterized in a Markovian channel model comprising a plurality of links, wherein each link corresponds to one component of the nonlinear system; (b) modeling the system uncertainty parameters to create a plurality of distributions, wherein each distribution comprises a plurality of values ranging from a theoretical maximum entropy to a theoretical minimum entropy for one system uncertainty parameter, wherein at least one of the system uncertainty parameters is unknown; (c) calculating an entropy at each component, H(H), H(X), H(Y), . . . H(Q), wherein the entropy is directly related to an amount of uncertainty at each component; (d) computing an amount of mutual information between H and Q, I(H;Q),wherein I(H;Q) is used to characterize a total system performance and wherein the at least one source of uncertainty increases a total amount of entropy in the nonlinear system, thereby decreasing I(H;Q) and degrading the total system performance; (e) calculating an amount of cumulative component information loss from H to Q, IL_X, IL_Y, . . . IL_Q, wherein IL_Qis equal to a sum of the component-level information loss that occurs at each component, IL_XΔ, IL_YΔ, . . . IL_QΔ, and wherein component-level information loss occurs only within the Markovian channel model; (f) correlating, using Fano's equality, at least one of I(H;Q) and IL_Qto the total amount of entropy to generate at least one overall probability of error P_efor the nonlinear system; (g) estimating, using the Data Processing Inequality together with Fano's equality, a component-level probability of error, P_e^X, P_e^Y, . . . P_e^Q; and (h) correlating the component-level probability of error to the component-level information loss.

In one or more embodiments, the method further comprises computing a component-level performance reliability and attributing a contribution of each system uncertainty parameter to the component-level performance reliability, the method comprising the steps of: (a) determining a real world statistical variation of the system uncertainty parameters; (b) performing a Monte-Carlo simulation of a plurality of the statistical uncertainty parameters for a plurality of settings through iteration of steps 1b) to 1h); (c) calculating a component-level probability of error statistical distribution at each component; (d) determining the component-level performance reliability based on a standard deviation of each component-level probability of error statistical distribution; and (e) correlating the contribution of each system uncertainty parameter to the component-level performance reliability.

In an exemplary embodiment, the step of performing the Monte-Carlo simulation further comprises determining a proper ensemble sample size. In an exemplary embodiment, the method further comprising determining at least one component-level ensemble sampling requirement for the method of claim 1, the method comprising the steps of: (a) determining a set of test criteria for a maximum allowable sampling uncertainty of the component-level information loss relative to the component-level probability of error statistical distributions; (b) determining a sample ensemble size N_Mfor the component-level information loss using a phase transition method; and (c) computing the component-level performance reliability using a numerical simulation method on the sample ensemble size N_M. In a particular embodiment, the numerical simulation method comprises Monte Carlo modeling.

In another aspect of the present disclosure, a method is provided for determining an optimal component design for a nonlinear system comprising a plurality of components, wherein at least one of the components of the nonlinear system is subject to at least one source of uncertainty, each source of uncertainty comprising a plurality of system uncertainty parameters, the method comprising the steps of: (a) establishing an information loss budget comprising a desired P_e^Q; (b) calculating component-level information loss, IL_XΔ, IL_YΔ, . . . IL_QΔ, according to claim 1; (c) calculating component probability of error, P_e^X, P_e^Y, . . . P_e^Q, according to claim 1 to generate a calculated P_e^Q; (d) comparing the calculated P_e^Qwith the desired P_e^Q; (e) identifying at least one source of information reduction, wherein the at least one source of information reduction comprises at least one of component-level information loss and information flow reduction; (f) determining the optimal component design to minimize the calculated P_e^Q, wherein the optimal component design includes at least one tradeoff between information flow and component design, wherein the at least one tradeoff decreases the at least one source of information reduction; and (g) repeating steps 6b) to 6g) until the calculated P_e^Qis equal to or less than the desired P_e^Q.

In one or more embodiments, the method further comprising identifying at least two sources of information reduction, wherein the at least two sources of information reduction comprise at least one of component-level information loss and information flow reduction; ranking the at least two sources of information reduction according to impact on the calculated P_e^Q, wherein at least one dominant source of information reduction is identified; and determining the optimal component design to minimize the calculated P_e^Q, wherein the optimal component design includes at least one tradeoff between information flow and component design, wherein the at least one tradeoff decreases the at least one dominant source of information reduction.

Although specific embodiments have been described in detail in the foregoing description and illustrated in the drawings, various other embodiments, changes, and modifications to the disclosed embodiment(s) will become apparent to those skilled in the art. All such other embodiments, changes, and modifications are intended to come within the spirit and scope of the appended claims.

While the disclosure has been described with reference to exemplary embodiments, it will be understood by those skilled in the art that various changes may be made and equivalents may be substituted for elements thereof without departing from the scope of the disclosure. In addition, many modifications may be made to adapt a particular system, device or component thereof to the teachings of the disclosure without departing from the essential scope thereof. Therefore, it is intended that the disclosure not be limited to the particular embodiments disclosed for carrying out this disclosure, but that the disclosure will include all embodiments falling within the scope of the appended claims. Moreover, the use of the terms first, second, etc. do not denote any order or importance, but rather the terms first, second, etc. are used to distinguish one element from another.

In the preceding detailed description of exemplary embodiments of the disclosure, specific exemplary embodiments in which the disclosure may be practiced are described in sufficient detail to enable those skilled in the art to practice the disclosed embodiments. For example, specific details such as specific method orders, structures, elements, and connections have been presented herein. However, it is to be understood that the specific details presented need not be utilized to practice embodiments of the present disclosure. It is also to be understood that other embodiments may be utilized and that logical, architectural, programmatic, mechanical, electrical and other changes may be made without departing from general scope of the disclosure. The following detailed description is, therefore, not to be taken in a limiting sense, and the scope of the present disclosure is defined by the appended claims and equivalents thereof.

References within the specification to “one embodiment,” “an embodiment,” “embodiments”, or “one or more embodiments” are intended to indicate that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the present disclosure. The appearance of such phrases in various places within the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Further, various features are described which may be exhibited by some embodiments and not by others. Similarly, various requirements are described which may be requirements for some embodiments but not other embodiments.

Appendices A-C attached hereto are incorporated by reference in their entirety to the present application.

It is understood that the use of specific component, device and/or parameter names and/or corresponding acronyms thereof, such as those of the executing utility, logic, and/or firmware described herein, are for example only and not meant to imply any limitations on the described embodiments. The embodiments may thus be described with different nomenclature and/or terminology utilized to describe the components, devices, parameters, methods and/or functions herein, without limitation. References to any specific protocol or proprietary name in describing one or more elements, features or concepts of the embodiments are provided solely as examples of one implementation, and such references do not limit the extension of the claimed embodiments to embodiments in which different element, feature, protocol, or concept names are utilized. Thus, each term utilized herein is to be given its broadest interpretation given the context in which that terms is utilized.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. As used herein, the singular forms “a”, “an” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms “comprises” and/or “comprising,” when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

The description of the present disclosure has been presented for purposes of illustration and description, but is not intended to be exhaustive or limited to the disclosure in the form disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope of the disclosure. The described embodiments were chosen and described in order to best explain the principles of the disclosure and the practical application, and to enable others of ordinary skill in the art to understand the disclosure for various embodiments with various modifications as are suited to the particular use contemplated.

	Number	Date	Country
Parent	14315365	Jun 2014	US
Child	16666516		US

Fano-Based Information Theoretic Method (FBIT) for Design and Optimization of Nonlinear Systems

Information

Publication Number

Date Filed

Date Published

Inventors

Original Assignees

CPC

International Classifications

Abstract

Description

Claims

CROSS-REFERENCE TO RELATED APPLICATIONS

ORIGIN OF THE INVENTION

Provisional Applications (1)

Continuation in Parts (1)