This application relates generally to systems and methods for searching spectral data bases and identifying unknown materials. More particularly, this application relates to outlier detection and spectral library augmentation.
The challenge of integrating multiple data types into a comprehensive database searching algorithm has yet to be adequately solved. Existing data fusion and database searching algorithms used in the spectroscopic community suffer from key disadvantages. Most notably, competing methods such as interactive searching are not scalable, and are at best semi-automated, requiring significant user interaction. For instance, the BioRAD KnowItAll® software claims an interactive searching approach that supports searching up to three different types of spectral data using the search strategy most appropriate to each data type. Results are displayed in a scatter plot format, requiring visual interpretation and restricting the scalability of the technique. Also, this method does not account for mixture component searches. Data Fusion Then Search (DFTS) is an automated approach that combines the data from all sources into a derived feature vector and then performs a search on that combined data. The data is typically transformed using a multivariate data reduction technique, such as Principal Component Analysis, to eliminate redundancy across data and to accentuate the meaningful features. This technique is also susceptible to poor results for mixtures, and it has limited capacity for user control of weighting factors.
The present disclosure describes a system and method that overcomes these disadvantages allowing users to identify unknown materials with multiple spectroscopic data.
The present disclosure generally relates to spectral analysis and provides for a system and method to search spectral databases and to identify unknown materials. In one embodiment, the disclosure relates to the detection of spectral “outliers.” In another embodiment, the disclosure relates to an adaptive methodology for spectral library augmentation. In spectral analysis, certain spectral data may be classified as “outliers” from a library of reference spectral data sets. Broadly speaking, the term “outlier” may refer to any spectral data set that is not present in a relevant library. For example, in a library or set of reference Raman spectra of known biological threat agents, a target Raman spectral data set or spectrum (e.g., from a perceived biological threat) may be considered as an “outlier” when that spectrum is found not to match with any spectrum in the reference library. However, a simplistic interpretation of the term “outlier” may not be reliable for sensitive and specific detection of hazardous materials including chemical, biological, radiological, nuclear, and explosive (CBRNE) materials. It is noted here that an “outlier” may represent the actual target data or may just include noise. Therefore, additional analysis of initial outlier status determination may be necessary to accurately determine whether the target data set is a true outlier.
The present disclosure provides a detailed discussion of analysis of spectral data collected from a target (or sample under investigation) so as to more clearly define outlier datasets and, in turn, augment existing spectral libraries to adaptively accommodate such outlier datasets to allow for improved detection in the field of previously undetectable or unknown compounds. The discussion below relates to a more accurate identification or detection of outliers among target spectral data sets and to a methodology to determine when an outlier may be added to the reference library data set. The process outlined may be automated so as to accomplish outlier detection and classification without user intervention. Alternatively, a portion of the process may be performed in software and another portion may be performed manually.
The accompanying drawings, which are included to provide further understanding of the disclosure and are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and, together with the description, serve to explain the principles of the disclosure.
In the drawings:
Reference will now be made in detail to the embodiments of the present disclosure, examples of which are illustrated in the accompanying drawings. Wherever possible, the same reference numbers will be used throughout the drawings to refer to the same or like parts.
The plurality of test data sets 110 includes data that characterizes an unknown material. The plurality of test data sets 110 are obtained from a variety of instruments 140 that produce data representative of the chemical and physical properties of the unknown material. The plurality of test data sets includes spectroscopic data, text descriptions, chemical and physical property data, and chromatographic data. In one embodiment, the plurality of test data sets includes a spectrum or a pattern that characterizes the chemical composition, molecular composition, physical properties and/or elemental composition of an unknown material. In another embodiment, the plurality of test data sets includes one or more of a Raman spectrum, a mid-infrared spectrum, an x-ray diffraction pattern, an energy dispersive x-ray spectrum, and a mass spectrum that are characteristic of the unknown material. In yet another embodiment, the plurality of test data sets may also include an image data set of the unknown material. In still another embodiment, the test data set may include a physical property test data set selected from the group consisting of boiling point, melting point, density, freezing point, solubility, refractive index, specific gravity or molecular weight of the unknown material. In another embodiment, the test data set includes a textual description of the unknown material.
The plurality of spectroscopic data generating instruments 140 include any analytical instrument which generates a spectrum, an image, a chromatogram, a physical measurement and a pattern characteristic of the physical properties, the chemical composition, or structural composition of a material. In one embodiment, the plurality of spectroscopic data generating instruments 140 includes a Raman spectrometer, a mid-infrared spectrometer, an x-ray diffractometer, an energy dispersive x-ray analyzer and a mass spectrometer. In another embodiment, the plurality of spectroscopic data generating instruments 140 further includes a microscope or image generating instrument. In yet another embodiment, the plurality of spectroscopic generating instruments 140 further includes a chromatographic analyzer.
Library 120 includes a plurality of sublibraries 120a, 120b, 120c, 120d and 120e. Each sublibrary is associated with a different spectroscopic data generating instrument 140. In one embodiment, the sublibraries include a Raman sublibrary, a mid-infrared sublibrary, an x-ray diffraction sublibrary, an energy dispersive sublibrary and a mass spectrum sublibrary. For this embodiment, the associated spectroscopic data generating instruments 140 include a Raman spectrometer, a mid-infrared spectrometer, an x-ray diffractometer, an energy dispersive x-ray analyzer and a mass spectrometer. In another embodiment, the sublibraries further include an image sublibrary associated with a microscope. In yet another embodiment, the sublibraries further include a textual description sublibrary. In still yet another embodiment, the sublibraries further include a physical property sublibrary.
Each sublibrary contains a plurality of reference data sets. The plurality of reference data sets includes data representative of the chemical and physical properties of a plurality of known materials. The plurality of reference data sets includes spectroscopic data, text descriptions, chemical and physical property data, and chromatographic data. In one embodiment, a reference data set includes a spectrum and a pattern that characterizes the chemical composition, the molecular composition and/or element composition of a known material. In another embodiment, the reference data set includes a Raman spectrum, a mid-infrared spectrum, an x-ray diffraction pattern, an energy dispersive x-ray spectrum, and a mass spectrum of known materials. In yet another embodiment, the reference data set further includes a physical property test data set of known materials selected from the group consisting of boiling point, melting point, density, freezing point, solubility, refractive index, specific gravity or molecular weight. In still another embodiment, the reference data set further includes an image displaying the shape, size and morphology of known materials. In another embodiment, the reference data set includes feature data having information such as particle size, color and morphology of the known material.
System 100 further includes at least one processor 130 in communication with the library 120 and sublibraries. The processor 130 executes a set of instructions to identify the composition of an unknown material.
In one embodiment, system 100 includes a library 120 having the following sublibraries: a Raman sublibrary associated with a Raman spectrometer; an infrared sublibrary associated with an infrared spectrometer; an x-ray diffraction sublibrary associated with an x-ray diffractometer; an energy dispersive x-ray sublibrary associated with an energy dispersive x-ray spectrometer; and a mass spectrum sublibrary associated with a mass spectrometer. The Raman sublibrary contains a plurality of Raman spectra characteristic of a plurality of known materials. The infrared sublibrary contains a plurality of infrared spectra characteristic of a plurality of known materials. The x-ray diffraction sublibrary contains a plurality of x-ray diffraction patterns characteristic of a plurality of known materials. The energy dispersive sublibrary contains a plurality of energy dispersive spectra characteristic of a plurality of known materials. The mass spectrum sublibrary contains a plurality of mass spectra characteristic of a plurality of known materials. The test data sets include two or more of the following: a Raman spectrum of the unknown material, an infrared spectrum of the unknown material, an x-ray diffraction pattern of the unknown material, an energy dispersive spectrum of the unknown material, and a mass spectrum of the unknown material.
With reference to
In step 210, the test data sets are corrected to remove signals and information that are not due to the chemical composition of the unknown material. Algorithms known to those skilled in the art may be applied to the data sets to remove electronic noise and to correct the baseline of the test data set. The data sets may also be corrected to reject outlier data sets. In one embodiment, the system detects test data sets, having signals and information that are not due to the chemical composition of the unknown material. These signals and information are then removed from the test data sets. In another embodiment, the user is issued a warning when the system detects test data set having signals and information that are not due to the chemical composition of the unknown material.
A detailed discussion of the detection of outliers and augmentation of a spectral library is provided hereinbelow with reference to
With further reference to
In step 225, the set of scores, produced in step 220, are converted to a set of relative probability values. The set of relative probability values contains a plurality of relative probability values, one relative probability value for each reference data set.
Referring still to
In step 240, the identity of the unknown material is reported. To determine the identity of the unknown, the highest final probability value from the set of final probability values is selected. This highest final probability value is then compared to a minimum confidence value. If the highest final probability value is greater than or equal to the minimum confidence value, the known material having the highest final probability value is reported. In one embodiment, the minimum confidence value may range from 0.70 to 0.95. In another embodiment, the minimum confidence value ranges from 0.8 to 0.95. In yet another embodiment, the minimum confidence value ranges from 0.90 to 0.95.
As described above, the library 120 contains several different types of sublibraries, each of which is associated with an analytical technique, i.e., the spectroscopic data generating instrument 140. Therefore, each analytical technique provides an independent contribution to identifying the unknown material. Additionally, each analytical technique has a different level of specificity for matching a test data set for an unknown material with a reference data set for a known material. For example, a Raman spectrum generally has higher discriminatory power than a fluorescence spectrum and is thus considered more specific for the identification of an unknown material. The greater discriminatory power of Raman spectroscopy manifests itself as a higher likelihood for matching any given spectrum using Raman spectroscopy than using fluorescence spectroscopy. The method illustrated in
In yet another embodiment, each spectroscopic data generating instrument has a different associated weighting factor. Estimates of these associated weighting factors are determined through automated simulations. In particular, with at least two data records for each spectroscopic data generating instrument (i.e. two Raman spectra per material), the library is split into training and validation sets. The training set is then used as the reference data set. The validation set is used as test data set and searched against the training set. Without the weighting factors ({W}={1, 1, . . . , 1}), a certain percentage of the validation set will be correctly identified, and some percentage will be incorrectly identified. By explicitly or randomly varying the weighting factors and recording each set of correct and incorrect identification rates, the optimal operating set of weighting factors, for each spectroscopic data generating instrument, is estimated by choosing those weighting factors that result in the best identification rates.
The method of the present disclosure also provides for using a text query to limit the number of reference data sets of known compounds in the sublibrary searched in step 220 of
The method of the present disclosure also provides for using images to identify the unknown material. In one embodiment, an image test data set characterizing an unknown material is obtained from an image generating instrument. The test image, of the unknown, is compared to the plurality of reference images for the known materials in an image sublibrary to assist in the identification of the unknown material. In another embodiment, a set of test feature data is extracted from the image test data set using a feature extraction algorithm to generate test feature data. The selection of an extraction algorithm is well known to one of skill in the art of digital imaging. The test feature data includes information concerning particle size, color or morphology of the unknown material. The test feature data is searched against the reference feature data in the image sublibrary, producing a set of scores. The reference feature data includes information such as particle size, color and morphology of the material. The set of scores, from the image sublibrary, are used to calculate a set of probability values. The relative probability values, for the image sublibrary, are fused with the relative probability values for the other plurality of sublibraries as illustrated in
The method of the present disclosure further provides for enabling a user to view one or more reference data sets of the known material identified as representing the unknown material despite the absence of one or more test data sets. For example, the user inputs an infrared test data set and a Raman test data set to the system. The x-ray dispersive spectroscopy (“EDS”) sublibrary contains an EDS reference data set for the plurality of known compounds even though the user did not input an EDS test data set. Using the steps illustrated in
The method of the present disclosure also provides for identifying unknowns when one or more of the sublibraries are missing one or more reference data sets. When a sublibrary has fewer reference data sets than the number of known materials characterized within the main library, the system treats this sublibrary as an incomplete sublibrary. To obtain a score for the missing reference data set, the system calculates a mean score based on the set of scores, from step 225, for the incomplete library. The mean score is then used, in the set of scores, as the score for missing reference data set.
The method of the present disclosure also provides for identifying miscalibrated test data sets. When one or more of the test data sets fail to match any reference data set in the searched sublibrary, the system treats the test data set as miscalibrated. The assumed miscalibrated test data sets are processed via a grid optimization process where a range of zero and first order corrections are applied to the data to generate one or more corrected test data sets. The system then reanalyzes the corrected test data set using the steps illustrated in
The method of the present disclosure also provides for the identification of the components of an unknown mixture. With reference to
In step 307, the test data sets are corrected to remove signals and information that are not due to the chemical composition of the unknown material. In step 310, each sublibrary is searched for a match for each combined test data set. The searched sublibraries are associated with the spectroscopic data generating instrument used to generate the combined test data sets. The sublibrary search is performed using a spectral unmixing metric that compares the plurality of combined test data sets to each of the reference data sets in each of the searched sublibraries. A spectral unmixing metric is disclosed in U.S. patent application Ser. No. 10/812,233 entitled “Method for Identifying Components of a Mixture via Spectral Analysis,” filed Mar. 29, 2004 which is incorporated herein by reference in its entirety; however this application forms no part of the present invention. The sublibrary searching produces a corresponding second set of scores for each searched sublibrary. Each second score and the second set of scores is the score and set of scores produced in the second pass of the searching method. Each second score in said second set of scores indicates a second likelihood of a match between the combined test data sets and each of reference data sets in the searched sublibraries. The second set of scores contains a plurality of second scores, one second score for each reference data set in the searched sublibrary.
According to a spectral unmixing metric, the combined test data sets define an n-dimensional data space, where n is the number of points in the test data sets. Principal component analysis (PCA) techniques are applied to the n-dimensional data space to reduce the dimensionality of the data space. The dimensionality reduction step results in the selection of m eigenvectors as coordinate axes in the new data space. For each search sublibrary, the reference data sets are compared to the reduced dimensionality data space generated from the combined test data sets using target factor testing techniques. Each sublibrary reference data set is projected as a vector in the reduced m-dimensional data space. An angle between the sublibrary vector and the data space results from target factor testing. This is performed by calculating the angle between the sublibrary reference data set and the projected sublibrary data. These angles are used as the second scores which are converted to second probability values for each of the reference data sets and fed into the fusion algorithm in the second pass of the search method. This paragraph forms no part of the present invention.
Referring still to
From the set of second final probabilities values, a set of high second final probability values is selected. The set of high second final probability values is then compared to the minimum confidence value, step 325. If each high second final probability value is greater than or equal to the minimum confidence value, step 335, the set of known materials represented in the library having the high second final probability values is the reported. In one embodiment, the minimum confidence value may range from 0.70 to 0.95. In another embodiment, the minimum confidence value may range from 0.8 to 0.95. In yet another embodiment, the minimum confidence value may range from 0.9 to 0.95.
Referring to
COMBINED TEST DATA SET=CONCENTRATION×REFERENCE DATA SET+RESIDUAL
To calculate a residual data set, a linear spectral unmixing algorithm may be applied to the plurality of combined test data sets, to thereby produce a plurality of residual test data, step 410. Each searched sublibrary has an associated residual test data. When a plurality of residual data are not identified in step 410, a report is issued, step 420. In this step, the components of the unknown material are reported as those components determined in step 335 of
In this example, a network of n spectroscopic instruments each provide test data sets to a central processing unit. Each instrument makes an observation vector {Z} of parameter {X}. For instance, a dispersive Raman spectrum would be modeled with X=dispersive Raman and Z=the spectral data. Each instrument generates a test data set and calculates (using a similarity metric) the likelihoods {pi(Ha)} of the test data set being of type Ha. Bayes' theorem gives:
where:
p(Ha({Z}): the posterior probability of the test data being of type Ha, given the observations {Z};
p({Z}|Ha): the probability that observations {Z} were taken, given that the test data is type Ha;
p(Ha): the prior probability of type Ha being correct; and
p({Z}): a normalization factor to ensure the posterior probabilities sum to 1.
Assuming that each spectroscopic instrument is independent of the other spectroscopic instruments gives:
and from Bayes rule
gives
Equation 4 is the central equation that uses Bayesian data fusion to combine observations from different spectroscopic instruments to give probabilities of the presumed identities.
To infer a presumed identity from the above equation, a value of identity is assigned to the test data having the most probable (maximum a posteriori) result:
To use the above formulation, the test data is converted to probabilities. In particular, the spectroscopic instrument must give p({Z}|Ha), the probability that observations {Z} were taken, given that the test data is type Ha. Each sublibrary is a set of reference data sets that match the test data set with certain probabilities. The probabilities of the unknown matching each of the reference data sets must sum to 1. The sublibrary is considered as a probability distribution.
The system applies a few commonly used similarity metrics consistent with the requirements of this algorithm: Euclidean Distance, the Spectral Angle Mapper (SAM), the Spectral Information Divergence (SID), Mahalanobis distance metric and spectral unmixing. The SID has roots in probability theory and is thus the best choice for the use in the data fusion algorithm, although either choice will be technically compatible. Euclidean Distance (“ED”) is used to give the distance between spectrum x and spectrum y:
Spectral Angle Mapper (“SAM”) finds the angle between spectrum x and spectrum y:
When SAM is small, it is nearly the same as ED. Spectral Information Divergence (“SID”) takes an information theory approach to similarity and transforms the x and y spectra into probability distributions p and q:
The discrepancy in the self-information of each band is defined as:
So the average discrepancies of x compared to y and y compared to x (which are different) are:
The SID is thus defined as:
SID(x,y)=D(x∥y)+D(y∥x) (Equation 11)
A measure of the probabilities of matching a test data set with each entry in the sublibrary is needed. Generalizing a similarity metric as m(x, y), the relative spectral discrimination probabilities is determined by comparing a test data set x against k library entries.
Equation 12 is used as p({Z}|Ha) for each sensor in the fusion formula.
Assuming, a library consists of three reference data sets: {H}={A, B, C}. Three spectroscopic instruments (each a different modality) are applied to this sample and compare the outputs of each spectroscopic instrument to the appropriate sublibraries (i.e. dispersive Raman spectrum compared with library of dispersive Raman spectra). If the individual search results, using SID, are:
SID(xRaman,LibraryRaman)={20, 10, 25}
SID(xFluor,LibraryFluor)={40, 35, 50}
SID(xIR,LibraryIR)={50, 20, 40}
Applying Equation 12, the relative probabilities are:
p(Z{Raman}|{H})={0.63, 0.81, 0.55}
p(Z{Flour}|{H})={0.68, 0.72, 0.6}
p(Z{IR}|{H})={0.55, 0.81, 0.63}
It is assumed that each of the reference data sets is equally likely, with:
p({H})={p(HA), p(HB), p(HC)}={0.33, 0.33, 0.33}
Applying Equation 4 results in:
p({H}|{Z})=α×{0.33, 0.33, 0.33}×[{0.63, 0.81, 0.55}·{0.68, 0.72, 0.6}·{0.55, 0.81, 0.63}]
p({H}|{Z})=α×{0.0779, 0.1591, 0.0687}
Now normalizing with α=1/(0.0779+0.1591+0.0687) results in:
p({H}|{Z})={0.25, 0.52, 0.22}
The search identifies the unknown sample as reference data set B, with an associated probability of 52%.
Raman and mid-infrared sublibraries each having reference data set for 61 substances were used. For each of the 61 substances, the Raman and mid-infrared sublibraries were searched using the Euclidean distance vector comparison. In other words, each substance is used sequentially as a target vector. The resulting set of scores for each sublibrary were converted to a set of probability values by first converting the score to a Z value and then looking up the probability from a Normal Distribution probability table. The process was repeated for each spectroscopic technique for each substance and the resulting probabilities were calculated. The set of final probability values was obtained by multiplying the two sets of probability values.
The results are displayed in Table 1. Based on the calculated probabilities, the top match (the score with the highest probability) was determined for each spectroscopic technique individually and for the combined probabilities. A value of “1” indicates that the target vector successfully found itself while a value of “0” indicates that the target vector found some match other than itself as the top match. The Raman probabilities resulted in four incorrect results, the mid-infrared probabilities resulted in two incorrect results, and the combined probabilities resulted in no incorrect results.
The more significant result is the fact that the distance between the top match and the second match is significantly large for the combined approach as opposed to Raman or mid-infrared for almost all of the 61 substances. In fact, 15 of the combined results have a difference that is a four times greater distance than the distance for either MIR or Raman, individually. Only five of the 61 substances do not benefit from the fusion algorithm.
A spectral data set that falls outside a library or reference data set beyond a predetermined confidence interval or tolerance (e.g., 95% match level threshold) may be considered as an outlier (discussed in more detail with reference to
Still referring to
A number of methods may be used for determining class identification or target data classification (i.e., to determine with which class of reference spectra the target data may be associated, if any at all). There are many different methods that can be used for supervised classification. For example, the Mahalanobis Distance (MD) method may be used. The two factors to balance for supervised classification are sensitivity vs. overfitting. Consider the distribution of the set of points representing two classes (of reference spectra or spectral data set) in n-dimensional space. If there is significant overlap of the points for those two classes, that overlap can be removed by drawing classification boundaries that are specific to the points on the boundary. In other words, a jagged line enables more points to be classified correctly than a straight line does. Support Vector Machines (SVM) may allow this greater degree of discrimination, for example, than does MD. It may not be desirable to overfit on a particular training set with an accompanying loss of actual predictive power for spectra that were not included in the calibration set.
Reporting at step 510 or step 511 may include facts about the classification and the class to which the unknown (i.e., the target data set) was assigned. These may include things like the degree of confidence in the assignment, score associated with the match, whether the class was one of the original classes or a class that was generated via adaptive learning (as discussed later with reference to
The test (of the target data set) may be designed as a two-class problem—the threat class versus the background class. It could alternatively be designed as an n-class problem, where one may attempt to identify the particular class (biological species, chemical characterization of the explosive, etc). In some respects, the n-class problem may be easier than the 2-class problem because there may not be one big diverse class made up of the members of all the different threat classes. This may be the trade-off between one general model versus many smaller specific models. The smaller models may have more uniformly distributed members.
Data fusion (of data from different types of sensors such as, for example, a Raman sensor, a LIBS sensor, a fluorescence sensor, etc.) may take into account the confidence (specificity) associated with each spectroscopic method and the confidence associated with each class. In other words, if a given target is classified as belonging to a given class by a spectroscopic technique that has a high degree of specificity (such as Raman spectroscopy), then another technique that has a lower degree of specificity (such as fluorescence spectroscopy) may not override the classification unless the fluorescence class designation has a much higher degree of confidence than the Raman designation for the particular sample and class. The confidence associated with a class may depend on the degree to which the members of the class evenly and completely cover the space defined by the class—the homogeneity of the class. This can be measured by the density of the class in an n-dimensional space or in a reduced dimensional space. Other ways to measure the quality of the class may include the leverage exhibited by any single member of the class. In other words, if that single member is left out, does the space spanned by the class change drastically? If the change is drastic, then the quality of the class may be called into question. The quality of a match may be measured by how well the target spectrum fits inside the set of data points that define the class.
Note that using multiple target data points rather than a single target data point can greatly increase the confidence associated with a particular classification—e.g., by voting, polling. A plurality of measurements may likely be performed here, given the use of fiber optic bundles that provide multiple parallel measurements (e.g., in case of a fiber array translator (FAST) based spectroscopy unit). Using weighted confidence factors can also be helpful. The weights may be determined by grid search optimizations against ground truth (supervised classification). Thus, for example, a Raman measurement may likely have a higher weighting factor than a fluorescence measurement. This would be balanced against the quality of the match (score) for the one technique (e.g., Raman) vs. the other (e.g., fluorescence). In a similar fashion, a match from a class that is a very uniform class may be given more weight than a class that does not have a uniform distribution of the data points that define that class.
Note that weights could be continually updated as new members are added to the classes via the process defined in
It is noted here that any target data set not matching with the reference library data set within a predetermined confidence level/tolerance may be considered as representing “noise” data. However, as discussed herein, it may be desirable to further analyze this “noise” to identify whether a true outlier is present in the “noise.”
In
This Adaptive Learning Module 506 in
If a match is not found, the next step may be to submit the (target) data to an unmixing step 522. Fusion may be performed along with unmixing in this case 522. It may be that data from auxiliary sensors may show a strong enough match to a given class to overrule the uncertainty associated with the match for the data from the dominant sensor. In one embodiment, the Raman sensor may function as the dominant sensor whereas other sensors (e.g., the LIBS sensor, or the fluorescence sensor, etc.) may be considered as auxiliary sensors. Successful results from fusion 522 may be reported in step 523. Failures can be reported in step 524.
It is possible that classes may overlap for a small subspace in an n-dimensional space. A uniquely classified target spectrum is represented as point 1 in
Still referring to
If the classification models were unable to assign the target sample to a given class with the proper degree of confidence, then fusion may be performed in step 526. Fusion may allow the polling of additional data techniques if they are available. It is possible that these additional techniques may have high enough degrees of confidence to result in a statistically significant assignment of the target sample to one of the given classes which may be reported in step 527. If fusion is not successful, then unmixing with fusion can be performed at step 529.
The target data set may be a pure data set or a spectral mixture of, for example, data from a combination of chemical and/or biological entities. Unmixing is attempted if none of the preceding steps in
The Adaptive Learning box 506, shown in
Still referring to
If the matching steps (i.e., the process in
It is observed here that the addition of a data point to a candidate class 548 could affect other candidate classes. The now-bigger candidate class (because of the addition of the data point) could overlap with another class and cause a member of a different class to be reassigned. All candidate classes and even labeled classes may be reviewed to see how the changing class structure affects all other classes. If adaptive weights are used, the weights may need to be recalculated at this point. This is a computationally intensive step that may need to occur in the background (and, hence, is not expressly illustrated in
For illustration,
In further reference to
If the candidate class is confirmed (in step 549 discussed above), this class may be added to the list of labeled classes 550. These results can be reported in step 551. Labeled classes can then be used for assignment in the top half of the flow chart in
Still referring to
The Figures numbered 9A through 9D illustrate an exemplary set of test results to which the methodology described hereinbefore regarding detection of outliers and augmentation of spectral libraries may be applicable.
The average spectra of the known and unknown samples are shown in
It is seen from the scatter plot in
The present disclosure may be embodied in other specific forms without departing from the spirit or essential attributes of the disclosure. Accordingly, reference should be made to the appended claims, rather than the foregoing specification, as indicating the scope of the disclosure. Although the foregoing description is directed to the embodiments of the disclosure, it is noted that other variations and modification will be apparent to those skilled in the art, and may be made without departing from the spirit or scope of the disclosure.
This application is a continuation-in-part of pending U.S. patent application Ser. No. 11/450,138, titled “Forensic Integrated Search Technology” and filed on Jun. 9, 2006, which, in turn, claims the priority benefits of U.S. Provisional Application No. 60/688,812, filed on Jun. 9, 2005 and titled, “Forensic Integrated Search Technology,” and U.S. Provisional Application No. 60/711,593, filed on Aug. 26, 2005 and titled “Forensic Integrated Search Technology.” The disclosures of all of these applications are incorporated herein by reference in their entireties. This application further claims priority benefit under 35 U.S.C. § 119(e) of the U.S. Provisional Application No. 60/957,757.
Number | Date | Country | |
---|---|---|---|
60688812 | Jun 2005 | US | |
60711593 | Aug 2005 | US | |
60957757 | Aug 2007 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 11450138 | Jun 2006 | US |
Child | 12196921 | US |