Claims
- 1. A method of determining whether a sample matches a reference species, the method comprising:selecting N indices l1, l2, . . . lN of peaks in an indexed data set characterizing the reference species; selecting a first set of probabilities p1, p2, . . . pN that peaks will occur at indices l1, l2, . . . lN, respectively, of an indexed data set that characterizes the sample when the sample matches the reference species; selecting a second set of probabilities q1, q2, . . . qN that peaks will occur at indices l1, l2, . . . lN, respectively, of an indexed data set that characterizes the sample when the sample does not match the reference species; choosing a threshold Kc; obtaining an indexed observation data set x1, x2, . . . xN, where xj ∈{0, 1} and xj=1 if and only if a peak is present in the sample at lj; deciding that the sample matches the reference species if λ≦Kc where λ=∑1≤j≤N log(1-pj1-qj)+∑1≤j≤N xjlog[pj(1-qj)qj(1-pj)];anddeciding that the sample does not match the reference species if λ>Kc.
- 2. The method of claim 1, wherein Kc is selected such that, given that the sample matches the reference species, P{λ>Kc}≦α for a predetermined type I error probability α.
- 3. The method of claim 1, wherein said selecting steps comprise iterative proportional scaling calculations.
- 4. The method of claim 1, wherein said selecting steps comprise iterative weighted least squares calculations.
- 5. The method of claim 1, wherein said selecting steps comprise application of a Lancaster model.
- 6. The method of claim 1, wherein said selecting steps comprise application of a latent class model.
- 7. A method of detelaring whether a sample matches a reference species, the method comprising:selecting N indices l1, l2, . . . lN of peaks in an indexed data set characterizing the reference species; selecting a first set of probabilities p1, p2, . . . pN that peaks will occur at indices l1, l2, . . . lN of an indexed data set that characterizes the sample when the sample matches the reference species; selecting a first set of probability density functions gi(yi; θi) that characterize a measurable feature yi of the peak at index li given the presence of a peak at index li of a data set that characterizes the sample when the sample matches the reference species; selecting a second set of probabilities q1, q2, . . . qN that peaks will occur at indices l1, l2, . . . lN of an indexed data set that characterizes the sample when the sample does not match the reference species; selecting a second set of probability density functions gi(yi; Ωi) that characterize the measurable feature yi of the peak at index li given the presence of a peak at index li of a data set that characterizes the sample when the sample does not match the reference species; selecting a threshold Kc; obtaining an indexed observation data set x1, x2, . . . xN where xi∈{0, 1} and xi=1 if and only if a peak is present in the sample at li; obtaining a feature data set yi, y2, . . . yN; and deciding that the sample matches the reference species if λ≦Kc where λ=∑i=1N [log 1-pi1-qi+xi{log pi(1-qi)qi(1-pi)+log gi(yi;θi)gi(yi;Ωi)}];anddeciding that the sample does not match the reference species if λ>Kc.
- 8. The method of claim 7, wherein one or more gi(·) is a lognormal density given by gi(yi;θi)=gi(yi;μi,σi2)=1yi2πσ2exp{-(log yi-μi)22σi2},yi≥0.
- 9. The method of claim 7, wherein one or more gi(·) is a gamma density given by gi(yi;θi)=gi(yi;αi,βi)=1Γαiβiαiyiαi-1exp(-yi/βi),yi≥0.
- 10. The method of claim 7, wherein one or more gi(·) is a Poisson density given by gi(yi;θi)=θyiexp(-θi)yi!,yi=0,1,2,… .
- 11. The method of claim 7, wherein the measurable feature is the intensity of the peak at index li.
- 12. The method of claim 7, wherein the measurable feature is the width of the peak at index li.
- 13. The method of claim 7, wherein the measurable feature is a quantification of the skew of the peak at index li.
- 14. A method, wherein the status of a process at any point t in time is characterized by an indexed observation data set Xt={x1,t, x2,t, . . . xN,t}, where xj,t ∈{0, 1} and xj,t=1 if and only if a peak is present at time t in the sample at index lj, the method comprising:selecting a first set of probabilities p1, p2, . . . pN that peaks will occur at x1,t, x2,t, . . . xN,t, respectively, when the process is operating normally; selecting a second set of probabilities q1, q2, . . . qN that peaks will occur at x1,t, x2,t, . . . xN,t, respectively, when the process is not operating normally; acquiring a sequence X1, X2, . . . XT of indexed observation data sets; intervening in the process when it is determined that Cn equals or exceeds a predetermined value A, where C0=0; Cn=Sn−min1≦j≦n {Sj} for n≧1; and Sn=∑1≤j≤n log(1-pj1-qj)+∑1≤j≤n xj,tlog[pj(1-qj)qj(1-pj)].
- 15. The method of claim 14, wherein A is selected as a function of the desired false alarm rate for the test.
- 16. The method of claim 14, wherein said intervening comprises stopping the process.
- 17. A method,wherein the status of a process at any point t in time is characterized by an indexed observation data set Xt={x1,t, x2,t, . . . xN,t}, where xj,t ∈{0, 1} and xj,t=1 if and only if a peak is present at time t in the sample at index lj, and a feature data set Yt={y1,t, y2,t, . . . yN,t}, where if xj,t=0, yj,t=0, and if xj,t=1, yj,t quantifies a feature of the peak at time t in the sample at index lj, the method comprising:selecting a first set of probabilities p1, p2, . . . pN that peaks will occur at x1,t, x2,t, . . . xN,t, respectively, when the process is operating normally; selecting a first set of probability density functions gi(yi; θi) that characterize a measurable feature yi of the peak at index li given the presence of a peak at index li of a data set that characterizes the process when it is operating normally; selecting a second set of probabilities q1, q2, . . . qN that peaks will occur at x1,t, x2,t, . . . xN,t, respectively, when the process is not operating normally; selecting a second set of probability density functions gi(yi; Ωi) that characterize the measurable feature yi of the peak at index li given the presence of a peak at index li of a data set that characterizes the process when it is operating normally; acquiring a sequence X1, X2, . . . XT of indexed observation data sets; acquiring a sequence Y1, Y2, . . . YT of feature data sets; intervening in the process when it is determined that Cn equals or exceeds a predetermined value A, where C0=0; Cn=Sn−min1≦j≦n{Sj} for n≧1; and Sn=∑i=1N [log 1-pi1-qi+xi{log pi(1-qi)qi(1-pi)+log gi(yi;θi)gi(yi;Ωi)}].
- 18. The method of claim 17, wherein one or more gi(·) is a lognormal density function given by gi(yi;θi)=gi(yi;μi,σi2)=1yi2πσ2exp{-(log yi-μi)22σi2},yi≥0.
- 19. The method of claim 17, wherein one or more gi(·) is a gamma density given by gi(yi;θi)=gi(yi;αi,βi2)=1Γαiβiαiyiαi-1exp(-yi/βi),yi≥0.
- 20. The method of claim 17, wherein one or more gi(·) is a Poisson density given by gi(yi;θi)=θyiexp(-θi)yi!,yi=0,1,2,… .
- 21. The method of claim 17, wherein the measurable feature is the intensity of the peak at index li.
- 22. The method of claim 17, wherein the measurable feature is the width of the peak at index li.
- 23. The method of claim 17, wherein the measurable feature is a quantification of the skew of the peak at index li.
- 24. The method of claim 17, wherein A is selected as a function of the desired false alarm rate for the test.
- 25. The method of claim 17, wherein said intervening comprises stopping the process.
- 26. A system for analyzing a sample in comparison with a reference species, comprising:a processor; a memory storing data indicative of: probabilities p1, p2, . . . pN that peaks will occur at indices l1, l2, . . . lN of an indexed data set that characterizes the sample when the sample matches the reference species; probabilities q1, q2, . . . qN that peaks will occur at indices l1, l2, . . . lN of an indexed data set that characterizes the sample when the sample does not match the reference species; a threshold value; and an indexed sample data set x1, x2, . . . xN characterizing the sample, wherein each xi is a binary value that indicates whether or not a peak is present at index li; and a computer-readable medium encoded with programming instructions executable by said processor to: calculate a log-likelihood ratio λ, where λ=∑1≤j≤Nlog(1-p1-qj)+∑1≤j≤Nxjlog[pj(1-qj)qj(1-pj)];generate a first signal when λ is less than said threshold value; and generate a second signal when λ is greater than said threshold value.
- 27. A method of performing discriminant analysis, the method comprising:selecting N indices l1, l2, . . . lN of peaks in an indexed data set characterizing a first reference species or a second reference species; selecting a first set of probabilities p1,1, p2,1, . . . pN,1 that peaks will occur at indices l1, l2, . . . lN, respectively, of an indexed data set that characterizes the sample when the sample matches the first reference species; selecting a second set of probabilities p1,2, p2,2, . . . pN,2 that peaks will occur at indices l1, l2, . . . lN, respectively, of an indexed data set that characterizes the sample when the sample matches the second reference species; selecting a third set of probabilities q1,1, q2,1, . . . qN,1 that peaks will occur at indices l1, l2, . . . lN, respectively, of an indexed data set that characterizes the sample when the sample matches a second reference species; selecting a fourth set of probabilities q1,2, q2,2, . . . qN,2 that peaks will occur at indices l1, l2, . . . lN, respectively, of an indexed data set that characterizes the sample when the sample matches a second reference species; obtaining an indexed observation data set x1, x2, . . . xN, where xj∈{0, 1} and xj=1 if and only if a peak is present in the sample at lj; calculating λ1=∑1≤j≤Nlog(1-pj,11-qj,1)+∑1≤j≤Nxjlog[pj,1(1-qj,1)qj,1(1-pj,1)] andλ2=∑1≤j≤Nlog(1-pj,21-qj,2)+∑1≤j≤Nxjlog[pj,2(1-qj,2)qj,2(1-pj,2)];anddeciding that the sample matches the first reference species if λ1≦λ2; and the sample matches the second reference species if λ1>λ2.
- 28. A method of performing a cluster analysis of M samples, comprising:selecting N indices l1, l2, .. lN of possible peak locations in indexed data sets characterizing the M samples; obtaining indexed data sets Xi={x1,i, x2,i, . . . xN,i}:i=1, 2, . . . M, each data set corresponding to a different sample, wherein xj,i={0, 1} and xj,i=1 if and only if a peak exists in the data set for sample i at index lj; and defining P groups of samples by selecting a first array of probabilities pk,i: k=1, 2, . . . P; i=1, 2, . . . N that peaks will occur at indices l1, l2, . . . lN, respectively, of an indexed data set that characterizes a sample when the sample is in group k; selecting a second array of probabilities qk,i: k=1, 2, . . . P; i=1, 2, . . . N that peaks will occur at indices l1, l2, . . . lN, respectively, of an indexed data set that characterizes the sample when the sample is not in group k; and selecting gj ∈{1, 2, . . . P}: j=1, 2, . . . M, where sample j is in group gj; wherein pk,i, qk,i, and gj are selected to maximize λ=∑1≤j≤M{∑1≤j≤N[log(1-pi1-qi)+xi,jlog[pk,i(1-qk,i)qk,i(1-pk,i)]]|k=gj}.
- 29. The method of claim 28, wherein P is also selected to maximize λ.
CROSS-REFERENCE TO RELATED APPLICATIONS
This is a continuation-in-part of U.S. patent application Ser. No. 09/288,758, now U.S. Pat. No. 6,253,162 filed on Apr. 7, 1999, which is entitled “Method of Identifying Features in Indexed Data,” and of U.S. patent application Ser. No. 09/765,872, now U.S. Pat. No. 6,366,870 filed on Jan. 19, 2001, which is titled “Identification of Features in Indexed Data and Equipment Therefore.” These documents are hereby incorporated by reference as if fully set forth herein.
Government Interests
This invention was made with Government support under Contract DE-AC0676RLO1830 awarded by the U.S. Department of Energy. The Government has certain rights in the invention.
US Referenced Citations (2)
Number |
Name |
Date |
Kind |
6253162 |
Jarman et al. |
Jun 2001 |
B1 |
6366870 |
Jarman et al. |
Apr 2002 |
B2 |
Non-Patent Literature Citations (3)
Entry |
A. Nijhuis et al., “Multivariate statistical process control in chromatography”, Chemometrics and Intelligent Laboratory Systems v. 38, pp. 51-62 (Elsevier Science B.V. 1997). |
Arnold, R. J. and Reilly, J. P., “Fingerprint Matching of E.Coli Strains with Matrix-assisted Laser Desorption/Ionization Time-of-Flight Mass Spectrometry of Whole Cells Using a Modified Correlation Approach”, Rapid Communications in Mass Spectrometry, v. 12, pp. 630-636 (1998). |
Martens, H. and Naes, T., “Multivariate Calibration”, pp. 85-101 (John Wiley & Sons). |
Continuation in Parts (2)
|
Number |
Date |
Country |
Parent |
09/288758 |
Apr 1999 |
US |
Child |
09/866201 |
|
US |
Parent |
09/765872 |
Jan 2001 |
US |
Child |
09/288758 |
|
US |