Claims
- 1. A method of quantifying the relationship between a test spectrum and a reference spectrum from a mass spectrometry device, where the reference spectrum is related to a known peptide sequence, comprising:
constructing from the test spectrum a peak table comprising a list locations of peaks in the reference spectrum that have a magnitude greater than a predetermined threshold magnitude; determining fingerprint data F for a peptide of length P, wherein said fingerprint data comprises pr,i for all r and i, such that
r is a C- or N-terminus sequence Cy or Ny of length y taken from C- or N-terminus sequences C1, C2, . . . CP, or N1, N2, . . . NP of the peptide, respectively, i is an ion type from 1, 2, . . . I, and pr,i reflects the fraction of replicate spectra in which the peak corresponding to sequence r and ion type i is expected to be observed; and calculating a probability P{HA|x} that the known peptide sequence is present in the sample as a function of fingerprint data F.
- 2. The method of claim 1, wherein said calculating is performed as a function of a value s that quantifies variability in peak location for all r, i.
- 3. The method of claim 2, wherein said value s is specified by the tolerance value of a testing instrument.
- 4. The method of claim 1, wherein said predetermined threshold significance is based on the likelihood of a peak occurring by random chance at a given location.
- 5. The method of claim 4, wherein said predetermined threshold significance is the likelihood of a peak occurring by random chance at a given location.
- 6. The method of claim 5, wherein said predetermined threshold significance is a predetermined offset from the likelihood of a peak occurring by random chance at a given location.
- 7. The method of claim 1, further comprising calculating a probability P{H0|x} that the known peptide sequence is not present in the sample as a function of fingerprint data F.
- 8. The method of claim 7, further comprising calculating a likelihood ratio L of P{HA|x} to P{H0|x}.
- 9. The method of claim 8, further comprising calculating a logarithm λ of the likelihood ratio L.
- 10. The method of claim 1, wherein at least one of said pr,i is determined experimentally from actual replicate spectra.
- 11. The method of claim 1, wherein at least one of said pr,i is determined mathematically from r and i.
- 12. The method of claim 1, wherein said fingerprint data F comprises a set of triples (lr,i, sr,i, pr,i), such that
lr,i is the peak location from the peak table corresponding to sequence r and ion type i, and sr,i reflects the variability in the peak location measurement.
- 13. An apparatus comprising a processor and a memory in communication with said processor, wherein said memory contains programming instructions executable by said processor to:
acquire a spectrum s0, as from tandem mass spectrometry, representative of a sample P0; acquire a spectrum sj, for all j=1, 2, . . . N, as from tandem mass spectrometry, representative of each of a plurality of known peptides P1, P2, . . . PN; acquire a probability pr,i, which reflects the fraction of replicate spectra in which the peak corresponding to sequence r and ion type i is expected to be observed, for each j,
r, a C- or N-terminus sequence Cy or Ny of length y taken from C- or N-terminus sequences C1, C2, . . . CP, or N1, N2, . . . NP of the peptide, respectively, and i, an ion type from 1, 2, . . . I, and for each j, calculate a probability Pj{HA|x} that the known peptide Pj is present in the sample as a function of the pr,i for that j.
- 14. The method of claim 13, wherein for each j the probability Pj{HA|x} is calculated also as a function of the variability in the peak location measurement associated with the spectrum sj.
- 15. The method of claim 13, wherein for each j the probability Pj{HA|x} is calculated also as a function of the peak locations in the spectrum sj.
- 16. The method of claim 13, wherein at least one spectrum sj is obtained experimentally from actual peptide Pj.
- 17. The method of claim 13, wherein at least one spectrum sj is obtained algorithmically from information about theoretical peptide Pj.
- 18. A method of scoring the relationship between a plurality of candidate peptides and a sample, comprising:
generating a list of candidate peptides; and scoring each candidate peptide in the list independently of said generation.
- 19. The method of claim 18, wherein said scoring is performed as a function of the coincidence of peaks in a mass spectrometry spectrum corresponding to the candidate peptide with peaks in a mass spectrometry spectrum corresponding to the sample.
- 20. The method of claim 18, wherein said scoring comprises calculating a probability that the known peptide sequence is present in the sample as a function of
lr,i, the peak location from the peak table corresponding to sequence r and ion type i, sr,i, which reflects the variability in the peak location measurement, and pr,i, which reflects the fraction of replicate spectra in which the peak corresponding to ion r and ion type i is expected to be observed, for all rε{C- and N-terminus sequences Cy and Ny of length y taken from C- or N-terminus sequences C1, C2, . . . CP, or N1, N2, . . . NP, of the known peptide sequence, respectively}, and iε{ion types from 1, 2, . . . I}.
- 21. A method of finding one or more possible matching peptides to a test peptide associated with a tandem mass spectrometry test spectrum s, comprising:
selecting a function f that takes spectra s1 and s2 as input, where f includes at least one term comprising the number n of peaks that appear in both s1 and a shifted copy of s2; and performing a genetic algorithm on a plurality of candidate peptides using f as the objective function and using s as either s1 or s2.
- 22. The method of claim 21, wherein said performing comprises:
generating a second plurality of candidate peptides from a first plurality of candidate peptides, wherein said generating comprises calculating f(s, t) for each t in a set of spectra representing the first plurality of candidate peptides.
- 23. The method of claim 21, wherein said performing comprises determining n for some s1 and s2 by:
creating an m1×m2 matrix M, where:
m1 is the number of peaks in s1; m2 is the number of peaks in s2; and the cell of M at row i, column j, holds a number representative of the signed difference between the location of peak i in s1 and peak j in s2; and assigning n to be the number of non-distinct values in M.
- 24. The method of claim 21, wherein said performing comprises determining n for some s1 and s2 by:
creating an m1×m2 matrix M, where:
m1 is the number of peaks in s1; m2 is the number of peaks in s2; and the cell of M at row i, column j, holds a number representative of the signed difference between the location of peak i in s1 and peak j in s2; and assigning n to be the maximum number of times a non-distinct value appears in M.
STATEMENT OF GOVERNMENT SUPPORT
[0001] This invention was made with Government support under Contract DE-AC06-76RL01830, awarded by the U.S. Department of Energy. The United States Government may have certain rights in the invention.