The entire contents of the following documents are incorporated herein by reference in their entireties:
U.S. Pat. No. 6,983,213; International Patent Application PCT/US2004/013096, filed on Apr. 28, 2004; U.S. patent application Ser. No. 11/261,440, filed on Oct. 28, 2005; International Patent Application PCT/US2005/039186, filed on Oct. 28, 2005; International Patent Application PCT/US2006/013723, filed on Apr. 11, 2006; U.S. patent application Ser. No. 11/754,305, filed on May 27, 2007; International Patent Application PCT/US2007/069832, filed on May 28, 2007; and U.S. provisional patent application Ser. No. 60/941,656, filed on Jun. 2, 2007.
1. Field of the Invention
The present invention relates to mass spectrometry systems. More particularly, it relates to mass spectrometry systems that are useful for the analysis of complex mixtures of molecules, including large and small organic molecules such as proteins or peptides, environmental pollutants, pharmaceuticals and their metabolites, and petrochemical compounds, to methods of analysis used therein, and to a computer program product having computer code embodied therein for causing a computer, or a computer and a mass spectrometer in combination, to affect such analysis.
2. Prior Art
A previous approach, as in U.S. Pat. No. 6,983,213, International Patent Application PCT/US2005/039186, filed on Oct. 28, 2005, and U.S. provisional patent application Ser. No. 60/941,656, filed on Jun. 2, 2007 provides a novel method for calibrating mass spectra for improved mass accuracy and line shape correction to improve the ability to perform elemental composition analysis or formula identification.
Very high mass accuracy can be obtained on so-called unit mass resolution systems in accordance with the techniques taught in U.S. Pat. No. 6,983,213.
Accurate line shape calibration provides an additional metric to assist in the unambiguous formula identification by matching the measured spectra to the calculated spectra of candidate formulas, as in International Patent Application PCT/US2005/039186, filed on Oct. 28, 2005.
For higher resolution mass spectrometers where the monoisotopic peak is baseline resolved from the rest of the isotopes, accurate line shape calibration can be performed even without the use of either internal or external calibration standards by simply using the monoisotopic peak of the unknown ion itself as the peak shape calibration standard, as in U.S. provisional patent application Ser. No. 60/941,656, filed on Jun. 2, 2007.
However, obtaining correct elemental compositions from conventional to high resolution mass spectrometry systems remains a challenge to practitioners of mass spectrometry due to the enormous number of possible formulas within a given accurate mass tolerance and the highly tedious process of deciding which elements to consider for the elemental composition.
There exists a significant gap between what the current mass spectral system can offer and what is being achieved at the present using existing technologies for mass spectral analysis.
It is an object of the invention to provide a mass spectrometry system and a method for operating a mass spectrometry system that overcomes the difficulties described above, in accordance with the methods described herein.
It is another object of the invention to provide a storage media having thereon computer readable program code for causing a mass spectrometry system to perform the method in accordance with the invention.
An additional aspect of the invention is, in general, a computer readable medium having thereon computer readable code for use with a mass spectrometer system having a data analysis portion including a computer, the computer readable code being for causing the computer to analyze and interpret data by performing the methods described herein. The computer readable medium preferably further comprises computer readable code for causing the computer to perform at least one the specific methods described.
Of particular significance, the invention is also directed generally to a mass spectrometer system for analyzing chemical composition, the system including a mass spectrometer portion, and a data analysis system, the data analysis system operating by obtaining calibrated continuum spectral data by processing raw spectral data; generally in accordance with the methods described herein. The data analysis portion may be configured to operate in accordance with the specifics of these methods. Preferably the mass spectrometer system further comprises a sample preparation portion for preparing samples to be analyzed, and a sample separation portion for performing an initial separation of samples to be analyzed. The separation portion may comprise at least one of an electrophoresis apparatus, a chemical affinity chip, or a chromatograph for separating the sample into various components.
The foregoing aspects and other features of the present invention are explained in the following description, taken in connection with the accompanying drawings, wherein:
Referring to
Analysis system 10 has a sample preparation portion 12, other detector portion 23, a mass spectrometer portion 14, a data analysis system 16, and a computer system 18. The sample preparation portion 12 may include a sample introduction unit 20, of the type that introduces a sample containing proteins, peptides, or small molecule drug of interest to system 10, such as LCQ Deca XP Max, manufactured by Thermo Fisher Scientific Corporation of Waltham, Mass., USA. The sample preparation portion 12 may also include an analyte separation unit 22, which is used to perform a preliminary separation of analytes, such as the proteins to be analyzed by system 10. Analyte separation unit 22 may be any one of a chromatography column, an electrophoresis separation unit, such as a gel-based separation unit manufactured by Bio-Rad Laboratories, Inc. of Hercules, Calif., or other separation apparatus as is well known in the art. In electrophoresis, a voltage is applied to the unit to cause the proteins to be separated as a function of one or more variables, such as migration speed through a capillary tube, isoelectric focusing point (Hannesh, S. M., Electrophoresis 21, 1202-1209 (2000), or by mass (one dimensional separation)) or by more than one of these variables such as by isoelectric focusing and by mass. An example of the latter is known as two-dimensional electrophoresis.
The mass spectrometer portion 14 may be a conventional mass spectrometer and may be any one available, but is preferably one of MALDI-TOF, quadrupole MS, ion trap MS, qTOF, TOF/TOF, or FTMS. If it has a MALDI or electrospray ionization ion source, such ion source may also provide for sample input to the mass spectrometer portion 14. In general, mass spectrometer portion 14 may include an ion source 24, a mass analyzer 26 for separating ions generated by ion source 24 by mass to charge ratio, an ion detector portion 28 for detecting the ions from mass analyzer 26, and a vacuum system 30 for maintaining a sufficient vacuum for mass spectrometer portion 14 to operate most effectively. If mass spectrometer portion 14 is an ion mobility spectrometer, generally no vacuum system is needed and the data generated are typically called a plasmagram instead of a mass spectrum.
In parallel to the mass spectrometer portion 14, there may be other detector portion 23, where a portion of the flow is diverted to for nearly parallel detection of the sample in a split flow arrangement. This other detector portion 23 may be a single channel UV detector, a multi-channel UV spectrometer, or Reflective Index (RI) detector, light scattering detector, radioactivity monitor (RAM) etc. RAM is most widely used in drug metabolism research for 14C-labeled experiments where the various metabolites can be traced in near real time and correlated to the mass spectral scans.
The data analysis system 16 includes a data acquisition portion 32, which may include one or a series of analog to digital converters (not shown) for converting signals from ion detector portion 28 into digital data. This digital data is provided to a real time data processing portion 34, which processes the digital data through operations such as summing and/or averaging. A post processing portion 36 may be used to do additional processing of the data from real time data processing portion 34, including library searches, data storage and data reporting.
Computer system 18 provides control of sample preparation portion 12, mass spectrometer portion 14, other detector portion 23, and data analysis system 16, in the manner described below. Computer system 18 may have a conventional computer monitor or display 40 to allow for the entry of data on appropriate screen displays, and for the display of the results of the analyses performed. Computer system 18 may be based on any appropriate personal computer, operating for example with a Windows® or UNIX® operating system, or any other appropriate operating system. Computer system 18 will typically have a hard drive 42 or other type of data storage medium, on which the operating system and the program for performing the data analysis described below, is stored. A removable data storage device 44 for accepting a CD, floppy disk, memory stick or other data storage medium is used to load the program in accordance with the invention on to computer system 18. The program for controlling sample preparation portion 12 and mass spectrometer portion 14 will typically be downloaded as firmware for these portions of system 10. Data analysis system 16 may be a program written to implement the processing steps discussed below, in any of several programming languages such as C++, JAVA or Visual Basic.
As mentioned in the U.S. Pat. No. 6,983,213, it is always preferred to have mass spectral data acquired in the profile (sometimes called raw or continuum) mode in order to preserve all key information about the ions under observation (Step 210 in
When it comes to elemental composition determination such as in metabolite identification application described above, mass spectrometry at high mass accuracy is a powerful tool used for compound ID or validation by virtue of the fact that every unique chemical formula has a unique mass, as referenced in Blaum, K., Physics Reports, Volume 425, Issues 1, March 2006, Pages 1-78. However, even at very high mass accuracy (1-5 ppm) there are still a significant number of formula candidates to consider as all compounds within the mass error window must be considered, which can be a very large number, as referenced in Kind, T. BMC Bioinformatics 2006, 7, 234. Traditionally, the list of compound candidates can be reduced by limiting the possible elements and applying other chemical constraints, but the list can still easily contain many tens of compounds. For a given compound (ion), the isotope pattern is also unique even if the individual isotopes and isobars are not fully resolved. Simple measurement of the relative intensities of the isotope peaks (M, M+1, M+N . . . ) can be a useful additional metric for paring down the composition list particularly for Br-, Cl-, or S-containing compounds with their unique isotope patterns, as referenced in Kind, T. BMC Bioinformatics 2006, 7, 234. Other approaches include simple computer modeling, as referenced in
More elaborate approaches have been proposed involving the fitting of Gaussian or other assumed mathematical curves to the isotope distribution in an attempt to model the isotope pattern, as referenced in U.S. Pat. No. 6,188,064. However, all of these approaches are only rough approximations to the true isotope pattern because the actual measured line shape is either unknown or not available for use, resulting in modeling errors as large as a few percent, the level of error overwhelming the subtle differences from one formula to another, and largely limiting the usefulness of isotope pattern modeling.
In elemental formula determination approaches in currently available hardware and software systems, including the cross referenced related patent applications/patents, there are no interactive visual tools to aid in the determination process, during which some elements may need to be added or deleted, the number of included elements may need to be adjusted, the chemistry constraints such as double bond equivalence may need to be changed, and the charge state may also need to be adjusted. This application discloses here a novel interactive visual approach to address these deficiencies.
As noted above, previous approaches and/or documents referred to herein, have shown a method by which in using a known calibration ion or ions (either just its mono isotopic peak or the entire isotope profile), accurate correction of the instrument line shape to a known mathematical function can be performed while simultaneously calibrating for the mass axis. The calibration standard can be acquired separately, included in the mix when run with the unknown, as an internal standard and acquired simultaneously, or acquired along with the unknowns at different retention times during the same chromatographic separation.
For example, as mentioned in the U.S. Pat. No. 6,983,213, for a given standard ion of known elemental composition, the acquired profile mode mass spectral data y0 and its theoretical counterpart y are related to each other through
(gy0)=(gy)p Equation 1
where represents convolution, g represents a small Gaussian, and p represents the mass spectral peak shape function. When y0, y, and g are known, the actual mass spectral peak shape function p can be readily calculated through deconvolution.
It is not always convenient or desirable, or it may simply be impractical to run a separate calibration standard to obtain the actual peak shape function described above. Some of these situations include:
In all of these situations, the analysis would still benefit significantly if the actual peak shape function can be utilized. This is disclosed in U.S. provisional patent application Ser. No. 60/941,656, filed on Jun. 2, 2007.
Once the peak shape function p is obtained, one may optionally proceed with the mass spectral calibration as referenced in U.S. Pat. No. 6,983,213 to calibrate for the mass axis, while also transforming the actual peak shape into a desired or target peak shape function that is mathematically definable. Alternatively, but less desirably, one could leave the raw mass spectral data as is, except that the actual peak shape function is now known and numerically represented by p, as outlined in Step 210A in
In order for the mass spectral calibration procedure outlined in U.S. Pat. No. 6,983,213 to work with a single monoisotope peak as a calibration standard, one needs to determine a known elemental composition for this calibration ion, which may be unknown at the moment. There are several ways to handle this:
Advantages of this self-calibration approach include:
Another benefit to calibrating to a known and mathematically definable (also called a desired or target) line shape is the possibility of performing highly accurate background interference correction or of performing any other mathematical data analysis, including multivariate statistical analysis. Calibrating a complex run, such as from a biological matrix, to a known mathematical line shape will significantly improve the ability to discriminate among different sample types associated with a particular biological expression such as is the case in biomarker discovery, through approaches such as principle component analysis.
The referenced U.S. Pat. No. 6,983,213 provides an approach for the use of actual peak shape function in the subsequent peak analysis outlined in Step 210A in
Once the accurate mass is obtained, typically for the monoisotopic peak of the unknown ion, one may proceed to Step 210C in
For each formula on the list of candidate formulas, its theoretical isotope distribution can be readily calculated. By definition, the theoretical isotope distribution comes in the form of a discrete distribution, not a continuum distribution. In order to compare accurately and quantitatively the theoretical distribution and the actual mass spectral data so as to differentiate among the many candidate formulas generated from Step 210C in
In addition to the actual peak shape function, there exist other significant differences that need to be addressed before accurately and quantitatively comparing the theoretical and actual mass spectrum. A theoretical mass spectrum can be calculated at any arbitrary intensity scale, while the actual mass spectrum may come in any given level of system counts, depending on the analog and digital gains built into the hardware and software system, the ionization efficiency of the ion source, the mass spectral transmission efficiency through the mass analyzer, the sample concentration, and any co-existing ions with ion suppression or enhancing effects etc. Furthermore, the actual mass spectrum may come with background ions, interference ions, and baselines. Lastly, the actual mass spectrum may not be located at exactly the same mass location as the theoretical mass spectrum, due to any residual mass error from even the highly accurate mass measurement and calibration. For these reasons, there should be a normalization step before the mass spectral overlay in Step 210E in
The normalization included in Step 210D may take the form of
r=Kc+e Equation 2
where r is an (n×1) matrix of the actual mass spectral data, digitized at n m/z values; c is a (p×1) matrix of regression coefficients which are representative of the concentrations of p components in matrix K; K is an (n×p) matrix composed of mass spectral responses for the p components, all sampled at the same n m/z points as r; and e is an (n×1) matrix of a fitting residual with contributions from random noise and any systematic deviations from this model. The p columns of the matrix K may contain the theoretical mass spectrum t and any background, mass spectra of any interfering ions, or baseline components, which may or may not vary with mass. Columns may also be added into matrix K to contain derivative terms of either the actual mass spectrum or theoretical mass spectrum so as to compensate for any residual mass shift, as disclosed in the cross-referenced International Patent Application PCT/US2004/013096 filed on Apr. 28, 2004.
In the above Equation 2, it should be noted that the vectors r and t can be switched to achieve better computational efficiency, where the matrix K is fixed for all candidate formulas and needs to be inverted only once for normalizing the theoretical mass spectra of each different candidate formula.
The estimation of concentration vector c is first obtained as,
=K+r Equation 3
where K+ is the pseudo inverse of matrix K, a process well established in matrix algebra, as referenced in U.S. Pat. No. 6,983,213; International Patent Application PCT/US2004/013096, filed on Apr. 28, 2004; U.S. patent application Ser. No. 11/261,440, filed on Oct. 28, 2005; International Patent Application PCT/US2005/039186, filed on Oct. 28, 2005; International Patent Application PCT/US2006/013723, filed on Apr. 11, 2006; and U.S. provisional patent application Ser. No. 60/941,656, filed on Jun. 2, 2007. The is the estimated concentration vector c, which can be inserted back into Equation 2 to arrive at a normalized or fitted mass spectral response {circumflex over (r)},
{circumflex over (r)}=K Equation 4
The normalized mass spectrum {circumflex over (r)} and the actual mass spectrum r can now be displayed as overlays in Step 210E in
=r−{circumflex over (r)} Equation 5
This residual vector can be plugged into the following equation for the calculation of a numeric metric to accurately measure the similarity between the two (Step 210F in
The Spectral Accuracy (SA) thus calculated will be 100% if the actual mass spectrum r matches a theoretical mass spectrum exactly. In the absence of random or systematic error, the Spectral Accuracy would be 100% for the correct formula. In practice with ion counting noise on a well calibrated mass spectrometer, the Spectral Accuracy can reach more than 99% to enable unique formula determination even on a single quadrupole MS system.
As noted in Step 210A in
At Step 210F in
Following Step 210B in
A new element, S, is then added to the element list (Step 210G in
The process described above includes a fairly comprehensive series of steps, for purposes of illustration, and to be complete. However, there are many ways in which the process may be varied, including leaving out certain steps, or performing certain steps before hand or “off-line”. For example, it is possible to follow all the above approaches by including disjoining isotope segments (that is using isotope peaks that are separated in mass, but not using portions of the spectrum between the peaks), especially with data measured from higher resolution MS systems, so as to avoid the mass spectrally separated interference peaks that are located within, but are not directly overlapped, with the isotope cluster of an ion of interest. Furthermore, one may wish to include only the isotopic peaks that are not overlapped with interferences in the above analysis, using exactly the same vector or matrix algebra during the normalization Step 210D in
For all the analysis described above, it may be advantageous to transform the m/z axis into another more appropriate axis before hand, to allow for analysis with a uniform peak shape function in the transformed axis, as pointed out in U.S. Pat. No. 6,983,213 and International Patent Application PCT/US2004/034618 filed on Oct. 20, 2004.
Conversely certain steps may be combined or performed at the same times as other steps. For example, if the monoisotope peak is deemed to be impure and overlapped with other monoisotope peaks in Step 210A and Step 210B in
Additionally, some steps may be simplified or combined in specific situations. For example, the normalization step in Step 210D and the preferred embodiment from Equations 2 to 5 can be simplified to a straight scaling operation involving scalar division or multiplication, or in combination with a mass shift operation via spectral interpolation to align the actual mass spectrum with the theoretical mass spectrum or vice versa.
It is noted that the terms “mass” and “mass to charge ratio” are used somewhat interchangeably in connection with information or output as defined by the mass to charge ratio axis of a mass spectrometer. This is a common practice in the scientific literature and in scientific discussions, and no ambiguity will occur, when the terms are read in context, by one skilled in the art.
It is further noted that the terms “peak shape (function)” and “line shape (function)” are used somewhat interchangeably throughout this specification. This is a common practice in the scientific literature and in scientific discussions, and no ambiguity will occur, when the terms are read in context, by one skilled in the art.
The methods of analysis of the present invention can be realized in hardware, software, or a combination of hardware and software. Any kind of computer system—or other apparatus adapted for carrying out the methods and/or functions described herein—is suitable. A typical combination of hardware and software could be a general purpose computer system with a computer program that, when loaded and executed, controls the computer system, which in turn control an analysis system, such that the system carries out the methods described herein. The present invention can also be embedded in a computer program product, which comprises all the features enabling the implementation of the methods described herein, and which—when loaded in a computer system (which in turn control an analysis system), is able to carry out these methods.
Computer program means or computer program in the present context include any expression, in any language, code or notation, of a set of instructions intended to cause a system having an information processing capability to perform a particular function either directly or after conversion to another language, code or notation, and/or reproduction in a different material form.
Thus the invention includes an article of manufacture, which comprises a computer usable medium having computer readable program code means embodied therein for causing a function described above. The computer readable program code means in the article of manufacture comprises computer readable program code means for causing a computer to effect the steps of a method of this invention. Similarly, the present invention may be implemented as a computer program product comprising a computer usable medium having computer readable program code means embodied therein for causing a function described above. The computer readable program code means in the computer program product comprising computer readable program code means for causing a computer to effect one or more functions of this invention. Furthermore, the present invention may be implemented as a program storage device readable by machine, tangibly embodying a program of instructions executable by the machine to perform method steps for causing one or more functions of this invention.
It is noted that the foregoing has outlined some of the more pertinent objects and embodiments of the present invention. The concepts of this invention may be used for many applications. Thus, although the description is made for particular arrangements and methods, the intent and concept of the invention is suitable and applicable to other arrangements and applications. It will be clear to those skilled in the art that other modifications to the disclosed embodiments can be effected without departing from the spirit and scope of the invention. The described embodiments ought to be construed to be merely illustrative of some of the more prominent features and applications of the invention. Thus, it should be understood that the foregoing description is only illustrative of the invention. Various alternatives and modifications can be devised by those skilled in the art without departing from the invention. Other beneficial results can be realized by applying the disclosed invention in a different manner or modifying the invention in ways known to those familiar with the art. Thus, it should be understood that the embodiments has been provided as an example and not as a limitation. Accordingly, the present invention is intended to embrace all alternatives, modifications and variances which fall within the scope of the appended claims.
This application claims priority under 35 U.S.C. § 119(e) from provisional patent application 61/057,804 filed on May 30, 2008, the entire contents of which are incorporated herein by reference for all purposes.
Number | Date | Country | |
---|---|---|---|
61057804 | May 2008 | US |