1. Field of the Invention
The present invention relates to mass spectrometry systems. More particularly, it relates to mass spectrometry systems that are useful for the analysis of complex mixtures of molecules, including large and small organic molecules such as proteins or peptides, environmental pollutants, pharmaceuticals and their metabolites, and petrochemical compounds, to methods of analysis used therein, and to a computer program product having computer code embodied therein for causing a computer, or a computer and a mass spectrometer in combination, to affect such analysis.
2. Prior Art
In drug metabolism studies, researchers typically create a radio-labeled version of the parent drug before dosing the drug in animal or human test subjects. Through biotransformations, the drug will be transformed into its metabolites, between just a few to as many as 50-70 metabolites. By detecting and following the radioactivity, researchers can trace these bio transformations and account for the metabolites. The sample is typically injected into an LC/MS system for analysis, where various metabolites are separated in (retention) time and detected by mass spectrometry. While these metabolites can be traced by a radio activity detector in a split flow arrangement in parallel to mass spectrometry, the identification of these metabolites will ultimately have to rely on mass spectrometry due to its mass (m/z) measuring capability. Unfortunately in many cases, the biological sample, even after extensive clean-up, sample preparation, and LC separation, still suffers from significant matrix or background ion interferences, making metabolite identification a time-consuming and tedious process. To help with the mass spectral identification of possible metabolites, researchers may dose test subjects with a mixture of the native and radio-labeled compound, creating a unique mass spectral signature that is easier for researchers to spot in a mass spectrum. Subject to limitations on total dosage, radioactivity exposure for a given test species, mass spectral saturation, and the uncertainty surrounding the ratio between the native and the radio-labeled version of the drug, metabolite identification remains a daunting task for researchers, even with the aid of radioactivity tracing.
After an ion has been identified to be possibly drug-related, it is typically required to then confirm its elemental composition before structural elucidation through further MS/MS experimentation, or even isolation for NMR analysis. Due to the various backgrounds present, typically, higher resolution mass spectrometry is desired in order to avoid interference from the matrix or background ions. Higher resolution mass spectrometry systems such as TOF, qTOF, Orbi-Trap, or FT ICR MS, offer two distinct advantages: less spectral interferences and higher mass accuracy. Even with elaborate calibration schemes such as lock mass, dual spray, and internal calibration, obtaining unique elemental composition remains a challenge at the extremely high mass accuracy of 100 ppb.
A previous approach, as in U.S. Pat. No. 6,983,213 and International Patent Application PCT/US2005/039186, filed on Oct. 28, 2005, provides a novel method for calibrating mass spectra for improved mass accuracy and line shape correction to improve the ability to perform elemental composition analysis or formula identification.
Very high mass accuracy can be obtained on so-called unit mass resolution systems in accordance with the techniques taught in U.S. Pat. No. 6,983,213.
Accurate line shape calibration provides a highly reliable metric to assist in unambiguous formula identification by matching the measured spectra to calculated candidate formulas, as in International Patent Application PCT/US2005/039186, filed on Oct. 28, 2005.
However, obtaining unique elemental composition from conventional to high resolution mass spectrometry systems remains a challenge to practitioners of mass spectrometry.
Thus, there exists a significant gap between what current mass spectral system can offer, and what is being achieved at the present using existing technologies for mass spectral analysis.
It is an object of the invention to provide a mass spectrometry system and a method for operating a mass spectrometry system that overcomes the disadvantages described above, in accordance with the methods described herein.
It is another object of the invention to provide a storage media having thereon computer readable program code for causing a mass spectrometry system to perform the method in accordance with the invention.
An additional aspect of the invention is, in general, a computer readable medium having thereon computer readable code for use with a mass spectrometer system having a data analysis portion including a computer, the computer readable code being for causing the computer to analyze data by performing the methods described herein. The computer readable medium preferably further comprises computer readable code for causing the computer to perform at least one of the specific methods described.
Of particular significance, the invention is also directed generally to a mass spectrometer system for analyzing chemical composition, the system including a mass spectrometer portion, and a data analysis system, the data analysis system operating by obtaining calibrated continuum spectral data by processing raw spectral data; generally in accordance with the methods described herein. The data analysis portion may be configured to operate in accordance with the specifics of these methods. Preferably the mass spectrometer system further comprises a sample preparation portion for preparing samples to be analyzed, and a sample separation portion for performing an initial separation of samples to be analyzed. The separation portion may comprise at least one of an electrophoresis apparatus, a chemical affinity chip, or a chromatograph for separating the sample into various components.
The foregoing aspects and other features of the present invention are explained in the following description, taken in connection with the accompanying drawings, wherein:
Referring to
Analysis system 10 has a sample preparation portion 12, other detector portion 23, a mass spectrometer portion 14, a data analysis system 16, and a computer system 18. The sample preparation portion 12 may include a sample introduction unit 20, of the type that introduces a sample containing proteins, peptides, or small molecule drug of interest to system 10, such as Finnegan LCQ Deca XP Max, manufactured by Thermo Electron Corporation of Waltham, Mass., USA. The sample preparation portion 12 may also include an analyte separation unit 22, which is used to perform a preliminary separation of analytes, such as the proteins to be analyzed by system 10. Analyte separation unit 22 may be any one of a chromatography column, an electrophoresis separation unit, such as a gel-based separation unit manufactured by Bio-Rad Laboratories, Inc. of Hercules, Calif., and is well known in the art. In general, a voltage is applied to the unit to cause the proteins to be separated as a function of one or more variables, such as migration speed through a capillary tube, isoelectric focusing point (Hannesh, S. M., Electrophoresis 21, 1202-1209 (2000), or by mass (one dimensional separation)) or by more than one of these variables such as by isoelectric focusing and by mass. An example of the latter is known as two-dimensional electrophoresis.
The mass spectrometer portion 14 may be a conventional mass spectrometer and may be any one available, but is preferably one of MALDI-TOF, quadrupole MS, ion trap MS, qTOF, TOF/TOF, or FTMS. If it has a MALDI or electrospray ionization ion source, such ion source may also provide for sample input to the mass spectrometer portion 14. In general, mass spectrometer portion 14 may include an ion source 24, a mass analyzer 26 for separating ions generated by ion source 24 by mass to charge ratio, an ion detector portion 28 for detecting the ions from mass analyzer 26, and a vacuum system 30 for maintaining a sufficient vacuum for mass spectrometer portion 14 to operate efficiently. If mass spectrometer portion 14 is an ion mobility spectrometer, generally no vacuum system is needed and the data generated are typically called a plasmagram instead of a mass spectrum.
In parallel to the mass spectrometer portion 14, there may be other detector portion 23, where a portion of the flow is diverted, for nearly parallel detection of the sample in a split flow arrangement. This other detector portion 23 may be a single channel UV detector, a multi-channel UV spectrometer, or Reflective Index (RI) detector, light scattering detector, radioactivity monitor (RAM) etc. RAM is most widely used in drug metabolism research for 14C-labeled experiments where the various metabolites can be traced in near real time and correlated to the mass spectral scans.
The data analysis system 16 includes a data acquisition portion 32, which may include one or a series of analog to digital converters (not shown) for converting signals from ion detector portion 28 into digital data. This digital data is provided to a real time data processing portion 34, which processes the digital data through operations such as summing and/or averaging. A post-processing portion 36 may be used to do additional processing of the data from real time data processing portion 34, including library searches, data storage and data reporting.
Computer system 18 provides control of sample preparation portion 12, mass spectrometer portion 14, other detector portion 23, and data analysis system 16, in the manner described below. Computer system 18 may have a conventional computer monitor or display 40 to allow for the entry of data on appropriate screen displays, and for the display of the results of the analyses performed. Computer system 18 may be based on any appropriate personal computer, operating for example with a Windows® or UNIX® operating system, or any other appropriate operating system. Computer system 18 will typically have a hard drive 42, on which the operating system and the program for performing the data analysis described below is stored. A drive 44 for accepting a CD or floppy disk is used to load the program in accordance with the invention on to computer system 18. The program for controlling sample preparation portion 12 and mass spectrometer portion 14 will typically be downloaded as firmware for these portions of system 10. Data analysis system 16 may be a program written to implement the processing steps discussed below, in any of several programming languages such as C++, JAVA or Visual Basic.
As mentioned in the U.S. Pat. No. 6,983,213, for a given standard ion of known elemental composition, the acquired profile mode mass spectral data y0 and its theoretical counterpart y are related to each other through
(g{circle around (×)}y0)=(g{circle around (×)}y){circle around (×)}p Equation 1
where {circle around (×)} represents convolution, g represents a small Gaussian, and p represents the mass spectral peak shape function. When y0, y, and g are known, the mass spectral peak shape function p can be readily calculated through deconvolution.
When the measured y0 is a linear combination of two ions at varying relative signal levels, such as the native and radio labeled version of a small molecule drug, additional parameters need to be introduced, such that:
y
0
=c
1
y
1,0
+c
2
y
2,0 Equation 2
y=c
1
y
1
+c
2
y
2 Equation 3.
As long as the two additional parameters c1 and c2 are known or their ratio c1/c2 or c2/c1 is given, the same approach outlined in U.S. Pat. No. 6,983,213 can be used to arrive at the peak shape function p. When their relative concentrations are not known, as is the case in drug metabolism research, due to incomplete isotope replacement reaction, an iterative approach to arrive at c1/c2 and p has been disclosed in International Patent Application PCT/US2005/039186, filed on Oct. 28, 2005 and International Patent Application PCT/US2006/013723, filed on Apr. 11, 2006.
While generally producing excellent results, there are situations in which an iterative approach is not preferred due to at least two considerations: it may be computationally extensive and its convergence is not always guaranteed. For this reason, a more direct, computationally efficient, and reliable approach will be disclosed here as a preferred embodiment described below, which is described herein, in a few distinct steps:
y
0
=c
1(y1{circle around (×)}p)+c2(y2{circle around (×)}p) Equation 4
which can now be solved to obtain updated values for c1 and c2 using the same y0, y1, and y2 from step b above and the cleaned-up version of p, through, for example, least squares linear regression.
It should be noted that with the monoisotopic peak from the native ion from a higher resolution system, where the monoisotopic peak is baseline resolved from other isotopes, the true peak shape function p can be directly obtained without iteration, and the relative concentrations c1 and c2 can be obtained from the above Equation 4 in a single step. Once the true peak shape function p is obtained, one may proceed with the mass spectral calibration as referenced in U.S. Pat. No. 6,983,213 to calibrate for the mass axis while also transforming the peak shape into a desired or target peak shape function that is mathematically definable. Alternatively, but less desirably, one could leave the raw mass spectral data as is, except that the peak shape function is now known and numerically represented by p. This completes Step 230 in
One can now move to the next stage, Step 240 in
This similarity in isotope patterns among the parent drug and its various metabolites will now be exploited for an automatic algorithm to identify the possible presence of these resembling ions without actually knowing their precise elemental compositions. Once the peak shape function p has been obtained along with the concentration ratios c1/c2 between the two basic ions (those of know chemical composition, such as, for example, a parent drug, the isotope labeled version of the parent drug, a known fragment of the parent drug or its isotope labeled version, a known metabolite or its fragment, and the isotope labeled version of the known metabolite or its fragment from drug metabolism studies; e.g. those of a composition that is know or has already been determined), a mass spectral isotope pattern t can be established by
t=(c1y1+c2y2){circle around (×)}p Equation 5
where y1 and y2 are theoretically calculated from the elemental compositions of the basic ion and its isotope labeled version, respectively (Step 240 in
r=Kc+e Equation 6
where r is an (n×1) matrix of the profile mode mass spectral data, digitized at n m/z values; c is a (k×1) matrix of regression coefficients which are representative of the concentrations of k components in matrix K; K is an (n×k) matrix composed of profile mode mass spectral responses for the k components, all sampled at the same n m/z points as r; and e is an (n×1) matrix of a fitting residual with contributions from random noise and any systematic deviations from this model. The k columns of the matrix K will contain the isotope pattern t (for example, in its first column, without the loss of generality, for easy subsequent description) and any background or baseline components, which may or may not vary with mass (as additional columns). A least square solution to Equation 6 leads to
=K+r Equation 7
where K+ (dimensioned as k×n) is the pseudo inverse of the matrix K, a process well established in matrix algebra, as referenced in U.S. Pat. No. 6,983,213; International Patent Application PCT/US2004/013096, filed on Apr. 28, 2004; U.S. patent application Ser. No. 11/261,440, filed on Oct. 28, 2005; International Patent Application PCT/US2005/039186, filed on Oct. 28, 2005; and International Patent Application PCT/US2006/013723, filed on Apr. 11, 2006.
Note that in Equation 7, each row in K+ serves as a digital filter applied to the mass spectral segment r to arrive at a concentration vector c containing the contribution of each component, including the ion isotope pattern t and any components included in matrix K. These digital filters in K+ can be calculated once in a limited mass spectral range and then applied to a mass spectral segment in an extended mass range in a sliding window, much like a convolution filter, in Step 240 in
=r−K Equation 8
in Steps 250 and 260 in
e=∥e∥
2
/∥r∥
2
While one can relate this residual error directly to the likelihood for the presence of a resembling ion, it may be more convenient intuitively to convert this residual error into a numeric metric that increases when the measured isotope pattern more closely resembles the given isotope pattern t given in Equation 5. This numeric metric may be equal to the t-statistic or one minus the p-value as disclosed in U.S. Pat. No. 6,983,213 and U.S. patent application Ser. No. 11/754,305, filed on May 27, 2007; corresponding to International Patent Application PCT/US2007/069832, filed on May 28, 2007, or some other appropriate function of the residual error. This corresponds to Step 280 in
where the subscript i refers to mass spectral data point i, ri and ei are the mass spectral raw signal and residual corresponding to mass spectral data point i based on the above calculations using a mass spectral segment centered around mass spectral data point i, and a is a user-settable parameter that takes on the form of:
a=0.15, for ei<0.15 or 15% relative residual error
a=0.05, for ei≧0.15 or 15% relative residual error
Comparing the zoomed-in versions of
These high likelihood ions and their elemental compositions are reported out by computer 18 (
Since all weights across the mass spectrum can be summed up into a total weight and plotted out as a function of chromatographic retention time (
For reasons discussed in U.S. Pat. No. 6,983,213; International Patent Application PCT/US2004/013096, filed on Apr. 28, 2004; U.S. patent application Ser. No. 11/261,440, filed on Oct. 28, 2005; International Patent Application PCT/US2005/039186, filed on Oct. 28, 2005; International Patent Application PCT/US2006/013723, filed on Apr. 11, 2006; and U.S. patent application Ser. No. 11/754,305, filed on May 27, 2007; International Patent Application PCT/US2007/069832, filed on May 28, 2007, it is preferred to carry out all of the above calculations using the profile mode mass spectral data and have the raw profile mode data calibrated for both mass and peak shape. The above calculations can, however, be carried out in centroid mode, with or without peak shape calibration, with inferior results. In this case, the peak shape function described in this application becomes a delta function with just one non-zero element in the entire peak shape vector.
While the description above uses a pair of two ions as basic ions for easy discussion, the same approach applies to cases involving 3 or more ions. For example, when there are 2 14C replacements with incomplete reaction, it is possible to have a mixture as a linear combination of native, one 14C labeled, and two 14C labeled ion. Identical process and algorithm can be utilized for these multiple 14C labeling experiment by simply augmenting the relevant matrices including K, c, K+, and adding y3. Although there appears to be three concentration elements in this case, there are actually only two independent concentration elements due to the closure rule:
c
3=1−c1−c2
which can be utilized to reduce the number of unknowns estimated and improve the numerical and statistical stability of the calculations. As a special case, when there is only one ion involved as the basic ion for the metabolism study of a Br— or Cl— containing drug, all of the above calculations and algorithms still apply, except that there are no concentration estimate steps b, c, or d.
For all the analysis described above, it may be advantageous to transform the m/z axis into another more appropriate axis before hand, to allow for analysis with a uniform peak shape function in the transformed axis, as pointed out in U.S. Pat. No. 6,983,213 and International Patent Application PCT/US2004/034618 filed on Oct. 20, 2004.
The process described above includes a fairly comprehensive series of steps, for purposes of illustration, and to be complete. However, there are many ways in which the process may be varied, including leaving out certain steps, or performing certain steps before hand or “off-line”. For example, it is possible to follow all the above approaches by including disjoining isotope segments (segments that are not continuous with respect to one another, but have spaces between them in the spectrum), especially with data measured from higher resolution MS systems, so as to avoid the mass spectrally separated interference peaks that are located within, but are not directly overlapped, with the isotope cluster of an ion of interest. Furthermore, one may wish to include only the isotopic peaks that are not overlapped with interferences in the above analysis, using exactly the same vector or matrix algebra during the quantitative comparison Step 250 in
Although the matrix operation is used to describe the process including Equation 1 to 8, its mathematical equivalence such as digital filtering, convolution, deconvolution, correlation, auto-correlation, regression, optimization, and fitting may also be utilized to the same effect, as is well known by one skilled in the art of digital signal processing and numerical analysis.
This invention discloses an approach to calculate or calibrate the actual peak shape function in order to achieve the best possible results. One may bypass this actual peak shape function and instead simply assume a peak shape function to proceed with the ion isotope pattern identification, with somewhat inferior results.
It is noted that the terms “mass” and “mass to charge ratio” are used somewhat interchangeably in connection with information or output as defined by the mass to charge ratio axis of a mass spectrometer. This is a common practice in the scientific literature and in scientific discussions, and no ambiguity will occur, when the terms are read in context, by one skilled in the art.
The methods of analysis of the present invention can be realized in hardware, software, or a combination of hardware and software. Any kind of computer system—or other apparatus adapted for carrying out the methods and/or functions described herein—is suitable. A typical combination of hardware and software could be a general purpose computer system with a computer program that, when loaded and executed, controls the computer system, which in turn control an analysis system, such that the system carries out the methods described herein. The present invention can also be embedded in a computer program product, which comprises all the features enabling the implementation of the methods described herein, and which—when loaded in a computer system (which in turn control an analysis system), is able to carry out these methods.
Computer program means or computer program in the present context include any expression, in any language, code or notation, of a set of instructions intended to cause a system having an information processing capability to perform a particular function either directly or after conversion to another language, code or notation, and/or reproduction in a different material form.
Thus the invention includes an article of manufacture, which comprises a computer usable medium having computer readable program code means embodied therein for causing a function described above. The computer readable program code means in the article of manufacture comprises computer readable program code means for causing a computer to effect the steps of a method of this invention. Similarly, the present invention may be implemented as a computer program product comprising a computer usable medium having computer readable program code means embodied therein for causing a function described above. The computer readable program code means in the computer program product comprising computer readable program code means for causing a computer to effect one or more functions of this invention. Furthermore, the present invention may be implemented as a program storage device readable by machine, tangibly embodying a program of instructions executable by the machine to perform method steps for causing one or more functions of this invention.
It is noted that the foregoing has outlined some of the more pertinent objects and embodiments of the present invention. The concepts of this invention may be used for many applications. Thus, although the description is made for particular arrangements and methods, the intent and concept of the invention is suitable and applicable to other arrangements and applications. It will be clear to those skilled in the art that other modifications to the disclosed embodiments can be effected without departing from the spirit and scope of the invention. The described embodiments ought to be construed to be merely illustrative of some of the more prominent features and applications of the invention. Thus, it should be understood that the foregoing description is only illustrative of the invention. Various alternatives and modifications can be devised by those skilled in the art without departing from the invention. Other beneficial results can be realized by applying the disclosed invention in a different manner or modifying the invention in ways known to those familiar with the art. Thus, it should be understood that the embodiments has been provided as an example and not as a limitation. Accordingly, the present invention is intended to embrace all alternatives, modifications and variances which fall within the scope of the appended claims.
This application claims priority, under 35 U.S.C. §119(e), from provisional patent applications Ser. No. 60/941,656 filed on Jun. 2, 2007 and 60/956,692 filed on Aug. 18, 2007. The entire contents of these applications are incorporated herein, in their entireties. The entire contents of the following documents are incorporated herein by reference in their entireties: U.S. Pat. No. 6,983,213; International Patent Application PCT/US2004/013096, filed on Apr. 28, 2004; U.S. patent application Ser. No. 11/261,440, filed on Oct. 28, 2005; International Patent Application PCT/US2005/039186, filed on Oct. 28, 2005; International Patent Application PCT/US2006/013723, filed on Apr. 11, 2006; U.S. patent application Ser. No. 11/754,305, filed on May 27, 2007; International Patent Application PCT/US2007/069832, filed on May 28, 2007. U.S. patent application Ser. No. 11/830,772 which was filed on Jul. 30, 2007 and which claims priority from provisional patent application Ser. No. 60/833,862 filed on Jul. 29, 2006.
Number | Date | Country | |
---|---|---|---|
60941656 | Jun 2007 | US | |
60956692 | Aug 2007 | US |