This invention relates generally to the field of liquid chromatography and mass spectrometry (LC/MS) and more specifically, to a method of data interpretation, selection and generation of an extracted ion chromatogram.
Liquid chromatography/mass spectrometry (LC/MS) is widely used to identify and characterize a broad range of chemical and biological samples, from small molecules, such as drugs and drug metabolites, to large molecules such as oligonucleotides, polypeptides and proteins. In LC/MS, liquid chromatography (LC) is used to separate a sample into one or more components or into smaller mixtures of components that may be subsequently analyzed by a mass spectrometer.
Ion chromatograms of sample mixtures are often complicated by the presence of peaks associated with components outside the mass range of interest. For this reason, it is common to select ion chromatographic data from a restricted mass range to produce an extracted ion chromatogram (XIC or EIC) of intensity (I) or relative abundance (RA) versus retention time (RT).
The extraction algorithm used to generate XICs should be objective, insensitive to noise, require a minimum of user-defined parameters, and be able to tolerate modest variations and/or slight drifts in mass over the course of a mass spectroscopic measurement.
Two standard approaches to generating XICs are to use a priori knowledge about the target involved to determine a mass range, or to apply an intensity threshold to a plot of intensity versus retention time and the mass to charge ratio (m/z) under the assumption that the remaining data points will line up in rows of constant m/z. Many existing extraction schemes are refinements of these two approaches. Both of these approaches suffer from disadvantages. It may not always be possible to obtain the a prioi knowledge necessary to apply the first scheme. The second scheme can prove excessively sensitive to the choice of an intensity threshold, and can be complicated by cases where the data does not line up in well-defined rows. Against this background, there remains a need in the mass spectrometry art for an improved method of generating XICs that avoids the deficiencies of the above known techniques.
In accordance with an illustrated embodiment of the present invention, a method for generating an extracted ion chromatogram from mass spectrometry data is described by the use of minimal spanning trees (MST). The MST technique described herein provides XICs without the need for any a priori knowledge of the target and without being excessively sensitive to the choice of an intensity threshold. An illustrative embodiment of the present invention receives mass spectrometry data having more than one data point. Each mass spectrometry data point represents three values; a measured ion intensity, a mass to charge ratio (m/z) and a chromatographic retention time. The mass spectrometry data is then filtered to give a filtered mass spectrometry dataset. The filtered dataset is then used to generate a minimal spanning tree (MST) where all of the data points are connected by the shortest possible connecting path. Longer minimal spanning tree branches that join the data points may then be cut or pruned in accordance with a specified length threshold to provide one or more data point sub-trees. The specified length threshold may be input by the user and the remaining sub-trees may be interpreted as a set of extracted ion chromatograms (XICs). Advantages that this method may provide over the prior art include:
During the filtering step of the mass spectrometry data, the data point(s) with maximum observed intensity may be determined in order to set a noise threshold. This can be achieved by multiplying the maximum observed intensity by a relative intensity threshold to determine an absolute intensity threshold below which data points may be discarded. The data points can then be plotted in a retention time dimension and in a m/z dimension.
Scaling of the m/z or retention time axes may play a role in sub-tree formation as branches that are pruned between neighboring data points may be interchanged resulting in different sub-trees being formed. Therefore, a scaling factor may be applied to one of the axes, preferably in the m/z direction, that spaces out the data points in this direction.
The following description is presented to enable a person skilled in the art to make and use the invention, and is provided in the context of a particular application and its requirements. Various modifications to the described embodiments herein will be readily apparent to those skilled in the art and the generic principles may be applied to other embodiments. Thus, the present invention is not intended to be limited to the embodiments and examples shown but is to be given the widest possible scope in accordance with the features and principles shown and described. The particular features and advantages of the invention will become more apparent with reference to the appended
As used herein and unless the context indicates otherwise, singular forms of the terms are to be construed as including the plural form and vice versa. For instance, unless the context indicates otherwise, a singular reference, such as “a” or “an” means “one or more”. Throughout the description and claims of this specification, the words “comprise”, “including”, “having” and “contain” and variations of these words, for example “comprising” and “comprises” etc, mean “including but not limited to”, and are not intended to (and do not) exclude other components. It will be appreciated that variations to the foregoing embodiments of the invention can be made while still falling within the scope of the invention. Each feature disclosed in this specification, unless stated otherwise, may be replaced by alternative features serving the same, equivalent or similar purpose. Thus, unless stated otherwise, each feature disclosed is one example only of a generic series of equivalent or similar features.
The use of any and all examples, or exemplary language (“for instance”, “such as”, “for example”, “e.g.” and like language) provided herein, is intended merely to better illustrate the invention and does not indicate a limitation on the scope of the invention unless otherwise claimed. No language in the specification should be construed as indicating any non-claimed element as essential to the practice of the invention. Steps described in this specification may be performed in any order or simultaneously unless stated or the context requires otherwise. All of the features disclosed in this specification may be combined in any combination, except combinations where at least some of such features and/or steps are mutually exclusive. In particular, the preferred features of the invention are applicable to all aspects of the invention and may be used in any combination. Likewise, features described in non-essential combinations may be used separately (not in combination).
Preferred embodiments of the present invention provide a method for generating an extracted ion chromatogram (XIC or EIC) from mass spectrometry or LC-MS data.
Mass spectrometers are often used to compile a three dimensional data array of intensity (I) or relative abundance (RA) versus mass to charge ratio (m/z) versus retention time (RT) in MS, LC-MS or LC-MSn. A TIC is a two dimensional slice through this 3D data set that displays intensity or relative abundance of all detected ions versus RT. An XIC is a slice through the 3D data set that displays intensity versus RT for only a limited m/z window. For example, in a MS system that had a relatively low resolution mass analyzer such as a standard quadrupole MS or an ion trap MS, if the ion of interest had a nominal m/z value of 301.0 Daltons, a typical XIC might be generated between 300.5 to 301.5 Daltons for analyte quantification (a major use of MS or LC-MS instruments).
A data control system
In step 202 of
In the initial step 201, the mass spectrometry data is received for processing by the data control system 101. The LC/MS instrument then produces an array of data points, each data point having values representing time, m/z, and intensity. These data may either be centroid data, in which case each spectrum is represented as a set of individual peaks, or profile data, in which case each spectrum is represented as a continuous profile of points.
A data set may be represented as a 3-dimensional plot, as shown in
An extracted ion chromatogram (XIC) is generated by restricting the m/z range 302 to a narrow region of interest.
In step 203 (
Minimal spanning trees have several useful properties. They may group clusters of nearby points into sub-trees. If the separation between data points along one axis is typically less than the separation between data points along any other orthogonal axes used in the minimal spanning tree, these sub-trees will tend to form ‘branches’ that run quasi-parallel to this preferred axis. For LC/MS applications, applying a scaling factor in the m/z dimension 302 (
Several well-established algorithms, such as Kruskal's algorithm [Kruskal. J. B., On the Shortest Spanning Subtree of a Graph and the Traveling Salesman Problem, in Proceedings of the American Mathematical Society, Vol 7, No. 1, pp. 48-50, 1956] and Prim's algorithm [Prim, R. C., Shortest connection networks and some generalizations, in Bell System Technical Journal, 36, pp. 1389-1401, 1957], exist to generate minimal spanning trees and these publications are herein incorporated by reference. These algorithms all have the desirable feature that they are automatic, objective, and do not generally depend on user-defined parameters.
Input to step 203 in
[First:Endpoint Second_Endpoint:Length]
These elements and the points they contain need not be in any particular order. For example, in the tree shown in
[5:6:0.1512]
By linking together branches that share endpoints, it is possible to reconstruct part or all of the tree. For example, the portion of the tree in
[5:6:0.1512]
[9:7:0.1838]
[7:6:0.2112]
[7:8:0.2990]
A minimal spanning tree can be divided into two or more sub-trees by pruning branches whose length exceeds a threshold value.
In step 204,
This algorithm provides a reliable and objective means of resolving ambiguities associated with the scatter plot in
In
In step
Various modifications to the described embodiments will be readily apparent to those skilled in the art. The generic principles herein may be applied to similar embodiments of the invention described herein. Thus, the present invention is not intended to be limited to the embodiments and examples shown but is to be accorded the widest possible scope in accordance with the features and principles shown and described.
Although exemplary embodiments herein refer to LC-MS applications, the scope and spirit of the invention is not meant to be limited to LC-MS applications. One skilled in the art would readily recognize that the scope of the invention might relate to many other types of mass spectrometry based systems including but not limited to GC-MS, FT-ICR-MS, IR-MS and Maldi-MS and also to areas that generate similar 3D data plots, for example, photo diode array high performance liquid chromatography (PDA-HPLC) which produces absorbance-retention time-wavelength triplet data points.
In describing exemplary embodiments, specific terminology is used where clarity is required. For purposes of description, each specific term is intended to at least include all technical and functional equivalents that operate in a similar manner to accomplish a similar purpose. Additionally, in some instances where a particular exemplary embodiment includes a plurality of method steps, those steps may be replaced with a single step. Likewise, a single step may be replaced with a plurality of steps that serve the same purpose. It will thus be appreciated that those skilled in the art will be able to devise various alternatives that, although not explicitly shown or described herein, embody the principles of the invention and thus are within its spirit and scope.