MASS LABELS

This invention relates to useful reactive labels for labelling peptides and to methods for deconvoluting or simplifying mass spectra, to identify and quantify peptides. More specifically the invention relates to methods for the identification of peaks in a spectrum, which result from ions from a sample under investigation, and peaks, which result from background radiation, noise or other non-data sources. In particular the method identifies peaks having specific distributions of isotopic variants. The invention is thus capable of rapidly identifying ions with characteristic isotope distributions by comparison with pre-determined isotope distribution templates. These methods are of particular value for the analysis of data obtained by high resolution and high mass accuracy mass analysers such as orbitraps and time-of-flight mass analysers.

BACKGROUND

Mass spectrometry is emerging as the favoured tool for the analysis of large biomolecules, particularly for the analysis of peptides and proteins. Mann and co-workers, for example, have shown that the mass of a single peptide along with partial sequence information, which can be determined through collision induced dissociation of the peptide, can be sufficient to identify the parent protein (1). Consequently, new methods are being developed in which specific peptides are isolated from each protein in a mixture. Conceptually, the simplest approach to the analysis of complex polypeptide mixtures is seen in the MudPIT procedure in which a mixture of polypeptides is digested with a protease and all digest peptides are analysed by Liquid Chromatography Mass Spectrometry (LC-MS) (2,3). The MudPIT approach overcomes the problem of the complexity of the sample by attempting to separate all of these peptides with high resolution multi-dimensional chromatography, but it is not uncommon for many peptides to elute from the chromatographic column simultaneously. Liquid Chromatography separations are generally interfaced to Mass Spectrometry by an electrospray ionisation source. Electrospray ionisation is a very ‘gentle’ technique for getting ions in the liquid phase into the gas phase but ionisation of large biomolecules tends to result in ions being present in multiple charge states complicating the resulting mass spectra (4). Thus the mass spectra that result from the combination of MudPIT and electrospray mass spectrometry are very complex.

In addition, over the last fifteen years a range of chemical mass tags bearing heavy isotope substitutions have been developed to enable and improve the quantitative analysis of biomolecules by mass spectrometry. Depending on the tag design, members of tag sets are either isochemic having the same chemical structure but different absolute masses, or isobaric having both identical structure and absolute mass. Isochemic tags are typically used for quantitation in MS mode whilst isobaric tags must be fragmented in MS/MS mode to release reporter fragments with a unique mass. To date the isotopically doped mass tags have primarily been employed for the analysis of proteins and nucleic acids.

An early example of isochemic mass tags were the Isotope-Coded Affinity Tags (ICAT) (5). The ICAT reagents are a pair of mass tags bearing a differential incorporation of heavy isotopes in one (heavy) tag with no substitutions in the other (light) tag. Two samples are labelled with either the heavy or light tag and then mixed prior to analysis by LC-MS. A peptide present in both samples will give a pair of precursor ions with masses differing in proportion to the number of heavy isotope atomic substitutions.

The ICAT method also illustrates ‘sampling’ methods, which are useful as a way of reconciling the need to deal with small populations of peptides to reduce the complexity of the mass spectra generated while retaining sufficient information about the original sample to identify its components. The ‘isotope encoded affinity tags’ used in the ICAT procedure comprise a pair biotin linker isotopes, which are reactive to thiols, for the capture peptides with cysteine in them. Typically 90 to 95% or proteins in a proteome will have at least one cysteine-containing peptide and typically cysteine-containing peptides represent about 1 in 10 peptides overall so analysis of cysteine-containing peptides greatly reduces sample complexity without losing significant information about the sample. Thus, in the ICAT method, a sample of protein from one source is reacted with a ‘light’ isotope biotin linker while a sample of protein from a second source is reacted with a ‘heavy’ isotope biotin linker, which is typically 4 to 8 daltons heavier than the light isotope. The two samples are then pooled and cleaved with an endopeptidase. The biotinylated cysteine-containing peptides can then be isolated on avidinated beads for subsequent analysis by mass spectrometry. The two samples can be compared quantitatively: corresponding peptide pairs act as reciprocal standards allowing their ratios to be quantified. The ICAT sampling procedure produces a mixture of peptides that represents the source sample that is less complex than MudPIT, but large numbers of peptides are still isolated and their analysis by LC-MS/MS generates complex spectra. With 2 ICAT tags, the number of peptide ions in the mass spectrum is doubled compared to a label-free analysis. Further examples of isochemic tags include the ICPL reagents that provide up to four different reagents, and with ICPL the number of peptide ions in the mass spectrum is quadrupled compared to a label-free analysis. For this reason, it is unlikely to be practical to develop very high levels of multiplexing with simple heavy isotope tag design.

Whilst isochemic tags allow quantification in proteomic studies and assist with experimental reproducibility, this is achieved at the cost of increasing the complexity of the mass spectrum. To overcome this limitation, and to take advantage of greater specificity of tandem mass spectrometry, isobaric mass tags were developed. Since their introduction in 2000 (WO 01/68664), isobaric mass tags have provided improved means of proteomic expression profiling by universal labelling of amine functions in proteins and peptides prior to mixing and simultaneous analysis of multiple samples. Because the tags are isobaric, having the same mass, they do not increase the complexity of the mass spectrum since all precursors of the same peptide will appear at exactly the same point in the chromatographic separation and have the same aggregate mass. Only when the molecules are fragmented prior to tandem mass spectrometry are unique mass reporters released, thereby allowing the relative or absolute amount of the peptide present in each of the original samples to be calculated.

U.S. Pat. No. 7,294,456 sets out the underlying principles of isobaric mass tags and provides specific examples of suitable tags wherein different specific atoms within the molecules are substituted with heavy isotope forms including 13C and 15N respectively. U.S. Pat. No. 7,294,456 further describes the use of offset masses to make multiple isobaric sets to increase the overall plexing rates available without unduly increasing the size of the individual tags. WO 2004/070352 describes additional sets of isobaric mass tags. WO 2007/012849 describes further sets of isobaric mass tags including 3-[2-(2,6-Dimethyl-piperidin-1-yl)-acetylamino]-propanoic acid-(2,5-dioxo-pyrrolidine-1-yl)-ester (DMPip-βAla-OSu).

Despite the significant benefits of previously disclosed isobaric mass tags, these isobaric mass tags require MS/MS analysis to quantify peptides and peptides are typically analyzed individually meaning that there is a finite limit on the number of peptides that can be analyzed by a single MS/MS capable machine in a given amount of time. In a typical analysis, the number of peptides that one would want to be analyzed typically exceeds the throughput capability of the instrument.

MS-mode analysis of peptides is useful in that multiple peptides can be analysed simultaneously increasing the throughput. In addition, with high mass accuracy many peptides can be identified by their mass alone through so-called Accurate Mass Tag (AMT) analysis (6,7). Thus with high mass accuracy MS-mode analysis it is possible to identify a very substantial proportion of any given proteome relatively rapidly. However, it is not been generally shown that it is possible to identify and quantify proteomes using MS-mode tags and AMT approaches as the MS-mode tags introduce additional complexity and ambiguities into AMT database searches.

Recently, with dramatic improvements in mass accuracy and mass resolution enabled by high mass resolution mass spectrometers such as the Orbitrap (8,9), Fourier Transform Ion Cyclotron Resonance (FT-ICR) mass spectrometers (10) and high resolution Time-of-Flight (TOF) mass spectrometers (11), it has become possible to resolve millidalton differences between ion mass-to-charge ratios. This high resolution capability has been exploited to increase multiplexing of Isobaric Tandem Mass Tags using heavy nucleon substitutions of 13C for 15N which results in 6.3 millidalton differences in nominally isobaric reporter ions (12,13). Similarly, it has been shown that metabolic labelling with lysine isotopes comprising millidalton mass differences can be resolved by high-resolution mass spectrometry enabling multiplexing and relative quantification of samples in yeast (14). The authors propose that chemical tags comprising millidalton differences for MS-mode analysis of peptides would be useful but do not suggest any specific tags. Tags comprising very small mass differences are useful in that labelled ions that are related to each other, e.g. corresponding peptides from different samples will cluster closely in the same ion envelope with very distinctive and unnatural isotope patterns that are readily recognisable and which will be much less likely to interfere with the identification of other different peptides because the ion clusters of the labelled peptides comprise an ion envelope that occupies essentially the same space in the mass spectrum that the unlabeled species occupies.

It is thus an objective of this invention to provide sets of isochemic reactive tags for the purposes of labelling peptides and other biomolecules where the tags in a set are differentiated by very small differences in mass.

Furthermore, while isochemic tags comprising very small mass differences give rise to highly distinctive mass spectra, manual analysis of such spectra would be highly time-consuming particularly for complex samples. Consequently, there is a need for software to rapidly and automatically deconvolute these complex spectra, particularly those generated by electrospray ionisation of peptide mixtures, and to identify specific ion classes in the spectra. Peptides have characteristic isotope distributions due to their relatively predictable carbon, nitrogen, oxygen and hydrogen distributions. Some elements are typically not present in peptides, such as halogen atoms while others, such as sulphur and phosphorus are occasionally present. These different atomic compositions give rise to characteristic isotope compositions for peptides due to the natural variations in the abundances of the isotopes of the elements that typically comprise a peptide. Such distributions can in principle be detected in mass spectral data but effective software for this purpose is not readily available. Similarly, altered distributions can be created by labelling peptides with the tags of this invention that are separated by very small mass differences. There is however no software readily available for the automatic processing of spectra to identify ions with characteristic isotope abundance distributions in complex spectra.

It is thus a further aim of the present invention to provide a method for distinguishing between peaks in a mass spectrum that result from a biomolecules labelled with isotopologue mass labels comprising very small mass differences, and peaks that do not, in order to deconvolute and/or simplify the spectrum. In particular, it is an aim of this invention to provide methods of identifying ions with characteristic isotope distributions in mass spectra, even if the ions may have widely different masses and may exist in multiple charge states.

It is a further object of this invention to provide automated methods of interpreting spectra to identify and quantify ions present in the spectra. In particular, it is an objective to provide methods to identify specific features of labelled peptides to assist in the identification of the peptides.

STATEMENT OF INVENTION

The present invention provides, a set of two or more mass labels, wherein each mass label in the set has the same integer mass as every other label in the set, and each mass label in the set has an exact mass which is different to the mass of all other mass labels in the set such that all the mass labels in the set are distinguishable from each other by mass spectrometry.

The term mass label used in the present context is intended to refer to a moiety suitable to label an analyte for determination. The term label is synonymous with the term tag.

The exact mass of a mass label is the theoretical mass of the mass label and is the sum of the exact masses of the individual isotopes of the molecule, e.g. ¹²C=12.000000, ¹³C=13.003355 H¹=1.007825, ¹⁶O=15.994915. This mass takes account of mass defects. The integer mass is also known as the nominal mass, and is the sum of the integer masses of each isotope of each nucleus that comprises the molecule, e.g. ¹²C=12, ¹³C=13, ¹H=1, ¹⁶O=16. The integer mass of an isotope is the sum of protons and neutrons that make up the nucleus of the isotope, i.e. ¹²C comprises 6 protons and 6 neutrons while ¹³C comprises 6 protons and 7 neutrons. This is often also referred to as the atomic mass number or nucleon number of an isotope.

In one embodiment of the set of two or more mass labels, each mass label comprises a reporter moiety, and each mass label in the set has a reporter moiety which has an exact mass which is different to the exact mass of the reporter moiety of every other label in the set such that the reporter moieties are distinguishable by mass spectrometry.

In another embodiment of the set of two or more mass labels, each mass label comprises a reporter moiety, and each mass label in the set has a reporter moiety which has an integer mass which is different to the integer mass of the reporter moiety of every other label in the set such that the reporter moieties are distinguishable by mass spectrometry.

The difference in exact mass between at least two of the mass labels is usually less than 100 millidaltons, preferably less than 50 millidaltons, most preferably less than 20 millidaltons (mDa). Typically, the difference in exact mass between at least two of the mass labels in a set is 2.5 mDa, 2.9 mDa, 6.3 mDa, 8.3 mDa, 9.3 mDa, or 10.2 mDa due to common isotope substitutions as set out in Table 4 below. For example, if a first label comprises a ¹³C isotope, and in a second label this ¹³C isotope is replaced by ¹²C, and a ¹⁴N isotope is replaced by a ¹⁵N isotope, the difference in exact mass between the two labels will be 6.3 mDa.

In a preferred embodiment of the set of two or more mass labels, each mass label in the set is an isotopologue of every other mass label in the set. Isotopologues are chemical species that differ only in the isotopic composition of their molecules. For example, water has three hydrogen-related isotopologues: HOH, HOD and DOD, where D stands for deuterium (²H). Isotopologues are distinguished from isotopomers (isotopic isomers) which are isomers having the same number of each isotope but in different positions. The invention provides a set of 2 or more isotopologue mass labels where the tags have the same integer mass but are differentiated from each other by very small differences in mass such that individual tags are differentiated from the nearest tags by typically less than 100 millidaltons.

Typically, the difference in exact mass is provided by a different number or type of heavy isotope substitution(s).

In a preferred embodiment the set comprises n mass labels, where the m^thmass label comprises (n−m) atoms of a first heavy isotope and (m−1) atoms of a second heavy isotope different from the first, wherein m has values from 1 to n. Typically, heavy isotope is ²H, ¹³C or ¹⁵N. Preferably, the first heavy isotope is ¹³C and the second heavy isotope is ¹⁵N.

In another embodiment, the set comprises n mass labels, wherein the m^thmass label comprises (n−m) atoms of a first heavy isotope selected from ¹⁸O or ³⁴S and (2m−2) atoms of a second heavy isotope different from the first selected from ²H or ¹³C or ¹⁵N, wherein m has values from 1 to n.

In one embodiment of the set of two or more mass labels, each label comprises the formula:

X-L-M

wherein X is a reporter moiety, L is a linker cleavable by collision in a mass spectrometer, and M is a mass modifier, and wherein each mass label further comprises a reactive functionality Re for attaching the mass label to an analyte.

The term reporter moiety is used to refer to a moiety to be detected independently, typically after cleavage, by mass spectrometry, however, it will be understood that the remainder of the mass label attached to the analyte as a complement ion may also be detected in methods of the invention. The mass modifier is a moiety which is incorporated into the mass label to ensure that the mass label has a desired exact mass. The reporter moiety of each mass label may sometimes comprise no heavy isotopes.

In some embodiments the Reactive functionality, Re, may be linked through the X group while in other embodiments the Reactive functionality, Re, may be linked through the M group as follows:

X-M-Re or M-X—Re

Typically each mass label comprises the general formula:

X-(L)_k1-M-(L)_k2-Re or M-(L)_k1-X-(L)_k2-Re;

wherein k1 and k2 are independently integers between 0 and 10.

One or more of the moieties X, M, L or Re may be modified with heavy isotopes to achieve the desired exact and/or integer mass.

In a preferred embodiment the linker L comprises an amide bond.

In a most preferred embodiment the reporter moiety is a mass marker moiety, and the mass modifier is a mass normalization moiety, wherein the mass normalization moiety ensures that each mass label has a desired integer or exact mass. The term mass marker moiety used in the present context is intended to refer to a moiety that is to be detected by mass spectrometry.

The term mass normalisation moiety used in the present context is intended to refer to a moiety that is not necessarily to be detected by mass spectrometry, but is present to ensure that a mass label has a desired aggregate mass. However, the mass normalisation moiety may be detected as part of a complement ion (see below). The mass normalisation moiety is not particularly limited structurally, but merely serves to vary the overall mass of the mass label.

In one embodiment, the mass labels are isotopologues of Tandem Mass Tags as defined in WO 01/68664.

Typically, each mass label in the set has one of the following general structures:

embedded image

wherein * represents that oxygen is ¹⁸O, carbon is ¹³C, nitrogen is ¹⁵N or hydrogen is ²H and wherein the each label in the set comprises one or more * such that in the set of n tags, the m^thtag comprises (n−m) atoms of a first heavy isotope and (m−1) atoms of second heavy isotope different from the first, m is from 1 to n and n is 2 or more; and wherein the cyclic unit is aromatic or aliphatic and comprises from 0-3 double bonds independently between any two adjacent atoms; each Z is independently N, N(R¹), C(R¹), CO, CO(R¹) (i.e. —O—C(R¹)— or —C(R¹)—O—), C(R¹)₂, O or S; X is N, C or C(R¹); each R¹is independently H, a substituted or unsubstituted straight or branched C₁-C₆alkyl group, a substituted or unsubstituted aliphatic cyclic group, a substituted or unsubstituted aromatic group or a substituted or unsubstituted heterocyclic group or an amino acid side chain; and a is an integer from 0-10; and b is at least 1, and wherein c is at least 1.

In an embodiment of the invention, each mass label in the set has one of the following structures:

embedded image

wherein * represents that the oxygen is O¹⁸, carbon is C¹³or the nitrogen is N¹⁵or at sites where the heteroatom is hydrogenated, * may represent H²and wherein the each label in the set comprises one or more * such that in the set of n mass labels, the m^thmass label comprises (n−m) atoms of a first heavy isotope and (m−1) atoms of second heavy isotope different from the first, wherein m has values from 1 to n and n is 2 or more.

A set of mass labels according to the invention may comprise the following mass labels:

embedded image