The invention relates to methods useful for the structural analysis of glycans. Methods are disclosed for sequencing glycans using stepwise disassembly processes by analysis of the fragments produced therein. Methods are additionally provided for identifying sequential mass spectrometry (MSn) disassembly pathways that are inconsistent with a set of expected structures, and which therefore may indicate the presence of alternative isomeric structures. A method for interactive spectra annotation is also provided.
Glycans include, for example, oligosaccharides that are conjugated to fats (lipids) and to over half of human proteins and other important biomolecules, and play important roles in a wide variety of biological processes. Unlike linear DNA and proteins, glycans are not direct gene products, but instead are synthesized by a step-wise process regulated by numerous enzymes called glycosyltransferases. Therefore, glycan structure cannot be accurately predicted by interpretation of the genetic code and requires sophisticated alternative methods for analysis.
Additionally, glycans are complex branched structures, where one monosaccharide residue may be linked to several others. These linkages also have variables such as linkage position and anomericity, resulting in astonishing numbers of theoretically possible structures. These intrinsic properties make glycan analysis (for example, sequencing or detecting isomeric glycans) a considerable technical challenge.
Glycans are significant in a number of biological and biomedical research areas. For instance, glycans are biomarkers for various cancers and the principal component of new and promising vaccines for diverse cancers, viruses (Dwek et al, Nat. Rev. Drug. Discov., 1: 65-75 (2002)), and bacteria. They drive parasite-host and microbe-host interactions, as well as egg fertilization and protein folding. They are crucial to drug development efforts and are involved in allergic and inflammatory responses. Defective glycan metabolism manifests itself as Congenital Disorders of Glycosylation, Gaucher, Fabry, Tay-Sachs, and Sandhoff diseases, among others. Research in these and related areas is hindered by the lack of effective glycan sequencing tools and methods.
In light of the biological and biomedical importance of glycans, methods useful for the structural analysis of glycans are of considerable utility. Because glycans cannot be amplified as DNA can, glycan sequencing and structural analysis technologies must operate on minute quantities of oligosaccharides. Structural analysis can be augmented with enzymes that cleave glycans in well-defined ways, but these methods are restricted by the limited number of available exo- and endoglycosidases and by the fact that many such enzymes are not completely specific. As such, a need exists for improved glycan sequence tools and methods
The invention provides methods useful for glycan structural analysis that employ stepwise disassembly processes. Analysis of the fragments generated by such processes is used, for example, in glycan sequencing and in the determination of isomeric glycans. Stepwise disassembly processes include mass spectrometry (MS) and sequential mass spectrometry (MSn), the sensitivity of which is useful when working with minute analytic samples. The use of mass spectrometry in glycan analysis has largely been limited to the composition of glycan structures as obtaining sequence information has continued to pose considerable technical challenges (Sheridan, Nat Biotechnol. 25: 145-146, 2007). The invention also provides methods of interactive spectra annotation.
In the first aspect, the invention provides a method of glycan sequencing. This method accordingly includes the steps of:
where steps (e)-(h) are, optionally, repeated at least once; and
where fragmentation patterns are mapped to a precomputed composition database.
In certain embodiments, steps (e)-(h) are repeated for all precursor spectra or for a subset of precursor spectra in the fragmentation tree.
In other embodiments, the terminus of the fragmentation tree in (b) is the terminal member, the root member, or an intermediate member.
In some embodiments, the possible substructures generated in (b) are all possible substructures or a subset of all possible structures.
In still other embodiments, a scoring method is used to determine acceptable candidate structures. In certain embodiments, the scoring method includes
In some embodiments, the stepwise disassembly process includes sequential mass spectrometry. In particular embodiments, sequential mass spectrometry uses:
In other embodiments, the stepwise disassembly process further includes the use of at least one glycosidase. In further embodiments, the stepwise disassembly process includes
where steps (d)-(g) are repeated for each remaining pool prepared in (a); and
where the digest of (e) is optionally purified prior to step (f).
In another aspect, the invention provides a method of detecting glycan isomers using sequential mass spectrometry (MSn) including the steps of:
where the disassembly patterns are mapped to a precomputed composition database.
In certain embodiments, the peak selection of (c) is done by a human operator or using a computer algorithm or computer program.
In other embodiments, the scoring method includes identifying each FCP as consistent, possibly consistent, or inconsistent with the corresponding m/z pathway. In still other embodiments, the scoring method involves assigning numerical values to each FCP.
In another aspect, the invention provides a method of interactively annotating a MSn spectrum of an experimental sample including the following steps:
where possible compositions that correspond to a precursor are used to annotate a spectrum;
where ions that do not satisfy a determined threshold are optionally excluded;
where any of the steps (a)-(e), or any combination thereof, may be performed on a precursor more than once; and
where steps (a)-(e) are optionally performed on more than one precursor in a spectrum.
In certain embodiments, the compositions identified in step (d) as not corresponding to the precursor in (c) are eliminated.
In other embodiments, the ions that do not satisfy a determined threshold are excluded. In still other embodiments, the determined threshold may be set by a human operator. In some embodiments, the determined threshold is set by a computer algorithm or program.
In some embodiments, the experimental sample includes a glycan. In certain embodiments, the glycan comprises a five-residue N-linked core.
In any of the methods of the invention, the glycan is a purified glycan, a native glycan, a derivatized glycan, or a glycan that has been cleaved from a glycoconjugate. In any of the methods of the invention, the glycan may be a synthetic glycan. In some embodiments, the glycan has been cleaved from a glycoconjugate using a chemical method or a physical method. In other embodiments, the glycan that is cleaved from a glycoconjugate is a native glycan.
In certain embodiments, the derivatized glycan results from chemical reduction, attachment of a mass tag to the reducing end, by functionalization of hydroxyl groups, or any combination thereof. In some embodiments, the derivatized glycan can be optionally purified.
Any of the methods of the invention may be used in any applications where structural analysis of glycans is useful. For example, the methods are useful for the analysis of biomolecules that have a glycoconjugate, including but not limited to, glycoproteins, glycolipids, and glycosaminoglycans (GAGs). These methods may also be used to analyze N-glycans, O-glycans, glycosaminoglycans (GAGs), and all other oligosaccharides that are not conjugated to another biomolecule.
Applications in which methods for the structural analysis of glycans are useful include, but are not limited to: biomarker discovery; drug discovery, manufacturing, and quality control; parasite/host interaction; infectious disease; egg fertilization; embryonic development; protein folding; glycan-modified protein function; cell adhesion; inter- and intra-cellular signaling; molecular recognition; allergic and inflammatory responses; and defective glycan metabolism (e.g., Congenital Disorders of Glycosylation, Gaucher, Fabry, Tay-Sachs, and Sandhoff diseases, among others). In all of these instances, the use of the methods of the invention can provide information about glycan structure that can lead to insights into biological function.
As used herein, by “candidate structure” is meant a proposed glycan structure or substructure resulting from analysis of fragmentation patterns. A candidate structure can be further analyzed to determine whether it has met a threshold level of acceptability established using scoring methods.
As used herein, by “corresponds sufficiently” is meant that the threshold level of acceptability established by the scoring method used for evaluation been met.
As used herein, by “derivatized glycan” is meant any glycan that has been chemically modified. Glycans can be chemically modified by procedures standard in the art that include, but are not limited to: chemical reduction, attachment of a mass tag to the reducing end, functionalization of hydroxyl groups (e.g., permethylation or peracetylation), or by any combination of these procedures. A derivatized glycan may be optionally purified. Derivatized glycans may optionally be released from a glycoconjugate by procedures standard in the art that include, but are not limited to: chemical methods (e.g., hydrazine or PNGase F) and physical methods (e.g., fragmentation via CID within a mass spectrometer).
As used herein, by “disassembly pattern” is meant any information about a set of glycan structures or substructures that results from performing a stepwise disassembly process on a sample, e.g., a polypeptide or fragment thereof, that includes a glycan. A non-limiting example of a disassembly pattern is the fragmentation pattern obtained by performing mass spectrometry on a sample.
As used herein, by “dissociation mode” is meant the method by which gas phase ions are fragmented in a stepwise disassembly pattern (for example, sequential mass spectrometry). In sequential mass spectrometry, exemplary dissociation modes include, but are not limited to: collision-induced dissociation (CID), in-source fragmentation, infrared multi-photon dissociation (IRMPD), electron capture dissociation (ECD), and electron transfer dissociation (ETD).
As used herein, by “downtree” or “down-tree” is meant the process of comparing a proposed glycan structure against successive product spectra, moving “down” the fragmentation tree. Scoring may be utilized to rank the proposed structures according to how well each fits the experimental spectra.
As used herein, by “experimental mode” is meant the type of charged gas phase ions produced by a mass spectrometry technique such as, for example, sequential mass spectrometry. In positive experimental mode, positively charged ions are produced. In negative experimental mode, negatively charged ions are produced.
As used herein, by “extended m/z pathway” is meant appending the m/z value of a peak observed in a mass spectrum to the m/z pathway associated with said mass spectrum.
As used herein, by “feasible composition pathway” or “FCP” is meant the compositions of a proposed glycan, or substructures thereof that could result from a stepwise disassembly process. Feasible composition pathways are generated from a corresponding extended m/z pathway.
As used herein, by “fragmentation” is meant the rupturing of covalent bonds in a glycan, or substructure thereof, following the performance of a stepwise disassembly process. For example, fragmentation can be accomplished by performing mass spectrometry on said glycan or substructure thereof.
As used herein, by “fragmentation pattern” is meant the collection of substructures formed by the fragmentation of a given glycan or a given substructure thereof. A fragmentation pattern is also a collection of fragmentation values. For example, performing mass spectrometry on a glycan will yield a collection of substructures that can be represented by the corresponding m/z peaks, often represented as a mass spectrum. In tandem mass spectrometry, the m/z peak representing an unfragmented glycan may be subsequently isolated and fragmented, yielding a fragmentation pattern for the m/z peak. In sequential mass spectrometry, also known as MSn, this isolate/fragment cycle can be repeated multiple times, allowing for sequential disassembly of the glycan.
As used herein, by “fragmentation tree” is meant a collection of fragmentation patterns. The fragmentation tree includes the fragmentation pattern of the glycan as well as fragment patterns for the substructures formed from the initial fragmentation or from multiple disassembly steps. For example, sequential mass spectrometry on a glycan affords a fragmentation pattern that includes the peaks corresponding to the gas phase ions formed by the glycan as well as the peaks formed by further fragmentation of the gas phase ions.
As used herein, by “fragmentation value” is meant a numerical value used to represent the substructures formed following fragmentation of a glycan or substructures thereof. For example, the m/z value for a given peak represents the fragmentation value when mass spectrometry is used.
As used herein, by “glycan” is meant a monosaccharide, an oligosaccharide, a polysaccharide, or these structures found in glycoconjugates. Exemplary glycoconjugates are glycoproteins, glycolipids, and glycosaminoglycans. Glycoconjugates also include gangliosides. A glycan may be a native glycan or it may be a derivatized glycan. A glycan may be synthetic or naturally occurring. For example, a glycan may be a synthetic glycan having the structure of a native glycan. Both N-glycans and O-glycans are useful in the methods of the invention. Glycans that are purified are also useful in the methods of the invention. Glycans may optionally be released from a glycoconjugate by procedures standard in the art that include, but are not limited to: chemical methods (e.g., hydrazine or PNGase F) and physical methods (e.g., fragmentation via CID within a mass spectrometer).
As used herein, by “high abundance” is meant that the ratio of (peak intensity)/(intensity of most abundant ion in MS spectrum) for a given peak is determined to exceed a defined value. The ratio may be between the relative intensities of the target and most abundant peaks, the areas under the two peaks, or between any similar metric that expresses the relative abundance of the two peaks. The defined value may be established by the operator or through the use analytical software or other algorithms, or by a combination of operator and algorithms or software. For example, an operator or algorithm can determine that a high abundance peak occurs when the ratio of area of the selected peak to the most abundant peak is at least 0.05 (i.e., 5%).
As used herein in connection with the molecular structure of a glycan, by “internal” is meant a monosaccharide that not at the reducing end or at the non-reducing end of a glycan.
As used herein in connection with a fragmentation tree, by “intermediate member” is meant a member of the fragmentation tree that is not a terminal member or the root.
As used herein, by “ionization method” is meant a method by which a charge is imparted to a target molecule. Examples include electron ionization (EI), electrospray ionization (ESI), matrix-assisted laser desorption/ionization (MALDI), and surface-enhanced laser desorption/ionization (SELDI)
As used herein, by “mass tag” is meant an exogenous molecule that is covalently bound to the glycan, or substructure thereof, that facilitates structural analysis by mass spectrometry. Exemplary mass tags include, but are not limited to, 2-aminobenzoic acid (2-AA) and 2-aminobenzamide (2-AB).
As used herein, by “member of the fragmentation tree” is meant an entity that corresponds to the glycan or the substructures that form following a stepwise disassembly process. Members of the fragmentation tree include the root, the terminal members, and intermediate precursors. A non-limiting example is an intermediate mass spectrum obtained by sequential mass spectrometry.
As used herein, “m/z pathway” corresponds to a series of m/z values that represent one specific sequential disassembly of a glycan structure or substructure. Many different m/z pathways can be generated from the same glycan structure or substructure, each representing a different disassembly sequence.
As used herein, by “native glycan” is meant a glycan as it is found in nature. Native glycans may optionally be released from their glycoconjugate by procedures standard in the art that include, but are not limited to: chemical methods (e.g., hydrazine or PNGase F) and physical methods (e.g., fragmentation via CID within a mass spectrometer).
As used herein, “peak” refers to an observed m/z value in mass spectral data. A peak may be further analyzed to determine whether it is of sufficient abundance as to warrant analysis. This determination may be made manually by the operator or may be determined through the use analytical software or other algorithms, or by a combination of operator and algorithms or software. For example, an algorithm may facilitate the determination of peaks by excluding m/z values that correspond to isotopic variants of a given chemical structure. Peaks may also be referred to as “m/z peaks.”
As used herein, by “precomputed composition database” is meant a database that includes entries for both fragmented and unfragmented glycan compositions. The precomputed composition database may also include entries for glycans that include modifiers such as sulfate and phosphate groups.
As used herein, by “precursor fragmentation pattern” is meant the fragmentation pattern from which a product fragmentation pattern is generated. For example, in sequential mass spectrometry, an ion is isolated on a precursor spectrum and fragmented to produce a product spectrum.
As used herein, by “precursor ion” is meant an ion selected for fragmentation. For example, in sequential mass spectrometry, typically all ions within a given m/z isolation window are isolated and fragmented.
As used herein, by “product fragmentation pattern” is meant the fragmentation pattern resulting from the disassembly of a glycan structure or substructure. For example, in sequential mass spectrometry, isolating and fragmenting a particular m/z ion will generate a product spectrum.
As used herein, by “product ions” is meant ions created by fragmenting a precursor ion.
As used herein in connection with glycans, by “purification” is meant the process of preparing an experimental sample that includes a glycan such that impurities that include, for example, salts and detergents, have been removed. Purification can also refer to the fractionation of an experimental sample that includes more than one glycan by methods known in the art, e.g., high performance liquid chromatography (HPLC) or electrophoresis.
As used herein in connection with a fragmentation tree, by “root” is meant the member of a fragmentation tree that corresponds to the molecular weight of the original glycan structure or substructure submitted for analysis. Typically the root represents an unfragmented glycan, but can represent a glycoconjugate that has been fragmented from, e.g., a glycopeptide or ganglioside. For example, the root terminus of a fragmentation tree obtained using sequential mass spectrometry usually corresponds to the mass spectrum obtained by fragmenting the glycan once.
As used herein in connection with the molecular structure of a glycan, by “root” is meant a monosaccharide at the reducing end of a glycan.
As used herein, by “scoring method” is meant a method used to compare the predicted fragmentation of a glycan, or substructure thereof, with an experimental fragmentation pattern and to assign a value to the glycan, or substructure thereof, based on the comparison. The assigned value is then used to determine whether the proposed glycan, or substructure thereof, meets the threshold of acceptability. Scoring methods may include, but are not limited to, the following criteria: weighting the bond strengths of bonds ruptured in ionization; weighting the likelihood of formation of a proposed substructure; favorably weighting high abundance matching peaks in the experimental data and the predicted data for the candidate structure; penalizing a candidate structure if a predicted substructure has no corresponding experimental peak; or penalizing a candidate structure if a predicted substructure appears in the experimental data with significantly lower abundance than predicted.
As used herein, by “stepwise disassembly process” is meant any process that disassembles glycans in a stepwise fashion. An exemplary, desirable, stepwise disassembly process is sequential mass spectrometry. Stepwise disassembly of glycans may also be accomplished using chemical or biological agents, e.g., glycosidases. Alternatively, a stepwise disassembly process may use both sequential mass spectrometry and glycosidases.
As used herein, by “structure” is meant an unfragmented glycan or a glycan in which a cleavage event was applied to fragment the glycan from its glycoconjugate (for example, fragmenting the glycan off of a glycopeptide or a glycolipid).
As used herein, by “substructure” is meant a molecular fragment that results from performing a stepwise disassembly process on a glycan.
As used herein in connection with a fragmentation tree, by “terminal member” is meant the member of the fragmentation tree for which no further product spectra were generated. For example, in a fragmentation tree obtained using sequential mass spectrometry, generated terminal member is a spectrum for which no contained ion was selected for further fragmentation.
As used herein in connection with the molecular structure of a glycan, by “terminal” is meant a monosaccharide that is at the end of the glycan that is not the reducing end. A terminal monosaccharide may also be referred to as a “leaf.”
As used herein, by “terminus” is meant the member of the fragmentation tree that serves as the starting point for glycan sequencing. A terminus may be selected from a terminal member, the root, or an intermediate member.
As used herein, by “threshold level of acceptability” is meant a value used to determine whether a proposed glycan, or substructure thereof, is consistent with the experimental data.
As used herein, by “unfragmented” is meant a molecule that has not been subjected to a stepwise disassembly process. Such a molecule may also be referred to as a “parent” molecule. For example, “unfragmented glycan” can be used interchangeably with “parent glycan.”
As used herein, by “uptree” or “up-tree” is meant the process of creating proposed glycan structures and comparing them against successive precursor spectra, moving “up” the fragmentation tree. Scoring may be utilized to rank the proposed structures according to how well each fits the experimental spectra, and glycans that meet a threshold of acceptability may be passed to the precursor spectrum for further processing.
Other features and advantages of the invention will be apparent from the following Detailed Description, the drawings, and the claims.
The invention provides methods useful for glycan structural analysis that employ stepwise disassembly processes. Analysis of the fragments generated by such processes is used, for example, in glycan sequencing and in determining the presence of isomeric glycans. Stepwise disassembly processes include mass spectrometry (MS) and sequential mass spectrometry (MSn). The invention also provides methods of interactive spectra annotation.
Glycans are formed from monosaccharide building blocks including, for example, glucose (Glc), mannose (Man), galactose (Gal), fucose (Fuc), β-D-N-acetylglucosamine (GlcNAc), N-acetylgalactosamine (GalNAc), and N-acetylneuraminic acid (Neu5Ac). The monosaccharides that form the glycan are also known as residues. Other monosaccharides of interest include, but are not limited to, xylose, iduronic acid, frutose, glucuronic acid, and ribose.
Scheme 1 shows the results of derivatization on the monosaccharides introduced above. We establish class names to represent monomers with identical masses: H for hexose (glucose, mannose, and galactose); F for deoxyhexose (fucose); N for HexNAc (GlcNAc and GalNAc); and S for the sialic acid NeuAc. The methods of the invention support residues that include the three reduced residues derived from H, F, and N; these are designated h, f, and n, respectively. The methods of the invention will also support other residues such as, for example, xylose, the sialic acid NeuGc, and so on, as well as their reduced counterparts.
Scheme 2 shows a simplified representation of the monosaccharides from Scheme 1. A reduced residue is distinguished by the case of its label, not by a difference in shape. This representation is a simplification of the standards established by the Nomenclature Committee of the Consortium for Functional Glycomics.
Interresidue Linkage and Anomericity
Monosaccharides combine to form disaccharides, trisaccharides, and so on, by forming glycosidic bonds in one of two possible stereochemical anomeric orientations, axial (alpha or a) or equatorial (beta or (3). The interresidue bonds extend from the anomeric carbon (carbon 2 for sialic acid, carbon 1 otherwise) of the non-reducing-end sugar to an available position (carbons 4, 7, 8 or 9 for sialic acid; otherwise a subset of carbons 2, 3, 4, or 6) of the reducing-end sugar. The linkage positions for certain residues are shown in Scheme 1, with the anomeric carbons highlighted. Other monosaccharide residues, for example fructose, have different linkage positions.
Scheme 3 shows a hypothetical trisaccharide with individual residues labeled with superscripts. Residue F0 is terminal (a leaf), H1 in internal, and n2 is at the reducing end (the root). Using the linkage positions shown, we would designate this structure as F1-4H1-4n; that is, an F residue 1-4 linked to an H, which is 1-4 linked to n.
Domon/Costello Fragment Nomenclature
A popular fragment nomenclature was established in Domon and Costello, Glycoconjugate J., 5: 397-409 (1988). Among other things, it defines particular ion fragments as being of type A, B, C, X, Y, or Z. Ion types B/Y and C/Z are complementary fragments caused by cleavages around the glycosidic oxygen. Scheme 4 is used to illustrate the nomenclature as used herein.
Scheme 4A shows a fully methylated FH disaccharide. According to the customary usage, the rightmost residue is the reducing end. There are two pairs of fragments that can be formed by cleavages around the glycosidic oxygen. Scheme 4B shows a cleavage to the non-reducing side of the oxygen, yielding F-(ene) and H-(oh) fragments; these are, respectively, B and Y ions. Scheme 4C shows a cleavage to the reducing side of the oxygen, yielding F-(oh) and H-(ene) fragments, also called C and Z, respectively.
Generally speaking, a B-type ion indicates an (ene) cleavage at the fragment's reducing end, C-type indicates an (oh) at the reducing end, Y-type indicated an (oh) at the non-reducing end, and Z-type indicates an (ene) at the non-reducing end. Both B/Y and C/Z are complementary pairs.
As an extension of this nomenclature, used herein is notation such as B/Y/Y, meaning a fragment with one (ene) cleavage at the reducing end and two (oh) cleavages at the non-reducing end.
The terms (ene) and (oh) do not imply the location of the scars; the B/C/Y/Z notation is required for that. As such, the (ene)/(oh) notation is better suited to compositions and the B/C/Y/Z notation is better suited for fragments.
Domon and Costello also define A- and X-type ions, which represent cleavages across the sugar ring (i.e., cross-ring fragments). Scheme 6 shows one cross-ring fragment that might be observed: part of the H's ring is still attached to the terminal F. The mass of this cross-ring fragment reveals that F0 is linked to either position 4 or 6 of H1. The linkage could just have easily been 1-6 instead of the shown 1-4; the mass of the fragment would have been identical. Multiple cross-ring cleavages are sometimes required to confirm a linkage assignment.
Cross-ring fragments are identified by the bonds cleaved to generate the fragment and whether or not the fragment contains the anomeric carbon of the cleaved residue. Scheme 5 shows the bond numbering for a hexose residue. All residues supported by the methods of the invention described herein share this scheme. In this scheme, bond numbers match the carbon which they follow.
Scheme 6 shows the two fragments that would result from cleaving bonds three and five of the reducing-end hexose. The fragment without the anomeric carbon (labeled “1”) is denoted the 3,5A fragment; the complementary fragment is denoted 3,5X. The cross-ring fragment of 6 could more precisely be described as having composition F-3,5A[HNn], where the [HNn] denotes the residue classes that might have generated the cross-ring fragment. H, N, and n all share the same atomic structure at the relevant parts of the residues, and hence any of these might have generated the fragment. F-3,5A[F] is not a valid composition, as a reducing-end F residue could not produce the fragment exactly as shown—F has no OMe at carbon six. In this case, we know the cross-ring fragment came from a hexose (residue H1, to be specific) and so we further simplify the notation of this fragment from F-3,5A[HNn] to F-3,5A[H].
Composition Notation
Residue compositions are given as residue counts paired with scars. For example, H4N2n represents a composition of four hexoses, two HexNAcs, and one reduced HexNAc. Scars are denoted by (oh) and (ene) modifiers, each of which may be modified by a count. A few examples:
Subscripts denote the number of monomers in an ion composition (e.g., H2 means two hexoses) and superscripts identify particular residues (H2 means the hexose with index 2).
Annotated Disassembly Pathways
In the methods of the invention, some commands accept an m/z disassembly pathway as an argument. For example, the input notation 1636.8—914.4—710.3—506.2—316.2 represents the pathway m/z 1636.8→914.4→710.3→506.2→316.2.
Each ion in the pathway may optionally be annotated with additional bracketed information. A charge state is given as n+ or n−. If no charge state is given, 1+ is assumed. For example,
1141.6[2+]—1012.0[2+]—1537.0 represents a pathway with the first two ions assigned a charge state of 2+ and the last ion assigned, by default, a charge state of 1+.
Ions in the pathway can also be annotated with an “XR” to indicate that cross-ring fragment compositions can be considered for that ion. In the absence of the XR suffix, ions are interpreted as having compositions consistent with the result of multiple glycosidic cleavages only. For example, in this pathway 1636.8—914.4—710.3—506.2—316.2 [XR], only the last ion (m/z 316.2) will entertain cross-ring fragments for its composition; all other ions in the pathway will consider only glycosidic fragments.
Ion annotations can be combined in a comma-separated list. For example, 1141.6[2+, XR] is a doubly-charged ion that allows cross-ring cleavage interpretations.
Structure Notation (Linear Code)
It is often convenient to represent a glycan structure using text instead of a diagram. The representation used by the methods of the invention is based upon the standards established by the Nomenclature Committee of the Consortium for Functional Genomics. In this linear code, reading from left-to-right moves from the non-reducing-end of the glycan to the reducing end, and so the final monomer listed is the reducing-end residue. Parentheses designate branching.
Table 1 shows a series of hypothetical glycan topologies along with the linear code for each. As residues are added, the topology's complexity increases. In this example, n is always the reducing end residue (or, correspondingly, the root of the tree). Topology 1 shows that linear glycans require no parentheses in their linear code, because, of course, they are not branched. Topology 2 show how a simple branch is represented in the linear code: One of the branches is parenthesized, but the other is not. (In our notation, the choice of which branch to parenthesize is arbitrary; other similar notations specify complex rules to generate canonical representations.) Topology 3 shows that branches can themselves contain linear components, and so FH and (SH) represent the two non-reducing-end linear sequences. Topology 4 shows how additional branching is represented. Here the right-most H residue has three branches, represented as FH, (SH), and (N) in the linear code. Similarly, we see a reducing-end fucose-substituted n, represented (F)n.
The simple five residue N-linked core (topology 2 in Table 1) is represented H (H) HNn. Optional interresidue linkages may be given as well, yielding H6 (H3) H4N4n. An alternative form is available, where the anomeric carbon that originates the glycosidic bond is also listed: H1-6 (H1-3) H1-4N1-4n. Finally, alpha/beta anomericity may also be included: Ha1-6 (Ha1-3) Hb1-4Nb14n. For N-linked structures, the user must indicate each core residue by applying a prime: H′ (H′)H′ N′ n′. If the reducing end of the glycan contains a scar, -(oh) or -(ene) may be appended.
Note that linkage designators are neither subscripted nor superscripted, avoiding possible confusion with monomer quantities or indices, respectively.
The linear code used herein will omit optional components not relevant to the particular algorithm being discussed. For example, when anomericity is not being considered when using the methods of the invention, a/b will always be eliminated.
Table 2 defines some equivalent terms which are used interchangeably herein.
The methods of this invention are applicable to glycan types that include, but not limited to: monosaccharides; glycoconjugates (for example, glycoproteins, glycolipids, and glycosaminoglycans), oligosaccharides, and polysaccharides.
Derivatized glycans may be used in the methods of the invention. Analysts routinely derivatize (chemically modify) glycans before MSn analysis.
Glycans can be first released from their conjoiners and purified. For example, a native glycan can be released from a glycoconjugate such as, for example, a glycoprotein, glycolipid, or glycosaminoglycan. Glycans that are released from their conjoiners can afford a complex mixture of oligosaccharides, and direct links back to their sources are lost. Frequently, the exposed hemiacetal bond is reduced to form an alditol, breaking the carbon ring of the reducing-end (root) sugar and giving it a modified mass that serves as a reference anchor during MSn analysis. An exemplary reducing agent used in such processes in sodium borohydride. Other reducing-end tags such as 2-aminobenzoic acid (“2-AA”) and 2-aminobenzamide (“2AB”) can also be used to derivative glycans analyzed using the methods of the invention.
Glycans can also be permethylated. Here, methylation replaces all acidic protons, in effect converting all hydroxyl groups (OH) to methoxyl groups (OCH3, abbreviated OMe). Permethylation allows for the detection of cleavages between residues, as will be discussed herein. The complex glycan mixture may optionally be separated, by LC (liquid chromatography) or similar techniques, to reduce the number of glycan structures examined at one time.
N-Glycans and O-Glycans
N-linked glycans, or simply N-glycans, are always attached to proteins at the nitrogen atom (hence, “N”) of the amide group of an asparagine amino acid residue. Importantly, they nearly always contain a trimannosyl core consisting of five residues linked in an unwavering formation: two mannoses α1-3 and α1-6 connected to a single mannose, which is β1-4 connected to an internal GlcNAc, which is β1-4 connected to the reducing end GlcNAc. See Scheme 7. Larger N-glycans attach additional residues to this core.
O-linked glycans, or O-glycans, are attached to the oxygen atom (hence, “O”) of a serine or threonine amino acid. They commonly consist of from one up to approximately a dozen residues and are often classified according to a series of common core structures, Core 1-Core 8, as shown on page 93 of Brooks et al. in Functional and Molecular Glycobiology, BIOS Scientific Publishers Limited (2002).
The methods of the invention map masses to possible compositions via a precomputed database. It includes entries for both fragmented and unfragmented glycan compositions. The database contains compositions, not structures. The database contains entries for glycans composed of (a limited number of) residues and glycan modifiers such as sulfate and phosphate groups, plus fragment entries that allow for the presence of scars on each of these compositions. Given an observed mass, the database returns a list of glycan compositions and glycan fragment compositions that fall within the experimental error of the mass. The tools then use these compositions to complete their tasks. For example, an observed sodiated ion with m/z 1187.7 would be mapped to the glycan composition H3Nn, plus any other compositions that fall within the specified error tolerance of 1187.7. The composition database utilized in the context of this invention is structurally similar to the one described in section 3.5 of Lapadula, Ph.D. Dissertation, University of New Hampshire, Durham, (2007), herein incorporated by reference, with extensions for phosphate and sulfate modifiers, additional cross-ring cleavages, and additional monomer types. Consequently, it is evident to one skilled in the art that the composition database can be assembled using comparable methods.
The methods of the invention are applicable to any stepwise disassembly process performed on a glycan. Such methods include, but are not limited to, mass spectrometric techniques and chemical methods of disassembly (for example, the use of glycosidases). The methods of the invention are also useful with combinations of stepwise disassembly methods. For example, the methods of the invention include performing mass spectrometry on the products resulting from treatment of a glycan (or mixture of glycans) with glycosidases.
Glycosidases
A method well known in the field utilizes glycosidase digests to remove selected monosaccharide residues from glycans. By alternating the application of various glycosidases with measurement techniques such as tandem MS, the target glycan can be sequentially disassembled. The structural changes can be noted after each digest, and the original structure of the glycan can be determined.
Exemplary, non-limiting glycosidases useful in the invention include endoglycosidases and exoglycosidases. Other exemplary glycosidases include amylases, chitinases, fucosidases, galactosidases, hyaluronidases, invertases, lactases, maltases, mannosidases, N-Acetylgalactosaminidases, N-Acetylglucosaminidases, N-Acetylhexosaminidases, neuraminidases, sucrases, and lysozymes. Still other examples of glycosidases include beta-glucosidase; beta-galactosidase; 6-phospho-beta-galactosidase; 6-phospho-beta-glucosidase; lactase-phlorizin hydrolase;; beta-mannosidase; myrosinase; PNGase F; Peptide-N-Glycosidase A; O-Glycosidase; Endoglycosidase F1; Endoglycosidase F2; Endoglycosidase F3; Endoglycosidase H; Endo-β-galactosidase; Glycopeptidase A; Lacto-N-biosidase.
Mass Spectrometry (MS)
A number of ionization and detection technologies are available for use in Mass spectrometry. Regardless of ionization source (e.g., electrospray (ESI), Matrix Assisted Laser Desorption Ionization (MALDI)), sequential mass spectrometry (MSn), often implemented using an ion trap (IT-MS), allows the operator to select peaks (“precursor ions”) from a spectrum, fragment them, and record the resulting “product ions” in another spectrum. In sequential mass spectrometry, peak fragmentation is iterative and may be performed as many times as required. In some instances, fragmentation may be limited by the physical capabilities of the instruments. Fragmenting a peak from the initial MS spectrum yields an MS2 spectrum; fragmenting a peak from that yields an MS3 spectrum, and so on.
The fragments generated by MSn disassembly can be analyzed by an analyst and are used in the methods of the invention. For example, glycosidic bonds joining monomers are often the most labile and where fragmentation often occurs. Thus, it is frequently the case that the most abundant ions are the result of glycosidic cleavages. Cross-ring cleavages, multiple simultaneous cleavages, and other interpretations are possible as well, but these typically yield lower-intensity peaks when using permethylated glycans.
Derivatization of a glycan can also influence the type of fragments formed (e.g., with the lower-intensity peaks discussed above). Additionally, for permethylated glycans, the fragments generated during MSn preserve hints of their original connectivity. Exemplary types of fragments that can form are those that include 1,2-double bonds (“ene”) or those that include a terminal hydroxyl (“oh”). Specifically, the number of (ene) and (oh) scars in each composition indicate the number of cleavages applied to the fragment, although the original linkage and identity of the cleaved residues are not directly recorded. In this case, the observed composition n-(oh) reveals only that the n residue had a single residue connected directly to it, but not the identity of the residue. Similarly, the H-(ene)(oh) fragment tells us that the H residue had previously been directly connected to two residues, and F-(ene) indicates that the F residue had only a single attached residue.
The invention includes the use of scoring methods in order to compare the predicted fragmentation of a glycan, or substructure thereof, with an experimental fragmentation pattern and to assign a value to the glycan, or substructure thereof, based on the comparison. The assigned value is then used to determine whether the proposed glycan, or substructure thereof, meets the threshold of acceptability.
Scoring methods may include, but are not limited to, the following criteria:
Scoring methods used in the invention can use descriptive terms as assigned values (for example, “consistent,” “possibly consistent,” or “inconsistent”). Alternatively, numerical values may be used as the assigned value.
Methods for Detection of Glycan Isomers (“gtIsoDetect”)
One method of the invention can be used to detect disassembly pathways that likely did not come from a set of expected glycan structures. These detected pathways may instead have originated from structural isomers. Often an analyst will assume that particular glycan structures are present, and wish to be told which pathways appear to indicate the presence of isomers. Put another way, the analyst would like a list of pathways that do not appear to have come from the expected structures. These issues are addressed by the method of the invention for detecting glycan isomers.
Using the glycan isomer detection method of the invention, it can be determined if a given structure can be sequentially disassembled in such a way as to match the observed ions generated by an MSn experiment. The method enables the comparison of each structure against each MSn pathway (as extracted from the MSn spectra) and produces a full report on the consistency of every structure/pathway pair.
Broadly speaking, the method for detection of glycan isomers includes the following features:
A pathway that is possibly consistent or not consistent may actually represent the disassembly of an unexpected glycan structure which may merit further attention from the analyst.
Step (3) mentions the “predicted disassembly” of a glycan. A detailed example of this for permethylated glycans in positive mode is described in Example 1 and Example 2.
The method for detection of glycan isomers can be performed in the following manner:
The method for detecting glycan isomers described above may also be modified according to the following ways.
Arbitrary Cleavages
The glycan isomer detection method described above works with more than just glycosidic cleavages. It also handles cross-ring cleavages as well as other “non-standard” losses that can nonetheless be predicted from an expected glycan structure. For example, permethylated HexNAc (N) residues often lose their acetyl and N-acetyl groups, which register as losses of 42 Da and 74 Da, respectively. These peaks can easily be understood by gtIsoDetect even though they are not the result of glycosidic cleavages.
Linkage Isomers
Because the method for detecting glycan isomers works with cross-ring cleavages, it can be used to find structural isomers that differ only in linkage. For example, the cross-ring fragments generated by a H1-6N disaccharide (that is, a hexose that is 1-6 linked to a HexNAc) differ from the cross-ring fragments from a H1-3N disaccharide. If the expected linkage was 1-6, but 1-3 fragments were observed in the spectrum, the 1-3 fragments would be called out as inconsistent with the expected structure. In this way, the operator can identify “linkage isomers” using the methods described herein.
Methods for Selecting Residues for Each Composition
The method of detecting glycan isomers can determine which residues in a proposed structure can map to the compositions in a feasible composition pathway. The only requirement of this process is that the residues in a given composition be connected together, and for permethylated glycans, be removable from the glycan by cleavages that leave the expected number and type of scars. An exhaustive search for these embedded compositions is a baseline strategy, but can clearly be improved upon using various techniques such as those described herein. One possible implementation may be performed according to the following procedure:
Various optimizations can be performed to increase the efficiency of the search for residues that match a given composition.
For example, as soon as a subtree contains too many residues of a particular type, that branch of the search can be abandoned. Or, if the subtree under R does not contain enough residues of the appropriate types to aggregate into the target composition, that search branch can be abandoned.
More generally speaking, each residue in the glycan can be marked with the sum of the residue types found in the subtree rooted at the residue. This allows the pruning of the search for subtrees, greatly increasing efficiency.
An expanded version of this optimization can also store, at each residue, (1) the minimum and maximum number of (ene) and (oh) cleavages predicted to occur in the residue's subtree, (2) the minimum and maximum number of possible (not predicted) cleavages that could occur in the residue's subtree. Here (1) allows efficient search pruning for the case where the target composition has a known scar count (as when dealing with permethylated glycans) and (2) allows efficient search pruning for the case where scar counts are not available (as when dealing with native glycans).
A given precursor structure may contain multiple internal substructures that match composition C. (For example, there may be multiple ways to extract HN-(ene) from a glycan.) The gtIsoDetect algorithm can find and report all of these substructures.
Native Glycans
This method for detecting glycan isomers can also be used with native glycans. In native glycans, there are fewer “scars” left behind when residues are cleaved, and so strict scar counts cannot be used in the feasible composition pathways. However, just using the residue counts in the composition is enough to make gtIsoDetect useful for native glycans. For example, if a native fragment was determined to contain three residues, H2S, those three residues can be extracted from GM1a (residues H0H2S4) but not from GM1b (as GM1b does not embed a H2S connected substructure). This is described further in Example 1, Scheme 8 of the specification. Therefore any native pathway containing H2S is marked as inconsistent with GM1b, even though exact scar counts are not used.
Multiply-Charged Ions
In addition to singly-charged ions, the methods of the invention can also be used with multiply-charged ions. If ion charge states are determined independently (either by software or by an analyst), the algorithm executes in exactly the same way.
Ions with an undetermined charge state can be processed multiple times, once for each possible charge state. For example, if the doubly-charged precursor m/z 1890.22+ yields the product ion m/z 678.4 with an unknown charge state (but which must necessarily be either 2+ or 1+), the method described above could examine this pathway as both 1890.22+—678.42+ and 1809.22+—678.41+, reporting both results or reporting only the result that is most consistent with an expected structure.
The invention provides methods to reconstruct a glycan's original topology given fragmentation data in the form of data obtained from sequential disassembly methods, e.g., MSn spectra. The invention provides methods for glycan sequencing that employ processes that disassemble glycans in a step-wise fashion. Exemplary stepwise disassembly processes include, but are not limited to, mass spectrometry (e.g., sequential mass spectrometry) and the use of glycosidases to chemically disassemble glycans.
The methods of the invention include taking a precursor structure, for example, an intact glycan or a previously-disassembled fragment, and predicting which product fragments would arise if the substructure were fragmented again.
gtSequenceGrow
One method of the invention for glycan sequencing couples the product fragment prediction process described above with the precursor/product nature inherent in glycan disassembly to derive glycan structures. This method is herein referred to as “gtSequenceGrow.”
Other sequencing methods have had limited success because they attempt to enumerate all possible glycans of a given composition and then score each of those glycans against the experimental data. However, once glycans pass a modest size, the vast number of possible structures makes these methods intractable.
The gtSequenceGrow method solves this problem by interleaving up-tree and down-tree phases, walking up and down the MSn spectrum tree. The method may be performed as illustrated in
To better discriminate between candidates, and to make use of the full MSn spectrum tree, gtSequenceGrow also implements a down-tree phase that interrupts the up-tree phase when suitable MSn spectra are available. When multiple product spectra are available, and when those spectra are compatible with the candidates under consideration, the candidates are passed down the MSn spectrum tree (Step 7). At each step, the candidate is predictively fragmented and compared against the experimental spectrum. The candidate's score is updated accordingly: product spectra that include the candidate's predicted fragments increase the candidate's score, and spectra that do not decrease its score.
Each candidate from Step 6 is passed recursively down the MSn spectrum tree and all spectra that the candidate might have reasonably generated participate in updating the candidate's score. This down-tree processing is very similar to the disassembly process used by gtIsoDetect to identify isomeric fragment peaks. As described herein, the same problem must be faced in gtSequenceGrow of deciding whether a given structure should be considered compatible with a given spectrum—that is, given a candidate structure, determining whether a particular spectrum be used to modify the candidate's score. If the spectrum could not have been generated by the candidate, the candidate's score should not suffer. The candidate should not be penalized just because spectra were collected from an incompatible isomer. To solve this problem, we utilize the gtIsoDetect solution again. As used herein, consistent means that the fragment was predicted, possibly consistent means that the fragment was not predicted but is logically possible to predict, and inconsistent means that the fragment was not predicted or possible to predict.
Given product spectrum S and candidate C, the gtSequenceGrow method can include the following features:
1) Always apply S to C's score if C is consistent with S (that is, C is predicted to fragment in such a way as to generate S);
2) Optionally apply S to C's score if C is possibly consistent with S; and
3) Never apply S to C's score if C is inconsistent with S
The optional application of S to C in the possibly consistent case can be resolved by having the algorithm accept an appropriate decision input from the user. In certain implementations of this method, the analyst (or some external algorithm) is able to make this “do/do not apply” decision each time a possibly consistent spectrum is considered.
When all up-tree and down-tree processing has been completed, the remaining candidate structures and their scoring details are output. Note that because the candidate structures have walked most (or perhaps all) of the MSn tree, a vast amount of information has been collected about each candidate, for example, which disassembly pathways are consistent with which candidates. All of this additional information can also be presented to the user at the algorithm's conclusion.
The gtSequenceGrow can also be described as follows.
Special Handling of Complementary Fragments:
If an MSn spectrum has two product spectra that are complements of each other (that is, they appear to be two fragments that, if combined, would reform exactly the precursor ion), then special processing may be applied:
Other features of the sequencing method include, but are not limited to, those described below.
All candidates can be stored at all spectra in the MSn tree, so external intervention (by another algorithm/technique or a human analyst) is possible. For example, an external tool (or analyst) may prefer a given candidate over all others at a given spectrum. All other candidates could then be eliminated, and the algorithm could continue its processing from that point, bubbling new results up the tree. This interactivity will provide much benefit for users of this technique. A specific example is a database that maps experimental spectra to known substructures. That spectrum's “fingerprint” could be used to deduce the structure represented by the spectrum, and all other candidates could be removed from consideration.
Often a single m/z value may have multiple possible compositions. (For example, the m/z 1677.87 spectrum of has two isobaric [mass equivalent] composition possibilities: H2N4h and H3N3n.) Again, external intervention is possible here, where preferred compositions can be indicated, and undesirable compositions eliminated. The algorithm can continue its processing from that point. For this example, however, we only consider the starting composition H3N3n.
When deciding if a predicted peak is present in the spectrum, external intervention is possible. There are times when different isotopic envelopes overlap, or where the charge state of an ion is difficult to ascertain. In these and similar cases, an external tool or human analyst can be consulted to decide if the predicted peak is truly present, and if so, at what abundance. This interactivity produces large benefits to users of this technique.
The peaks that match each candidate/spectrum pair can be stored and made available as part of the algorithm's output. This provides valuable insight into which candidates are consistent with which subsets of the observed peaks. Importantly, the algorithm does not attempt to create all possible candidates for the full glycan. Instead, it only considers those candidates at MSn level N that are a small “edit distance” away from those at level N+1. By limiting the number of candidates passed up at each step, the algorithm's performance is bounded.
The entire MSn tree is considered, or put another way, none of the collected data are unjustly ignored. Going up the tree, candidates are created, scored, and culled; coming down the tree, their scores are refined.
gtSequenceAll
In select cases, it may desirable to generate the exhaustive set of candidate structures for a full glycan, herein referred to as “gtSequenceAll.” According to the methods of the invention, the “downward” phase of the gtSequenceGrow method can be used and each candidate can be scored against the entirety of the MSn tree using the following sequence:
In other uses of the methods of the invention, upfront processing constrains the number of candidates to be considered, and those candidates are scored in a down-tree phase over the MSn tree. This method is herein referred to as “gtSequenceConstrained.”
This method matches gtSequenceAll described above, with only a single change. Instead of “all possible/plausible candidate structures” in Step 2), the gtSequenceConstrained algorithm generates “a set of candidate structures that are (A) compatible with one or more disassembly pathways in the spectra and/or (B) compatible with presumed biosynthetic constraints and/or (C) consistent with a spectrum fingerprint of known glycans and/or (D) any other technique used to eliminate candidate structures as being too unlikely to merit further consideration.”
Additional modifications of the aforementioned methods for glycan sequencing and isomer detection are possible. Exemplary, non-limiting modifications of these methods are described below.
The -ErrTolPPM and -ErrTolMZ Global Options
The -ErrTolPPM switch gives an error tolerance in parts per million (ppm); -ErrTolMZ gives an error tolerance in m/z units. When an experimental mass is used to retrieve possible compositions, all compositions in the larger of these error tolerance windows are considered.
The -NLinkedCore Global Option
When the -NLinkedCore global option is given, the methods of the invention will only consider structures that embed the N-linked core motif H3Nn (Scheme 7). The structures will have all interresidue linkages assigned as well. This option may be given when the analyst is investigating the linkage of an N-glycan and wishes to assign residues to the 3- or 6-branch of the N-linked core.
The -NLinkedCoreBranching Global Option
The -NLinkedCoreBranching option is similar to -NLinkedCore with the exception that the interresidue linkages are not specified (although branching is specified). This option is used when the analyst is investigating branching topology only, and is not concerned with linkage assignments.
The -ReducingEndResidue Global Option
The -ReducingEndResidue option specifies which residues are eligible to be the reducing-end sugar of suggested structures. The supported option values are shown in Table 3. The default is -ReducingEndResidue any. Many examples in this work use -ReducingEndResidue reduced. The allowed option values are extended as additional residues are supported in the future.
Spectrum annotation is the process of assigning putative compositions to peaks observed on a mass spectrum. This step allows spectra to be interpreted by either an analyst or a computer algorithm or computer program. Prior to the present invention, there was no tool that performs this task interactively for MSn spectra.
Analysts and algorithms must often convert the observed m/z values into putative compositions in order to attempt a structural analysis. The inherent complexity of having multiple MSn spectra, with a tree of precursor and product spectra, can easily overwhelm an analyst—especially given the number of m/z peaks found on each spectrum. Providing interactive capabilities for annotating these spectra is advantageous in the structural analysis of molecules that include, for example, glycans.
The method for interactive spectra annotation described herein can allow the analyst to provide information to the system to reduce this complexity, and to guide the analyst to the most likely interpretations of the peaks on each spectrum. For example, the analyst can eliminate downstream compositions in order to facilitate analysis. One method that can be used to decide which downstream compositions can be eliminated is as follows.
Given a precursor/product composition pair, the residue types and counts are compared to determine if the product could have been generated from the precursor. When cleavage types and counts are available, as with permethylated glycans, the cleavage scars can also be used to rule out impossible precursor/product pairs.
An exemplary method for interactive labeling of spectra can include the following steps:
The following examples are put forth so as to provide those of ordinary skill in the art with a complete disclosure and description of how the methods and compounds claimed herein are performed, made, and evaluated, and are intended to be purely exemplary of the invention and are not intended to limit the scope of what the inventors regard as their invention.
The below data show that some chemical bonds in permethylated glycans are considerably more likely to rupture (i.e., these bonds are more “labile”) than others, and therefore lead to predicable fragments when the glycans are analyzed via MSn.
It has been well established that permethylated glycans tend to fragment most readily at the glycosidic bonds between residues, especially when the number of residues in the precursor fragment is, for example, four or more. A closer examination shows that certain permethylated residues form weaker glycosidic bonds, leading to a skewed distribution of fragment intensities on the experimental spectrum. That is, fragments formed by the rupture of weak bonds tend to occur with a higher relative abundance than fragments formed by the rupture of strong bonds.
Metal ion (Na+, K+, and Li+) and proton localization (or charge localization) in positive mode and electron delocalization in negative mode lead to predictable fragmentation patterns in mass spectrometers, allowing the algorithms to predict fragments correctly with high probability.
We can assign a rough “cost” to each bond, where larger numbers indicate increasingly strong bonds, and hence more costly to break. See, for example. Table 4.
These bond costs are approximate and can be optionally adjusted. For example, bond cleavage costs can depend upon factors that include, for example:
These estimates give predictions that closely match the observed experimental results. Also important is the type of fragments generated when an inter-residue bond is broken. An oxygen atom is between each pair of residues, and the bond can break on either side of the oxygen (see the Domon and Costello A/X, B/Y, and C/Z ion type complements above). The methods of the invention predict which fragment types are expected to arise when bonds are ruptured as shown in Table 5.
Table 4 and Table 5 combine to predict the relative abundance and type of fragments generated during glycan disassembly. As such, they are the underpinnings of the methods for sequencing and isomer detection of the invention.
These predictions align with experimental data as described below.
Fragmentation of GM1a/GM1b
Scheme 8 shows the fragments expected to arise from the mixture of GM1a/GM1b glycans shown in
These predicted fragments are in close agreement with
Every predicted zero-cost fragment was found on the spectrum and in non-trivial abundance. These data support the contention that because the cost fragmentation scheme makes predictions that match experimental results.
Fragmentation of the Intact Glycan
The fetuin glycan m/z 3618.81 (1820.92+) is shown in Scheme 9, with a simplified representation in Scheme 10.
Table 7 lists the ions observed in
The rules set forth herein also correctly predict the cleavage types. For example, ion m/z 847.4 matches the predicted B-type (ene) cleavage to residues N7, N8 and/or N9, and the complementary Y-type (oh) ion is found at m/z 1408.6.
Sequential Fragmentation of an m/z 847.4 Antenna
As another example of predicting the fragmentation of permethylated glycans in positive mode, consider the m/z 847.4 antenna from the previous fetuin glycan shown in Scheme 11a. This example demonstrates the predictability of disassembly on substructures. Given the S-H-N-(ene) linear antenna, we would predict fragments as shown in Table 8.
Again we see that, as predicted, rupturing lower-cost bonds yields fragments in greater abundance. As the precursor ion size shrinks (as measured by the number of contained residues), we are beginning to observe cross-ring fragments, specifically ions m/z 690.3, 486.2 and 315.1. These are shown Scheme 11b, 11c, and 11d, respectively.
Fragmentation of Native Glycans in Negative Mode
The principles used to analyze glycans fragmented in positive mode can be adapted to the analysis of native glycans fragmented in negative mode. Unlike the B-, C-, and Y-type ions that dominate the positive mode spectra of permethylated glycans, native/negative spectra contain mainly A-type cross-ring fragments and C-type glycosidic fragments. Also observed in abundance are what are called “D ions,” which are in effect a combination of two cleavages (C and Z) applied to the same residue. Glycan fragmentation in negative mode is discussed in a series of papers by Harvey (J. Am. Soc. Mass. Spectrom., 16: 622-630 (2005); J. Am. Soc. Mass. Spectrom., 16: 631-646 (2005); and J. Am. Soc. Mass. Spectrom., 16: 647-659 (2005)), each of which is incorporated herein by reference.
In negative mode, a lack of “internal fragments” (fragments produced by cleavages at multiple sites) was observed. This result further serves to increase the predictability of native glycan fragmentation in negative mode.
The fragmentation predictability of native glycans in negative mode makes it an excellent fit for structural analysis according to the methods of the invention.
To illustrate the gtIsoDetect algorithm, we apply it to the concrete example of two isomeric glycans found in ovalbumin m/z 1677.8. The composition pathway used in this example are shown in
Processing 1677.8→1384.5→1125.4→866.4→662.4→444.1
First we demonstrate how gtIsoDetect applies the m/z pathway 1677.8→1384.5→1125.4→866.4→662.4→444.1 to structures B and C. For both structures in parallel, substructures are sought that match the composition of each successive ion in the pathway as shown in Table 9.
As Table 9 shows, structure B is able to fulfill every ion in the pathway via a predicted cleavage. Cleaving above an N yields an (ene) scar and all non-reducing-end cleavages yield (oh) scars.
For m/z 1384.5, residue n7 is lost. For m/z 1125.4, a terminal N must be lost. In both structures, this is ambiguous, as either N4 or N5 can be lost, and so both alternatives are considered. In the very next step (m/z 866.4), however, the other terminal N is lost, eliminating any ambiguity. At m/z 662.4, an internal H is lost, which again is ambiguous as H1 and H2 are both acceptable choices.
m/z 444.1 differs between structures B and C. For B, the ion can be satisfied by the subtree H3N6, which contains the required (ene)(oh)3 scars. The gtIsoDetect labels this structure/pathway pair as predicted. However, no such subtree exists within structure C. The corresponding H3N6 residues would contain only three scars when extracted from the full glycan, not the four scars demanded by the composition. As such, gtIsoDetect labels this structure/pathway pair as inconsistent.
Processing 1677.8→1384.5→1125.4→866.4→662.4→458.1
Next we demonstrate how gtIsoDetect applies the m/z pathway 1677.8→1384.5→1125.4→866.4→662.4→458.1 to structures B and C. This pathway is identical to the previous example, except the terminal ion is not m/z 444.1, but rather m/z 458.1, with a composition of HN-(ene)(oh)2. Again, for both structures in parallel, substructures are sought that match the composition of each successive ion in the pathway. See Table 10.
The processing is unchanged until the final ion. Here, the HN-(ene)(oh)2 composition cannot be satisfied by structure B, because the H3N6 substructure can be extracted with four cleavages, not the required three. Structure B is therefore labeled as inconsistent with this m/z pathway. However, structure C is able to satisfy all losses with predicted cleavages, and so is labeled consistent.
Processing 1677.8→1384.5→1125.4→866.4→662.4→444.1→250.1
Next we demonstrate how gtIsoDetect applies the m/z pathway 1677.8→1384.5→1125.4→866.4→662.4→444.1→250.1 to structures B and C. Ion m/z 250.1 appears on the experimental spectrum of ion m/z 444.1, data not shown. This pathway is identical to the first example, except the new terminal ion m/z 250.1 has been added, with a composition of N-(ene)2. Again, for both structures in parallel, substructures are sought that match the composition of each successive ion in the pathway. See Table 11.
Here, ion m/z 250.1 can be satisfied by structure B, but not by using only predicted fragmentation. The composition of this ion, N-(ene)2, requires an (ene) scar on the non-reducing side of the N residue. This Z-type ion is not predicted; however, it is a logical possibility and so this pathway/structure pair is labeled as possibly consistent. The unsure nature of this assignment is therefore flagged for inspection by the analyst.
Also note that ion m/z 250.1 is not processed for structure C. Because the precursor ion m/z 444.1 is inconsistent with the structure, processing stops and the pathway/structure pair is labeled as inconsistent.
Summary of gtlsoDetect Results
Table 12 gives a summary of the gtIsoDetect output for the six examined pathway/structure pairs. The highlighted entries would be suitable for further investigation by the analyst.
In this Example, we use the gtSequenceGrow method to assign a glycan topology. These data were collected via MSn, but this technique can be applied to any technology that fragments glycans in a predictable step-wise manner such as, for example, with a series of glycosidase digests interleaved with MS/MS analysis.
Processing follows the chart of
Simulate m/z 1384.50/H3N3-(ene)
gtSequenceAll Summary
The combination of up-tree and down-tree processing declares 1.3.1.1.2.1.1.1 as the structure that best fits the examined spectra. This structure as been reported as structure “C” in Ashline 2007, page 3835.
Additional Features of gtSequenceAll
Note that this assembly proceeded without the assumption that the target glycan contained the five-residue N-linked core, but rather correctly inferred the core directly from the data. Prior to the methods of the invention described herein, no existing de novo tool has been capable of such a feat for a glycan of this size. Note also that the algorithm found the expected structure without generating a large number of candidate structures. This feature can be advantageous when, for example, computational resources are limited. Other features of the method may be modified and such modifications can be envisioned and executed by those skilled in the art. Exemplary, non-limiting modifications are described below.
Thresholds for “Missing” Ions
Users can set a relative intensity threshold for considering an ion to be absent. In this presentation, absent means a relative intensity of 0%, but 0.1% can also be used in some cases. The threshold can also be varied based on the structure size and number of predicted low-cost bonds, which absorb collisional energy. That is, if many low-cost bonds are present, it becomes more likely that high-cost bonds will not be ruptured in detectable quantities. Alternatively, the threshold can be raised for fragments predicted to be of higher abundance.
Simulated fragmentation
Scoring
The method can use fragments that are unique to exactly one candidate and cause the score to accentuate the difference between candidates. Alternatively, the unique fragments could be weighted more heavily. The relative abundance of isomers can also be used to weight the scoring method. If isomer X is known to be much more abundant that isomer Y, then X's major peaks should be more abundant than Y's. In another modification, the penalties applied can be reduced when the corresponding experimental spectrum is of poor quality. On a Thermo LTQ, for example, a low normalization level (NL) may mean that ions were so sparse that minor fragments will not be observed. This can be compensated for by accumulating data for a longer period and data averaging. In this case penalties would be reduced for small values of the product NL*(acquisition time). Penalties can also be reduced if the “missing” fragment could only be generated by applying multiple cleavages to the precursor structure.
To better understand Interactive Spectrum Labeling (or Annotating), consider a simplified MSn spectrumtree for IgG glycan m/z 1677.8 as described in Table 20. However, the process extends to the entire MSn spectrum tree. Also for clarity, this example only considers fragment compositions that can arise from the rupture of glycosidic bonds. Again the process extends to other types of cleavages, such as cross-ring fragments and the loss of N-acetyl groups. Lastly, this example focuses on permethylated glycans, but this is not an inherent limitation of the procedure.
For this example and for illustrative purposes, we assume that only three spectra have been collected: 1677.8, 1677.81418.7, and 1677.81418.7900.4. Further we assume that each spectrum contains only two m/z peaks.
For each spectrum's terminal ion, we see that there are there are two possible compositions: m/z 1677.8 can be H3N3n or H2N4h, m/z 1418.7 (as isolated from 1677.8) can be H3N2n-(oh) or H2N3h-(oh), and m/z 900.4 (as isolated from 1677.8—1418.7) can be H3n-(oh)3 or H2Nh-(oh)3. Most of the ions on these spectra also have two interpretations, as shown in the table.
The underlying problem is that Nh has the same mass as Hn—that is, reducing a hexose changes its mass by the same amount as reducing a HexNAc. This leads to the composition ambiguities shown above, where any fragment composition that includes Nh must also have the equivalent composition with Hn substituted instead. This confusion would be greatly magnified over a larger MSn tree where each spectrum would have many m/z peaks.
Interactive spectrum annotation reduces this confusion by allowing an external agent (an analyst or algorithm) to eliminate possible compositions at any point in the MSn tree. Removing these compositions will reduce the number of composition possibilities at all subsequent product spectra and their contained peaks.
In this example, we assume that the analyst (or external algorithm) has knowledge that the glycan under investigation does in fact have a reducing-end HexNAc (n) and not a reducing-end hexose (h). The analyst can transfer this knowledge to the system by eliminating H2N4h as a possible composition for spectrum 1677.8. See Table 21 and notice the highlighted (eliminated) composition.
Now the possible compositions of all product spectra derived directly or indirectly from spectrum 1677.8 can be updated. Because of the precursor/product relationship guaranteed by MSn, the only compositions allowed for product spectra are those that are a subset of the composition available at spectrum 1677.8, namely H3N3n. Propagating this change eliminates a composition possibility from spectrum 1677.8—1418.7, which in turn eliminates a composition possibility from spectrum 1677.8—1418.7—900.4. See Table 22.
Now that the spectra have had their composition sets adjusted, we apply similar logic to each contained peak. If a putative peak composition cannot have been generated from any of its spectrum's remaining compositions, the peak composition is excluded.
For example, spectrum 1677.8—1418.7—900.4 contains the peak 696.4, which currently has two possible compositions, H2n-(oh)3 and HNh-(oh)3. However, the spectrum no longer contains a reduced hexose (h) in any of its compositions, and so we eliminate HNh-(oh)3 as a possible composition for this peak. See Table 23 to see the composition changes to three peaks across the three spectra.
For clarity, Table 24 shows the final composition assignments for all spectra and peaks after the propagation has been completed. Notice the reduction in complexity as compared to the starting point of the analysis.
Beyond the application of precursor/product constraints as shown above, many other constraints can be applied to reduce the number of composition possibilities for both the examined spectra and their contained peaks. These constraints include, but are not limited to:
Note that two composition possibilities for ion m/z 898.27 have been eliminated: FHNh-(oh) and FH2n-(oh). Because the user has selected the constraint labeled “Apply precursor/product constraints”, these two compositions are eliminated because the sole remaining composition for m/z 1273.62—H3NS-(oh)—could generate neither FHNh-(oh) nor FH2n-(oh). Selecting product spectra under m/z 1273.62 would reflect additional eliminations caused by the application of product/precursor constraints, or any other constraints selected or provided by the user.
While the invention has been described in connection with specific embodiments thereof, it will be understood that it is capable of further modifications and this application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure that come within known or customary practice within the art to which the invention pertains and may be applied to the essential features hereinbefore set forth, and follows in the scope of the claims.
All publications, patents, and patent applications mentioned in this specification, including U.S. Provisional Application Nos. 61/057,596 and 61/134,440, are herein incorporated by reference to the same extent as if each independent publication or patent application was specifically and individually indicated to be incorporated by reference.
Other embodiments are within the claims. What is claimed is:
| Filing Document | Filing Date | Country | Kind | 371c Date |
|---|---|---|---|---|
| PCT/US09/45236 | 5/27/2009 | WO | 00 | 2/15/2011 |
| Number | Date | Country | |
|---|---|---|---|
| Parent | 61057596 | May 2008 | US |
| Child | 12995388 | US | |
| Parent | 61134440 | Jul 2008 | US |
| Child | 61057596 | US |