With respect to glycans, releasing glycans from a glycoprotein and labelling the glycans with a fluorescent tag facilitates relative quantitation of each different glycan structure in a sample. In one example, liquid chromatography (LC) may be utilized to separate the labelled released glycans.
Features of the present disclosure are illustrated by way of example and not limited in the following figure(s), in which like numerals indicate like elements, in which:
For simplicity and illustrative purposes, the present disclosure is described by referring mainly to examples. In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present disclosure. It will be readily apparent however, that the present disclosure may be practiced without limitation to these specific details. In other instances, some methods and structures have not been described in detail so as not to unnecessarily obscure the present disclosure.
Throughout the present disclosure, the terms “a” and “an” are intended to denote at least one of a particular element. As used herein, the term “includes” means includes but not limited to, the term “including” means including but not limited to. The term “based on” means based at least in part on.
Glycan peak assignment apparatuses, methods for glycan peak assignment, and non-transitory computer readable media having stored thereon machine readable instructions to provide glycan peak assignment are disclosed herein.
With respect to release of glycans from a glycoprotein and labelling of the glycans with a fluorescent tag, it can be technically challenging to assign a peak to a glycan structure. In some cases, the retention time (RT) of peaks may be utilized to determine the identity of a glycan structure. However, utilization of the retention times may include drawbacks in that the retention times may vary significantly from day to day, over the life of a chromatographic column, from poorly controlled parameters such as ambient temperature effects, due to small changes in the composition of the liquid chromatography (LC) mobile phase, and/or between individual chromatography columns of the same type.
The selectivity between different oligosaccharides (a) may be relatively stable for separation modes associated with hydrophilic interaction chromatography (HILIC), versus their relatively unstable absolute RTs. In this regard, a carbohydrate standard may be utilized to normalize the retention times, creating a retention index (RI), which may be used to control the aforementioned variations. The RI for a glycan structure may be more stable than the retention time, and may therefore be more suitable for assigning peaks to structures. In one example, peaks may be assigned to glycan structures by referring to a library of expected RIs developed based on separate experimental results with previously encountered glycans with well-confirmed structures.
In one example, implementation of an RI may rely on running a glucose homopolymer mixture (e.g., “ladder”), such as dextran, which is labeled with the same tag as the sample (such as 2-aminobenzamide, or InstantPC). Examples of standards such as the 2-aminobenzamide (2-AB) labelled glucose homopolymer standard, InstantPC labelled maltodextrin standard, etc., may be run as part of the same analysis sequence (e.g., a consecutive run before or after the sample runs) and utilize the same instrument method as possible, including mobile phase composition gradient, flow rate and column temperature. The results from a calibration run may be used to fit the relationship between RT and degree of polymerization of the glucose homopolymer with a function such as a 5th degree polynomial function. This approach may be utilized for both size exclusion chromatography mode and HILIC mode chromatography of glycans. For each sample run, the same function may be used to map the RT of each glycan peak to a glucose units (GU) number that represents the degree of polymerization of a labeled glucose homopolymer that would elute at the same time as the glycan peak in question. In some cases, these may be a hypothetical fractional degree of polymerization such as 5.43, 7.81 etc. (e.g., fractional GUs that fall between the RT of two peaks in the labelled glucose homopolymer standard).
Implementing an RI approach may help prevent uncertainty over peak identity in an LC run if an experimenter is performing routine analysis and sees changes in the RT, since the experimenter is aware that the shift is due to changes in their system, as opposed to a new glycan structure appearing in the sample.
It is also possible to assign unknown glycan structures by referring to a list of previously confirmed glycan structures (e.g., labeled with the same tag) that were previously shown to give various GU numbers under very similar experimental conditions. Such lists of glycan structures and their GU may be referred to as a GU library or GU database, such as the library known as Glycobase. In some HILIC methods, glycan structures on amide HILIC columns with common tags may elute in the range from 4 to 14 GU, although this range can be higher or lower depending on the column and technique utilized.
The same general principles may be used with electrophoresis methods, in which case the labeled homopolymer standard migration times may be used to create a migration index that achieves the same goals as the retention index. Tables of known structures and their corresponding GU values may be utilized for assigning glycan structures with neutral tags such as 2-aminobenzamide, but may be less accurate and therefore less useful when working with positively charged tags such as InstantPC. This is because of the higher charge heterogeneity of such samples. For example, when glycan samples containing structures with acidic groups are labeled with the positively charged tag, they may have a net 0 or negative charge, unlike the uniformly positively charged homopolymer mixture. In such cases, the GU values may be more sensitive to changes in ionic interactions brought about by experimental conditions that cannot readily be controlled, such as individual column characteristics, column age, and precise mobile composition. Regardless of the cause, the approach to assigning unknown peaks based on their GU values may be less reliable with positively charged tags.
In order to address at least the aforementioned technical challenges, the apparatuses, methods, and non-transitory computer readable media disclosed herein apply an adjustment to the expected GU (or RT) of each glycan structure in a library according to the number of acidic monosaccharides it contains, which can be 0, 1, 2, 3 or 4. These “charge groups” of glycans are known, respectively, as S0, S1, S2, S3 and S4. The magnitude of the adjustment applied to the GU (or expected RT) may be determined using the RT of labeled glycans of known structure, either as adjacent, external RT standards, or internal RT standards if some peaks in the sample itself have been confidently assigned. For example, the observed GU of an S0 glycan such as GOF may be determined and then the deviation from the expected value (from a “library” of expected GU values) may be applied as an adjustment to the predicted GU/RT of all other S0 glycans.
With respect to S2 glycans, the observed deviation of GU/RT of an observed S2 glycan peak of a known structure may be determined and applied as an adjustment to the predicted GU/RT of all other S2 glycans. This may be done for all charge groups, or even other types of structural glycan categories such as high mannose glycans.
However, to limit the burden of obtaining many calibration points, effective adjustment values for S1, S3 and S4 glycans may be determined by scaling up or down the adjustments from the S0 or S2 glycans.
The apparatuses, methods, and non-transitory computer readable media disclosed herein thus provide for a user to enter observed RTs for their glucose homopolymer standard and enter RTs for an S0 and an S2 glycan. A library of GU values may be provided for labelled glycans that have been determined experimentally using a HILIC column and separation method. When the apparatus is operated after submittal of calibration retention times by a user, the apparatus may determine the predicted RTs at which the user should expect to see each glycan in a library on their own system.
For the apparatuses, methods, and non-transitory computer readable media disclosed herein, in one example, the resultant library of expected RTs may be automatically (e.g., without human intervention) formatted as a compound database or. CSV file, and imported into a liquid chromatography (LC)/mass spectrometry (MS) data analysis software package such as Masshunter Bioconfim. Using this database, the software package may automatically assign the signals observed in LC/MS sample data to glycan structures based on both the RT and their mass to charge ratios as detected by a mass spectrometer, providing a relatively higher degree of confidence in the peak assignment compared to the approach of using RT or mass to charge ratio alone. For example, the calculated expected RTs, as disclosed herein, may be exported as a .CSV file that contains, in the following order of comma delimited columns: the chemical formulae of each glycan structure, the expected RTs of each glycan in the library, a blank column, the name of the glycan structure, and a final column for comments. This .CSV file may be loaded as a compound database file, for example, in the data analysis software package. The data analysis software package may be utilized for automatic searching through LC/MS data for signals that match the expected masses of compounds in a compound database. A table of compounds detected along with a certainty score may be output, and the certainty score may be based on how closely the observed mass for a given compound fits the expected mass based on the chemical formula. This allows a user to quickly identify compounds for signals observed in their LC/MS data. The data analysis software package may include additional features that can be used when the compound database includes accurate expected RTs. In that case, the certainty score may be based on the observed versus expected RT of the proposed compounds. In the case of MassHunter Bioconfirm, the RT certainty score may be a linear function where 0 deviation gives the maximum score of 100. The score drops off linearly down to zero when the deviation from the expected RT reaches a certain value, which can be set by a user. The ability to easily score accuracy or compare the RTs to their expected values is especially useful for glycans with more than one isomer, which cannot be distinguished by mass but may instead be distinguished by RT. The output table may also display the observed versus expected RTs for each signal so that a user can see if the deviation is low or large enough to arouse suspicion that the proposed compound is incorrect. Thus the highly accurate RT predictions achieved by the apparatuses, methods, and non-transitory computer readable media disclosed herein greatly enhances the accuracy and certainty achieved when using a data analysis software package to automatically analyze LC/MS data from glycan samples.
The apparatuses, methods, and non-transitory computer readable media disclosed herein provide technical benefits such as an air gap that separates a user's personal data from the software platform of the apparatus. A user may likely collect both confidential and non-confidential experimental data, but the user is only required to enter non-confidential data into the software platform. This data is then processed by the software platform and returned to the user as a list of predicted retention times, which assists the user in interpreting their data. Furthermore, the ability of the apparatuses, methods, and non-transitory computer readable media disclosed herein to provide predicted retention times eliminates the need to understand how to compare GU values to assign peak identities. In one example disclosed herein, the ability to predict retention times is aided by the use of a novel polynomial that expresses retention time as a function of GU. This polynomial may be paired with a novel extrapolation feature as disclosed herein. The implementation of glycan adjustment values as disclosed herein provides another technical advancement that improves the accuracy of retention time predictions and allows calibration refinement using internal sample peaks.
The apparatuses, methods, and non-transitory computer readable media disclosed herein generally include first generating GU values using a glucose homopolymer standard and then adjusting with real glycan RTs. In this regard, the apparatuses, methods, and non-transitory computer readable media disclosed herein may be implemented based on glycan RTs.
According to another aspect of the apparatuses, methods, and non-transitory computer readable media disclosed herein, some glycans elute at a retention time that is greater than that of the highest degree of polymerization peaks that can be reliably integrated using available standards of labelled glucose homopolymer. This presents technical challenges when interpolating from a 5th degree polynomial that is valid from the lowest degree of polymerization until the highest degree of polymerization that was used to fit the function. In this regard, hypothetical RTs may be generated for degree of polymerizations beyond those observed experimentally by fitting to a function, such as a linear function, based on the final 2 or 3 points (e.g., the RTs of the highest, second highest and third highest degree of polymerization labelled glucose homopolymer peaks). These hypothetical glucose homopolymer degree of polymerization retention times may be treated as though they were experimentally observed, allowing the 5th degree polynomial to extend to higher degree of polymerization (and thus higher GU/RT) than would otherwise be possible.
For the apparatuses, methods, and non-transitory computer readable media disclosed herein, the elements of the apparatuses, methods, and non-transitory computer readable media disclosed herein may be any combination of hardware and programming to implement the functionalities of the respective elements. In some examples described herein, the combinations of hardware and programming may be implemented in a number of different ways. For example, the programming for the elements may be processor executable instructions stored on a non-transitory machine-readable storage medium and the hardware for the elements may include a processing resource to execute those instructions. In these examples, a computing device implementing such elements may include the machine-readable storage medium storing the instructions and the processing resource to execute the instructions, or the machine-readable storage medium may be separately stored and accessible by the computing device and the processing resource. In some examples, some elements may be implemented in circuitry.
Referring to
An RT analyzer 108 that is executed by at least one hardware processor (e.g.,. the hardware processor 602 of
An expected RT generator 112 that is executed by at least one hardware processor (e.g., the hardware processor 602 of
A compound identifier 128 that is executed by at least one hardware processor (e.g., the hardware processor 602 of
According to examples disclosed herein, the RT analyzer 108 may determine, based on application of the glucose ladder RTs 104, and the neutral and charged glycan RTs 106 to the expected GU values, the adjusted GU values 110 by determining, based on the glucose ladder RTs 104, a fifth order polynomial GU as a function of RT.
According to examples disclosed herein, the RT analyzer 108 may determine, based on application of the glucose ladder RTs 104, and the neutral and charged glycan RTs 106 to the expected GU values, the adjusted GU values 110 by determining, based on the fifth order polynomial GU and the neutral and charged glycan RTs, an observed GU 116.
According to examples disclosed herein, the RT analyzer 108 may determine, based on application of the glucose ladder RTs 104, and the neutral and charged glycan RTs 106 to the expected GU values, the adjusted GU values 110 by determining, based on the observed GU 116, GU adjustment factors 118. Further, the RT analyzer 108 may determine, based on application of the GU adjustment factors 118 to the expected GU values, the adjusted GU values 110.
According to examples disclosed herein, the GU adjustment factors 118 may include a first value that represents a neutral glycan adjustment value that may be selectable from a set of options that exist in the GU library. In some cases this will allow for the use of a glycan in a sample, for which the identity is known, to perform the adjustment. Since this adjustment is performed with data from the sample run, the accuracy may be further enhanced by eliminating run-to-run variation from this aspect of the calibration. The GU adjustment factors 118 may include a second value that represents an acidic glycan adjustment value applied to S2 glycans that may be similarly selectable from a set of options which exist in the GU library. This will similarly allow for the use of a glycan in a sample, for which the identity is known, to perform the adjustment. The GU adjustment factors 118 may include a third value that represents an acidic glycan adjustment value multiplied by a factor of 0.5 and applied to S1 glycans. The GU adjustment factors 118 may include a fourth value that represents an acidic glycan adjustment value multiplied by a factor of 1.5 and applied to S3 glycans. The GU adjustment factors 118 may include a fifth value that represents an acidic glycan adjustment value multiplied by a factor of 2.0 and applied to S4 glycans.
According to examples disclosed herein, the RT analyzer 108 may determine, based on application of the glucose ladder RTs 104, and the neutral and charged glycan RTs 106 to the expected GU values, the adjusted GU values 110 by determining, based on subtraction of the GU adjustment factors from respective expected GU values, the adjusted GU values.
According to examples disclosed herein, the RT analyzer 108 may determine, based on application of the glucose ladder RTs 104, and the neutral and charged glycan RTs 106 to the expected GU values, the adjusted GU values 110 by determining, based on the glucose ladder RTs, a fifth order polynomial RT as a function of the GU.
According to examples disclosed herein, the expected RTs 114 may be exported, for example, as a .CSV file 120 that contains, in the following order of comma delimited columns: the chemical formulae of each glycan structure (e.g. glycan formulas 122), the expected RTs 114 of each glycan in the library, a blank column, the name of the glycan structure (e.g., glycan names 124), and a final column for comments. This .CSV file 120 may be formatted, for example, by an LC/MS data analysis formatter 126 and loaded by the compound identifier 128 as a compound database file in a data analysis software package, such as MassHunter Data Analysis or Masshunter Bioconfirm. The data analysis software package may be utilized by the compound identifier 128 for automatic searching through LC/MS data for signals that match the expected masses of compounds in a compound database. For example, the .CSV file 120 may be loaded in a data analysis software package as a compound library to be used for automatic analysis of the LC/MS data. The LC/MS data may be generated by LC/MS acquisition software and an LC/MS instrument that is used to analyze a user's sample. Examples of compounds may include InstantPC (IPC) labeled glycan structures G0F, G2F, G2S2, Man5, Man6. Other examples may also include 2AB labeled glycan structures if a user is working with a 2AB labeling sample prep instead of IPC.
The compound identifier 128 may output a table of compounds (e.g., including the compound 130) detected along with a certainty score, and the certainty score may be based on how closely the observed mass for a given compound fits the expected mass based on the chemical formula. The certainty score may range from 0 to 100, where 100 indicates the best possible match between expected and observed values. In this regard the compound identifier 128 may utilize a scalable system where the expected deviation can be edited in the software (e.g., to account for different experimental set ups having different levels of reproducibility), which affects how quickly the score drops off from 100 as the observed value deviates from expected value. Further, expected mass may be determined by the compound identifier 128 from the chemical formula of the compound, and may be independent of RT.
The certainty score may be utilized as an indicator to identify compounds for signals observed in a user's LC/MS data. Thus, by utilizing the certainty score, the compound identifier 128 may identify, based on the expected RTs 114, at least one compound 130.
Referring to
Specifically, referring to
At 216, a user may run a glycan sample, which may be utilized for one neutral and one charged glycan RTs at block 218, which are also received by the RT analyzer 108 at block 214 to determine GU with equation f (observed GU).
The glucose ladder RTs at block 210 may be further received by the RT analyzer 108 at block 220 to determine a 5th order polynomial RT=g(GU).
The operations at blocks 212, 214, and 220, as well as the operations as blocks 200, 202, 204, 224, 226, and 228 may be performed at a server at 222.
The output of blocks 212, 214, and 220 may be utilized by the RT analyzer 108 at block 224 to determine RT with equation g.
The determined RT from block 224 may be utilized with equation f (e.g., block 212), and at block 226, the difference between the result and the predicted GU may be determined. This difference may be divided by the slope (first derivative) of equation f at the x-value RT (calculated at block 224) to generate an adjustment value at block 226, which is then subtracted from the RT of block 224 to determine (e.g., by the expected RT generator 112) expected RTs 114 at block 228. This resulting RT (block 228) may be fed back through the process in block 226 in an iterative fashion, if desired, to generate increasingly accurate expected RTs.
With respect to determination of expected RTs 114 at block 228, referring to block 234, in an alternative approach, a GU value may be entered into equation f to solve for RT (e.g., GU=f(RT)). In this regard, the implementation at block 224 may represent a 1st approximation of the RT, and the adjustment at block 226 may represent a refinement of that approximation. This alternative approach would also eliminate blocks 220, 224, and 226. Moreover, the circular arrows between blocks 226 and 228 show that the adjustment in block 226 can be performed multiple times to obtain better approximations (e.g., the iterative approach as disclosed herein).
Further, at block 230, the expected RTs 114 may be provided to a user.
The apparatus 100 provides an air gap 232 that separates a user's personal data (e.g., at 206) from the software platform (e.g., at 222). A user may likely collect both confidential and non-confidential experimental data, but they are only required to enter non-confidential data into the software platform for the apparatus 100. This data may be processed by the software platform and returned to the user as a list of predicted RTs, which assists the user in interpreting their data.
The adjustment applied by the RT analyzer 108 at block 204 of
However, to limit the burden of obtaining many calibration points, effective adjustment values for S1, S3 and S4 glycans may be determined by scaling up or down the adjustments from the S0 or S2 glycans.
The apparatus 100 thus allows a user to enter observed RTs for their glucose homopolymer standard and enter RTs for an S0 and an S2 glycan. A library of GU values may be provided for labelled glycans that have been determined experimentally using an
HILIC column and separation method. When the apparatus 100 is operated after submittal of calibration RTs by a user, the apparatus may determine the predicted RTs at which the user should expect to see each glycan in a library on their own system.
Referring to
An example of implementation of the flow of
Referring to
Referring to
An example graph of glucose ladder data is shown at 408. The graph of glucose ladder data shows 5th degree polynomial y=4E-07x5−5E-05x4+0.0032x3−0.1185x2+2.8982x−5.0559. Further, the graph of glucose ladder data shows a logarithmic equation y=13.563ln(x)−16.487.
Referring to
The processor 602 of
Referring to
The processor 602 may fetch, decode, and execute the instructions 608 to receive neutral and charged glycan RTs 106.
The processor 602 may fetch, decode, and execute the instructions 610 to determine, based on application of the glucose ladder RTs 104, and the neutral and charged glycan RTs 106 to expected glucose unit (GU) values, adjusted GU values 110.
The processor 602 may fetch, decode, and execute the instructions 612 to determine, based on the adjusted GU values 110, expected RTs 114 for specified glycans.
The processor 602 may fetch, decode, and execute the instructions 614 to identify, based on the expected RTs, at least one compound 130.
Referring to
At block 704, the method may include determining, based on application of the at least one glucose ladder RT 104, and the at least one neutral and charged glycan RT 106 to at least one expected glucose unit (GU) value, at least one GU adjustment factor 118.
At block 706, the method may include determining, based on the at least one GU adjustment factor 118, at least one adjusted GU value 110.
At block 708, the method may include determining, based on the at least one adjusted GU value 110, at least one expected RT 114 for at least one specified glycan.
Referring to
The processor 804 may fetch, decode, and execute the instructions 808 to determine, based on the adjusted GU values 110, expected RTs 114 for specified glycans.
What has been described and illustrated herein is an example along with some of its variations. The terms, descriptions and figures used herein are set forth by way of illustration only and are not meant as limitations. Many variations are possible within the spirit and scope of the subject matter, which is intended to be defined by the following claims—and their equivalents—in which all terms are meant in their broadest reasonable sense unless otherwise indicated.
This application claims priority to co-pending U.S. Provisional Patent Application Ser. No. 63/587,593, filed Oct. 3, 2023, titled “GLYCAN PEAK ASSIGNMENT”, the disclosure of which is hereby incorporated by reference in its entirety.
Number | Date | Country | |
---|---|---|---|
63587593 | Oct 2023 | US |