A Sequence Listing, which is a part of the present disclosure, is submitted concurrently with the specification as a text file. The name of the text file containing the Sequence Listing is “57777_Seqlisting.txt.” The Sequence Listing was created on May 13, 2022, and is 557 bytes in size. The subject matter of the Sequence Listing is incorporated by reference herein in its entirety.
The present invention relates to a method of designing carbohydrates.
Glycans are major post-translational modifications, and their structures can directly impact protein characteristics such as binding kinetics, stability, and bioavailability [1, 2]. Therefore, an understanding of their associated biosynthetic pathways is essential for efforts to modify or engineer glycosylation [3-5]. However, since glycan synthesis is highly stochastic and compartmentalized, real-time observation of the glycosylation process is extremely difficult and further complicated by the dynamic structures of the endoplasmic reticulum and Golgi apparatus [6, 7]. Thus it has been challenging to fully understand the dynamic process of glycan synthesis [8]. Given our incomplete understanding of the glycosylation machinery and the costly and laborious glycomics procedures, predictive computational glycosylation models can be invaluable for capturing the features of the complex glycosylation machinery and to understand how the glycosylation machinery responds to external or internal signals and perturbations.
Over the past two decades, several computational models have been built to quantify and model glycan synthesis [9-14]. Recently, a Markov chain method [15, 16] was developed for modeling N-linked glycosylation. This approach has the advantage of being a low-parameter framework that does not require kinetic characterization a priori. The Markov chain process effectively captures the sequential and stochastic nature of glycan modification. In the model, each node represents a glycan and the state transitions are the reactions that add a single sugar to the glycan. Thus, the edge weight is a transition probability, which represents the ratio of total flux making a single glycan from a single precursor glycan, divided by the total flux to make all glycans from that same precursor. The stationary distribution of a Markov model represents the distribution of all fluxes used to make all measured glycans. One can learn the transition probabilities for each reaction by fitting the model to a single glycoprofile, and subsequently predict changes in glycosylation following glycoengineering. Initial studies have laid the groundwork for this approach, but further work is needed to develop models that are broadly applicable and practical to predict the glycosylation outcome of complex glycoengineering for diverse protein products.
One challenge in model-based glycoengineering is how to account for complex regulatory mechanisms of the glycosylation machinery and accurately define enzyme and isozyme specificity for different glycan substrates. Indeed, glycosyltransferase (GT) isozyme specificity and interactions between glycosyltransferases remain unclear and therefore difficult to model. Recently, studies have confirmed functional interactions among several GT isozymes, wherein one GT impacts the function of another. Examples include interactions between β-1,4-galactosyltransferase (B4galt) and Mannosyl-glycoprotein N-acetylglucosaminyltransferases (Mgat), B4galt and β-1,3-N-acetylglucosaminyltransferase (B3gnt), Mgat and B3gnt, and B4galt and beta-galactoside alpha-2,3-sialyltransferase (St3gal) [17-20]. Evidence of these interactions has been based on an observed dependency of glycoprofiles or omics data of GT-knockout cell lines (e.g. ST3GAL1 and B4GALT1 interaction [18]). While these findings suggested GT isozymes interact with each other through direct protein-protein interactions or transcriptional regulation, the specific mechanisms of these interactions and the extent of such interactions have not been extensively studied.
Another significant hurdle for predictive modeling for glycoengineering is our incomplete understanding of GT catalytic specificity. Some glycosyltransferase isozymes, such as those from the B4galt and St3gal families, have more specific catalytic activity on different branches of N glycans [17, 21-24]. However, the complex GT-GT interactions, unknown glycan substrate specificities, and the difficulty in obtaining comprehensive omics and enzyme kinetic data, have all presented great challenges to rational model-driven glycoengineering. Therefore, while considerable efforts have been made for predicting glycosylation patterns of recombinant proteins upon the glycoengineered CHO cells [15, 16, 25, 26], model-based prediction of a glycoengineered glycoprofile from the wildtype glycoprofile is still challenging.
The present invention provides a method for determining the biosynthetic basis of a glycosylation pattern or lipid pattern on a cell, glycolipid, tissue, or a protein to be produced by a cell, or an organism to be engineered, comprising: quantifying the impact on the abundance of a glycan or lipid, stemming from enzyme mutation, gene/protein expression changes, or activities of other enzymes to learn enzyme specificity and enzyme interaction rules; and applying learned enzyme specificity and enzyme interaction rules for glycosylation pattern or lipid pattern to predict an outcome of enzyme mutations or gene/protein expression changes on the glycosylation or lipid pattern on a studied protein, lipid, or cell.
In some embodiments the enzyme is a glycosyltransferase (GT) or a glycosidase or an enzyme in lipid biosynthesis or lipid degradation, such as any one enzyme selected from table 1 or table 3.
In some embodiments the enzyme mutations occur by natural mutations, such as by genetic variations of the enzymes or non-naturally by modification of the gene sequence or post-translational modification or enzyme activity through cell culture or chemical treatment, or by changing gene/protein expression levels by natural or non-natural means. In some embodiments the mutations or gene/protein expression changes occur on a single enzyme or multiple enzymes.
In some embodiments the lipids or glycans are free or are attached to a protein, lipid, tissue, recombinant vaccine, or a cell.
In some embodiments the biological source of the glycosylation pattern or lipid pattern (e.g., protein, lipid, cell, tissue sample) is either the same product or a different product from the control product.
In some embodiments the method utilizes a Markov model.
In some embodiments the method to quantify enzyme mutational effects on reactions catalyzed by other enzymes utilizes Markov transition probabilities.
In some embodiments the enzyme mutations or gene/protein expression changes are in different enzymes and/or isozymes.
In some embodiments the cell is a eukaryotic cell, such as a Chinese hamster ovary cell or a human cell, or cancer cell, or mammalian cell, or fish cell, or plant cell, or insect cell, or yeast cell, or fungus, or other microbe.
In some embodiments the glycosylation is any kind of glycosylation, such as N-linked glycosylation, O-linked glycosylation, or glycolipid.
In some embodiments the tissue is any kind of tissue, such as skin, pyloric caeca, or proximal intestine.
In some embodiments the organism is any kind of species, such as a microbe, such as a virus or a bacteria, a plant, or an animal, such as a fish or a mammal.
The present invention provides in a further aspect, a method of producing a protein or lipid or cell or tissue or organism having a desired lipid or glycosylation pattern comprising determining a glycosylation pattern by the methods of the invention, and producing the glycosylated protein or lipid or cell.
The present invention provides in a further aspect a glycan or lipid that is free or part of a protein or cell or tissue or organism produced by the method of the invention.
The present invention provides in a further aspect a method of a glycosylated protein or lipid production or tissue engineering or organism engineering in a biological sample according to the invention, wherein the method is conducted in a biopharmaceutical manufacturing facility.
The present invention provides in a further aspect a method of treatment for a biological sample in need comprising administering to the subject a treatment effective amount of the glycosylated protein or lipid or cell or tissue or organism of the invention, wherein the biological sample is a cell culture.
The present invention provides in a further aspect a method of treatment for a biological sample in need comprising administering to the subject a treatment effective amount of the glycosylated protein or lipid or cell of the invention, wherein the biological sample comprises mammalian cell, such as (e.g., CHO cells or a human subject), or an animal cell, such as (e.g., salmon cells or fish subjects), or a plant cell, or an insect cell, or yeast cell, or fungus, or other microbe.
The disclosure further provides a method for determining a glycosylation pattern on a protein to be produced by a cell or a lipid pattern produced by a cell, comprising: quantifying glycosyltransferase (GT) or lipid enzyme mutations effects on reactions catalyzed by other GTs or lipid enzymes from a glycosylated product or lipid-comprising product to establish learned GT-GT interaction rules or enzyme-enzyme interaction rules, and applying learned GT-GT or enzyme-enzyme interaction rules from the quantifying step to predict an outcome of GT or enzyme mutations on the glycosylation pattern on the protein or lipid pattern.
In embodiments, the disclosure provides a method for determining a glycosylation pattern on a protein or lipid pattern to be produced by a cell, wherein the mutations occur on more than one GT gene or lipid metabolism gene.
In embodiments, the disclosure provides a method for determining a glycosylation pattern on a protein to be produced by a cell, wherein the glycosylated product is a protein.
In embodiments, the disclosure provides a method for determining a glycosylation pattern on a protein or a lipid pattern to be produced by a cell, wherein the protein is different from the glycosylated product or the lipid product and wherein the method utilizes a Markov model.
In embodiments, the disclosure provides a method for determining a glycosylation pattern on a protein or lipid pattern to be produced by a cell, wherein GT or enzyme mutations are in different GT or lipid metabolism isozymes and wherein the cell is a human cell or a CHO cell or a mammalian cell.
In embodiments, the disclosure provides a method of producing a protein having a desired glycosylated pattern comprising determining a glycosylation pattern and producing the glycosylated protein.
In embodiments, the disclosure provides a method of treatment for a subject in need comprising administering to the subject a treatment effective amount of the glycosylated protein.
The present invention provides an extensive Markov modeling framework for glycosylation and lipids. Specifically, this modeling framework can learn glycosyltransferase or lipid enzyme activities, including substrate specificities of individual GT or lipid-metabolism isozymes. We further present here a model that predicts the glycosylation of protein drugs produced by glycoengineered Chinese hamster ovary (CHO) cells with multiple glycosyltransferase isozyme knockouts. We demonstrate that our model can estimate the isozyme specificity. We further employed the model to predict the glycoprofiles of multiple glycosyltransferase knockouts. Finally, we show our model effectively predicts glycoengineered glycoprofiles for three diverse recombinant proteins based solely on the wildtype glycoprofiles for three protein drugs (Rituximab, Enbrel, and alpha-1 antitrypsin) produced by CHO cells. These results demonstrate that our updated modeling framework provides a valuable approach for rational glycoengineering and for elucidating the relationships among glycosyltransferases, wherein one can discover the genetic basis of complex glycosylation regulatory mechanisms.
All publications, patents, and patent applications mentioned in this specification are herein incorporated by reference to the same extent as if each individual publication, patent, or patent application was specifically and individually indicated to be incorporated by reference.
Unless defined otherwise, all technical and scientific terms and any acronyms used herein have the same meanings as commonly understood by one of ordinary skill in the art in the field of the invention. Although any methods and materials similar or equivalent to those described herein can be used in the practice of the present invention, the exemplary methods, devices, and materials are described herein.
The practice of the present invention will employ, unless otherwise indicated, conventional techniques of molecular biology (including recombinant techniques), microbiology, cell biology, biochemistry and immunology, which are within the skill of the art. Such techniques are explained fully in the literature, such as, Molecular Cloning: A Laboratory Manual, 2nd ed. (Sambrook et al., 1989); Oligonucleotide Synthesis (M. J. Gait, ed., 1984); Animal Cell Culture (R. I. Freshney, ed., 1987); Methods in Enzymology (Academic Press, Inc.); Current Protocols in Molecular Biology (F. M. Ausubel et al., eds., 1987, and periodic updates); PCR: The Polymerase Chain Reaction (Mullis et al., eds., 1994); Remington, The Science and Practice of Pharmacy, 20th ed., (Lippincott, Williams & Wilkins 2003), and Remington, The Science and Practice of Pharmacy, 22th ed., (Pharmaceutical Press and Philadelphia College of Pharmacy at University of the Sciences 2012).
As used herein, the terms “comprises,” “comprising,” “includes,” “including,” “has,” “having,” “contains”, “containing,” “characterized by,” or any other variation thereof, are intended to encompass a non-exclusive inclusion, subject to any limitation explicitly indicated otherwise, of the recited components. For example, a fusion protein, a pharmaceutical composition, and/or a method that “comprises” a list of elements (e.g., components, features, or steps) is not necessarily limited to only those elements (or components or steps), but may include other elements (or components or steps) not expressly listed or inherent to the fusion protein, pharmaceutical composition and/or method.
As used herein, the transitional phrases “consists of” and “consisting of” exclude any element, step, or component not specified. For example, “consists of” or “consisting of” used in a claim would limit the claim to the components, materials or steps specifically recited in the claim except for impurities ordinarily associated therewith (i.e., impurities within a given component). When the phrase “consists of” or “consisting of” appears in a clause of the body of a claim, rather than immediately following the preamble, the phrase “consists of” or “consisting of” limits only the elements (or components or steps) set forth in that clause; other elements (or components) are not excluded from the claim as a whole.
As used herein, the transitional phrases “consists essentially of” and “consisting essentially of” are used to define a fusion protein, pharmaceutical composition, and/or method that includes materials, steps, features, components, or elements, in addition to those literally disclosed, provided that these additional materials, steps, features, components, or elements do not materially affect the basic and novel characteristic(s) of the claimed invention. The term “consisting essentially of” occupies a middle ground between “comprising” and “consisting of”.
When introducing elements of the present invention or the preferred embodiment(s) thereof, the articles “a”, “an”, “the” and “said” are intended to mean that there are one or more of the elements. The terms “comprising”, “including” and “having” are intended to be inclusive and mean that there may be additional elements other than the listed elements.
The term “and/or” when used in a list of two or more items, means that any one of the listed items can be employed by itself or in combination with any one or more of the listed items. For example, the expression “A and/or B” is intended to mean either or both of A and B, i.e. A alone, B alone or A and B in combination. The expression “A, B and/or C” is intended to mean A alone, B alone, C alone, A and B in combination, A and C in combination, B and C in combination or A, B, and C in combination.
It is understood that aspects and embodiments of the invention described herein include “consisting” and/or “consisting essentially of” aspects and embodiments.
It should be understood that the description in range format is merely for convenience and brevity and should not be construed as an inflexible limitation on the scope of the invention. Accordingly, the description of a range should be considered to have specifically disclosed all the possible sub-ranges as well as individual numerical values within that range. For example, description of a range such as from 1 to 6 should be considered to have specifically disclosed sub-ranges such as from 1 to 3, from 1 to 4, from 1 to 5, from 2 to 4, from 2 to 6, from 3 to 6 etc., as well as individual numbers within that range, for example, 1, 2, 3, 4, 5, and 6. This applies regardless of the breadth of the range. Values or ranges may be also be expressed herein as “about,” from “about” one particular value, and/or to “about” another particular value. When such values or ranges are expressed, other embodiments disclosed include the specific value recited, from the one particular value, and/or to the other particular value. Similarly, when values are expressed as approximations, by use of the antecedent “about,” it will be understood that the particular value forms another embodiment. It will be further understood that there are a number of values disclosed therein, and that each value is also herein disclosed as “about” that particular value in addition to the value itself. In embodiments, “about” can be used to mean, for example, within 10% of the recited value, within 5% of the recited value, or within 2% of the recited value.
As used herein, “patient” or “subject” means a human or animal subject to be treated.
As used herein the term “pharmaceutical composition” refers to a pharmaceutically acceptable composition, wherein the composition comprises a pharmaceutically active agent, and in some embodiments further comprises a pharmaceutically acceptable carrier. In some embodiments, the pharmaceutical composition may be a combination of pharmaceutically active agents and carriers.
As used herein the term “pharmaceutically acceptable” means approved by a regulatory agency of the Federal or a state government or listed in the U.S. Pharmacopoeia, other generally recognized pharmacopoeia in addition to other formulations that are safe for use in animals, and more particularly in humans and/or non-human mammals.
As used herein the term “pharmaceutically acceptable carrier” refers to an excipient, diluent, preservative, solubilizer, emulsifier, adjuvant, and/or vehicle with which demethylation compound(s), is administered. Such carriers may be sterile liquids, such as water and oils, including those of petroleum, animal, vegetable or synthetic origin, such as peanut oil, soybean oil, mineral oil, sesame oil and the like, polyethylene glycols, glycerine, propylene glycol or other synthetic solvents. Antibacterial agents such as benzyl alcohol or methyl parabens; antioxidants such as ascorbic acid or sodium bisulfite; chelating agents such as ethylenediaminetetraacetic acid; and agents for the adjustment of tonicity such as sodium chloride or dextrose may also be a carrier. Methods for producing compositions in combination with carriers are known to those of skill in the art. In some embodiments, the language “pharmaceutically acceptable carrier” is intended to include any and all solvents, dispersion media, coatings, isotonic and absorption delaying agents, and the like, compatible with pharmaceutical administration. The use of such media and agents for pharmaceutically active substances is well known in the art. See, e.g., Remington, The Science and Practice of Pharmacy, 20th ed., (Lippincott, Williams & Wilkins 2003). Except insofar as any conventional media or agent is incompatible with the active compound, such use in the compositions is contemplated.
As used herein, “therapeutically effective” refers to an amount of a pharmaceutically active compound(s) that is sufficient to treat or ameliorate, or in some manner reduce the symptoms associated with diseases and medical conditions. When used with reference to a method, the method is sufficiently effective to treat or ameliorate, or in some manner reduce the symptoms associated with diseases or conditions. For example, an effective amount in reference to diseases is that amount which is sufficient to block or prevent onset; or if disease pathology has begun, to palliate, ameliorate, stabilize, reverse or slow progression of the disease, or otherwise reduce pathological consequences of the disease. In any case, an effective amount may be given in single or divided doses.
As used herein, the terms “treat,” “treatment,” or “treating” embraces at least an amelioration of the symptoms associated with diseases in the patient, where amelioration is used in a broad sense to refer to at least a reduction in the magnitude of a parameter, e.g. a symptom associated with the disease or condition being treated. As such, “treatment” also includes situations where the disease, disorder, or pathological condition, or at least symptoms associated therewith, are completely inhibited (e.g. prevented from happening) or stopped (e.g. terminated) such that the patient no longer suffers from the condition, or at least the symptoms that characterize the condition.
As used herein, and unless otherwise specified, the terms “prevent,” “preventing” and “prevention” refer to the prevention of the onset, recurrence or spread of a disease or disorder, or of one or more symptoms thereof. In certain embodiments, the terms refer to the treatment with or administration of a compound or dosage form provided herein, with or without one or more other additional active agent(s), prior to the onset of symptoms, particularly to subjects at risk of disease or disorders provided herein. The terms encompass the inhibition or reduction of a symptom of the particular disease. In certain embodiments, subjects with familial history of a disease are potential candidates for preventive regimens. In certain embodiments, subjects who have a history of recurring symptoms are also potential candidates for prevention. In this regard, the term “prevention” may be interchangeably used with the term “prophylactic treatment.”
As used herein, and unless otherwise specified, a “prophylactically effective amount” of a compound is an amount sufficient to prevent a disease or disorder, or prevent its recurrence. A prophylactically effective amount of a compound means an amount of therapeutic agent, alone or in combination with one or more other agent(s), which provides a prophylactic benefit in the prevention of the disease. The term “prophylactically effective amount” can encompass an amount that improves overall prophylaxis or enhances the prophylactic efficacy of another prophylactic agent.
Numbered embodiments of the invention:
1. A method for determining a glycosylation pattern on a protein to be produced by a cell, comprising:
a. quantifying glycosyltransferase (GT) mutations effects on reactions catalyzed by other GTs from a glycosylated product to establish learned GT-GT interaction rules, and
b. applying learned GT-GT interaction rules from the quantifying step to predict an outcome of GT mutations on the glycosylation pattern on the protein.
2. The method of Embodiment 1, wherein the mutations occur on more than one GT gene.
3. The method of Embodiment 1, wherein the glycosylated product is a protein.
4. The method of Embodiment 1, wherein the protein is different from the glycosylated product.
5. The method of Embodiment 1, wherein the method utilizes a Markov model.
6. The method of Embodiment 1, wherein GT mutations are in different GT isozymes.
7. The method of Embodiment 1, wherein the cell is a human cell.
8. A method of producing a protein having a desired glycosylated pattern comprising determining a glycosylation pattern by the method of Embodiments 1-7 and producing the glycosylated protein.
9. A glycosylated protein produced by the method of Embodiment 8.
10. A method of treatment for a subject in need comprising administering to the subject a treatment effective amount of the glycosylated protein of Embodiment 9.
A branch-Specific N-glycosylation Markov Model Effectively Predicts Glycosylation of Glycoengineered CHO Cells
Here, we present four major changes to the N-glycosylation Markov model [15, 16] to overcome the aforementioned challenges (see details in Materials and Methods). To test these changes, we defined two different types of models: a branch-specific model and a branch-general model. The branch-specific model introduced the possibility of branch-specific substrate specificity for each isozyme catalyzing sialylation, galactosylation, and poly-LAcNAc elongation reactions (see details in Materials and Methods). Meanwhile, the branch-general model does not distinguish the glycan substrate branches. We subsequently tested this updated framework (
Our newly modified framework demonstrated notable improvements in RMSE and coverage (
Substrate Specificity of Glycosyltransferases can be Predicted by Model Transition Probabilities
To gain insights into effective glycosylation prediction using the branch-specific models, we closely examined the optimized transition probabilities (TPs) of these models. Each transition probability (TP) is regarded as the probability of transition from one state (substrate) to another (product) for a specific reaction type. The wild-type (WT) model is the basis used to compare with the other glycoengineered models. Therefore, we used the wild-type model to explore if substrate specificity of glycosyltransferases could be described by the TPs. The overall WT model showed a good fit (RMSE=7.72e-03) and complete (100%) coverage (
Four important findings from the model TPs (
The Branch-Specific Markov Model Reveals Glycosyltransferase Isozyme Specificity and Co-Dependence
Perturbation experiments are widely used to identify potential regulators (e.g., transcriptional regulator), their gene targets, and their regulatory relationships. Here, we employed the same rationale to study how glycosyltransferases regulate N-linked glycan synthesis, using a comprehensive compilation of GT-perturbed glycoprofiles [24]. Specifically, we systematically quantified the contribution of each GT isozyme to different GT reactions by investigating the impact of a single knockout GT on all other reactions. This was done by computing the fold change of TP vectors between the WT model and the GT-knockout models. A significant interaction between a GT and a reaction is detected if the GT knockout significantly altered both the transition probability (TP) and the reaction flux of the GT-knockout model in comparison with those of the WT model (Materials and Methods).
Our results show the total effects of glycosyltransferases on N-linked glycosylation, as identified by the branch-specific models (
The B3gnt-family glycosyltransferases add GlcNAc to the galactose of the N-linked glycans (poly-LacNAc extension). We observed their differentiated catalytic capabilities on LacNAc extension (red lines in
Intriguingly, despite that glycosylation has been known as a non-templated glycan synthesis process, all these results suggest glycosylation to be a robust cellular process with the mechanism in response to GT knockout. While interactions between different isozymes in the same family and other GTs are complicated, our model TPs and flux variation were highly consistent with the GTs' known interactive mechanisms or enzyme kinetics. While further experimental validation is required, our model captured glycotransferase isozyme specificity and suggested how glycosyltransferases influence the activities with each other. These insights shed light on the regulation of N-linked glycosylation.
Glycoprofiles for Complex GT Mutants can be Predicted from Single GT Knockout Models
Genetic interactions complicate the prediction of multi-gene knockout phenotypes, especially when the genes are involved in the same pathway. However, since our modeling framework captures the pathway architecture in N-linked glycosylation, we examined if our models trained on single GT mutants could predict glycoprofiles for mutants with more complex genotypes. Specifically, after obtaining the fitted models of single GT knockouts, we extracted transition probability (TP) vectors from these models and combined them to create new TP vectors, which predicted the GTs' collective influence on the N-glycosylation synthesis for the combinatory knockout experiments. We developed an algorithm that enabled us to assess the significance of TP fold change vector elements for a multiplex glycoengineered Markov model (Materials and Methods). Briefly, our algorithm identifies the fitted single-knockout TPs that define the changes in reaction flux following the knockout of an isozyme. It subsequently merges these TPs for all gene knockouts in the more complex mutant to establish a new multi-gene knockout TP vector for glycoprofile prediction.
The predicted glycoprofiles produced by our models showed high consistency with the experimental profiles for the multi-gene knockouts. Specifically, glycoprofiles were accurately predicted for eight erythropoietin (EPO) samples, each produced in different glycoengineered CHO cells with different combinations of glycosyltransferases knocked out. The multi-gene knockout models predict glycoprofiles with excellent performance (all log2(RMSEs)<−5.5), comparable to the fitting performance in general (
Glycoprofiles can be Predicted for Additional Glycoengineered Drugs De Novo, Based Solely on TP Fold Changes Learned from EPO
Various factors impact the glycoprofile of each unique protein, including protein sequence, structure, post-translational modifications, etc. Thus, it is unclear if glycosyltransferase preferences for one glycoprotein substrate will translate to other protein substrates. Thus, we tested if the EPO-trained models could be generalized to predict the glycoprofiles of other glycoengineered protein drugs (see details in Materials and Methods) directly from their corresponding wildtype models (see
Testing our hypothesis, we predicted glycoprofiles for three different drugs (Rituximab, alpha-1 antitrypsin, and Enbrel) produced by CHO cell lines with both single and multiplex GT knockouts covering all the four GT families (
The Low-Parameter Markov Framework is Further Simplified for More Efficient Modeling of Glycosylation
Over the past two decades, several mathematical models have provided insights into the complex glycosylation machinery [8, 10, 25, 36, 37]. Here, we extended our low-parameter Markov model framework [15] and demonstrated its ability to predict GT substrate specificity and the outcome of multiplex glycosyltransferase mutations. This low parameter approach does not require the input of kinetic or concentration information, and we further simplified it by updating the transition probability (TP) formulation only describe the activity of the 20 different glycosyltransferases and glycosidases (the previous formulation considered all transitions at each branch point in the biosynthetic network independently). In essence, the updated framework makes strong ties between transition probabilities (TPs) and the enzymes' catalytic capabilities, which is especially effective for modeling glycoengineered glycoprofiles. By closely examining the fluxes of glycosylation models, our results demonstrated that the new method comprehensively captures the active parts of the glycosylation network following glycoengineering. For example, our single knockout models (Mgat4b and Mgat5) identified significantly increased poly-LacNAc extension fluxes, which is consistent with known competition between the Mgat isozymes and B3gnt isozymes for the same GlcNAc monosaccharides ([29, 30], see Results). Furthermore, we replaced the original flux variability analysis (FVA) with the efficient global optimization algorithm—Pattern Search. At present, we are able to model a glycoprofile within 2 hours for a model with 8,435 glycans and 19,719 reactions, which took a few days to complete by using the original FVA optimization algorithm. Both the reduced number of TPs and the new algorithm make the computational time of fitting a large reaction network more practical.
Computational Analyses can Unravel Multi-Glycosyltransferase Interactions Impacting Activities Beyond their Simple Enzyme Rules
A critical challenge in developing a predictive glycosylation model lies in the difficulties of quantifying the genetic interactions beyond each GT's simple enzyme rules. Recently, large amounts of glycoprofiling data were generated from GT knockouts. These data allow us to capture how each perturbed GT impacts the expected activities of other GTs, providing new insights into the genetic interactions between different glycosyltransferases. We presented here a comprehensive documentation of genetic interactions between glycosyltransferases. Importantly, while GTs are expected to be specific toward their own catalytic functions, we show here that knocking out a glycosyltransferase could impact the function of other GTs. For instance, the Mgat2 knockout decreased its own GnTII reaction but promoted the b4GalT—Branch2 reaction (galactosylation). The above findings raise at least two important issues for biotherapeutic glycoengineering applications. The first issue concerns the extent to which potential unintended GT changes (off-target effects) may arise from a specific GT perturbation, and rational glycoengineering of a specific glycoform could be more non-intuitive than we thought. However, as multiplex GT mutants are constructed and profiled, computational approaches as presented here can identify and account for genetic interactions, thus helping improve rational glycoengineering of biotherapeutics. Furthermore, such computational analyses can be leveraged to guide research into the underlying molecular mechanisms (e.g., transcription, epigenetic, and feedback loops) regulating GT-GT interactions.
Predicting Glycosylation with Minimal a Priori Knowledge
One of the major goals for developing glycosylation models is to provide valuable guidance for glycoengineering therapeutic proteins. The present findings of this research contribute to the field's understanding of the underlying rules acting on single GT knockout models resulting in a complex GT mutated model, which enables us to predict glycoprofiles of multi-gene mutations. The excellent performance for our model indicates that TP fold changes capture the specificity of each isozyme. These TP values that were learned and quantified from glycoengineered EPO profiles could be combined to predict the glycoprofiles from multi-gene mutants producing distinct glycoproteins, as long as one has the WT glycoprofile for the new protein of interest. These results lend credence to the hypothesis that the GT interactions are generally encoded in the glycosylation machinery, which could be captured by our glycosylation model. It is apparent that the effect of complex GT knockout strategies impact different biologics in a similar manner. The satisfying accuracy of prediction results and the generalizability of the model pave the way to prospective research for consolidating the study of glycosyltransferase interactions and for rational glycoengineering for better biopharmaceuticals.
Disentangling the Functions of Different Isozymes
We demonstrated here that model-based analyses can discover or reinforce our understanding of the unique functions of different GT isozymes. We found that there are major isozymes whose knockouts impacted more reactions. Several studies have demonstrated the diversity of GT isozymes. For example, in different mammalian cells, Mgat4b is more responsible for the GlcNAc-β1,4-Man-α1,3 branching [24], B4galt1 for galactosylation [24, 38], St3gal4 for sialylation [39], and B3gnt2 for poly-GlcNAc formation [20, 32, 33]. Our glycosylation modeling framework confirmed putative GT specificity but reinforced the dominant role of these major GT isozymes in CHO cells. Furthermore, our results also suggest that different GT isozymes have differences in their functions. For instance, our model suggests that knocking out St3gal6 or St3gal4 had the most severe impact on sialylation (decreased sialylation fluxes by >85%), but knocking out St3gal3 had little influence. These results are in accordance with its primary role for sialylation [39]. This knowledge is particularly important and could be applied to improve product quality through glycoengineering by being able to partially dial down some glycan epitopes. Indeed, sialylation is a key factor in most glycoengineering, since it can improve the serum half life and activity of these drugs [40]. On the other hand, limiting sialylation on monoclonal antibodies (mAb) could enhance antibody-dependent cell-mediated cytotoxicity (ADCC) and complement-dependent cytotoxicity (CDC). In these cases, we could consider knocking out a few sialyltransferases (St3gal3, St3gal4, or St3gal6) for better control of the sialylation on mAb. The proposed model framework thus provides a toolbox that could help identify the best combination of different GT isozymes for desired glycoforms. The more we are able to disentangle the functions of different isozymes, the better we can ultimately control the glycosylation machinery, which should be an important steppingstone toward rational glycoengineering.
Conclusions
Here we present a substantial improvement to the Markov chain modeling framework for glycosylation, which accounts for branch-specificity and isozyme preference. These refined models effectively simulated the N-glycosylation process of recombinant proteins produced by various glycoengineered CHO cell lines. The essence of our model is transition probabilities, which capture the catalytic capabilities of glycosyltransferase isozymes and quantify the changes in glycosylation after knocking out various isozymes. Exploiting the new modeling framework, we systematically examined the potential interactions between different families of glycosyltransferases and their substrate/branch specificities, which provides insights into the roles of GT isozymes in specific contexts. Our results here further demonstrated that we can predict complex glycoengineered glycoprofiles from single-KO models. With the learned fold changes of transition probabilities from EPO, we achieved de novo prediction of GT-KO glycoprofiles directly from their WT glycoprofiles for new protein drugs produced by CHO cells. Therefore, as this framework facilitates rational glycoengineering of various glycosylated protein drugs, it will accelerate the development of effective, safe, and affordable glycosylated biopharmaceuticals.
Materials and Methods
Framework of Markov Chain Model for the N-Linked Glycosylation
The Markov model of glycosylation is implemented as previously published [15], with a few adaptations described here to improve the fitting to glycoprofiles subsequent model predictions (
Model Evaluation Metrics—RMSE and Coverage
Two model evaluation metrics were used for evaluating the performance of our models. The first one is the root mean squared error (RMSE) for assessing the goodness of fit between the model-predicted glycan intensities and the experimentally measured glycan intensities. The experimental glycoprofiles were fit by minimizing the RMSE of TP vectors between the model prediction glycoprofile and the experimental glycoprofile. The RMSE was calculated by equation 1, where N represents the number of all possible glycan compositions (m/z values) in the experimental glycoprofile. ypre,i (yexp,i) represented the predicted (experimentally measured) signal intensity measured at the ith m/z value (glycan composition).
Statistical significance was further assessed using the highest density interval (HDI), wherein the statistical meaning of HDI=95% is that the two groups of tested models are significantly different with a 95% confidence interval.
Another model evaluation metric is ‘coverage’ for assessing how many of the experimentally measured glycans were accurately included among the glycans predicted by our framework. For an experimental glycoprofile, the m/z values corresponding to glycans with the top signal intensities and collectively representing at least 90% of the total signal intensity were selected as experimentally detected glycans. The coverage was defined as the ratio of these glycan compositions that can be captured by the glycoprofiles predicted by the Markov models (branch-specific and branch-general models).
Predicting Multiple GT Knockouts from Single GT Knockout Models
The TP vector for a given multiple knockout glycoprofile was derived from the TP vectors of the relevant fitted single-knockout glycoprofiles. Four criteria were used to define the significance of TP vector elements for a multiplex glycoengineered Markov model. Specifically, the fitted single-knockout TPs are required for substantiating the impact of knocking out an isozyme on the reactions listed in Table 2. First, the TP fold change of reaction i after knocking out glycosyltransferase k must be statistically different from 0 (i.e., the 95% highest density interval (HDI) does not include 0 from the BEST analysis. Assessment of the statistical credibility of flux and TP using Bayesian estimation was used. Second, the mean flux fold change of reaction i, after knocking out glycosyltransferase k, must be have a scaling factor of at least 1.5 fold (|log2(mean flux fold change)|≥0.58), and the mean flux fold change±one standard deviation does not include 1. Then, another two additional criteria were established for predicting a new TP for a glycoprofile with combinatorial glycosyltransferase knockouts. Third, if all isozymes of the same family are knocked out, the TP log2 fold changes of the associated direct reaction(s) will be reduced to at most −10 (eliminating fluxes of direct reactions). Fourth, log2(flux fold change) and log2(TP fold change) must have the same sign for the KO model of glycosyltransferase k. These four criteria were applied in equations 2-3 for deriving the final combined TP vectors:
Briefly, the fold change of the transition probability values, FC(TPFi,k), is defined as the TP fold change of reaction i, which is the reaction (denoted as ‘F’) directly catalyzed by GT-isozyme k, and FC(TPSi,k) is the reaction (denoted as ‘S’) potentially impacted by GT-isozyme k knockout. In which, Table 2 listed the reactions directly catalyzed by a given enzyme based on their known reaction rules. The potentially impacted reactions are all the other reactions not directly influenced by the GT-isozyme k knockout, which can be indirectly influenced by either kinetically or through other known interactions (i.e. B4galt and Mgat4). Ai is the number of non-zero FC(TPSi,k), and FC(TPCi,k) is the TP fold change of reaction i for the predicted multiple glycosyltransferase knockout glycoprofile. FC (Fold change) is defined as the TP of reaction i for the fitted WT divided by the predicted multiple GT-KO glycoprofiles. The derived (predicted) TP vector for a combined GT-KO Markov model was then assigned to the initial TPX, which was used in models to predict the multiple knockout glycoprofile (
Protein Purification and Glycan Analysis for Additional Glycoengineered Drugs
GT-knockout cell line generation and model protein expression. Glyco gene knockout cells lines were derived from the CHO-S cell line (Gibco Cat. #A11557-01), and they were generated and verified according to the procedures described previously [53]. Cells were cultured in CD CHO medium (Gibco 10743-029) supplemented with 8 mM L-glutamine (Lonza BE17-605F) and 2 mL/L of anti-clumping agent (Gibco 0010057AE) according to the Gibco guidelines. The day prior to transfection, cells were washed and cultured in exponential phase in medium not supplemented with anti-clumping agent. At the day of transfection, viable cell density was adjusted to 800,000 cells/mL in 125 mL shake flasks (Corning 431143) containing 30 mL medium only supplemented with 8 mM L-glutamine. Plasmids encoding for Rituximab, Enbrel, and alpha-1-antitrypsin, respectively, were used for transient transfections. For each transfection, 30 ug plasmid was diluted in OptiPro SFM (Gibco 12309019) to a final volume of 750 uL. Separately, 90 uL FuGene HD reagent (Promega E2311) was diluted in 660 uL OptiPro SFM. The plasmid/OptiPro SFM mixture was added to the FuGENE HD/OptiPro SFM mixture and incubated at room temperature for 5 minutes. The resultant 1.5 mL plasmid/lipid mixture was added dropwise to the cells. Supernatants containing model protein were harvested after 72 h by centrifugation of cell culture at 1,000g for 10 minutes and stored at −80° C. until purification and N-glycan analysis.
Protein purification and N-glycan Rituximab and Enbrel were purified by protein A affinity chromatography. A 5-mL MAbSelect column (GE Healthcare) was equilibrated with 5 column volumes (CV) of 20 mM sodium phosphate, 0.15 M NaCl, pH 7.2. Following column equilibration, the supernatant was loaded, the column was washed with 8 CV of 20 mM sodium phosphate, 0.15 M NaCl, pH 7.2, and the protein was eluted using 0.1 M citrate, pH 3.0. Elution fractions (0.5 mL) were collected in deep-well plates containing 60 μL of 1 M Tris, pH 9 per well. alpha-1-antitrypsin, C-terminally tagged with the HPC4 tag (amino acids EDQVDPRLIDGK), was purified over a 1-mL column of anti-protein C affinity matrix according to the manufacturer's protocol (Roche, cat. no. 11815024001). 1 mM CaCl2was added to the supernatants, equilibration buffer and wash buffer. The proteins were eluted in 0.5 mL fractions using 5 mM EDTA in the elution buffer. For all four proteins, elution fractions containing the highest concentration of protein were concentrated ten-fold using Amicon Ultra 0.5-mL centrifugal filter units (MWCO 10 kDa). 12 μL of concentrated protein solutions (concentrations varying between 0.1 and 1 mg/mL) were subjected to N-glycan labeling using the GlycoWorks RapiFluor-MS N-Glycan Kit (Waters).
N-glycan analysis. N-glycans were labeled with GlycoWorks RapiFluor-MS N-Glycan Kit (Waters). Briefly, 12 uL concentrated culture supernatant labeled according to the manufacturer's instructions. Labeled N-Glycans were analyzed by LC-MS as described previously [53]. Initial conditions 25% 50 mM ammonium formate buffer 75% Acetonitrile, separation gradient from 30% to 43% buffer. MS were run in positive mode, no source fragmentation. The normalized, relative amount of the N-glycans is calculated from the area under the peak with Thermo Xcalibur software (Thermo Fisher Scientific).
O-Glycosylation Markov Model Effectively Predicts Glycosylation of Salmon Skin Mucus
O-glycosylation plays important roles in developmental and immunological functions in biological systems (Joshi et al., 2018); especially, the mucin-type O-glycosylation has been investigated for its potential use in drug and vaccine development.(Tarp and Clausen, 2007) For example, the pathogen, furunculosis-causing bacterium Aeromonas salmonicida ssp. salmonicida, binds differentially to mucins isolated from skin and intestinal regions of the Atlantic salmon (Padra et al., 2014), resulting in substantial loss in the salmon industry. Therefore, investigation on the O-glycosylation could benefit salmon production, especially to improve the health of the fish and its impact on nutritional value.
Recently, Jin et al. (Jin et al., 2015) experimentally measured the mucin-type O-glycosylation of Atlantic Salmon. In this study, we developed the first O-glycosylation model on mucin proteins composing the mucus layer of the Atlantic Salmon. Table 1 shows the reaction rules for reconstructing the mucin-type O-glycosylation network. We first fit the reconstructed O-glycosylation network with the experimental data of 5 salmon skin samples (Table 2). Our results demonstrated excellent performance (RMSE ranged between 1.74 to 2.48; average RMSE=1.95) and most of the detected glycans were identified by our model (coverage greater than 95%) for predicting the mucin-type O-glycosylation of salmon skin samples (
Lipids are essential for a variety of biological functions and are some of the most fundamental components of cells. While advances in lipidomics technology have enabled us to probe the pathogenesis of many severe but common diseases, such as hepatic steatosis, systematic study of shift in lipidomic mechanisms remains a daunting task due to the tremendous number of lipid subspecies and unclarified enzymes of analogous reactions (Han et al, 2016; Lydic et al, 2018). Previously published kinetic lipidomics models (ODEs) or FBA models only attempted to look at limited pools of specific, well characterized lipid species but often omitted disambiguating isomers and required a priori estimation of kinetic/constraint parameters, while higher-level, community-based lipid network analyses provided only limited insights into the genetic bases of lipidomic changes (Shih et al, 2008; McAuley & Mooney et al, 2015; Schützhold et al, 2016; Tsouka & Hatzimanikatis et al, 2020). Here, we extended our low-parameter Markov framework (Spahn et al., 2016; Liang et al., 2020) for modeling the complex synthesis process of the lipidome. Fully realizing the potential of Markov modeling, it is of great interest for us to understand and quantify the underlying lipid synthesis dynamics when presented with comprehensive lipidomics data.
A collaboration between NIST and NIDDK (Quehenberger et al, 2010) published a large set of comprehensive lipidomic samples (100 human plasma lipidomic samples) with 500 measured lipid subspecies. In this study, we developed a comprehensive lipidomic model which predicted a lipidomics sample of the human plasma dataset. Table 3 shows the reaction rules for constructing the lipid synthesis network used to demonstrate the modeling framework, and the scope of the modeling framework was summarized in
#Potential retro-conversions and short-path inter-conversions between lipid species are implicitly combined with the corresponding synthesis reactions, or ignored as they usually happen in mitochondria and peroxisome.
Metabolism. Curr Biol. 2012; 22:R414-24. doi:10.1016/j.cub.2012.03.004.
This application is a U.S. National Stage of International Application No. PCT/EP2020/082713, filed Nov. 19, 2020, which claims the benefit of U.S. Provisional Patent Application No. 62/937,932, filed Nov. 20, 2019, the entire contents of each of which are fully incorporated herein by reference.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/EP2020/082713 | 11/19/2020 | WO |
Number | Date | Country | |
---|---|---|---|
62937932 | Nov 2019 | US |