Glycomics, an integrated approach to structure-function relationships of complex carbohydrates or glycans, is emerging as an important paradigm in post-genomics cellular and molecular biology. In the past few years, there has been a dramatic increase in the known biological roles of glycans in fundamental biological processes, such as cell growth and development, tumor growth and metastasis, anticoagulation, immune recognition/response, cell-cell communication, and microbial pathogenesis. Glycans are primary components of the cell surface and the interface between cell and its extracellular environment. As a result, glycans interact with numerous proteins such as growth factors, cytokines, immune receptors, and enzymes, which modulate their activity and thus impinge on the above biological processes.
Therefore, there is a need to identify and/or characterize glycan binding capabilities.
The present invention provides a system for analyzing glycans and their interaction partners. The inventive system is particularly useful in the identification and analysis of glycoprotein binding interactions. As described herein, the inventive system has been applied to several different glycoprotein analyses, in each case successfully identifying interaction characteristics. The principles of the inventive system are therefore widely applicable across glycan interactions.
Affinity: As is known in the art, “affinity” is a measure of the tightness with a particular ligand (e.g., an HA polypeptide) binds to its partner (e.g., and HA receptor). Affinities can be measured in different ways.
Biologically active: As used herein, the phrase “biologically active” refers to a characteristic of any agent that has activity in a biological system, and particularly in an organism. For instance, an agent that, when administered to an organism, has a biological effect on that organism, is considered to be biologically active. In particular embodiments, where a protein or polypeptide is biologically active, a portion of that protein or polypeptide that shares at least one biological activity of the protein or polypeptide is typically referred to as a “biologically active” portion.
Broad spectrum human-binding (BSHB) H5 HA polypeptides: As used herein, the phrase “broad spectrum human-binding H5 HA” refers to a version of an H5 HA polypeptide that binds to HA receptors found in human epithelial tissues, and particularly to human HA receptors having α2-6 sialylated glycans. Moreover, inventive BSHB H5 HAs bind to a plurality of different α2-6 sialylated glycans. In some embodiments, BSHB H5 HAs bind to a sufficient number of different α2-6 sialylated glycans found in human samples that viruses containing them have a broad ability to infect human populations, and particularly to bind to upper respiratory tract receptors in those populations. In some embodiments, BSHB H5 HA bind to umbrella glycans (e.g., long α2-6 sialylated glycans) as described herein.
Characteristic portion: As used herein, the phrase a “characteristic portion” of a protein or polypeptide is one that contains a continuous stretch of amino acids, or a collection of continuous stretches of amino acids, that together are characteristic of a protein or polypeptide. Each such continuous stretch generally will contain at least two amino acids. Furthermore, those of ordinary skill in the art will appreciate that typically at least 5, 10, 15, 20 or more amino acids are required to be characteristic of a protein. In general, a characteristic portion is one that, in addition to the sequence identity specified above, shares at least one functional characteristic with the relevant intact protein.
Characteristic sequence: A “characteristic sequence” is a sequence that is found in all members of a family of polypeptides or nucleic acids, and therefore can be used by those of ordinary skill in the art to define members of the family.
Cone topology: The phrase “cone topology” is used herein to refer to a 3-dimensional arrangement adopted by certain glycans and in particular by glycans on HA receptors. As illustrated in
Corresponding to: As used herein, the term “corresponding to” is often used to designate the position/identity of an amino acid residue in an HA polypeptide. Those of ordinary skill will appreciate that, for purposes of simplicity, a canonical numbering system (based on wild type H3 HA) is utilized herein (as illustrated, for example, in FIGS. 5 and 10-13), so that an amino acid “corresponding to” a residue at position 190, for example, need not actually be the 190th amino acid in a particular amino acid chain but rather corresponds to the residue found at 190 in wild type H3 HA; those of ordinary skill in the art readily appreciate how to identify corresponding amino acids.
Degree of separation removed: As used herein, amino acids that are a “degree of separation removed” are HA amino acids that have indirect effects on glycan binding. For example, one-degree-of-separation-removed amino acids may either: (1) interact with the direct-binding amino acids; and/or (2) otherwise affect the ability of direct-binding amino acids to interact with glycan that is associated with host cell HA receptors; such one-degree-of-separation-removed amino acids may or may not directly bind to glycan themselves. Two-degree-of-separation-removed amino acids either (1) interact with one-degree-of-separation-removed amino acids; and/or (2) otherwise affect the ability of the one-degree-of-separation-removed amino acids to interact with direct-binding amino acids, etc.
Direct-binding amino acids: As used herein, the phrase “direct-binding amino acids” refers to HA polypeptide amino acids which interact directly with one or more glycans that is associated with host cell HA receptors.
Engineered: The term “engineered”, as used herein, describes a polypeptide whose amino acid sequence has been selected by man. For example, an engineered HA polypeptide has an amino acid sequence that differs from the amino acid sequences of HA polypeptides found in natural influenza isolates. In some embodiments, an engineered HA polypeptide has an amino acid sequence that differs from the amino acid sequence of HA polypeptides included in the NCBI database.
H1 polypeptide: An “H1 polypeptide”, as that term is used herein, is an HA polypeptide whose amino acid sequence includes at least one sequence element that is characteristic of H1 and distinguishes H1 from other HA subtypes. Representative such sequence elements can be determined by alignments such as, for example, those illustrated in FIGS. 5 and 10-11 and include, for example, those described herein with regard to H1-specific embodiments of HA Sequence Elements.
H3 polypeptide: An “H3 polypeptide”, as that term is used herein, is an HA polypeptide whose amino acid sequence includes at least one sequence element that is characteristic of H3 and distinguishes H3 from other HA subtypes. Representative such sequence elements can be determined by alignments such as, for example, those illustrated in
H5 polypeptide: An “H5 polypeptide”, as that term is used herein, is an HA polypeptide whose amino acid sequence includes at least one sequence element that is characteristic of H5 and distinguishes H5 from other HA subtypes. Representative such sequence elements can be determined by alignments such as, for example, those illustrated in
Hemagglutinin (HA) polypeptide: As used herein, the term “hemagglutinin polypeptide” (or “HA polypeptide’) refers to a polypeptide whose amino acid sequence includes at least one characteristic sequence of HA. A wide variety of HA sequences from influenza isolates are known in the art; indeed, the National Center for Biotechnology Information (NCBI) maintains a database (www.ncbi.nlm.nih.gov/genomes/FLU/flu.html) that, as of the filing of the present application included 9796 HA sequences. Those of ordinary skill in the art, referring to this database, can readily identify sequences that are characteristic of HA polypeptides generally, and/or of particular HA polypeptides (e.g., H1, H2, H3, H4, H5, H6, H7, H8, H9, H10, H11, H12, H13, H14, H15, or H16 polypeptides; or of HAs that mediate infection of particular hosts, e.g., avian, camel, canine, cat, civet, environment, equine, human, leopard, mink, mouse, seal, stone martin, swine, tiger, whale, etc. For example, in some embodiments, an HA polypeptide includes one or more characteristic sequence elements found between about residues 97 and 185, 324 and 340, 96 and 100, and/or 130-230 of an HA protein found in a natural isolate of an influenza virus. In some embodiments, an HA polypeptide has an amino acid sequence comprising at least one of HA Sequence Elements 1 and 2, as defined herein. In some embodiments, an HA polypeptide has an amino acid sequence comprising HA Sequence Elements 1 and 2, in some embodiments separated from one another by about 100-200, or by about 125-175, or about 125-160, or about 125-150, or about 129-139, or about 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, or 139 amino acids. In some embodiments, an HA polypeptide has an amino acid sequence that includes residues at positions within the regions 96-100 and/or 130-230 that participate in glycan binding. For example, many HA polypeptides include one or more of the following residues: Tyr98, Ser/Thr136, Trp153, His183, and Leu/Ile194. In some embodiments, an HA polypeptide includes at least 2, 3, 4, or all 5 of these residues.
Isolated: The term “isolated”, as used herein, refers to an agent or entity that has either (i) been separated from at least some of the components with which it was associated when initially produced (whether in nature or in an experimental setting); or (ii) produced by the hand of man. Isolated agents or entities may be separated from at least about 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, or more of the other components with which they were initially associated. In some embodiments, isolated agents are more than 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% pure.
Long oligosaccharide: For purposes of the present disclosure, an oligosaccharide is typically considered to be “long” if it includes at least one linear chain that has at least four saccharide residues.
Non-natural amino acid: The phrase “non-natural amino acid” refers to an entity having the chemical structure of an amino acid (i.e.,:
and therefore being capable of participating in at least two peptide bonds, but having an R group that differs from those found in nature. In some embodiments, non-natural amino acids may also have a second R group rather than a hydrogen, and/or may have one or more other substitutions on the amino or carboxylic acid moieties.
Polypeptide: A “polypeptide”, generally speaking, is a string of at least two amino acids attached to one another by a peptide bond. In some embodiments, a polypeptide may include at least 3-5 amino acids, each of which is attached to others by way of at least one peptide bond. Those of ordinary skill in the art will appreciate that polypeptides sometimes include “non-natural” amino acids or other entities that nonetheless are capable of integrating into a polypeptide chain, optionally.
Pure: As used herein, an agent or entity is “pure” if it is substantially free of other components. For example, a preparation that contains more than about 90% of a particular agent or entity is typically considered to be a pure preparation. In some embodiments, an agent or entity is at least 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%<Or 99% pure.
Short oligosaccharide: For purposes of the present disclosure, an oligosaccharide is typically considered to be “short” if it has fewer than 4, or certainly fewer than 3, residues in any linear chain.
Specificity: As is known in the art, “specificity” is a measure of the ability of a particular ligand (e.g., an HA polypeptide) to distinguish its binding partner (e.g., a human HA receptor, and particularly a human upper respiratory tract HA receptor) from other potential binding partners (e.g., an avian HA receptor).
Therapeutic agent: As used herein, the phrase “therapeutic agent” refers to any agent that elicits a desired biological or pharmacological effect.
Treatment: As used herein, the term “treatment” refers to any method used to alleviate, delay onset, reduce severity or incidence, or yield prophylaxis of one or more symptoms or aspects of a disease, disorder, or condition. For the purposes of the present invention, treatment can be administered before, during, and/or after the onset of symptoms.
Umbrella topology: The phrase “umbrella topology” is used herein to refer to a 3-dimensional arrangement adopted by certain glycans and in particular by glycans on HA receptors. The present invention encompasses the recognition that binding to umbrella topology glycans is characteristic of HA proteins that mediate infection of human hosts. As illustrated in
Vaccination: As used herein, the term “vaccination” refers to the administration of a composition intended to generate an immune response, for example to a disease-causing agent. For the purposes of the present invention, vaccination can be administered before, during, and/or after exposure to a disease-causing agent, and in certain embodiments, before, during, and/or shortly after exposure to the agent. In some embodiments, vaccination includes multiple administrations, appropriately spaced in time, of a vaccinating composition.
Variant: As used herein, the term “variant” is a relative term that describes the relationship between a particular HA polypeptide of interest and a “parent” HA polypeptide to which its sequence is being compared. An HA polypeptide of interest is considered to be a “variant” of a parent HA polypeptide if the HA polypeptide of interest has an amino acid sequence that is identical to that of the parent but for a small number of sequence alterations at particular positions. Typically, fewer than 20%, 15%, 10%, 9%, 8%, 7%, 6%, 5%, 4%, 3%, 2% of the residues in the variant are substituted as compared with the parent. In some embodiments, a variant has 10, 9, 8, 7, 6, 5, 4, 3, 2, or 1 substituted residue as compared with a parent. Often, a variant has a very small number (e.g., fewer than 5, 4, 3, 2, or 1) number of substituted functional residues (i.e., residues that participate in a particular biological activity). Furthermore, a variant typically has not more than 5, 4, 3, 2, or 1 additions or deletions, and often has no additions or deletions, as compared with the parent. Moreover, any additions or deletions are typically fewer than about 25, 20, 19, 181, 17, 16, 15, 14, 13, 10, 9, 8, 7, 6, and commonly are fewer than about 5, 4, 3, or 2 residues. In some embodiments, the parent HA polypeptide is one found in a natural isolate of an influenza virus (e.g., a wild type HA).
Vector: As used herein, “vector” refers to a nucleic acid molecule capable of transporting another nucleic acid to which it has been linked. In some embodiment, vectors are capable of extra-chromosomal replication and/or expression of nucleic acids to which they are linked in a host cell such as a eukaryotic or prokaryotic cell. Vectors capable of directing the expression of operatively linked genes are referred to herein as “expression vectors.”
Wild type: As is understood in the art, the phrase “wild type” generally refers to a normal form of a protein or nucleic acid, as is found in nature. For example, wild type HA polypeptides are found in natural isolates of influenza virus. A variety of different wild type HA sequences can be found in the NCBI influenza virus sequence database, http://www.ncbi.nlm.nih.gov/genomes/FLU/FLU.html.
An important family of proteins, often referred to as glycan binding proteins (GBPs), bind to N-linked and O-linked glycans on various glycoproteins and mediate cell-cell adhesion, signaling and trafficking events in immune responses. The main classes of GBPs include C-type lectins, galectins and siglecs. GBPs are typically either expressed as soluble or membrane bound proteins in the monomeric or multimeric forms with multiple glycan binding sites. Also, GBPs can be dispersed on the cell surface or localized in a microenvironment.
The glycan binding site in a GBP is also known as a carbohydrate recognition domain (CRD). CRDs on GBPs typically accommodate mono-tetrasaccharide glycan ligand motifs. The interaction between a single CRD and a glycan motif is typically low affinity with values in μM range. However, most of the physiological glycan-GBP interactions are multivalent involving binding of an ensemble of glycan motifs to multimeric CRDs formed by association of GBPs. Thus, unlike protein-protein interactions which either activate or inhibit protein function (digital regulation), glycan-GBP interactions fine tune (analog modulation) protein function through avidity, graded affinity and multivalency.
Decoding structure-function relationships of glycan-protein interactions in the context of biochemical pathways leading to biological function presents unique challenges. One aspect of these challenges arises from the heterogeneity and chemical diversity of glycans due to their non-template biosynthesis involving coordinated expression of multiple glycosyltransferases, some of which have additional tissue specific isoforms. Furthermore, given their biosynthesis and cellular location such as multiple glycosylation sites on proteins, glycans should usually be considered as a heterogeneous mixture of different chemical structures when isolated from cells and tissues. The non-template nature of glycan biosynthesis has also made it challenging to amplify specific glycan structures from biological sources.
Many advances have been made to address the above challenges. Important developments in chemical synthesis strategies have led to synthesis of hundreds of glycan structures which capture the diversity of the glycans present at the cell surface. Using these strategies, different morphologies of glycan motifs viz. clusters, dendromers, polymers, etc. have been constructed to match the different types of multivalent associations of glycan binding sites on proteins. These multivalent glycoconjugates have been primarily utilized in competitive assays to assess the relative binding affinities of different GBPs and for designing inhibitors to physiological glycan-GBP interactions. Despite these advances, much less is known on the specificity or recognition of individual physiological glycan motifs by the different GBPs and the selectivity of biological functions modulated by these interactions.
To rapidly expand the current knowledge of known specific glycan-GBP interactions, the Consortium for Functional Glycomics (CFG; www.functionalglycomics.org), an international collaborative research initiative, has developed glycan arrays comprising several glycan structures that have enabled high throughput screening of GBPs for novel glycan ligand specificities. These glycan arrays are continuously being expanded to increase the diversity of glycan motifs to best mimic the physiological diversity of glycans. Most of the glycans on the CFG arrays were derived by chemical and chemoenzymatic synthesis.
The CFG glycan arrays also comprise both monovalent and polyvalent glycan motifs (i.e. attached to polyacrylamide backbone), and are emerging as widely used resources for glycobiologists to discover new glycan ligands for their GBPs of interest. In addition to the glycan array data, the CFG has also been developing state-of-the-art resources to generate diverse datasets ranging from gene expression of glycan biosynthetic enzymes and GBPs to whole organism glycome and phenome analysis.
The public dissemination of CFG datasets via user-friendly interfaces has begun to motivate the development of data mining tools to find interesting patterns or make meaningful predictions by analyzing these complex data sets. Data mining tools are becoming common place in the realm of genomics and proteomics. High throughput data dealing with numerous components (genes and proteins) and their interactions in complex networks representing biochemical pathways are analyzed to make statistically significant correlations and predictions. In the case of glycomics, given the analog nature of glycan-GBP interactions, it is necessary to go beyond a single glycan interacting with a single GBP to understand the common features in an ensemble of structures that govern binding to specific GBPs.
As a first step toward building data mining tools for analysis of high throughput glycomics data, we have taken a novel approach in this study to analyze the CFG glycan array data using rule induction based data mining methodologies. Taking advantage of the flexible software architecture and relational databases of the CFG, we have utilized our approach to identify patterns that govern the ability of an ensemble of glycans to bind to a specific GBP. Using specific examples of three different families of GBPs: (1) DC-SIGN and SIGNR; (2) galectins; and (3) hemagglutinins, we identify specific patterns in glycans on the array that govern the interactions with these proteins. We validate the patterns identified by using crystal structures and by predicting binding levels between GBP and glycans that are not found in the glycan array. These patterns enable, for the first time, understanding of interactions between an ensemble of glycan structures (containing a common set of features) and a given GBP, thereby allowing analysis and definition of structure-function relationships for glycan-GBP interactions.
The present invention therefore provides a system for understanding the structure-function relationships of glycan-GBP interactions. In particular, the invention provides a system for understanding how interactions between an ensemble of glycan structures and multivalent CRDs of GBPs modulate fundamental biological processes. The invention identifies features in glycans or their binding partners that determine the specificity of a given interaction. The invention also defined constraints provided by the features, for example based on analytical information (e.g., from X-ray crystallography, NMR, etc.) Such constraints can be used on their own or, optionally can be coupled with functional or other information. Appropriate functional information can, for example, be obtained from glycan binding studies.
The invention provides computational methods to analyze datasets obtained from glycan arrays such as those developed by the CFG, which are being increasingly utilized for the purpose of identifying novel candidate glycan ligands for different GBPs. As these glycan arrays continue to expand, the value of such computational methods for analyzing the datasets obtained from these arrays and understanding the basis for specificity in glycan-GBP interactions only increases.
For example, using a rule based data mining methodology to analyze the entire glycan array data (including high, medium, low affinity and non-binders), the present invention provides a novel approach to identifying patterns in glycans that have a positive and negative effect on binding to a GBP. One advantage of such a rule based approach is the presentation of the final patterns as a set of straightforward rules which can be easily applied to identify other potential glycans that satisfy these rules.
As described herein, the principles of the present invention were applied to three diverse GBP families to establish proof-of-principle of their effectiveness. In the first example, i.e. DC-SIGN and SIGNR system, the rules gave three broad features for DC-SIGN viz. high mannose, Lewis x [Galb4(Fuca3)GlcNAc] and Fuca4GlcNAc containing motifs and only the high mannose feature for DC-SIGNR. In addition to capturing the common features that governed high affinity binding, the rules also captured features that were detrimental to binding such as absence of any 3-O substitution on the Gal for the Lewis x containing motifs. These negative results were consistent with analysis of the crystal structures of DC-SIGNR, thus highlighting the value of our approach.
In the case of galectins, the rules were more complex. In addition to identifying the main feature (Galb4GlcNAc) required for high affinity ligand binding by galectins 1 and 3, we also determined the role of substitutions to this unit in the context of chain length in governing the interactions with glycan ligands. Similar to the DC-SIGN example, our findings were consistent with the analysis of the crystal structures of galectins. Based on the features, the main difference between glycan binding of galectin-1 and -3 was that galectin-3 preferred linear repeat units of the Galb4GlcNAc rather than these units present in different branches in N-linked glycans. Since galectin-1 typically occurs as a homodimer with noncovalently associated CRDs, it is possible that the presence of Galb4GlcNAc on different branches would enhance high affinity multivalent binding. On the other hand, galectin-3 is a monomer with a N-terminus linker region and would most likely have a preference to linear repeats of the lactosamine unit in comparison with the branched occurrence of these units.
The overall accuracy of our rule based induction approach is good given that the rules accurately identified 80% of the high binders (in case of DC-SIGN) to 100% of the high binders (galectin-3 and DC-SIGNR) and that there were no false positives in all of the cases. Although the glycan array has a diverse set of glycans, it still does not systematically capture the overall diversity of glycans. As a result, there are singleton data points in the screening data, i.e. high affinity glycan structures which do not fall under any specific group defined by a common set of features. Such singleton data points lead to false negatives in our prediction results. It should be observed from Tables 2 and 3, that each of the rules comprise a primary glycan motif that is shared by a set of glycans with high affinity binding. Furthermore, the primary motifs are specified in conjunction with other constraints such as absence of other motifs or chain length requirements.
As a part of the overall process of data mining, additional features based on these primary patterns can be defined and the roles of these features on glycan binding can be further investigated. For example, the location of Galb4GlcNAc in terms of distance from reducing end or non-reducing end and occurrence as a part of a linear chain or branched chain can be defined as additional features to evaluate their effect on binding. Also, additional glycan features that combine all modifications to each monosaccharide such as GalNAc, Gal[3-O—SO3], Gal[6-O—SO3] can be combined into a single feature to evaluate the importance of each of these modifications to the binding.
In summary, using CFG glycan array data as a model system, we have outlined an approach to identify rules or patterns in complex datasets that would facilitate their meaningful interpretation. Many large scale glycomics initiatives are positioning their resources to obtain diverse data sets ranging from gene expression of glycan biosynthesis enzymes, GBPs to identifying the repertoire of glycans from specific cell types and tissues isolated from different sources. As these datasets expand, the rule based induction method outlined herein can be utilized to obtain a combination of patterns that would govern gene expression to glycan-GBP interactions and biological functions.
The present invention allows detailed characterization of glycan-GBP binding interaction. The invention therefore provides definitions of sets of glycans that do (or do not) interact with a given GBP. The invention thus allows the preparation of GBP-specific glycan arrays, i.e., of arrays containing a set of glycans sufficient to establish or define the presence or identity of a particular GBP.
For example, once the glycan binding characteristics of a particular GBP are defined as provided herein, an array containing glycans that are bound, glycans that are not bound, and/or combinations thereof can be assembled and used, for example, to detect that particular GBP in samples and/or to characterize derivatives of the GBP.
To give one particular example, one of the GBPs whose binding analysis is exemplified below is the hemagglutinin (HA) H5 protein. Generally speaking, HA interacts with the surface of cells by binding to a glycoprotein receptor. Binding of HA to HA receptors is predominantly mediated by N-linked glycans on the HA receptors. Specifically, HA on the surface of flu virus particles recognizes sialylated glycans that are associated with HA receptors on the surface of the cellular host. After recognition and binding, the host cell engulfs the viral cell and the virus is able to replicate and produce many more virus particles to be distributed to neighboring cells.
HA receptors are modified by either α2-3 or α2-6 sialylated glycans near the receptor's HA-binding site, and the type of linkage of the receptor-bound glycan affects the conformation of the receptor's HA-binding site, thus affecting the receptor's specificity for different HA subtypes. Moreover, the present inventors have determined that the topology of the linked glycans (umbrella-like or cone-like) influences the receptor's specificity for different Has.
For example, the glycan binding pocket of avian HA is narrow. According to the present invention, this pocket binds to the trans conformation of α2-3 sialylated glycans, and/or to cone-topology glycans, whether α2-3 or α2-6 linked.
HA receptors in avian tissues, and also in human deep lung and gastrointestinal (GI) tract tissues are characterized by α2-3 sialylated glycan linkages, and furthermore (according to the present invention), are characterized by glycans, including α2-3 sialylated and/or α2-6 sialylated glycans, which predominantly adopt cone topologies.
By contrast, human HA receptors in the bronchus and trachea of the upper respiratory tract are modified by α2-6 sialylated glycans. Unlike the α2-3 motif, the α2-6 motif has an additional degree of conformational freedom due to the C6-C5 bond (Russell et al., Glycoconj J 23:85, 2006). HAs that bind to such α2-6 sialylated glycans have a more open binding pocket to accommodate the diversity of structures arising from this conformational freedom. Moreover, according to the present invention, HAs may need to bind to glycans (e.g., α2-6 sialylated glycans) in an umbrella topology, and particularly may need to bind to such umbrella topology glycans with strong affinity and/or specificity, in order to effectively mediate infection of human upper respiratory tract tissues.
As a result of these spatially restricted glycosylation profiles, humans are not usually infected by viruses containing many wild type avian HAs (e.g., avian H5). Specifically, because the portions of the human respiratory tract that are most likely to encounter virus (i.e., the trachea and bronchi) lack receptors with cone glycans (e.g., α2-3 sialylated glycans, and/or short glycans) and wild type avian HAs typically bind primarily or exclusively to receptors associated with cone glycans (e.g., α2-3 sialylated glycans, and/or short glycans), humans rarely become infected with avian viruses. Only when in sufficiently close contact with virus that it can access the deep lung and/or gastrointestinal tract receptors having umbrella glycans (e.g., long α2-6 sialylated glycans) do humans become infected.
As described herein, the present invention allows identification of a set of glycans that can be used to detect the H5 HA protein and/or to detect variants of the protein that might emerge with altered binding specificity. In particular, such an inventive array can be used to detect any H5 variant or indeed any of HA protein or variant thereof, with an ability to bind to upper respiratory human HA receptors and/or with an ability to bind (optionally with high affinity and/or specificity, preferably with high affinity) to umbrella-topology glycans.
As demonstrated herein, such arrays are useful for the identification and/or characterization of different HA proteins and their glycan-binding characteristics. In certain embodiments, inventive H5 HA variant proteins are tested on such arrays to assess their ability to bind to umbrella-topology (e.g., α2-6 glycans, and particularly long α2-6 glycans), and particularly to assess their ability to bind to multiple such glycans.
Indeed, the present invention provides arrays of umbrella glycans (e.g., α2-6 glycans, and particularly long α2-6 glycans) and optionally cone-topology glycans (e.g., α2-3 sialylated glycans), that can be used to characterize HA binding capabilities and/or as a diagnostic to detect, for example, human-binding HAs. As will be clear to those of ordinary skill in the art, such arrays are useful not only for characterizing or detecting H5 HAs, but indeed for characterizing or detecting any HAs, including for example, H7 and/or H9, whose ability to bind α2-6 glycans is desirably to be assessed.
The CFG has developed two kinds of glycan arrays: (1) well based microarray and (2) solid phase printed array. The printed array was more recently developed, so most of the initial ligand screening was performed using the well based microarray. The first version of the well-based array developed by the CFG comprised around 60 different glycans with triplicate representations of each glycan. Each successive version of the array incorporated additional glycans, and the current version comprises 195 glycans with quadruplicate representation of each glycan (see http://www.functionalglycomics.orestatic/consortium/resources/resourcecoreh5.shtml). The array predominantly comprises synthetic glycans that capture the physiological diversity of N- and O-linked glycans. The array also comprises polyvalent glycan ligands attached to a polyacrylamide backbone. In addition to the synthetic glycans, N-linked glycan mixtures derived from different mammalian glycoproteins are also represented on the array.
The datasets chosen for analysis in this study were obtained from the CFG web site at: http://www.functionalglycomics.org/glycomics/publicdata/primaryscreen.jsp. Currently, 40 mammalian GBPs have been screened against different versions of the glycan array. The screening data are available both in the raw format comprising of the intensity signals for a given GBP in a given well, as well as mean signal and signal to noise ratio of the GBP for each glycan ligand on the array. It is important to point out that as the glycan array evolved into its current version; the GBPs that were screened using the earlier version of the array generally were not screened again using the latest version. The absence of these data points has had implications on identification of features that distinguished the binding of glycan ligands from one GBP to another (as discussed herein). The datasets corresponding to the screening of DC-SIGN, -SIGNR, human galectin-1 and galectin-3 (and its individual carbohydrate recognition domains), and hemagglutinin H5 were obtained. These datasets were analyzed using the data mining platform described below.
The main steps involved in the data mining process are illustrated in
The data mining platform comprises software modules that interact with each other (
As noted above, features can be extracted from glycans and/or from their binding partners. In the particular applications exemplified herein, certain features were extracted from glycans on the glycan array, as listed in Table 1:
The rationale behind choosing the features shown was that glycan binding sites on GBPs typically accommodate di-tetra-saccharides. A tree-based representation was used to capture the information on monosaccharides and linkages in the glycan structures (root of the tree at the reducing end). This representation facilitated the abstraction of various features including higher order features such as connected set of monosaccharide triplets, etc (
Different types of classifiers have been developed and used in many applications. They primarily fall into three main categories: Mathematical methods, Distance methods and Logic Methods. These different methods and their advantages and disadvantages are discussed in detail in Weiss & Indrukhya (Predictive data mining—A practical guide. Morgan Kaufmann, San Francisco, 1998). For this specific application we choose a method called Rule Induction, which falls under Logic Methods. The Rule Induction classifier generates patterns in form of IF-THEN rules.
One of the main advantages of the Logic Methods and specifically classifiers such as the Rule Induction method that generate IF-THEN rules is that the results of the classifiers can be explained more easily when compared to the other statistical or mathematical methods. This allows one to explore the structural and biological significance of the rule or pattern discovered. An example rule generated using the features described earlier (see Table 1) is—
IF A Glycan contains “Galb4GlcNAcb3Gal[B]” and DOES NOT contain “Fuca3GlcNAc[B]”, THEN the Glycan will bind with higher affinity to Galectin 3.
The specific Rule Induction algorithm that was used in this case is the one developed by Weiss & Indurkya (Predictive data mining—A practical guide. Morgan Kaufmann, San Francisco, 1998).
A threshold that distinguished low affinity and high affinity binding was defined for each of the glycan array screening data sets (
By applying data mining methods to the high throughput CFG glycan array data, we have identified a set of features in glycans that bind to different GBPs. Three specific systems were chosen as examples: (1) DC-SIGN and -SIGNR, (2) galectins; and (3) hemagglutinin H5. Each of these GBP families is reasonably well defined in terms of glycan ligand preferences. The first example provides an additional validation of our methodology since a recent study outlined the structural basis of distinct ligand specificities of DC-SIGN and -SIGNR based on the glycan array data. Earlier studies have systematically evaluated ligand specificities of different galectins. However, the CFG glycan arrays represent a much larger domain of glycan structures that have been used to screen ligand specificities of different galectins. Thus the application of our methodology to the galectin datasets provides additional rules that govern the binding of different galectins to their glycan ligands.
DC-SIGN and DC-SIGNR belong to the type II transmembrane receptor subfamily of C-type lectins which recognize and bind to glycan ligands in a Ca2+ dependent manner. DC-SIGN is abundantly expressed in dendritic cells, and plays a key role in adhesion of T-cells to the antigen presenting dendritic cells via ICAM-3 molecule, thereby initiating an immune response. In addition, DC-SIGN has also been shown to play an important role in recognition of pathogens such as HIV, etc. by the dendritic cells. In fact, it has been demonstrated that binding of HIV to DC-SIGN on dendritic cells enhances the infection of the T-cells. On the other hand, DC-SIGNR, which shares a 77% sequence identity with DC-SIGN, is found on endothelial cells in liver, lymph nodes and placenta.
Each of these proteins contains a single carbohydrate recognition domain (CRD) at the C-terminus. The extracellular alpha helical domain (adjacent to CRD) on both the proteins facilitates tetramerization of the CRDs, thus enabling multivalent interactions with glycan ligands. There has been a wealth of crystal structure information on DC-SIGN and -SIGNR including crystal structures with different glycan ligands. More recently, these proteins were screened using the CFG glycan arrays and it was demonstrated that they had distinct ligand specificities and signaling properties. Thus, the glycan array data for these proteins provided a good framework to validate the data mining methodology.
As outlined above in Example 1, the glycan features (Table 1) corresponding to glycan screening analysis of DC-SIGN and DC-SIGNR were abstracted from the CFG database. The rule-based classification methods were performed, using these features where the main objective function was the mean signal to noise ratio of binding of each glycan to each of the two proteins. The results from the classification methods are summarized in Table 2:
The overall performance of the rule based classification methods was good given that they predicted 100% of the candidate high mannose structures for DC-SIGNR and predicted 16 out of 20 high binders for DC-SIGN. It is significant to note that there were no false positives, in other words there was no instance of a glycan which was predicted to bind, but did not bind. The first obvious implication by looking at the results is that both DC-SIGN and DC-SIGNR share common high affinity binding to high mannose structures. The presence of Mana3(Mana6)Mana6Man is a strong rule that captures 6 different high mannose glycan ligands that bind with high affinity based on the glycan array data. This observation is consistent with the earlier crystal structure studies.
In addition to the high mannose ligands, DC-SIGN bound to an additional set of fucosylated ligands that were characterized by distinct features. These fucosylated ligands did not bind to DC-SIGNR. The Fuca4GlcNAc is a commonly observed motif in Lewisa [Fuca4(Galb3)GlcNAc] containing glycan structures. The Fuca3(Galb4)GlcNAc is another commonly observed Lewisx motif present on the non-reducing terminal of N- and O-linked glycans. Both these features were characteristic of high affinity binders to DC-SIGN. This observation is consistent with the distinct binding of DC-SIGN to fucosylated ligands that was observed in earlier crystal structures of DC-SIGN with LewisX containing glycan structures. Based on a detailed investigation of crystal structures of DC-SIGN and SIGNR with high mannose and fucosylated ligands, it was shown that while both these proteins shared a similar mode of binding to the high mannose ligand, the binding to the fucosylated ligands was completely different and could be achieved only by the amino acids in the CRD of DC-SIGN.
Another interesting observation provided by our analysis is the required absence of specific features for high affinity binding. In other words, the presence of Neua3Galb4GlcNAc and Gala3Gal along with the Lewisx motif would be detrimental to binding of these ligands with DC-SIGN. The value of our data mining approach is highlighted by the confirmation of this rule by investigating the crystal structure of DC-SIGN with the Lewisx containing glycan ligand. Since the 3-OH position of the Gal in Fuca3(Galb4)GlcNAc is close to the CRD of DC-SIGN (
Based on the crystal structure of LewisX containing glycans with DC-SIGN and -SIGNR, the primary binding of fucosylated ligands involves the equatorial oxygens 3-OH and 4-OH of the Fuc which form coordination with the Ca2+ ion. Thus even in the case of Lewisa antigen Fuca4(Galb3)GlcNAc, the primary binding involves the 3, and 4-OH of Fuc (
Galectins belong to a family of soluble GBPs that are known to bind β-galactosides which were earlier defined as S-type lectins due to their requirement for reducing thiols for their activity. Unlike the C-type lectins (such as DC-SIGN and -SIGNR), galectins do not require Ca2+ for ligand binding. Galectins have been implicated in numerous biological roles viz. cell development, apoptosis, cancer, and immune response. While galectins are generally known to bind to type I (Galb3GlcNAc) and type II (Galb4GlcNAc) lactosamine units, their finer substrate specificity and its implications on their numerous biological roles is less understood. The data sets for human galectin-1 and -3 were analyzed using the rule based data mining approach. These two galectins are fundamentally different in terms of organization of their CRDs. Both galectin-1 and -3 share a similar C-terminal F3 type CRD. Galectin-1 is typically a homodimer of CRDs whereas galectin-3 comprises of single CRD with a N-terminus linker domain. The N-terminus domain of galectin-3 has been implicated to enhance its affinity for glycan ligands.
Similar to the above example, the features that enhanced and diminished binding of glycan ligands on the glycan array to galectin-1 and -3 were identified using the rule based data mining approach (Table 3):
The rules that govern the high affinity ligand binding of galectin-1 and -3 are more complex that those derived for DC-SIGN and -SIGNR. Although it is known that galectins-1 and -3 bind with similar affinity to both type II and type I lactosamine units, the data from the glycan array did not reveal any type I (Galb3GlcNAc) binders based on the threshold intensities that were used to distinguish high binders.
In the case of galectin-1, the first rule (Table 3) that captured 8 out of the 9 high binders included the presence of at least one lactosamine unit in a chain length of at least 3 monosaccharides. Again it is significant to note that there were no false positives. Based on analysis of the low and high affinity binders, several patterns in rule 1 were implicated to have a negative effect on binding. Fucosylation of the GlcNAc, terminal fucosylation of the Gal, sialylation of Gal and also presence of Gala3Gal or Gala4Gal in conjunction with the type II lactosamine unit had negative effects on binding. Furthermore the —Galb4GlcNAcb6GalNAc— unit which comprises of the type II lactosamine on a Core 2 (or Core 4) O-linked core had a negative effect on binding.
The second rule gave an interesting pattern which indicated that sialylation did not have an effect on high affinity binding if the glycan motif comprised of a type II polylactosamine repeat with at least two Galb4GlcNAc units. Earlier studies have implicated that glycans with terminal sialylation are candidate ligands for galectin-1. Since the sialylated glycans used in this study comprised of at least two Galb4GlcNAc units, these results are consistent with our rules. Furthermore, our rules also indicate that galectin-1 binds to internal Galb4GlcNAc units and any other patterns that are farther way in the chain towards non-reducing end have no effect on high affinity binding. There was only one false negative which comprised of Gal[3-O—SO3]b3GalNAc.
While the rules for galectin-3 binding were similar to galectin-1 there were some differences. These differences are captured in Table 4:
The main difference was in the first rule which had a combination of the absence of Mana3(Mana6)Man unit in conjunction with the other patterns. It is important to point out that this rule does not preclude all N-linked glycans. Instead it implies that galectin-3 favors linear repeat of Galb4GlcNAc (polylactosamine) in comparison with Galb4GlcNAc occurring on different branches attached to the Mana3(Mana6)Man of the core. Another difference was that the binding to Galectin-3 was not inhibited by the fucosylation of the Gal in the lactosamine, whereas the binding to Galectin-1 was inhibited by it.
Similar to the DC-SIGN and -SIGNR example, the results from our analysis of the galectin data were compared with structural aspects of ligand binding. Structural complexes of galectin-1 and -3 with different ligands such as Galb4GlcNAc, Neu5Aca3Galb4GlcNAc, Neu5Aca3Galb4(Fuca3)GlcNAc, Neu5Aca6Galb4GlcNAc, etc. were analyzed. The crystal structures of galectin-1 and -3 with Galb4GlcNAc ligands were used respectively as framework to superimpose structures of other ligands and construct the different structural complexes. The 4-, and 6-OH groups of Gal and 3-OH of GlcNAc in the Galb4GlcNAc unit were involved in interactions with the amino acids of the CRD of galectin-1 and -3. Thus, substitution at any of these oxygens resulted in unfavorable steric contacts with the protein.
The rules for galectin binding derived by our approach indicated that Gala4Gal, NeuAca6Gal and Fuca3GlcNAc were detrimental to binding, consistent with the analysis of the structural complexes. The crystal structures also indicated that it is possible to extend Galb4GlcNAc on the non-reducing side with another such unit (via b3 linkage) implying that both galectin-1 and -3 can bind to internal Galb4GlcNAc units. This validates the rule where longer chains having the terminal units such as Galb4(Fuca3)GlcNAc or Neu5Aca3/6Galb4GlcNAc did not have an effect on high affinity binding.
To further validate the rules for binding to Galectin-1 and Galectin-3, the rules were used to predict the relative binding of two different glycans that were not present in the glycan array to Galectin-1 and Galectin-3 (Table 5):
As observed earlier, the rules predicted that Galectin-3 favors the linear repeat of lactoseamine, whereas Galectin-1 favors the lactoseamine found in a branched arrangement. This is consistent with the ligand binding propensity that was observed in Hirabayashi et al. (2002).
A framework for the binding of H5N1 subtype to α2-3/6 sialylated glycans was developed (
This analysis provides important insights into the interactions of an HA glycan binding site with a variety of α2-3/6 sialylated glycans, including glycans of either umbrella or cone topology. The second involves a data mining approach to analyze the glycan array data on the different H1, 1-3 and H5 HAs. This data mining analysis correlates the strong, weak and non-binders of the different wild type and mutant HAs to the structural features of the glycans in the microarray (Table 7).
Importantly, these correlations (classifiers) capture the effect of subtle structural variations of the α2-3/6 sialylated linkages and/or of different topologies on binding to the different HAs. The correlations of glycan features obtained from the data mining analysis are mapped onto the HA glycan binding site, providing a framework to systematically investigate the binding of H1, H3 and H5 HAs to α2-3 and α2-6 sialylated glycans, including glycans of different topologies, as discussed below.
To give but one example, application of this framework to H5 HA according to the present invention illustrates how length of an α2-6 oligosaccharide chain becomes more important, especially in the context of degree of branching, than the nuances of structural variations around the glycan. For example, a triantennary structure with a single α2-6 motif versus a biantennary structure with a longer α2-6 motif will influence HA-glycan binding as against structural variations around the individual α2-6 motif. This is confirmed by the distinct length dependent classifiers for the α2-6 motif obtained herein from data mining (Table 7).
Crystal structures of HAs from H1 (PDB IDS: 1RD8, 1RU7, 1RUY, 1RV0, 1RVT, 1RVX, 1RVZ), H3 (PDB IDs: 1MQL, 1MQM, 1MQN) and H5 (1JSN, 1JSO, 2FKO) and their complexes with α2-3 and/or α2-6 sialylated oligosaccharides have provided molecular insights into residues involved in specific HA-glycan interactions. More recently, the glycan receptor specificity of avian and human H1 and H3 subtypes has been elaborated by screening the wild type and mutants on glycan arrays comprising of a variety of α2-3 and α2-6 sialylated glycans.
The Asp190Glu mutation in the HA of the 1918 human pandemic virus reversed its specificity from α2-6 to α2-3 sialylated glycans (Stevens et al., J. Mol. Biol., 355:1143, 2006; Glaser et al., J. Virol., 79:11533, 2005). On the other hand, the double mutation Glu190Asp and Gly225Asp on an avian H1 (A/Duck/Alberta/35/1976) reversed its specificity from α2-3 to α2-6 sialylated glycans. In the case of the H3 subtype, the amino acid changes from Gln226 to Leu and Gly228 to Ser between the 1963 avian H3N8 strain and the 1967-68 pandemic human H3N2 strain correlate with the change in their preference from α2-3 to α2-6 sialylated glycans (Rogers et al., Nature, 304:76, 1983). The relationship between the HA glycan binding specificity and transmission efficiency was demonstrated in a ferret model using the highly pathogenic and virulent 1918 H1N1 viruses (Tumpey, T. M. et al. Science 315: 655, 2007). Switching the receptor binding specificity from the parental human α2,6 sialylated glycan (SC18) receptor preference to an avian α2,3 sialylated receptor preference (AV18) resulted in a virus that was unable to transmit. On the other hand, one of the mixed α-2,3/α-2,6 sialylated glycan specificity virus (A/New York/1/18 (NY18)) showed no transmission, surprisingly A/Texas/36/91 (Tx91) virus, also mixed α2,3/α2,6 sialylated glycan specificity, was able to efficiently transmit. Furthermore, as stated above, various strains of the highly pathogenic H5N1 viruses also show mixed α2,3/α2,6 sialylated glycan specificity (Yamada, S. et al. Nature 444:378, 2006), and have yet been able to transmit from human-to-human. The confounding results with respect to HA's sialylated glycan specificity and transmission posed the following questions. First, is there diversity in the sialylated glycans found in the upper airways in humans, and could that account for the specificity and tissue tropism of the virus? Second, are there nuances of glycan conformation that might play a role in how both α2-3 and/or α2-6 sialylated glycans bind to HA glycan binding pocket? Taken together, what are the glycan binding requirements of the Influenza A virus HA for human adaptation?
Analysis of all the HA-glycan co-crystal structures indicates that the orientation of the Neu5Ac sugar (SA) is fixed relative to the HA glycan binding site. A highly conserved set of amino acids Phe95, Ser/Thr136, Trp153, His183, Leu/De194 across different HA subtypes are involved in anchoring the SA. Therefore, the specificity of HA to α2-3 or α2-6 is governed by interactions of the HA glycan binding site with the glycosidic oxygen atom and sugars beyond SA.
The conformation of the Neu5Acα2-3 Gal linkage is such that the positioning of Gal and sugars beyond Gal in α2-3 fall in a cone-like region governed by the glycosidic torsion angles at this linkage (
In addition to the conserved anchor points for sialic acid binding, two critical residues, Gln226 and Glu190, are involved in binding to the Neu5Acα2-3Gal motif. Gln226, located at the base of the binding site, interacts with the glycosidic oxygen atom of the Neu5Acα2-3Gal linkage (
Superimposition of the glycan binding site in the crystal structures of AAI68_H3—23, ADU67_H3—23 and APR34_H1—23 gives additional insights into the positioning of the Glu190 side chain and its effect on HA binding to α2-3 sialylated glycans. The side chain of Glu190 in H1 HA is further (around 1 Å) into the binding site in comparison with that of Glu190 in 1-13 HA. This could be due to the amino acid differences Pro186 in H1 HA as against Ser186 in H3 HA which are proximal to the Glu190 residue. This change in side chain conformation of Glu190 could correlate with the binding of avian H1 (and not avian H3) with moderate affinity to some of the α2-6 sialylated glycans as shown by the data mining analysis of the glycan microarray data (Table 7). Further, substitution of Gly228 to Ser—a hallmark change between avian and human H3 subtypes—alters the conformation of Glu190 and interferes with the interaction of human H3 HA to Neu5Acα2-3Gal in the trans conformation: This is further elaborated by the distinct conformation (that is not trans) of Neu5Acα2-3Gal motif observed in the human AAI68_H3—23 co-crystal structure. The Neu5Acα2-3Gal motif in this conformation provides less optimal contacts with human H3 HA binding site compared to those provided by this motif in the trans conformation with the avian H3 HA (
How do the structural variations around the Neu5Acα2-3Gal influence HA-glycan interactions? Lys193, which is highly conserved in the avian H5 (
Thus, for binding to α2-3 sialylated glycans, apart from the residues that anchor Neu5Ac, Glu190 and Gln226, highly conserved in all avian H1, H3 and H5 subtypes are critical for binding to Neu5Acα2-3Gal motif. The contacts with GlcNAc or GalNAc and substitutions such as sulfation and fucosylation in the α2-3 motif involve amino acids at positions 137, 186, 187, 193 and 222. HA from H1, H3 and H5 exhibit differential binding specificity to the diverse α2-3 sialylated glycans present in the glycan microarray. The amino acid residues in these positions are not conserved across the different HAs and this accounts for the different binding specificities
In the case of Neu5Aca2-6 Gal linkage, the presence of the additional C6-C5 bond provides added conformational flexibility. The position of Gal and subsequent sugars in α2-6 would span a much larger umbrella-like region as compared to the cone-like region in the case of α2-3 (
In H1 HA, superimposition of the glycan binding domain of HA from a human H1N1 (A/South Carolina/1/1918) subtype with that of ASI30_H1—26 and APR34_H1—26 provided insights into the amino acids involved in providing specificity to the α2-6 sialylated glycan. Lys222 and Asp225 are positioned to interact with the oxygen atoms of the Gal in the Neu5Acα2-6Gal motif. Asp190 and Ser/Asn193 are positioned to interact with additional monosaccharides GlcNAcα1-3Gal of the Neu5Acα2-6Galα1-4GlcNAcα1-3Gal motif (
Asp190, Lys222 and Asp225 are highly conserved among the H1 HAs from the 1918 human pandemic strains. Although the amino acid Gln226 is highly conserved in all the avian and human H1 subtypes, it does not appear to be as involved in binding to α2-6 sialylated glycans (in human H1 subtypes) compared to its role in binding to α2-3 sialylated glycans (in the avian H1 subtypes). The data mining analysis of the glycan array results for wild type and mutant form of the avian and human H1 HAs further substantiates the role of the above amino acids in binding to α2-6 sialylated glycans (Table 7). The Glu190Asp/Gly225Asp double mutant of the avian H1 HA reverses its binding to α2-6 sialylated glycans (Table 7). Further, the Lys222Leu mutant of human ANY18_H1 removes its binding to all the sialylated glycans on the array consistent with the essential role of Lys222 in glycan binding.
In order to identify amino acids that provide specificity for H3N2 HA binding to α2-6 sialylated glycans, the glycan binding domain of HA from human H3N2 (AAI68_H3), ADU63_H3—26 and ASI30_H1—26 were superimposed. Analysis of these superimposed structures showed that Leu226 is positioned to provide optimal van der Waals contact with the C6 atom of the Neu5α2-6 Gal motif and Ser228 is positioned to interact with O9 of the sialic acid. Ser228 in the human H3 also interacts with Glu190 (unlike Gly228 in avian ADU63_H3 which does not) thereby affecting its side chain conformation. The side chain of Glu190 in human H3 HA is displaced slightly into the binding site by about 0.7 Å in comparison with that of Glu190 in avian H3 HA. These differences limit the ability of human H3 HA to bind to α2-3 sialylated glycans and correlate with its preferential binding to α2-6 sialylated glycans. Thus, the Gln226Leu and Gly228Ser mutations cause a reversal of the glycan receptor specificity of avian H3 to human H3 subtype during the 1967 pandemic.
Comparison of HAs from 1967-68 pandemic H3N2 and those from more recent H3 subtypes (after 1990) show that the Glu190 is mutated to Asp in the recent subtypes. This mutation further enhances the binding of human H3 to α2-6 sialylated glycans since Asp190 in human H3 is positioned to interact favorably with these glycans. This structural implication is further corroborated by the data mining analysis of the glycan array data on a human H3 subtype (A/Moscow/10/1999). This HA comprises Asp190, Leu226 and Ser228 (
The above observations highlight both the similarities as well as differences between H1 and H3 HA binding to α2-6 sialylated glycans. In both H1 and H3 HA, Asp190 and Ser/Asn193 are positioned to make favorable contacts with monosaccharides beyond Neu5Acα2-6Gal motif (
The interactions with α2-6 sialylated glycans provided by the different amino acids in H1 and H3 HA suggested that the current avian H5N1 HA could mutate into a H1-like or H3-like glycan binding site in order to reverse its glycan receptor specificity. Based on the above framework, the hypothesized H1-like and H3-like mutations for H5 HA are further elaborated and tested as discussed below.
Analysis of the superimposed ASI30_H1—26, APR34_H1—26, ADS97_H5—26 and Viet04_H-5 structures provided insights into the H1-like binding of H5 HA to α2-6 sialylated glycans. Since the H1 and H5 HAs belong to the same structural clade, their glycan binding sites share a similar topology and distribution of amino acids (Russell et al., Virology, 325:287, 2004). Lys222, which is highly conserved in avian H5 HAs is positioned to provide optimal contacts with Gal of Neu5Acα2-6Gal motif similar to the analogous Lys in H1 HA. Glu190 and Gly225 in Viet04_H5 (in the place of Asp190 and Asp225 in H1) do not provide the necessary contacts with the Neu5Acα2-6Galβ1-4GlcNAc motif similar to H1. Therefore Glu190Asp and Gly225Asp mutations in H5 HA could potentially improve the contacts with α2-6 sialylated glycans.
Analysis of the interactions beyond GlcNAc in the Neu5Acα2-6Galβ1-4GlcNAcβ1-3Galβ1-4Glc oligosaccharide and the glycan binding pocket of H1 and H5 HAs showed that while Ser/Asn193 in H1 HA provides favorable contacts with the penultimate Gal, the analogous Lys193 in H5 has unfavorable steric overlaps with the GlcNAcβ1-3Gal motif. Thus, the Lys193Ser mutation can provide additional favorable contacts (along with Glu190Asp and Gly225Asp mutations) with α2-6 sialylated glycans.
The highly conserved Gln226 in H1 HA is also conserved in the avian H5 HA. Given that Gln226 plays a less active role in H1 HA binding to α2-6 sialylated glycans (as discussed above), mutation of this amino acid to a hydrophobic amino acid such as Leu could potentially enhance its van der Waals contact with C6 atom of Gal in Neu5Acα2-6Gal motif.
The superimposition of ADU63_H3—26, AAI68_H3, ADS97_H5—26 and Viet04_H5 provides insights into the H3-like binding of H5 HA to α2-6 sialylated glycans. While this superimposition structurally aligned the glycan binding site of H5 and H3 HA, it was not as good as the structural alignment between H5 and H1. The favorable van der Waals contact and ionic contact with Neu5α2-6Gal motif respectively provided by Leu226 and Ser228 in H3 HA were absent in H5 HA (with Gln226 and Gly228). Given that Leu226 and Ser228 are critical for binding to α2-6 sialylated glycans in human H3 HA, the Gln226Leu and Gly228Ser mutations in H5 HA could potentially provide optimal contacts with α2-6 sialylated glycans. Further, even in the comparison between H3 and H5, Lys193 is positioned such that it would have unfavorable steric contacts with the monosaccharides beyond Neu5Acα2-6Gal motif as against Ser193 in human H3 HA which is positioned to provide favorable contacts. Although the HA from the 1967-68 pandemic H3N2 comprises of Glu190, Asp190 in H5 HA would be positioned to provide better ionic contacts with Neu5Acα2-6Gal motif in longer oligosaccharides.
The roles of the above mentioned residues were further corroborated by data mining analysis of glycan array data for wild type and mutant forms of Viet04_H5 (Table 7). The double mutant, Glu190Asp/Gly225Asp, does not bind to any glycan structure since it loses the amino acid Glu190 for binding α2-3 sialylated glycans and has the steric interference from Lys193 for binding to α2-6 sialylated glycans. Similarly the double mutant, Gln226Leu/Gly228Ser binds to some of the α2-3 sialylated glycans (α2-3 Type B classifier) but only to a single biantennary α2-6 sialylated glycan (α2-6 Type A classifier).
Analysis of this binding to the biantennary α2-6 sialylated glycan showed that the Neu5Acα2-6Gal linkage in this glycan can potentially bind in an extended conformation to the double mutant albeit with lesser contacts (
Without wishing to be bound by any particular theory, the present inventors propose that a necessary condition for human adaptation of influenza A virus HAs is to gain the ability to bind to long α2-6 (predominantly expressed in human upper airway) with high affinity. For example, an aspect of glycan diversity is the length of the lactosamine branch that is capped with the sialic acid. This is captured by the two distinct features of α2-6 sialylated glycans derived from the data mining analysis (Table 7). One feature is characterized by the Neu5Acα2-6 Galβ1-4GlcNAc linked to the Man of the N-linked core and the other is characterized by this motif linked to another lactose amine unit forming a longer branch (which typically adopts umbrella topology). Thus, the extensive binding of the mutant H5 HAs to the upper airways may only be possible if these mutants bind with high affinity to the glycans with long α2-6 adopting the umbrella topology. For example, according to the present invention, desirable binding patterns include binding to umbrella glycans depicted in
By contrast, we note a recent report of modified H5 HA proteins (containing Gly228Ser and Gln226Leu/Gly228Ser substitution) showed binding to only a single biantennary α2-6 sualyl-lactosamine glycan structure on the glycan array (Stevens et al., Science 312:404, 2006). Such modified H5 HA proteins are therefore not BSHB H5 HAs, as described herein.
Thus, the present invention demonstrates that the current avian H5N1 HA, can undergo mutations that would alter its specificity towards α2-6 glycans based on interactions of human H1 or H3 HA with these glycans. The Glu190Asp, Lys193Ser, Gly225Asp and Gln226Leu mutations (“DSDL mutant”) could potentially make the 1-15 HA binding site similar to that of the human H1 HA, while the Glu190Asp, Lys193Ser, Gln226Leu and Gly228Ser (“DSLS mutant”) could potentially make it similar to that of the human H3 HA for optimal interactions with α2-6 sialylated glycans. DSDL and DSLS H5 HA mutants were designed and tested based on the above framework. Wild type and mutant BSHB H5 HAs were expressed in baculovirus and purified as reported earlier (FIG. 10XXXY).
We found that only recombinant wild type H5 HA bound extensively to the alveolar region, and very little if any to the trachea or bronchus consistent with binding of avian H5 HA to α2-3 sialylated glycans. In contrast, only the DSLS mutant (H3-like) binds to the upper airway tracheal and bronchial tissues; and further this mutant does not bind to the deep lung alveolar tissues.
For the tissue binding experiment, tissue sections were deparaffinized, rehydrated and incubated with the WT and the mutant HA proteins (diluted in PBS) for 3 hr. Based on the protein concentration for a given lot after purification, appropriate serial dilutions in the ranges of 1:10-1:100 were tested. After extensive washing with PBS, the sections were blocked with 2% BSA-PBS for 30 min and then incubated with rabbit anti avian H5N1 hemagglutinin antibody (Pro-Sci Inc, 1:1000 in 2% BSA-PBS) for 3 hr. Sections were Washed with PBS and then incubated with secondary goat-anti rabbit antibody (Invitrogen; 1:500 in 2% BSA-PBS) for 90 min. Sections were counterstained with propidium iodide (in red; Invitrogen; 1:200 in PBS) and then viewed under a confocal microscope (Zeiss LSM510 laser scanning confocal microscopy). All incubations were at room temperature.
The observation that the DSLS version of H5 HA, but not the DSDL version, bound to tracheal and bronchial sections (but not to alveolar) was intriguing given that both DSDL and DSLS mutants were expected to bind extensively to α2-6 sialylated glycans in the upper airway based on our framework. The Ser193 instead of Lys193 in both these mutants would have removed the steric hindrance imposed by Lys193 (in the wild type H5 HA) to provide them with broad specificity towards α2-6 sialylated glycans. Further, given that H5 and H1 belong to the same structural clade, it would be been more likely for H5 HA to mutate into a H1-like glycan binding site.
To further understand the inability of DSDL mutant to bind to α2-6 sialylated glycans, this mutation was mapped on to the Viet04_H5 crystal structure which was further superimposed with ASI30_H1—26 and APR34_H1—26 crystal structures. This mapping showed that all the contacts with the α2-6 sialylated oligosaccharide are conserved between H1 HAs and the DSDL mutant. However, Asp187, which is highly conserved in avian H5 HA was in close proximity to the Asp190 in the DSDL mutant. The presence of 3 aspartates (Asp187, Asp190 and Asp225) further explained the pI of 6.8 for the DSDL mutant (as compared to 7.3 for WT and DSLS mutant). The interaction between Asp187 and Asp190 could potentially alter conformation of Asp190 similar to the influence of Ser228 on Glu190 in H3 HA. The effect of proximity of amino acid at 187 on Asp190 is also evident from the differences in SASA of Thr187 in ASI30_H1 interacting with α2-3 vs. α2-6 sialylated glycan. Given that Asp190 is involved in forming optimal contacts with α2-6 sialylated glycans in H1 HA, the effect of Asp187 on Asp190 could potentially disrupt this interaction. Perhaps the mutation of highly conserved Gln226 in H1 HA to Leu in the DSDL mutant could have affected the environment of HA binding site of this mutant in the context of the other H1-like mutations and made it less optimal for binding to α2-6 sialylated glycans.
The role of Gln226 in the H1-like binding of H5 HA was further tested using a Glu190Asp/Lys193Ser or DS mutant which retains the Gln226. The lack of binding of the DS mutant to the deep lung tissues is consistent with the loss of binding to α2-3 sialylated glycans (due to Glu190Asp mutation). Similarly the lack of binding of this mutant to the upper airway tissues further supports the disruptive effect of Asp187 on Asp190 which could lower the binding of this mutant to α2-6 sialylated glycans. Thus, the mutations in current avian H5N1 HA would prefer leading to a H3-like (as compared to an H1-like) glycan binding site having a broad specificity for α2-6 sialylated glycans.
The binding of the DSLS mutant to the upper lung raises the question as to the diversity of the α2-6 sialylated glycans in the upper airways. Lectin staining of the human bronchial epithelial (HBE) cells clearly shows that these cells are abundant in different α2-6 sialylated glycans such as N-linked, O-linked and glycolipids (
Specifically, about 70×10616HBE14o-cells (a gift from Dr. D. C. Gruenert; University of California, San Francisco) were harvested when they were >90% confluent with 100 mM citrate saline buffer and the cell membrane was isolated after treatment with protease inhibitor (Calbiochem) and homogenization. The cell membrane fraction was treated with PNGaseF (New England Biolabs) and the reaction mixture was incubated overnight at 37° C. The reaction mixture was boiled for 10 min to deactivate the enzyme and the deglycosylated peptides and proteins were removed using a Sep-Pak C18 SPE cartridge (Waters). The glycans were further desalted and purified into neutral (25% acetonitrile fraction) and acidic (50% acetonitrile containing 0.05% trifluoroacetic acid) fractions using graphitized carbon solid-phase extraction columns (Supelco). The acidic fractions (containing sialylated glycans) were analyzed by MALDI-TOF MS in negative ion mode with soft ionization conditions (accelerating voltage 22 kV, grid voltage 93%, guide wire 0.3% and extraction delay time of 150 ns). This MALDI TOF-TOF fragmentation analysis of representative mass peaks illustrated the diversity in terms of branching pattern and increased branch length in the N-linked glycans. The longer branch length versus higher branching observed in the glycan profile can influence the binding of H5 HA to these glycans.
For example, an aspect of glycan diversity is the length of the lactosamine branch that is capped with the sialic acid. This is captured by the two distinct features of α2-6 sialylated glycans derived from the data mining analysis (Table 7). One feature is characterized by the Neu5Acα2-6Galβ1-4GlcNAc linked to the Man of the N-linked core and the other is characterized by this motif linked to another lactose amine unit forming a longer branch. Thus, the extensive binding of the mutant H5 HAs to the upper airways is only possible if these mutants have a broad binding specificity to α2-6 sialylated glycans. For example, according to the present invention, desirable binding patterns include those depicted in
and combinations thereof:
and/or
and combinations thereof.
By contrast, we note a recent report of modified H5 HA proteins (containing Gly228Ser and Gln226Leu/Gly228Ser substitution) showed binding to only a single biantennary α2-6 sualyl-lactosamine glycan structure on the glycan array (Stevens et al., Science 312:404, 2006). Such modified H5 HA proteins are therefore not BSHB H5 HAs, as described herein.
Those skilled in the art will recognize, or be able to ascertain using no more than routine experimentation, many equivalents to the specific embodiments of the invention described herein. The scope of the present invention is not intended to be limited to the above Description, but rather is as set forth in the following claims:
1Border line high binder;
2Sulfated GlcNAc[6/S]/Gal[6S] high binders
3Border line high) binders to a2-6 Type B. Only sulfated GlcNAc[6S]/Gal[6S] are high binders;
4Binds to several non-sialylated glycans;
5Border line high to α2-3 sialylated glycans;
6Few border line high binders to sulfated GlcNAc on Neu5Acα3Galβ3/4GlcNAc;
7High binders are Neu5Acα6Galβ4GlcNAcβ3Gal & !GlcNAcα6Man; Others are borderline high.
The present application claims priority under 35 USC 119(e) to co-pending U.S. Provisional patent application Ser. No. 60/837,868, filed on Aug. 14, 2006, and to co-pending U.S. provisional patent application Ser. No. 60/837,869, filed on Aug. 14, 2006. The entire contents of each of these prior applications are incorporated herein by reference.
This invention was made with United States government support awarded by the National Institute of General Medical Sciences under contract number U54 GM62116 and by the National Institutes of Health under contract number GM57073. The United States Government has certain rights in the invention.
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/US07/18103 | 8/14/2007 | WO | 00 | 1/29/2010 |
Number | Date | Country | |
---|---|---|---|
60837869 | Aug 2006 | US | |
60837868 | Aug 2006 | US |