The present invention relates to a method of selecting a protein variant having modified immunogenicity as compared to the parent protein, to the protein variant and use thereof, as well as to a method for producing said protein variant.
An increasing number of proteins, including enzymes, are being produced industrially, for use in various industries, housekeeping and medicine. Being proteins they are likely to stimulate an immunological response in man and animals, including an allergic response.
Depending on the application, individuals get sensitised to the respective allergens by inhalation, direct contact with skin and eyes, or injection. The general mechanism behind an allergic response is divided in a sensitisation phase and a symptomatic phase. The sensitisation phase involves a first exposure of an individual to an allergen. This event activates specific T- and B-lymphocytes, and leads to the production of allergen specific IgE antibodies (in the present context the antibodies are denoted as usual, i.e. immunoglobulin E IgE etc.). These IgE antibodies eventually facilitate allergen capturing and presentation to T-lymphocytes at the onset of the symptomatic phase. This phase is initiated by a second exposure to the same or a resembling antigen. The specific IgE antibodies bind to the specific IgE receptors on mast cells and basophils, among others, and capture at the same time the allergen. The polyclonal nature of this process results in bridging and clustering of the IgE receptors, and subsequently in the activation of mast cells and basophils. This activation triggers the release of various chemical mediators involved in the early as well as late phase reactions of the symptomatic phase of allergy. Prevention of allergy in susceptible individuals is therefore a research area of great importance.
For certain forms of IgE-mediated allergies, a therapy exists, which comprises repeated administration of allergen preparations called ‘allergen vaccines’ (Int. Arch. Allergy Immunol., 1999, vol. 119, pp 1-5). This leads to reduction of the allergic symptoms, possibly due to a redirection of the immune response away from the allergic (Th2) pathway and towards the immunoprotective (Th1) pathway (Int. Arch. Allergy Immunol., 1999, vol. 119, pp 1-5).
Various attempts to reduce the immunogenicity of polypeptides and proteins have been conducted. It has been found that small changes in an epitope may affect the binding to an antibody. This may result in a reduced importance of such an epitope, maybe converting it from a high affinity to a low affinity epitope, or maybe even result in epitope loss, i.e. that the epitope cannot sufficiently bind an antibody to elicit an immunogenic response.
There is a need for methods to identify epitopes on proteins and alter these epitopes in order to modify the immunogenicity of proteins in a targeted manner. Such methods and kits for their execution can have at least four useful purposes:
1) reduce the allergenicity of a commercial protein using protein engineering.
2) reduce the potential of commercial proteins to cross-react with environmental allergens and hence cause allergic reactions in people sensitized to the environmental allergens (or vice versa).
3) improve the immunotherapeutic effect of allergen vaccines.
4) assist characterization of clinical allergies in order to select the appropriate treatment, including allergen vaccination.
In WO 99/53038 (Genencor Int.) as well as in prior references (Kammerer et al, Clin. Exp. Allergy, 1997, vol. 27, pp 1016-1026; Sakakibara et al, J. Vet. Med. Sci., 1998; vol. 60, pp. 599-605), methods are described, which identify linear T-cell epitopes among a library of known peptide sequences, each representing part of the primary sequence of the protein of interest. Further, several similar techniques for localization of B-cell epitopes are disclosed by Walshet et al, J. Immunol. Methods, vol. 121, 1275-280, (1989), and by Schoofs et al. J. Immunol. vol. 140, 611-616, (1987). All of these methods, however, only leads to identification of linear epitopes, not to identification of ‘structural’ or ‘discontinuous’ epitopes, which are found on the 3-dimensional surface of protein molecules and which comprise amino acids from several discrete sites of the primary sequence of the protein. For several allergens, it has been realized that the dominant epitopes are of such discontinuous nature (Collins et al., Clin. Exp. All. 1996, vol. 26, pp. 36-42).
Slootstra et al; Molecular Diversity, 2, pp. 156-164, 1996 disclose the screening of a semi-random library of synthetic peptides for their binding properties to three monoclonal antibodies by immobilizing the peptides on polyethylene pins and binding a dilution series of each antibody to the pins. This reference does not disclose any indication of how the antibody binding peptide sequences relate to any full protein antigens or allergens.
In WO 92/10755 a method for modifying proteins to obtain less immunogenic variants is described. Randomly constructed protein variants, revealing a reduced binding of antibodies to the parent enzyme as compared to the parent enzyme itself, are selected for the measurement in animal models in terms of allergenicity. Finally, it is assessed whether reduction in immunogenicity is due to true elimination of an epitope or a reduction in affinity for antibodies. This method targets the identification of amino acids that may be part of structural epitopes by using a complete protein for assessing antigen binding. The major drawbacks of this approach are the ‘trial and error’ character, which makes it a lengthy and expensive process, and the lack of general information on the epitope patterns. Without this information, the results obtained for one protein can not be applied on another protein.
WO 99/47680 (ALK-ABELLÓ) discloses the identification and modification of B-cell epitopes by protein engineering. However, the method is based on crystal structures of Fab-antigen complexes, and B-cell epitopes are defined as “a section of the surface of the antigen comprising 15-25 amino acid residues, which are within a distance from the atoms of the antibody enabling direct interaction” (p. 3). This publication does not show how one selects which Fab fragment to use (e.g. to target the most dominant allergy epitopes) or how one selects the substitutions to be made. Further, their method cannot be used in the absence of such crystallographic data for antigen-antibody complexes, which are very cumbersome, sometimes impossible, to obtain—especially since one would need a separate crystal structure for each epitope to be changed.
Hence, it is of interest to establish a general and efficient method to identify structural epitopes on the 3-dimensional surface of commercial and environmental allergens.
The present invention relates to a method of selecting a protein variant having modified immunogenicity as compared to a parent protein, comprising the steps of:
a) obtaining antibody binding peptide sequences,
b) using the sequences to localise epitope sequences on the 3-dimensional structure of parent protein,
c) defining an epitope area including amino acids situated within 5 Å from the epitope amino acids constituting the epitope sequence,
d) changing one or more of the amino acids defining the epitope area of the parent protein by genetic engineering mutations of a DNA sequence encoding the parent protein,
e) introducing the mutated DNA sequence into a suitable host, culturing said host and expressing the protein variant, and
f) evaluating the immunogenicity of the protein variant using the parent protein as reference.
A second aspect of the present invention is a protein variant having modified immunogenicity as compared to its parent protein. The amino acid sequence of the protein variant differs from the amino acid sequence of the parent protein with respect to at least one epitope pattern of the parent protein, such that the immunogenicity of the protein variant is modified as compared with the immunogenicity of the parent protein.
A further aspect of the present invention is a composition comprising a protein variant as defined above, as well as the use of the composition for industrial application, such as the production of a formulation for personal care products (for example shampoo; soap; skin, hand and face lotions; skin, hand and face crèmes; hair dyes; toothpaste), food (for example in the baking industry), detergents and for the production of pharmaceuticals, e.g. vaccines.
Yet another aspect is a DNA molecule encoding a protein variant as defined above.
Further aspects are a vector comprising a DNA molecule as described above as well a host cell comprising said DNA molecule.
Another aspect is a method of producing a protein variant having modified immunogenicity as compared to the parent protein as defined above.
Prior to a discussion of the detailed embodiments of the invention, a definition of specific terms related to the main aspects of the invention is provided.
In accordance with the present invention there may be employed conventional molecular biology, microbiology, and recombinant DNA techniques within the skill of the art. Such techniques are explained fully in the literature. See, e.g., Sambrook, Fritsch & Maniatis, Molecular Cloning: A Laboratory Manual, Second Edition (1989) Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y. (herein “Sambrook et al., 1989”) DNA Cloning: A Practical Approach, Volumes I and II/D. N. Glover ed. 1985); Oligonucleotide Synthesis (M. J. Gait ed. 1984); Nucleic Acid Hybridization (B. D. Hames & S. J. Higgins eds (1985)); Transcription And Translation (B. D. Hames & S. J. Higgins, eds. (1984)); Animal Cell Culture (R. I. Freshney, ed. (1986)); Immobilized Cells And Enzymes (IRL Press, (1986)); B. Perbal, A Practical Guide To Molecular Cloning (1984).
When applied to a protein, the term “isolated” indicates that the protein is found in a condition other than its native environment, such as apart from blood and animal tissue. In a preferred form, the isolated protein is substantially free of other proteins, particularly other proteins of animal origin. It is preferred to provide the proteins in a highly purified form, i.e., greater than 95% pure, more preferably greater than 99% pure. When applied to a polynucleotide molecule, the term “isolated” indicates that the molecule is removed from its natural genetic milieu, and is thus free of other extraneous or unwanted coding sequences, and is in a form suitable for use within genetically engineered protein production systems. Such isolated molecules are those that are separated from their natural environment and include cDNA and genomic clones. Isolated DNA molecules of the present invention are free of other genes with which they are ordinarily associated, and may include naturally occurring 5′ and 3′ untranslated regions such as promoters and terminators. The identification of associated regions will be evident to one of ordinary skill in the art (see for example, Dynan and Tijan, Nature 316: 774-78, 1985).
A “polynucleotide” is a single- or double-stranded polymer of deoxyribonucleotide or ribonucleotide bases read from the 5′ to the 3′ end. Polynucleotides include RNA and DNA, and may be isolated from natural sources, synthesized in vitro, or prepared from a combination of natural and synthetic molecules.
A “nucleic acid molecule” refers to the phosphate ester polymeric form of ribonucleosides (adenosine, guanosine, uridine or cytidine; “RNA molecules”) or deoxyribonucleosides (deoxyadenosine, deoxyguanosine, deoxythymidine, or deoxycytidine; “DNA molecules”) in either single stranded form, or a double-stranded helix. Double stranded DNA-DNA, DNA-RNA and RNA-RNA helices are possible. The term nucleic acid molecule, and in particular DNA or RNA molecule, refers only to the primary and secondary structure of the molecule, and does not limit it to any particular tertiary or quaternary forms. Thus, this term includes double-stranded DNA found, inter alia, in linear or circular DNA molecules (e.g., restriction fragments), plasmids, and chromosomes. In discussing the structure of particular double-stranded DNA molecules, sequences may be described herein according to the normal convention of giving only the sequence in the 5′ to 3′ direction along the nontranscribed strand of DNA (i.e., the strand having a sequence homologous to the mRNA). A “recombinant DNA molecule” is a DNA molecule that has undergone a molecular biological manipulation.
A DNA “coding sequence” is a double-stranded DNA sequence, which is transcribed and translated into a polypeptide in a cell in vitro or in vivo when placed under the control of appropriate regulatory sequences. The boundaries of the coding sequence are determined by a start codon at the 5′ (amino) terminus and a translation stop codon at the 3′ (carboxyl) terminus. A coding sequence can include, but is not limited to, prokaryotic sequences, cDNA from eukaryotic mRNA, genomic DNA sequences from eukaryotic (e.g., mammalian) DNA, and even synthetic DNA sequences. If the coding sequence is intended for expression in a eukaryotic cell, a polyadenylation signal and transcription termination sequence will usually be located 3′ to the coding sequence.
An “Expression vector” is a DNA molecule, linear or circular, that comprises a segment encoding a polypeptide of interest operably linked to additional segments that provide for its transcription. Such additional segments may include promoter and terminator sequences, and optionally one or more origins of replication, one or more selectable markers, an enhancer, a polyadenylation signal, and the like. Expression vectors are generally derived from plasmid or viral DNA, or may contain elements of both.
Transcriptional and translational control sequences are DNA regulatory sequences, such as promoters, enhancers, terminators, and the like, that provide for the expression of a coding sequence in a host cell. In eukaryotic cells, polyadenylation signals are control sequences.
A “secretory signal sequence” is a DNA sequence that encodes a polypeptide (a “secretory peptide” that, as a component of a larger polypeptide, directs the larger polypeptide through a secretory pathway of a cell in which it is synthesized. The larger polypeptide is commonly cleaved to remove the secretory peptide during transit through the secretory pathway.
The term “promoter” is used herein for its art-recognized meaning to denote a portion of a gene containing DNA sequences that provide for the binding of RNA polymerase and initiation of transcription. Promoter sequences are commonly, but not always, found in the 5′ non-coding regions of genes.
“Operably linked”, when referring to DNA segments, indicates that the segments are arranged so that they function in concert for their intended purposes, e.g. transcription initiates in the promoter and proceeds through the coding segment to the terminator.
A coding sequence is “under the control” of transcriptional and translational control sequences in a cell when RNA polymerase transcribes the coding sequence into mRNA, which is then trans-RNA spliced and translated into the protein encoded by the coding sequence.
“Isolated polypeptide” is a polypeptide which is essentially free of other non-[enzyme]polypeptides, e.g., at least about 20% pure, preferably at least about 40% pure, more preferably about 60% pure, even more preferably about 80% pure, most preferably about 90% pure, and even most preferably about 95% pure, as determined by SDS-PAGE.
“Heterologous” DNA refers to DNA not naturally located in the cell, or in a chromosomal site of the cell. Preferably, the heterologous DNA includes a gene foreign to the cell.
A cell has been “transfected” by exogenous or heterologous DNA when such DNA has been introduced inside the cell. A cell has been “transformed” by exogenous or heterologous DNA when the transfected DNA effects a phenotypic change. Preferably, the transforming DNA should be integrated (covalently linked) into chromosomal DNA making up the genome of the cell.
A “clone” is a population of cells derived from a single cell or common ancestor by mitosis.
“Homologous recombination” refers to the insertion of a foreign DNA sequence of a vector in a chromosome. Preferably, the vector targets a specific chromosomal site for homologous recombination. For specific homologous recombination, the vector will contain sufficiently long regions of homology to sequences of the chromosome to allow complementary binding and incorporation of the vector into the chromosome. Longer regions of homology, and greater degrees of sequence similarity, may increase the efficiency of homologous recombination.
The techniques used to isolate or clone a nucleic acid sequence encoding a polypeptide are known in the art and include isolation from genomic DNA, preparation from cDNA, or a combination thereof. The cloning of the nucleic acid sequences of the present invention from such genomic DNA can be effected, e.g., by using the well known polymerase chain reaction (PCR) or antibody screening of expression libraries to detect cloned DNA fragments with shared structural features. See, e.g., Innis et al., 1990, A Guide to Methods and Application, Academic Press, New York. Other nucleic acid amplification procedures such as ligase chain reaction (LCR), ligated activated transcription (LAT) and nucleic acid sequence-based amplification (NASBA) may be used. The nucleic acid sequence may be cloned from a strain producing the polypeptide, or from another related organism and thus, for example, may be an allelic or species variant of the polypeptide encoding region of the nucleic acid sequence.
The term “isolated” nucleic acid sequence as used herein refers to a nucleic acid sequence which is essentially free of other nucleic acid sequences, e.g., at least about 20% pure, preferably at least about 40% pure, more preferably about 60% pure, even more preferably about 80% pure, most preferably about 90% pure, and even most preferably about 95% pure, as determined by agarose gel electorphoresis. For example, an isolated nucleic acid sequence can be obtained by standard cloning procedures used in genetic engineering to relocate the nucleic acid sequence from its natural location to a different site where it will be reproduced. The cloning procedures may involve excision and isolation of a desired nucleic acid fragment comprising the nucleic acid sequence encoding the polypeptide, insertion of the fragment into a vector molecule, and incorporation of the recombinant vector into a host cell where multiple copies or clones of the nucleic acid sequence will be replicated. The nucleic acid sequence may be of genomic, cDNA, RNA, semisynthetic, synthetic origin, or any combinations thereof.
As used herein the term “nucleic acid construct” is intended to indicate any nucleic acid molecule of cDNA, genomic DNA, synthetic DNA or RNA origin. The term “construct” is intended to indicate a nucleic acid segment which may be single- or double-stranded, and which may be based on a complete or partial naturally occurring nucleotide sequence encoding a polypeptide of interest. The construct may optionally contain other nucleic acid segments.
The DNA of interest may suitably be of genomic or cDNA origin, for instance obtained by preparing a genomic or cDNA library and screening for DNA sequences coding for all or part of the polypeptide by hybridization using synthetic oligonucleotide probes in accordance with standard techniques (cf. Sambrook et al., supra).
The nucleic acid construct may also be prepared synthetically by established standard methods, e.g. the phosphoamidite method described by Beaucage and Caruthers, Tetrahedron Letters 22 (1981), 1859-1869, or the method described by Matthes et al., EMBO Journal 3 (1984), 801-805. According to the phosphoamidite method, oligonucleotides are synthesized, e.g. in an automatic DNA synthesizer, purified, annealed, ligated and cloned in suitable vectors.
Furthermore, the nucleic acid construct may be of mixed synthetic and genomic, mixed synthetic and cDNA or mixed genomic and cDNA origin prepared by ligating fragments of synthetic, genomic or cDNA origin (as appropriate), the fragments corresponding to various parts of the entire nucleic acid construct, in accordance with standard techniques.
The nucleic acid construct may also be prepared by polymerase chain reaction using specific primers, for instance as described in U.S. Pat. No. 4,683,202 or Saiki et al., Science 239 (1988), 487-491.
The term nucleic acid construct may be synonymous with the term expression cassette when the nucleic acid construct contains all the control sequences required for expression of a coding sequence of the present invention. The term “coding sequence” as defined herein is a sequence which is transcribed into mRNA and translated into a polypeptide of the present invention when placed under the control of the above mentioned control sequences. The boundaries of the coding sequence are generally determined by a translation start codon ATG at the 5′-terminus and a translation stop codon at the 3′-terminus. A coding sequence can include, but is not limited to, DNA, cDNA, and recombinant nucleic acid sequences.
The term “control sequences” is defined herein to include all components which are necessary or advantageous for expression of the coding sequence of the nucleic acid sequence. Each control sequence may be native or foreign to the nucleic acid sequence encoding the polypeptide. Such control sequences include, but are not limited to, a leader, a polyadenylation sequence, a propeptide sequence, a promoter, a signal sequence, and a transcription terminator. At a minimum, the control sequences include a promoter, and transcriptional and translational stop signals. The control sequences may be provided with linkers for the purpose of introducing specific restriction sites facilitating ligation of the control sequences with the coding region of the nucleic acid sequence encoding a polypeptide.
The control sequence may be an appropriate promoter sequence, a nucleic acid sequence which is recognized by a host cell for expression of the nucleic acid sequence. The promoter sequence contains transcription and translation control sequences which mediate the expression of the polypeptide. The promoter may be any nucleic acid sequence which shows transcriptional activity in the host cell of choice and may be obtained from genes encoding extracellular or intracellular polypeptides either homologous or heterologous to the host cell.
The control sequence may also be a suitable transcription terminator sequence, a sequence recognized by a host cell to terminate transcription. The terminator sequence is operably linked to the 3′ terminus of the nucleic acid sequence encoding the polypeptide. Any terminator which is functional in the host cell of choice may be used in the present invention.
The control sequence may also be a polyadenylation sequence, a sequence which is operably linked to the 3′ terminus of the nucleic acid sequence and which, when transcribed, is recognized by the host cell as a signal to add polyadenosine residues to transcribed mRNA. Any polyadenylation sequence which is functional in the host cell of choice may be used in the present invention.
The control sequence may also be a signal peptide coding region, which codes for an amino acid sequence linked to the amino terminus of the polypeptide which can direct the expressed polypeptide into the cell's secretory pathway of the host cell. The 5′ end of the coding sequence of the nucleic acid sequence may inherently contain a signal peptide coding region naturally linked in translation reading frame with the segment of the coding region which encodes the secreted polypeptide.
Alternatively, the 5′ end of the coding sequence may contain a signal peptide coding region which is foreign to that portion of the coding sequence which encodes the secreted polypeptide. A foreign signal peptide coding region may be required where the coding sequence does not normally contain a signal peptide coding region. Alternatively, the foreign signal peptide coding region may simply replace the natural signal peptide coding region in order to obtain enhanced secretion relative to the natural signal peptide coding region normally associated with the coding sequence. The signal peptide coding region may be obtained from a glucoamylase or an amylase gene from an Aspergillus species, a lipase or proteinase gene from a Rhizomucor species, the gene for the alpha-factor from Saccharomyces cerevisiae, an amylase or a protease gene from a Bacillus species, or the calf preprochymosin gene. However, any signal peptide coding region capable of directing the expressed polypeptide into the secretory pathway of a host cell of choice may be used in the present invention.
The control sequence may also be a propeptide coding region, which codes for an amino acid sequence positioned at the amino terminus of a polypeptide. The resultant polypeptide is known as a proenzyme or propolypeptide (or a zymogen in some cases). A propolypeptide is generally inactive and can be converted to mature active polypeptide by catalytic or autocatalytic cleavage of the propeptide from the propolypeptide. The propeptide coding region may be obtained from the Bacillus subtilis alkaline protease gene (aprE), the Bacillus subtilis neutral protease gene (nprT), the Saccharomyces cerevisiae alpha-factor gene, or the Myceliophthora thermophilum laccase gene (WO 95/33836).
The nucleic acid constructs of the present invention may also comprise one or more nucleic acid sequences which encode one or more factors that are advantageous in the expression of the polypeptide, e.g., an activator (e.g., a trans-acting factor), a chaperone, and a processing protease. Any factor that is functional in the host cell of choice may be used in the present invention. The nucleic acids encoding one or more of these factors are not necessarily in tandem with the nucleic acid sequence encoding the polypeptide.
An activator is a protein which activates transcription of a nucleic acid sequence encoding a polypeptide (Kudla et al., 1990, EMBO Journal 9:1355-1364; Jarai and Buxton, 1994, Current Genetics 26:2238-244; Verdier, 1990, Yeast 6:271-297). The nucleic acid sequence encoding an activator may be obtained from the genes encoding Bacillus stearothermophilus NprA (nprA), Saccharomyces cerevisiae heme activator protein 1 (hap1), Saccharomyces cerevisiae galactose metabolizing protein 4 (gal4), and Aspergillus nidulans ammonia regulation protein (areA). For further examples, see Verdier, 1990, supra and MacKenzie et al., 1993, Journal of General Microbiology 139:2295-2307.
A chaperone is a protein which assists another polypeptide in folding properly (Hartl et al., 1994, TIBS 19:20-25; Bergeron et al., 1994, TIBS 19:124-128; Demolder et al., 1994, Journal of Biotechnology 32:179-189; Craig, 1993, Science 260:1902-1903; Gething and Sambrook, 1992, Nature 355:33-45; Puig and Gilbert, 1994, Journal of Biological Chemistry 269:7764-7771; Wang and Tsou, 1993, The FASEB Journal 7:1515-11157; Robinson et al., 1994, Bio/Technology 1:381-384). The nucleic acid sequence encoding a chaperone may be obtained from the genes encoding Bacillus subtilis GroE proteins, Aspergillus oryzae protein disulphide isomerase, Saccharomyces cerevisiae calnexin, Saccharomyces cerevisiae BiP/GRP78, and Saccharomyces cerevisiae Hsp70. For further examples, see Gething and Sambrook, 1992, supra, and Hartl et al., 1994, supra.
A processing protease is a protease that cleaves a propeptide to generate a mature biochemically active polypeptide (Enderlin and Ogrydziak, 1994, Yeast 10:67-79; Fuller et al., 1989, Proceedings of the National Academy of Sciences USA 86:1434-1438; Julius et al., 1984, Cell 37:1075-1089; Julius et al., 1983, Cell 32:839-852). The nucleic acid sequence encoding a processing protease may be obtained from the genes encoding Aspergillus niger Kex2, Saccharomyces cerevisiae dipeptidylaminopeptidase, Saccharomyces cerevisiae Kex2, and Yarrowia lipolytica dibasic processing endoprotease (xpr6).
It may also be desirable to add regulatory sequences which allow the regulation of the expression of the polypeptide relative to the growth of the host cell. Examples of regulatory systems are those which cause the expression of the gene to be turned on or off in response to a chemical or physical stimulus, including the presence of a regulatory compound. Regulatory systems in prokaryotic systems would include the lac, tac, and trp operator systems. In yeast, the ADH2 system or GAL1 system may be used. In filamentous fungi, the TAKA alpha-amylase promoter, Aspergillus niger glucoamylase promoter, and the Aspergillus oryzae glucoamylase promoter may be used as regulatory sequences. Other examples of regulatory sequences are those which allow for gene amplification. In eukaryotic systems, these include the dihydrofolate reductase gene which is amplified in the presence of methotrexate, and the metallothionein genes which are amplified with heavy metals. In these cases, the nucleic acid sequence encoding the polypeptide would be placed in tandem with the regulatory sequence.
Examples of suitable promoters for directing the transcription of the nucleic acid constructs of the present invention, especially in a bacterial host cell, are the promoters obtained from the E. coli lac operon, the Streptomyces coelicolor agarase gene (dagA), the Bacillus subtilis levansucrase gene (sacB), the Bacillus subtilis alkaline protease gene, the Bacillus licheniformis alpha-amylase gene (amyL), the Bacillus stearothermophilus maltogenic amylase gene (amyM), the Bacillus amyloliquefaciens alpha-amylase gene (amyQ), the Bacillus amyloliquefaciens BAN amylase gene, the Bacillus licheniformis penicillinase gene (penP), the Bacillus subtilis xylA and xylB genes, and the prokaryotic beta-lactamase gene (Villa-Kamaroff et al., 1978, Proceedings of the National Academy of Sciences USA 75:3727-3731), as well as the tac promoter (DeBoer et al., 1983, Proceedings of the National Academy of Sciences USA 80:21-25), or the Bacillus pumilus xylosidase gene, or by the phage Lambda PR or PL promoters or the E. coli lac, trp or tac promoters. Further promoters are described in “Useful proteins from recombinant bacteria” in Scientific American, 1980, 242:74-94; and in Sambrook et al., 1989, supra.
Examples of suitable promoters for directing the transcription of the nucleic acid constructs of the present invention in a filamentous fungal host cell are promoters obtained from the genes encoding Aspergillus oryzae TAKA amylase, Rhizomucor miehei aspartic proteinase, Aspergillus niger neutral alpha-amylase, Aspergillus niger acid stable alpha-amylase, Aspergillus niger or Aspergillus awamori glucoamylase (glaA), Rhizomucor miehei lipase, Aspergillus oryzae alkaline protease, Aspergillus oryzae triose phosphate isomerase, Aspergillus nidulans acetamidase, Fusarium oxysporum trypsin-like protease (as described in U.S. Pat. No. 4,288,627, which is incorporated herein by reference), and hybrids thereof. Particularly preferred promoters for use in filamentous fungal host cells are the TAKA amylase, NA2-tpi (a hybrid of the promoters from the genes encoding Aspergillus niger neutral alpha-amylase and Aspergillus oryzae triose phosphate isomerase), and glaA promoters. Further suitable promoters for use in filamentous fungus host cells are the ADH3 promoter (McKnight et al., The EMBO J. 4 (1985), 2093-2099) or the tpiA promoter.
Examples of suitable promoters for use in yeast host cells include promoters from yeast glycolytic genes (Hitzeman et al., J. Biol. Chem. 255 (1980), 12073-12080; Alber and Kawasaki, J. Mol. Appl. Gen. 1 (1982), 419-434) or alcohol dehydrogenase genes (Young et al., in Genetic Engineering of Microorganisms for Chemicals (Hollaender et al, eds.), Plenum Press, New York, 1982), or the TPI1 (U.S. Pat. No. 4,599,311) or ADH2-4-c (Russell et al., Nature 304 (1983), 652-654) promoters.
Further useful promoters are obtained from the Saccharomyces cerevisiae enolase (ENO-1) gene, the Saccharomyces cerevisiae galactokinase gene (GAL1), the Saccharomyces cerevisiae alcohol dehydrogenase/glyceraldehyde-3-phosphate dehydrogenase genes (ADH2/GAP), and the Saccharomyces cerevisiae 3-phosphoglycerate kinase gene. Other useful promoters for yeast host cells are described by Romanos et al., 1992, Yeast 8:423-488. In a mammalian host cell, useful promoters include viral promoters such as those from Simian Virus 40 (SV40), Rous sarcoma virus (RSV), adenovirus, and bovine papilloma virus (BPV).
Examples of suitable promoters for directing the transcription of the DNA encoding the polypeptide of the invention in mammalian cells are the SV40 promoter (Subramani et al., Mol. Cell. Biol. 1 (1981), 854-864), the MT-1 (metallothionein gene) promoter (Palmiter et al., Science 222 (1983), 809-814) or the adenovirus 2 major late promoter.
An example of a suitable promoter for use in insect cells is the polyhedrin promoter (U.S. Pat. No. 4,745,051; Vasuvedan et al., FEBS Lett. 311, (1992) 7-11), the P10 promoter (J. M. Vlak et al., J. Gen. Virology 69, 1988, pp. 765-776), the Autographa californica polyhedrosis virus basic protein promoter (EP 397 485), the baculovirus immediate early gene 1 promoter (U.S. Pat. No. 5,155,037; U.S. Pat. No. 5,162,222), or the baculovirus 39K delayed-early gene promoter (U.S. Pat. No. 5,155,037; U.S. Pat. No. 5,162,222).
Preferred terminators for filamentous fungal host cells are obtained from the genes encoding Aspergillus oryzae TAKA amylase, Aspergillus niger glucoamylase, Aspergillus nidulans anthranilate synthase, Aspergillus niger alpha-glucosidase, and Fusarium oxysporum trypsin-like protease. for fungal hosts) the TPI1 (Alber and Kawasaki, op. cit.) or ADH3 (McKnight et al., op. cit.) terminators.
Preferred terminators for yeast host cells are obtained from the genes encoding Saccharomyces cerevisiae enolase, Saccharomyces cerevisiae cytochrome C (CYC1), or Saccharomyces cerevisiae glyceraldehyde-3-phosphate dehydrogenase. Other useful terminators for yeast host cells are described by Romanos et al., 1992, supra.
Preferred polyadenylation sequences for filamentous fungal host cells are obtained from the genes encoding Aspergillus oryzae TAKA amylase, Aspergillus niger glucoamylase, Aspergillus nidulans anthranilate synthase, and Aspergillus niger alpha-glucosidase.
Useful polyadenylation sequences for yeast host cells are described by Guo and Sherman, 1995, Molecular Cellular Biology 15:5983-5990.
Polyadenylation sequences are well known in the art for mammalian host cells such as SV40 or the adenovirus 5 Elb region.
An effective signal peptide coding region for bacterial host cells is the signal peptide coding region obtained from the maltogenic amylase gene from Bacillus NCIB 11837, the Bacillus stearothermophilus alpha-amylase gene, the Bacillus licheniformis subtilisin gene, the Bacillus licheniformis beta-lactamase gene, the Bacillus stearothermophilus neutral proteases genes (nprT, nprS, nprM), and the Bacillus subtilis PrsA gene. Further signal peptides are described by Simonen and Palva, 1993, Microbiological Reviews 57:109-137.
An effective signal peptide coding region for filamentous fungal host cells is the signal peptide coding region obtained from Aspergillus oryzae TAKA amylase gene, Aspergillus niger neutral amylase gene, the Rhizomucor miehei aspartic proteinase gene, the Humicola lanuginosa cellulase or lipase gene, or the Rhizomucor miehei lipase or protease gene, Aspergillus sp. amylase or glucoamylase, a gene encoding a Rhizomucor miehei lipase or protease. The signal peptide is preferably derived from a gene encoding A. oryzae TAKA amylase, A. niger neutral alpha-amylase, A. niger acid-stable amylase, or A. niger glucoamylase.
Useful signal peptides for yeast host cells are obtained from the genes for Saccharomyces cerevisiae alpha-factor and Saccharomyces cerevisiae invertase. Other useful signal peptide coding regions are described by Romanos et al., 1992, supra.
For secretion from yeast cells, the secretory signal sequence may encode any signal peptide which ensures efficient direction of the expressed polypeptide into the secretory pathway of the cell. The signal peptide may be naturally occurring signal peptide, or a functional part thereof, or it may be a synthetic peptide. Suitable signal peptides have been found to be the a-factor signal peptide (cf. U.S. Pat. No. 4,870,008), the signal peptide of mouse salivary amylase (cf. O. Hagenbuchle et al., Nature 289, 1981, pp. 643-646), a modified carboxypeptidase signal peptide (cf. L. A. Valls et al., Cell 48, 1987, pp. 887-897), the yeast BAR1 signal peptide (cf. WO 87/02670), or the yeast aspartic protease 3 (YAP3) signal peptide (cf. M. Egel-Mitani et al., Yeast 6, 1990, pp. 127-137).
For efficient secretion in yeast, a sequence encoding a leader peptide may also be inserted downstream of the signal sequence and upstream of the DNA sequence encoding the polypeptide. The function of the leader peptide is to allow the expressed polypeptide to be directed from the endoplasmic reticulum to the Golgi apparatus and further to a secretory vesicle for secretion into the culture medium (i.e. exportation of the polypeptide across the cell wall or at least through the cellular membrane into the periplasmic space of the yeast cell). The leader peptide may be the yeast a-factor leader (the use of which is described in e.g. U.S. Pat. No. 4,546,082, EP 16 201, EP 123 294, EP 123 544 and EP 163 529). Alternatively, the leader peptide may be a synthetic leader peptide, which is to say a leader peptide not found in nature. Synthetic leader peptides may, for instance, be constructed as described in WO 89/02463 or WO 92/11378.
For use in insect cells, the signal peptide may conveniently be derived from an insect gene (cf. WO 90/05783), such as the lepidopteran Manduca sexta adipokinetic hormone precursor signal peptide (cf. U.S. Pat. No. 5,023,328).
The present invention also relates to recombinant expression vectors comprising a nucleic acid sequence of the present invention, a promoter, and transcriptional and translational stop signals. The various nucleic acid and control sequences described above may be joined together to produce a recombinant expression vector which may include one or more convenient restriction sites to allow for insertion or substitution of the nucleic acid sequence encoding the polypeptide at such sites. Alternatively, the nucleic acid sequence of the present invention may be expressed by inserting the nucleic acid sequence or a nucleic acid construct comprising the sequence into an appropriate vector for expression. In creating the expression vector, the coding sequence is located in the vector so that the coding sequence is operably linked with the appropriate control sequences for expression, and possibly secretion.
The recombinant expression vector may be any vector (e.g., a plasmid or virus) which can be conveniently subjected to recombinant DNA procedures and can bring about the expression of the nucleic acid sequence. The choice of the vector will typically depend on the compatibility of the vector with the host cell into which the vector is to be introduced. The vectors may be linear or closed circular plasmids. The vector may be an autonomously replicating vector, i.e., a vector which exists as an extrachromosomal entity, the replication of which is independent of chromosomal replication, e.g., a plasmid, an extrachromosomal element, a minichromosome, or an artificial chromosome. The vector may contain any means for assuring self-replication. Alternatively, the vector may be one which, when introduced into the host cell, is integrated into the genome and replicated together with the chromosome(s) into which it has been integrated. The vector system may be a single vector or plasmid or two or more vectors or plasmids which together contain the total DNA to be introduced into the genome of the host cell, or a transposon.
The vectors of the present invention preferably contain one or more selectable markers which permit easy selection of transformed cells. A selectable marker is a gene the product of which provides for biocide or viral resistance, resistance to heavy metals, prototrophy to auxotrophs, and the like. Examples of bacterial selectable markers are the dal genes from Bacillus subtilis or Bacillus licheniformis, or markers which confer antibiotic resistance such as ampicillin, kanamycin, chloramphenicol, tetracycline, neomycin, hygromycin or methotrexate resistance. A frequently used mammalian marker is the dihydrofolate reductase gene (DHFR). Suitable markers for yeast host cells are ADE2, HIS3, LEU2, LYS2, MET3, TRP1, and URA3. A selectable marker for use in a filamentous fungal host cell may be selected from the group including, but not limited to, amdS (acetamidase), argB (ornithine carbamoyltransferase), bar (phosphinothricin acetyltransferase), hygB (hygromycin phosphotransferase), niaD (nitrate reductase), pyrG (orotidine-5′-phosphate decarboxylase), sC (sulfate adenyltransferase), trpC (anthranilate synthase), and glufosinate resistance markers, as well as equivalents from other species. Preferred for use in an Aspergillus cell are the amdS and pyrG markers of Aspergillus nidulans or Aspergillus oryzae and the bar marker of Streptomyces hygroscopicus. Furthermore, selection may be accomplished by co-transformation, e.g., as described in WO 91/17243, where the selectable marker is on a separate vector.
The vectors of the present invention preferably contain an element(s) that permits stable integration of the vector into the host cell genome or autonomous replication of the vector in the cell independent of the genome of the cell.
The vectors of the present invention may be integrated into the host cell genome when introduced into a host cell. For integration, the vector may rely on the nucleic acid sequence encoding the polypeptide or any other element of the vector for stable integration of the vector into the genome by homologous or nonhomologous recombination. Alternatively, the vector may contain additional nucleic acid sequences for directing integration by homologous recombination into the genome of the host cell. The additional nucleic acid sequences enable the vector to be integrated into the host cell genome at a precise location(s) in the chromosome(s). To increase the likelihood of integration at a precise location, the integrational elements should preferably contain a sufficient number of nucleic acids, such as 100 to 1,500 base pairs, preferably 400 to 1,500 base pairs, and most preferably 800 to 1,500 base pairs, which are highly homologous with the corresponding target sequence to enhance the probability of homologous recombination. The integrational elements may be any sequence that is homologous with the target sequence in the genome of the host cell. Furthermore, the integrational elements may be non-encoding or encoding nucleic acid sequences. On the other hand, the vector may be integrated into the genome of the host cell by non-homologous recombination. These nucleic acid sequences may be any sequence that is homologous with a target sequence in the genome of the host cell, and, furthermore, may be non-encoding or encoding sequences.
For autonomous replication, the vector may further comprise an origin of replication enabling the vector to replicate autonomously in the host cell in question. Examples of bacterial origins of replication are the origins of replication of plasmids pBR322, pUC19, pACYC177, pACYC184, pUB110, pE194, pTA1060, and pAMβ1. Examples of origin of replications for use in a yeast host cell are the 2 micron origin of replication, the combination of CEN6 and ARS4, and the combination of CEN3 and ARS1. The origin of replication may be one having a mutation which makes its functioning temperature-sensitive in the host cell (see, e.g., Ehrlich, 1978, Proceedings of the National Academy of Sciences USA 75:1433).
More than one copy of a nucleic acid sequence encoding a polypeptide of the present invention may be inserted into the host cell to amplify expression of the nucleic acid sequence. Stable amplification of the nucleic acid sequence can be obtained by integrating at least one additional copy of the sequence into the host cell genome using methods well known in the art and selecting for transformants.
The procedures used to ligate the elements described above to construct the recombinant expression vectors of the present invention are well known to one skilled in the art (see, e.g., Sambrook et al., 1989, supra).
The present invention also relates to recombinant host cells, comprising a nucleic acid sequence of the invention, which are advantageously used in the recombinant production of the polypeptides. The term “host cell” encompasses any progeny of a parent cell which is not identical to the parent cell due to mutations that occur during replication.
The cell is preferably transformed with a vector comprising a nucleic acid sequence of the invention followed by integration of the vector into the host chromosome. “Transformation” means introducing a vector comprising a nucleic acid sequence of the present invention into a host cell so that the vector is maintained as a chromosomal integrant or as a self-replicating extra-chromosomal vector. Integration is generally considered to be an advantage as the nucleic acid sequence is more likely to be stably maintained in the cell. Integration of the vector into the host chromosome may occur by homologous or non-homologous recombination as described above.
The choice of a host cell will to a large extent depend upon the gene encoding the polypeptide and its source. The host cell may be a unicellular microorganism, e.g., a prokaryote, or a non-unicellular microorganism, e.g., a eukaryote. Useful unicellular cells are bacterial cells such as gram positive bacteria including, but not limited to, a Bacillus cell, e.g., Bacillus alkalophilus, Bacillus amyloliquefaciens, Bacillus brevis, Bacillus circulans, Bacillus coagulans, Bacillus lautus, Bacillus lentus, Bacillus licheniformis, Bacillus megaterium, Bacillus stearothermophilus, Bacillus subtilis, and Bacillus thuringiensis; or a Streptomyces cell, e.g., Streptomyces lividans or Streptomyces murinus, or gram negative bacteria such as E. coli and Pseudomonas sp. In a preferred embodiment, the bacterial host cell is a Bacillus lentus, Bacillus licheniformis, Bacillus stearothermophilus or Bacillus subtilis cell. The transformation of a bacterial host cell may, for instance, be effected by protoplast transformation (see, e.g., Chang and Cohen, 1979, Molecular General Genetics 168:111-115), by using competent cells (see, e.g., Young and Spizizin, 1961, Journal of Bacteriology 81:823-829, or Dubnar and Davidoff-Abelson, 1971, Journal of Molecular Biology 56:209-221), by electroporation (see, e.g., Shigekawa and Dower, 1988, Biotechniques 6:742-751), or by conjugation (see, e.g., Koehler and Thorne, 1987, Journal of Bacteriology 169:5771-5278).
The host cell may be a eukaryote, such as a mammalian cell, an insect cell, a plant cell or a fungal cell.
Useful mammalian cells include Chinese hamster ovary (CHO) cells, HeLa cells, baby hamster kidney (BHK) cells, COS cells, or any number of other immortalized cell lines available, e.g., from the American Type Culture Collection.
Examples of suitable mammalian cell lines are the COS (ATCC CRL 1650 and 1651), BHK (ATCC CRL 1632, 10314 and 1573, ATCC CCL 10), CHL (ATCC CCL39) or CHO (ATCC CCL 61) cell lines. Methods of transfecting mammalian cells and expressing DNA sequences introduced in the cells are described in e.g. Kaufman and Sharp, J. Mol. Biol. 159 (1982), 601-621; Southern and Berg, J. Mol. Appl. Genet. 1 (1982), 327-341; Loyter et al., Proc. Natl. Acad. Sci. USA 79 (1982), 422-426; Wigler et al., Cell 14 (1978), 725; Corsaro and Pearson, Somatic Cell Genetics 7 (1981), 603, Ausubel et al., Current Protocols in Molecular Biology, John Wiley and Sons, Inc., N.Y., 1987, Hawley-Nelson et al., Focus 15 (1993), 73; Ciccarone et al., Focus 15 (1993), 80; Graham and van der Eb, Virology 52 (1973), 456; and Neumann et al., EMBO J. 1 (1982), 841-845.
In a preferred embodiment, the host cell is a fungal cell. “Fungi” as used herein includes the phyla Ascomycota, Basidiomycota, Chytridiomycota, and Zygomycota (as defined by Hawksworth et al., In, Ainsworth and Bisby's Dictionary of The Fungi, 8th edition, 1995, CAB International, University Press, Cambridge, UK) as well as the Oomycota (as cited in Hawksworth et al., 1995, supra, page 171) and all mitosporic fungi (Hawksworth et al., 1995, supra). Representative groups of Ascomycota include, e.g., Neurospora, Eupenicillium (=Penicillium), Emericella (=Aspergillus), Eurotium (=Aspergillus), and the true yeasts listed above. Examples of Basidiomycota include mushrooms, rusts, and smuts. Representative groups of Chytridiomycota include, e.g., Allomyces, Blastocladiella, Coelomomyces, and aquatic fungi. Representative groups of Oomycota include, e.g., Saprolegniomycetous aquatic fungi (water molds) such as Achlya. Examples of mitosporic fungi include Aspergillus, Penicillium, Candida, and Alternaria. Representative groups of Zygomycota include, e.g., Rhizopus and Mucor.
In a preferred embodiment, the fungal host cell is a yeast cell. “Yeast” as used herein includes ascosporogenous yeast (Endomycetales), basidiosporogenous yeast, and yeast belonging to the Fungi Imperfecti (Blastomycetes). The ascosporogenous yeasts are divided into the families Spermophthoraceae and Saccharomycetaceae. The latter is comprised of four subfamilies, Schizosaccharomycoideae (e.g., genus Schizosaccharomyces), Nadsonioideae, Lipomycoideae, and Saccharomycoideae (e.g., genera Pichia, Kluyveromyces and Saccharomyces). The basidiosporogenous yeasts include the genera Leucosporidim, Rhodosporidium, Sporidiobolus, Filobasidium, and Filobasidiella. Yeast belonging to the Fungi Imperfecti are divided into two families, Sporobolomycetaceae (e.g., genera Sorobolomyces and Bullera) and Cryptococcaceae (e.g., genus Candida). Since the classification of yeast may change in the future, for the purposes of this invention, yeast shall be defined as described in Biology and Activities of Yeast (Skinner, F. A., Passmore, S. M., and Davenport, R. R., eds, Soc. App. Bacteriol. Symposium Series No. 9, 1980. The biology of yeast and manipulation of yeast genetics are well known in the art (see, e.g., Biochemistry and Genetics of Yeast, Bacil, M., Horecker, B. J., and Stopani, A. O. M., editors, 2nd edition, 1987; The Yeasts, Rose, A. H., and Harrison, J. S., editors, 2nd edition, 1987; and The Molecular Biology of the Yeast Saccharomyces, Strathern et al., editors, 1981).
The yeast host cell may be selected from a cell of a species of Candida, Kluyveromyces, Saccharomyces, Schizosaccharomyces, Candida, Pichia, Hansenula, or Yarrowia. In a preferred embodiment, the yeast host cell is a Saccharomyces carlsbergensis, Saccharomyces cerevisiae, Saccharomyces diastaticus, Saccharomyces douglasii, Saccharomyces kluyveri, Saccharomyces norbensis or Saccharomyces oviformis cell. Other useful yeast host cells are a Kluyveromyces lactis, Kluyveromyces fragilis, Hansenula polymorpha, Pichia pastoris, Yarrowia lipolytica, Schizosaccharomyces pombe, Ustilgo maylis, Candida maltose, Pichia guillermondii and Pichia methanolio cell (cf. Gleeson et al., J. Gen. Microbiol. 132, 1986, pp. 3459-3465; U.S. Pat. No. 4,882,279 and U.S. Pat. No. 4,879,231).
In a preferred embodiment, the fungal host cell is a filamentous fungal cell. “Filamentous fungi” include all filamentous forms of the subdivision Eumycota and Oomycota (as defined by Hawksworth et al., 1995, supra). The filamentous fungi are characterized by a vegetative mycelium composed of chitin, cellulose, glucan, chitosan, mannan, and other complex polysaccharides. Vegetative growth is by hyphal elongation and carbon catabolism is obligately aerobic. In contrast, vegetative growth by yeasts such as Saccharomyces cerevisiae is by budding of a unicellular thallus and carbon catabolism may be fermentative. In a more preferred embodiment, the filamentous fungal host cell is a cell of a species of, but not limited to, Acremonium, Aspergillus, Fusarium, Humicola, Mucor, Myceliophthora, Neurospora, Penicillium, Thielavia, Tolypocladium, and Trichoderma or a teleomorph or synonym thereof. In an even more preferred embodiment, the filamentous fungal host cell is an Aspergillus cell. In another even more preferred embodiment, the filamentous fungal host cell is an Acremonium cell. In another even more preferred embodiment, the filamentous fungal host cell is a Fusarium cell. In another even more preferred embodiment, the filamentous fungal host cell is a Humicola cell. In another even more preferred embodiment, the filamentous fungal host cell is a Mucor cell. In another even more preferred embodiment, the filamentous fungal host cell is a Myceliophthora cell. In another even more preferred embodiment, the filamentous fungal host cell is a Neurospora cell. In another even more preferred embodiment, the filamentous fungal host cell is a Penicillium cell. In another even more preferred embodiment, the filamentous fungal host cell is a Thielavia cell. In another even more preferred embodiment, the filamentous fungal host cell is a Tolypocladium cell. In another even more preferred embodiment, the filamentous fungal host cell is a Trichoderma cell. In a most preferred embodiment, the filamentous fungal host cell is an Aspergillus awamori, Aspergillus foetidus, Aspergillus japonicus, Aspergillus niger, Aspergillus nidulans or Aspergillus oryzae cell. In another most preferred embodiment, the filamentous fungal host cell is a Fusarium cell of the section Discolor (also known as the section Fusarium). For example, the filamentous fungal parent cell may be a Fusarium bactridioides, Fusarium cerealis, Fusarium crookwellense, Fusarium culmorum, Fusarium graminearum, Fusarium graminum, Fusarium heterosporum, Fusarium negundi, Fusarium reticulaturn, Fusarium roseum, Fusarium sambucinum, Fusarium sarcochroum, Fusarium sulphureum, or Fusarium trichothecioides cell. In another preferred embodiment, the filamentous fungal parent cell is a Fusarium strain of the section Elegans, e.g., Fusarium oxysporum. In another most preferred embodiment, the filamentous fungal host cell is a Humicola insolens or Humicola lanuginosa cell. In another most preferred embodiment, the filamentous fungal host cell is a Mucor miehei cell. In another most preferred embodiment, the filamentous fungal host cell is a Myceliophthora thermophilum cell. In another most preferred embodiment, the filamentous fungal host cell is a Neurospora crassa cell. In another most preferred embodiment, the filamentous fungal host cell is a Penicillium purpurogenum cell. In another most preferred embodiment, the filamentous fungal host cell is a Thielavia terrestris cell or an Acremonium chtysogenum cell. In another most preferred embodiment, the Trichoderma cell is a Trichoderma harzianum, Trichoderma koningii, Trichoderma longibrachiatum, Trichoderma reesei or Trichoderma viride cell. The use of Aspergillus spp. for the expression of proteins is described in, e.g., EP 272 277, EP 230 023.
Fungal cells may be transformed by a process involving protoplast formation, transformation of the protoplasts, and regeneration of the cell wall in a manner known per se. Suitable procedures for transformation of Aspergillus host cells are described in EP 238 023 and Yelton et al., 1984, Proceedings of the National Academy of Sciences USA 81:1470-1474. A suitable method of transforming Fusarium species is described by Malardier et al., 1989, Gene 78:147-156 or in copending U.S. Ser. No. 08/269,449. Examples of other fungal cells are cells of filamentous fungi, e.g. Aspergillus spp., Neurospora spp., Fusarium spp. or Trichoderma spp., in particular strains of A. oryzae, A. nidulans or A. niger. The use of Aspergillus spp. for the expression of proteins is described in, e.g., EP 272 277, EP 230 023. The transformation of F. oxysporum may, for instance, be carried out as described by Malardier et al., 1989, Gene 78: 147-156.
Yeast may be transformed using the procedures described by Becker and Guarente, In Abelson, J. N. and Simon, M. I., editors, Guide to Yeast Genetics and Molecular Biology, Methods in Enzymology, Volume 194, pp 182-187, Academic Press, Inc., New York; Ito et al., 1983, Journal of Bacteriology 153:163; and Hinnen et al., 1978, Proceedings of the National Academy of Sciences USA 75:1920. Mammalian cells may be transformed by direct uptake using the calcium phosphate precipitation method of Graham and Van der Eb (1978, Virology 52:546).
Transformation of insect cells and production of heterologous polypeptides therein may be performed as described in U.S. Pat. No. 4,745,051; U.S. Pat. No. 4,775,624; U.S. Pat. No. 4,879,236; U.S. Pat. No. 5,155,037; U.S. Pat. No. 5,162,222; EP 397,485) all of which are incorporated herein by reference. The insect cell line used as the host may suitably be a Lepidoptera cell line, such as Spodoptera frugiperda cells or Trichoplusia ni cells (cf. U.S. Pat. No. 5,077,214). Culture conditions may suitably be as described in, for instance, WO 89/01029 or WO 89/01028, or any of the aforementioned references.
The transformed or transfected host cells described above are cultured in a suitable nutrient medium under conditions permitting the production of the desired molecules, after which these are recovered from the cells, or the culture broth.
The medium used to culture the cells may be any conventional medium suitable for growing the host cells, such as minimal or complex media containing appropriate supplements. Suitable media are available from commercial suppliers or may be prepared according to published recipes (e.g. in catalogues of the American Type Culture Collection). The media are prepared using procedures known in the art (see, e.g., references for bacteria and yeast; Bennett, J. W. and LaSure, L., editors, More Gene Manipulations in Fungi, Academic Press, CA, 1991).
If the molecules are secreted into the nutrient medium, they can be recovered directly from the medium. If they are not secreted, they can be recovered from cell lysates. The molecules are recovered from the culture medium by conventional procedures including separating the host cells from the medium by centrifugation or filtration, precipitating the proteinaceous components of the supernatant or filtrate by means of a salt, e.g. ammonium sulphate, purification by a variety of chromatographic procedures, e.g. ion exchange chromatography, gelfiltration chromatography, affinity chromatography, or the like, dependent on the type of molecule in question.
The molecules of interest may be detected using methods known in the art that are specific for the molecules. These detection methods may include use of specific antibodies, formation of a product, or disappearance of a substrate. For example, an enzyme assay may be used to determine the activity of the molecule. Procedures for determining various kinds of activity are known in the art.
The molecules of the present invention may be purified by a variety of procedures known in the art including, but not limited to, chromatography (e.g., ion exchange, affinity, hydrophobic, chromatofocusing, and size exclusion), electrophoretic procedures (e.g., preparative isoelectric focusing (IEF), differential solubility (e.g., ammonium sulfate precipitation), or extraction (see, e.g., Protein Purification, J-C Janson and Lars Ryden, editors, VCH Publishers, New York, 1989).
The term “immunological response”, used in connection with the present invention, is the response of an organism to a compound, which involves the immune system according to any of the four standard reactions (Type I, II, III and IV according to Coombs & Gell).
Correspondingly, the “immunogenicity” of a compound used in connection with the present invention refers to the ability of this compound to induce an ‘immunological response’ in animals including man.
The term “allergic response”, used in connection with the present invention, is the response of an organism to a compound, which involves IgE mediated responses (Type I reaction according to Coombs & Gell). It is to be understood that sensibilization (i.e. development of compound-specific IgE antibodies) upon exposure to the compound is included in the definition of “allergic response”.
Correspondingly, the “allergenicity” of a compound used in connection with the present invention refers to the ability of this compound to induce an ‘allergic response’ in animals including man.
The term “parent protein” refer to the polypeptide to be modified by creating a library of diversified mutants. The “parent protein” may be a naturally occurring (or wild-type) polypeptide or it may be a variant thereof prepared by any suitable means. For instance, the “parent protein” may be a variant of a naturally occurring polypeptide which has been modified by substitution, deletion or truncation of one or more amino acid residues or by addition or insertion of one or more amino acid residues to the amino acid sequence of a naturally-occurring polypeptide.
The term “enzyme variants” or “protein variants” refer to a polypeptide of the invention comprising one or more substitutions of the specified amino acid residues. The total number of such substitutions is typically not more than 10, e.g. one, two, three, four, five or six of said substitutions. In addition, the enzyme variant or protein variant of the invention may optionally include other modifications of the parent enzyme, typically not more than 10, e.g. not more than 5 such modifications. The variant generally has a homology with the parent enzyme of at least 80%, e.g. at least 85%, typically at least 90% or at least 95%.
The term “randomized library” of protein variants refers to a library with at least partially randomized composition of the members, e.g. protein variants.
An “epitope” is a set of amino acids on a protein that are involved in an immunological response, such as antibody binding or T-cell activation. One particularly useful method of identifying epitopes involved in antibody binding is to screen a library of peptide-phage membrane protein fusions and selecting those that bind to relevant antigen-specific antibodies, sequencing the randomized part of the fusion gene, aligning the sequences involved in binding, defining consensus sequences based on these alignments, and mapping these consensus sequences on the surface or the sequence and/or structure of the antigen, to identify epitopes involved in antibody binding.
By the term “epitope pattern” is meant such a consensus sequence of antibody binding peptides. An example is the epitope pattern A R R<R. The sign “<” in this notation indicates that the aligned antibody binding peptides included a non-consensus amino acid between the second and the third arginine.
An “epitope area” is defined as the amino acids situated close to the epitope sequence amino acids. Preferably, the amino acids of an epitope area are located <5 Å from the epitope sequence. Hence, an epitope area also includes the corresponding epitope sequence itself. Modifications of amino acids of the ‘epitope area’ can possibly affect the immunogenic function of the corresponding epitope.
By the term “epitope sequence” is meant the amino acid residues of a parent protein, which have been identified to belong to an epitope by the methods of the present invention (an example of an epitope sequence is E271 Q12 I8 in Savinase).
The term ‘antibody binding peptide’ denotes a peptide that bind with sufficiently high affinity to antibodies. Identification of ‘antibody binding peptides’ and their sequences constitute the first step of the method of this invention.
“Anchor amino acids” are the individual amino acids of an epitope pattern.
“Hot spot amino acids” are amino acids of parent protein, which are particularly likely to result in modified immunogenecity if they are mutated. Amino acids, which appear in three or more epitope sequences or which correspond to anchor amino acids are hot spot amino acids.
“Environmental allergens” are protein allergens that are present naturally. They include pollen, dust mite allergens, pet allergens, food allergens, venoms, etc.
“Commercial allergens” are protein allergens that are being brought to the market commercially. They include enzymes, pharmaceutical proteins, antimicrobial peptides, as well as allergens of transgenic plants.
The “donor protein” is the protein that was used to raise antibodies used to identify antibody binding sequences, hence the donor protein provides the information that leads to the epitope patterns.
The “acceptor protein” is the protein, whose structure is used to fit the identified epitope patterns and/or to fit the antibody binding sequences. Hence the acceptor protein is also the parent protein.
An “autoepitope” is one that has been identified using antibodies raised against the parent protein, i.e. the acceptor and the donor proteins are identical.
A “heteroepitope” is one that has been identified with distinct donor and acceptor proteins.
The term “functionality” of protein variants refers to e.g. enzymatic activity; binding to a ligand or receptor; stimulation of a cellular response (e.g. 3H-thymidine incorporation as response to a mitogenic factor); or anti-microbial activity.
By the term “specific polyclonal antibodies” is meant polyclonal antibodies isolated according to their specificity for a certain antigen, e.g. the protein backbone.
By the term “monospecific antibodies” is meant polyclonal antibodies isolated according to their specificity for a certain epitope. Such monospecific antibodies will bind to the same epitope, but with different affinity, as they are produced by a number of antibody producing cells recognizing overlapping but not necessarily identical epitopes.
The term “randomized library” of protein variants refers to a library with at least partially randomized composition of the members, e.g. protein variants.
‘Spiked mutagenesis’ is a form of site-directed mutagenesis, in which the primers used have been synthesized using mixtures of oligonucleotides at one or more positions.
By the term “a protein variant having modified immunogenicity as compared to the parent protein” is meant a protein variant which differs from the parent protein in one or more amino acids whereby the immunogenicity of the variant is modified. The modification of immunogenicity may be confirmed by testing the ability of the protein variant to elicit an IgE/IgG response.
In the present context the term “protein” is intended to cover oligopeptides, polypeptides as well as proteins as such.
The present invention relates to a method of selecting a protein variant having modified immunogenicity as compared to a parent protein, comprising the steps of:
a) obtaining antibody binding peptide sequences,
b) using the sequences to localise epitope sequences on the 3-dimensional structure of parent protein,
c) defining an epitope area including amino acids situated within 5 Å from the epitope amino acids constituting the epitope sequence,
d) changing one or more of the amino acids defining the epitope area of the parent protein by genetic engineering mutations of a DNA sequence encoding the parent protein,
e) introducing the mutated DNA sequence into a suitable host, culturing said host and expressing the protein variant, and
f) evaluating the immunogenicity of the protein variant using the parent protein as reference.
A first step of the method is to identify peptide sequences, which bind specifically to antibodies.
Antibody binding peptide sequences can be found by testing a set of known peptide sequences for binding to antibodies raised against the donor protein. These sequences are typically selected, such that each represents a segment of the donor protein sequence (Mol. Immunol., 1992, vol. 29, pp. 1383-1389; Am. J. Resp. Cell. Mol. Biol. 2000, vol. 22, pp. 344-351). Also, randomized synthetic peptide libraries can be used to find antibody binding sequences (Slootstra et al; Molecular Diversity, 1996, vol. 2, pp. 156-164).
In a preferred method, the identification of antibody binding sequences may be achieved by screening of a display package library, preferably a phage display library. The principle behind phage display is that a heterologous DNA sequence can be inserted in the gene coding for a coat protein of the phage (WO 92/15679). The phage will make and display the hybrid protein on its surface where it can interact with specific target agents. Such target agent may be antigen-specific antibodies. It is therefore possible to select specific phages that display antibody-binding peptide sequences. The displayed peptides can be of predetermined lengths, for example 9 amino acids long, with randomized sequences, resulting in a random peptide display package library. Thus, by screening for antibody binding, one can isolate the peptide sequences that have sufficiently high affinity for the particular antibody used. The peptides of the hybrid proteins of the specific phages which bind protein-specific antibodies characterize epitopes that are recognized by the immune system.
The antibodies used for reacting with the display package are preferably IgE antibodies to ensure that the epitopes identified are IgE epitopes, i.e. epitopes inducing and binding IgE. In a preferred embodiment the antibodies are polyclonal antibodies, optionally monospecific antibodies.
For the purpose of the present invention polyclonal antibodies are preferred in order to obtain a broader knowledge about the epitopes of a protein.
It is of great importance that the amino acid sequence of the peptides presented by the display packages is long enough to represent a significant part of the epitope to be identified. In a preferred embodiment of the invention the peptides of the peptide display package library are oligopeptides having from 5 to 25 amino acids, preferably at least 8 amino acids, such as 9 amino acids. For a given length of peptide sequences (n), the theoretical number of different possible sequences can be calculated as 20n. The diversity of the package library used must be large enough to provide a suitable representation of the theoretical number of different sequences. In a phage-display library, each phage has one specific sequence of a determined length. Hence an average phage display library can express 108-1012 different random sequences, and is therefore well-suited to represent the theoretical number of different sequences.
The antibody binding peptide sequences can be further analysed by consensus alignment e.g. by the methods described by Feng and Doolittle, Meth. Enzymol., 1996, vol. 266, pp. 368-382; Feng and Doolittle, J. Mol. Evol., 1987, vol. 25, pp. 351-360; and Taylor, Meth. Enzymol., 1996, vol. 266, pp. 343-367.
This leads to identification of epitope patterns, which can assist the comparison of the linear information obtained from the antibody binding peptide sequences to the 3-o dimensional structure of the acceptor protein in order to identify epitope sequences at the surface of the acceptor protein.
Given a number of antibody binding peptide sequences and possibly the corresponding epitope patterns, one need the 3-dimensional structure coordinates of an acceptor protein to find the epitope sequences on its surface.
These coordinates can be found in databases (NCBI: http://www.ncbi.nlm.nih.gov/), determined experimentally using conventional methods (Ducruix and Giege: Crystallization of Nucleic Acids and Proteins, IRL PRess, Oxford, 1992, ISBN 0-19-963245-6), or they can be deduced from the coordinates of a homologous protein. Typical actions required for the construction of a model structure are: alignment of homologous sequences for which 3-dimensional structures exist, definition of Structurally Conserved Regions (SCRs), assignment of coordinates to SCRs, search for structural fragments/loops in structure databases to replace Variable Regions, assignment of coordinates to these regions, and structural refinement by energy minimization. Regions containing large inserts (>3 residues) relative to the known 3-dimensional structures are known to be quite difficult to model, and structural predictions must be considered with care.
Using the coordinates and the several methods of mapping the linear information on the 3-dimensional surface are possible, as described in the examples below.
One can match each amino acid residue of the antibody binding peptide to an identical or homologous amino acid on the 3-D surface of the acceptor protein, such that amino acids that are adjacent in the primary sequence are close on the surface of the acceptor protein, with close being <5 Å, preferably <3 Å between any two atoms of the two amino acids.
Alternatively, one can define a geometric body (e.g. an ellipsoid, a sphere, or a box) of a size that matches a possible binding interface between antibody and antigen and look for a positioning of this body where it will contain most of or all the anchor amino acids.
Also, one can use the epitope patterns to facilitate identification of epitope sequences. This can be done, by first matching the anchor amino acids on the 3-D structure and subsequently looking for other elements of the antibody binding peptide sequences, which provide additional matches. If there are many residues to be matched, it is only necessary that a suitable number can be found on the 3-D structure. For example if an epitope pattern comprises 4, 5, 6, or 7 amino acids, it is only necessary that 3 matches surface elements of the acceptor protein.
In all cases, it is desirable that amino acids of the epitope sequence are surface exposed (as described below in Examples).
It is known, that amino acids that surround binding sequences can affect binding of a ligand without participating actively in the binding process. Based on this knowledge, areas covered by amino acids with potential steric effects on the epitope-antibody interaction, were defined around the identified epitope sequences. These areas are called ‘epitope areas’. Practically, all amino acids situated within 5 Å from the amino acids defining the epitope sequence were included. Preferably, the epitope area equals the epitope sequence. The accessibility criterium was not used as hidden amino acids of an epitope area also can have an effect on the adjacent amino acids of the epitope sequence.
There are at least four ways to utilize the information about epitope sequences, which has been derived by the methods of this invention:
1) reduce the allergenicity of a commercial protein using protein engineering.
2) reduce the potential of commercial proteins to cross-react with environmental allergens and hence cause allergic reactions in people sensitized to the environmental allergens (or vice versa).
3) improve the immunotherapeutic effect of allergen vaccines.
4) assist characterization of clinical allergies in order to select the appropriate allergen vaccine.
Protein Engineering to Reduce the Allergenicity, Cross-Reactivity and/or Immunotherapeutic Effect of Proteins.
The methods described thus far have led to identification of epitope areas on an acceptor protein, each containing epitope sequences. These subsets of amino acids, are preferred for introducing mutations that are meant to modify the immunogenecity of the acceptor protein. An even more preferred subset of amino acids to target by mutagenesis are ‘hot spot amino acids’, which appear in several different epitope sequences, or which corresponds to anchor amino acids of the epitope patterns.
Thus, genetic engineering mutations should be designed in the epitope areas, preferably in epitope sequences, and more preferably in the ‘hot spot amino acids’.
When the epitope area(s) have been identified, a protein variant exhibiting a modified immunogenicity may be produced by changing the identified epitope area of the parent protein by genetic engineering mutation of a DNA sequence encoding the parent protein.
The epitope identified may be changed by substituting at least one amino acid of the epitope area. In a preferred embodiment at least one anchor amino acid or hot spot amino acid is changed. The change will often be substituting to an amino acid of different size, hydrophilicity, and/or polarity, such as a small amino acid versus a large amino acid, a hydrophilic amino acid versus a hydrophobic amino acid, a polar amino acid versus a non-polar amino acid and a basic versus an acidic amino acid.
Other changes may be the addition/insertion or deletion of at least one amino acid of the epitope sequence, preferably deleting an anchor amino acid or a hot spot amino acid. Furthermore, an epitope pattern may be changed by substituting some amino acids, and deleting/adding other.
In the claims a position to be changed by substitution, insertion, deletion will be indicated by: “Position xx to aaa, bbb, ccc, insertion, deletion”, meaning that position xx can be substituted by the amino acid aaa, bbb, ccc or that any amino acid can be inserted after position xx or that position xx can be deleted, e.g. “Position 27 to A, D, E, insertion, deletion” means that in position 27 the amino acid can be substituted by A, D or E, or that any amino acid can be inserted after position 27, or that the amino acid in position 27 can be deleted.
When one uses protein engineering to eliminate epitopes, it is indeed possible that new epitopes are created, or existing epitopes are duplicated. To reduce this risk, one can map the planned mutations at a given position on the 3-dimensional structure of the protein of interest, and control the emerging amino acid constellation against a database of known epitope patterns, to rule out those possible replacement amino acids, which are predicted to result in creation or duplication of epitopes. Thus, risk mutations can be identified and eliminated by this procedure, thereby reducing the risk of making mutations that lead to increased rather than decreased allergenicity.
In yet another embodiment, one can design the mutation, such that amino acids suitable for chemical modification are substituted for existing ones in the epitope areas. The protein variant can then be conjugated to activated polymers. Which amino acids to substitute and/or insert, depends in principle on the coupling chemistry to be applied. The chemistry for preparation of covalent bioconjugates can be found in “Bioconjugate Techniques”, Hermanson, G. T. (1996), Academic Press Inc., which is hereby incorporated as reference (see below). It is preferred to make conservative substitutions in the polypeptide when the polypeptide has to be conjugated, as conservative substitutions secure that the impact of the substitution on the polypeptide structure is limited. In the case of providing additional amino groups this may be done by substitution of arginine to lysine, both residues being positively charged, but only the lysine having a free amino group suitable as an attachment groups. In the case of providing additional carboxylic acid groups the conservative substitution may for instance be an asparagine to aspartic acid or glutamine to glutamic acid substitution. These residues resemble each other in size and shape, except from the carboxylic groups being present on the acidic residues. In the case of providing SH-groups the conservative substitution may be done by changing threonine or serine to cysteine.
For chemical conjugation, the protein variant needs to be incubate with an active or activated polymer and subsequently separated from the unreacted polymer. This can be done in solution followed by purification or it can conveniently be done using the immobilized protein variants, which can easily be exposed to different reaction environments and washes.
In the case were polymeric molecules are to be conjugated with the polypeptide in question and the polymeric molecules are not active they must be activated by the use of a suitable technique. It is also contemplated according to the invention to couple the polymeric molecules to the polypeptide through a linker. Suitable linkers are well-known to the skilled person. Methods and chemistry for activation of polymeric molecules as well as for conjugation of polypeptides are intensively described in the literature. Commonly used methods for activation of insoluble polymers include activation of functional groups with cyanogen bromide, periodate, glutaraldehyde, biepoxides, epichlorohydrin, divinylsulfone, carbodiimide, sulfonyl halides, trichlorotriazine etc. (see R. F. Taylor, (1991), “Protein immobilisation. Fundamental and applications”, Marcel Dekker, N.Y.; S. S. Wong, (1992), “Chemistry of Protein Conjugation and Crosslinking”, CRC Press, Boca Raton; G. T. Hermanson et al., (1993), “Immobilized Affinity Ligand Techniques”, Academic Press, N.Y.). Some of the methods concern activation of insoluble polymers but are also applicable to activation of soluble polymers e.g. periodate, trichlorotriazine, sulfonylhalides, divinylsulfone, carbodiimide etc. The functional groups being amino, hydroxyl, thiol, carboxyl, aldehyde or sulfydryl on the polymer and the chosen attachment group on the protein must be considered in choosing the activation and conjugation chemistry which normally consist of i) activation of polymer, ii) conjugation, and iii) blocking of residual active groups.
In the following a number of suitable polymer activation methods will be described shortly. However, it is to be understood that also other methods may be used.
Coupling polymeric molecules to the free acid groups of polypeptides may be performed with the aid of diimide and for example amino-PEG or hydrazino-PEG (Pollak et al., (1976), J. Am. Chem. Soc., 98, 289-291) or diazoacetate/amide (Wong et al., (1992), “Chemistry of Protein Conjugation and Crosslinking”, CRC Press).
Coupling polymeric molecules to hydroxy groups is generally very difficult as it must be performed in water. Usually hydrolysis predominates over reaction with hydroxyl groups.
Coupling polymeric molecules to free sulfhydryl groups can be achieved with special groups like maleimido or the ortho-pyridyl disulfide. Also vinylsulfone (U.S. Pat. No. 5,414,135, (1995), Snow et al.) has a preference for sulfhydryl groups but is not as selective as the other mentioned.
Accessible arginine residues in the polypeptide chain may be targeted by groups comprising two vicinal carbonyl groups.
Techniques involving coupling of electrophilically activated PEGs to the amino groups of Lysines may also be useful. Many of the usual leaving groups for alcohols give rise to an amine linkage. For instance, alkyl sulfonates, such as tresylates (Nilsson et al., (1984), Methods in Enzymology vol. 104, Jacoby, W. B., Ed., Academic Press: Orlando, p. 56-66; Nilsson et al., (1987), Methods in Enzymology vol. 135; Mosbach, K., Ed.; Academic Press: Orlando, pp. 65-79; Scouten et al., (1987), Methods in Enzymology vol. 135, Mosbach, K., Ed., Academic Press: Orlando, 1987; pp 79-84; Crossland et al., (1971), J. Amr. Chem. Soc. 1971, 93, pp. 4217-4219), mesylates (Harris, (1985), supra; Harris et al., (1984), J. Polym. Sci. Polym. Chem. Ed. 22, pp 341-352), aryl sulfonates like tosylates, and para-nitrobenzene sulfonates can be used.
Organic sulfonyl chlorides, e.g. Tresyl chloride, effectively converts hydroxy groups in a number of polymers, e.g. PEG, into good leaving groups (sulfonates) that, when reacted with nucleophiles like amino groups in polypeptides allow stable linkages to be formed between polymer and polypeptide. In addition to high conjugation yields, the reaction conditions are in general mild (neutral or slightly alkaline pH, to avoid denaturation and little or no disruption of activity), and satisfy the non-destructive requirements to the polypeptide.
Tosylate is more reactive than the mesylate but also less stable decomposing into PEG, dioxane, and sulfonic acid (Zalipsky, (1995), Bioconjugate Chem., 6, 150-165). Epoxides may also been used for creating amine bonds but are much less reactive than the abovementioned groups.
Converting PEG into a chloroformate with phosgene gives rise to carbamate linkages to Lysines. Essentially the same reaction can be carried out in many variants substituting the chlorine with N-hydroxy succinimide (U.S. Pat. No. 5,122,614, (1992); Zalipsky et al., (1992), Biotechnol. Appl. Biochem., 15, p. 100-114; Monfardini et al., (1995), Bioconjugate Chem., 6, 62-69, with imidazole (Allen et al., (1991), Carbohydr. Res., 213, pp 309-319), with para-nitrophenol, DMAP (EP 632 082 A1, (1993), Looze, Y.) etc. The derivatives are usually made by reacting the chloroformate with the desired leaving group. All these groups give rise to carbamate linkages to the peptide.
Furthermore, isocyanates and isothiocyanates may be employed, yielding ureas and thioureas, respectively.
Amides may be obtained from PEG acids using the same leaving groups as mentioned above and cyclic imid thrones (U.S. Pat. No. 5,349,001, (1994), Greenwald et al.). The reactivity of these compounds are very high but may make the hydrolysis to fast.
PEG succinate made from reaction with succinic anhydride can also be used. The hereby comprised ester group make the conjugate much more susceptible to hydrolysis (U.S. Pat. No. 5,122,614, (1992), Zalipsky). This group may be activated with N-hydroxy succinimide.
Furthermore, a special linker can be introduced. The most well studied being cyanuric chloride (Abuchowski et al., (1977), J. Biol. Chem., 252, 3578-3581; U.S. Pat. No. 4,179,337, (1979), Davis et al.; Shafer et al., (1986), J. Polym. Sci. Polym. Chem. Ed., 24, 375-378.
Coupling of PEG to an aromatic amine followed by diazotation yields a very reactive diazonium salt, which can be reacted with a peptide in situ. An amide linkage may also be obtained by reacting an azlactone derivative of PEG (U.S. Pat. No. 5,321,095, (1994), Greenwald, R. B.) thus introducing an additional amide linkage.
As some peptides do not comprise many Lysines it may be advantageous to attach more than one PEG to the same Lysine. This can be done e.g. by the use of 1,3-diamino-2-propanol.
PEGs may also be attached to the amino-groups of the enzyme with carbamate linkages (WO 95/11924, Greenwald et al.). Lysine residues may also be used as the backbone.
The coupling technique used in the examples is the N-succinimidyl carbonate conjugation technique descried in WO 90/13590 (Enzon).
In a preferred embodiment, the activated polymer is methyl-PEG which has been activated by N-succinimidyl carbonate as described WO 90/13590. The coupling can be carried out at alkaline conditions in high yields.
For coupling of polymers to the protein variants, it is preferred to use conditions similar to those described in WO 96/17929 and WO 99/00489 (Novo Nordisk A/S) e.g. mono or bis activated PEG's of molecular weight ranging from 100 to 5000 Da. For instance, a methyl-PEG 350 could be activated with N-succinimidyl carbonate and incubated with protein variant at a molar ratio of more than 5 calculated as equivalents of activated PEG divided by moles of lysines in the protein of interest. For coupling to immobilized protein variant, the PEG:protein ratio should be optimized such that the PEG concentration is low enough for the buffer capacity to maintain alkaline pH throughout the reaction; while the PEG concentration is still high enough to ensure sufficient degree of modification of the protein. Further, it is important that the activated PEG is kept at conditions that prevent hydrolysis (i.e. dissolved in acid or solvents) and diluted directly into the alkaline reaction buffer. It is essential that primary amines are not present other than those occurring in the lysine residues of the protein. This can be secured by washing thoroughly in borate buffer. The reaction is stopped by separating the fluid phase containing unreacted PEG from the solid phase containing protein and derivatized protein. Optionally, the solid phase can then be washed with tris buffer, to block any unreacted sites on PEG chains that might still be present.
In another embodiment, the mutations are designed, such that recognition sites for post-translational modifications are introduced in the epitope areas, and the protein variant is expressed in a suitable host organism capable of the corresponding post-translational modification. These post-translational modifications may serve to shield the epitope and hence lower the immunogenicity of the protein variant relative to the protein backbone. Post-translational modifications include glycosylation, phosphorylation, N-terminal processing, acylation, ribosylation and sulfatation. A good example is N-glycosylation. N-glycosylation is found at sites of the sequence Asn-Xaa-Ser, Asn-Xaa-Thr, or Asn-Xaa-Cys, in which neither the Xaa residue nor the amino acid following the tri-peptide consensus sequence is a proline (T. E. Creighton, ‘Proteins—Structures and Molecular Properties, 2nd edition, W.H. Freeman and Co., New York, 1993, pp. 91-93). It is thus desirable to introduce such recognition sites in the sequence of the backbone protein. The specific nature of the glycosyl chain of the glycosylated protein variant may be linear or branched depending on the protein and the host cells. Another example is phosphorylation: The protein sequence can be modified so as to introduce serine phosphorylation sites with the recognition sequence arg-arg-(xaa)n-ser (where n=0, 1, or 2) (SEQ ID NOS: 38 and 39), which can be phosphorylated by the cAMP-dependent kinase or tyrosine phosphorylation sites with the recognition sequence -lys/arg-(xaa)3-asp/glu-(xaa)3-tyr (SEC) ID NO: 40), which can usually be phosphorylated by tyrosine-specific kinases (T. E. Creighton, “Proteins—Structures and molecular properties”, 2nd ed., Freeman, N.Y., 1993).
In order to generate protein variants, more than one amino acid residue may be substituted, added or deleted, these amino acids preferably being located in different epitope areas. In that case, it may be difficult to assess a priori how well the functionality of the protein is maintained while antigenicity is reduced, especially since the possible number of mutation-combinations becomes very large, even for a small number of mutations. In that case, it will be an advantage, to establish a library of diversified mutants each having one or more changed amino acids introduced and selecting those variants, which show good retention of function and at the same time a significant reduction in antigenicity.
A diversified library can be established by a range of techniques known to the person skilled in the art (Reetz M T; Jaeger K E, in ‘Biocatalysis—from Discovery to Application’ edited by Fessner W D, Vol. 200, pp. 31-57 (1999); Stemmer, Nature, vol. 370, p. 389-391, 1994; Zhao and Arnold, Proc. Natl. Acad. Sci., USA, vol. 94, pp. 7997-8000, 1997; or Yano et al., Proc. Natl. Acad. Sci., USA, vol. 95, pp 5511-5515, 1998). These include, but are not limited to, ‘spiked mutagenesis’, in which certain positions of the protein sequence are randomized by carring out PCR mutagenesis using one or more oligonucleotide primers which are synthesized using a mixture of nucleotides for certain positions (Lanio T, Jeltsch A, Biotechniques, Vol. 25(6), 958,962,964-965 (1998)). The mixtures of oligonucleotides used within each triplet can be designed such that the corresponding amino acid of the mutated gene product is randomized within some predetermined distribution function. Algorithms have been disclosed, which facilitate this design (Jensen L J et al., Nucleic Acids Research, Vol. 26(3), 697-702 (1998)).
In an embodiment substitutions are found by a method comprising the following steps: 1) a range of substitutions, additions, and/or deletions are listed encompassing several epitope areas (preferably in the corresponding epitope sequences, anchor amino aids, and/or hot spots), 2) a library is designed which introduces a randomized subset of these changes in the amino acid sequence into the target gene, e.g. by spiked mutagenesis, 3) the library is expressed, and preferred variants are selected. In another embodiment, this method is supplemented with additional rounds of screening and/or family shuffling of hits from the first round of screening (J. E. Ness, et al, Nature Biotechnology, vol. 17, pp. 893-896, 1999) and/or combination with other methods of reducing immunogenicity by genetic means (such as that disclosed in WO 92/10755).
The library may be designed, such that at least one amino acid of the epitope area is substituted. In a preferred embodiment at least one amino acid of the epitope sequence itself is changed, and in an even more preferred embodiment, one or more hot spot amino acids are changed. The library may be biased such that towards introducing an amino acid of different size, hydrophilicity, and/or polarity relative to the original one of the ‘protein backbone’. For example changing a small amino acid to a large amino acid, a hydrophilic amino acid to a hydrophobic amino acid, a polar amino acid to a non-polar amino acid or a basic to an acidic amino acid. Other changes may be the addition or deletion of at least one amino acid of the epitope area, preferably deleting an anchor amino acid. Furthermore, substituting some amino acids and deleting or adding others may change an epitope.
Diversity in the protein variant library can be generated at the DNA triplet level, such that individual codons are variegated e.g. by using primers of partially randomized sequence for a PCR reaction. Further, several techniques have been described, by which one can create a library with such diversity at several locations in the gene, which are too far apart to be covered by a single (spiked) oligonucleotide primer. These techniques include the use of in vivo recombination of the individually diversified gene segments as described in WO 97/07205 on page 3, line 8 to 29 or by using DNA shuffling techniques to create a library of full length genes that combine several gene segments each of which are diversified e.g. by spiked mutagenesis (Stemmer, Nature 370, pp. 389-391, 1994 and U.S. Pat. Nos. 5,605,793 and 5,830,721). In the latter case, one can use the gene encoding the “protein backbone” as a template double-stranded polynucleotide and combining this with one or more single or double-stranded oligonucleotides as described in claim 1 of U.S. Pat. No. 5,830,721. The single-stranded oligonucleotides could be partially randomized during synthesis. The double-stranded oligonucleotides could be PCR products incorporating diversity in a specific region. In both cases, one can dilute the diversity with corresponding segments containing the sequence of the backbone protein in order to limit the number of changes that are on average introduced. As mentioned above, methods have been established for designing the ratios of nucleotides (A; C; T; G) used at a particular codon during primer synthesis, so as to approximate a desired frequency distribution among a set of desired amino acids at that particular codon. This allows one to bias the partially randomized mutagenesis towards e.g. introduction of post-translational modification sites, chemical modification sites, or simply amino acids that are different from those that define the epitope or the epitope area. One could also approximate a sequence in a given location or epitope area to the corresponding location on a homologous, human protein.
Occasionally, one would be interested in testing a library that combines a number of known mutations in different locations in the primary sequence of the ‘protein backbone’. These could be introduced post-translational or chemical modification sites, or they could be mutations, which by themselves had proven beneficial for one reason or another (e.g. decreasing antigenicity, or improving specific activity, performance, stability, or other characteristics). In such cases, it may be desirable to create a library of diverse combinations of known sequences. For example if 12 individual mutations are known, one could combine (at least) 12 segments of the ‘protein backbone’ gene in which each segment is present in two forms: one with and one without the desired mutation. By varying the relative amounts of those segments, one could design a library (of size 212) for which the average number of mutations per gene can be predicted. This can be a useful way of combining elements that by themselves give some, but not sufficient effect, without resorting to very large libraries, as is often the case when using ‘spiked mutagenesis’. Another way to combine these ‘known mutations’ could be by using family shuffling of oligomeric DNA encoding the known changes with fragments of the full length wild type sequence.
When protein variants have been constructed based on the methods described in this invention, it is desirable to confirm their antibody binding capacity, functionality, immunogenicity and/or allergenicity using a purified preparation. For that use, the protein variant of interest can be expressed in larger scale, purified by conventional techniques, and the antibody binding and functionality should be examined in detail using dose-response curves and e.g. direct or competitive ELISA (C-ELISA).
The potentially reduced allergenicity (which is likely, but not necessarily true for a variant w. low antibody binding) should be tested in in vivo or in vitro model systems: e.g. an in vitro assays for immunogenicity such as assays based on cytokine expression profiles or other proliferation or differentiation responses of epithelial and other cells incl. B-cells and T-cells. Further, animal models for testing allergenicity should be set up to test a limited number of protein variants that show desired characteristics in vitro. Useful animal models include the guinea pig intratracheal model (GPIT) (Ritz, et al. Fund. Appl. Toxicol., 21, pp. 31-37, 1993), mouse subcutaneous (mouse-SC) (WO 98/30682, Novo Nordisk), the rat intratracheal (rat-IT) (WO 96/17929, Novo Nordisk), and the mouse intranasal (MINT) (Robinson et al., Fund. Appl. Toxicol. 34, pp. 15-24, 1996) models.
The immunogenicity of the protein variant is measured in animal tests, wherein the animals are immunised with the protein variant and the immune response is measured. Specifically, it is of interest to determine the allergenicity of the protein variants by repeatedly exposing the animals to the protein variant by the intratracheal route and following the specific IgG and IgE titers. Alternatively, the mouse intranasal (MINT) test can be used to assess the allergenicity of protein variants. By the present invention the allergenicity is reduced at least 3 times as compared to the allergenicity of the parent protein, preferably 10 times reduced, more preferably 50 times.
However, the present inventors have demonstrated that the performance in ELISA correlates closely to the immunogenic responses measured in animal tests. To obtain a useful reduction of the allergenicity of a protein, the IgE binding capacity of the protein variant must be reduced to at least below 75%, preferably below 50%, more preferably below 25% of the IgE binding capacity of the parent protein as measured by the performance in IgE ELISA, given the value for the IgE binding capacity of the parent protein is set to 100%.
Thus a first assessment of the immunogenicity and/or allergenicity of a protein can be made by measuring the antibody binding capacity or antigenicity of the protein variant using appropriate antibodies. This approach has also been used in the literature (WO 99/47680).
The immunotherapeutic effect of allergen vaccines can be assessed a number of different ways. One is to measure the specific IgE binding, the reduction of which indicates a better allergen vaccine potential (WO 99/47680, ALK-ABELLO). Also, several cellular assays could be employed to show the modified immuneresponse indicative of good allergen vaccine potential as shown in several publications, all of which are hereby incorporated by reference (van Neerven et al, “T lymphocyte responses to allergens: Epitope-specificity and clinical relevance”, Immunol Today, 1996, vol. 17, pp. 526-532; Hoffmann et al., Allergy, 1999, vol. 54, pp. 446-454, WO 99/07880).
Eventually, clinical trials with allergic patients could be employed using cellular or clinical end-point measurements. (Ebner et al., Clin. Exp. All., 1997, vol. 27, pp. 107-1015; Int. Arch. Allergy Immunol., 1999, vol. 119, pp 1-5).
A wide variety of protein functionality assays are available in the literature. Especially, those suitable for automated analysis are useful for this invention. Several have been published in the literature such as protease assays (WO 99/34011, Genencor International; J. E. Ness, et al, Nature Biotechn., 17, pp. 893-896, 1999), oxidoreductase assays (Chemy et al., Nature Biotechn., 17, pp. 379-384, 1999, and assays for several other enzymes (WO 99/45143, Novo Nordisk). Those assays that employ soluble substrates can be employed for direct analysis of functionality of immobilized protein variants.
A related objective is to reduce cross-reactivity between ‘commercial allergens’ and ‘environmental allergens’. Cross-reactivities between food allergens of different origin are well-known (Akkerdaas et al, Allergy 50, pp 215-220, 1995). Similarly, cross-reactivities between other environmental allergens (like pollen, dust mites etc.) and commercial allergens (like enzyme proteins) have been established in the literature (J. All. Clin. Immunol., 1998, vol. 102, pp. 679-686 and by the present inventors. The molecular reason for this cross-reactivity can be explored using epitope mapping. By finding epitope patterns using antibodies raised against environmental allergen (donor protein) and mapping this information on a commercial allergen (the acceptor protein), one may find the epitopes that are common to both proteins, and hence responsible for the cross-reactivity. Obviously, one can also use the commercial allergen as donor and the environmental allergen as acceptor. By modifying the commercial allergen using protein engineering in the epitope areas identified as described above, one can reduce the cross-reactivity of the commercial allergen variant towards the environmental allergens (and vice versa). Hence, the use of the modified commercial allergens would be safer than using the unmodified commercial allergen.
Testing of this approach would be done using an antibody-binding assay with the protein variant (and its parent protein as control) and antibodies raised against the protein that cross-reacts with the parent protein. The method is otherwise identical to those described in the Methods section for characterization of allergencitiy and antigenicity.
The modifications of the enzymes in the epitope areas as disclosed the present application may cause other effects to the enzyme than modified immunogenicity. A modification may also change the performance of the enzyme, such as the wash performance, thermo stability, storage stability and increased catalytical activity of the enzyme.
The ability of an enzyme to catalyze the degradation of various naturally occurring substrates present on the objects to be cleaned during e.g. wash is often referred to as its washing ability, wash-ability, detergency, or wash performance. Throughout this application the term wash performance will be used to encompass this property.
Another aspect of the invention is a composition comprising at least one protein (polypeptide) or enzyme of the invention. The composition may comprise other polypeptides, proteins or enzymes and/or ingredients normally used in personal care products, such as shampoo, soap bars, skin lotion, skin creme, hair dye, toothpaste, household articles, agro chemicals, personal care products, such as cleaning preparations e.g. for contact lenses, cosmetics, toiletries, oral and dermal pharmaceuticals, compositions used for treating textiles, compositions used for manufacturing food, e.g. baking, and feed etc.
Examples of said proteins(polypeptides)/enzymes include enzymes exhibiting protease, lipolytic enzyme, oxidoreductase, carbohydrase, transferase, such as transglutaminase, phytase and/or anti-microbial polypeptide activity. These enzymes may be present as conjugates with reduced activity.
The protein of the invention may furthermore typically be used in detergent composition. It may be included in the detergent composition in the form of a non-dusting granulate, a stabilized liquid, or a protected enzyme. Non-dusting granulates may be produced, e.g., as disclosed in U.S. Pat. Nos. 4,106,991 and 4,661,452 (both to Novo Industri NS) and may optionally be coated by methods known in the art. Examples of waxy coating materials are poly(ethylene oxide) products (polyethylene glycol, PEG) with mean molecular weights of 1000 to 20000; ethoxylated nonylphenols having from 16 to 50 ethylene oxide units; ethoxylated fatty alcohols in which the alcohol contains from 12 to 20 carbon atoms and in which there are 15 to 80 ethylene oxide units; fatty alcohols; fatty acids; and mono- and di- and triglycerides of fatty acids. Examples of film-forming coating materials suitable for application by fluid bed techniques are given in patent GB 1483591. Liquid enzyme preparations may, for instance, be stabilized by adding a polyol such as propylene glycol, a sugar or sugar alcohol, lactic acid or boric acid according to established methods. Other enzyme stabilizers are well known in the art. Protected enzymes may be prepared according to the method disclosed in EP 238,216.
The detergent composition may be in any convenient form, e.g. as powder, granules, paste or liquid. A liquid detergent may be aqueous, typically containing up to 70% water and 0-30% organic solvent, or non-aqueous.
The detergent composition comprises one or more surfactants, each of which may be anionic, nonionic, cationic, or zwitterionic. The detergent will usually contain 0-50% of anionic surfactant such as linear alkylbenzenesulfonate (LAS), alpha-olefinsulfonate (AOS), alkyl sulfate (fatty alcohol sulfate) (AS), alcohol ethoxysulfate (AEOS or AES), secondary alkanesulfonates (SAS), alpha-sulfo fatty acid methyl esters, alkyl- or alkenylsuccinic acid, or soap. It may also contain 0-40% of nonionic surfactant such as alcohol ethoxylate (AEO or AE), carboxylated alcohol ethoxylates, nonylphenol ethoxylate, alkylpolyglycoside, alkyldimethylamine oxide, ethoxylated fatty acid monoethanolamide, fatty acid monoethanolamide, or polyhydroxy alkyl fatty acid amide (e.g. as described in WO 92/06154).
The detergent composition may additionally comprise one or more other enzymes, such as e.g. proteases, amylases, lipolytic enzymes, cutinases, cellulases, peroxidases, oxidases, and further anti-microbial polypeptides.
The detergent may contain 1-65% of a detergent builder or complexing agent such as zeolite, diphosphate, triphosphate, phosphonate, citrate, nitrilotriacetic acid (NTA), ethylenediaminetetraacetic acid (EDTA), diethylenetriaminepentaacetic acid (DTMPA), alkyl- or alkenylsuccinic acid, soluble silicates or layered silicates (e.g. SKS-6 from Hoechst). The detergent may also be unbuilt, i.e. essentially free of detergent builder.
The detergent may comprise one or more polymers. Examples are carboxymethylcellulose (CMC), poly(vinylpyrrolidone) (PVP), polyethyleneglycol (PEG), poly(vinyl alcohol) (PVA), polycarboxylates such as polyacrylates, maleic/acrylic acid copolymers and lauryl methacrylate/acrylic acid copolymers.
The detergent may contain a bleaching system which may comprise a H2O2 source such as perborate or percarbonate which may be combined with a peracid-forming bleach activator such as tetraacetylethylenediamine (TAED) or nonanoyloxybenzenesulfon-ate (NOBS). Alternatively, the bleaching system may comprise peroxyacids of, e.g., the amide, imide, or sulfone type.
The detergent composition of the invention comprising the polypeptide of the invention may be stabilized using conventional stabilizing agents, e.g. a polyol such as propylene glycol or glycerol, a sugar or sugar alcohol, lactic acid, boric acid, or a boric acid derivative such as, e.g., an aromatic borate ester, and the composition may be formulated as described in, e.g., WO 92/19709 and WO 92/19708.
The detergent may also contain other conventional detergent ingredients such as, e.g., fabric conditioners including clays, foam boosters, suds suppressors, anti-corrosion agents, soil-suspending agents, anti-soil-redeposition agents, dyes, bactericides, optical brighteners, or perfume.
The pH (measured in aqueous solution at use concentration) will usually be neutral or alkaline, e.g. in the range of 7-11.
Further, a modified enzyme according to the invention may also be used in dishwashing detergents.
Dishwashing detergent compositions comprise a surfactant which may be anionic, non-ionic, cationic, amphoteric or a mixture of these types. The detergent will contain 0-90% of non-ionic surfactant such as low- to non-foaming ethoxylated propoxylated straight-chain alcohols.
The detergent composition may contain detergent builder salts of inorganic and/or organic types. The detergent builders may be subdivided into phosphorus-containing and non-phosphorus-containing types. The detergent composition usually contains 1-90% of detergent builders.
Examples of phosphorus-containing inorganic alkaline detergent builders, when present, include the water-soluble salts especially alkali metal pyrophosphates, orthophosphates, and polyphosphates. An example of phosphorus-containing organic alkaline detergent builder, when present, includes the water-soluble salts of phosphonates. Examples of non-phosphorus-containing inorganic builders, when present, include water-soluble alkali metal carbonates, borates and silicates as well as the various types of water-insoluble crystalline or amorphous alumino silicates of which zeolites are the best-known representatives.
Examples of suitable organic builders include the alkali metal, ammonium and substituted ammonium, citrates, succinates, malonates, fatty acid sulphonates, carboxymetoxy succinates, ammonium polyacetates, carboxylates, polycarboxylates, aminopolycarboxylates, polyacetyl carboxylates and polyhydroxsulphonates.
Other suitable organic builders include the higher molecular weight polymers and co-polymers known to have builder properties, for example appropriate polyacrylic acid, polymaleic and polyacrylic/polymaleic acid copolymers and their salts.
The dishwashing detergent composition may contain bleaching agents of the chlorine/bromine-type or the oxygen-type. Examples of inorganic chlorine/bromine-type bleaches are lithium, sodium or calcium hypochlorite and hypobromite as well as chlorinated trisodium phosphate. Examples of organic chlorine/bromine-type bleaches are heterocyclic N-bromo and N-chloro imides such as trichloroisocyanuric, tribromoisocyanuric, dibromoisocyanuric and dichloroisocyanuric acids, and salts thereof with water-solubilizing cations such as potassium and sodium. Hydantoin compounds are also suitable.
The oxygen bleaches are preferred, for example in the form of an inorganic persalt, preferably with a bleach precursor or as a peroxy acid compound. Typical examples of suitable peroxy bleach compounds are alkali metal perborates, both tetrahydrates and monohydrates, alkali metal percarbonates, persilicates and perphosphates. Preferred activator materials are TAED and glycerol triacetate.
The dishwashing detergent composition of the invention may be stabilized using conventional stabilizing agents for the enzyme(s), e.g. a polyol such as e.g. propylene glycol, a sugar or a sugar alcohol, lactic acid, boric acid, or a boric acid derivative, e.g. an aromatic borate ester.
The dishwashing detergent composition of the invention may also contain other conventional detergent ingredients, e.g. deflocculant material, filler material, foam depressors, anti-corrosion agents, soil-suspending agents, sequestering agents, anti-soil redeposition agents, dehydrating agents, dyes, bactericides, fluorescers, thickeners and perfumes.
Finally, the enzyme of the invention may be used in conventional dishwashing-detergents, e.g. in any of the detergents described in any of the following patent publications: EP 518719, EP 518720, EP 518721, EP 516553, EP 516554, EP 516555, GB 2200132, DE 3741617, DE 3727911, DE 4212166, DE 4137470, DE 3833047, WO 93/17089, DE 4205071, WO 52/09680, WO 93/18129, WO 93/04153, WO 92/06157, WO 92/08777, EP 429124, WO 93/21299, U.S. Pat. No. 5,141,664, EP 561452, EP 561446, GB 2234980, WO 93/03129, EP 481547, EP 530870, EP 533239, EP 554943, EP 346137, U.S. Pat. No. 5,112,518, EP 318204, EP 318279, EP 271155, EP 271156, EP 346136, GB 2228945, CA 2006687, WO 93/25651, EP 530635, EP 414197, and U.S. Pat. No. 5,240,632.
A particularly useful application area for low allergenic proteins or of proteins with low cross-reactivity to environmental allergens would be in personal care products where the end-user is in close contact with the protein, and where certain problems with allergenicity has been encountered in experimental set-ups (Kelling et al., J. All. Clin. 1 mm., 1998, Vol. 101, pp. 179-187 and Johnston et al., Hum. Exp. Toxicol., 1999, Vol. 18, p. 527).
First of all the conjugate or compositions of the invention can advantageously be used for personal care products, such as hair care and hair treatment products. This include products such as shampoo, balsam, hair conditioners, hair waving compositions, hair dyeing compositions, hair tonic, hair liquid, hair cream, shampoo, hair rinse, hair spray.
Further contemplated are oral care products such as dentifrice, oral washes, chewing gum.
Also contemplated are skin care products and cosmetics, such as skin cream, skin milk, cleansing cream, cleansing lotion, cleansing milk, cold cream, cream soap, nourishing essence, skin lotion, milky lotion, calamine lotion, hand cream, powder soap, transparent soap, sun oil, sun screen, shaving foam, shaving cream, baby oil lipstick, lip cream, creamy foundation, face powder, powder eye-shadow, powder, foundation, make-up base, essence powder, whitening powder.
Also for contact lenses hygiene products the conjugate of the invention can be used advantageously. Such products include cleaning and disinfection products for contact lenses.
Proteases are well-known active ingredients for cleaning of contact lenses. They hydrolyse the proteinaceous soil on the lens and thereby makes it soluble. Removal of the protein soil is essential for the wearing comfort.
Proteases are also effective ingredients in skin cleaning products, where they remove the upper layer of dead keratinaseous skin cells and thereby make the skin look brighter and fresher.
Proteases are also used in oral care products, especially for cleaning of dentures, but also in dentifrices.
Further, proteases are used in toiletries, bath and shower products, including shampoos, conditioners, lotions, creams, soap bars, toilet soaps, and liquid soaps.
Lipolytic enzymes can be applied for cosmetic use as active ingredients in skin cleaning products and anti-acne products for removal of excessive skin lipids, and in bath and shower products such as creams and lotions as active ingredients for skin care.
Lipolytic enzymes can also be used in hair cleaning products (e.g. shampoos) for effective removal of sebum and other fatty material from the surface of hair.
Lipolytic enzymes are also effective ingredients in products for cleaning of contact lenses, where they remove lipid deposits from the lens surface.
The most common oxidoreductase for personal care purposes is an oxidase (usually glucose oxidase) with substrate (e.g. glucose) that ensures production of H2O2, which then will initiate the oxidation of for instance SON− or I− into antimicrobial reagents (SCNO− or I2) by a peroxidase (usually lactoperoxidase). This enzymatic complex is known in nature from e.g. milk and saliva.
It is being utilised commercially as anti-microbial system in oral care products (mouth rinse, dentifrice, chewing gum) where it also can be combined with an amyloglucosidase to produce the glucose. These systems are also known in cosmetic products for preservation.
Anti-microbial systems comprising the combination of an oxidase and a peroxidase are know in the cleaning of contact lenses.
Another application of oxidoreductases is oxidative hair dyeing using oxidases, peroxidases and laccases.
Free radicals formed on the surface of the skin (and hair) known to be associated with the ageing process of the skin (spoilage of the hair). The free radicals activate chain reactions that lead to destruction of fatty membranes, collagen, and cells. The application of free radical scavengers such as Superoxide dismutase into cosmetics is well known (R. L. Goldemberg, DCI, Nov. 93, p. 48-52).
Protein disulfide isomerase (PDI) is also an oxidoreductase. It can be utilised for waving of hair (reduction and reoxidation of disulfide bonds in hair) and repair of spoiled hair (where the damage is mainly reduction of existing disulfide bonds).
Plaque formed on the surface of teeth is composed mainly of polysaccharides. They stick to the surface of the teeth and the microorganisms. The polysaccharides are mainly α-1,6 bound glucose (dextran) and α-1,3 bound glucose (mutan). The application of different types of glucanases such as mutanase and dextranase helps hydrolysing the sticky matrix of plaque, making it easier to remove by mechanical action.
Also other kinds of biofilm for instance the biofilm formed in lens cases can be removed by the action of glucanases.
Further conjugated enzymes or polypeptides with reduced immunogenicity according to the invention may advantageously be used in the manufacturing of food and feed.
The gluten in wheat flour is the essential ingredient responsible for the ability of flour to be used in baked foodstuffs. Proteolytic enzymes are sometimes needed to modify the gluten phase of the dough, e.g. a hard wheat flour can be softened with a protease.
Neutrase® is a commercially available neutral metallo protease that can be used to ensure a uniform dough quality and bread texture, and to improve flavour. The gluten proteins are degraded either moderately or more extensively to peptides, whereby close control is necessary in order to avoid excessive softening of the dough.
Proteases are also used for modifying milk protein.
To coagulate casein in milk when producing cheese proteases such as rennet or chymosin may be used.
In the brewery industry proteases are used for brewing with unmalted cereals and for controlling the nitrogen content.
In animal feed products proteases are used so to speak to expand the animals digestion system.
Addition of lipolytic enzyme results in improved dough properties and an improved breadmaking quality in terms of larger volume, improved crumb structure and whiter crumb colour. The observed effect can be explained by a mechanism where the lipolytic enzyme changes the interaction between gluten and some lipids fragment during dough mixing. This results in an improved gluten network.
The flavour development of blue roan cheese (e.g. Danablue), certain Italian type cheese, and other dairy products containing butter-fat, are dependent on the degradation of milk fat into free fatty acids. Lipolytic enzymes may be used for developing flavour in such products.
In the oil- and fat producing industry lipases are used e.g. to minimize the amount of undesirable side-products, to modify fats by interesterification, and to synthesis of esters.
Further oxidoreductases with reduced immunogenicity according to the invention may advantageously be used in the manufacturing of food and feed.
Several oxidoreductases are used for baking, glucose oxidase, lipoxygenase, peroxidase, catalase and combinations hereof. Traditionally, bakers strengthen gluten by adding ascorbic acid and potassium bromate. Some oxidoreductases can be used to replace bromate in dough systems by oxidation of free sulfydryl units in gluten proteins. Hereby disulphide linkages are formed resulting in stronger, more elastic doughs with greater resistance.
Gluzyme™ (Novozymes A/S) is a glucose oxidase preparation with catalase activity that can be used to replace bromate. The dough strengthen is measured as greater resistance to mechanical shock, better oven spring and larger loaf volume.
Flour has varying content of amylases leading to differences in the baking quality. Addition of amylases can be necessary in order to standardize the flour. Amylases and pentosanases generally provide sugar for the yeast fermentation, improve the bread volume, retard retrogradation, and decrease the staling rate and stickiness that results from pentosan gums. Examples of carbohydrases are given below.
Certain maltogenic amylases can be used for prolonging the shelf life of bread for two or more days without causing gumminess in the product. Selectively modifies the gelatinized starch by cleaving from the non-reducing end of the starch molecules, low molecular wight sugars and dextrins. The starch is modified in such a way that retrogradation is less likely to occur. The produced low-molecular-weight sugars improve the baked goods water retention capacity without creating the intermediate-length dextrins that result in gumminess in the finished product. The enzyme is inactivated during bread baking, so it can be considered a processing aid that does not have to be declared on the label. Overdosing of Novamyl can almost be excluded.
The bread volume can be improved by fungal α-amylases which further provide good and uniform structure of the bread crumb. Said α-amylases are endoenzymes that produce maltose, dextrins and glucose. Cereal and some bacterial α-amylases are inactivated at temperatures above the gelatinization temperature of starch, therefore when added to wheat dough it results in a low bread volume and a sticky bread interior. Fungamyl has the advantage of being thermolabile and is inactivated just below the gelatinization temperature.
Enzyme preparations containing a number of pentosanase and hemi-cellulase activities can improve the handling and stability of the dough, and improves the freshness, the crumb structure and the volume of the bread.
By hydrolysing the pentosans fraction in flour, it will lose a great deal of its water-binding capacity, and the water will then be available for starch and gluten. The gluten becomes more pliable and extensible, and the starch gelatinizes more easily. Pentosanases can be used in combination with or as an alternative to emulsifiers.
Further carbohydrases are user for producing syrups from starch, which are widely used in soft drinks, sweets, meat products, dairy products, bread products, ice cream, baby food, jam etc.
The conversion of starch is normally carried out three steps. First the starch is liquefied, by the use of alpha-amylases. Maltodextrins, primary consisting of oligosaccharides and dextrins, are obtained.
The mixture is then treated with an amyloglucosidase for hydrolysing the oligosaccharides and dextrins into glucose. This way a sweeter product is obtained. If high maltose syrups are desired beta-amylases alone or in combination with a pullulanase (de-branching enzyme) may be used.
The glucose mixture can be made even sweeter by isomerization to fructose. For this an immobilized glucose isomerase can be used.
In the sugar industry, it is common practice to speed up the break down of present starch in cane juices. Thereby the starch content in the raw sugar is reduced and filtration at the refinery facilitated.
Furthermore dextranases are used to break down dextran in raw sugar juices and syrups.
In the alcohol industry alpha-amylases is advantageously being used for thinning of starch in distilling mashes.
In the brewing industry alpha-amylases is used for adjunct liquefaction.
In the dairy industry beta-galactosidases (lactase) is used when producing low lactose milk for persons suffering from lactose malabsorption.
When flavoured milk drinks are produced from lactase-treated milk, the addition of sugar can be reduced without reducing the sweetness of the product.
In the production of condensed milk, lactose crystallization can be avoided by lactase treatment, and the risk of thickening caused by casein coagulation in lactose crystals is thus reduced.
When producing ice cream made from lactase-treated milk (or whey) no lactose crystals will be formed and the defect, sandiness, will not occur.
Further, xylanases are known to be used within a number of food/feed industrial applications as described in WO 94/21785 (Novo Nordisk A/S).
Alpha-amylases are used in the animal feed industry to be added to cereal-containing feed to improve the digestibility of starch.
Certain bacteriolytic enzymes may be used e.g. to wash carcasses in the meat packing industry (see U.S. Pat. No. 5,354,681 from Novo Industri A/S)
Transglutaminases with reduced immunogenicity according to the invention may advantageously be used in the manufacturing of food and feed.
Transglutaminases has the ability to crosslinking protein.
This property can be used for gelling of aqueous phases containing proteins. This may be used for when producing of spreads (DK patent application no. 1071/84 from Novo Nordisk A/S).
Transglutaminases are being used for improvement of baking quality of flour e.g. by modifying wheat flour to be used in the preparation of cakes with improved properties, such as improved taste, dent, mouth-feel and a higher volume (see JP 1-110147).
Further producing paste type food material e.g. used as fat substitution in foods as ice cream, toppings, frozen desserts, mayonnaises and low fat spreads (see WO 93/22930 from Novo Nordisk A/S).
Furthermore for preparation of gels for yoghurt, mousses, cheese, puddings, orange juice, from milk and milk-like products, and binding of chopped meat product, improvement of taste and texture of food proteins (see WO 94/21120 and WO 94/21129 from Novo Nordisk A/S).
Phytases of the invention may advantageously be used in the manufacturing of food, such as breakfast cereal, cake, sweets, drinks, bread or soup etc., and animal feed.
Phytases may be used either for exploiting the phosphorus bound in the phytate/phytic acid present in vegetable protein sources or for exploiting the nutritionally important minerals bound in phytic acid complexes.
Microbial phytase may be added to feedstuff of monogastric animals in order to avoid supplementing the feed with inorganic phosphorus (see U.S. Pat. No. 3,297,548).
Further phytases may be used in soy processing. Soyabean meal may contain high levels of the anti-nutritional factor phytate which renders this protein source unsuitable for application in baby food and feed for fish, calves and other non-ruminants, since the phytate chelates essential minerals present therein (see EP 0 420 358).
Also for baking purposes phytases may be used. Bread with better quality can be prepared by baking divided pieces of a dough containing wheat flour etc. and phytase (see JP-0-3076529-A).
A high phytase activity as in koji mold are known to be used for producing refined sake (see JP-0-6070749-A).
Proteases are used for degumming and sand washing of silk.
Lipolytic enzymes are used for removing fatty matter containing hydrophobic esters (e.g. triglycerides) during the finishing of textiles (see e.g. WO 93/13256 from Novo Nordisk A/S).
In bleach clean up of textiles catalases may serve to remove excess hydrogen peroxide.
Cellulolytic enzymes are widely used in the finishing of denim garments in order to provide a localized variation in the colour density of the fabric (Enzyme facilitated “stone wash”).
Also cellulolytic enzymes find use in the bio-polishing process. Bio-Polishing is a specific treatment of the yarn surface which improves fabric quality with respect to handle and appearance without loss of fabric wettability. Bio-polishing may be obtained by applying the method described e.g. in WO 93/20278.
During the weaving of textiles, the threads are exposed to considerable mechanical strain. In order to prevent breaking, the threads are usually reinforced by the coating (sizing) with a gelatinous substance (size). The most common sizing agent is starch in native or modified form. A uniform and durable finish can thus be obtained only after removal of the size from the fabric, the so-called desizing. Desizing of fabrics sized with a size containing starch or modified starch is preferably facilitated by use of amylolytic enzymes.
Different combinations of highly purified proteases (e.g. Trypsin and Chymotrypsin) are used in pharmaceuticals to be taken orally, and dermal pharmaceuticals for combating e.g. inflammations, edemata and injuries.
Transglutaminase is known to be used to casein-finishing leather by acting as a hardening agent (see WO 94/13839 from Novo Nordisk).
Cleaning of hard surfaces e.g. in the food industry is often difficult, as equipment used for producing dairies, meat, sea food products, beverages etc. often have a complicated shape. The use of surfactant compositions in the form gels and foams comprising enzymes have shown to facilitate and improve hard surface cleaning. Enzymes, which advantageously may be added in such surfactant compositions, are in particular proteases, lipolytic enzymes, amylases and cellulases.
Such hard surface cleaning compositions comprising enzymes may also advantageously be used in the transport sector, for instance for washing cars and for general vessel wash.
Furthermore this invention relates to the method by which the protein variants are being synthesised and expressed in host cells. This is achieved by culturing host cells capable of expressing a polypeptide in a suitable culture medium to obtain expression and secretion of the polypeptide into the medium, followed by isolation of the polypeptide from the culture medium. The host cell may be any cell suitable for the large-scale production of proteins, capable of expressing a protein and being transformed by an expression vector.
The host cell comprises a DNA construct as defined above, optionally the cells may be transformed with an expression vector comprising a DNA construct as defined above. The host cell is selected from any suitable cell, such as a bacterial cell, a fungal cell, an animal cell, such as an insect cell or a mammalian cell, or a plant cell.
A number of vaccination approaches have been described to for infective diseases as well as for non-infective diseases (such as cancers). In a number of cases, the antigen provided is an isolated protein or protein-adjuvant mixture and more and more often, the protein is recombinant (e.g. the hepatitits B vaccine from Merck & Co). In these cases, it could be desirable to modify the immunogenicity of the antigen vaccine, such that it offers a stronger or more specific protection. This can be achieved by protein engineering of the amino acid sequence of the antigen, and would be greatly facilitated by the use of the methods of this invention for identification of epitopes on the antigen vaccine to be the favored sites for modification.
There are several examples of vaccine molecules that have been engineered to achieve a specific immune protection against virus, parasites or cancer (Ryu and Nam, Biotechnol. Prog., 2000, vol. 16 pp. 2-16; and references cited therein). “The goal is often to vaccinate with a minimal strucutre consisting of a well-defined antigen, to stimulate an effective specific immune response, while avoiding potentially hazardous risks” (Ryu and Nam, Biotechnol. Prog., 2000, vol. 16 pp. 2-16). Thus, the methods of this invention can be used to identify such minimal structures that define an antigen (or epitope thereof) whether in the form of the parent protein scaffold with a number of mutations introduced in it, or whether it is in the form of the antibody binding peptides themselves.
Today, a patient suffering allergic disease may be subjected to allergy vaccine therapy using allergens selected on the basis of testing the specificity of the patient's serum IgE against a bank of allergen extracts (or similar specificity tests of the patient's sensibilization such as skin prick test.
One could improve the quality of characterization by using antibody binding peptides corresponding to various epitope sequences on the protein allergens of interest. This would require a kit comprising reagents for such specificity characterization, e.g. the antibody binding peptides of desired specificity. It would be preferred to use antibody binding sequences in the kit, which correspond to defined epitope sequences known to be specific for the allergen under investigation (i.e. not identified on other allergens and/or not cross-reacting with sera raised against other allergens). This kit would be useful to specifying which allergy the patient is suffering from. This kit will lead to a more specific answer than those kits used today, and hence to a better selection of allergen vaccine therapy for the individual patient.
Further, the knowledge about cross-reacting epitopes may improve vaccine development.
In an extension of this approach, one could also characterize the patient's serum by identifying the corresponding antibody binding peptides among a random display library using the aforementioned methods. This again may lead to a better selection of allergen vaccine therapy.
Further, one could use the individual antibody binding sequences as allergen vaccines leading to more specific allergen vaccine. These antibody binding sequences could be administered in an isolated form or fused to a membrane protein of the phage display system, or to another protein, which may have beneficial effect for the immunoprotective effect of the antibody binding peptide (Dalum et al., Nature Biotechnology, 1999, Vol. 17, pp. 666-669).
The “parent protein” can in principle be any protein molecule of biological origin, non-limiting examples of which are peptides, polypeptides, proteins, enzymes, post-translationally modified polypeptides such as lipopeptides or glycosylated peptides, anti-microbial peptides or molecules, and proteins having pharmaceutical properties etc.
Accordingly the invention relates to a method, wherein the “parent protein” is chosen from the group consisting of polypeptides, small peptides, lipopeptides, antimicrobials, and pharmaceutical polypeptides.
The term “pharmaceutical polypeptides” is defined as polypeptides, including peptides, such as peptide hormones, proteins and/or enzymes, being physiologically active when introduced into the circulatory system of the body of humans and/or animals.
Pharmaceutical polypeptides are potentially immunogenic as they are introduced into the circulatory system.
Examples of “pharmaceutical polypeptides” contemplated according to the invention include insulin, ACTH, glucagon, somatostatin, somatotropin, thymosin, parathyroid hormone, pigmentary hormones, somatomedin, erythropoietin, luteinizing hormone, chorionic gonadotropin, hypothalmic releasing factors, antidiuretic hormones, thyroid stimulating hormone, relaxin, interferon, thrombopoietin (TPO) and prolactin.
However, the proteins are preferably to be used in industry, housekeeping and/or medicine, such as proteins used in personal care products (for example shampoo; soap; skin, hand and face lotions; skin, hand and face cremes; hair dyes; toothpaste), food (for example in the baking industry), detergents and pharmaceuticals.
The antimicrobial peptide (AMP) may be, e.g., a membrane-active antimicrobial peptide, or an antimicrobial peptide affecting/interacting with intracellular targets, e.g. binding to cell DNA. The AMP is generally a relatively short peptide, consisting of less than 100 amino acid residues, typically 20-80 residues. The antimicrobial peptide has bactericidal and/or fungicidal effect, and it may also have antiviral or antitumour effects. It generally has low cytotoxicity against normal mammalian cells.
The antimicrobial peptide is generally highly cationic and hydrophobic. It typically contains several arginine and lysine residues, and it may not contain a single glutamate or asparatate. It usually contains a large proportion of hydrophobic residues. The peptide generally has an amphiphilic structure, with one surface being highly positive and the other hydrophobic.
The bioactive peptide and the encoding nucleotide sequence may be derived from plants, invertebrates, insects, amphibians and mammals, or from microorganisms such as bacteria and fungi.
The antimicrobial peptide may act on cell membranes of target microorganisms, e.g. through nonspecific binding to the membrane, usually in a membrane-parallel orientation, interacting only with one face of the bilayer.
The antimicrobial peptide typically has a structure belonging to one of five major classes: a helical, cystine-rich (defensin-like), b-sheet, peptides with an unusual composition of regular amino acids, and peptides containing uncommon modified amino acids.
Examples of alpha-helical peptides are Magainin 1 and 2; Cecropin A, B and P1; CAP18; Andropin; Clavanin A or AK; Styelin D and C; and Buforin II. Examples of cystine-rich peptides are a-Defensin HNP-1 (human neutrophil peptide) HNP-2 and HNP-3; b-Defensin-12, Drosomycin, g1-purothionin, and Insect defensin A. Examples of b-sheet peptides are Lactoferricin B, Tachyplesin I, and Protegrin PG1-5. Examples of peptides with an unusual composition are Indolicidin; PR-39; Bactenicin Bac5 and Bac7; and Histatin 5. Examples of peptides with unusual amino acids are Nisin, Gramicidin A, and Alamethicin.
Another example is the antifungal peptide (AFP) from Aspergillus giganteus. As explained in detail in WO 94/01459, which is hereby incorporated by reference, the antifungal polypeptide having the amino acid sequence shown in
However, the antifungal polypeptide, or variants thereof, suitable for the use according to the invention are expected to be derivable from other fungal species, especially other Aspergillus species such as A. pallidus, A. clavatus, A. longivesica, A. rhizopodus and A. clavatonanicus, because of the close relationship which exists between these species and A. giganteus.
In one embodiment of the invention the protein is an enzyme, such as glycosyl hydrolases, carbohydrases, peroxidases, proteases, lipolytic enzymes, phytases, polysaccharide lyases, oxidoreductases, transglutaminases and glycoseisomerases, in particular the following.
Parent proteases (i.e. enzymes classified under the Enzyme Classification number E.C. 3.4 in accordance with the Recommendations (1992) of the International Union of Biochemistry and Molecular Biology (IUBMB)) include proteases within this group.
Examples include proteases selected from those classified under the Enzyme Classification (E.C.) numbers:
3.4.11 (i.e. so-called aminopeptidases), including 3.4.11.5 (Prolyl aminopeptidase), 3.4.11.9 (X-pro aminopeptidase), 3.4.11.10 (Bacterial leucyl aminopeptidase), 3.4.11.12 (Thermophilic aminopeptidase), 3.4.11.15 (Lysyl aminopeptidase), 3.4.11.17 (Tryptophanyl aminopeptidase), 3.4.11.18 (Methionyl aminopeptidase).
3.4.21 (i.e. so-called serine endopeptidases), including 3.4.21.1 (Chymotrypsin), 3.4.21.4 (Trypsin), 3.4.21.25 (Cucumisin), 3.4.21.32 (Brachyurin), 3.4.21.48 (Cerevisin) and 3.4.21.62 (Subtilisin);
3.4.22 (i.e. so-called cysteine endopeptidases), including 3.4.22.2 (Papain), 3.4.22.3 (Ficain), 3.4.22.6 (Chymopapain), 3.4.22.7 (Asclepain), 3.4.22.14 (Actimidain), 3.4.22.30 (Caricain) and 3.4.22.31 (Ananain);
3.4.23 (i.e. so-called aspartic endopeptidases), including 3.4.23.1 (Pepsin A), 3.4.23.18 (Aspergillopepsin I), 3.4.23.20 (Penicillopepsin) and 3.4.23.25 (Saccharopepsin); and
3.4.24 (i.e. so-called metalloendopeptidases), including 3.4.24.28 (Bacillolysin).
A serine protease is an enzyme which catalyzes the hydrolysis of peptide bonds, and in which there is an essential serine residue at the active site (White, Handler and Smith, 1973 “Principles of Biochemistry,” Fifth Edition, McGraw-Hill Book Company, NY, pp. 271-272).
The bacterial serine proteases have molecular weights in the 20,000 to 45,000 Dalton range. They are inhibited by diisopropylfluorophosphate. They hydrolyze simple terminal esters and are similar in activity to eukaryotic chymotrypsin, also a serine protease. A more narrow term, alkaline protease, covering a sub-group, reflects the high pH optimum of some of the serine proteases, from pH 9.0 to 11.0 (for review, see Priest (1977) Bacteriological Rev. 41 711-753).
A sub-group of the serine proteases tentatively designated subtilases has been proposed by Siezen et al., Protein Engng. 4 (1991) 719-737 and Siezen et al. Protein Science 6 (1997) 501-523. They are defined by homology analysis of more than 170 amino acid sequences of serine proteases previously referred to as subtilisin-like proteases. A subtilisin was previously often defined as a serine protease produced by Gram-positive bacteria or fungi, and according to Siezen et al. now is a subgroup of the subtilases. A wide variety of subtilases have been identified, and the amino acid sequence of a number of subtilases has been determined. For a more detailed description of such subtilases and their amino acid sequences reference is made to Siezen et al., (1997).
One subgroup of the subtilases may be classified as savinase-like subtilisins, having at least 81% homology to Savinase, preferably at least 85% homology, more preferably at least 90% homology, even more preferably at least 96% homology, most preferably at least 98% homology to Savinase.
The term “parent subtilase” describes a subtilase defined according to Siezen et al. (1991 and 1997). For further details see description of “SUBTILASES” immediately above. A parent subtilase may also be a subtilase isolated from a natural source, wherein subsequent modifications have been made while retaining the characteristic of a subtilase. Furthermore, a parent subtilase may also be a subtilase which has been prepared by the DNA shuffling technique, such as described by J. E. Ness et al., Nature Biotechnology, 17, 893-896 (1999).
Alternatively the term “parent subtilase” may be termed “wild type subtilase”.
The term “modification(s)” used herein is defined to include chemical modification of a subtilase as well as genetic manipulation of the DNA encoding a subtilase. The modification(s) can be replacement(s) of the amino acid side chain(s), substitution(s), deletion(s) and/or insertions in or at the amino acid(s) of interest.
In the context of this invention, the term subtilase variant or mutated subtilase means a subtilase that has been produced by an organism which is expressing a mutant gene derived from a parent microorganism which possessed an original or parent gene and which produced a corresponding parent enzyme, the parent gene having been mutated in order to produce the mutant gene from which said mutated subtilase protease is produced when expressed in a suitable host.
Examples of relevant subtilisins comprise subtilisin BPN′, subtilisin amylosacchariticus, subtilisin 168, subtilisin mesentericopeptidase, subtilisin Carlsberg, subtilisin DY, subtilisin 309, subtilisin 147, PD498 (WO 93/24623), thermitase, aqualysin, Bacillus PB92 protease, proteinase K, Protease TW7, and Protease TW3.
Preferred commercially available protease enzymes include Alcalase™, Savinase™ Primase™, Duralase™, Neutrase®, Dyrazym®, Esperase™, Pyrase®, Pancreatic Trypsin NOVO (PTN), Bio-Feed™ Pro, Clear-Lens Pro, and Relase® (Novozymes A/S), Maxatase™ Maxacal™, Maxapem™, Properase™, Purafect™, Purafect OxP™ (Genencor International Inc.).
It is to be understood that also protease variants are contemplated as the parent protease. Examples of such protease variants are disclosed in EP 130.756 (Genentech), EP 214.435 (Henkel), WO 87/04461 (Amgen), WO 87/05050 (Genex), EP 251.446 (Genencor), EP 260.105 (Genencor), Thomas et al., (1985), Nature. 318, p. 375-376, Thomas et al., (1987), J. Mol. Biol., 193, pp. 803-813, Russel et al., (1987), Nature, 328, p. 496-500, WO 88/08028 (Genex), WO 88/08033 (Amgen), WO 89/06279 (Novo Nordisk A/S), WO 91/00345 (Novo Nordisk A/S), EP 525 610 (Solvay) and WO 94/02618 (Gist-Brocades N.V.).
The activity of proteases can be determined as described in “Methods of Enzymatic Analysis”, third edition, 1984, Verlag Chemie, Weinheim, vol. 5.
Lipolytic enzymes are classified in EC 3.1.1 Carboxylic Ester Hydrolases according to Enzyme Nomenclature (available at http://www.chem.qmw.ac.uk/iubmb/enzyme). The lipolytic enzyme may have a substrate specificity with an activity such as EC 3.1.1.3 triacylglycerol lipase, EC 3.1.1.4 phospholipase A2, EC 3.1.1.5 lysophospholipase, EC 3.1.1.26 galactolipase, EC 3.1.1.32 phospholipase A1, EC 3.1.1.73 feruloyl esterase or EC 3.1.1.74 cutinase.
The parent lipolytic enzyme may be prokaryotic, particularly a bacterial enzyme, e.g. from Pseudomonas. Examples are Pseudomonas lipases, e.g. from P. cepacia (U.S. Pat. No. 5,290,694, pdb file 1OIL), P. glumae (N Frenken et al. (1992), Appl. Envir. Microbiol. 58 3787-3791, pdb files 1TAH and 1QGE), P. pseudoalcaligenes (EP 334 462) and Pseudomonas sp. strain SD 705 (FERM BP-4772) (WO 95/06720, EP 721 981, WO 96/27002, EP 812 910). The P. glumae lipase sequence is identical to the amino acid sequence of Chromobacterium viscosum (DE 3908131 A1). Other examples are bacterial cutinases, e.g. from Pseudomonas such as P. mendocina (U.S. Pat. No. 5,389,536) or P. putida (WO 88/09367).
Alternatively, the parent lipolytic enzyme may be eukaryotic, e.g. a fungal lipolytic enzyme such as lipolytic enzymes of the Humicola family and the Zygomycetes family and fungal cutinases.
Examples of fungal cutinases are the cutinases of Fusarium solani pisi (S. Longhi et al., Journal of Molecular Biology, 268 (4), 779-799 (1997)) and Humicola insolens (U.S. Pat. No. 5,827,719).
The parent lipolytic enzyme may be fungal and may have an amino acid sequence that can be aligned with SEQ ID NO: 1 which is the amino acid sequence shown in positions 1-269 of SEQ ID NO: 2 of U.S. Pat. No. 5,869,438 for the lipase from Thermomyces lanuginosus (synonym Humicola lanuginosa), described in EP 258 068 and EP 305 216 (trade name LIPOLASE). The parent lipolytic enzyme may particularly have an amino acid sequence with at least 50% homology with SEQ ID NO: 1. In addition to the lipase from T. lanuginosus, other examples are a lipase from Penicillium camembertii (P25234), a lipase from Fusasrium, lipase/phospholipase from Fusarium oxysporum (EP 130064, WO 98/26057), lipase from F. heterosporum (R87979), lysophospholipase from Aspergillus foetidus (W33009), phospholipase A1 from A. oryzae (JP-A 10-155493), lipase from A. oryzae (D85895), lipase/ferulic acid esterase from A. niger (Y09330), lipase/ferulic acid esterase from A. tubingensis (Y09331), lipase from A. tubingensis (WO 98/45453), lysophospholipase from A. niger (WO 98/31790), lipase from F. solanii having an isoelectric point of 6.9 and an apparent molecular weight of 30 kDa (WO 96/18729).
Other examples are the Zygomycetes family of lipases comprising lipases having at least 50% homology with the lipase of Rhizomucor miehei (P19515. This family also includes the lipases from Absidia reflexa, A. sporophora, A. corymbifera, A. blakesleeana, A. griseola (all described in WO 96/13578 and WO 97/27276) and Rhizopus oryzae (P21811). Numbers in parentheses indicate publication or accession to the EMBL, GenBank, GeneSeqp or Swiss-Prot databases.
Examples of lipases include lipases derived from the following microorganisms. The indicated patent publications are incorporated herein by reference:
Humicola, e.g. H. brevispora, H. brevis var. thermoidea.
Pseudomonas, e.g. Ps. fragi, Ps. stutzeri, Ps. cepacia and Ps. fluorescens (WO 89/04361), or Ps. plantarii or Ps. gladioli (U.S. Pat. No. 4,950,417 (Solvay enzymes)) or Ps. alcaligenes and Ps. pseudoalcaligenes (EP 218 272) or.
Candida, e.g. C. cylindracea (also called C. rugosa) or C. antarctica (WO 88/02775) or C. antarctica lipase A or B (WO 94/01541 and WO 89/02916).
Geotricum, e.g. G. candidum (Schimada et al., (1989), J. Biochem., 106, 383-388).
Rhizopus, e.g. R. delemar (Hass et al., (1991), Gene 109, 107-113) or R. niveus (Kugimiya et al., (1992) Biosci. Biotech. Biochem 56, 716-719) or R. oryzae.
Bacillus, e.g. B. subtilis (Dartois et al., (1993) Biochemica et Biophysica acta 1131, 253-260) or B. stearothermophilus (JP 64/7744992) or B. pumilus (WO 91/16422).
Specific examples of readily available commercial lipases include Lipolase® (WO 98/35026) Lipolase™ Ultra, Lipozyme®, Palatase®, Novozym® 435, Lecitase® (all available from Novozymes A/S).
Examples of other lipases are Lumafast™, Ps. mendocian lipase from Genencor Int. Inc.; Lipomax™, Ps. pseudoalcaligenes lipase from Gist Brocades/Genencor Int. Inc.; Fusarium solani lipase (cutinase) from Unilever; Bacillus sp. lipase from Solvay enzymes. Other lipases are available from other companies.
It is to be understood that also lipase variants are contemplated as the parent enzyme. Examples of such are described in e.g. WO 93/01285 and WO 95/22615.
The activity of the lipase can be determined as described in “Methods of Enzymatic Analysis”, Third Edition, 1984, Verlag Chemie, Weinhein, vol. 4, or as described in AF 95/5 GB (available on request from Novozymes A/S).
Parent oxidoreductases (i.e. enzymes classified under the Enzyme Classification number E.C. 1 (Oxidoreductases) in accordance with the Recommendations (1992) of the International Union of Biochemistry and Molecular Biology (IUBMB)) include oxidoreductases within this group.
Examples include oxidoreductases selected from those classified under the Enzyme Classification (E.C.) numbers:
Glycerol-3-phosphate dehydrogenase (NAD) (1.1.1.8), Glycerol-3-phosphate dehydrogenase [NAD(P)] (1.1.1.94), Glycerol-3-phosphate 1-dehydrogenase [NADP] (1.1.1.94), Glucose oxidase (1.1.3.4), Hexose oxidase (1.1.3.5), Catechol oxidase (1.1.3.14), Bilirubin oxidase (1.3.3.5), Alanine dehydrogenase (1.4.1.1), Glutamate dehydrogenase (1.4.1.2), Glutamate dehydrogenase [NAD(P)] (1.4.1.3), Glutamate dehydrogenase (NADP) (1.4.1.4), L-Amino acid dehydrogenase (1.4.1.5), Serine dehydrogenase (1.4.1.7), Valine dehydrogenase (NADP) (1.4.1.8), Leucine dehydrogenase (1.4.1.9), Glycine dehydrogenase (1.4.1.10), L-Amino-acid oxidase (1.4.3.2.), D-Amino-acid oxidase(1.4.3.3), L-Glutamate oxidase (1.4.3.11), Protein-lysine 6-oxidase (1.4.3.13), L-lysine oxidase (1.4.3.14), L-Aspartate oxidase (1.4.3.16), D-amino-acid dehydrogenase (1.4.99.1), Protein disulfide reductase (1.6.4.4), Thioredoxin reductase (1.6.4.5), Protein disulfide reductase (glutathione) (1.8.4.2), Laccase (1.10.3.2), Catalase (1.11.1.6), Peroxidase (1.11.1.7), Lipoxygenase (1.13.11.12), Superoxide dismutase (1.15.1.1).
Said glucose oxidases may be derived from Aspergillus niger.
Said laccases may be derived from Polyporus pinsitus, Myceliophthora thermophila, Coprinus cinereus, Rhizoctonia solani, Rhizoctonia praticola, Scytalidium thermophilum and Rhus vernicifera. Because of the homology found between the above mentioned laccases (see WO 98/38287), they are considered to belong to the same class of laccases, namely the class of “Coprinus-like laccases”. Accordingly, in the present context, the term “Coprinus-like laccase” is intended to indicate a laccase which, on the amino acid level, displays a homology of at least 50% and less than 100% to the Coprinus cinereus laccase SEQ ID NO: 3, or at least 55% and less than 100% to the Coprinus cinereus laccase SEQ ID NO: 3, or at least 60% and less than 100% to the Coprinus cinereus laccase SEQ ID NO: 3, or at least 65% and less than 100% to the Coprinus cinereus laccase SEQ ID NO: 3, or at least 70% and less than 100% to the Coprinus cinereus laccase SEQ ID NO: 3, or at least 75% and less than 100% to the Coprinus cinereus laccase SEQ ID NO: 3, or at least 80% and less than 100% to the Coprinus cinereus laccase SEQ ID NO: 3, or at least 85% and less than 100% to the Coprinus cinereus laccase SEQ ID NO: 3, or at least 90% and less than 100% to the Coprinus cinereus laccase SEQ ID NO: 3, at least 95% and less than 100% or at least 98% and less than 100% to the Coprinus cinereus laccase SEQ ID NO: 3.
Bilirubin oxidases may be derived from Myrothechecium verrucaria.
The peroxidase may be derived from e.g. Soy bean, Horseradish or Coprinus cinereus.
The protein disulfide reductase may be any of the mentioned in Danish application nos. 768/93, 265/94 and 264/94 (Novo Nordisk A/S), which are hereby incorporated as references, including Protein Disulfide reductases of bovine origin, Protein Disulfide reductases derived from Aspergillus oryzae or Aspergillus niger, and DsbA or DsbC derived from Escherichia coli.
Specific examples of readily available commercial oxidoreductases include Gluzyme™ (enzyme available from Novozymes A/S). However, other oxidoreductases are available from others.
It is to be understood that also variants of oxidoreductases are contemplated as the parent enzyme.
The activity of oxidoreductases can be determined as described in “Methods of Enzymatic Analysis”, third edition, 1984, Verlag Chemie, Weinheim, vol. 3.
Parent carbohydrases may be defined as all enzymes capable of breaking down carbohydrate chains (e.g. starches) of especially five and six member ring structures (i.e. enzymes classified under the Enzyme Classification number E.C. 3.2 (glycosidases) in accordance with the Recommendations (1992) of the International Union of Biochemistry and Molecular Biology (IUBMB)). Also included in the group of carbohydrases according to the invention are enzymes capable of isomerizing carbohydrates e.g. six member ring structures, such as D-glucose to e.g. five member ring structures like D-fructose.
Examples include carbohydrases selected from those classified under the Enzyme Classification (E.C.) numbers: alpha-amylase (3.2.1.1), beta-amylase (3.2.1.2), glucan 1,4-alpha-glucosidase (3.2.1.3), cellulase (3.2.1.4), endo-1,3(4)-beta-glucanase (3.2.1.6), endo-1,4-beta-xylanase (3.2.1.8), dextranase (3.2.1.11), chitinase (3.2.1.14), polygalacturonase (3.2.1.15), lysozyme (3.2.1.17), beta-glucosidase (3.2.1.21), alpha-galactosidase (3.2.1.22), beta-galactosidase (3.2.1.23), amylo-1,6-glucosidase (3.2.1.33), xylan 1,4-beta-xylosidase (3.2.1.37), glucan endo-1,3-beta-D-glucosidase (3.2.1.39), alpha-dextrin endo-1,6-glucosidase (3.2.1.41), sucrose alpha-glucosidase (3.2.1.48), glucan endo-1,3-alpha-glucosidase (3.2.1.59), glucan 1,4-beta-glucosidase (3.2.1.74), glucan endo-1,6-beta-glucosidase (3.2.1.75), arabinan endo-1,5-alpha-arabinosidase (3.2.1.99), lactase (3.2.1.108), chitonanase (3.2.1.132) and xylose isomerase (5.3.1.5).
Examples of relevant carbohydrases include alpha-1,3-glucanases derived from Trichoderma harzianum; alpha-1,6-glucanases derived from a strain of Paecilomyces; beta-glucanases derived from Bacillus subtilis; beta-glucanases derived from Humicola insolens; beta-glucanases derived from Aspergillus niger, beta-glucanases derived from a strain of Trichoderma; beta-glucanases derived from a strain of Oerskovia xanthineolytica; exo-1,4-alpha-D-glucosidases (glucoamylases) derived from Aspergillus niger; alpha-amylases derived from Bacillus subtilis; alpha-amylases derived from Bacillus amyloliquefaciens; alpha-amylases derived from Bacillus stearothermophilus; alpha-amylases derived from Aspergillus oryzae; alpha-amylases derived from non-pathogenic microorganisms; alpha-galactosidases derived from Aspergillus niger, Pentosanases, xylanases, cellobiases, cellulases, hemi-cellulases derived from Humicola insolens; cellulases derived from Trichoderma reesei; cellulases derived from non-pathogenic mold; pectinases, cellulases, arabinases, hemi-celluloses derived from Aspergillus niger, dextranases derived from Penicillium lilacinum; endo-glucanase derived from non-pathogenic mold; pullulanases derived from Bacillus acidopullyticus; beta-galactosidases derived from Kluyveromyces fragilis; xylanases derived from Trichoderma reesei.
Specific examples of readily available commercial carbohydrases include Alpha-Gal™ Bio-Feed™ Alpha, Bio-Feed™ Beta, Bio-Feed™ Plus, Bio-Feed™ Plus, Novozyme® 188, Carezyme® (SEQ ID NO: 5), Celluclast®, Cellusoft®, Ceremyl®, Citrozym™, Denimax™ Dezyme™, Dextrozyme™, Finizym®, Fungamyl™, Gamanase™, Glucanex®, Lactozym®, Maltogenase™, Pentopan™, Pectinex™, Promozyme®, Pulpzyme™, Novamyl™, Termamyl®, AMG (Amyloglucosidase Novo), Maltogenase®, Sweetzyme®, Aquazym®, Natalase® (SEQ ID NO: 4), SP722, AA560 (all enzymes available from Novozymes A/S). Other carbohydrases are available from other companies.
The parent cellulase is preferably a microbial cellulase. As such, the cellulase may be selected from bacterial cellulases, e.g. Pseudomonas cellulases or Bacillus, such as the Bacillus strains described in U.S. Pat. No. 4,822,516, U.S. Pat. No. 5,045,464 or EP 468 464, or B. lautus (cf. WO 91/10732), cellulases. More preferably, the parent cellulases may be a fungal cellulase, in particular Humicola, Trichoderma, Irpex, Aspergillus, Penicillium, Myceliophthora or Fusarium cellulases. Examples of suitable parent cellulases are described in, e.g. WO 91/17244. Examples of suitable Trichoderma cellulases are those described in T. T. Teeri, Gene 51, 1987, pp. 43-52. Preferably, the parent cellulase is selected from the cellulases classified in family 45, e.g. the enzymes EG B (Pseudomonas fluorescens) and EG V (Humicola insolens), as described in Henrissat, B. et al.: Biochem. J. (1993), 293, p. 781-788.
It is well known that a number of alpha-amylases produced by Bacillus spp. are highly homologous on the amino acid level. For instance, the B. licheniformis alpha-amylase comprising the amino acid sequence shown in SEQ ID NO: 4 of WO 00/29560 (commercially available as Termamyl®) has been found to be about 89% homologous with the B. amyloliquefaciens alpha-amylase comprising the amino acid sequence shown in SEQ ID NO: 5 of WO 00/29560 and about 79% homologous with the B. stearothermophilus alpha-amylase comprising the amino acid sequence shown in SEQ ID NO: 3 of WO 00/29560. Further homologous alpha-amylases include an alpha-amylase derived from a strain of the Bacillus sp. NCIB 12289, NCIB 12512, NCIB 12513 or DSM 9375, all of which are described in detail in WO 95/26397, and the alpha-amylase described by Tsukamoto et al., Biochemical and Biophysical Research Communications, 151 (1988), pp. 25-31.
Still further homologous alpha-amylases include the alpha-amylase produced by the B. licheniformis strain described in EP 0252666 (ATCC 27811), and the alpha-amylases identified in WO 91/00353 and WO 94/18314. Other commercial Termamyl-like B. licheniformis alpha-amylases are Optitherm® and Takatherm® (available from Solvay), Maxamyl® (available from Gist-brocades/Genencor), Spezym AA® and Spezyme Delta AA™ (available from Genencor), and Keistase® (available from Daiwa).
Because of the substantial homology found between these alpha-amylases, they are considered to belong to the same class of alpha-amylases, namely the class of “Termamyl-like alpha-amylases”.
Accordingly, in the present context, the term “Termamyl-like alpha-amylase” is intended to indicate an alpha-amylase which, at the amino acid level, exhibits a substantial homology to Termamyl®, i.e., the B. licheniformis alpha-amylase having the amino acid sequence shown in SEQ ID NO: 4 (WO 00/29560). In other words, a Termamyl-like alpha-amylase is an alpha-amylase which has the amino acid sequence shown in SEQ ID NOS: 1, 2, 3, 4, 5, 6, 7 or 8 of WO 00/29560, and the amino acid sequence shown in SEQ ID NO: 1 of WO 95/26397 (the same as the amino acid sequence shown as SEQ ID NO: 7 of WO 00/29560) or in SEQ ID NO: 2 of WO 95/26397 (the same as the amino acid sequence shown as SEQ ID NO: 8 of WO 00/29560) or in Tsukamoto et al., 1988, (which amino acid sequence is shown in SEQ ID NO: 6 of WO 00/29560) or i) which displays at least 60% homology (identity), preferred at least 70%, more preferred at least 75%, even more preferred at least 80%, especially at least 85%, especially preferred at least 90%, especially at least 95%, even especially more preferred at least 97%, especially at least 99% homology with at least one of said amino acid sequences shown in SEQ ID NOS: 1, 2, 3, 4, 5, 6, 7 or 8 of WO 00/29560 and/or ii) displays immunological cross-reactivity with an antibody raised against one or more of said alpha-amylases, and/or iii) is encoded by a DNA sequence which hybridizes, under the low to very high stringency conditions (said conditions described below) to the DNA sequences encoding the above-specified alpha-amylases which are apparent from SEQ ID NOS: 9, 10, 11, 12, and 32, respectively, of the present application (which encodes the amino acid sequences shown in SEQ ID NOS: 1, 2, 3, 4, and 5 herein, respectively), from SEQ ID NO: 4 of WO 95/26397 (which DNA sequence, together with the stop codon TAA, is shown in SEQ ID NO: 13 herein and encodes the amino acid sequence shown in SEQ ID NO: 8 herein) and from SEQ ID NO: 5 of WO 95/26397 (shown in SEQ ID NO: 14 herein), respectively.
In connection with property i), the “homology” (identity) may be determined by use of any conventional algorithm, preferably by use of the gap progamme from the GCG package version 8 (August 1994) using default values for gap penalties, i.e., a gap creation penalty of 3.0 and gap extension penalty of 0.1 (Genetic Computer Group (1991) Programme Manual for the GCG Package, version 8, 575 Science Drive, Madison, Wis., USA 53711).
The parent Termamyl-like alpha-amylase backbone may in an embodiment have an amino acid sequence which has a degree of identity to SEQ ID NO: 4 (WO 00/29560) of at least 65%, preferably at least 70%, preferably at least 75%, more preferably at least 80%, more preferably at least 85%, even more preferably at least about 90%, even more preferably at least 95%, even more preferably at least 97%, and even more preferably at least 99% identity determined as described above.
A structural alignment between Termamyl® (SEQ ID NO: 4) and a Termamyl-like alpha-amylase may be used to identify equivalent/corresponding positions in other Termamyl-like alpha-amylases. One method of obtaining said structural alignment is to use the Pile Up programme from the GCG package using default values of gap penelties, i.e., a gap creation penalty of 3.0 and gap extension penalty of 0.1. Other structural alignment methods include the hydrophobic cluster analysis (Gaboriaud et al., (1987), FEBS LETTERS 224, pp. 149-155) and reverse threading (Huber, T; Torda, A E, PROTEIN SCIENCE Vol. 7, No. 1 pp. 142-149 (1998).
Parent glucoamylase contemplated according to the present invention include fungal glucoamylases, in particular fungal glucoamylases obtainable from an Aspergillus strain, such as an Aspergillus niger or Aspergillus awamori glucoamylases and variants or mutants thereof, homologous glucoamylases, and further glucoamylases being structurally and/or functionally similar to SEQ ID NO: 2 (WO 00/04136). Specifically contemplated are the Aspergillus niger glucoamylases G1 and G2 disclosed in Boel et al. (1984), “Glucoamylases G1 and G2 from Aspergillus niger are synthesized from two different but closely related mRNAs”, EMBO J. 3 (5), p. 1097-1102. The G2 glucoamylase is disclosed in SEQ ID NO: 2 (WO 00/04136). The G1 glucoamylase is disclosed in SEQ ID NO: 13 (WO 00/04136). Another AMG backbone contemplated is Talaromyces emersonii, especially Talaromyces emersonii DSM disclosed in WO 99/28448 (Novo Nordisk).
The homology referred to above of the parent glucoamylase is determined as the degree of identity between two protein sequences indicating a derivation of the first sequence from the second. The homology may suitably be determined by means of computer programs known in the art such as GAP provided in the GCG program package (Program Manual for the Wisconsin Package, Version 8, August 1994, Genetics Computer Group, 575 Science Drive, Madison, Wis., USA 53711) (Needleman, S. B. and Wunsch, C. D., (1970), Journal of Molecular Biology, 48, p. 443-453). Using Gap with the following settings for polypeptide sequence comparison: Gap creation penalty of 3.0 and Gap extension penalty of 0.1, the mature part of a polypeptide encoded by an analogous DNA sequence of the invention exhibits a degree of identity preferably of at least 60%, such as 70%, at least 80%, at least 90%, more preferably at least 95%, more preferably at least 97%, and most preferably at least 99% with the mature part of the amino acid sequence shown in SEQ ID NO: 2 (WO 00/04136).
Preferably, the parent glucoamylase comprise the amino acid sequences of SEQ ID NO: 2 (WO 00/04136); or allelic variants thereof; or fragments thereof that has glucoamylase activity.
A fragment of SEQ ID NO: 2 is a polypeptide which have one or more amino acids deleted from the amino and/or carboxyl terminus of this amino acid sequence. For instance, the AMG G2 (SEQ ID NO: 2) is a fragment of the Aspergillus niger G1 glucoamylase (Boel et al. (1984), EMBO J. 3 (5), p. 1097-1102) having glucoamylase activity. An allelic variant denotes any of two or more alternative forms of a gene occupying the same chromosomal locus. Allelic variation arises naturally through mutation, and may result in polymorphism within populations. Gene mutations can be silent (no change in the encoded polypeptide) or may encode polypeptides having altered amino acid sequences. An allelic variant of a polypeptide is a polypeptide encoded by an allelic variant of a gene.
It is to be understood that also carbohydrase variants are contemplated as the parent enzyme.
The activity of carbohydrases can be determined as described in “Methods of Enzymatic Analysis”, third edition, 1984, Verlag Chemie, Weinheim, vol. 4.
Parent transferases (i.e. enzymes classified under the Enzyme Classification number E.C. 2 in accordance with the Recommendations (1992) of the International Union of Biochemistry and Molecular Biology (IUBMB)) include transferases within this group.
The parent transferases may be any transferase in the subgroups of transferases: transferases transferring one-carbon groups (E.C. 2.1); transferases transferring aldehyde or residues (E.0 2.2); acyltransferases (E.C. 2.3); glucosyltransferases (E.C. 2.4); transferases transferring alkyl or aryl groups, other that methyl groups (E.C. 2.5); transferases transferring nitrogeneous groups (2.6).
In a preferred embodiment the parent transferase is a transglutaminase E.C 2.3.2.13 (Protein-glutamine μ-glutamyltransferase).
Transglutaminases are enzymes capable of catalyzing an acyl transfer reaction in which a gamma-carboxyamide group of a peptide-bound glutamine residue is the acyl donor. Primary amino groups in a variety of compounds may function as acyl acceptors with the subsequent formation of monosubstituted gamma-amides of peptide-bound glutamic acid. When the epsilon-amino group of a lysine residue in a peptide-chain serves as the acyl acceptor, the transferases form intramolecular or intermolecular gamma-glutamyl-epsilon-lysyl crosslinks.
Examples of transglutaminases are described in the pending DK patent application no. 990/94 (Novo Nordisk A/S).
The parent transglutaminase may be of human, animal (e.g. bovine) or microbial origin.
Examples of such parent transglutaminases are animal derived Transglutaminase, FXIIIa; microbial transglutaminases derived from Physarum polycephalum (Klein et al., Journal of Bacteriology, Vol. 174, p. 2599-2605); transglutaminases derived from Streptomyces sp., including Streptomyces lavendulae, Streptomyces lydicus (former Streptomyces libani) and Streptoverticillium sp., including Streptoverticillium mobaraense, Streptoverticillium cinnamoneum, and Streptoverticillium griseocarneum (Motoki et al., U.S. Pat. No. 5,156,956; Andou et al., U.S. Pat. No. 5,252,469; Kaempfer et al., Journal of General Microbiology, Vol. 137, p. 1831-1892; Ochi et al., International Journal of Sytematic Bacteriology, Vol. 44, p. 285-292; Andou et al., U.S. Pat. No. 5,252,469; Williams et al., Journal of General Microbiology, Vol. 129, p. 1743-1813).
It is to be understood that also transferase variants are contemplated as the parent enzyme.
The activity of transglutaminases can be determined as described in “Methods of Enzymatic Analysis”, third edition, 1984, Verlag Chemie, Weinheim, vol. 1-10.
Parent phytases are included in the group of enzymes classified under the Enzyme Classification number E.C. 3.1.3 (Phosphoric Monoester Hydrolases) in accordance with the Recommendations (1992) of the International Union of Biochemistry and Molecular Biology (IUBMB)).
Phytases are enzymes produced by microorganisms which catalyse the conversion of phytate to inositol and inorganic phosphorus
Phytase producing microorganisms comprise bacteria such as Bacillus subtilis, Bacillus natto and Pseudomonas; yeasts such as Saccharomyces cerevisiae; and fungi such as Aspergillus niger, Aspergillus ficuum, Aspergillus awamori, Aspergillus oryzae, Aspergillus terreus or Aspergillus nidulans, and various other Aspergillus species).
Examples of parent phytases include phytases selected from those classified under the Enzyme Classification (E.C.) numbers: 3-phytase (3.1.3.8) and 6-phytase (3.1.3.26).
The activity of phytases can be determined as described in “Methods of Enzymatic Analysis”, third edition, 1984, Verlag Chemie, Weinheim, vol. 1-10, or may be measured according to the method described in EP-A1-0 420 358, Example 2A.
Suitable lyases include Polysaccharide lyases: Pectate lyases (4.2.2.2) and pectin lyases (4.2.2.10), such as those from Bacillus licheniformis disclosed in WO 99/27083.
Without being limited thereto suitable protein disulfide isomerases include PDIs described in WO 95/01425 (Novo Nordisk A/S) and suitable glucose isomerases include those described in Biotechnology Letter, Vol. 20, No 6, June 1998, pp. 553-56.
Contemplated isomerases include xylose/glucose Isomerase (5.3.1.5) including Sweetzyme®.
The environmental allergens that are of interest for epitope mapping include allergens from pollen, dust mites, mammals, venoms, fungi, food items, and other plants.
Pollen, allergens include but are not limited to those of the order Fagales, Oleales, Pinales, Poales, Asterales, and Urticales; including those from Betula, Alnus, Corylus, Carpinus, Olea, Phleum pratense and Artemisia vulgaris, such as Aln g1, Cor a1, Car b1, Cry j1, Amb a1 and a2, Art v1, Par j1, Ole e1, Ave v1, and Bet v1 (WO 99/47680).
Mite allergens include but are not limited to those from Derm. farinae and Derm. pteronys., such as Der f1 and f2, and Der p1 and p2.
From mammals, relevant environmental allergens include but are not limited to those from cat, dog, and horse as well as from dandruff from the hair of those animals, such as Fel d1; Can f1; Equ c1; Equ c2; Equ c3.
Venum allergens include but are not limited to PLA2 from bee venom as well as Apis m1 and m2, Ves g1, g2 and g5, Ves v5 and to Pol and Sol allergens.
Fungal allergens include those from Alternaria alt. and Cladospo. herb. such as Alt a1 and Cla h1.
Food allergens include but are not limited to those from milk (lactoglobulin), egg (ovalbumin), peanuts, hazelnuts, wheat (alpha-amylase inhibitor),
Other plant allergens include latex (hevea brasiliensis).
In addition, a number of proteins of interest for expression in transgenic plants could be useful objects for epitope engineering. If for instance a heterologous enzyme is introduced into a transgenic plant e.g. to increase the nutritional value of food or feed derived from that plant, that enzyme may lead to allergenicity problems in humans or animals ingesting the plant-derived material. Epitope mapping and engineering of such heterologous enzymes or other proteins of transgenic plants may lead to reduction or elimination of this problem. Hence, the methods of this patent are also useful for potentially modifying proteins for heterologous expression in plants and plant cells.
Horse Radish Peroxidase labelled pig anti-rabbit-Ig (Dako, DK, P217, dilution 1:1000)
Rat anti-mouse IgE (Serotec MCA419; dilution 1:100)
Mouse anti-rat IgE (Serotec MCA193; dilution 1:200)
Biotin-labelled mouse anti-rat IgG1 monoclonal antibody (Zymed 03-9140; dilution 1:1000)
Biotin-labelled rat anti-mouse IgG1 monoclonal antibody (Serotec MCA336B; dilution 1:2000)
Streptavidin-horse radish peroxidase (Kirkegård & Perry 14-30-00; dilution 1:1000).
PBS (pH 7.2 (1 liter))
Washing buffer PBS, 0.05% (v/v) Tween 20
Blocking buffer PBS, 2% (wt/v) Skim Milk powder
Dilution buffer PBS, 0.05% (v/v) Tween 20, 0.5% (wt/v) Skim Milk powder
Citrate buffer 0.1M, pH 5.0-5.2
Stop-solution (DMG-buffer)
Sodium Borate, borax (Sigma)
3,3-Dimethyl glutaric acid (Sigma)
Tween 20: Poly oxyethylene sorbitan mono laurate (Merck cat no. 822184)
PMSF (phenyl methyl sulfonyl flouride) from Sigma
Succinyl-Alanine-Alanine-Proline-Phenylalanine-paranitro-anilide (Suc-AAPF-pNP) Sigma no. S-7388, Mw 624.6 g/mol.
mPEG (Fluka)
OPD: o-phenylene-diamine, (Kementec cat no. 4260)
The implementation consists of 3 pieces of code:
1. The core program (see above), written in C (see Appendix A).
2. A “wrapping” cgi-script run by the web server, written in Python (see Appendix B).
3. A HTML page defining the input/submission form (see Appendix C).
The wrapper receives the input and calls the core program and several other utilities.
Apart from the standard Unix utility programs (my, rm, awk, etc.) the following must be installed:
A web server capable of running cgi-scripts, e.g. Apache
Python 1.5 or later
Gnuplot 3.7 or later
DSSP, version July 1995
1. A Brookhaven PDB file with the structure of the protein
2. The output of DSSP called with the above PDB file.
3. Maximum distance between adjacent residues
4. Minimum solvent accessible surface area for each residue
5. Maximum epitope size (max distance between any two residues in epitope)
6. Maximum number of non-redundant epitopes to include (0=all)
7. The shortest acceptable epitope (as a fraction of the length of the epitope consensus sequence).
8. Epitope consensus sequence describing which residues are possible at the different positions. An example is shown below:
KR (Lys or Arg allowed)
AILV− (Ala, Ile, Leu, Val or missing residue allowed)
* (All Residues Allowed, but there Must be a Residue)
DE (Asp or Glu allowed)
(*, ? or − in first or last position is allowed but obsolete. (− in first position is ignored.))
Examples of matching epitopes:
The “core” of the program is the algorithm that scans the protein surface for the epitope patterns. The principle is that several “trees” are built, where each of their branches describes one epitope:
1. All residues in the protein are checked according to: a) Does the residue type match the first residue of the epitope consensus sequence. b) Is the surface accessibility greater than or equal to the given threshold. If both requirements are fulfilled, the protein residue is considered as one root in the epitope tree. Remark that there are usually many roots.
2. For each of the residues defined as roots, all residues within the given threshold distance between adjacent residues (e.g. 7 Angstroms) are checked for the same as above: a) Does the residue type match the second residue of the epitope consensus sequence. b) Is the surface accessibility greater than or equal to the given threshold. If yes, the protein residue is considered as a “child” of the root. The spatial position of a residue is defined as the coordinates of its C-alpha atom.
3. The procedure from step 2 is repeated for the next residue in the epitope consensus sequence, where each of the “childs” found in step 2 are now “roots” of new childs. If a gap is defined in the epitope consensus sequence, a “missing” residue is allowed, and the coordinates of the root (also called “parent”) is used.
4. This procedure is repeated for all residues in the epitope consensus sequence.
5. In this way a number of trees (corresponding to the number of roots found in step 1) are found. Notice that the same protein residue can be present many places in the trees.
6. If no epitopes that matches the length of the epitope consensus sequence are found, the longest shorter epitopes that matches the first n residues of the epitope consensus sequence are used, where n is an integer smaller than the length of the epitope consensus sequence. If n is smaller than the length of the epitope consensus sequence multiplied by the fraction value defining the shortest acceptable epitope length, no epitopes are written to the output, and steps 7, 8 and 9 are skipped.
7. The epitopes are extracted from the trees by traversing down from each of the “childs” in the last level. The algorithm also finds epitopes which have the same protein residue present more than once. This is, of course, an artifact and such epitopes are discarded. Every epitope is then checked for its size, that is, the maximum distance between any two residues which are members of the epitope. If this exceeds the threshold, the epitope is discarded.
8. Redundant epitopes are removed. Epitopes containing one or more gaps are redundant if they are subsets of other epitopes without or with fewer gaps. For example: A82-gap-F45-G44-K43 is a subset of A82-L46-F45-G44-K43, and is therefore discarded.
9. For every epitope, the total solvent accessible surface area is calculated (by adding the contributions from each residue as found by the DSSP program). The epitopes are sorted according to this area in descending order. If a maximum number of n non-redundant epitopes has been specified, the n epitopes with largest solvent accessible surface area are selected.
10. The output consists of a list of the found epitopes, along with information of the epitope consensus sequence used and other internal parameters. A separate file containing the number of epitopes that each of the protein residues is a member of is also written.
The wrapper
1. One PDB file, describing one structure, or one ZIP file, containing a number of PDB files, each describing one structure. The ZIP file must not contain subfolders.
2. An epitope consensus sequence or which part of the current epitope library to use (full library or IgE part or IgG part).
3. Maximum distance between adjacent residues
4. Minimum solvent accessible surface area for each residue
5. Maximum epitope size (max distance between any two residues in epitope)
6. Maximum number of non-redundant epitopes to include (0=all)
7. Whether to use sequential numbering (1,2,3,4, etc) or PDB-file numbering.
The core program accepts only one structure and one epitope consensus sequence. It is usually desirable to use a library of epitope consensus sequences and sometimes several protein structures. The wrapper reads the user input and calls the utility programs and the core program the necessary number of times. The output is collected and presented on the web page returned to the user.
Depending on the type of input, the wrapper works in different modes:
Epitope consensus can be given directly or taken from a library
Input type can be a single PDB file or a collection of PDB file given as a ZIP-file.
Any of the four possible combinations are allowed.
The epitope library consists of a number of text files, each containing one epitope consensus sequence as specified above.
The layout of the wrapper is like this:
1. Check if the program is already in use from somewhere else (this is done by checking for a lock file when the wrapper starts. If it does not exist, it is created and removed again when the program is finished).
2. If the epitope consensus sequences are to be read from the library, make an internal list of the desired library entries.
3. If the input type is a ZIP file, unzip the file and create one new directory for each of the contained PDB files. Move each PDB file to its corresponding directory.
4. Do a loop over the structures and/or epitope consensus sequences. For each structure/epitope consensus sequence pair, DSSP and the core program is called with the required parameters. If the input type is a ZIP file, the outputs are put in the appropriate directories.
5. If the epitope library is used, a sum file containing the total number of epitopes each residue is a member of. (Such a file is generated by the core program for each epitope consensus sequence—here a sum of these files is calculated). If input type is a ZIP file, a sum file is generated for each structure and put in the appropriate directory.
6. If the epitope library is used, a file containing the total number of epitopes found from each entry in the epitope library. If the input type is a PDB file, the file contains only one line (with a number of data corresponding to the library size). If the input type is a ZIP file, there is one line for each structure.
7. Depending on the combination of input type (ZIP or single PDB) and epitope consensus sequence source (typed-in or epitope library), different information is returned to the user:
Single PDB+typed in epitope: Graph of numbers of epitopes that each residue is a member of. List of found epitopes.
ZIP file+typed in epitope: Graphs (one for each structure) of numbers of epitopes that each residue is a member of. Lists (one for each structure) of found epitopes.
Single PDB+epitope library: Graph of numbers of epitopes that each residue is a member of (total for the complete library).
ZIP file+epitope library: Graphs (one for each structure) of numbers of epitopes that each residue is a member of (total for the complete library).
Data flow sheets for the four different are shown in the FIG.
8. For all modes except Single PDB+typed in epitope, a ZIP file containing all output files is created and returned to the user.
Twenty intratracheal (IT) immunisations were performed weekly with 0,100 ml 0.9% (wt/vol) NaCl (control group), or 0,100 ml of a protein dilution (˜0, 1-1 mg/ml). Each group contained 10 rats. Blood samples (2 ml) were collected from the eye one week after every second immunisation. Serum was obtained by blood clothing and centrifugation and analysed as indicated below.
Twenty subcutaneous (SC) immunisations were performed weekly with 0.05 ml 0.9% (wt/vol) NaCl (control group), or 0.050 ml of a protein dilution (˜0.01-0.1 mg/ml). Each group contained 10 female Balb/C mice (about 20 grams) purchased from Bomholdtgaard, Ry, Denmark. Blood samples (0.100 ml) were collected from the eye one week after every second immunisation. Serum was obtained by blood clothing and centrifugation and analysed as indicated below.
Specific IgG1 and IgE levels were determined using the ELISA specific for mouse or rat IgG1 or IgE. Differences between data sets were analysed by using appropriate statistical methods.
A fresh stock solution of cyanuric chloride in acetone (10 mg/ml) is diluted into PBS, while stirring, to a final concentration of 1 mg/ml and immediately aliquoted into CovaLink NH2 plates (100 microliter per well) and incubated for 5 minutes at room temperature. After three washes with PBS, the plates are dryed at 50° C. for 30 minutes, sealed with sealing tape, and stored in plastic bags at room temperature for up to 3 weeks.
Mouse anti-Rat IgE was diluted 200× in PBS (5 microgram/ml). 100 microliter was added to each well. The plates were coated overnight at 4° C.
Unspecific adsorption was blocked by incubating each well for 1 hour at room temperature with 200 microliter blocking buffer. The plates were washed 3× with 300 microliter washing buffer.
Unknown rat sera and a known rat IgE solution were diluted in dilution buffer: Typically 10×, 20× and 40× for the unknown sera, and 1/2 dilutions for the standard IgE starting from 1 μg/ml. 100 microliter was added to each well. Incubation was for 1 hour at room temperature.
Unbound material was removed by washing 3× with washing buffer. The anti-rat IgE (biotin) was diluted 2000× in dilution buffer. 100 microliter was added to each well. Incubation was for 1 hour at room temperature. Unbound material was removed by washing 3× with washing buffer.
Streptavidin was diluted 1000× in dilution buffer. 100 microliter was added to each well. Incubation was for 1 hour at room temperature. Unbound material was removed by washing 3× with 300 microliter washing buffer. OPD (0.6 mg/ml) and H2O2 (0.4 microliter/ml) were dissolved in citrate buffer. 100 microliter was added to each well. Incubation was for 30 minutes at room temperature. The reaction was stopped by addition of 100 microliter H2SO4. The plates were read at 492 nm with 620 nm as reference.
Similar determination of IgG can be performed using anti Rat-IgG and standard rat IgG reagents.
Similar determinations of IgG and IgE in mouse serum can be performed using the corresponding species-specific reagents.
To determine the IgE binding capacity of protein variants one can use an assay, essentially as described above, but using sequential addition of the following reagents:
1) Mouse anti-rat IgE antibodies coated in wells;
2) Known amounts of rat antiserum containing IgE against the parent protein;
3) Dilution series of the protein variant in question (or parent protein as positive control);
4) Rabbit anti-parent antibodies
5) HRPO-labelled anti-rabbit Ig antibodies for detection using OPD as described.
The relative IgE binding capacity (end-point and/or affinity) of the protein variants relative to that of the parent protein are determined from the dilution-response curves. The IgE-positive serum can be of other animals (including humans that inadvertently have been sensitized to the parent protein) provided that the species-specific anti-IgE capture antibodies are changed accordingly.
C-ELISA was performed according to established procedures. In short, a 96 well ELISA plate was coated with the parent protein. After proper blocking and washing, the coated antigen was incubated with rabbit anti-enzyme polyclonal antiserum in the presence of various amounts of modified protein (the competitior). The residual amount of rabbit antiserum was detected by horseraddish peroxidase-labelled pig anti-rabbit immunoglobulin.
For purposes of the present invention, the degree of homology may be suitably determined by means of computer programs known in the art, such as GAP provided in the GCG program package (Program Manual for the Wisconsin Package, Version 8, August 1994, Genetics Computer Group, 575 Science Drive, Madison, Wis., USA 53711) (Needleman, S. B. and Wunsch, C. D., (1970), Journal of Molecular Biology, 48, 443-45).
In the present invention, corresponding (or homologous) positions in subtilisin protease sequences are defined by alignment with Subtilisin Novo (BPN′) from B. amyloliquefaciens, as shown in Table 1A for Alcalase, Protease B, Esperase, Protease C, Protease D, Protease E, Protease A, PD498, Properase, Relase, Savinase.
To find the homologous positions in subtilisin protease sequences not shown in the alignment of Table 1A, the sequence of interest is aligned to the sequence of BPN′ as shown in Table 1B for YaB protease and Subtilisin sendai. The new sequence is aligned to the BPN′ sequence by using the GAP alignment to the most homologous sequence found by the GAP program. GAP is provided in the GCG program package (Program Manual for the Wisconsin Package, Version 8, August 1994, Genetics Computer Group, 575 Science Drive, Madison, Wis., USA 53711) (Needleman, S. B. and Wunsch, C. D., (1970), Journal of Molecular Biology, 48, 443-45).
The sequence of the YaB protease is disclosed by Kaneko, R.; Koyama, N.; Tsai, Y.-C.; Juang, R.-Y.; Yoda, K.; Yamasaki, M.; Molecular cloning of the structural gene for alkaline elastase YaB, a new subtilisin produced by an alkalophilic Bacillus strain. J. Bacteriol. 171:5232 (1989), it has Swissprot number P20724, and is shown in SEQ ID NO: 35.
The sequence of the Subtilisin sendai is disclosed by Yamagata, Y.; Isshiki, K.; Ichishima, E.; Subtilisin Sendai from alkalophilic Bacillus sp.: molecular and enzymatic properties of the enzyme and molecular cloning and characterization of the gene, aprS. Enzyme Microb. Technol. 17:653 (1995), it has SPTREMBL accession number Q45522, and is shown in SEQ ID NO: 34.
Identity to savinase: 81.7%
identity to savinase: 82.09%
These alignments reveal that that homology between various subtilisin proteases ranges between 100% and 40%.
Unless specified, subtilisin sequences and positions mentioned in the present invention, are given in the BPN′ numeration, and can be converted by alignment as 50 described above (Tables 1A and 1B).
Sequence identities between different pairs of proteases are given below:
Sequence identity to BPN′:
Sequence identity to Savinase:
The protein structure of PD498 is disclosed in WO 98/35026 (Novo Nordisk). The structure of Savinase can be found in BETZEL et al, J.MOL.BIOL., Vol. 223, p. 427, 1992 (1 svn.pdb).
Three dimensional structural models of the subtilisins properase, release, ProteaseC, ProteaseD, ProteaseE, and PROTEASE B were constructed based on three dimensional structure of Savinase (Protein Data Bank entry 1SVN; Betzel, C., Klupsch, S., Papendorf, G., Hastrup, S., Branner, S., Wilson, K. S.: Crystal structure of the alkaline proteinase Savinase from Bacillus lentus at 1.4 Å resolution. J Mol Biol 223 pp. 427 (1992)) using the Modeller 5o (
The sequence of the T. lanuginosus lipase (trade name Lipolase) is provided in SEQ ID NO: 1 and the structure is disclosed in WO 98/35026 and as “1tib”, available in Structural Classification of Proteins (SCOP) on the Internet.
The amylase used in the examples is the alpha-amylase of Bacillus halmapalus (WO 96/23873), which is called amylase SP722 (the wild-type). Its sequence is shown in SEQ ID NO: 2 and the corresponding protein structure was built from the BA2 structure, as described in WO 96/23874. The first four amino acids of the structural model are not defined, hence the sequence used for numeration of amino acid residues in the examples of this invention is four amino acids shorter than the one of the full length protein SP722.
Several variants of this amylase are available (WO 96/23873). One particularly useful variant has deleted two amino acid residues at D-G at positions 183 and 184 of the SEQ ID NO: 2 (corresponding to residues 179 and 180 of the modelled structure). This variant is called JE-1 or Natalase.
Another amylase that is particularly useful is the amylase AA560: This alkaline alpha-amylase may be derived from a strain of Bacillus sp. DSM 12649. The strain was deposited on 25 Jan. 1999 by the assignee under the terms of the Budapest Treaty on the International Recognition of the Deposit of Microorganisms for the Purposes of Patent Procedure at Deutshe Sammmlung von Microorganismen and Zellkulturen GmbH (DSMZ), Mascheroder Weg 1b, D-38124 Braunschweig DE.
The laccase used in this invention is that from Coprinus cinereus (WO 98/38287), the sequence of which is shown as SEQ ID NO: 3. The structure of the Myceliophthora thermophila laccase can be built by homology modeling to the Coprinus cinereus laccase as shown in WO 98/38287.
The cellulase sequence and structure used in the present invention is that of the core fragment of endoglucanase V from Humicola insolens (aka Cel45 or Carezyme). The core fragment structure is available as 3eng.pdb (G. J. DAVIES et al. ACTA CRYSTALLOGR., SECT.D, Vol. 52, p. 7 1996; G. J. DAVIES et al. BIOCHEMISTRY, V. 34, p. 16210, 1995); SwissProt accession number P43316, and the sequences shown in SEQ ID 4. The corresponding full-length sequence is disclosed in WO 91/17243 and shown here in SEQ ID NO: 5. The numeration of all description and claims of this invention pertain to the core fragment, however, it is contemplated that all claims are also valid for the corresponding positions in the full-length protein.
High diversity libraries (1012) of phages expressing random hexa-, nona- or dodecapetides as part of their membrane proteins, were screened for their capacity to bind purified specific rabbit IgG, and purified rat and mouse IgG1 and IgE antibodies. The phage libraries were obtained according to prior art (se WO 9215679 hereby incorporated by reference).
The antibodies were raised in the respective animals by subcutaneous, intradermal, or intratracheal injection of relevant proteins (e.g. proteases, lipolytic enzymes, amylases, oxidoreductases) dissolved in phosphate buffered saline (PBS). The respective antibodies were purified from the serum of immunised animals by affinity chromatography using paramagnetic immunobeads (Dynal AS) loaded with pig anti-rabbit IgG, mouse anti-rat IgG1 or IgE, or rat anti-mouse IgG1 or IgE antibodies.
The respective phage libraries were incubated with the IgG, IgG1 and IgE antibody coated beads. Phages, which express oligopeptides with affinity for rabbit IgG, or rat or mouse IgG1 or IgE antibodies, were collected by exposing these paramagnetic beads to a magnetic field. The collected phages were eluted from the immobilised antibodies by mild acid treatment, or by elution with intact enzyme. The isolated phages were amplified as know to the specialist. Alternatively, immobilised phages were directly incubated with E. coli for infection. In short, F-factor positive E. coli (e.g. XL-1 Blue, JM101, TG1) were infected with M13-derived vector in the presence of a helper-phage (e.g. M13K07), and incubated, typically in 2xYT containing glucose or IPTG, and appropriate antibiotics for selection. Finally, cells were removed by centrifugation. This cycle of events was repeated 2-5 times on the respective cell supernatants. After selection round 2, 3, 4, and 5, a fraction of the infected E. coli was incubated on selective 2xYT agar plates, and the specificity of the emerging phages was assessed immunologically. Thus, phages were transferred to a nitrocellulase (NC) membrane. For each plate, 2 NC-replicas were made. One replica was incubated with the selection antibodies, the other replica was incubated with the selection antibodies and the immunogen used to obtain the antibodies as competitor. Those plaques that were absent in the presence of immunogen, were considered specific, and were amplified according to the procedure described above.
The specific phage-clones were isolated from the cell supernatant by centrifugation in the presence of polyethylenglycol. DNA was isolated, the DNA sequence coding for the oligopeptide was amplified by PCR, and the DNA sequence was determined, all according to standard procedures. The amino acid sequence of the corresponding oligopeptide was deduced from the DNA sequence.
Thus, a number of peptide sequences with specificity for the protein specific antibodies, described above, were obtained. These sequences were collected in a database, and analysed by sequence alignment to identify epitope patterns. For this sequence alignment, conservative substitutions (e.g. aspartate for glutamate, lysine for arginine, serine for threonine) were considered as one. This showed that most sequences were specific for the protein the antibodies were raised against. However, several cross-reacting sequences were obtained from phages that went through 2 selection rounds only. In the first round 22 epitope patterns were identified.
In further rounds of phage display, more antibody binding sequences were obtained leading to more epitope patterns. Further, the literature was searched for peptide sequences that have been found to bind environmental allergen-specific antibodies (J All Clin Immunol 93 (1994) pp. 34-43; Int Arch Appl Immunol 103 (1994) pp. 357-364; Clin Exp Allergy 24 (1994) pp. 250-256; Mol Immunol 29 (1992) pp. 1383-1389; J Immunol 121 (1989) pp. 275-280; J. Immunol 147 (1991) pp. 205-211; Mol Immunol 29 (1992) pp. 739-749; Mol Immunol 30 (1993) pp. 1511-1518; Mol Immunol 28 (1991) pp. 1225-1232; J. Immunol 151 (1993) pp. 7206-7213). These antibody binding peptide sequences were included in the database.
A first generation database of antibody binding peptides identified and their corresponding epitope patterns are shown in Table 2-7 below.
Tables 2-7: Overview of the antibody binding peptide sequences, epitope patterns and epitope sequences. The type of antibody used for identifying the antibody binding sequences is indicated as IgG or IgE and the species from which the antibodies were derived are indicated as mo (mouse), ra (rat) and hu (human).
Epitope sequences were assessed manually on the screen on the 3D-structure of the protein of interest, using appropriate software (e.g. SwissProt Pdb Viewer, WebLite Viewer).
In a first step, the identified epitope patterns were fitted with the 3D-structure of the enzymes. A sequence of at least 3 amino acids, defining a specific epitope pattern, was localised on the 3D-structure of the acceptor protein. Conservative mutations (e.g. aspartate for glutamate, lysine for arginine, serine for threonine) were considered as one for those patterns for which phage display had evidenced such exchanges to occur. Among the possible sequences provided by the protein structure, only those were retained where the sequence matched a primary sequence, or where it matched a structural sequence of amino acids, where each amino acid was situated within a distance of 5 Å from the next one. Occasionally, the mobility of the amino acid side chains, as provided by the software programme, had to be taken in to consideration for this criterium to be fulfilled.
Secondly, the remaining anchor amino acids as well as the variable amino acids, i.e. amino acids that were not defining a pattern but were present in the individual sequences identified by phage library screening, were assessed in the area around the various amino acid sequences localised in step 1. Only amino acids situated within a distance of 5 Å from the next one were included.
Finally, an accessibility criterium was introduced. The criterium was that at least half of the anchor amino acids had a surface that was >30% accessible. Typically, 0-2 epitopes were retained for each epitope pattern. In some cases, two different amino acids could with equal probability be part of the epitope (e.g. two leucines located close to each other in the protein 3D-structure). For example, in Savinase two epitopes actually fit to the antibody binding peptide LDQIFFTRW (SEQ ID NO:62): L75 D41 Q2 179 and L42 D41 Q2 I79. A shorthand notation for such a situation is: L42/L75 D41 Q2 I79.
Thus, a number of epitope sequences were identified and localised on the surface of various proteins. As suggested by sequence alignment of the antibody binding peptides, structural analysis confirmed most of the epitopes to be enzyme specific, with only few exceptions. Overall, most of the identified epitopes were at least partially structural. However, some proteins (e.g. amylase) expressed predominantly primary sequence epitopes. Typically, the epitopes were localised in very discrete areas of the enzymes, and different epitope sequences often shared some amino acids (hot-spots).
The identified epitope sequences are shown in Tables 2-7.
Bet v1 (WO 99/47680) was used as the parent protein for identification of epitope sequences that may cross react with enzyme epitopes. The structural coordinates from 1BV1.pdb (Gajhede et al., NAT.STRUCT.BIOL., Vol. 3, p. 1040, 1996) were used as well the corresponding sequence (Swissprot accession number P15494). The epitope pattern P>PAP>S (which had been identified from antibody binding peptides specific for anti-Lipolase antibodies) was found to match three (overlapping) epitope sequences on the surface of Bet v1:
It is common knowledge that amino acids that surround binding sequences can affect binding of a ligand without participating actively in the binding process. Based on this knowledge, areas covered by amino acids with potential steric effects on the epitope-antibody interaction, were defined around the identified epitopes. Practically, all amino acids situated within 5 Å from the amino acids defining the epitope were included. The accessibility criterium was not included for defining epitope areas, as hidden amino acids can have an effect on the surrounding structures.
For Savinase, the following amino acid residues belong to the epitope area that correspond to each epitope sequence indicated in Table 2:
For PD498, the following amino acid residues belong to the epitope area that correspond to each epitope sequence indicated in Table 3:
For Lipolase, the following amino acid residues belong to the epitope area that correspond to each epitope sequence indicated in Table 4:
For Amylase, the following amino acid residues belong to the epitope area that correspond to each epitope sequence indicated in Table 5:
Having identified ‘antibody binding peptide’ sequences and by consensus analysis also “epitope patterns” (e.g. >DF>>K>), one can identify potential epitope sequences on the 3-dimensional surface of a parent protein (=acceptor protein) in a semi-automated manner using the following method:
The anchor amino acid residues are transferred to a three dimensional structure of the protein of interest, by colouring D red, F white and K blue. Any surface area having all three residues within a distance of 18 Å, preferably 15 Å, more preferably 12 Å, is then claimed to be an epitope. The relevant distance can easily be measured using e.g. molecular graphics programs like Insight!! from Molecular Simulations Inc.
The residues in question should be surface exposed, meaning that the residue should be more than 20% surface exposed, preferably more than 50% surface exposed, more preferably 70% surface exposed. The percentage “surface accessible area” of an amino acid residue of the parent protein is defined as the Connolly surface (ACC value) measured using the DSSP program to the relevant protein part of the structure, divided by the residue total surface area and multiplied by 100. The DSSP program is disclosed in W. Kabsch and C. Sander, BIOPOLYMERS 22 (1983) pp. 2577-2637. The residue total surface areas of the 20 natural amino acids are tabulated in Thomas E. Creighton, PROTEINS; Structure and Molecular Principles, W.H. Freeman and Company, NY, ISBN: 0-7167-1566-X (1984).
Substitutions of one or more residue (s) within 18 Å, preferably 15 Å, more preferably 12 Å, around the geometrical center of the residues involved in the epitope, for a bigger or smaller residues, may destroy the epitope, and make the protein less antigenic.
Residues involved in epitope is 2, preferably 3 and more preferably 4
Epitope sequences and hot-spots amino acids were mutated using standard techniques know to the person skilled in the field (e.g. site-directed mutagenesis, error-prone PCR—see for example Sambrook et al. (1989), Molecular Cloning. A Laboratory Manual, Cold Spring Harbour, N Y).
In the examples shown below, variants were made by site-directed mutagenesis. Amino acid exchanges giving new epitopes or duplicating existing epitopes, according to the information collected in the epitope-database (See Example 1), were avoided in the mutagenesis process.
Enzyme variants were screened for reduced binding of antibodies raised against the backbone enzyme. Antibody binding was assessed by competitive ELISA as described in the Methods section.
Variants with reduced antibody binding capacity were further evaluated in the mouse SC animal model (See methods section).
The following variants showed reduced IgE and/or reduced IgG levels in the mouse model:
Hot-spots or epitopes were mutated using techniques known to the expert in the field (e.g. site-directed mutagenesis, error-prone PCR).
In the examples showed below, variants were made by site-directed mutagenesis. Amino acid exchanges giving new epitopes or duplicating existing epitopes according to the information collected in the epitope-database, were avoided in the mutagenesis process.
Enzyme variants were screened for reduced binding of antibodies raised against the backbone enzyme. This antibody binding was assessed by established assays (e.g. competitive ELISA, agglutination assay).
Variants with reduced antibody binding capacity were further evaluated in animal studies.
Mice were immunised subcutanuous weekly, for a period of 20 weeks, with 50 microliters 0.9% (wt/vol) NaCl (control group), or 50 microliters 0.9% (wt/vol) NaCl containing 10 micrograms of protein. Blood samples (100 microliters) were collected from the eye one week after every second immunization. Serum was obtained by blood clothing, and centrifugation.
Specific IgG1 and IgE levels were determined using the ELISA specific for mouse or rat IgG1 or IgE. Differences between data sets were analysed by using appropriate statistical methods.
A. Site-Directed Mutagenesis of Amino Acids Defining Epitopes, with an Effect on IgG1 and/or IgE Responses in Mice.
The variant carried the mutation R170F.
In a competitive IgE ELISA, this variant was less effective in competing for anti-savinase antibodies, giving a 15% lower endpoint inhibition as compared to the savinase backbone.
Mouse studies revealed an 80% reduction of the specific IgE levels, as compared to savinase backbone (p<0.01). The IgG1 levels were not significantly affected.
The variant carried the mutation S216W.
In a competitive IgG ELISA, the variant was less effective in competing for Lipolase antibodies, giving a 38% decrease in endpoint inhibition as compared to the enzyme backbone.
Mouse studies revealed a 69% decrease in specific IgG1 levels, compared to the lipolase backbone (p<0.05). The IgE levels were not significantly affected.
B. Site-Directed Mutagenesis of Epitopes, with Examples of Epitope Duplication, and New Epitope Formation, Respectively, Predicted by the Epitope-Database.
The variant carried the mutation E136R.
In a competitive IgG ELISA, the variants were less effective in competing for savinase antibodies, giving a 38% decrease in endpoint inhibition as compared to the savinase backbone.
Mouse studies revealed a dramatic increase in specific IgG1 levels, compared to savinase backbone (p<0.01). The IgE levels were not significantly affected.
Mutation E136R establishes an IgG1 epitope of the R Y P R/K pattern, previously identified on PD498. Apparently, this new epitope was more antigenic in mice than the existing epitope. The introduction of a savinase unrelated epitope on the savinase backbone could explain the observed discrepancy between competitive ELISA and animal studies.
In this example, it was found that using information derived exclusively from screening phage libraries with anti-PD498 antibodies (to identify the R Y P R/K epitope pattern of Table 2) one could predict the outcome of a genetic engineering experiment for Savinase in which the E136R mutation created the PD498-epitope on the Savinase surface, leading to increased immunogenicity of this Savinase variant. This demonstrates that the epitope patterns identified may be used to predict the effect on immunogenicity of substitutions in proteins that are different from the parent protein(s) used to identify the epitope pattern.
C. Site-Directed Mutagenesis of Amino Acids Defining Epitope Areas, with a Differential Effect on IgG1 and IgE Antibody Levels in Mice, and an Inhibiting Effect on IgG Binding, Respectively.
Epitope area: P131, S132, A133, L135, E136, V139, A151, A152, S153, G161, S162, I165, S166, Y167, P168, Y171, N173, A174, A176, Q191, Y192, G195, L196, R247, S259, T260, L262, Y263, G264.
The variant was different at position Y167 by the mutation Y167I.
In a competitive IgE ELISA, the variant was less effective in competing for anti-savinase antibodies, giving a 8% lower endpoint inhibition as compared to the its backbone.
Mouse studies revealed a 75% reduction of the specific IgE levels, as compared to the backbone (p<0.01). In contrast, the IgG1 levels were dramatically increased (p<0.01).
Epitope area: V10A, 1107, A108, L111, E112, G115, S132, A133, T134, Q137, A138, V139, S141, A142, S144, R145, G146, V147, V149, Y167, P168, Y171, A172, A174, M175, N243, R247.
While variant no. 1 was mutated at the epitope position (N140D), variant no. 2 was mutated at N140 (N140D), but also at the epitope area position (A172D).
In a competitive IgG ELISA, variant no. 1 was less effective in competing for anti-savinase antibodies, as compared to savinase. This variant revealed a 21% lower endpoint inhibition as compared to the its backbone.
Variant no. 2 resulted in an endpoint inhibition that was 60% lower as compared to savinase, and 40% as compared to variant no. 1.
4.9 mg of the Savinase variant was incubated in 50 mM Sodium Borate pH 9.5 with 12 mg of N-succinimidyl carbonate activated bis-PEG 1000 in a reaction volume of approximately 2 ml. The reaction was carried out at ambient temperature using magnetic stirring while keeping the pH within the interval 9.0-9.5 by addition of 0.5 M NaOH. The reaction time was 2 hours.
The derivatives was purified and reagent excess removed by size exclusion chromatography on a Superdex-75 column (Pharmacia) equilibrated in 50 mM Sodium Borate, 5 mM Succinic Acid, 150 mM NaCl, 1 mM CaCl2 pH 6.0.
The conjugate was stored at −20° C., in the above described buffer.
Compared to the parent enzyme variant, the protease activity of the conjugate was retained (97% using Dimethyl-casein as substrate at pH 9).
Competitive ELISA was performed according to established procedures. In short, a 96 well ELISA plate was coated with the parent protein. After proper blocking and washing, the coated antigen was incubated with rabbit anti-enzyme polyclonal antiserum in the presence of various amounts of modified protein (the competitior).
The amount of residual rabbit antiserum was detected by pig anti-rabbit immunoglobulin, horseraddish peroxidase labelled.
The data show that the derivative (60% endpoint inhibition) has reduced capacity to bind enzyme specific immunoglobulines, as compared to the parent protein (100% endpoint inhibition).
For this example the epitope sequences were determined in four environmental allergens (Bet v1; Der f2; Der p2 and PhI p2), based on their structures (1btv.pdb; 1ahm.pdb; a19v.pdb; and 1whp.pdb, respectively), sequences (SEQ ID NOS: 6, 7, 8 and 9, respectively) and computer modelling of the epitope patterns that had been assembled in our database (shown in Table 8). The allergens arise from common sources of allergy: Birch (Bet v1 from Betula pendula), House dust mites (Der f2 from Dermatophagoides farinae and Der p2 from Dermatophagoides pteronyssinus), and Timothy grass (PhI p2 from Phleum pratense).
The protein surface is scanned for epitope patterns matching the given “consensus” sequence of about 6-12 residues. First, residues on the protein surface that match the first residue of the consensus sequence are identified. Within a specified distance from each of these, residues on the protein surface that match the next residue of the consensus sequence are identified. This procedure is repeated for the remaining residues of the consensus sequence. The method is further described under the paragraph “Methods” above and the computer program can be found in the Appendixes.
The critical parameters used in this screening included:
In this way a number of potential epitopes are identified. The epitopes are sorted according to total surface accessible area, and certain entries removed:
The epitope sequences found by this second generation mapping procedure were:
K96, P99, D129, 128, R128, A98
K77, H74, F75, N71, D69, V76
K15, S1, Q2, K14, A39, L17
For this example the third-generation epitope sequences were determined in further 11 environmental allergens (Bosd2, Equc1, Gald4-mutant (with alanine substituted for glycine in position 102), Hevb8, Profillin1-AC, Profillin1-AT, Profillin2-AC, Profillin-birch pollen, Rag weed pollen5 and Vesv5), based on their structures sequences (SEQ ID NOS: 12, 13, 15, 16, 17, 18, 19, 20, 21 and 22, respectively), their structures (1bj7.pdb, 1ew3.pdb, 1flu.pdb, 1g5u.pdb, 1prq.pdb, 1a0k.pdb, 1f2k.pdb, 1cqa.pdb, 1bbg.pdb, and 1qnx.pdb, respectively), and computer modelling of the epitope patterns that had been assembled in our database (shown in Table 8). Further, the epitope sequences of the four environmental allergens of example 9, Bet v1, Der f2, Der p2, and PhI p2, were redetermined.
The additional allergens arise from common sources of allergy: cows (Bos d2 which is a bovine member of the lipocalin family of allergens), horses (Equ C1, a major horse allergen also of the lipocalin family), Hen egg white (Lysozyme Gal D 4), Latex (Hey b8, a profilin from Hevea brasiliensis), Acanthamoeba castellani (Profilin1-AC, a profilin isoform IA and Profilin2-AC, a profilin isoform II), Arabidosis thaliana (Profillin1-AT a cytoskeleton profilin), Birch (Profilin-birch pollen (Birch pollen profilin), Rag weed pollen5 (Ragweed pollen allergen V from Ambrosia trifida) and whasp venom (Ves v5 allergen from Vespula vulgaris venom).
The protein surface is scanned for epitope patterns matching the given “consensus” sequence of about 6-12 residues. First, residues on the protein surface that match the first residue of the consensus sequence are identified. Within a specified distance from each of these, residues on the protein surface that match the next residue of the consensus sequence are identified. This procedure is repeated for the remaining residues of the consensus sequence. The method is further described under the paragraph “Methods” above and the program can be found in Appendixes.
The critical parameters used in this screening included:
In this way a number of potential epitopes are identified. The epitopes are sorted according to total surface accessible area, and certain entries removed:
The epitope sequences found were:
“SAS” is solvent accessible surface. “Size” is the total surface area of the epitope in A2.
K96, P99, D129, 129, R128, A98
PhIp2:
For this example a third-generation epitope sequences were determined for some additional enzymes and redetermined for all of the enzymes in example 1-3. New enzymes are AMG (AMG pdb), BPN″ (1sup.pdb), Esperase (structure see Appendix D), Natalase (structure modelling based on SP722), Amylase-AA560 (Structure modelling based on SP722), Protease A, Alcalase, Protease B, ProteaseC, ProteaseD, ProteaseE, Properase and Relase based on their sequences and structures. The structures of Protease B, Properase, Relase, Protease A, Alcalase, ProteaseC, ProteaseD and ProteaseE can be found by “Homology modelling” (see above) and computer modelling of the epiope patterns that had been assembled in our database (shown in Table 8). Furthermore, the epitope sequences were redetermined for CAREZYME, Laccase, PD498, Savinase, Amylase SP722, and Cellulase, according to the method.
The protein surface is scanned for epitope patterns matching the given “consensus” sequence of about 6-12 residues. First, residues on the protein surface that match the first residue of the consensus sequence are identified. Within a specified distance from each of these, residues on the protein surface that match the next residue of the consensus sequence are identified. This procedure is repeated for the remaining residues of the consensus sequence. The method is further described under the paragraph “Methods” above and the program can be found in Appendixes.
The critical parameters used in this screening included:
In this way a number of potential epitopes are identified. The epitopes are sorted according to total surface accessible area, and certain entries removed:
The subtilisin sequences and positions mentioned in the following are not given in the BPN′ numeration but in the subtilisins own numeration (see the alignment as described above in Tables 1A and 1B).
The epitope sequences found were:
Epi#03
Epi#37
Epi#51
Epi#41
“SAS” is solvent accessible surface. “Size” is the total surface area of the epitope in A2.
The object of this example is to provide evidence showing that subtilisins with an homology to BPN′ of as low as 44.8% reveal a similar epitope distribution as BPN′.
Alcalase, Protease B, Savinase, Esperase, and PD498 (which range from 44.8% to 69.5% in sequence identity to BPN′) were epitope mapped as described in the above example, and compared with epitope mapped BPN′ (
The data in
Even better overlap between the epitope sequences can be found among proteins of higher sequence identity, such as within the Savinase-like subtilisins with more than 81% identity, preferably more than 85%, more preferably more than 90%, even more preferably more than 96% or most preferably more than 98% identity.
The following example provides results from a number of washing tests that were conducted under the conditions indicated
pH is adjusted to 10.5 which is within the normal range for a powder detergent.
Water hardness was adjusted by adding CaCl2 and MgCl2 (Ca2+:Mg2+=2:1) to deionized water (see also Surfactants in Consumer Products—Theory, Technology and Application, Springer Verlag 1986). pH of the detergent solution was adjusted to pH 10.5 by addition of HCl.
Measurement of reflectance (R) on the test material was done at 460 nm using a Macbeth ColorEye 7000 photometer. The measurements were done according to the manufacturers protocol.
The wash performance of the variants was evaluated by calculating a performance factor:
P=(RVariant−RBlank)/(RSavinase−RBlank)
P: Performance factor
RVariant: Reflectance of test material washed with variant
RSavinase: Reflectance of test material washed with Savinase®
RBlank: Reflectance of test material washed with no enzyme
The variants all have improved wash performance compared to Savinase®—i.e. P>1.
The variants can be divided into improvement classes designated with capital letters:
As it can be seen from Table 12 SAVINASE® variants of the invention exhibits an improvement in wash performance.
Number | Date | Country | Kind |
---|---|---|---|
PA 2000 00707 | Apr 2000 | DK | national |
PA 2001 00327 | Feb 2001 | DK | national |
This application is a continuation of U.S. application Ser. No. 09/957,806 filed on Sep. 21, 2001, which is a continuation of PCT/DK01/00293 filed Apr. 30, 2001 and claims, under 35 U.S.C. 119, priority or the benefit of Danish application nos. PA 2000 00707 and PA 2001 00327 filed Apr. 28, 2000 and Feb. 28, 2001, respectively, and U.S. application Nos. 60/203,345 and 60/277,817 filed May 10, 2000 and Mar. 21, 2001, respectively, the contents of which are fully incorporated herein by reference.
Number | Date | Country | |
---|---|---|---|
60203345 | May 2000 | US | |
60277817 | Mar 2001 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 09957806 | Sep 2001 | US |
Child | 12699979 | US | |
Parent | PCT/DK01/00293 | Apr 2001 | US |
Child | 09957806 | US |