Generation of stabilized proteins by combinatorial consensus mutagenesis

Information

  • Patent Application
  • 20050084868
  • Publication Number
    20050084868
  • Date Filed
    October 16, 2003
    21 years ago
  • Date Published
    April 21, 2005
    19 years ago
Abstract
The present invention provides methods and compositions for the production of stabilized proteins. In particular, the present invention provides methods and compositions for the generation of combinatorial libraries of consensus mutations and screening for improved protein variants.
Description
FIELD OF THE INVENTION

The present invention provides methods and compositions for the production of stabilized proteins. In particular, the present invention provides methods and compositions for the generation of combinatorial libraries of consensus mutations and screening for improved protein variants.


BACKGROUND OF THE INVENTION

Developing libraries of nucleic acids that comprise various combinations of several or many mutant or derivative sequences is recognized as a powerful method of discovering novel products having improved or more desirable characteristics. A number of powerful methods for mutagenesis have been developed that when used iteratively with focused screening to enrich the useful mutants is known by the general term “directed evolution.”


For example, a variety of in vitro DNA recombination methods have been developed for the purpose of recombining more or less homologous nucleic acid sequences to obtain novel nucleic acids. For example, recombination methods have been developed comprising mixing a plurality of homologous, but different, nucleic acids, fragmenting the nucleic acids and recombining them using PCR to form chimeric molecules. For example, U.S. Pat. No. 5,605,793 describes methods that generally comprise fragmentation of double stranded DNA molecules by DNase I, while U.S. Pat. No. 5,965,408 provides methods that generally rely on the annealing of relatively short random primers to target genes and extending them with DNA polymerase. Each of these disclosures relies on polymerase chain reaction (PCR)-like thermocycling of fragments in the presence of DNA polymerase to recombine the fragments. Additional methods known in the art take advantage of the phenomenon known as template switching (See e.g., Meyerhans, and Wain-Hobson, Nucleic Acids Res., 18: 1687-1891 [1990]). One shortcoming of these PCR-based recombination methods however is that the recombination points tend to be limited to those areas of relatively significant homology. Accordingly, in recombining more diverse nucleic acids, the frequency of recombination is dramatically reduced and limited.


In many contexts, it is desirable to be able to develop libraries of mutant molecules that mix and match mutations which are known to be important or interesting due to functional or structural data. Several strategies toward combinatorial mutagenesis have been developed, including “gene shuffling” methods in combination with a mixture of specifically designed oligonucleotide primers to incorporate desired mutations into the shuffling scheme (See, Stemmer et al., Biotechn., 18:194-196 [19951). In other methods (See, Osuna et al., Gene, 106:7-12 [1991]), synthetic DNA fragments comprising 50% wild type codon and 50% of an equimolar mixture of codons for each of the 20 amino acids at positions 144, 145 and 200 of EcoRI endonuclease were produced. The mutagenic primers were added to a solution of ssDNA template and the primers for the 144 and 145 mutations used separately from the primers for the 200 site. The separate mixtures from each experiment were hybridized to the template ssDNA and extended for one hour with PolIk polymerase. The fragments were isolated and ligated to produce a full length fragment with mutations at all three sites. The fragment was amplified with PCR and purified and cloned into a vector. While it was predicted that a balanced distribution of each of the 20 mutants would be obtained at each position, the authors were unable to verify whether the predicted distribution was attained. In another method (See, Tu et al., Biotechn., 20:352-353 [1996]) generation of combination of mutations is accomplished by using multiple mutagenic oligonucleotides which are incorporated into a mutagenic nucleotide by a single round of primer extension followed by ligation. In yet another method (See, Merino et al., Biotechn., 12:508-509 [1992]) single or combinatorial directed mutagenesis utilizes a universal set of primers complementary to the areas that flank the cloning region of the pUC/M13 vectors used in the mutagenesis scheme for the purpose of optimizing yield of mutants. In a further method (See, PCT Publication No. WO 98/42728) several variations on the theme of recombination of related families of nucleic acids are provided. In particular, this publication describes the use of defined primers in combination with recombination based generation of diversity, the defined primers being used to encourage cross-over recombination at sites not otherwise likely to be cross-over points. Recently, methods have been described that allow the construction of libraries based on gene synthesis where the location and level of diversity in the target gene can be widely controlled (See e.g., Ostermeier, Trends Biotechnol., 21, 244-7 [2003]).


While it is apparent that a number of methods exist to construct libraries, it is desirable to develop more efficient methods to design libraries which contain an increased number of variants with improved traits. Indeed, what is needed are methods that provides means to rapidly and efficiently design proteins with desired improvements (e.g., increased stability).


SUMMARY OF THE INVENTION

The present invention provides methods and compositions for the production of stabilized proteins. In particular, the present invention provides methods and compositions for the generation of combinatorial libraries of consensus mutations and screening for improved protein variants.


In some preferred embodiments, the present invention provides methods for combinatorial consensus mutagenesis comprising the steps: a) identifying a starting gene of interest; b) identifying at least two homologs of the starting gene of interest; c) generating a multiple sequence alignment of the at least two homologs of the starting gene of interest, and the starting gene of interest; d) using the multiple sequence alignment to identify consensus mutations and produce a combinatorial consensus library; and e) screening the combinatorial consensus library to identify at least one initial hit.


In additional embodiments, the present invention provides methods for combinatorial consensus mutagenesis further comprising the steps: f) sequencing at least one initial hit to provide at least one sequenced initial hit; and g) identifying improving mutations in the at least one sequenced initial hit.


In still further embodiments, the present invention provides methods for combinatorial consensus mutagenesis further comprising the steps: h) using the sequenced initial hits to generate an enhanced combinatorial consensus library; and i) screening the enhanced combinatorial consensus library to identify at least one improved hit.


In yet additional embodiments, the methods of the present invention further comprise the step of sequencing improved hits. In alternative embodiments, the improved hits are stabilized variants of the starting gene. In some particularly preferred embodiments, the improved hits comprise performance-enhancing mutations. In still further embodiments of the methods of the present invention, screening comprises determining the stability of the initial hit in at least one assay selected from the group consisting of protease resistance assays, thermostability assays, denaturation assays, and functional assays. In yet additional preferred embodiments, the methods comprise the further step of analyzing the correlation between sequence and stability of at least two initial hits. In other preferred embodiments, methods of the present invention further comprise the step of analyzing the correlation between sequence and stability of at least two sequenced improved hits.


In some embodiments, the multiple sequence alignment identifies amino acids that occur frequently in homologs but are not part of a consensus sequence. In yet additional embodiments, the steps of the methods are repeated at least once, as desired.


The present invention also provides sequence improved hits that are produced according to the methods of the present invention. In additional embodiments, the present invention provides combinatorial consensus mutagenesis libraries produced according to the methods of the present invention.


In some preferred embodiments, the present invention provides stabilized variants of beta-lactamase, wherein the stabilized variant comprises at least one amino acid change selected from the group consisting of V11I, V251I, R91K, Q95E, A153S, N232R, S247T, V293L, V294I, T342K, I262V, and V284I.


In some alternative preferred embodiments, the present invention provides stabilized variants of carcinoembryonic antigen binder, wherein the stabilized variant comprises at least one amino acid change selected from the group consisting of K3Q, L37V, E42G, E136Q, M146V, F170Y, A194D, and A234G.


In yet additional preferred embodiments, the present invention provides stabilized single chain fragment variable region (scFV), wherein the stabilized scFV variant comprises at least one amino acid change selected from the group consisting of K3Q, L37V, E42G, E136Q, M146V, F170Y, A194D, and A234G.




DESCRIPTION OF THE FIGURES


FIG. 1 provides a map of the plasmid pCB04.



FIG. 2 provides the nucleotide sequence (SEQ ID NO:1) of plasmid pCB04.



FIG. 3 provides a graph showing the enrichment of consensus mutations observed during screening of NA04 library.



FIG. 4 provides a table showing the calculated parameters for some mutations.



FIG. 5 provides a graph showing the relative remaining activity of BLA variants of NA04 in the presence of three proteases.



FIG. 6 provides a graph showing the stability distribution of 90 variants from NA01, NA02 and NA03.



FIG. 7 provides the amino acid sequence of CAB1. The sequences of the heavy chain (SEQ ID NO:2), linker (SEQ ID NO:3), light chain (SEQ ID NO:4), and BLA (SEQ ID NO:5) are shown.



FIG. 8 provides a map of plasmid pME27.1, encoding CAB1.



FIG. 9 provides the nucleotide sequence of plasmid pME27.1 (SEQ ID NO:6).



FIG. 10 provides the amino acid sequences of consensus mutations used in constructing library NA 05 (SEQ ID NOS:7-9).



FIG. 11 provides a graph showing the binding assay results for variants from the library NA05.



FIG. 12 provides a graph showing the binding of various isolates from NA06 to CEA.



FIG. 13 provides a brief schematic of the steps of the present invention.




DESCRIPTION OF THE INVENTION

The present invention provides methods and compositions for the production of stabilized proteins. In particular, the present invention provides methods and compositions for the generation of combinatorial libraries of consensus mutations and screening for improved protein variants.


Definitions


Unless defined otherwise herein, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs (See e.g., Singleton, et al., DICTIONARY OF MICROBIOLOGY AND MOLECULAR BIOLOGY, 2D ED., John Wiley and Sons, New York [1994]; and Hale & Marham, THE HARPER COLLINS DICTIONARY OF BIOLOGY, Harper Perennial, N.Y. [1991], both of which provide one of skill with a general dictionary of many of the terms used herein). Although any methods and materials similar or equivalent to those described herein can be used in the practice or testing of the present invention, the preferred methods and materials are described. Numeric ranges are inclusive of the numbers defining the range. Unless otherwise indicated, nucleic acids are written left to right in 5′ to 3′ orientation; amino acid sequences are written left to right in amino to carboxy orientation, respectively. The headings provided herein are not limitations of the various aspects or embodiments of the invention that can be had by reference to the specification as a whole. Accordingly, the terms defined immediately below are more fully defined by reference to the specification as a whole.


As used herein, the term, “combinatorial mutagenesis” refers to the methods of the present invention in which libraries of variants of a starting sequence are generated. In these libraries, the variants contain one or several mutations chosen from a predefined set of mutations. In addition, the methods provide means to introduce random mutations which were not members of the predefined set of mutations. In some embodiments, the methods include those set forth in U.S. patent application Ser. No. 09/699,250, filed Oct. 26, 2000, hereby-incorporated by reference. In alternative embodiments, combinatorial mutagenesis methods encompass commercially available kits (e.g., QuikChange Multisite, Stratagene, San Diego, Calif.).


As used herein, the term “library of mutants” refers to a population of cells which are identical in most of their genome but include different homologues of one or more genes. Such libraries can be used, for example, to identify genes or operons with improved traits.


As used herein, the term “starting gene” refers to a gene of interest that encodes a protein of interest that is to be improved and/or changed using the present invention.


As used herein, the term “multiple sequence alignment” (“MSA”) refers to the sequences of multiple homologs of a starting gene that are aligned using an algorithm (e.g., Clustal W).


As used herein, the terms “consensus sequence” and “canonical sequence” refer to an archetypical amino acid sequence against which all variants of a particular protein or sequence of interest are compared. The terms also refer to a sequence that sets forth the nucleotides that are most often present in a DNA sequence of interest. For each position of a gene, the consensus sequence gives the amino acid that is most abundant in that position in the MSA. For example, in the Pribnow box, the canonical sequence is T89 A89 T50 A65 and T100, wherein the subscript indicates the percent occurrence of the most frequently found base.


As used herein, the term “consensus mutation” refers to a difference in the sequence of a starting gene and a consensus sequence. Consensus mutations are identified by comparing the sequences of the starting gene and the consensus sequence resulting from an MSA. In some embodiments, consensus mutations are introduced into the starting gene such that it becomes more similar to the consensus sequence. Consensus mutations also include amino acid changes that change an amino acid in a starting gene to an amino acid that is more frequently found in an MSA at that position relative to the frequency of that amino acid in the starting gene. Thus, the term consensus mutation comprises all single amino acid changes that replace an amino acid of the starting gene with an amino acid that is more abundant than the amino acid in the MSA.


As used herein, the term “initial hit” refers to a variant that was identified by screening a combinatorial consensus mutagenesis library. In preferred embodiments, initial hits have improved performance characteristics, as compared to the starting gene.


As used herein, the term “improved hit” refers to a variant that was identified by screening an enhanced combinatorial consensus mutagenesis library.


As used herein, the terms “improving mutation” and “performance-enhancing mutation” refer to a mutation that leads to improved performance when it is introduced into the starting gene. In some preferred embodiments, these mutations are identified by sequencing hits that were identified during the screening step of the method. In most embodiments, mutations that are more frequently found in hits are likely to be improving mutations, as compared to an unscreened combinatorial consensus mutagenesis library.


As used herein, the term “enhanced combinatorial consensus mutagenesis library” refers to a CCM library that is designed and constructed based on screening and/or sequencing results from an earlier round of CCM mutagenesis and screening. In some embodiments, the enhanced CCM library is based on the sequence of an initial hit resulting from an earlier round of CCM. In additional embodiments, the enhanced CCM is designed such that mutations that were frequently observed in initial hits from earlier rounds of mutagenesis and screening are favored. In some preferred embodiments, this is accomplished by omitting primers that encode performance-reducing mutations or by increasing the concentration of primers that encode performance-enhancing mutations relative to other primers that were used in earlier CCM libraries.


As used herein, the term “performance-reducing mutations” refer to mutations in the combinatorial consensus mutagenesis library that are less frequently found in hits resulting from screening as compared to an unscreened combinatorial consensus mutagenesis library. In preferred embodiments, the screening process removes and/or reduces the abundance of variants that contain “performance-reducing mutations.”


As used herein, the term “functional assay” refers to an assay that provides an indication of a protein's activity. In particularly preferred embodiments, the term refers to assay systems in which a protein is analyzed for its ability to function in its usual capacity. For example, in the case of enzymes, a functional assay involves determining the effectiveness of the enzyme in catalyzing a reaction.


As used herein, the term “target property” refers to the property of the starting gene that is to be altered. It is not intended that the present invention be limited to any particular target property. However, in some preferred embodiments, the target property is the stability of a gene product (e.g., resistance to denaturation, proteolysis or other degradative factors), while in other embodiments, the level of production in a production host is altered. Indeed, it is contemplated that any property of a starting gene will find use in the present invention.


The term “property” or grammatical equivalents thereof in the context of a nucleic acid, as used herein, refer to any characteristic or attribute of a nucleic acid that can be selected or detected. These properties include, but are not limited to, a property affecting binding to a polypeptide, a property conferred on a cell comprising a particular nucleic acid, a property affecting gene transcription (e.g., promoter strength, promoter recognition, promoter regulation, enhancer function), a property affecting RNA processing (e.g., RNA splicing, RNA stability, RNA conformation, and post-transcriptional modification), a property affecting translation (e.g., level, regulation, binding of mRNA to ribosomal proteins, post-translational modification). For example, a binding site for a transcription factor, polymerase, regulatory factor, etc., of a nucleic acid may be altered to produce desired characteristics or to identify undesirable characteristics.


The term “property” or grammatical equivalents thereof in the context of a polypeptide, as used herein, refer to any characteristic or attribute of a polypeptide that can be selected or detected. These properties include, but are not limited to oxidative stability, substrate specificity, catalytic activity, thermal stability, alkaline stability, pH activity profile, resistance to proteolytic degradation, Km, kcat, Kcat/km ratio, protein folding, inducing an immune response, ability to bind to a ligand, ability to bind to a receptor, ability to be secreted, ability to be displayed on the surface of a cell, ability to oligomerize, ability to signal, ability to stimulate cell proliferation, ability to inhibit cell proliferation, ability to induce apoptosis, ability to be modified by phosphorylation or glycosylation, ability to treat disease.


As used herein, the term “screening” has its usual meaning in the art and is, in general a multi-step process. In the first step, a mutant nucleic acid or variant polypeptide therefrom is provided. In the second step, a property of the mutant nucleic acid or variant polypeptide is determined. In the third step, the determined property is compared to a property of the corresponding precursor nucleic acid, to the property of the corresponding naturally occurring polypeptide or to the property of the starting material (e.g., the initial sequence) for the generation of the mutant nucleic acid.


It will be apparent to the skilled artisan that the screening procedure for obtaining a nucleic acid or protein with an altered property depends upon the property of the starting material the modification of which the generation of the mutant nucleic acid is intended to facilitate. The skilled artisan will therefore appreciate that the invention is not limited to any specific property to be screened for and that the following description of properties lists illustrative examples only. Methods for screening for any particular property are generally described in the art. For example, one can measure binding, pH, specificity, etc., before and after mutation, wherein a change indicates an alteration. Preferably, the screens are performed in a high-throughput manner, including multiple samples being screened simultaneously, including, but not limited to assays utilizing chips, phage display, and multiple substrates and/or indicators.


As used herein, in some embodiments, screens encompass selection steps in which variants of interest are enriched from a population of variants. Examples of these embodiments include the selection of variants that confer a growth advantage to the host organism, as well as phage display or any other method of display, where variants can be captured from a population of variants based on their binding or catalytic properties. In a preferred embodiment, a library of variants is exposed to stress (heat, protease, denaturation) and subsequently variants that are still intact are identified in a screen or enriched by selection. It is intended that the term encompass any suitable means for selection. Indeed, it is not intended that the present invention be limited to any particular method of screening.


In one embodiment of the invention, the template nucleic acid encodes all or a portion of an antibody. The term “antibody” or grammatical equivalents, as used herein, refer to antibodies and antibody fragments that retain the ability to bind to the epitope that the intact antibody binds and include polyclonal antibodies, monoclonal antibodies, chimeric antibodies, anti-idiotype (anti-ID) antibodies. Preferably, the antibodies are monoclonal antibodies. Antibody fragments include, but are not limited to the complementarity-determining regions (CDRs), single-chain fragment variable regions (scFv), heavy chain variable region (VH), light chain variable region (VL).


As used herein, “host cell” refers to a cell that has the capacity to act as a host and expression vehicle for an incoming sequence. In one embodiment, the host cell is a microorganism.


As used herein, the terms “DNA construct” and “transforming DNA” are used interchangeably to refer to DNA used to introduce sequences into a host cell or organism. The DNA may be generated in vitro by PCR or any other suitable technique(s) known to those in the art. In particularly preferred embodiments, the DNA construct comprises a sequence of interest (e.g., as an incoming sequence). In some embodiments, the sequence is operably linked to additional elements such as control elements (e.g., promoters, etc.). The DNA construct may further comprise a selectable marker. It may further comprise an incoming sequence flanked by homology boxes. In a further embodiment, the transforming DNA comprises other non-homologous sequences, added to the ends (e.g., stuffer sequences or flanks). In some embodiments, the ends of the incoming sequence are closed such that the transforming DNA forms a closed circle. The transforming sequences may be wild-type, mutant or modified. In some embodiments, the DNA construct comprises sequences homologous to the host cell chromosome. In other embodiments, the DNA construct comprises non-homologous sequences. Once the DNA construct is assembled in vitro it may be used to: 1) insert heterologous sequences into a desired target sequence of a host cell, and/or 2) mutagenize a region of the host cell chromosome (i.e., replace an endogenous sequence with a heterologous sequence), 3) delete target genes; and/or introduce a replicating plasmid into the host.


As used herein, the term “targeted randomization” refers to a process that produces a plurality of sequences where one or several positions have been randomized. In some embodiments, randomization is complete (i.e., all four nucleotides, A, T, G, and C can occur at a randomized position. In alternative embodiments, randomization of a nucleotide is limited to a subset of the four nucleotides. Targeted randomization can be applied to one or several codons of a sequence, coding for one or several proteins of interest. When expressed, the resulting libraries produce protein populations in which one or more amino acid positions can contain a mixture of all 20 amino acids or a subset of amino acids, as determined by the randomization scheme of the randomized codon. In some embodiments, the individual members of a population resulting from targeted randomization differ in the number of amino acids, due to targeted or random insertion or deletion of codons. In further embodiments, synthetic amino acids are included in the protein populations produced. In some preferred embodiments, the majority of members of a population resulting from targeted randomization show greater sequence homology to the consensus sequence than the starting gene.


In some preferred embodiments, mutant DNA sequences are generated with site saturation mutagenesis in at least one codon. In other preferred embodiments, site saturation mutagenesis is performed for two or more codons. In a further embodiment, mutant DNA-sequences have more than 40%, more than 45%, more than 50%, more than 55%, more than 60%, more than 65%, more than 70%, more than 75%, more than 80%, more than 85%, more than 90%, more than 95%, or more than 98% homology with the sequence of the starting gene. Alternatively, mutant DNA may be generated in vivo using any known mutagenic procedure (e.g., radiation, nitrosoguanidine, etc.). The DNA construct sequences may be wild-type, mutant or modified. In addition, the sequences may be homologous or heterologous.


The terms “modified sequence” and “modified genes” are used interchangeably herein to refer to a sequence that includes a deletion, insertion or interruption of naturally occurring nucleic acid sequence. In some preferred embodiments, the expression product of the modified sequence is a truncated protein (e.g., if the modification is a deletion or interruption of the sequence). In some particularly preferred embodiments, the truncated protein retains biological activity. In alternative embodiments, the expression product of the modified sequence is an elongated protein (e.g., modifications comprising an insertion into the nucleic acid sequence). In some embodiments, an insertion leads to a truncated protein (e.g., when the insertion results in the formation of a stop codon). Thus, an insertion may result in either a truncated protein or an elongated protein as an expression product.


As used herein, the terms “mutant sequence” and “mutant gene” are used interchangeably and refer to a sequence that has an alteration in at least one codon occurring in a host cell's wild-type sequence. The expression product of the mutant sequence is a protein with an altered amino acid sequence relative to the wild-type. The expression product may have an altered functional capacity (e.g., enhanced enzymatic activity).


The terms “mutagenic primer” or “mutagenic oligonucleotide” (used interchangeably herein) are intended to refer to oligonucleotide compositions which correspond to a portion of the template sequence and which are capable of hybridizing thereto. With respect to mutagenic primers, the primer will not precisely match the template nucleic acid, the mismatch or mismatches in the primer being used to introduce the desired mutation into the nucleic acid library. As used herein, “non-mutagenic primer” or “non-mutagenic oligonucleotide” refers to oligonucleotide compositions which will match precisely to the template nucleic acid. In one embodiment of the invention, only mutagenic primers are used. In another preferred embodiment of the invention, the primers are designed so that for at least one region at which a mutagenic primer has been included, there is also non-mutagenic primer included in the oligonucleotide mixture. By adding a mixture of mutagenic primers and non-mutagenic primers corresponding to at least one of the mutagenic primers, it is possible to produce a resulting nucleic acid library in which a variety of combinatorial mutational patterns are presented. For example, if it is desired that some of the members of the mutant nucleic acid library retain their precursor sequence at certain positions while other members are mutant at such sites, the non-mutagenic primers provide the ability to obtain a specific level of non-mutant members within the nucleic acid library for a given residue. The methods of the invention employ mutagenic and non-mutagenic oligonucleotides which are generally between 10-50 bases in length, more preferably about 15-45 bases in length. However, it may be necessary to use primers that are either shorter than 10 bases or longer than 50 bases to obtain the mutagenesis result desired. With respect to corresponding mutagenic and non-mutagenic primers, it is not necessary that the corresponding oligonucleotides be of identical length, but only that there is overlap in the region corresponding to the mutation to be added.


Primers may be added in a pre-defined ratio according to the present invention. For example, if it is desired that the resulting library have a significant level of a certain specific mutation and a lesser amount of a different mutation at the same or different site, by adjusting the amount of primer added, it is possible to produce the desired biased library. Alternatively, by adding lesser or greater amounts of non-mutagenic primers, it is possible to adjust the frequency with which the corresponding mutation(s) are produced in the mutant nucleic acid library.


“Contiguous mutations” means mutations which are presented within the same oligonucleotide primer. For example, contiguous mutations may be adjacent or nearby each other, however, they will be introduced into the resulting mutant template nucleic acids by the same primer.


“Discontiguous mutations” means mutations which are presented in separate oligonucleotide primers. For example, discontiguous mutations will be introduced into the resulting mutant template nucleic acids by separately prepared oligonucleotide primers.


An “incoming sequence” as used herein means a DNA sequence that is newly introduced into the host cell. In some embodiments, the incoming sequence becomes integrated into the host chromosome or genome. The sequence may encode one or more proteins of interest. Thus, as used herein, the term “sequence of interest” refers to an incoming sequence or a sequence to be generated by the host cell. The terms “gene of interest” and “sequence of interest” are used interchangeably herein.


The incoming sequence may comprise a promoter operably linked to a sequence of interest. An incoming sequence comprises a sequence that may or may not already present in the genome of the cell to be transformed (i.e., homologous and heterologous sequences find use with the present invention).


In one embodiment, the incoming sequence encodes at least one heterologous protein, including, but not limited to hormones, enzymes, and growth factors. In an alternative embodiment, the incoming sequence encodes a functional wild-type gene or operon, a functional mutant gene or operon, or a non-functional gene or operon. In some embodiments, the non-functional sequence is inserted into a target sequence to disrupt function, thereby allowing a determination of function of the disrupted gene.


The terms “wild-type sequence,” or “wild-type gene” are used interchangeably herein, to refer to a sequence that is native or naturally occurring in a host cell. In some embodiments, the wild-type sequence refers to a sequence of interest that is the starting point of a protein engineering project. The wild-type sequence may encode either a homologous or heterologous protein. A homologous protein is one the host cell would produce without intervention. A heterologous protein is one that the host cell would not produce but for the intervention.


As used herein, the term “heterologous sequence” refers to a sequence derived from a separate genetic source or species. Heterologous sequences encompass non-host sequences, modified sequences, sequences from a different host cell strain, and homologous sequences from a different chromosomal location of the host cell. In some embodiments, homology boxes flank each side of an incoming sequence As used herein, the term “selectable marker” refers to genes that provide an indication that a host cell has taken up an incoming DNA of interest or some other reaction has occurred. Typically, selectable markers are genes that confer antibiotic resistance or a metabolic advantage on the host cell to allow cells containing the exogenous DNA to be distinguished from cells that have not received any exogenous sequence during the transformation. A “residing selectable marker” is one that is located on the chromosome of the microorganism to be transformed. A residing selectable marker encodes a gene that is different from the selectable marker on the transforming DNA construct.


DETAILED DESCRIPTION OF THE INVENTION

The present invention provides methods and compositions for the production of stabilized proteins. In particular, the present invention provides methods and compositions for the generation of combinatorial libraries of consensus mutations and screening for improved protein variants.


Protein sequences of organisms have evolved as a result of random mutagenesis and selection. During this process of evolution, many mutations that de-stabilize or otherwise reduce performance of a protein are removed and performance-enhancing mutations are retained. However, evolution also leads to the accumulation of random mutations that may be performance-reducing but have little impact on the fitness of their host organism. Multiple sequence alignments of homologous proteins allow to identify which amino acid is frequently found in a particular position of a protein. These consensus residues are likely to result in functional mutants if they are introduced into a particular sequence of a family of related proteins and it has been demonstrated that such consensus mutations can lead to variants with improved function (See e.g., Steipe et al., J. Mol. Biol., 240: 188-92 [1994]). Thus, it is possible to improve the performance of a protein by systematically introducing individual consensus mutations into a protein. However, this process is very time consuming, as the number of possible consensus mutations can be large and it may be necessary to incorporate several consensus mutations to achieve the desired performance enhancement. An alternative method involves the direct synthesis of a protein's consensus sequence (Lehmann et al., Protein Eng., 13:49-57 [2000]). Indeed, this approach was used to identify a stabilized phytase variant. However, the authors noted in subsequent studies that not all consensus mutations were stabilizing. Thus, it was necessary to remove a number of consensus mutations, which again is a slow and iterative process (Lehmann et al., Protein Eng., 15:403-11 [2002]).


During the development of the present invention, the assumption was made that consensus mutations can be divided into “improving mutations” and “performance-reducing mutations.” Thus, methods were developed that allow for the rapid generation of variants of a starting protein that contain a number of improving mutations and few if any performance-reducing mutations. As part of the process, combinatorial consensus mutagenesis (CCM) libraries are created that contain multiple combinations of consensus mutations. In some particularly preferred embodiments, these CCM libraries are screened to identify “initial hits” which contain one or several improving mutations and few if any performance-reducing mutations. In some cases, the resulting initial hits are sufficiently improved for their intended application. However, the present invention further provides methods that allow further improvement of these initial hits. By sequencing several initial hits from a CCM library, improving mutations which are more common among the hits as compared to the initial CCM library are identified. This information facilitates the construction of a second (i.e., “enhanced”) CCM library that is enriched in improving mutations. In some embodiments, the enhanced CCM library is constructed based on the starting gene. In alternative embodiments, the enhanced CCM library is started from one or several of the initial hits which already contain some improving mutations, and add further improving mutations (that were found in other initial hits) to them in the enhanced CCM library. If further enhancement is desired, further rounds of CCM library construction based on already improved hits and/or based on additional sequence information resulting from improved and initial hits are performed. This combinatorial process allows one to rapidly identify variants of the starting gene that contain multiple improving consensus mutations but few if any performance-reducing mutations. An overview of the CCM process is outlined in FIG. 13.


In particularly preferred embodiments, it is important to note that the effect of mutations on the performance of a protein is not necessarily additive. Thus, mutations that enhance the performance of the starting gene may not necessarily have the same effect in a variant of that gene. One advantage of the CCM process of the present invention is that it explores many combinations of consensus mutations. Thus, the present invention is very likely to identify combinations of such mutations that lead to large improvements in gene performance.


In preferred embodiments, the present invention provides means to identify homologs of a starting gene through use of database searching and/or homology cloning from a sample of interest (e.g., an environmental sample). Once the homolog(s) are identified, MSA are generated and consensus mutations identified. Depending upon the number of differences between the starting sequence and the consensus sequence, the positions at which the MSA gives a clear consensus that differs from the starting gene can be chosen for further investigation. In alternative embodiments, positions are included in the MSA where many homologs differ from the starting sequence, even when there is no clear consensus in that position. In these alternative embodiments, it is possible to generate larger libraries containing more diverse variants.


Next, mutagenic oligonucleotides are designed that introduce the chosen consensus mutation into the starting gene. Then, combinatorial mutagenesis is performed to produce a library of variants. Once this step is completed, improved variants in the library are identified. It is not intended that the present invention be limited to any particular method of screening variants and identifying those with improved properties. Indeed, those of skill in the art know how to best choose a method, as it will depend upon the starting gene, expression host, and the target property to be improved.


In additional embodiments, the variants in the library are sequenced, in particular those that have been improved. In further embodiments, statistical analyses are conducted to estimate the contribution of each individual mutation to the performance of the individual variants. In yet further embodiments, a second combinatorial library is generated, based on the results of the statistical analyses.


EXPERIMENTAL

The following examples are provided in order to demonstrate and further illustrate certain preferred embodiments and aspects of the present invention and are not to be construed as limiting the scope thereof.


In the experimental disclosure which follows, the following abbreviations apply: ° C. (degrees Centigrade); rpm (revolutions per minute); H2O (water); dH2O (deionized water); HCl (hydrochloric acid); aa (amino acid); bp (base pair); kb (kilobase pair); kD (kilodaltons); gm (grams); μg and ug (micrograms); mg (milligrams); ng (nanograms); μl (microliters); ml (milliliters); mm (millimeters); nm (nanometers); μm and um (micrometer); M (molar); mM (millimolar); μM and uM (micromolar); U (units); V (volts); MW (molecular weight); sec (seconds); min(s) (minute/minutes); hr(s) (hour/hours); MgCl2 (magnesium chloride); NaCl (sodium chloride); SOC (2% Bacto-Tryptone, 0.5% Bacto Yeast Extract, 10 mM NaCl, 2.5 mM KCl); Terrific Broth (TB; 12 g/l Bacto Tryptone, 24 g/l glycerol, 2.31 g/l KH2PO4, and 12.54 g/l K2HPO4); OD280 (optical density at 280 nm); OD600 (optical density at 600 nm); C (constant region or chain); V (variable chain); vH and VH (variable heavy chain); vL and VL (variable light chain); PAGE (polyacrylamide gel electrophoresis); PBS (phosphate buffered saline [150 mM NaCl, 10 mM sodium phosphate buffer, pH 7.2]); PBST (PBS+0.25% Tween® 20); PEG (polyethylene glycol); PCR (polymerase chain reaction); RT-PCR (reverse transcription PCR); SDS (sodium dodecyl sulfate); Tris (tris(hydroxymethyl)aminomethane); w/v (weight to volume); v/v (volume to volume); CEA (carcinoembryonic antigen); CAB (CEA antigen binder); LA medium (per liter: Difco Tryptone Peptone 20 g, Difco Yeast Extract 10 g, EM Science NaCl 1 g, EM Science Agar 17.5 g, dH2O to IL); NCBI (National Center for Biotechnology Information); ATCC (American Type Culture Collection, Rockville, Md.); Applied Biosystems (Applied Biosystems, Foster City, Calif.); Clontech (CLONTECH Laboratories, Palo Alto, Calif.); Difco (Difco Laboratories, Detroit, Mich.); Oxoid (Oxoid Inc., Ogdensburg, N.Y.); GIBCO BRL or Gibco BRL (Life Technologies, Inc., Gaithersburg, Md.); Millipore (Millipore, Billerica, Mass.); Bio-Rad (Bio-Rad, Hercules, Calif.); Invitrogen (Invitrogen Corp., San Diego, Calif.); NEB (New England Biolabs, Beverly, Mass.); Sigma (Sigma Chemical Co., St. Louis, Mo.); Pierce (Pierce Biotechnology, Rockford, Ill.); Takara (Takara Bio Inc. Otsu, Japan); Roche (Hoffmann-La Roche, Basel, Switzerland); EM Science (EM Science, Gibbstown, N.J.); Qiagen (Qiagen, Inc., Valencia, Calif.); Biodesign (Biodesign Intl., Saco, Me.); Aptagen (Aptagen, Inc., Herndon, Va.); Molecular Devices (Molecular Devices, Corp., Sunnyvale, Calif.); Stratagene (Stratagene Cloning Systems, La Jolla, Calif.); and Microsoft (Microsoft, Inc., Redmond, Wash.).


Example 1
Combinatorial Consensus Mutagenesis of BLA

In this Example, the use of combinatorial consensus mutagenesis with beta-lactamase (BLA) is described. These experiments were performed using plasmid pCB04 which directs the expression of beta-lactamase (BLA) from Enterobacter cloacae. BLA expression is driven by a lac promoter. The protein is secreted into the periplasm of E. coli, as it contains a leader peptide from the pIII protein of bacteriophage M 13. The BLA gene is fused to a gene coding for the D3 domain of the pIII protein of bacteriophage M13. However, there is a amber stop codon located between both genes and consequently, TOP10 cells (Invitrogen,) carrying the plasmid express BLA and not a fusion protein. Expression of BLA from plasmid pCB04 confers resistance to the antibiotic cefotaxime to the cells. FIG. 1 provides a map of plasmid pCB04, while FIG. 2 provides the nucleotide sequence (SEQ ID NO:1) of plasmid pCB04. Plasmid pCB04 contains the following features:

P lac:3008-3129 bpgIII signal:3200-3253BLA:3254-4336His Tag:4364-4384gIII d3:4421-5053F1 origin: 175-630CAT:3253-3912


Choosing Mutations for Mutagenesis


Forty-three publicly available protein sequences for bacterial beta-lactamases of class C type were identified by a keyword search of protein sequences available at NCBI. Among the available sequences were three of particular note: NCBI accession number PNKBP corresponded to the Enterobacter cloacae enzyme that has been used as the backbone for protein engineering; NCBI accession number AMPC_PSYIM corresponded to a lactamase isolated from a psychrophilic organism; and NCBI accession number AAM23514 corresponded to a lactamase isolated from a thermophilic organism.


Table 1 provides the accession numbers and corresponding species for the 38 BLA sequences used in the multiple sequence alignment.

TABLE 1Sequences Used in Multiple Sequence AlignmentNCBI Accession #OrganismAAL49969Shewanella algaeAAM23514Thermoanaerobacter tengcongensisAAM90334Klebsiella pneumoniaeAF411145_1Enterobacter cloacaeAF462690_1Aeromonas punctataAF492445_2Citrobacter mutliniaeAF492446_2Enterobacter cancerogenusAF492447_2Citrobacter braakiiAF492448_2Citrobacter werkmaniiAF492449_1Escherichia fergusoniiAMPC_CITFRCitrobacter freundiiAMPC_ECOLIEscherichia coli K12AMPC_LYSLALysobacter lactamgenusAMPC_MORMOMorganella morganiiAMPC_PROSTProvidencia stuartiiAMPC_PSEAEPseudomonas aeruginosaAMPC_PSYIMPsychrobacter immobilisAMPC_SERMASerratia marcescensAMPC_YERENYersinia enterocoliticaCAA54602Klebsiella pneumoniaeCAA56561Aeromonas sobriaCAA76196Salmonella enteriditisCAB36900Escherichia coliCAC04522Ochrobactum anthropiCAC17149Ochrobactum anthropiCAC17622Ochrobactum anthropiCAC85157Enterobacter asburiaeCAC85357Enterobacter hormaecheiCAC85358Enterobacter intermediusCAC85359Enterobacter dissolvensCAC94553Buttiauxella sp BTN01CAC95129Enterobacter cancerogenusCAD32298Enterobacter amnigenusCAD32299Enterobacter nimipressuralisCAD32304Citrobacter youngaeNP_313158Escherichia coli O157:H7PNKBPEnterobacter cloacaeS13408Pseudomonas aeruginosa


The AlignX program within the Vector NTI version 7.0 software suite (Invitrogen) was used to align the 43 sequences identified. AlignX uses a clustalw algorithm; the alignment parameters used were the default parameters recommended and supplied with the program. The alignment was based on the E. cloacae sequence. Preliminary examination of this initial alignment revealed a duplicate sequence and a cluster of 4 sequences representing broad-spectrum inhibitor-resistant proteins which were excluded from the final protein alignment. The remaining 38 sequences were realigned, again basing the alignment on the E. cloacae sequence. In this alignment, the most-distantly related protein was the lactamase from the thermophilic bacterium. The AlignX program was allowed to define a consensus residue at each position where it was able to, using its default-definition of a consensus residue. At each position where the alignment indicated a consensus residue, that residue was compared to the corresponding residue in the E. cloacae sequence. In this analysis, 29 residues were identified where the cloacae sequence differed from the consensus sequence. These 29 residues were chosen for the first round of mutagenesis.


Primers were designed to incorporate the desired amino acid changes into the E. cloacae backbone. General primer design was done following the recommendations of the manufacturer of the Quikchange® Multi-Site kit (Stratagene). Briefly, the constructed primers were 5′ phosphorylated, ranged in length from 35 to 40 nucleotides, and had predicted melting temperatures of >75° C. In most cases, the change to the desired amino acid was accomplished by changing a single nucleotide in the primer, although in a few cases, two changes had to be introduced. The mismatching nucleotide or nucleotides was/were placed in the center of the primer, with generally 15-17 nucleotides on either side of the mismatch. Primers were named corresponding to the amino acid to be changed, its position, and the intended mutation. For example, primer “A214S” corresponds to alanine at position 214 to be changed to serine. The numbering starts with the initial methionine in the signal sequence of the wildtype E. cloacae protein. All primers were designed to the sense strand.


Three libraries were prepared using the QuikChange® Multi-Site Mutagenesis kit (QCMS) (Stratagene), with some modifications as described below. The first library, “NA01,” was prepared using a final concentration of 4 uM for all primers combined (approximately 35 ng of each primer). The second library, “NA02” was prepared using a concentration of 0.4 uM for all primers combined (approximately 3.5 ng of each primer). The third library, “NA03,” was prepared using a concentration of 0.4 uM for all primers combined (as with NA02), but the reaction was heated to 95° C. for 2 minutes before transformation, in order to determine whether the wild-type background could be reduced. The QCMS protocol recommends the use of 50-100 ng and up to 5 primers. Thus, the reaction components used as described in this Example are a bit different from the standard reaction compositions. It was noted that the experiment with 3.5 ng of each primer worked quite well, whereas the experiment with 35 ng of each primer resulted in fewer mutants.


The QCMS reactions contained 18.5 ul ddH2O, 1.0 ul undiluted (100 uM stock of total primers) or diluted primer mix (10 uM stock of total primers), 1.0 ul dNTPs (provided in kit), 1.0 ul template DNA (pCB04 wt; 160 ng), 1.0 ul enzyme blend (provided in kit), and 2.5 ul buffer (provided in kit), for a total of 25 ul. The cycling conditions were 95° C. for 1 minute, (once), followed by cycling (30×) at 95° C., 1 minute; 55° C. for 1 minute, and 65° C. for 10 minutes; the reactions were then held at 4° C. Then, the reactions were digested with DpnI (1 ul) for 2 hours at 37° C., after which 0.5 ul DpnI were added, and digestion continued for two more hours. The reactions mixtures were transformed (0.5 ul) into TOP10 electrocompetent cells (Invitrogen). SOC broth was added to make a total volume of 350 ul. Then, 25 ul or 50 ul suspensions of cells were plated on LA+5 ppm CMP (chloramphenicol) (random clones) or LA-5 ppm CMP+0.1 ppm CTX (cefotaxime) (active clones). Following incubation for about 20 hours (i.e., overnight) at 37°. The number of random and active colonies were compared and found to be comparable for all of the libraries. In the case of libraries NA02 and NA03, a single QCMS reaction was carried out, and it was split into 2 portions after DpnI digestion. One portion, “NA02,” was transformed directly into E. coli and the second portion, “NA03,” was heated at 95° C. for 2 min before transformation into E. coli. This was conducted to determine if denaturation of hemimethylated DNA by heating after DpnI digestion would reduce the wild type template background in the libraries. No difference was observed in the wild type background in libraries NA02 and NA03. However, library NA01 had a significantly higher wild type background of 48% compared to NA02 and NA03, which had wild type backgrounds of only 17%.


The following list provides the sequences of 29 mutagenic oligonucleotides that were used to generate the combinatorial libraries (the position of the mutation is given based on the entire gene including a 20 amino acid pro-peptide). The T21A primer was later found to be incorrectly designed and the corresponding mutation was not observed in any of the isolates.

A173SCGCGTCTTTACGCCAACTCCAGCATCGGTCTTTTTG(SEQ ID NO:10)A214SGGATTAACGTGCCGAAATCGGAAGAGGCGCATTAC(SEQ ID NO:11)A228PGCTATCGTGACGGTAAACCGGTGCGCGTTTCGCCG(SEQ ID NO:12)A33DGCTGGCGGAGGTGGTCGACAATACGATTACCCCGCT(SEQ ID NO:13)F63YACCGCACTATTACACATATGGCAAGGCCGATATCGC(SEQ ID NO:14)I282VAGTCGCGCTACTGGCGTGTCGGGTCAATGTATCAG(SEQ ID NO:15)I354LCTTTATTCCTGAAAAGCAGCTCGGTATTGTGATGCTCGCG(SEQ ID NO:16)I85VCTGTTCGAGCTGGGTTCTGTAAGTAAAACCTTCACCG(SEQ ID NO:17)M126LAGTGGCAGGGTATTCGTCTGCTGGATCTCGCCACC(SEQ ID NO:18N246TCTATGGCGTGAAAACCACCGTGCAGGATATGGCGA(SEQ ID NO:19)N252RACGTGCAGGATATGGCGCGCTGGGTCATGGCCAACA(SEQ ID NO:22)P315AGTAAGGTAGCGCTAGCGGCGTTGCCCGTGGCAGAAG(SEQ ID NO:23)Q115ETGACCAGATACTGGCCAGAGCTGACGGGCAAGCAG(SEQ ID NO:24)Q239ECGGGTATGCTGGATGCAGAAGCCTATGGCGTGAAAAC(SEQ ID NO:25)R111KGGACGATGCGGTGACCAAATACTGGCCACAGCTGA(SEQ ID NO:26)R125TAGCAGTGGCAGGGTATTACTATGCTGGATCTCGCCA(SEQ ID NO:27)S150AAGGTCACGGATAACGCCGCCCTGCTGCGCTTTTATC(SEQ ID NO:28)S24TTCTCGCCACGCCAGTGACAGAAAAACAGCTGGCGG(SEQ ID NO:29)S267TGAGAACGTTGCTGATGCCACACTTAAGCAGGGCATCG(SEQ ID NO:30)T21ACTTGCTCTGCTCTCGCCGCGCCAGTGTCAGAAAAAC(SEQ ID NO:31)T245SCAAGCCTATGGCGTGAAATCCAACGTGCAGGATATGG(SEQ ID NO:32)T362KTGTGATGCTCGCGAATAAAAGCTATCCGAACCCGG(SEQ ID NO:33)V247ATGGCGTGAAAACCAACGCGCAGGATATGGCGAACT(SEQ ID NO:34)V303LCCGTGGAGGCAAACACGCTGGTCGAGGGCAGCGAC(SEQ ID NO:35)V304ITGGAGGCAAACACGGTGATCGAGGGCAGCGACAGT(SEQ ID NO:36)V31IGAAAAACAGCTGGCGGAGATCGTCGCGAATACGATTACC(SEQ ID NO:37)V45ITGATGAAAGCACAGAGTATTCCAGGCATGGCGGTG(SEQ ID NO:38)Y190FACCTTCTGGCATGCCCTTTGAGCAGGCCATGACGA(SEQ ID NO:39)Y61FGGGAAAACCGCACTATTTCACATTTGGCAAGGCCG(SEQ ID NO:40)T21ACTTGCTCTGCTCTCGCCGCGCCAGTGTCAGAAAAAC(SEQ ID NO:41)


Sequencing


Thirty colonies from each library were sequenced using M13 reverse and Dbseq primers by Qiagen Genomic Services (Valencia, Calif.). The sequences of the primers used in this sequencing were:

M13 reverse:CAGGAAACAGCTATGAC(SEQ ID NO:42)Dbseq:GCCGCTCAAGCTGGACCATA(SEQ ID NO:43)


The libraries were then screened and analyzed as described in Example 3. Statistical analysis indicated that 11 mutations appeared to stabilize the BLA protein, while 5 mutations appeared to destabilize it. The best clone, “NA03.8” was found to have 2 stabilizing and 1 neutral mutation.


Following the statistical analysis described below, an additional library “NA04,” was constructed in order to introduce 9 stabilizing mutations into NA03.8.


Screen for Thermostability


Libraries NA01, NA02, and NA03 were plated onto agar plates with LA medium containing 5 mg/l chloramphenicol. Thirty colonies from each library were transferred into a 96-well plate containing 200 ul LB(5 mg/l chloramphenicol). Four additional wells were inoculated with TOP10/pCB04, which served as control during the assay. A master plate was generated by adding glycerol and was stored frozen at −80° C.


A 96-well plate containing 200 ul LB (5 mg/l chloramphenicol and 0.1 mg/l cefotaxime) was inoculated from the master plate using a replication tool. The plate was incubated for 3 days at 25° C. in a humidified incubator at 225 rpm. The following operations were performed with each well of the cultured 96 well plate: 50 ul of culture were transferred into a plate that contained 50 ul B-PER reagent (Pierce). The suspension was incubated at room temperature for 90 min to lyze the cells and liberate BLA from the cells. The lysate was diluted 1000-fold and 10000 fold into 100 mM citrate/phosphate buffer pH 7.0 containing 0.125% octylglucopyranoside (Sigma). The diluted samples were heated to 56° C. for 1 h with mixing at 650 rpm. Subsequently, 20 ul of the sample were transferred to 180 ul of nitrocefin assay buffer (0.1 mg/l nitrocefin in 50 mM phosphate buffered saline containing 0.125% octylglucopyranoside) and the BLA activity was determined using a Spectramax plus plate reader (Molecular Devices) at 490 nm. In parallel, a control sample was subjected to the same procedure but the heating step was omitted. Based on both activity readings, the fraction of BLA activity that remained after the heat treatment was calculated for each of the 90 variants and 4 controls on the plate.


Out of these 90 clones, 7 clones had mutations which were not intended and appeared to be PCR mistakes that occurred during the QuikChange® reaction. For 3 clones, less than 67% complete sequence was obtained. All clones with unintended mutations or <67% complete sequence were excluded from further analysis.



FIG. 6 shows the remaining BLA activity of the 80 isolates from libraries NA01, NA02, and NA03. Of these isolates, 23 had no mutations. These variants are shown in black. It can be seen, that about 38% of the variants are more stable than wild type BLA. Table 2 provides the mutations that were detected in the 5 most stable BLA variants.

TABLE 2Mutations Detected in Stable BLA VariantsCloneMutationsNA03.8Q95E, A153S, I334LNA01.18A13D, F43Y, I65V, Q95E, R105T, T225S,I262V, V284I, T342KNA02.29S130A, A153S, A208P, T225S, V284INA03.20A13D, Q95E, M106L, T225S, I262V, I334LNA02.15A13D, V25I, I65V, A153S, Q219E, N232R,I262V


Statistical Analysis of the Correlation Between Sequence and Stability


The experiments described herein resulted in the identification of 80 isolates from the library for which stability measurements as well as sequence information were obtained. Of these 80 isolates, 23 contained no mutations, while the remaining 57 isolates contained between one and 11 of the consensus mutations. Seven of the isolates contained random mutations which were ignored in the statistical analysis.


Various statistical methods find use in making the determination of which mutations have a stabilizing effect. The description used herein is but one suitable method for this analysis. Thus, although an adaptation of the Free Wilson method was used here, other statistical methods or graphical analysis could have been used as well.


The contribution of each mutation to BLA stability was calculated based on the remaining activity of the 80 isolates using the Free Wilson method (Free and Wilson, J. Med. Chem., 7:395-399 [1964]). This method has been previously adapted to peptide substrates for proteases (See e.g., Pozsgay et al., Eur. J. Biochem., 115:491-495 [1981]). However, it apparently has not been used to characterize protein variants. During the analysis described herein, it was assumed that individual mutations make additive contributions to the stability of the protein. The analysis included 80 variants for which sufficient sequence information was available. The method assigns a parameter Pk to each of the m mutations in the data set. It also assumes that the remaining activity Ri of each variant can be calculated based on these parameters using equation (1):
log(Ri)=k=1mMkiPk+C(1)

where Mki equals one if variant i contains mutation k, and zero, if variant i does not contain mutation k and C is a constant that should reflect the remaining activity of the wild type enzyme. The parameters were determined by solving equation (2) using the solver function in Microsoft Excel.
i=1n{log(Ri)-k=1mMkiPk-C}=min(2)


The calculated parameters for some of the mutations are summarized in the FIG. 4.


The data illustrate, that not all consensus mutations stabilize BLA. Several mutations, Y41F, 165V, M106L, Q219E, and P295A appear to have significantly destabilizing effect on BLA. The following mutations are of particular interest, as they show significant stabilizing effect on BLA: V11I, V25I, R91K, Q95E, A153S, N232R, S247T, I262V, V293L, V294I, T342K.


The most stable variant, NA03.8, was chosen as the starting template for a further combinatorial library (NA04, described below), in order to introduce several additional stabilizing mutations into variant NA03.8.


Construction of Library NA04


Library NA04 was constructed using NA03.8 as template and 10 mutagenic primers as indicated below. One primer was designed to contain mutations V303L and V304I because these mutations can not be simultaneously introduced into a variant by individual mutagenic primers due to their proximity in the sequence. The combinatorial library NA04 was made with 10 mutagenic primers at a concentration of 0.04 μM (i.e., approximately 11 ng of each primer). The other conditions used to construct the library were identical to the conditions indicated above for the construction of NA01 through NA03, above. The mutagenic primers are provided below (the position of the mutation is given based on the entire gene including a 20 amino acid pro-peptide).

V31IGAAAAACAGCTGGCGGAGATCGTCGCGAATACGATTACC(SEQ ID NO:44)V45ITGATGAAAGCACAGAGTATTCCAGGCATGGCGGTG(SEQ ID NO:45)R111KGGACGATGCGGTGACCAAATACTGGCCACAGCTGA(SEQ ID NO:46)N252RACGTGCAGGATATGGCGCGCTGGGTCATGGCCAACA(SEQ ID NO:47)S267TGAGAACGTTGCTGATGCCACACTTAAGCAGGGCATCG(SEQ ID NO:48)I282VAGTCGCGCTACTGGCGTGTCGGGTCAATGTATCAG(SEQ ID NO:49)V303LCCGTGGAGGCAAACACGCTGGTCGAGGGCAGCGAC(SEQ ID NO:50)V304ITGGAGGCAAACACGGTGATCGAGGGCAGCGACAGT(SEQ ID NO:51)T362KTGTGATGCTCGCGAATAAAAGCTATCCGAACCCGG(SEQ ID NO:52)V303, V304CCGTGGAGGCAAACACGCTGATCGAGGGCAGCGACAGTAAG(SEQ ID NO:53)


Once the clones grew up, 616 clones from this library were screened for improved resistance to thermolysin, as described below in Example 2.


Example 2
Screening of NA04 for Protease Resistance

In this Example, experiments conducted to screen the NA04 library for protease resistance. In particular, in these experiments, library NA04 was screened to identify variants that resist degradation by the protease thermolysin at elevated temperature. Thermolysin is a thermostable protease which has been found to preferentially cleave unfolded proteins (See, Arnold and Ulbrich-Hofmann, Biochem., 36:2166-2172 [1997]).


The library NA04 was plated onto LA agar containing 5 mg/l chloramphenicol and 0.1 mg/l cefotaxime and incubated for 30 h at 37° C. Colonies were transferred into eight 96-well plates containing 160 ul per well of LB medium containing 5 mg/l chloramphenicol and 0.1 mg/l cefotaxime using an automated colony picker. For each plate, 8 wells were inoculated, with variant NA03.8 used as control. The plates were incubated for 48 h at 37° C. in a humidified incubator shaker. Subsequently, 70 ul of culture was transferred to a 96-well filter plate (Millipore) and 70 ul of B-PER reagent (Pierce) was added. After 30 min of incubation at room temperature to allow cell lysis, the plates were filtered producing clear lysate. Then, 90 ul of 25% glycerol was added to the remainder of the culture plates and they were stored at −80° C. The lysate was diluted 500-fold into destabilization buffer (50 mM imidazole pH 7.0, 10 mM CaCl2, 0.005% Tween®-20, 1 mg/l thermolysin (Sigma)). Then, 40 ul of the samples was immediately transferred into a fresh plate containing 10 ul of 50 mM EDTA to inactivate thermolysin. Then, the samples were incubated for 1 hour in a water bath at 46° C. to degrade unstable variants of BLA. Subsequently, a second sample of 40 ul was transferred into a fresh plate containing 10 ul of 50 mM EDTA. The amount of BLA activity was measured in both samples (obtained before and after heat treatment) by addition of 25 ul of sample into 175 ul of assay buffer (0.1 mg/l nitrocefin in 50 mM phosphate buffered saline containing 0.125% octylglucopyranoside), and the BLA activity was determined using a Spectramax plus plate reader (Molecular Devices) at 490 nm. The fraction of remaining BLA activity was calculated for each variant and 22 stabilized variants were chosen for further analysis.


The stability of the 22 variants was confirmed by repeating the same assay but testing 4 wells for each variants. During the confirmation experiment, the 22 stabilized variants had remaining activities of 24-45% whereas the parent, NA03.8, had only 13.5% of its activity remaining after thermolysin treatment. Table 3 provides the remaining activity and mutations for the 6 most stable variants.

TABLE 3Remaining Activity and Mutations for Six VariantsRemainingVariantActivity (%)MutationsNA03.8 (parent)13.5NoneNA04.240R91K, S247T, I262VNA04.1039V11I, V25I, N232R, I262V, V284INA04.1440V11I, R91K, N232R, I262V, V284INA04.1745V25I, R91K, N232R, I262V, V284INA04.1839V25I, R91K, I262V,NA04.2240V11I, V25I, R91K, N232R, S247T,I262V, V284I, T342K


In addition, 40 random variants were also isolated from library NA04 to assess the sequence variation in the library. All 9 intended mutations were observed at frequencies between 13-50%. Random clones from library NA04 contained an average of 3.15 mutations versus 3.9 mutations for the 22 stabilized variants. It was observed that 3 mutations, R91K, I262V, and V284I, were significantly enriched during the screen, which indicates that these 3 mutations have particularly significant stabilizing effect on BLA. In contrast, mutation V25I was reduced in its frequency during the screen which suggest, that this change is destabilizing BLA (See, FIG. 3).


Example 3
Testing the Protease Stability of BLA Variants

In this Example, experiments conducted to test the protease stability of three BLA variants (NA03.8, NA04.2, and NA04.17) produced in Example 1 are described. As a control, the parent BLA (pCB04) was also tested. The host cells expressing these variants and control BLA were inoculated into 1 L Terrific Broth containing 5 mg/l chloramphenicol and incubated at 37° C. over night. Cells were harvested by centrifugation (6000×g for 15 minutes). The pellets were resuspended in 200 ml of phosphate-buffered B-PER solution (Pierce). The suspensions were shaken for about 1 hour at room temperature until the pellets were solubilized. Cell wall debris and insoluble protein were removed by centrifugation (15000×g for 15 minutes). The supernatants were stored at 4° C., until purification.


Proteins were first purified using Ni-IMAC (Applied Biosystems). The purification was done on Bio-Cat (PerSeptive Biosystems, Applied Biosystems). A Waters column of 22 mm×95 mm was used. The column was first loaded with 250 mM NiCl, then it was washed with water and equilibrated with 10 mM HEPES, 0.5M NaCl, pH 8.4. Samples were loaded onto the column, washed with equilibration buffer, and eluted with 10 mM HEPES, 0.5M NaCl and a gradient of 200 mM imidazole.


The eluted protein was further purified by affinity chromatography using m-aminophenylboronic acid (PBA) resin (SIGMA). This purification was done by gravity flow. 15 ml PBA resin was packed in a disposable column 15×120 mm (Bio-Rad) and equilibrated with 20 mM TEA, 0.5M NaCl, pH 7. After loading the sample, the columns were washed with 4 column volumes of equilibration buffer, and subsequently BLA was eluted with 0.5M sodium borate, 0.5M NaCl, pH 7. A purity level of 99% was achieved for these proteins, as determined by SDS-PAGE.


Purified proteins (˜1 ug) were incubated with different concentrations of each test protease in 100 mM Tris-HCl 10 mM CaCl2 0.005% TWEEN®20 pH, 7.9 for different time periods at 37° C. in quadruplicates. Trypsin, chymotrypsin, and thermolysin (SIGMA) were tested in these experiments. The BLA activity was measured for samples with protease and without protease by monitoring the hydrolysis of its chromogenic substrate nitrocefin (Oxoid). The remaining activity of protease-treated sample to untreated sample in percent was calculated for each variant (i.e., relative remaining activity). The data were normalized to the most stable variant. FIG. 5 provides a graph showing the relative remaining activity of these variants upon exposure to these proteases. As compared to the parent protein, all three of the stabilized variants of BLA were found to be significantly more resistant to protease cleavage by all of the test proteases.


Example 4
Stabilization of an scFv

In this Example, experiments conducted to stabilize a single chain variable fragment (scFv) are described. As described below, the methods of the present invention provide means to identify stabilized variants of CAB1-scFv. Indeed, the method allowed for the screening of relatively small libraries, with six changes being accumulated in the best-performing variant. The Example also demonstrates that fusion of the CAB1-scFv greatly facilitates the identification of improved variants of this molecule.


A. Construction of pME27.1


Plasmid pME27.1 was generated by inserting a BglI/EcoRV fragment encoding a part of the pelB leader, the CAB1-scFv and a small part of BLA into the expression vector pME25. The amino acid sequence of CAB1 is provided in FIG. 7. FIG. 8 provides a map of this plasmid, while FIG. 9 provides its nucleotide sequence (SEQ ID NO:6). The insert, encoding for the CAB1-scFv, has been synthesized by Aptagen, based on the sequence of the previously described scFv MFE-23 (See, Boehm et al., Biochem. J., 346(Pt 2): 519-28 [2000]). Both the plasmid containing the synthetic gene (pPCR-GME1) and pME25 were digested with BglI and EcoRV, gel purified and ligated together with ligase using the Takara DNA ligation kit (Takara) according to the manufacturer's instructions. The ligated product was transformed into TOP10 (Invitrogen) electrocompetent cells, plated on LA medium containing 5 mg/l chloramphenicol and 0.1 mg/l cefotaxime.


Plasmid pME27.1 contains the following features (bases indicated):

P lac:4992-5113 bppel B leader: 13-78CAB 1 scFv: 79-810BLA: 811-1896T7 term.:2076-2122CAT:3253-3912


The CAB1 sequence, indicating heavy (SEQ ID NO:2) and light (SEQ ID NO:4) chain domains, as well as the linker (SEQ ID NO:3), and BLA (SEQ ID NO:5) is provided in FIG. 7.


B. Choosing Mutations for Mutagenesis


The sequence of the vH and vL sequences of CAB1-scFv were compared with a published frequency analysis of human antibodies (Steipe, Sequenzdatenanalyse. (“Sequence Data Analysis”, available in German only) in Zorbas and Lottspeich (eds.), Bioanalytik, Spektrum Akademischer Verlag. S. 233-241 [1998]). The authors aligned sequences of variable segments of human antibodies as found in the Kabat data base and calculated the frequency of occurrence of each amino acid for each position. The frequencies were published by the authors on the internet and are shown in Tables 4 and 5. The Tables also show the sequence of CAB1-scFv, the location of the CDRs, and they show which positions were selected for CCM.

TABLE 4Amino Acid Frequencies in Heavy Chains of Human AntibodiesPositionNumber(HeavyofObserved Frequencies of 5 Most Abundant AminoCAB1MutatedChain)ObservationsAcids in Alignment of Human SequencesSequenceCDRResidues1291E0.616Q0.346D0.014G0.014A0.003L0.003Q2293V0.887M0.027L0.024S0.020I0.017A0.007V3291Q0.852H0.034R0.027T0.027E0.014V0.014K14282L0.975V0.011A0.007D0.004M0.004L5276V0.645Q0.148L0.120R0.022M0.014N0.014Q6267E0.693Q0.263A0.022D0.011G0.007R0.004Q7265S0.951W0.019X0.015T0.008A0.004N0.004S8266G0.989S0.008T0.004G9274G0.624A0.193P0.164S0.011E0.004H0.004A10271G0.638E0.192D0.081A0.070T0.011V0.007E 11270L0.681V0.270F0.030S0.019L12267V0.757K0.154I0.026N0.022L0.015A0.007V13247K0.474Q0.428R0.049E0.034G0.004H0.004R114251P0.968A0.012K0.008G0.004L0.004S0.004S115244G0.783S0.156T0.033P0.016K0.008E0.004G16243G0.488E0.131Q0.107A0.094R0.082S0.066T117234S0.766T0.204A0.009F0.009P0.004R0.004S18244L0.812V0.155M0.008A0.004E0.004F0.004V19242R0.545K0.240S0.161T0.037A0.012Q0.004K20246L0.736V0.191I0.061E0.004R0.004X0.004L21218S0.729T0.234G0.009I0.009A0.005D0.005S22217C0.991R0.005S0.005C23231A0.558K0.203T0.117E0.048V0.022I0.013T24235A0.638V0.174G0.064I0.055T0.030F0.026A25226S0.951Y0.027F0.009C0.004K0.004T0.004S26225G0.956E0.013A0.009D0.009S0.009V0.004G27213F0.559Y0.164G0.150D0.080S0.019L0.014F28203T0.571S0.286I0.049N0.049P0.015A0.005N129207F0.749V0.111I0.068L0.053T0.010A0.005I130202S0.762T0.119N0.035G0.020R0.020A0.010K131199S0.482T0.136D0.104N0.087G0.060K0.040DH132202Y0.535S0.144N0.083A0.069D0.031G0.030SH133197A0.269Y0.162G0.147W0.117S0.091T0.066YH134200M0.520I0.210W0.070A0.055Y0.050V0.040MH135196S0.372H0.235N0.077A0.061G0.051Y0.046HH135a330.824W0.096V0.043G0.016S0.016N0.005H135b270.856N0.064G0.037S0.032A0.005R0.005H136192W0.990M0.005T0.005W37193V0.741I0.228L0.021G0.005Q0.005L138190R0.989P0.005V0.005R39190Q0.979T0.011G0.005R0.005Q40191A0.634P0.199S0.073M0.052G0.010V0.010G141187P0.914S0.043T0.021A0.005L0.005Q0.005P42187G0.925S0.064P0.005R0.005E143186K0.683Q0.183R0.124E0.005H0.005Q44186G0.882A0.048S0.043R0.027G45186L0.978P0.022L46185E0.956Q0.039V0.005E47184W0.989S0.011W48185V0.481M0.222I0.173L0.124I49185G0.600S0.216A0.162E0.005L0.005T0.005G50185R0.146W0.146V0.119A0.114G0.081Y0.081WH251185I0.822T0.081R0.027V0.022K0.016M0.011IH252184S0.250Y0.239N0.123K0.060I0.054D0.050DH252a1410.230P0.180Y0.153G0.126N0.066V0.055PH252b340.814K0.115R0.060G0.005Y0.005H252c220.880T0.044V0.033S0.022A0.011G0.005H253184S0.228D0.163Y0.125G0.109N0.082H0.054EH254183G0.328S0.202D0.129N0.112K0.082F0.055NH255182G0.544S0.181D0.085W0.066Y0.060N0.020GH256182S0.231D0.182N0.147T0.143Y0.077G0.060DH257184T0.582K0.120N0.065A0.054I0.054P0.022TH258183Y0.322N0.216D0.139R0.060H0.055T0.038EH259184Y0.908F0.043N0.016S0.011D0.005G0.005YH260183A0.579N0.153S0.104T0.055R0.044G0.027AH261184D0.277P0.239Q0.174A0.141V0.076T0.033PH262185S0.686K0.146P0.065N0.038G0.016R0.016KH263186V0.511L0.247F0.215S0.011A0.005K0.005FH264186K0.581Q0.274R0.054N0.032E0.022T0.022QH265186G0.688S0.237T0.032A0.016D0.011E0.011GH266186R0.935Q0.054H0.005I0.005K167186F0.462V0.409I0.065L0.054A0.005S0.005A168186T0.914I0.038A0.016S0.011K0.005N0.005T69187I0.791M0.139V0.032D0.005F0.005G0.005F170187S0.684T0.214N0.070L0.032T71187R0.529V0.160A0.107P0.064T0.053K0.043T172186D0.902N0.071K0.016E0.011D73185T0.368N0.266D0.177K0.070E0.059A0.011T74186S0.946A0.048L0.005S75187K0.674T0.139I0.070R0.027A0.021F0.021S176187N0.701S0.251K0.027R0.011T0.005Y0.005N77187T0.615Q0.273S0.048M0.021L0.016P0.011T78186L0.364A0.273F0.235V0.096I0.005M0.005A79187Y0.638S0.239F0.059V0.048H0.005M0.005Y80187L0.782M0.207N0.0050.005L81187Q0.529E0.205K0.122R0.032T0.032N0.027Q82194M0.497L0.421W0.051V0.015I0.0100.005L82a195N0.442S0.291R0.077T0.066D0.053G0.020S82b194S0.795N0.082R0.051G0.026T0.021A0.010S82c197L0.701V0.234M0.041G0.010A0.005D0.005L83197R0.528T0.239K0.122D0.041E0.020Q0.015T84198A0.495P0.182S0.177T0.051I0.035V0.030S85198E0.591A0.172D0.126S0.051V0.045G0.015E86198D0.975T0.010V0.010N0.005D87198T0.929S0.035G0.010M0.010A0.005Q0.005T88198A0.939G0.040P0.005T0.005V0.005Y0.005A89198V0.768L0.066M0.056T0.045I0.040F0.010V90199Y0.980F0.010A0.005I0.005Y91199Y0.930F0.045C0.015R0.005T0.005Y92198C0.990A0.005M0.005C93198A0.838T0.076V0.061H0.005K0.005N0.005N194198R0.596K0.162T0.051G0.045P0.045Q0.025E195161G0.174D0.120E0.099A0.093N0.092P0.068G96159P0.168R0.130G0.112L0.062V0.062Y0.062TH397156G0.170P0.094V0.094E0.088T0.069S0.063PH398155G0.152Y0.101L0.095D0.087V0.076S0.063TH399143G0.172Y0.108T0.1020.089A0.076E0.070GH31001310.171S0.165Y0.146G0.095V0.070R0.051PH3100a1100.304G0.146S0.095D0.046A0.044L0.044YH3100b990.369G0.134S0.127T0.076Y0.045V0.038YH3100c920.410G0.122Y0.103D0.058S0.058P0.045H3100d720.538Y0.058G0.051S0.051C0.045L0.038H3100e620.600Y0.155S0.045F0.032G0.032A0.026H3100f530.658Y0.097H0.039R0.039P0.026S0.026H3100g410.735Y0.084G0.065Q0.026S0.019D0.013H3100h300.806Y0.058D0.032A0.019G0.019S0.019H3100i240.844Y0.039G0.026X0.019L0.013N0.013H3100j800.481Y0.149A0.117W0.084F0.045G0.039H3100k138F0.503M0.144L0.1370.098D0.039V0.033FH3101149D0.754A0.073R0.066N0.020Q0.020P0.013DH3102151Y0.368V0.224I0.112S0.086P0.072H0.053YH3103154W0.955E0.013F0.013D0.006R0.006Y0.006W104154G0.974Y0.013D0.006T0.006G105154Q0.798R0.104K0.045E0.013N0.013S0.013Q106155G0.987Y0.0060.006G107152T0.908S0.026V0.020G0.013I0.007L0.007T108152L0.645T0.178M0.105P0.020K0.013R0.013T109151V0.967L0.013I0.007M0.007X0.007V110151T0.940S0.026I0.013A0.007H0.007V0.007T111137V0.978I0.015T0.007V112138S0.971T0.014R0.007V0.007S113131S0.962P0.015A0.008L0.008T0.008S









TABLE 5










Amino Acid Frequencies in Human vL Fragments













Position
Number







(Light
of
Observed Frequencies of 5 Most Abundant Amino Acids
CAB1

Mutated












Chain)
Observations
in Alignment of Human Sequences
Sequence
CDR
Residues



























1
95
Q
0.589
S
0.158
N
0.095
H
0.074
D
0.053
F
0.021
E

1






2
139
S
0.446
Y
0.388
F
0.101
V
0.043
L
0.014
T
0.007
N

1





3
140
V
0.307
E
0.243
A
0.207
M
0.093
D
0.064
I
0.043
V





4
140
L
0.971
V
0.029








L





5
141
T
0.915
A
0.021
S
0.021
I
0.014
K
0.007
L
0.007
T





6
140
Q
0.993
E
0.007








Q





7
139
P
0.906
D
0.029
S
0.029
A
0.022
E
0.014


S

1





8
139
P
0.741
A
0.137
H
0.072
R
0.029
L
0.007
S
0.007
P





9
139
S
0.964
A
0.014
V
0.014
R
0.007




A

1





10
0

1.000










I

1





11
138
V
0.790
A
0.138
L
0.058
M
0.014




M

1





12
139
S
0.978
F
0.007
T
0.007
E
0.004
Q
0.004


S





13
138
V
0.406
G
0.348
A
0.138
E
0.087
L
0.014
D
0.007
A





14
135
S
0.630
A
0.230
T
0.111
D
0.007
F
0.007
G
0.007
S





15
135
P
0.881
L
0.089
A
0.022
S
0.007




P





16
134
G
0.978
E
0.015
L
0.007






G





17
133
Q
0.811
K
0.098
A
0.045
E
0.024
G
0.015
H
0.008
E

1





18
133
T
0.504
S
0.263
R
0.135
K
0.068
E
0.008
G
0.008
K

1





19
130
V
0.454
A
0.385
I
0.146
G
0.008
L
0.008


V





20
128
T
0.531
R
0.188
S
0.148
K
0.047
I
0.031
M
0.016
T





21
121
I
0.901
V
0.050
L
0.017
A
0.008
F
0.008
M
0.008
I





22
120
S
0.492
T
0.475
A
0.008
G
0.008
I
0.008
N
0.008
T





23
117
C
1.000










C





24
112
S
0.536
T
0.259
G
0.089
A
0.045
Q
0.033
I
0.018
S
L1





25
108
G
0.870
L
0.056
R
0.028
A
0.019
I
0.009
P
0.009
A
L1





26
108
D
0.339
S
0.250
T
0.213
N
0.087
E
0.037
G
0.037
S
L1





27
104
S
0.415
N
0.118
K
0.113
A
0.104
T
0.066
G
0.047
S
L1





28
104
L
0.346
S
0.346
I
0.115
G
0.067
A
0.058
D
0.019
S
L1





29
100
G
0.243
N
0.239
D
0.159
S
0.078
P
0.068
H
0.058
V
L1





30
103
I
0.291
V
0.165
D
0.136
N
0.107
E
0.058
S
0.049
S
L1





31
101
G
0.356
K
0.168
A
0.099
E
0.084
Q
0.084
D
0.069
Y
L1





31a
54

0.438
S
0.167
G
0.104
N
0.083
Y
0.063
D
0.052
M
L1





31b
49

0.495
N
0.227
Y
0.155
S
0.041
G
0.021
H
0.021
H
L1





31c
23

0.760
N
0.134
S
0.031
K
0.021
D
0.012
E
0.010

L1





31d
0

1.000











L1





31e
0

1.000











L1





31f
0

1.000











L1





32
94
Y
0.515
S
0.134
F
0.093
A
0.072
T
0.052
H
0.041

L1





33
97
V
0.680
A
0.186
I
0.082
Y
0.021
F
0.010
P
0.010

L1





34
92
S
0.380
H
0.120
A
0.109
Y
0.098
N
0.076
Q
0.076

L1





35
98
W
0.990
Y
0.010








W





36
96
Y
0.844
F
0.073
H
0.073
W
0.010




F

1





37
95
Q
0.916
R
0.042
E
0.011
H
0.011
K
0.011
Y
0.011
Q





38
94
Q
0.862
H
0.053
L
0.053
E
0.011
K
0.011
V
0.011
Q





39
93
K
0.333
L
0.172
R
0.161
H
0.151
Q
0.086
V
0.043
K





40
93
P
0.946
S
0.022
A
0.011
L
0.011
R
0.011


P





41
93
G
0.871
H
0.065
D
0.022
R
0.022
P
0.011
V
0.011
G





42
92
Q
0.424
T
0.217
K
0.163
R
0.087
S
0.054
G
0.022
T





43
92
A
0.717
S
0.174
G
0.065
T
0.022
L
0.011
V
0.011
S





44
93
P
0.978
A
0.011
M
0.011






P





45
92
K
0.391
V
0.315
R
0.109
L
0.065
T
0.065
A
0.033
K





46
92
L
0.728
V
0.076
F
0.065
T
0.043
A
0.022
M
0.022
L





47
91
V
0.484
L
0.374
I
0.077
M
0.055
N
0.011


W

1





48
91
I
0.791
V
0.110
M
0.077
L
0.011
S
0.011


I





49
91
Y
0.769
F
0.110
R
0.066
H
0.022
D
0.011
I
0.011
Y





50
89
D
0.303
E
0.210
Q
0.093
V
0.067
G
0.056
K
0.056
S
L2





51
88
D
0.364
N
0.205
V
0.159
H
0.068
T
0.068
G
0.034
T
L2





52
89
N
0.393
T
0.213
S
0.202
D
0.101
A
0.022
F
0.011
S
L2





53
88
K
0.307
D
0.193
Q
0.182
N
0.080
E
0.057
S
0.057
N
L2





54
88
R
0.875
X
0.068
K
0.034
L
0.011
W
0.011


L
L2





55
86
P
0.851
G
0.080
S
0.023
A
0.011
H
0.011
R
0.011
A
L2





56
85
S
0.837
D
0.081
P
0.023
A
0.012
L
0.012
T
0.012
S
L2





57
86
G
0.920
E
0.034
S
0.011
T
0.011
W
0.011

0.011
G





58
84
I
0.600
V
0.353
A
0.012
G
0.012
T
0.012

0.012
V





59
84
P
0.847
S
0.106
A
0.012
L
0.012
V
0.012

0.012
P





60
85
D
0.488
E
0.325
N
0.047
A
0.035
H
0.023
L
0.023
A

1





61
87
R
0.977
D
0.011

0.011






R





62
88
F
0.943
I
0.034
L
0.011
R
0.011




F





63
87
S
0.989
F
0.011








S





64
87
G
0.885
A
0.069
S
0.023
V
0.023




G





65
87
S
0.977
G
0.011
Y
0.011






S





66
86
K
0.430
N
0.186
S
0.186
T
0.081
X
0.070
R
0.035
G

1





67
85
S
0.953
T
0.024
K
0.012
L
0.012




S





68
85
G
0.859
S
0.071
A
0.035
D
0.024
Q
0.012


G





69
85
N
0.434
T
0.318
A
0.129
D
0.036
G
0.024
K
0.024
T





70
85
T
0.529
S
0.341
E
0.082
A
0.024
K
0.024


S





71
85
A
0.847
R
0.082
V
0.059
S
0.012




Y

1





72
85
T
0.447
S
0.424
Y
0.082
A
0.035
I
0.012


S





73
85
L
0.988
S
0.012








L





74
85
T
0.706
A
0.165
G
0.106
I
0.012
L
0.012


T





75
85
I
0.929
V
0.047
A
0.012
L
0.012




I





76
85
S
0.718
T
0.200
N
0.035
I
0.024
G
0.012
R
0.012
S





77
85
G
0.765
R
0.129
S
0.094
E
0.012




R





78
85
L
0.588
V
0.224
T
0.106
A
0.071
G
0.012


M

1





79
85
Q
0.659
E
0.153
R
0.071
K
0.047
L
0.024
A
0.012
E





80
85
A
0.459
S
0.235
T
0.200
V
0.047
P
0.035
N
0.012
A





81
85
E
0.541
G
0.235
M
0.071
D
0.047
L
0.024
N
0.024
E





82
85
D
0.964
N
0.024
E
0.012






D





83
85
E
0.976
D
0.012
T
0.012






A

1





84
85
A
0.941
T
0.035
E
0.012
S
0.012




A





85
85
D
0.859
E
0.082
H
0.024
A
0.012
I
0.012
M
0.012
T

1





86
85
Y
0.976
F
0.012
H
0.012






Y





87
85
Y
0.894
F
0.106








Y





88
85
C
0.988
H
0.012








C





89
85
Q
0.482
A
0.153
S
0.141
G
0.094
C
0.059
N
0.035
Q
L3





90
85
S
0.388
T
0.271
A
0.212
V
0.118
L
0.012


Q
L3





91
85
W
0.576
Y
0.247
A
0.059
F
0.035
R
0.035
D
0.012
R
L3





92
84
D
0.606
G
0.095
A
0.071
N
0.061
T
0.048
E
0.024
S
L3





93
84
S
0.405
D
0.179
G
0.107
N
0.095
P
0.071
T
0.060
S
L3





94
84
S
0.536
G
0.155
N
0.073
R
0.060
D
0.058
T
0.048
Y
L3





95
82
S
0.265
L
0.253
G
0.108
N
0.096
T
0.084
A
0.036
P
L3





95a
60

0.268
S
0.183
D
0.159
N
0.110
T
0.073
Q
0.049
L
L3





95b
40

0.512
A
0.098
G
0.098
H
0.085
E
0.049
R
0.037
T
L3





95c
5

0.939
P
0.037
A
0.012
G
0.012





L3





95d
1

0.988
G
0.012









L3





95e
0

1.000











L3





95f
0

1.000











L3





96
80
V
0.305
G
0.098
P
0.098
W
0.098
A
0.073
N
0.073

L3





97
85
V
0.788
I
0.118
L
0.047
M
0.035
G
0.012



L3





98
86
F
0.988
V
0.012








F





99
89
G
0.989
F
0.011








G





100
89
G
0.831
T
0.124
A
0.022
S
0.022




A

1





101
89
G
1.000










G





102
89
T
0.989
G
0.011








T





103
88
K
0.739
N
0.091
R
0.068
Q
0.034
T
0.034
E
0.011
K





104
87
L
0.667
V
0.322
Q
0.011






L





105
87
T
0.954
S
0.023
I
0.011
L
0.011




E

1





106
85
V
0.988
T
0.012








L

1





106a
84
L
0.952
V
0.024
P
0.012
Q
0.012




K

1





107
78
G
0.782
S
0.103
R
0.090
C
0.013
L
0.013


R

1





108
46
Q
0.957
P
0.022
R
0.022






A

1





109
46
P
0.957
K
0.022
Q
0.022






A

1









These frequencies were compared with the actual amino acid sequence of CAB1. Based on these comparisons, 33 positions that fulfilled the following criteria were identified: 1) the position is not part of a CDR as defined by the Kabat nomenclature; 2) the amino acid found in CAB1-scFv is observed in the homologous position in less than 10% of human antibodies; and 3) the position is not one of the last 6 amino acids in the light chain of scFv. These 33 positions were then used in the combinatorial mutagenesis methods of the present invention.


Mutagenic oligonucleotides were synthesized for each of the 33 positions such that the targeted position would be changed from the amino acid in CAB1-scFv to the most abundant amino acid in the homologous position of a human antibody. FIG. 10 provides the sequence of CAB1-scFv, the CDRs, and the mutations that were chosen for combinatorial mutagenesis.


Construction of Library NA05


Table 6 provides the sequences of 33 mutagenic oligonucleotides that were used to generate the combinatorial library designated as “NA05.”

TABLE 6Mutagenic Primers Used to Generate NA05pos. (pME27)CAB1Consensus aa (VH)Primer NameQuikChange ® Oligonucleotide Primer SequenceSEQ ID NO:3KQnsa147.1fpCGGCCATGGCCCAGGTGCAGCTGCAGCAGTCTGGGGC5413RKnsa147.2fpCTGGGGCAGAACTTGTGAAATCAGGGACCTCAGTCAA5514SPnsa147.3fpGGGCAGAACTTGTGAGGCCGGGACCTCAGTCAAGTT5616TGnsa147.4fpAACTTGTGAGGTCAGGGGGCTCAGTCAAGTTGTCCTG5728NTnsa147.5fpGCACAGCTTCTGGCTTCACCATTAAAGACTCCTATAT5829IFnsa147.6fpCAGCTTCTGGCTTCAACTTTAAAGACTCCTATATGCA5930KSnsa147.7fpCTTCTGGCTTCAACATTAGCGACTCCTATATGCACTG6037LVnsa147.8fpACTCCTATATGCACTGGGTGAGGCAGGGGCCTGAACA6140GAnsa147.9fpTGCACTGGTTGAGGCAGGCGCCTGAACAGGGCCTGGA6242EGnsa147.10fpGGTTGAGGCAGGGGCCTGGCCAGGGCCTGGAGTGGAT6367KRnsa147.11fpCCCCGAAGTTCCAGGGCCGTGCCACTTTTACTACAGA6468AFnsa147.12fpCGAAGTTCCAGGGCAAGTTCACTTTTACTACAGACAC6570FInsa147.13fpTCCAGGGCAAGGCCACTATTACTACAGACACATCCTC6672TRnsa147.14fpGCAAGGCCACTTTTACTCGCGACACATCCTCCAACAC6776SKnsa147.15fpTTACTACAGACACATCCAAAAACACAGCCTACCTGCA6897NAnsa147.16fpCTGCCGTCTATTATTGTGCGGAGGGGACTCCGACTGG6998ERnsa147.17fpCCGTCTATTATTGTAATCGCGGGACTCCGACTGGGCC70136EQnsa147.18fpCTGGCGGTGGCGGATCACAGAATGTGCTCACCCAGTC71137NSnsa147.19fpGCGGTGGCGGATCAGAAAGCGTGCTCACCCAGTCTCC72142SPnsa147.20fpGAAAATGTGCTCACCCAGCCGCCAGCAATCATGTCTGC73144ASnsa147.21fpTGCTCACCCAGTCTCCAAGCATCATGTCTGCATCTCC74146MVnsa147.22fpCCCAGTCTCCAGCAATCGTGTCTGCATCTCCAGGGGA75152EQnsa147.23fpTGTCTGCATCTCCAGGGCAGAAGGTCACCATAACCTG76153KTnsa147.24fpCTGCATCTCCAGGGGAGACCGTCACCATAACCTGCAG77170FYnsa147.25fpTAAGTTACATGCACTGGTACCAGCAGAAGCCAGGCAC78181WVnsa147.26fpGCACTTCTCCCAAACTCGTGATTTATAGCACATCCAA79194ADnsa147.27fpTGGCTTCTGGAGTCCCTGATCGCTTCAGTGGCAGTGG80200GKnsa147.28fpCTCGCTTCAGTGGCAGTAAATCTGGGACCTCTTACTC81205YAnsa147.29fpGTGGATCTGGGACCTCTGCGTCTCTCACAATCAGCCG82212MLnsa147.30fpCTCTCACAATCAGCCGACTGGAGGCTGAAGATGCTGC83217AEnsa147.31fpGAATGGAGGCTGAAGATGAAGCCACTTATTACTGCCA84219TDnsa147.32fpAGGCTGAAGATGCTGCCGATTATTACTGCCAGCAAAG85234AGnsa147.33fpACCCACTCACGTTCGGTGGCGGCACCAAGCTGGAGCT86


The QuikChange® multi site-directed mutagenesis kit (QCMS; Stratagene Catalog # 200514) was used to construct the combinatorial library NA05 using the above 33 mutagenic primers. The primers were designed so that they had 17 bases flanking each side of the codon of interest based on the template plasmid pME27.1. The codon of interest was changed to encode the appropriate consensus amino acid using an E. coli codon usage table (indicated in the above Table by underlining). All primers were designed to anneal to the same strand of the template DNA (i.e., all were forward primers). The QCMS reaction was carried out as described in the QCMS manual with the exception of the primer concentration used, as approximately 3 ng of each primer were used in the experiments described herein, while the QCMS manual recommends using 50 ng of each primer in the reaction. However, it is not intended that the present invention be limited to any particular primer concentration as other primer concentrations find use in the present invention.


In particular, the reaction used in the present Example contained 50-100 ng template plasmid (pME27.1; 5178 bp), 1 μl of primer mix (10 μM stock of all primers combined containing 0.3 μM each primer), 1 μl dNTPs (QCMS kit), 2.5 μl 10× QCMS reaction buffer, 18.5 μl deoinized water, and 1 μl enzyme blend (QCMS kit), for a total volume of 25 μl. The thermocycling program was set for 1 cycle at 95° for 1 min., followed by 30 cycles of 95° C. for 1 min., 55° C. for 1 min., and 65° C. for 10 minutes. DpnI digestion was performed by adding 1 μl DpnI (provided in the QCMS kit), incubation at 37° C. for 2 hours, addition of another 1 μl DpnI, and incubation at 37° C. for an additional 2 hours. Then, 1 μl of the reaction was transformed into 50 μl of TOP10 electrocompetent cells from Invitrogen. Then, 250 μl of SOC was added after electroporation, followed by a 1 hr incubation with shaking at 37° C. Thereafter, 10-50 μl of the transformation mix was plated on LA plates with 5 ppm chloramphenicol (CMP) or LA plates with 5 ppm CMP and 0.1 ppm of cefotaxime (CTX) for selection of active BLA clones. The active BLA clones from the CMP+CTX plates were used for screening, whereas the random library clones from the CMP plates were sequenced to assess the quality of the library.


Sixteen randomly chosen clones were sequenced. The clones contained different combinations of 1 to 7 mutations.


D. Screen for Improved Expression


It was observed that when TOP10/pME27.1 is cultured in LB medium at 37° C., the concentration of intact fusion protein peaks after one day and most of the fusion protein is degraded by host proteases after 3 days of culture. Degradation appears to occur mainly in the scFv portion of the CAB1 fusion protein, as the cultures contain significant amounts of free BLA after 3 days, which can be detected by Western blotting, or nitrocefin (Oxoid) activity assay. Thus, library NA05 was screened to detect variants of CAB1-scFv that would resist degradation by host proteases over 3 days of culture at 37° C.


To conduct the screen, library NA05 was plated onto agar plates with LA medium containing 5 mg/l chloramphenicol and 0.1 mg/l cefotaxime (Sigma). Then, 910 colonies were transferred into a total of 10 96-well plates containing 100 ul/well of LA medium containing 5 mg/l chloramphenicol and 0.1 mg/l cefotaxime. Four wells in each plate were inoculated with TOP10/pME27.1 as control and one well per plate was left as a blank. The plates were grown overnight at 37° C. The next day, the cultures were used to inoculate fresh plates (production plates) containing 100 ul of the same medium using a transfer stamping tool and glycerol was added to the master plates which were stored at −70° C., as known in the art. The production plates were incubated in a humidified shaker at 37° C. for 3 days. Then, 100 ul/well of B-PER (Pierce) were added to the production plate to release protein from the cells.


Samples from the production plate were diluted 100-fold in PBST (PBS containing 0.125% Tween®-20) and BLA activity was measured by transferring 20 ul diluted lysate into 180 ul of nitrocephin assay buffer (0.1 mg/ml nitrocephin in 50 mM PBS buffer containing 0.125% octylglucopyranoside (Sigma)), and the BLA activity was determined at 490 nm using a Spectramax plus plate reader (Molecular Devices).


Binding to CEA (carcinoembryonic antigen; Biodesign) was measured using the following procedure: 96-well plates were coated with 100 ul per well of 5 ug/ml of CEA in 50 mM carbonate buffer pH 9.6 and incubated overnight at 4° C. The plates were washed with PBST and blocked for 1-2 hours with 300 ul of casein (Pierce) at 25° C. Then, 100 ul of sample from the production plate diluted 100-1000 fold was added to the CEA coated plate and the plates were incubated for 2 h at room temperature. Subsequently, the plates were washed four times with PBST, 200 ul nitrocefin assay buffer were added, and the BLA activity was measured as described above.


The BLA activity determined by the CEA-binding assay and the total BLA activity found in the lysate plates were compared in order to identify variants that showed high levels of total BLA activity and high levels of CEA-binding activities.


The “winners” (i.e., variants with the highest total BLA activity and CEA-binding activity) were confirmed by testing 4 replicates in a similar protocol. The variants were cultured in 2 ml of LB containing 5 mg/l chloramphenicol and 0.1 mg/] cefotaxime for 3 days. Protein was released from the cells using B-PER reagent. The binding assay was performed as described above, but different dilutions of culture lysate were tested for each variant. Thus, a binding curve which provides a measure of the binding affinity of the variant for the target CEA was produced. The binding curve obtained is shown in FIG. 11. The culture supernatants were also analyzed by SDS-PAGE. Variant NA05.6 was found to contain a pronounced band at an approximate molecular weight of 65 kD that was significantly weaker for the parent molecule and for most of the other tested isolates. Table 7 provides a list of 6 variants with the largest improvement in stability.

TABLE 7Sequence of Six VariantsCloneMutationsNA05.6R13K, T16G, W181VNA05.8R13K, F170Y, A234GNA05.9K3Q, S14P, L37V, E42G, E136Q, M146V, W181V, A234GNA05.10K3Q, L37V, P170Y, W181VNA05.12K3Q, S14P, L37V, M146VNA05.15M146V, F170Y, A194D


E. Construction of Library NA06


Clone NA05.6 was chosen as the best variant and was used as the template for a second round of combinatorial mutagenesis. A subset of the same mutagenic primers that had been used to generate library NA05 were used to generate combinatorial variants with the following mutations: K3Q, L37V, E42G, E136Q, M146V, F170Y, A194D, A234G, which had been identified in other winners from library NA05. The primer encoding mutation S14P was not used, as its sequence overlapped with mutations R13K and T16G that are present in NA05.6. A combinatorial library (designated “NA06”) was constructed using QCMS method as described above. The template used was pNA05.6 and 1 μl of primer mix (10 μM stock of all primers combined containing 1.25 μM each primer) were used.


F. Screening of Library NA06


The screen was performed as described above with the following modifications described below. In these experiments, 291 variants were screened using three 96-well plates. For each well, a 10 μl sample from the lysate plates was added to 180 μl of 10 μg/ml thermolysin (Sigma) in 50 mM imidazole buffer pH 7.0 containing 0.005% Tween®-20 and 10 mM calcium chloride. This mixture was incubated for 1 h at 37° C., to hydrolyze unstable variants of NA05.6. This protease-treated sample was used to perform the CEA-binding assay as described above. Promising variants were cultured in 2 ml medium as described above and binding curves were obtained for samples after thermolysin treatments. FIG. 12 provides binding curves for selected clones. As indicated in the Figure, a number of variants retain much more binding activity after thermolysin incubation than the parent NA05.6. Table 8 provides 6 variants that are significantly more resistant to protease than NA05.6. All 6 of these variants have the mutation L37V which was rare in randomly chosen clones from the same library. Further testing showed that variant NA06.6 had the highest level of total BLA activity and the highest protease resistance of all the tested variants.

TABLE 8Six Variants More Protease Resistant than NA05.6CloneMutationsNA06.2R13K, T16G, W181V, L37V, E42G, A194DNA06.4R13K, T16G, W181V, L37V, M146VNA06.6R13K, T16G, W181V, L37V, M146V, K3QNA06.10R13K, T16G, W181V, L37V, M146V, A194DNA06.11R13K, T16G, W181V, L37V, K3Q, A194DNA06.12R13K, T16G, W181V, L37V, E136Q


Claims
  • 1. A method for combinatorial consensus mutagenesis comprising the steps: a) identifying a starting gene of interest; b) identifying at least two homologs of said starting gene of interest; c) generating a multiple sequence alignment of said at least two homologs of said starting gene of interest, and said starting gene of interest; d) using said multiple sequence alignment to identify consensus mutations and produce a combinatorial consensus library; and e) screening said combinatorial consensus library to identify at least one initial hit.
  • 2. The method of claim 1, further comprising the steps: f) sequencing said at least one initial hit to provide at least one sequenced initial hit; and g) identifying improving mutations in said at least one sequenced initial hit.
  • 3. The method of claim 2, further comprising the steps: h) using said sequenced initial hits to generate an enhanced combinatorial consensus library; and i) screening said enhanced combinatorial consensus library to identify at least one improved hit.
  • 4. The method of claim 3, further comprising the step of sequencing said improved hits.
  • 5. The method of claim 3, wherein said improved hits are stabilized variants of said starting gene.
  • 6. The method of claim 3, wherein said improved hits comprise performance-enhancing mutations.
  • 7. The method of claim 1, wherein said screening comprises determining the stability of said initial hit in at least one assay selected from the group consisting of protease resistance assays, thermostability assays, denaturation assays, and functional assays.
  • 8. The method of claim 1, further comprising the step of analyzing the correlation between sequence and stability of said at least two initial hits.
  • 9. The method of claim 3, further comprising the step of analyzing the correlation between sequence and stability of said at least two sequenced improved hits.
  • 10. The method of claim 1, wherein said multiple sequence alignment identifies amino acids that occur frequently in said homologs but are not part of a consensus sequence.
  • 11. The method of claim 2, wherein said steps are repeated at least once.
  • 12. The method of claim 3, wherein said steps are repeated at least once.
  • 13. A sequence improved hit produced according to the method of claim 3.
  • 14. A sequence improved hit produced according to the method of claim 2.
  • 15. A combinatorial consensus mutagenesis library produced according to the method of claim 1.
  • 16. A stabilized variant of beta-lactamase, wherein said stabilized variant comprises at least one amino acid change selected from the group consisting of V11I, V251I, R91K, Q95E, A153S, N232R, S247T, V293L, V294I, T342K, I262V, and V284I.
  • 17. A stabilized variant of carcinoembryonic antigen binder, wherein said stabilized variant comprises at least one amino acid change selected from the group consisting of K3Q, L37V, E42G, E136Q, M146V, F170Y, A194D, and A234G.
  • 18. A stabilized single chain fragment variable region (scFV), wherein said stabilized scFV variant comprises at least one amino acid change selected from the group consisting of K3Q, L37V, E42G, E136Q, M146V, F170Y, A194D, and A234G.