The invention relates to a method to generate rational libraries comprising genetic elements which are involved in transcriptional and/or translational regulation of a gene and devised to increase the production yield of the encoded protein, as well as to the rational library and to the application of said rational library.
The number of recombinant proteins used for therapeutic and research purposes has increased substantially during the last decades, and the demand is expected to further increase dramatically in the future. The most commonly commercially produced proteins are growth factors, antibodies, enzymes, hormones and blood factors. Monoclonal antibody (MAb) therapeutics are set to play a significant role in the treatment of a wide number of diseases with cancer, arthritis, immune and inflammatory diseases as the main focus. Today MAb production shows the strongest growth area of therapeutic proteins.
Recombinant proteins can be produced in various cell expression systems, each having its advantages and drawbacks. Bacterial systems have the advantages of easy handling, rapid growth and high-yield protein production at relatively low-costs, but lack the post-translational modification (PTM) machinery found in eukaryotes. For production of more complex molecules, such as glycosylated proteins more advanced cell systems are required, such as eukaryotic systems.
Today mammalian cell systems provide the best opportunity to produce safe and fully biologically active glycosylated proteins, despite their bottleneck of high costs, complicated process technology, and the potential risk of carrying animal viruses.
The classical manufacturing process to generate a stable cell line producing the recombinant protein requires that an expression vector containing the gene of interest together with a selection marker gene must be generated; this vector is then introduced into the cells which are selected for the presence of the marker protein. Single cells surviving selection are expanded to clonal cell lines which are screened for high-yield recombinant protein production. A cell line suitable for industrial production requires three key criteria: a high growth rate, high specific productivity regarding the recombinant protein, and the ability of maintaining high titers over an extended period of time.
In order to accomplish yields up to the biological limit, there is still room for drastic improvements of mammalian cell systems, e.g. by genetic engineering of expression vectors, such that the recombinant protein in question is efficiently synthesised and secreted in larger quantities than currently possible using available technology.
Virtually all recombinant proteins that are in current use are naturally secreted. Such proteins are initially synthesised with a small N-terminal peptide the signal peptide (SP). The SP is of importance in the process of co-translational translocation of the growing polypeptide chain whereby the molecule is effectively transferred into the lumen of the endoplasmic reticulum (ER). Having achieved this, the SP is cleaved off the growing polypeptide chain by the enzyme signal peptidase. The synthesised protein is then transported through the ER to the Golgi apparatus where appropriate PTM occurs. An important PTM is glycosylation, a necessary prerequisite for the expression of biological activity of many secreted proteins e.g. MAbs.
A very important genetic element that has to be present in a vector to be used for the expression of a gene coding for a secreted protein is thus the signal sequence (SS)—the stretch of nucleotides that codes for the SP. There has hitherto been little attention addressed to this sequence with respect to enhanced protein production. It has recently become evident to us that the actual choice of SP has a large impact on the level of synthesis/secretion of the protein of choice (Knappskog et al. 2007; Stern et al. 2007; Tröβe et al. 2007). Importantly, there appears to be no “universal SP” that can be used for all recombinant proteins (Stern et al. 2007).
The “individual” requirement of each protein to be produced can be extended to include other genetic regulatory elements in a vector involved in transcription and translation, such as 5′ and 3′ untranslated regions (5′UTR and 3′UTR), intron and promoter. To address this issue a novel, intelligent approach is needed. One efficient way is to create rational genetic libraries containing one or more of the genetic elements carrying mutations in one or more pre-defined positions. Generating genetic libraries where all positions of an element are randomised is not a viable approach, since the number of variants will be too huge to allow for identification of the best ones. For example, a total random 20 amino acids (AAs) long SP would give approximately 1026 possible variants at the AA level and 1036 at the DNA level.
The invention relates to a new method to design a rational library containing genetic elements, wherein said library may be used to screen for clones that result in an increased expression of a gene of interest, wherein the increase is due to that one or more nucleotides at specific positions have been randomised in one of the genetic elements involved in transcriptional and/or translational regulation of a gene. The genetic library is developed using a strategy which combines the selection of specific positions within any of these elements with randomisation of nucleotides at these predetermined positions.
By such a strategy and subsequent transfection of a host cell line with this library a cell pool comprising a large, though limited number of clones will be generated. This will provide the opportunity of finding clones that produce the encoded recombinant protein of interest at much higher levels than would be otherwise obtained.
By the newly invented method in which specific nucleotide positions are pre-defined as being of interest for randomisation, a library approach becomes feasible. By identifying and using specific “high-impact positions”, the number of different variants will be greatly reduced as compared to employing a library generated by complete genetic element randomisation, though still ensuring that the best variants are included. Therefore the probability of finding the best performing variant in a screening procedure is much high than when having to find it in a much bigger background where complete screening would be impractical.
In a first aspect the invention relates to a method to generate rational libraries comprising genetic elements involved in transcriptional and/or translational regulation of a gene and devised to increase the production yield of the encoded protein, comprising the steps of: providing a genetic element to be optimised for expression capacity and defining at most 18 nucleotide residues, either non-coding or coding for at most 6 amino acid residues, at specific positions in said genetic element to be randomised, amplifying said genetic element, said genetic element being part of a double stranded DNA plasmid being a preliminary vector or a final vector and subjecting said genetic element to randomisation and generating a pool of genetic element variants, amplifying said pool of genetic element variants being part either of a preliminary vector, thus generating a pre-made library or being part of a final vector, thus generating a final library, or introducing said pool of genetic element variants being part of a preliminary vector into a recipient vector in a seamless manner, thus generating a final library, transforming said final library into eukaryotic cells and obtaining a eukaryotic cell pool containing a rational library comprising up to 418≈6.9×1010 different vector variants.
By such a new rational library approach it will for the first time enable one to identify among many clones containing different genetic element variants, wherein one genetic element has been subjected to randomisation in at most 18 nucleotide positions, the best-performing clone that produces the encoded recombinant protein of interest to a substantially higher level than that mediated by the original non-modified nucleotide sequence, with a very high likelihood. Thus the approach is both efficient and guarantees a high success rate.
In a second aspect the invention relates to a method to identify a clonal cell line within a cell pool, harbouring a vector variant where said clonal cell line produces a protein of interest at the highest amount, comprising the steps of generating the genetic element variants in a vector containing the gene encoding the protein of interest or incorporating said genetic element variants from a pre-made library into a vector containing the gene encoding the protein of interest or incorporating the gene encoding the protein of interest into a pre-made library according to what is described in the application, screening for the cell clone that produces the protein of interest to the highest level and obtaining a clonal cell line from the cells transfected with the rational library, giving rise to the highest level of production of the encoded protein.
Such a method to identify a clonal cell line that produces the encoded recombinant protein of interest to higher levels compared to a cell line not having been transfected with a vector exposed to specific nucleotide randomisation within its genetic elements, will result in identifying such a clone with a very high level of probability and thereby the production of biologics, biosimilars, industrial proteins proteins for research or any other protein of interest can be significantly increased.
In a third aspect the invention relates to a rational library based on a vector containing different genetic elements which have been seamlessly cloned, said rational library containing up to 7×1010 different vector variants wherein each variant contains at most 18 randomised nucleotides, either non-coding or coding for at most 6 amino acid residues, at specific positions one of the genetic elements and wherein each vector variant mediates a different expression level of the encoded protein of interest as compared to the non-modified vector.
In a final aspect the invention relates to the use of the methods as well as the rational library for the increased production of recombinant proteins in a eukaryotic cell.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by those ordinarily skilled in the art to which the invention belongs.
In the context of the present application and invention the following definitions apply:
The terms “modified/mutated” are used interchangeably within the text.
The term “protein of interest” is intended to mean any protein encoded by one or more genes, and of which there is a need for obtaining an increased quantity for specific purposes and which is to be produced in a recombinant manner by cultivated eukaryotic cells.
The term “coding sequence (cds)” is intended to mean a nucleotide sequence which begins with a start codon or the codon encoding the first amino acid of a mature protein and ends with a stop codon.
The terms “signal peptide (SP)” and “signal sequence (SS)” are intended to mean an N-terminal polypeptide targeting a protein for translocation across the endoplasmic reticulum membrane in eukaryotic cells and cleaved off during the translocation process, and the nucleotide sequence which codes for this polypeptide, respectively. Signal peptides may also be called targeting or localisation signals, signal or leader sequences or transit or leader peptides in the literature.
The term “5′ untranslated region (5′UTR)” is intended to mean the nucleotide sequence in a mature mRNA located immediately upstream of any cds and not translated into protein. It extends from the transcription initiation site to just before the beginning of a cds.
The term “3′ untranslated region (3′UTR)” is intended to mean the nucleotide sequence in a mature mRNA located immediately downstream of any cds and not translated into protein. It extends from the first nucleotide after the stop codon of any cds to just before the poly(A) tail of the mRNA.
The term “genetic element(s)” is intended to mean an mRNA element as well as any other nucleotide sequence involved in transcriptional and/or translational regulation of a gene, including but not limited to SS, 5′UTR, 3′UTR, enhancer, promoter, intron, polyadenylation signal and chromatin control elements such as MAR, UCOE and STAR, and any derivatives thereof.
The term “genetic element variant(s)” is intended to mean any genetic element which differs from the parental genetic element by one or more nucleotides and which has been generated by randomisation of said parental genetic element by using a mutagenic primer.
The term “randomised genetic element(s)” is intended to mean a pool of genetic elements being derived from one genetic element subjected to nucleotide randomisation at specific position(s).
The term “seamless cloning” is intended to mean a cloning method, such as PCR-based cloning, that results in the exact assembly of different genetic elements without incorporation of any linker DNA sequences (e.g. restriction sites) at the junctions between the different elements.
The term “secretion cassette” is intended to mean a nucleotide sequence containing the cds(s) of a protein of interest as well as at least the following genetic elements: a specific promoter, a specific 5′UTR, a specific SS, a specific 3′UTR and a specific polyadenylation signal.
The term “vector” is intended to mean a nucleotide sequence, usually being a circular double stranded DNA, having the ability to multiply independently of chromosomal DNA into numbers of copies in a host cell and may also integrate into the genome of the host cell. Furthermore, the vector is stably maintained and propagated in the host by making use of a selectable marker encoded by the vector. The vector may be a bacteriophage, a plasmid, a phagemid, an episomal vector, a viral vector, a plant transformation vector, an insect vector, or a yeast artificial chromosome.
The term “preliminary vector” is intended to mean a vector containing a specific genetic element, the nucleotide sequence of which is to be randomised at specific pre-defined positions, and equipped with special restriction enzyme recognition sites enabling the exact excision of said genetic element and its seamless insertion into any recipient vector containing the cds(s) of a protein of interest.
The term “final vector” is intended to mean a vector containing a specific genetic element(s), the nucleotide sequence(s) of which is/are to be randomised at specific pre-defined positions, and the cds(s) of a protein(s) of interest.
The term “recipient vector” is intended to mean any vector into which a DNA fragment is inserted by recombinant DNA technology.
The term “clonal cell line” is intended to mean the derivation of a cell line arising from a single cell.
The term “rational” is intended to mean based on reasoning and is used in this invention with respect to limiting random mutagenesis to specific positions within a genetic element according to practical experience and/or theoretical considerations.
The term “random” is intended to mean a process without order and is used in this invention with respect to the insertion of any of the four nucleotides at a specific position within a genetic element in an unbiased manner and with respect to selecting a certain number of samples from a pool in an unbiased manner.
The term “mutagenic primer” is intended to mean a synthetic oligonucleotide containing either a specific nucleotide(s) or any of the four nucleotides introduced at a defined position(s) and in this invention designed to cause incorporation of a mutation(s) at a specific position(s) in a genetic element.
The term “library” is intended to mean a pool of vector variants containing all variants of a specific genetic element generated by randomisation of its nucleotide sequence at specific pre-defined positions.
The term “pre-made library” is intended to mean a library generated with a preliminary vector.
The term “final library” is intended to mean a library generated with a final vector.
The terms “tailored” and “tailor-made” are intended to mean adjusted to specific needs and are used in this invention to describe libraries which are particularly efficient with respect to mediating increased production yields of specific proteins/protein classes.
The invention relates to the experience the inventors have from the observation that AAs at three specific positions (4, 5 and 13) in a chosen SP (derived from the Oik1 gene in the marine organism Oikopleura dioica) proved to be of particular importance in determining the efficiency by which the SP operates. A series of Oik1 SP mutants in AA were generated, fused to the Gaussia princeps luciferase CDS (reporter gene) and transfected into CHO cells. Large differences in luciferase activity were observed, ranging from 0 to almost 250% with respect to the activity achieved with pOik1 wild type SP. Exchange of leucine (position 4), serine (position 5), histidine (position 13) in the wild-type Oik1 SP with arginine, glycine, valine or arginine, threonine, tryptophan or lysine, valine, alanine in the respective positions led to decreased amounts of product, whereas the combinations valine, leucine, leucine or serine, phenylalanine, leucine resulted in increased amounts.
Identifying critical positions in any genetic element is a major challenge. Based on the inventor's knowledge gathered from a broad series of experimental data, visualisation of the SP patterns using hydropathy plots (Stern et al. 2007) as well as using bioinformatic tools we are now to a certain extent able to target such positions. Although the hallmark of an SP is a core hydrophobic region, we found that its hydropathic score is not a valid measure for the prediction of SP efficiency. Therefore it is imperative to achieve the highest diversity possible with respect to the different residues in each position of the element chosen to be randomised. To generate such high-quality libraries containing all variants without any bias for certain residue combinations and their application is the basis of our approach described here. Its most important aspect is the fact that it will enable “tailored” solutions. Defining the best elements with the best residue combinations for any protein to be produced goes far beyond the currently available “one-for-all” solutions where in a given production platform the same vector construct, though optimised for high expression, is applied to all proteins of interest. Only with such an approach can one realistically hope to reach the biological limit of production of a recombinant protein in a given expression system.
In one embodiment the invention relates to a method to generate rational libraries comprising genetic elements which are involved in the expression of a gene and devised to increase the production yield of the encoded protein, comprising the steps of
Said genetic element may be selected from the group consisting of SS, 5′UTR, 3′UTR, enhancer, promoter, intron, polyadenylation signal and chromatin control elements or other genetic elements that might be involved in transcriptional and/or translational regulation of the encoded protein or in mRNA stability, wherein the genetic element is randomised at the level of the nucleotide sequence which in cases where the genetic element is a cds, will give rise to a randomisation at the AA level. Examples of chromatin control elements are selected from the group consisting of MAR, UCOE and STAR.
The method with which a rational library is created to randomise 6, 7, 8, 9, 10, 11 or 12 nucleotides, either non-coding or coding for 2, 3 or 4 AA residues, thus comprising from 46=4096 up to 412≈1.7×107 different vectors.
In a second embodiment said genetic element is a SS.
SPs show a remarkable level of divergence in AA composition, in fact the only unifying property shared by all SPs seems to be a stretch of at least 6 hydrophobic residues. The tolerability of divergent AA compositions is illustrated by the observation that up to 20% of all random 20-residue sequences can function as secretion signals in yeast. Despite the absence of a consensus sequence and a defined length, three distinct regions can be recognised in most SPs: First comes an amino-terminal 2-5 residue long positively charged region (n-region), followed by a 7-15 residue long hydrophobic core (h-region) and finally a 3-7 residue long polar carboxy-terminal region (c-region) containing the cleavage site recognised by a membrane bound signal peptidase. Positions −1 and −3, with respect to the cleavage site (0), are particularly important for specifying the cleavage site.
It was recognised early on that not all SPs are functionally equivalent (Knappskog et al. 2007, Stern et al. 2007, Tröβe et al. 2007, Zhang et al. 2005). Many have reported that increased hydrophobicity is associated with enhanced translocation efficiency of SPs into the ER lumen. However, an upper limit for total hydrophobicity of SPs in mammalian cells may exist; one group created mutants with different degrees of hydrophobicity, and surprisingly the most hydrophobic SP was significantly less efficient in mediating translocation than less hydrophobic counterparts. Biophysical studies of SPs have demonstrated that functional SPs show a clear tendency toward stable α-helix formation in the hydrophobic core, to have high affinities for lipids (Jones et al. 1990). Position-dependent effects on hydrophobic core modifications on both translocation efficiency and SP cleavage indicate that the h-region has important structural properties in addition to hydrophobicity alone (Cioffi et al. 1989). Systematic introduction of α-helix breaking prolines in the h-region showed position-dependent inhibitory effects on glycoprotein C translocation, indicating functional asymmetry in the hydrophobic core of the SP (Ryan and Edwards 1995).
The charge of the amino-terminal basic region also has been shown to have an effect on SP efficiency. An SP with marginal hydrophobicity in the h-region depends on a sufficient positive charge at the n-region for translocation to occur (Rusch et al. 2002). This dependence diminished when the stretch of hydrophobic residues was increased, indicating that the requirement of positive charge can be compensated for by a longer hydrophobic core (Hikita and Mizushima 1992). Separating the positive charged n-region from the h-region with more than four AAs abolished SP function, indicating that the positioning of these elements is crucial for promoting protein transport into the ER (Rusch et al. 2002). Introduction of negative residues in the n-region has a negative impact on translocation efficiency (Szczesna-Skorupa and Kemper 1989; Izard et al. 1996). The impact of one negative residue in the n-region can be rescued by a highly hydrophobic h-region, but if as many as 3 negative residues are incorporated then the positive effect of this core will be severely affected (Szczesna-Skorupa and Kemper 1989). Growing evidence indicates that SPs contain more information than just simple “tags” for targeting to the ER lumen (Hegde and Bernstein 1998; Martoglio and Dobberstein 1998).
Said SS may be selected from the group consisting of SSs from human, rodent, Gaussia princeps, Metridia longa and Oikopleura dioica, and mutants derived thereof. Examples of nucleotide sequences that contain SSs are SEQ ID NO:1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 or 12. Specific examples include SEQ ID NO:1, 2, 3, 4, 5 or 6. The SS may be derived from Gaussia princeps.
One example is the method as described above with which a rational library is created to randomise 9 or 12 nucleotides coding for 3 or 4 AAs, thus randomising 3 or 4 AAs.
The rational library may be a pre-made library equipped with restriction enzyme recognition sites enabling the seamless insertion of the randomised genetic element within said pre-made library into any recipient vector encoding a protein of interest thus generating a final library or the insertion of any cds of a protein of interest into the pre-made library and thus generating a final library. The restriction enzyme recognition sites to be used may vary depending on the sequence of the recipient vector or cds, respectively. For the insertion of the randomised genetic element into a recipient vector the sites must be present only once in the recipient vector and for the insertion of a cds into the pre-made library they must not cut the cds. To choose such sites is well-known for a person skilled in the art.
The vector used in the library may be an episomal vector suitable due to the fact that it replicates extrachromosomally rather than by integrating into the genome of the host cell. The production level of the recombinant protein thus would not be affected by the integration site which could “camouflage” the effect of the individual vector variants in the library by promoting increased or decreased levels of transcription depending on the status of the chromatin. The invented library may be generated directly in an episomal vector, or be moved into an episomal vector before being transfected into a cell line such as a mammalian cell line.
The eukaryotic cells into which the libraries are transferred may be selected from the group consisting of cells derived from animal, plant, fungi and yeast systems, such that the animal cell may be a mammalian or insect cell. Eukaryotic cells have the ability to perform glycosylation. However, different eukaryotic cells may give rise to different glycosylation patterns and some of the eukaryotic cells may give rise to a pattern that is different to that of the native protein to be produced. In that case, for that particular protein, another eukaryotic cell line would have to be used. Examples of different eukaryotic cells that may be used in the invented method are murine lymphoid cell lines, baby hamster kidney cell lines, human embryo kidney cell lines, human retina-derived cell lines, Chinese hamster ovary cell lines. Other examples are that said mammalian cell is selected from the group consisting of primate-, monkey- and rodent-derived cells. Other examples are that said primate cell is of Homo sapiens or Pan troglodytes origin, said monkey cell is of Cercopithecus aethiops origin, and said rodent cell is of Cricetulus griseus, Mesocricetus auratus, Rattus norvegicus, Oryctolagus cuniculus or Mus musculus origin. Other examples are that said mammalian cell belongs to any of the cell line families CHO, SP2/0, NS0, 293, myeloma, NOS, COS, BHK, HeLa and PER.C6, and derivatives thereof.
In one embodiment the genetic element variants may be generated either by gene synthesis where random nucleotides are incorporated at specific positions or by Thermal Cycling utilising a mutagenic primer. The mutagenic primer used in the Thermal Cycling reaction comprises all randomised nucleotides at all specified positions and has a length between 60 and 100 nucleotides, a total TM from 70 to 85° C. and similar TMs with values from 55 to 70° C. at both non-mutated ends flanking the mutated region.
In a further embodiment the invention relates to a method to identify a clonal cell line harbouring a vector variant where said clonal cell line produces the highest amount of the protein of interest, comprising the steps of
The method may include a step, wherein said screening is performed by flow cytometry and/or cell sorting.
In another embodiment the invention relates to a rational library based on a vector containing different genetic elements which have been seamlessly cloned, said rational library containing up to 7×1010 different vector variants wherein each variant contains at most 18 changed nucleotides, either non-coding or coding for at most 6 amino acid residues, at specific positions in one of the genetic elements and wherein each vector variant mediates a different expression level of the encoded protein of interest as compared to the non-modified vector. Said rational library may be obtained by the method disclosed above as well as using the steps disclosed in the examples below. The rational library may contain vectors as defined above, wherein said vectors may contain SS, 5′UTR, 3′UTR, enhancer, promoter, such as the human cytomegalovirus major immediate-early promoter/enhancer (hCMV promoter), intron, polyadenylation signal and chromatin control elements such as MAR, UCOE and STAR. The vectors may also contain origin of replication, restriction enzyme recognition sites as well as one or more selection marker genes. The vector may also contain one or more genes to be expressed by said vector. Said gene of interest may be cloned into said vector in a manner well-known for a person skilled in the art, such as by the use of a suitable method disclosed in the well-known manuals Sambrook J et al. (Molecular Cloning A Laboratory Manual (Third Edition), Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y., 2001) and Ausubel F M et al. (Current Protocols in Molecular Biology, Wiley InterScience, 2010). The vector may then be introduced into a eukaryotic host cell line and the cells containing the vector may be selected by cultivating the cells in a medium containing a selection agent, such as hygromycin B phosphotransferase, or puromycin, depending on the particular selection marker present in the vector used.
In a final embodiment the invention relates to the use of the method as described above for the increased production of recombinant proteins in a eukaryotic cell. The rational libraries disclosed above may be used for but is not limited to the use for biologics, biosimilars, industrial proteins and proteins for research.
The production level of the proteins of interest may be determined by the use of an enzyme-linked immunosorbent assay (ELISA), a bioluminescence assay, Western blot analysis, Protein A HPLC, or by any other suitable method as disclosed e.g. in the above mentioned manuals by Sambrook et al. and Ausubel et al.
The following examples are intended to illustrate but not to limit the invention in any manner, shape, or form, either explicitly or implicitly.
The chemicals, enzymes, media and solutions used for the creation, verification and application of the libraries are commonly used and well known for a person skilled in the art of molecular and cell biology; they are available from a number of companies including Amersham, Invitrogen, Stratagene, Sigma, Merck, Fluka, Medicago, Promega, Fermentas and Qiagen, many of them being provided in kits.
Unless indicated otherwise, the methods used in this invention including Polymerase Chain Reaction (PCR), restriction enzyme cloning, DNA purification, bacterial and eukaryotic cell cultivation, transformation, transfection, Western blotting and Enzyme-Linked Immuno Sorbent Assay (ELISA), were performed in a standard manner well known for a person skilled in the art of molecular and cell biology, and such as described in the following manuals: Sambrook J et al. (Molecular Cloning A Laboratory Manual (Third Edition), Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y., 2001) and Ausubel F M et al. (Current Protocols in Molecular Biology, Wiley InterScience, 2010).
It has been shown previously that the choice of SP is crucial when constructing vectors with the aim of increasing the production level of a recombinant protein. Gaussia princeps luciferase SP (Gluc SP) has proved to be a far better SP than a whole series of other SPs tested in an expression system (transfected CHO cells) using Gaussia luciferase as a reporter protein (Knappskog et al. 2007; Stern et al. 2007).
Among the SPs tested were seven derived from the marine organism Oikopleura dioica. Only one of these, namely the SP of the oikosin 1 protein (Oik1 SP) gave substantial activity, amounting to 45% of that generated when the Gluc SP was used (Tröβe et al. 2007). From hydropathy plots of the two SPs (
The template for all plasmids used in a pilot study was a derivative of pTRE2hyg (Clontech) containing a secretion cassette composed of the 5′UTR, the cds and the 3′UTR of Gaussia luciferase cDNA (GenBank accession no. AY015993) and the Oik1 SP cds (SEQ ID NO:9) immediately preceding the luciferase cds. For mutagenesis of the Oik1 SP at positions 4, 5 and 13 the QuikChange Site-Directed Mutagenesis Kit from Stratagene was used (according to the manufacturer's recommendations) and synthetic oligonucleotide primers containing the desired mutations.
Twenty constructs encoding Oik1 SP mutants were generated (termed p - - - , with the “-” sign depicting AA positions 4, 5 or 13 in the Oik1 SP and being replaced by the single-letter code for AAs if the AA differs from that in the wild-type Oik1 SP) and the constructs as well as the reference construct encoding the wild-type Oik1 SP (termed pOik1) transiently transfected into CHO cells using the MATRa technology by IBA (according to the manufacturer's recommendations). Medium samples were collected after 30 h and the luciferase activity was measured in the medium following to the method described below. The results are shown in Table 1. In order to compare RLU values from different measurements, the value obtained with pOik1 has been set to 100% and the other values have been adapted accordingly.
It can be seen from Panel A in Table 1 that the combination of AAs in positions 4 and 5 had a major impact on the level of luciferase production. The plasmid encoding the Oik1 SP mutant where leucine was switched to arginine and serine to isoleucine (pRI-) was by far the most efficient of this series, giving rise to 125% more activity than plasmids pFS- and pVQ-, and almost 90% more activity than pOik1. The switch of AA in position 13 (histidine in the wild-type Oik1 SP) had an even greater impact on luciferase activity levels (Panel B). The plasmid encoding the Oik1 SP mutant with phenylalanine in this position (p - - - F) gave rise to more than 170% more luciferase than p - - - A and almost 150% more than pOik1. Mutations in all three positions, namely 4, 5 and 13 (Panel C), also resulted in large differences in the levels of luciferase produced. Three mutant plasmids gave lower levels of luciferase than pOik1 while the other two were very effective in producing luciferase.
From these results it is evident that the ability to predict which AA is suited for a specific position in the SP is virtually impossible. For example, the pKV-mutant (Table 1, Panel A) in the Oik1 SP contains the same AAs in positions 4 and 5 as is found in the Glue SP (see
An interesting correlation between the improvement in efficiency of the Oik1 wild type SP by mutation and the use of bioinformatics (SignalP 3.0 Server—http://www.cbs.dtu.dk/services/SignalP/) was demonstrated. As seen in Table 1, Panel B, mutation of histidine (wild type) in position 13 to phenylalanine causes a major increase in luciferase production. Comparison of the two S-score plots (
The results shown here demonstrate that the choice of making mutations in positions 4 and 5, and 13 of the Oik1 SP, based on a study of the hydropathy plots, was a correct strategy to provide a basis for improving the performance of the Oik1 wild type SP. It is envisioned that in the future it will be fruitful to couple the use of hydropathy plots with the bioinformatics approach in the identification of positions in SPs that can have a high impact on recombinant protein yields.
Luciferase activity in the medium sample was measured as the amount of photons released when the sample was mixed with coelentrazine (Promega) in a Chameleon multilabel counter (Hidex Oy). Two samples for every cell line transfected with a specific construct were removed from the −80° C. freezer, and thawed on ice. To find the optimal dilution, a dilution assay with Renilla buffer (Promega) was performed by measuring dilutions of the samples in the luminometer. When a linear area was found, a suitable dilution was chosen for the real measurements of the samples. Ten μl of the diluted samples were added to each of 2 wells in a 96-well plate placed on ice. The plate was put into the lunimometer and to each well was added 150 μl of standardized coelenterazine solution (A267≈0.400) by the dispenser. The Relative Light Units (RLUs) data obtained from the luminometer were corrected for dilutions made and for the number of cells present in the well the sample was taken from (determined with the Nucleocounter from Chemometec).
a)For the plasmid encoding the wild-type Oik1 SP, pOik1, the AAs at positions 4 and 5 and/or position 13 are given in brackets in consecutive order. The same order is included in the name of the plasmids encoding the mutant Oik1 SPs, p---, with the amino acid(s) differing from the wild-type Oik1 AA(s) specified.
b)Luciferase activity measured in the medium obtained from 3 independent transfections, given in percent with respect to the value obtained with pOik1 set to 100%.
The model protein chosen is a human IgG light chain (LC) derived from GenBank accession no. AB064226. Its cDNA cds without native SS was at the 5′ end fused either to the codon optimised SS from Gaussia princeps luciferase (SEQ ID NO:6) or the SS from a mouse IgG LC (SEQ ID NO:12) and at the 3′ end to the 3′UTR from Gaussia princeps luciferase. The respective SSs were at their 5′ends fused to the 5′UTR from Gaussia princeps luciferase, extended at its 5′end with the sequence 5′-ATTCAGACAACTGAATCCAAAAGGAAA-3′. The respective 5′UTR-SS-cds-3′UTR units were inserted between the human cytomegalovirus major immediate-early enhancer/promoter (derived from GenBank accession no. NC—006273) at the 5′ end and the rabbit beta-globin polyadenylation signal (derived from GenBank accession no. RABBGLOB) at the 3′ end. Assembly of the various sequences was performed by seamless cloning The method used is outlined in
The vector used is a derivative of pcDNA3.1(+) (Invitrogen). The respective expression cassettes equipped with appropriate restriction enzyme recognition sites at their 5′ and 3′ ends were inserted into the vector by restriction enzyme cloning
For randomisation the QuikChange Multi Site-Directed Mutagenesis Kit or the QuikChange Lightning Multi Site-Directed Mutagenesis Kit (Stratagene) were used. Although designed for the site-directed mutagenesis of plasmid DNA at different sites simultaneously and suitable for nucleotide randomisation, the following adaptations were required to make the kits suitable for library generation: (i) Since the positions within the SSs to be randomised were located rather close to each other, the incorporation of mutations had to be performed with one primer instead of several primers, as recommended by the manufacturer. In order to contain sufficiently long stretches with parental (non-mutated) sequences flanking the region with the mutations, the primers had to be longer than recommended (60 to 100 nucleotides instead of 25-45 nucleotides). They were designed such that their total TM was between 70 and 85° C. and the TMs of the flanking stretches between 55 and 70° C. and as similar as possible. (ii) The amount of QuikSolution (provided with the kit) in the Thermal Cycling reaction was increased from 3% to 4% and the reaction volume from 25 μl to 50 μl. (iii) The incubation time with DpnI was prolonged from 1 h to 6 h.
The positions chosen for mutagenesis in the codon optimised SS from Gaussia princeps luciferase (Gluc SS) and the SS from mouse IgG LC (LC SS), respectively, are shown in
Transformation of E. coli XL-10-Gold ultracompetent cells was performed by using 4 μl aliquots of DNA from the mutagenesis reaction for each transformation reaction. The whole volume of the transformation reaction was plated on Luria Bertani (LB) agar plates (diameter 15 cm) containing ampicillin (100 μg/ml) for plasmid selection, and the LB agar plates incubated o/n at 37° C. Subsequently, 10 ml LB medium was added to each LB agar plate and the colonies scraped off with a cell scraper and collected in two GSA bottles. Another 10 ml LB medium was added to each plate to rinse and collect any remaining bacteria on the plates. This colony mix was then directly subjected to plasmid DNA purification using the Qiagen Plasmid Mega Kit (Qiagen) according to the manufacturer's recommendation.
Prior to harvesting the library (see Example 2), a test transformation was performed. This was done in order to assess the transformation efficiency (measured as cfu (colony forming units) per μg pUC18 control plasmid), the number of colonies formed per transformation using 4 μl aliquots of the mutagenesis reaction, as well as the mutagenesis efficiency, i.e. the number of mutants obtained per number of transformants. To assess the latter, approx. 50 single colonies were randomly selected, inoculated and incubated, the cultures were then subjected to plasmid DNA purification and the DNA sequence in the region of the SS was determined. When the parameters were satisfactory, i.e. high transformation and mutagenesis efficiencies and colony number achieved, the libraries were harvested.
The quality of a library is determined by two criteria, namely size and diversity. A library generated based on 9 randomised nucleotides (LC SS) or 12 randomised nucleotides (Glue SS), respectively (see
According to these numbers, with the Glue SS and the LC SS libraries a completeness of approx. 10% was achieved by adjusting the following parameters:
In order to evaluate the potential of the library approach and establish proof-of-concept two SS/SP libraries were constructed, both containing the human IgG LC cds to which either the Gluc SP (Library 1), or the LC SP (Library 2) was fused (for details see Example 2). Using the approach described in Example 1, in Gluc SP positions 2, 3, 4 and 11 were identified as key positions, and in LC SP positions 4, 5 and 9 were identified as key positions. The libraries were generated accordingly. CHO cells transiently transfected with randomly selected mutants were grown and medium samples were collected 30 h after transfection. Three parallel experiments were performed and the IgG LC levels in the medium were measured either by ELISA and Western blot. The results obtained from the two methods were similar; in
From Library 1 a total of 27 distinct variants of the Gluc SPs were tested, and from Library 2 a total of 35 distinct variants of the LC SPs were examined. None of them contained stop codons or frame-shift mutations. The mutant AA sequences together with the N-terminal sequence of the LC protein were examined by using the SignalP server (http://www.cbs.dtu.dk/services/SignalP/) to ensure that they would function as SPs and also to predict the cleavage site.
In Library 1 a great variation in the expression levels of the mutants was seen, ranging from 0% to 311% compared to the non-modified Gluc SP (
From these observations demonstrating the high degree of variation seen in the small subset of mutants analysed from the two libraries it is evident that such a library approach to improve protein production has a tremendous potential. The SP mutants had a major influence on the secretion levels of the human IgG LC protein. The variation in levels obtained was extremely marked. It was of interest to note that the mutant that gave the highest yield of LC was derived from Gluc SP and was not among the LC SP mutants. It is thus evident that the most effective SP for a given protein need not necessarily have to be derived from the same class of proteins. The results also clearly demonstrate the level of variability among individual SP mutants, showing that mutations of single AAs at pre-chosen sites based on theoretical considerations (see Example 1) have a major impact on yield. It will, though, seemingly be impossible to predict a “best-performing” SP based on AA sequence data alone.
The outcome of this study has three important aspects:
In order to normalise the IgG LC levels in each medium sample for varying transfection efficiencies, co-transfection with 0.5 μg Firefly luciferase encoding plasmid (together with 2 μg of the LC encoding plasmid to be analysed) was performed.
Extracts from the cells were thawed on ice and measured for Firefly luciferase activity using the Luciferase Assay from Promega. First a dilution series of one of the samples was performed in Renilla buffer to determine the linear range of light detection. Then all samples to be analysed were diluted to an appropriate concentration. The luciferase substrate was prepared according to the manufacturers recommendations. Twenty μl of each sample was loaded onto a 96-well plate and the plate then placed into a Chameleon multilabel counter (Hidex Oy). At room temperature 100 μl luciferase assay substrate was stepwise added to each well and the RLUs measured.
For normalisation of the ELISA absorbance values obtained for IgG LC, standard curves for both ELISA and the Firefly assay were made and the A450 values divided by the RLU values.
Several steps during library generation have been optimised (see Example 3) and it will therefore be a relatively straightforward task to obtain rational libraries with maximal size and diversity. To reach 100% completeness will mainly be a matter of increasing the number of transformations performed in parallel. On the other hand, the quality of any de novo created library may vary and verification of size and diversity is time consuming. A “standardisation” would therefore be desirable, where pre-made, quality checked libraries, based on a variety of specific genetic elements randomised at critical positions, could be taken off-the-shelf and applied to any protein of interest. Such libraries, when being re-used repeatedly for various proteins, would, in addition, contribute to the understanding of which proteins or protein classes would benefit most from which type of library. This could considerably speed up and broaden the process of being able to generate “tailor-made libraries” for specific proteins/protein classes in the future.
The challenge of establishing this concept is to devise a cloning strategy for the pre-made libraries. The randomised genetic element has to be seamlessly fused with adjacent elements (i.e. the randomised SS with the 5′UTR and protein cds on either side, respectively), which cannot be performed using a PCR-based cloning approach, such as the one outlined in
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/IB2010/053648 | 8/12/2010 | WO | 00 | 2/7/2012 |
Number | Date | Country | |
---|---|---|---|
61233294 | Aug 2009 | US |