This application includes as part of the originally filed subject matter two compact discs, labeled “Copy 1” and “Copy 2,” each disc containing a Sequence Listing. The machine format of each compact disc is IBM-PC and the operating system of each compact disc is MS-Windows. Each of the compact discs includes a single text file, which is named “WYE-060.ST25.txt” (1,423 KB, created Apr. 20, 2007). The contents of the compact discs labeled “Copy 1” and “Copy 2” are hereby incorporated by reference herein in their entireties.
The present invention relates to methods for identifying genes and proteins that are involved in conferring a particular cell phenotype by differential expression profiling analysis and the use of the genes and proteins in the optimization of cell line culture conditions and transgene expression.
Fundamental to the present-day study of biology is the ability to optimally culture and maintain cell lines. Cell lines not only provide an in vitro model for the study of biological systems and diseases, but are also used to produce organic reagents. Of particular importance is the use of genetically engineered prokaryotic or eukaryotic cell lines to generate mass quantities of recombinant proteins. A recombinant protein may be used in a biological study, or as a therapeutic compound for treating a particular ailment or disease.
The production of recombinant proteins for biopharmaceutical application typically requires vast numbers of cells and/or particular cell culture conditions that influence cell growth and/or expression. In some cases, production of recombinant proteins benefits from the introduction of chemical inducing agents (such as sodium butyrate or valeric acid) to the cell culture medium. Identifying the genes and related genetic pathways that respond to the culture conditions (or particular agents) that increase transgene expression may elucidate potential targets that can be manipulated to increase recombinant protein production and/or influence cell growth.
Research into optimizing recombinant protein production has been primarily devoted to examining gene regulation, cellular responses, cellular metabolism, and pathways activated in response to unfolded proteins. Currently, there is no available method that allows for the simultaneous monitoring of transgene expression and identification of the genetic pathways involved in transgene expression. For example, currently available methods for detecting transgene expression include those that measure only the presence and amount of known proteins (e.g., Western blot analysis, enzyme-linked immunosorbent assay, and fluorescence-activated cell sorting), or the presence and amount of known messenger RNA (mRNA) transcripts (e.g., Northern blot analysis and reverse transcription-polymerase chain reaction). These and similar methods are not only limited in the number of known proteins and/or mRNA transcripts that can be detected at one time, but they also require that the investigator know or “guess” what genes are involved in transgene expression prior to experimentation (so that the appropriate antibodies or oligonucleotide probes are used). Another limitation inherent in blot analyses and similar protocols is that proteins or mRNA that are the same size cannot be distinguished. Considering the vast number of genes contained within a single genome, identification of even a minority of genes involved in a genetic pathway using the methods described above is costly and time-consuming. Additionally, the requirement that the investigator have some idea regarding which genes are involved does not allow for the identification of genes and related pathways that were either previously undiscovered or unknown to be involved in the regulation of transgene expression.
Therefore, there is a need in the field of cell line engineering for a more systematic approach to identify genes and proteins (including previously undiscovered genes and proteins) and related genetic pathways that are involved (directly or indirectly) with a particular cell culture phenotype, e.g., increased and efficient transgene expression. Discovery of these genes and/or related pathways will provide new targets that can be manipulated to improve the yield and quality of recombinant proteins and influence cell growth.
The present invention solves these problems by providing differential expression profiling analysis of industrially relevant cell line phenotypes through the use of nucleic acid microarray and proteomics analysis methods. In particular, the present invention provides methods for systematically identifying genes and proteins and related pathways that maximize protein expression and secretion by expression profiling analysis. The present invention further provides methods for manipulating the identified genes and proteins to engineer improved cell lines.
Thus, in one aspect, the present invention features a method for identifying proteins regulating or indicative of a cell culture phenotype in a cell line. The method includes generating a protein expression profile of a sample derived from a test cell line; comparing the protein expression profile to a control profile derived from a control cell line; and identifying one or more differentially expressed proteins based on the comparison, wherein the test cell line has a cell culture phenotype distinct from that of the control cell line, and the one or more differentially expressed proteins are capable of regulating or indicating the cell culture phenotype. In a preferred embodiment, the cell line is a Chinese hamster ovary (CHO) cell line. In another embodiment, the protein expression profile is generated by fluorescent two-dimensional differential in-gel electrophoresis.
In some embodiments, the cell culture phenotype is a cell growth rate, a cellular productivity (such as a maximum cellular productivity or a sustained high cellular productivity), a peak cell density, a sustained cell viability, a rate of ammonia production or consumption, or a rate of lactate production or consumption. In one embodiment, the cell culture phenotype is a maximum cellular productivity. In another embodiment, the cell culture phenotype is a sustained cell viability. In yet another embodiment, the cell culture phenotype is a peak cell density. In still another embodiment, the cell culture phenotype is a cell growth rate.
The present invention provides a method for improving a cell line by modulating, i.e., up-regulating or down-regulating, one or more proteins identified according to the method described above. As used herein, “up-regulating” includes providing an exogenous nucleic acid (e.g., an over-expression construct) encoding a protein of interest or a variant retaining its activity (such as, for example, a mammalian homolog thereof, such as a primate or rodent homolog) or providing a factor or a molecule indirectly enhancing the protein or gene activity or expression level. As used herein, “down-regulating” includes knocking-out the gene encoding a protein of interest, providing an RNA interference construct, or providing an inhibitor or other factors indirectly inhibiting the protein or gene activity or expression level. In one particular embodiment, the present invention provides a method for improving a cell line by down-regulating one or more proteins identified according to the method described above by RNA interference.
In particular, the present invention provides a method for improving cellular productivity of a cell line including modulating, i.e., up-regulating or down-regulating, one or more proteins identified according to the method described above. In one embodiment, the present invention provides a method for improving cellular productivity of a cell line including modulating, i.e., up-regulating or down-regulating, one or more genes or proteins selected from Tables 2, 3, 9, 10, 11, and 12.
In one embodiment, the present invention provides a method for improving the cell growth rate of a cell line including modulating, i.e., up-regulating or down-regulating, one or more proteins identified according to the method described above. In particular, the present invention provides a method for improving the cell growth rate of a cell line including modulating, i.e., up-regulating or down-regulating, one or more genes or proteins selected from Tables 4, 5, 6, 13, 14, 27 and 28.
In another embodiment, the present invention provides a method for increasing the peak cell density of a cell line including modulating, i.e., up-regulating or down-regulating, one or more proteins identified according to the method described above. In particular, the present invention provides a method for increasing the peak cell density of a cell line including modulating, i.e., up-regulating or down-regulating, one or more genes or proteins selected from Tables 8, 15, 16, and 17.
In another embodiment, the present invention provides a method for increasing the sustained cell viability of a cell line including modulating, i.e., up-regulating or down-regulating, one or more proteins identified according to the method described above. In particular, the present invention provides a method for increasing the sustained cell viability of a cell line including modulating, i.e., up-regulating or down-regulating, one or more genes or proteins selected from Tables 7, 18 and 19.
In another embodiment, the present invention provides a method for regulating the lactate production or consumption of a cell line including modulating, i.e., up-regulating or down-regulating, one or more proteins identified according to the method described above. In particular, the present invention provides a method for regulating the lactate production or consumption of a cell line including modulating, i.e., up-regulating or down-regulating, one or more genes or proteins selected from Tables 7, 18 and 19.
In yet another embodiment, the present invention provides a method for improving a cell line by modulating, i.e., up-regulating or down-regulating, one or more genes or proteins identified according to the method described above. In particular, the present invention provides a method for improving a cell line by modulating, i.e., up-regulating or down-regulating, one or more genes or proteins selected from Tables 20, 24, 25 and 26.
In another aspect, the present invention provides a method for improving a cell line by modulating, i.e., up-regulating or down-regulating, at least two genes or proteins, wherein a first gene or protein affects a first cell culture phenotype and a second gene or protein affects a second, different cell culture phenotype, wherein the cell culture phenotypes are selected from the group consisting of a cell growth rate, a cellular productivity, a peak cell density, a sustained cell viability, a rate of ammonia production or consumption, or a rate of lactate production or consumption. In one embodiment, the method further including up-regulating or down-regulating a third gene or protein affecting a third cell culture phenotype different from the first and second cell culture phenotypes.
In yet another aspect, the present invention provides a method of assessing a cell culture phenotype of a cell line. The method including detecting, in a sample from the cell culture, an expression level of a protein identified according to any of the methods described above; and comparing the expression level to a reference level, wherein the comparison is indicative of the cell culture phenotype.
Alternatively, the present invention provides a method of assessing a cell culture phenotype of a cell line. The method including detecting, in a sample from the cell culture, one or more markers indicative of the cell culture phenotype, wherein the markers are selected from the group consisting of peptides selected from
In another aspect, the present invention provides an engineered cell line with an improved cell culture phenotype containing a population of engineered cells, each of which comprises an engineered construct up-regulating or down-regulating one or more proteins identified according to various methods as described above. In particular, the present invention provides an engineered cell line with an improved cellular productivity containing a population of engineered cells, each of which comprises an engineered construct up-regulating or down-regulating one or more genes or proteins selected from Tables 2, 3, and 9 through 12. In some embodiments, the engineered construct is an over-expression construct. In other embodiments, the engineered construct is an interfering RNA construct.
In other embodiments, the present invention provides an engineered cell line with an improved cell growth rate including a population of engineered cells, each of which includes an engineered construct up-regulating or down-regulating one or more genes or proteins selected from Tables 4, 5, 6, 13, 14, 27 and 28. In some embodiments, the engineered construct is an over-expression construct. In other embodiments, the engineered construct is an interfering RNA construct.
In other embodiments, the present invention provides an engineered cell line with an improved peak cell density containing a population of engineered cells, each of which includes an engineered construct up-regulating or down-regulating one or more genes or proteins selected from Tables 8, 15, 16, and 17. In some embodiments, the engineered construct is an over-expression construct. In other embodiments, the engineered construct is an interfering RNA construct.
In other embodiments, the present invention provides an engineered cell line with an improved sustained cell viability containing a population of engineered cells, each of which comprising an engineered construct up-regulating or down-regulating one or more genes or proteins selected from Tables 18 and 26. In some embodiments, the engineered construct is an over-expression construct. In other embodiments, the engineered construct is an interfering RNA construct.
In other embodiments, the present invention provides an engineered cell line with regulated lactate production or consumption containing a population of engineered cells, each of which comprising an engineered construct up-regulating or down-regulating one or more genes or proteins selected from Tables 29 and 30. In some embodiments, the engineered construct is an over-expression construct. In other embodiments, the engineered construct is an interfering RNA construct.
In some embodiments, the present invention provides an improved cell line containing a population of engineered cells, each of which comprising an engineered construct up-regulating or down-regulating one or more genes or proteins selected from Table 20, 24, 25 and 26. In some embodiments, the engineered construct is an over-expression construct. In other embodiments, the engineered construct is an interfering RNA construct.
In yet another aspect, the invention provides a method for expression of a protein of interest using engineered cell lines as described above. The method includes the steps of introducing into an engineered cell line according to any one of the embodiments described above a nucleic acid encoding the protein of interest; and harvesting the protein of interest.
In still another aspect, the invention also provides isolated genes or proteins, or polynucleotides or polypeptides that are of previously undiscovered genes or proteins, and/or are involved with regulating or indicative of cell culture phenotypes of interest. In particular, the invention provides an isolated or recombinant nucleic acid containing a sequence selected from Tables 9, 13, and 15, complements thereof, and subsequences thereof. The present invention also provides an isolated or recombinant protein containing a sequence selected from Tables 2 and 3, or fragments thereof. The invention also provides genetically engineered expression vectors, host cells, and transgenic animals comprising the nucleic acid molecules or proteins of the invention. The invention additionally provides inhibitory polynucleotides, e.g., antisense and RNA interference (RNAi) molecules, to the nucleic acid molecules of the invention or the nucleic acid encoding the proteins of the invention.
Other features, objects, and advantages of the present invention are apparent in the detailed description that follows. It should be understood, however, that the detailed description, while indicating embodiments of the present invention, is given by way of illustration only, not limitation. Various changes and modifications within the scope of the invention will become apparent to those skilled in the art from the detailed description.
Exemplary results of over-expression of identified genes on cellular growth and productivity are illustrated in
The present invention provides systematic methods for identifying genes and proteins that influence cell culture phenotypes of interest. The methods of the invention are based on differential expression profiling analysis of industrially relevant cell culture phenotypes through integrated use of DNA microarray and proteomics analysis. Specifically, the method includes generating a gene or protein expression profile of a sample derived from a test cell line; comparing the gene or protein expression profile to a control profile derived from a control cell line which has a cell culture phenotype distinct from that of the test cell line; and identifying one or more differentially expressed genes or proteins based on the comparison. As used herein, the test cell line and the control cell line can be different cell lines with different genetic background or same cell line grown under different cell culture conditions.
The one or more differentially expressed genes or proteins are candidate genes or proteins that regulate or are indicative of the cell culture phenotype of interest. The identified genes and proteins can be further confirmed and validated. The identified genes or proteins may also be manipulated to improve the cell culture phenotype of interest. Therefore, the present invention represents a significant advance in cell engineering for rational designing of improved cell lines and cell culture conditions.
Various aspects of the invention are described in further detail in the following subsections. The use of subsections is not meant to limit the invention. Each subsection may apply to any aspect of the invention. In this application, the use of “or” means “and/or” unless stated otherwise.
Cell Lines and Cell Culture Phenotypes
The present invention contemplates differential expression profiling analysis and optimization of cell lines derived from a variety of organisms, including, but not limited to, bacteria, plants, fungi, and animals (the latter including, but not limited to, insects and mammals). For example, the present invention may be applied to Escherichia coli, Spodoptera frugiperda, Nicotiana sp., Zea mays, Lemna sp., Saccharomyces sp., Pichia sp., Schizosaccharomyces sp., mammalian cells, including, but not limited to, COS cells, CHO cells, 293 cells, A431 cells, 3T3 cells, CV-1 cells, HeLa cells, L cells, BHK21 cells, HL-60 cells, U937 cells, HEK cells, PerC6 cells, Jurkat cells, normal diploid cells, cell strains derived from in vitro culture of primary tissue, and primary explants. The list of organisms and cell lines are meant only to provide nonlimiting examples.
In particular, the present invention contemplates differential expression profiling analysis of industrially relevant cell lines, such as, for example, CHO cells. CHO cells are a primary host for therapeutic protein production, such as, for example, monoclonal antibody production, receptor productions, and Fc fusion proteins because CHO cells provide fidelity of folding, processing, and glycosylation. CHO cells are also compatible with deep-tank, serum-free culture and have excellent safety records.
The present invention permits an understanding of pathways, genes and proteins that influence desired cell culture phenotypes or characteristics, for example, cell phenotypes that enable highly productive fed-batch processes. Such desired cell phenotypes include, but are not limited to, high cell growth rate, high peak cell density, sustained high cell viability, high maximum cellular productivity, sustained high cellular productivity, low ammonium production, and low lactate production. Desired phenotypes or characteristics may be inherent properties of established cell lines that have certain genomic backgrounds. Desired phenotypes or characteristics may also be conferred to cells by growing the cells in different conditions, e.g., temperatures, cell densities, the use of agents such as sodium butyrate, to be in different kinetic phases of growth (e.g., lag phase, exponential growth phase, stationary phase or death phase), and/or to become serum-independent, etc. During the period in which these phenotypes are induced, and/or after these phenotypes are achieved, a pool of target nucleic acid or protein samples can be prepared from the cells and analyzed with the oligonucleotide array to determine and identify which genes demonstrate altered expression in response to a particular stimulus (e.g., temperature, sodium butyrate), and therefore are potentially involved in conferring the desired phenotype or characteristic.
Preparation of Pool of Target Nucleic Acids
In order to conduct gene expression profiling analysis, a pool of target nucleic acids are prepared from a sample derived from a cell line. Any biological sample may be used as a source of target nucleic acids. The pool of target nucleic acids can be total RNA, or any nucleic acid derived therefrom, including each of the single strands of cDNA made by reverse transcription of the mRNA, or RNA transcribed from the double-stranded cDNA intermediate. Methods of isolating target nucleic acids for analysis with an oligonucleotide array or other probes, such as phenol-chloroform extraction, ethanol precipitation, magnetic bead separation, or silica-gel affinity purification, are well known to one of skill in the art.
For example, various methods are available for isolating or enriching RNA. These methods include, but are not limited to, RNeasy kits (provided by Qiagen), MasterPure kits (provided by Epicentre Technologies), charge-switch technology (see, e.g., U.S. Published patent application Nos. 2003/0054395 and 2003/0130499), and TRIZOL (provided by Gibco BRL). The RNA isolation protocols provided by Affymetrix can also be employed in the present invention. See, e.g., GeneChip® EXPRESSION ANALYSIS TECHNICAL MANUAL (701021 rev. 3, Affymetrix, Inc. 2002).
Preferably, the pool of target nucleic acids (i.e., mRNA or nucleic acids derived therefrom) should reflect the transcription of gene coding regions. In one example, mRNA is enriched by removing rRNA. Different methods are available for eliminating or reducing the amount of rRNA in a sample. For instance, rRNA can be removed by enzyme digestions. According to the latter method, rRNAs are first amplified using reverse transcriptase and specific primers to produce cDNA. The rRNA is allowed to anneal with the cDNA. The sample is then treated with RNAase H, which specifically digests RNA within an RNA:DNA hybrid.
Target nucleic acids may be amplified before incubation with an oligonucleotide array or other probes. Suitable amplification methods, including, but not limited to, reverse transcription-polymerase chain reaction, ligase chain reaction, self-sustained sequence replication, and in vitro transcription, are well known in the art. It should be noted that oligonucleotide probes are chosen to be complementary to target nucleic acids. Therefore, if an antisense pool of target nucleic acids is provided (as is often the case when target nucleic acids are amplified by in vitro transcription), the oligonucleotide probes should correspond with subsequences of the sense complement. Conversely, if the pool of target nucleic acids is sense, the oligonucleotide array should be complementary (i.e., antisense) to them. Finally, if target nucleic acids are double-stranded, oligonucleotide probes can be sense or antisense.
The present invention involves detecting the hybridization intensity between target nucleic acids and complementary oligonucleotide probes. To accomplish this, target nucleic acids may be attached directly or indirectly with appropriate and detectable labels. Direct labels are detectable labels that are directly attached to or incorporated into target nucleic acids. Indirect labels are attached to polynucleotides after hybridization, often by attaching to a binding moiety that was attached to the target nucleic acids prior to hybridization. Such direct and indirect labels are well known in the art. In a preferred embodiment of the invention, target nucleic acids are detected using the biotin-streptavidin-PE coupling system, where biotin is incorporated into target nucleic acids and hybridization is detected by the binding of streptavidin-PE to biotin.
Target nucleic acids may be labeled before, during or after incubation with an oligonucleotide array. Preferably, the target nucleic acids are labeled before incubation. Labels may be incorporated during the amplification step by using nucleotides that are already labeled (e.g., biotin-coupled dUTP or dCTP) in the reaction. Alternatively, a label may be added directly to the original nucleic acid sample (e.g., mRNA, cDNA) or to the amplification product after the amplification is completed. Means of attaching labels to nucleic acids are well known to those of skill in the art and include, but are not limited to, nick translation, end-labeling, and ligation of target nucleic acids to a nucleic acid linker to join it to a label. Alternatively, several kits specifically designed for isolating and preparing target nucleic acids for microarray analysis are commercially available, including, but not limited to, the GeneChip® IVT Labeling Kit (Affymetrix, Santa Clara, Calif.) and the Bioarray® High Yield® RNA Transcript Labeling Kit with Fluorescein-UTP for Nucleic Acid Arrays (Enzo Life Sciences, Inc., Farmingdale, N.Y.).
Polynucleotides can be fragmented before being labeled with detectable moieties. Exemplary methods for fragmentation include, but are not limited to, heat or ion-mediated hydrolysis.
Oligonucleotide Arrays
Probes suitable for the present invention includes oligonucleotide arrays or other probes that capable of detecting the expression of a plurality of genes (including previously undiscovered genes) by a cell (or cell line), including known cells or cells derived from an unsequenced organism, and to identify genes (including previously undiscovered genes) and related pathways that may be involved with the induction of a particular cell phenotype, e.g., increased and efficient transgene expression.
Oligonucleotide probes used in this invention may be nucleotide polymers or analogs and modified forms thereof such that hybridizing to a pool of target nucleic acids occurs in a sequence specific manner under oligonucleotide array hybridization conditions. As used herein, the term “oligonucleotide array hybridization conditions” refers to the temperature and ionic conditions that are normally used in oligonucleotide array hybridization. In many examples, these conditions include 16-hour hybridization at 45° C., followed by at least three 10-minute washes at room temperature. The hybridization buffer comprises 100 mM MES, 1 M [Na+], 20 mM EDTA, and 0.01% Tween 20. The pH of the hybridization buffer can range between 6.5 and 6.7. The wash buffer is 6× SSPET, which contains 0.9 M NaCl, 60 mM NaH2PO4, 6 mM EDTA, and 0.005% Triton X-100. Under more stringent oligonucleotide array hybridization conditions, the wash buffer can contain 100 mM MES, 0.1 M [Na+], and 0.01% Tween 20. See also GENECHIP® EXPRESSION ANALYSIS TECHNICAL MANUAL (701021 rev. 3, Affymetrix, Inc. 2002), which is incorporated herein by reference in its entirety.
As is known by one of skill in the art, oligonucleotide probes can be of any length. Preferably, oligonucleotide probes suitable for the invention are 20 to 70 nucleotides in length. Most preferably, suitable oligonucleotide probes are 25 nucleotides in length. In one embodiment, the nucleic acid probes of the present invention have relatively high sequence complexity. In many examples, the probes do not contain long stretches of the same nucleotide. In addition, the probes may be designed such that they do not have a high proportion of G or C residues at the 3′ ends. In another embodiment, the probes do not have a 3′ terminal T residue. Depending on the type of assay or detection to be performed, sequences that are predicted to form hairpins or interstrand structures, such as “primer dimers,” can be either included in or excluded from the probe sequences. In many embodiments, each probe employed in the present invention does not contain any ambiguous base.
Oligonucleotide probes are made to be specific for (e.g., complementary to (i.e., capable of hybridizing to)) a template sequence. Any part of a template sequence can be used to prepare probes. Multiple probes, e.g., 5, 10, 15, 20, 25, 30, or more, can be prepared for each template sequence. These multiple probes may or may not overlap each other. Overlap among different probes may be desirable in some assays. In many embodiments, the probes for a template sequence have low sequence identities with other template sequences, or the complements thereof. For instance, each probe for a template sequence can have no more than 70%, 60%, 50% or less sequence identity with other template sequences, or the complements thereof. This reduces the risk of undesired cross-hybridization. Sequence identity can be determined using methods known in the art. These methods include, but are not limited to, BLASTN, FASTA, and FASTDB. The Genetics Computer Group (GCG) program, which is a suite of programs including BLASTN and FASTA, can also be used. Preferable sequences for template sequences include, but are not limited to, consensus sequences, transgene sequences, and control sequences (i.e., sequences used to control or normalize for variation between experiments, samples, stringency requirements, and target nucleic acid preparations). Additionally, any subsequence of consensus, transgene and control sequences can be used as a template sequence.
In one embodiment, only certain regions (i.e., tiling regions) of consensus, transgene and control sequences are used as template sequences for the oligonucleotide probes used in this invention. One of skill in the art will recognize that protocols that may be used in practicing the invention, e.g., in vitro transcription protocols, often result in a bias toward the 3′-ends of target nucleic acids. Consequently, in one embodiment of the invention, the region of the consensus sequence or transgene sequence closest to the 3′-end of a consensus sequence is most often used as a template for oligonucleotide probes. Generally, if a poly-A signal could be identified, the 1400 nucleotides immediately prior to the end of the consensus or transgene sequences are designated as a tiling region. Alternatively, if a poly-A signal could not be identified, only the last 600 nucleotides of the consensus or transgene sequence are designated as a tiling region. However, it should be noted that the invention is not limited to using only these tiling regions within the consensus, transgene and control sequences as templates for the oligonucleotide probes. Indeed, a tiling region may occur anywhere within the consensus, transgene or control sequences. For example, the tiling region of a control sequence may comprise regions from both the 5′ and 3′-ends of the control sequence. In fact, the entire consensus, transgene or control sequence may be used as a template for oligonucleotide probes.
An oligonucleotide array suitable for the invention may include perfect match probes to a plurality of consensus sequences (i.e., consensus sequences for multi-sequence clusters, and consensus sequences for exemplar sequences) identified as described above. The oligonucleotide array suitable for the invention may also include perfect match probes to both consensus and transgene sequences. It will be apparent to one of skill in the art that inclusion of oligonucleotide probes to transgene sequences will be useful when a cell line is genetically engineered to express a recombinant protein encoded by a transgene sequence, and the purpose of the analysis is to confirm expression of the transgene and determine the level of such expression. In those cases where the transgene is linked in a bicistronic mRNA to a downstream ORF, such as dihydrofolate reductase (DHFR), the level of transgene expression may also be determined from the level of expression of the downstream sequence. In another embodiment of the invention, the oligonucleotide array further comprises control probes that normalize the inherent variation between experiments, samples, stringency requirements, and preparations of target nucleic acids. Exemplary compositions of each of these types of control probes is described in U.S. Pat. No. 6,040,138 and in U.S. Publication No. 20060010513, the teachings of both of which are incorporated herein in their entirety by reference.
It is well known to one of skill in the art that two pools of target nucleic acids individually processed from the same sample can hybridize to two separate but identical oligonucleotide arrays with varying results. The varying results between these arrays are attributed to several factors, such as the intensity of the labeled pool of target nucleic acids and incubation conditions. To control for these variations, normalization control probes can be added to the array. Normalization control probes are oligonucleotides exactly complementary to known nucleic acid sequences spiked into the pool of target nucleic acids. Any oligonucleotide sequence may serve as a normalization control probe. For example, the normalization control probes may be created from a template obtained from an organism other than that from which the cell line being analyzed is derived. In one embodiment, an oligonucleotide array to mammalian sequences will contain normalization oligonucleotide probes to the following genes: bioB, bioC, and bioD from the organism Escherichia coli, cre from the organism Bacteriophage PI, and dap from the organism Bacillus subtilis, or subsequences thereof. The signal intensity received from the normalization control probes are then used to normalize the signal intensities from all other probes in the array. Additionally, when the known nucleic acid sequences are spiked into the pool of target nucleic acids at known and different concentrations for each transcript, a standard curve correlating signal intensity with transcript concentration can be generated, and expression levels for all transcripts represented on the array can be quantified (see, e.g., Hill et al. (2001) Genome Biol. 2(12):research0055.1-0055.13).
Due to the naturally differing metabolic states between cells, expression of specific target nucleic acids vary from sample to sample. In addition, target nucleic acids may be more prone to degradation in one pool compared to another pool. Consequently, in another embodiment of the invention, the oligonucleotide array further comprises oligonucleotide probes that are exactly complementary to constitutively expressed genes, or subsequences thereof, that reflect the metabolic state of a cell. Nonlimiting examples of these types of genes are beta-actin, transferrin receptor and glyceraldehyde-3-phosphate dehydrogenase (GAPDH).
In one embodiment of the invention, the pool of target nucleic acids is derived by converting total RNA isolated from the sample into double-stranded cDNA and transcribing the resulting cDNA into complementary RNA (cRNA) using methods described in U.S. Publication No. 20060010513, the teachings of which are incorporated herein in their entirety by reference. The RNA conversion protocol is started at the 3′-end of the RNA transcript, and if the process is not allowed to go to completion (if, for example, the RNA is nicked, etc.) the amount of the 3′-end message compared to the 5′-end message will be greater, resulting in a 3′-bias. Additionally, RNA degradation may start at the 5′-end (Jacobs Anderson et al. (1998) EMBO J. 17:1497-506). The use of these methods suggests that control probes that measure the quality of the processing and the amount of degradation of the sample preferably should be included in the oligonucleotide array. Examples of such control probes are oligonucleotides exactly complementary to 3′- and 5′-ends of constitutively expressed genes, such as beta-actin, transferrin receptor and GAPDH, as mentioned above. The resulting 3′ to 5′ expression ratio of a constitutively expressed gene is then indicative of the quality of processing and the amount of degradation of the sample; i.e., a 3′ to 5′ ratio greater than three (3) indicates either incomplete processing or high RNA degradation (Auer et al. (2003) Nat. Genet. 35:292-93). Consequently, in a preferred embodiment of the invention, the oligonucleotide array includes control probes that are complementary to the 3′- and 5′-ends of constitutively expressed genes.
The quality of the pool of target nucleic acids is not only reflected in the processing and degradation of the target nucleic acids, but also in the origin of the target nucleic acids. Contaminating sequences, such as genomic DNA, may interfere with well-known quantification protocols. Consequently, in a preferred embodiment of the invention, the array further comprises oligonucleotide probes exactly complementary to bacterial genes, ribosomal RNAs, and/or genomic intergenic regions to provide a means to control for the quality of the sample preparation. These probes control for the possibility that the pool of target nucleic acids is contaminated with bacterial DNA, non-mRNA species, and genomic DNA. Such exemplary control sequences are disclosed in U.S. Publication No. 20060010513, the teaching of which are incorporated herein in their entirety by reference.
In a preferred embodiment of the invention, the oligonucleotide array further comprises control mismatch oligonucleotide probes for each perfect match probe. The mismatch probes control for hybridization specificity. Preferably, mismatch control probes are identical to their corresponding perfect match probes with the exception of one or more substituted bases. More preferably, the substitution(s) occurs at a central location on the probe. For example, where a perfect match probe is 25 oligonucleotides in length, a corresponding mismatch probe will have the identical length and sequence except for a single-base substitution at position 13 (e.g., substitution of a thymine for an adenine, an adenine for a thymine, a cytosine for a guanine, or a guanine for a cytosine). The presence of one or more mismatch bases in the mismatch oligonucleotide probe disallows target nucleic acids that bind to complementary perfect match probes to bind to corresponding mismatch control probes under appropriate conditions. Therefore, mismatch oligonucleotide probes indicate whether the incubation conditions are optimal, i.e., whether the stringency being utilized provides for target nucleic acids binding to only exactly complementary probes present in the array.
For each template, a set of perfect match probes exactly complementary to subsequences of consensus, transgene, and/or control sequences (or tiling regions thereof) may be chosen using a variety of strategies. It is known to one of skill in the art that each template can provide for a potentially large number of probes. As is known, apparent probes are sometimes not suitable for inclusion in the array. This can be due to the existence of similar subsequences in other regions of the genome, which causes probes directed to these subsequences to cross-hybridize and give false signals. Another reason some apparent probes may not be suitable for inclusion in the array is because they may form secondary structures that prevent efficient hybridization. Finally, hybridization of target nucleic acids with (or to) an array comprising a large number of probes requires that each of the probes hybridizes to its specific target nucleic acid sequence under the same incubation conditions.
An oligonucleotide array may comprise one perfect match probe for a consensus, transgene, or control sequence, or may comprise a probeset (i.e., more than one perfect match probe) for a consensus, transgene, or control sequence. For example, an oligonucleotide array may comprise 1, 5, 10, 25, 50, 100, or more than 100 different perfect match probes for a consensus, transgene or control sequence. In a preferred embodiment of the invention, the array comprises at least 11-50 different perfect match oligonucleotide probes exactly complementary to subsequences of each consensus and transgene sequence. In an even more preferred embodiment, only the most optimal probeset for each template is included. The suitability of the probes for hybridization can be evaluated using various computer programs. Suitable programs for this purpose include, but are not limited to, LaserGene (DNAStar), Oligo (National Biosciences, Inc.), MacVector (Kodak/IBI), and the standard programs provided by the GCG. Any method or software program known in the art may be used to prepare probes for the template sequences of the present invention. For example, oligonucleotide probes may be generated by using Array Designer, a software package provided by TeleChem International, Inc (Sunnyvale, Calif.). Another exemplary algorithm for choosing optimal probe sets is described in U.S. Pat. No. 6,040,138, the teachings of which are hereby incorporated by reference. Other suitable means to optimize probesets, which will result in a comparable oligonucleotide array, are well known in the art and may be found in, e.g., Lockhart et al. (1996) Nat. Biotechnol. 14:1675-80 and Mei et al. (2003) Proc. Natl. Acad. Sci. USA 100:11237-42.
The oligonucleotide probes of the present invention can be synthesized using a variety of methods. Examples of these methods include, but are not limited to, the use of automated or high throughput DNA synthesizers, such as those provided by Millipore, GeneMachines, and BioAutomation. In many embodiments, the synthesized probes are substantially free of impurities. In many other embodiments, the probes are substantially free of other contaminants that may hinder the desired functions of the probes. The probes can be purified or concentrated using numerous methods, such as reverse phase chromatography, ethanol precipitation, gel filtration, electrophoresis, or any combination thereof.
More detailed information of making an oligonucleotide array suitable for the present invention and exemplary arrays are disclosed in U.S. Publication No. 20060010513, the disclosures of which are hereby incorporated by reference. As described in U.S. Publication No. 20060010513, a CHO chip microarray suitable for the invention includes 122 array quality control sequences (non-CHO), 732 public hamster sequences, 2835 library-derived CHO sequences, and 22 product/process specific sequences. Additional suitable arrays are described in U.S. Pat. No. 6,040,138, the disclosures of which are incorporated by reference.
Incubation of Target Nucleic Acids with an Array to Form a Hybridization Profile
Incubation reactions can be performed in absolute or differential hybridization formats. In the absolute hybridization format, polynucleotides derived from one sample are hybridized to the probes in an oligonucleotide array. Signals detected after the formation of hybridization complexes correlate to the polynucleotide levels in the sample. In the differential hybridization format, polynucleotides derived from two samples are labeled with different labeling moieties. A mixture of these differently labeled polynucleotides is added to an oligonucleotide array. The oligonucleotide array is then examined under conditions in which the emissions from the two different labels are individually detectable. In one embodiment, the fluorophores Cy3 and Cy5 (Amersham Pharmacia Biotech, Piscataway, N.J.) are used as the labeling moieties for the differential hybridization format.
In the present invention, the incubation conditions should be such that target nucleic acids hybridize only to oligonucleotide probes that have a high degree of complementarity. In a preferred embodiment, this is accomplished by incubating the pool of target nucleic acids with an oligonucleotide array under a low stringency condition to ensure hybridization, and then performing washes at successively higher stringencies until the desired level of hybridization specificity is reached. In other embodiments, target nucleic acids are incubated with an array of the invention under stringent or well-known oligonucleotide array hybridization conditions. In many examples, these oligonucleotide array hybridization conditions include 16-hour hybridization at 45 ° C., followed by at least three 10-minute washes at room temperature. The hybridization buffer comprises 100 mM MES, 1 M [Na+], 20 mM EDTA, and 0.01% Tween 20. The pH of the hybridization buffer can range between 6.5 and 6.7. The wash buffer is 6× SSPET, which contains 0.9 M NaCl, 60 mM NaH2PO4, 6 mM EDTA, and 0.005% Triton X-100. Under more stringent oligonucleotide array hybridization conditions, the wash buffer can contain 100 mM MES, 0.1 M [Na+], and 0.01% Tween 20. See also GENECHIP® EXPRESSION ANALYSIS TECHNICAL MANUAL (701021 rev. 3, Affymetrix, Inc. 2002), which is incorporated herein by reference in its entirety.
Differential Gene Expression Profiling Analysis
Methods used to detect the hybridization profile of target nucleic acids with oligonucleotide probes are well known in the art. In particular, means of detecting and recording fluorescence of each individual target nucleic acid-oligonucleotide probe hybrid have been well established and are well known in the art, described in, e.g., U.S. Pat. No. 5,631,734, U.S. Publication No. 20060010513, incorporated herein in their entirety by reference. For example, a confocal microscope can be controlled by a computer to automatically detect the hybridization profile of the entire array. Additionally, as a further nonlimiting example, the microscope can be equipped with a phototransducer attached to a data acquisition system to automatically record the fluorescence signal produced by each individual hybrid.
It will be appreciated by one of skill in the art that evaluation of the hybridization profile is dependent on the composition of the array, i.e., which oligonucleotide probes were included for analysis. For example, where the array includes oligonucleotide probes to consensus sequences only, or consensus sequences and transgene sequences only, (i.e., the array does not include control probes to normalize for variation between experiments, samples, stringency requirements, and preparations of target nucleic acids), the hybridization profile is evaluated by measuring the absolute signal intensity of each location on the array. Alternatively, the mean, trimmed mean (i.e., the mean signal intensity of all probes after 2-5% of the probesets with the lowest and highest signal intensities are removed), or median signal intensity of the array may be scaled to a preset target value to generate a scaling factor, which will subsequently be applied to each probeset on the array to generate a normalized expression value for each gene (see, e.g., Affymetrix (2000) Expression Analysis Technical Manual, pp. A5-14). Conversely, where the array further comprises control oligonucleotide probes, the resulting hybridization profile is evaluated by normalizing the absolute signal intensity of each location occupied by a test oligonucleotide probe by means of mathematical manipulations with the absolute signal intensity of each location occupied by a control oligonucleotide probe. Typical normalization strategies are well known in the art, and are included, for example, in U.S. Pat. No. 6,040,138 and Hill et al. (2001) Genome Biol. 2(12):research0055.1-0055.13.
Signals gathered from oligonucleotide arrays can be analyzed using commercially available software, such as those provide by Affymetrix or Agilent Technologies. Controls, such as for scan sensitivity, probe labeling and cDNA or cRNA quantitation, may be included in the hybridization experiments. The array hybridization signals can be scaled or normalized before being subjected to further analysis. For instance, the hybridization signal for each probe can be normalized to take into account variations in hybridization intensities when more than one array is used under similar test conditions. Signals for individual target nucleic acids hybridized with complementary probes can also be normalized using the intensities derived from internal normalization controls contained on each array. In addition, genes with relatively consistent expression levels across the samples can be used to normalize the expression levels of other genes.
To identify genes that confer or correlate with a desired phenotype or characteristic, a gene expression profile of a sample derived from a test cell line is compared to a control profile derived from a control cell line that has a cell culture phenotype of interest distinct from that of the test cell line and differentially expressed genes are identified. For example, the method for identifying the genes and related pathways involved in cellular productivity may include the following: 1) growing a first sample of a first cell line with a particular cellular productivity and growing a second sample of a second cell line with a distinct cellular productivity; 2) isolating, processing, and hybridizing total RNA from the first sample to a first oligonucleotide array; 3) isolating, processing, and hybridizing total RNA from the second sample to a second oligonucleotide array; and 4) comparing the resulting hybridization profiles to identify the sequences that are differentially expressed between the first and second samples. Similar methods can be used to identify genes involved in other phenotypes.
Typically, each cell line was represented by at least three biological replicates. Programs known in the art, e.g., GeneExpress 2000 (Gene Logic, Gaithersburg, Md.), were used to analyze the presence or absence of a target sequence and to determine its relative expression level in one cohort of samples (e.g., cell line or condition or time point) compared to another sample cohort. A probeset called present in all replicate samples was considered for further analysis. Generally, fold-change values of 1.2-fold, 1.5-fold or greater were considered statistically significant if the p-values were less than or equal to 0.05.
The identification of differentially expressed genes that correlate with one or more particular cell phenotypes (e.g., cell growth rate, peak cell density, sustained high cell viability, maximum cellular productivity, sustained high cellular productivity, ammonium production or consumption, lactate production or consumption, etc.) can lead to the discovery of genes and pathways, including those were previously undiscovered, that regulate or are indicative of the cell phenotypes.
The subsequently identified genes are sequenced and the sequences are blasted against various databases to determine whether they are known genes or unknown genes. If genes are known, pathway analysis can be conducted based on the existing knowledge in the art. Both known and unknown genes are further confirmed or validated by various methods known in the art. For example, the identified genes may be manipulated (e.g., up-regulated or down-regulated) to induce or suppress the particular phenotype by the cells.
A harmonized decision tree illustrating this process is shown in
Differential Protein Expression Profiling Analysis
The present invention also provide methods for identifying differentially expressed proteins by protein expression profiling analysis. Protein expression profiles can be generated by any method permitting the resolution and detection of proteins from a sample from a cell line. Methods with higher resolving power are generally preferred, as increased resolution can permit the analysis of greater numbers of individual proteins, increasing the power and usefulness of the profile. A sample can be pre-treated to remove abundant proteins from a sample, such as by immunodepletion, prior to protein resolution and detection, as the presence of an abundant protein may mask more subtle changes in expression of other proteins, particularly for low-abundance proteins. A sample can also be subjected to one or more procedures to reduce the complexity of the sample. For example, chromatography can be used to fractionate a sample; each fraction would have a reduced complexity, facilitating the analysis of the proteins within the fractions.
Three useful methods for simultaneously resolving and detecting several proteins include array-based methods; mass-spectrometry based methods; and two-dimensional gel electrophoresis based methods.
Protein arrays generally involve a significant number of different protein capture reagents, such as antibodies or antibody variable regions, each immobilized at a different location on a solid support. Such arrays are available, for example, from Sigma-Aldrich as part of their Panorama® line of arrays. The array is exposed to a protein sample and the capture reagents selectively capture the specific protein targets. The captured proteins are detected by detection of a label. For example, the proteins can be labeled before exposure to the array; detection of a label at a particular location on the array indicates the detection of the corresponding protein. If the array is not saturated, the amount of label detected may correlate with the concentration or amount of the protein in the sample. Captured proteins can also be detected by subsequent exposure to a second capture reagent, which can itself be labeled or otherwise detected, as in a sandwich immunoassay format.
Mass spectrometry-based methods include, for example, matrix-assisted laser desorption/ionization (MALDI), Liquid Chromatography/Mass Spectrometry/Mass Spectrometry (LC-MS/MS) and surface enhanced laser desorption/ionization (SELDI) techniques. For example, a protein profile can be generated using electrospray ionization and MALDI. SELDI, as described, for example, in U.S. Pat. No. 6,225,047, incorporates a retention surface on a mass spectrometry chip. A subset of proteins in a protein sample are retained on the surface, reducing the complexity of the mixture. Subsequent time-of-flight mass spectrometry generates a “fingerprint” of the retained proteins.
In methods involving two-dimensional gel electrophoresis, proteins in a sample are generally separated in a first dimension by isoelectric point and in a second dimension by molecular weight during SDS-PAGE. By virtue of the two dimensions of resolution, hundreds or thousands of proteins can be simultaneously resolved and analyzed. The proteins are detected by application of a stain, such as a silver stain, or by the presence of a label on the proteins, such as a Cy2, Cy3, or Cy5 dye. To identify a protein, a gel spot can be cut out and in-gel tryptic digestion performed. The tryptic digest can be analyzed by mass spectrometry, such as MALDI. The resulting mass spectrum of peptides, the peptide mass fingerprint or PMF, is searched against a sequence database. The PMF is compared to the masses of all theoretical tryptic peptides generated in silico by the search program. Programs such as Prospector, Sequest, and MasCot (Matrix Science, Ltd., London, UK) can be used for the database searching. For example, MasCot produces a statistically-based Mowse score indicates if any matches are significant or not. MS/MS can be used to increase the likelihood of getting a database match. CID-MS/MS (collision induced dissociation of tandem MS) of peptides can be used to give a spectrum of fragment ions that contain information about the amino acid sequence. Adding this information to a peptide mass fingerprint allows Mascot to increase the statistical significance of a match. It is also possible in some cases to identify a protein by submitting only a raw MS/MS spectrum of a single peptide.
A recent improvement in comparisons of protein expression profiles involves the use of a mixture of two or more protein samples, each labeled with a different, spectrally-resolvable, charge- and mass-matched dye, such as Cy3 and Cy5. This improvement, called fluorescent 2-dimensional differential in-gel electrophoresis (DIGE), has the advantage that the test and control protein samples are run in the same gel, facilitating the matching of proteins between the two samples and avoiding complications involving non-identical electrophoresis conditions in different gels. The gels are imaged separately and the resulting images can be overlaid directly without further modification. A third spectrally-resolvable dye, such as Cy2, can be used to label a pool of protein samples to serve as an internal control among different gels run in an experiment. Thus, all detectable proteins are included as an internal standard, facilitating comparisons across different gels.
Engineering Cell Lines to Improve Cell Phenotypes
As described above, the present invention provides polynucleotide sequences (or subsequences) of genes or polypeptide sequences (or subsequences) of proteins that are differentially expressed in different cell lines or cell samples with at least one distinct cell phenotype. These sequences are collectively referred to as differential sequences. The differential sequences may be used as targets to effect a cell phenotype, particularly a phenotype characterized by increased and efficient production of a recombinant transgene, increased cell growth rate, high peak cell density, sustained high cell viability, high maximum cellular productivity, sustained high cellular productivity, low ammonium production, and low lactate production, etc.
More particularly, the invention provides each purified and/or isolated polynucleotide or polypeptide sequence referred to in the relevant Tables that is shown to be a suitable target for regulating a CHO cell phenotype, i.e., is differentially expressed by a first CHO cell line compared to a second CHO cell line, herein designated as “differential CHO sequence.” Specifically, as used herein, a differential CHO sequence include a sequence having and/or consisting essentially of a sequence selected from the gene sequences referenced in the Tables, a fragment or a complement thereof. As used herein, a differential CHO sequence also includes a polypeptide sequence selected from the protein sequences referenced in the Tables, or a fragment thereof. As used herein, a differential CHO sequence also includes a polynucleotide sequence encoding a polypeptide sequence selected from the protein sequences referenced in the Tables, a fragment or a complement thereof. A skilled artisan will recognize that the differential CHO sequences of the invention may include novel CHO sequences (as discussed below), known gene sequences that are attributed with a function that is, or was, not obviously involved in transgene expression, and known sequences that previously had no known function but may now be known to function as targets in regulating a CHO cell phenotype.
The present invention contemplates methods and compositions that may be used to alter (i.e., regulate (e.g., enhance, reduce, or modify)) the expression and/or the activity of the genes or proteins corresponding to the differential CHO sequences in a cell or organism. Altered expression of the differential CHO sequences encompassed by the present invention in a cell or organism may be achieved through down-regulating or up-regulating of the corresponding genes or proteins. For example, the differential CHO sequences may be down-regulated by the use of various inhibitory polynucleotides, such as antisense polynucleotides, ribozymes that bind and/or cleave the mRNA transcribed from the genes of the invention, triplex-forming oligonucleotides that target regulatory regions of the genes, and short interfering RNA that causes sequence-specific degradation of target mRNA (e.g., Galderisi et al. (1999) J. Cell. Physiol. 181:251-57; Sioud (2001) Curr. Mol. Med. 1:575-88; Knauert and Glazer (2001) Hum. Mol. Genet. 10:2243-51; Bass (2001) Nature 411:428-29).
The inhibitory antisense or ribozyme polynucleotides suitable for the invention can be complementary to an entire coding strand of a gene of the invention, or to only a portion thereof. Alternatively, inhibitory polynucleotides can be complementary to a noncoding region of the coding strand of a gene of the invention. The inhibitory polynucleotides of the invention can be constructed using chemical synthesis and/or enzymatic ligation reactions using procedures well known in the art. The nucleoside linkages of chemically synthesized polynucleotides can be modified to enhance their ability to resist nuclease-mediated degradation, as well as to increase their sequence specificity. Such linkage modifications include, but are not limited to, phosphorothioate, methylphosphonate, phosphoroamidate, boranophosphate, morpholino, and peptide nucleic acid (PNA) linkages (Galderisi et al., supra; Heasman (2002) Dev. Biol. 243:209-14; Mickelfield (2001) Curr. Med. Chem. 8:1157-70). Alternatively, antisense molecules can be produced biologically using an expression vector into which a polynucleotide of the present invention has been subcloned in an antisense (i.e., reverse) orientation.
In yet another embodiment, the antisense polynucleotide molecule suitable for the invention is an α-anomeric polynucleotide molecule. An α-anomeric polynucleotide molecule forms specific double-stranded hybrids with complementary RNA in which, contrary to the usual β-units, the strands run parallel to each other. The antisense polynucleotide molecule can also comprise a 2′-o-methylribonucleotide or a chimeric RNA-DNA analogue, according to techniques that are known in the art.
The inhibitory triplex-forming oligonucleotides (TFOs) suitable for the present invention bind in the major groove of duplex DNA with high specificity and affinity (Knauert and Glazer, supra). Expression of the genes of the present invention can be inhibited by targeting TFOs complementary to the regulatory regions of the genes (i.e., the promoter and/or enhancer sequences) to form triple helical structures that prevent transcription of the genes.
In one embodiment of the invention, the inhibitory polynucleotides are short interfering RNA (siRNA) molecules. These siRNA molecules are short (preferably 19-25 nucleotides; most preferably 19 or 21 nucleotides), double-stranded RNA molecules that cause sequence-specific degradation of target mRNA. This degradation is known as RNA interference (RNAi) (e.g., Bass (2001) Nature 411:428-29). Originally identified in lower organisms, RNAi has been effectively applied to mammalian cells and has recently been shown to prevent fulminant hepatitis in mice treated with siRNA molecules targeted to Fas MRNA (Song et al. (2003) Nat. Med. 9:347-51). In addition, intrathecally delivered siRNA has recently been reported to block pain responses in two models (agonist-induced pain model and neuropathic pain model) in the rat (Dom et al. (2004) Nucleic Acids Res. 32(5):e49).
The siRNA molecules suitable for the present invention can be generated by annealing two complementary single-stranded RNA molecules together (one of which matches a portion of the target mRNA) (Fire et al., U.S. Pat. No. 6,506,559) or through the use of a single hairpin RNA molecule that folds back on itself to produce the requisite double-stranded portion (Yu et al. (2002) Proc. Natl. Acad. Sci. USA 99:6047-52). The siRNA molecules can be chemically synthesized (Elbashir et al. (2001) Nature 411:494-98) or produced by in vitro transcription using single-stranded DNA templates (Yu et al., supra). Alternatively, the siRNA molecules can be produced biologically, either transiently (Yu et al., supra; Sui et al. (2002) Proc. Natl. Acad. Sci. USA 99:5515-20) or stably (Paddison et al. (2002) Proc. Natl. Acad. Sci. USA 99:1443-48), using an expression vector(s) containing the sense and antisense siRNA sequences. Recently, reduction of levels of target mRNA in primary human cells, in an efficient and sequence-specific manner, was demonstrated using adenoviral vectors that express hairpin RNAs, which are further processed into siRNAs (Arts et al. (2003) Genome Res. 13:2325-32).
The siRNA molecules targeted to the differential CHO sequences of the present invention can be designed based on criteria well known in the art (e.g., Elbashir et al. (2001) EMBO J. 20:6877-88). For example, the target segment of the target mRNA should begin with AA (preferred), TA, GA, or CA; the GC ratio of the siRNA molecule should be 45-55%; the siRNA molecule should not contain three of the same nucleotides in a row; the siRNA molecule should not contain seven mixed G/Cs in a row; and the target segment should be in the ORF region of the target mRNA and should be at least 75 bp after the initiation ATG and at least 75 bp before the stop codon. siRNA molecules targeted to the polynucleotides of the present invention can be designed by one of ordinary skill in the art using the aforementioned criteria or other known criteria.
Down-regulation of the genes or proteins of the present invention in a cell or organism may also be achieved through the creation of cells or organisms whose endogenous genes corresponding to the differential CHO sequences of the present invention have been disrupted through insertion of extraneous polynucleotides sequences (i.e., a knockout cell or organism). The coding region of the endogenous gene may be disrupted, thereby generating a nonfunctional protein. Alternatively, the upstream regulatory region of the endogenous gene may be disrupted or replaced with different regulatory elements, resulting in the altered expression of the still-functional protein. Methods for generating knockout cells include homologous recombination and are well known in the art (e.g., Wolfer et al. (2002) Trends Neurosci. 25:336-40).
The expression or activity of the CHO differential sequences may also be altered by up-regulating the genes or proteins corresponding to the CHO differential sequences of the invention. Up-regulation includes providing an exogenous nucleic acid (e.g., an over-expression construct) encoding a protein or gene of interest or a variant retaining its activity or providing a factor or a molecule indirectly enhancing the protein activity. The variant generally shares common structural features with the protein or gene of interest and should retain the activity permitting the improved cellular phenotype. The variant may correspond to a homolog from another species (e.g. a rodent homolog; a primate homolog, such as a human homolog; an other mammalian homolog; or a more distant homolog retaining sequence conservation sufficient to convey the desired effect on cellular phenotype). In some cases, the variant may retain at least 70%, at least 80%, at least 90%, or at least 95% sequence identity with the CHO sequence or with a known homolog. In certain embodiments, the variant is a nucleic acid molecule that hybridizes under stringent conditions to the CHO nucleic acid sequence or to the nucleic acid sequence of a known homolog.
For example, the isolated polynucleotides corresponding to the differential CHO sequences of the present invention may be operably linked to an expression control sequence such as the pMT2 and pED expression vectors for recombinant production of differentially expressed genes or proteins of the invention. General methods of expressing recombinant proteins are well known in the art.
The expression or activity of the differentially expressed genes or proteins of the present invention may also be altered by exogenous agents, small molecules, pharmaceutical compounds, or other factors that may be directly or indirectly modulating the activity of the genes or proteins of the present invention. As a result, these agents, small molecules, pharmaceutical compounds, or other factors may be used to regulate the phenotype of CHO cells, e.g., increased production of a recombinant transgene, increased cell growth rate, high peak cell density, sustained high cell viability, high maximum cellular productivity, sustained high cellular productivity, low ammonium production, and low lactate production, etc.
Any combinations of the methods of altering gene or protein expression described above are within the scope of the invention. Any combination of genes or proteins affecting different cell phenotypes can be modulated based on the methods described herein and are within the scope of the invention.
Novel Genes or Proteins
As described above, the present invention provides differential sequences including sequences newly discovered to be expressed by CHO cells. Accordingly, the present invention provides novel isolated and/or purified polynucleotides that are at least part of previously undiscovered genes. Exemplary novel polynucleotide sequences (or subsequences) of genes that are newly discovered expressed by CHO cells are illustrated in Tables 9, 13, and 15. The present invention also provides isolated and/or purified polypeptides that are at least part of previously undiscovered proteins. Exemplary novel polypeptide sequences (or subsequences) of proteins that are newly discovered expressed by CHO cells are illustrated in Tables 2 and 4. The present invention also provides novel polynucleotides encoding the polypeptides sequences as illustrated in Tables 2 and 4.
Thus, the invention provides each purified and/or isolated polynucleotide sequence selected from Tables 9, 13, and 15 that is, or is part of, a previously undiscovered gene (i.e., a gene that had not been sequenced and/or shown to be expressed by CHO cells) and is verifiably expressed by CHO cells. Alternatively, the invention provides each purified and/or isolated polypeptide sequence selected from Tables 2 and 4 that is, or is part of, a previously undiscovered protein (i.e., a protein that had not been sequenced and/or shown to be expressed by CHO cells) and is verifiably expressed by CHO cells. The invention also provides isolated and/or purified polynucleotide sequence encoding each polypeptides sequence selected from Tables 2 and 4. These sequences are herein collectively designated as “novel CHO sequences.” Preferred polynucleotide sequences of the invention include DNA sequences including genomic and cDNA sequences and chemically synthesized DNA sequences, RNA sequences, or other modified nucleic acid sequences. Preferred polypeptide sequences of the invention include amino acid sequences or modified amino acid sequences.
It is part of the invention to provide inhibitory polynucleotides to each novel CHO sequence as described above. Polynucleotides of the present invention also include polynucleotides that hybridize under stringent conditions to novel CHO sequences, or complements thereof, and/or encode polypeptides that retain substantial biological activity of polypeptides encoded by novel CHO sequences of the invention. Polynucleotides of the present invention also include continuous portions of novel CHO sequences comprising at least 21 consecutive nucleotides.
Polynucleotides of the present invention also include polynucleotides that encode any of the amino acid sequences encoded by the polynucleotides as described above, or continuous portions thereof, and that differ from the polynucleotides described above only due to the well-known degeneracy of the genetic code.
The isolated polynucleotides of the present invention may be used as hybridization probes (e.g., as an oligonucleotide array, as described above) and primers to identify and isolate nucleic acids having sequences identical to, or similar to, those encoding the disclosed polynucleotides. Hybridization methods for identifying and isolating nucleic acids include polymerase chain reaction (PCR), Southern hybridization, and Northern hybridization, and are well known to those skilled in the art.
Hybridization reactions can be performed under conditions of different stringencies. The stringency of a hybridization reaction includes the difficulty with which any two nucleic acid molecules will hybridize to one another. Preferably, each hybridizing polynucleotide hybridizes to its corresponding polynucleotide under reduced stringency conditions, more preferably stringent conditions, and most preferably highly stringent conditions. Examples of stringency conditions are shown in Table 1 below: highly stringent conditions are those that are at least as stringent as, for example, conditions A-F; stringent conditions are at least as stringent as, for example, conditions G-L; and reduced stringency conditions are at least as stringent as, for example, conditions M-R.
1The hybrid length is that anticipated for the hybridized region(s) of the hybridizing polynucleotides. When hybridizing a polynucleotide to a target polynucleotide of unknown sequence, the hybrid length is assumed to be that of the hybridizing polynucleotide. When polynucleotides of known sequence are hybridized, the hybrid length can be determined by aligning the sequences of the polynucleotides and identifying the region or regions of optimal sequence complementarity.
HSSPE (1x SSPE is 0.15M NaCl, 10 mM NaH2PO4, and 1.25 mM EDTA, pH 7.4) can be substituted for SSC (1x SSC is 0.15M NaCl and 15 mM sodium citrate) in the hybridization and wash buffers.
TB* − TR*: The hybridization temperature for hybrids anticipated to be less than 50 base pairs in length should be 5-10° C. less than the melting temperature (Tm) of the hybrid, where Tm is determined according to the following equations. For hybrids less than 18 base pairs in length, Tm(° C.) = 2(# of A + T bases) + 4(# of G + C bases). For hybrids between 18 and 49 base pairs in length,
Generally, and as stated above, the isolated polynucleotides of the present invention may also be used as hybridization probes and primers to identify and isolate DNAs homologous to the disclosed polynucleotides. These homologs are polynucleotides isolated from different species than those of the disclosed polynucleotides, or within the same species, but with significant sequence similarity to the disclosed polynucleotides. Preferably, polynucleotide homologs have at least 60% sequence identity (more preferably, at least 75% identity; most preferably, at least 90% identity) with the disclosed polynucleotides. Preferably, homologs of the disclosed polynucleotides are those isolated from mammalian species.
The isolated polynucleotides of the present invention may also be used as hybridization probes and primers to identify cells and tissues that express the polynucleotides of the present invention and the conditions under which they are expressed.
The present invention also contemplates recombinantly express the proteins or polypeptides encoded by the novel CHO sequences. A number of cell types may act as suitable host cells for recombinant expression of the polypeptides encoded by the novel CHO sequences of the invention. Mammalian host cells include, but are not limited to, e.g., COS cells, CHO cells, 293 cells, A431 cells, 3T3 cells, CV-1 cells, HeLa cells, L cells, BHK21 cells, HL-60 cells, U937 cells, HEK cells, PerC6 cells, Jurkat cells, normal diploid cells, cell strains derived from in vitro culture of primary tissue, and primary explants.
Alternatively, it may be possible to recombinantly produce the polypeptides encoded by the novel CHO sequences of the present invention in lower eukaryotes such as yeast or in prokaryotes. Potentially suitable yeast strains include Saccharomyces cerevisiae, Schizosaccharomyces pombe, Kluyveromyces strains, and Candida strains. Potentially suitable bacterial strains include Escherichia coli, Bacillus subtilis, and Salmonella typhimurium. If the polypeptides are made in yeast or bacteria, it may be necessary to modify them by, e.g., phosphorylation or glycosylation of appropriate sites, in order to obtain functionality. Such covalent attachments may be accomplished using well-known chemical or enzymatic methods.
The polypeptides encoded by polynucleotides of the present invention may also be recombinantly produced by operably linking the isolated novel CHO sequences of the present invention to suitable control sequences in one or more insect expression vectors, such as baculovirus vectors, and employing an insect cell expression system. Materials and methods for baculovirus/Sf9 expression systems are commercially available in kit form (e.g., the MaxBac® kit, Invitrogen, Carlsbad, Calif.).
Following recombinant expression in the appropriate host cells, the polypeptides encoded by polynucleotides of the present invention may then be purified from culture medium or cell extracts using known purification processes, such as gel filtration and ion exchange chromatography. Purification may also include affinity chromatography with agents known to bind the polypeptides encoded by the polynucleotides of the present invention. These purification processes may also be used to purify the polypeptides from natural sources.
Alternatively, the polypeptides encoded by the novel CHO sequences of the present invention may also be recombinantly expressed in a form that facilitates purification. For example, the polypeptides may be expressed as fusions with proteins such as maltose-binding protein (MBP), glutathione-S-transferase (GST), or thioredoxin (TRX). Kits for expression and purification of such fusion proteins are commercially available from New England BioLabs (Beverly, Mass.), Pharmacia (Piscataway, N.J.), and Invitrogen (Carlsbad, Calif.), respectively. The polypeptides encoded by polynucleotides of the present invention can also be tagged with a small epitope and subsequently identified or purified using a specific antibody to the epitope. A preferred epitope is the FLAG epitope, which is commercially available from Eastman Kodak (New Haven, Conn.).
The polypeptides encoded by the novel CHO sequences of the present invention may also be produced by known conventional chemical synthesis. Methods for chemically synthesizing the polypeptides encoded by the novel CHO sequences of the present invention are well known to those skilled in the art. Such chemically synthetic polypeptides may possess biological properties in common with the natural, purified polypeptides, and thus may be employed as biologically active or immunological substitutes for the natural polypeptides.
It should be understood that the above-described embodiments and the following examples are given by way of illustration, not limitation. Various changes and modifications within the scope of the present invention will become apparent to those skilled in the art from the present description.
Cells were cultured in serum-free suspension culture in two basic formats, under two basic conditions. One format was small scale, shake flask culture in which cells were cultured in less than 100 ml in a vented tissue culture flask, rotated on an orbiting shaker in a CO2 incubator. The second format was in bench top bioreactors, 2L or less working volume, controlled for pH, nutrients, dissolved oxygen, and temperature. The two basic culture conditions were ordinary passage conditions of 37C, or fed batch culture conditions. In a basic fed batch culture, the cells are grown for a longer period of time, and shifted to a lower temperature in order to prolong cell viability and extend to the productive phase of the culture.
CHO cell lines were categorized based on each of the following phenotypes useful for highly productive fed-batch cell culture processes: high cell growth rate, high peak cell density, sustained high cell viability, high maximum cellular productivity, sustained high cellular productivity, low ammonium production, and low lactate production. A cell sample matrix was generated in which the phenotypic categories were populated with the appropriate CHO cell samples taken from shake flask and benchtop bioreactor cultures and included 375 individual samples (including biological triplicates or quadruplicates) and 29 different rCHO lines expressing monoclonal antibodies, cytokines, coagulation factors and Fc:receptor fusion molecules. An exemplary portion of the cell sample matrix is depicted in
Method
Cells were harvested and subjected to standard lysis in 7 M urea, 2 M thiourea, 4% CHAPS, 30 mM Tris, 5 mM magnesium acetate at pH 8.5. 150 μg aliquots of the lysates were analyzed by two-dimensional gel electrophoresis to confirm sample quality using 18 cm immobilized pH gradient isoelectric focusing gradient strips, pH 4-7. The strips were rehydrated overnight with 340 μl of buffer per strip. Samples were loaded at the cathodic end of the strip and subjected to 500 V for 1 hour, 1000 V for 1 hour, and 8000 V for 4 hours and stored at −80° C. until the second dimension on 12.5% acrylamide gels. Electrophoresis in the second dimension was performed at 1.5 W per gel for 30 minutes and then a total of 100 W for 5 hours for a Dalt 6 run of 6 large format gels. Proteins were visualized by silver staining to confirm the quality of the proteins in the lysate.
Aliquots of the original lysates were then labeled with fluorescent dyes in preparation for fluorescent 2-dimensional differential in-gel electrophoresis (DIGE), an overview of which is shown in
The samples were applied to immobilized pH gradient isoelectric focusing strips. The strips were rehydrated overnight for about 20 hours. Samples were loaded at the cathodic end of the strip and subjected to 300V/3 hr/G, 600V/3 hr/S&H, 1000V/3 hr/G, 8000V/3 hr/G, 8000V/4 hr/S&H, and 500V/12 hr/S&H. One hour before SDS-PAGE, the strips were subjected to 8000V for one hour. The strips were equilibrated for 15 minutes in SDS buffer+1% DTT and for 15 minutes in SDS buffer+2.5% iodoacetamide. The strips were applied to polyacrylamide gels and overlaid with agarose. Electrophoresis through the gels was performed at 1.5 W/gel at 10° C. for about 18 hours on a Dalt 12 using 12 large format gels. The gels were scanned on a Typhoon™ 9400 scanner with a variable mode imager; cropped; and imported into DeCyder™ software. Differentially regulated proteins were identified using biological variance analysis (BVA). These proteins were matched to a preparative gel loaded with 400 μg of protein and stained with ruthenium. From the preparative gel, an Ettan Spot Picker was used to pick proteins identified by DIGE as differentially regulated. An Ettan Digestor was used to digest the individual proteins with an overnight trypsin incubation. The resulting peptides were analyzed by mass spectrometry. MALDI is used, particularly for highly abundant samples on gels, for peptide mass fingerprinting.
For lower abundance samples, LC-MS/MS using an MDLC LTQ machine is used. Tryptically digested samples from 2D gel spots were resuspended in 20 μL of LC-MS grade water containing 0.1% TFA and analysed by one-dimensional LC-MS using the Ettan™ MDLC system (GE Healthcare) in high-throughput configuration directly connected to a Finnigan™ LTQ™ (Thermo Electron). Samples were concentrated and desalted on RPC trap columns (Zorbax™ 300SB C18, 0.3 mm×5 mm, Agilent Technologies) and the peptides were separated on a nano-RPC column (Zorbax™ 300SB C18, 0.075 mm×100 mm, Agilent Technologies) using a linear acetonitrile gradient from 0-65% Acetonitrile (Riedel-de Haën LC-MS grade) over 60 minutes directly into the LTQ via a 10 μm nanoESI emitter (Presearch FS360-20-10-CE-20). The LTQ ion trap mass spectrometer was used for MS/MS. A scan time of ˜0.15 s (one microscans with a maximum ion injection time of 10 ms) over an m/z range of 300-2000 was used followed by MS/MS analysis of the 3 most abundant peaks from each scan which were then excluded for the next 60 seconds followed by MS/MS of the next three abundant peaks which in turn were excluded for 60 seconds and so on. A “collision energy” setting of 35% was applied for ion fragmentation and dynamic exclusion was used to discriminate against previously analysed ions (data dependent analysis).
All buffers used for nanoLC separations contained 0.1% Formic Acid (Fluka) as the ion pairing reagent. Full scan mass spectra were recorded in profile mode and tandem mass spectra in centroid mode. The peptides were identified using the information in the tandem mass spectra by searching against SWISS PROT database using SEQUEST™. An Xcorr value of >1.5 for singly charged peptides, >2.0 for doubly charged peptide and >2.5 for triply charged peptides was used as statistical cut-off.
Markers for Maximum Cellular Productivity
The protein expression profile of four cultures of a cell line overexpressing PACE (furin preproprotein), having a high maximum cellular productivity, was compared to the protein expression profile of four cultures of a control cell line. Approximately 2000 proteins were matched across all 8 gel experiments (involving a total of 24 images). To be considered as a differentially-expressed protein in the DeCyder analysis, a protein must have been identified in all 24; have demonstrated at least a 1.5-fold up- or down-regulation; and have demonstrated a T-test score less than 0.05. 188 proteins were identified as differentially regulated, most with highly significant T-test scores, including several low abundance proteins.
Tables 2 and 3 list several of the spots identified as differentially expressed in the high maximal cellular productivity cell line. For each of the spots listed in the tables, MALDI sequence analysis identified one or two corresponding amino acid sequences. The tables provide, for each spot number, the fold difference in protein levels between the test and control samples, labeled as “Average Ratio”; proteins whose levels are reduced in the test samples are indicated with a negative sign. The tables also provide the p-value that the differences in expression would be the result of random chance and the protein name and accession number corresponding to any identified amino acid sequence. In the MALDI sequence analysis, the molecular weights of the trypsin fragments were compared to predicted molecular weights of trypsin fragments of known sequences. In some cases, in this sequence analysis and in other peptide sequence analyses included in this application, the detected molecular weights are indicative of detection of a modified form of a peptide, such as where cysteine has been modified with iodacetamide, or where methionine has been partially oxidized. It is understood that this is not necessarily reflective of the initial state of the peptide in the context of the protein in the cell or the cellular milieu. Accordingly, the peptide sequences provided in the sequence listing reflect the unmodified forms of the peptide, and cells engineered to have desirable cellular phenotypes will, in some embodiments, be engineered to regulate genes expressing an amino acid sequence comprising one or more of the peptides.
In the tables, “% coverage” refers to the percentage of the total length of a database sequence for which corresponding trypsin fragments were detected in the experiment. pI and MR refer to the apparent isoelectric point and apparent molecular weight of the protein spot. For some proteins, putative protein functions are also provided in the table.
Cricetulus griseus
Cricetulus griseus
Cricetulus griseus
Cricetulus griseus
Cricetulus griseus
Cricetulus griseus
Cricetulus griseus
Cricetulus griseus
Cricetulus griseus
Cricetulus gnseus
Sequence data for identified proteins are provided in
Markers for High Cell Growth Rate
The protein expression profile of PA DUKX 378, having a high cell growth rate, was compared to the protein expression profile of PA DUKX 153.8. Tables 4 and 5 list several of the spots identified as differentially expressed in the high maximal cellular productivity cell line. For each of the spots listed in the tables, MALDI sequence analysis identified matches to a corresponding amino acid sequence from Chinese hamsters or from another species. The tables provide, for each spot number, the fold difference in protein levels between the test and control samples, labeled as “Average Ratio”; proteins whose levels are reduced in the test samples are indicated with a negative sign. The tables also provide: the p-value (statistical significance); and the protein name, accession number, and species corresponding to any identified amino acid sequence.
Cricetulus
griseus
Cricetulus
griseus
Cricetulus
griseus
Cricetulus
griseus
Cricetulus
griseus
Statistics used in Decyder analysis, +/−1.5 fold change, t-test < 0.05
Sequence data for identified proteins are provided in
Table 7 lists several of the spots identified as differentially expressed in the cells with sustained high cell viability using methods as described in Example 3. Sequence data for the identified proteins are provided in
RNA samples from test and control CHO cell lines were obtained and analyzed on a microchip containing probes for CHO mRNA sequences as described in U.S. Patent Application Publication US2006/0010513, the complete contents of which are herein incorporated by reference. The hybridization cocktail was spiked with a fragmented cRNA standard to generate a standard curve using labeled, fragmented cRNA of control sequences at known concentrations, permitting normalization of the data and assessment of chip sensitivity and saturation. The scan data were quality controlled using the 3′/5′ ratio of β-actin and GAPDH, the signal intensity and consistency, and the percent present. Generally, data normalization was performed using software tools Affy 5.0 and Genesis 2.0; or dChiP (see Li et al. (2001) Proc. Natl. Acad. Sci. USA 98:31-36 and Li et al. (2001) Genome Biol. 2:0032.1-0032.11) and Genespring. A PValue less than or equal to 0.05 and a fold-change minimum between the test and control lines of 1.2 was required before a gene would be further considered. An unsupervised Pearson Clustering Analysis is depicted in
An exemplary method of data analysis is depicted in
An exemplary method of data analysis that does not rely on pairwise differences is depicted in
A summary of nucleic acids identified as differentially expressed in cells with high maximum cellular productivity is provided in Tables 9 and 10. For each nucleic acid, a qualifier name, symbol, and title are provided, as well as whether the nucleic acid is up-regulated or down-regulated in the cells with higher maximum cellular productivity. For nucleic acids with human or mouse homologs in the Unigene database, the table provides Unigene ID numbers and statistics relating to the comparison, including e-values, percent sequence identities between the CHO sequence and the Unigene databank entries, and percent coverage (“% QC”).
Nucleic acids encoding proteins associated with the endoplasmic reticulum (ER) or the Golgi complex may contribute to cellular productivity, particularly for the production of a secreted protein. Table 11 summarizes nucleic acids that are differentially expressed by a factor of at least 1.2 in cells overexpressing PACE and encode an ER-associated protein. Table 12 summarizes nucleic acids that are differentially expressed by a factor of at least 1.2 in cells overexpressing PACE and encode a Golgi-associated protein.
A summary of nucleic acids identified as differentially expressed in cells with high cellular growth rate is provided in Tables 13 and 14. For each nucleic acid, a qualifier name, symbol, and title are provided, as well as whether the nucleic acid is up-regulated or down-regulated in the cells with higher maximum cellular productivity. For nucleic acids with human or mouse homologs in the Unigene database, the table provides Unigene ID numbers and statistics relating to the comparison, including e-values, percent sequence identities between the CHO sequence and the Unigene databank entries, and percent coverage (“% QC”).
A summary of nucleic acids identified as differentially expressed in cells with high peak cell density is provided in Tables 15, 16, and 17. For each nucleic acid, a qualifier name, symbol, and title are provided, as well as whether the nucleic acid is up-regulated or down-regulated in the cells with higher maximum cellular productivity. For nucleic acids with human or mouse homologs in the Unigene database, the table provides Unigene ID numbers and statistics relating to the comparison, including e-values, percent sequence identities between the CHO sequence and the Unigene databank entries, and percent coverage (“% QC”).
griseus intracellular adhesion molecule 1
gallus]
Bcl-xL is a powerful inhibitor of cell death. Cells overpressing Bcl-xL demonstrate sustained high cell viability. Tables 18 and 19 summarize nucleic acids that are differentially expressed by a factor of at least 1.2 in cells overexpressing Bcl-xL. Samples were taken at multiple time points for comparison. Table 18 summarizes nucleic acids that are differentially expressed by a factor of at least 1.2 at day 5. Table 19 summarizes nucleic acids that are differentially expressed by a factor of at least 1.2 at a stage later than day 5.
Four cell lines were analyzed from the Platform Process category that exhibit a desired metabolic phenotype when cultured in fed batch culture. That is, the cell lines maintain high viability, and consume lactate and ammonia late in fed batch culture. Multiple time points were collected for each cell line grown in fed batch culture. The time points from each cell line were examined by ANOVA analysis to monitor the changes in gene expression over the course of the culture. The gene lists from each cell line were compared, and those that were in common between all 4 cell lines were identified. Exemplary nucleic acid sequences are listed in Table 20.
The ability of the differentially expressed genes and proteins to affect a cellular phenotype is verified by overexpression of a nucleic acid inhibiting the expression of the relevant gene using methods known in the art. Exemplary methods based on interfering RNA constructs are described below.
Design and Synthesis of siRNA
Typically, targets that are candidates for siRNA mediated gene knockdown are sequenced, and the sequences verified. Full-length cDNA sequence information is preferred (although not required) to facilitate siRNAs design. The target sequence that is a candidate for gene knockdown is compared to gene sequences available on public or proprietary databases (e.g., BLAST search). Sequences within the target gene that overlap with other known sequences (for example, 16-17 contiguous basepairs of homology) are generally not suitable targets for specific siRNA-mediated gene knockdown.
siRNAs may be designed using, for example, online design tools, over secure internet connections, such as the one available on the Ambion® website (http://www.ambion.com/techlib/misc/siRNA_finder.html). Alternatively, custom siRNAs may also be requested from Ambion®, which applies the Cenix algorithm for designing effective siRNAs. The standard format for siRNAs is typically 5 nmol, annealed and with standard purity in plates. Upon receipt of synthesized siRNAs, the siRNAs are prepared according to the instructions provided by the manufacture and stored at the appropriate temperature (−20° C.)
Standard procedures were used for siRNA transfections. Cells to be transfected were typically pre-passaged on the day before transfection to ensure that the cells are in logarithmic growth phase. Typically, an siRNA Fed-Batch assay was used. Exemplary materials, conditions and methods for transfections are as follows.
Transfection (D0)
Per Spin Tube (50 ml)
100 uL R1
2 uL Transit-TKO transfection reagent (Mirus)
10 uL 10 uM siRNA
2 mL 1 e5 cells/mL in AS1 medium
Following Transfection
37° C.: 72 hrs
31° C.: 96 hrs
Feed: AQ3 on day 3 (D3)
Sample taken on day 1 (D1), day 3 (D3), day 7 (D7)
24 Well Suspension Transfections
For each experiment, 100,000 cells (e.g., 3C7 cells) in 1 mL total volume, and 50 nM siRNA were used. To make a mix for 3 reactions, 150 μL R1 and 70 μL Mirus TKO reagent were mixed and incubated for 10 minutes at room temperature. 15 μL of 10 μM siRNA was added and the mix was incubated for 10 minutes at room temperature. 57.3 μL of the mix was transferred into each of 3 wells. 942.7 μL of R5CD1 (containing 100,000 cells) was added and the plate was incubated on rocker at 37° C. for 72 hrs.
Spin Tube siRNA Transfection
For each experiment, 100,000 cells (e.g., 3C7 cells) in 1 mL total volume were used. For each transfection, 100 μL R1 and 2 μL Mirus TKO reagent were mixed and incubated for 10 minutes at room temperature. 10 μL of 10 μM siRNA was added and the mix was incubated for 15 minutes at room temperature, mixed occasionally. 1.9 mL culture was transferred to each spin tube. siRNA mix (112 uL) was added to each spin tube. The culture was initially incubated at 37° C. and then the temperature was shifted to 31° C. on day 3. Spin tube cultures were shaken rapidly (˜250 RPM). Samples were taken on days 1, 3, and 7. Cultures were terminated on day 7.
Growth and productivity controls were included on each plate. An exemplary productivity control is DHFR (selectable marker on bicistronic mRNA). Treatment with DHFR siRNA reproducibly decreases amount of antibody in the CM-FcIGEN (antibody production control). An exemplary growth control is CHO1 (kinesin) (see Matuliene et al. (2002) Mol. Cell. Biol. 13:1832-45) (typically, about 20-30% growth inhibition was observed with CHO1 treatment). Other standard controls such as no siRNA treatment (transfection reagents only) and non-targeting siRNA treatment (non-specific siRNA) were also included. Plates were then subjected to cell counting (for example, in a 96-well cell counting instrument) to assess growth and to, for example, an automated 96-well titer assay, to assess productivity. Genes whose modulation, singly or in combination, are sufficient to modify useful cellular phenotypes were thereby validated and such changes can be engineered, singly or in combination, into a mammalian cell line to modify its properties.
Model cell lines used for the validation purposes and their characteristics are shown in Table 21.
The ability of the differentially expressed genes and proteins to affect a cellular phenotype is verified by overexpression of a nucleic acid encoding the expression of the relevant gene using methods known in the art. Exemplary methods are described below.
For example, nucleic acids overexpressing specific targets can be introduced into CHO cells by transient transfections and then the impact of over-expression on cellular growth and productivity are monitored. An exemplary protocol, 24 well format, was illustrated in
Growth and productivity controls are typically used for overexpression assays. For example, positive growth/viability control used in this experiment included Ha-Ras and Bcl-xL. Negative growth control used included p27. Other suitable growth and productivity controls are known in the art and can be used for overexpression assays. Additional standard controls such as no nucleic acid control (transfection reagents only) were also included.
Target genes and the control genes were cloned into the pExpress1 vector and introduced into various model cell lines as shown in Table 22.
The 24 well format was used to distinguish phenotypic effects of transient transfection of various genes on various cell lines. Cellular growth and productivity were determined. Exemplary results are illustrated in
NA = Not tested or awaiting results
(* * *) Strongly increased or decreased
(* *) Increased or decreased
(*) moderately increased or decreased
The verified target genes are used to effect a cell phenotype, particularly a phenotype characterized by increased and efficient production of a recombinant transgene, increased cell growth rate, high peak cell density, sustained high cell viability, high maximum cellular productivity, sustained high cellular productivity, low ammonium production, and low lactate production, etc. Exemplary target genes are disclosed above, for example, in Tables 2 through 20 and in Tables 24 through 30.
norvegicus]
gallus]
Standard cell engineering methods are used to modify target genes to effect desired cell phenotypes. As discussed above, target genes are modified to achieve desired CHO cell phenotypes by interfering RNA, conventional gene knockout or overexpression methods. Typically, knockout methods or stable transfection methods with overexpression constructs are used to engineer modified CHO cell lines. Other suitable methods are discussed in the general description section and known in the art.
The foregoing description of the present invention provides illustration and description, but is not intended to be exhaustive or to limit the invention to the precise one disclosed. Modifications and variations are possible consistent with the above teachings or may be acquired from practice of the invention. Thus, it is noted that the scope of the invention is defined by the claims and their equivalents.
All sequence accession numbers, publications and patent documents cited in this application are incorporated by reference in their entirety for all purposes to the same extent as if the contents of each individual publication or patent document was incorporated herein.
This application claims priority to and the benefit of U.S. Provisional Patent Application No. 60/794,299, filed on Apr. 21, 2006, and U.S. Provisional Patent Application No. 60/897,412, filed on Jan. 25, 2007, the entire contents of both of which are incorporated by reference herein.
Number | Date | Country | |
---|---|---|---|
60897412 | Jan 2007 | US | |
60794299 | Apr 2006 | US |