The invention relates generally to control genes that may be utilized for normalizing hybridization and/or amplification reactions.
Nucleic acid hybridization and other quantitative nucleic acid detection assays are routinely used in medical and biotechnological research and development, diagnostic testing, drug development and forensics. Such technologies have been used to identify genes which are up- or down-regulated in various disease or physiological states, to analyze the roles of the members of cellular signaling cascades and to identify druggable targets for various disease and pathology states.
Examples of technologies commonly used for the detection and/or quantification of nucleic acids include northern blotting (Krumlauf (1994) Mol Biotechnol 2(3), 227-242), ill situ hybridization (Parker & Barnes (1999) Methods Mol Biol. 106, 247-83), RNAse protection assays (Hod (1992) Biotechniques 13(6), 852-854; Saccomanno et al. (1992) Biotechniques 13(6):846-50), microarrays, and reverse transcription polymerase chain reaction (RT-PCR) (see Bustin, (2000) Journal of Molecular Endocrinology 25, 169-193).
The reliability of these nucleic acid detection methods depend on the availability of accurate means for accounting for variations between analyses. For example, variations in hybridization conditions, label intensity, reading and detector efficiency, sample concentration and quality, background effects, and image processing effects each contribute to signal heterogeneity. Hegde et al. (2000) Biotechniques 29(3): 548-562; Berger et al. (2000) WO 00/04188. Normalization procedures used to overcome these variations often rely on control hybridizations to housekeeping genes such as β-actin, glyceraldehyde-3-phosphate dehydrogenase, and the transferrin receptor gene. Eickhoffet al. (1999) Nucleic Acids Research 27(22): e33; Spiess et al. (1999) Biotechniques 26(1): 46-50. These methods, however, generally do not provide the signal linearity sufficient to detect small but significant changes in transcription or gene expression. Spiess et al. (1999) Biotechniques 26(1): 46-50. In addition, the steady state levels of many housekeeping genes are susceptible to alterations in expression levels that are dependent on cell differentiation, nutritional state, specific experimental and stimulation protocols. Eickhoffet al. (1999) Nucleic Acids Research 27(22): e33; Spiess et al. (1999) Biotechniques 26(1): 46-50; Hegde et al. (2000)Biotechniques 29(3): 548-562; and Berger et al. (2000) WO 00/04188. Consequently, there exists a need for the identification and use of additional genes that may serve as effective controls in nucleic acid detection assays.
The present invention includes methods of identifying at least one gene that is consistently expressed across different cell or tissue types in an organism, comprising: preparing gene expression profiles for different cell or tissue types from the organism; calculating a coefficient of variation for at least one gene in each of the profiles across the different cell or tissue types; and selecting any gene whose coefficient of variation indicates that the gene is consistently expressed across the different cell or tissue types. The coefficient of variation may be less than about 40% and the methods may comprise creating gene expression profiles for about 10, 25, 50, 100 or more different cell or tissue types. The gene expression profiles may be prepared be querying a gene expression database.
The invention also includes a set of probes comprising at least two probes that specifically hybridize to a control gene identified by the methods of the invention. Such sets of probes may comprise probes that specifically hybridize to at least about 10, 25, 50 or 100 control genes. In some formats, the sets of probes are attached to a solid substrate such as a microarray or chip.
The invention also includes methods of normalizing the data from a nucleic acid detection assay comprising: detecting the expression level for at least one gene in a nucleic acid sample; and normalizing the expression of said at least one gene with the detected expression of at least one control gene identified by the method of the invention. The number of control genes used to normalize gene expression data may comprise about 10, 25, 50, 100 or more of the control genes herein identified.
In another embodiment, the invention includes a set of probes comprising at least two probes that specifically hybridize to a gene of Table 1 or Table 2. The set may comprise at least about 10, 25, 50, 100 or more the control genes of Table 1 or Table 2. The sets of probes may or may not be attached to a solid substrate such as a chip.
The invention, in another embodiment, includes methods of normalizing the data from a nucleic acid detection assay comprising: detecting the expression level for at least one gene in a nucleic acid sample; and normalizing the expression of said at least one gene with the detected expression of at least one control gene of Table 1 or Table 2. The number of control genes used to normalize gene expression data may comprise about 10, 25, 50, 100 or more of the control genes herein identified.
The present Inventors have identified control genes that may be monitored in nucleic acid detection assays and whose expression levels may be used to normalize gene expression data. Normalization of gene expression data from a cell or tissue sample with the expression level(s) of the identified genes allows the accurate assessment of the expression level(s) for genes that are differentially regulated between samples, tissues, treatment conditions, etc. These genes may be used across a broad spectrum of assay formats, but are particularly useful in microarray or hybridization based assay formats.
A. Nucleic Acid Detection Assay Controls
1. Selection of Control Genes
As used herein, the genes and nucleic acids of Tables 1 and 2 are referred to as “control genes.”
Control genes of the invention are produced by a method comprising preparing gene expression profiles (a representation of the expression level for at least one gene, preferably 10, 50, 100 or more, or most preferably nearly all or all expressed genes in a sample) from a variety of cell or tissue types, measuring the level of expression for at least one gene in each of the gene expression profiles to produce gene expression data, calculating a coefficient of variation from the gene expression data for each gene and selecting genes whose coefficient of variation indicates that the gene is consistently expressed at about the same level in the different cell or tissue types.
The gene expression profile may be produced by any means of quantifying gene expression for at least one gene in the tissue or cell sample. In preferred methods, gene expression is quantified by a method selected from the group consisting of a hybridization assay or an amplification assay. Hybridization assays may be any assay format, such as this described below, that relies on the hybridization of a probe or primer to a nucleic acid molecule in the sample. Such formats include, but are not limited to, differential display formats and microarray hybridization, including microarrays produced in chip format. Amplification assays include, but are not limited to, quantitative PCR, semiquantitative PCR and assays that rely on amplification of nucleic acids subsequent to the hybridization of the nucleic acid to a probe or primer. Such assays include the amplification of nucleic acid molecules from a sample that are bound to a microarray or chip.
In other circumstances, gene expression profiles may be produced by querying a gene expression database comprising expression results for genes from various cell or tissue samples. The gene expression results in the database may be produced by any available method, such as differential display methods and microarray-based hybridization methods. The gene expression profile is typically produced by the step of querying the database with the identity of a specific cell or tissue type for the genes that are expressed in the cell or tissue type and/or the genes that are differentially regulated compared to a control cell or tissue sample. Available databases include, but are not limited to, the Gene Logic GeneExpress® database, the Gene Expression Omnibus gene expression and hybridization array repository available through NCBI (www.ncbi.nln.nih.gov/entrez) and the SAGE™ gene expression database.
The cell or tissue samples that are used to prepare gene expression profiles may include any cell or tissue sample available. Such samples include, but are not limited to, tissues removed as surgical samples, diseased or normal tissues, in vitro or in vivo grown cells, cell culture and cells or tissues exposed to an agent such as a toxin. The number of samples required to calculate a coefficient of variation is variable, but may include about 10, 25, 50, 100, 200, 500 or more cell or tissue samples. The cell or tissue samples may be derived from an animal or plant, preferably a mammal. In some instances, the cell or tissue samples may be human, canine (dog), mouse or rat in origin.
The coefficient of variation may be calculated from raw expression data or from data that has been normalized to control for the mechanics of hybridization, such as data normalized or controlled for background noise due to non-specific hybridization. Such data typically includes, but is not limited to, fluorescence readings from microarray based hybridizations, densitometry readings produced from assays that rely on radiological labels to detect and quantify gene expression and data produced from quantitative or semi-quantitative amplification assays.
The coefficient of variation (% CV) is typically calculated by calculating a mean value for the expression level of a given gene across a number of samples and calculating the standard deviation (SD) from that mean. The % CV may be calculated by the following equation: % CV=SD/Mean×100. Genes with a % CV of less than about 50% and preferably less than about 40%, may be selected as control genes or are considered as genes that are consistently expressed across the different cell or tissue types tested.
As used herein, “background” refers to signals associated with non-specific binding (cross-hybridization). In addition to cross-hybridization, background may also be produced by intrinsic fluorescence of the hybridization format components themselves.
“Bind(s) substantially” refers to complementary hybridization between an oligonucleotide probe and a nucleic acid sample and embraces minor mismatches that can be accommodated by reducing the stringency of the hybridization media to achieve the desired detection of the nucleic acid sample.
The phrase “hybridizing specifically to” refers to the binding, duplexing or hybridizing of a molecule substantially to or only to a particular nucleotide sequence or sequences under stringent conditions when that sequence is present in a complex mixture (e.g., total cellular) DNA or RNA.
2. Preparation of Controls Genes, Probes and Primers
The control genes listed in Tables 1 and 2 may be obtained from a variety of natural sources such as organisms, organs, tissues and cells. The sequences of known genes are in the public databases. The GenBank Accession Number corresponding to the Normalization Control Genes can be found in the third column of the Tables under “Exemplar Seq: Accession.” The sequences of the genes in GenBank (http://www.ncbi.nlin.nih.gov/) are herein incorporated by reference in their entirety.
Probes or primers for the nucleic acid detection assays described herein that specifically hybridize to a control gene may be produced by any available means. For instance, probe sequences may be prepared by cleaving DNA molecules produced by standard procedures with commercially available restriction endonucleases or other cleaving agent. Following isolation and purification, these resultant normalization control gene fragments can be used directly, amplified by PCR methods or amplified by replication or expression from a vector.
Control genes and control gene probes or primers (i.e., synthetic oligo- and polynucleotides) are most easily synthesized by chemical techniques, for example, the phosphotriester method of Matteucci, et al. ((1981) J. Am. Chem. Soc. 103: 3185-3191) or using automated synthesis methods using the GenBank sequences disclosed in Tables 1 and 2. In addition, larger nucleic acids can readily be prepared by well known methods, such as synthesis of a group of oligonucleotides that define various modular segments of the normalization control genes and normalization control gene segments, followed by ligation of oligonucleotides to build the complete nucleic acid molecule.
B. Normalization Methods
Gene expression data produced from the control genes in a given sample or samples may be used to normalize the gene expression data from other genes using any available arithmatic or calculative means. Such methods include, but are not limited, methods of data analysis described by Hegde et al. (2000)Biotechiniques 29(3): 548-562; Winzeller et al. (1999) Meth. Enzymol. 306(1): 3-18; Tkatchenko et al. (2000) Biochimica et Biophysica Acta 1500: 17-30; Berger et al. (2000) WO 00/04188; Schuchhardt et al. (2000) Nucleic Acids Research 28(10): e47; Eickhoffet al. (1999) Nucleic Acids Research 27(22): e33. Micro-array data analysis and image processing software packages and protocols, including normalization methods, are also available from BioDiscovery (http://www.biodiscovery.com/), Silicon Graphics (http://www.sigenetics.com), Spotfire (http://www.spotfire.com/), Stanford University (http://rana.Stanford.EDU/software/), National Human Genome Research Institute (http://www.nhgri.nih.gov/DIR/LCG/15K/HTML/img_analysis.html), TIGR (http://www.tigr.org/softlab/), and Affymetrix (affy and maffy paclkages), among others.
C. Assay or Hybridization Formats
The control genes of the present invention may be used in any nucleic acid detection assay format, including solution-based and solid support-based assay formats. As used herein, “hybridization assay format(s)” refer to the organization of the oligonucleotide probes relative to the nucleic acid sample. The hybridization assay formats that may be used with the control genes and methods of the present invention include assays where the nucleic acid sample is labeled with one or more detectable labels, assays where the probes are labeled with one or more detectable labels, and assays where the sample or the probes are immobilized. Hybridization assay formats include but are not limited to: Northern blots, Southern blots, dot blots, solution-based assays, branched-DNA assays, PCR, RT-PCR, quantitative or semi-quantitative RT-PCR, microarrays and biochips.
As used herein, “nucleic acid hybridization” simply involves contacting a probe and nucleic acid sample under conditions where the probe and its complementary target can form stable hybrid duplexes through complementary base pairing (see Lockhart et al., (1999) WO 99/32660). The nucleic acids that do not form hybrid duplexes are then washed away leaving the hybridized nucleic acids to be detected, typically through detection of an attached detectable label.
It is generally recognized that nucleic acids are denatured by increasing the temperature or decreasing the salt concentration of the buffer containing the nucleic acids.
Under low stringency conditions (e.g., low temperature and/or high salt) hybrid duplexes (e.g., DNA-DNA, RNA-RNA or RNA-DNA) will form even where the annealed sequences are not perfectly complementary. Thus, specificity of hybridization is reduced at lower stringency. Conversely, at higher stringency (e.g., higher temperature or lower salt) successful hybridization requires fewer mismatches. One of skill in the art will appreciate that hybridization conditions may be selected to provide any degree of stringency. In a preferred embodiment, hybridization is performed at low stringency, in this case in 6×SSPE-T at 37° C. (0.005% Triton x-100) to ensure hybridization and then subsequent washes are performed at higher stringency (e.g., 1×SSPE-T at 37° C.) to eliminate mismatched hybrid duplexes. Successive washes may be performed at increasingly higher stringency (e.g., down to as low as 0.25×SSPET at 37° C. to 50° C. until a desired level of hybridization specificity is obtained. Stringency can also be increased by addition of agents such as formamide. Hybridization specificity may be evaluated by comparison of hybridization to the test probes with hybridization to the various controls that can be present (e.g., expression level control, normalization control, mismatch controls, etc.).
As used herein, the term “stringent conditions” refers to conditions under which a probe will hybridize to a complementary control nucleic acid, but with only insubstantial hybridization to other sequences. Stringent conditions are sequence-dependent and will be different under different circumstances. Longer sequences hybridize specifically at higher temperatures. Generally, stringent conditions are selected to be about 5° C. lower than the thermal melting point (Tm) for the specific sequence at a defined ionic strength and pH.
Typically, stringent conditions will be those in which the salt concentration is at least about 0.01 to 1.0 M sodium ion concentration (or other salts) at pH 7.0 to 8.3 and the temperature is at least about 30° C. for short probes (e.g., 10 to 50 nucleotide). Stringent conditions may also be achieved with the addition of destabilizing agents such as formamide.
In general, there is a tradeoff between hybridization specificity (stringency) and signal intensity. Thus, in a preferred embodiment, the wash is performed at the highest stringency that produces consistent results and that provides a signal intensity greater than approximately 10% of the background intensity. Thus, in a preferred embodiment, the hybridized array may be washed at successively higher stringency solutions and read between each wash. Analysis of the data sets thus produced will reveal a wash stringency above that the hybridization pattern is not appreciably altered and which provides adequate signal for the particular oligonucleotide probes of interest.
The “percentage of sequence identity” or “sequence identity” is determined by comparing two optimally aligned sequences or subsequences over a comparison window or span, wherein the portion of the polynucleotide sequence in the comparison window may optionally comprise additions or deletions (i.e., gaps) as compared to the reference sequence (which does not comprise additions or deletions) for optimal alignment of the two sequences. The percentage is calculated by determining the number of positions at which the identical residue (e.g., nucleic acid base or amino-acid residue) occurs in both sequences to yield the number of matched positions, dividing the number of matched positions by the total number of positions in the window of comparison and multiplying the result by 100 to yield the percentage of sequence identity. Percentage sequence identity when calculated using the programs GAP or BESTFIT (see below) is calculated using default gap weights. Sequences corresponding to the control genes of Tables 1 and 2 may comprise at least about 70% sequence identity to the GenBank IDS of the genes in the Tables, preferably about 75%, 80% or 85% or more preferably, about 90% or 95% or more identity.
Homology or identity is determined by BLAST (Basic Local Alignment Search Tool) analysis using the algorithm employed by the programs blastp, blastn, blastx, tblastn and tblastx (Karlin et al., (1990) Proc. Natl. Acad. Sci. USA 87, 2264-2268 and Altschul, (1993) J. Mol. Evol. 36, 290-300, fully incorporated by reference) which are tailored for sequence similarity searching. The approach used by the BLAST program is to first consider similar segments between a query sequence and a database sequence, then to evaluate the statistical significance of all matches that are identified and finally to summarize only those matches which satisfy a preselected threshold of significance. For a discussion of basic issues in similarity searching of sequence databases, see Altschul et al., (1994) Nature Genet. 6, 119-129) which is fully incorporated by reference. The search parameters for histogram, descriptions, alignments, expect (i.e., the statistical significance threshold for reporting matches against database sequences), cutoff, matrix and filter are at the default settings. The default scoring matrix used by blastp, blastx, tblastn, and tblastx is the BLOSUM62 matrix (Henilcoff et al., (1992) Proc. Natl. Acad. Sci. USA 89, 10915-10919, fully incorporated by reference). Four blastn parameters were adjusted as follows: Q=10 (gap creation penalty); R=10 (gap extension penalty); wink=1 (generates word hits at every winkth position along the query); and gapw=16 (sets the window width within which gapped alignments are generated). The equivalent Blastp parameter settings were Q=9; R=2; winkl; and gapw=32. A Bestfit comparison between sequences, available in the GCG package version 10.0, uses DNA parameters GAP=50 (gap creation penalty) and LEN=3 (gap extension penalty) and the equivalent settings in protein comparisons are GAP=8 and LEN=2.
As used herein a “probe” or “oligonucleotide probe” is defined as a nucleic acid, capable of binding to a nucleic acid sample or complementary control gene nucleic acid through one or more types of chemical bonds, usually through complementary base pairing, usually through hydrogen bond formation. As used herein, a probe may include natural (i.e., A, G, U, C or T) or modified bases (7-deazaguanosine, inosine, etc.). In addition, the bases in probes may be joined by a linkage other than a phosphodiester bond, so long as it does not interfere with hybridization. Thus, probes may be peptide nucleic acids in which the constituent bases are joined by peptide bonds rather than phosphodiester linkages.
Probe arrays may contain at least two or more oligonucleotides that are complementary to or hybridize to one or more of the control genes described herein. Such arrays may also contain oligonucleotides that are complementary or hybridize to at least about 2, 3, 5, 7, 10, 50, 100 or more the genes described herein. Any solid surface to which oligonucleotides or nucleic acid sample can be bound, either directly or indirectly, either covalently or non-covalently, can be used. For example, solid supports for various hybridization assay formats can be filters, polyvinyl chloride dishes, silicon or glass based chips, etc. Glass-based solid supports, for example, are widely available, as well as associated hybridization protocols. (See, e.g., Beattie, WO 95/11755).
A preferred solid support is a high density array or DNA chip. This contains an oligonucleotide probe of a particular nucleotide sequence at a particular location on the array. Each particular location may contain more than one molecule of the probe, but each molecule within the particular location has an identical sequence. Such particular locations are termed features. There may be, for example, 2, 10, 100, 1000, 10,000, 100,000, 400,000, 1,000,000 or more such features on a single solid support. The solid support, or more specifically, the area wherein the probes are attached, may be on the order of a square centimeter.
1. Dot Blots
The control genes listed in Tables 1 and 2 and methods of the present invention may be utilized in numerous hybridization formats such as dot blots, dipstick, branched DNA sandwich and ELISA assays. Dot blot hybridization assays provide a convenient and efficient method of rapidly analyzing nucleic acid samples in a sensitive manner. Dot blots are generally as sensitive as enzyme-linked immunoassays. Dot blot hybridization analyses are well known in the art and detailed methods of conducting and optimizing these assays are detailed in U.S. Pat. No. 6,130,042 and 6,129,828, and Tkatchenlco et al. (2000) Biochimica et Biophysica Acta 1500: 17-30. Specifically, labeled or unlabeled nucleic acid sample is denatured and bound to a membrane (i.e. nitrocellulose), and is then contacted with unlabeled or labeled oligonucleotide probes. Buffer and temperature conditions can be adjusted to vary the degree of identity between the oligonucleotide probes and nucleic acid sample necessary for hybridization.
Several modifications of the basic Dot blot hybridization format have been devised. For example, Reverse Dot blot analyses employ the same strategy as the Dot blot method, except that the oligonucleotide probes are bound to the membrane and the nucleic acid sample is applied and hybridized to the bound probes. Similarly, the Dot blot hybridization format can be modified to include formats where either the nucleic acid sample or the oligonucleotide probe is applied to microtiter plates, microbeads or other solid substrates.
2. Membrane-Based Formats
Although each membrane-based format is essentially a variation of the Dot blot hybridization format, several types of these formats are preferred. Specifically, the methods of the present invention may be used in Northern and Southern blot hybridization assays. Although the methods of the present invention are generally used in quantitative nucleic acid hybridization assays, these methods may be used in qualitative or semi-quantitative assays such as Southern blots, in order to facilitate comparison of blots. Southern blot hybridization, for example, involves cleavage of either genomic or cDNA with restriction endonucleases followed by separation of the resultant fragments on a polyacrylamide or agarose gel and transfer of the nucleic acid fragments to a membrane filter. Labeled oligonucleotide probes are then hybridized to the membrane-bound nucleic acid fragments. In addition, intact cDNA molecules may also be used, separated by electrophoresis, transferred to a membrane and analyzed by hybridization to labeled probes. Northern analyses, similarly, are conducted on nucleic acids, either intact or fragmented, that are bound to a membrane. The nucleic acids in Northern analyses, however, are generally RNA.
3. Arrays.
Any microarray platform or technology may be used to produce gene expression data that may be normalized with the control genes and methods of the invention. Oligonucleotide probe arrays can be made and used according to any techniques known in the art (see for example, Lockhart et al., (1996) Nat. Biotechnol. 14, 1675-1680; McGall et al., (1996) Proc. Nat. Acad. Sci. USA 93, 13555-13460). Such probe arrays may contain at least one or more oligonucleotides that are complementary to or hybridize to one or more of the nucleic acids of the nucleic acid sample and/or the control genes of Tables 1 and 2. Such arrays may also contain oligonucleotides that are complementary or hybridize to at least 2, 3, 5, 7, 10, 50, 100 or more of the control genes listed in Tables 1 and 2.
Control oligonucleotide probes of the invention are preferably of sufficient length to specifically hybridize only to appropriate, complementary genes or transcripts. Typically the oligonucleotide probes will be at least about 10, 12, 14, 16, 18, 20 or 25 nucleotides in length. In some cases longer probes of at least 30, 40, or 50 nucleotides will be desirable. The oligonucleotide probes of high density array chips include oligonucleotides that range from about 5 to about 45 or 5 to about 500 nucleotides, more preferably from about 10 to about 40 nucleotides and most preferably from about 15 to about 40 nucleotides in length. In other particularly preferred embodiments the probes are 20 or 25 nucleotides in length. In another preferred embodiment, probes are double or single strand DNA sequences. The oligonucleotide probes are capable of specifically hybridizing to the control gene nucleic acids in a sample.
One of skill in the art will appreciate that an enormous number of array designs comprising control probes of the invention are suitable for the practice of this invention. The high density array will typically include a number of probes that specifically hybridize to each control gene nucleic acid, e.g. mRNA or cRNA. (See WO 99/32660 for methods of producing probes for a given gene or genes.) Assays and methods comprising control probes of the invention may utilize available formats to simultaneously screen at least about 100, preferably about 1000, more preferably about 10,000 and most preferably about 500,000 or 1,000,000 different nucleic acid hybridizations.
The methods and control genes of this invention may also be used to normalize gene expression data produced using commercially available oligonucleotide arrays that contain or are modified to contain control gene probes or the invention. A preferred oligonucleotide array may be selected from the Affymetrix, Inc. GeneChip® series of arrays which include the GeneChip® Human Genome U95 Set, GeneChip® Hu35K Set, GeneChip®, HuGeneFL Array, GeneChip® Human Cancer G110 Array, GeneChip® Rat Genome U34 Set, GeneChip® Mu19K Set, GeneChip® Mu11K Set, GeneChip® Yeast Genome S98 Array, GeneChip® E. coli Genome Array, GeneChip® Arabidopsis Genome Array, GeneChip® HuSNP™ Probe Array, GeneChip® GenFlex™ Tag Array, GeneChip® HIV PRT Plus Probe Array, GeneChip® P53 Probe Array, GeneChip®, and the CYP450 Probe Array. In another embodiment, an oligonucleotide array may be selected from the Incyte Pharmaceuticals, Inc. GEM™ series of arrays which includes the UniGEM™ V 2.0, Human Genome GEM 1, Human Genome GEM 2, Human Genome GEM 3, Human Genome GEM 4, Human Genome GEM 5, LifeGEM™ 1 Cancer/Signal Peptide, LifeGEM 2 Inflammation/Blood, Mouse GEM 1 Rat GEM 1 Liver/Kidney,Rat GEM 2 Central Nervous System, Rat GEM 3 Liver/Kidney, S. aureus GEM 1, C. albicans GEM 1, and Arabidopsis GEM.
4. RT-PCR
The control genes and methods of the invention may be used in any type of polymerase chain reaction. A preferred PCR format is reverse transciptase polymerase chain reaction (RT-PCR), an in vitro method for enzymatically amplifying defined sequences of RNA (Rappolee et al., (1988) Science 241, 708-712) permitting the analysis of different samples from as little as one cell in the same experiment (See Arubion: RT-PCR: The Basics; M. J. McPherson and S. G. Møller, PCR BIOS Scientific Publishers Ltd Oxford, OX4 1RE (2000); Dieffenbach et al., PCR Primer: A Laboratory Manual Cold Spring Harbor Laboratory Press 1995 for review). One of ordinary skill in the art may appreciate the enormous number of variations in RT-PCR platforms that are suitable for the practice of the invention, including complex variations aimed at increasing sensitivity such as semi-nested (Wasserman et al., (1999) Molecular Diagnostics 4, 21-28), nested (Israeli et al., (1994) Cancer Research 54, 6303-6310; Soeth et al., (1996) International Journal of Cancer 69, 278-282), and even three-step nested (Funaki et al., (1997) Life Sciences 60, 643-652; Funaki et al., (1998) British Journal of Cancer 77, 1327-1332).
In one embodiment of the invention, separate enzymes are used for reverse transcription and PCR amplification. Two commonly used reverse transcriptases, for example, are avian myeloblastosis virus and Moloney murine leukaemia virus. For amplification, a number of thermostable DNA-dependent DNA polymerases are currently available, although they differ in processivity, fidelity, thermal stability and ability to read modified triphosphates such as deoxyuridine and deoxyinosine in the template strand (Adams et al., (1994) Bioorganic and Medicinal Chemistry 2, 659-667; Perler et al., (1996) Advances in Protein Chemistry 48, 377-435). The most commonly used enzyme, Taq DNA polymerase, has a 5′-3′ nuclease activity but lacks a 3′-5′ proofreading exonuclease activity. When fidelity is required, proofreading exonucleases such as Vent and Deep Vent (New England Biolabs) or Pfu (Stratagene) may be used (Cline et al., (1996) Nucleic Acids Research 24, 3456-3551). In another embodiment of the invention, a single enzyme approach may be used involving a DNA polymerase with intrinsic reverse transcriptase activity, such as Thermus thermophius (Tth) polymerase (Bustin, (2000) Journal of Molecular Endocrinology 25, 169-193. A skilled artisan may appreciate the variety of enzymes available for use in the present invention.
The methodologies and control gene primers of the present invention may be used, for example, in any kinetic RT-PCR methodology, including those that combine fluorescence techniques with instrumentation capable of combining amplification, detection and quantification (Orlando et al., (1998) Clinical Chemistry and Laboratory Medicine 36, 255-269). The choice of instrumentation is particularly important in multiplex RT-PCR, wherein multiple primer sets are used to amplify multiple specific targets simultaneously. This requires simultaneous detection of multiple fluorescent dyes. Accurate quantitation while maintaining a broad dynamic range of sensitivity across mRNA levels is the focus of upcoming technologies, any of which are applicable for use in the present invention. Preferred instrumentation may be selected from the ABI Prism 7700 (Perkin-Elmer-Applied Biosystems), the Lightcycler (Roche Molecular Biochemicals) and iCycler Thermal Cycler. Featured aspects of these products include high-throughput capacities or unique photodetection devices.
Without further description, it is believed that one of ordinary skill in the art can, using the preceding description and the following illustrative examples, practice the methods and use the control genes of the present invention. The following working examples therefore, specifically point out the preferred embodiments of the present invention, and are not to be construed as limiting in any way the remainder of the disclosure.
The control genes were selected by querying the Gene Logic GeneExpress® database to create expression profiles from a variety of human cell and tissue samples. Table 3 A-B lists and describes the tissue or cell samples used to identify control genes listed in Tables 1 and 2. The first column of Table 3 identifies the organ of the particular sample, the second details the morphology, and the third column provides the number of samples. Table 3 A-B includes 695 diseased and 560 normal samples.
The GeneExpress® database was produced from data derived from screening various cell or tissue samples using the Affymetrix human chip set. In general, tissue and cell samples were processed following the Affymetrix GeneChip® Expression Analysis Manual. Frozen tissue was first ground to powder using the Spex Certiprep 6800 Freezer Mill. Total RNA was then extracted using Trizol (Invitrogen Life Technologies) followed by a cleanup step utilizing the RNeasy Mini Kit and if required ethanol precipitated to achieve a concentration of 1 μg/pl. Using 10-40 μg of total RNA, double stranded cDNA was created using the SuperScript Choice system (Invitrogen Life Technologies). First strand cDNA synthesis was primed with a T7-(dT24) oligonucleotide. The cDNA was then phenol-chloroform extracted and ethanol precipitated to a final concentration of 1 μg/μl.
55 μg of fragmented cRNA was hybridized on the Human Genome U95 set for twenty-four hours at 60 rpm in a 45° C. hybridization oven, according to the Affymetrix protocol. The chips were washed and stained with Streptavidin Phycoerythrin (SAPE) (Molecular Probes) in Affymetrix fluidics stations. To amplify staining, the chips were washed with SAPE solution, stained with an anti-streptavidin biotinylated antibody (Vector Laboratories) followed by washing with SAPE solution. Hybridization to the probe arrays was detected by fluorometric scanning (Hewlett Packard Gene Array Scanner). Following hybridization and scanning, the microarray images were analyzed for quality control, looking for major chip defects or abnormalities in hybridization signal. After all chips passed quality control, the data was analyzed using Affymetrix GeneChip® software (v3.0).
Gene expression data was then analyzed to identify those genes that are consistently expressed across 1255 normal and disease samples, e.g. being called Present more than 95% of the time. Table 1 provides an initial list of approximately 560 genes with a % CV less than 30% across the normal and disease samples studied. Table 1 also provides the mean expression value, an exemplary GenBank accession number for each of the genes and the standard deviation value from the mean for each gene. The GenBank accession numbers can be used to locate the publicly available sequences and all GenBank accession numbers herein reported at specifically incorporated by reference in their entirety. This list of 560 genes from 1255 normal and diseased samples had been scanned on Affymetrix human U95 A GeneChip® scanned on a high photomultiplier tube (PMT) settings.
The gene list of Table 1 was then re-examined by utilizing human samples run on the Affyymetrix human U95 A GeneChip® scanned on a low photomultiplier tube (PMT) settings. The human samples consisted of 55 human tissue samples and 46 human cancer cell lines. For each of these samples, the mean average difference, standard deviation and % CV were determined for each Affymetrix fragment on the human U95 A GeneChip®. The data was sorted by % CV and those gene fragments with values less than 40% were chosen for fiuther analysis after all genes with underscore annotations were deleted (i.e. _f, _s, _r, etc.) [see www.affimetrix.com].
The high PMT list was then compared with the low PMT list and all genes that were not present on both lists were removed. All genes with underscore annotations were then deleted from the list (i.e. _f, _s, _r, etc.). This resulted in a list of 771 genes. The list was then filtered to show CV values equal or less than 28% at low PMT settings as well as CV values equal or less than 31% at high PMT settings. Six additional human genes with CV values equal or less 37% at low PMT settings and equal or less than 32% at high PMT settings were added to the list. These six genes have rat homologue genes that exhibited constant gene expression over untreated and toxin treated rat samples scanned at low PMT settings (˜200 samples). The resulting control gene list is in Table 2.
The expression levels of one or more genes listed in Tables 1 and 2 may be used to normalize gene expression data produced using Quantitative PCR analysis. For example, Table 4 provides sequences for use as Taqman probes along with the forward and reverse primers for three genes: sorting nexin 3, polymerase (RNA) II (DNA directed) polypeptide F, and seryl-tRNA synthetase in Table 1 or 2. Real time PCR detection may be accomplished by the use of the ABI PRISM 7700 Sequence Detection System. The 7700 measures the fluorescence intensity of the sample each cycle and is able to detect the presence of specific amplicons within the PCR reaction. TaqMan® assay provided by Perkin Ebner may be used to assay quantities of RNA. The primers may be designed from each of the identified genes of Table 1 using Primer Express, a program developed by PE to efficiently find primers and probes for specific sequences. These primers may be used in conjunction with SYBR green (Molecular Probes), a nonspecific double stranded DNA dye, to measure the expression level mRNA corresponding to the expression levels of each gene. This gene expression data may then be used to normalize gene expression data of other test genes.
Although the present invention has been described in detail with reference to examples above, it is understood that various modifications can be made without departing from the spirit of the invention. Accordingly, the invention is limited only by the following claims. All cited patents and publications referred to in this application are herein incorporated by reference in their entirety.
H. sapiens tunp mRNA for transformation upregulated nuclear
H. sapiens mRNA for ribosomal protein L6.
H. sapiens BAT1 mRNA for nuclear RNA helicase (DEAD family).
Homo sapiens alpha NAC mRNA, complete cds.
Homo sapiens sperm acrosomal protein mRNA, complete cds.
Homo sapiens cytochrome c oxidase subunit IV precursor (COX4)
H. sapiens mRNA for heterogeneous nuclear ribonucleoprotein.
Homo sapiens mRNA for TAP/NXF1 protein (nxf1 gene).
Homo sapiens SPF31 (SPF31) mRNA, complete cds.
Homo sapiens mRNA for KIAA0788 protein, partial cds.
H. sapiens BTF3b mRNA.
Homo sapiens ribosomal protein L30 mRNA, complete cds.
H. sapiens mRNA for transmembrane protein rnp24.
H. sapiens gene for ribosomal protein S7.
H. sapiens mRNA for protein phosphatase 6.
H. sapiens MLN51 mRNA.
H. sapiens gene for ribosomal protein S6.
Homo sapiens fb19 mRNA.
Homo sapiens mRNA for GCP170, complete cds.
H. sapiens mRNA for 23 kD highly basic protein.
Homo sapiens protein kinase C inhibitor (PKCI-1) mRNA, complete
Homo sapiens ribosomal protein L30 mRNA, complete cds.
Homo sapiens mRNA for hypothetical protein.
Homo sapiens 60S ribosomal protein L12 (RPL12) pseudogene,
Homo sapiens mRNA for poly(A)-specific ribonuclease.
Homo sapiens mRNA for squamous cell carcinoma antigen SART-3,
H. sapiens mRNA for ORF.
Homo sapiens mRNA for Prer protein.
H. sapiens mRNA for BCL7B protein.
Homo sapiens integral membrane protein, calnexin, (IP90) mRNA,
H. sapiens mRNA for mitochondrial phosphate carrier protein.
Homo sapiens ribosomal protein L18 (RPL18) mRNA, complete cds.
Homo sapiens mRNA for KIAA0572 protein, partial cds.
Homo sapiens mRNA; cDNA DKFZp564O1716 (from clone
H. sapiens mRNA for ribosomal protein L19.
H. sapiens mRNA for non-muscle type cofilin.
Homo sapiens RHOA proto-oncogene multi-drug-resistance protein
Homo sapiens beta-3A-adaptin subunit of the AP-3 complex mRNA,
Homo sapiens RNA-binding protein regulatory subunit mRNA,
Homo sapiens protein tyrosine phosphatase PIR1 mRNA, complete
H. sapiens mRNA for ribosomal protein L7.
H. sapiens U21.1 mRNA.
Homo sapiens hJTB gene, complete cds.
Homo sapiens sorting nexin 3 (SNX3) mRNA, complete cds.
Homo sapiens chaperonin containing t-complex polypeptide 1, delta
Homo sapiens translation initiation factor elF3 p66 subunit mRNA,
H. sapiens mRNA for ribosomal protein L11.
Homo sapiens clone 24448 unknown mRNA, partial cds.
Homo sapiens mRNA, expressed in fibroblasts of periodontal
H. sapiens BBC1 mRNA.
Homo sapiens nucleophosmin phosphoprotein (NPM) gene, 3′
Homo sapiens GTP binding protein mRNA, complete cds.
Homo sapiens hnRNP-C like protein mRNA, complete cds.
Homo sapiens TNF-alpha stimulated ABC protein (ABC50) mRNA,
H. sapiens rpS8 gene for ribosomal protein S8.
H. sapiens mRNA for 218 kD Mi-2 protein.
Homo sapiens glycogen synthase kinase 3 mRNA, complete cds.
H. sapiens mRNA for C1D protein.
Homo sapiens mRNA for KIAA0169 protein, partial cds.
Homo sapiens mRNA for ribosomal protein L14, complete cds.
Homo sapiens poly(A) binding protein∥(PABP2) gene, complete
Homo sapiens GLI-Krupple related protein (YY1) mRNA, complete
Homo sapiens mRNA for PRP8 protein, complete cds.
Homo sapiens mRNA for human protein homologous to DROER
Homo sapiens mRNA for SART-1, complete cds.
Homo sapiens ribosomal protein S20 (RPS20) mRNA, complete cds.
Homo sapiens GTP-binding protein (rhoA) mRNA, complete cds.
Homo sapiens ribosomal protein L34 (RPL34) mRNA, complete cds.
Homo sapiens mRNA for KIAA1063 protein, partial cds.
H. sapiens mRNA for DRES9 protein.
H. sapiens mRNA for yeast methionyl-tRNA synthetase homologue.
Homo sapiens zinc finger protein (ZPR1) mRNA, complete cds.
Homo sapiens mRNA for Lysyl tRNA Synthetase, complete cds.
H. sapiens mRNA for ribosomal protein L29.
Homo sapiens mRNA for KIAA0745 protein, partial cds.
H. sapiens OZF mRNA.
Homo sapiens mRNA for glycosylphosphatidylinositol anchor
Homo sapiens mRNA for GDP dissociation inhibitor beta.
Homo sapiens histone acetyltransferase (HBO1) mRNA, complete
Homo sapiens translation initiation factor 3 47 kDa subunit mRNA,
Homo sapiens splicing factor mRNA, complete cds.
H. sapiens fau mRNA.
H. sapiens mRNA for elongation factor 2.
Homo sapiens mRNA for Asparaginyl tRNA Synthetase, complete
Homo sapiens mRNA for proton-ATPase-like protein, complete cds.
This application claims priority to U.S. Provisional Application 60/305,154 (filed Jul. 16, 2001), which is herein incorporated by reference in its entirety.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US02/21821 | 7/12/2002 | WO |
Number | Date | Country | |
---|---|---|---|
60305154 | Jul 2001 | US |