1. Field of the Invention
The present invention relates to a method of using artificial genes as universal controls in gene expression analysis systems. More particularly, the present invention relates to a method of producing universal Controls for use in gene expression analysis systems such as macroarrays, real-time PCR, northern blots, SAGE and microarrays, such as those provided in the Microarray ScoreCard system.
2. Description of Related Art
Gene expression profiling is an important biological approach used to better understand the molecular mechanisms that govern cellular function and growth. Microarray analysis is one of the tools that can be applied to measure the relative expression levels of individual genes under different conditions. Microarray measurements often appear to be systematically biased, however, and the factors that contribute to this bias are many and ill-defined (Bowtell, D. L., Nature Genetics 21, 25-32 (1999); Brown, P. P. and Botstein, D., Nature Genetics 21, 33-37 (1999)). Others have recommended the use of “spikes” of purified mRNA at known concentrations as controls in microarray experiments. Affymetrix includes several for use with their GeneChip products. In the current state of the art, these selected genes are actual genes selected from very distantly related organisms. For example, the human chip (designed for use with human mRNA) includes control genes from bacterial and plant sources.
Each of the prior art controls consists of transcribed sequences of DNA from some source. As a result, that source cannot be the subject of a hybridization experiment using those controls due to the inherent hybridization of the controls to its source. In addition, the lack of universal references consistent from experiment to experiment and from species to species greatly reduces the ability for scientists to compare data across labs, users, or time. What is needed, therefore, is a set of universal controls that do not hybridize with the DNA of any source which may be the subject of an experiment. More desirably, there is a need for a universal control for gene expression analysis which do not hybridize with any known source.
Accordingly, this invention provides a process of producing universal controls that are useful in gene expression analysis systems designed for any species and which can be tested to insure lack of hybridization with mRNA from sources other than the control DNA itself.
The invention relates in a first embodiment to a process for producing at least one universal control for use in a gene expression analysis system. The process comprises selecting at least one non-transcribed (preferably intergenic, also intronic) region of genomic DNA from a known sequence, designing primer pairs for said at least one non-transcribed region and amplifying said at least one non-transcribed region of genomic DNA to generate corresponding double stranded DNA, then cloning said double stranded DNA using a vector to obtain additional double stranded DNA and formulating at least one control comprising said double stranded DNA.
The present invention relates in a second embodiment to a process of producing at least one universal control for use in a gene expression analysis system wherein testing of said at least one non-transcribed region to ensure lack of hybridization with mRNA from sources other than said at least one non-transcribed region of genomic DNA is performed.
The present invention in a third embodiment relates to said process further comprising purifying said DNA and mRNA, determining the concentrations thereof and formulating at least one control comprising said DNA or of said mRNA at selected concentrations and ratios.
Another embodiment of the present invention is a universal control for use in a gene expression analysis system comprising a known amount of at least one DNA generated from at least one non-transcribed region of genomic DNA from a known sequence, or comprising a known amount of at least one mRNA generated from DNA generated from at least one non-transcribed region of genomic DNA from a known sequence. The present invention may optionally include generating mRNA complementary to said DNA and formulating at least one control comprising said mRNA, by optionally purifying said DNA and mRNA, determining the concentrations thereof and formulating at least one control comprising said DNA or of said mRNA at selected concentrations and ratios.
Another embodiment of the present invention is a universal control for use in a gene expression analysis system wherein a known amount of at least one DNA sequence generated from at least one non-transcribed region of genomic DNA from a known sequence, a known amount of at least one mRNA generated from DNA generated from at least one non-transcribed region of genomic DNA from a known sequence is included, and the aforementioned control wherein, said DNA and mRNA do not hybridize with any DNA or mRNA from a source other than the at least one non-transcribed region of genomic DNA.
The present invention, relates to a method of using said universal control, as a negative control in a gene expression analysis system by adding a known amount of said control containing a known amount of DNA, to a gene expression analysis system as a control sample and subjecting the sample to hybridization conditions in the absence of complementary labeled mRNA and examining the control sample for the absence or presence of signal.
Further, said controls can be used in a gene expression analysis system by adding a known amount of a said control containing a known amount of DNA to a gene expression analysis system as a control sample and subjecting the sample to hybridization conditions, in the presence of a said control containing a known amount of labeled complementary mRNA, and measuring the signal values for the labeled mRNA and determining the expression level of the gene transcript based on the signal value of the labeled mRNA.
Additionally, said controls may be used as calibrators in a gene expression analysis system by adding a known amount of a said control containing known amounts of several DNA sequences to a gene expression analysis system as control samples and subjecting the samples to hybridization conditions in the presence of a said control containing known amounts of corresponding complementary labeled mRNAs, each mRNA being at a different concentration and measuring the signal values for the labeled mRNAs and constructing a dose-response or calibration curve based on the relationship between signal value and concentration of each mRNA.
Also, the present invention relates to a method of using said controls as calibrators for gene expression ratios in a two-color gene expression analysis system by adding a known amount of at least one of said controls containing a known amount of DNA to a two-color gene expression analysis system as control samples and subjecting the samples to hybridization conditions in the presence of a said control containing known amounts of two differently labeled corresponding complementary labeled mRNAs for each DNA sample present and measuring the ratio of the signal values for the two differently labeled mRNAs and comparing the signal ratio to the ratio of concentrations of the two or more differently labeled mRNAs.
A further embodiment of the present invention is a process of producing controls that are useful in gene expression analysis systems designed for any species and which can be tested to insure lack of hybridization with mRNA from sources other than the synthetic sequences of DNA from which the control is produced.
One or more such controls can be produced by a process comprising synthesizing a near-random sequence of non-transcribed DNA, designing primer pairs for said at least one near random sequence and amplifying said non-transcribed DNA to generate corresponding double stranded DNA, then cloning said double stranded DNA using a vector to obtain additional double stranded DNA and formulating at least one control comprising said double stranded DNA.
The process can also be used to produce at least one control for use in a gene expression analysis system wherein testing of said sequence of non-transcribed synthetic DNA to ensure lack of hybridization with mRNA from sources other than said sequence of non-transcribed DNA is performed.
Additionally, mRNA complementary to said synthetic DNA can be generated and formulated to generate at least one control comprising said mRNA.
DNA and mRNA can be subsequently purified, the concentrations thereof determined, and one or more controls comprising said DNA or said mRNA at selected concentrations and ratios be formulated.
Another embodiment of the present invention is a control for use in a gene expression analysis system produced by the process comprises synthesizing a near-random sequence of DNA, designing primer pairs for said synthetic DNA and amplifying said DNA to generate corresponding double stranded DNA, then cloning said double stranded DNA using a vector to obtain additional double stranded DNA and formulating at least one control comprising a known amount of at least one said double stranded DNA or a known amount of at least one mRNA generated from said DNA, and optionally, wherein, said DNA and mRNA do not hybridize with any DNA or mRNA from a source other than said DNA sequence of non-transcribed DNA.
The present invention, additionally, relates to a method of using said controls containing a known amount of DNA, as a negative control in a gene expression analysis system including adding a known amount of said control containing a known amount of DNA to a gene expression analysis system as a control sample, and subjecting the sample to hybridization conditions in the absence of complementary labeled mRNA and examining the control sample for the absence or presence of signal.
Further, said controls may be used in a gene expression analysis system wherein a known amount of a said control containing a known amount of DNA is added to a gene expression analysis system as a control sample and subjecting the sample to hybridization conditions in the presence of a said control containing a known amount of labeled complementary mRNA and measuring the signal values for the labeled mRNA and determining the expression level of the gene transcript based on the signal value of the labeled mRNA.
The present invention, also relates to a method of using said controls as calibrators in a gene expression analysis system including adding known amounts of a said control containing known amounts of several DNAs to a gene expression analysis system as control samples and subjecting the samples to hybridization conditions in the presence of a said control containing known amounts of corresponding complementary labeled mRNAs, each mRNA being at a different concentration and measuring the signal values for the labeled mRNAs and constructing a dose-response or calibration curve based on the relationship between signal value and concentration of each mRNA.
The present invention, additionally, relates to a method of using said controls as calibrators for gene expression ratios in a two-color gene expression analysis system comprising adding a known amount of at least one of said controls containing a known amount of DNA to a two-color gene expression analysis system as control samples and subjecting the samples to hybridization conditions in the presence of a said control containing known amounts of two differently labeled corresponding complementary labeled mRNAs for each DNA sample present and measuring the ratio of the signal values for the two differently labeled mRNAs and comparing the signal ratio to the ratio of concentrations of the two or more differently labeled mRNAs.
Further embodiments and uses of the current invention will become apparent from a consideration of the ensuing description.
The above and other objects and advantages of the present invention will be apparent upon consideration of the following detailed description taken in conjunction with the accompanying drawings, in which like characters refer to like parts throughout, and in which:
The present invention teaches universal Controls for use in gene expression analysis systems such as microarrays. Many have expressed interest in being able to obtain suitable genes and spikes as controls for inclusion in their arrays.
An advantage of the universal Controls of this invention is that a single set can be used with assay systems designed for any species, as these Controls will not be present unless intentionally added. This contrasts with the concept of using genes from “distantly related species.” For example, an analysis system directed at detecting human gene expression might employ a Bacillus subtilis gene as control, which may not be present in a human genetic material. But this control might be present in bacterial genetic material (or at least, cross hybridize), thus it may not be a good control for an experiment on bacterial gene expression. The novel universal Controls presented here provide an advantage over the state of the art in that the same set of controls can be used without regard to the species for the test sample RNA.
The present invention employs the novel approaches of using either non-transcribed genomic sequences or totally random synthetic sequences as a template and generating both DNA and complementary “mRNA” from such sequences, for use as controls. The Controls could be devised de novo by designing near-random sequences and synthesizing them resulting in synthetic macromolecules as universal controls. Totally synthetic random DNA fragments are so designed that they do not cross-hybridize with each other or with RNA from any biologically relevant species (meaning species whose DNA or RNA might be present in the gene expression analysis system). The cost of generating such large synthetic DNA molecules can be high. However, they only need to be generated a single time. Additionally, fragment size can be increased by ligating smaller synthetic fragments together by known methods. In this way, fragments large enough to be easily cloned can be created. Through cloning and PCR sufficient quantities of DNA for use as controls can be produced and mRNA can be generated by in vitro transcription for use in controls.
A simpler approach is to identify sequences from the intergenic or intronic regions (referred here as non-transcribed regions) of genomic DNA from an organism, and use these as a template for synthesis via PCR (polymerase chain reaction). Ideally, sequences of around 1000 bases (could range from 500 to 2000 bases) are selected based on computer searches of publicly accessible sequence data. The criteria for selection include:
PCR primer pairs are designed for the selected sequence(s) and PCR is performed using genomic DNA (as a template) to generate PCR fragments (double strand DNA) corresponding to the non-transcribed sequence(s) as the control DNA. Additional control DNA can be cloned using a vector and standard techniques. Subsequently, standard techniques such as in vitro transcription are used to generate mRNA (complementary to the cDNA and containing a poly-A tail) as the control mRNA. Standard techniques are used for purifying the Control DNA and Control mRNA products, and for estimating their concentrations.
Empirical testing is also performed to ensure lack of hybridization between the Control DNA on the array and other mRNAs, as well as with mRNA from important gene expression systems (e.g., human, mouse, Arabidopsis, etc.).
The above approaches were used to generate twenty-three universal control sequences from intergenic regions of the yeast Saccharomyces cerevisiae genome. Specifically, using yeast genome sequence data publicly available at the Stanford University web site, intergenic regions approximately 1 kb in size were identified. These sequences were BLAST'd and those showing no homology to other sequences were identified as candidates for artificial gene controls. Candidates were analyzed for GC-content and a subset with a GC-content of ≧36% was identified. Specific primer sequences have been identified and primers synthesized. PCR products amplified with the specific primers have been cloned directly into the pGEM™-T Easy vector (Promega Corp., Madison, Wis.). Both array targets and templates for spike mRNA have been amplified from these clones using distinct and specific primers.
A greater number of intergenic regions have been cloned for testing. DNA samples from all the candidates were amplified, spotted on glass microarray slides and hybridized with mRNA samples from several species and each candidate spike mRNA, respectively, to identify those that do not cross-hybridize. First, they were screened for no cross-hybridization with RNA from different biological species. mRNA from human (eight tissues: skeletal muscle, spleen, liver, heart, kidney, brain, placenta and lung), mouse (six tissues: skeletal muscle, spleen, liver, heart, kidney and brain), rat (six tissues: skeletal muscle, spleen, liver, heart, kidney and brain), yeast (S. cerevisiae) and bacteria (E. coli and two Archaea species), as well as total RNA from plant (Arabidopsis, Oil Palm) were tested against the control candidates. Candidates that did not cross-react with the RNA samples from the species tested were then selected for cross-hybridization with each other. The candidates were hybridized with each candidate mRNA independently.
From the candidate clones that exhibited specific hybridization, twenty-three were included into the final set of universal controls.
These universal controls, when included in microarray experiments, perform as:
Mixtures of several different Control mRNA species can be prepared (spike mixes) at known concentrations and ratios to simplify and standardize the experimental protocol while providing a comprehensive set of precision and accuracy information. Table 1 demonstrates one embodiment of this concept. The mRNA from the final set of clones have been pre-mixed at specific concentrations and ratios so they can serve as the various controls when hybridized to their corresponding control DNA spotted on the arrays. Ten calibrators (those included in the labeling reaction at a ratio of 1:1) spanning a dynamic range of 4.5 orders of magnitude are included as calibration controls. Eight ratio controls are included, at two expression levels (low and medium to high) and reversed with respect to the reference and test samples.
The universal controls as shown in Table 1 can be used as references for microarray validation and standardization across biological species and experimental platforms. These controls can be used to verify the accuracy and precision of gene expression ratios, and the sensitivity and dynamic range of the microarray system. Through the use of Calibration (standard) curves, these controls may allow reporting gene expression levels in consistent mass units, improving the comparisons of results across laboratories.
The following examples demonstrate how these Control DNA and Control mRNA were generated, and then used as universal controls in microarray gene expression experiments. They are representative of the many different types of experiments that could benefit from the use of these controls. The following examples are offered by way of illustration and not by way of limitation.
Using yeast genomic sequence data publicly available at the Stanford University web site, intergenic regions (YIRs) approximately 1 kb in size were identified. These sequences were BLAST'd and those showing no homology to other sequences were identified as candidates for artificial gene controls. Candidates were analyzed for GC-content and a subset with a GC-content of ≧36% was identified. Specific primer sequences have been identified and synthesized. PCR products amplified with the specific primers have been cloned directly into the pGEM™-T Easy vector (Promega Corp., Madison, Wis.). Both array targets and templates for spike mRNA have been amplified from these clones using distinct and specific primers.
When used as DNA controls, the YIR sequences were amplified by PCR with specific primers, using 5 ng of cloned template (plasmid DNA) and a primer concentration of 0.5 μM in a 100 μl reaction volume, and cycled as follows: 35 cycles of 94° C. 20 sec., 52° C. 20 sec., 72° C. 2 min., followed by extension at 72° C. for 5 min.
All YIR control mRNAs for the spike mix are generated by in vitro transcription. Templates for in vitro transcription (IVT) are generated by amplification with specific primers that are designed to introduce a T7 RNA polymerase promoter on the 5′ end and a polyT (T21) tail on the 3′ end of the PCR products. Run-off mRNA is produced using 1 μl of these PCR products per reaction with the AmpliScribe system (Epicentre, Madison, Wis.). IVT products are purified using the RNAEasy system (Qiagen Inc., Valencia, Calif.) and quantified by spectrophotometry.
Initially, fifty intergenic region sequences have been cloned for testing. DNA samples from all the candidates were amplified, spotted on glass microarray slides and hybridized with mRNA samples from several species and each candidate spike mRNA, respectively, to identify those that do not cross-hybridize. First, they were screened for no cross-hybridization with RNA from different biological species. mRNA from human (8 tissues: skeletal muscle, spleen, liver, heart, kidney, brain, placenta and lung), mouse (6 tissues: skeletal muscle, spleen, liver, heart, kidney and brain), rat (6 tissues: skeletal muscle, spleen, liver, heart, kidney and brain), yeast (S. cerevisiae) and bacteria (E. coli and two Archaea species), as well as total RNA from plant (Arabidopsis, Oil Palm) were tested against the control candidates.
Candidates that did not cross-react with the RNA samples from the species tested were then tested for cross-hybridization with each other. The candidates were hybridized with each candidate mRNA independently. In
From the candidate clones that exhibited specific hybridization, twenty-three are included into the final set of universal controls.
Upon confirmation of the exact structure, each of the above-described nucleic acids of confirmed structure is recognized to be immediately useful as a control.
The universal controls (both the spike mixes and their corresponding spotting samples) have been evaluated for their performance in real microarray experiments and tested for the following.
Experimental design, including array design and the hybridization sample concentration were tested (
Universal utility, including hybridization of the spikes on pre-arrayed slides from various species were also tested. The controls showed no cross-hybridization on human, rat, mouse, Arabidopsis, Yeast and E. coli pre-arrayed slides from commercial sources (data not shown).
Spike mix performance was tested, including ratio performance and Calibration curves (
The controls as shown in Table 1 can be used as references for microarray validation and standardization across biological species and experimental platforms. These controls can be used to verify the accuracy and precision of gene expression ratios, and the sensitivity and dynamic range of the microarray system. Through the use of Calibration (standard) curves, these controls may allow reporting gene expression levels in consistent mass units, improving the comparisons of results across laboratories
The above examples illustrate specific aspects of the present invention and are not intended to limit the scope thereof in any respect and should not be so construed.
Those skilled in the art having the benefit of the teachings of the present invention as set forth above, can effect numerous modifications thereto. These modifications are to be construed as being encompassed within the scope of the present invention as set forth in the appended claims.
This application is a continuation of U.S. patent application Ser. No. 10/278,845 filed Oct. 23, 2002, abandoned, which is a continuation-in-part of U.S. patent application Ser. No. 10/140,545 filed May 7, 2002, now U.S. Pat. No. 6,943,242, which claims priority to U.S. provisional patent application No. 60/289,202 filed May 7, 2001 and 60/312,420, filed Aug. 15, 2001. This application also claims priority to U.S. provisional patent application No. 60/335,115 filed Oct. 24, 2001 and 60/391,367 filed Jun. 25, 2002, the disclosures of which are incorporated herein by reference in their entireties.
Number | Date | Country | |
---|---|---|---|
60289202 | May 2001 | US | |
60312420 | Aug 2001 | US | |
60335115 | Oct 2001 | US | |
60391367 | Jun 2002 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 10278845 | Oct 2002 | US |
Child | 11339364 | Jan 2006 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 10140545 | May 2002 | US |
Child | 10278845 | Oct 2002 | US |