The present invention concerns methods for normalizing the results obtained from quantitative PCR or microarrays.
Gene expression analysis is one of the most interesting ways to compare experimental or clinical conditions. Understanding gene expression profiles is expected to provide insight into complex regulatory networks. Over the last twenty years, real time quantitative PCR (rt-qPCR) has become the method of choice for accurate expression profiling, replacing end-point PCR, RPA (Ribonuclease Protection Assay) and Northern blotting. Although this method is widely used, much needs to be done to increase its reliability and accuracy.
Typically, rt-qPCR requires different steps (
Each of these steps has a variable yield that could alter quantification of the target gene. In addition, PCR suffers from false negative results when enzyme inhibitors are present in the samples or when reagents are missing or degraded.
Steps 1 and 2 are also used for microarray experiments. Accordingly, controls applicable to these steps are relevant for these techniques (
To ensure normalization of initial steps of rt-qPCR and microarrays, different improvements have been developed (van de Peppel et al. (2003) EMBO Rep. 4:387-393; Huggett et al. (2005) Genes Immun 6:279-284). These improvements are summarized in Table 1.
One way to normalize target mRNA expression is to use a fixed amount of total RNA for subsequent RT, namely “total RNA normalization”. Total RNA normalization is deemed inaccurate because total RNA is mainly composed of ribosomal RNA (rRNA) which amount is too different from the amount of the messenger RNA (mRNA) of interest. Furthermore, total RNA normalization does not control for RNA degradation or output variations during quantification of RNA molecules or RT.
The most commonly used way to normalize gene expression is to report the expression of the gene(s) of interest to the expression of “HouseKeeping Genes” (HKG) or internal control genes which expression is assumed to be stable between cells/tissues/samples and experimental conditions. These HKG can code for mRNA or rRNA. Nevertheless, the use of rRNA is not a good standard because these RNAs are present in the cell in a much larger quantity than the target mRNA. In addition, the present inventors (Caradec et al. (2010) Br. J. Cancer 102:1037-1043) and others (Lee et al. (2002) Genome Res. 12:292-297; Vandesompele et al. (2002) Genome Biol. 3:RESEARCH0034; Radonic et al. (2004) Biochem. Biophys. Res. Commun. 313:856-862) have demonstrated that HKG expression could vary according to samples or experimental procedures, leading to an inaccurate normalization, a misinterpretation of results and even conflicting report.
Most microarray experiments make use of the expression levels of all genes as normalization features, assuming that relatively few transcript levels vary between samples, or that any changes that occur are balanced. Van de Peppel et al. showed that this “all-gene” approach does not take into account global changes that often occur during experimental conditions, sampling and sample preparation (van de Peppel et al. (2003)).
Therefore there are no universal control genes to normalize rt-qPCR and microarray assays. Accordingly, prior to any quantification of target genes, several HKG should be tested for their stability in each condition or sample studied (pre-experimental validation) in order to determine the less variable HKG which will be the more appropriate reference for the experiment (Tricarico et al. (2002) Anal Biochem 309:293-300; Pfaffl et al. (2004) Biotechnol Lett 26:509-515). This prior analysis is cumbersome, time-consuming and costly. Furthermore, for laboratories which are not using rt-qPCR or microarray assay as a routine, the prior study of HKG is not profitable. Finally, searching for the best HKG means the pre-analysis consumption of precious samples.
To determine the best HKG among set of reference genes tested, mathematical models (Chervoneva et al. (2010) BMC Bioinformatics 11:253) and specialized softwares (Vandesompele et al. (2002); Pfaffl et al. (2004)) have been developed. However, mathematical models can be complex and difficult to use and specialized softwares do not always corroborate on the determination of the best HKG. In addition, different primers are used for common HKG amplification, which are not always the same between laboratories. The sequences of primers could influence rt-qPCR efficiency resulting in variation of results obtained using the same HKG in the same experimental protocol in different laboratories. Therefore, there are neither universal control genes nor universal defined protocols to determine the best control genes.
Nowadays, the use of HKG in the normalization of rt-qPCR and microarray results does not seem an accurate way for a universal standardization of results and worldwide comparison of genomic expression. Scientific community suggested developing universal RNA material reference for rt-qPCR and microarrays standardization. The addition of external heterologous RNA (either synthetic as alien RNA or from plants when studying animal genes for example) at the RT step (Step 2) is considered the most promising method to normalize results. It allows monitoring of the RT, PCR efficiency and RNA degradation.
However, it is of no help to monitor the variation of extraction yield, errors in total RNA quantification or the degradation of the RNA of interest during storage. In addition, since synthetic RNAs are in vitro retro-transcribed, concerns may arise on the different efficiencies of RT and PCR compared to natural RNA with secondary and tertiary structures. Exogenous RNAs are already used for microarrays (Benes and Muckenthaler (2003) Trends Biochem Sci 28:244-249) but their use for normalization of rt-qPCR results is not generalized (Huggett et al. (2005) Genes Immun 6:279-284).
To conclude, there is a real need for an easy-to-use, commercially available method of normalization to monitor all the pre-analytical steps of gene expression analysis assays such as rt-qPCR and microarrays and to facilitate the comparison of data collected in laboratories throughout the world. Reliable normalization methods are also mandatory for microarray and rt-qPCR methods to be widely adopted for clinical diagnostic use.
The present invention arises from the unexpected finding by the inventors that addition of a fixed amount of external control sample during extraction of RNAs to be studied and of a fixed amount of external RNA before the RT, as depicted in
The present invention thus concerns a method for comparing, in at least two samples A1 and A2, the amount of RNA of a target gene t, comprising the steps consisting of:
wherein the reference gene gc is selected in such a way that nucleic acids, primers and/or probes used in step e) to measure the cDNA level of the reference gene gc do not cross-react with cDNAs of the target gene t and of the reference gene gd and wherein the reference gene gd is selected in such a way that nucleic acids, primers and/or probes used in step e) to measure the cDNA level of the reference gene gd do not cross-react with cDNAs of the target gene t and of the reference gene gc.
The present invention also concerns a kit for comparing the amount of RNA of a target gene t in at least two samples A1 and A2, comprising:
wherein the reference genes gc and gd are genes with a relative low expression level and the reference genes g′c and g′d are genes with a relative high expression level.
In the context of the invention, the terms “target gene”, “target RNA” and “target cDNA” refer to the sequences of interest to be quantified and/or compared in samples A1 and A2.
As used herein, the term “sample” refers to any biological or synthetic sample containing ribonucleic acid which can be extracted. Preferably, the samples of the invention are biological samples. In particular, the biological samples may be selected from the group consisting of blood, serum, plasma, urine, feces, cerebrospinal fluid, sperm, puncture fluid, expectora, saliva, bronchial and alveolar fluids, pus, genital secretions, amniotic fluids, gastric fluids, bile, pancreatic fluid, tissue biopsy, hair, skin, teeth, and lymphatic fluids. In the context of the invention, the biological sample can also constituted of cultured cells or medium containing ribonucleic acid. In some embodiments, biological samples may be synthetic and/or man-made, or a mix of natural and synthetic and/or man-made samples. In other embodiments, biological samples may be of beverages, perfumes, foods, or any type of fluids that could contain ribonucleic acids.
As used herein, the expression “external control sample” refers to a sample as defined above, preferably a biological sample as defined above, which is obtained from a different organism from the biological samples to be studied and which comprises RNA of a reference gene gc as defined herein below.
As used herein, the expression “external control RNA” refers to a composition consisting essentially of RNA and which comprises RNA of a reference gene gd as defined herein below.
As used herein, the expression “reference gene” refers to a gene, the sequence of which is used for normalization, and which relative level of expression in the external control biological sample or in the external control RNA is known. In the context of the invention, the reference gene gc is selected in such a way that nucleic acids, primers and/or probes used in step e) of the method of the invention to measure the cDNA level of the reference gene gc do not cross-react with cDNAs of the target gene t and of the reference gene gd and the reference gene gd is selected in such a way that nucleic acids, primers and/or probes used in step e) of the method of the invention to measure the cDNA level of the reference gene gd do not cross-react with cDNAs of the target gene t and of the reference gene gc.
As used herein, the term “cross-reacting” means hybridizing to and/or amplifying another nucleic acid sequence than the nucleic acid of interest.
In particular, the reference genes gc and gd may be homologous genes obtained from distinct species.
Preferably, the reference genes gc and gd are selected in such a way that the order of magnitude of their relative expression level is similar to the order of magnitude of the relative expression level of the target gene t. In particular, when the relative expression level of the target gene t is low, the reference genes gc and gd preferably display a low relative expression level. Similarly, when the relative expression level of the target gene t is high, the reference genes gc and gd preferably display a high relative expression level. Reference genes gc and gd displaying a low relative expression level or a high relative expression level are well-known from the skilled person or can be easily determined by the skilled person using conventional techniques of measurement of RNA level in a sample.
The present invention concerns a method for comparing, in at least two samples A1 and A2, preferably at least two biological samples A1 and A2, the amount of RNA of a target gene t, comprising the steps consisting of:
a) mixing each of the at least two samples A1 and A2 with a determined amount of external control sample C, preferably of external control biological sample C, comprising RNA of a reference gene gc;
b) extracting RNA from each of the at least two mixtures A1+C and A2+C obtained in step a), in order to obtain corresponding solutions of extracted RNA;
c) mixing each of the at least two solutions of extracted RNA of A1+C and A2+C with a determined amount of external control D RNA including RNA of a reference gene gd;
d) performing reverse transcriptions on each of the at least two mixtures A1+C+D and A2+C+D obtained in step c), in order to obtain corresponding solutions comprising cDNAs of the target gene t, of the reference gene gc and of the reference gene gd;
e) measuring the cDNA levels of each of the target gene t, of the reference gene gc and of the reference gene gd in each of the at least two cDNA solutions A1+C+D and A2+C+D obtained in step d); and
f) normalizing the cDNA levels of the target gene t from the at least two samples A1 and A2, using cDNA levels of the reference genes gc and gd;
wherein the reference gene gc is selected in such a way that nucleic acids, primers and/or probes used in step e) to measure the cDNA level of the reference gene gc do not cross-react with cDNAs of the target gene t and of the reference gene gd and wherein the reference gene gd is selected in such a way that nucleic acids, primers and/or probes used in step e) to measure the cDNA level of the reference gene gd do not cross-react with cDNAs of the target gene t and of the reference gene gc.
RNA can be extracted from the sample according to any method well known to those of skill in the art. For example, methods of extraction of nucleic acids are described in Laboratory Techniques in Biochemistry and Molecular Biology: Hybridization With Nucleic Acid Probes, Part I. Theory and Nucleic Acid Preparation, P. Tijssen, ed. Elsevier (1993).
The extracted RNAs can be labeled with one or more labeling moieties to allow for detection of hybridized arrayed/sample nucleic acid molecule complexes. The labeling moieties can include compositions that can be detected by spectroscopic, photochemical, biochemical, bioelectronic, immunochemical, electrical, optical or chemical means. The labeling moieties include radioisotopes, such as 32P, 33P or 35S, chemiluminescent compounds, labeled binding proteins, heavy metal atoms, spectroscopic markers, such as fluorescent markers and dyes, magnetic labels, linked enzymes, mass spectrometry tags, spin labels, electron transfer donors and acceptors, and the like. Preferred fluorescent markers include Cy3 and Cy5 fluorophores (Amersham Pharmacia Biotech, Piscataway N.J.).
As used herein, the term “reverse transcription” refers to the transcription of single-stranded RNA into single-stranded DNA (cDNA). Techniques to perform reverse-transcription are well-known from the skilled person and typically involve the use of reverse transcriptases. Preferably, the step of reverse transcription is performed by RT-PCR, i.e. techniques applying a polymerase chain reaction (PCR) after conversion of RNA into complementary DNA (cDNA) by a reverse transcription.
Techniques to measure cDNA levels in step e) are well-known from the skilled person. Such techniques include in particular quantitative PCR and microarrays.
As used herein, the expressions “quantitative PCR”, “real time PCR” and “real time RT-PCR” are used indifferently and refer to fluorescence-based PCR methods on photometric thermocyclers with the option for quantification of original template amounts. These techniques can include additional pre-amplification steps on a traditional thermocycler for a defined number of PCR-cycles.
Preferably, the step e) of measuring cDNA levels is performed by quantitative PCR.
Preferably, when step e) of measuring cDNA levels is performed by quantitative PCR, a cycle threshold Ct value is preferably obtained in step e) for target gene t, and for reference genes gc and gd in each of the at least two cDNA solutions A1+C+D and A2+C+D.
As used herein, the term “Cycle Threshold” or “Ct” refers to the cycle in exponential phase where fluorescent intensity reaches a predetermined manual or computed threshold level, significantly higher than the fluorescent background noise. Ct value thus logarithmically depends on the amount of template cDNA input and characterizes gene expression level.
As used herein, the term “normalizing” or “normalization” refers to a process enabling, when comparing the expression level of a gene in two samples, cancelling the differences due to the variable yields of each step of the comparison method.
In the context of the invention, in particular when step e) of measuring cDNA levels is performed by quantitative PCR and when a cycle threshold Ct value is obtained in step e) for target gene t, and for reference genes gc and gd, any calculation using the Ct values obtained for reference genes gc and gd can be used to normalize the results and give an estimation of experiment's yield between samples A1 and A2.
Preferably, in the method of the invention, step f) of normalizing is performed using the following equation:
wherein:
R represents the ratio of the cDNA level of the target gene t in the sample A1 on the cDNA level of the target gene t in the sample A2,
In another preferred embodiment, the step e) of measuring cDNA levels is performed using microarrays.
As used herein, the term “microarray” refers to an arrangement of hybridizable array elements. Preferably, in microarrays used according to the invention, the hybridization signal from each of the array elements is individually distinguishable.
Preferably, when step e) of measuring cDNA levels is performed using microarrays, a relative intensity fluorescence is obtained in step a) for target gene t, and for reference genes gc and gd in each of the at least two cDNA solutions A1+C+D and A2+C+D.
In the context of the invention, in particular when step e) of measuring cDNA levels is performed using microarrays and when a relative intensity fluorescence is obtained in step a) for target gene t, and for reference genes gc and gd, any calculation using relative intensity fluorescence values obtained from control gene gc and control gene gd can be sued to normalize the results and give an estimation of experiment's yield between samples A1 and A2.
The present invention also concerns a kit for comparing the amount of RNA of a target gene t in at least two samples A1 and A2, preferably in at least two biological samples A1 and A2, comprising:
(i) a determined amount of an external control sample C, preferably of an external control biological sample C, comprising RNA of a reference gene gc and of a reference gene g′c;
(ii) a determined amount of external control D RNA including RNA of a reference gene gd and of a reference gene g′d;
(iii) a couple of primers that specifically amplify cDNA of the reference gene gc;
(iv) a couple of primers that specifically amplify cDNA of the reference gene g′c;
(v) a couple of primers that specifically amplify cDNA of the reference gene gd; and
(vi) a couple of primers that specifically amplify cDNA of the reference gene g′d;
wherein the reference genes gc and gd are genes with a relative low expression level and the reference genes g′c and g′d are genes with a relative high expression level.
As used herein, the term “couple of primers” refers to oligonucleotides designed to hybridize only to certain regions of target cDNA or external control cDNA to yield amplicons of a specific length in a PCR reaction.
As used herein, the expression “specifically amplify” means that said couple of primers hybridizes to and enables amplifying by PCR a sequence of a given cDNA without hybridizing to or amplifying other sequences.
Preferably, the couple of primers (iii), (iv), (v) and (vi) do not amplify cDNA of the target gene t.
In particular, the reference genes gc and gd, and g′c and g′d, which are specifically amplified by the couple of primers (iii), (iv), (v) and (vi) may be respectively homologous genes obtained from distinct species.
The present invention will be further illustrated, but not limited, by the figures and examples described herein below.
The present example demonstrates the normalizing power of the method according to the invention compared to conventional methods such as methods using HKG.
In the present example, two types of samples A1 and A2 of human origin (Homo sapiens, Hs) were studied for their relative expression of SNAIL and SLUG genes (t genes), which are transcription factors involved in mesenchyme-epithelium transition and in cancer process. Samples A1 and A2 studied are, on one hand, from normal and cirrhotic liver from two different patients with hepatocarcinoma, and on the other hand, from normal and cancerous prostatic cell lines, i.e. PNT2 and LNCaP respectively.
External control C comes from rainbow trout (Oncorhynchus mykiss, Om) and external control RNA D comes from chicken (Gallus gallus, Gs). GAPDH or ACTS genes from Om and Gs were used in the present example as gc/gd external control genes. The three species (Hs, Om and Gs) do not cross react for target nucleic acid and external control genes chosen for the study.
To ensure the use of standardized amounts of sample from liver biopsy, snap frozen tissues were sectioned using a cryostat-microtome to obtain 50 μm-sections with a diameter of 2 to 3 mm, corresponding to approximately 5 mg of tissue.
The prostatic cell lines PNT2 and LNCaP were maintained at 37° C. in a humidified atmosphere of 5% CO2, in RPMI 1640 medium (Gibco) supplemented with 10% fetal bovine serum and 1% penistreptomycin-penicillin solution. Cells were harvested at confluence and counted using glasstic slide 10 with grids (KOVA). For RNA extraction, 50,000 cells were pelleted and stored at −80° C.
In the present example, external control C was prepared from rainbow trout muscle (Oncorhynchus mykiss, Om) in the same conditions as human liver samples. Sections of trout tissue were added to the liver samples with a tissue size ratio of 2/1 (sample/trout), and one section of Om muscle was added to 50,000 cells.
Samples and control tissue were sonicated using Bandelin Sonopuls HD 2070 during 30 seconds at 75 W in TRIreagent (Invitrogen) and RNA extraction was done following the manufacturer's instructions.
External control RNA was prepared from chicken muscle (Gallus gallus, Gs) using the TRIreagent (Invitrogen) following manufacturer's instructions.
RNA concentration was measured using Nanodrop spectrophotometer (Thermo Scientific) and 100 ng of control RNA from Gs was added to 400 ng of RNA obtained from mixed samples liver-trout or cells-trout.
Reverse transcription of RNA was performed with M-MLV enzyme (Invitrogen) using random primers following manufacturer's protocol.
cDNAs obtained from RT were used to perform real time rt-qPCR. Real time rt-qPCRs were performed in Step One Plus real-time PCR system (Applied Biosystems) using Sybr®green PCR master mix (Applied Biosystems) according to manufacturer's instructions.
To validate the present invention, results obtained with normalization using external control C and D were compared to results obtained with normalization using the less variable HKG determined beforehand for each experimental condition with statistical analyses (
Expression of target genes, housekeeping genes (HKG) and external control genes was assessed with the following specifically designed forward (F) and reverse (R) primers:
The normalization using housekeeping genes (HKG) was done according to the classical delta Ct equation:
To normalize using external controls, the inventors chose to correct relative expression of SNAIL or SLUG (t genes) in A1 and A2 with their respective relative expression of GAPDH or ACTB from Om (gc genes), before correcting sample A1 with a ratio representing difference in output rt-qPCR between samples A1 and A2. This ratio is the comparison of relative expression of GAPDH or ACTB Gs (gd genes) in sample A2 with those observed in sample A1. The normalization using external controls was thus done as follows:
In another way, the difference of real time RT-PCR output between samples A1 and A2 can be estimated with direct Ct values, with the coefficient (C) calculated as follows:
Theoretically, if C is equal or very closed to 1, output difference is negligible and equation 2 can be simplified as follows:
Human samples were assessed with or without external controls (Om and Gs). 500 ng for human samples alone were compared to 400 ng of mixed sample Hs+Om plus 100 ng of Gs RNA. Tables 2 and 3 show the Ct values obtained for SNAIL, SLUG, PGK 1 or ATP5G3 genes in each sample. Corresponding standard deviations and CV values are shown. Unpaired t test was used.
Statistical analyses showed that addition of external controls Om and Gs did not significantly alter Ct obtained for each sample studied alone. Even when Hs cell lines were diluted with Om and Gs, rt-qPCR output was better with lower Ct values, above all for LNCaP cells. Since inhibitors of real time RT-PCR could be present in cellular samples, dilution of samples with Om and Gs external controls could decrease inhibitors in samples that could interfere with real time RT-PCR process, leading to better experiment output. Another explanation was the variability in samples preparation between cells alone and cells mixed with Om and Gs. Interestingly, Ct variations of HKG did not systematically follow those of SNAIL and SLUG genes.
Ratios of relative expression of SNAIL and SLUG in PNT2 compared to LNCaP were calculated in about 15 different samples of each cell line, following equation 1 using HKG genes expression levels (HKG normalization) and equation 2 using Om and Gs GAPDH expression levels (external control normalization). ATP5G3 was determined as the best HKG, i.e. ATP5G3 showed the less variable expression between PNT2 and LNCaP cells; whereas ACTB was considered the worst HKG tested, i.e. ACTB showed the most variable expression level between PNT2 and LNCaP cells (Table 4 and
Relative expression of SNAIL and SLUG genes in PNT2 vs LNcaP samples was measured and results from two representative rt-qPCR assays 1 and 2 are depicted in
Ratios of relative expression of SNAIL and SLUG in cirrhotic compared to normal liver were calculated in about 20 different samples of each tissue, following equation 1 using HKG genes expression levels (HKG normalization) and equation 2 using Om and Gs ACTB expression levels (external control normalization). PGK 1 was determined as the less variable HKG, but ATP5G3 was also a quite good candidate for a stable HKG (
Relative expression of SNAIL and SLUG genes in cirrhotic liver vs normal liver tissues was measured and results from two representative real time RT-PCR assays 1 and 2 are depicted in
All these data demonstrated that normalization using Om and Gs external controls did not interfere with the different steps of real time RT-PCR and that relative expression levels obtained were as reliable as classical HKG normalization, provided that HKG were determined in pre-analytical experiments as the less variable in samples studied.
Normalization using Om and Gs external controls, described in the present example, gave similar results to those obtained with normalization using the less variable HKG, determined with pre-analytical experiments. In addition, normalization with Om and Gs external controls was reproducible. These results demonstrated that normalization according to the present invention is as efficient as the best HKG determined. The invention could be used in replacement of HKG, avoiding time-, money- and sample-consuming pre-analytical experiments needed to determine the less variable HKG.
In addition, variations of HKG expression could be observed within the same tissue, according to the localization of the biopsy. This variation could interfere with a good normalization of real time RT-PCR results. Om and Gs external control genes have stable expression, whatever the sample or the localization of the biopsy studied.
Moreover, external controls were essential to pinpoint problems encountered during extraction and/or RT and PCR steps. Indeed, in Tables 8 and 9, cell line or tissues liver samples with high Ct values for Om and Gs are shown.
High Ct values for Om and Gs were most probably due to problems during processing of the samples. In the same samples, HKG ATP5G3 or PGK 1 Ct values were higher but within normal range of gene expression. Thus, when using HKG for data normalization, target gene expression could not be correctly observed, whereas using Om and Gs external controls, problems in output of real time RT-PCR could be revealed. These results show that HKG normalization could lead to a misinterpretation because it does not reflect variation due to sample processing.
Finally, the present invention can be commercialized to propose a worldwide standardization of real time rt-qPCR and microarray results, to enable inter-laboratories genomic expression comparison. External control genes can be chosen according to their relative expression and to the expression of the gene of interest in samples A1 and A2. Indeed, external control genes with low expression could be used to study target genes with a low expression in samples of interest, inversely external control genes with high expression could be used to study highly expressed genes of interest. This method offers the possibility to normalize results with control genes and target genes showing close Ct values, which is not possible with HKG since they are generally highly expressed.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/EP2012/072062 | 11/7/2012 | WO | 00 |
Number | Date | Country | |
---|---|---|---|
61556655 | Nov 2011 | US |