The present invention identifies a pregnancy signature gene set containing 12 genes, i.e., FGF12, (Hs00374427_m1), GPR137B (Hs00162803_m1), SLC2A9 (Hs00417125_m1), ARID1B (Hs00368175_m1), NR2F6 (Hs00172870_m1), ZNF132 (Hs01036387_m1), FAM36A (Hs00831105_s1), ZNF93 (Hs01656246_s1), RHBDL2 (Hs00384848_m1), DNAJC15 (Hs00387763_m1), MTUS1 (Hs00826834_m1), ND NUP133 (Hs00217272_m1), wherein the expression of one or more of these genes by cumulus cells correlates to the competency of an oocyte associated therewith, or from the same female donor.
Based on this discovery, the present invention provides methods and test kits for identifying human oocytes which are potentially suitable for use in IVF procedures by detecting the level of expression of one or more of these 12 genes or corresponding polypeptides consisting of FGF12, (Hs00374427_m1), GPR137B (Hs00162803_m1), SLC2A9 (Hs00417125_m1), ARID1B (Hs00368175_m1), NR2F6 (Hs00172870_m1), ZNF132 (Hs01036387_m1), FAM36A (Hs00831105_s1), ZNF93 (Hs01656246_s1), RHBDL2 (Hs00384848_m1), DNAJC15 (Hs00387763_m1), MTUS1 (Hs00826834_m1), ND NUP133 (Hs00217272_m1).
Based on this discovery, the present invention provides arrays or test kits containing one or more of these genes or polypeptides or primers or antibodies that provide for the detection and/or quantification of the level of expression of one or more of these 12 genes or corresponding polypeptides consisting of FGF12, (Hs00374427_m1), GPR137B (Hs00162803_m1), SLC2A9 (Hs00417125_m1), ARID1B (Hs00368175_m1), NR2F6 (Hs00172870_m1), ZNF132 (Hs01036387_m1), FAM36A (Hs00831105_s1), ZNF93 (Hs01656246_s1), RHBDL2 (Hs00384848_m1), DNAJC15 (Hs00387763_m1), MTUS1 (Hs00826834_m1), ND NUP133 (Hs00217272_m1). For example, such test kits may contain antibodies that specifically detect one or more of the gene products encoded by these 12 genes and one or more detectable label. Also, such test kits may comprise primers that provide for the specific amplication of one or more of these 12 genes in a sample such as a nucleic acid sample obtained from cumulus cells which are associated with oocytes potentially to be used for fertilization or IVF procedures.
Based on the foregoing, the present invention further provides genetic methods of identifying female subjects and materials (microarrays, test kits) for use therein, preferably human females, having impaired fertility function, e.g., as a result of impaired ovarian function because of age (menopause), underlying disease condition or drug therapy by analyzing the expression of one or more of these 12 specific genes on cumulus cells obtained from oocytes isolated from said female subject.
Also, the invention provides methods of evaluating the efficacy of a putative fertility or hormonal treatment by assessing its effect on the expression of one, two, three, four, five, six, seven, eight, nine, ten, eleven or all 12, or any combination thereof, of 12 specific genes, i.e., FGF12, (Hs00374427_m1), GPR137B (Hs00162803_m1), SLC2A9 (Hs00417125_m1), ARID1B (Hs00368175_m1), NR2F6 (Hs00172870_m1), ZNF132 (Hs01036387_m1), FAM36A (Hs00831105_s1), ZNF93 (Hs01656246_s1), RHBDL2 (Hs00384848_m1), DNAJC15 (Hs00387763_m1), MTUS1 (Hs00826834_m1), ND NUP133 (Hs00217272_m1), by cumulus cells of a female subject receiving this fertility or hormonal treatment.
Currently, there is no reliable commercially available genetic or non-genetic procedure for identifying whether a female subject produces oocytes that are “pregnancy competent”, i.e., oocytes which when fertilized by natural or artificial means are capable of giving rise to embryos that in turn are capable of yielding viable offspring when transferred to an appropriate uterine environment. Rather, conventional fertility assessment methods assess fertility e.g., based on hormonal levels, visual inspection of numbers and quality of oocytes, surgical or non-invasive (MRI) inspection of the female reproduction system organs, and the like. Often, when a woman has a problem in producing a viable pregnancy after a prolonged duration, e.g., more than a year, the diagnosis may be an “unexplained” fertility problem and the woman advised to simply keep trying or to seek other options, e.g., adoption or surrogacy.
Perhaps in part of the lack of a means for identifying pregnancy competent oocytes, the success rate for assisted reproductive technology (ART), pregnancy and birth rates following in vitro fertilization (IVF) attempts remain low. Subjective morphological parameters are still a primary criterion to select healthy embryos used for in IVF and ICSI programs. However, such criteria do not truly predict the competence of an embryo. Many studies have shown that a combination of several different morphologic criteria leads to more accurate embryo selection. Morphological criteria for embryo selection are assessed on the day of transfer, and are principally based on early embryonic cleavage (25-27 h post insemination), the number and size of blastomeres on day two, day three, or day five, fragmentation percentage and the presence of multi-nucleation in the 4 or 8 cell stage (Fenwick et al., Hum Reprod, 17, 407-12. (2002).
A recent study has shown that the selection of oocytes for insemination does not improve outcome of ART as compared to the transfer of all available embryos, irrespective of their quality (La Sala et al., Fertil Steril. (2008)).
There is a need to identify viable embryos with the highest implantation potential to increase IVF success rates, reduce the number of embryos for fresh replacement and lower multiple pregnancy rates. For all these reasons, several biomarkers for embryo selection are currently being investigated (Haouzi et al., Gynecol Obstet Fertil, 36, 730-742. (2008); He et al., Nature, 444, 12-3. (2006)).
As embryos that result in pregnancy differ in their metabolic profiles compared to embryos that do not, some studies are trying to identify a molecular signature that can be detected by non-invasive evaluation of the embryo culture medium (Brison et al., Hum Reprod, 19, 2319-24. (2004); Gardner et al., Fertil Steril, 76, 1175-80. (2001); Sakkas and Gardner, Curr Opin Obstet Gynecol, 17, 283-8 (2005); Seli et al., Fertil Steril, 88, 1350-7. (2007); Zhu et al. Fertil Steril. (2007).
Genomics are also providing vital knowledge of genetic and cellular function during embryonic development. McKenzie et al., Hum Reprod, 19, 2869-74. (2004); Feuerstein et al., Hum Reprod, 22, 3069-77 have reported, that the expression of several genes in cumulus cells, such as cyclooxygenase 2 (COX2), was indicative of oocyte and embryo quality. In addition Gremlin 1 (GREM1), hyaluronic acid synthase 2 (HAS2), steroidogenic acute regulatory protein (STAR), stearoyl-coenzyme A desaturase 1 and 5 (SCD1 and 5), amphiregulin (AREG) and pentraxin 3 (PTX3) have also been reported to be positively correlated with embryo quality (Zhang et al., Fertil Steril, 83 Suppl 1, 1169-79. (2005)). More recently, the expression of glutathione peroxidase 3 (GPX3), chemokine receptor 4 (CXCR4), cyclin D2 (CCND2) and catenin delta 1 (CTNND1) in human cumulus cells have been shown to be inversely correlated with embryo quality, based on early-cleavage rates during embryonic development (van Montfoort et al., (2008) MoI Hum Reprod, 14, 157-68. (2008)).
Also Cillo et al., Reprod. 134:645-50 (2007) suggests a correlation between the expression of certain cumulus genes, i.e., HAS2, GREM1 and PTX3 and oocyte quality and embryo development. Still further Assidi et al. Biol. Reprod. 79(2) 209-222 (2008) suggest a correlation as to the expression of certain cumulus genes, i.e., EGFR, CD44, HAS2, PTSG2 and BTC and oocyte quality and development of embryos therefrom. Further, Bettegowda et al., Biol. Reprod. 79(2):301-309 (2008) suggest a correlation as to the expression of certain proteinase cathepsin genes and bovine oocyte quality and development of offspring therefrom.
In addition, a patent was recently issued to Zhang et al. (Aug. 11, 2009) claims the detection of pentraxin 3 and a BCL-2 member on cumulus cells to assess oocyte quality. Also, US20040058975 published on Mar. 25, 2004 teaches that antagonism of the EP2 receptor and/or cycloxygenase COX-2 promotes cumulus cell proliferation and oocyte development.
Also, while early cleavage has been shown to be a reliable biomarker for predicting pregnancy (Lundin et al., Hum Reprod, 16, 2652-7. (2001); Van Montfoort et al., Hum Reprod, 19, 2103-8 (2004; Yang et al., Fertil Steril, 88, 1573-8 (2007)), little has been reported correlating gene expression profiles of cumulus cells with respect to pregnancy outcome (but see Assou et al., Mol Hum Reprod. 2008 December; 14(12):711-9. Epub 2008 Nov. 21).
Therefore, notwithstanding the foregoing, providing alternative and more predictive methods for identifying oocytes suitable for use in IVF procedures and in identifying the genetic bases of fertility problems in women would be highly desirable. In particular an identification of other genes, and biomarkers, the expression of which by cumulus cells correlates to pregnancy competency of oocytes and test kits and assays using same would be highly desirable as this could enhance the outcome of IVF procedures.
These methods and test kits would in addition provide for the identification of women with oocyte related fertility problems, which is desirable as such fertility problems may correlate to other health issues that preclude pregnancy, e.g., cancer, menopausal condition, hormonal dysfunction, ovarian cyst, or other underlying disease or health related problems.
The present invention relates to a method for selecting a competent oocyte, e.g., one that gives rise to a fertilized embryo that yields a viable pregnancy comprising a step of measuring the expression level of any combination of one of 12 genes selected from the group consisting of FGF12, (Hs00374427_m1), GPR137B (Hs00162803_m1), SLC2A9 (Hs00417125_m1), ARID1B (Hs00368175_m1), NR2F6 (Hs00172870_m1), ZNF132 (Hs01036387_m1), FAM36A (Hs00831105_s1), ZNF93 (Hs01656246_s1), RHBDL2 (Hs00384848_m1), DNAJC15 (Hs00387763_m1), MTUS1 (Hs00826834_m1), ND NUP133 (Hs00217272_m1) by a cumulus cell associated with an oocyte or from an oocyte from the same female donor and comparing said gene expression to a suitable control, e.g., cumulus cells of female donors with normal oocytes, i.e., which give rise to viable pregnancies.
The present invention also relates to a method for selecting a competent embryo, comprising a step of measuring the expression level of specific genes in a cumulus cell surrounding the embryo, wherein said genes include or consist of genes selected from the group consisting of FGF12, (Hs00374427_m1), GPR137B (Hs00162803_m1), SLC2A9 (Hs00417125_m1), ARID1B (Hs00368175_m1), NR2F6 (Hs00172870_m1), ZNF132 (Hs01036387_m1), FAM36A (Hs00831105_s1), ZNF93 (Hs01656246_s1), RHBDL2 (Hs00384848_m1), DNAJC15 (Hs00387763_m1), MTUS1 (Hs00826834_m1), ND NUP133 (Hs00217272_m1).
The present invention also relates to a method for selecting a competent oocyte or a competent embryo, comprising a step of measuring in a cumulus cell surrounding said oocyte or said embryo the expression level of one or more genes selected from the FGF12, (Hs00374427_m1), GPR137B (Hs00162803_m1), SLC2A9 (Hs00417125_m1), ARID1B (Hs00368175_m1), NR2F6 (Hs00172870_m1), ZNF132 (Hs01036387_m1), FAM36A (Hs00831105_s1), ZNF93 (Hs01656246_s1), RHBDL2 (Hs00384848_m1), DNAJC15 (Hs00387763_m1), MTUS1 (Hs00826834_m1), ND NUP133 (Hs00217272_m1).
Aberrant expression levels of one or more of these genes is predictive of a non competent oocyte or embryo due to early embryo arrest.
As discussed infra, it has been found that the level of expression of these genes by a cumulus cell of a woman donor correlates to the likelihood that an oocyte associated with said cumulus cell or derived from the same subject are “pregnancy competent” when fertilized by natural or artificial means. These genes and expression levels constitute what Applicants refer to as the “pregnancy signature”. In addition the pregnancy signature may further include one or more of the genes disclosed in Applicant's prior applications identified supra.
It is a related object of the invention to provide a novel method of determining whether an individual has a genetic associated fertility problem which potentially renders the individual's oocytes unsuitable for use in IVF methods based on the detected level of expression of one or more genes or corresponding polypeptides which constitute the “pregnancy signature.” The genes and gene products which constitute the pregnancy signature are again preferably selected from those contained in FGF12, (Hs00374427_m1), GPR137B (Hs00162803_m1), SLC2A9 (Hs00417125_m1), ARID1B (Hs00368175_m1), NR2F6 (Hs00172870_m1), ZNF132 (Hs01036387_m1), FAM36A (Hs00831105_s1), ZNF93 (Hs01656246_s1), RHBDL2 (Hs00384848_m1), DNAJC15 (Hs00387763_m1), MTUS1 (Hs00826834_m1), ND NUP133 (Hs00217272_m1).
It is another object of the invention to provide a method of evaluating the efficacy of a female fertility treatment which comprises: treating a female subject putatively having a problem that prevents or inhibits her from having a “viable pregnancy” and isolating at least one oocyte from said female subject and cells associated therewith after said fertility treatment; isolating at least one cumulus cell associated with said isolated oocyte, and detecting the level of expression of at least one gene selected from FGF12, (Hs00374427_m1), GPR137B (Hs00162803_m1), SLC2A9 (Hs00417125_m1), ARID1B (Hs00368175_m1), NR2F6 (Hs00172870_m1), ZNF132 (Hs01036387_m1), FAM36A (Hs00831105_s1), ZNF93 (Hs01656246_s1), RHBDL2 (Hs00384848_m1), DNAJC15 (Hs00387763_m1), MTUS1 (Hs00826834_m1), ND NUP133 (Hs00217272_m1), or their orthologs, splice or allelic variants that is expressed at a characteristic level of expression in “pregnancy competent” oocytes; and determining the putative efficacy of said fertility treatment based on whether said gene is expressed at a level characteristic of “pregnancy competent” oocytes as a result of treatment.
It is another specific object of the invention to provide novel methods of treating infertility by modulating the expression of one or more genes that constitute the pregnancy signature. These methods include the administration of compounds that agonize or antagonize the expression of one or more of the genes selected from FGF12, (Hs00374427_m1), GPR137B (Hs00162803_m1), SLC2A9 (Hs00417125_m1), ARID1B (Hs00368175_m1), NR2F6 (Hs00172870_m1), ZNF132 (Hs01036387_m1), FAM36A (Hs00831105_s1), ZNF93 (Hs01656246_s1), RHBDL2 (Hs00384848_m1), DNAJC15 (Hs00387763_m1), MTUS1 (Hs00826834_m1), ND NUP133 (Hs00217272_m1), or their orthologs, splice or allelic variants and their splice or allelic variants.
It is another object of the invention to provide animal models for evaluating the efficacy of putative fertility treatments comprising identifying genes which are expressed at characteristic levels in cumulus cells associated with pregnancy competent oocytes of a non-human animal, e.g., a non-human primate; and assessing the efficacy of a putative fertility treatment in said non-human animal based on its effect on said gene expression levels, i.e., whether said treatment results in said gene expression levels better mimicking gene expression levels observed in cumulus cells associated with pregnancy competent oocytes, (“pregnancy signature”). i.e. one or more of the 12 genes selected from FGF12, (Hs00374427_m1), GPR137B (Hs00162803_m1), SLC2A9 (Hs00417125_m1), ARID1B (Hs00368175_m1), NR2F6 (Hs00172870_m1), ZNF132 (Hs01036387_m1), FAM36A (Hs00831105_s1), ZNF93 (Hs01656246_s1), RHBDL2 (Hs00384848_m1), DNAJC15 (Hs00387763_m1), MTUS1 (Hs00826834_m1), ND NUP133 (Hs00217272_m1), or their orthologs, splice or allelic variants.
FIG. 1 contains a flow chart of methods used to identify the subject “pregnancy signature” i.e., 12 genes the expression of which on cumulus cells correlates to the pregnancy competency or ability of an oocyte associated with said cumulus cell or from the same female human or other mammalian donor to be capable of fertilization and when used in an IVF procedure capable of giving rise to a viable fetus and live offspring
FIG. 2 shows the predictive value and specificity of the subject gene detection methods according to Youdun's index.
Prior to discussing the invention in more detail, the following definitions are provided. Otherwise all words and phrases in this application are to be construed by their ordinary meaning, as they would be interpreted by an ordinary skilled artisan within the context of the invention.
“Pregnancy-competent oocyte”: refers to a female gamete or egg that when fertilized by natural or artificial means is capable of yielding a viable pregnancy when it is comprised in a suitable uterine environment.
“The term “competent embryo” similarly refers to an embryo with a high implantation rate leading to pregnancy. The term “high implantation rate” means the potential of the embryo when transferred in uterus, to be implanted in the uterine environment and to give rise to a viable fetus, which in turn develops into a viable offspring absent a procedure or event that terminates said pregnancy.
“Viable-pregnancy”: refers to the development of a fertilized oocyte when contained in a suitable uterine environment and its development into a viable fetus, which in turn develops into a viable offspring absent a procedure or event that terminates said pregnancy.
“Cumulus cell” refers to a cell comprised in a mass of cells that surrounds an oocyte. This is an example of an “oocyte associated cell”. These cells are believed to be involved in providing an oocyte some of its nutritional and or other requirements that are necessary to yield an oocyte which upon fertilization is “pregnancy competent”.
“Differential gene expression” refer to genes the expression of which varies within a tissue of interest; herein preferably a cell associated with an oocyte, e.g., a cumulus cell.
“Real Time RT-PCR”: refers to a method or device used therein that allows for the simultaneous amplification and quantification of specific RNA transcripts in a sample.
“Microarray analysis”: refers to the quantification of the expression levels of specific genes in a particular sample, e.g., tissue or cell sample.
“Pregnancy signature”: herein preferably refers to the normal level of expression of one or more genes or polypeptides that are selected or encoded by the specific genes selected from the group consisting of FGF12, (Hs00374427_m1), GPR137B (Hs00162803_m1), SLC2A9 (Hs00417125_m1), ARID1B (Hs00368175_m1), NR2F6 (Hs00172870_m1), ZNF132 (Hs01036387_m1), FAM36A (Hs00831105_s1), ZNF93 (Hs01656246_s1), RHBDL2 (Hs00384848_m1), DNAJC15 (Hs00387763_m1), MTUS1 (Hs00826834_m1), ND NUP133 (Hs00217272_m1). and their orthologs, splice or allelic variants wherein these genes or polypeptides are expressed in normal cumulus cells at levels which correlate to the likelihood that an oocyte that is associated with a cumulus cell which expresses said one or more genes or polypeptides at these characteristic levels are more likely to give rise to a viable pregnancy. Alternatively the signature may include one or more of the genes differentially expressed by cumulus cells the expression of which also correlates to pregnancy competent oocytes which are identified in the patent applications incorporated by reference herein.
“Characteristic level of expression of a cumulus gene” herein with respect to a particular detected expressed nucleic acid sequence or polypeptide means that the particular gene or polypeptide is expressed at levels which are substantially similar to the levels observed in cumulus cells that are associated with a normal cumulus cell or one associated with a normal or developmentally competent oocyte.
By “substantially similar” is meant that the levels of expression of individual genes are preferably within the range of +/−1-5 fold of the level of expression by a normal cumulus cell, more preferably within the range of +/−1-3-fold, still more preferably within the range of +/−1-1.5 fold and most preferably within the range of +/−1.0-1.4, 1.0-1.3, 1.0-1.2 or 1.0-1.1 fold of the detected levels of expression of the gene or polypeptide by a normal cumulus cell.
According to the invention, the oocyte may result from a natural cycle, a modified natural cycle or a stimulated cycle for cIVF or ICSI. The term “natural cycle” refers to the natural cycle by which the female or woman produces an oocyte. The term “modified natural cycle” refers to the process by which, the female or woman produces an oocyte or two under a mild ovarian stimulation with GnRH antagonists associated with recombinant FSH or hMG. The term “stimulated cycle” refers to the process by which a female or a woman produces one ore more oocytes under stimulation with GnRH agonists or antagonists associated with recombinant FSH or hMG.
“Oocyte or cumulus cell determined to possess suitable pregnancy signature or to be pregnancy competent” refers to an oocyte or a cumulus cell associated with the oocyte or an oocyte derived from the same subject at around the same time (within 0-6 months) as the tested cumulus cell which has been determined to express at least one of the genes or polypeptides encoded by the following genes: FGF12, (Hs00374427_m1), GPR137B (Hs00162803_m1), SLC2A9 (Hs00417125_m1), ARID1B (Hs00368175_m1), NR2F6 (Hs00172870_m1), ZNF132 (Hs01036387_m1), FAM36A (Hs00831105_s1), ZNF93 (Hs01656246_s1), RHBDL2 (Hs00384848_m1), DNAJC15 (Hs00387763_m1), MTUS1 (Hs00826834_m1), ND NUP133 (Hs00217272_m1). or an ortholog or splice or allelic variant thereof in a manner characteristic of the level of expression by a normal cumulus cell. Preferably at least 2 or 3 genes are expressed in a characteristic manner, more preferably at least 3-5 genes, or their allelic or splice variants. It should be understood that if the expression of numerous genes are evaluated in the subject genetic based assays, such as in the order of 10 or more, that a suitable pregnancy signature means that all or substantially all, i.e. at least 70-80% of the detected genes are expressed in a manner characteristic of a normal cumulus cell. For example if the expression of 10 genes is detected at least 7, 8 or 9 of the genes will preferably be expressed at the levels consistent with a normal cumulus cell, i.e. one associated with an oocyte capable of giving rise to a normal embryo and viable pregnancy.
In general with respect to the pregnancy signature the characteristic levels of expression is observed for any combination of the afore-identified 12-gene pregnancy signature set, i.e., any combination of at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 or 12 of the afore-identified genes, that are expressed at characteristic levels in cumulus cells, that surround “pregnancy competent” oocytes. This is intended to encompass the level at which the gene is expressed and the distribution of gene expression within cumulus cells analyzed.
“Pregnancy signature gene”: refers to a gene which is expressed at characteristic levels by a cumulus cell, which is associated with a normal or “pregnancy competent” oocyte. These genes are FGF12, (Hs00374427_m1), GPR137B (Hs00162803_m1), SLC2A9 (Hs00417125_m1), ARID1B (Hs00368175_m1), NR2F6 (Hs00172870_m1), ZNF132 (Hs01036387_m1), FAM36A (Hs00831105_s1), ZNF93 (Hs01656246_s1), RHBDL2 (Hs00384848_m1), DNAJC15 (Hs00387763_m1), MTUS1 (Hs00826834_m1), ND NUP133 (Hs00217272_m1). and their orthologs, splice and allelic variants. These 12 human genes are referenced by their name as well as Accession number. It should be understood that the invention further encompasses detection of allelic and splice variants of these genes and species orthologs.
“Probe suitable for detection of the expression of a pregnancy signature gene or polypeptide” refers to a nucleic acid sequence or sequences or ligand such as an antibody that specifically detects the expression of the transcribed gene or corresponding polypeptide. In a preferred embodiment expression is selected by use of realtime PCR detection methods.
“IVF”: refers to in vitro fertilization.
The term “classical in vitro fertilization” or “cIVF” refers to a process by which oocytes are fertilized by sperm outside of the body, in vitro. IVF is a major treatment in infertility when in vivo conception has failed. The term “intracytoplasmic sperm injection” or “ICSI” refers to an in vitro fertilization procedure in which a single sperm is injected directly into an oocyte. This procedure is most commonly used to overcome male infertility factors, although it may also be used where oocytes cannot easily be penetrated by sperm, and occasionally as a method of in vitro fertilization, especially that associated with sperm donation.
“Zona pellucida” refers to the outermost region of an oocyte.
“Method for detecting differential expressed genes” encompasses any known method for quantitatively evaluating differential gene expression using a probe that specifically detects for the expressed gene transcript or encoded polypeptide. Examples of such methods include indexing differential display reverse transcription polymerase chain reaction (DDRT-PCR; Mahadeva et al, 1998, J. Mol. Biol. 284:1391-1318; WO 94/01582; subtractive mRNA hybridization (See Advanced Mol. Biol.; R. M. Twyman (1999) Bios Scientific Publishers, Oxford, p. 334, the use of nucleic acid arrays or microarrays (see Nature Genetics, 1999, vol. 21, Suppl. 1061) and the serial analysis of gene expression. (SAGE) See e.g., Valculesev et al, Science (1995) 270:484-487) and real time PCR (RT-PCR). For example, differential levels of a transcribed gene in an oocyte cell can be detected by use of Northern blotting, and/or RT-PCR. A preferred method is the CRL amplification protocol refers to the novel total RNA amplification protocol that combines template-switching PCR and T7 based amplification methods. This protocol is well suited for samples wherein only a few cells or limited total RNA is available.
Preferably, the “pregnancy signature” genes are detected by hybridization of RNA or DNA to DNA chips, e.g., filter arrays comprising cDNA sequences or glass chips containing cDNA or in situ synthesized oligonucleotide sequences. Filtered arrays are typically better for high and medium abundance genes. DNA chips can detect low abundance genes. In the exemplary embodiment the sample may be probed with Affymetrix GeneChips comprising genes from the human genome or a subset thereof.
Alternatively, polypeptide arrays comprising the polypeptides encoded by pregnancy signature genes or antibodies that bind thereto may be produced and used for detection and diagnosis.
“EASE” is a gene ontology protocol that from a list of genes forms subgroups based on functional categories assigned to each gene based on the probability of seeing the number of subgroup genes within a category given the frequency of genes from that category appearing on the microarray.
Based on the foregoing the present invention provides a novel method of detecting whether a female, preferably human or non-human mammal, produces “pregnancy competent” oocytes or whether a particular oocyte is pregnancy competent. The method involves detecting the levels of expression of one or more genes in selected from the group consisting of FGF12, (Hs00374427_m1), GPR137B (Hs00162803_m1), SLC2A9 (Hs00417125_m1), ARID1B (Hs00368175_m1), NR2F6 (Hs00172870_m1), ZNF132 (Hs01036387_m1), FAM36A (Hs00831105_s1), ZNF93 (Hs01656246_s1), RHBDL2 (Hs00384848_m1), DNAJC15 (Hs00387763_m1), MTUS1 (Hs00826834_m1), ND NUP133 (Hs00217272_m1) that are expressed at characteristic levels by cumulus cells associated with (surrounding) oocytes that are “pregnancy competent”, i.e., these oocytes when fertilized by natural or artificial means (IVF), and transferred into a suitable uterine environment are capable of yielding a viable pregnancy, i.e., embryo that develops into a viable fetus and eventually an offspring unless the pregnancy is terminated by some event or procedure, e.g., a surgical or hormonal intervention.
As described herein the inventors have determined a set of 12 genes expressed in cumulus cells that are biomarkers for embryo potential and pregnancy outcome. They demonstrated that genes expression profile of cumulus cells which surrounds oocyte correlated to different pregnancy outcomes, allowing the identification of a specific expression signature of embryos developing toward pregnancy. Their results indicate that analysis of cumulus cells surrounding the oocyte is a non-invasive approach for embryo selection.
The set of 12 predictive genes herein are known human genes. However, the expression of these genes (on cumulus cells) had not heretofore been correlated to oocyte competency or embryo development. Therefore, this invention relates to a method for selecting a competent oocyte, comprising a step of measuring the expression level of specific genes in a cumulus cell surrounding said oocyte, wherein said genes include at least one of the genes selected from the group consisting of FGF12, (Hs00374427_m1), GPR137B (Hs00162803_m1), SLC2A9 (Hs00417125_m1), ARID1B (Hs00368175_m1), NR2F6 (Hs00172870_m1), ZNF132 (Hs01036387_m1), FAM36A (Hs00831105_s1), ZNF93 (Hs01656246_s1), RHBDL2 (Hs00384848_m1), DNAJC15 (Hs00387763_m1), MTUS1 (Hs00826834_m1), ND NUP133 (Hs00217272_m1).
The methods of the invention may further comprise a step consisting of comparing the expression level of the genes in the sample with a control, wherein detecting differential in the expression level of the genes between the sample and the control is indicative whether the oocyte is competent. The control may consist in sample comprising cumulus cells associated with a competent oocyte or in a sample comprising cumulus cells associated with an unfertilized oocyte.
The methods of the invention are applicable preferably to human women but may be applicable to other mammals (e.g., primates, dogs, cats, pigs, cows) including endangered species wherein IVF procedures are often used in zoos in order to increase population numbers.
The methods of the invention are particularly suitable for assessing the efficacy of an in vitro fertilization treatment. Accordingly the invention also relates to a method for assessing the efficacy of a controlled ovarian hyperstimulation (COS) protocol in a female subject comprising: 1) providing from said female subject at least one oocyte with its cumulus cells; ii) determining by a method of the invention whether said oocyte is a competent oocyte.
Then after such a method, the embryologist may select the competent oocytes and in vitro fertilize them, fur example using a classical in vitro fertilization (cIVF) protocol or under an intracytoplasmic sperm injection (ICSI) protocol.
A further object of the invention relates to a method for monitoring the efficacy of a controlled ovarian hyperstimulation (COS) protocol comprising: 1) isolating from said woman at least one oocyte with its cumulus cells under natural, modified or stimulated cycles; ii) determining by a method of the invention whether said oocyte is a competent oocyte; iii) and monitoring the efficacy of COS treatment based on whether it results in a competent oocyte.
The COS treatment may be based on at least one active ingredient selected from the group consisting of GnRH agonists or antagonists associated with recombinant FSH or hMG.
The present invention also relates to a method for selecting a competent embryo, comprising a step of measuring the expression level of at least one of the 12 genes selected from the group consisting of FGF12, (Hs00374427_m1), GPR137B (Hs00162803_m1), SLC2A9 (Hs00417125_m1), ARID1B (Hs00368175_m1), NR2F6 (Hs00172870_m1), ZNF132 (Hs01036387_m1), FAM36A (Hs00831105_s1), ZNF93 (Hs01656246_s1), RHBDL2 (Hs00384848_m1), DNAJC15 (Hs00387763_m1), MTUS1 (Hs00826834_m1), ND NUP133 (Hs00217272_m1).
The methods of the invention may further comprise a step consisting of comparing the expression level of the genes in the sample with a control, wherein detecting differential in the expression level of the genes between the sample and the control is indicative whether the embryo is competent. The control may consist in sample comprising cumulus cells associated with an embryo that gives rise to a viable fetus or in a sample comprising cumulus cells associated with an embryo that does not give rise to a viable fetus.
It is noted that the methods of the invention leads to an independence from morphological considerations of the embryo. Two embryos may have the same morphological aspects but by a method of the invention may present a different implantation rate leading to pregnancy.
The methods of the invention are applicable preferably to human women but may be applicable to other mammals, both domesticated ad non-domesticated such as endangered species (e.g. primates, dogs, cats, pigs, cows, tigers, lions, pandas, cheetahs, et al.).
The present invention also relates to a method for determining whether an embryo is a competent embryo, comprising a step consisting of measuring the expression level of specific genes in a cumulus cell surrounding the embryo, wherein said genes include at least one of the 12 genes selected from the group consisting of FGF12, (Hs00374427_m1), GPR137B (Hs00162803_m1), SLC2A9 (Hs00417125_m1), ARID1B (Hs00368175_m1), NR2F6 (Hs00172870_m1), ZNF132 (Hs01036387_m1), FAM36A (Hs00831105_s1), ZNF93 (Hs01656246_s1), RHBDL2 (Hs00384848_m1), DNAJC15 (Hs00387763_m1), MTUS1 (Hs00826834_m1), ND NUP133 (Hs00217272_m1).
The present invention also relates to a method for determining whether an embryo is a competent embryo, comprising: i) providing an oocyte with its cumulus cells; ii) in vitro fertilizing said oocyte; and iii) determining whether the embryo that results from step ii) is competent by determining by a method of the invention whether said oocyte of step i), is a competent oocyte.
The present invention also relates to a method for selecting a competent oocyte or a competent embryo, comprising a step of measuring in a cumulus cell surrounding said oocyte or said embryo the expression level of one or more genes selected from at least one of the 12 genes selected from the group consisting of FGF12, (Hs00374427_m1), GPR137B (Hs00162803_m1), SLC2A9 (Hs00417125_m1), ARID1B (Hs00368175_m1), NR2F6 (Hs00172870_m1), ZNF132 (Hs01036387_m1), FAM36A (Hs00831105_s1), ZNF93 (Hs01656246_s1), RHBDL2 (Hs00384848_m1), DNAJC15 (Hs00387763_m1), MTUS1 (Hs00826834_m1), ND NUP133 (Hs00217272_m1). Aberrant expression of one or more of these genes selected my be predictive of a non competent oocyte or embryo, the inability of the embryo being unable to implant or of a non competent oocyte or embryo due to early embryo arrest.
The methods of the invention are particularly suitable for enhancing the pregnancy outcome of a female. Accordingly the invention also relates to a method for enhancing the pregnancy outcome of a female comprising: i) selecting a competent embryo by performing a method of the invention; iii) implanting the embryo selected at step i) in the uterus of said female, wherein said female may or may not be the oocyte donor.
The method as above described will thus help embryologist to avoid the transfer in uterus of embryos with a poor potential for pregnancy outcome. The method as above described is also particularly suitable for avoiding multiple pregnancies by selecting the competent embryo able to lead to an implantation and a viable, full-term pregnancy.
Determination of the expression level of the genes in the “pregnancy signature” i.e., at least one of the 12 genes selected from the group consisting of FGF 12, (Hs00374427_m1), GPR137B (Hs00162803_m1), SLC2A9 (Hs00417125_m1), ARID 1B (Hs00368175_m1), NR2F6 (Hs00172870_m1), ZNF132 (Hs01036387_m1), FAM36A (Hs00831105_s1), ZNF93 (Hs01656246_s1), RHBDL2 (Hs00384848_m1), DNAJC15 (Hs00387763_m1), MTUS1 (Hs00826834_m1), ND NUP133 (Hs00217272_m1) can be performed by a variety of techniques. Generally, the expression level as determined is a relative expression level.
More preferably, the determination comprises contacting the sample with selective reagents such as probes, primers or ligands, and thereby detecting the presence, or measuring the amount, of polypeptide or nucleic acids of interest originally in the sample. Contacting may be performed in any suitable device, such as a plate, microtitre dish, test tube, well, glass, column, and so forth. In specific embodiments, the contacting is performed on a substrate coated with the reagent, such as a nucleic acid array or a specific ligand array. The substrate may be a solid or semi-solid substrate such as any suitable support comprising glass, plastic, nylon, paper, metal, polymers and the like. The substrate may be of various forms and sizes, such as a slide, a membrane, a bead, a column, a gel, etc. The contacting may be made under any condition suitable for a detectable complex, such as a nucleic acid hybrid or an antibody-antigen complex, to be formed between the reagent and the nucleic acids or polypeptides of the sample.
In a preferred embodiment, the expression level may be determined by determining the quantity of mRNA.
Methods for determining the quantity of mRNA are well known in the art. For example the nucleic acid contained in the samples (e.g., cell or tissue prepared from the patient) is first extracted according to standard methods, for example using lytic enzymes or chemical solutions or extracted by nucleic-acid-binding resins following the manufacturer's instructions. The extracted mRNA is then detected by hybridization (e.g., Northern blot analysis) and/or amplification (e.g., RT-PCR). Preferably quantitative or semi-quantitative RT-PCR is preferred. Real-time quantitative or semi-quantitative RT-PCR is particularly advantageous. Other methods of amplification include ligase chain reaction (LCR), transcription-mediated amplification (TMA), strand displacement amplification (SDA) and nucleic acid sequence based amplification (NASBA).
Nucleic acids having at least 10 nucleotides and exhibiting sequence complementarity or homology to the mRNA of interest herein find utility as hybridization probes or amplification primers. It is understood that such nucleic acids need not be identical, but are typically at least about 80% identical to the homologous region of comparable size, more preferably 85% identical and even more preferably 90-95% identical. In certain embodiments, it is advantageous to use nucleic acids in combination with appropriate means, such as a detectable label, for detecting hybridization. A wide variety of appropriate indicators are known in the art including, fluorescent, radioactive, enzymatic or other ligands (e.g. avidin/biotin).
Probes typically comprise single-stranded nucleic acids of between 10 to 1000 nucleotides in length, for instance of between 10 and 800, more preferably of between 15 and 700, typically of between 20 and 500. Primers typically are shorter single-stranded nucleic acids, of between 10 to 25 nucleotides in length, designed to perfectly or almost perfectly match a nucleic acid of interest, to be amplified. The probes and primers are “specific” to the nucleic acids they hybridize to, i.e. they preferably hybridize under high stringency hybridization conditions (corresponding to the highest melting temperature Tm, e.g., 50% formamide, 5× or 6×SCC. SCC is a 0.15 M NaCl, 0.015 M Na-citrate). The nucleic acid primers or probes used in the above amplification and detection method may be assembled as a kit. Such a kit includes consensus primers and molecular probes. A preferred kit also includes the components necessary to determine if amplification has occurred. The kit may also include, for example, PCR buffers and enzymes; positive control sequences, reaction control primers; and instructions for amplifying and detecting the specific sequences.
In a particular embodiment, the methods of the invention comprise the steps of providing total RNAs extracted from cumulus cells and subjecting the RNAs to amplification and hybridization to specific probes, more particularly by means of a quantitative or semiquantitative RT-PCR.
In another preferred embodiment, the expression level is determined by DNA chip analysis. Such DNA chip or nucleic acid microarray consists of different nucleic acid probes that are chemically attached to a substrate, which can be a microchip, a glass slide or a micro sphere-sized bead. A microchip may be constituted of polymers, plastics, resins, polysaccharides, silica or silica-based materials, carbon, metals, inorganic glasses, or nitrocellulose. Probes comprise nucleic acids such as cDNAs or oligonucleotides that may be about 10 to about 60 base pairs. To determine the expression level, a sample from a test subject, optionally first subjected to a reverse transcription, is labeled and contacted with the microarray in hybridization conditions, leading to the formation of complexes between target nucleic acids that are complementary to probe sequences attached to the microarray surface. The labeled hybridized complexes are then detected and can be quantified or semi-quantified. Labeling may be achieved by various methods, e.g. by using radioactive or fluorescent labeling. Many variants of the microarray hybridization technology are available to the man skilled in the art (see e.g. the review by Hoheisel, Nature Reviews, Genetics, 2006, 7:200-210)
In this context, the invention further provides a DNA chip comprising a solid support which carries nucleic acids that are specific to at least one of the 12 genes selected from the group consisting of FGF12, (Hs00374427_m1), GPR137B (Hs00162803_m1), SLC2A9 (Hs00417125_m1), ARID1B (Hs00368175_m1), NR2F6 (Hs00172870_m1), ZNF132 (Hs01036387_m1), FAM36A (Hs00831105_s1), ZNF93 (Hs01656246_s1), RHBDL2 (Hs00384848_m1), DNAJC15 (Hs00387763_m1), MTUS1 (Hs00826834_m1), ND NUP133 (Hs00217272_m1).
Other methods for determining the expression level of said genes include the determination of the quantity of proteins encoded by said genes.
Such methods comprise contacting the sample with a binding partner capable of selectively interacting with a marker protein present in the sample. The binding partner is generally an antibody that may be polyclonal or monoclonal, preferably monoclonal.
The presence of the protein can be detected using standard electrophoretic and immunodiagnostic techniques, including immunoassays such as competition, direct reaction, or sandwich type assays. Such assays include, but are not limited to, Western blots; agglutination tests; enzyme-labeled and mediated immunoassays, such as ELISAs; biotin/avidin type assays; radioimmunoassays; immunoelectrophoresis; immunoprecipitation, etc. The reactions generally include revealing labels such as fluorescent, chemiluminescent, radioactive, enzymatic labels or dye molecules, or other methods for detecting the formation of a complex between the antigen and the antibody or antibodies reacted therewith.
The aforementioned assays generally involve separation of unbound protein in a liquid phase from a solid phase support to which antigen-antibody complexes are bound. Solid supports which can be used in the practice of the invention include substrates such as nitrocellulose (e.g., in membrane or microtitre well form); polyvinylchloride (e.g., sheets or microtitre wells); polystyrene latex (e.g., beads or microtitre plates); polyvinylidine fluoride; diazotized paper; nylon membranes; activated beads, magnetically responsive beads, and the like. More particularly, an ELISA method can be used, wherein the wells of a microtiter plate are coated with an antibody against the protein to be tested. A biological sample containing or suspected of containing the marker protein is then added to the coated wells. After a period of incubation sufficient to allow the formation of antibody-antigen complexes, the plate (s) can be washed to remove unbound moieties and a detectably labeled secondary binding molecule added. The secondary binding molecule is allowed to react with any captured sample marker protein, the plate washed and the presence of the secondary binding molecule detected using methods well known in the art.
Alternatively an immunohistochemistry (IHC) method may be preferred. IHC specifically provides a method of detecting targets in a sample or tissue specimen in situ. The overall cellular integrity of the sample is maintained in IHC, thus allowing detection of both the presence and location of the targets of interest. Typically a sample is fixed with formalin, embedded in paraffin and cut into sections for staining and subsequent inspection by light microscopy. Current methods of IHC use either direct labeling or secondary antibody-based or hapten-based labeling. Examples of known IHC systems include, for example, EnVision™ (DakoCytomation), Powervision® (Immunovision, Springdale, Ariz.), the NBA™ kit (Zymed Laboratories Inc., South San Francisco, Calif.), HistoFine® (Nichirei Corp, Tokyo, Japan).
In particular embodiment, a tissue section (e.g. a sample comprising cumulus cells) may be mounted on a slide or other support after incubation with antibodies directed against the proteins encoded by the genes of interest. Then, microscopic inspections in the sample mounted on a suitable solid support may be performed. For the production of photomicrographs, sections comprising samples may be mounted on a glass slide or other planar support, to highlight by selective staining the presence of the proteins of interest.
Therefore IHC samples may include, for instance: (a) preparations comprising cumulus cells (b) fixed and embedded said cells and (c) detecting the proteins of interest in said cells samples. In some embodiments, an IHC staining procedure may comprise steps such as: cutting and trimming tissue, fixation, dehydration, paraffin infiltration, cutting in thin sections, mounting onto glass slides, baking, deparaffination, rehydration, antigen retrieval, blocking steps, applying primary antibodies, washing, applying secondary antibodies (optionally coupled to a suitable detectable label), washing, counter staining, and microscopic examination.
The invention also relates to a kit for performing the methods as above described, wherein said kit comprises means for measuring the expression level the levels of at least one of the 12 genes selected from the group consisting of FGF12, (Hs00374427_m1), GPR137B (Hs00162803_m1), SLC2A9 (Hs00417125_m1), ARID1B (Hs00368175_m1), NR2F6 (Hs00172870_m1), ZNF132 (Hs01036387_m1), FAM36A (Hs00831105_s1), ZNF93 (Hs01656246_s1), RHBDL2 (Hs00384848_m1), DNAJC15 (Hs00387763_m1), MTUS1 (Hs00826834_m1), ND NUP133 (Hs00217272_m1) that are indicative whether the oocyte or the embryo is competent.
The invention is further illustrated by the following description of how the inventors determined that the expression of one or more of these 12 genes on a cumulus cell correlates to oocyte competency and embryo development upon implantation and working examples. However, these examples and description should not be interpreted in any way as limiting the scope of the present invention.
The present inventors used accepted statisatical methods to assess specific genes wherein the levels of expression thereof by cumulus cells correlates to the pregnancy competency of an oocyte associated therewith or from the same donor. The methods are summarized below:
Statistical methods and algorithms used to identify the 12 gene signature of the present invention are further described below.
Gene Signature Refinement
We ran TLDAs on 49 (24N; 25F) samples that have been used in microarray profiling with 196 genes that can be represented on the TLDA.
TLDA Output Normalization
Scaling
From the TLDA analysis, we have two sets of output: Ct values (logged expression levels) and dCt values, where for a given sample, each gene's dCt value is calculated by subtracting Ct values of an endogenous control, in this case the 18S endogenous control gene imprinted on all TLDA plates, from the gene's cT value. Since cT values are logarithmic, this corresponds to dividing each gene's expression value by 18S's expression value. In other words, it is the fold change between a gene and 18S. Moving on with these values mean calculating fold change between groups based on genes' fold change with respect to 18S. dCt values are referred to as “scaled”.
Delta Ct Value Normalization
Once scaled, further normalization was done so that 12-gene valued vector for each sample has “length” or “amplitude” 1.
For a given sample, we calculated the “amplitude” or “length” of the 12 valued-vector (this is achieved by summing the square of each gene and then taking the square root) and then divide each gene value by this number.
Prediction Analysis
Following normalization, it was observed that 84 genes showed the same direction of expression in both TLDA and microarray results.
In the prediction analysis, we used the only genes in agreement between Affy and TLDA when genes that are “undetected” in 25 or more samples are filtered out. We found 84 genes to be detected and concordant between Affy and TLDA.
Leave-One-Out-Cross-Validation (L1OXV)
To arrive at the smallest, most predictive set from these 84 genes, Gema executed an iterative strategy called leave-one-out-cross-validation (L1OXV). L1OXV is explained as follows:
In this method, first number of genes in the predictive gene set, say P, is fixed. Then one sample in the training set is left-out and top P genes using the remaining samples that differentiate between N and F are calculated. Using these P genes the sample that is left out is predicted as N or F. This process is cycled through all 33 samples in the training set (leaving one out at a time). The total number of correct predictions is listed as the accuracy of the predictor on the training set.
During L1OXV process, different values for P, number of predictor genes, are tried and for ones that show good L1OXV prediction accuracy, these genes are applied on the validation set. The number of samples correctly predicted in the validation set is reported as prediction accuracy in the validation set. The smallest P that yields high training and validation accuracies are reported as the predictor gene set.
Prediction Analysis Results
Prediction analysis using these 84 confirmed genes and the normalized TLDA values of the 49 samples yielded a 12 gene signature with ˜72% prediction accuracy (35/49 correct predictions—14/24 N's; 21/25 F's correctly predicted). The predictor gene set remained significant using the Fisher's test, permutation test and randomization test (p-value<0.05).
Weighted Average Prediction Algorithm
Signal to Noise Ratio
During the weighted voting approach, we used “signal to noise ratio” (SNR) to assess predictor value of a gene g (Golub et al., 1999). Let μF(g) and μN(g) be the mean value of gene g in F and N sample groups, respectively. Similarly, let σF(g) and σN(g) be the standard deviation of gene g in F and N sample groups, respectively. We define SNR(g)=[μF(g)−μN(g)]/[σF(g)+σN(g)]. This metric defines a neighborhood in RM around ideal gene expression vectors for both groups where M=|F|+|N|, total number of samples in the data set. SNR punishes genes with an expression highly deviant in either group and provides a signed ranking method for a gene's membership. In this case large positive values indicate a good predictor for the F group and large negative values (in absolute value) indicate a good predictor for the N group.
Boundary Value
We also define the boundary between the correlation between idealized expression patterns and a given gene g as B(g)=[μF(g)+μN(g)]/2.
Assume we are given a predictor gene set of P genes G=(g1, g2, . . . , gP), a group of F and N samples and a new sample S to be predicted. The vote of gi, 1≦i≦P, is defined as Vi=SNR(gi) [S(gi)−B(gi)], where S(gi) represents the signal value of gene gi in S. Vi represents how well S(gi) relates to the “behavior” of gi in F and N samples. If Vi is positive, we conclude that based on gi, S is predicted to be F and if Vi is negative gi predicts S as N. Cycling through all genes in the predictor set we obtain P votes and let VF be the sum of all positive votes and VN be the sum of all negative votes. If VF is greater than VN in absolute value, we predict sample S as F; otherwise we predict S as N. Alternatively, one can consider the number of positive versus number of negative votes. If number of positive votes is greater than P/2, then the sample is predicted as F; otherwise it is predicted as N. Finally, both “sum” and “number of votes” criteria can be used in combination for sample prediction.
Prediction Algorithm
The first step in the prediction algorithm is to calculate prediction values for each gene in each sample. These values are calculated by multiplying the SNR of the gene by the difference between the normalized dCt value and the boundary value.
Once prediction values for each gene in each sample is calculated, a total prediction value for each sample is calculated by summing the prediction values of each gene in the sample.
The final prediction is made by using the following logic: If the sum of the Prediction Values for that sample is less than 0 and the count of the positive Prediction Values for each gene in that sample is less than 7, then the sample is an “F”, otherwise “N”.
There are various issues to consider such as handling of data points that have a value of 40, calculating fold change, and whether or not to use logged values. Below, we address such issues providing potential solutions.
Scaling: We have two sets of output: Ct values (logged expression levels) and dCt values, where for a given sample, each gene's dC value is calculated by subtracting GAPDH's Ct value from the gene's Ct value. Since Ct values are logarithmic, this corresponds to dividing each gene's expression value by GAPDH's expression value. In other words, it is the fold change between a gene and GAPDH. Moving on with these values mean calculating fold change between groups based on genes' fold change with respect to GAPDH. Since GAPDH is not one of the endogenous controls used on the array, there are no spike-in controls used in TLDA, and small variations in logarithmic scale may imply large differences in real values, we approach this with some caution. Nevertheless, we provide analysis both using scaled and unscaled values. For the remainder of this report unscaled values refer to Ct values as obtained in amplification files and scaled values refer to dCt values obtain by subtracting GAPDH.
Fold Change:
Assuming we have two samples A and B, and gene X's expression values in these samples are aX and bX, respectively. What we see in TLDA output (Ct values) are log(aX) and log(bX). If you want to calculate fold change between these two samples, you would subtract Ct values and take that to power of 2. That is, FC=2 log(aX)−log(bX). The reason for this is the following rules: log p−log q=log(p/q) and 2 log 2p=p. However, since Ct values are reversed, i.e. a smaller value means larger expression, this FC gives you the fold change B/A. To exemplify, if we see a Ct value of 10.8 in A and 12.3 in B, this means this gene is upregulated in A and fold change for B/A is 2 10.8−12.3=2−1.5=0.35. In other words, this gene is upregulated in A by 1/0.35=2.8 times. Another way to arrive this point is first to unlog Ct values and then calculate FC as we know it, except that the direction is reversed, i.e. in Ct world less means more. Hence, we have the expression level for A=2 10.8=1782, the expression level for B=2 12.3=5042, and FC B/A=1782/5042=0.35.
FC values less than 1 are hard to interpret so what we do is we reverse them and put a minus sign. For the above example, instead of saying FC for B/A is 0.35, we say FC for B/A is −1/0.35=−2.8. In all my calculations, we always subtracted F values from N values (if we were using log scale) or divided N values by F values (if we used unlogged values) and calculated FC for F/N. we used negative values to depict FCs less than 1 as explained above.
As if it has not been complicated enough to calculate a simple FC, we have more to think about. The example above contained only two samples, or, you can view it as having one sample in each group. How about if we have more than one sample in each group, as in our case (16 N, 19F)? If you average Ct values, you indeed get a geometric mean of expression levels. If you then subtract averages of Ct values in two groups and then take that to the power of two, this in turn means calculating FC by dividing geometric means of expressions in two groups. The reason for this is the following rules: alogX=logXa and logp+log q=log (pq).
To give an example, assume you have expression levels a, b, and c in group N and d, e, f, and g in group F. What we see in TLDA output is log a, log b, . . . , etc. In order to calculate FC (F/N), if we subtract the average value in F from the average value in N and then take that to power 2, we get the following:
Average in N=⅓[log a+log b+log c]=⅓ log [abc]=log(abc)⅓
Average in F=¼[log d+log e+log f+log g]=¼ log [defg]=log (defg)¼
FC(F/N)=2̂[log(abc)⅓−log(defg)¼]=2̂(log [(abc)⅓/(defg)¼])=(abc)⅓/(defg)¼
Recall that geometric mean of n numbers is nth root of their products. Therefore, we always choose to work with unlogged values. That is, we first took Ct values to the power of 2 and then did our analyses.
40:40 is an arbitrary Ct value considered high enough to represent a gene that has not been detected. However, if you set it to 42 instead of 40, all your results will change. Therefore, we resolved this by first looking at all values that are not 40 and ranked them. For Hasan Genes, this corresponds to ranking 4623 values. We then looked at the bottom 2% of these genes, that is lowest 92 genes; calculated their average and standard deviation, which turned out to be 37.9 and 0.8. We then replaced each 40 by a number randomly chosen between the interval [37.9−0.8, 37.9+0.8].
Outliers: When you manually look at the expression levels, you often see samples that behave as outliers for a given gene. In order to overcome this we removed the highest and lowest expression levels in a group (N or F) when calculating FC. We also repeated this procedure by removing highest two and lowest two samples in each group.
Gene Signature Refinement
We ran TLDAs on 49 (24N; 25F) samples that have been used in microarray profiling with 196 genes that can be represented on the TLDA.
TLDA Output Normalization
Scaling
From the TLDA analysis, we have two sets of output:
Ct values (logged expression levels) and
dCt values, where for a given sample, each gene's dCt value is calculated by subtracting Ct values of an endogenous control, in this case the 18S endogenous control gene imprinted on all TLDA plates, from the gene's cT value. Since cT values are logarithmic, this corresponds to dividing each gene's expression value by 18S's expression value. In other words, it is the fold change between a gene and 18S. Moving on with these values mean calculating fold change between groups based on genes' fold change with respect to 18S. dCt values are referred to as “scaled”.
Delta Ct Value Normalization
Once scaled, further normalization was done so that 12-gene valued vector for each sample has “length” or “amplitude” 1.
For a given sample, we calculated the “amplitude” or “length” of the 12 valued-vector (this is achieved by summing the square of each gene and then taking the square root) and then divide each gene value by this number.
Prediction Analysis
Following normalization, it was observed that 84 genes showed the same direction of expression in both TLDA and microarray results.
In the prediction analysis, we used the only genes in agreement between Affy and TLDA when genes that are “undetected” in 25 or more samples are filtered out. We found 84 genes to be detected and concordant between Affy and TLDA.
Leave-One-Out-Cross-Validation (L1OXV)
To arrive at the smallest, most predictive set from these 84 genes, Gema executed an iterative strategy called leave-one-out-cross-validation (L1OXV). L1OXV is explained as follows:
In this method, first number of genes in the predictive gene set, say P, is fixed. Then one sample in the training set is left-out and top P genes using the remaining samples that differentiate between N and F are calculated. Using these P genes the sample that is left out is predicted as N or F. This process is cycled through all 33 samples in the training set (leaving one out at a time). The total number of correct predictions is listed as the accuracy of the predictor on the training set.
During L1OXV process, different values for P, number of predictor genes, are tried and for ones that show good L1OXV prediction accuracy, these genes are applied on the validation set. The number of samples correctly predicted in the validation set is reported as prediction accuracy in the validation set. The smallest P that yields high training and validation accuracies are reported as the predictor gene set.
Prediction Analysis Results
Prediction analysis using these 84 confirmed genes and the normalized TLDA values of the 49 samples yielded a 12 gene signature with ˜72% prediction accuracy (35/49 correct predictions—14/24 N's; 21/25 F's correctly predicted). The predictor gene set remained significant using the Fisher's test, permutation test and randomization test (p-value <0.05).
The methods used to ascertain the 12 gene pregnancy signature are summarized below.
The first aspect of reducing the invention to practice involved identifying genes which constitute the pregnancy signature in women and potentially other mammals and was achieved by identifying and comparing the expression of genes in cumulus cells collected from women donors which are pregnancy competent or not. This was effected by collecting cumulus cells from different human oocytes of donor women and implanting patients with one or two putatively fertilized eggs. These patients were then, based on the results of the implantation, divided into three groups based on full, partial, and no pregnancy. For each oocyte used in the process, the transcriptional profile of at least one cumulus cell surrounding the particular oocyte was determined using Affymetrix HG 133 Plus 2 arrays containing over 54,000 transcripts. Patients were included in the study only if they did not meet any of the exclusion criteria identified in Table 1.
More particularly, in order to find gene signatures predictive of an oocyte's ability to produce a healthy baby, the inventors profiled the transcriptome of cumulus cells surrounding the oocyte using Affymetrix HG 133 Plus 2 arrays containing over 54,000 transcripts. Total RNA from individual cumulus samples was isolated using the PicoPure RNA isolation kit (Molecular Devices, Sunnyvale, Calif.). Sample RNA was amplified using a protocol developed in-house which ensures faithful and consistent amplification of small amounts of RNA to levels required for microarray analysis (Kocabas, et al., Proc Natl Acad Sci USA, 103, 14027-14032 (2006)).
Resulting amplified RNA (aRNA) was hybridized to the Affymetrix arrays. Thirty-six samples were used for which none of the embryo transfers led to successful pregnancies (labeled N for No success) and 30 samples for which all of the transfers led to successful pregnancies (labeled F for Full success). There were no known confounding factors to effect pregnancy success and relevant clinical parameters such as age or IVF cycle number did not vary significantly between the F and N groups.
Quality Control (QC) parameters were calculated for all 65 samples using Expression Console™ (EC) software freely available by the manufacturer (Affymetrix). All QC parameters including scaling factor (coefficient needed to equate the 2% trimmed mean of overall chip intensity), percentage of probe sets called present, 3′-5′ ratios for spike and labeling controls and housekeeping genes were within acceptable ranges (as described in manufacturer's guidelines) for all the samples. There were no known confounding factors to affect pregnancy success and relevant clinical parameters such as oocyte age or IVF cycle number did not vary significantly (t-test p>0.05) between F and N groups (see Table 1). Additional criteria for acceptance included absence of Polycystic Ovarian Syndrome (PCOS), no history of chemotherapy or radiation to the abdomen or pelvis, absence of >4 cm intramural or submucosal fibroids, and on the male side, no history of testicular biopsy and sperm count of >5 million.
In order to prove the soundness of the prediction model, F and N samples were divided randomly into training and validation sets. The goal was to find a predictive set of genes developed on the training set and then test the performance of the predictive genes on the validation set, which has not been used in development of the predictive model. This strategy (as opposed to using all the samples to develop a signature) prevents over-fitting and provides an assessment of predictive signature's robustness (Nevins, J. R. and Potti, A. (2007) Mining gene expression profiles: expression signatures as cancer phenotypes, Nat Rev Genet, 8, 601-609.)
A detailed summary of the materials and methods used to identify the preferred 12 gene “pregnancy signature” is provided below.
Materials and Methods Used to Identify 12-Gene Pregnancy Signature
Patient Selection, Implantation, and Pregnancy
This Institutional Review Board (IRB)-approved retrospective study included patients undergoing either IVF or ICSI from one clinical site in Chile, Clinica Las Condes (CLC) and from two in the U.S., Jarrett Fertility Group (JFG) and Pacific Fertility Center (PFC). One, two, or three embryos were transferred to each patient, and embryo transfers occurred on day 2, 3, or 5. Clinical pregnancy, defined as the presence of fetal heartbeat and gestational sac by first ultrasound examination, was determined between four and nine weeks following embryo transfer, depending upon the clinic's program. The Centers for Disease Control (CDC) use these as the standard criteria for defining pregnancy to report IVF results in the USA. This study included only samples from patients for whom all embryos transferred resulted in pregnancy (P, full success) or patients for whom zero embryos transferred resulted in pregnancy (N, no success). Live birth outcome was further recorded for patients with clinical pregnancy (P samples). We excluded patients older than 35, patients with fibroids larger than 4 cm in diameter, those with a body mass index greater than 35, or those with a history of chemo- or radiotherapy. Additionally, our study excluded families with severe male factor infertility as defined by a total sperm count of less than 5 million or a history of testicular biopsy.
Patient Stimulation
Clinicians determined the most appropriate means for stimulating their patients, but protocols generally combined either GnRH agonist or antagonist, to suppress spontaneous ovulation, with purified or recombinant FSH; they also either did or did not include hMG or luteal phase support. Ovarian response and follicular development were monitored by serum estradiol level and transvaginal ultrasound. We induced final follicular maturation by administering hCG and retrieved with ultrasound guidance 36 hours later.
Human CC Collection
Individual cumulus-oocyte-complexes (COCs) were rinsed in culture media to remove any blood, loose cells, or other debris. A small number of CCs from each COC, carefully were mechanically removed, careful to not take the very outer- or innermost layers. Each CC sample was rinsed in PBS and placed in a microcentrifuge tube with 100 μl, extraction buffer (Life Technologies, Carlsbad, Calif., USA) and resuspended gently by pipetting. Individual CC samples were incubated at 42° C. for 30 minutes, centrifuged, and frozen in liquid nitrogen until they were shipped to a processing laboratory. Corresponding oocytes were placed in individual culture drops and cultured individually until embryo transfer (ET).
RNA Isolation
RNA isolation was performed using the PicoPure RNA Isolation Kit (Life Technologies, Carlsbad, Calif., USA), according to the manufacturer's instructions. We analyzed total RNA quantity and quality using a NanoDrop 2000 spectrophotometer (NanoDrop Technologies, Wilmington, Del., USA). Total RNA isolation was done at Michigan State University, East Lansing, Mich., USA, and at GeneMarkers in Kalamazoo, Mich., USA.
Microarray Analysis
We performed transcriptional profiling of 64 individual CC samples (29 P, 35 N; Table 2) from 35 patients with Affymetrix HG-U 133 Plus 2.0 chips, which use more than 54,000 probe sets representing over 47,000 transcripts and variants. We synthesized and amplified cDNA using a protocol developed in house, as previously described (Kocabas A M, Crosby J, Ross P J, Otu H H, Beyhan Z, Can H et al. The transcriptome of human oocytes. Proc Natl Acad Sci USA 2006; 103:14027-32). Samples were analyzed with Affymetrix GeneChip Microarray Analysis Suite 5.0 and Expression Console software (Affymetrix Inc., Santa Clara, Calif., USA) for quality control assessment and normalization, following manufacturer's instructions.
Prediction Analysis
We applied the weighted voting approach utilizing “signal to noise ratio” (SNR) to assess predictor value of a gene g (Golub et al. 1999). Let μP(g) and μN(g) be the mean value of gene g in P and N sample groups, respectively. Similarly, let σP(g) and σN(g) be the standard deviation of gene g in P and N sample groups, respectively. SNR is defined as SNR(g)=[μF(g)−μN(g)]/[σF(g)+σN(g)]. This metric defines a neighborhood in RM around ideal gene expression vectors for both groups where M=|P|+|N|, total number of samples in the data set. SNR punishes genes with an expression highly deviant in either group and provides a signed ranking method for a gene's membership. In this case large positive values indicate a good predictor for the P group and large negative values (in absolute value) indicate a good predictor for the N group. The boundary between the idealized expression patterns and a given gene g is defined as B(g)=[μP(g)+μN(g)]/2.
When we are given a predictor gene set of T genes G={g1, g2, . . . , gT}, a group of P and N samples and a new sample S to be predicted. The vote of gi, 1≦i≦T, is defined as Vi=SNR(gi) [S(gi)−B(gi)], where S(gi) represents the signal value of gene gi in S. Vi represents how well S(gi) relates to the “behavior” of gi in P and N samples. If Vi is positive, we conclude that based on gi, S is predicted to be P and if Vi is negative gi predicts S as N. Cycling through all genes in the predictor set we obtain T votes used in the prediction of sample S.
When a prediction model is applied on a data set, the data set is first divided into Training and Validation sets. The predictor gene set is calculated on the Training set using leave-one-out cross-validation (L1OXV). In the L1OXV method utilizing a predictive gene set of T genes, one sample in the Training Set is left-out and top T genes using the remaining samples that differentiate between N and P are calculated. Using these T genes, the sample that is left out is predicted as N or F. This process is cycled through all samples in the Training Set leaving one out at a time. The total number of correct predictions is listed as the accuracy of the predictor on the training set. The predictor set of T genes is then applied on the Validation set. We assigned significance of the predictor genes using Fisher's test and two additional strategies: i) a permutation test, in which we randomly permuted class labels of P and N sample groups and identified optimum gene predictors using the same strategy ii) randomization test, in which we assessed the accuracy of T randomly chosen gene predictors using the original data set class labels. We compared the performance of the original predictor set with the results obtained using permutation and randomization tests to assess the original predictor set's significance. In both tests, we used 1000 realizations.
Quantitative Real-Time PCR
We performed cDNA synthesis using 8 ng total RNA with the High Capacity cDNA Reverse Transcription Kit (Life Technologies, Carlsbad, Calif., USA), according to the manufacturer's protocol. Preamplification was done according to the Taqman PreAmp Pools Protocol (Life Technologies) using a custom PreAmp Pool for 381 unique mRNA assays. Each sample reaction included 25 μL of 2× Taqman PreAmp Master Mix (Life Technologies), 12.5 μL of custom PreAmp Pool (Life Technologies), and 12.5 μL of cDNA template. The thermocycler conditions were as follows: 10 minutes at 95° C., followed by 14 cycles of 15 seconds at 95° C. and then 4 minutes at 60° C. We employed a custom Taqman Low Density Array (TLDA; Life Technologies) and ran one sample per array. Endogenous control genes 18S, GAPDH, and β-actin were included for relative quantification of transcripts. Forty-nine of the 64 individual CC samples previously used on microarray, along with 37 new individual biological CC samples from new patients, were analyzed on TLDA (Table 2).
Statistics
We used the GeNorm algorithm in Real-Time StatMiner (Integromics, Philadelphia, Pa., USA) software to identify the most stable endogenous control gene, or combination of endogenous control genes on the qRT-PCR TLDA across all sample sets. The Mann-Whitney test (Zar J H. Biostatistical Analysis (5th Edition). Upper Saddle River, N.J.: Pearson Prentice-Hall, 2010) was used to evaluate the clinical characteristics between pregnant (P) and nonpregnant (N) groups. Because we assessed several variables, we used α=0.01 to determine statistical significance so as to manage the potentially inflated false-positive error rate. Fisher's exact test was used to determine the significance of prediction results during the pregnancy prediction analysis of the qRT-PCR gene expression data. We employed analysis of variance (ANOVA) to assess categorical variable differences in gene expression, and we used Pearson's correlation to evaluate the relationship between continuous variables and gene expression. The ROC analysis was performed on the gene expression using the clinical pregnancy outcome (P, N) as the basis for truth. The ROC curve was created by plotting the true positive fraction (TPF or sensitivity) versus the false positive fraction (FPF or 1-specificity) determined by moving the cut-point value along the gene expression range. The area under this curve (AUC) indicates the degree of predictive ability of the gene expression ranging from 0.5 (random chance) to 1.0 (perfect). All analyses were carried out using SAS software (SAS V9.2; Cary, N.C., USA) or MedCalc (V11.3.1.0; Mariakerke, Belgium).
Results
Patient and Sample Clinical Characteristics
The analysis included a total of 101 CC samples, 86 of which were included on qRT-PCR TLDA from 55 patients (FIG. 1, Table 2). All TLDA P samples that were confirmed as clinical pregnancies at fetal heartbeat check advanced to healthy live birth.
Of the 86 samples used to confirm, refine, and validate the predictive gene set using qRT-PCR, 25, 45, and 16 samples were provided by CLC, JFG, and PFC, respectively (Table 5). The majority of samples came from double ETs (69), while eight CCs came from single ETs, and nine samples corresponded to triple ETs. ETs for 47 samples occurred on days 2/3, and 39 underwent ETs on day 5; no significant difference existed between P and N groups on the day of ET. We found no differences in the primary clinical characteristics, such as oocyte age and cycle number, between P and N groups (Table 7). However, we found a higher number of metaphase II (MID oocytes (p. 0.008) in the P group and a lower fertilization rate (number of 2PN from MII oocytes; p. 0.002) in the P group (Table 8). Due to these observed differences between groups, we ran a clinical correlate of gene expression analysis, which we describe in a later section.
Pregnancy Prediction Analysis
First, we used microarrays to obtain transcriptional profiling for 64 individual CC samples (35 N and 29 P; Table 2, FIG. 1). Signal-to-noise ratio (SNR) was used to assess the predictive value of a gene using weighted voting, as previously described (Golub T R, Slonim D K, Tamayo P, Huard C, Gaasenbeek M, Mesirov J P et al. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 1999; 286:531-7). This group was divided into (1) a training set (18 N and 15 P) to find a predictive set of genes and (2) a validation set (17 N and 14 P). We used the validation set to test the performance of the predictive genes; the validation set comprised and consisted of samples that were not used in development of the predictive model. This strategy prevented overfitting and provided an assessment of the predictive signature's robustness (Nevins J R, Potti A. Mining gene expression profiles: expression signatures as cancer phenotypes. Nat Rev Genet 2007; 8:601-9). In order to find genes that correlated with success, we identified genes in the training set (P versus N) that showed differential expression based on t-tests (p<0.05 with Bonferroni correction for multiple hypothesis testing). The resulting 1180 genes, called “descriptive genes,” were used for L1OXV in the training set (Radmacher M D, McShane L M, Simon R. A paradigm for class prediction using gene expression profiles. J Comput Biol 2002; 9:505-11.). Weighted voting analysis revealed a 227 gene predictor set yielding 97% L1OXV accuracy (32/33 correct predictions—17/18 N and 15/15 P correctly predicted) on the training set and 87% (27/31 correct predictions—17/17 N and 10/14 P correctly predicted) prediction accuracy on the validation set. The prediction results remained significant using Fisher's test, the permutation test, and the randomization test (p<0.05).
Validation and Refinement of Predictive Genes with qRT-PCR
Of 227 genes found to be predictive of pregnancy outcome, we included 196 in our custom TLDA for qRT-PCR validation. The endogenous controls O-actin, GAPDH, and 18S were evaluated for the most stable expression across the sample set. We found that 18S alone was most stable, and Ct values were normalized to this gene's expression level, providing dCt values which represented the fold change of a sample's gene relative to 18S expression.
We used a subset of 49 samples (24 N and 25 P; Table 1, FIG. 1) out of 64 samples used in microarrays to confirm and further refine the predictive gene set. Following normalization to 185, we observed that 84 genes showed concordant expression on TLDA, as was previously determined on microarray with the same 49 biological samples. Using pregnancy prediction analysis on these 84 genes with the same strategy (weighted voting utilizing the SNR) yielded a predictive set of 12 genes. In order to further assess the predictive value of the 12-gene set, we ran TLDA on 37 new biological samples from new patients (19 N and 18 P; Table 1, FIG. 1) not used in the microarray analysis. The predictor gene set remained significant using Fisher's test, the permutation test, and the randomization test (p<0.05) during both refinement and validation procedures.
Gene Expression in Cumulus Cells as a Biomarker of Pregnancy Outcome
The 12-gene predictor set identified using qRT-PCR TLDA on Sample Set A′ (49 samples previously screened by microarray) was validated on Sample Set B (37 new biological samples not used by microarray) using weighted voting as previously described. Seven genes were upregulated in P samples compared to N, and five genes were downregulated in P compared to N group (Table 5). When applied to the validating B data set (37 samples), this pregnancy prediction model yielded an accuracy of 78%, a sensitivity for identifying successful pregnancy outcomes of 72%, a specificity for identifying failed pregnancy outcomes of 84%, a positive predictive value (PPV) of 81%, and a negative predictive value (NPV) of 76% (Table 3).
Receiver Operating Characteristic (ROC) analysis, a common method for evaluating the diagnostic utility of a test (Zhou K H, O'Malley A J, Mauri L. Receiver-operating characteristic analysis for evaluating diagnostic tests and predictive models. Circulation 2007; 115:654-7; and Linden A. Measuring diagnostic and predictive accuracy in disease management: an introduction to receiver operating characteristic (ROC) analysis. J Eval Clin Pract 2006; 12:132-9;), was conducted to determine the predictive power of identifying a successful pregnancy outcome based upon the 12-gene prediction values for the validating 37 B samples (Table 4, FIG. 2). The AUC, which indicates the degree of predictive ability, was 0.763±0.079, which is significantly (p=0.0009) greater than 0.5 (random chance prediction). Our sample size and the AUC observed in our ROC analysis fall in line with previous diagnostic reports within the IVF field (Esterhuizen A D, Franken D R, Lourens J G H, Prinsloo E, van Rooyen L H. Sperm chromatin packaging as an indicator of in-vitro fertilization rates. Hum Reprod 2000; 15:657-61; and Fabregues F, Balasch J, Creus M, Carmona F, Puerto B, Quinto L et al. Ovarian Reserve Test with Human Menopausal Gonadotropin as a Predictor of In Vitro Fertilization Outcome. J Assist Reprod Genet 2000; 17:13-9).
Clinical Correlates of Gene Expression
We evaluated patients' clinical characteristics for potential correlation with the 12-gene expression prediction values. Again, because several variables were being assessed, we used α=0.01 to determine statistical significance to manage the potentially inflated false-positive error rate. Of the continuous variables, none significantly correlated with the prediction value (Table 8), including the number of MII oocytes and the fertilization rate (2PN/MII), despite their displaying different values between pregnant and nonpregnant samples. Although the number of MII oocytes and the fertilization rate differed significantly in the pregnancy outcome groups, neither variable correlated with the gene expression signature. That is, despite different numbers of MIT oocytes and different fertilization rates between P and N groups, this did not seem to affect the strength of the pregnancy signature.
The differences in the sum of the 12-gene prediction value for the categorical assessments were evaluated using ANOVA. If the overall test for category differences was considered significant at α=0.01, then we evaluated pairwise comparisons of the categories. Only two categorical variables, gonadotropin and ET catheter, were found to differ significantly in gene expression (Table 9). Regarding gonadotropin, only JFG used the pFSH/hMG regimen (n=45); PFC used rFSH exclusively (n=16). Thus, we found a degree of confounding between site and gonadotropin, and these results should be interpreted with caution. Similarly, regarding the ET catheter, results should be interpreted cautiously, as a confounding effect resulted from each site using different catheters exclusively. Further, the Wallace catheter sample size was very small (n=5), providing very little power from which to draw conclusions. Finally, with respect to clinical site, the majority of samples from CLC were collected much earlier and stored longer than those from JFG, likely explaining the difference seen in predictive values between these sites.
Tables 2-9 referenced supra are set forth below.
†Most patients contributed sibling samples to both the Training and Validation Sets
The ability to select viable oocytes and embryos during IVF has significant medical, social, and financial benefits. A diagnostic assay using CCs that complements morphology would present a noninvasive approach to attaining this goal. A critical question, however, has remained whether developing a test robust enough to overcome inherent variations in patients and clinics would be possible. This report describes, for the first time, a novel set of 12 genes—produced from multiple sites and diverse clinical protocols—that predict pregnancy outcome. Our proposed prediction strategy, based on the expression levels of the genes in CCs, paves the way for a noninvasive supplementary tool for selecting viable oocytes. We developed the predictive gene set using a global expression profiling approach and then employed qRT-PCR to validate it on two independent biological sample sets. Additional ROC analysis confirmed that this predictive gene set has significant predictive power.
While the genes that ultimately comprised our final gene set do not overlap with genes reported as predictive of pregnancy previously, this is not entirely surprising. This could be due to several factors: differences in technical approaches such as the use of TLDAs, the fact that our algorithm incorporates weighted voting which places varied contribution of each gene's expression in the prediction model, or a combination of both.
The genes in our predictive set are, in part, involved with glucose metabolism, transcriptional regulation, gonadotropin regulation, and apoptosis—all essential to viable COC processes. Considering the generally known functions of some of the genes or gene families, it is not improbable that they could reveal themselves as part of a pregnancy predictive CC gene panel. For example, since the fibroblast growth factor (FGF) family plays an important role in regulating cell survival, FGF12 appears upregulated in our P group compared to the N group of samples.
Glucose, which is metabolized by the glycolysis pathway, acts as a crucial metabolite for the COC (Leese H J, Baumann C G, Brison D R, McEvoy T G, Sturmey R G. Metabolism of the viable mammalian embryo: quietness revisited. Mol Hum Reprod 2008; 14:667-72.). The breakdown of glucose by CCs provides the oocyte with essential nutrients, such as pyruvate and lactate, to complete maturation in preparation for ovulation. Converting glucose into these byproducts has further importance: providing the oocyte with the maternal store of metabolites/energy sources as it is nurtured by the surrounding granulosa cells, of which CCs are one type. Thus, granulosa cells play a critical role in supporting the developing oocyte and establishing its maternal supply of energy resources to carry it through the first few cell divisions (Watson A J. Oocyte cytoplasmic maturation: A key mediator of oocyte and embryo developmental competence. J Anim Sci 2007; 85:E1-E3.). SCL2A9 (also known as GLUT9), a member of the SLC2A facilitative transporter family, plays an important role in glucose homeostasis (Sutton-McDowall M L, Gilchrist R B, Thompson J G. The pivotal role of glucose metabolism in determining oocyte developmental competence. Reproduction 2010; 139:685-95). Specifically, SCL2A9 has been demonstrated to transport uric acid and hexose sugars, of which glucose is one example (Augustin R, Carayannopoulos M O, Dowd L O, Phay J E, Moley J F, Moley K H. Identification and characterization of human glucose transporter-like protein-9 (GLUT9): alternative splicing alters trafficking. J Biol Chem 2004; 279:16229-36). In the bovine model, mature COCs were observed to utilize more glucose and its metabolic products than immature COCs (Sutton M L, Cetica P D, Beconi M T, Kind K L, Gilchrist R B, Thompson J G. Influence of oocyte-secreted factors and culture duration on the metabolic activity of bovine cumulus cell complexes. Reproduction 2003; 126:27-34). Given this fact, the increased expression of SCL2A9 in CCs corresponding to viable oocytes may reflect a more dynamic transport of glucose within those CCs and therefore a more properly functioning metabolic state in these COCs as a whole.
NR2F6 was also upregulated in our P sample sets relative to N. This gene is an orphan nuclear receptor, belonging to a subgroup of the nuclear receptor superfamily of transcription factors and cofactors. While the exact function of NR2F6 remains undefined in CCs, orphan nuclear receptors are known to play a role in many reproductive processes (Bertolin K, Bellefleur A-M, Zhang C, Murphy B D. Orphan nuclear receptor regulation of reproduction. Animal Reproduction 2010; 7:146-53). Specifically, research has shown that NR2F6 inhibits luteinizing hormone receptor (LHr) transcription via promoter repression (Zhang Y, Dufau M L. Nuclear orphan receptors regulate transcription of the gene for the human luteinizing hormone receptor. J Biol Chem 2000; 275:2763-70;). The formation of LHr on the surface of CCs plays a key part in proper follicular maturation prior to the LH surge, which induces ovulation. However, overexpression of LHr can also have adverse effects on the ovulatory process, as higher levels of this receptor have been reported in the granulosa cells of women with polycystic ovaries compared to those without (Jakimiuk A J, Weitsman S R, Navab A, Magoffin D A. Luteinizing Hormone Receptor, Steroidogenesis Acute Regulatory Protein, and Steroidogenic Enzyme Messenger Ribonucleic Acids Are Overexpressed in Thecal and Granulosa Cells from Polycystic Ovaries. J Clin Endocrinol Metab 2001; 86:1318-23). The slightly lower expression of NR2F6 seen in our N group may indicate a hyperactive state of LHr expression, which could lead to suboptimal maturation of the follicle.
We found four additional genes that were upregulated in the CCs of P samples compared to N samples: ARID1B, FAM36A, GPR137B, and ZNF132. ARID1B is part of the SWI/SNF chromatin remodeling complex, which plays a critical role in cell cycle control. Research has demonstrated the necessity of open gap junction communication between follicular cells and their oocyte for proper meiotic maturation, which involves chromatin remodeling maturation (Luciano A M, Franciosi F, Modina S C, Lodde V. Gap Junction-Mediated Communications Regulate Chromatin Remodeling During Bovine Oocyte Growth and Differentiation Through cAMP-Dependent Mechanism(s). Biol Reprod 2011; 85:1252-9). Increased ARID1B in our P samples may facilitate gap junction communication and improve oocyte viability. The function of FAM36A is not well characterized, but this protein has been localized in mitochondria and is integral to the membrane. GPR137B is also poorly characterized; however, this gene encodes a G-protein-coupled receptor (GPCR) integral membrane protein. Given the prominent role GPCRs play in interpreting external messages for a cell, this could indicate an important role for GPR137B in signaling within the follicular microenvironment. ZNF132—yet another gene with a poorly understood function—is, however, a member of the zinc finger protein family, which aids in directly affecting transcription by acting as the DNA-binding subunit of transcription factors, thus conferring DNA sequence specificity.
Five genes in our signature were downregulated in P versus N samples: DNAJC15, RHBDL2, MTUS1, NUP133, and ZNF93. Little is known about the specific action of these genes. DNAJC15 is localized to mitochondria and membranes and is thought to have heat-shock-binding properties. RHBDL2 is an intermembrane protease, and research increasingly suggests the importance of intermembrane proteolysis in regulating a variety of cellular processes, such as development and metabolism (Erez E, Fass D, Bibi E. How intramembrane proteases bury hydrolytic reactions in the membrane. Nature 2009; 459:371-8). MTUS1 has previously been reported as more highly expressed in ovaries than in other tissues (Nagase T, Ishikawa K-i, Kikuno R, Hirosawa M, Nomura N, Ohara O. Prediction of the Coding Sequences of Unidentified Human Genes. XV. The Complete Sequences of 100 New cDNA Clones from Brain Which Code for Large Proteins in vitro. DNA Research 1999; 6:337-45; Nagase T, Ishikawa K-i, Kikuno R, Hirosawa M, Nomura N, Ohara O. Prediction of the Coding Sequences of Unidentified Human Genes. XV. The Complete Sequences of 100 New cDNA Clones from Brain Which Code for Large Proteins in vitro. DNA Research 1999; 6:337-45)), although the specific action of this gene in ovarian regions remains documented. NUP133 is involved with nucleocytoplasmic transport activity, a subset of which includes glucose transport. Finally, ZNF93, another zinc finger gene, has an as-yet-undescribed function but is thought, like other characterized zinc finger proteins, to regulate transcription in a direct manner as the DNA-binding component of transcription factors.
The functional role of each gene in our predictive set with respect to oocyte and embryo viability remains to be elucidated. Hypothesis-driven experiments are required to interrogate how each gene expressed in CCs acts individually, and in combination, to impart or compromise the developmental competence of their respective oocyte, dependent on its level of expression.
Despite a significant difference in the number of MII oocytes and the fertilization rate between samples from pregnant and nonpregnant patients, the clinical correlates of gene expression analysis has demonstrated that these differences have no correlation with the gene expression values, and therefore no effect on the strength of our predictive gene set.
The effect on gene expression values identified in gonadotropin choice and ET catheter between pregnancy outcome groups appears more indicative of the clinical site, as usage of these factors were confounded with site. Again, regarding the clinical site difference seen between CLC and JFG, the majority of samples from CLC were collected earlier and stored longer than those from the JFG, likely explaining the difference seen in this covariate.
The data presented herein reveal a novel 12-gene set in CCs that are predictive of pregnancy; these data, from multiple sites using multiple stimulation protocols, had an overall accuracy of 78%. ROC analysis confirms the predictive power of our test, with an AUC=0.763±0.079, which is significantly greater than the 0.5 of random chance prediction (p=0.0009) and comparable with the expectation for a successful diagnostic test. This is particularly promising given the heterogeneous nature of the patients and the treatment differences in the treatment they received.
This gene signature may be applied to randomized control clinical trial across multiple sites in order to further confirm its pregnancy prediction value in identifying the oocytes with the highest pregnancy potential for embryo transfer.
In conclusion, using accepted statistical methods the inventors identified 12 genes, i.e., FGF12, (Hs00374427_m1), GPR137B (Hs00162803_m1), SLC2A9 (Hs00417125_m1), ARID1B (Hs00368175_m1), NR2F6 (Hs00172870_m1), ZNF132 (Hs01036387_m1), FAM36A (Hs00831105_s1), ZNF93 (Hs01656246_s1), RHBDL2 (Hs00384848_m1), DNAJC15 (Hs00387763_m1), MTUS1 (Hs00826834_m1), ND NUP133 (Hs00217272_m1), wherein the levels of expression of one of these genes, or any combination of these genes of by cumulus cells correlates to the capability of an oocyte associated therewith or from the same women donor to result in a viable pregnancy. Therefore, methods which detect the expression of one or more of these 12 genes by a cumulus cell may be used in order to determine whether an oocyte associated therewith or from the same women donor is suitable for use in an IVF procedure, as well as for identifying individuals with conditions that result in oocytes unsuitable for use in IVF procedures, and for monitoring the success of fertility treatments.
Throughout this application, various references describe the state of the art to which this invention pertains. The disclosures of these references are hereby incorporated by reference into the present disclosure.
This PCT application claims priority to U.S. Provisional Application Ser. No. 61/547,403 filed on Oct. 14, 2011 and U.S. Provisional Application Ser. No. 61/581,219 filed on Dec. 29, 2011. This application also relates to PCT application WO/2011/060080, published May 19, 2011, U.S. provisional application Ser. No. 61/388,296 filed Sep. 30, 2010; U.S. provisional application Ser. No. 61/387,313 and 61/387,286 both filed Sep. 28, 2010; U.S. provisional application Ser. No. 61/360,556 filed on Jul. 1, 2010 and U.S. provisional application Ser. No. 61/259,783 filed on Nov. 10, 2009. The contents of all of the identified provisional and non-provisional applications is incorporated herein by reference in its entirety.
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/US2012/060307 | 10/15/2012 | WO | 00 | 4/14/2014 |
Number | Date | Country | |
---|---|---|---|
61547403 | Oct 2011 | US | |
61581219 | Dec 2011 | US |