The present invention relates a method for assessing endometrial receptivity of a female subject before embryo implantation, comprising performing an assay on fertility-associated biomarkers in methylation profiles of cervical secretions of the female subject.
In vitro fertilization (IVF) has become the most effective treatment for women who have 59 difficulties conceiving since the first baby was born via this medically assisted reproduction method in 1978. The number of IVF treatments performed is continuing to increase globally. A successful pregnancy relies on embryo, endometrium and embryo-endometrium synchronization. Although the selection of euploid embryos has been achieved via the application of preimplantation genetic testing for aneuploidies (PGT-A), resulting in increased clinical pregnancy rates and live birth rates, favorable outcomes after the transfer of embryos are not always guaranteed. Ovulation induction protocols and embryo culture systems in the laboratory have been continuously optimized following decades of development, resulting in improved quantity and quality of embryos. However, the implantation rate remains 25-40%, preventing IVF from having an ideal outcome. To overcome the last barrier to IVF success, namely, the implantation process, endometrial status must become readily assessable.
Implantation requires highly orchestrated interactions between the developing embryo and endometrium. The association between abnormal implantation and reproductive failure is evident. The ability of the endometrium to allow implantation of the embryo is termed receptivity. A successful pregnancy must be established on a receptive endometrium. Although efforts have been made to characterize a receptive endometrium, neither morphological parameters nor molecular biomarkers correlate well with pregnancy outcomes. Normal implantation occurs during a short time period in the mid-secretory phase termed the window of implantation (WOI). In this period, the endometrium becomes optimally receptive to support embryo implantation. Recently, a transcriptomic profile based on endometrial biopsies suggested that implantation failure results from displacement of the WOI. In addition, according to a transcriptomic analysis, pregnancy can be achieved if the timing of embryo transfer is advanced or delayed. Identifying the timeframe of the WOI can improve pregnancy outcomes in IVF by optimizing the synchrony between embryo and endometrium. However, implantation failure is more common for an endometrium with abnormal or absent WOI.
The human endometrium is a unique tissue that undergoes monthly changes involving regeneration, remodeling, and degradation. In each cycle, endometrial stem/progenitor cells are responsible for construction of the new endometrium following shedding of the old one. The substantial rearrangement of endometrial tissue during the menstrual phase is accompanied by vigorous epigenetic alterations. The DNA methylation of the endometrium then remains almost unchanged through the menstrual cycle until the late-secretory phase when the endometrium starts to break down. DNA methylation is a major epigenetic event involving the addition of a methyl group (—CH3) to the carbon at position 5 of cytosine residues in the DNA template. Aberrant methylation of promoter regions of several genes has been found to be strongly associated with diseases. Since DNA methylation of the endometrium drastically changes only when stem/progenitor cells participate in the regeneration, it is likely that each newly grown endometrium has a distinct DNA methylation landscape regulating its behaviors, including the ability to allow embryo implantation. As evidenced by several studies, alterations in DNA methylation impair the expression of genes involved in embryo-endometrium crosstalk, implantation, and decidualization, leading to low fecundity. Evidence also indicates that the DNA methylome of endometrial tissue differs between healthy fertile donors and women suffering recurrent implantation failure. So far, most studies investigating the receptivity of the endometrium have been based on analysis of endometrial tissue obtained through biopsies. Endometrial biopsy is a blind & invasive procedure done by inserting a thin catheter through the natural opening of the cervix and into the uterine cavity to sample the endometrial cavity. In an endometrial biopsy, a small piece of tissue from the lining of the uterus is removed. Since the invasiveness of endometrial biopsies is detrimental to embryo implantation, embryos must be transferred in cycles separate from the analyzed one. Therefore, differences in the endometrium between different menstrual cycles cannot be evaluated by invasive approaches and are thus always ignored. Criticisms of invasive analysis such as inconsistent results being obtained between menstrual cycles in the same individual and inconclusive benefits of personalized embryo transfer based on a transcriptome-defined WOI might be explained by monthly variation of the endometrium.
From experience in cancer screening, cancer-associated DNA methylation can be detected in cell-free DNA or fragmented DNA present in body fluids and secretions. Indeed, the DNA methylome in cervical scrapings has been used as a noninvasive biomarker for the detection of endometrial cancer with high accuracy. Because cervical secretions can reflect the intrauterine environment, methylation profiles may be used as proxies for investigating the differences of DNA methylome in the endometrium between pregnancy and non-pregnancy cycles.
The present invention provides a predictive method for assessing the probability of the success of embryo implantation based on methylation profiles of cervical secretions at the preimplantation stage.
The patent or application file contains at least one drawing executed in color. Copies of this patent or patent application publication with color drawing(s) will be provided by the Office upon request and payment of the necessary fee.
The term “a” or “an” as used herein is to describe elements and ingredients of the present invention. The term is used only for convenience and providing the basic concepts of the present invention. Furthermore, the description should be understood as comprising one or at least one, and unless otherwise explicitly indicated by the context, singular terms include pluralities and plural terms include the singular. When used in conjunction with the word “comprising” in a claim, the term “a” or “an” may mean one or more than one.
The term “or” as used herein may mean “and/or.”
The endometrium is the mucosa coating the inside of the uterine cavity. Its function is to house the embryo, allowing its implantation and favoring the development of the placenta. This process requires a receptive endometrium capable of responding to the signals of the blastocyst, which is the stage of development of the embryo when it implants. Human endometrium is a tissue cyclically regulated by hormones, the hormones preparing it to reach said receptivity state are estradiol, which induces cell proliferation, and progesterone which is involved in differentiation, causing a large number of changes in the gene expression profile of the endometrium, which reaches a receptive phenotype for a short time period referred to as “window of implantation”. Therefore, the endometrial receptivity is the state in which the endometrium is prepared for embryo implantation. The present invention first demonstrates that gene methylation patterns from the cervical sample is associated with the change of endometrial receptivity during the pregnancy cycles.
The present invention provides a method for identifying a potential biomarker for determining the probability of the success of embryo implantation, comprising: (1) providing a cervical sample from a female subject; (2) assaying nucleic acids of the cervical sample to generate a methylation profile comprising 1733 genes listed in Table 4; (3) calculating a statistical value of at least one gene from the 1733 genes in the methylation profile; and (4) identifying the at least one gene as a biomarker in the cervical sample for determining the probability of the success of embryo implantation when the statistical value of the at least one gene is higher than a threshold value.
In one embodiment, the cervical sample is a biological sample obtained from the lumen of the cervix. The cervix is the lower part of the uterus in the human female reproductive system, composed of two regions; the ectocervix and the endocervical canal. The cervix connects the vagina with the main body of the uterus, acting as a gateway between them. Anatomically and histologically, the cervix is distinct from the uterus, and hence the present invention considers it as a separate anatomical structure. In a preferred embodiment, the biological sample comprises secretions, epithelial cells, stromal cells, squamous cells, glandular cells, immune cells, vaginal fluids, vaginal microbiota, mucus molecules or water.
In another embodiment, the cervical sample is obtained by using a cotton applicator, a cotton wool ball, a cotton swab, or cotton balls. It can be gently rubbed against the cervix to obtain samples.
An embryo transfer is part of the process of IVF. In one embodiment, the cervical sample is obtained from 1-5 days before or on the day of the female subject receiving embryo transfer. In other words, the cervical sample is obtained on day −5˜−1, or on the day of the female subject receiving embryo transfer. In a preferred embodiment, the cervical sample is obtained on day P+0, P+1, P+2, P+3, P+4 or P+5. In a more preferred embodiment, the cervical sample is obtained on day P+0 or day P+5. P+0 means the day of starting progesterone supplementation (considered as P+0). P+5 means the following 5th day of progesterone supplementation or administration (considered as P+5). Progesterone can be applied orally, vaginally, intramuscularly, or subcutaneously. Different protocols for initiation of progesterone supplementation are reported, ranging from before oocyte retrieval to 6 days after oocyte retrieval. In current IVF practice, day 3 cleavage-stage embryo transfer and day 5 blastocyst-stage embryo transfer is routine in many assisted reproductive technology centers. A day 3 embryo should therefore be transferred 2 days earlier. In a preferred embodiment, the biological sample is obtained before embryo transfer during IVF.
The term “subject” as used herein, refers to an animal including the human species. Accordingly, the term “subject” comprises any mammal, which may benefit from the method of the present invention. The term “mammal” refers to all members of the class Mammalia. In one embodiment, the subject is a human.
The term “methylation” as used herein, refers to the covalent attachment of a methyl group at the C5-position of cytosine within the CpG dinucleotides of the core promoter region of a gene. The term “methylation state” refers to the presence or absence of 5-methyl-cytosine (5-mCyt) at one or a plurality of CpG dinucleotides within a gene or nucleic acid sequence of interest. As used herein, the term “methylation level” refers to the amount of methylation in one or more copies of a gene or nucleic acid sequence of interest. The methylation level may be calculated as an absolute measure of methylation within the gene or nucleic acid sequence of interest. Also, a “relative methylation level” may be determined as the amount of methylated DNA, relative to the total amount DNA present or as the number of methylated copies of a gene or nucleic acid sequence of interest, relative to the total number of copies of the gene or nucleic acid sequence. Additionally, the “methylation level” can be determined as the percentage of methylated CpG sites within the DNA stretch of interest.
As used herein, the term “methylation profile” refers to a set of data to representing the methylation level of one or more target genes in a sample of interest. In one embodiment, the methylation profile is generated by bisulfite sequencing PCR (BSP), reduced representation bisulfite sequencing (RRBS), whole genome bisulfite sequencing (WGBS), methylated DNA immunoprecipitation sequencing (MeDIP), enzymatic methyl sequencing (EM-Seq), mass spectrometry method, methylation specific PCR, qPCR, PCR, sanger sequencing, next-generation sequencer, methylation chip, methylation chip array, ion torrent sequencer, real-time nanopore sequencing, smaller genomes sequencing, targeted regions sequencing, targeted amplicons sequencing, fiber optical particle plasmon resonance (FOPPR), or changes in transverse proton relaxation. In a preferred embodiment, the methylation profile is generated by Infinium methylation array, a tiling microarray or methylation specific PCR.
The present invention uses a computational predictor to perform a mathematical tool which uses a data matrix, in this case of the data generated with the methylation profile, and learns to distinguish classes, in this case two or more classes according to the different pregnancy profiles that are generated (pregnancy and non-pregnancy). The set of samples which trains the classifier to define the classes is referred to as training set. In other words, the methylation profile of these samples, measured with the endometrial receptivity, are used by the program to know which probes are the most informative and to distinguish between classes (different normal non-receptive and receptivity states). This training set will gradually grow as a larger number of samples are tested.
The classification is done by the bioinformatic program using different mathematical algorithms, there being many available. An algorithm is a well-defined, ordered and finite list of operations which allows solving a problem. A final state is reached through successive and well-defined steps given an initial state and an input, obtaining a solution. The classifier calculates the error committed by means of a process called cross-validation, which consists of leaving a subset of the samples of the training set of a known actual class out of the group for defining the classes, and then testing them with the generated model and seeing if it is right. This is done by making all the possible combinations. The efficacy of the classifier is calculated and prediction models are obtained which correctly classify all the samples of the training set. In other words, all the samples of the training set are classified by the predictor in the assigned actual class known by the inventors.
Depending on all the parameters relating to the computational predictor explained above, a prediction model is generated which classifies all the samples according to the assigned actual class. Therefore, the genes of the methylation profile in the cervical sample can be used for the positive identification of the endometrial receptivity.
Therefore, the present invention also provides a method for identifying a potential gene associated with the probability of the success of embryo implantation, comprising: (a) providing a cervical sample from a female subject; (b) extracting nucleic acids from the cervical sample; (c) assaying the nucleic acids to generate a methylation profile; (d) in a programmed computer, inputting the data comprising the methylation levels of genes from the methylation profile in the step (c) to a trained algorithm to identifying one or more genes in the cervical sample associated with the success of embryo implantation based on the relationship between the methylation levels of genes and the change of endometrial receptivity; and (e) electronically outputting a report that identifies the one or more genes in the cervical sample associated with the probability of the success of embryo implantation.
The present invention uses a statistical analysis to process the differential methylation detection of the methylation profile from the cervical sample, then selects 1733 genes with the best performance listed in Table 4. The present invention further uses hierarchical models to cluster 1733 genes into three clusters, i.e., cluster A, cluster B and cluster C. According to the level in DNA methylation, the cluster A is a group with lower methylation (<10%) comprising 319 genes, the cluster B is a group with middle methylation (20%˜55%) comprising 174 genes and the cluster C is a group with higher methylation (>55%) comprising 1240 genes. In one embodiment, the 1733 genes are divided into cluster A comprising 319 genes, cluster B comprising 174 genes and cluster C comprising 1240 genes, wherein the genes in the clusters A, B and C are listed in Table 4.
The present invention further identifies multi-gene panels can serve as epigenetic biomarker panel for determining the probability of the success of embryo implantation. Therefore, the present invention selects at least one gene from clusters A, B and/or C to validate. In one embodiment, the at least one gene is selected from the group consisting of the cluster A, the cluster B and the cluster C. For example, the present invention identifies four, five or six-gene based panel. The AUC reached 0.81 (>0.8) in 4-gene combination (SYNE1, KCNC2, SLITRK2 and PDE4C). In another embodiment, the AUC was 0.81 in the 5-gene combination (SYNE1, KCNC2, SLITRK2, PDE4C and TMEM62). In another embodiment, the AUC was 0.82 in the 5-gene combinations (SYNE1, KCNC2, SLITRK2, PDE4C and ARID3C; SYNE1, KCNC2, SLITRK2, PDE4C and CASR). In another embodiment, the AUC was 0.82 in the 6-gene combinations (SYNE1, KCNC2, SLITRK2, PDE4C, CASR and TMEM62). In another embodiment, the AUC was 0.83 in the 6-gene combinations (SYNE1, KCNC2, SLITRK2, PDE4C, CASR and ARID3C).
The present invention also provides a composition comprising a gene combination, wherein the gene combination comprises SYNE1, KCNC2, SLITRK2 and PDE4C, and the gene combination is used for determining the probability of the success of embryo implantation. Therefore, the composition comprises multi-gene panels which can be used for determining the probability of the success of embryo implantation. More generally, the present invention identifies and validates multi-gene panels, that can predict clinical pregnancy outcome in IVF cycles with high precision.
In one embodiment, the gene combination further comprises at least one gene is selected from the group consisting of TMEM62, ARID3C and CASR.
The present invention uses a statistical method to calculate the statistical value of a gene panel from 1733 genes. Therefore, the present invention uses 5-fold cross-validation to assess classifier performance. The present invention applies 5-fold cross validation with 10 repetitions (500 iterations) for each of the datasets. The maximum and minimum AUCs are calculated (over the 500 iterations). The AUC is averaged from all 500 repetitions of bootstrap sampling, and the confidence intervals are computed from the concatenation of the predicted and actual values through these iterations. In one embodiment, the statistical value of the at least one gene is a value of an area under the curve (AUC) calculated by a receiver operating characteristic (ROC) curve. In a preferred embodiment, the statistical value of the at least one gene is a value of AUC calculated by k-fold cross validation, wherein the k is an integer. In another embodiment, the k is 4, 5, 10, 20, 50, 100 or 500. In a preferred embodiment, the k is 5 or 500. In one embodiment, the value of AUC is calculated by k-fold cross validation which is performed based on the methylation profile of the cervical samples from the non-pregnancy group after receiving embryo implantation and the pregnancy group after receiving embryo implantation, wherein the k is an integer. In a preferred embodiment, the value of AUC is calculated by 500-times bootstrapping which is performed based on the methylation profile of the cervical sample from the non-pregnancy group after receiving embryo implantation and the pregnancy group after receiving embryo implantation.
In addition, the threshold value is determined by a data point on the ROC curve. In one embodiment, the threshold value is 0.5. In a preferred embodiment, the threshold value is 0.7. In a more preferred embodiment, the threshold value is 0.8. In another embodiment, the threshold value is 0.9.
The present invention further provides a kit for determining the probability of the success of embryo implantation, comprising a composition, wherein the composition comprises first binding molecules for detecting SYNE1, KCNC2, SLITRK2 and PDE4C.
In one embodiment, the composition further comprises second binding molecules for detecting at least one gene, wherein the at least one gene is selected from the group consisting of TMEM62, ARID3C and CASR.
In one embodiment, the form of the binding molecules comprises antibodies, peptides, primers or probes.
The present invention reveals that DNA methylation profiles from cervical secretions differed between pregnancy and non-pregnancy cycles. Using cervical secretions obtained during procedures of embryo transfer, the accuracy of using the methylation status for predicting pregnancy outcomes can be as high as 86.0%, providing a new way to personalize embryo transfer.
The advantage of the present invention is the use of a noninvasive approach that enables confirmation of the test results using pregnancy outcomes. The detection of the cervical secretions for analyzing is able to ensure the avoidance of perturbation of the implantation environment, providing a tool to investigate the monthly variation of endometrial receptivity. Because the analyzed cycle is the conceptional cycle itself, this noninvasive analysis is applicable to both fresh and frozen-thawed embryos. Even for in vivo fertilized embryos in natural conception, this noninvasive test is a promising way of indicating fertile cycles by identifying the receptive endometrium.
Thus, the present invention demonstrates the feasibility of noninvasively assessing endometrial receptivity using methylation status as determined from cervical secretions. The methylation profiles of mid-secretory samples can identify 96.4% of receptive endometria, as confirmed by a viable ongoing pregnancy after embryo transfer in the very same cycle. Predicting receptivity of the endometrium ahead of embryo transfer through quick diagnostic tests can maximize the likelihood of a successful pregnancy by saving good embryos for cycles with a favorable endometrium. The methylation profile not only provides an objective diagnosis of endometrial receptivity, but also reveals the molecules involved in the establishment of pregnancy, which may pave the way for new therapies for endometrial and obstetric diseases.
The examples below are non-limiting and are merely representative of various aspects and features of the present invention.
Materials and Methods:
1. Clinical Samples
The samples were collected from 2018 to 2021. Cycles with at least one good quality embryo ready for transfer were included in this present invention. Written informed consent was obtained with the approval of the ethic committee from all participating women. Embryos of good quality were defined as follows: (1) cleavage-stage embryos with an adequate number of cells (4-5 cells on day 2 and 7-9 cells on day 3 of culture) as well as less than 20% fragmentation and (2) blastocysts scored ≥3BB according to the Gardner and Schoolcraft grading system.
A sample of cervical secretion was collected during the embryo transfer procedure. The samples used in the present invention were obtained from a cervical sample before embryo transfer. The cervical sample from the female subject should be taken on the 0, 1st, 2nd, 3st, 4th, or 5th day of progesterone administration (P+0˜P+5) in a hormone replacement treatment (HRT) cycle (with progesterone administration), or in a natural cycle controlled by human chorionic gonadotropin (hCG) with and without modifications for ovulation triggering. In the case, embryo transfer should be carried out on the 5th, 6th or 7th days after oocyte retrieval. Samples were categorized into the pregnancy group and the non-pregnancy group according to the existence of a viable intrauterine pregnancy at 12 weeks of gestation. Overall, 59 pregnancy and 67 non-pregnancy samples were used for matched analysis. These samples were separated into a discovery set and a validation set. The methylomic profiles were generated using the discovery set, including 27 pregnancy and 30 non-pregnancy samples, which were subsequently used for verification of the array data. The validation set included 32 pregnancy and 37 non-pregnancy samples, which were used to validate the methylation levels of the selected genes (Table 1). Clinical characteristics of the enrolled embryo transfer cycles were recorded, including the age of the women at embryo transfer, the presence of endometriosis, the use of ovarian stimulation, and the number of embryos per transfer. Fresh embryos were transferred after IVF following ovarian stimulation and oocyte retrieval. In cycles of frozen embryo transfer, the endometrium was prepared by hormone replacement treatment. For women with endometriosis or adenomyosis, the preparation of endometrium was preceded by pituitary downregulation for at least 1 month.
2. DNA Extraction
The cervical secretions were collected before embryo transfer procedure (P+0˜P+5) using a cotton wool ball, and put into a 50 ml centrifuge tube, and stored at 4° C. One milliliter of phosphate buffered saline was used to rinse the cotton wool ball, which was then centrifuged at 1000 g for 10 min to collect the flow-through. Genomic DNA was extracted from the flow-through using the QIAamp DNA Mini Kit (QIAGEN, Hilden, Germany). DNA extracts were stored at −20° C. or −80° C. before use.
3. Differential Methylomics and Bioinformatic Analysis
The present invention generated methylomic profiles of samples from the discovery set using the Infinium MethylationEPIC BeadChip array, which covered more than 850,000 CpG sites (Illumina, San Diego, Calif., USA). In the beadchip system, β-value (ranging from 0 to 1), where 0.0 is equivalent to 0% methylation and 1.0 is equivalent to 100% methylation at a given CpG dinucleotide, was used to present the DNA methylation level of each probe. The methylation levels derived from type I and type II probes were normalized by the Beta-Mixture Quantile (BMIQ) method. After probes with single-nucleotide polymorphism (SNP) were removed, the differentially methylated probes (DMPs) were identified by a detecting P value of each probe <0.05 and a β difference>|0.02|. Next, the present invention focused on DMPs at promoter regions and ranked them by the area under the receiver operating characteristic curve (AUC). A higher AUC meant higher accuracy in differentiating pregnancy and non-pregnancy samples. The performance of various DMP sets, such as Top 3000, Top 2000, and Top 1500, was evaluated by the percentage of correct categorization of samples in terms of pregnancy outcomes. The top 2000 DMP set had the best performance and was selected for the following analysis (Table 2).
4. Bisulfite Conversion
DNA was bisulfite-converted from 500 pg-2 μg genomic DNA, cDNA or fragmented DNA, using the EZ DNA Methylation Kit, EZ DNA Methylation-Direct Kit, EZ DNA Methylation-Gold Kit, EZ DNA Methylation-Lightning Kit (Zymo Research Corp., Irvine, Calif., USA) or other commercial kits, according to the manufacturer's recommendations.
5. Statistical Analysis
The Mann-Whitney nonparametric U test was used to identify differences in methylation levels between the two sample groups. The significance of all differences was assessed using a two-tailed t test for continuous variables and Fisher's exact test for categorical variables, with a threshold for significance of P<0.05. AUC was calculated using the Youden index in the ROC package. To estimate the performance of gene combinations in predicting pregnancy outcomes, a logistic regression model based on 500 rounds of five-fold cross-validation on all samples was performed to calculate AUC. The aforementioned analyses were performed and the plots were created using the statistical package in R (version 3.3.2) or MedCalc version 19 (MedCalc Software Ltd., Ostend, Belgium; 2018).
6. Biomarker Panel Selection
Heat map analysis combined with hierarchical clustering was performed to investigate whether the 2000 DMPs clearly differentiated between the non-pregnancy group and the pregnancy group (
7. Measurement of Methylation Levels by qMSP
To verify the array data, a biomarker panel is designed and validated. One to two genes in each subgroup of the top 2000 DMPs were selected for quantifying DNA methylation levels using real-time polymerase chain reaction. The primers were designed by Oligo 7.0 Primer Analysis software (Molecular Biology Insights, Inc., Colorado Springs, Colo., USA). Quantitative methylation-specific polymerase chain reaction (qMSP) assays were performed on the LightCycler 480 System (Roche, Indianapolis, Ind., USA). Duplicate testing was conducted for each gene in all samples. To normalize the amount of input DNA in each qMSP reaction, a type II collagen gene (COL2A1), located in a non-CpG region, was used as a reference. DNA methylation levels were estimated by the difference in crossing point (ΔCp) values, defined as follows: Cp of target gene-Cp of COL2A1. Samples with test results of a Cp value of COL2A1>36 were defined as not detectable.
8. Hierarchical Cluster Analysis
Hierarchical cluster analysis is a step-by-step process to perform a cluster analysis. Calculated the distance matrix by Euclidean or Manhattan distance and complete linkage method to generate a dendritic tree. Using the distance threshold separates optimal subgroups.
Results:
1. Genome-Wide Methylation Profiles of Cervical Secretions
As illustrated in
Samples of cervical secretions were categorized as pregnancy and non-pregnancy according to the existence of a viable intrauterine pregnancy at 12 weeks of gestation following embryo transfer. The discovery set included 28 pregnancy and 29 non-pregnancy samples. Clinical characteristics of embryo transfer cycles enrolled in the discovery set are described in Table 1. The measurement of methylation levels was reliable, as shown by the high correlation (R2=0.99) between technical replicates (
There were 23569 CpG sites with significant differences in methylation between pregnancy and non-pregnancy samples, accounting for 3.2% of total probes (
2. Predicting Pregnancy Outcomes by Differential DNA Methylation
Unsupervised hierarchical clustering analysis of all DMPs correctly categorized 45 out of the 57 samples (78.9%) according to pregnancy status (Table 2). The percentage of correct categorization became higher (84.2%) when only the 5569 DMPs located at promoter regions were used for analysis (Table 2). The present invention further eliminated less relevant probes to identify the panel with the best performance by ranking the promoter DMPs according to AUC, which represented the ability of methylation levels to separate pregnancy from non-pregnancy samples. During this process, the percentages of correct categorization for all samples as well as for pregnancy samples increased until the size of DMPs was less than 2000 (Table 2). The top 2000 promoter DMPs were 86.0% correct for all samples and 96.4% correct for pregnancy samples, which constituted the profile with the fewest probes and the best performance for differentiating pregnancy and non-pregnancy samples.
Analysis of the top 2000 DMPs by unsupervised hierarchical clustering was performed, as shown in Table 3, which revealed three main clusters that divided the 57 cervical secretion samples according to pregnancy outcomes. The first cluster (C1) included 3 samples all from pregnancy cycles. The second cluster (C2) included most of the pregnancy samples, that is, 24 pregnancy and 7 non-pregnancy samples. In contrast, most of the non-pregnancy samples clustered in the third cluster (C3), which included 22 non-pregnancy samples and only one pregnancy sample (Table 3). Factors that may influence pregnancy outcomes were analyzed, such as the age of women receiving embryo transfer, the presence of endometriosis, and the exposure to supraphysiological hormone levels due to ovarian stimulation. None of the above factors was correlated with the three clusters, implying the specificity of the selected DMPs to pregnancy status.
The ability of the top 2000 DMPs to classify samples according to pregnancy outcomes could also be characterized with other machine learning techniques. Upon analysis by k-means clustering, the top 2000 DMPs partitioned the 57 samples into 5 clusters. Two clusters comprised exclusively pregnancy samples and another two clusters comprised exclusively non-pregnancy samples. There was only one cluster comprising both samples, which included 15 samples from 9 cases of pregnancy and 6 of non-pregnancy (
3. Microarray Verification by qMSP
To verify how methylation status reflects pregnancy status as discovered by microarray, the methylation levels of selected genes were measured by qMSP using the same samples from which the microarray results were generated. The genes associated with top 2000 DMPs included 1733 genes. Table 4 showed the 1733 candidate genes. Simultaneously, the present invention minimized the number of features to select the best multi-biomarker panel for pregnancy outcome prediction. The 1733 genes candidate genes could be divided into 3 clusters, A, B and C (Table 4). The algorithm also clustered top 2000 DMSs three major groups consisting of 355 DMSs in cluster A (comparatively hypomethylated); 191 DMPs in cluster B; and 1454 DMPs in cluster C (comparatively hypermethylated).
4. A Methylation Biomarker Panel
One, two or more genes were selected from 3 subgroups (cluster A, B and C) according to hierarchical clustering of the top 2000 DMPs and created a biomarker panel. The differences of methylation levels in selected genes between pregnancy and non-pregnancy samples were tested by quantitative methylation-specific polymerase chain reaction (qMSP). The present invention further selected SYNE1 from the cluster A; ARID3C, CASR, PDE4C and SLITRK2 from cluster B; and TMEM62 and KCNC2 from the cluster C to validate the pregnancy outcome prediction in IVF. Among the seven selected genes, the AUCs of each single gene ranged from 0.53 to 0.73 in 20 pregnancy and 23 non-pregnancy samples, and ranged from 0.53 to 0.78 in another 32 pregnancy and 37 non-pregnancy samples. To further test the validity of these markers, all the 126 samples were used to estimate the performance of gene combinations by a logistic regression model with 500 times bootstrapping. As demonstrated in Table 5, the AUCs of each single gene ranged from 0.5 to 0.70. Among the selected genes, two genes (SLITRK2 and KCNC2) had only been reported in the nervous system and their role in endometrium was not known. SLITRK2 encodes a transmembrane protein that is involved in the formation and maintenance of synapses. KCNC2 encodes components of voltage-gated potassium channels that are required to maintain the high-frequency firing in neocortical GABAergic interneurons. As for the last two genes, SYNE1 encodes a spectrin repeat-containing protein that anchors the nuclear envelope to the cytoskeleton, which is critical for nuclear positioning. ARID3C encodes a helix-turn-helix transcription factor, implying its role in regulation of gene expression during cell growth, differentiation and development. Multiple markers combined in a biomarker panel may improve diagnostic sensitivity help to optimize the pregnancy outcome prediction in IVF.
5. Cross-Validation of Gene Combinations for Predicting Pregnancy Outcomes
To further test the performance of gene combinations of these selected genes in predicting pregnancy outcomes, five-fold cross-validation was performed on all 126 samples, including the discovery and validation sets, to simulate a larger data set that could be used to estimate the out-of-sample performance. In each round of cross-validation, samples were randomly partitioned into five equal-sized subgroups. Four subgroups were used to perform the analysis (the training set) and the remaining subgroup to validate the analysis (the testing set). Compute the AUC scores by performing 5-fold cross-validation. The process was repeated for 5 times with each of the subgroups used exactly once as the validation data. After 500 rounds of five-fold cross-validation, the validation results were logistically regressed, as demonstrated in Table 6. A four-gene panel (including SYNE1, KCNC, SLITRK2, and PDE4C) was established for prediction model. The ROC curve revealed good predicted performance (AUC=0.81). Five-gene combinations or six-gene combinations showed slightly higher AUC (0.81˜0.83).
Feature selection is necessary along with model estimation to reduce data dimension and model complexity. The above findings suggested that the methylation levels of selected genes having potential diagnostic usage as biomarkers. Importantly, features combine named as multi-biomarker panel could be an effective approach to improving diagnostic accuracy.
The expression of these selected genes in normal endometrium throughout the menstrual cycle was retrieved from publicly available single-cell RNA-seq data. Only KCNC2, PDE4C, SYNE1, and TMEM62 were available in the database. As illustrated in
The endometrium undergoes cyclic changes involving cell proliferation, differentiation and degradation, which were driven by steroid hormones (
Predicting receptivity of endometrium ahead of embryo transfer through quick diagnosis tests would be able to maximize chances of successful pregnancy by saving good embryos to cycles with favorable endometrium. The methylation profile not only provided an objective diagnosis for endometrial receptivity, but also unraveled the molecular involvements in the establishment of pregnancy, which may pave a way for new therapies in endometrial and obstetrical diseases.
Those skilled in the art recognize the foregoing outline as a description of the method for communicating hosted application information. The skilled artisan will recognize that these are illustrative only and that many equivalents are possible.
Number | Date | Country | |
---|---|---|---|
63124097 | Dec 2020 | US |