This invention relates to the field of genetic testing for pregnant females in order to diagnose chromosomal aneuploidy and fetal gender from maternal peripheral blood samples.
Fetal chromosomal aneuploidy results from the presence of abnormal dose(s) of a chromosome or chromosomal region. The Down syndrome or Trisomy 21 (T21) is the most common incurable chromosomal aneuploidy in live born infants, which is typically associated with physical and mental disability (Parker et al. 2010). The overall incidence of T21 is approximately 1 in 700 births in the general obstetrical population, but this risk increases to 1 in 35 term births for women 45 years of age. An invasive diagnostic procedure is currently the only way to confirm the diagnosis of T21, commonly by a fetal cytogenetic analysis (such as karyotyping), which requires fetal genetic material to be invasively obtained by amniocentesis, chorionic villus sampling or cordocentesis. Due to the current risk of prenatal testing it is currently offered only for women in the high-risk group. Although the safety of invasive procedures has improved since their introduction, a well-recognized risk of fetal loss (0.5 to 1% for chorionic villus sampling and amniocentesis) and follow-up infections still remain (Akolekar et al. 2015). Hence, non-invasive and highly confident prenatal screening tests to reduce the number of invasive diagnostic procedures are still required.
Since the discovery of fetal genomic material in the form of circulating cell-free fetal DNA (cffDNA) in the blood plasma of pregnant women (Lo, et al., 1997) many attempts have been made aiming at using cffDNA for non-invasive risk-free prenatal testing (NIPT). Early applications of NIPT included the determination of Rhesus D blood-group status and fetal sex as well as the diagnosis of autosomal dominant disorders of paternal inheritance by quantitative real time PCR (qPCR) (Lo et al., 1998; Daniels et al, 2006). However, the application of cffDNA to the prenatal detection of fetal chromosomal aneuploidies has represented a considerable challenge. First of all, the cffDNA represents only a subfraction of 6-10% of the total cfDNA (cell-free DNA) of maternal origin in first and second trimester pregnancies and rises up to 10-20% in third trimester pregnancies (Lun et al., 2008; Lo et al., 2010), and this can often interfere with the analysis of fetal nucleic acids. One way to deal with the low abundance of the fetal DNA was the evaluation of the dosage of chromosome 21 calculating the ratios of polymorphic alleles in the placenta-derived DNA/RNA molecules (Lo, and Chiu, 2007). However, this method can only be applied to fetuses that are heterozygous for the targeted polymorphisms.
A study of Zimmermann et al (2002) was able to distinguish between trisomic 21 and euploid fetuses using qPCR based on the 1.5-fold increase in chromosome 21 dosage in the trisomic cases. Since a 2-fold difference in DNA template concentration constitutes a difference of only one threshold cycle (CT), the discrimination of a 1.5-fold difference is at the limit of conventional qPCR.
With the development of massive parallel sequencing (MPS) the detection of fetal aneuploidy is carried out through counting cfDNA molecules and measuring the over- or underrepresentation of any chromosome in maternal plasma. As previous reports have indicated that fetal cffDNA is shorter than its maternal counterpart (Chan et al, 2004; Li et al, 2004; Fan et al, 2010), MPS has been combined with size fractionation prior to sequencing or in silico of plasma DNA fragments to enrich for fetal DNA. However, even though MPS has been widely used in commercial prenatal testing, such an approach which requires deep coverage or paired-end sequencing, increases the cost of service.
An alternative approach to improve the sensitivity and cost-effectiveness of NIPT is preferential targeting of fetal DNA sequences by utilizing epigenetic differences between maternal blood DNA and cffDNA.
Bisulfite conversion that enables analysis of the methylation status of each CG site, followed by either methylation-specific PCR or sequencing has been applied to detect methylation differences between maternal and fetal DNA (Chim, et al. 2005; Chiu, et al. 2007; Chim, et al. 2008; Lun et al, 2013; Jensen et al, 2015). However, although providing high resolution, bisulfite treatment reinforces the degradation of low amounts of fetal DNA, complicating fetal specific methylome analysis. Furthermore, screening genomes for diagnostic of DMRs by whole-genome bisulfite-sequencing is technologically demanding and extremely expensive leading to an unnecessary increase in cost of NIPT.
The application of methylation sensitive restriction digestion involves the use of methylation-sensitive restriction enzymes to remove hypomethylated maternal DNA thus allowing direct polymerase chain reaction (PCR) analysis of cffDNA (Old, et al. 2007; Tong et al, 2010). However, methylation sensitive restriction digestion is inherently limited by the sequence-specificity of available enzymes what restricts the number of DMR regions suitable for testing.
The methylcytosine-immunoprecipitation based approach (MeDIP) was used in combination with oligonucleotide array analysis, sequencing and MeDIP-qPCR for the quantification of selected hypermethylated fetal DMRs on chromosome 21 (Papageorgiou et al., 2009, Tsaliki et al, 2012, Keravnou et al, 2016). However, MeDIP enrichment is biased to highly methylated sequences (Weber et al. 2005) and thus, the potential diagnostic informativeness of the less CG dense or less methylated sequences might be lost. Therefore, further developments and advances are necessary for the identification and detection of highly specific and stable fetal-specific markers.
Placental DNA was reported to be generally hypomethylated as compared to maternal blood DNA. Examination of the differential methylation between placenta and maternal blood uncovered large contiguous genomic regions with significant placental hypomethylation relative to non-pregnant female cfDNA (Jensen et al, 2015). Moreover, these regions are of low CpG and gene density and thus could be poorly covered by affinity enrichment methods, such as MeDIP. Since unmodified CG fraction represents smaller portion of the human genome (20-30% of CGs are unmethylated), its targeted analysis is more relevant for cost-effective and sensitive detection of fetal specific DNA fragments in maternal circulation.
In recent years, we and others have been adapted covalent derivatization for epigenome-wide studies of various cytosine modifications (Song et al. 2011; Kriukienė et al. 2013; Staševskij et al. 2017; Gibas et al, 2020, accepted). Generally, robust and highly specific enrichment of a covalently modified minor fraction of cytosines in the fetal cffDNA, for example of unmodified CGs or hydroxymethylated cytosines, could potentially help achieve superior sensitivity and specificity in prenatal diagnostics. More importantly, a method for highly specific targeted analysis of a particular fraction of fetal regions combined with lower cost next generation sequencing devices or real time quantitative PCR (qPCR) can significantly alter the cost and turnaround time of NIPT, increasing the availability of NIPT screening for all pregnancies without the restriction to a high risk group.
In the first aspect, the present invention provides a new method for noninvasive prenatal diagnosis based on analysis of unmodified CG sites (uCG) or hydroxymethylated CGs (hmCGs) in nucleic acid molecules extracted from a biological sample obtained from a pregnant female typically during the first trimester of gestational age through use covalent modification of uCGs or hmCs and subsequent estimation of the labeled fraction of CG sites, enabling genome-wide identification of the fetal-specific regions.
According to one exemplary embodiment, a biological sample received from a pregnant female is analyzed to perform a prenatal diagnosis of a fetal chromosomal aneuploidy, such as trisomy T21, and fetal gender.
A maternal biological sample includes nucleic acid molecules found in various maternal body fluids, such as peripheral blood or a fractionated portion of peripheral blood, urine, plasma, serum, and other suitable biological samples. In a preferred embodiment, the maternal biological sample is a fractionated portion of maternal peripheral blood.
A large number of differentially labeled regions (DLRs) on chromosome 21, 13 and 18 which are differentially modified between non-pregnant female peripheral blood DNA sample and DNA of placental origin (chorionic villi (CV) of the fetal part of placenta which are enriched in fetal trophoblasts) or between non-pregnant female peripheral blood DNA sample and peripheral blood DNA sample of pregnant women have been identified using covalent chemical modification of the cytosine base of naturally unmodified CG sites or hydroxymethylated CG sites in maternal nucleic acid molecules. Subsequent PCR amplification with or without enrichment of the labeled fraction of CG sites coupled with sequence determination of the labeled and amplified nucleic acid molecules enabled genome-wide identification of the fetal-specific labeled regions. As used herein, the term DLR refers to a “differently labeled genomic region” that is more or less intensively labeled through enzymatic transfer of a reactive group onto the cytosine base in the nucleic acid molecule. For the purposes of the invention, the preferred DLRs (selected u-DLRs; see Table 4) are those that are hypomethylated and thus, more intensively labeled, in fetal DNA and hypermethylated in maternal DNA. In another aspect, the preferred DLRs (selected hm-DLRs; see Table 5) are those that are hyper-hydroxymethylated and thus, more intensively labeled, in fetal DNA and hypo-hydroxymethylated in maternal DNA.
In one embodiment, a DLR can be confined to a single cytosine or a dinucleotide, preferentially a CG dinucleotide (CG-DLRs).
Representative examples of a subset of these u-DLRs, hm-DLRs and CG-DLRs have been used to accurately predict trisomy 21, in a method based on analysis of fetal-specific hypomethylated or hydroxymethylated DNA in a sample of maternal blood, typically during the first trimester of gestational age. Thus, the effectiveness of the disclosed DLRs and methodologies for diagnosing fetal aneuploidies have been demonstrated.
In addition, representative examples of a subset of these u-DLRs and hm-DLRs have been used to accurately predict fetal gender from X and Y chromosomes, in a method based on analysis of fetal-specific hypomethylated DNA in a sample of maternal blood, typically during the first trimester of gestational age. Thus, the effectiveness of the disclosed DLRs and methodologies for diagnosing fetal gender have been demonstrated.
Accordingly, the invention pertains to a method for prenatal diagnosis of a trisomy 21, and fetal gender using a sample of maternal blood, the method comprising:
(a) enzymatic labeling of uCG and hmC sites of nucleic acid molecules in a sample of maternal blood with a first reactive group, preferably an azide group;
(b) chemically tethering of an oligodeoxyribonucleotide (ODN) having the second reactive group, preferably an alkyne group, to the first group in a template nucleic acid;
(c) producing nucleic acid molecules from a template nucleic acid sequence using a nucleic acid polymerase which contacts a template nucleic acid sequence at or around the site of the labeled uCG/hmC and starts polymerization from the 3′-end of a primer non-covalently attached to the ODN;
(d) determining the presence or availability of the CG target sites and hence the level of the unmodified or hydroxymethylated template genomic nucleic acid molecules across the regions of chromosomal DNA shown in Tables 4 or 5, or 6;
(e) comparing the acquired value of the regions of step (d) to a standard reference value for the combination of at least one region from the list shown in Tables 4-6, wherein the standard reference value is (i) a value for a DNA sample from a woman bearing a fetus without trisomy 21; or (ii) a value for a DNA sample from a woman bearing a fetus with trisomy 21.
(f) diagnosing a trisomy based on said comparison, wherein trisomy 21 is diagnosed if the acquired value of the regions of step (d) is (i) higher than the standard reference value from a woman bearing a fetus without trisomy 21; or (ii) lower than the standard reference value from a woman bearing a fetus without trisomy 21; or (iii) comparable to the standard reference value from a woman bearing a fetus with trisomy 21.
(g) detecting fetal gender based on said comparison wherein female gender of a fetus is detected if the acquired value of the regions of step (d) is comparable to the standard reference value from a woman bearing a female fetus, and male gender of a fetus is detected if the acquired value of the regions of step (d) is comparable to the standard reference value from a woman bearing a male fetus.
In the present embodiment, the method comprises the measurement of the presence or availability of the target CG sites in the template nucleic acid molecules by sequencing of the amplified nucleic acid molecules of the biological sample, such that only the sequence of the targeted CGs and hence the unmodified/hydroxymethylated fraction of CGs is determined. In this embodiment, amplification prior to sequencing is performed through the ODN-directed and ligation-mediated PCR using one primer bound complementary to the ODN or a part of it in the absence of complementarity to the genomic template region, and the second primer bound through non-covalent complementary base pairing to oligonucleotide linkers ligated to both ends of the template nucleic acid molecule. In another aspect of this embodiment, amplification prior to sequencing can be performed by targeted PCR amplification utilizing one primer bound complementary to the ODN or a part of it in the presence (5-7 nucleotides complementarity to the genomic template DNA in the proximity of a CG site) or absence of complementarity to the genomic template DNA, and the second primer bound through non-covalent complementary base pairing to the template DNA in the chromosomal regions shown in Tables 4 or 5 or 6 or 7.
In further embodiments, the method comprises the measurement of the presence or availability of the labeled target sites and hence the level of the unmodified or hydroxymethylated template nucleic acid molecules by real time quantitative polymerase chain reaction (qPCR) of the enriched fetal CGs and DNA regions, which have been previously covalently targeted and pre-amplified using attached ODN as described above, utilizing one primer with its 5′ end bound complementary to the chromosomal regions shown in Tables 4-7 in the very close vicinity (its 5′ end binds at or more than 5 nucleotides to a labeled CG site) to the labeled cytosine, and the second primer bound complementary to the template DNA in the selected chromosomal regions shown in Tables 4 or 5 or 6 or 7.
In yet another aspect, the method comprises the measurement of the presence or availability of the labeled target sites and hence the level of the unmodified or hydroxymethylated template nucleic acid molecules in a non-preamplified DNA sample by real time quantitative polymerase chain reaction, utilizing one primer that recognizes and binds to the ODN and 5-7 nucleotides adjacent to the target CG site in a template genomic DNA through non-covalent complementary base pairing, and a second primer binds complementary to the template DNA in the selected chromosomal regions shown in Tables 4 or 5 or 6 or 7.
In the preferred embodiment of the invention, the plurality of differentially labeled regions (DLRs) preferably is chosen from the lists shown in Tables 4-7. In various embodiments, the levels of the plurality of DLRs are determined for at least one DLR, for example chosen from the lists shown in Tables 4-7. Preferably, the levels of the plurality of DLRs in the labeled DNA sample are determined by real time quantitative polymerase chain reaction (qPCR). As used herein, the term “a plurality of DLRs” is intended to mean one or more DLRs (or CG dinucleotides).
In a further aspect, the present invention pertains to a kit, comprising the composition of the invention. In other embodiments, the kit further comprises:
(a) an enzyme capable of covalent derivatization of the cytosine base with an active group, preferentially an azide group;
(b) a compound comprising the active group (an azide group);
(c) an ODN attached to the second reactive group, preferably an alkyne group; and
(d), oligonucleotide primers (e.g., two or more) for assessment of DLR regions through PCR amplification, wherein one primer binds to the ODN or in the close vicinity to the ODN attachment site through non-covalent complementary base pairing and is able to prime a nucleic acid polymerization reaction from the labeled CG and the second primer binds to the genomic regions described in Tables 4-7;
(e) in another embodiment, the kit can further comprise oligonucleotide linkers for ligation and/or oligonucleotide primers for PCR amplification of the nucleic acid molecules to be analyzed by qPCR or sequencing.
The present invention is based, at least in part, on the inventors' identification of a large panel of differentially labeled regions (DLRs) and CGs (CG-DLRs) that exhibit strong labeling in fetal DNA and weak or absence of labeling in maternal DNA. Still further, the invention is based, at least in part, on the inventors' demonstration that hypomethylated/hydroxymethylated fetal DNA can be specifically targeted and enriched through covalent modification of CGs, thereby resulting in a sample enriched for fetal DNA. Still further, the inventors have accurately diagnosed trisomy 21 and fetal gender in a panel of maternal peripheral blood samples using representative examples of the DLRs disclosed herein, thereby demonstrating the effectiveness of the identified DLRs and disclosed methodologies in diagnosing fetal aneuploidy T21 and fetal gender.
Various aspects of this disclosure are described in further detail in the following subsections.
I. A Method for Non-Invasive Detection of Fetal Aneuploidy T21 and Fetal Gender
Accordingly, the invention pertains to a method for prenatal diagnosis of a trisomy 21, and fetal gender using a sample of maternal blood, the method comprising:
(a) enzymatic labeling of uCG or hmC sites of nucleic acid molecules in a sample of maternal blood with a reactive azide group;
(b) chemically tethering of an oligodeoxyribonucleotide (ODN) having an alkyne group to the introduced azide groups in a template nucleic acid;
(c) producing nucleic acid molecules from a template nucleic acid sequence starting at the azide-labeled CG sites through PCR amplification;
(d) determining the labeling intensity level of unmodified or hydroxymethylated template genomic nucleic acid molecules across the regions or CG sites of chromosomal DNA shown in Tables 4 or 5, or 6 using, preferably qPCR, or sequencing of labeled genomic fraction;
(e) comparing the experimentally acquired value of the regions of step (d) to a standard reference value for the combination of at least one region, or at least two regions from the list shown in Tables 4-6, wherein the standard reference value is (i) a value for a DNA sample from a woman bearing a fetus without trisomy 21; or (ii) a value for a DNA sample from a woman bearing a fetus with trisomy 21.
(f) diagnosing a trisomy 21 based on said comparison, wherein trisomy 21 is diagnosed if the experimentally acquired value of the sample is (i) higher than the standard reference value from a woman bearing a fetus without trisomy 21; or (ii) lower than the standard reference value from a woman bearing a fetus without trisomy 21; or (iii) comparable to the standard reference value from a woman bearing a fetus with trisomy 21.
A schematic illustration of the analytical approach for evaluation of labeling intensity in DLRs using labeling and enrichment of unmodified or hydroxymethylated CGs is demonstrated in
II. Labeling of Unmodified or Hydroxymethylated CG Sites
Methods for the first step of covalent derivatization of genomic DNA sites are known in the art. Covalent labeling of genomic uCG or hmC sites can be performed using an enzyme capable of transfer of a covalent group onto genomic DNA. The enzyme may comprise a methyltransferase or a glucosyltransferase.
An enzyme for covalent labeling of uCG sites is preferably the C5 DNA methyltransferase M.Sssl or a modified variant of it, such as M.Sssl variant Q142A/N370A (Kriukiene et al., 2013; Stasevskij et al, 2017) which is adapted to work with synthetic cofactors, such as Ado-6-azide cofactor (Kriukiene et al., 2013; Masevicius et al., 2016).
An enzyme for covalent labeling of hmC/hmCG sites is preferably the phage T4 beta-glucosyltransferase (BGT) which is adapted to work with synthetic cofactors, such as UDP-6-azidoglucose (Song et al, 2011).
The ODN is preferably from 20 to 90 nucleotides in length, as shown in the exemplary embodiment preferably 39 nt. The ODN contains the reactive group at the second base position from its 5′-end, preferably the alkyne group, which reacts with the azide group which was enzymatically introduced in a template nucleic acid molecule.
It should be noted that DNA after covalent labeling becomes enzymatically and chemically altered but preserves base specificity. As used herein, the term “enzymatically altered” is intended to mean reacting the DNA with an enzymatically transferred chemical group that enables the conversion of respective CG sites into the azide-CG sites, giving discrimination of the labeled sites from template CGs. As used herein, the term “chemically altered” is intended to mean enzymatic transformation of template cytosine into the azide-modified cytosine in CG sites. Thus, in the instant method the fetal specific regions are calculated between more intensively and less intensively labeled CG sites in DNA without the need to directly determine methylation or hydroxymethylation levels of template DNA. Furthermore, in the instant method, the DNA of the maternal blood sample is not subjected to sodium bisulfite conversion or any other similar chemical reactions that alter base specificity, such as sodium bisulfite conversion, nor the maternal blood sample is treated with a methylation-sensitive restriction enzyme(s) or through direct or indirect immunoprecipitation to enrich for a portion of maternal blood sample DNA.
Alternatively, the ODN-derivatized template DNA can be enriched on solid surfaces using an affinity tag that is introduced in the composition of the ODN. A useful affinity tag preferably is but not restricted to the biotin and can be used in the methods of the present invention. In this aspect, the invention includes an additional step of separating maternal nucleic acid sequences on a solid surface, for example on streptavidin/avidin beads, thereby further enriching for nucleic acid molecules containing labeled CG sites. Other approaches known in the art for physical separation of components can be also used. The captured DNA is to be used for further analysis without detachment or can be detached from beads in mild conditions, such as, for example pure water and heating to 95° C. for 5 min.
III. Producing of Template Nucleic Acid Molecules from the Site of Covalent Labeling
In the diagnostic method, a nucleic acid polymerase primes polymerization of the template nucleic acid at or around the site of labeling using the 3′-end of an externally added primer which is non-covalently attached to the ODN. Non-covalent bonding preferably involves base pairing interaction between the ODN and the externally added primer. In the preferred embodiments shown in
In the diagnostic method, typically after tagging of CGs in the maternal blood sample with the ODN, the tagged CGs and adjacent template nucleic acid are pre-amplified starting from the site of the attachment of the ODN. As used herein, the term “pre-amplified” is intended to mean that additional copies of the DNA are made to thereby increase the number of copies of the DNA, which is typically accomplished using the polymerase chain reaction (PCR).
In the preferred embodiment, the experimentally acquired value for the presence or availability of labeled CG that were tagged with the ODN in the maternal blood sample can be acquired by amplification of the DNA molecules starting from the tagged CG sites using the ODN-directed and partially ligation mediated (LM-PCR) polymerase chain reaction. The skilled person will be well aware of suitable methods for ligating adaptor sequences to the DNA fragments. In LM-PCR of the present invention, an adaptor nucleic acid sequences are added onto both ends of each DNA fragments through preferably sticky end or blunt-end ligation, wherein each strand of an adaptor sequences is capable of hybridizing with a primer for PCR, thereby amplifying the DNA fragments to which the linkers have been ligated. In this aspect of the present invention, only one strand of the ligated partially complementary double-stranded adaptor sequence is used to anchor a primer for amplification of the labeled template DNA strand as shown in
Yet, alternatively, the values of the amplified sequences, or DLRs, are determined through massive parallel sequencing. In this aspect of the embodiments, one strand of the ligated double-stranded adaptor sequence is used to anchor a primer for amplification of the labeled template DNA strand as shown in
In another embodiment of the invention, the experimentally acquired value for the presence or availability of labeled CG is estimated through qPCR, in a maternal blood sample that has not been subjected to adaptor ligation or pre-amplification, as shown in
IV. Differentially Labeled Regions (DLRs).
The diagnostic method of the invention employs a plurality of regions of chromosomal DNA wherein the regions are more intensively labeled in fetal DNA as compared to female peripheral blood samples. In theory, any chromosomic region with the above characteristics can be used in the instant diagnostic method. In particular, methods for identifying such DLRs are described in detail below and in the Examples (see Examples 1 and 2). Moreover, a large panel of DLRs for chromosomes 21, 13 and 18 suitable for use in the diagnostic methods, has now been identified (the strategy for identification of DLRs is shown in
Furthermore, representative examples of a subset of these DLRs (4175 tissue-specific u-DLRs; 163 pregnancy-specific u-DLRs; 8815 tissue-specific hm-DLRs, 679 pregnancy-specific hm-DLRs) have been used to accurately predict trisomy 21, in a method based on analysis of fetal-specific DLRs in chromosome 21 by sequencing of labeled CG sites in a maternal blood sample. We also evaluated labeling differences between maternal blood samples of healthy and T21 positive pregnancies and identified 3,490 u-DLRs and 2,002 hm-DLRs which are shown in Tables 4 and 5, respectively. The effectiveness of the disclosed regions and methodologies for diagnosing fetal aneuploidy T21 has been demonstrated in
According to the second exemplary embodiment, DLRs restricted to individual CGs (CG-DLRs) have been identified in chromosomes 21 and X. Representative examples of a subset of these DLRs have been used to accurately predict trisomy 21, in a method based on analysis of fetal-specific hypomethylated or hyper-hydroxymethylated CG-DLRs in chromosome 21 by sequencing of labeled CG sites in a sample of maternal blood. Also, representative examples of a subset of these CG-DLRs have been used to accurately predict fetal gender, in a method based on analysis of fetal-specific CG-DLRs in chromosome X by sequencing of labeled CG sites in a sample of maternal blood. The effectiveness of the disclosed DLRs and methodologies for determination T21 aneuploidy and fetal gender has been demonstrated in
In the third exemplary embodiment, representative examples of a subset of the CG-DLRs have been used to accurately predict trisomy 21 and fetal gender, in a method based on analysis of fetal-specific DLRs in chromosome 21 and chromosome X and/or Y in a sample of maternal blood by qPCR. Thus, the effectiveness of the disclosed regions and methodologies for diagnosing trisomy 21 and fetal gender has been demonstrated in
In other methods for detecting a fetal aneuploidy, the plurality of DLRs may be on chromosome 13, chromosome 18, to allow for diagnosis of aneuploidies of any of these chromosomes. In theory, any DMR with the above characteristics in a chromosome of interest can be used in the instant diagnostic method. Methods for identifying such DLRs in chromosome 13 and chromosome 18 are described in Example 1 and the effectiveness of the disclosed regions has been demonstrated in
As used herein, the term “a plurality of DLRs” is intended to mean one or more regions or DLRs, selected from the list shown in Table 4-7. In various embodiments, the levels of the plurality of DLRs are determined for at least one region. Control regions or control DLRs also can be used in the diagnostic methods of the invention as a reference for evaluation of the labeled signal in the DLR region(s) of interest.
In a particularly preferred embodiment, the plurality of DLRs on chromosome 21 comprise one region or a combination of at least two regions, selected from the group shown in Table 6.
The invention also pertains to a composition comprising nucleic acid probes that selectively detect DLRs shown in Table 6.
The actual nucleotide sequence of any of the DLRs shown in Tables 4-7 is obtainable from the information provided herein together with other information known in the art. More specifically, each of the DLRs shown in Tables 4-7 is defined by a start base position on a particular chromosome, such as, for example “position 10774500” of chromosome 21. Furthermore, primers for targeted detection and/or amplification of a DLR can then be designed, using standard molecular biology methods, based on the nucleotide sequence of the DLR.
In another aspect, the invention provides nucleic acid compositions that can be used in the methods and kits of the invention. These nucleic acid compositions are informative for detecting DLRs. As described in detail in Example 3, at least one CG-DLR shown in Table 6 has been selected and identified as being sufficient to accurately diagnose trisomy 21 in a maternal blood sample during pregnancy of a woman bearing a trisomy 21 fetus.
V. Determining Levels of DLRs.
Labeling levels of the identified DLRs can be measured by sequencing or by qPCR.
Labeling levels of a plurality of regions as described above are determined in the unmethylated or hydroxymethylated DNA sample, to thereby obtain a labeling value for the DNA sample. As used herein, the term “the levels of the plurality of DLRs are determined” is intended to mean that the prevalence of the DLRs is determined. The basis for this is that in a fetus with a fetal trisomy 21 there will be a larger amount of the DLRs as a result of the trisomy 21, as compared to a normal fetus. In another aspect, when the T21-specific DLR are being used, the amount of such DLRs can be larger or lesser then the amount in a fetus without a fetal trisomy 21.
In a preferred embodiment, the levels of the plurality of DLRs are determined by real time quantitative polymerase chain reaction (qPCR), a technique well-established in the art. The term “the labeling value” is intended to encompass any quantitative representation of the level of DLRs in the sample. For example, the data obtained from qPCR can be used as “the labeling value” or it can be normalized based on various controls and statistical analyses to obtain one or more numerical values that represent the level of each of the plurality of DLRs in the testing DNA sample. The procedure for detection of DLRs by qPCR including primers' sequences, and the cycle conditions used were as described in Example 3.
In analysis of labeling intensity of DLRs by sequencing, the level of differential labeling was calculated for non-overlapping 100 bp regions. In more detail, for each window we computed the total log-transformed coverage and the fraction of identified CGs which we then normalized by the total log-transformed coverage and the fraction of identified CGs in reference chromosomes 16 (for uCG signal) and 20 (for hmC signal). For each window a full and null logistic regression models were fitted. Full model included coverage, identified fraction, and, for T21-specific DMRs, fetal sex and fetal fraction, as independent variables. Coverage and identified fraction were excluded from the null model. ANOVA Chi-squared test was used to compare full and null models to obtain p value. In cases where models did not converge fetal sex was removed and p value evaluated again. Model statistics were moderated using empirical Bayes. FDR was used to adjust p values for multiple testing and q<0.05 was used as significance threshold.
For each pregnancy-specific or tissue-specific DLR a leave-one-out cross-validation procedure was performed in order to determine its ability to diagnose T21. For each cross-validation cycle Bayesian generalized linear model (Gelman et al. 2008) with normalized coverage and identified CG as independent variables was constructed on the training samples. The model was then applied on the testing sample returning the predicted probability of the sample belonging to the T21 category. After all the cross-validation cycles the prediction probabilities for all samples were taken together. Various thresholds that would determine the discrete sample class from continuous probability measurement may have different effects on predictor's specificity and sensitivity. Therefore, a receiver-operating characteristic curve analysis was performed to estimate the effect of any threshold. The area under receiver-operating characteristic curve (AUC) indicates the overall accuracy of the model. Those DLRs for which the area under the curve was equal to 100% and, therefore, could achieve 100% prediction accuracy, were deemed to be the T21-predictive DLRs.
An approach that would combine individual DLRs into a single predictive model is also possible. Such model could be one of but not limited to elastic net, random forest or support vector machine. Model would be evaluated in the same way by assessing receiver-operating characteristic and using cross-validation for parameter tuning. Also, bootstrap could be used instead of cross-validation. Other model accuracy measures could be employed, and data could be transformed in different ways. Interactions of DLRs could be taken into account to build new composite features that would be used for subsequent model training and evaluation.
VI. Comparison to a Standardized Reference Value.
The labeling value of the fetal DNA (also referred to herein as the “test value”) present in the maternal peripheral blood is compared to a standardized reference value, and the diagnosis of trisomy 21 (or lack of such fetal trisomy 21) is made based on this comparison. Typically, the test value for the fetal DNA sample is compared to a standardized normal reference value for a normal fetus, and diagnosis of fetal trisomy 21 is made when the test value is higher than the standardized normal reference labeling value for a normal fetus. In another aspect, the test value can be lower than the standardized normal reference labeling value for a normal fetus.
Alternatively, the test value for the labeled DNA sample can be compared to a standardized reference labeling value for a fetal trisomy 21 fetus, and diagnosis of fetal trisomy 21 can be made when the test value is comparable to the standardized reference labeling value for a fetal trisomy 21 fetus.
To establish the standardized normal reference labeling values for a normal fetus, maternal blood samples from the pregnant women carrying a normal fetus are subjected to the same steps of the diagnostic method, namely amplification of the ODN-derivatized CGs and their neighboring genomic sequences to obtain a reference DNA sample, and then determining the labeling value and the levels of at least one region of chromosomal DNA by sequencing or qPCR wherein selected from Tables 4-7.
In order to establish the standardized normal reference methylation values for a normal fetus, healthy pregnant women carrying healthy fetuses or healthy non-pregnant women are selected. Pregnant women are of similar gestational age, which is within the appropriate time period of pregnancy for screening fetal chromosomal aneuploidy, typically within the first trimester of pregnancy. Standardized reference labeling values for a T21 fetus can be established using the same approach as described above for establishing the standardized reference values for a healthy fetus, except that the maternal blood samples used to establish the T21-specific reference values are from pregnant women who have been determined to be carrying a fetus with fetal trisomy 21.
This example provides the methodology for the preparation of the labeled genomic libraries of the mentioned-above biological samples for genomic mapping of unmodified or hydroxymethylated CGs. Also, this example provides the strategy for DLRs determination and how DLRs for detection of trisomy T21 were preferentially chosen.
Biological Samples.
We performed analysis of three distinct sample types, enabling a characterization of the unmethylated and hydroxymethylated CGs in DNA obtained from plasma of pregnant women; we created single CG resolution uCG and 5hmCG maps of placental chorionic villi (CV) tissue samples from the 1st trimester abortions (CVS; n=6 of uCG and n=3 of 5hmCG); cfDNA samples of female non-pregnant controls (NPC; uCG n=6 and 5hmCG n=7) and cfDNA samples of pregnant women carrying healthy fetuses (uCG n=7 and 5hmCG n=6) or fetuses with the trisomy 21 (uCG n=5 and 5hmCG n=4).
Circulating DNA from maternal blood samples was extracted using the MagMax Nucleic Acid Extraction kit (Thermo Fisher Scientific (TS)) or the QIAamp DNA blood Midi Kit (QIAGEN), and DNA from chorionic villi tissue was prepared by phenol extraction.
All the maternal peripheral blood DNA samples (1st trimester pregnancies) and chorionic villi samples (1st trimester abortions) were obtained at Tartu University Hospital (Tartu, Estonia) through collaboration with Tartu University (Estonia). Consent forms approved by the Research Ethics Committee of the University of Tartu (ethical permission No. 246/T-21 and 213/T-21) were collected for each of the mother participated.
Mapping of Unmodified/Hydroxymethylated CGs in DNA Extracted from Biological Samples.
In uTOP-seq, 4-10 ng of cfDNA (or 100 ng of CV tissue DNA, sheared to 200 bp by Covaris sonicator) were labeled with 0.11 ΣM eM.Sssl (Kriukienė et al. 2013) in 10 mM Tris-HCl (pH 7.4), 50 mM NaCl, 0.5 mM EDTA buffer supplemented with 200 μM Ado-6-azide cofactor (Masevicius et al, 2016) for 1 h at 30° C. followed by thermal inactivation at 65° C. for 20 min and Proteinase K treatment (0.2 mg/ml) for 30 min at 55° C. and finally column purified (GeneJET PCR purification kit, (TS)). In hmTOP-seq, 5hmC glycosylation was carried with 5-10 ng of cfDNA supplemented with 50 μM UDP-6-azide-glucose (Jena Bioscience) and 2.5-5 U T4 β-glucosyltransferase (TS) for 1 h 37° C. followed by enzyme inactivation at 65° C. for 20 min and column purification (GeneJET PCR Purification kit (TS)). After ligation of the partially complementary adapters as described previously (Staševskij et al. 2017), covalently labeled DNA was supplemented with 20 μM alkyne-containing DNA oligonucleotide (which was biotinylated for construction of 5hmC maps) (ODN; 5′-T(alkyneT)TTTTGTGTGGTTTGGAGACTGACTACCAGATGTAACA-3′ (or -(biotin)-3′), Base-click) and 8 mM CuBr: 24 mM THPTA mixture (Sigma) in 50% of DMSO, incubated for 20 min at 45° C. and subsequently diluted to <1.5% DMSO before a column purification (GeneJET NGS Cleanup Kit, Protocol A (TS)). DNA recovered after biotinylation step was incubated with 0.1 mg Dynabeads MyOne Cl Streptavidin (TS) in a buffer A (10 mM Tris-HCl (pH 8.5), 1 M NaCl) at room temperature for 3 h on a roller. DNA-bound beads were washed 2× with buffer B (10 mM Tris-HCl (pH 8.5), 3 M NaCl, 0.05% Tween 20); 2× with buffer A (supplemented with 0.05% Tween 20); 1× with 100 mM NaCl and finally resuspended in water and heated for 5 min at 95° C. to recover enriched DNA fraction. Purified DNA after oligonucleotide conjugation (uCG) or biotin-enrichment (5hmC) was subsequently used in a priming reaction with 1 U Pfu DNA polymerase (TS), 0.2 mM dNTP, 0.5 μM complementary priming oligonucleotide (EP; 5′-TGTTACATCTGGTAGTCAGTCTCCAAACCACACAA-3). The reaction mixture was incubated at the following cycling conditions: 95° C. 2 min; 5 cycles at 95° C. 1 min, 65° C. 10 min, 72° C. 10 min. Amplification of a primed DNA library was carried out by adding the above reaction mixture to 100 μl of amplification reaction containing 50 μl of 2× Platinum SuperFi PCR Master Mix (TS) and barcoded fusion PCR primers A(Ad)-EP-barcode-primer (63 nt) and trP1(Ad)-A2-primer (45 nt) at 0.5 μM each. Thermocycler conditions: 94° C. 4 min; 15 cycles (uCG) or 17 cycles (5hmC) at 95° C. 1 min, 60° C. 1 min, 72° C. 1 min. The final libraries were size-selected for −270 bp fragments (MagJET NGS Cleanup and Size Selection Kit, (TS)), and their quality and quantity were tested on 2100 Bioanalyzer (Agilent). Libraries were subjected to Ion Proton (TS) sequencing.
Data Analysis.
Raw TOP-seq and hmTOP-seq sequencing reads were processed as described in Staševskij et al. (2017) and Gibas et al. (2020, accepted) except for the 3′ sequence ends where adapter sequences were trimmed only if they were identified using cutadapt with maximum allowed error rate 0.1 (Martin 2011). Processed reads were mapped to reference human genome version hg19 and coverage for each CG dinucleotide was computed as the total number of reads starting at or around the CG dinucleotide on either of its strands. We define CG coverage as the total number of reads, c, on any strand starting within absolute distance, d. We retained only reads with d≤3. Only reads aligned to chromosomes 1 to 22, X and Y were used for further analysis. On average, 40% of the raw reads were retained for downstream analysis per sample.
Outlier identification was performed separately for uCG and 5hmC samples. CG coverage matrices were transformed using Hellinger transformation (Legendre and Gallagher, 2001) and then represented in two-dimensional space using non-metric multidimensional scaling (nMDS) with Bray-Curtis similarity index (Bray and Curtis, 1957). Samples that were further than two standard deviations away from the mean of their own sample group (cfDNA of non-pregnant controls, other cfDNA, CV tissue) in either nMDS1 or nMDS2 dimension were deemed outliers and removed from further analysis. There were three outlying samples in uCG and one in 5hmCG dataset.
Identification of DLRs in Chromosomes 21, 13 and 18.
The strategy for DLR identification is show in
First, we obtained the pregnancy-specific u-DLRs by comparing NPC samples with cfDNA samples of healthy pregnancies. For each window a full and null logistic regression models were fitted. Full model included coverage, identified fraction, and, for T21-specific DLRs, fetal sex and fetal fraction, as independent variables. Coverage and identified fraction were excluded from the null model. ANOVA Chi-squared test was used to compare full and null models to obtain p value. In cases where models did not converge fetal sex was removed and p value evaluated again. Model statistics were moderated using empirical Bayes adjustment. FDR was used to adjust p values for multiple testing and q<0.05 was used as significance threshold.
Next, we used the same strategy to obtain tissue-specific u-DLRs (FDR q<0.05; logistic regression) by comparing NPC and CV tissue samples. The same analytic approach was used separately for uCG and hmCG data. In case of hm-DLRs, nominal p value threshold was used when analysis did not yield any FDR significant DLRs.
Further, for each hypomodified pregnancy-specific and tissue-specific u-DLR or hyper-hydroxymethylated pregnancy-specific and tissue-specific hm-DLR in chromosome 21 a leave-one-out cross-validation procedure was performed in order to determine its ability to diagnose T21. For each cross-validation cycle Bayesian generalized linear model (Gelman et al. 2008) with normalized coverage and identified CG as independent variables was constructed on the training samples. The model was then applied on the testing sample returning the predicted probability of the sample belonging to the T21 category. After all the cross-validation cycles the prediction probabilities for all samples were taken together. Various thresholds that would determine the discrete sample class from continuous probability measurement may have different effects on predictor's specificity and sensitivity. Therefore, a receiver-operating characteristic curve analysis was performed to estimate the effect of any threshold. The area under receiver-operating characteristic curve indicates the overall accuracy of the model. Those DLRs for which area under the curve was equal to 100% and, therefore, could achieve 100% prediction accuracy, were deemed to be T21-predictive DLRs (
Using the strategy for DLR determination in chromosome 21, we obtained 2,761 pregnancy-specific u-DLRs (FDR q<0.05) and 16,555 fetal tissue-specific u-DLRs (FDR q<0.05; logistic regression). For hm-DLR identification, we used nominal p<0.05 threshold and identified 4,930 pregnancy-specific hm-DLRs and 15,986 tissue-specific hm-DLRs.
An in-depth investigation of the identified DLRs between non-pregnant female peripheral blood and placental DNA samples or non-pregnant and pregnant female cfDNA samples, has led to the selection of a list of DLRs located on chromosome 21 for diagnosing trisomy 21. The selection criteria of the regions were based firstly on the labeling intensity status of the regions in maternal blood samples and CV DNA samples, or on the labeling intensity status of the regions in the non-pregnant and pregnant female maternal blood samples. More specifically, the selected regions should demonstrate a high labeling intensity status in CV tissue DNA and a low labeling intensity or absence of labeling in peripheral blood samples of NPCs, or should show a high labeling intensity status in pregnant female blood samples and a low labeling intensity or absence of labeling in NPCs. Using leave-one-out cross-validation as described above we discovered 4175 tissue-specific u-DLRs; 163 pregnancy-specific u-DLRs; 8815 tissue-specific hm-DLRs, 679 pregnancy-specific hm-DLRs in chromosome 21 that classified the samples according to fetal karyotype with 100% accuracy (the selected DLRs are shown in Tables 4 and 5, for the uCG and hmCG signal, respectively) (
Furthermore, considering global epigenetic changes in Down syndrome affected fetuses (Jin et al. 2013), we also employed an alternative approach to identify the trisomy 21-specific DLRs. We evaluated modification differences between cfDNA samples of healthy and T21-diagnosed pregnancies and identified differentially modified DLRs. A logistic regression model was fitted to each 100 bp window with the CG-coverage and CG-fraction as independent variables and karyotype as the response variable, as above. In addition, we adjusted for possible confounding effects of fetal fraction and fetal gender which could not be accounted for in the previous analyses. With such approach, we identified 3,490 u-DLRs and 2,002 hm-DLRs (FDR q<0.05; logistic regression). The selected T21-specific DLRs that discriminate most the sample groups of healthy and T21-diagnosed pregnancies are shown in Tables 4 and 5, for uCG and hmCG signal, respectively) (
Using the same strategy for DLR identification shown in
The total number of fetal specific hypomethylated and hyper-hydroxymethylated tissue- and pregnancy-specific DLRs identified across chromosomes 21, 13 and 18 is summarized in Table 1.
This example provides the strategy for determination of individual labeled CGs (CG-DLRs) following analysis of the samples described in Example 1 that can be used for detection of fetal trisomy T21.
An investigation of labeling intensities of uCGs and hmCGs in peripheral blood samples of women that were confirmed to be carrying a fetus with trisomy 21 against labeling intensities of uCGs and hmCGs in the three types of control samples, i.e. placental CV tissue DNA, peripheral blood samples of non-pregnant women and peripheral blood samples of women pregnant with healthy fetuses, has led to the selection of individual CG-DLRs located on chromosome 21 for detection of fetal T21. The selection criteria of the CG-DLRs were based firstly on a labeling intensity status of CGs in blood samples of women pregnant with T21-diagnosed fetuses. More specifically, the selected CG-DLRs should demonstrate a high labeling intensity status in blood samples of women pregnant with T21-diagnosed fetuses and a low labeling intensity or absence of labeling in the three other sample types: CV tissue DNA, peripheral blood samples of NPC and pregnant female carrying a healthy fetus.
The CGs with non-zero coverage and non-zero variance were used. The read coverage was log transformed. CGs from chromosome 21 were used for detection of T21 markers. Samples from non-pregnant female and pregnant with healthy fetuses women and CV tissue samples were marked as control, whereas only the female samples with T21 positive fetuses were marked as cases. A linear regression model was fitted for every CG, and resulting model fits were moderated using empirical Bayes adjustment. The CGs with FDR q value less than 0.05 and log fold change more than 1.2 were taken as significant. The list of the selected T21 CG-DLRs is shown in Table 6 (
Identification of CG-DLRs for Determination of Fetal Sex.
Similarly, CGs from chromosome X (and Y) were analyzed for identification of CG-DLRs for fetal gender determination. A no intercept linear regression model was fitted for each CG and a contrast fit was used to determine differences between male and female samples. Resulting model fits were moderated using empirical Bayes adjustment. The CGs with FDR q value less than 0.05 and log fold change more than 1 were taken as significant. The list of the selected gender CG-DLRs is shown in Table 6 (
In this example, individual CGs or CG-DLRs identified according to the methodology described in Examples 1 and 2 were used for their validation by qPCR. A flowchart diagram of the methodology is shown in
Detection of Fetal Trisomy T21 by qPCR.
The difference in labeling intensity at specific CG-DLRs, shown in Table 6, was tested in blood samples of pregnant female carrying healthy or T21-diagnosed fetuses (
In this embodiment, the plurality of CG-DLRs on chromosome 21 comprises one region or a combination of at least two regions, selected from Table 6. The invention also pertains to a composition comprising nucleic acid probes that selectively detect the regions shown in Table 6, preferably, the pair/set of oligonucleotide primers are selected from Table 2.
Detection of Fetal Gender by qPCR.
In another embodiment of the invention, the experimentally acquired value for the presence or availability of labeled CGs is estimated through qPCR, in a total untreated, i.e. non-ligated to adaptors and non-preamplified, maternal blood sample as shown in
Firstly, the difference in the abundance of DLR regions starting at specific CGs shown in Table 6 was tested in the 1st trimester CV tissue DNA of both genders and non-pregnant female blood sample DNA. Then, we mixed CV tissue DNA and non-pregnant female peripheral blood plasma DNA to the ratios 20/80 and 0/100 of the CV and plasma DNA, respectively. 10 ng of each sample mixture were labeled and derivatized with the ODN as described above. Next, 1.5 ng of each sample was analyzed in replicates by qPCR. The coordinates of the u-CG-DLRs on chromosomes X and Y and primers for qPCR are shown in Table 3.
In more detail, DNA of each sample were labeled with eM.Sssl MTase in the presence of 200 μM Ado-6-azide cofactor for 1 hour at 30° C. as described in Example 1 followed by column purification (Oligo Clean&Concentrator-5, Zymo Research). Then, DNA eluted in 8 ul of Elution Buffer was supplemented with 20 uM alkyne DNA oligonucleotide (ODN, 5′-T(alkyneU)TTTTGTGTGGTTTGGAGACTGACTACCAGATGTAACA), the mixture of 8 mM CuBr and 24 mM of THPTA (Sigma) in 50% of DMSO, incubated for 20 min at 45° C. and subsequently diluted to <1.5% DMSO before purification through the GeneJET NGS Cleanup kit (TS). 1.5 ng of the purified DNA were used for measurement of the labeling intensity of uCGs by qPCR with a Rotor-GeneQ real-time PCR system (Qiagen) using Maxima SybrGreen/ROX qPCR Master Mix (TS). 0.3 mM of each primer pair was used in each reaction, wherein one of the primers binds complementarily to the ODN and to 5 nucleotides of the template genomic DNA adjacent to the derivatized CG site, and another primer binds in a vicinity of the CG to allow PCR amplification of the region (or selected DLR) to occur. The amplification program was set as: 95° C. for 10 min, 40 cycles 95° C. for 15 s, 65° C. for 30 s, 72° C. for 30 s (
This example describes the independent validation of non-invasive testing for fetal trisomy 21. For this purpose, we have performed qPCR-based analysis of a small group of samples which have not been used in the previous Examples for identification of validation of DLRs. The group consists of 3 maternal peripheral blood samples from women bearing a normal fetus and 2 maternal peripheral blood samples from women bearing a trisomy 21-affected fetus.
These maternal peripheral blood samples were obtained at a gestational age of between 12-13 weeks at Tartu University Hospital (Tartu, Estonia) through collaboration with Tartu University (Estonia). Consent forms approved by the Research Ethics Committee of the University of Tartu (ethical permission No. 246/T-21 and 213/T-21) were collected for each of the mother participated.
The fetal specific approach used herein is illustrated schematically in
An in-depth investigation of our previously identified DLRs, described in Examples 1 and 2, has led to selection of DLRs located on chromosome 21. A group of selected DLRs has been used for identification of fetal trisomy 21 by qPCR (Example 3). These DLRs demonstrate a hypomethylated or hyper-hydroxymethylated, and thus more labeled, status in peripheral blood DNA of pregnant women carrying a T21-diagnosed fetus and a hypermethylated or hypo-hydroxymethylated, and thus less labeled, status in CV tissue DNA and peripheral blood DNA of pregnant women carrying a normal fetus and in peripheral blood DNA of non-pregnant women in order to achieve the enrichment of fetal T21-specific CG-labeled regions. These selected CG-DLRs shown in Table 2 were used for analysis of the samples by qPCR.
The procedure of sample processing and qPCR cycle conditions used were as described in Examples 1 and 3. Briefly, 5-10 ng of maternal cfDNA was covalently derivatized with the ODN and the adaptors were ligated to the ends of DNA fragments. The labeled CG regions were enriched through the ODN-mediated polymerization of the adjacent genomic regions and such regions were subsequently amplified using the primers complementary to the ODN and one strand of the adaptors. Then, the amounts of u-CG-DLRs and hm-CG-DLRs was calculated by qPCR as shown in Example 3 using a combination of CG-DLRs and qPCR primers listed in Table 2.
Comparing the obtained test values of the samples with known karyotype (the T21-diagnosed samples show lower test values than normal cases), all T21-diagnosed samples were confirmed as having trisomy 21, indicating 100% specificity and 100% sensitivity of the approach (
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/IB2020/053011 | 3/30/2020 | WO |