This invention relates to a method of analyzing cellular chromosomes, and particularly relates to a method of analyzing the chromosomes of amniotic cells by sequencing.
Fetal chromosomal aneuploidy means a condition that the number of chromosome is not diplontic in a fetal genome. Normally, there are 44 autosomes and 2 sex chromosomes in a human genome in which the male karyotype is (46, XY) and the female karyotype is (46, XX). Fetal chromosomal aneuploidy may refer to the condition of having one more chromosome than a normal diploid fetus, i.e. 47 chromosomes in the fetal genome. Take fetal trisomy 21 for example, compared with a normal diploid fetus, the fetus with trisomy has an extra chromosome 21 with the karyotype of 47, XX (or XY), +21. Also, fetal chromosomal aneuploidy refers to the condition of missing a chromosome in comparison with the normal diploid fetus, i.e. 45 chromosomes in the fetal genome. For example, the fetus with Turner syndrome with the karyotype of 45, XO misses one chromosome, relative to the normal diploid fetus. Fetal chromosomal aneuploidy also refers to the condition that a part of chromosome is lost, or example, translocated trisomy 21 with the karyotype of 45, XX, der (14; 21)(q10; q10), and Cri du chat syndrome with the karyotype of 46, XX (XY),del (5)(p13).
According to incomplete statistics, the birth rate of fetuses with chromosomal aneuploidy is 1/160 in the world, wherein the birth rate of fetal trisomy 21 (T21, Down syndrome) is 1/800, the birth rate of fetal trisomy 18 (T18, Edwards syndrome) is 1/6000, and the birth rate of trisomy 13 (T13, Patau syndrome) is 1/10000. The development of fetuses with other types of the chromosomal aneuploidy stagnates because some developmental stages could not be accomplished, resulting in clinically unreasoned miscarriage at the early stage of gestation (Deborah A. Driscoll, M. D., and Susan Gross, M. D., Prenatal Screening for Aneuploidy[J]. N Engl J Med 2009; 360:2556-62).
Amniotic cells are epithelial cells floating in the amniotic fluid, which derive from skin, digestive tracts and respiratory tracts of fetus. The procedure of culturing amniotic cells, enriching the number of fetal nucleated cells, preparing chromosomal specimen, and analyzing the fetal chromosomal karyotype is a golden standard of traditionally clinically diagnosing the chromosomal abnormality of fetuses. This technique is reliable, accurate, and enables the observation of abnormalities of chromosome number and structure. Its disadvantages, however, are that it is time-consuming, it takes 10 days to 3 weeks to yield the result, and culturing failure rate is about 1.00% (THEIN A T, ABDEL-FATTAH S A, KYLE P M, et al., An Assessment of the Ase of Interphase FISH with Chromosome Specific Probes as an Alternative to Cytogenetics in Prenatal Diagnosis [J]. PrenatDiagn, 2000, 20(4): 275-280).
The reason of high failure rate of culturing amniotic cells is that the amniotic cells are aging and pyknotic cells, resulting in harder culturing than that of the other tissues (Changjun Ma, Yuania Chen, Peidan Huo; The Culture of Amniotic Cells and the Method of Preparing the Chromosomal Specimen Thereof [J]. Reproduction and Contraception, 1985, 5 (1): 53-4). Therefore, successful culturing of amniotic cells plays a critical role in the process of detecting chromosomal aneuploidy. Because of the high requirement of culturing amniotic cells, relatively few viable cells in the amniotic fluid of some of pregnant women, relatively few harvested cells with division phases, and poor chromosomal shape, it is difficult to count and analyze a sufficient number of cells, and detect chromosomes. Furthermore, amniocentesis is a risk with about 2-3% of pregnant women suffering complications, such as uterine contractions, abdominal swelling, tenderness, vaginal bleeding, infection, water leakage, or fetal injury. It would be unacceptable for a pregnant woman if being asked to do second amniocentesis if the culturing of amniotic cells fails or the harvested cells are not sufficient to count and analyze. Moreover, the amniocentesis is generally performed at 16-20 weeks' gestation, once the culture fails, many pregnant women's gestational period has advanced too far, and thus have to undergo cordocentesis with even higher risk. After the culturing of amniotic cells, the analysis of karyotype requires a lot of labor and costs such that many hospitals cannot afford the procedure, causing great difficulties in the clinical spreading and application of amniocentesis.
With the continuous development of sequencing techniques, it is being increasingly applied in the detection and analysis of the chromosome number. Dennis Lo et al. used the peripheral blood of a pregnant woman as experimental material to examine the abnormality of chromosome number by means of massive sequencing based on mathematical statistics methods (Y. M. Dennis Lo, et al., Quantitative Analysis of Fetal DNA in Maternal Plasma and Serum: Implications for Noninvasive Prenatal Diagnosis. Am. J. Hum. Genet. 62:768-775, 1998). But this method cannot completely replace amniocentesis in clinical application, because of some defects occurring in the technique: the cell-free DNAs in plasma are fragmented DNAs, they cannot form a complete genome on their own, the aneuploidy, translocation or mosaicism of the chromosomes other than chromosomes 21, 18, and 13 fails to be detected or have a low detective accuracy.
In order to overcome the missed results of detection caused by the sequencing of peripheral blood, and resolve the high failure rate of the culturing of amniotic cells, this invention, in combination with the advantages of the analysis of karyotype by amniocentesis and the method of sequencing the cell-free DNAs in plasma, utilizes a method of detecting chromosomal aneuploidy based on massive sequencing of amniotic cells, including the steps of drawing amniotic cells, isolating DNA, conducting high-throughout sequencing, analyzing the obtained data, and acquiring detection results.
In one aspect of the invention, a method of using high-throughout sequencing technique to analyze the chromosomal information of a subject's cells is provided, comprising the steps of:
a. randomly breaking the genomic DNA of the cells, obtaining DNA fragments with a certain size, and sequencing them;
b. strictly aligning the DNA sequences sequenced in step a to the reference sequence of the human genome to obtain the information about the DNA sequences located on a particular chromosome;
c. for the particular chromosome N, determining the total number of the sequences mapped to a sole region of the chromosome, among the above-sequenced DNA sequences, thereby making ChrN % for chromosome N, i.e. ratio of the total number (S1) of the sequences mapped to the sole place of chromosome N, among the above-sequenced DNA sequences, to the total number (S2) of the sequences located on all chromosomes, among the above-sequenced DNA sequences: ChrN %=S1/S2;
d. comparing ChrN % for chromosome N with ChrN % for the corresponding chromosome coming from standard cells to determine whether there exists a difference between the chromosome of the cells and the corresponding one of the standard cells.
In the invention, the cells may be, for example, amniotic cells, wherein the amniotic cells may be uncultured amniotic cells or cultured amniotic cells. In one embodiment of the invention, to avoid culturing amniotic cells, the amniotic cells are uncultured amniotic cells.
In the invention, the genomic DNA of the cells may be obtained by traditional methods of isolating DNA, such as salting-out, column chromatography, and SDS, preferably by column chromatography. The so-called column chromatography involves using cell lysis buffer and protease K to treat amniotic cells or tissues to expose naked DNA molecules, making them pass through a silica membrane column capable of binding negatively charged DNA molecules, to which the genomic DNA molecules in the system are reversibly adsorbed, removing the impurities such as proteins or lipids by washing buffers, and diluting by purifying buffers to obtain the DNA of amniotic cells (for more details about specific principles and methods, see the product manuals for product No. 56304 from Qiagen and product DP316 from Tiangen).
In the invention, the DNA molecules are randomly broken by restriction cleavage, atomization, ultrasound, or HydroShear method. HydroShear method is preferably used (when the solution containing DNA is flowing through a passage with small section, the flowing rate is accelerated, creating a force enough to destruct suddenly the DNA to produce DNA fragments in various sizes depending on the flowing rate and the section area. For more details about specific principles and methods, see product manuals of HydroShear from Life Sciences Wild). In this way the DNA molecules are broken into fragments with a narrow range of sizes, of which major bands generally range from 200 bp to 300 bp in size.
The sequencing method adopted in the invention may be the second generation sequencing method such as Illumina/Solexa or ABI/SOLiD. In one embodiment of the invention, the sequencing method is Illumina/Solexa and the resultant sequences are fragments with 35 bp in size.
When the DNA molecules to be examined are from multiple samples, each sample may be attached a different tagged sequence index so as to be processed during the process of sequencing (Micah Hamady, Jeffrey J Walker, J Kirk Harris et al., Error-correcting Barcoded Primers for Pyrosequencing Hundreds of Samples in Multiplex. Nature Methods, 2008, March, Vol. 5 No. 3).
In the invention, the reference sequence of the human genome is produced after the shield of the repeated sequences within the human genome sequence, for example, the latest version of the reference sequence of the human genome in the NCBI database. In one embodiment of the invention, the human genome sequence is the reference sequence of the human genome as shown in version 36 (NCBI Build 36) of NCBI database.
In the invention, aligning strictly with the reference sequence of the human genome means that the adopted method of alignment is a fault-intolerant alignment of the sole region located in the reference sequence of the human genome. In one embodiment of the invention, alignment software Eland (a software package provided by Illumina) was used, and the method adopted was an absolute, fault-intolerance alignment.
In the invention, when the said DNA sequences is a sequence which is able to be located at a sole region of the reference sequence of the human genome, it is defined as sole sequence represented by Unique reads. In the invention, for the purpose of avoiding the interference of the repeated sequences, it is needed to remove those DNA sequences located at the regions of tandem repeats and transpositional repeats within the reference sequence of the human genome and merely take into account those DNA sequences, i.e. sole sequences, which may be located at a sole region. Generally, of all the sequenced DNA sequences, about a quarter to a third of DNA sequence are able to be located at a sole region of the genome, i.e. sole sequences. The statistical number of these sole sequences represents the distribution of the DNA sequences on the genomic chromosomes.
Therefore, the sole sequences can assist in the localization of each DNA sequence that is produced by breaking and sequencing the DNA molecules isolated from amniotic cells on a particular chromosome. ChrN % is values produced by normalizing the sole sequences found on different chromosomes, and the values are merely relevant to the size of a particular chromosome rather than the amount of the data being sequenced. Thus the values can be used to analyze the information on individual 46 chromosomes. Therefore, ChrN % is basic value to conduct a chromosomal analysis.
In the invention, whether there exists a difference between the number of a particular chromosome in the cellular samples and the standard cells can be determined by drawing a boxplot, wherein a sample for which ChrN % corresponds to an outlier that goes beyond 1.5-3 time or above 3 times the interquartile range is determined to differ from the standard cells in the chromosome number, i.e. aneuploidy.
In the invention, determining whether there exists a difference between a particular chromosome respectively in the said cellular samples and in the standard cellular samples may be accomplished by using “z score_ChrN” to indicate the deviation of ChrN % for the said cellular samples from ChrN % for the standard cellular samples.
Specifically, z score_ChrN=(ChrN % for a particular chromosome from detection samples−ChrN % mean (mean_ChrN %) for the particular chromosome)/ChrN % standard deviation (S.D._ChrN %).
If z score_ChrN is extremely large or small, it means that the deviation of the chromosome number in the cellular detection sample from that of the normal sample is significant. When it reaches a given level of significance, it may be believed that there is an apparent difference between the former number and the latter number.
In the invention, the average value of ChrN % for a particular chromosome from the standard cellular samples may be determined according to ChrN % for the chromosome from such as at least 10, 20, 30, 50, or 100 standard cellular samples.
In the invention, the standard cellular samples are the samples of human cells in which the number of the chromosomes is diploid. A normal male cell has 44 autosomes and 2 different sex chromosomes, (46, XY). On the other hand, a normal female cell has 44 autosomes and 2 identical sex chromosomes, (46, XX).
In the invention, the ChrN % standard deviation (S.D._ChrN %) for a particular chromosome from the standard cellular samples may be determined according to the ChrN % for the chromosomes, such as at least 10, 20, 30, 50, or 100 standard cellular samples.
In one embodiment of the invention, the standard cellular samples have 20 samples from normal males and 10 samples from normal females, numbered, respectively, with 1, 2, . . . , 30, in which Nos. 1-20 are the detection samples from normal males, Nos. 21-30 are the detection samples from normal females. The average value of ChrN % (mean_ChrN %) for the standard cellular samples is calculated as follows:
(Note: due to the fluctuation of sequencing and a large number of gaps existing on Y chromosome of the reference sequence, it results in that even for the normal female samples there are a few DNA sequences aligned with Y chromosome. As compared with males, however, the ChrN % for females is much less than that for males. In the examples, the ChrN % for females is around 0.004, whereas the ChrN % for males is around 0.114.)
Based on each ChrN % mean (mean_ChrN %) for the standard cellular samples obtained by the method described above, the ChrN % standard deviation (S.D._ChrN %) is calculated with the following formula:
Since there is a missed X chromosome replaced by Y chromosome among male chromosomes in contrast to female chromosomes, and the whole length of the X chromosome is about 155M, whereas the Y chromosome is about 59M. In detecting these sex chromosomes, it is necessary to establish a set of normal distribution curves concerning ChrX % or ChrY % for different agendas. The most accurate analysis for X chromosome can be obtained from the different agenda-based normal distribution curves.
In one embodiment of the invention, 30 standard cellular samples were selected to conduct the chromosomal analysis. Then a normal distribution curve was established under the requirement of significance level (such as 0.1%) for normal distribution reached in the instance of having a difference between the number of simulated chromosomes and that in standard cells. Thus, the instance of the absolute value of the z score_ChrN being determined to be below 3 was defined by the number of chromosomes being the same as that in the standard cells. On the basis of the results above, then the chromosomes of the detection samples were analyzed as follows:
If the absolute value of the z score amounts to 3, then the samples have a 99.9% probability that they are not among the normally distributed population, i.e. outliers. This means that the chromosome number of the detected cells differs from that of the standard cells, i.e. chromosomal aneuploidy.
If the absolute value of the z score is less than 3, then the samples have a 99.9% probability that they are normal samples, which means that the chromosome number of the detected cells is the same as that of the standard cells.
If the absolute value of the z score is greater than 3, then the samples have a 99.9% probability that they are abnormal samples, which means that the chromosome number of the detected cells differs from that of the standard cells, i.e. chromosomal aneuploidy.
Further, in the invention, if the absolute value of the z score is greater than 3, for the specific instance of chromosomal aneuploidy occurring in the detected cells, the Z reference value (cutoff value) may be used to determine it. The Z reference value is calculated with the following formula:
Z=(mean_ChrN %×0.5×X%)/S.D._ChrN %
When N represents autosomes, mean_ChrN % and S.D._ChrN % are the means for all of the samples of the standard cells. When N represents sex chromosomes, mean_ChrN % and S.D._ChrN % are the means for the samples of the standard cells of respective agenda;
X may be any integer between, inclusive, −100 and 100, such as −100, −90, −80, −70, −60, −50, −40, −30, −20, −10, 0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100.
In one embodiment of the invention, when X amounts to 100, it represents that the cellular detection samples have one more chromosome than the standard cells. In one embodiment of the invention, when X amounts to −100, it represents that the cellular detection samples have one less chromosome than the standard cells. When X amounts to 50, it represents that the cellular detection samples have extra half of one chromosome than the standard cells. In one embodiment of the invention, when X amounts to 50, it represents that the cellular detection samples lacks half of one chromosome in comparison with the standard cells.
In the invention, in calculating the Z reference value for sex chromosomes, mean_ChrN % and S.D._ChrN % are calculated for either female samples or the male samples. That is:
For the male samples, the Z reference value reached may be (mean_ChrN % (male)×0.5×X %)/S.D._ChrN % (male);
For the female samples, Z reference value reached may be (mean_ChrN % (female)×0.5×X %)/S.D._ChrN % (female).
When the absolute value of the z score_ChrN is greater than or equal to 3 and reaches the absolute value of the Z reference value, there is a significant difference between the number of a particular chromosome in the detected cells and that in the standard cells equal to X %. For example, when X amounts to 50, it represents that the detected cells have extra half of the particular chromosome compared with the standard cells; when X amounts to 100, it represents that the detected cells have one particular chromosome more than the standard cells.
In one embodiment of the invention, the specific method of analyzing the chromosomes of amniotic cells includes the following steps:
A library was built according to the modified Illumina/Solexa standard procedure of building a library after the DNA of amniotic cells was isolated in accordance with the manual from Tiangen Micro Kit. Then adapters for sequencing were added to the both ends of the randomly broken DNA fragments. During the process, a different tagged sequence (index) was attached to each of the samples such that multiple samples could be differentiated in the data obtained from one-time sequencing.
After sequencing by using the second generation sequencing technique, known as Illumina/Solexa sequencing (other alternate sequencing methods can also be used to achieve the same or similar effects), the fragmented DNA sequences of a specified size were produced for each sample, which were subjected to alignment strictly with the reference sequence of the human genome. Thus, the information was obtained on the location of the sequences at the corresponding regions of the genome.
Such a restricted alignment was required because it could not be determined from which chromosome a given DNA sequence originated if fault-tolerant alignment or the alignment with multiple regions was allowed. This would be unfavorable for the subsequent analysis of the data.
The total number of the sole sequences located on each chromosome was calculated by taking each chromosome as a unit, thereby making the ChrN % for each chromosome, i.e. ratio of the total number (S1) of the sequences among the above-sequenced DNA sequences, which are located at the sole place of chromosome N, to the total number (S2) of the sequences among the above-sequenced DNA sequences, which are located on all chromosomes: ChrN %=S1/S2.
This is a method of normalization for different samples having a different sequencing amount. Because amniotic cells contain the whole information of 46 chromosomes, theoretically the total number of the sequences located on a given chromosome is directly proportional to the length of the chromosome.
For example, chromosome 1 is the largest chromosome (about 247 M) in the human genome, whereas chromosome 21 is the smallest chromosome (about 47 M) in the human genome, therefore, given a certain total amount of sequencing, the sequencing results from normal diploid amniotic cells are nearly a fixed value. Although in some sequencing and experimental conditions, the ChrN % is not directly proportional to the size of chromosomes, it is usually a fixed value.
By a boxplot boxplotanalysis, it may be directly determined whether the cellular detection samples are likely to differ from the standard cells in the chromosome number. The samples with abnormal values are directly considered as suspect samples, and the others are considered as standard cellular samples. The detailed process is as follows:
In the invention, in order to help with the data analysis, a boxplot (used for differentiate abnormal values in mathematical statistics) involving the ChrN % produced above is adopted to determine suspect samples. The specific drawing process is as follows:
1) calculating the upper-quartile (75%), median (50%), and lower-quartile (25%);
2) calculating the difference, interquartile range (IQR), between the upper-quartile and lower-quartile;
3) drawing the upper and lower ranges of a boxplot with the upper limit being the upper-quartile and the lower limit being the lower-quartile, and drawing a horizontal line where the median lies inside the box;
4) values which are 1.5 times greater than the upper-quartile of the interquartile range or 1.5 times less than the lower-quartile of the interquartile range are classified as outliers.
5) beyond the outliers, drawing a horizontal line across the two value points closest to the upper margin and the lower margin, respectively, as a “whisker” of the boxplot.
6) extreme outliers going beyond a distance three times longer than the interquartile range are represented with star points; milder outliers that lie within 1.5-3 times as the distance of the interquartile are represented with hollow points.
The bold line in the middle of the box represents median values, and the upper and lower boarders represent the upper and lower quartiles, respectively. Outliers are defined by the points deviating from 1.5 times the distance between the upper quartile and lower quartile. For example, when the detection samples are standard cellular samples, the ChrN % corresponding to their chromosomes is a fixed value (for example, 1). When the ChrN % corresponding to their chromosomes is 1.5, then the difference can be considered greatly significant, thereby making the samples suspected samples. That is, it is likely that they are samples differed from the standard cells in the chromosome number.
If needed, the ChrN % mean and standard deviation (S.D._ChrN %) may be determined, respectively, by the ChrN % for a particular chromosome corresponding to the standard cellular samples. Then the z score_ChrN for the chromosome from the suspected samples are calculated with the following formula:
z score_ChrN=(ChrN % for the particular chromosome from the suspected samples−ChrN % mean)/S.D._ChrN %
If the absolute value of the z score_ChrN is greater than or equal to 3, there is a difference between the number of a particular chromosome in the cellular samples and that in the standard cells.
Further, in the invention, for the specific instance of an abnormal chromosome number occurring in the cells, reference to the Z reference value (cutoff value) may be used to determine it. Value Z is calculated with the following formula:
Z=(mean_ChrN %×0.5×X %)/S.D._ChrN %
When N represents autosomes, the mean_ChrN % and S.D._ChrN % are the mean of all the samples of the standard cells. When N represents sex chromosomes, the mean_ChrN % and S.D._ChrN % are the mean of the samples of the standard cells of the respective agenda;
X is assigned to be 50 or 100. Correspondingly, when X is 100, it represents that the cellular detection samples have one more chromosome than the standard cells. When X is 50, it represents that the cellular detection samples have extra half of one chromosome than the standard cells.
In the invention, in calculating the Z reference value for sex chromosomes, the mean_ChrN % and S.D._ChrN % are the mean for either female samples and male samples, that is:
For the male samples, the Z reference value reached is (mean_ChrN % (male)×0.5×X %)/S.D._ChrN % (male);
For the female samples, the Z reference value reached is (mean_ChrN % (female)×0.5×X %)/S.D._ChrN % (female).
When the absolute value of the z score_ChrN is greater than or equal to 3 and reaches the absolute value of the Z reference value, there is a X % difference between the number of a particular chromosome in the cells and that in the standard cells.
The invention can be used for the analysis of cells, such as amniotic cells. In the invention, DNA can directly be isolated from amniotic cells to be detected without a subculture, which greatly decreases the difficulties such as uneasy attachment, insufficient number, or failure of culture caused by the culture of amniotic cells.
By using the characteristic of amniotic cells containing the entire genomic information about a fetus, the invention is able to make an analysis of the aneuploidy of all of the chromosomes of the cells, rather than examine only the sex chromosomes X and Y and chromosomes 21, 18, 13.
Besides, though the method of determination involved in the invention, as compared with plasma samples, is also dependent on approximately normal distribution established on standard cellular samples, such dependency on the standard cellular samples is greatly reduced. Additionally, abnormal samples can be directly determined from data abnormalities, assuming sufficient data.
By using the method of the invention, a large number of cellular detection samples can be subjected to batch analysis. Hundreds of thousands of cellular detection samples can be detected at one time, thereby greatly saving labors and costs.
The embodiment of the invention will be described in detail in combination with the following examples. A person skilled in the art would appreciate, however, that the following examples are merely intended to make a description of the invention and would not be regarded as the limitation of the scope of the invention. If specific conditions are not specified in the examples, these examples are performed in accordance with commonly used conditions or those advised by manufacturers. If the sources of the reagents or equipment or instruments used in the invention are not specified, such as their manufacturers, all of them are commercially available. The used linkers for sequencing and the tagged sequences index come from Multiplexing Sample Preparation Oligonutide Kit provided by Illumina.
In the following parentheses is manufacturers' product number for each of the reagents or kits.
DNA of amniotic cells was isolated according to the procedure of manipulation of a small amount of genome of Tiangen Micro Kit (DP316), and quantitated with Qubit (Invitrogen, the Quant-iT™ dsDNA HS Assay Kit). The total amount of the isolated DNA varied from 100 ng to 500 ng.
The isolated DNA was either the entire genomic DNA or partially degraded smear-like DNA. A DNA library was built under the standard library-building procedure provided by the modified Illumina/Solexa. Adapters were added to both ends of randomly broken DNA molecules, and attached with different tagged sequence indexes. Then these molecules were hybridized with complementary adapters on the surface of a flow cell, and allowed to be clustered in particular conditions. 36 sequencing cycles were run on an Illumina Genome Analyzer II, producing DNA fragments with 35 bp.
Specifically, Diagenode Bioruptor was used to randomly break about 100-500 ng of DNA isolated from amniotic cells into 300 bp fragments. 100-500 ng of initially broken DNA was used to build a library under Illumina/Solexa. See the prior art for a detailed procedure (Illumina/Solexa manual for standard library-building provided by Illumina's website). The size of the DNA library was determined by way of 2100 Bioanalyzer (Agilent), and the inserted fragments were 300 bp. After accurate quantitation by QPCR, sequencing was performed.
In the example, batch sequencing was conducted of 53 DNA samples isolated from amniotic cells according to Cluster Station and GA II×(SE sequencing) officially published by Illumina/Solexa.
Refer to the prior art (see the manual concerning Pipeline method provided at Illumina's website), the sequence information obtained in step 1 was subjected to one single Pipeline process, and sequences with low quality were removed, finally resulting in ELAND alignment result against the reference sequence of the human genome of NCBI version 36. Then the number of the sole sequences located on chromosomes was statistically analyzed.
The ChrN % for 22 chromosomes and the X/Y chromosome respectively from 53 samples was calculated and a boxplot (see
Percentage of a particular chromosome in detection sample M, ChrN %=the total number of the sole sequences contained in sample M and located on the corresponding chromosome of the reference sequence through alignment (S1)/the total number of the sole sequences contained in sample M and located on all of the chromosomes of the reference sequence through alignment (S2).
According to the boxplot drawn in step 2, it was firstly determined whether an outlier existed. That is, as compared with the upper and lower boarders, if a suspected sample deviated far from the point that was 1.5 times the difference between the upper-quartile and the lower-quartile away, it was likely that it differed from the standard samples in chromosome number.
Specifically, the distribution of the boxplot was observed, and 8 suspected samples (sample Nos. P1-P8) were detected. A normal distribution was established by using as standard samples the data concerning 20 normal males and 10 normal females, chosen randomly from the remaining 45 standard cellular samples after the suspected samples were removed. The ChrN % mean (mean_ChrN %) for each chromosome is designated by mean_ChrN % and standard deviation (S.D._ChrN %) is given in table 1.
Furthermore, in order to examine whether the instance of a half chromosome or an additional chromosome existed in the suspected samples, X was assigned to be 50 or 100 and the corresponding chromosomal Z reference value (cutoff value) was calculated (see table 2):
Z=(mean_ChrN %×0.5×X %)/S.D._ChrN %, wherein N represents chromosomes 1-22, X is 50 or 100.
The z score_ChrN for each chromosome in the suspected samples was calculated with the following formula:
z score_ChrN=(ChrN % for a given chromosome in the detection samples−mean_ChrN %)/S.D._ChrN %.
As seen from the analysis above, the suspected samples were 8 in total among the 53 detection samples of amniotic cells, in which, for the chromosomes in each of the suspected samples, 8 abnormalities of chromosome number with the absolute value of a z score_ChrN greater than 3 were detected (see table 3). Specifically, they were:
1) Chr21 for P1, Chr21 for P2, Chr21 for P3, and Chr21 for P4;
2) Chr18 for P5, Chr18 for P6, and Chr18 for P7; and
3) Chr13 for P8.
It was determined by checking the Z value obtained when X=100 in table 2 that the number of chromosome 21 in samples P1-P4 and the number of chromosome 18 in samples P5-P7 were one more than the number of the corresponding chromosomes in the standard cells, respectively, and the number of chromosome 13 in P8 was half one more than the number of the corresponding chromosome in the standard cells. That is, P1-P4 were T21 (Down syndrome), and P5-P7 were 118 (Edwards syndrome), and P8 was mosaic T13 (mosaic Patau syndrome). The results were completely consistent with the traditional analysis results of chromosomal karyotype.
An additional 6 samples (Q1-Q6) of amniotic cells were treated and sequenced in the same way as the above to produce data for analysis. The z score_ChrN was calculated on mean_ChrN % and S.D._ChrN % calculated from 30 standard cellular samples in example 1. 3 positive samples were identified from the 6 samples.
As seen from the results, Q5 had an extra copy of chromosome 21 than the standard cells, which was T21; Q3, Q4 missed one copy of chromosome X, which was 45×0 (Turner syndrome). The results were completely consistent with the traditional analysis results of chromosomal karyotype.
Although the examples of the invention have been described in great detail, a person skilled in the art will understand that, according to all of disclosed teachings, a variety of modification and replacement may be made of those details. The changes are covered by the scope of protection of the invention. The whole scope of the invention is defined by attached claims and its equivalent.
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/CN2010/001230 | 8/13/2010 | WO | 00 | 4/8/2013 |