This application claims priority under 35 U.S.C. § 119(a) to Taiwan Patent Application No. 107145537, filed on Dec. 18, 2018, the content of which is hereby incorporated by reference in its entirety.
The present invention relates to a method of genome detection, especially a method of Copy Number Variation (CNV) detection of cell chromosome.
Traditionally, while using chromosome microarray chip to detect copy number variation, it is necessary to add both the test sample DNA and the reference sample DNA, which are labelled by different fluorescent dyes, to the same chip. Conventional fluorescent dyes are Cy3 (Cyanine Dye 3) and Cy5 (Cyanine Dye 5), which, after excitation, emit green light and red light, respectively. Then, the sample DNA is denatured into single strand DNA, followed by hybridization with the probes on a chromosome microarray chip and the comparison between fluorescent signals to identify whether the chromosomes of the test sample contain a gain or loss in a specific chromosomal region or not. Because a reference sample is required when detecting CNV in a traditional manner, it takes doubled reagents and human resources for every copy number variation detection, which results in a burden of detection.
Due to the high cost of traditional CNV detection methods in view of related reagents and human resources, a novel method with reduced cost for CNV detection is required.
In order to overcome the shortcomings of the existing technology, the objective of the present invention is to spare the reference sample DNA and the related reagents. Besides, due to the lower standard deviation and the more convergent data, the present invention detects the CNV in a clearer manner.
To achieve the above objective, the present invention provides a method for detecting copy number variation, comprising (A) providing at least three test samples; (B) purifying nucleic acid from each test sample to obtain a respective nucleic acid sample for each test sample; (C) dividing all the nucleic acid samples into groups, wherein each group consists of two said nucleic acid samples, to obtain nucleic acid sample groups; (D) conducting whole genome amplification for each nucleic acid sample in the nucleic acid sample groups to obtain amplified nucleic acid samples in groups; (E) labelling one amplified nucleic acid sample in each group with a first fluorescent dye to obtain a first-fluorescent-dye-labelled amplified nucleic acid sample for each group, and labelling the other amplified nucleic acid sample in each group with a second fluorescent dye to obtain a second-fluorescent-dye-labelled amplified nucleic acid sample for each group; (F) mixing the first-fluorescent-dye-labelled amplified nucleic acid sample in each group with the second-fluorescent-dye-labelled amplified nucleic acid sample in said group to obtain a mixture, and conducting hybridization with the mixture on a chip that contains a set of human genome probes to obtain two signal data sets in one group for each chip, wherein each group of signal data sets consists of two signal data sets, and each of the signal data sets consists of signal data of all the probes against a labelled and amplified nucleic acid sample from each test sample; (G) analyzing the signal data sets in one group for each chip via locally weighted scatterplot smoothing (Lowess) to obtain two Lowess-analyzed signal data sets; (H) calibrating the signal data in the Lowess-analyzed signal data sets for the probes arranged in the order of genomic coordinates in view of corresponding probe values in a probe values set for calibration to obtain calibrated results, said probe values set for calibration is generated by: (i) using the signal data sets in groups derived from test samples in the same sample batch as the test sample of interest or in a different sample batch to (ii) obtain a probe values set for calibration via calculation in view of at least three Lowess-analyzed signal data sets, wherein said probe values set for calibration is a collection of probe values for calibration for all the probes; (I) analyzing the calibrated results to obtain a CNV result of the test sample of interest.
The present invention generates a probe value for calibration for each probe in view of at least three Lowess-analyzed signal data sets as a reference comparison value, and completes the CNV detection after calibrating the signal data of the test sample of interest. As compared to the traditional assays for detecting CNV, the present invention does not require a reference sample for each test sample of interest and thus reduces related reagents and human resources required. The present invention is advantageous to high-throughput CNV detection, because of the lower cost and economic benefit.
Preferably, the calculation in step (H)(ii) comprises: adjusting all of the at least three Lowess-analyzed signal data sets by mean centering based on the mean value of all the signal data of all the probes for Chromosome 1 to Chromosome 22, and calculating a median signal value for each probe on the chip based on the at least three Lowess-analyzed and mean center-adjusted signal data sets, wherein the median signal value for each probe on the chip is the probe value for calibration for each probe. The Lowess analysis reduces the cross interference between the two fluorescent dyes. The adjustment of mean centering reduces the signal intensity difference due to the difference in the amounts of nucleic acid in the hybridization as well as the difference in labelling efficiencies of the fluorescent dyes. Adopting the median of the signal data sets of the test samples for each probe excludes the outlier of signal data sets among the test samples resulting from experimental errors or deterioration of certain test samples. Such procedure generates the probe values set for calibration equivalent to the reference sample in the traditional CNV detection method.
Preferably, the calibration in step (H) comprises using the probe values set for calibration generated from the steps (i) and (ii) to conduct the calculation as follows:
log2 (each probe signal data in the Lowess-analyzed signal data set of the test sample of interest/the corresponding probe value in the probe values set for calibration)
to obtain the log2 ratio for each probe for the test sample of interest.
Preferably, after obtaining the log2 ratio for each probe of the test sample of interest, the calibration in step (H) further comprises the following steps to obtain the calibrated results: adjusting the log2 ratios for all the probes by zeroing the median of the ratios (that is to say, calculating the log2 ratios for all the probes on the chip, and calculating the median based on all the log2 ratios obtained, for example, if the median is −0.1, subtract −0.1 from all log2 ratios such that the median of all obtained log2 ratios becomes 0), calculating the median and standard deviation for each probe based on the median-zeroing-adjusted log2 ratios of at least three consecutive probes arranged in the order of genomic coordinates. Calculating the medians based on the data sets of consecutive probes excludes the signal outlier within a sample resulting from experimental errors or a single invalid probe. Then, calculate the calibrated result as follows:
the median±the standard deviation for the corresponding probe of the test sample of interest×a coefficient.
If the median for a probe based on the median-zeroing-adjusted log2 ratios is a positive value, calculate the result as:
the median−the standard deviation for the corresponding probe of the test sample of interest×a coefficient.
On the other hand, if the median for a probe based on the median-zeroing-adjusted log2 ratios is a negative value, calculate the result as:
the median+the standard deviation for the corresponding probe of the test sample of interest×a coefficient.
In such manner, the deviated signals are converged in view of the standard deviation.
More preferably, the coefficient ranges from 0 to 1. Said coefficient adjusts the convergence level of the calibrated results as well as the background signal based on the standard deviation of the whole signal data sets and/or a standard sample of chromosome abnormality (for example, using the sample from Coriell Institute or the abnormality sample detected by a traditional CNV detection method) to highlight the gains or losses of fragments. More preferably, the coefficient ranges from 0.1 to 0.3. When the coefficient ranges from 0.3 to 0.5, the convergence level of the signal data of majority probes would be the highest (the standard deviation of the signal data sets is the lowest). When the coefficient ranges from 0 to 0.2, the background signal (not the signal of standard sample of chromosome abnormality supposed to be) would be the lowest.
Preferably, the analysis in step (I) is conducted by means including Circular binary segmentation (CBS), BioHMM, Forward-Backward Fragment-Annealing Segmentation or Wavelet smoothing.
Preferably, the first fluorescent dye in step (E) is Cy3 (Cyanine Dye 3) and the second fluorescent dye in step (E) is Cy5 (Cyanine Dye 5).
The term “zeroing the median” or “median-zeroing” used herein means calculating the median of all statistics samples and subtracting said median from each statistics sample value, so that the median value of all statistics samples becomes 0.
In the following, the specific implantation of the CNV detection method of the present invention is explained through an embodiment. A person skilled in the art can easily understand the benefit and the effect of the present invention via the present specification and make various modifications and variations without departing from the scope and spirit of the present invention in order to implement and exercise the content of the present invention.
First, as shown in step (S1) of
Then, as shown in step (S2) of
As shown in step (S3) of
Next, as shown in step (S4) of
After that, as shown in step (S5) of
Then, as shown in step (S6) of
Next, as shown in step (S7) of
After that, as shown in step (S8) of
(i) using the signal data sets in groups derived from test samples in the same sample batch as the test sample of interest or in a different sample batch to
(ii) obtain a probe values set for calibration via calculation in view of at least three Lowess-analyzed signal data sets, wherein said probe values set for calibration is the collection of probe values for calibration for all the probes. Specifically, all of the 20 Lowess-analyzed signal data sets for all the probes (including those against Chromosome X and Chromosome Y) are adjusted by mean centering based on the mean value of all the signal data of all the probes against Chromosome 1 to Chromosome 22 in order to reduce the data reading error among the chips. In such manner, 20 Lowess-analyzed and mean centered signal data for each probe are obtained in accordance with the 20 Lowess-analyzed and mean center-adjusted signal data sets. Then, calculate the median signal value for each probe on the CytoOneArray v2.23 chip based on the 20 Lowess-analyzed and mean center-adjusted signal data sets, wherein the median signal value for each probe on said chip is the probe value for calibration for each probe. The collection of probe values for calibration for all the probes is the probe values set for calibration. Next, calibrate the signal data in the Lowess-analyzed signal data sets for the probes in view of corresponding probe values in the probe values set for calibration. In the present embodiment, the test sample of interest is one of the 20 test samples and the Lowess-analyzed signal data sets for the test sample of interest used to be calibrated in view of the probe values set for calibration is one of the 20 Lowess-analyzed signal data sets derived from the 20 test samples. However, in other embodiments, if there are new samples other than the 20 Lowess-analyzed samples, the new samples can be calibrated in view of the probe values set for calibration generated from the 20 Lowess-analyzed signal data sets derived from the 20 test samples of the present embodiment after finishing step (S1) to step (S7). Specifically, the calibration in view of the probe values set for calibration is calculated as follows:
log2 (each probe signal data in the Lowess-analyzed signal data set of the test sample of interest/the corresponding probe value in the probe values set for calibration)
to obtain the log2 ratio for each probe for the test sample of interest. Then, the log2 ratios for all the probes for the test sample of interest are further adjusted by zeroing the median of the ratios. That is, after calculating the log2 ratio for each probe for the test sample, subtract the median, calculated based on the log2 ratios for the 32,816 probes, from the log2 ratio for each probe. In this manner, the median of said 32,816 probes becomes zero. Next, arrange all the probes in the order of genomic coordinates and calculate the median and standard deviation for each probe based on the median-zeroing-adjusted log2 ratios of 5 consecutive probes. In other words, calculate the median and standard deviation for probe 3 based on the ratios of probe 1 to probe 5; calculate the median and standard deviation for probe 4 based on the ratios of probe 2 to probe 6, and so on. After that, calculate as follows:
the median±the standard deviation for the corresponding probe of the test sample of interest×a coefficient
to obtain the calibrated result. If the median for a probe based on the median-zeroing-adjusted log2 ratios is a positive value, calculate the result as:
the median−the standard deviation for the corresponding probe of the test sample of interest×a coefficient.
On the other hand, if the median for a probe based on the median-zeroing-adjusted log2 ratios is a negative value, calculate the result as:
the median+the standard deviation for the corresponding probe of the test sample of interest×a coefficient.
In such manner, the deviated signals are converged in view of the standard deviation. Besides, the coefficient, which ranges from 0 to 1, adjusts the convergence level of the calibrated results based on the standard deviation of the whole signal data sets and a standard sample of chromosome abnormality to highlight the gains or losses of the fragment. In the present embodiment, the calibrated result for each probe is calculated with the coefficient being 0.2.
Next, plot the calibrated results on the Y-axis and the genomic coordinates on the X-axis for analysis (
The experimental procedure in Comparative Example 1 is similar to Embodiment 1. However, in step (S1), only a DNA sample purified from a test sample is used in combination with a reference DNA sample (human genomic DNA, human male, promega cat # G1521). Hybridization is conducted with the mixture of the Cy5-fluorescent-labelled amplified DNA sample from the test sample and the Cy3-fluorescent-labelled amplified reference DNA sample on the same chip (Phalanx Biotech Group) with 32,816 probes. After Lowess analysis of the signal data of the test sample and the signal data of the reference sample, calculate the log2 ratios as:
log2 (signal datum of probe 1 in the Lowess-analyzed signal data set of the test sample/signal datum of probe 1 in the Lowess-analyzed signal data set of the reference sample)
and so on. After obtaining the log2 ratios of all the 32,816 probes, each log2 ratio is adjusted by mean-zeroing. Next, plot the adjusted results on the Y-axis and the genomic coordinates on the X-axis (
The experimental procedure is similar to Embodiment 1. However, DNA samples purified from two test samples (test sample A and test sample B) are used in combination with a reference DNA sample (human genomic DNA, human male, promega cat # G1521) in step (S1). The mixture of Cy5-labelled amplified DNA sample A and Cy3-labelled amplified reference sample is used to carry out a hybrdization on one chip (Phalanx Biotech Group) with 32,816 probes; and the mixture of Cy5-labelled amplified DNA sample B and Cy3-labelled amplified reference sample is used to carry out another hybridization assay on another chip (Phalanx Biotech Group) with 32,816 probes. After said two chips (chip A and chip B) are separately analyzed via Lowess, calculate the log2 ratios as:
log2 (signal datum of probe 1 in the Lowess-analyzed signal data set of the test sample A/signal datum of probe 1 in the Lowess-analyzed signal data set of the reference sample on chip B)
and so on. After obtaining the log2 ratios of all the 32,816 probes, each log2 ratio is adjusted by mean-zeroing. Next, plot the adjusted results on the Y-axis and the genomic coordinates on the X-axis (
Comparing the result of the CNV detection method of the present invention (
In summary, the CNV detection method of the present invention saves a reference sample and related reagents and human resources for every test sample detection. The cost reducing effect is particularly obvious when conducting the CNV detections for a large sample size.
Number | Date | Country | Kind |
---|---|---|---|
107145537 | Dec 2018 | TW | national |