The present disclosure relates to the fields of biology, medicine and bioinformatics. Particularly, the present disclosure relates to peripheral red blood cell micronuclei DNA and its application in cancer detection.
Cancer is one of the main diseases threatening human health and life. It is reported that, in 2018, there were 18.1 million new cancer cases and 9.6 million cancer deaths worldwide. Nearly half of new cancer cases and more than half of cancer deaths occurred in Asia (Global Cancer Statistics 2018: GLOBOCAN Estimates of Incidence and Mortality Worldwide for 36 Cancers in 185 Countries. Bray Freddie et al., CA: A Cancer Journal for Clinicians. 2018). Despite decades of continuous exploration, progress has been made in the diagnosis and treatment of cancer, but there is still a huge demand for cancer detection, especially the screening, diagnosis, classification and staging of cancer.
Blood continuously circulates in the body, and the total blood of normal adults accounts for about 8% of the body weight of men and about 7.5% of the body weight of women. Peripheral blood samples are easy to collect, store and transport and have high stability (Dagur, P. K. and J. J. McCoy, Collection, Storage, and Preparation of Human Blood Cells. Curr Protoc Cytom, 2015. 73: p. 5.1.1-16).
Micronuclei is usually considered as a small nuclear structure formed when chromosomes or chromosome fragments are not incorporated into one of the daughter nuclei during cell division, which is usually a sign of genotoxicity events and chromosome instability. Generally, it is a small nuclear structure formed outside the main nucleus which is independent of the main nucleus, due to the incorrect repair or unrepaired DNA breakage, or lagging asymmetric chromosome or chromatid fragment caused by chromosome non-separation (Liu, S., et al., Nuclear envelope assembly defects link mitotic errors to chromothripsis. Nature, 2018. 561(7724): p. 551-555).
Up to now, there is no report on the micronuclei DNA isolated or purified from peripheral red blood cells, and there is no report on the use of peripheral red blood cell micronuclei DNA for cancer detection.
Generally, the present disclosure relates to micronuclei DNA isolated or purified from peripheral red blood cells, its extraction method, and its application in screening, diagnosis, typing and/or staging of diseases.
The first aspect of the present disclosure relates to micronuclei DNA isolated or purified from peripheral red blood cells.
In some embodiments, the micronuclei DNA isolated or purified from peripheral red blood cells does not contain or substantially does not contain nucleated cell genomic DNA.
In some embodiments, the peripheral blood is human peripheral blood. In a specific embodiment, the peripheral blood is fresh human peripheral blood.
In some embodiments, the micronuclei DNA is used for cancer detection, such as early screening, diagnosis, typing and/or staging of cancer. In some specific embodiments, the micronuclei DNA is used for diagnosis of pan-cancer patients, including but not limited to patients suffering from colorectal cancer (also referred to as “CRC” hereinafter), hepatocellular cancer (also referred to as “HCC” hereinafter) or lung cancer (also referred to as “LC” hereinafter).
In some embodiments, the micronuclei DNA is used for early screening, diagnosis, typing and/or staging of cervical cancer.
In some embodiments, the micronuclei DNA is used for early screening, diagnosis, typing and/or staging of cervical cancer, and the micronuclei DNA comprises a gene classifier shown in Table 2, 4 or 6.
In other embodiments, the micronuclei DNA is used for early screening, diagnosis, typing and/or staging of colorectal cancer.
In a further embodiment, the micronuclei DNA is used for early screening, diagnosis, typing and/or staging of colorectal cancer, and the micronuclei DNA comprises a gene classifier shown in Table 8 or 10.
In some further embodiments, the micronuclei DNA is used for early screening, diagnosis, typing and/or staging of hepatocellular cancer.
In some even further embodiments, the micronuclei DNA is used for early screening, diagnosis, typing and/or staging of lung cancer.
In some even further embodiments, the micronuclei DNA is used for discriminating power between each of the two cancer patient groups: CRC vs. HCC, LC vs. HCC, LC vs. CRC.
In some even further embodiments, the micronuclei DNA is used for the multiclass discrimination of different types of cancers. In a specific embodiment, the micronuclei DNA is used for the multiclass discrimination of HD (“health donors”), HCC, LC and CRC.
The second aspect of the present disclosure relates to a method for isolating or purifying micronuclei DNA from peripheral red blood cells, which comprises the following steps:
a) providing peripheral blood samples;
b) isolating mononuclear cells and red blood cells from peripheral blood samples;
c) collecting red blood cells;
d) treating collected red blood cells with a red blood cell lysis buffer; and
e) extracting micronuclei DNA from the lysed red blood cells.
In a specific embodiment, the collected red blood cells are subjected to two or more sequential filtrations, e.g., filtrations by cell strainers, e.g., filtrations by 10 μm cell strainers.
In some embodiments, the red blood cell lysis buffer specifically lyses red blood cells by changing the osmotic pressure of cell suspension, but does not lyse nucleated cells.
In some embodiments, the red blood cell lysis buffer comprises NH4Cl, NaHCO3, EDTA or a combination thereof.
In some embodiments, micronuclei DNA is extracted from the lysed red blood cells by a DNA extraction reagent. In certain embodiments, the DNA extraction reagent comprises a protease, such as protease K. In certain specific embodiments, the DNA extraction reagent comprises protease K and EDTA.
In some embodiments, before step b), a step of diluting the peripheral blood sample is further included, for example, diluting with phosphate buffer solution in equal volume.
In some embodiments, in step b), the peripheral blood sample is subjected to density gradient centrifugation, such as Ficoll density gradient centrifugation, to obtain a mononuclear cell layer and a red blood cell layer.
A third aspect of the present disclosure relates to a method for constructing a gene classifier for cancer detection through peripheral red blood cell micronuclei DNA, which comprises:
a) providing more than one class, wherein each class represents a group of subjects with common characteristics;
b) isolating or purifying peripheral red blood cell micronuclei DNA from peripheral red blood cells of each subject of each class;
c) sequencing the whole genome of the peripheral red blood cell micronuclei DNA to obtain fragment sequence information of the micronuclei DNA;
d) comparing the fragment sequence information of micronuclei DNA from peripheral red blood cells in different classes of subjects;
e) training the characteristic DNA fragment set for specific cancers according to the differences in the distribution of fragment the sequence information of micronuclei DNA in peripheral red blood cells of different classes of subjects, thus obtaining a gene classifier for specific cancer detection.
In certain embodiments, the different classes are cancer subjects and non-cancer subjects for the same cancer.
In certain embodiments, the different classes are subjects with different types of the same cancer.
In certain embodiments, the different classes are subjects at different stages of the same cancer type.
The fourth aspect of the present disclosure relates to a gene classifier for cancer detection, which is constructed by peripheral red blood cell micronuclei DNA.
In certain embodiments, the gene classifier comprises the genes shown in Table 2, 4, 6, 8 or 10.
A fifth aspect of the present disclosure relates to a method of cancer detection for a test subject, comprising:
a) extracting micronuclei DNA in peripheral red blood cells of the test subject, wherein the extract does not contain or substantially does not contain nucleated cell genomic DNA;
b) sequencing the micronuclei DNA and sample-matched genomic DNA by whole genome sequencing to obtain signature of the micronuclei DNA from red blood cells in specific genomic elements or different bin size for the test subject;
c) comparing the sample-matched genomic DNA and micronuclei DNA in red blood cells or micronuclei DNA from different types of samples in step b) with the whole genome analysis, so as to classify the micronuclei DNA from genomic DNA and evaluate the difference of micronuclei DNA signature from types of samples;
d) comparing the signature information of the micronuclei DNA from different classes of cancer patients or healthy donors obtained in step b) with the gene classifier or other deep neural network classifier for cancer detection of the present disclosure, so as to classify the test subjects into one or more of the classes.
A sixth aspect of the present disclosure relates to a system for cancer detection of a test subject, which comprises a comparison means for comparing peripheral red blood cell micronuclei DNA from the test subject with the gene classifier of the present disclosure.
A seventh aspect of the present disclosure relates to the use of an agent for analyzing micronuclei DNA of peripheral red blood cells in the preparation of a detection device or a detection kit for cancer screening, diagnosing, typing and/or staging.
In some specific embodiments, the screening or diagnosis is early screening or diagnosis.
The eighth aspect of the present disclosure relates to peripheral red blood cell micronuclei DNA for use in cancer detection.
The ninth aspect of the present disclosure relates to a method for isolating peripheral red blood cells.
The tenth aspect of the present disclosure relates to the use of peripheral red blood cells in cancer detection.
The above content is summary in general, so it includes simplification, generalization and omission of details when necessary. Therefore, those skilled in the art will recognize that this general summary is merely illustrative and is not intended to be limiting in any way. Other aspects, features and advantages of the methods, compositions and/or devices and/or other subjects described herein will become apparent under the teachings herein. A summary is provided to simplify the introduction of some selected concepts, which will be further described in the following detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used as an auxiliary means to determine the scope of the claimed subject matter. In addition, the contents of all references, patents and published patent applications cited throughout this application are incorporated herein by reference in their entirety.
The inventors extracted micronuclei DNA from peripheral red blood cells for the first time and performed high-throughput sequencing on the extracted micronuclei DNA. Through bioinformatics analysis, erythrocyte micronuclei DNA has been successfully used in cancer screening, diagnosis, risk ranking, typing and staging, which has important guiding significance for cancer prevention, treatment and prognosis.
The invention has achieved superior technical effects in at least the following aspects.
Abundancy in Sample Sources
According to the invention, peripheral blood is used as a sample source, and the source is abundant, stable, and easy to obtain, collect, store and transport.
Effectiveness in the Isolation of Micronuclei DNA from Red Blood Cells
By the method disclosed in the present disclosure, micronuclei DNA in red blood cells can be effectively isolated from human peripheral blood. It has not been reported in the art that micronuclei DNA in red blood cells can be effectively isolated from human peripheral blood.
Simple and Fast Operation
According to the present disclosure, only a small amount (for example, only 1 ml) of peripheral blood needs to be collected from the subject, which may relieve the psychological pressure of the subject. Particularly, for the detection of cervical cancer, there is no need to collect cervical exfoliated cells of the subjects, which is easy to operate and can effectively reduce the psychological pressure of the subjects.
In addition, by high-throughput sequencing, micronuclei DNA can be quickly sequenced to obtain genetic information.
High Sensitivity and Specificity of Cancer Detection
Using micronuclei DNA obtained from peripheral red blood cells, cancer can be detected with extremely high sensitivity and specificity by the method of the present disclosure.
The present invention will be more apparent to those skilled in the art through the specific embodiments and examples described in the present disclosure, combined with the following drawings.
While the present invention can be embodied in many different manners, specific illustrative embodiments thereof which demonstrate the principles of the invention are disclosed herein. It should be emphasized that the present invention is not limited to the specific embodiments illustrated. In addition, any chapter titles used herein are for organizational purposes only and should not be interpreted as limiting the described subject matter.
Unless otherwise defined herein, scientific and technical terms used in connection with the present invention will have the meanings commonly understood by those of ordinary skill in the art. In addition, unless the context requires otherwise, the singular term shall include the plural, and the plural term shall include the singular. More specifically, as used in this specification and the appended claims, unless the context clearly indicates otherwise, the singular forms “a,” “an” and “the” include plural referents. Therefore, for example, reference to “a protein” may include a variety of proteins; and reference to “a cell” includes a mixture of cells, etc. In this application, unless otherwise stated, the use of the expression “or” refers to “and/or.” In addition, the use of the term “comprising” and other forms such as “comprise” and “comprises” is not limiting. Furthermore, the ranges provided in the description and the appended claims include all values between endpoints and breakpoints.
Generally, the terms related to cell and tissue culture, molecular biology, immunology, microbiology, genetics, and protein as well as nucleic acid chemistry and hybridization described herein, and their techniques are well-known and commonly used in the art. Unless otherwise stated, the methods and techniques of the present invention are generally carried out according to conventional methods known in the art, as described in various general and more specific references cited and discussed throughout this specification. See, for example, Abbas et al., Cellular and Molecular Immunology, 6th ed., W. B. Saunders Company (2010); Sambrook J. & Russell D. Molecular Cloning: A Laboratory Manual, 3rd ed., Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y. (2000); Ausubel et al., Short Protocols in Molecular Biology: A Compendium of Methods from Current Protocols in Molecular Biology, Wiley, John & Sons, Inc. (2002); Harlow and Lane Using Antibodies: A Laboratory Manual, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y. (1998); and Coligan et al., Short Protocols in Protein Science, Wiley, John & Sons, Inc. (2003). Terms related to analytical chemistry, synthetic organic chemistry and drugs and pharmaceutical chemistry described herein, as well as laboratory procedures and techniques, are well-known and commonly used terms in the field. In addition, any chapter titles used herein are for organizational purposes only and are not to be interpreted as limiting the described subject matter.
In order to better understand the present invention, definitions and interpretations of related terms are provided as follows.
In the context of the present disclosure, the term “DNA” refers to deoxyribonucleic acid.
In the context of the present disclosure, the term “micronuclei” is intended to refer to a small nuclear structure containing DNA in a specific cell other than the nucleus. There is no nucleus in peripheral red blood cells, so there is only micronuclei structure.
In the context of the present disclosure, the term “cervical cells” include cells located at any part of the cervix and cells detached from any part of the cervix that can be diseased. In one embodiment, cervical cells are cells isolated from tissues exfoliated from that inner wall of the cervix in a natural or artificial way, also called “cervical exfoliated cells.”
In the context of the present disclosure, a “subject” refers to a subject to be tested. In certain embodiments, the “subject” is a human subject.
In the context of the present disclosure, a “patient” refers to a subject suffering from a certain disease, such as cervical cancer.
In the context of the present disclosure, “cancer” is a general term for malignant tumors. “Tumor” refers to the abnormal proliferation of cells in local tissues under the influence of various tumorigenic factors.
In the context of the present disclosure, a “cancer subject” or a “cancer patient” are used interchangeably, referring to a subject suffering from a certain cancer, such as cervical cancer or colorectal cancer.
In the context of the present disclosure, a “non-cancer subject” refers to a subject who does not suffer a certain cancer. For example, “non-cervical cancer subject” refers to a subject without cervical cancer. In the specific embodiments and examples of the present disclosure, a “non-cancer subject” is also referred to as a “healthy individual,” and likewise, it refers to that the individual or subject does not have such cancer.
In the context of the present disclosure, the term “cancer detection” refers to detecting the condition of a subject suffering from cancer. “Detecting” includes but is not limited to screening, diagnosis, typing and staging. “Screening” refers to preliminarily detecting whether there is cancer or the risk of cancer. “Diagnosis” or “medical diagnosis” refers to assessing the patient's condition from a medical point of view. “Typing” refers to further dividing the same kind of cancer into specific subtypes. For example, cervical cancer can be classified into cervical squamous cell carcinoma and cervical adenocarcinoma. “Staging” refers to predicting, assessing or dividing the stage of a cancer. For example, cervical cancer (squamous cell carcinoma) can be divided into three stages: low differentiation, low-medium differentiation, medium differentiation and high differentiation.
In the context of the present disclosure, the term “nucleated cell” refers to a cell in which a nucleus exists. For peripheral blood, the term “nucleated cell” is the general term for a granulocyte, a monocyte and a lymphocyte.
In the context of the present disclosure, the term “genome” refers to the sum of all genetic information in a cell, especially a complete set of haploid genetic material in a cell.
In the context of the present disclosure, the term “nucleated cell genomic DNA,” “nucleated cell nucleus genome,” or “nucleated cell nucleus genomic DNA” are used interchangeably, meaning all genetic information contained in nuclear chromosomes.
In the context of the present disclosure, the term “gene classifier” or “classifier” can be used interchangeably, referring to a group of DNA fragments or a group of genes in genomic DNA or micronuclei DNA that are specific for a specific disease.
In the context of the present disclosure, the term “DNA fragment library” or “DNA library” can be used interchangeably, which refers to double-stranded DNA obtained by completing the ends of a sample DNA fragment, adding a phosphate group at the 5′ end, adding an adenine nucleotide (A) at the 3′ end, and connecting an adapter and a sample barcode at both ends.
In the context of the present disclosure, the term “micronuclei DNA from red blood cells” and “erythrocyte micronuclei DNA” are used interchangeably, and is intends to refer micronuclei DNA isolated from red blood cells. In a specific embodiment, the red blood cells are peripheral red blood cells. Accordingly, in the context of the present disclosure, “peripheral red blood cells micronuclei DNA, “peripheral erythrocyte micronuclei DNA”, and “micronuclei DNA from peripheral red blood cells” are used interchangeably. In a specific embodiment, micronuclei DNA is isolated or purified from peripheral red blood cells.
In the context of the present disclosure, the term “high-throughput sequencing” (also known as Next-Generation Sequencing (NGS)) refers to DNA sequencing technology that simultaneously sequences thousands (even millions) of DNA templates in a single chemical reaction.
In the context of the present disclosure, the term “reads” refers to the sequence of a sample DNA fragment in a DNA fragment library measured by high-throughput sequencing, with the sequence linked in the library preparation stage removed.
In the context of the present disclosure, the term “coverage depth” refers to an effective nucleic acid sequencing fragment for base recognition in a specific region, also known as the number of reads.
In the context of the present disclosure, the term “sequence alignment” refers to the alignment of reads to a reference genome (e.g., a human reference genome) by the principle of sequence identity.
In the context of the present disclosure, the term “reference genome” is the whole genome sequence of an organism of the same species as the sample DNA, which can be obtained from a public database. In one embodiment, the reference genome is a human reference genome. The public database is not particularly limited. In some embodiments, the public database is GenBank database of NCBI.
In the context of the present disclosure, the term “sensitivity” refers to the percentage of samples with positive tests in the total number of patients. In medical diagnosis, sensitivity can be expressed by the following formula, reflecting the ratio of correctly diagnosing patients:
Sensitivity=true positive number/(true positive number+false negative number) ×100%.
In short, if “true positive,” “false positive,” “true negative” and “false negative” are represented by “a”, “b”, “c” and “d”, respectively, the relationship among sensitivity, specificity, missed diagnosis rate, misdiagnosis rate and accuracy can be shown as follows.
Among the cases with positive screening results by this method, “true positive (a)” refers to the number of cases diagnosed as diseased by pathology, and the result of a method is also positive; “false positive (b)” refers to the number of cases diagnosed as non-diseased by pathology, and the result of a method is positive; “false negative (c)” refers to the number of cases diagnosed as diseased by pathology and the result of a method is negative; and “true negative (d)” refers to the number of cases diagnosed as non-diseased by pathology and the result of a method is negative.
Sensitivity(sen)=a/(a+c);
Specificity(sep)=d/(b+d);
Missed diagnosis rate=c/(a+c);
Misdiagnosis rate=b/(b+d);
Accuracy=(a+d)/(a+b+c+d)
As known by those skilled in the art, the higher the value of sensitivity and specificity, the better; and the lower the missed diagnosis rate and misdiagnosis rate, the better.
In the context of the present disclosure, the term “specificity” refers to the percentage of samples with negative tests in healthy people in the total number of healthy people. In medical diagnosis, “specificity” can be expressed by the following formula, which reflects the ratio of correct diagnosis of non-patients:
Specificity=true negative number/(true negative number+false positive number)×100%.
In the context of the present disclosure, the term “missed diagnosis rate,” also known as false negative rate, refers to the percentage of patients who are actually diseased when screening or diagnosing a disease in a population, but are determined as non-patients according to the diagnostic criteria. In medical diagnosis, the missed diagnosis rate can be expressed by the following formula:
Missed diagnosis rate=false negative number/(true positive number+false negative number)×100%.
In the context of the present disclosure, the term “misdiagnosis rate,” also known as false positive rate, refers to the percentage of subjects who do not actually suffer from a disease when screening or diagnosing a disease in a population, but are determined as patients with such a disease according to the diagnostic criteria. In medical diagnosis, the misdiagnosis rate can be expressed by the following formula:
Misdiagnosis rate=false positive number/(true negative number+false positive number)×100%.
In the context of the present disclosure, the expression “about” refers to that the deviation does not exceed plus or minus 10% of a specified value or range.
Peripheral Blood
In the present disclosure, “peripheral blood” refers to blood released into the circulatory system by hematopoietic organs and participating in the circulation. “Peripheral blood” is different from immature blood cells in hematopoietic organs such as bone marrow. In the present disclosure, peripheral blood can be collected by reference to known methods in the art such as venous blood collection, fingertip blood collection or earlobe blood collection.
Generally, peripheral blood consists of plasma and blood cells, wherein the blood cells further include white blood cells (also called “leukocytes”), red blood cells and platelets. By volume, red blood cells account for about 45%, plasma accounts for about 54.3%, and white blood cells account for about 0.7% of the total peripheral blood. Leukocytes are nucleated cells, which are the general term of granulocytes, monocytes and lymphocytes. Normal red blood cells have no nucleus, no genomic DNA, and are nuclear-free cells.
In the context of the present disclosure, a “peripheral blood mononuclear cell” (PBMC) refers to a cell with a single nucleus in peripheral blood, including monocytes and lymphocytes.
Separation of Peripheral Blood Cells
The separation methods of peripheral blood cells include natural sedimentation, differential sedimentation, sodium chloride separation, density gradient centrifugation and so on.
Different components of peripheral blood can be separated by using the density difference between different components of peripheral blood. For example, different components of peripheral blood can be separated by Ficoll density gradient centrifugation or Percoll method.
In a specific embodiment of the present disclosure, peripheral blood is separated by Ficoll density gradient centrifugation. Specifically, it is carried out in the following ways:
1. Peripheral Blood Collection and Sample Preparation
Peripheral blood is obtained from a subject and diluted appropriately. For example, it can be diluted by adding phosphate buffer solution (PBS). In certain embodiments, about 1-5 ml of fresh peripheral blood is obtained from a subject and diluted by adding an equal volume of PBS to obtain a diluted blood sample. In a specific embodiment, 1 ml of fresh peripheral blood is obtained from a subject, and 1×PBS is added for equal volume dilution to obtain diluted peripheral blood samples.
2. Density Gradient Centrifugation of Peripheral Blood Samples
Initially, an appropriate amount of Ficoll density gradient centrifuge is added into the density gradient centrifuge tube, and then the diluted peripheral blood sample as described above is add thereto. In certain embodiments, an appropriate amount of Ficoll density gradient centrifuge is added to the density gradient centrifuge tube in a ratio of the volume of peripheral blood collected from the subject to the volume of Ficoll density gradient centrifuge of about 1:3 to 1:10. For example, in a specific embodiment, 1 ml of fresh peripheral blood is obtained from a subject, and 5 ml of Ficoll density gradient centrifuge (Stemcell, Lymphoprep™ 07801) is added to the density gradient centrifuge tube.
Then, the diluted peripheral blood sample was slowly added onto the Ficoll density gradient centrifuge in the Ficoll density gradient centrifuge tube for density gradient centrifugation. Density gradient centrifugation can be carried out for about 10-15 minutes at about 15-25° C. and at about 1000-1500 g g. In a specific embodiment, the density gradient centrifugation is performed by 1200 g centrifugation at 18° C. for 15 minutes.
After density gradient centrifugation, it is divided into three layers: the upper layer is plasma, the middle layer is PBMC layer, and the bottom layer is RBC layer.
Collection of PBMC and RBC respectively. For example, the middle and upper layer liquid in the density gradient centrifuge tube is sucked by a suction means (such as a straw), and PBMC is separated and collected. An extraction means (such as a needle tube) is used to extract bottom red blood cells from the bottom of the density gradient centrifuge tube, and RBCs are separated and collected. In a specific embodiment, the bottom red blood cells are extracted from the bottom of the density gradient centrifuge tube by using a needle tube to a 1.5 ml centrifuge tube, with 1×PBS added up to a volume of 1 ml. Centrifugation is conducted at room temperature for 10 min at 300 g, and red blood cells at the bottom of the tube are collected. The collected RBCs were then subject to two sequential filtrations by 10 μm cell strainers to remove potential contamination of nucleated cells.
Isolation of Micronuclei DNA from Peripheral Red Blood Cells
According to the inventor's knowledge, there is no report on isolating micronuclei DNA from human peripheral red blood cells in the prior art. Unexpectedly, the inventors found that the micronuclei DNA of peripheral red blood cells can be separated simply and efficiently by the method of the present disclosure. In certain embodiments, the collected red blood cells are first lysed and then centrifuged. Thereafter, micronuclei DNA was extracted from the supernatant after centrifugation. In certain embodiments of the present disclosure, “peripheral red blood cell micronuclei DNA” includes all DNA present in peripheral red blood cells. In a specific embodiment of the present disclosure, the isolated “peripheral red blood cell Micronuclei DNA” does not contain nucleated cell genomic DNA. In another specific embodiment of the present disclosure, isolated “peripheral red blood cell micronuclei DNA” substantially does not contain nucleated cell genomic DNA.
The inventors also unexpectedly found that micronuclei DNA isolated from peripheral red blood cells can be used to detect various cancers.
Lysis of Red Blood Cells
In some embodiments, the collected red blood cells are lysed by adding a red blood cell lysis buffer. Erythrocyte lysis buffer can lyse erythrocytes while hardly damaging nucleated cells (such as PBMC). It can lyse erythrocytes effectively by slightly changing the osmotic pressure of cell suspension without affecting all nucleated cells. The red blood cell lysis buffer commonly used in the art contains NH4Cl, NaHCO3, EDTA or other combinations, for example, NH4Cl, NaHCO3 and EDTA. For example, every 1000 ml of red blood cell lysis buffer contains 8.3 g NH4Cl, 1.0 g NaHCO3, 1.8 ml of 5% EDTA and ultra-pure water.
The red blood cell lysis buffer can be, for example, a red blood cell lysis buffer (Biosharp, Cat No./ID: BL503B), a red blood cell lysis buffer (Solarbio, Cat No./ID: R1010) or a BD FACS Lysing Solution red blood cell lysis buffer (BD, Cat No./1 D: 349202). In a specific embodiment, 10 ml of red blood cell lysis buffer (Biosharp, Cat No./ID: BL503B) is added to the collected red blood cells, and the collected red blood cells are lysed for 20 minutes at room temperature in the dark.
Centrifugation
Thereafter, supernatant and precipitate (cell debris) are separated by centrifugation. In a specific embodiment, centrifugation is performed at 3000 g at room temperature for 10 minutes, and then the supernatant is taken.
Isolation of Micronuclei DNA
Then, micronuclei DNA is extracted from the supernatant. In certain embodiments, the DNA contained in the supernatant is pretreated by adding EDTA and protease K. EDTA is added in the digestion process with protease K to inhibit the influence of Mg2+-dependent nuclease. In a specific embodiment, the supernatant is incubated with 10 mm EDTA (Solarbio Cat No./ID: E1170), 200 ug/ul protease K (ProteinaseK, Ambion, Cat No./ID: AM2548) at 56° C. for 8 hours.
After incubation, commercial kits or reagents are used to extract micronuclei DNA. Examples of commercial kits include but are not limited to QIAamp DNA Blood Mini Kit, DNAzol reagent, PureLink™ Pro 96 Genomic DNA Purification Kit (Thermo, Cat No./ID: K182104A), blood genomic DNA extraction system (0.1-20 ml) (TIANGEND, Cat No./ID: P349), HiPure Blood DNA Midi Kit III(Magen, Cat No./ID: D3114). In a specific embodiment, erythrocyte micronuclei DNA is extracted using QIAamp DNA Blood Mini Kit (Qiagen, Cat No./ID: 51106).
Extraction of Genomic DNA from Peripheral Blood Mononuclear Cells
The genomic DNA of peripheral blood mononuclear cells can be extracted by commercial kits. In a specific embodiment, for peripheral blood mononuclear cell samples obtained after density gradient centrifugation, genomic DNA is extracted using QIAamp DNA Blood Mini Kit (Qiagen, Cat No./ID: 51106).
Whole Genome Amplification
Whole-genome amplification (WGA) is non-selective amplification of the whole genome sequence. Its main purpose is to maximize the amount of DNA on the basis of faithfully reflecting the whole genome, and to amplify the whole genome DNA of micro tissues and single cells without sequence bias.
Whole-genome amplification methods are mainly divided into the following types: first, amplification technology based on thermal cycles and PCR; second, amplification technology based on isothermal reaction and not based on PCR; and the third is MALBAC (Multiple Annealing and Looping-based Amplification Cycles). The WGA technology based on PCR includes degenerate oligonucleotide primer PCR (DOP-PCR), linker-adapter PCR (LA-PCR), interspersed repeat sequence PCR (IRS-PCR), tagged random primer PCR (T-PCR), primer extension preamplification PCR (PEP-PCR), among others. WGA based on isothermal reaction includes multiple displacement amplification (MDA), primase-based whole genome amplification (pWGA) and so on.
The methods of amplifying the whole genome DNA of a single cell mainly include MDA, MALBAC and DOP-PCR. These amplification methods can amplify pg-level or fg-level DNA in cells to μg-level which can satisfy sequencing.
Multiple Displacement Amplification (MDA)
Multiple displacement amplification (MDA) was first proposed by Dr. Lizardi of Yale University in 1998. This method is a constant temperature amplification method based on the principle of strand displacement amplification. Phage 129 DNA polymerase was used in multiple displacement amplification. PhageΦ29 DNA polymerase has a strong binding ability to DNA template, which can continuously amplify 100 Kb DNA template without dissociation from the template. At the same time, the enzyme has 3′-5′ exonuclease activity and low amplification error rate.
Multiple displacement amplification has the following advantages:
Commercial kits for MDA include REPLI-g series kits (Qiagen Inc), GenomiPhi series kits (GE Healthcare Inc), among others.
MALBAC (Multiple Annealing and Looping-Based Amplification Cycles)
MALBAC is different from non-linear or exponential amplification, but uses special primers to make the ends of amplicon complementary to each other. In this technique, the unique DNA polymerase with strand displacement activity is used for quasi-linear whole genome pre-amplification, and then exponential amplification is performed by PCR technology, which provides sufficient experimental materials for downstream analysis. In 2012, Science magazine published two articles related to this technology (C. Zong et al., Science 2012: 1622-1626; S. Lu et al., Science: 1627-1630).
MALBAC has the following advantages:
Commercial kits for MALBAC include MALBAC® single cell amplification kit from YiKon.
Degenerate Oligonucleotide Primer PCR(DOP-PCR)
The difference between DOP-PCR and conventional PCR is that it uses a single semi-degenerate primer and low renaturation temperature, has no species specificity, has no relation with the complexity of DNA, and can uniformly amplify the whole genome.
Commercial kits for DOP-PCR include PicoPlex series kits (Rubicon Genomics Inc), GenomePlex series kits (Sigma Aldrich Inc), SurePlex series kits (BlueGnome, which has been acquired by Illumina) and so on.
In the present disclosure, PBMC genomic DNA and RBC micronuclei DNA can be amplified by the whole genome amplification methods known in the art. In a specific embodiment, PBMC genomic DNA and RBC micronuclei DNA are amplified by MDA. Specifically, for PBMC genomic DNA and RBC micronuclei DNA extracted by QIAamp DNA Blood Mini Kit (Qiagen, Cat No./ID: 51106), MDA was performed by using REPLI-g Single Cell Kit (Qiagen, Cat No./ID: 150345), respectively. And the amplified DNA sample is obtained.
The REPLI-g Single Cell Kit adopts multiple displacement amplification (MDA) technology, which can uniformly amplify single cell or purified genomic DNA, and can cover all loci of genome. All buffers and reagents are produced through a strictly controlled process to avoid DNA contamination and ensure reliable results for each experiment.
Library Construction
A library is constructed by fragmenting the genomic DNA into short DNA molecules, then connecting the fragmented genomic DNA to universal adaptors, and then generating millions or even more single-molecule multi-copy PCR clone arrays.
In the present disclosure, any conventional method in the field can be used to fragment the amplified DNA and construct a DNA fragment library. For example, a commercially available kit can be used to fragment genomic DNA and construct a library of DNA fragments.
In certain embodiments, the process of fragmenting genomic DNA and constructing a DNA fragment library by using a kit may include:
(i) performing fragmentation on genomic DNA;
(ii) carrying out terminal modification on the obtained DNA fragments:
In a specific embodiment of the present disclosure, after MDA, the amplified DNA samples are subjected to secondary sequencing library construction using TruePrep DNA Library Prep Kit V2 for Illumina (Vazyme, TD503).
High Throughput Sequencing
In the present disclosure, as long as the high-throughput sequencing of the DNA fragment library can be realized, there is no special restriction on the sequencing method and apparatus adopted. In certain embodiments, the library of DNA fragments is high-throughput sequenced using a commercially available sequencer. For example, the high-throughput sequencing of DNA fragment library can be performed by using a sequencer from Illumina, a sequencer from Apply Biosystems (ABI), a sequencer from Roche, a sequencer from Helicos, or a sequencer from Complete Genomics.
In a specific embodiment, the genomic DNA of peripheral blood mononuclear cells and erythrocyte micronuclei DNA are sequenced by Novo-seq platform (NovaSeq 6000, from Novogene, Beijing), with 10×sequencing depth and 30 G data volume.
In the specific embodiment of the present disclosure, the original sequencing files for sequencing the erythrocyte micronuclei DNA and the genomic DNA of peripheral blood mononuclear cells are stored in FASTQ files. FASTQ is a standard text-based format to save biological sequences (usually nucleic acid sequences) and their sequencing quality information.
Bioinformatics Analysis
After high-throughput sequencing, bioinformatics analysis of the obtained sequencing results generally includes quality control, data comparison, post-alignment processing, among others.
In certain embodiments of the present disclosure, quality control is performed on the original sequencing files of erythrocyte micronuclei DNA, and the sequencing data passing the quality control is compared with the reference genome, and then post-processing is performed.
In a further embodiment of the present disclosure, quality control is performed on genomic DNA of peripheral blood mononuclear cells, and sequencing data passing the quality control is compared with a reference genome.
Quality Control
The sequencing data is quality controlled by data quality control software. The process of quality control includes adapter removal, filtering of low-quality reads, removal of low-quality 3′ and 5′ ends, removal of reads with more N, inspection of data quality, etc. Commonly used data software includes FastQC, Fastx_toolkit, Trimmomaic and so on.
As the most classic quality control software, FastQC can make quick statistics on gene information of high-throughput sequencing data and give corresponding chart reports. The software can be obtained at the following website: http://www.bioinformatics.babraham.ac.uk/projects/fastqc/.
In addition, Fastx_toolkit software can be obtained at the following website: http://hannonlab.cshl.edu/fastx_toolkit/; and Trimmomaic software can be obtained through the following web site: http://www.usadellab.org/cms/?page=trimmomatic.
In a specific embodiment of the present disclosure, the original sequencing files of erythrocyte micronuclei DNA and genomic DNA of peripheral blood mononuclear cells are subjected to adaptor removal by cutadapter software (Kong, Y., Btrim: a fast, lightweight adapter and quality trimming program for next-generation sequencing technologies. Genomics, 2011. 98(2): p. 152-3), and the quality control is carried out by FastQC software.
Data Comparison
After quality control, the data passed the quality control will be compared to the reference genome by a software. Sequencing data comparison software commonly used in this field includes BWA, Bowtie, Maq, Novoalign, etc., which can be obtained from the following website:
BMA: http://bio-bwa.sourceforge.net
Bowtie: http://bowtie-bio.sourceforge.net
Maq: http://maq.sourceforge.net
Novoalign: http://www.novocraft.com/products/novoalign/
In certain embodiments of the present disclosure, the sequencing data of erythrocyte micronuclei DNA and genomic DNA of peripheral blood mononuclear cells can be compared to reference genomes, such as human genomes, respectively, through data comparison software in the field. In a specific embodiment of the present disclosure, the sequencing data of erythrocyte micronuclei DNA and peripheral blood mononuclear cell genome DNA were compared to human genome (GenBank) by BWA software.
Post-Alignment Processing of Data
Post-alignment processing may include the following situations, such as removing duplicate reads, Indel local re-alignment, re-proofreading of base mass values, and so on. Whether or not to carry out post-alignment processing is determined according to actual needs. The commonly used post-alignment processing includes removing duplicate reads. Different reads aligned onto the same position of the reference genome may be considered as duplication due to quality problems, sequencing errors, alignment errors, alleles, among others.
In some embodiments of the present disclosure, post-alignment processing is performed by removing duplicate reads. In a specific embodiment of the present disclosure, improper alignment and repeated reads are removed by Picard software (Weisenfeld, N. I., et al., Direct determination of diploid genome sequences. Genome Res, 2017. 27(5): p. 757-767). Picard software can be obtained from the following website:
http://broadinstitute.github.io/picard/
Data Analysis
After data processing, the sequencing data obtained are analyzed.
Comparison and Counting of Reads
In certain embodiments of the present disclosure, whether there are significant differences in the fragmentation degree of DNA fragments in red blood cells of different types of subjects is compared. For example, the reads of sequencing fragments existing in micronuclei DNA of samples can be counted by software for reads counting (such as HTseq-count, featureCounts, BEDTools, Qualimap, Rsubread, GenomicRanges, etc.). Variance analysis (such as ANOVA test) is applied to judge whether there is a significant difference therebetween.
In certain specific embodiments of the present disclosure, the reads of small sequencing fragments existing in erythrocytes micronuclei DNA are counted corresponding to the gene regions of human genome by HTseq-count software (Anders, S., P. T. Pyl and W. Huber, HTSeq-a Python framework to work with high-throughput sequencing data. Bioinformatics, 2015. 31(2): p. 166-9).
In a specific embodiment of the present disclosure, one class is peripheral red blood cell micronuclei DNA from cervical cancer patients and the other class is peripheral red blood cell micronuclei DNA from healthy individuals.
In another specific embodiment of the present disclosure, one class is peripheral red blood cell micronuclei DNA from patients with cervical adenocarcinoma and the other class is peripheral red blood cell micronuclei DNA from cervical squamous cell carcinoma.
In another specific embodiment of the present disclosure, one class is peripheral red blood cell micronuclei DNA from medium-differentiated patients in cervical squamous cell carcinoma, and the other class is peripheral red blood cell micronuclei DNA from low-medium differentiated or low differentiated patients in cervical squamous cell carcinoma.
In a further embodiment of the present disclosure, one class is peripheral red blood cell micronuclei DNA from colorectal cancer patients and the other class is peripheral red blood cell micronuclei DNA from healthy individuals.
In a further embodiment of the present disclosure, one class is peripheral red blood cell micronuclei DNA from colon cancer patients and the other class is peripheral red blood cell micronuclei DNA from rectal cancer.
Data Classification and Classifier Construction
Classification is an important method of data mining. Based on the existing data, a classification function is learned or a classification model is constructed, which also called a classifier. Classifiers can map data records in the database to a given class, which can be applied to data prediction. Classification methods include decision tree, selection tree, logistic regression, Naive Bayes and deep neural network.
In certain embodiments of the present disclosure, genes with significant differences are selected as features, and a classifier is constructed for known classified samples based on support vector machine (SVM) to predict the specific disease classification of unknown samples (Huang, M. W., et al., SVM and SVM Ensembles in Breast Cancer Prediction. PLoS One, 2017. 12(1): p. e0161501). In some specific embodiments of the present disclosure, through the hierarchical clustering-based support vector machine algorithm, a classifier composed of a group of genes corresponding to DNA fragments is constructed. In a specific embodiment of the present disclosure, two types of samples are randomly clustered according to Pearson correlation to construct a classifier composed of a group of genes.
In certain embodiments of the present disclosure, specific regions of erythrocyte micronuclei DNA are further selected before constructing the classifier.
In certain embodiments of the present disclosure, macs2 software is used to search for the fragments of erythrocyte micronuclei DNA which are mainly enriched in a specific region relative to the genome DNA sequencing reads of peripheral blood mononuclear cells, and to remove the peak areas which are more enriched by peripheral blood mononuclear cells relative to peripheral blood mononuclear cells per se as a whole. Compared with peripheral blood mononuclear cells, genome information annotation and pathway enrichment (KEGG, gene ontology) were performed on red blood cell-specific fragments (Chen, L., et al., Gene Ontology and KEGG Pathway Enrichment Analysis of a Drug Target-Based Classification System. PLoS One, 2015. 10(5): p. e0126492.).
Application of Classifier
On the basis of the classifier constructed in the present disclosure, the present invention can be widely used in biological research, medical research, clinical diagnosis and other fields by isolating peripheral blood micronuclei DNA from subjects in the manner described in the present disclosure and performing biological analysis. The invention has important value in scientific research and medical fields.
The inventors have successfully isolated erythrocyte micronuclei DNA from peripheral blood and applied it to cancer detection for the first time, including screening, diagnosis, typing and staging of cancer.
Among cancers, cervical cancer and colorectal cancer account for a large proportion of new cases and fatal cases.
Cervical Cancer
Cervical cancer is one of the most common gynecological tumors, and its incidence is increasing year by year. According to the statistics of the World Health Organization (WHO), there are an average of 530,000 new cases of cervical cancer every year, and about 250,000 women die from cervical cancer, among which developing countries account for 80% of the global total cases (Schiffman, M., et al., Carcinogenic human papillomavirus infection. Nat Rev Dis Primers, 2016. 2: p. 16086). In China, there are about 140,000 new cases of cervical cancer and about 37,000 deaths every year. Therefore, early screening and clinical staging of cervical cancer patients are of great significance to the treatment of cervical cancer.
Pathogenic Factors of Cervical Cancer
Pathogenic factors of cervical cancer include but are not limited to the following aspects:
Virus Infection
HPV infection is the main pathogenic factor of cervical cancer. There are many subtypes of HPV, about 40 of which are related to reproductive tract infection. Continuous infection by high-risk HPV subtypes (subtypes 16, 18, 31, 33, 35, 39, 45, 51, 52, 56, 58, 59 and 69), especially HPV subtypes 16 and 18 can cause cervical cancer.
Sexual Behavior and Number of Deliveries
Other Biological Factors
Chlamydia trachomatis, herpes simplex virus type II, trichomoniasis and other pathogens have synergistic effects in the pathogenesis of cervical cancer caused by high-risk HPV infection.
Other Behavioral Factors
Smoking as a synergistic factor of HPV infection can increase the risk of cervical cancer. In addition, malnutrition and poor sanitation can also affect the occurrence of diseases.
Early Screening Method for Cervical Cancer in the Prior Art
At present, the early screening of cervical cancer is mainly carried out by virus detection and cytological detection. Among them, virus detection is mainly human papillomavirus (HPV) detection, while cytological detection mainly includes Pap smear and TCT detection.
(1) HPV Detection
HPV can cause squamous epithelial proliferation of human skin mucosa. According to its pathogenicity, it can be divided into low-risk type and high-risk type. Low risk infection can cause common warts, genital warts (condyloma acuminatum) and other symptoms. Persistent high-risk human papillomavirus (HPV) infection is the main cause of cervical cancer. Molecular epidemiological analysis shows that some types of human papillomavirus (HPV) are the main causes of invasive cervical cancer and cervical intraepithelial neoplasia. At present, more than 80 types of HPV have been found, and about 40 of them can infect reproductive tract [Schiffman, M., et al., Carcinogenic human papillomavirus infection. Nat Rev Dis Primers, 2016. 2: p. 16086; Munoz, N., et al., Epidemiologic classification of human papillomavirus types associated with cervical cancer. N Engl J Med, 2003. 348(6): p. 518-27.]. Among them, high-risk HPV (such as HPV 16 and HPV 18) is often associated with invasive cervical cancer. The detection methods of high-risk HPV mainly include morphological observation, immunohistochemistry, dot-blot hybridization, in situ blotting hybridization, PCR/RFLP, PCR/Southern and so on.
Screening cervical cancer by HPV virus detection can identify more than 95% of precancerous cervical lesions, but it is mainly aimed at patients with cervical intraepithelial neoplasia (CIN) grade 2 or more, while the specificity for CIN2 negative patients is relatively low, because most women have spontaneous clearance after transient HPV infection, and hardly progress to C1N3 and cancer (Cook, D. A., et al., Evaluation of a validated methylation triage signature for human papillomavirus positive women in the HPV FOCAL cervical cancer screening trial. Int J Cancer, 2018). HPV detection can only determine whether women are infected with carcinogenic HPV, but cannot determine the risk of individual cancer, and there are still few HPV negative cervical cancer patients. Therefore, there may be false positives in HPV testing. On the basis of HPV detection, it is usually necessary to combine other clinical detection indications for subsequent diagnosis.
(2) Pap Smear
Pap Smear, also known as cervical smear or Pap test, is a traditional and most commonly used screening method for cervical cancer. In this method, the cervical exfoliated cells are collected, stained and microscopically observed to test whether there are precancerous cells or cancer cells on the cervix, which has always been regarded as the “gold standard” for cervical cancer detection (Rodriguez, A. C. and J. Salmeron, Cervical cancer prevention in upper middle-income countries. Prev Med, 2017. 98: p. 36-38).
Combined with pathological observation, Pap smear can clearly identify the development of cervical cancer, but this method can only detect about 50% of cervical precancerous lesions. The difference in sample collection quality, insufficient cell collection, fewer abnormal cells, and the shielding of abnormal cells by blood or inflammatory cells will affect smear observation, resulting in poor detection sensitivity (Cook, D. A., et al., Evaluation of a validated methylation triage signature for human papillomavirus positive women in the HPV FOCAL cervical cancer screening trial. Int J Cancer, 2018). At the same time, due to the limitation of sampling, it is difficult to have regular detection and trace cases.
(3) TCT Detection
TCT test, also named as liquid-based thin-layer cytology test, collects cervical cell samples through a special sampler, but does not directly carry out smear observation, and instead puts the collector into a culture bottle filled with cell preservation solution for rinsing to obtain enough cell samples (Massad, L. S., et al., 2012 updated consensus guidelines for the management of abnormal cervical cancer screening tests and cancer precursors. Obstet Gynecol, 2013. 121(4): p. 829-46). After that, the cell sample bottles were sent for laboratory inspection, and the cell samples were dispersed and filtered by automatic cell detector, so as to reduce the interference of blood, mucus and inflammatory tissues and obtain a thin cervical cell layer for further microscopic detection and diagnosis.
TCT detection is an optimized detection scheme for pap smear of cervical cancer developed in recent decades. Compared with the traditional Pap smear of cervical cancer, TCT detection significantly improved the satisfaction of specimens and the detection rate of abnormal cells of cervical cancer. The detection rate of cervical cancer cells by TCT was 100%, and some precancerous lesions could also be found (Andy, C., L. F. Turner and J. O. Neher, Clinical inquiries. Is the ThinPrep better than conventional Pap smear at detecting cervical cancer? J Fam Pract, 2004. 53(4): p. 313-5). However, the detection rate of TCT for cervical precancerous lesions is still low, and the sensitivity for early screening and detection of cervical cancer is low, and there are still many atypical squamous cells (ASC-US) and atypical glandular cells (AGC) with unknown meanings.
The above methods still have some limitations. First of all, for the above-mentioned methods, it is often necessary to use combined screening methods in clinical use (Zigras, T., et al., Early Cervical Cancer: Current Dilemmas of Staging and Surgery. Curr Oncol Rep, 2017. 19(8): p. 51). Secondly, at present, the samples used for cervical cancer screening by the above method are cervical exfoliated cells, and the sampling method will inevitably cause damage and psychological burden to patients, and at the same time, there are certain restrictions on sampling requirements, and the quality of samples is difficult to control. In addition, screening for cervical cancer often requires regular detection. According to FDA standards, for women over 21 years old, regular detection should be conducted every 3 years to assess the risks. The large fluctuation of sampling quality stability may lead to the loss of long-term regression testing. Therefore, a more reliable and stable sample source is needed to provide a more dynamic, accurate and instructive monitoring method and system for cervical cancer screening.
In the context of the present disclosure, “cervical cancer” includes any type of cervical cancer.
Classification and Staging of Cervical Cancer
The occurrence and development of cervical cancer has a gradual evolution process, which can last from several years to several decades. It is generally considered that the evolution can be divided into several stages: mild intraepithelial neoplasia (CINI), moderate intraepithelial neoplasia (CINII), severe intraepithelial neoplasia (CINIII) and invasive cancer.
Cervical cancer can be classified into different types according to different standards.
According to whether cancer has metastasized or not, cervical cancer can be divided into cancer in situ and invasive cancer. Cancer in situ is more common in women aged 30-35, while invasive cancer is more common in women aged 45-55. Lymphatic metastasis may occur in patients with severe cervical cancer. After local infiltration, the cancer invaded lymphatic vessels to form tumor plugs, which were drained into local lymph nodes with lymph fluid and spread in lymphatic vessels.
According to pathological types, cervical cancer can be divided into three types: squamous cell carcinoma, adenocarcinoma and adeno-squamous carcinoma.
Cervical squamous cell carcinoma is the main type of cervical cancer. According to histological differentiation, it can be divided into three grades: Grade I is highly differentiated squamous cell carcinoma, Grade II is medium differentiated squamous cell carcinoma (non-keratinized large cell type), and Grade III is low-medium differentiated and low differentiated squamous cell carcinoma (small cell type).
Cervical adenocarcinoma includes mucinous adenocarcinoma type and malignant adenoma type. Mucinous adenocarcinoma originates from columnar mucous cells of cervical canal, and the glandular structure can be seen under microscope. The hyperplasia of glandular epithelial cells is multilayer, the dysplasia is obvious, and mitosis is seen. The cancer cells protrude into the glandular cavity in mastoid shape. Malignant adenoma is a highly differentiated adenocarcinoma of cervical canal mucosa. There are many cancerous glands with different sizes and varied shapes, which extend into the deep cervical stroma in a punctate way. The glandular epithelial cells are atypical and often have lymph node metastasis.
Unexpectedly, the inventors found that peripheral red blood cell micronuclei DNA can be used for screening and diagnosing cervical cancer. The inventors further unexpectedly found that the micronuclei DNA of peripheral red blood cells can be used to distinguish the types of cervical cancer, which can be divided into squamous cell carcinoma and adenocarcinoma. The inventor further unexpectedly found that, peripheral red blood cell micronuclei DNA can stage cervical cancer, for example, cervical squamous cell carcinoma can be divided into high-differentiated type, medium-differentiated type, and low-medium-differentiated and low-differentiated type. It is of great significance for the early diagnosis, screening, classification and staging of cervical cancer.
Colorectal Cancer
Colorectal cancer (CRC) is a cancer that arises from the colon or rectum. It is one of the most common malignant tumors in the gastrointestinal tract. The early symptoms are not obvious. The symptoms and signs shown with the increase of the cancer can include blood in stools, weight loss, and constant fatigue (General Information About Colon Cancer. NCI. May 12, 2014. Archived from the original on Jul. 4, 2014. Retrieved Jun. 29, 2014).
There are approximately 1.4 million new cases of colorectal cancer each year. Colorectal cancer ranks third among newly diagnosed cancers, and it is also the fourth cause of death from cancer. Studies have shown that by 2030, the number of global colorectal cancer cases is expected to increase by 60%, with more than 2.2 million new cases per year and approximately 1.1 million deaths per year (Global patterns and trends in colorectal cancer incidence and mortality. M, et al. Gut. 2017; 66:683-91).
Globally, colorectal cancer is the third most common cancer, accounting for about 10% of all cancer cases. It is especially common in developed countries, where more than 65% of cases are found to be CRC, and it is usually less common in women than in men (Forman D, Ferlay J (2014). “Chapter 1.1: The global and regional burden of cancer”. In Stewart B W, Wild C P (eds.). World Cancer Report. the International Agency for Research on Cancer, World Health Organization. pp. 16-53. ISBN 978-92-832-0443-5).
With the improvement of people's living standards in China, the incidence of colorectal cancer is on the rise. The latest statistics show that the incidence and mortality of colorectal cancer (CRC) in China have maintained an upward trend. Cancer statistics in China in 2015 show that the incidence and mortality of colorectal cancer in my country rank fifth among all malignant tumors, with 376,000 new cases and 191,000 deaths. Among them, the amount in urban areas is much higher than that in rural areas, and the incidence of colon cancer has increased significantly. Most patients are already in the middle and late stages when found. Early diagnosis of colorectal cancer is extremely important, and early diagnosis can significantly increase the possibility of successful treatment (5. Standards for Diagnosis and Treatment of Colorectal Cancer in China (2017 Edition) [J]. Chinese Journal of Medical Frontiers (Electronic Edition), 2018, 10(3): 1-21).
Causes of Disease
Most colorectal cancers are caused by factors like aging and lifestyle, and only a few cases are caused by potential hereditary diseases. Risk factors include diet, obesity, smoking and lack of physical activity. Another risk factor is inflammatory bowel disease, including Crohn's disease and ulcerative colitis. Some hereditary diseases lead to colorectal cancer, including familial adenomatous polyposis and hereditary nonpolyposis colon cancer. CRC usually begins with benign tumor and appears as polyp, which may become cancerous with time.
Classification
Classification According to Causes
According to the causes, colorectal cancer can be divided into three classes, two of which have genetic factors:
Sporadic colorectal cancer: Sporadic colorectal cancer is the most common type, with 90% of patients diagnosed at the age of 50 and above. It is not directly related to genetics or family history. About one in every 20 Americans has this type of CRC.
Familial colorectal cancer: Some families are prone to CRC. If more than one person in the family suffers from CRC, especially before the age of 50, attention must be paid to it. If immediate family members (parents, siblings or children) have colorectal cancer, the risk of such family members will double.
Hereditary colorectal cancer: At present, many hereditary diseases have been found to be related to CRC, including hereditary nonpolyposis colon cancer (HNPCC), also known as Lynch syndrome; Familial adenomatous polyposis (FAP); Attenuated familial adenomatous polyposis (AFAP); APCI 1307K; Potts-Jaggers syndrome; MYH-associated polyposis (MAP); Juvenile polyposis; Hereditary polyposis.
Classification According to Focus of Cancer
According to the focus of cancer, colorectal cancer can be divided into colon cancer and rectal cancer.
Importance of Early Screening
Lifestyles such as high-fat diet, smoking and alcoholism may increase the risk of colorectal cancer. More than 90% of colorectal cancer patients are over 50 years old. Usually, the best treatment period is missed because of neglecting the early symptoms of the disease, including bloody feces or changes in defecation habits. Early diagnosis can significantly increase the possibility of successful treatment.
In recent years, in the United States, the incidence rate and mortality rate of CRC are gradually decreasing. The microscopic simulation model MISCAN-Colon suggests that the observed mortality rate of CRC is decreasing, and about 53% of it may benefit from CRC screening. In 2012, 65.1% of adults at the age of 50-75 in the United States had been screened for CRC, and 27.7% had never been screened. Colonoscopy is the most frequently used screening examination (nearly 62%). From 2002 to 2010, the screening rate increased from 52.3% to 65.4%. With the improvement of screening rate, early treatment and intervention for risk individuals significantly reduced the incidence rate and mortality rate of CRC (Cronin K A, Lake A J, Scott S, et al. Annual Report to the Nation on the Status of Cancer, part I: National cancer statistics. Cancer 2018; 124:2785).
Early Screening and Diagnosis Methods of Colorectal Cancer in the Prior Art
Early screening and diagnosis of colorectal cancer mainly include the following ways:
(1) Colonoscopy
Colonoscopy is the most accurate and universal diagnostic examination of CRC, which can locate lesions in the whole large intestine and perform biopsy to find simultaneous tumors and remove polyps. Observed under endoscope, most colon cancer and rectal cancer are intraluminal masses which originate from mucosa and protrude into the lumen. Tumors can be exophytic or polypoid. Bleeding (blood oozing or obvious bleeding) may be observed in fragile, necrotic or ulcerated lesions. A few gastrointestinal tumor lesions (both asymptomatic and symptomatic individuals) are non-polypoid. A study found that, non-polypoid colorectal tumors are more prone to carcinogenesis than polypoid tumors. Compared with polypoid lesions, cancer caused by non-polypoid adenoma may be more difficult to be detected under colonoscopy, but colonoscopy is more sensitive to this situation than barium enema or CT colonography. When experienced endoscopy operators use colonoscopy to examine asymptomatic patients, the missed diagnosis rate of CRC is 2%-6%.
(2) Flexible Sigmoidoscopy
It has been observed that in the past 50 years, the proportion of right colon cancer or proximal colon cancer in the United States and around the world is gradually increasing, and the incidence of tumors originating from cecum is increasing with the fastest speed. In view of this, and considering the high incidence of simultaneous CRC, for patients with suspected CRC, flexible sigmoidoscopy is generally considered not to be an appropriate diagnostic examination, unless the tumor is palpable in rectum. In this case, total colonoscopy is still needed to assess whether the rest of the colon has simultaneous polyps and cancer. However, flexible sigmoidoscopy is used to screen CRC. It is one of the few methods that have been proved by randomized controlled trials to reduce the incidence and morbidity of CRC.
(3) CT Colonography
CT colonography, also known as virtual colonoscopy or CT colonography, can provide a computer-simulated intraluminal perspective for the inflated colon. This technology uses traditional spiral CT scanning or MRI to obtain a large number of continuous data, and uses complex post-processing software to generate images, which can enable operators to walk and pass in any selected direction in the clean colon cavity. CT colonography needs mechanical bowel preparation similar to barium enema, because feces can be similar to polyps in image, causing interference. CT colonography can also detect extracolonic lesions, which can provide information on the causes of symptoms and tumor staging, but it may also lead to anxiety and increase costs due to unnecessary examinations. And its detection rate for clinically important lesions may also be low.
Compared with colonoscopy, CT colonography is an alternative with similar sensitivity and less trauma for patients with CRC. However, considering that colonoscopy can remove/biopsy the lesions and simultaneous cancers or polyps seen during the operation, colonoscopy is still considered as the gold standard for CRC symptoms. When the use of colonoscopy is limited, CT colonography is preferred over barium enema (Mulder S A, Kranse R, Damhuis R A, et al. Prevalence and prognosis of synchronous colorectal cancer: a Dutch population-based study. Cancer Epidemiol 2011; 35:442).
However, due to the particularity of sampling and testing methods, the above screening methods will inevitably lead to psychological burden and local injury of some screeners, which is also an influencing factor that limits the long-term and large-scale screening, and it is necessary to consider the adaptability of patients' age and screening methods.
(4) Guaiac-Based Faecal Occult Blood Test (gFOBT)
This test detects whether there is blood in a patient's stool sample. But blood stool test is not 100% accurate, because not all cancers cause bleeding, or they may not bleed all the time. Therefore, this test can give false negative results. Blood may also be present due to other diseases or conditions, such as hemorrhoids. The method of detecting fecal hemoglobin by guaiac is an indirect method for detecting peroxidase activity. There are non-hemoglobin peroxidase catalytic components in various foods, which may cause false positive, thus limiting the application of this method. Its advantage lies in the convenience and rapidity of initial detection and screening, which has certain guiding significance for further detection and diagnosis, but its accuracy is relatively low.
(5) Immunochemical Test (Faecal Immunochemical Test, FIT)
This test uses antibodies to detect fecal occult blood. FIT uses monoclonal or polyclonal antibodies to directly detect hemoglobin in human feces, which is not affected by dietary. In qualitative FIT, color change is visible after the hemoglobin content in feces exceeds a certain threshold. While quantitative FIT can measure the value: when it exceeds a certain normal range, it is defined as positive. Compared with gFOBT, immunochemical test requires less stool samples, and there is no dietary restriction before collecting stool samples, but only one or two stool samples are collected each time (Mettle Kalager, et al. Overdiagnosis in Colorectal Cancer Screening: Time to Acknowledge a Blind Spot[J]. Gastroenterology, 2018 Aug. 1). Even if there is merely occult blood in a sample, occult blood can also be detected. Occult blood in a sample indicates intestinal bleeding. This method has relatively high specificity, but poor sensitivity, and there may also be false positive or negative results due to interference from other diseases, which makes it impossible to make a definite diagnosis.
(6) Fecal DNA Test
Colorectal cancer generally occurs in colorectal epithelial tissue, and first grows into intestinal cavity. During its growth, tumor cells are constantly shed into intestinal cavity and discharged with feces. The shed tumor cells in feces contain special components (such as mutated and methylated human genes), which can be used as tumor markers. Fecal DNA test analyzed several DNA markers of colon cancer or precancerous polyp cells flowing into feces. Patients can be provided with a kit containing instructions on how to collect stool samples at home, and then send it to the laboratory for detection and analysis. This test is more accurate for detecting colon cancer than polyp, but it cannot detect all DNA mutations that indicate the existence of tumor. The value of fecal gene detection lies in early diagnosis, which can prompt the occurrence of colorectal cancer, find precancerous adenoma and help patients to find colorectal cancer at an earlier stage (Imperiale, T. F., et al., Multitarget Stool DNA Testing for Colorectal-Cancer Screening. New England Journal of Medicine, 2014. 370(14): p. 1287-1297). However, fecal genetic testing can only be used as an auxiliary diagnostic method. If there is a positive result, it must be confirmed and intervened by colonoscopy. However, due to the complexity of fecal DNA, its low specificity and low success rate of fecal DNA preparation will lead to insufficient cost-effectiveness, which greatly hinders its practical application.
The above methods are relatively convenient for sampling, and non-invasive. Non-invasive detection is more acceptable to patients, which may be used as an indicator of CRC screening. However, due to the specificity and sensitivity of the methods, most of them can only be used as an auxiliary means of diagnosis, and other means such as colonoscopy are still needed for diagnosis and intervention. Meanwhile, for stool sampling and treatment, the psychological burden to a certain extent, as well as the complexity and pollution of stool samples, also cause problems in the stability and repeatability of sample detection (Brenner, H., et al., Prevention, Early Detection, and Overdiagnosis of Colorectal Cancer Within 10 Years of Screening Colonoscopy in Germany. Clinical Gastroenterology and Hepatology, 2015. 13(4): p. 717-723). Therefore, a more reliable and stable sample source is needed to provide a more dynamic, accurate and instructive monitoring system for CRC screening.
Surprisingly, the inventors found that peripheral red blood cell micronuclei DNA can be used to screen and diagnose colorectal cancer. The inventors further surprisingly found that micronuclei DNA of peripheral red blood cells can be used to distinguish the types of colorectal cancer, which can be divided into colon cancer and rectal cancer. It is of great significance for early diagnosis, screening and risk ranking of colorectal cancer.
Lung Cancer
Lung cancer is the most common cancer type worldwide in terms of both incidence and mortality. The key cause of lung cancer is tobacco smoking, which is responsible for 63% of overall global deaths from lung cancer and for more than 90% of lung cancer deaths in countries where smoking is prevalent in both sexes. Causes of lung cancer also including: secondhand smoke, family history of lung cancer, exposed to asbestos, arsenic, chromium, beryllium, nickel, soot, or tar in the workplace, air population, etc.
Classification according to causes:
According to the causes, lung cancer can be divided into two main classes are small-cell lung carcinoma (SCLC) and non-small-cell lung carcinoma (NSCLC).
SCLC (10%-15%): This type of lung cancer is the most aggressive and rapidly growing of all the types. SCLC is strongly related to cigarette smoking. SCLCs metastasize rapidly to many sites within the body and are most often discovered after they have spread extensively.
NSCLC (85%): NSCLC has three main types designated by the type of cells found in the tumor. They are:
The Diagnosis of Lung Cancer is Mainly Focused on Imaging Examination:
(1) X-ray inspection: X-ray examination can understand the location and size of lung cancer, and may see local emphysema, atelectasis, or infiltrating lesions or pulmonary inflammation near the lesion due to bronchial obstruction. (2) Bronchoscopy: the bronchoscope can directly observe the pathological conditions of the bronchial lining and lumen. Tumor tissue can be taken for pathological examination, or bronchial secretions can be drawn for cytological examination to confirm the diagnosis and determine the histological type. (3) Cytological examination: sputum cytology is a simple and effective method for general screening and diagnosis of lung cancer. Most patients with primary lung cancer can find shed cancer cells in the sputum. The positive rate of sputum cytology for central lung cancer can reach 70% to 90%, while that for peripheral lung cancer is only about 50%. (4) ECT inspection:ECT bone imaging can detect bone metastases earlier. Both X-ray film and bone imaging have positive findings. If the osteogenesis reaction of the lesion is static and the metabolism is not active, the bone imaging is negative and the X-ray film is positive. The two complement each other and can improve the diagnosis rate. (6) Mediastinoscopy: mediastinoscopy is mainly used for patients with mediastinal lymph node metastasis, which is not suitable for surgical treatment, and other methods cannot obtain pathological diagnosis.
Surprisingly, the inventors found that peripheral red blood cell micronuclei DNA (rbcDNA) can be used to screen and diagnose lung cancer. The inventors further surprisingly found that rbcDNA signature is of great significance for early diagnosis, screening and risk ranking of lung cancer.
Hepatocellular Cancer
Hepatocellular cancer (HCC) is the fifth most common cause of cancer, and the incidence is increasing globally due to the spread of hepatitis B and C virus infections, causes also including: cirrhosis, heavy drinking, obesity and diabetes, abusive anabolic steroids, iron storage disease and aflatoxin. If caught early, it can sometimes be cured by surgery or transplantation. In more severe cases, it cannot be cured.
Detection of Serum Bio-Markers for Hepatocellular Cancer
(1) The determination of serum alpha-fetoprotein (AFP) is relatively specific for the diagnosis of this disease. Immunoassay measures continuous serum AFP≥400 μg/L, and can rule out pregnancy, active liver disease, etc., then the diagnosis of liver cancer can be considered. However, approximately 30% of patients with liver cancer are clinically negative for AFP. (2) Blood enzymology and other tumor marker examinations. The levels of γ-glutamyl transpeptidase and its isoenzymes, abnormal prothrombin, alkaline phosphatase, and lactate dehydrogenase isoenzymes in the serum of patients with liver cancer may be higher than normal, but it lacks specificity.
Imaging Examination
(1) Ultrasound examination can show the size, shape, location of the tumor and whether there are tumor thrombi in the hepatic vein or portal vein, and the diagnostic coincidence rate can reach 90%. (2) CT examination has a high resolution, and the diagnostic coincidence rate for liver cancer can reach more than 90%, and it can detect small cancer foci with a diameter of about 1.0 cm. (3) The diagnostic value of MRI is similar to that of CT. It is better than CT in distinguishing benign and malignant intrahepatic lesions, especially from hemangioma. (4) Selective angiography of celiac artery or hepatic arteriography. For cancers with abundant blood vessels, but the low-resolution limit for small liver cancers tumor volume less than 2.0 cm, the positive rate can reach 90%. (5) Needle aspiration cytology for liver puncture, needle aspiration under the guidance of B-mode ultrasound can help increase the positive rate in cancer diagnosis, but with invasive tissue damage.
Surprisingly, the inventors found that peripheral red blood cell micronuclei DNA (rbcDNA) can be used to screen and diagnose hepatocellular cancer. The inventors further surprisingly found that rbcDNA signature is of great significance for early diagnosis, screening and risk ranking of hepatocellular cancer.
Combined Application of the Invention and Other Methods
In certain embodiments, the methods of the present disclosure can also be combined with other methods for screening, diagnosing or risk ranking of cancer. Those skilled in the art can select suitable other methods in the prior art as required.
In certain embodiments, methods related to cervical cancer that can be combined with the methods of the present disclosure include, for example, detection of high-risk HPV and cytological examination of cervical exfoliated cells. In an embodiment, the detection methods for high-risk HPV include morphological observation method, immunohistochemistry method, dot-blot hybridization method, blotting hybridization in situ, PCR/RFLP method, PCR/Southern method and the like. In one embodiment, the cytological examination of cervical exfoliated cells includes TCT, Pap smear, etc.
In certain embodiments, methods related to colorectal cancer that can be combined with the methods of the present disclosure include, for example, colonoscopy, flexible sigmoidoscopy, CT colonography, fecal occult blood test, immunochemical test, fecal DNA test, and the like.
In the following section, the present invention is further illustrated by examples. Examples are provided by way of illustration, but the present invention is not limited to the following examples. In the following examples, the subjects are all human subjects.
Through the following steps, the peripheral blood samples of each subject were subjected to density gradient centrifugation.
Step 1. 1 ml fresh peripheral blood was obtained from a subject, and 1×PBS was added in equal volume to prepare a diluted blood sample.
Step 2. 5 ml Ficoll density gradient centrifuge (Stemcell, Lymphoprep™ 07801) was added into the density gradient centrifuge tube.
Step 3. The diluted blood sample prepared in step 1 was slowly added to the density gradient centrifuge tube in Step 2. Density gradient centrifugation was performed at 1200 g at 18° C. for 15 minutes.
After density gradient centrifugation, the sample was divided into three layers: the upper layer was plasma, the middle layer was peripheral blood mononuclear cells (PBMC), and the bottom layer was red blood cells (as shown in
After density gradient centrifugation in Example 1, peripheral blood mononuclear cells and red blood cells were separated.
Specifically, as shown in
In this example, the genome of peripheral blood mononuclear cells and erythrocyte micronuclei DNA were extracted, respectively.
3.1 Extraction of Genomic DNA from Peripheral Blood Mononuclear Cells
Genomic DNA was extracted from the peripheral blood mononuclear cell sample obtained in Example 2 using QIAamp DNA Blood Mini Kit (Qiagen, Cat No./ID: 51106), as shown in
3.2 Extraction of Erythrocyte Micronuclei DNA
Red blood cells obtained in Example 2 were lysed by a red blood cell lysis buffer. Specifically, 10 ml of red blood cell lysis buffer (Biosharp, Cat No./ID: BL503B) was added to the red blood cells collected in Example 2, lysed for 20 minutes at room temperature in the dark, and centrifuged at 3000 g at room temperature for 10 minutes. Supernatant was taken and incubated with 10 mm EDTA (Solarbio Cat No./ID: E1170) and 200 ug/ul protease K (Ambion, Cat No./ID: AM2548) at 56° C. for 8 hours. Erythrocyte micronuclei DNA was extracted using QIAamp DNA Blood Mini Kit (Qiagen, Cat No./ID: 51106).
Genomic DNA of peripheral blood mononuclear cells and erythrocyte micronuclei DNA extracted in Example 3 were amplified, library constructed and sequenced, respectively.
4.1 DNA Amplification
Genomic DNA of peripheral blood mononuclear cells and erythrocyte Micronuclei DNA prepared in Example 3 were subjected to multiple displacement amplification (MDA) using REPLI-g Single Cell Kit(Qiagen, Cat No./ID: 150345), to obtain amplified DNA samples.
4.2 Library Construction
After MDA, the amplified DNA samples were subjected to second-generation sequencing library construction using TruePrep DNA Library Prep Kit V2 for Illumina (Vazyme, TD503).
4.3 High-Throughput Sequencing
Genomic DNA of peripheral blood mononuclear cells and erythrocyte micronuclei DNA were sequenced by Novo-seq platform, with 10×sequencing depth and 30 G data.
Bioinformatics analysis was made on micronuclei DNA information in red blood cells by the following steps (see
1. Quality control. Quality control on the original sequencing files of double-ended sequencing of erythrocyte micronuclei DNA and peripheral blood mononuclear cell genome DNA respectively through FastQC software.
2. Adaptor removal. Adaptor removal in the original sequencing file through cutadapter software (Kong, Y., Btrim: a fast, lightweight adapter and quality trimming program for next-generation sequencing technologies. Genomics, 2011. 98(2): p. 152-3). According to sequencing quality, the reads of small fragments with appropriate length and accurate pairing were reserved.
3. Data alignment. Sequenced fragments of red blood cell micronuclei DNA and peripheral blood mononuclear cell genome DNA were analyzed by bwa software (http://bio-bwa.sourceforge.net) was aligned to the human genome, and inappropriate and repeated reads were removed by Picard (Weisenfeld, N. I., et al., Direct determination of diploid genome sequences. Genome Res, 2017. 27(5): p. 757-767).
4. Comparison and counting of reads. The reads of sequenced small fragments in erythrocyte micronuclei DNA were counted corresponding to the gene regions of human genome using htseq-count software (Anders, S., P. T. Pyl and W. Huber, HTSeq-a Python framework to work with high-throughput sequencing data. Bioinformatics, 2015. 31(2): p. 166-9), to compare whether there were significant differences in the degree of DNA fragmentation in red blood cells of healthy individuals and cancer patients.
5. Peak Calling. Searching for the fragments of red blood cell micronuclei DNA which were mainly enriched in a specific genetic region relative to the genome DNA sequencing reads of peripheral blood mononuclear cells through macs2 software, and removing the peak areas which were more enriched by peripheral blood mononuclear cells relative to PBMC per se as a whole.
6. Genome information annotation and pathway enrichment of specific broken fragments in erythrocyte micronuclei DNA. Compared with peripheral blood mononuclear cells, genome information annotation and pathway enrichment (KEGG, Gene Ontology) were performed on specific broken fragments of erythrocytes (Chen, L., et al., Gene Ontology and KEGG Pathway Enrichment Analysis of a Drug Target-Based Classification System. PLoS One, 2015. 10(5): p. e0126492), the specific broken genes in erythrocyte micronuclei DNA were obtained.
7. Data classification and classifier construction. Differentiated genes were select as features to construct classifiers for known classified samples based on support vector machine (SVM), and to predict unknown samples (Huang, M. W., et al., SVM and SVM Ensembles in Breast Cancer Prediction. PLoS One, 2017. 12(1): p. e0161501).
7.1 Data Classification
Specifically, the reads count in gene regions of “n” experimental samples and “m” control samples were selected every time, wherein “n” and “m” refers to the number of samples). The differentiated genes (also called “characteristic genes”) were screened out by ANOVA test to distinguish the two types of samples.
7.2 Classifier Construction
Classifier parameter adjustment. Based on the characteristic genes screened in step 7.1, the training group (n=100) was determined by using the algorithm SVM/LOOCV (leave-one-out cross-validation support vector machine). First, the true labels of all samples were set (e.g., the sample in the experimental group was recorded as 1, and the sample in the control group was recorded as 0). One sample was picked out at a time as a test set, and all other samples (n−1) were used to build a model and test the “test set”. The test set traversed all samples to complete n rounds of cross-validation, and the test results for each sample were obtained. Based on the whole test results and the real label of each sample, the accuracy, sensitivity and specificity were calculated, so as to adjust the best parameters of the model and construct the training model. In this study, the parameters of SVM were set as C=100 and gamma=10−4, wherein C is the penalty coefficient, that is, the tolerance of errors; gamma is a default parameter when RBF function is selected as kernel.
7.3 Prediction on Unknown Samples
Based on the training model obtained in the previous step, unknown samples (i.e., test sets) that did not participate in the training were used to predict the test set samples through the classifier constructed in the previous step, to obtain the prediction results of the test set and the real labels of the samples, and to present the proportion of each prediction result in the two classes (i.e., risk assessment index). Unknown samples were predicted and to show the results of binary classification.
In this example, there were 15 subjects, including:
Experimental group: 9 patients diagnosed with cervical cancer by other methods
Control group: 6 healthy individuals (non-cervical disease individuals).
The peripheral blood samples from patients with cervical cancer were expressed in the form of “P” plus patient number. For example, “P1” represented a peripheral blood sample from the first cervical cancer patient (“Patient 1”), “P2” represented a peripheral blood sample from the second cervical cancer patient (“Patient 2”), and so on. In addition, peripheral blood samples from healthy individuals were expressed in the form of “H” plus individual number. For example, “H1” represents the peripheral blood sample from the first healthy individual, “H2” represents the peripheral blood sample from the second healthy individual, and so on.
The basic information of 9 patients with cervical cancer is shown in Table 1. “cervical cancer type” refers to the type of cervical cancer diagnosed by other methods.
Erythrocyte micronuclei DNA and genomic DNA of peripheral blood mononuclear cells of each subject were obtained as described in Examples 1-4, and bioinformatics analysis was carried out as described in Example 5.
Specifically, 9 samples of primary cervical cancer and 6 samples of healthy women were selected for the reads counting, and 2306 differential genes were screened out by ANOVA test to distinguish the two classes of samples. Then, according to Pearson correlation, the two classes of samples were clustered in unsupervised hierarchy, showing that there were significant differences between the two classes of samples.
As shown in
The list of 2,306 differential genes is shown in Table 2. Each gene corresponds to each row from top to bottom in
In this example, there were 8 subjects, including 2 patients diagnosed as cervical adenocarcinoma and 5 patients diagnosed as cervical squamous cell carcinoma by other methods.
The peripheral blood samples from patients with cervical cancer were expressed in the form of “P” plus patient number. For example, “P1” represents a peripheral blood sample from the first cervical cancer patient (“Patient 1”), “P2” represents a peripheral blood sample from the second cervical cancer patient (“Patient 2”), and so on.
The basic information of 7 patients with cervical cancer is shown in Table 3. “Cervical cancer type” refers to the type of cervical cancer diagnosed by other methods.
Erythrocyte micronuclei DNA and genomic DNA of peripheral blood mononuclear cells of each subject were obtained as described in Examples 1-4, and bioinformatics analysis was carried out as described in Example 5.
Specifically, 2 adenocarcinoma samples and 6 squamous cell carcinoma (including one HPV-negative sample) in primary cervical cancer samples were selected for the reads counting, and 360 differential genes were screened out by ANOVA test to distinguish the two classes of samples. Then, according to Pearson correlation, the two classes of samples were clustered in unsupervised hierarchy, showing that there were significant differences between the two classes of samples.
As shown in
The list of 360 differential genes is shown in Table 4. Each gene corresponds to each row from top to bottom in
In this example, there were 5 subjects, including 2 patients diagnosed as medium differentiated cervical squamous cell carcinoma by other methods and 3 patients diagnosed as low differentiated and low-medium differentiated cervical squamous cell carcinoma.
The peripheral blood samples from patients with cervical cancer were expressed in the form of “P” plus patient number. For example, “P1” represents a peripheral blood sample from the first cervical cancer patient (“Patient 1”), “P2” represents a peripheral blood sample from the second cervical cancer patient (“Patient 2”), and so on.
The basic information of 5 patients with cervical cancer is shown in Table 5. “Cervical cancer type” refers to the type of cervical cancer diagnosed by other methods.
Erythrocyte micronuclei DNA and genomic DNA of peripheral blood mononuclear cells of each subject were obtained as described in Examples 1-4, and bioinformatics analysis was carried out as described in Example 5.
Specifically, 2 medium differentiated cervical squamous cell carcinoma samples and 3 low differentiated and low-medium differentiated squamous cell carcinoma samples in primary cervical squamous cell carcinoma samples were selected for the reads counting, and 466 differential genes were screened out by ANOVA test to distinguish the two classes of samples. Then, according to Pearson correlation, the two classes of samples were clustered in unsupervised hierarchy, showing that there were significant differences between the two classes of samples.
As shown in
The list of 466 differential genes is shown in Table 6. Each gene corresponds to each row from top to bottom in
Using the classifier (2,306 genes) constructed in Example 6 for clustering healthy individuals and cervical cancer patients, 8 unknown samples from 8 subjects were predicted.
Erythrocyte micronuclei DNA and genomic DNA of peripheral blood mononuclear cells of each subject were obtained as described in Examples 1-4, and bioinformatics analysis was carried out as described in Example 5.
After testing, it was found that 5 of the 8 samples were at high risk of cervical cancer (the risk probabilities were all over 85%), and 3 were at low risk of cervical cancer (the risk probabilities were all less than 5%). Tracing back the sample sources of subjects predicted to be high risk and subjects predicted to be low risk, it was found that 5 samples with high risk of cervical cancer were obtained from patients who were diagnosed as cervical cancer by other diagnostic methods. Three samples with low risk of cervical cancer were obtained from healthy individuals detected by other diagnostic methods.
The result is shown in
Therefore, the method and the gene classifier of the present disclosure can effectively distinguish cervical cancer patients from healthy individuals.
Erythrocyte micronuclei DNA and genomic DNA of peripheral blood mononuclear cells of each subject were obtained as described in Examples 1-4, and bioinformatics analysis was carried out as described in Example 5.
Using the classifier (360 genes) constructed in Example 7 for clustering patients with cervical squamous cell carcinoma and cervical adenocarcinoma, three cervical cancer samples with unknown classification were predicted.
After testing, it was found that two of the three samples were of high risk (the risk probabilities were all over 85%) and one was of low risk (the risk probability was less than 5%). Tracing back the sample sources of subjects with high risk of cervical squamous cell carcinoma and subjects with low risk of cervical squamous cell carcinoma, it was found that two samples with high risk of cervical squamous cell carcinoma were obtained from patients with cervical squamous cell carcinoma detected by other diagnostic methods, and one sample with low risk of cervical squamous cell carcinoma was obtained from healthy individuals as detected by other diagnostic methods.
The result is shown in
Therefore, the method and gene classifier of the present disclosure can effectively classify cervical cancer patients and distinguish cervical squamous cell carcinoma from cervical adenocarcinoma.
In this Example, there were 17 subjects, including:
Experimental group: 4 patients diagnosed as colorectal cancer by other methods
Control group: 13 healthy individuals (non-colorectal cancer individuals).
The peripheral blood samples from patients with colorectal cancer were expressed in the form of “P” plus patient number. For example, “P1” represented a peripheral blood sample from the first colorectal cancer patient (“Patient 1”), “P2” represented a peripheral blood sample from the second colorectal cancer patient (“Patient 2”), and so on. In addition, peripheral blood samples from healthy individuals were expressed in the form of “H” plus individual number. For example, “H1” represents the peripheral blood sample from the first healthy individual, “H2” represents the peripheral blood sample from the second healthy individual, and so on.
The basic information of 4 patients with colorectal cancer is shown in Table 7. Colorectal cancer type, e.g., “adenocarcinoma”, refers to the type of colorectal cancer diagnosed by other methods.
Erythrocyte micronuclei DNA and genomic DNA of peripheral blood mononuclear cells of each subject were obtained as described in Examples 1-4, and bioinformatics analysis was carried out as described in Example 5.
Specifically, the reads counts of the gene regions of 4 primary colorectal cancer samples and 13 healthy female samples were selected, and 903 differential genes were screened out by ANOVA test to distinguish the two classes of samples. Then, the unsupervised hierarchical clustering of the two classes of samples was carried out according to Pearson correlation, and it was found that there were significant differences between the two classes of samples.
As shown in
The list of 903 differential genes is shown in Table 8. Each gene corresponds to each row from top to bottom in
In this Example, there were 10 patients with colorectal cancer, including 5 patients diagnosed with colon cancer and 5 patients diagnosed with rectal cancer by other methods.
The peripheral blood samples from the above patients are expressed in the form of “P” plus patient number. For example, “P1” represents a peripheral blood sample from the first colorectal cancer patient (“Patient 1”), “P2” represents a peripheral blood sample from the second colorectal cancer patient (“Patient 2”), and so on.
The basic information of 10 colorectal cancer patients is shown in Table 9. Colorectal cancer type, e.g., “adenocarcinoma”, refers to the type of colorectal cancer diagnosed by other methods.
Erythrocyte micronuclei DNA and genomic DNA of peripheral blood mononuclear cells of each subject were obtained as described in Examples 1-4, and bioinformatics analysis was carried out as described in Example 5.
Specifically, 97 different genes were screened out by ANOVA test from the reads count of gene regions of 5 colon cancer samples and 5 rectal cancer samples, and then unsupervised hierarchical clustering of the two types of samples according to Pearson correlation showed that there were significant differences between the two classes of samples.
As shown in
The list of 97 differential genes is shown in Table 10. Each gene corresponds to each row from top to bottom in
Using the classifier (903 genes) constructed in Example 11 for clustering healthy individuals and colorectal cancer patients, four unknown samples from four subjects were predicted.
Erythrocyte micronuclei DNA and genomic DNA of peripheral blood mononuclear cells of each subject were obtained as described in Examples 1-4, and bioinformatics analysis was carried out as described in Example 5.
After testing, it was found that two of the four samples were at high risk of colorectal cancer (the risk probabilities were all over 90%) and two were at low risk of colorectal cancer (the risk probabilities were all less than 5%). Tracing back the sample sources of subjects predicted to be high risk and subjects predicted to be low risk, it was found that the two samples with high risk of colorectal cancer were obtained from patients who were diagnosed as colorectal cancer by other diagnostic methods, and the two samples with low risk of colorectal cancer were obtained from healthy individuals detected by other diagnostic methods.
The result is shown in
Therefore, the method and gene classifier of the present disclosure can effectively distinguish colorectal cancer patients from healthy individuals.
Erythrocyte micronuclei DNA and genomic DNA of peripheral blood mononuclear cells of each subject were obtained as described in Examples 1-4, and bioinformatics analysis was carried out as described in Example 5.
Using the classifier (97 genes) constructed in Example 12 for clustering colon cancer and rectal cancer patients, four colorectal cancer samples with unknown classification were predicted.
After testing, it was found that two of the four samples were at high risk of colon cancer (the risk probabilities were all over 85%) and two were at low risk of colon cancer (the risk probabilities were all less than 5%). Tracing back the sample sources of subjects with high risk of colon cancer and subjects with low risk of colon cancer, it was found that the two samples with high risk of colon cancer were obtained from patients with colon cancer detected by other diagnostic methods, and the two samples with low risk of colon cancer came from subjects who were diagnosed as rectal cancer by other diagnostic methods.
The result is shown in
Therefore, the method and gene classifier of the present disclosure can effectively classify colorectal cancer patients and distinguish colon cancer from rectal cancer.
We randomly assigned HD and cancer samples to a training set (70%, n=236) for model development, a validation set (10%, n=34) for hyper-parameter selection and a test set (20%, n=68) for model validation. Our results showed that 91% (95% confidence interval 84-100%) of cancer patients including 85% LC, 100% CRC and 90% HCC were detected with 99% specificity. This includes 86% patients with stage I, 92% of patients with stage II and 100% of patients with stage III cancers (Table 14). This data suggests the presence of specific rbcDNA signatures that can differentiate between healthy donors and cancer patients. We next tested the efficacy of rbcDNA in differentiating among specific cancer types. rbcDNA signatures exhibit high discriminatory performance in pairwise comparisons of healthy and cancer groups, our results showed that 90% (95% confidence interval 68-100%) of HCC patients, 100% (95% confidence interval 100-100%) of CRC patients and 85% (95% confidence interval 70-100%) of LC patients, all were detected with 95% specificity (Table 15). Moreover, pairwise and multiclass tests showed overall high accuracy in detecting specific cancers indicating significant discriminative power of rbcDNA profiles (
The result is shown in Table 14, the list of differential rbcDNA signature is shown in Table 16.
The result is shown in Table 15, the list of differential rbcDNA signature is shown in Table 17 for HD vs. LC, Table 18 for HD vs. CRC, Table 19 for HD vs. HCC.
The result is shown in
Table 14 shows accuracy ratios of pan-cancer deep neural network classification in the test set for each cancer type, including corresponding sensitivity with 99% specificity (CI, Confidence Interval).
Table 15 shows accuracy ratios of each cancer type deep neural network classification in the test set for each cancer type, including corresponding sensitivity with 95% specificity (CI, Confidence Interval).
Genome-wide sequencing profiles revealed rbcDNA signals distribute across autosomal chromosomes with specific patterns distinct from those of the corresponding genomic DNA (gDNA) (
It can be clearly seen from the above examples that the inventors have successfully isolated the micronuclei DNA of peripheral red blood cells, and constructed a classifier for cancer detection by using the micronuclei DNA of peripheral red blood cells, thus realizing the effective detection of cancer, which is of great significance for clinical screening, diagnosis, classification and staging of cancer.
Although the specific embodiments of the present invention have been described in detail, it will be understood by those skilled in the art that various modifications and variations can be made to the details in light of all the teachings disclosed, and these changes are within the scope of protection of the present invention. The full scope of the invention is provided by the appended claims and any equivalents thereof.
Number | Date | Country | Kind |
---|---|---|---|
PCT/CN2020/090545 | May 2020 | WO | international |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/CN2021/093919 | 5/14/2021 | WO |