The present invention relates to a method for detecting Parkinson's disease by using a Parkinson's disease marker.
Parkinson's disease is pathologically a progressive neurodegenerative disease composed mainly of the formation of Lewy body having α-synuclein aggregates as a main component, the degeneration of dopaminergic neurons in the substantia nigra of the midbrain, and cell death, and is clinically a disease composed mainly of movement disorder such as muscle stiffness, tremor, hypokinesis, or gait disturbance.
Parkinson's disease is the second most common neurodegenerative disease after Alzheimer's disease. Its morbidity prevalence rate is 120 to 130 per 100,000 people, and it is estimated that there are approximately 140,000 patients in Japan.
At present, there exists no definitive therapy for Parkinson's disease. It is considered important for QOL maintenance to control symptoms by symptomatic therapy based on the supplementation of L-DOPA or the like.
However, subjective symptoms of movement disorder appear in an intermediate stage thereof or later. Thus, there is a demand for early diagnosis and early intervention of the disease.
For example, the detection of α-synuclein accumulation as well as the detection of microRNA derived from circulating serum (Patent Literature 1) and the measurement of the concentration ratio of tyrosine to phenylalanine in blood (Patent Literature 2) have been proposed as biomarkers for detecting Parkinson's disease. It has also been reported that: the formation of α-synuclein aggregates is observed in the skin, as in the brain, of Parkinson's disease patients (Non Patent Literature 1); and Parkinson's disease patients manifest skin diseases or symptoms such as seborrheic dermatitis, melanoma, bullous pemphigoid, or rosacea (Non Patent Literature 2). Although it is also considered that skin conditions are related in some way to Parkinson's disease, its scientific relation is totally unknown.
Meanwhile, techniques of examining current or future physiological states in vivo in humans by the analysis of nucleic acids such as DNA or RNA in biological samples have been developed. The analysis using nucleic acids has the advantages that: exhaustive analysis methods have already been established and abundant information can be obtained by one analysis; and the functional connection of analysis results is easily performed on the basis of many research reports on single-nucleotide polymorphism, RNA functions, and the like. Nucleic acids derived from a biological origin can be extracted from body fluids such as blood, secretions, tissues, and the like. It has recently been reported that: RNA contained in skin surface lipids (SSL) can be used as a biological sample for analysis; and marker genes of the epidermis, the sweat gland, the hair follicle and the sebaceous gland can be detected from SSL (Patent Literature 3).
(Patent Literature 1) JP-A-2019-506183
(Patent Literature 2) JP-A-2016-75644
(Patent Literature 3) WO 2018/008319
(Non Patent Literature 1) Rodriguez-Leyva I et al. Ann Clin Transl Neurol. 2014 (modified)
(Non Patent Literature 2) Ravn A H et al. Clin Cosmet Investig Dermatol. 2017
The present invention relates to the following 1) to 3).
1) A method for detecting Parkinson's disease in a test subject, comprising a step of measuring an expression level of at least one gene selected from the group of 4 genes consisting of SNORA16A, SNORA24, SNORA50 and REXO1L2P or an expression product thereof in a biological sample collected from the test subject.
2) A test kit for detecting Parkinson's disease, the kit being used in a method according to 1), and comprising an oligonucleotide which specifically hybridizes to the gene, or an antibody which recognizes an expression product of the gene.
3) A marker for detecting Parkinson's disease comprising at least one gene selected from the groups of genes shown in Tables 3-1 to 3-4 and Tables 6-1 and 6-2 or an expression product thereof.
The present invention relates to a provision of a marker for detecting Parkinson's disease and a method for detecting Parkinson's disease by using the marker.
The present inventors collected SSL from the skin of Parkinson's disease patients and healthy subjects and exhaustively analyzed the expression state of RNA contained in the SSL as sequence information, and consequently found that the expression levels of particular genes significantly differ therebetween and Parkinson's disease can be detected on the basis of this index.
The present invention enables Parkinson's disease to be conveniently and noninvasively detected in an early stage with high accuracy, sensitivity and specificity.
All patent literatures, non patent literatures, and other publications cited herein are incorporated herein by reference in their entirety.
In the present invention, the term “nucleic acid” or “polynucleotide” means DNA or RNA. The DNA includes all of cDNA, genomic DNA, and synthetic DNA. The “RNA” includes all of total RNA, mRNA, rRNA, tRNA, non-coding RNA and synthetic RNA.
In the present invention, the “gene” encompasses double-stranded DNA including human genomic DNA as well as single-stranded DNA including cDNA (positive strand), single-stranded DNA having a sequence complementary to the positive strand (complementary strand), and their fragments, and means matter containing some biological information in sequence information on bases constituting DNA.
The “gene” encompasses not only a “gene” represented by a particular nucleotide sequence but a nucleic acid encoding a congener (i.e., a homolog or an ortholog), a variant such as gene polymorphism, and a derivative thereof.
The names of genes disclosed herein follow Official Symbol described in NCBI ([www.ncbi.nlm.nih.gov/]). Meanwhile, gene ontology (GO) follows Pathway ID. described in String ([string-db.org/]).
In the present invention, the “expression product” of a gene conceptually encompasses a transcription product and a translation product of the gene. The “transcription product” is RNA resulting from the transcription of the gene (DNA), and the “translation product” means a protein which is encoded by the gene and translationally synthesized on the basis of the RNA.
In the present invention, the “Parkinson's disease” means an idiopathic and progressive disease which has the degeneration of dopaminergic neurons in the substantia nigra pars compacta as a main lesion and manifests three motor symptoms (tremor at rest, rigidity, and bradykinesia or akinesia) in a slowly progressive manner.
In the present invention, the “detection” of Parkinson's disease means to elucidate the presence or absence of Parkinson's disease and may be used interchangeably with the term “test”, “measurement”, “determination”, “evaluation” or “assistance of evaluation”. In the present specification, the term “determination” or “evaluation” does not include determination or evaluation by a physician.
The 4 genes consisting of SNORA16A, SNORA24, SNORA50 and REXO1L2P according to the present invention are genes selected from the 33 genes described in Table A given below for which the expression level of SSL-derived RNA was found to be significantly increased (UP) or decreased (DOWN) in Parkinson's disease patients compared with healthy subjects, as shown in Examples mentioned later. The 4 genes are genes whose relation to Parkinson's disease has previously been unknown (indicated by boldface in the table).
REXO1L2P
SNORA16A
SNORA24
SNORA50
33 genes shown in Table A were obtained by converting data (read count values) on the expression level of RNA extracted from SSL of test subjects of two tests (Test 1: 15 healthy subjects and 15 Parkinson's disease patients, Test 2: 50 healthy subjects and 50 Parkinson's disease patients) to RPM values which normalize the read count values for difference in the total number of reads among samples, identifying RNA (Test 1: 111 genes with increased expression and 68 genes with decreased expression (a total of 179 gene, Tables 1-1 to 1-5), Test 2: 565 genes with increased expression and 294 genes with decreased expression (a total of 859 gene, Tables 1-6 to 1-27) which attained a p value of 0.05 or less in Student's t-test in Parkinson's disease patients compared with healthy subjects on the basis of values obtained by the conversion of the RPM values to logarithmic values to base 2 (Log2 RPM values), and selecting common genes with increased expression (18 genes) and genes with decreased expression (15 genes) between Test 1 and Test 2.
Thus, a gene selected from the group consisting of the 179 genes and the 859 genes (a total of 1,005 genes except for duplication) or an expression product thereof is capable of serving as a Parkinson's disease marker for detecting Parkinson's disease. Among them, a gene selected from the group consisting of 33 genes shown in Table A or an expression product thereof is a preferred Parkinson's disease marker.
In Table A and Table 1 mentioned later, the “p value” refers to the probability of observing extreme statistics based on statistics actually calculated from data under null hypothesis in a statistical test. Thus, a smaller “p value” can be regarded as more significant difference between objects to be compared.
Genes represented by “UP” are genes whose expression level is increased in Parkinson's disease patients, and genes represented by “DOWN” are genes whose expression level is decreased in Parkinson's disease patients.
The group of the differentially expressed genes described above was found to include genes related to Parkinson's disease (hsa05012) in search for a biological process (BP) and a KEGG pathway by gene ontology (GO) enrichment analysis (see Table 2 mentioned later). Meanwhile, in the group of the differentially expressed genes described above, genes shown in Tables 3-1 to 3-4 mentioned later are genes whose relation to Parkinson's disease has not been reported so far. Thus, at least one gene selected from the group consisting of these genes or an expression product thereof is a novel Parkinson's disease marker for detecting Parkinson's disease. Particularly, at least one gene selected from the group consisting of SNORA16A, SNORA24, SNORA50 and REXO1L2P which are common between Test 1 and Test 2, or an expression product thereof is preferred as a novel Parkinson's disease marker. Two or more genes selected from the group are more preferred, three or more genes selected therefrom are further more preferred, and all of the four genes are even more preferred. It is also preferred to include at least SNORA24, which is included in common in Table A described above and Table B mentioned later.
Differentially expressed RNA may be identified from data (read count values) on the expression level of RNA by using normalized count values obtained by using, for example, DESeq2 (Love M I et al., Genome Biol. 2014) or logarithmic values to base 2 of the count value plus integer 1 (Log2(count+1) value).
For example, RNA which attains a corrected p value (FDR) of 0.25 or less in a likelihood ratio test in Parkinson's disease patients compared with healthy subjects is identified by using normalized count values as data on the expression level of RNA extracted from SSL of test subjects of the two tests mentioned above. As a result, 74 genes with increased expression, 209 genes with decreased expression, and a total of 283 genes (Tables 4-1 to 4-8) are obtained in Test 1, and 151 genes with increased expression, 308 genes with decreased expression, and a total of 459 genes (Tables 4-9 to 4-20) are obtained in Test 2. The expression of 7 genes is increased in common between Test 1 and Test 2 (ANXA1, AQP3, EMP1, KRT16, POLR2L, SERPINB4, and SNORA24), and the expression of 10 genes is decreased in common therebetween (ATP6VOC, BHLHE40, CCL3, CCNI, CXCR4, EGR2, GABARAPL1, RHOA, RNASEK, and SERINC1) (a total of 17 genes, Table B).
Thus, a gene selected from the group consisting of the 283 genes and the 459 genes (a total of 725 genes except for duplication) or an expression product thereof is capable of serving as a Parkinson's disease marker for detecting Parkinson's disease. Among them, a gene selected from the group consisting of the 17 genes shown in Table B or an expression product thereof is a preferred Parkinson's disease marker. Among them, a gene selected from the group consisting of 11 genes shown in Table C mentioned later, which are common with the genes shown in Table A described above, or an expression product thereof is a more preferred Parkinson's disease marker.
In the group of the differentially expressed genes described above, genes shown in Tables 6-1 and 6-2 mentioned later are genes whose relation to Parkinson's disease has not been reported so far. Thus, at least one gene selected from the group consisting of these genes or an expression product thereof is a novel Parkinson's disease marker for detecting Parkinson's disease. Particularly, SNORA24 (indicated by boldface in the table) which is common between Test 1 and Test 2 or an expression product thereof is preferred as a novel Parkinson's disease marker.
SNORA24
The gene capable of serving as a Parkinson's disease marker (hereinafter, also referred to as a “target gene”) also encompasses a gene having a nucleotide sequence substantially identical to the nucleotide sequence of DNA constituting the gene, as long as the gene is capable of serving as a biomarker for detecting Parkinson's disease. In this context, the nucleotide sequence substantially identical means a nucleotide sequence having 90% or higher, preferably 95% or higher, more preferably 98% or higher, further more preferably 99% or higher identity to the nucleotide sequence of DNA constituting the gene, for example, when searched by using homology calculation algorithm NCBI BLAST under conditions of expectation value=10; gap accepted; filtering=ON; match score=1; and mismatch score=−3.
The method for detecting Parkinson's disease according to the present invention includes a step of measuring an expression level of a target gene, which is in one aspect, at least one gene selected from the group consisting of SNORA16A, SNORA24, SNORA50 and REXO1L2P or an expression product thereof in a biological sample collected from a test subject.
In the method for detecting Parkinson's disease according to the present invention, examples of the test subject from which the biological sample is collected include mammals including humans and nonhuman mammals. A human is preferred. When the test subject is a human, the human is not particularly limited by sex, age, race, and the like thereof and can include infants to elderly people. Preferably, the test subject is a human who needs or desires detection of Parkinson's disease. The test subject is, for example, a human suspected of developing Parkinson's disease or a human having a genetic predisposition to develop Parkinson's disease.
The biological sample used in the present invention can be a tissue or a biomaterial in which the expression of the gene of the present invention varies with the onset or progression of Parkinson's disease. Examples thereof specifically include organs, skin, blood, urine, saliva, sweat, stratum corneum, skin surface lipids (SSL), body fluids such as tissue exudates, serum, plasma and others prepared from blood, feces, and hair, and preferably include the skin, the stratum corneum and skin surface lipids (SSL), more preferably skin surface lipids (SSL). Examples of the site of the skin from which SSL is collected include, but are not particularly limited to, the skin at an arbitrary site of the body, such as the head, the face, the neck, the body trunk, and the limbs. The skin at a site with high sebum secretion, for example, the skin of the head or the face, is preferred, and facial skin is more preferred.
In this context, the “skin surface lipids (SSL)” refer to a lipid-soluble fraction present on skin surface, and is also referred to as sebum. In general, SSL mainly contains secretion secreted from the exocrine gland such as the sebaceous gland in the skin, and is present on skin surface in the form of a thin layer that covers the skin surface. SSL contains RNA expressed in skin cells (see Patent Literature 3 described above). In the present specification, the “skin” is a generic name for regions containing tissues such as the stratum corneum, the epidermis, the dermis, and the hair follicle as well as the sweat gland, the sebaceous gland and other glands, unless otherwise specified.
Any approach for use in the recovery or removal of SSL from the skin can be adopted for the collection of SSL from the skin of a test subject. Preferably, an SSL-absorbent material or an SSL-adhesive material mentioned later, or a tool for scraping off SSL from the skin can be used. The SSL-absorbent material or the SSL-adhesive material is not particularly limited as long as the material has affinity for SSL. Examples thereof include polypropylene and pulp. More detailed examples of the procedure of collecting SSL from the skin include a method of allowing SSL to be absorbed to a sheet-like material such as an oil blotting paper or an oil blotting film, a method of allowing SSL to adhere to a glass plate, a tape, or the like, and a method of recovering SSL by scraping with a spatula, a scraper, or the like. In order to improve the adsorbability of SSL, an SSL-absorbent material impregnated in advance with a solvent having high lipid solubility may be used. On the other hand, the SSL-absorbent material preferably has a low content of a solvent having high water solubility or water because the adsorption of SSL to a material containing the solvent having high water solubility or water is inhibited. The SSL-absorbent material is preferably used in a dry state. Examples of the site of the skin from which SSL is collected include, but are not particularly limited to, the skin at an arbitrary site of the body, such as the head, the face, the neck, the body trunk, and the limbs. A site having high secretion of sebum, for example, the facial skin, is preferred.
The RNA-containing SSL collected from the test subject may be preserved for a given period. The collected SSL is preferably preserved under low-temperature conditions as rapidly as possible after collection in order to minimize the degradation of contained RNA. The temperature conditions for the preservation of RNA-containing SSL according to the present invention can be 0° C. or lower and are preferably from −20±20° C. to −80±20° C., more preferably from −20±10° C. to −80±10° C., further more preferably from −20±20° C. to −40±20° C., further more preferably from −20±10° C. to −40±10° C., further more preferably −20±10° C., further more preferably −20±5° C. The period of preservation of the RNA-containing SSL under the low-temperature conditions is not particularly limited and is preferably 12 months or shorter, for example, 6 hours or longer and 12 months or shorter, more preferably 6 months or shorter, for example, 1 day or longer and 6 months or shorter, further more preferably 3 months or shorter, for example, 3 days or longer and 3 months or shorter.
In the present invention, examples of the measurement object for the expression level of a target gene or an expression product thereof include cDNA artificially synthesized from RNA, DNA encoding the RNA, a protein encoded by the RNA, a molecule which interacts with the protein, a molecule which interacts with the RNA, and a molecule which interacts with the DNA. In this context, examples of the molecule which interacts with the RNA, the DNA or the protein include DNA, RNA, proteins, polysaccharides, oligosaccharides, monosaccharides, lipids, fatty acids, and their phosphorylation products, alkylation products, and sugar adducts, and complexes of any of them. The expression level comprehensively means the expression level or activity of the gene or the expression product.
In a preferred aspect, in the method of the present invention, SSL is used as a biological sample. In this case, the expression level of RNA contained in SSL is analyzed. Specifically, RNA is converted to cDNA through reverse transcription, followed by the measurement of the cDNA or an amplification product thereof.
In the extraction of RNA from SSL, a method which is usually used in RNA extraction or purification from a biological sample, for example, phenol/chloroform method, AGPC (acid guanidinium thiocyanate-phenol-chloroform extraction) method, a method using a column such as TRIzol®, RNeasy®, or QIAzol®, a method using special magnetic particles coated with silica, a method using magnetic particles for solid phase reversible immobilization, or extraction with a commercially available RNA extraction reagent such as ISOGEN can be used.
In the reverse transcription, primers which target particular RNA to be analyzed may be used, and random primers are preferably used for more comprehensive nucleic acid preservation and analysis. In the reverse transcription, common reverse transcriptase or reverse transcription reagent kit can be used. Highly accurate and efficient reverse transcriptase or reverse transcription reagent kit is suitably used. Examples thereof include M-MLV reverse transcriptase and its modified forms, and commercially available reverse transcriptase or reverse transcription reagent kits, for example, PrimeScript® Reverse Transcriptase series (Takara Bio Inc.) and SuperScript® Reverse Transcriptase series (Thermo Fisher Scientific, Inc.). SuperScript® III Reverse Transcriptase, SuperScript® VILO cDNA Synthesis kit (both from Thermo Fisher Scientific, Inc.), and the like are preferably used.
The temperature of extension reaction in the reverse transcription is adjusted to preferably 42° C.±1° C., more preferably 42° C.±0.5° C., further more preferably 42° C.±0.25° C., while its reaction time is adjusted to preferably 60 minutes or longer, more preferably from 80 to 120 minutes.
In the case of using RNA, cDNA or DNA as a measurement object, the method for measuring the expression level can be selected from nucleic acid amplification methods typified by PCR using DNA primers which hybridize thereto, real-time RT-PCR, multiplex PCR, SmartAmp, and LAMP, hybridization using a nucleic acid probe which hybridizes thereto (DNA chip, DNA microarray, dot blot hybridization, slot blot hybridization, Northern blot hybridization, and the like), a method of determining a nucleotide sequence (sequencing), and combined methods thereof.
In PCR, only particular DNA to be analyzed may be amplified by using a primer pair which targets the particular DNA, or a plurality of DNAs may be amplified by using a plurality of primer pairs. Preferably, the PCR is multiplex PCR. The multiplex PCR is a method of amplifying a plurality of gene regions at the same time by using a plurality of primer pairs at the same time in a PCR reaction system. The multiplex PCR can be carried out by using a commercially available kit (e.g., Ion AmpliSeq Transcriptome Human Gene Expression Kit; Life Technologies Japan Ltd.).
The temperature of annealing and extension reaction in the PCR depends on the primers used and therefore cannot be generalized. In the case of using the multiplex PCR kit described above, the temperature is preferably 62° C.±1° C., more preferably 62° C.±0.5° C., further more preferably 62° C.±0.25° C. Thus, preferably, the annealing and the extension reaction are performed by one step in the PCR. The time of the step of the annealing and the extension reaction can be adjusted depending on the size of DNA to be amplified, and the like, and is preferably from 14 to 18 minutes.
Conditions for denaturation reaction in the PCR can be adjusted depending on the DNA to be amplified, and are preferably from 95 to 99° C. and from 10 to 60 seconds. The reverse transcription and the PCR using the temperatures and the times as described above can be carried out by using a thermal cycler which is generally used for PCR.
The reaction product obtained by the PCR is preferably purified by the size separation of the reaction product. By the size separation, the PCR reaction product of interest can be separated from the primers and other impurities contained in the PCR reaction solution. The size separation of DNA can be performed by using, for example, a size separation column, a size separation chip, or magnetic beads which can be used in size separation. Preferred examples of the magnetic beads which can be used in size separation include magnetic beads for solid phase reversible immobilization (SPRI) such as Ampure XP.
The purified PCR reaction product may be subjected to further treatment necessary for conducting subsequent quantitative analysis. For example, for DNA sequencing, the purified PCR reaction product may be prepared into an appropriate buffer solution, the PCR primer regions contained in DNA amplified by PCR may be cleaved, and an adaptor sequence may be further added to the amplified DNA. For example, the purified PCR reaction product can be prepared into a buffer solution, and the removal of the PCR primer sequences and adaptor ligation can be performed for the amplified DNA. If necessary, the obtained reaction product can be amplified to prepare a library for quantitative analysis. These operations can be performed, for example, by using 5×VILO RT Reaction Mix attached to SuperScript® VILO cDNA Synthesis kit (Life Technologies Japan Ltd.), 5× Ion AmpliSeq HiFi Mix attached to Ion AmpliSeq Transcriptome Human Gene Expression Kit (Life Technologies Japan Ltd.), and Ion AmpliSeq Transcriptome Human Gene Expression Core Panel according to a protocol attached to each kit.
In the case of measuring the expression level of a target gene or a nucleic acid derived therefrom by use of Northern blot hybridization, examples thereof include a method in which; probe DNA is first labeled with a radioisotope, a fluorescent material, or the like. Subsequently, the obtained labeled DNA is allowed to hybridize to biological sample-derived RNA transferred to a nylon membrane or the like in accordance with a routine method. Then, the formed duplex of the labeled DNA and the RNA can be measured by detecting a signal derived from the label.
In the case of measuring the expression level of a target gene or a nucleic acid derived therefrom by use of RT-PCR, for example, cDNA is first prepared from biological sample-derived RNA in accordance with a routine method. This cDNA is used as a template, and a pair of primers (a positive strand which binds to the cDNA (− strand) and an opposite strand which binds to a + strand) prepared so as to be able to amplify the target gene of the present invention is allowed to hybridize thereto. Then, PCR is performed in accordance with a routine method, and the obtained amplified double-stranded DNA is detected. In the detection of the amplified double-stranded DNA, for example, a method of detecting labeled double-stranded DNA produced by the PCR by using primers labeled in advance with RI, a fluorescent material, or the like can be used.
In the case of measuring the expression level of a target gene or a nucleic acid derived therefrom by use of a DNA microarray, for example, an array in which at least one nucleic acid (cDNA or DNA) derived from the target gene of the present invention is immobilized on a support is used. Labeled cDNA or cRNA prepared from mRNA is allowed to bind onto the microarray, and the expression level of the mRNA can be measured by detecting the label on the microarray.
The nucleic acid to be immobilized in the array can be a nucleic acid which specifically (i.e., substantially only to the nucleic acid of interest) hybridizes under stringent conditions, and may be, for example, a nucleic acid having the whole sequence of the target gene of the present invention or may be a nucleic acid consisting of a partial sequence thereof. In this context, examples of the “partial sequence” include nucleic acids consisting of at least 15 to 25 bases. In this context, examples of the stringent conditions can usually include washing conditions on the order of “1×SSC, 0.1% SDS, and 37° C.”. Examples of the more stringent hybridization conditions can include conditions on the order of “0.5×SSC, 0.1% SDS, and 42° C.”. Examples of the much more stringent hybridization conditions can include conditions on the order of “0.1×SSC, 0.1% SDS, and 65° C.”. The hybridization conditions are described in, for example, J. Sambrook et al., Molecular Cloning: A Laboratory Manual, Third Edition, Cold Spring Harbor Laboratory Press (2001).
In the case of measuring the expression level of a target gene or a nucleic acid derived therefrom by sequencing, examples thereof include analysis using a next-generation sequencer (e.g., Ion S5/XL system, Life Technologies Japan Ltd.). RNA expression can be quantified on the basis of the number of reads (read count) prepared by the sequencing.
The probe or the primers for use in the measurement described above, which correspond to the primers for specifically recognizing and amplifying the target gene of the present invention or a nucleic acid derived therefrom, or the probe for specifically detecting the RNA or the nucleic acid derived therefrom, can be designed on the basis of a nucleotide sequence constituting the target gene. In this context, the phrase “specifically recognize” means that a detected product or an amplification product can be confirmed to be the gene or the nucleic acid derived therefrom in such a way that, for example, substantially only the target gene of the present invention or the nucleic acid derived therefrom can be detected in Northern blot, or, for example, substantially only the nucleic acid is amplified in RT-PCR.
Specifically, an oligonucleotide containing a given number of nucleotides complementary to DNA consisting of a nucleotide sequence constituting the target gene of the present invention, or a complementary strand thereof can be used. In this context, the “complementary strand” refers to one strand of double-stranded DNA consisting of A:T (U for RNA) and/or G:C base pairs with respect to the other strand. The term “complementary” is not limited by the case of being a completely complementary sequence in a region with the given number of consecutive nucleotides, and can have preferably 80% or higher, more preferably 90% or higher, further more preferably 95% or higher identity of the nucleotide sequence. The identity of the nucleotide sequence can be determined by algorithm such as BLAST described above.
For use as a primer, the oligonucleotide can achieve specific annealing and strand extension. Examples thereof usually include oligonucleotides having a strand length of 10 or more bases, preferably 15 or more bases, more preferably 20 or more bases, and 100 or less bases, preferably 50 or less bases, more preferably 35 or less bases. For use as a probe, the oligonucleotide can achieve specific hybridization. An oligonucleotide can be used which has at least a portion or the whole of the sequence of DNA (or a complementary strand thereof) consisting of a nucleotide sequence constituting the target gene of the present invention, and has a strand length of, for example, 10 or more bases, preferably 15 or more bases, and, for example, 100 or less bases, preferably 50 or less bases, more preferably 25 or less bases.
In this context, the “oligonucleotide” can be DNA or RNA and may be synthetic or natural. The probe for use in hybridization is usually labeled for use.
In the case of measuring a translation product (protein) of the target gene of the present invention, a molecule which interacts with the protein, a molecule which interacts with the RNA, or a molecule which interacts with the DNA, a method such as protein chip analysis, immunoassay (e.g., ELISA), mass spectrometry (e.g., LC-MS/MS and MALDI-TOF/MS), one-hybrid method (PNAS 100, 12271-12276 (2003)), or two-hybrid method (Biol. Reprod. 58, 302-311 (1998)) can be used and can be appropriately selected depending on the measurement object.
For example, in the case of using the protein as a measurement object, the measurement may be carried out by contacting an antibody against the expression product of the present invention with a biological sample, detecting a polypeptide in the sample bound to the antibody, and measuring the level thereof. For example, according to Western blot, the antibody described above is used as a primary antibody, and an antibody which binds to the primary antibody and which is labeled with, for example, a radioisotope, a fluorescent material or an enzyme is used as a secondary antibody to label the primary antibody therewith, followed by the measurement of a signal derived from such a labeling material using a radiation meter, a fluorescence detector, or the like.
The antibody against the translation product may be a polyclonal antibody or a monoclonal antibody. These antibodies can be produced in accordance with a method known in the art. Specifically, the polyclonal antibody may be produced by using a protein which has been expressed in E. coli or the like and purified in accordance with a routine method, or synthesizing a partial polypeptide of the protein in accordance with a routine method, and immunizing a nonhuman animal such as a house rabbit therewith, followed by obtainment from the serum of the immunized animal in accordance with a routine method.
Meanwhile, the monoclonal antibody can be obtained from hybridoma cells prepared by immunizing a nonhuman animal such as a mouse with a protein which has been expressed in E. coli or the like and purified in accordance with a routine method, or a partial polypeptide of the protein, and fusing the obtained spleen cells with myeloma cells. Alternatively, the monoclonal antibody may be prepared by use of phage display (Griffiths, A. D.; Duncan, A. R., Current Opinion in Biotechnology, Volume 9, Number 1, February 1998, pp. 102-108 (7)).
In this way, the expression level of the target gene of the present invention or the expression product thereof in a biological sample collected from a test subject is measured, and Parkinson's disease is detected on the basis of the expression level. The detection is specifically performed by comparing the measured expression level of the target gene of the present invention or the expression product thereof with a control level.
In the case of analyzing expression levels of a plurality of target genes by sequencing, as described above, read count values which are data on expression levels, RPM values which normalize the read count values for difference in the total number of reads among samples, values obtained by the conversion of the RPM values to logarithmic values to base 2 (Log2 RPM values), or normalized count values obtained by using DESeq2 or logarithmic values to base 2 of the count value plus integer 1 (Log2(count+1) values) are preferably used as an index. Also, values calculated by, for example, fragments per kilobase of exon per million reads mapped (FPKM), reads per kilobase of exon per million reads mapped (RPKM), or transcripts per million (TPM) which are general quantitative values of RNA-seq may be used. Alternatively, signal values obtained by microarray method or corrected values thereof may be used. In the case of analyzing only a particular target gene by RT-PCR or the like, an analysis method of converting the expression level of the target gene to a relative expression level with respect to the expression level of a housekeeping gene as a standard, or a method of analyzing a copy number obtained by absolute quantification using a plasmid containing a region of the target gene is preferred. A copy number obtained by digital PCR may be used.
In this context, examples of the “control level” include an expression level of the target gene or the expression product thereof in a healthy person. The expression level of the healthy person may be a statistic (e.g., a mean) of the expression level of the gene or the expression product thereof measured from a healthy person population. For a plurality of target genes, it is preferred to determine a standard expression level of each individual gene or expression product thereof.
The detection of Parkinson's disease according to the present invention may be performed through an increase and/or decrease in the expression level of the target gene of the present invention or the expression product thereof. In this case, the expression level of the target gene or the expression product thereof in a biological sample derived from a test subject is compared with a cutoff value (reference value) of each gene or the expression product thereof. The cutoff value can be appropriately determined on the basis of a statistical numeric value, such as a mean or standard deviation, of the expression level based on the expression level of the target gene or expression product thereof in a healthy subject obtained as a standard data.
A discriminant (prediction model) which discriminates between a Parkinson's disease patient and a healthy person is constructed by using measurement values of an expression level of the target gene or the expression product thereof derived from a Parkinson's disease patient and an expression level of the target gene or the expression product thereof derived from a healthy person, and Parkinson's disease can be detected through the use of the discriminant. Specifically, a discriminant (prediction model) which discriminates between a Parkinson's disease patient and a healthy person is constructed by using measurement values of an expression level of a target gene or an expression product thereof derived from a Parkinson's disease patient and an expression level of the target gene or the expression product thereof derived from a healthy subject as teacher samples, and a cutoff value (reference value) which discriminates between the Parkinson's disease patient and the healthy person is determined on the basis of the discriminant. In the preparation of the discriminant, dimensional compression is performed by principal component analysis (PCA), and a principal component can be used as an explanatory variable.
The presence or absence of Parkinson's disease in a test subject can be evaluated by similarly measuring a level of the target gene or the expression product thereof from a biological sample collected from the test subject, substituting the obtained measurement value into the discriminant, and comparing the results obtained from the discriminant with the reference value.
In this context, algorithm known in the art such as algorithm for use in machine learning can be used as the algorithm in the construction of the discriminant. Examples of the machine learning algorithm include random forest, linear kernel support vector machine (SVM linear), rbf kernel support vector machine (SVM rbf), neural network, generalized linear model, regularized linear discriminant analysis, and regularized logistic regression. A predictive value is calculated by inputting data for the verification of the constructed prediction model, and a model which attains the predictive value most compatible with an actually measured value, for example, recall, precision, and an F value which is a harmonic mean thereof are calculated from a predictive value and an actually measured value, and a model having the largest F value can be selected as the optimum prediction model.
The method for determining the cutoff value (reference value) is not particularly limited, and the value can be determined in accordance with an approach known in the art. The value can be determined from, for example, an ROC (receiver operating characteristic) curve prepared by using the discriminant. In the ROC curve, the probability (%) of producing positive results in positive patients (sensitivity) is plotted on the ordinate against a value (false positive rate) of 1 minus the probability (%) of producing negative results in negative patients (specificity) on the abscissa. As for “true positive (sensitivity)” and “false positive (1−specificity)” shown in the ROC curve, a value at which “true positive (sensitivity)”−“false positive (1−specificity)” is maximized (Youden index) can be used as the cutoff value (reference value).
As shown in Examples mentioned later, prediction models were constructed by use of machine learning algorithm by using a value of each principal component obtained from expression level data (Log2 RPM values) on target genes shown in Table A (33 genes or 4 genes selected therefrom) as an explanatory variable, and the healthy subjects and the Parkinson's disease patients as objective variables. As a result, Parkinson's disease was found predictable with the model by using the 4 genes SNORA16A, SNORA24, SNORA50, and REXO1L2P. Also, Parkinson's disease was found predictable more accurately with the model by using the 33 genes.
Thus, in the case of preparing the discriminant which discriminates between a Parkinson's disease patient group and a healthy person group, a discriminant which exhibits high recall and precision can be prepared by appropriately adding, to expression data on the 4 target genes SNORA16A, SNORA24, SNORA50 and REXO1L2P, expression data on at least one gene selected from the group consisting of the remaining 29 genes shown in Table A or an expression product thereof as a target gene, preferably adding thereto an appropriate number of genes with high variable importance based on variable importance shown in Table 8 mentioned later. Thus, Parkinson's disease can be detected with higher accuracy. Specifically, addition of 8 genes EGR2, RHOA, CCNI, RNASEK, CSF2RB, SERP1, ANKRD12, and SLC25A3 are preferred. Further, addition of 12 genes consisting of these 8 genes and 4 genes CD83, CXCR4, ITGAX, and UQCRH are preferred, and addition of 18 genes consisting of these 12 genes and 6 genes KCNQ1OT1, CCL3, C10orf116, SERPINB4, LCE3D, and CNFN are preferred. It is preferred to add all of the 29 genes.
Alternatively, expression data on at least one gene, except for SNORA24, selected from the group consisting of 11 genes which are shown as differentially expressed genes in both Table A and Table B described above, and shown in Table C given below, or an expression product thereof may be appropriately added as a target gene to the 4 target genes SNORA16A, SNORA24, SNORA50 and REXO1L2P.
Expression data on at least one gene selected from the group consisting of genes shown in Table B or an expression product thereof may be used as a target gene for use in preparing the discriminant which discriminates between a Parkinson's disease patient group and a healthy person group. Preferably, SNORA24 as well as at least one of the other genes is used. More preferably, expression data on genes shown in Table C or expression products thereof is used. Further more preferably, expression data on all the genes shown in Table B or expression products thereof is used.
The test kit for detecting Parkinson's disease according to the present invention contains a test reagent for measuring an expression level of the target gene of the present invention or an expression product thereof in a biological sample separated from a patient.
Specific examples thereof include a reagent for nucleic acid amplification and hybridization containing an oligonucleotide (e.g., a primer for PCR) which specifically binds (hybridizes) to the target gene of the present invention or a nucleic acid derived therefrom, and a reagent for immunoassay containing an antibody which recognizes an expression product (protein) of the target gene of the present invention. The oligonucleotide, the antibody, or the like contained in the kit can be obtained by a method known in the art as mentioned above.
The test kit may contain, in addition to the antibody or the nucleic acid, a labeling reagent, a buffer solution, a chromogenic substrate, a secondary antibody, a blocking agent, an instrument necessary for a test, a control, a tool for collecting a biological sample (e.g., an oil blotting film for collecting SSL), and the like.
Aspects and preferred embodiments of the present invention will be given below.
<1> A method for detecting Parkinson's disease in a test subject, comprising a step of measuring an expression level of at least one gene selected from the group of 4 genes consisting of SNORA16A, SNORA24, SNORA50 and REXO1L2P or an expression product thereof in a biological sample collected from the test subject.
<2> The method for detecting Parkinson's disease according to <1>, wherein the method at least comprises measuring an expression level of SNORA24 gene or an expression product thereof.
<3> The method according to <1> or <2>, wherein the expression level of the gene or the expression product thereof is measured as an expression level of mRNA.
<4> The method according to any of <1> to <3>, wherein the gene or the expression product thereof is RNA contained in skin surface lipids of the test subject.
<5> The method according to any of <1> to <4>, wherein the presence or absence of Parkinson's disease is evaluated by comparing the measurement value of the expression level with a reference value of the gene or the expression product thereof.
<6> The method according to any of <1> to <4>, wherein the presence or absence of Parkinson's disease in the test subject is evaluated by the following steps: preparing a discriminant which discriminates between the Parkinson's disease patient and a healthy person by using measurement values of an expression level of the gene or the expression product thereof derived from a Parkinson's disease patient and an expression level of the gene or the expression product thereof derived from a healthy subject as teacher samples; substituting the measurement value of the expression level of the gene or the expression product thereof obtained from the biological sample collected from the test subject into the discriminant; and comparing the obtained results with a reference value.
<7> The method according to <6>, wherein expression levels of all the genes of the group of 4 genes or expression products thereof are measured.
<8> The method according to <6> or <7>, wherein expression levels of the at least one gene selected from the group of 4 genes as well as at least one gene selected from the following group of 29 genes or expression products thereof are measured:
ANKRD12, C10orf116, CCL3, CCNI, CD83, CNFN, CNN2, CSF2RB, CXCR4, EGR2, EMP1, ITGAX, KCNQ1OT1, LCE3D, LITAF, NDUFA4L2, NDUFS5, POLR2L, RHOA, RNASEK, RPL7A, RPS26, SERINC1, SERP1, SERPINB4, SLC25A3, SNRPG, SRRM2, and UQCRH.
<9> The method according to <8>, wherein expression levels of the at least one gene selected from the group of 4 genes as well as at least one gene selected from the following group of 10 genes or expression products thereof are measured:
CCL3, CCNI, CXCR4, EGR2, EMP1, POLR2L, RHOA, RNASEK, SERINC1, and SERPINB4.
<10> The method according to <6> or <7>, wherein expression levels of the at least one gene selected from the group of 4 genes as well as at least one gene selected from the following group of 16 genes or expression products thereof are measured:
ANXA1, AQP3, ATP6VOC, BHLHE40, CCL3, CCNI, CXCR4, EGR2, EMP1, GABARAPL1, KRT16, POLR2L, RHOA, RNASEK, SERINC1, and SERPINB4.
<11> The method according to <6> or <7>, wherein expression levels of the at least one gene selected from the group of 4 genes as well as at least one gene selected from the groups of genes shown in Tables 3-1 to 3-4 mentioned later and Tables 6-1 and 6-2 mentioned later (except for the 4 genes) or expression products thereof are measured.
<12> The method according to <6> or <7>, wherein expression levels of the at least one gene selected from the group of 4 genes as well as at least one gene selected from the groups of 1,005 genes shown in Tables 1-1 to 1-27 mentioned later and 725 genes shown in Tables 4-1 to 4-20 mentioned later except for the 4 genes or expression products thereof are measured.
<13> A test kit for detecting Parkinson's disease, the kit being used in a method according to any of <1> to <10>, and comprising an oligonucleotide which specifically hybridizes to the gene or a nucleic acid derived therefrom, or an antibody which recognizes an expression product of the gene.
<14> Use of at least one gene selected from the groups of genes shown in Tables 3-1 to 3-4 mentioned later and Tables 6-1 and 6-2 mentioned later or an expression product thereof as a marker for detecting Parkinson's disease.
<15> Use of at least one gene selected from the group of 4 genes consisting of SNORA16A, SNORA24, SNORA50 and REXO1L2P or an expression product thereof as a marker for detecting Parkinson's disease.
<16> A marker for detecting Parkinson's disease comprising at least one gene selected from the groups of genes shown in Tables 3-1 to 3-4 mentioned later and Tables 6-1 and 6-2 mentioned later or an expression product thereof.
<17> The marker for detecting Parkinson's disease according to <16>, wherein the detection marker comprises at least one gene selected from the group of 4 genes consisting of SNORA16A, SNORA24, SNORA50 and REXO1L2P or an expression product thereof.
Hereinafter, the present invention will be described in more detail with reference to Examples. However, the present invention is not limited by these examples.
1) SSL Collection
Two tests were conducted as the following Test 1 and Test 2.
Test 1: 15 healthy subjects (from 40 to 89 years old, male and female) and 15 Parkinson's disease patients (PD) (from 40 to 89 years old, male and female) were selected as test subjects.
Test 2: 50 healthy subjects (from 40 to 89 years old, male) and 50 PD (from 40 to 89 years old, male) were selected as test subjects.
PD was diagnosed in advance as Parkinson's disease (Hoehn & Yahr stage I or II) by a neurologist. Sebum was recovered from the whole face of each test subject by using an oil blotting film (5×8 cm, made of polypropylene, 3M Company). Then, the oil blotting film was transferred to a vial and preserved at −80° C. for approximately 1 month until use in RNA extraction.
2) RNA Preparation and Sequencing
The oil blotting film of the above section 1) was cut into an appropriate size, and RNA was extracted by using QIAzol Lysis Reagent (Qiagen N.V.) in accordance with the attached protocol. On the basis of the extracted RNA, cDNA was synthesized through reverse transcription at 42° C. for 90 minutes by using SuperScript VILO cDNA Synthesis kit (Life Technologies Japan Ltd.). The primers used for reverse transcription reaction were random primers attached to the kit. A library containing DNA derived from 20802 genes was prepared by multiplex PCR from the obtained cDNA. The multiplex PCR was performed by using Ion AmpliSeq Transcriptome Human Gene Expression Kit (Life Technologies Japan Ltd.) under conditions of [99° C., 2 min→(99° C., 15 sec→62° C., 16 min)×20 cycles→4° C., hold]. The obtained PCR product was purified with Ampure XP (Beckman Coulter Inc.), followed by buffer reconstitution, primer sequence digestion, adaptor ligation, purification, and amplification to prepare a library. The prepared library was loaded on Ion 540 Chip and sequenced by using Ion S5/XL system (Life Technologies Japan Ltd.).
3) Data Analysis
i) RNA Expression Analysis—1
In the data (read count values) on the expression level of RNA derived from the test subjects measured in the above section 2), data with a read count of less than 10 was treated as missing values. After conversion to RPM values which normalized the read count values for difference in the total number of reads among samples, the missing values were compensated for by use of an approach called singular value decomposition (SVD) imputation. However, only genes which produced expression level data without missing values in 80% or more sample test subjects in the expression level data on the test subjects in all the samples were used in analysis given below. In the analysis, converted RPM values, logarithmic values of the RPM values of the read counts to base 2 (Log2 RPM values) were used in order to approximate the RPM values, which followed negative binominal distribution, to normal distribution.
Differentially expressed RNA which attained a p value of 0.05 or less in Student's t-test in PD compared with the healthy subjects was identified on the basis of the SSL-derived RNA expression levels (Log2 RPM values) of the healthy subjects and PD described above. In Test 1, the expression of 111 RNAs was increased in PD compared with the healthy subjects (Tables 1-1 to 1-3), and the expression of 68 RNAs was decreased therein (Tables 1-4 to 1-5). Meanwhile, in Test 2, the expression of 565 RNAs was increased (Tables 1-6 to 1-19), and the expression of 294 RNAs was decreased (Tables 1-20 to 1-27). The expression of 18 RNAs was increased in common between Test 1 and Test 2, and the expression of 15 RNAs was decreased in common therebetween (genes indicated by boldface in the tables).
C10orf116
1.346318684
0.027914601
UP
CNFN
1.272119089
0.024347366
UP
EMP1
1.428292097
0.010451584
UP
KCNQ1OT1
1.644199329
0.04140012
UP
LCE3D
1.525057902
0.017729736
UP
NDUFA4L2
1.469853745
0.047011103
UP
NDUFS5
1.173366098
0.028285636
UP
POLR2L
1.288069119
0.005102443
UP
REXO1L2P
2.258334633
0.021096388
UP
RPL7A
0.799765552
0.040024088
UP
RPS26
1.048925589
0.020173699
UP
SERPINB4
1.73450959
0.048165225
UP
SLC25A3
0.683663369
0.040816858
UP
SNORA16A
1.233214856
0.005217419
UP
SNORA24
1.397191537
0.001016782
UP
SNORA50
1.299388426
0.010607324
UP
SNRPG
1.989577925
0.002505629
UP
UQCRH
1.081064513
0.010579673
UP
ANKRD12
−1.930754522
0.010768568
DOWN
CCL3
−1.639111096
0.008678309
DOWN
CCNI
−1.932387295
0.00403203
DOWN
CD83
−1.066374053
0.04175246
DOWN
CNN2
−0.629754604
0.023710615
DOWN
CSF2RB
−1.104312619
0.020573046
DOWN
CXCR4
−2.033830014
0.00024412
DOWN
EGR2
−0.997120306
0.005989411
DOWN
ITGAX
−1.11377676
0.027930711
DOWN
LITAF
−0.831805644
0.014085655
DOWN
RHOA
−0.902566363
0.003151667
DOWN
RNASEK
−1.016194703
0.030620951
DOWN
SERINC1
−0.651103256
0.046063301
DOWN
SERP1
−0.82729507
0.033858187
DOWN
SRRM2
−0.752261071
0.036848008
DOWN
C10orf116
0.529752336
0.039587014
UP
CNFN
0.990121666
1.85E−05
UP
EMP1
0.948607145
0.000952753
UP
KCNQ1OT1
0.517120259
0.015543571
UP
LCE3D
0.843482517
0.000577787
UP
NDUFA4L2
1.063393782
2.72E−05
UP
NDUFS5
0.457090069
0.011340643
UP
POLR2L
0.3357497
0.037600455
UP
REXO1L2P
0.730041651
0.016022131
UP
RPL7A
0.370261967
0.003107308
UP
RPS26
0.423684057
0.015281796
UP
Test 2
SERPINB4
0.740104652
0.009405167
UP
Test 2
SLC25A3
0.266515461
0.031602198
UP
SNORA16A
0.800445194
3.37E−05
UP
SNORA24
0.62246595
0.000620204
UP
SNORA50
0.501154595
0.004445349
UP
SNRPG
0.533113185
0.003903621
UP
UQCRH
0.346746063
0.030618555
UP
CD83
−0.526594744
0.029159029
DOWN
CNN2
−0.478206967
0.045041795
DOWN
CSF2RB
−0.537088027
0.047037042
DOWN
CXCR4
−0.628204085
0.020358444
DOWN
EGR2
−0.299185803
0.033982561
DOWN
ITGAX
−0.64770333
0.014582029
DOWN
LITAF
−0.329270813
0.029150473
DOWN
RHOA
−0.299449889
0.004939206
RNASEK
−0.203072703
0.046581317
SEMA6B
−0.5268738
0.041383614
DOWN
SERINC1
−0.54365295
0.011959311
DOWN
A biological process (BP) and a KEGG pathway were searched for by gene ontology (GO) enrichment analysis by using the public database STRING. As a result, 30 and 39 KEGG pathways related to the gene group with increased or decreased expression in the PD patients were obtained in Test 1 and Test 2, respectively, and the term hsa05012 (Parkinson's disease) which indicates Parkinson's disease was found to be included in both the tests (Tables 2-1 and 2-2).
Previously reported literatures were checked about the relation to Parkinson's disease of the genes shown in Tables 1-1 to 1-27 described above which were differentially expressed in at least either Test 1 or Test 2. As a result, 21 genes shown in Table 3-1 among the genes differentially expressed in Test 1 and 92 genes shown in Tables 3-2 to 3-4 among the genes differentially expressed in Test 2 had not been reported so far on their relation to Parkinson's disease, demonstrating that these genes are capable of serving as novel markers for detecting Parkinson's disease. Genes indicated by boldface in the tables are common genes between Test 1 and Test 2.
REXO1L2P
UP
SNORA16A
UP
SNORA24
UP
SNORA50
REXO1L2P
SNORA16A
SNORA24
SNORA50
ii) RNA Expression Analysis—2
Data (read count values) on the expression level of RNA derived from the test subjects measured in the above section 2) was normalized by use of an approach called DESeq2. However, a sample in which 4161 or more genes were not detected was excluded, and only genes which produced expression level data without missing values in 90% or more sample test subjects in the expression level data on the test subjects in all the samples after exclusion were used in analysis given below. In the analysis, normalized count values obtained by use of an approach called DESeq2 were used.
Differentially expressed RNA which attained a corrected p value (FDR) of 0.25 or less in the likelihood ratio test in PD compared with the healthy subjects was identified on the basis of the SSL-derived RNA expression levels (normalized count values) of the healthy subjects and PD described above. In Test 1, the expression of 74 RNAs was increased in PD compared with the healthy subjects (Tables 4-1 and 4-2), and the expression of 209 RNAs was decreased therein (Tables 4-3 to 4-8). Meanwhile, in Test 2, the expression of 151 RNAs was increased (Tables 4-9 to 4-12), and the expression of 308 RNAs was decreased (Tables 4-13 to 4-20). The expression of 7 RNAs was increased in common between Test 1 and Test 2, and the expression of 10 RNAs was decreased in common therebetween (genes indicated by boldface in the tables).
ANXA1
1.686938977
0.032012546
UP
AQP3
2.056781943
0.207453699
UP
EMP1
2.274143956
0.060301659
UP
KRT16
1.904813057
0.157035049
UP
POLR2L
1.140600646
0.205453026
UP
SERPINB4
2.405038672
0.093218948
UP
SNORA24
1.41317214
0.022725658
UP
ATP6V0C
−0.92893591
0.142104577
DOWN
BHLHE40
−1.574553746
0.003238712
DOWN
CCL3
−2.617993487
0.022303042
DOWN
CCNI
−2.705241728
8.8856E−05
DOWN
CXCR4
−1.852473527
0.024085385
DOWN
EGR2
−0.988003468
0.166431417
DOWN
GABARAPL1
−1.322693883
0.060301659
DOWN
RHOA
−0.846384811
0.166431417
DOWN
RNASEK
−0.803995199
0.134092229
DOWN
SERINC1
−1.248273951
0.073126006
DOWN
ANXA1
0.789867752
0.014394956
UP
AQP3
0.599212307
0.197195688
UP
EMP1
1.53672252
0.000620279
UP
KRT16
0.398735989
0.203917134
UP
POLR2L
0.388253793
0.070687016
UP
SERPINB4
0.673429357
0.142882428
UP
SNORA24
0.379856346
0.249405298
UP
ATP6V0C
−0.454978704
0.029798738
DOWN
BHLHE40
−0.401373324
0.189293656
DOWN
CCL3
−1.013989016
0.019217132
DOWN
CCNI
−0.297462333
0.191525939
DOWN
CXCR4
−0.655969209
0.097540633
DOWN
EGR2
−0.387465028
0.179929185
DOWN
GABARAPL1
−0.427119307
0.02821497
DOWN
RHOA
−0.306833302
0.114612949
DOWN
RNASEK
−0.263726846
0.189823772
DOWN
SERINC1
−0.436066431
0.233336584
DOWN
A biological process (BP) and a KEGG pathway were searched for by gene ontology (GO) enrichment analysis by using the public database STRING. As a result, 30 and 28 KEGG pathways related to the gene group with increased or decreased expression in the PD patients were obtained in Test 1 and Test 2, respectively, and the term hsa05012 (Parkinson's disease) which indicates Parkinson's disease was found to be included in both the tests (Tables 5-1 and 5-2).
coli infection
Previously reported literatures were checked about the relation to Parkinson's disease of the genes shown in Tables 4-1 to 4-20 described above which were differentially expressed in at least either Test 1 or Test 2. As a result, 19 genes shown in Table 6-1 among the genes differentially expressed in Test 1 and 30 genes shown in Table 6-2 among the genes differentially expressed in Test 2 had not been reported so far on their relation to Parkinson's disease, demonstrating that these genes are capable of serving as novel markers for detecting Parkinson's disease. Genes indicated by boldface in the tables are common genes between Test 1 and Test 2.
SNORA24
SNORA24
1) Data Used
In the data (read count values) on the expression level of SSL-derived RNA from the test subjects, data with a read count of less than 10 was treated as missing values, as in RNA expression analysis—1 in Example 1. After conversion to RPM values which normalized the read count values for difference in the total number of reads among samples, the missing values were compensated for by use of an approach called singular value decomposition (SVD) imputation. However, only genes which produced expression level data without missing values in 80% or more samples in all the samples were used in analysis given below. In the construction of machine learning models, converted RPM values, logarithmic values of RPM value to base 2 (Log2 RPM values) were used in order to approximate the RPM values, which followed negative binominal distribution, to normal distribution.
2) Data Set Partitioning
In the RNA profile data set obtained from the test subjects of Test 1, RNA profile data from a total of 20 subjects (10 healthy subjects and 10 PD) was used as training data for PD prediction models, and RNA profile data from the remaining 10 subjects was used as test data for use in the evaluation of model precision. In the RNA profile data set obtained from the test subjects of Test 2, RNA profile data from a total of 80 subjects (40 healthy subjects and 40 PD) was used as training data for PD prediction models, and RNA profile data from the remaining 20 subjects was used as test data for use in the evaluation of model precision.
3) Selection of Feature Gene
18 RNAs whose expression was increased in common between Test 1 and Test 2 and 15 RNAs whose expression was decreased in common between Test 1 and Test 2, in the PD patients compared with the healthy subjects in RNA expression analysis—1 in Example 1 (genes indicated by boldface in Tables 1-1 to 1-27) were selected as feature genes. Their expression level data was converted to principal components by principal component analysis. Then, the first to tenth principal components were used as explanatory variables. Among the 18 RNAs whose expression was increased in common between Test 1 and Test 2 and the 15 RNAs whose expression was decreased in common between Test 1 and Test 2 in the PD patients, 4 genes SNORA16A, SNORA24, SNORA50, and REXO1L2P were selected as feature genes. Their expression level data was converted to principal components by principal component analysis. Then, the first to fourth principal components were used as explanatory variables.
4) Model Construction
Prediction model construction was carried out by using a value of each principal component obtained from expression level data (Log2 RPM values) on the feature genes selected as training data from SSL-derived RNA as an explanatory variable, and the healthy subjects (HL) and PD as objective variables. The prediction models were learned by 10-fold cross validation by using 7 algorithms random forest, linear kernel support vector machine (SVM linear), rbf kernel support vector machine (SVM rbf), neural network, generalized linear model, regularized linear discriminant analysis, and regularized logistic regression for each item to be predicted. As for each algorithm, the value of each principal component obtained from the feature gene expression levels (Log2 RPM value) of the test data was input to the models thus learned to calculate a target predictive value for each prediction item. Recall, precision, and an F value which is a harmonic mean thereof are calculated from a predictive value and an actually measured value, and a model having the largest F value was selected as the optimum prediction model.
5) Results
Table 7 shows the algorithm used, the recall, the precision, and the F value of each item to be predicted.
Table 8 shows results of calculating the variable importance of each feature gene when random forest was used in model construction.
F1 of the model obtained by using 4 genes SNORA16A, SNORA24, SNORA50, and REXO1L2P was 0.67 in Test 1, 0.75 in Test 2, and 0.76 in integrated Test 1+Test 2, indicating that PD was predictable with this model. F1 of the model obtained by using a total of 33 genes including 18 RNAs with increased expression and 15 RNAs with decreased expression in the PD patients was 0.91 in Test 1, 0.80 in Test 2, and 0.82 in integrated Test 1+Test 2, indicating that PD was more highly accurately predictable with this model.
1) Data Used
Data (read count values) on the expression level of SSL-derived RNA from the test subjects was normalized by use of an approach called DESeq2, as in RNA expression analysis—2 in Example 1. However, a sample in which 4161 or more genes were not detected was excluded, and only genes which produced expression level data without missing values in 90% or more sample test subjects in the expression level data on the test subjects in all the samples after exclusion were used in analysis given below. In the analysis, normalized count values obtained by use of an approach called DESeq2 were used.
2) Data Set Partitioning
In the RNA profile data set obtained from the test subjects of Test 1, RNA profile data from a total of 15 subjects (9 healthy subjects and 6 PD) was used as training data for PD prediction models, and RNA profile data from a total of 5 subjects (the remaining 4 healthy subjects and 1 PD) was used as test data for use in the evaluation of model precision. In the RNA profile data set obtained from the test subjects of Test 2, RNA profile data from a total of 72 subjects (37 healthy subjects and 35 PD) was used as training data for PD prediction models, and RNA profile data from a total of 24 subjects (the remaining 13 healthy subjects and 11 PD) was used as test data for use in the evaluation of model precision.
3) Selection of Feature Gene
17 RNAs whose expression was increased or decreased in common between Test 1 and Test 2 in the PD patients compared with the healthy subjects in RNA expression analysis—2 in Example 1 (genes indicated by boldface in Tables 4-1 to 4-20) were selected as feature genes. Their expression level data was converted to principal components by principal component analysis. Then, the first to fourth principal components were used as explanatory variables.
4) Model Construction
Prediction model construction was carried out by using a value of each principal component obtained from expression level data (logarithmic values to base 2 of normalized count values plus 1) on the feature genes selected as training data from SSL-derived RNA as an explanatory variable, and the healthy subjects (HL) and PD as objective variables. The prediction models were learned by 10-fold cross validation by using 7 algorithms random forest, linear kernel support vector machine (SVM linear), rbf kernel support vector machine (SVM rbf), neural network, generalized linear model, regularized linear discriminant analysis, and regularized logistic regression for each item to be predicted. As for each algorithm, the value of each principal component obtained from the feature gene expression levels (logarithmic values to base 2 of normalized count values plus 1) of the test data was input to the models thus learned to calculate a target predictive value for each prediction item. Recall, precision, and an F value which is a harmonic mean thereof are calculated from a predictive value and an actually measured value, and a model having the largest F value was selected as the optimum prediction model.
5) Results
Table 9 shows the algorithm used, the recall, the precision, and the F value of each item to be predicted.
The F value of the model obtained by using 17 RNAs whose expression was increased or decreased in common between Test 1 and Test 2 in results of the likelihood ratio test after normalization by DESeq2 was 1 in Test 1 and 0.87 in Test 2, indicating that PD was predictable with this model.
1) Data Used
Data (read count values) on the expression level of SSL-derived RNA from the test subjects was normalized by use of an approach called DESeq2, as in RNA expression analysis—2 in Example 1. However, a sample in which 4161 or more genes were not detected was excluded, and only genes which produced expression level data without missing values in 90% or more sample test subjects in the expression level data on the test subjects in all the samples after exclusion were used in analysis given below. In the analysis, normalized count values obtained by use of an approach called DESeq2 were used.
2) Data Set Partitioning
In the RNA profile data set obtained from the test subjects of Test 1, RNA profile data from a total of 15 subjects (9 healthy subjects and 6 PD) was used as training data for PD prediction models, and RNA profile data from a total of 5 subjects (the remaining 4 healthy subjects and 1 PD) was used as test data for use in the evaluation of model precision. In the RNA profile data set obtained from the test subjects of Test 2, RNA profile data from a total of 72 subjects (37 healthy subjects and 35 PD) was used as training data for PD prediction models, and RNA profile data from a total of 24 subjects (the remaining 13 healthy subjects and 11 PD) was used as test data for use in the evaluation of model precision.
3) Selection of Feature Gene
19 RNAs whose expression was increased or decreased in Test 1 in the PD patients compared with the healthy subjects (genes shown in Table 6-1) or 30 RNAs whose expression was increased or decreased in Test 2 in the PD patients compared with the healthy subjects (genes shown in Table 6-2) in RNA expression analysis—2 in Example 1 were selected as feature genes. Their expression level data was converted to principal components by principal component analysis. Then, the first to fourth principal components were used as explanatory variables.
4) Model Construction
Prediction model construction was carried out by using a value of each principal component obtained from expression level data (logarithmic values to base 2 of normalized count values plus 1) on the feature genes selected as training data from SSL-derived RNA as an explanatory variable, and the healthy subjects (HL) and PD as objective variables. The prediction models were learned by 10-fold cross validation by using 7 algorithms random forest, linear kernel support vector machine (SVM linear), rbf kernel support vector machine (SVM rbf), neural network, generalized linear model, regularized linear discriminant analysis, and regularized logistic regression for each item to be predicted. As for each algorithm, the value of each principal component obtained from the feature gene expression levels (logarithmic values to base 2 of normalized count values plus 1) of the test data was input to the models thus learned to calculate a target predictive value for each prediction item. Recall, precision, and an F value which is a harmonic mean thereof are calculated from a predictive value and an actually measured value, and a model having the largest F value was selected as the optimum prediction model.
5) Results
Tables 10 and 11 show the algorithm used, the recall, the precision, and the F value of each item to be predicted.
The F value of the model obtained by using 19 RNAs whose relation to Parkinson's disease had not been reported so far among RNAs whose expression was increased or decreased in results of the likelihood ratio test after normalization by DESeq2 in Test 1 was 1, indicating that PD was predictable with this model. The F value of the model obtained by using 30 RNAs whose relation to Parkinson's disease had not been reported so far among RNAs whose expression was increased or decreased in results of the likelihood ratio test after normalization by DESeq2 in Test 2 was 0.87, indicating that PD was predictable with this model.
1) Data Used
In the data (read count values) on the expression level of SSL-derived RNA from the test subjects, data with a read count of less than 10 was treated as missing values, as in RNA expression analysis—1 in Example 1. After conversion to RPM values which normalized the read count values for difference in the total number of reads among samples, the missing values were compensated for by use of an approach called singular value decomposition (SVD) imputation. However, only genes which produced expression level data without missing values in 80% or more samples in all the samples were used in analysis given below. In the construction of machine learning models, converted RPM values, logarithmic values of RPM value to base 2 (Log2 RPM values) were used in order to approximate the RPM values, which followed negative binominal distribution, to normal distribution.
2) Data Set Partitioning
In the RNA profile data set obtained from the test subjects of Test 1, RNA profile data from a total of 20 subjects (10 healthy subjects and 10 PD) was used as training data for PD prediction models, and RNA profile data from the remaining 10 subjects was used as test data for use in the evaluation of model precision. In the RNA profile data set obtained from the test subjects of Test 2, RNA profile data from a total of 80 subjects (40 healthy subjects and 40 PD) was used as training data for PD prediction models, and RNA profile data from the remaining 20 subjects was used as test data for use in the evaluation of model precision.
3) Selection of Feature Gene
21 RNAs whose expression was increased or decreased in Test 1 in the PD patients compared with the healthy subjects (genes shown in Table 3-1) or 92 RNAs whose expression was increased or decreased in Test 2 in the PD patients compared with the healthy subjects (genes shown in Tables 3-2 to 3-4) in RNA expression analysis—1 in Example 1 were selected as feature genes. Their expression level data was converted to principal components by principal component analysis. Then, the first to fourth principal components were used as explanatory variables.
4) Model Construction
Prediction model construction was carried out by using a value of each principal component obtained from expression level data (Log2 RPM values) on the feature genes selected as training data from SSL-derived RNA as an explanatory variable, and the healthy subjects (HL) and PD as objective variables. The prediction models were learned by 10-fold cross validation by using 7 algorithms random forest, linear kernel support vector machine (SVM linear), rbf kernel support vector machine (SVM rbf), neural network, generalized linear model, regularized linear discriminant analysis, and regularized logistic regression for each item to be predicted. As for each algorithm, the value of each principal component obtained from the feature gene expression levels (Log2 RPM value) of the test data was input to the models thus learned to calculate a target predictive value for each prediction item. Recall, precision, and an F value which is a harmonic mean thereof are calculated from a predictive value and an actually measured value, and a model having the largest F value was selected as the optimum prediction model.
5) Results
Tables 12 and 13 show the algorithm used, the recall, the precision, and the F value of each item to be predicted.
The F value of the model obtained by using 21 RNAs whose relation to Parkinson's disease had not been reported so far among RNAs whose expression was increased or decreased in results of the test after normalization by Log2 RPM in Test 1 was 0.91, indicating that PD was predictable with this model. The F value of the model obtained by using 92 RNAs whose relation to Parkinson's disease had not been reported so far among RNAs whose expression was increased or decreased in results of the test after normalization by Log2 RPM in Test 2 was 0.9, indicating that PD was predictable with this model.
Number | Date | Country | Kind |
---|---|---|---|
2020-085430 | May 2020 | JP | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/JP2021/018511 | 5/14/2021 | WO |