This application claims the benefit of priority of Singapore application No. 10202001048U, filed 5 Feb. 2020, the contents of it being hereby incorporated by reference in its entirety for all purposes.
The invention is in the field of biomarkers, in particular biomarkers associated with Parkinson's disease and methods and uses thereof.
Parkinson's disease (PD) is one of the most common age-related neurodegenerative diseases worldwide and has contributed to over 200,000 deaths and 3.2 million disability-adjusted life years worldwide in 2016. PD presents as a hypokinetic movement disorder characterized by bradykinesia, postural instability, rigidity and resting tremors resulting from loss of nigrostriatal dopaminergic neurons and other non-dopaminergic structures. At present, there is no cure for PD as symptoms only present at late stages of the disease. Several genes containing rare pathogenic variants have been identified in familial PD, suggesting that while genetic factors play a role in PD pathogenesis, it is extremely heterogeneous and influenced by multiple genes and pathways. It implies that germ line genetic variants may serve as stable biomarkers for risk prediction early in life. Despite the large-scale meta-analyses of genome-wide association studies (GWAS) in the European population having identified several dozen loci with implication in PD pathogenesis and confirmed the involvement of familial PD genes in sporadic PD, there are limited studies in the Asian population which is the largest worldwide, and thus makes up a significant fraction of PD patients globally.
It is therefore important to identify biomarkers that can be used to diagnose PD, predict risk and identify at-risk individuals for early monitoring and therapeutic intervention. In addition, there is also a need to identify novel, potentially Asian-specific biomarkers to conduct a robust comparison between Asian and European genetic risk for PD.
In one aspect, there is provided a method of identifying whether a subject is at risk of developing PD, whether a subject is suffering from PD, or whether a subject is in need of early therapeutic intervention for PD, the method comprising: a. obtaining a DNA sample from the subject; and b. detecting the presence of a genetic variant at the loci of one or more genes selected from the group consisting of SV2C, WBSCR17, PARK16, ITPKB, MCCC1, SNCA, FAM47E-SCARB2, FYN, DLG2, LRRK2, RIT2 and combinations thereof in the sample; wherein the presence of one or more genetic variants identifies that the subject is at risk of developing PD, the subject is suffering from PD, or the subject is in need of early therapeutic intervention for PD.
In one aspect, there is provided a method of determining the prognosis of a subject with PD or a subject at risk of developing PD, the method comprising: a. obtaining a DNA sample from the subject; and b. detecting the presence of a genetic variant at the loci of one or more genes selected from the group consisting of SV2C, WBSCR17, PARK16, ITPKB, MCCC1, SNCA, FAM47E-SCARB2, FYN, DLG2, LRRK2, RIT2 and combinations thereof in the sample; wherein the presence of one or more genetic variants indicates that the subject has a poor prognosis.
In another aspect, there is provided a method of calculating a polygenic risk score (PRS) of a subject of developing PD, the method comprising the steps of: a. obtaining a DNA sample from the subject; b. detecting the presence of a genetic variant at the loci of one or more genes selected from the group consisting of SV2C, WBSCR17, PARK16, ITPKB, MCCC1, SNCA, FAM47E-SCARB2, FYN, DLG2, LRRK2, RIT2 and combinations thereof in the sample; and running genotyping analysis of DNA; and c. measuring the total number of the genetic variants detected in step b to calculate a PRS of a subject of developing PD.
In another aspect, there is provided a kit comprising one or more reagents to detect the presence of a genetic variant at the loci of one or more genes selected from the group consisting of SV2C, WBSCR17, PARK16, ITPKB, MCCC1, SNCA, FAM47E-SCARB2, FYN, DLG2, LRRK2, RIT2 and combinations thereof in a sample, together with instructions for use.
In yet another aspect, there is provided a PD biomarker, wherein the biomarker is a genetic variant at the loci of one or more genes selected from the group consisting of SV2C, WBSCR17, PARK16, ITPKB, MCCC1, SNCA, FAM47E-SCARB2, FYN, DLG2, LRRK2, RIT2 and combinations thereof.
The following are some definitions that may be helpful in understanding the description of the present invention. These are intended as general definitions and should in no way limit the scope of the present invention to those terms alone, but are put forth for a better understanding of the following description.
As used herein, the term “prognosis” refers to a prediction of the probable course and outcome of a clinical condition or disease. The prognosis, as used herein, can also refer to requirement of therapeutic intervention according to the course and outcome of a clinical condition or disease. A prognosis of a patient is usually made by evaluating factors or symptoms of a disease that are indicative of a favorable or unfavorable course or outcome of the disease. The term “prognosis” does not refer to the ability to predict the course or outcome of a condition with 100% accuracy. Instead, the term “prognosis” refers to an increased probability that a certain course or outcome will occur; that is, that a course or outcome is more likely to occur in a patient exhibiting a given condition, when compared to those individuals not exhibiting the condition. For example, the course or outcome of a condition may be predicted with 99%, 98%, 97%, 96%, 95%, 94%, 93%, 92%, 91%, 90%, 89%, 88%, 87%, 86%, 85%, 84%, 83%, 82%, 81%, 80%, 75%, 70%, 65%, 55%, and 50% accuracy.
As used herein, the term “biomarker” refers to a molecular indicator of a specific biological property, a biochemical feature or facet that can be used to determine the presence or absence and/or severity of a particular disease or condition. One or more biomarkers may be associated with the particular disease or condition. The term “biomarker” may refer to a polypeptide or nucleic acid sequence encoding the polypeptide, a fragment or variant of the polypeptide that is associated with PD. In addition, a “biomarker” can also refer to metabolites or metabolized fragments of the expressed polypeptide. A person skilled in the art would understand that a metabolite of one of the biomarkers referred to herein can still retain the capability of being used as biomarker for the methods described herein. It is also noted that some of the biomarkers in the biomarker set can be present in their variant form or metabolized form while others are still intact. In the present disclosure, the term “biomarker” refers to, but is not limited to, one or more genetic variants, a sequence encoding the genetic variant, the resulting mRNA, or the resulting polypeptide or protein if the genetic variation affects the protein-coding region. For example, a biomarker may be a combination of genetic variants at the loci of one or more genes. Evaluation of such biomarkers and their correlation to a pathological condition or disease can be done by, for example, determining the absence or presence of a biomarker, and comparative analysis between diseased and disease-free samples.
As used herein, the term “polymorphism” refers genetic polymorphism, which is used to describe diversity in genomes in species, such as a human being. It essentially refers to inter-individual differences in a DNA sequence that is unique to an individual. In other words, a genetic polymorphism is the occurrence, in the same population, of multiple discrete allelic states. Polymorphism involves one of two or more variants of a particular DNA sequence. The most common type of polymorphism involves variation at a single nucleotide, i.e., single nucleotide polymorphism (SNP).
As used herein, the terms “variant” or “genetic variant” refer to a specific region of the genome that differs from a reference genome. Based on the type of alteration, the term “genetic variant” can refer to, but is not limited to, single nucleotide variant (SNV) or single nucleotide polymorphism (SNP). As used herein, the term “SNV” or “SNP” refers to a variant with a single nucleotide substitution in a DNA sequence. Conventionally a SNP is a SNV that is present to some appreciable degree within a population (for example, more than 1% of said population).
SNPs may occur in all positions of the DNA sequence encoding the genetic variant, such as coding regions, non-coding regions, or the regions between genes. They can occur, for example, in the exons, introns, UTRs, regulatory regions such as enhancer, transcription factor binding domain and DNA methylation regions or regions with no known function.
As used herein, the term “locus” refers to a specific position on a chromosome. It is known that multiple genes can reside at the same locus. It would be understood by a person skilled in the art that a SNP occurs at a specific locus on the chromosome which can be either within a gene or in the region between two genes. The locus where a SNP occurs may be named according to the gene that is nearest to the SNP. For example, the locus where SNP rs34311866 occurs may be named as “GAK”. The locus where a SNP occurs may be also named according to multiple genes that are located at varying distances from the SNP within the locus. For example, the locus where SNP rs34311866 occurs may also be named as “TMEM175-GAK-DGKQ”.
As used herein, the term “polygenic score” or “polygenic risk score (PRS)” is a score based on the variation in multiple genetic loci and their associated weights. The PRS is constructed from the effect size for each risk allele or effect allele and generally follows the form:
where the PRS, Ŝ of an individual is equal to the weighted sum of the individual's marker genotypes, Xj, at m genetic variants or small nucleotide polymorphisms (SNPs). Weights {circumflex over (β)}j are estimated using regression analysis, such as logistic regression.
As used herein, the term “principal component analysis (PCA)” refers to a statistical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables called principal components. PCA may be used to detect and correct allele frequency differences between an individual and controls (one or more individuals of know ancestry) due to systemic ancestry differences, thereby allowing ancestry differences between an individual and controls to be modelled.
As used herein, the terms “isolated” or “isolating” relates to a biological component (such as a nucleic acid molecule, protein or organelle) that has been substantially separated or purified away from other biological components in the cell of the organism in which the component naturally occurs, i.e., other chromosomal and extra-chromosomal DNA and RNA, proteins and organelles. Nucleic acids that have been “isolated” include nucleic acids purified by standard purification methods.
As used herein, the term “sample”, refers to single cells, multiple cells, fragments of cells, tissue, or body fluid, which has been obtained from, removed from, or isolated from a subject. An example of a sample includes, but is not limited to, blood, stool, serum, plasma, tears, saliva, urine, sputum, nasal fluid, gastrointestinal fluid, cerebrospinal fluid, bone marrow fluid, exudate, transudate, bronchial lavage. In another example, the biomarker may be fresh tissue, frozen fresh tissue, paraffin embedded tissue or formalin fixed paraffin embedded tissue. The sample can include, but is not limited to, tissue obtained from the brain, lung, muscle, brain, liver, skin, pancreas, stomach, bladder, and other organs.
As used herein, the term “primer” refers to any single-stranded oligonucleotide sequence capable of being used as a primer in, for example, PCR technology. Thus, a “primer” according to the disclosure refers to a single-stranded oligonucleotide sequence that is capable of acting as appoint of initiation for synthesis of a primer extension product that is substantially identical to the nucleic acid strand to be copied (for a forward primer) or substantially the reverse complement of the nucleic acid strand to be copied (for a reverse primer).
As used herein, the term “probe” refers to any nucleic acid fragment that hybridizes to a target sequence. A probe may be labelled with radioactive isotopes, fluorescent tags, antibodies or chemical labels to facilitate detection of the probe.
The invention will be better understood with reference to the detailed description when considered in conjunction with the non-limiting examples and the accompanying drawings, in which:
In one aspect, the present invention refers to a method of identifying whether a subject is at risk of developing PD, whether a subject is suffering from PD, or whether a subject is in need of early therapeutic intervention for PD, the method comprising: a) obtaining a DNA sample from the subject; and b) detecting the presence of a genetic variant at the loci of one or more genes selected from the group consisting of SV2C, WBSCR17, PARK16, ITPKB, MCCC1, SNCA, FAM47E-SCARB2, FYN, DLG2, LRRK2, RIT2 and combinations thereof in the sample; wherein the presence of one or more genetic variants identifies that the subject is at risk of developing PD, the subject is suffering from PD, or the subject is in need of early therapeutic intervention for PD.
In one example, the method involves detecting the presence of a genetic variant at the loci of SV2C and WBSCR17.
In another example, the method involves detecting the presence of a genetic variant at the loci of SV2C, WBSCR17, PARK16, ITPKB, MCCC1, SNCA, FAM47E-SCARB2, FYN, DLG2, LRRK2 and RIT2. Full name of the 11 genetic loci can be found in Table 1.
The method of the invention can therefore be used either identify whether a subject is at risk of developing PD, or whether a subject is suffering from PD.
A subject or patient who is suffering from PD either has already been diagnosed with, or has not yet been diagnosed with PD. The subject or patient may be symptomatically characterized by one or more of the following features, but not limited to, bradykinesia, postural instability, rigidity, resting tremors, loss of automatic movements, changes in speech and writing, and cognitive impairment. The subject may also be patho-physiologically characterized by one or more of the following features, but not limited to, loss of nigrostriatal dopaminergic neurons and other non-dopaminergic structures. In one example, the characteristics of PD is assessed using the United Kingdom Parkinson's Society Brain Bank Criteria.
A subject or patient who is at risk of developing PD has a higher likelihood of developing PD relative to the rest of the population. The higher likelihood may be attributed to factors including, but not limited to, genetic variations and environmental triggers such as exposure to certain toxins. In some example, the higher risk is due to genetic predisposition or susceptibility. A subject or patient is said to be developing PD or have developed PD based on the manifestation of symptoms of PD, such as bradykinesia, postural instability, rigidity, resting tremors, loss of automatic movements, changes in speech and writing, and cognitive impairment, and/or pathological characteristics, such as loss of nigrostriatal dopaminergic neurons and other non-dopaminergic structures.
A subject who is identified as being at risk of developing PD may or may not also be in need of early therapeutic intervention. Similarly, a person who is suffering from PD may or may not also be in need of early therapeutic intervention. Therefore, provided here is also a method to identify whether a subject is in need of early therapeutic intervention for PD.
In one example, early therapeutic intervention includes but is not limited to one or more of the following: monitoring the subject for disease onset and progression, prophylactic treatment with a neuroprotective drug, and dietary or lifestyle changes.
As part of early therapeutic intervention, the subject may be monitored regularly for the onset of PD and/or progression. Further therapeutic intervention may be prescribed based on the outcome of the monitoring.
Early therapeutic intervention may also include prophylactic treatment. Prophylactic treatment in the context of PD refers to a treatment or intervention that is designed and used to prevent PD disease from occurring, to delay the onset of PD, to reduce the severity of PD or combinations thereof. For example, a prophylactic treatment for PD can be a neuroprotective drug that is commercially available or in clinical trials. It will generally be understood that a neuroprotective drug or a neuroprotective agent is a compound or agent that is capable of salvaging, recovering and/or regenerating the nervous system, neural cells, neural structure or neural function.
Other early intervention therapies include dietary or lifestyle changes such as changes to diet, nutrition intake and exercise.
A genetic variant can occur in many forms, which include, but are not limited to, SNV or SNP. In one example, a genetic variant refers to a SNP.
The genetic variant may be detected in any position of the DNA sequence encoding the genetic variant, for example, exons, introns, UTRs, other regulatory regions or regions without known functions. For example, the genetic variant may be a SNP detected within an intron of a gene.
The consequence of the genetic variation can be synonymous or non-synonymous. For example, the genetic variant may be a synonymous or non-synonymous SNP that occurs in the exon of the gene. Synonymous SNPs are those SNPs that have different alleles that encode for the same amino acid. Non-synonymous SNPs are SNPs that have different alleles that encode different amino acids. A synonymous variant occurs when the nucleotide substitution does not result in a change in amino acid, while a non-synonymous variant occurs when the nucleotide substitution leads to an amino acid substitution. In some example, the non-synonymous SNPs may be missense, nonsense or frameshift. Missense refers to where the nucleotide substitution results in a codon that codes for a different amino acid. Nonsense refers to where the nucleotide substitution results in a premature stop codon and truncation of protein. For example, a non-synonymous SNP may be a missense variant.
A subject who has been identified as having or suffering from PD, or as being at risk of developing PD may also be tested to determine their prognosis. As such, in another aspect, the present invention refers to a method of determining the prognosis of a subject with PD or a subject at risk of developing PD, the method comprising: a). obtaining a DNA sample from the subject; and b). detecting the presence of a genetic variant at the loci of one or more genes selected from the group consisting of SV2C, WBSCR17, PARK16, ITPKB, MCCC1, SNCA, FAM47E-SCARB2, FYN, DLG2, LRRK2, RIT2 and combinations thereof in the sample; wherein the presence of one or more genetic variants indicates that the subject has a poor prognosis.
The prognosis of a subject in the context of PD includes but is not limited to the response of a subject to a treatment for PD, the progression of PD, the age of onset of PD, the need for early and/or aggressive therapy for PD. A poor prognosis therefore may mean that a subject is not responsive or not likely to respond to PD treatment. A poor prognosis may also mean that a subject is likely to have a rapid progression of PD or a rapid onset of symptoms associated with PD. Further, a poor prognosis may mean that the onset of PD happened or is likely to happen at an early or earlier age relative to a subject that has a good prognosis. A subject with a poor prognosis of PD may also require early and/or aggressive therapy for PD.
Early therapy refers to the treatment of a subject at an early stage of PD. For example, where the symptoms of PD are mild. Aggressive PD therapy refers to the treatment of a subject with more types of drugs, higher doses of drugs, higher frequency of treatment or more types of treatments. Aggressive PD therapy may also refer to intensive monitoring of high risk individuals at pre-symptomatic stage or early stages, and possible participation in trials for neuroprotective therapy.
In one example, the method involves detecting the presence of a genetic variant at the loci of SV2C and WBSCR17.
In another example, the method involves detecting the presence of a genetic variant at the loci of SV2C, WBSCR17, PARK16, ITPKB, MCCC1, SNCA, FAM47E-SCARB2, FYN, DLG2, LRRK2 and RIT2.
In addition to detecting genetic variants at the loci of one or more genes described in the foregoing, the method may further detect the presence of genetic variants at the loci of one or more additional genes. In one example, the one or more additional genes is selected from the group consisting of ILIR2, SCN3A, SATB1, NCKIPSD, CDC71, ALAS1, TLR9, DNAH1, BAP1, PHF7, NISCH, STAB I, ITIH3, ITIH4, ANK2, CAMK2D, ELOVL7, ZNF184, CTSB, SORBS3, PDLIM2, C8orf58, BIN3, SH3GL2, FAM171A1, GALC, COQ7, TOX3, ATP6V0A1, PSMC3I, TUBG2, GBA-SYT11, RAB7L1-NUCKS1, SIPA1L2, ACMSD-TMEM163, STK39, KRT8P25-APOOP2, NMD3, TMEM175-GAK-DGKQ, BST1, HLA-DQB1, GPNMB, FGF20, MMP16, ITGA8, INPP5F, MIR4697, LRRK2, CCDC62, GCH1, TMEM229B, VPS13C, BCKDK-STXIB, SREBF1-RAI1, MAPT, SPPL2B, DDRGKI, USP25, FCGR2A, VAMP4, KCNS3, KCNIP3, LINC00693, KPNA1, MED12L, SPTSSB, LCORL, CLCN3, PAM, C5orf24, TRIM40, RIMS1, RPS12, GS1-124K5.11, FAM49B, UBAP2, GBF1, RNF141, SCAF11, FBRSLI, CAB39L, MBNL2, MIPOL1, RPS6KL1, CD19, NOD2, CNOT1, CHRNBI, UBTF, FAM171A2, BRIP1, DNAH17, ASXL3, MEX3C, CRLS1, DYRKIA and combinations thereof.
In one example, in addition to detecting a genetic variant at the loci of SV2C, WBSCR17, PARK16, ITPKB, MCCC1, SNCA, FAM47E-SCARB2, FYN, DLG2, LRRK2 and RIT2, a genetic variant is further detected at the loci of BST1, GAK, ASXL3, VPS13C, FGF20, RPS12, ZNF184, SH3GL2, CCDC62, LCORL, RIMS1, UBAP2, RNF141, SCAF11, FBRSLI, RPS6KL1, UBTF and STK39.
In another example, in addition to detecting a genetic variant at the loci of SV2C, WBSCR17, PARK16, ITPKB, MCCC1, SNCA, FAM47E-SCARB2, FYN, DLG2, LRRK2 and RIT2, a genetic variant is further detected at the loci of ILIR2, SCN3A, SATB1, NCKIPSD, CDC71, ALAS1, TLR9, DNAH1, BAP1, PHF7, NISCH, STAB I, ITIH3, ITIH4, ANK2, CAMK2D, ELOVL7, ZNF184, CTSB, SORBS3, PDLIM2, C8orf58, BIN3, SH3GL2, FAM171A1, GALC, COQ7, TOX3, ATP6V0A1, PSMC3I, TUBG2, GBA-SYT11, RAB7L1-NUCKS1, SIPA1L2, ACMSD-TMEM163, STK39, KRT8P25-APOOP2, NMD3, TMEM175-GAK-DGKQ, BST1, HLA-DQB1, GPNMB, FGF20, MMP16, ITGA8, INPP5F, M1R4697, LRRK2, CCDC62, GCH1, TMEM229B, VPS13C, BCKDK-STXIB, SREBF1-RAI1, MAPT, SPPL2B, DDRGKI, USP25, FCGR2A, VAMP4, KCNS3, KCNIP3, L1NC00693, KPNA1, MED12L, SPTSSB, LCORL, CLCN3, PAM, C5orf24, TRIM40, RIMS1, RPS12, GS1-124K5.11, FAM49B, UBAP2, GBF1, RNF141, SCAF11, FBRSLI, CAB39L, MBNL2, MIPOLI, RPS6KL1, CD19, NOD2, CNOT1, CHRNBI, UBTF, FAM171A2, BRIP1, DNAH17, ASXL3, MEX3C, CRLS1 and DYRKIA.
The present invention also provides a method of calculating a risk score for the likelihood or risk of a subject developing PD. In one aspect, the present invention refers to a method of calculating a PRS of a subject of developing PD, the method comprising the steps of: a. obtaining a DNA sample from the subject; b. detecting the presence of a genetic variant at the loci of one or more genes selected from the group consisting of SV2C, WBSCR17, PARK16, ITPKB, MCCC1, SNCA, FAM47E-SCARB2, FYN, DLG2, LRRK2, RIT2 and combinations thereof in the sample; and c. measuring the total number of the genetic variants detected in step b to calculate a PRS of a subject of developing PD.
In one example, the method of calculating a PRS involves detecting the presence of a genetic variant and measuring the total number of genetic variants at the loci of SV2C and WBSCR17.
In another example, the method of calculating a PRS involves detecting the presence of a genetic variant and measuring the total number of genetic variants at the loci of SV2C, WBSCR17, PARK16, ITPKB, MCCC1, SNCA, FAM47E-SCARB2, FYN, DLG2, LRRK2 and RIT2.
In addition to detecting the presence of a genetic variant and measuring the total number of genetic variants at the loci of SV2C, WBSCR17, PARK16, ITPKB, MCCC1, SNCA, FAM47E-SCARB2, FYN, DLG2, LRRK2 and RIT2 genes, the method of calculating a PRS may further comprise detecting the presence of a genetic variant and measuring the total number of genetic variants at the loci of one or more additional genes. In one example, the one or more additional genes is selected from the group consisting of ILIR2, SCN3A, SATB1, NCKIPSD, CDC71, ALAS1, TLR9, DNAH1, BAP1, PHF7, NISCH, STAB1, ITIH3, ITIH4, ANK2, CAMK2D, ELOVL7, ZNF184, CTSB, SORBS3, PDLIM2, C8orf58, BIN3, SH3GL2, FAM171A1, GALC, COQ7, TOX3, ATP6V0A1, PSMC3I, TUBG2, GBA-SYT11, RAB7L1-NUCKS1, SIPA1L2, ACMSD-TMEM163, STK39, KRT8P25-APOOP2, NMD3, TMEM175-GAK-DGKQ, BST1, HLA-DQB1, GPNMB, FGF20, MMP16, ITGA8, INPP5F, M1R4697, LRRK2, CCDC62, GCH1, TMEM229B, VPS13C, BCKDK-STXIB, SREBF1-RAI1, MAPT, SPPL2B, DDRGKI, USP25, FCGR2A, VAMP4, KCNS3, KCNIP3, L1NC00693, KPNA1, MED12L, SPTSSB, LCORL, CLCN3, PAM, C5orf24, TRIM40, RIMS1, RPS12, GS1-124K5.11, FAM49B, UBAP2, GBF1, RNF141, SCAF11, FBRSLI, CAB39L, MBNL2, MIPOLI, RPS6KL1, CD19, NOD2, CNOT1, CHRNBI, UBTF, FAM171A2, BRIP1, DNAH17, ASXL3, MEX3C, CRLS1, DYRKIA and combinations thereof.
In one example, the method of calculating a PRS comprises detecting the presence of a genetic variant and measuring the total number of genetic variants at the loci of the SV2C, WBSCR17, PARK16, ITPKB, MCCC1, SNCA, FAM47E-SCARB2, FYN, DLG2, LRRK2, RIT2, BST1, GAK, ASXL3, VPS13C, FGF20, RPS12, ZNF184, SH3GL2, CCDC62, LCORL, RIMS1, UBAP2, RNF141, SCAF11, FBRSL1, RPS6KL1, UBTF and STK39 genes.
In another example, the method of calculating a PRS comprises detecting the presence of a genetic variant and measuring the total number of genetic variants at the loci of the SV2C, WBSCR17, PARK16, ITPKB, MCCC1, SNCA, FAM47E-SCARB2, FYN, DLG2, LRRK2, RIT2, ILIR2, SCN3A, SATB1, NCKIPSD, CDC71, ALAS1, TLR9, DNAH1, BAP1, PHF7, NISCH, STAB1, ITIH3, ITIH4, ANK2, CAMK2D, ELOVL7, ZNF184, CTSB, SORBS3, PDLIM2, C8orf58, BIN3, SH3GL2, FAM171A1, GALC, COQ7, TOX3, ATP6V0A1, PSMC3I, TUBG2, GBA-SYT11, RAB7L1-NUCKS1, SIPA1L2, ACMSD-TMEM163, STK39, KRT8P25-APOOP2, NMD3, TMEM175-GAK-DGKQ, BST1, HLA-DQB1, GPNMB, FGF20, MMP16, ITGA8, INPP5F, MIR4697, LRRK2, CCDC62, GCH1, TMEM229B, VPS13C, BCKDK-STXIB, SREBF1-RAI1, MAPT, SPPL2B, DDRGKI, USP25, FCGR2A, VAMP4, KCNS3, KCNIP3, LINC00693, KPNA1, MED12L, SPTSSB, LCORL, CLCN3, PAM, C5orf24, TRIM40, RIMS1, RPS12, GS1-124K5.11, FAM49B, UBAP2, GBF1, RNF141, SCAF11, FBRSLI, CAB39L, MBNL2, MIPOL1, RPS6KL1, CD19, NOD2, CNOT1, CHRNBI, UBTF, FAM171A2, BRIP1, DNAH17, ASXL3, MEX3C, CRLS1 and DYRK1 genes.
In the method for calculating a PRS, the total number of genetic variants may be unweighted or weighted. In one example, the total number of genetic variants may be weighted by the effect size of each variant.
Effect size or beta (β) is a measure of how the risk of developing PD changes for every copy of risk allele or effect allele carried by an individual. It will generally be understood that each individual carries 2 copies of each chromosome (a paternal and a maternal chromosome) and can therefore carry either 0, 1 or 2 copies of a risk allele or effect allele. The “effect size” measures the relative risk of an individual carrying 2 copies of the risk allele versus 1 copy of the risk allele, or 1 copy of the risk allele versus 0 copies of the risk allele. By comparing the number of copies of a risk allele between patients suffering from PD and controls, an effect size for each risk allele or genetic variant can be determined. The effect size may also be expressed as an “odds ratio (OR)”, which is calculated by taking the exponential of the effect size or beta (β).
In one example, effect size may be −0.300, −0.200, −0.150, −0.100, −0.050, 0.050, 0.100, 0.150, 0.200, 0.250, 0.300, 0.350, 0.400, 0.500, 0.600, 0.700, 0.800 or 0.900. In one example, the reported effect size is 0.211. In another example, the reported effect size is 0.217. In yet another example, the reported effect size is 0.128.
In one example, the effect size is determined using logistic regression comparing genotypes in patients suffering from PD versus controls (patients who are not suffering from PD). The effect size is calculated for each risk allele or effect allele and combined to construct a PRS.
In one example, in the method for calculating a PRS of a subject of developing PD, the PRS of the subject is compared with PRSs in a reference population to determine the percentile risk of the subject's risk of developing PD. An example of reference population is a population without PD. Another example is a representative population of the general population whose PD status is unknown.
In one example, the PRS percentiles are used to estimate the fold-difference in risk of developing PD. In one example, PRS cut-offs for the top and bottom 5% are determined based on the control population, and number of PD disease cases in the first group with PRS higher than or equals to the top 5 percentile and in the second group with PRS lower than or equals to the bottom 5 percentile are then determined respectively to estimate the fold-difference in risk between the two groups in the disease population. In another example, PRS cut-offs for the top and bottom 10% are determined based on the control population, and number of PD disease cases in the first group with PRS higher than or equals to the top 10 percentile and in the second group with PRS lower than or equals to the bottom 10 percentile are then determined respectively to estimate the fold-difference in risk between the two groups in the disease population.
In one example, the PRS percentile is used to predict the risk of developing PD. In one example, a subject with a PRS that is in a higher percentile has a higher risk of developing PD compared to an individual with a PRS that is in a lower percentile. In another example, an individual with a lower percentile PRS has a lower risk of developing PD compared to an individual with a higher percentile PRS. It will therefore be understood that a subject with a PRS that is in the bottom 5 percentile has lowest risk of developing PD, and a subject with a PRS that is in the 95-100 percentile or the top 5 percentile has the highest risk of developing PD.
In another example, the PRS may be used to determine the prognosis of subject with PD, where a subject with a PRS in a higher percentile has a higher risk of having poor prognosis compared to a subject with a PRS that is in a lower percentile. Similarly, a subject with a PRS in lower percentile has a lower risk of poor prognosis compared to a subject with a PRS that is in a higher percentile.
In one example, in the method of identifying whether a subject is suffering from PD, at risk of developing PD, identifying whether a subject is in need of early therapeutic intervention for PD, determining the prognosis, or calculating a PRS of a subject of developing PD, the one or more genetic variants is a polymorphism.
In one example, the polymorphism is a SNV or SNP. For example, the genetic variant is an effect allele or risk allele of the SNP or SNV.
An effect allele refers to the allele whose effects in relation to the disease are being studied. In some examples, the effect allele may be the risk allele, which is the allele of a SNP that confers the risk of developing the disease. Such an allele has genome-wide significance and has an odds ratio >1.0, which indicates an increased risk relative to the other allele. In other words, risk allele is associated with a positive effect size as opposed to negative effect size. In the present disclosure, the term “effect allele” refers to the risk allele, which is confers the increased risk of developing PD.
In one example, the genetic variant is a SNP selected from the group consisting of rs6826785, rs141336855, rs6679073, rs2292056, rs16846351, rs3816248, rs12278023, rs9638616, rs1887316, rs246814, rs31244, rs4130047 and combinations thereof.
In one example, the genetic variants for the genes WBSCR17 and SV2C are rs9638616 and rs246814 respectively. In another example, the genetic variants for the genes WBSCR17 and SV2C are rs9638616 and rs31244 respectively.
In one example, the genetic variants are rs6826785, rs141336855, rs6679073, rs2292056, rs16846351, rs3816248, rs12278023, rs9638616, rs1887316, rs246814 and rs4130047. In another example, the genetic variants are rs6826785, rs141336855, rs6679073, rs2292056, rs16846351, rs3816248, rs12278023, rs9638616, rs1887316, rs31244 and rs4130047.
It is well understood that each reference SNP (rs) number can be used as an identification number for a specific SNP at the locus of a gene. In one example, rs246814 is a SNP located within an intron of the SV2C gene. In another example, rs31244 is a missense SNP located within SV2C. In yet another example, rs9638616 is a SNP located within an intron of the WBSCR17 gene.
In some examples, the genetic variant at the loci of SNCA is rs6826785, and the effect allele of rs6826785 is cytosine (C). In some examples, the genetic variant at the loci of LRRK2 is rs141336855, and the effect allele of rs141336855 is thymine (T). In some examples, the genetic variant at the loci of PARK16 is rs6679073, and the effect allele of rs6679073 is adenine (A). In some examples, the genetic variant at the loci of MCCCI is rs2292056, and the effect allele of rs2292056 is guanine (G). In some examples, the genetic variant at the loci of ITPKB is rs16846351, and the effect allele of rs16846351is guanine (G). In some examples, the genetic variant at the loci of FAM47E-SCARB2 is rs3816248, and the effect allele of rs3816248 is cytosine (C). In some examples, the genetic variant at the loci of DLG2 is rs12278023, and the effect allele of rs12278023 is cytosine (C). In some examples, the genetic variant at the loci of WBSCR17 is rs9638616, and the effect allele of rs9638616 is thymine (T). In some examples, the genetic variant at the loci of FYN is rs1887316, and the effect allele of rs1887316 is adenine (A). In some examples, the genetic variant at the loci of SV2C is rs246814 or rs31244, and the effect allele of rs246814 is thymine (T) and the effect allele of rs31244 is guanine (G). In some examples, the genetic variant at the loci of RIT2 is rs4130047, and the effect allele of rs4130047 is cytosine (C).
In another example, in addition to the genetic variants detected in the foregoing gene list, the method further comprises detecting the presence or measuring the total number of genetic variants at the loci of one or more genes selected from the group consisting of ILIR2, SCN3A, SATB1, NCKIPSD, CDC71, ALAS1, TLR9, DNAH1, BAP1, PHF7, NISCH, STAB1, ITIH3, ITIH4, ANK2, CAMK2D, ELOVL7, ZNF184, CTSB, SORBS3, PDLIM2, C8orf58, BIN3, SH3GL2, FAM171A1, GALC, COQ7, TOX3, ATP6V0A1, PSMC3I, TUBG2, GBA-SYT11, RAB7L1-NUCKS1, SIPA1L2, ACMSD-TMEM163, STK39, KRT8P25-APOOP2, NMD3, TMEM175-GAK-DGKQ, BST1, HLA-DQB1, GPNMB, FGF20, MMP16, ITGA8, INPP5F, M1R4697, LRRK2, CCDC62, GCH1, TMEM229B, VPS13C, BCKDK-STXIB, SREBF1-RAI1, MAPT, SPPL2B, DDRGKI, USP25, FCGR2A, VAMP4, KCNS3, KCNIP3, L1NC00693, KPNA1, MED12L, SPTSSB, LCORL, CLCN3, PAM, C5orf24, TRIM40, RIMS1, RPS12, GS1-124K5.11, FAM49B, UBAP2, GBF1, RNF141, SCAF11, FBRSLI, CAB39L, MBNL2, MIPOLI, RPS6KL1, CD19, NOD2, CNOT1, CHRNBI, UBTF, FAM171A2, BRIP1, DNAH17, ASXL3, MEX3C, CRLS1, DYRKIA and combinations thereof, wherein the genetic variant is a SNP selected from the group consisting of rs34043159, GSA-rs353116, rs4073221, rs12497850, rs143918452, rs78738012, rs2694528, rs9468199, rs2740594, rs2280104, rs13294100, rs10906923, rs8005172, rs11343, rs4784227, rs601999, rs35749011, rs10797576, rs6430538, rs1474055, rs115185635, rs34016896, rs34311866, rs11724635, rs9275326, rs199347, rs591323, rs60298754, rs7077361, rs117896735, rs329648, rs11060180, rs11158026, rs1555399, rs2414739, rs14235, rs11868035, rs17649553, rs113579895, rs62120679, rs8118008, rs2823357, rs6658353, rs11578699, rs76116224, rs2042477, rs6808178, rs55961674, rs11707416, rs1450522, rs34025766, rs62333164, rs26431, rs11950533, rs9261484, rs12528068, rs75859381, rs76949143, rs2086641, rs6476434, rs10748818, rs7938782, rs7134559, GSA-rs11610045, rs9568188, rs4771268, rs12147950, rs3742785, rs2904880, rs6500328, rs200564078, rs12600861, rs2269906, rs850738, rs61169879, rs666463, rs1941685, rs8087969, rs77351827, rs2248244, rs4613239, rs1474055 and combinations thereof.
In one example, the genetic variants are rs6826785, rs141336855, rs6679073, rs2292056, rs16846351, rs3816248, rs12278023, rs9638616, rs1887316, rs246814, rs4130047, rs11724635, rs34311866, rs1941685, rs2414739, rs591323, rs75859381, rs9468199, rs13294100, rs11060180, rs34025766, rs12528068, rs6476434, rs7938782, rs7134559, GSA-rs11610045, rs3742785, rs2269906 and rs1474055.
In another example, the genetic variants are rs6826785, rs141336855, rs6679073, rs2292056, rs16846351, rs3816248, rs12278023, rs9638616, rs1887316, rs31244, rs4130047, rs11724635, rs34311866, rs1941685, rs2414739, rs591323, rs75859381, rs9468199, rs13294100, rs11060180, rs34025766, rs12528068, rs6476434, rs7938782, rs7134559, GSA-rs11610045, rs3742785, rs2269906 and rs1474055.
In another example, the genetic variants are rs6826785, rs141336855, rs6679073, rs2292056, rs16846351, rs3816248, rs12278023, rs9638616, rs1887316, rs246814, rs4130047, rs34043159, GSA-r5353116, rs4073221, rs12497850, rs143918452, rs78738012, rs2694528, rs9468199, rs2740594, rs2280104, rs13294100, rs10906923, rs8005172, rs11343, rs4784227, rs601999, rs35749011, rs10797576, rs6430538, rs1474055, rs115185635, rs34016896, rs34311866, rs11724635, rs9275326, rs199347, rs591323, rs60298754, rs7077361, rs117896735, rs329648, rs11060180, rs11158026, rs1555399, rs2414739, rs14235, rs11868035, rs17649553, rs113579895, rs62120679, rs8118008, rs2823357, rs6658353, rs11578699, rs76116224, rs2042477, rs6808178, rs55961674, rs11707416, rs1450522, rs34025766, rs62333164, rs26431, rs11950533, rs9261484, rs12528068, rs75859381, rs76949143, rs2086641, rs6476434, rs10748818, rs7938782, rs7134559, GSA-rs11610045, rs9568188, rs4771268, rs12147950, rs3742785, rs2904880, rs6500328, rs200564078, rs12600861, rs2269906, rs850738, rs61169879, rs666463, rs1941685, rs8087969, rs77351827, rs2248244, rs4613239 and rs1474055.
In yet another example, the genetic variants are rs6826785, rs141336855, rs6679073, rs2292056, rs16846351, rs3816248, rs12278023, rs9638616, rs1887316, rs31244, rs4130047, rs34043159, GSA-r5353116, rs4073221, rs12497850, rs143918452, rs78738012, rs2694528, rs9468199, rs2740594, rs2280104, rs13294100, rs10906923, rs8005172, rs11343, rs4784227, rs601999, rs35749011, rs10797576, rs6430538, rs1474055, rs115185635, rs34016896, rs34311866, rs11724635, rs9275326, rs199347, rs591323, rs60298754, rs7077361, rs117896735, rs329648, rs11060180, rs11158026, rs1555399, rs2414739, rs14235, rs11868035, rs17649553, rs113579895, rs62120679, rs8118008, rs2823357, rs6658353, rs11578699, rs76116224, rs2042477, rs6808178, rs55961674, rs11707416, rs1450522, rs34025766, rs62333164, rs26431, rs11950533, rs9261484, rs12528068, rs75859381, rs76949143, rs2086641, rs6476434, rs10748818, rs7938782, rs7134559, GSA-rs11610045, rs9568188, rs4771268, rs12147950, rs3742785, rs2904880, rs6500328, rs200564078, rs12600861, rs2269906, rs850738, rs61169879, rs666463, rs1941685, rs8087969, rs77351827, rs2248244, rs4613239 and rs1474055.
It is well known in epidemiology that ethnic variations exist and contribute to the prevalence and etiology of various diseases. In PD, it is known that different ethnic populations have different rates of occurrence, for example, Caucasians vs. Asians. It is also known that different ethnic populations have different disease progression, such as in the development of motor symptoms.
It is understood, with the underlying distinct genetic risk factors and etiologies, that patients with the same disease may show different results to the same method of diagnosis. They may also respond differently to the same treatment. There may be ethnic differences in allele frequencies and effect sizes. For example, a SNP of a gene may be strongly associated with the Asian population, but not European population, suggesting potential genetic or allelic heterogeneity at this gene. A previously identified genetic variant may be limited in use by allelic heterogeneity in a different population. Therefore, the methods of the invention may also be applied to various ethnic populations.
In one example, the methods of the present invention may be used in a subject of Asian ethnicity or ancestry. In another example, the subject is of Han Chinese ancestry or Chinese ethnicity or ancestry with no mixed ancestry, or a South Korean ethnicity or ancestry. In the present disclosure, the terms “ancestry” and “ethnicity” are of the same meaning and hence can be used interchangeably.
In one example, the ancestry or ethnicity of the subject is determined by PCA.
PCA may be used to measure the genetic distance and relatedness between an individual and one or more other individuals of known ancestry or ethnicity. Comparison of the genetic distance between the individual with other individuals of known ancestry or ethnicity allows the ancestry or ethnicity of the individual to be mapped or determined. For example, PCA can be used to confirm the ancestry or ethnicity of an individual as samples of a specific ancestry or ethnicity are expected to cluster together. In another example, PCA can be used to disprove the ancestry or ethnicity of an individual or identify an individual with mixed ancestry when a sample obtained from the individual does not cluster with samples of known ancestry or ethnicity.
In one example, PCA may be used to determine an individual as being of Asian ethnicity or ancestry. In another example, PCA may be used to determine an individual as being of Han Chinese ancestry or Chinese ethnicity or ancestry with no mixed ancestry. In yet another example, PCA may be used to determine an individual as being of South Korean ethnicity or ancestry.
In another aspect, the present invention refers to a kit comprising one or more reagents to detect the presence of a genetic variant at the loci of one or more genes selected from the group consisting of SV2C, WBSCR17, PARK16, ITPKB, MCCC1, SNCA, FAM47E-SCARB2, FYN, DLG2, LRRK2, RIT2 and combinations thereof in a sample, together with instructions for use.
In one example, the kit comprises one or more reagents to detect the presence of a genetic variant at the loci of SV2C and WBSCR17 genes.
In another example, the kit comprises one or more reagents to detect the presence of a genetic variant at the loci of SV2C, WBSCR17, PARK16, ITPKB, MCCC1, SNCA, FAM47E-SCARB2, FYN, DLG2, LRRK2 and RIT2.
In one example, in addition to the 11 genes listed in the foregoing, the kit may further comprise reagents to detect the presence of a genetic variant at the loci of one or more genes selected from the group consisting of ILIR2, SCN3A, SATB1, NCKIPSD, CDC71, ALAS1, TLR9, DNAH1, BAP1, PHF7, NISCH, STAB1, ITIH3, ITIH4, ANK2, CAMK2D, ELOVL7, ZNF184, CTSB, SORBS3, PDLIM2, C8orf58, BIN3, SH3GL2, FAM171A1, GALC, COQ7, TOX3, ATP6V0A1, PSMC3I, TUBG2, GBA-SYT11, RAB7L1-NUCKS1, SIPA1L2, ACMSD-TMEM163, STK39, KRT8P25-APOOP2, NMD3, TMEM175-GAK-DGKQ, BSTI, HLA-DQB1, GPNMB, FGF20, MMP16, ITGA8, INPP5F, MIR4697, LRRK2, CCDC62, GCH1, TMEM229B, VPS13C, BCKDK-STXIB, SREBF1-RAI1, MAPT, SPPL2B, DDRGKI, USP25, FCGR2A, VAMP4, KCNS3, KCNIP3, LINC00693, KPNA1, MED12L, SPTSSB, LCORL, CLCN3, PAM, C5orf24, TRIM40, RIMS1, RPS12, GS1-124K5.11, FAM49B, UBAP2, GBF1, RNF141, SCAF11, FBRSLI, CAB39L, MBNL2, MIPOL1, RPS6KL1, CD19, NOD2, CNOT1, CHRNB1, UBTF, FAM171A2, BRIP1, DNAH17, ASXL3, MEX3C, CRLS1, DYRKIA and combinations thereof.
In one example, in addition to detecting a genetic variant at the loci of the SV2C, WBSCR17, PARK16, ITPKB, MCCC1, SNCA, FAM47E-SCARB2, FYN, DLG2, LRRK2 and RIT2, the kit further comprises one or more reagents to detect the presence of a genetic variant at the loci of BSTI, GAK, ASXL3, VPS13C, FGF20, RPS12, ZNF184, SH3GL2, CCDC62, LCORL, RIMS1, UBAP2, RNF141, SCAF11, FBRSLI, RPS6KL1, UBTF and STK39 genes.
In another example, in addition to detecting a genetic variant at the loci of the SV2C, WBSCR17, PARK16, ITPKB, MCCC1, SNCA, FAM47E-SCARB2, FYN, DLG2, LRRK2 and RIT2, the kit further comprises one or more reagents to detect the presence of a genetic variant at the loci of the ILIR2, SCN3A, SATB1, NCKIPSD, CDC71, ALAS1, TLR9, DNAH1, BAP1, PHF7, NISCH, STAB1, ITIH3, ITIH4, ANK2, CAMK2D, ELOVL7, ZNF184, CTSB, SORBS3, PDLIM2, C8orf58, BIN3, SH3GL2, FAM171A1, GALC, COQ7, TOX3, ATP6V0A1, PSMC3I, TUBG2, GBA-SYT11, RAB7L1-NUCKS1, SIPA1L2, ACMSD-TMEM163, STK39, KRT8P25-APOOP2, NMD3, TMEM175-GAK-DGKQ, BST1, HLA-DQB1, GPNMB, FGF20, MMP16, ITGA8, INPP5F, M1R4697, LRRK2, CCDC62, GCH1, TMEM229B, VPS13C, BCKDK-STXIB, SREBF1-RAI1, MAPT, SPPL2B, DDRGKI, USP25, FCGR2A, VAMP4, KCNS3, KCNIP3, L1NC00693, KPNA1, MED12L, SPTSSB, LCORL, CLCN3, PAM, C5orf24, TRIM40, RIMS1, RPS12, GS1-124K5.11, FAM49B, UBAP2, GBF1, RNF141, SCAF11, FBRSLI, CAB39L, MBNL2, MIPOLI, RPS6KL1, CD19, NOD2, CNOT1, CHRNBI, UBTF, FAM171A2, BRIP1, DNAH17, ASXL3, MEX3C, CRLS1 and DYRKIA genes.
In one example, in the kit, the one or more reagents comprises a reagent to isolate a nucleic acid from the sample and at least one primer for amplification of a sequence encoding the genetic variant or part thereof. In another example, the one or more reagents comprises a reagent to isolate a nucleic acid from the sample and at least one probe for amplification of a sequence encoding the genetic variant or part thereof. In yet another example, the one or more reagents comprises a reagent to isolate a nucleic acid from the sample and at least one primer and at least one probe for amplification of a sequence encoding the genetic variant or part thereof.
In one example, the kit of the present invention may be used to identify whether a subject is at risk of developing PD, to identify whether a subject is suffering from PD or whether a subject is in need of early therapeutic intervention for PD.
In another example, kit of the present invention may be used to determine the prognosis of a subject with PD or a subject at risk of developing PD.
In yet another example, the kit of the present invention may be used to calculate a PRS of a subject of developing PD.
It will be understood that the kit of the present invention may be used for one or more of the uses recited herein.
The term “sequence encoding the genetic variant” may refer to any portion of the chromosome that encodes the genetic variant or SNP, including coding and non-coding regions. Coding regions may refer exon. Non-coding regions may refer to regulatory regions or regions without known regulatory functions. Examples of non-coding regions include, but are not limited to, intron, 5′ UTR, 3′UTR, and regulatory regions such as enhancer, transcription factor binding domain and DNA methylation region. In other words, the term “sequence encoding the genetic variant” may refer to the sequence encoding the gene or the sequence affecting the gene or the disease. In some examples, it may refer to the sequence encoding the isoforms of the gene. In one example, it refers to exon. In another example, it refers to intron. In another example, it refers to the promoter region. In another example, it refers to the enhancer region. In yet another example, it refers to the transcription factor binding region.
It will be well understood to one of skill in the art that genetic variant may be detected by a variety of genotyping methods. Examples of methods to detect genetic variation include but are not limited to polymerase chain reaction (PCR), quantitative PCR (qPCR), microarray, real time-PCR (RT-PCR) and Northern blot. Other examples of detection methods include but are not limited to restriction fragment length polymorphism identification (RFLPI) of genomic DNA, random amplified polymorphic detection (RAPD) of genomic DNA, amplified fragment length polymorphism detection (AFLPD), polymerase chain reaction (PCR), DNA sequencing, allele specific oligonucleotide (ASO) probes, and hybridization to DNA microarrays or beads, (epi)GBS (Genotyping by sequencing), RADseq. In some examples, the detection method may be NGS or massive parallel DNA sequencing. In one example, the detection method may be microarray.
It will also be understood to one of skill in the art that a variety of detection reagents may be used to detect the genetic variation. Examples of detection reagents include but are not limited to primers, probes and complementary nucleic acid sequences that hybridize to the gene.
In another example, in the method or the kit as described in the foregoing, the sample is selected from the group consisting of an oral tissue sample, scraping, or wash or a biological fluid sample, saliva, urine or blood or post mortem brain tissue. Examples of the sample includes but is not limited to blood, serum, saliva, urine, cerebrospinal fluid or bone marrow fluid. In one example, the sample is blood. Some other examples of the sample includes but is not limited to fresh tissue, frozen fresh tissue, paraffin embedded tissue or formalin fixed paraffin embedded tissue. In another example, the samples refers to DNA, RNA or protein extracted from one of various types of tissue. In another example, the sample is DNA extracted from one of various types of tissues. In another example, the sample is DNA extracted from blood collected from subjects.
The present invention also refers to a PD biomarker. A PD biomarker may be a combination of genetic variants at the loci of one or more genes.
In one aspect, the present invention refers to a PD biomarker, wherein the biomarker is a genetic variant at the loci of one or more genes selected from the group consisting of SV2C, WBSCR17, PARK16, ITPKB, MCCC1, SNCA, FAM47E-SCARB2, FYN, DLG2, LRRK2, RIT2i and combinations thereof.
In one example, the biomarker is a genetic variant at the loci of SV2C and WBSCR17 genes.
In another example, the biomarker is a genetic variant at the loci of SV2C, WBSCR17, PARK16, ITPKB, MCCC1, SNCA, FAM47E-SCARB2, FYN, DLG2, LRRK2 and RIT2.
The biomarker can be a genetic variant of different types, for example, SNV or SNP. In one example, the biomarker is a SNP at the loci of SV2C and WBSCR17.
In another example, the biomarker is a SNP at the loci of SV2C, WBSCR17, PARK16, ITPKB, MCCC1, SNCA, FAM47E-SCARB2, FYN, DLG2, LRRK2 and RIT2.
In one example, the biomarker is a SNP selected from the group consisting of rs9638616, rs246814, rs31244 and combinations thereof.
In another example, the biomarker is a SNP selected from the group consisting of rs6826785, rs141336855, rs6679073, rs2292056, rs16846351, rs3816248, rs12278023, rs9638616, rs1887316, rs246814, rs31244, rs4130047 and combinations thereof.
In another example, the biomarker is an effect allele or risk allele of the genetic variant, wherein the effect allele or risk allele of rs6826785 is cytosine (C), the effect allele of rs141336855 is thymine (T), the effect allele of rs6679073 is adenine (A), the effect allele of rs2292056 is guanine (G), the effect allele of rs16846351 is guanine (G), the effect allele of rs3816248 is cytosine (C), the effect allele of rs12278023 is cytosine (C), the effect allele of rs9638616 is thymine (T), the effect allele of rs1887316 is adenine (A), the effect allele of rs246814 is thymine (T), the effect allele of rs31244 is guanine (G), and the effect allele of rs4130047 is cytosine (C).
The biomarker can be used to, but not limited to, 1) identify whether a subject is at risk of developing PD, whether a subject is suffering from PD, or whether a subject is in need of early therapeutic intervention for PD; 2) determine the prognosis of a subject with PD or a subject at risk of developing PD including identification of therapeutic needs; 3) calculate a PRS of a subject of developing PD; or 4) stratify subjects who are suffering from PD or at risk of developing PD. It will be understood that the biomarker of the present invention may be used for one or more of the uses recited herein.
The invention illustratively described herein may suitably be practiced in the absence of any element or elements, limitation or limitations, not specifically disclosed herein. Thus, for example, the terms “comprising”, “including”, “containing”, etc. shall be read expansively and without limitation. Additionally, the terms and expressions employed herein have been used as terms of description and not of limitation, and there is no intention in the use of such terms and expressions of excluding any equivalents of the features shown and described or portions thereof, but it is recognized that various modifications are possible within the scope of the invention claimed. Thus, it should be understood that although the present invention has been specifically disclosed by preferred embodiments and optional features, modification and variation of the inventions embodied therein herein disclosed may be resorted to by those skilled in the art, and that such modifications and variations are considered to be within the scope of this invention.
The invention has been described broadly and generically herein. Each of the narrower species and subgeneric groupings falling within the generic disclosure also form part of the invention. This includes the generic description of the invention with a proviso or negative limitation removing any subject matter from the genus, regardless of whether or not the excised material is specifically recited herein.
Other embodiments are within the following claims and non- limiting examples. In addition, where features or aspects of the invention are described in terms of Markush groups, those skilled in the art will recognize that the invention is also thereby described in terms of any individual member or subgroup of members of the Markush group.
Non-limiting examples of the invention and comparative examples will be further described in greater detail by reference to specific Examples, which should not be construed as in any way limiting the scope of the invention.
Methods
Patient Recruitment and Sample Collection
Patients and ethnically- and regionally-matched controls were recruited by thirteen independent centres and study groups from six regions across East Asia. A total of 35,994 subjects were recruited, out of which 34,162 DNA samples (94.9% of recruited subjects) passed quality control for genotyping and 31,575 (92.4% of genotyped samples) were included in the final analysis. Patients were diagnosed with PD using the United Kingdom Parkinson's Society Brain Bank Criteria. The subjects' consent was obtained according to the Declaration of Helsinki. Blood samples were collected from each participant and DNA extraction was performed. This study was approved by the ethics committees or institutional review boards of the respective institutions (SingHealth Centralized Institutional Review Board CIRB 2002/008/A and 2019/2334 and Nanyang Technological University Institutional Review Board IRB-2016-08-011).
GWAS Genotyping and Statistical Analysis
Samples (N=34,162) were genotyped on the Illumina Infinium Global Screening Array −24 v2.0 for 759,993 SNPs. Samples were grouped into five regions: Singapore/Malaysia, Hong Kong, Taiwan, mainland China and South Korea. Genotype data from each batch was exported and converted to forward strand. Samples with extreme sample heterozygosity, gender inconsistencies, call rates <95%, SNPs with call rates <95%, minor allele frequencies (MAF) <1% and Hardy-Weinberg equilibrium (HWE) P<10−3 in controls and/or P<10−6 in all samples as well as all non-autosomal SNPs (X, Y and mitochondrial chromosomes) were excluded.
After performing identity-by-descent analysis using overlapping genotyped SNPs in PLINK and first-degree relative pair identification; the relative with a lower sample call rate was excluded. Principal components analysis was also run on 82,324 independent genotyped SNPs (pruned with pairwise r2<0.1 in a window of 500 SNPs, sliding in steps of 50) after exclusion of SNPs in the five conserved long-range linkage disequilibrium (LD) regions in Chinese. Outliers on the first six principal components were then excluded and principal components analysis was re-run in the remaining samples. 31,575 samples remained for the final analysis.
The software IMPUTE version 2 was used for imputation of untyped SNPs in each dataset following pre-phasing using SHAPEIT2, and using the multi-ethnic 1000 genomes Phase 3 reference panel consisting of 77,818,332 biallelic SNP genotypes in 2,504 individuals from Africa, East and South Asia, Europe, and the Americas. The imputation was ran separately for each of the five regions. Further stringent quality control filtering was run at the SNP level, excluding those with MAF <1%, info score <0.8, HWE Pin controls <10−3, HWE Pin all samples<10−6. All the 11 genome-wide significant SNPs were confirmed to have either good genotyping clusters or high imputation info scores.
Logistic regression analyses was run on genotype dosages adjusting for the first three principal components using SNPTEST. The results were combined using a fixed-effects inverse variance meta-analysis in PLINK.
Polygenic Risk Calculations
PRS were calculated in 2,536 PD cases and 21,840 population- based controls from Singapore and Malaysia. Weighted PRS were calculated based on sum of high-risk alleles weighted by their effect sizes (beta) that were calculated based on meta-analysis across five Asian datasets (11 Asian SNPs) or reported in the respective publications (Chang et al, 2017; Nalls et al, 2014; Nalls et al, 2019) (78 European SNPs). For polygenic risk scores combining Asian and European SNPs, 80 SNPs were included, whereby only the Asian SNP was considered at each of the nine loci that overlapped between the Asian and European PRS model. PRS cut-offs for the top and bottom 5% and 10% were determined based on the 21,840 population controls, and numbers of PD cases within each score range were then determined to estimate fold-difference in risk between the two extreme groups.
Fraction of Variance and Area Under Curve Analysis
The percentage of the total variance explained was estimated by calculating Nagelkerke's pseudo R2 using the fmsb package, entering SNP genotypes and affection status into the glm function in R (v 3.5.0). Receiver-operating characteristic (ROC) curves and area under the curve (AUC) estimates were done using the pROC package, using the bootstrap test (n=100) to assess differences between two ROC curves.
Replication in European-Ancestry and Japanese Samples
SNPs within the two novel loci were analyzed in 988 PD cases and 2,521 controls from Japan and SNPs in high LD (r2>0.9) were identified using SNiPA. The top SNPs in the largest and most recent European-ancestry PD GWAS (56,306 cases, 1,417,791 controls recruited from North America, Europe, Asia and Australia) from the IPDGC were analyzed.
Results
A total of 31,575 samples remained after quality control filtering, consisting of 6,724 PD cases 24,851 controls from China (2,279 cases, 2,021 controls), Taiwan (216 cases, 225 controls), Hong Kong (199 cases, 166 controls), South Korea (1,494 cases, 599 controls) and Chinese participants from Singapore and Malaysia (2,536 cases, 21,840 controls). Association statistics were combined using fixed effects meta-analysis at a total of 5,843,213 SNPs (MAF≥1%; λGC=1.082; λ1000=1.0077; λGC for MAF≥5%=1.092; λ1000=1.0087; LD score intercept=1.02) that were genotyped or successfully imputed at high quality across all five datasets. Sensitivity analyses using leave-one-out meta-analyses suggested that the effect size estimates were not driven by any single study (Table 2).
Table 2 Sensitivity analyses using leave-one-out meta-analysis using correlation between beta estimated across all 5,843,213 SNPs using all 5 datasets and beta estimated when one dataset is left out. For the 11 genome-wide significant loci, beta values from each meta-analysis (fixed effects) are shown for the lead SNP.
This meta-analysis revealed eleven genome-wide significant loci out of which nine were previously described (PARK16, ITPKB, MCCC1, SNCA, FAM47E-SCARB2, DLG2, LRRK2, RIT2 and FYN) (
Genome-wide significant association was observed at rs246814 (OR=1.24, 95% Cl=1.15−1.34, P=3.48×10−8) located within an intron of the SV2C gene (
Genome-wide significant association was also observed at a second novel locus tagged by rs9638616 (OR=1.14, 95% CI=1.09-1.19, P=2.53×10−8) (
The association evidence was evaluated at SNPs and loci previously reported to show genome-wide significant association with PD in European populations (Chang et al, 2017; Nalls et al 2014; Nalls et al, 2019) in the present GWAS meta-analysis results (Table 5, Table 6). Of the 78 SNPs polymorphic in Asian samples, only three showed genome-wide significant association in Asians, and another six were associated at P<1×10−5 (Table 5). A total of 63 SNPs had OR in same direction (38 with P<0.05), 15 had OR in the opposite direction (all with P>0.05 except MEX3C). It is recognized that the present Asian sample set is smaller than the largest European GWAS and has limited statistical power to validate these loci. However, the fraction of polymorphic SNPs showing same direction of association (63/78=80.8%) and the strong enrichment for significant SNPs (38/78=48.7% at P<0.05; median P=0.055, λ=8.08) suggest a substantial but incomplete overlap in genetic risk between Asian and European populations. At the locus level, SNPs with P<1×10−5 were observed in 16 of the previously-reported loci (Table 3), while there was no evidence of linked or independent signals crossing P<1×10−5at the remaining
loci.
Table 5 Variants at reported PD risk loci with P<0.01 in Asian discovery samples. Full SNP rsids and association statistics are listed in see Table 6.
To determine if the two novel SNPs are associated with PD risk in other populations, summary statistics from the largest European-ancestry datasets available online, namely the UK Biobank (1,239 cases, 451,025 controls) and the most recent meta-GWAS by the IPDGC (up to 56,306 cases, 1,417,791 controls) was evaluated. Given that the IPDGC dataset includes proxy cases and web-based diagnosed cases and controls, only the subset of clinically diagnosed PD cases consisting of 15,056 cases and 12,637 controls (Table 4) was analysed. In addition, SNPs within these two loci were analysed in 988 cases, 2521 controls from Japan. Both risk variants are present at lower frequencies in European populations compared to Asian populations (Table 4).
Consistent association was observed at SV2C in samples of Japanese (OR=1.11, 95% CI=0.94-1.31, P=0.24) and European-ancestry including IPDGC full (OR=1.07, 95% CI=1.04-1.11; P=3.62×10−5), and IPDGC clinically-diagnosed sub-dataset (OR=1.13, 95% CI=1.06-1.21; P=2.95×10−4) and UK Biobank data (OR=1.09, 95% CI=0.94-1.26; P=0.25). Based on the full replication datasets, significant replication was observed at the SV2C locus (ORreplication meta-analysis=1.07; 95% CI=1.04-1.11; Preplication meta-analysis=9.74×10−6; I2=0%, Phet=0.92; ORcombined meta-analysis=1.10; 95% CI=1.07-1.13; Pcombined meta-analysis=6.02×10−10; I2=48%, Phet=0.06) (Table 4). Meta-analysis of Asian consortium discovery samples with the European and Japanese clinically-diagnosed PD replication samples provided strong support for the association at both the lead SNP SV2C rs246814 (OR=1.16; 95% CI=1.11-1.21; P=1.17×10−10; I2=0%. Phet=0.50) (Table 4) and the missense variant p.Asp543Asn rs31244 (OR=1.16; 95% CI=1.11-1.21; P=1.80×10−10; I2=0%. Phet=0.53) with low inter-cohort and inter-ethnic heterogeneity.
The WBSCR17 SNP rs9638616 did not appear to be associated with PD risk in European data, in IPDGC full (OR=1.00, 95% CI=0.98-1.02; P=0.76) and clinically-diagnosed datasets (OR=1.01, 95% CI=0.95-1.06; P=0.85), UK BioBank (OR=0.97, 95% CI=0.89-1.06; P=0.53) or Japan (OR=1.04, 95% CI=0.94-1.16; P=0.43) PD GWAS. This SNP (OR=1.06; 95% CI=1.03-1.10; P=8.37×10−5; I2=67.1%; Phet=3.40×10−3) and locus did not reach genome-wide significance in a meta-analysis between the discovery, Japanese and European clinically-diagnosed PD samples (Table 4).
PRS was calculated based on the 11 genome-wide significant SNPs identified in this Asian PD study (Table 1 and 7). To evaluate the utility of SNPs identified by European GWAS in predicting risk in the Asian population, separate scores were calculated using 90 risk variants (78 polymorphic) from previously-reported European loci using effect sizes derived from the GWAS in which they were first reported. The PRS distribution was then evaluated in the largest Asian subset of 2,536 PD cases and 21,840 controls from Singapore and Malaysia (
In the weighted PRS distribution based on the 11 Asian SNPs, a 4.0- and 3.5-fold difference was observed in risk between the top and bottom 5% and 10% of the PRS distribution in controls (
These 11 Asian SNPs were estimated to account for about 2.61% of the variance in PD risk in this dataset (AUC=60.4%; 95% CI=59.5-61.8%), while the 78 polymorphic European SNPs explained about 2.57% of the variance in the same dataset (AUC=60.2%; 95% CI=59.0-61.2%). The AUCs were not significantly different between the two models (P=0.825). While the European PD SNPs are still able to discriminate Asian cases and controls, their utility is limited by allelic heterogeneity, LD differences and variability in effect sizes because of gene-gene or gene-environment interactions. Combining the European and Asian loci (Table 8), a significant improvement was observed in AUC (63.1%; 95% CI: 62.1-64.4%) over the model based on European loci alone (P=6.81×10−12) (
Discussion
The largest multicenter Asian GWAS on PD to date has been conducted, analysing 31,575 subjects (6,724 cases, 24,851 controls) from six regions across East Asia. Genome-wide significant association signals were observed at 11 loci and consistent association at nominal significance (P<0.05) at 51 other previously-reported loci. Of the two novel loci identified, strong replication of the association at SV2C was observed across three independent sample collections from European-ancestry and Japanese populations.
The top-associated haplotype at SV2C is consistent between Asian and European-ancestry samples. Despite differences in LD patterns, the top SNP rs246814 is in near perfect LD with p.Asp543Asn (rs31244) and two other flanking SNPs rs246813 and rs246815 in both Asians and Europeans, suggesting that the functional variant likely resides on this common haplotype. The lack of significant replication at WBSCR17 in the Japanese dataset may be attributed to the small effect sizes observed at this locus (68.5% power to detect an association at alpha=0.05). There is no significant genetic heterogeneity between the Japanese replication samples and the present East Asian discovery GWAS samples (Phet=0.24, I2=25.6%).
This study is notable in several aspects. Firstly, strong evidence is provided for the association of genetic variants (including a non-synonymous variant) in SV2C with PD risk in humans. The strong association reported now between this naturally occurring SV2C missense allele and increased risk of PD lends credence to SV2C being a potential therapeutic target.
In addition, the present results demonstrate that there are significant differences in the overall underlying genetic architecture, involving allele frequency and LD patterns and allelic heterogeneity between Europeans and Asians, leading to an improvement in the PRS model upon inclusion of SNPs identified in Asians.
Equivalents
The foregoing examples are presented for the purpose of illustrating the invention and should not be construed as imposing any limitation on the scope of the invention. It will readily be apparent that numerous modifications and alterations may be made to the specific embodiments of the invention described above and illustrated in the examples without departing from the principles underlying the invention. All such modifications and alterations are intended to be embraced by this application.
Number | Date | Country | Kind |
---|---|---|---|
10202001048U | Feb 2020 | SG | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/SG2021/050063 | 2/5/2021 | WO |