The present disclosure relates to methods and systems for assessing the risk of a human subject for developing a disease such as coronary artery disease, atrial fibrillation or Type 2 diabetes. These methods may be combined with the subjects clinical risk to improve risk analysis. Such methods may be used to assist decision making about appropriate therapeutic and monitoring regimens.
The most common cardiovascular disease is coronary artery disease (CAD) which involves the reduction of blood flow to the heart muscle due to build-up of plaque (atherosclerosis) in the arteries of the heart. Clinical risk factors include high blood pressure, smoking, diabetes, lack of exercise, obesity, high blood cholesterol, poor diet, depression and excessive alcohol. A number of tests may help with diagnoses including: electrocardiogram, cardiac stress testing, coronary computed tomographic angiography, and coronary angiogram.
Atrial fibrillation (AF) is an abnormal heart rhythm (arrhythmia) characterized by the rapid and irregular beating of the atrial chambers of the heart. AF typically begins as short periods of abnormal beating, which become longer or continuous over time. It may also start as other forms of arrhythmia such as atrial flutter that then transform into AF. High blood pressure and valvular heart disease are the most common alterable risk factors for AF. Other heart-related risk factors include heart failure, coronary artery disease, cardiomyopathy, and congenital heart disease.
Type 2 diabetes (T2D), also known as adult-onset diabetes, is a form of diabetes that is characterized by high blood sugar, insulin resistance, and relative lack of insulin. Long-term complications from high blood sugar include heart disease, strokes, diabetic retinopathy which can result in blindness, kidney failure, and poor blood flow in the limbs which may lead to amputations. Type 2 diabetes primarily occurs as a result of obesity and lack of exercise.
The Framingham Risk Score is a gender-specific algorithm used to estimate the 10-year CAD risk of an individual. It is also used assessing risk of developing others diseases such as AF and T2D. The Framingham Risk Score was first developed based on data obtained from the Framingham Heart Study, to estimate the 10-year risk of developing coronary heart disease (Wilson et al., 1998). In order to assess the 10-year cardiovascular disease risk, cerebrovascular events, peripheral artery disease and heart failure were subsequently added as disease outcomes for the 2008 Framingham Risk Score, on top of coronary heart disease (D'Agonstino et al., 2008).
There is a need for improved methods of assessing an individuals risk of developing coronary artery disease, atrial fibrillation or Type 2 diabetes.
In a first aspect, the present invention provides a method for assessing the risk of a human subject developing coronary artery disease, the method comprising performing a genetic risk assessment of the human subject, wherein the genetic risk assessment involves detecting, in a biological sample derived from the human subject, the presence of at least one polymorphisms associated with a risk of developing coronary artery disease, wherein the at least one polymorphism is selected from Table 1 and/or Table 2, or a single nucleotide polymorphism in linkage disequilibrium with one or more thereof.
In an embodiment of the above aspect, the method comprises detecting the presence of at least two, at least three, at least four, at least five, at least six, at least seven, at least eight, at least nine, at least ten, at least 20, at least 50, at least 100, at least 150, at least 200, at least 250, at least 300, at least 350, at least 400, at least 450, at least 500, at least 550, at least 600, at least 650, at least 700, at least 750, at least 800, at least 825, at least 850, or all of the polymorphism provided in Table 1, or a single nucleotide polymorphism in linkage disequilibrium with one or more thereof.
In an embodiment of the above aspect, the method comprises detecting the presence of at least two, at least three, at least four, at least five, at least six, at least seven, at least eight, at least nine, at least ten, at least 20, at least 50, at least 100, at least 150, at least 200, at least 250, at least 300, at least 350, at least 400, at least 450, at least 500, at least 550, at least 600, at least 650, at least 700, at least 750, at least 800, at least 850, at least 900, at least 950, at least 1,000 or all of the polymorphism provided in Tables 1 and 2, or a single nucleotide polymorphism in linkage disequilibrium with one or more thereof.
In an embodiment of the above aspect, the method comprises detecting each of the polymorphisms provided in Table 1. In an embodiment, method further comprises detecting each of the polymorphisms provided in Table 2.
In another aspect, the present invention provides a method for assessing the risk of a human subject developing atrial fibrillation, the method comprising performing a genetic risk assessment of the human subject, wherein the genetic risk assessment involves detecting, in a biological sample derived from the human subject, the presence at least one polymorphisms associated with a risk of developing atrial fibrillation, wherein the at least one polymorphism is selected from Table 3, or a single nucleotide polymorphism in linkage disequilibrium with one or more thereof.
In an embodiment of the above aspect, the method comprises detecting the presence of at least two, at least three, at least four, at least five, at least six, at least seven, at least eight, at least nine, at least ten, at least 20, at least 50, at least 100, at least 125, at least 150, at least 175, at least 200, at least 225, at least 250, or all of the polymorphism provided in Table 3, or a single nucleotide polymorphism in linkage disequilibrium with one or more thereof.
In an embodiment of the above aspect, the method comprises detecting each of the polymorphism provided in Table 3.
In a further aspect, the present invention provides a method for assessing the risk of a human subject developing Type 2 diabetes, the method comprising performing a genetic risk assessment of the human subject, wherein the genetic risk assessment involves detecting, in a biological sample derived from the human subject, the presence at least one polymorphisms associated with a risk of developing Type 2 diabetes, wherein the at least one polymorphism is selected from Table 4, or a single nucleotide polymorphism in linkage disequilibrium with one or more thereof.
In an embodiment of the above aspect, the method comprises detecting the presence of at least two, at least three, at least four, at least five, at least six, at least seven, at least eight, at least nine, at least ten, at least 20, at least 30, at least 40, at least 50, at least 60, at least 70, at least 80, at least 85, or all of the polymorphism provided in Table 4, or a single nucleotide polymorphism in linkage disequilibrium with one or more thereof.
In an embodiment of the above aspect, the method comprises detecting each of the polymorphism provided in Table 4.
In an embodiment, the method further comprises
In an embodiment, the clinical risk assessment includes, but is not limited to, obtaining information from the subject on one or more or all of: age, gender, HDL-cholesterol level (mmol/L), LDL-cholesterol level (mmol/L), total cholesterol level, blood pressure (systolic and/or diastolic (mm Hg)), smoking status, have or has had diabetes, on hypertension medication, c-reactive protein levels, whether the subject's mother or father have had a heart attack (such by the age of 60), body mass index, ethnicity, measures of deprivation, family history, have or has had chronic kidney disease, and have or has had rheumatoid arthritis.
In an embodiment, the clinical risk assessment includes obtaining information from the subject on one or more or all of: age, gender, HDL-cholesterol level (mmol/L), LDL-cholesterol level (mmol/L), total cholesterol level, blood pressure (systolic and diastolic (mm Hg)), smoking status, have or has had diabetes and on hypertension medication.
In an embodiment, the clinical risk assessment is the Framingham score.
In an embodiment, when the disease is coronary artery disease, the clinical risk assessment is The American College of Cardiologists Pooled Cohort Equations (PCE).
In an embodiment, combining the clinical risk assessment and the genetic risk assessment comprises adding or multiplying the risk assessments.
In a further aspect, the present invention provides a method of determining the identity of the alleles of fewer than 100,000 polymorphisms in a human subject selected from the group of subjects consisting of humans in need of assessment for the risk of developing coronary artery disease to produce a polymorphic profile of the subject, comprising
In a further aspect, the present invention provides a method of determining the identity of the alleles of fewer than 100,000 polymorphisms in a human subject selected from the group of subjects consisting of humans in need of assessment for the risk of developing atrial fibrillation to produce a polymorphic profile of the subject, comprising
In a further aspect, the present invention provides a method of determining the identity of the alleles of fewer than 100,000 polymorphisms in a human subject selected from the group of subjects consisting of humans in need of assessment for the risk of developing Type 2 diabetes to produce a polymorphic profile of the subject, comprising
In an embodiment of the above three aspects, where relevant, fewer than 100,000 polymorphisms, fewer than 50,000 polymorphisms, fewer than 40,000 polymorphisms, fewer than 30,000 polymorphisms, fewer than 20,000 polymorphisms, fewer than 10,000 polymorphisms, fewer than 7,500 polymorphisms, fewer than 5,000 polymorphisms, fewer than 4,000 polymorphisms, fewer than 3,000 polymorphisms, fewer than 2,000 polymorphisms, fewer than 1,000 polymorphisms, fewer than 900 polymorphisms, fewer than 800 polymorphisms, fewer than 700 polymorphisms, fewer than 600 polymorphisms, fewer than 500 polymorphisms, fewer than 400 polymorphisms, fewer than 300 polymorphisms, fewer than 200 polymorphisms, or fewer than 100 polymorphisms, are selected for allelic identity.
In an embodiment, the polymorphism(s) in linkage disequilibrium has linkage disequilibrium above 0.9. In an embodiment, the polymorphism(s) in linkage disequilibrium has linkage disequilibrium above 0.95. In an embodiment, the polymorphism(s) in linkage disequilibrium has linkage disequilibrium above 0.99. In another embodiment, the polymorphism(s) in linkage disequilibrium has linkage disequilibrium of 1.
In an embodiment, the risk assessment produces a score and the method further comprises comparing the score to a predetermined threshold, wherein if the score is at, or above, the threshold the subject is assessed at being at risk of developing coronary artery disease, atrial fibrillation or Type 2 diabetes.
In a further aspect, the present invention provides a method for determining the need for routine diagnostic testing of a human subject for a coronary artery disease, atrial fibrillation or Type 2 diabetes, the method comprising assessing the risk of the subject for developing coronary artery disease, atrial fibrillation or Type 2 diabetes using a method of the invention.
In another aspect, the present invention provides a method of screening for coronary artery disease, atrial fibrillation or Type 2 diabetes in a human subject, the method comprising assessing the risk of the subject for developing coronary artery disease, atrial fibrillation or Type 2 diabetes using a method of the invention, and routinely screening for coronary artery disease, atrial fibrillation or Type 2 diabetes in the subject if they are assessed as having a risk for developing coronary artery disease, atrial fibrillation or Type 2 diabetes.
In an embodiment of the above two aspects, for coronary artery disease the screening involves conducting one or more of an electrocardiogram (ECG), an exercise stress test, a nuclear stress test, a cardiac catheterization and angiogram, or a cardiac CT scan.
In an embodiment of the above two aspects, for atrial fibrillation disease the screening involves conducting an electrocardiogram (ECG).
In an embodiment of the above two aspects, for Type 2 diabetes the screening involves analysing one or more of blood glucose levels, urine glucose levels, glycated hemoglobin (HbA1c) levels, fructosamine levels or glucose tolerance of the subject.
In an aspect, the present invention provides a method for determining the need of a human subject for prophylactic anti-coronary artery disease therapy, anti-atrial fibrillation therapy or anti-Type 2 diabetes therapy comprising assessing the risk of the subject for developing coronary artery disease, atrial fibrillation or Type 2 diabetes using a method of the invention.
In yet a further aspect, the present invention provides a method for preventing or reducing the risk of coronary artery disease, atrial fibrillation or Type 2 diabetes in a human subject, the method comprising assessing the risk of the subject for developing coronary artery disease, atrial fibrillation or Type 2 diabetes using a method of the invention, and if they are assessed as having a risk for developing coronary artery disease, atrial fibrillation or Type 2 diabetes administering anti-coronary artery disease therapy, anti-atrial fibrillation therapy or anti-Type 2 diabetes therapy, respectively.
In a further aspect, the present invention provides an anti-coronary artery disease therapy, anti-atrial fibrillation therapy or anti-Type 2 diabetes therapy for use in preventing coronary artery disease, atrial fibrillation or Type 2 diabetes, respectively, in a human subject at risk thereof, wherein the subject is assessed as having a risk for developing coronary artery disease, atrial fibrillation or Type 2 diabetes using a method of the invention.
In an embodiment, the anti-coronary artery disease therapy is selected from, but not limited to, cholesterol lowering medication such as a statin, blood thinning medication such as aspirin, warfarin or rivaroxaban, a β-blocker, nitrates, or a calcium channel blocker.
In an embodiment, the anti-atrial fibrillation therapy is selected from, but not limited to, cardioversion, a n-blocker, a calcium channel blocker, blood thinning medication such as warfarin, aspirin or rivaroxaban, or an antiarrhythmic drug such as quinidine, flecainide, propafenone, sotalol, dofetilide. or amiodar.
In an embodiment, the anti-Type 2 diabetes therapy is selected from, but not limited to, metformin, insulin, a sulfonylurea such as glimepiride, glyburide or glipizide, a meglitinide such as prandin or starlix, a thiazolidinedione such as rosiglitazone or pioglitazone, a DPP-4 inhibitor such as sitagliptin, saxagliptin, linagliptin or alogliptin, a GLP-1 receptor agonist such as exenatide, liraglutide, lixisenatide, albiglutide or dulaglutide, and an SGLT2 inhibitor such as forxiga, invokana or jardiance.
In another aspect, the present invention provides a method for stratifying a group of human subject's for a clinical trial of a candidate therapy, the method comprising assessing the individual risk of the subject's for developing coronary artery disease, atrial fibrillation or Type 2 diabetes a method of the invention, and using the results of the assessment to select subject's more likely to be responsive to the therapy.
Also provides a kit comprising at least two sets of primers for amplifying two or more nucleic acids, wherein the two or more nucleic acids comprise a polymorphism selected from any one of Tables 1 to 4, or any combinations thereof such as Tables, 1, 3 and 4, or Tables 1 and 2, or a single nucleotide polymorphism in linkage disequilibrium with one or more thereof.
In a further aspect, provided is a genetic array comprising at least two sets of probes for hybridising to two or more nucleic acids, wherein the two or more nucleic acids comprise a polymorphism selected from any one of Tables 1 to 4, or any combinations thereof such as Tables, 1, 3 and 4, or Tables 1 and 2, or a single nucleotide polymorphism in linkage disequilibrium with one or more thereof.
In an aspect, the present invention provides a computer implemented method for assessing the risk of a human subject developing coronary artery disease, atrial fibrillation or Type 2 diabetes, the method operable in a computing system comprising a processor and a memory, the method comprising:
In a further aspect, the present invention provides a computer implemented method for assessing the risk of a human subject developing coronary artery disease, atrial fibrillation or Type 2 diabetes, the method operable in a computing system comprising a processor and a memory, the method comprising:
In another aspect, the present invention provides a system for assessing the risk of a human subject developing coronary artery disease, atrial fibrillation or Type 2 diabetes comprising:
In another aspect, the present invention provides a system for assessing the risk of a human subject developing coronary artery disease, atrial fibrillation or Type 2 diabetes comprising:
In an embodiment, the risk data for the subject is received from a user interface coupled to the computing system. In another embodiment, the risk data for the subject is received from a remote device across a wireless communications network. In another embodiment, the user interface or remote device is a SNP array platform. In another embodiment, outputting comprises outputting information to a user interface coupled to the computing system. In another embodiment, outputting comprises transmitting information to a remote device across a wireless communications network.
Any embodiment herein shall be taken to apply mutatis mutandis to any other embodiment unless specifically stated otherwise.
The present invention is not to be limited in scope by the specific embodiments described herein, which are intended for the purpose of exemplification only. Functionally-equivalent products, compositions and methods are clearly within the scope of the invention, as described herein.
Throughout this specification, unless specifically stated otherwise or the context requires otherwise, reference to a single step, composition of matter, group of steps or group of compositions of matter shall be taken to encompass one and a plurality (i.e. one or more) of those steps, compositions of matter, groups of steps or group of compositions of matter.
The invention is hereinafter described by way of the following non-limiting Examples and with reference to the accompanying FIGURES.
Unless specifically defined otherwise, all technical and scientific terms used herein shall be taken to have the same meaning as commonly understood by one of ordinary skill in the art (e.g., coronary artery disease, atrial fibrillation and Type 2 diabetes analysis, treatment and prevention, molecular genetics, bioinformatics and biochemistry).
Unless otherwise indicated, the molecular and statistical techniques utilized in the present disclosure are standard procedures, well known to those skilled in the art. Such techniques are described and explained throughout the literature in sources such as, J. Perbal, A Practical Guide to Molecular Cloning, John Wiley and Sons (1984), J. Sambrook et al., Molecular Cloning: A Laboratory Manual, Cold Spring Harbour Laboratory Press (1989), T. A. Brown (editor), Essential Molecular Biology: A Practical Approach, Volumes 1 and 2, IRL Press (1991), D. M. Glover and B. D. Hames (editors), DNA Cloning: A Practical Approach, Volumes 1-4, IRL Press (1995 and 1996), and F. M. Ausubel et al. (editors), Current Protocols in Molecular Biology, Greene Pub. Associates and Wiley-Interscience (1988, including all updates until present), Ed Harlow and David Lane (editors) Antibodies: A Laboratory Manual, Cold Spring Harbour Laboratory, (1988), and J. E. Coligan et al. (editors) Current Protocols in Immunology, John Wiley & Sons (including all updates until present).
It is to be understood that this disclosure is not limited to particular embodiments, which can, of course, vary. It is also to be understood that the terminology used herein is for the purpose of describing particular embodiments only, and is not intended to be limiting. As used in this specification and the appended claims, terms in the singular and the singular forms “a,” “an” and “the,” for example, optionally include plural referents unless the content clearly dictates otherwise. Thus, for example, reference to “a probe” optionally includes a plurality of probe molecules; similarly, depending on the context, use of the term “a nucleic acid” optionally includes, as a practical matter, many copies of that nucleic acid molecule.
As used herein, the term “about”, unless stated to the contrary, refers to +/−10%, more preferably +/−5%, more preferably +/−1%, of the designated value.
The term “and/or”, e.g., “X and/or Y” shall be understood to mean either “X and Y” or “X or Y” and shall be taken to provide explicit support for both meanings or for either meaning.
As used herein, the term “coronary artery disease” refers to the narrowing or blockage of the coronary arteries limiting blood flow to the heart, usually caused by hardening or clogging of the arteries due to the buildup of cholesterol and fatty deposits (called plaques) on the inner walls of the arteries. Symptoms include, but are not limited to, angina, cold sweats, dizziness, light-headedness, nausea or a feeling of indigestion, neck pain, shortness of breath especially with activity, and sleep disturbances. Coronary artery disease is also known in the art as “coronary heart disease” and “atherosclerotic heart disease”.
As used herein, the term “atrial fibrillation” refers to an irregular, typically rapid, heart rate that occurs when the two upper chambers of your heart experience chaotic electrical signals. The result is a fast and irregular heart rhythm. The heart rate in atrial fibrillation generally ranges from 100 to 175 beats a minute. Symptoms include, but are not limited to, general fatigue, rapid and irregular heartbeat, fluttering or thumping in the chest, dizziness, shortness of breath and anxiety, weakness, faintness or confusion, and fatigue when exercising.
As used herein, the term “Type 2 diabetes” or T2D refers to a chronic condition that affects the way the body processes blood sugar (glucose). With type 2 diabetes, the body either does not produce enough insulin, or it resists insulin. Symptoms of Type 2 diabetes include increased thirst, frequent urination, hunger, fatigue and blurred vision. In some cases, there may be no symptoms.
A “polymorphism” is a locus that is variable; that is, within a population, the nucleotide sequence at a polymorphism has more than one version or allele. One example of a polymorphism is a “single nucleotide polymorphism”, which is a polymorphism at a single nucleotide position in a genome (the nucleotide at the specified position varies between individuals or populations).
As used herein, the term “SNP” or “single nucleotide polymorphism” refers to a genetic variation between individuals; e.g., a single nitrogenous base position in the DNA of organisms that is variable. As used herein, “SNPs” is the plural of SNP. Of course, when one refers to DNA herein, such reference may include derivatives of the DNA such as amplicons, RNA transcripts thereof, etc.
The term “allele” refers to one of two or more different nucleotide sequences that occur or are encoded at a specific locus, or two or more different polypeptide sequences encoded by such a locus. For example, a first allele can occur on one chromosome, while a second allele occurs on a second homologous chromosome, e.g., as occurs for different chromosomes of a heterozygous individual, or between different homozygous or heterozygous individuals in a population. An allele “positively” correlates with a trait when it is linked to it and when presence of the allele is an indicator that the trait or trait form will occur in an individual comprising the allele. An allele “negatively” correlates with a trait when it is linked to it and when presence of the allele is an indicator that a trait or trait form will not occur in an individual comprising the allele. The term “risk allele” is used in the context of the present disclosure to refer to an allele indicating a genetic propensity to susceptibility to coronary artery disease, atrial fibrillation or Type 2 diabetes. A subject can be homozygous, heterozygous or null for a particular risk allele.
A marker polymorphism or allele is “correlated” or “associated” with a specified phenotype (coronary artery disease, atrial fibrillation or Type 2 diabetes susceptibility, etc.) when it can be statistically linked (positively or negatively) to the phenotype. Methods for determining whether a polymorphism or allele is statistically linked are known to those in the art. That is, the specified polymorphism(s) occurs more commonly in a case population (e.g., coronary artery disease, atrial fibrillation or Type 2 diabetes patients) than in a control population (e.g., individuals that do not have coronary artery disease, atrial fibrillation or Type 2 diabetes patients, respectively). This correlation is often inferred as being causal in nature, but it need not be—simple genetic linkage to (association with) a locus for a trait that underlies the phenotype is sufficient for correlation/association to occur.
The phrase “linkage disequilibrium” (LD) is used to describe the statistical correlation between two neighbouring polymorphic genotypes. Typically, LD refers to the correlation between the alleles of a random gamete at the two loci, assuming Hardy-Weinberg equilibrium (statistical independence) between gametes. LD is quantified with either Lewontin's parameter of association (D′) or with Pearson correlation coefficient (r) (Devlin and Risch, 1995). Two loci with a LD value of 1 are said to be in complete LD. At the other extreme, two loci with a LD value of 0 are termed to be in linkage equilibrium. Linkage disequilibrium is calculated following the application of the expectation maximization algorithm (EM) for the estimation of haplotype frequencies (Slatkin and Excoffier, 1996). LD values according to the present disclosure for neighbouring genotypes/loci are selected above 0.5, more preferably, above 0.6, still more preferably, above 0.7, preferably, above 0.8, more preferably above 0.9, ideally about 1.0. Many of the SNPs in linkage disequilibrium with the SNPs of the present disclosure that are described herein have LD values of 0.9 or 1.
Another way one of skill in the art can readily identify SNPs in linkage disequilibrium with the SNPs of the present disclosure is determining the LOD score for two loci. LOD stands for “logarithm of the odds”, a statistical estimate of whether two genes, or a gene and a disease gene, are likely to be located near each other on a chromosome and are therefore likely to be inherited. A LOD score of between about 2-3 or higher is generally understood to mean that two genes are located close to each other on the chromosome. Thus, in an embodiment, LOD values according to the present disclosure for neighbouring genotypes/loci are selected at least above 2, at least above 3, at least above 4, at least above 5, at least above 6, at least above 7, at least above 8, at least above 9, at least above 10, at least above 20 at least above 30, at least above 40, at least above 50.
In another embodiment, SNPs in linkage disequilibrium with the SNPs of the present disclosure can have a specified genetic recombination distance of less than or equal to about 20 centimorgan (cM) or less. For example, 15 cM or less, 10 cM or less, 9 cM or less, 8 cM or less, 7 cM or less, 6 cM or less, 5 cM or less, 4 cM or less, 3 cM or less, 2 cM or less, 1 cM or less, 0.75 cM or less, 0.5 cM or less, 0.25 cM or less, or cM or less. For example, two linked loci within a single chromosome segment can undergo recombination during meiosis with each other at a frequency of less than or equal to about 20%, about 19%, about 18%, about 17%, about 16%, about 15%, about 14%, about 13%, about 12%, about 11%, about 10%, about 9%, about 8%, about 7%, about 6%, about 5%, about 4%, about 3%, about 2%, about 1%, about 0.75%, about 0.5%, about 0.25%, or about 0.1% or less.
In another embodiment, SNPs in linkage disequilibrium with the SNPs of the present disclosure are within at least 100 kb (which correlates in humans to about 0.1 cM, depending on local recombination rate), at least 50 kb, at least 20 kb or less of each other.
One exemplary approach for the identification of surrogate markers for a particular SNP involves a simple strategy that presumes that SNPs surrounding the target SNP are in linkage disequilibrium and can therefore provide information about disease susceptibility. Potentially surrogate markers can therefore be identified from publicly available databases, such as HAPMAP, by searching for SNPs fulfilling certain criteria which have been found in the scientific community to be suitable for the selection of surrogate marker candidates.
“Allele frequency” refers to the frequency (proportion or percentage) at which an allele is present at a locus within an individual, within a line or within a population of lines. For example, for an allele “A,” diploid individuals of genotype “AA,” “Aa,” or “aa” have allele frequencies of 1.0, 0.5, or 0.0, respectively. One can estimate the allele frequency within a line or population (e.g., cases or controls) by averaging the allele frequencies of a sample of individuals from that line or population. Similarly, one can calculate the allele frequency within a population of lines by averaging the allele frequencies of lines that make up the population.
In an embodiment, the term “allele frequency” is used to define the minor allele frequency (MAF). MAF refers to the frequency at which the least common allele occurs in a given population.
An individual is “homozygous” if the individual has only one type of allele at a given locus (e.g., a diploid individual has a copy of the same allele at a locus for each of two homologous chromosomes). An individual is “heterozygous” if more than one allele type is present at a given locus (e.g., a diploid individual with one copy each of two different alleles). The term “homogeneity” indicates that members of a group have the same genotype at one or more specific loci. In contrast, the term “heterogeneity” is used to indicate that individuals within the group differ in genotype at one or more specific loci.
A “locus” is a chromosomal position or region. For example, a polymorphic locus is a position or region where a polymorphic nucleic acid, trait determinant, gene or marker is located. In a further example, a “gene locus” is a specific chromosome location (region) in the genome of a species where a specific gene can be found.
A “marker,” “molecular marker” or “marker nucleic acid” refers to a nucleotide sequence or encoded product thereof (e.g., a protein) used as a point of reference when identifying a locus or a linked locus. A marker can be derived from genomic nucleotide sequence or from expressed nucleotide sequences (e.g., from an RNA, nRNA, mRNA, a cDNA, etc.), or from an encoded polypeptide. The term also refers to nucleic acid sequences complementary to or flanking the marker sequences, such as nucleic acids used as probes or primer pairs capable of amplifying the marker sequence. A “marker probe” is a nucleic acid sequence or molecule that can be used to identify the presence of a marker locus, e.g., a nucleic acid probe that is complementary to a marker locus sequence. Nucleic acids are “complementary” when they specifically hybridize in solution, e.g., according to Watson-Crick base pairing rules. A “marker locus” is a locus that can be used to track the presence of a second linked locus, e.g., a linked or correlated locus that encodes or contributes to the population variation of a phenotypic trait. For example, a marker locus can be used to monitor segregation of alleles at a locus, such as a quantitative trait locus (QTL), that are genetically or physically linked to the marker locus. Thus, a “marker allele,” alternatively an “allele of a marker locus” is one of a plurality of polymorphic nucleotide sequences found at a marker locus in a population that is polymorphic for the marker locus.
In one embodiment, the present disclosure provides marker loci correlating with a phenotype of interest, e.g., coronary artery disease, atrial fibrillation or Type 2 diabetes. Each of the identified markers is expected to be in close physical and genetic proximity (resulting in physical and/or genetic linkage) to a genetic element, e.g., a QTL that contributes to the relevant phenotype. Markers corresponding to genetic polymorphisms between members of a population can be detected by methods well-established in the art. These include, e.g., PCR-based sequence specific amplification methods, detection of restriction fragment length polymorphisms (RFLP), detection of isozyme markers, detection of allele specific hybridization (ASH), detection of single nucleotide extension, detection of amplified variable sequences of the genome, detection of self-sustained sequence replication, detection of simple sequence repeats (SSRs), detection of single nucleotide polymorphisms (SNPs), or detection of amplified fragment length polymorphisms (AFLPs).
The term “amplifying” in the context of nucleic acid amplification is any process whereby additional copies of a selected nucleic acid (or a transcribed form thereof) are produced. Typical amplification methods include various polymerase based replication methods, including the polymerase chain reaction (PCR), ligase mediated methods such as the ligase chain reaction (LCR) and RNA polymerase based amplification (e.g., by transcription) methods.
An “amplicon” is an amplified nucleic acid, e.g., a nucleic acid that is produced by amplifying a template nucleic acid by any available amplification method (e.g., PCR, LCR, transcription, or the like).
A specified nucleic acid is “derived from” a given nucleic acid when it is constructed using the given nucleic acid's sequence, or when the specified nucleic acid is constructed using the given nucleic acid.
A “gene” is one or more sequence(s) of nucleotides in a genome that together encode one or more expressed molecules, e.g., an RNA, or polypeptide. The gene can include coding sequences that are transcribed into RNA which may then be translated into a polypeptide sequence, and can include associated structural or regulatory sequences that aid in replication or expression of the gene.
A “genotype” is the genetic constitution of an individual (or group of individuals) at one or more genetic loci. Genotype is defined by the allele(s) of one or more known loci of the individual, typically, the compilation of alleles inherited from its parents.
A “haplotype” is the genotype of an individual at a plurality of genetic loci on a single DNA strand. Typically, the genetic loci described by a haplotype are physically and genetically linked, i.e., on the same chromosome strand.
A “set” of markers, probes or primers refers to a collection or group of markers probes, primers, or the data derived therefrom, used for a common purpose (e.g., assessing an individuals risk of developing coronary artery disease, atrial fibrillation or Type 2 diabetes). Frequently, data corresponding to the markers, probes or primers, or derived from their use, is stored in an electronic medium. While each of the members of a set possess utility with respect to the specified purpose, individual markers selected from the set as well as subsets including some, but not all of the markers, are also effective in achieving the specified purpose.
The polymorphisms and genes, and corresponding marker probes, amplicons or primers described above can be embodied in any system herein, either in the form of physical nucleic acids, or in the form of system instructions that include sequence information for the nucleic acids. For example, the system can include primers or amplicons corresponding to (or that amplify a portion of) a gene or polymorphism described herein. As in the methods above, the set of marker probes or primers optionally detects a plurality of polymorphisms in a plurality of said genes or genetic loci. Thus, for example, the set of marker probes or primers detects at least one polymorphism in each of these genes, or any other polymorphism, gene or locus defined herein. Any such probe or primer can include a nucleotide sequence of any such polymorphism or gene, or a complementary nucleic acid thereof, or a transcribed product thereof (e.g., a nRNA or mRNA form produced from a genomic sequence, e.g., by transcription or splicing).
As used herein, “Receiver operating characteristic curves” refer to a graphical plot of the sensitivity vs. (1−specificity) for a binary classifier system as its discrimination threshold is varied. The ROC can also be represented equivalently by plotting the fraction of true positives (TPR=true positive rate) vs. the fraction of false positives (FPR=false positive rate). Also known as a Relative Operating Characteristic curve, because it is a comparison of two operating characteristics (TPR & FPR) as the criterion changes. ROC analysis provides tools to select possibly optimal models and to discard suboptimal ones independently from (and prior to specifying) the cost context or the class distribution. Methods of using in the context of the disclosure will be clear to those skilled in the art.
As used herein, the term “combining the genetic risk assessment with the clinical risk assessment to obtain the risk” refers to any suitable mathematical analysis relying on the results of the two assessments. For example, the results of the clinical risk assessment and the genetic risk assessment may be added, more preferably multiplied.
As used herein, the terms “routinely screening for coronary artery disease, atrial fibrillation or Type 2 diabetes” and “more frequent screening” are relative terms, and are based on a comparison to the level of screening recommended to a subject who has no identified risk of developing coronary artery disease, atrial fibrillation or Type 2 diabetes.
In an embodiment, the methods of the present disclosure relate to assessing the risk of a subject for developing coronary artery disease, atrial fibrillation or Type 2 diabetes by performing a genetic risk assessment.
The genetic risk assessment is performed by analysing the genotype of the subject at one or more loci for single nucleotide polymorphisms.
The present invention provides a method for assessing the risk of a human subject developing coronary artery disease, the method comprising performing a genetic risk assessment of the human subject, wherein the genetic risk assessment involves detecting, in a biological sample derived from the human subject, the presence of at least one polymorphisms associated with a risk of developing coronary artery disease, wherein the at least one polymorphism is selected from Table 1, or a single nucleotide polymorphism in linkage disequilibrium with one or more thereof. In an embodiment, the method comprises detecting the presence of at least two, at least three, at least four, at least five, at least six, at least seven, at least eight, at least nine, at least ten, at least 20, at least 50, at least 100, at least 150, at least 200, at least 250, at least 300, at least 350, at least 400, at least 450, at least 500, at least 550, at least 600, at least 650, at least 700, at least 750, at least 800, at least 825, at least 850, or all of the polymorphism provided in Table 1, or a single nucleotide polymorphism in linkage disequilibrium with one or more thereof. In embodiment, the method comprises detecting each of the polymorphisms provided in Table 1.
The present invention also provides a method for assessing the risk of a human subject developing coronary artery disease, the method comprising performing a genetic risk assessment of the human subject, wherein the genetic risk assessment involves detecting, in a biological sample derived from the human subject, the presence of at least one polymorphisms associated with a risk of developing coronary artery disease, wherein the at least one polymorphism is selected from Table 2, or a single nucleotide polymorphism in linkage disequilibrium with one or more thereof. In an embodiment, the method comprises detecting the presence of at least two, at least three, at least four, at least five, at least six, at least seven, at least eight, at least nine, at least ten, at least 20, at least 50, at least 100, at least 125, or all of the polymorphism provided in Table 2, or a single nucleotide polymorphism in linkage disequilibrium with one or more thereof. In embodiment, the method comprises detecting each of the polymorphisms provided in Table 2.
The present invention also provides a method for assessing the risk of a human subject developing coronary artery disease, the method comprising performing a genetic risk assessment of the human subject, wherein the genetic risk assessment involves detecting, in a biological sample derived from the human subject, the presence of at least one polymorphisms associated with a risk of developing coronary artery disease, wherein the at least one polymorphism is selected from Table 1 and/or Table 2, or a single nucleotide polymorphism in linkage disequilibrium with one or more thereof. In an embodiment, the method comprises detecting each of the polymorphisms provided in Tables 1 and 2.
The present invention also provides a method for assessing the risk of a human subject developing atrial fibrillation, the method comprising performing a genetic risk assessment of the human subject, wherein the genetic risk assessment involves detecting, in a biological sample derived from the human subject, the presence at least one polymorphisms associated with a risk of developing atrial fibrillation, wherein the at least one polymorphism is selected from Table 3, or a single nucleotide polymorphism in linkage disequilibrium with one or more thereof. In an embodiment, the method comprises detecting the presence of at least two, at least three, at least four, at least five, at least six, at least seven, at least eight, at least nine, at least ten, at least 20, at least 50, at least 100, at least 125, at least 150, at least 175, at least 200, at least 225, at least 250, or all of the polymorphism provided in Table 3, or a single nucleotide polymorphism in linkage disequilibrium with one or more thereof. In an embodiment, the method comprises detecting each of the polymorphism provided in Table 3.
The present invention further provides a method for assessing the risk of a human subject developing Type 2 diabetes, the method comprising performing a genetic risk assessment of the human subject, wherein the genetic risk assessment involves detecting, in a biological sample derived from the human subject, the presence at least one polymorphisms associated with a risk of developing Type 2 diabetes, wherein the at least one polymorphism is selected from Table 4, or a single nucleotide polymorphism in linkage disequilibrium with one or more thereof. In an embodiment, the method comprises detecting the presence of at least two, at least three, at least four, at least five, at least six, at least seven, at least eight, at least nine, at least ten, at least 20, at least 30, at least 40, at least 50, at least 60, at least 70, at least 80, at least 85, or all of the polymorphism provided in Table 4, or a single nucleotide polymorphism in linkage disequilibrium with one or more thereof. In an embodiment, the method comprises detecting each of the polymorphism provided in Table 4.
As the skilled addressee will appreciate, each SNP which increases the risk of developing coronary artery disease, atrial fibrillation or Type 2 diabetes has an odds ratio of association with coronary artery disease, atrial fibrillation or Type 2 diabetes, respectively, of greater than 1.0. Furthermore, each SNP which decreases the risk of developing coronary artery disease, atrial fibrillation or Type 2 diabetes has an odds ratio of association with coronary artery disease, atrial fibrillation or Type 2 diabetes, respectively, of less than 1.0. In an embodiment, none of the polymorphisms have an odds ratio of association with coronary artery disease, atrial fibrillation or Type 2 diabetes greater than 3 or greater than 4.
In an example, single nucleotide polymorphisms in linkage disequilibrium with one or more of the single nucleotide polymorphisms selected from Table 1, Table 2, Table 3 or Table 4 have LD values of at least 0.5, at least 0.6, at least 0.7, at least 0.8. In another example, single nucleotide polymorphisms in linkage disequilibrium have LD values of at least 0.9. In another example, single nucleotide polymorphisms in linkage disequilibrium have LD values of at least 1.
In an embodiment, the number of SNPs assessed is based on the net reclassification improvement in risk prediction calculated using net reclassification index (NRI) (Pencina et al., 2008). In an embodiment, the net reclassification improvement of the methods of the present disclosure is greater than 0.01.
In a further embodiment, the net reclassification improvement of the methods of the present disclosure is greater than 0.05. In yet another embodiment, the net reclassification improvement of the methods of the present disclosure is greater than 0.1.
In an additional embodiment, variants in linkage disequilibrium with those specifically mentioned herein are easily identified by those of skill in the art. Variants that exist in strong linkage disequilibrium with those specifically mentioned herein, such as variants that are linked with an r2>0.8 and D′>0.8 would also be considered as existing in positive linkage disequilibrium sufficient to be considered a surrogate or proxy marker variant for the variants specifically described here. Individuals skilled in the art will be able to assess the inheritance.
An individual's “genetic risk” can be defined as the weighted sum of the individuals' genotypes at multiple genetic loci. In other words, they are the linear combinations of the risk alleles across a set of candidate SNPs. For example, a PRS for individual i can be calculated as:
PRS
i=βixi1+β2xi2+ . . . +βjxij+ . . . +βpxip, (1)
where xij∈{0, 1, 2} are risk allele counts and βij are the weights for SNP j=1, . . . , p.
The PRS can also be represented as: PRS=(No of Risk Alleles×(3 coefficient).
In one embodiment, the key steps to construct a polygenic risk score (PRS) are to determine which SNPs to include and how to weight their effects. In one embodiment, the maxCT and SCT methods (Prive et al., 2019) are used. These methods are based on clumping and thresholding. The aim of clumping and thresholding is to remove correlated SNPs while keeping the most important SNPs in the PRS. In order to do this, the values of a range of hyperparameters including the correlation threshold (r2), the clumping window size (kb) and the p-value significance threshold (p) are decided. Different selection of these hyperparameters values would in general give a different section of SNPs to include. For the weights, the reported GWAS coefficients (i.e., regression coefficients or log odds ratios) from external published GWAS can be used.
The core idea of the maxCT and SCT procedures is to select a set of different hyperparameters values and compute a PRS for each combination of these values.
This will usually create a large number of PRSs, for example, around 100,000 vectors of PRSs for a typical GWAS. After constructing these RPSs, there are two approaches to create the final PRS. In maxCT, the PRS that has the strongest predictive performance (e.g., largest AUC) are selected as the final PRS. In SCT, the PRSs by a penalized logistic regression model, for example, the popular lasso procedure are combined. Since the outcome of SCT will be a linear combination of PRSs, where each PRS is again a linear combination of variants, the final PRS still has the form of (1), which means the effect sizes of SNPs can be obtained and used for prediction.
In an alternate example, a log-additive risk model can then be used to define three genotypes AA, AB, and BB for a single SNP having relative risk values of 1, OR, and OR2, under a rare disease model, where OR is the previously reported disease odds ratio for the high-risk allele, B, vs the low-risk allele, A. If the B allele has frequency (p), then these genotypes have population frequencies of (1−p)2, 2p(1−p), and p2, assuming Hardy-Weinberg equilibrium. The genotype relative risk values for each SNP can then be scaled so that based on these frequencies the average relative risk in the population is 1. Specifically, given the unscaled population average relative risk:
(μ)=(1−p)2+2p(1−p)OR+p2OR2
Adjusted risk values 1/μ, OR/μ, and OR2/μ are used for AA, AB, and BB genotypes. Missing genotypes are assigned a relative risk of 1. The following formula can be used to define the genetic risk:
SNP1×SNP2×SNP3×SNP4×SNP5×SNP6×SNP7,×SNP8, etc.
Similar calculations can be performed for non-SNP polymorphisms.
An alternate method for calculating the composite SNP risk is described in Mavaddat et al. (2015). In this example, the following formula is used;
PRS=β
1
x
1+βbx2+ . . . βκxκ+βnxn
where βκ is the per-allele log odds ratio (OR) for coronary artery disease, atrial fibrillation or Type 2 diabetes associated with the minor allele for SNP κ, and xκ the number of alleles for the same SNP (0, 1 or 2), n is the total number of SNPs and PRS is the polygenic risk score (which can also be referred to as composite SNP risk).
It is envisaged that the “risk” of a human subject for developing coronary artery disease, atrial fibrillation or Type 2 diabetes can be provided as a relative risk (or risk ratio) or an absolute risk as required.
In an embodiment, the genetic risk assessment obtains the “relative risk” of a human subject for developing coronary artery disease, atrial fibrillation or Type 2 diabetes. Relative risk (or risk ratio), measured as the incidence of a disease in individuals with a particular characteristic (or exposure) divided by the incidence of the disease in individuals without the characteristic, indicates whether that particular exposure increases or decreases risk. Relative risk is helpful to identify characteristics that are associated with a disease, but by itself is not particularly helpful in guiding screening decisions because the frequency of the risk (incidence) is cancelled out.
In another embodiment, the genetic risk assessment obtains the “absolute risk” of a human subject for developing coronary artery disease, atrial fibrillation or Type 2 diabetes. Absolute risk is the numerical probability of a human subject developing coronary artery disease, atrial fibrillation or Type 2 diabetes within a specified period (e.g. 5, 10, 15, 20 or more years). It reflects a human subject's risk of developing coronary artery disease, atrial fibrillation or Type 2 diabetes in so far as it does not consider various risk factors in isolation.
In an embodiment, one or more threshold value(s) are set for determining a particular action such as the need for routine diagnostic testing or preventative therapy. For example, a score determined using a method of the invention is compared to a pre-determined threshold, and if the score is higher than the threshold a recommendation is made to take the pre-determined action. Methods of setting such thresholds have now become widely used in the art and are described in, for example, US 20140018258.
The methods of the present disclosure can comprise performing a clinical risk assessment of the subject. The results of the clinical risk assessment can be combined with the genetic risk assessment to obtain the risk of the subject for developing coronary artery disease, atrial fibrillation or Type 2 diabetes.
Any suitable clinical risk assessment procedure can be used in the present disclosure. Preferably, the clinical risk assessment does not involve genotyping the subject at one or more loci.
In one embodiment, the clinical risk assessment procedure includes obtaining information from the subject on one or more of the following: age, gender, HDL-cholesterol level (mmol/L), LDL-cholesterol level (mmol/L), total cholesterol level, blood pressure (systolic and/or diastolic (mm Hg)), smoking status, have or has had diabetes, on hypertension medication, c-reactive protein levels, whether the subject's mother or father have had a heart attack (such by the age of 60), body mass index, ethnicity, measures of deprivation, family history, have or has had chronic kidney disease, and have or has had rheumatoid arthritis.
In another embodiment, the clinical risk assessment procedure includes obtaining information from the subject on one or more of the following: age, gender, HDL-cholesterol level (mmol/L), LDL-cholesterol level (mmol/L), total cholesterol level, blood pressure (systolic and diastolic (mm Hg)), smoking status, have or has had diabetes and on hypertension medication.
In another embodiment, performing the clinical risk assessment uses a model which calculates the absolute risk of developing coronary artery disease, atrial fibrillation or Type 2 diabetes. For example, the absolute risk of developing coronary artery disease, atrial fibrillation or Type 2 diabetes can be calculated using coronary artery disease, atrial fibrillation or Type 2 diabetes, respectively, incidence rates while accounting for the competing risk of dying from other causes apart from coronary artery disease, atrial fibrillation or Type 2 diabetes. In an embodiment, the clinical risk assessment provides a 5-year absolute risk of developing coronary artery disease, atrial fibrillation or Type 2 diabetes. In another embodiment, the clinical risk assessment provides a 10-year absolute risk of developing coronary artery disease, atrial fibrillation or Type 2 diabetes.
Examples of clinical risk assessment procedures include, but are not limited to, the Framingham score, the Reynolds score, QRISK and CANRISK (for diabetes). In a preferred embodiment, the clinical risk assessment is the Framingham Risk Score for the subject. In another preferred embodiment, the clinical risk assessment is The American College of Cardiologists Pooled Cohort Equations (PCE). In a preferred embodiment, the Framingham score is used to determine the risk of developing Atrial Fibrillation or Type 2 Diabetes. In a preferred embodiment, PCE is used to determine the risk of developing Coronary Artery Disease.
The Framingham Risk Score is an algorithm used to estimate the 10-year cardiovascular risk of an individual. The Framingham Risk Score was first developed based on data obtained from the Framingham Heart Study, to estimate the 10-year risk of developing coronary heart disease (Wilson et al., 1998). In order to assess the 10-year cardiovascular disease risk, cerebrovascular events, peripheral artery disease and heart failure were subsequently added as disease outcomes for the 2008 Framingham Risk Score, on top of coronary heart disease (D'Agostino et al., 2008). The Framingham Risk Score has also been shown to be a useful predictor of developing atrial fibrillation and Type 2 diabetes.
In an embodiment, for a female subject the Framingham Risk Score is determined as follows:
Age: 20-34 years: Minus 7 points. 35-39 years: Minus 3 points. 40-44 years: 0 points. 45-49 years: 3 points. 50-54 years: 6 points. 55-59 years: 8 points. 60-64 years: points. 65-69 years: 12 points. 70-74 years: 14 points. 75-79 years: 16 points.
Total cholesterol, mg/dL: Age 20-39 years: Under 160: 0 points. 160-199: 4 points. 200-239: 8 points. 240-279: 11 points. 280 or higher: 13 points. ⋅ Age 40-49 years: Under 160: 0 points. 160-199: 3 points. 200-239: 6 points. 240-279: 8 points. 280 or higher: 10 points. ⋅ Age 50-59 years: Under 160: 0 points. 160-199: 2 points. 200-239: 4 points. 240-279: 5 points. 280 or higher: 7 points. ⋅ Age 60-69 years: Under 160: points. 160-199: 1 point. 200-239: 2 points. 240-279: 3 points. 280 or higher: 4 points. ⋅ Age 70-79 years: Under 160: 0 points. 160-199: 1 point. 200-239: 1 point. 240-279: 2 points. 280 or higher: 2 points.
If cigarette smoker: Age 20-39 years: 9 points. ⋅ Age 40-49 years: 7 points. ⋅ Age years: 4 points. ⋅ Age 60-69 years: 2 points. ⋅ Age 70-79 years: 1 point.
All non smokers: 0 points.
HDL cholesterol, mg/dL: 60 or higher: Minus 1 point. 50-59: 0 points. 40-49: 1 point. Under 40: 2 points.
Systolic blood pressure, mm Hg: Untreated: Under 120: 0 points. 120-129: 1 point. 130-139: 2 points. 140-159: 3 points. 160 or higher: 4 points. ⋅ Treated: Under 120: 0 points. 120-129: 3 points. 130-139: 4 points. 140-159: 5 points. 160 or higher: 6 points.
10-year risk in %: Points total: Under 9 points: <1%. 9-12 points: 1%. 13-14 points: 2%. 15 points: 3%. 16 points: 4%. 17 points: 5%. 18 points: 6%. 19 points: 8%. 20 points: 11%. 21=14%, 22=17%, 23=22%, 24=27%, >25=Over 30%
In an embodiment, for a male subject the Framingham Risk Score is determined as follows:
Age: 20-34 years: Minus 9 points. 35-39 years: Minus 4 points. 40-44 years: 0 points. 45-49 years: 3 points. 50-54 years: 6 points. 55-59 years: 8 points. 60-64 years: 10 points. 65-69 years: 11 points. 70-74 years: 12 points. 75-79 years: 13 points.
Total cholesterol, mg/dL: Age 20-39 years: Under 160: 0 points. 160-199: 4 points. 200-239: 7 points. 240-279: 9 points. 280 or higher: 11 points. ⋅ Age 40-49 years: Under 160: 0 points. 160-199: 3 points. 200-239: 5 points. 240-279: 6 points. 280 or higher: 8 points. ⋅ Age 50-59 years: Under 160: 0 points. 160-199: 2 points. 200-239: 3 points. 240-279: 4 points. 280 or higher: 5 points. ⋅ Age 60-69 years: Under 160: 0 points. 160-199: 1 point. 200-239: 1 point. 240-279: 2 points. 280 or higher: 3 points. ⋅ Age 70-79 years: Under 160: 0 points. 160-199: 0 points. 200-239: 0 points. 240-279: 1 point. 280 or higher: 1 point.
If cigarette smoker: Age 20-39 years: 8 points. Age 40-49 years: 5 points. ⋅ Age 50-59 years: 3 points. ⋅ Age 60-69 years: 1 point. ⋅ Age 70-79 years: 1 point.
All non smokers: 0 points.
HDL cholesterol, mg/dL: 60 or higher: Minus 1 point. 50-59: 0 points. 40-49: 1 point. Under 40: 2 points.
Systolic blood pressure, mm Hg: Untreated: Under 120: 0 points. 120-129: 0 points. 130-139: 1 point. 140-159: 1 point. 160 or higher: 2 points. ⋅ Treated: Under 120: 0 points. 120-129: 1 point. 130-139: 2 points. 140-159: 2 points. 160 or higher: 3 points.
10-year risk in %: Points total: 0 point: <1%. 1-4 points: 1%. 5-6 points: 2%. 7 points: 3%. 8 points: 4%. 9 points: 5%. 10 points: 6%. 11 points: 8%. 12 points: 10%. 13 points: 12%. 14 points: 16%. 15 points: 20%. 16 points: 25%. 17 points or more: Over 30%.
In an embodiment, PCE (Goff et al., 2014; Riveros-McKay et al., 2021) includes obtaining information from the subject on one or more or all of the following: age, gender/biological sex, ethnicity, smoking status, diabetes status, HDL cholesterol level (mmol/L), total cholesterol level (mmol/L), systolic blood pressure (mm Hg), use of antihypertensive medication.
In an embodiment, in the PCE model the linear combination (L), dependent on the patient's ethnicity and gender, of the risk variables is the sum, of the product, of each variable's value and 13 coefficient, as defined in Table 5:
Linear Combination (L)=Σ(Variable Value×β coefficient).
The Reynolds risk calculator takes into account a family history of premature heart disease (which implies a genetic predisposition to cardiovascular disease), and also c-reactive protein (CRP) levels (a marker of inflammation) (Ridker et al., 2007 and 2008). These two risk factors are thought to be more predictive of heart disease in women than in men. The Reynolds score is based on the following risk factors: age, current smoker, systolic blood pressure, HDL cholesterol, CRP level, mother or father with heart attack before age 60.
QRISK (the latest version being QRISK3) is a prediction algorithm for cardiovascular disease that uses traditional risk factors (age, systolic blood pressure, smoking status and ratio of total serum cholesterol to high-density lipoprotein cholesterol) together with body mass index, ethnicity, measures of deprivation, family history, chronic kidney disease, rheumatoid arthritis, atrial fibrillation, diabetes mellitus, and antihypertensive treatment (Hippisley-Cox et al., 2008). A QRISK over 10 (10% risk of CVD event over the next ten years) indicates that primary prevention with lipid lowering therapy (such as statins) should be considered.
CANRISK is a questionnaire that helps identify their risk of pre-diabetes or type 2 diabetes (Robinson et al., 2011). It is mainly for adults between the ages of 45 and 74 years, but may also be used for younger groups in high-risk populations. CANRISK includes assessment of age, gender, weight, height, body mass index, waist circumference, level of physical activity, diet, blood pressure, blood sugar levels, birth weight of children, family history of diabetes, ethnicity of parents and level of education.
In combining the clinical risk assessment with the genetic risk assessment to obtain the “risk” of a human subject for developing coronary artery disease, atrial fibrillation or Type 2 diabetes, a maxCT approach can be used. In the embodiment, the fitted logistic regression models for CAD, atrial fibrillation and Type 2 diabetes can be:
log it(CAD)=−2.5396+3.5882×Framingham score+0.2771×PRS,
log it(AF)=−2.0551+3.3736×Framingham score+0.2074×PRS,
log it(T2D)=−2.7209+8.6933×Framingham score+0.4120×PRS.
PRS is defined as the sum of risk allele counts (0, 1, 2) of the selected SNPs weighted by their GWAS effect sizes, for example, PRS for individual i is given by:
PRS
i=β1xi1+β2xi2+ +βjxij+ . . . +βpxip,
where xij∈{0, 1, 2} are risk allele counts, βj is the effect size for SNP j, and p is the total number of SNPs selected by the maxCT approach. The effect sizes of SNPs are estimated from the reported summary statistics (i.e., regression coefficients or log odds ratios with the information of rs ID, risk allele and p-value) from external published GWAS.
Similarly, using the SCT approach, the fitted logistic regression models for CAD, atrial fibrillation and Type 2 diabetes can be:
log it(CAD)=−2.6492+3.6002×Framingham score+PRS,
log it(AF)=−3.4577+3.4253×Framingham score+PRS,
log it(T2D)=−2.9857+8.6158×Framingham score+PRS,
PRS is again defined as the linear combination of the risk alleles across the selected SNPs. However, for the SCT approach, the effect sizes of the selected SNPs are not simply estimated by the reported log odd ratios but rather a linear combination of them. The linear combination is guided by testing a grid of different hyperparameters and selecting the best one that gives the best prediction performance in the training data.
The odds and the risk of diseases for a given Framingham score and PRS can also be estimated from the logistic regression model, if so desired, which are given by:
=exp(log it),
=[1+exp(−log it)]−1.
In an alternate embodiment, the following formula can be used:
[Risk (i.e. Clinical Evaluation×SNP risk)]=[Clinical Evaluation risk]×SNP1×SNP2×SNP3×SNP4×SNP5×SNP6×SNP7,×SNP8, . . . ×SNPN etc.
Where Clinical Evaluation is the risk provided by the clinical evaluation, and SNP1 to SNPN are the relative risk for the individual SNPs, each scaled to have a population average of 1 as outlined above. Because the SNP risk values have been “centred” to have a population average risk of 1, if one assumes independence among the SNPs, then the population average risk across all genotypes for the combined value is consistent with the underlying Clinical Evaluation risk estimate.
In an embodiment the risk of a human subject for developing coronary artery disease, atrial fibrillation or Type 2 diabetes is calculated by [Clinical Evaluation risk]×SNP1×SNP2×SNP3×SNP4×SNP5×SNP6×SNP7,×SNP8, . . . ×SNPN etc. In another embodiment the risk of a human subject for developing coronary artery disease, atrial fibrillation or Type 2 diabetes is calculated by [Clinical Evaluation 5-year risk]×SNP1×SNP2×SNP3×SNP4×SNP5×SNP6×SNP7,×SNP8, . . . ×SNPN etc.
In another embodiment the risk of a human subject for developing coronary artery disease, atrial fibrillation or Type 2 diabetes is calculated by [Clinical Evaluation lifetime risk]×SNP1×SNP2×SNP3×SNP4×SNP5×SNP6×SNP7,×SNP8, . . . ×SNPN etc. In an embodiment, the Clinical Evaluation is performed by assessing one or more of the following: age, gender, HDL-cholesterol level (mmol/L), LDL-cholesterol level (mmol/L), total cholesterol level, blood pressure (systolic and/or diastolic (mm Hg)), smoking status, have or has had diabetes, on hypertension medication, c-reactive protein levels, whether the subject's mother or father have had a heart attack (such by the age of body mass index, ethnicity, measures of deprivation, family history, have or has had chronic kidney disease, and have or has had rheumatoid arthritis to provide a clinical risk. In another embodiment, the clinical risk assessment procedure includes obtaining information from the subject on one or more of the following: age, gender, HDL-cholesterol level (mmol/L), LDL-cholesterol level (mmol/L), total cholesterol level, blood pressure (systolic and diastolic (mm Hg)), smoking status, have or has had diabetes and on hypertension medication to provide a clinical risk. In this embodiment, the risk (i.e. combined genetic risk x clinical risk) is provided by:
[Risk (i.e. clinical×genetic risk)]=[clinical factor1×clinical factor2, . . . ,×clinical factor5]×SNP1×SNP2×SNP3×SNP4×SNP5×SNP6×SNP7,×SNP8, . . . ×SNPN etc.
In various embodiments the method performance is characterized by an area under the curve (AUC) of at least about 0.6, at least about 0.61, at least about 0.62, at least about 0.63, about 0.6, about 0.61, about 0.62, about 0.63, about 0.72, about 0.73, about 0.74, about 0.75, between 0.6 and 0.8, or between 0.6 and 0.76, for the risk of developing coronary artery disease.
In various embodiments the method performance is characterized by an area under the curve (AUC) of at least about 0.6, at least about 0.61, at least about 0.62, at least about 0.63, about 0.6, about 0.61, about 0.62, about 0.63, about 0.71, about 0.72, about 0.73, about 0.74, between 0.6 and 0.8, or between 0.6 and 0.75 for the risk of developing atrial fibrillation.
In various embodiments the method performance is characterized by an area under the curve (AUC) of at least about 0.77, at least about 0.78, about 0.77, about 0.78, for the risk of developing Type 2 diabetes.
In an embodiment, three calculations are required to generate the PCE probabilities (pca) defined below. These calculations use the Linear Combination (L) as defined in above and variables defined in Table 5 that are dependent on the patient's ethnicity and gender.
The patient's result is their 10-year risk which is detailed below using the patient's PRS score, PCE probabilities, along with the variables defined in Table 6 that are dependent on the patient's ethnicity and gender.
In an embodiment, when the disease is atrial fibrillation a calibrated Framingham score is determined as follows:
The patient's result is their 10-year risk which is detailed below using the patient's PRS score, Framingham probabilities, along with the variables defined in Table 7 that are dependent on the patient's ethnicity and gender.
In an embodiment, one or more threshold value(s) are set for determining a particular action such as the need for routine diagnostic testing or preventative therapy. For example, a score determined using a method of the invention is compared to a pre-determined threshold, and if the score is higher than the threshold a recommendation is made to take the pre-determined action. Methods of setting such thresholds have now become widely used in the art and are described in, for example, US 20140018258.
In one embodiment relating to assessing the risk of a human subject developing coronary artery disease, the subject is considered low at low risk of the 10-year Risk Score is below 7.5%, in particular below 5%, intermediate risk if the 10-year Risk Score is between 7.5% and 20%, and high if above 20%. If the subject's 10-year Risk Score is above 7.5%, in particular if above 20%, it is recommended they be given medication to treat and/or prevent hypertension.
The term “subject” as used herein refers to a human subject. Terms such as “subject”, “patient” or “individual” are terms that can, in context, be used interchangeably in the present disclosure. In an example, the methods of the present disclosure can be used for routine screening of subjects. Routine screening can include testing subjects at pre-determined time intervals. Exemplary time intervals include screening monthly, quarterly, six monthly, yearly, every two years or every three years.
In an embodiment, the subject has at least one symptom of coronary artery disease, atrial fibrillation or Type 2 diabetes. In another embodiment, the subject has a family history of coronary artery disease, atrial fibrillation or Type 2 diabetes.
The methods of the present disclosure can be used to assess risk in male and female subjects.
The methods of the present disclosure can be used for assessing the risk for developing coronary artery disease, atrial fibrillation or Type 2 diabetes in human subjects from various ethnic backgrounds. It is well known that over time there has been blending of different ethnic origins. While in practice, this does not influence the ability of a skilled person to practice the methods described herein, it may be desirable to identify the subject's ethnic background. In this instance, the ethnicity of the human subject can be self-reported by the subject. As an example, subjects can be asked to identify their ethnicity in response to this question: “To what ethnic group do you belong?” In another example, the ethnicity of the subject can be derived from medical records after obtaining the appropriate consent from the subject or from the opinion or observations of a clinician.
In an example, the subject can be classified as Caucasoid, Australoid, Mongoloid and Negroid based on physical anthropology. In an embodiment, the subject can be Caucasian, African American, Hispanic, Asian, Indian, or Latino. In an example, the subject is Caucasian. For example, the subject can be European.
A subject of predominantly European origin, either direct or indirect through ancestry, with white skin is considered Caucasian in the context of the present disclosure. A Caucasian may have, for example, at least 75% Caucasian ancestry (for example, but not limited to, the subject having at least three Caucasian grandparents).
A subject of predominantly central or southern African origin, either direct or indirect through ancestry, is considered Negroid in the context of the present disclosure. A Negroid may have, for example, at least 75% Negroid ancestry. An American subject with predominantly Negroid ancestry and black skin is considered African American in the context of the present disclosure. An African American may have, for example, at least 75% Negroid ancestry. Similar principle applies to, for example, subjects of Negroid ancestry living in other countries (for example Great Britain, Canada or the Netherlands).
A subject predominantly originating from Spain or a Spanish-speaking country, such as a country of Central or Southern America, either direct or indirect through ancestry, is considered Hispanic in the context of the present disclosure. A Hispanic subject may have, for example, at least 75% Hispanic ancestry.
In an embodiment, when the PCE is used the subject is self assessed as Caucasian (white), or African.
In performing the methods of the present disclosure, a biological sample from a subject is required. It is considered that terms such as “sample” and “specimen” are terms that can, in context, be used interchangeably in the present disclosure. Any biological material can be used as the above-mentioned sample so long as it can be derived from the subject and DNA can be isolated and analyzed according to the methods of the present disclosure. Samples are typically taken, following informed consent, from a patient by standard medical laboratory methods. The sample may be in a form taken directly from the patient, or may be at least partially processed (purified) to remove at least some non-nucleic acid material.
Exemplary “biological samples” include bodily fluids (blood, saliva, urine etc.), biopsy, tissue, and/or waste from the patient. Thus, tissue biopsies, stool, sputum, saliva, blood, lymph, tears, sweat, urine, vaginal secretions, or the like can easily be screened for SNPs, as can essentially any tissue of interest that contains the appropriate nucleic acids. In one embodiment, the biological sample is a cheek cell sample.
In another embodiment the sample is a blood sample. A blood sample can be treated to remove particular cells using various methods such as such centrifugation, affinity chromatography (e.g. immunoabsorbent means), immunoselection and filtration if required. Thus, in an example, the sample can comprise a specific cell type or mixture of cell types isolated directly from the subject or purified from a sample obtained from the subject. In an example, the biological sample is peripheral blood mononuclear cells (pBMC). Various methods of purifying sub-populations of cells are known in the art. For example, pBMC can be purified from whole blood using various known Ficoll based centrifugation methods (e.g. Ficoll-Hypaque density gradient centrifugation).
DNA can be extracted from the sample for detecting SNPs. In an example, the DNA is genomic DNA. Various methods of isolating DNA, in particular genomic DNA are known to those of skill in the art. In general, known methods involve disruption and lysis of the starting material followed by the removal of proteins and other contaminants and finally recovery of the DNA. For example, techniques involving alcohol precipitation; organic phenol/chloroform extraction and salting out have been used for many years to extract and isolate DNA. There are various commercially available kits for genomic DNA extraction (Qiagen, Life technologies; Sigma). Purity and concentration of DNA can be assessed by various methods, for example, spectrophotometry.
Amplification primers for amplifying markers (e.g., marker loci) and suitable probes to detect such markers or to genotype a sample with respect to multiple marker alleles can be used in the disclosure. For example, primer selection for long-range PCR is described in U.S. Ser. No. 10/042,406 and U.S. Ser. No. 10/236,480; for short-range PCR, U.S. Ser. No. 10/341,832 provides guidance with respect to primer selection. Also, there are publicly available programs such as “Oligo” available for primer design. With such available primer selection and design software, the publicly available human genome sequence and the polymorphism locations, one of skill in the art can construct primers to amplify the SNPs to practice the disclosure. Further, it will be appreciated that the precise probe to be used for detection of a nucleic acid comprising a SNP (e.g., an amplicon comprising the SNP) can vary, e.g., any probe that can identify the region of a marker amplicon to be detected can be used in conjunction with the present disclosure. Further, the configuration of the detection probes can, of course, vary.
As the skilled person will appreciate, the sequence of the genomic region to which these oligonucleotides hybridize can be used to design primers which are longer at the 5′ and/or 3′ end, possibly shorter at the 5′ and/or 3′ (as long as the truncated version can still be used for amplification), which have one or a few nucleotide differences (but nonetheless can still be used for amplification), or which share no sequence similarity with those provided but which are designed based on genomic sequences close to where the specifically provided oligonucleotides hybridize and which can still be used for amplification.
In some embodiments, the primers of the disclosure are radiolabelled, or labelled by any suitable means (e.g., using a non-radioactive fluorescent tag), to allow for rapid visualization of differently sized amplicons following an amplification reaction without any additional labelling step or visualization step. In some embodiments, the primers are not labelled, and the amplicons are visualized following their size resolution, e.g., following agarose or acrylamide gel electrophoresis. In some embodiments, ethidium bromide staining of the PCR amplicons following size resolution allows visualization of the different size amplicons.
It is not intended that the primers of the disclosure be limited to generating an amplicon of any particular size. For example, the primers used to amplify the marker loci and alleles herein are not limited to amplifying the entire region of the relevant locus, or any subregion thereof. The primers can generate an amplicon of any suitable length for detection. In some embodiments, marker amplification produces an amplicon at least nucleotides in length, or alternatively, at least 50 nucleotides in length, or alternatively, at least 100 nucleotides in length, or alternatively, at least 200 nucleotides in length. Amplicons of any size can be detected using the various technologies described herein. Differences in base composition or size can be detected by conventional methods such as electrophoresis.
Indeed, it will be appreciated that amplification is not a requirement for marker detection, for example one can directly detect unamplified genomic DNA simply by performing a Southern blot on a sample of genomic DNA.
Typically, molecular markers are detected by any established method available in the art, including, without limitation, allele specific hybridization (ASH), detection of single nucleotide extension, array hybridization (optionally including ASH), or other methods for detecting single nucleotide polymorphisms, amplified fragment length polymorphism (AFLP) detection, amplified variable sequence detection, randomly amplified polymorphic DNA (RAPD) detection, restriction fragment length polymorphism (RFLP) detection, self-sustained sequence replication detection, simple sequence repeat (SSR) detection, and single-strand conformation polymorphisms (SSCP) detection.
Some techniques for detecting genetic markers utilize hybridization of a probe nucleic acid to nucleic acids corresponding to the genetic marker (e.g., amplified nucleic acids produced using genomic DNA as a template). Hybridization formats, including, but not limited to: solution phase, solid phase, mixed phase, or in situ hybridization assays are useful for allele detection. An extensive guide to the hybridization of nucleic acids is found in Tijssen (1993) and Sambrook et al. (supra).
PCR detection using dual-labelled fluorogenic oligonucleotide probes, commonly referred to as “TaqMan™” probes, can also be performed according to the present disclosure. These probes are composed of short (e.g., 20-25 bases) oligodeoxynucleotides that are labelled with two different fluorescent dyes. On the 5′ terminus of each probe is a reporter dye, and on the 3′ terminus of each probe a quenching dye is found. The oligonucleotide probe sequence is complementary to an internal target sequence present in a PCR amplicon. When the probe is intact, energy transfer occurs between the two fluorophores and emission from the reporter is quenched by the quencher by FRET. During the extension phase of PCR, the probe is cleaved by 5′ nuclease activity of the polymerase used in the reaction, thereby releasing the reporter from the oligonucleotide-quencher and producing an increase in reporter emission intensity. Accordingly, TaqMan™ probes are oligonucleotides that have a label and a quencher, where the label is released during amplification by the exonuclease action of the polymerase used in amplification. This provides a real time measure of amplification during synthesis. A variety of TaqMan™ reagents are commercially available, e.g., from Applied Biosystems (Division Headquarters in Foster City, Calif.) as well as from a variety of specialty vendors such as Biosearch Technologies (e.g., black hole quencher probes). Further details regarding dual-label probe strategies can be found, e.g., in WO 92/02638.
Other similar methods include e.g. fluorescence resonance energy transfer between two adjacently hybridized probes, e.g., using the “LightCycler®” format described in U.S. Pat. No. 6,174,670.
Array-based detection can be performed using commercially available arrays, e.g., from Affymetrix (Santa Clara, Calif.) or other manufacturers. Reviews regarding the operation of nucleic acid arrays include Sapolsky et al. (1999); Lockhart (1998);
Fodor (1997a); Fodor (1997b) and Chee et al. (1996). Array based detection is one preferred method for identification markers of the disclosure in samples, due to the inherently high-throughput nature of array based detection.
The nucleic acid sample to be analyzed is isolated, amplified and, typically, labelled with biotin and/or a fluorescent reporter group. The labelled nucleic acid sample is then incubated with the array using a fluidics station and hybridization oven. The array can be washed and or stained or counter-stained, as appropriate to the detection method. After hybridization, washing and staining, the array is inserted into a scanner, where patterns of hybridization are detected. The hybridization data are collected as light emitted from the fluorescent reporter groups already incorporated into the labelled nucleic acid, which is now bound to the probe array. Probes that most clearly match the labelled nucleic acid produce stronger signals than those that have mismatches. Since the sequence and position of each probe on the array are known, by complementarity, the identity of the nucleic acid sample applied to the probe array can be identified.
Correlations between SNPs and risk of coronary artery disease, atrial fibrillation or Type 2 diabetes can be performed by any method that can identify a relationship between an allele and increased coronary artery disease, atrial fibrillation or Type 2 diabetes risk, or a combination of alleles and increased coronary artery disease, atrial fibrillation or Type 2 diabetes risk. For example, alleles in genes or loci defined herein can be correlated with increased risk of coronary artery disease, atrial fibrillation or Type 2 diabetes. Most typically, these methods involve referencing a look up table that comprises correlations between alleles of the polymorphism and the coronary artery disease, atrial fibrillation or Type 2 diabetes risk. The table can include data for multiple allele-risk relationships and can take account of additive or other higher order effects of multiple allele-risk relationships, e.g., through the use of statistical tools such as principle component analysis, heuristic algorithms, etc.
Correlation of a marker to a coronary artery disease, atrial fibrillation or Type 2 diabetes risk optionally includes performing one or more statistical tests for correlation. Many statistical tests are known, and most are computer-implemented for ease of analysis. A variety of statistical methods of determining associations/correlations between phenotypic traits and biological markers are known and can be applied to the present disclosure. Hartl (1981). A variety of appropriate statistical models are described in Lynch and Walsh (1998). These models can, for example, provide for correlations between genotypic and phenotypic values, characterize the influence of a locus on coronary artery disease, atrial fibrillation or Type 2 diabetes risk, sort out the relationship between environment and genotype, determine dominance or penetrance of genes, determine maternal and other epigenetic effects, determine principle components in an analysis (via principle component analysis, or “PCA”), and the like. The references cited in these texts provide considerable further detail on statistical models for correlating markers and coronary artery disease, atrial fibrillation or Type 2 diabetes risk.
In addition to standard statistical methods for determining correlation, other methods that determine correlations by pattern recognition and training, such as the use of genetic algorithms, can be used to determine correlations between markers and coronary artery disease, atrial fibrillation or Type 2 diabetes risk. This is particularly useful when identifying higher order correlations between multiple alleles and coronary artery disease, atrial fibrillation or Type 2 diabetes risk. To illustrate, neural network approaches can be coupled to genetic algorithm-type programming for heuristic development of a structure-function data space model that determines correlations between genetic information and phenotypic outcomes.
In any case, essentially any statistical test can be applied in a computer implemented model, by standard programming methods, or using any of a variety of “off the shelf” software packages that perform such statistical analyses, including, for example, those noted above and those that are commercially available, e.g., from Partek Incorporated (St. Peters, Mo.; www.partek.com), e.g., that provide software for pattern recognition (e.g., which provide Partek Pro 2000 Pattern Recognition Software).
Additional details regarding association studies can be found in U.S. Ser. No. 10/106,097, U.S. Ser. No. 10/042,819, U.S. Ser. No. 10/286,417, U.S. Ser. No. 10/768,788, U.S. Ser. No. 10/447,685, U.S. Ser. No. 10/970,761, and U.S. Pat. No. 7,127,355.
Systems for performing the above correlations are also a feature of the disclosure. Typically, the system will include system instructions that correlate the presence or absence of an allele (whether detected directly or, e.g., through expression levels) with a predicted coronary artery disease, atrial fibrillation or Type 2 diabetes risk.
Optionally, the system instructions can also include software that accepts diagnostic information associated with any detected allele information, e.g., a diagnosis that a subject with the relevant allele has a particular coronary artery disease, atrial fibrillation or Type 2 diabetes risk. This software can be heuristic in nature, using such inputted associations to improve the accuracy of the look up tables and/or interpretation of the look up tables by the system. A variety of such approaches, including neural networks, Markov modelling and other statistical analysis are described above.
The disclosure provides methods of determining the polymorphic profile of an individual at the SNPs outlined in the present disclosure (Tables 1 to 4) or SNPs in linkage disequilibrium with one or more thereof.
The polymorphic profile constitutes the polymorphic forms occupying the various polymorphic sites in an individual. In a diploid genome, two polymorphic forms, the same or different from each other, usually occupy each polymorphic site. Thus, the polymorphic profile at sites X and Y can be represented in the form X (x1, x1), and Y (y1, y2), wherein x1, x1 represents two copies of allele x1 occupying site X and y1, y2 represent heterozygous alleles occupying site Y.
The polymorphic profile of an individual can be scored by comparison with the polymorphic forms associated with susceptibility to coronary artery disease, atrial fibrillation or Type 2 diabetes occurring at each site. The comparison can be performed on at least, e.g., 1, 2, 5, 10, 25, 50, or all of the polymorphic sites, and optionally, others in linkage disequilibrium with them. The polymorphic sites can be analyzed in combination with other polymorphic sites.
Polymorphic profiling is useful, for example, in selecting agents to affect treatment or prophylaxis of coronary artery disease, atrial fibrillation or Type 2 diabetes in a given individual. Individuals having similar polymorphic profiles are likely to respond to agents in a similar way.
The methods of the present disclosure may be implemented by a system as a computer implemented method. For example, the system may be a computer system comprising one or a plurality of processors which may operate together (referred to for convenience as “processor”) connected to a memory. The memory may be a non-transitory computer readable medium, such as a hard drive, a solid state disk or CD-ROM. Software, that is executable instructions or program code, such as program code grouped into code modules, may be stored on the memory, and may, when executed by the processor, cause the computer system to perform functions such as determining that a task is to be performed to assist a user to determine the risk of a human subject for developing coronary artery disease, atrial fibrillation or Type 2 diabetes receiving data indicating the genetic risk and optionally the clinical risk of the subject developing coronary artery disease, atrial fibrillation or Type 2 diabetes; processing the data to obtain the risk of a human subject for developing coronary artery disease, atrial fibrillation or Type 2 diabetes; outputting the presence of the risk of a human subject for developing coronary artery disease, atrial fibrillation or Type 2 diabetes.
For example, the memory may comprise program code which when executed by the processor causes the system to determine the presence of the polymorphism(s) or receive data indicating the presence of the polymorphism(s); process the data to obtain the risk of a human subject for developing coronary artery disease, atrial fibrillation or Type 2 diabetes; report the risk of a human subject for developing coronary artery disease, atrial fibrillation or Type 2 diabetes. Thus, in an embodiment, the program code causes the system to determine the “genetic risk”.
In another example, the memory may comprise program code which when executed by the processor causes the system to determine the presence the polymorphism(s), or receive data indicating the presence polymorphism(s) and, receive or determine clinical risk data for the subject; process the data to combine the genetic risk data with the clinical risk data to obtain the risk of the subject for developing coronary artery disease, atrial fibrillation or Type 2 diabetes; report the risk of a human subject for developing coronary artery disease, atrial fibrillation or Type 2 diabetes. For example, the program code can cause the system to combine clinical risk assessment data×genetic risk.
In another embodiment, the system may be coupled to a user interface to enable the system to receive information from a user and/or to output or display information.
For example, the user interface may comprise a graphical user interface, a voice user interface or a touchscreen. In an example, the user interface is a SNP array platform.
In an embodiment, the system may be configured to communicate with at least one remote device or server across a communications network such as a wireless communications network. For example, the system may be configured to receive information from the device or server across the communications network and to transmit information to the same or a different device or server across the communications network. In other embodiments, the system may be isolated from direct user interaction.
In another embodiment, performing the methods of the present disclosure to assess the risk of a subject for developing coronary artery disease, atrial fibrillation or
Type 2 diabetes, enables establishment of a diagnostic or prognostic rule based on the the genetic risk of the subject developing coronary artery disease, atrial fibrillation or Type 2 diabetes. For example, the diagnostic or prognostic rule can be based on the genetic risk relative to a control, standard or threshold level of risk. In another example, the diagnostic or prognostic rule can be based on the combined genetic and clinical risk relative to a control, standard or threshold level of risk.
In another embodiment, the diagnostic or prognostic rule is based on the application of a statistical and machine learning algorithm. Such an algorithm uses relationships between a population of SNPs and disease status observed in training data (with known disease status) to infer relationships which are then used to determine the risk of a human subject for developing coronary artery disease, atrial fibrillation or Type 2 diabetes in subjects with an unknown risk. An algorithm is employed which provides a risk of a human subject developing coronary artery disease, atrial fibrillation or Type 2 diabetes. The algorithm performs a multivariate or univariate analysis function.
In an embodiment, the present disclosure provides a kit comprising at least one set of primers for amplifying two or more nucleic acids, wherein the two or more nucleic acids comprise a polymorphism selected from any one of Tables 1 to 4, or any combinations thereof such as Tables, 1, 3 and 4, or Tables 1 and 2, or a single nucleotide polymorphism in linkage disequilibrium with one or more thereof.
In an embodiment, the kit is for detecting, in a biological sample derived from a human subject, the presence of at least one polymorphisms associated with a risk of developing coronary artery disease, wherein the at least one polymorphism is selected from Table 1 and/or Table 2, or a single nucleotide polymorphism in linkage disequilibrium with one or more thereof. In an embodiment, the kit comprises sets of primers for detecting at least three, at least four, at least five, at least six, at least seven, at least eight, at least nine, at least ten, at least 20, at least 50, at least 100, at least 150, at least 200, at least 250, at least 300, at least 350, at least 400, at least 450, at least 500, at least 550, at least 600, at least 650, at least 700, at least 750, at least 800, at least 825, at least 850, at least 900, at least 950, at least 1,000, or all of the polymorphism provided in Table 1 and/or Table 2, or a single nucleotide polymorphism in linkage disequilibrium with one or more thereof. In an embodiment, the kit comprises sets of primers for detecting each of the polymorphisms provided in Table 1 and/or Table 2, or all of the polymorphisms in Tables 1 and 2.
In an embodiment, the kit is for detecting, in a biological sample derived from a human subject, the presence at least one polymorphisms associated with a risk of developing atrial fibrillation, wherein the at least one polymorphism is selected from Table 3, or a single nucleotide polymorphism in linkage disequilibrium with one or more thereof. In an embodiment, the kit comprises sets of primers for detecting at least two, at least three, at least four, at least five, at least six, at least seven, at least eight, at least nine, at least ten, at least 20, at least 50, at least 100, at least 125, at least 150, at least 175, at least 200, at least 225, at least 250, or all of the polymorphism provided in Table 3, or a single nucleotide polymorphism in linkage disequilibrium with one or more thereof. In an embodiment, the kit comprises sets of primers for detecting each of the polymorphism provided in Table 3.
In another embodiment, the kit is for detecting, in a biological sample derived from a human subject, the presence at least one polymorphisms associated with a risk of developing Type 2 diabetes, wherein the at least one polymorphism is selected from Table 4, or a single nucleotide polymorphism in linkage disequilibrium with one or more thereof. In an embodiment, the kit comprises sets of primers for detecting at least three, at least four, at least five, at least six, at least seven, at least eight, at least nine, at least ten, at least 20, at least 30, at least 40, at least 50, at least 60, at least 70, at least 80, at least 85, or all of the polymorphism provided in Table 4, or a single nucleotide polymorphism in linkage disequilibrium with one or more thereof. In an embodiment, the kit comprises sets of primers for detecting each of the polymorphism provided in Table 4.
As would be appreciated by those of skill in the art, once a SNP is identified, primers can be designed to amplify the SNP as a matter of routine. Various software programs are freely available that can suggest suitable primers for amplifying SNPs of interest.
Again, it would be known to those of skill in the art that PCR primers of a PCR primer pair can be designed to specifically amplify a region of interest from human DNA. In the context of the present disclosure, the region of interest contains the single-base variation (e.g. single-nucleotide polymorphism, SNP) which shall be genotyped. Each PCR primer of a PCR primer pair can be placed adjacent to a particular single-base variation on opposing sites of the DNA sequence variation. Furthermore, PCR primers can be designed to avoid any known DNA sequence variation and repetitive DNA sequences in their PCR primer binding sites.
The kit may further comprise other reagents required to perform an amplification reaction such as a buffer, nucleotides and/or a polymerase, as well as reagents for extracting nucleic acids from a sample.
Array based detection is one preferred method for assessing the SNPs of the disclosure in samples, due to the inherently high-throughput nature of array based detection. A variety of probe arrays have been described in the literature and can be used in the context of the present disclosure for detection of SNPs that can be correlated to coronary artery disease, atrial fibrillation or Type 2 diabetes. For example, DNA probe array chips are used in one embodiment of the disclosure. The recognition of sample DNA by the set of DNA probes takes place through DNA hybridization. When a DNA sample hybridizes with an array of DNA probes, the sample binds to those probes that are complementary to the sample DNA sequence. By evaluating to which probes the sample DNA for an individual hybridizes more strongly, it is possible to determine whether a known sequence of nucleic acid is present or not in the sample, thereby determining whether a marker found in the nucleic acid is present.
Thus, in another embodiment, the present disclosure provides a genetic array comprising at least two sets of probes for hybridising to two or more nucleic acids, wherein the two or more nucleic acids comprise a polymorphism selected from any one of Tables 1 to 4, or any combinations thereof such as Tables, 1, 3 and 4, or Tables 1 and 2, or a single nucleotide polymorphism in linkage disequilibrium with one or more thereof.
In an embodiment, the genetic array is for detecting, in a biological sample derived from a human subject, the presence of at least one polymorphisms associated with a risk of developing coronary artery disease, wherein the at least one polymorphism is selected from Table 1 and/or Table 2, or a single nucleotide polymorphism in linkage disequilibrium with one or more thereof. In an embodiment, the genetic array comprises probes for detecting at least three, at least four, at least five, at least six, at least seven, at least eight, at least nine, at least ten, at least 20, at least 50, at least 100, at least 150, at least 200, at least 250, at least 300, at least 350, at least 400, at least 450, at least 500, at least 550, at least 600, at least 650, at least 700, at least 750, at least 800, at least 825, at least 850, at least 900, at least 950, at least 1,000, or all of the polymorphism provided in Table 1 and/or Table 2, or a single nucleotide polymorphism in linkage disequilibrium with one or more thereof. In an embodiment, the genetic array comprises probes for detecting each of the polymorphisms provided in Table 1 and/or Table 2, or all of the polymorphisms in Tables 1 and 2.
In an embodiment, the genetic array is for detecting, in a biological sample derived from a human subject, the presence at least one polymorphisms associated with a risk of developing atrial fibrillation, wherein the at least one polymorphism is selected from Table 3, or a single nucleotide polymorphism in linkage disequilibrium with one or more thereof. In an embodiment, the genetic array comprises probes for detecting at least two, at least three, at least four, at least five, at least six, at least seven, at least eight, at least nine, at least ten, at least 20, at least 50, at least 100, at least 125, at least 150, at least 175, at least 200, at least 225, at least 250, or all of the polymorphism provided in Table 3, or a single nucleotide polymorphism in linkage disequilibrium with one or more thereof. In an embodiment, the genetic array comprises probes for detecting each of the polymorphism provided in Table 3.
In another embodiment, the genetic array is for detecting, in a biological sample derived from a human subject, the presence at least one polymorphisms associated with a risk of developing Type 2 diabetes, wherein the at least one polymorphism is selected from Table 4, or a single nucleotide polymorphism in linkage disequilibrium with one or more thereof. In an embodiment, the genetic array comprises probes for detecting at least three, at least four, at least five, at least six, at least seven, at least eight, at least nine, at least ten, at least 20, at least 30, at least 40, at least 50, at least 60, at least 70, at least 80, at least 85, or all of the polymorphism provided in Table 4, or a single nucleotide polymorphism in linkage disequilibrium with one or more thereof. In an embodiment, the genetic array comprises probes for detecting each of the polymorphism provided in Table 4.
In an embodiment, the array comprises less than 100,000, less than 50,000, less than 25,000, less than 10,000, less than 5,000, less than 2,500 or less than 1,000 specific nucleic acids.
Primers and probes for other SNPs can be included with the above exemplified kits/arrays.
The inventors used data from the UK Biobank Axiom Array (Bycroft et al., 2018) to develop polygenic risk scores (PRSs) for five common diseases (Said et al., 2018; Eastwood et al., 2016): CAD, hypertension, atrial fibrillation, stroke and type 2 diabetes. The UK Biobank conducted baseline assessment of over 500,000 participants aged 40-69 years from 2006 to 2010. For quality control, the inventors removed variants with minor allele frequency less than 0.001, Hardy-Weinberg equilibrium p-value less than 10−5, and genotyping rate at least 95%. After quality control and exclusion of prevalent cases, the data we used for analysis consisted of 315,327 individuals and 602,976 SNPs.
To adjust for the effect of age and prevent extreme unbalanced case-control ratio, the inventors applied the following resampling strategy to create training and testing sets for our study. For each of the diseases, the inventors computed the quintiles of age and divided individuals into five age groups. For each of the five age groups, and for each gender separately, we drew (at most) 5 controls for each case (i.e., if the number of controls was not enough to draw 5 controls per case for all age groups, the inventors drew 4 per case, and so on). By this resampling strategy, the individuals are age and sex matched, and the size difference between the number of cases and controls is largely reduced.
After resampling, the data was split into training (70%) and testing (30%) sets for each disease, with roughly the same case-control ratio. The sizes of the training and testing sets are summarized in Table 8. The training sets were used to build a PRS for each disease and the predictive performances were evaluated on the testing sets. The overall workflow is summarized in
The inventors created our PRSs using a recently developed method called stacked clumping and thresholding (SCT) (Prive et al., 2019). Apply clumping (or pruning) to control linkage disequilibrium and then followed by marginal p-value thresholding is a standard method for computing PRS9. This approach requires users to specify hyperparameters such as the size of clumping windows (kb), the correlation threshold (r2) and the p-value significance threshold for clumped SNPs. In general, it is not straightforward how to choose these hyperparameters in practice. Usually, users apply some default values for these hyperparameters, for example, the default option in Plink (Purcell et al., 2007) uses r2=0:5 for the correlation threshold, 250 kb for the window size and p=0:01 for the p-value threshold.
SCT is a more general algorithm that is based on the standard clumping and thresholding method. It chooses a set of hyperparameters, runs clumping and thresholding on each combination of those parameters and gives a PRS for each combination. Table 9 shows an example of these hyperparameters values; these are the default values used in the R package bigsnpr. The PRSs are then stacked using a penalized regression model. The outcome of this algorithm is a linear combination of PRSs, where each PRS is also a linear combination of variants. Therefore, a single vector of variant effect sizes can be obtained in the final prediction model. Instead of stacking these PRSs, we could also select the PRS with the best prediction, and this is referred to the maxCT approach. In general, SCT would identify more genetic variants than maxCT.
The inventors applied SCT and maxCT to create PRSs for five common diseases using UK Biobank data. They used training sets to create 1,400 risk scores for each chromosome using different values of hyperparameters, as listed in Table 9. For maxCT, they selected the risk score that maximized the AUC on the training sets as the final PRS. For SCT, they stacked those 30,800 (1,400×22) risk scores from all 22 chromosomes by a penalized logistic regression, the optimal stack weight was also estimated from the training sets. These steps were conducted using the R package bigsnpr.
To estimate the GWAS effect sizes of SNPs, the inventors obtained summary statistics from large external GWAS. Ambiguous SNPs and variants with duplicated positions or refSNP cluster ID numbers were removed, only keeping SNPs that appeared in both the UK Biobank data and the study from which the inventors used summary statistics. These GWAS and the number of SNPs are summarized in Table 10.
For each disease, the inventors created the risk scores using three different approaches. A risk prediction model was built by (i) using only the genetic variants, (ii) using the Framingham score (D'Agostino et al., 1994 and 2008; Wilson et al., 1998 and 2007, Wolf et al., 1991; Schnabel et al., 2009; Parikh et al. 2008), which uses clinical factors for estimating risk, and (iii) using genetic variants together with the Framingham score. The aim was to examine the predictive performance of genetic variants without other covariates such as age, and to verify if we can improve risk prediction by combining genetic scores with the clinical factor. After these risk scores are created for each disease, their predictive power on the testing data was quantified by computing AUC.
The main results are summarized in Table 11. For CAD, the AUCs of the risk scores using only genetic variants were 0.58 (95% CI: [0.57-0.59]) with maxCT and 0.60 (95% CI: [0.59-0.61]) with SCT. Similar AUC values were found in the model that used the Framingham score as the only predictor, which had an AUC of 0.59 (95% CI: [0.58-0.60]). Furthermore, the clinical risk prediction is improved when the PRS was combined with the Framingham score; the AUC is increased from 0.59 to 0.62 (95% CI: [0.60-0.63]) with maxCT and 0.63 (95% CI: [0.62-0.64]) with SCT. Table 12 shows the numbers of SNPs identified by both methods and Table 13 gives the optimal hyperparameters used in maxCT. SCT identified a set of 389,606 variants that best predicted CAD, while maxCT identified 867 variants, both when the Framingham score was present in the prediction model.
Strong predictive performance was also found for atrial fibrillation. Compared with the Framingham risk model, which has an AUC of 0.54 (95% CI: [0.53-0.56]), the risk scores based on genetic variants provided much better prediction, having an AUC of 0.60 (95% CI: [0.59-0.61]) with maxCT and 0.61 (95% CI: [0.60-0.63]) with SCT. The inventors also found combining the PRS with the Framingham score substantially increased the AUC from 0.54 to 0.61 (95% CI: [0.60-0.63]) with maxCT and 0.63 (95% CI: [0.61-0.64]) with SCT. In addition, the strong performance of both methods, especially maxCT, are obtained using much fewer variants compared with other studies. For example, the PRS for atrial fibrillation developed in Khera et al. (2018) has 6,730,541 SNPs while our PRS has 225,032 variants identified by SCT and 265 variants identified by maxCT.
For hypertension and type 2 diabetes, the Framingham score provided a very strong predictive performance, having AUCs of 0.72 (95% CI: [0.71-0.73]) and 0.77 (95% CI: [0.75-0.79]) respectively. For these two diseases, the clinical factor outperformed the PRSs, which had AUCs of 0.59 (95% CI: [0.58-0.60]) for hypertension and 0.60 (95% CI: [0.58-0.62]) for type 2 diabetes, both with SCT. Adding the PRS increased the AUC for type 2 diabetes from 0.77 to 0.78 (95% CI: [0.76-0.79]) and no meaningful improvement was observed for hypertension.
For stroke, the predictive performance of PRS was weak compared with the previous diseases. The risk scores generated by the three approaches all have AUC less than 0.60; in fact, the PRS only has an AUC of 0.52 (95% CI: [0.49-0.54]) with SCT. The Framingham score has an AUC of 0.56 (95% CI: [0.54-0.59]) and the best AUC among all risk scores was found to be 0.57 (95% CI: [0.54-0.59]).
To provide another way of interpreting the strength of risk prediction, we also computed the odds per adjusted standard deviation (OPERA) (Hopper et al., 2015) for the Framingham score, the PRS based on genetic variants by SCT and the combination of these two risk scores. OPERA is a measure to access the ability of a risk factor to discriminate between cases and controls on a population basis, it uses the standard deviation of the residuals for controls after adjusting for all other risk factors (such as age or sex). In this case, because age and sex are already adjusted by the design, the inventors did not put them in the prediction model when computing the OPERA. The results are summarized in Table 14.
To reassure age had been taken into account by our resampling strategy, the inventors also added the OPERA of age in the first column. The OPERA of age are close to 1 for all five common diseases, equivalently it means the risk gradients of age are close to 0:5 in terms of AUC. In contrast, without this design, the inventors found that using age only to predict atrial fibrillation has an AUC of 0.69, this shows why high AUCs can be misleading if age is simply included in the predictions.
In terms of the OPERA for the Framingham score and the PRS, the results are similar to what the inventors observed before. For example, the Framingham score has strong predictive power for hypertension and type 2 diabetes, having an OPERA of 2.20 (95% CI: [2.12-2.28]) and 2.02 (95% CI: [1.90-2.14]) respectively. The PRS has better predictive power than the Framingham score for CAD and atrial fibrillation, having an OPERA of 1.43 (95% CI: [1.37-1.49]) and 1.50 (95% CI: [1.43-1.58]) respectively, while the Framingham score has an OPERA of 1.31 (95% CI: [1.27-1.36]) for CAD and 1.19 (95% CI: [1.14-1.24]) for atrial fibrillation. As before, the inventors found improvement in predictions when combing the PRS with the Framingham score, the OPERA is increased from 1.31 (95% CI: [1.27-1.36]) to 1.56 (95% CI: [1.50-1.62]) for CAD and from 1.19 (95% CI: [1.14-1.24]) to 1.57 (95% CI: [1.50-1.65]) for atrial fibrillation. No meaningful improvement was found for stroke and hypertension.
The inventors also found combining the PRS with the PCE score (Goff et al., 2014; Riveros-McKay et al., 2021) (and as described herein) further improved test performance compared with the Framingham plus SNPs models, with an increase in AUC from 0.62 (95% CI 0.60-0.63) to 0.75 (95% CI 0.74-0.76). The PCE data included the SNPs in Table 2 in addition to those in Table 1.
Furthermore, the inventors determined that using a calibrated Framingham score (as described herein) improved the performance of the test for atrial fibrillation from an AUC of 0.7184 to 0.7361.
The present application claims priority from AU 2020903793 filed 20 Oct. 2020, the entire contents of which are incorporation herein by reference.
It will be appreciated by persons skilled in the art that numerous variations and/or modifications may be made to the invention as shown in the specific embodiments without departing from the spirit or scope of the invention as broadly described. The present embodiments are, therefore, to be considered in all respects as illustrative and not restrictive.
All publications discussed and/or referenced herein are incorporated herein in their entirety.
Any discussion of documents, acts, materials, devices, articles or the like which has been included in the present specification is solely for the purpose of providing a context for the present invention. It is not to be taken as an admission that any or all of these matters form part of the prior art base or were common general knowledge in the field relevant to the present invention as it existed before the priority date of each claim of this application.
Number | Date | Country | Kind |
---|---|---|---|
2020903793 | Oct 2020 | AU | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/AU2021/051218 | 10/19/2021 | WO |