The present invention relates to the identification and use of diagnostic markers for cardiovascular illness. In various aspects, the invention relates to methods for the prediction of stroke and its sub-types, cardiovascular damage and response to hypertension medication and the development of novel therapies in hypertension treatment and cardiovascular illness.
The following discussion of the background of the invention is merely provided to aid the reader in understanding the invention and is not admitted to describe or constitute prior art to the present invention.
Stroke Background
Stroke is the third leading cause of death in the U.S. and Europe, behind heart disease and cancer. Each year, about 700,000 people suffer a stroke. About 500,000 of these are first attacks, and 200,000 are recurrent attacks. Stroke killed 283,000 people in 2000 and accounted for about 1 of almost 14 deaths in the United States.
At all ages, 40,000 more women than men have a stroke. 28% of people who suffer a stroke in a given year are under age 65. Compared with whites, young African Americans have a two- to threefold greater risk of ischemic stroke, and African-American men and women are more likely to die of stroke.
Stroke is the leading cause of serious, long-term disability in the United States. 7.6% of ischemic strokes and 37.5% of hemorrhagic strokes result in death within 30 days. 8% of men and 11% of women will have a stroke within six years after a heart attack. 14% of people who have a stroke or TIA will have another within a year. 22% of men and 25% of women who have an initial stroke die within a year. For further statistics, one can consult the U.S. Centers for Disease Control and Prevention and the Heart Disease and Stroke Statistics—2004 Update, published by the American Heart Association.
A stroke is a sudden interruption in the blood supply of the brain. Most strokes are caused by an abrupt blockage of arteries leading to the brain (ischemic stroke). Other strokes are caused by bleeding into brain tissue when a blood vessel bursts (hemorrhagic stroke). Because stroke occurs rapidly and requires immediate treatment, stroke is also called a brain attack. When the symptoms of a stroke last only a short time (less than an hour), this is called a transient ischemic attack (TIA) or mini-stroke. Stroke has many consequences.
The effects of a stroke depend on which part of the brain is injured, and how severely it is injured. Strokes may cause sudden weakness, loss of sensation, or difficulty with speaking, seeing, or walking. Since different parts of the brain control different areas and functions, it is usually the area immediately surrounding the stroke that is affected. Sometimes people with stroke have a headache, but stroke can also be completely painless. It is very important to recognize the warning signs of stroke and to get immediate medical attention if they occur.
Stroke or brain attack is a sudden problem affecting the blood vessels of the brain. There are several types of stroke, and each type has different causes. The three main types of stroke are listed below.
The most common type of stroke—accounting for almost 80% of strokes—is caused by a clot or other blockage within an artery leading to the brain.
Intracerebral Hemorrhage
An intracerebral hemorrhage is a type stroke caused by the sudden rupture of an artery within the brain. Blood is then released into the brain, compressing brain structures.
Subarachnoid Hemorrhage
A subarachnoid hemorrhage is also a type of stroke caused by the sudden rupture of an artery. A subarachnoid hemorrhage differs from an intracerebral hemorrhage in that the location of the rupture leads to blood filling the space surrounding the brain rather than inside of it.
Ischemic stroke occurs when an artery to the brain is blocked. The brain depends on its arteries to bring fresh blood from the heart and lungs. The blood carries oxygen and nutrients to the brain, and takes away carbon dioxide and cellular waste. If an artery is blocked, the brain cells (neurons) cannot make enough energy and will eventually stop working. If the artery remains blocked for more than a few minutes, the brain cells may die. This is why immediate medical treatment is absolutely critical.
Ischemic stroke can be caused by several different kinds of diseases. The most common problem is narrowing of the arteries in the neck or head. This is most often caused atherosclerosis, or gradual cholesterol deposition. If the arteries become too narrow, blood cells may collect and form blood clots. These blood clots can block the artery where they are formed (thrombosis), or can dislodge and become trapped in arteries closer to the brain (embolism). Another cause of stroke is blood clots in the heart, which can occur as a result of irregular heartbeat (for example, atrial fibrillation), heart attack, or abnormalities of the heart valves. While these are the most common causes of ischemic stroke, there are many other possible causes. Examples include use of street drugs, traumatic injury to the blood vessels of the neck, or disorders of blood clotting.
Ischemic stroke can further be divided into two main types: thrombotic and embolic.
A thrombotic stroke occurs when diseased or damaged cerebral arteries become blocked by the formation of a blood clot within the brain. Clinically referred to as cerebral thrombosis or cerebral infarction, this type of event is responsible for almost 50% of all strokes. Cerebral thrombosis can also be divided into an additional two categories that correlate to the location of the blockage within the brain: large-vessel thrombosis and small-vessel thrombosis. Large-vessel thrombosis is the term used when the blockage is in one of the brain's larger blood-supplying arteries such as the carotid or middle cerebral, while small-vessel thrombosis involves one (or more) of the brain's smaller, yet deeper penetrating arteries. This latter type of stroke is also called a lacuner stroke.
An embolic stroke is also caused by a clot within an artery, but in this case the clot (or emboli) was formed somewhere other than in the brain itself. Often from the heart, these emboli will travel the bloodstream until they become lodged and cannot travel any further. This naturally restricts the flow of blood to the brain and results in almost immediate physical and neurological deficits.
Thrombolytic therapy has been proven to be effective for the treatment of acute ischemic stroke, but the increased risk of tissue plasminogen activator (tPA) is still of great clinical concern (see for instance The National Institutes of Neurological Disorders, and Stroke rt-PA Stroke Study Group. Tissue plasminogen activator for acute ischemic stroke. New England Journal of Medicine 1995; 333:1581-7).
As it is critical to restore proper blood flow to the brain as soon as possible to prevent tissue damage, rapid diagnosis of stroke is critical to the survival of the patient and the minimization of any effects of the stroke to the patient. If caught from three to six hours after occurrence most stroke patients can expect full or partial recovery
Current state of the art diagnosis of stroke involves a physical examination and imaging procedures such as computed tomography (CT) scan, angiogram, electrocardiogram, magnetic resonance imaging (MRI), Single photon emission computed tomography (SPECT) and positron emission tomography (PET).
While physical examination is rapid, it only can detect large strokes (defined to be significant impairment of symptoms on the National Institutes of Health Stroke Scale, (NIHSS) of greater than 12). In addition, prior studies have found that the accuracy of stroke identification by medical personnel is modest and variable from one community to another. Sensitivity for stroke recognition by prehospital personnel has ranged widely, and positive predictive values have remained between 64% and 77% (see for instance Zweifler R M, York D, U TT, Mendizabal J E, Rothrock J F. Accuracy of paramedic diagnosis of stroke. J Stroke Cerebrovasc Dis. 1998; 7:446-448.). These studies have consistently suggested a tendency for prehospital personnel to overdiagnose stroke by not recognizing stroke mimics, such as patients with alcohol and drug intoxication, postictal hemiparesis, hypoglycemia or other metabolic encephalopathies, and other nonstroke causes of acute neurological deficits. Finally, any clinical neurological screening test will be limited by the training and experience of the examiner. This suggests the need for an adjunctive clinical test that can provide diagnostic information above and beyond screening clinical exams.
CT scan produces x-ray images of the brain and is used to determine the location and extent of hemorrhagic stroke. It has widespread availability. CT scan usually cannot produce images showing signs of ischemic stroke until 48 hours after onset. This insensitivity to acute stroke limits its use to post-stroke damage assessment.
SPECT and PET involve injecting a radioactive substance into the bloodstream and monitoring it as it travels through blood vessels in the brain. These tests allow physicians to detect damaged regions of the brain resulting from reduced blood flow. However, this takes several hours, and thus is not used for rapid diagnosis of stroke.
MRI with magnetic resonance angiography (MRA) uses a magnetic field to produce detailed images of brain tissue and arteries in the neck and brain, allowing physicians to detect small-vessel infarct (i.e., stroke in small blood vessels deep in brain tissue). However, as a practical issue, most hospitals do not have these specialized and highly expensive MRI services available in the acute setting. Thus, without a practical and widely available radiological test, the diagnosis of stroke remains largely a clinical decision.
Recently, many researchers have investigated the possibility of blood-borne markers of stroke and its subtypes. This approach is well established in the clinical setting of suspected myocardial ischemia. In acute coronary syndromes, the myocardial isoform of creatinine phosphokinase and troponin play an important role both in treatment decisions and clinical research. Similarly, B-type natriuretic peptide has become a routine part of the assessment of patients with congestive heart failure and dyspnea. However, the ischemic cascade of glial activation and ischemic neuronal injury in stroke is far more complex than myocardial ischemia and less amenable to the use of a single biochemical marker. Indeed, the authors of the instant invention know of no individual biochemical marker has been demonstrated to possess the requisite sensitivity and specificity to allow it to function independently as a clinically useful diagnostic marker.
Thus a panel of markers was envisioned to overcome this deficiency in 1998 or earlier for detecting stroke (see for instance Misz M, Olah L, Kappelmayer J, Blasko G, Udvardy M, Fekete I, Csepany T, Ajzner E, Csiba L. Hemostatic abnormalities in ischemic stroke, Orv Hetil. 1998 Oct. 18; 139(42):2503-7; Tarkowski E, Rosengren L, Blomstrand C, Jensen C, Ekholm S, Tarkowski A. Intrathecal expression of proteins regulating apoptosis in acute stroke. Stroke. 1999 February; 30(2):321-7; Stevens H, Jakobs C, de Jager A E, Cunningham R T, Korf J. Neurone-specific enolase and N-acetyl-aspartate as potential peripheral markers of ischaemic stroke. Eur J Clin Invest. 1999 January; 29(1):6-11.) or its sub-types (see for instance Soderberg S, Ahren B, Stegmayr B, Johnson O, Wiklund P G, Weinehall L, Hallmans G, Olsson T. Leptin is a risk marker for first-ever hemorrhagic stroke in a population-based cohort. Stroke. 1999 February; 30(2):328-37).
In many studies since this time, many blood-borne proteomic markers have been shown to be associated with stroke and its sub-types. For example, acute stroke has been associated with serum elevations of numerous inflammatory and anti-inflammatory mediators such as interleukin 6 (IL-6) and matrix metalloproteinase-9 (MMP-9) (see for instance Kim J S, Yoon S S, Kim Y H, Ryu J S. Serial measurement of interleukin-6, transforming growth factor-beta, and S-100 protein in patients with acute stroke. Stroke. 1996; 27:1553-1557.; Dziedzic T, Bartus S, Klimkowicz A, Motyl M, Slowik A, Szczudlik A. Intracerebral hemorrhage triggers interleukin-6 and interleukin-10 release in blood. Stroke. 2002; 33:2334-2335.; Beamer N B, Coull B M, Clark W M, Hazel J S, Silberger J R. Interleukin-6 and interleukin-1 receptor antagonist in acute stroke. Ann Neurol. 1995; 37:800-805.; Montaner J, Alvarez-Sabin J, Molina C, et al. Matrix metalloproteinase expression after human cardioembolic stroke: temporal profile and relation to neurological impairment. Stroke. 2001; 32:1759-1766.; Perini F, Morra M, Alecci M, Galloni E, Marchi M, Toso V. Temporal profile of serum anti-inflammatory and pro-inflammatory interleukins in acute ischemic stroke patients. Neurol Sci. 2001; 22:289-296.; Vila N, Castillo J, Davalos A, Chamorro A. Proinflammatory cytokines and early neurological worsening in ischemic stroke. Stroke. 2000; 31: 2325-2329), markers of impaired hemostasis and thrombosis (see for instance Fon E A, Mackey A, Cote R, et al. Hemostatic markers in acute transient ischemic attacks. Stroke. 1994; 25:282-286.; Takano K, Yamaguchi T, Uchida K. Markers of a hypercoagulable state following acute ischemic stroke. Stroke. 1992; 23:194-198.), and markers of glial activation such as S100b (see for instance Buttner T, Weyers S, Postert T, Sprengelmeyer R, Kuhn W. S-100 protein: serum marker of focal brain damage after ischemic territorial MCA infarction. Stroke. 1997; 28:1961-1965.; Martens P, Raabe A, Johnsson P. Serum S-100 and neuron-specific enolase for prediction of regaining consciousness after global cerebral ischemia. Stroke. 1998; 29:2363-2366.). Several of these mediators, including IL-6, have been shown to be elevated within hours after ischemia and correlate with infarct volume (see for instance Fassbender K, Rossol S, Kammer T, et al. Proinflammatory cytokines in serum of patients with acute cerebral ischemia: kinetics of secretion and relation to the extent of brain damage and outcome of disease. J Neurol Sci. 1994; 122:135-139.; Tarkowski E, Rosengren L, Blomstrand C, et al. Early intrathecal production of interleukin-6 predicts the size of brain lesion in stroke. Stroke. 1995; 26:1393-1398).
Other authors have looked at the differentiation between TIA and stroke (see for instance Dambinova S A, Khounteev G A, Skoromets A A. Multiple panel of biomarkers for TIA/stroke evaluation. Stroke. 2002; 33:1181-1182.) or type of hemorrhage (see for instance McGirt M J, Lynch J R, Blessing R, Warner D S, Friedman A H, Laskowitz D T. Serum von Willebrand factor, matrix metalloproteinase-9, and vascular endothelial growth factor levels predict the onset of cerebral vasospasm after aneurysmal subarachnoid hemorrhage. Neurosurgery. 2002; 51:1128-1134).
To this date, most of these studies have been in small number of patients and while have individual markers in common, the panels proposed in each have not been replicated. This is due to the fact that many reported panels merely linearly add the effects of multiple markers, or perform simple logistic regression to get correlative effects of a panel. One such example of the current state of the art is that of Reynolds et al. (Mark A. Reynolds, Howard J. Kirchick, Jeffrey R. Dahlen, Joseph M. Anderberg, Paul H. McPherson, Kevin K. Nakamura, Daniel T. Laskowitz, Gunars E. Valkirs, and Kenneth F. Buechler, Early biomarkers of stroke, Clinical Chemistry 49:10 1733-1739, 2003). In this paper, a five marker panel consisting of S-100β, B-type neurotrophic growth factor, von Willebrand factor, matrix metalloproteinase-9, and monocyte chemotactic protein-1 was disclosed as suggested blood-borne panel to diagnosis acute ischemic stroke. In this analysis, univariate analysis was used to select an initial pool of candidate markers, and then multivariate analysis was used to achieve the final panel. However, as shown in the instant invention, this methodology is flawed. The result of this paper was tested on data used to train such, a typical mistake which usually leads to an irreproducible result.
Another example of the state of the art is U.S. Patent application 20040121343 and/or U.S. patent Ser. No. 10/225,082. In these application, a variety of markers for the diagnosis of stroke are envisioned, the mere presence or absence of such markers in the blood being indicative of disease. This methodology is fatally flawed, however, since it does not indicate how to relate the collective nonlinear effects of all markers to the outcome of interest, i.e. specify an algorithm to select among such markers and another to classify such markers as related to outcome. Instead, the application anticipates using the thresholded values of such markers as an indicator, giving a simple binary response of each as a value. As such markers are all treated as independent variables, there is no interaction between them, another fatal flaw.
Most existing statistical and computational methods for biomarker feature selection such as U.S. Patent application 20040121343 and/or U.S. patent Ser. No. 10/225,082 have focused on differential expression of markers between diseased and control data sets. This metric is tested by simple calculation of fold changes, by t-test, and/or F test. These are based on variations of linear discriminant analysis (i.e., calculating some or the entire covariance matrix between features).
However, the majority of these data analysis methods are not effective for biomarker identification and disease diagnosis for the following reasons. First, although the calculation of fold changes or t-test and F-test can identify highly differentially expressed biomarkers, the classification accuracy of identified biomarkers by these methods, is, in general, not very high. This is because linear transforms typically extract information from only the second-order correlations in the data (the covariance matrix) and ignore higher-order correlations in the data. We have shown that proteomic datasets are inherently non-symmetric (unpublished data). For such cases, nonlinear transforms are necessary. Second, most scoring methods do not use classification accuracy to measure a biomarker's ability to discriminate between classes. Therefore, biomarkers that are ranked according to these scores may not achieve the highest classification accuracy among biomarkers in the experiments. Even if some scoring methods, which are based on classification methods, are able to identify biomarkers with high classification accuracy among all biomarkers in the experiments, the classification accuracy of a single marker cannot achieve the required accuracy in clinical diagnosis. Third, a simple combination of highly ranked markers according to their scores or discrimination ability is usually not be efficient for classification, as shown in the instant invention. If there is high mutual correlation between markers, then complexity increases without much gain.
Accordingly, the instant invention provides a methodology that can be used for biomarker feature selection and classification, and is applied in the instant application to detection of stroke and its subtypes.
Exemplary Biomarkers Related to Cardiovascular Illness.
As the marker group as described in U.S. Patent application 20040121343 and/or U.S. patent Ser. No. 10/225,082 have literature references detailing normal and damage levels, and are of an overlapping set to the markers anticipated in the instant invention, we incorporate their description here. However, the instant invention goes beyond what is taught or anticipated in such applications, providing a rigorous methodology of discovering such markers and interpolating between them to determine clinical outcome, while the methodology described in U.S. Patent application 20040121343 and/or U.S. patent Ser. No. 10/225,082 rely on simple linear relationships between markers and linear optimization techniques to find them. As also illustrated in the instant invention, neither the general markers used, the idea of combinations of such markers, nor techniques used to analyze them are novel.
BNP
B-type natriuretic peptide (BNP), also called brain-type natriuretic peptide is a 32 amino acid, 4 kDa peptide that is involved in the natriuresis system to regulate blood pressure and fluid balance. See for instance Bonow, R. O., Circulation 93:1946-1950 (1996). The precursor to BNP is synthesized as a 108-amino acid molecule, referred to as “pre pro BNP,” that is proteolytically processed into a 76-amino acid N-terminal peptide (amino acids 1-76), referred to as “NT pro BNP” and the 32-amino acid mature hormone, referred to as BNP or BNP 32 (amino acids 77-108). It has been suggested that each of these species NT pro-BNP, BNP-32, and the pre pro BNP—can circulate in human plasma. Tateyama et al., Biochem. Biophys. Res. Commun. 185: 760-7 (1992); Hunt et al., Biochem. Biophys. Res. Commun. 214: 1175-83 (1995). The 2 forms, pre pro BNP and NT pro BNP, and peptides which are derived from BNP, pre pro BNP and NT pro BNP and which are present in the blood as a result of proteolyses of BNP, NT pro BNP and pre pro BNP, are collectively described as markers related to or associated with BNP.
The term “BNP” as used herein refers to the mature 32-amino acid BNP molecule itself. As the skilled artisan will recognize, however, because of its relationship to BNP, the concentration of NT pro-BNP molecule can also provide diagnostic or prognostic information in patients. The phrase “marker related to BNP or BNP related peptide” refers to any polypeptide that originates from the pre pro-BNP molecule, other than the 32-amino acid BNP molecule itself. Proteolytic degradation of BNP and of peptides related to BNP have also been described in the literature and these proteolytic fragments are also encompassed it the term “BNP related peptides.”
BNP and BNP-related peptides are predominantly found in the secretory granules of the cardiac ventricles, and are released from the heart in response to both ventricular volume expansion and pressure overload. See for instance Wilkins, M. et al., Lancet 349: 1307-10 (1997). Elevations of BNP are associated with raised atrial and pulmonary wedge pressures, reduced ventricular systolic and diastolic function, left ventricular hypertrophy, and myocardial infarction. See for instance Sagnella, G. A., Clinical Science 95: 519-29 (1998). Furthermore, there are numerous reports of elevated BNP concentration associated with congestive heart failure and renal failure.
D-dimer
D-dimer is a crosslinked fibrin degradation product with an approximate molecular mass of 200 kDa. The normal plasma concentration of D-dimer is <150 ng/ml (750 pM). The plasma concentration of D-dimer is elevated in patients with acute myocardial infarction and unstable angina, but not stable angina. See for instance Hoffineister, H. M. et al., Circulation 91: 2520-27 (1995); Bayes-Genis, A. et al., Thromb. Haemost. 81: 865-68 (1999); Gurfinkel, E. et al., Br. Heart J. 71: 151-55 (1994); Kruskal, J; B. et al., N. Engl. J. Med. 317: 1361-65 (1987); and Tanaka, M. and Suzuki, A., Thromb. Res. 76: 289-98 (1994).
The plasma concentration of D-dimer also will be elevated during any condition associated with coagulation and fibrinolysis activation, including stroke, surgery, atherosclerosis, trauma, and thrombotic thrombocytopenic purpura. D-dimer is released into the bloodstream immediately following proteolytic clot dissolution by plasmin. The plasma concentration of D-dimer can exceed 2 .mu.g/ml in patients with unstable angina. See for instance Gurfinkel, E. et al. Br. Heart J. 71: 151-55 (1994). Plasma D-dimer is a specific marker of fibrinolysis and indicates the presence of a prothrombotic state associated with acute myocardial infarction and unstable angina. The plasma concentration of D-dimer is also nearly always elevated in patients with acute pulmonary embolism; thus, normal levels of D-dimer may allow the exclusion of pulmonary embolism. See for instance Egermayer et al., Thorax 53: 830-34 (1998).
Cardiac Troponin
Troponin I (TnI) is a 25 kDa inhibitory element of the troponin complex, found in muscle tissue. TnI binds to actin in the absence of Ca.sup.2+, inhibiting the ATPase activity of actomyosin. A TnI isoform that is found in cardiac tissue (cTnI) is 40% divergent from skeletal muscle TnI, allowing both isoforms to be immunologically distinguished. The normal plasma concentration of cTnI is <0.1 ng/ml (4 pM). cTnI is released into the bloodstream following cardiac cell death; thus, the plasma cTnI concentration is elevated in patients with acute myocardial infarction. Investigations into changes in the plasma cTnI concentration in patients with unstable angina have yielded mixed results, but cTnI is not elevated in the plasma of individuals with stable angina. See for instance Benamer, H. et al., Am. J. Cardiol. 82: 845-50 (1998); Bertinchant, J. P. et al., Clin. Biochem. 29: 587-94 (1996); Tanasijevic, M. J. et al., Clin. Cardiol. 22: 13-16 (1999); Musso, P. et al., J. Ital. Cardiol. 26:1013-23 (1996); Holvoet, P. et al., JAMA 281: 1718-21 (1999); Holvoet, P. et al., Circulation 98: 1487-94 (1998).
The plasma concentration of cTnI in patients with acute myocardial infarction is significantly elevated 4-6 hours after onset, peaks between 12-16 hours, and can remain elevated for one week. The release kinetics of cTnI associated with unstable angina may be similar. The measurement of specific forms of cardiac troponin, including free cardiac troponin I and complexes of cardiac troponin I with troponin C and/or T may provide the user with the ability to identify various stages of ACS. Free and complexed cardiac-troponin T may be used in a manner analogous to that described for cardiac troponin I. Cardiac troponin T complex may be useful either alone or when expressed as a ratio with total cardiac troponin I to provide information related to the presence of progressing myocardial damage. Ongoing ischemia may result in the release of the cardiac troponin TIC complex, indicating that higher ratios of cardiac troponin TIC:total cardiac troponin I may be indicative of continual damage caused by unresolved ischemia. See for instance U.S. Pat. Nos. 6,147,688, 6,156,521, 5,947,124, and 5,795,725.
One versed in the ordinary state of the art knows that many other markers in the literature once measured from the blood in a diseased and healthy patient, selected through use of an feature selection algorithm might be diagnostic of cardiovascular illness if measured in combination with others and evaluated together with a nonlinear classification algorithm. We describe some of these other markers, previously considered for diagnosis or prognosis of cardiovascular illness and thus not novel in themselves, the text from U.S. patent application 20040126767.
Markers Related To Myocardial Injury
Annexin V, also called lipocortin V, endonexin II, calphobindin I, calcium binding protein 33, placental anticoagulant protein I, thromboplastin inhibitor, vascular anticoagulant-.alpha., and anchorin CII, is a 33 kDa calcium-binding protein that is an indirect inhibitor and regulator of tissue factor. Annexin V is composed of four homologous repeats with a consensus sequence common to all annexin family members, binds calcium and phosphatidyl serine, and is expressed in a wide variety of tissues, including heart, skeletal muscle, liver, and endothelial cells (See for instance Giambanco, I. et al., J. Histochem. Cytochem. 39:P1189-1198, 1991; Doubell, A. F. et al., Cardiovasc. Res. 27:1359-1367, 1993). The normal plasma concentration of annexin V is <2 ng/ml (See for instance Kaneko, N. et al., Clin. Chim. Acta 251:65-80, 1996). The plasma concentration of annexin V is elevated in individuals with acute myocardial infarction (See for instance Kaneko, N. et al., Clin. Chim. Acta 251:65-80, 1996). Due to its wide tissue distribution, elevation of the plasma concentration of annexin V may be associated with any condition involving non-cardiac tissue injury. However, one study has found that plasma annexin V concentrations were not significantly elevated in patients with old myocardial infarction, chest pain syndrome, valvular heart disease, lung disease, and kidney disease (See for instance Kaneko, N. et al., Clin. Chim. Acta 251:65-80, 1996). Annexin V is released into the bloodstream soon after acute myocardial infarction onset. The annexin V concentration in the plasma of acute myocardial infarction patients decreased from initial (admission) values, suggesting that it is rapidly cleared from the bloodstream (See for instance Kaneko, N. et al. Clin. Chim. Acta 251:65-80, 1996).
Enolase is a 78 kDa homo- or heterodimeric cytosolic protein produced from .alpha., .beta., and .gamma. subunits. Enolase catalyzes the interconversion of 2-phosphoglycerate and phosphoenolpyruvate in the glycolytic pathway. Enolase is present as .alpha..alpha., alpha..beta., .beta..beta., .alpha..gamma., and .gamma..gamma. isoforms. The .alpha. subunit is found in most tissues, the .beta. subunit is found in cardiac and skeletal muscle, and the .gamma. subunit is found primarily in neuronal and neuroendocrine tissues..beta.-enolase is composed of .alpha..beta. and .beta..beta. enolase, and is specific for muscle. The normal plasma concentration of .beta.-enolase is <10 ng/ml (120 pM). .beta.-enolase is elevated in the serum of individuals with acute myocardial infarction, but not in individuals with angina (See for instance Nomura, M. et al., Br. Heart J. 58:29-33, 1987; Herraez-Dominguez, M. V. et al., Clin. Chim. Acta 64:307-315, 1975). Further investigations into possible changes in plasma .beta.-enolase concentration associated with unstable and stable angina need to be performed. The plasma concentration of .beta.-enolase is elevated during heart surgery, muscular dystrophy, and skeletal muscle injury (See for instance Usui, A. et al., Cardiovasc. Res. 23:737-740, 1989; Kato, K. et al., Clin. Chim. Acta 131:75-85, 1983; Matsuda, H. et al., Forensic Sci. Int. 99:197-208, 1999)..beta.-enolase is released into the bloodstream immediately following cardiac or skeletal muscle injury. The plasma .beta.-enolase concentration was elevated to more than 150 ng/ml in the perioperative stage of cardiac surgery, and remained elevated for 1 week. Serum .beta.-enolase concentrations peaked approximately 12-14 hours after the onset of chest pain and acute myocardial infarction and approached baseline after 1 week had elapsed from onset, with maximum levels approaching 1 .mu.g/ml (See for instance Kato, K. et al., Clin. Chim. Acta 131:75-85, 1983; Nomura, M. et al., Br. Heart J. 58:29-33, 1987).
Creatine kinase (CK) is a 85 kDa cytosolic enzyme that catalyzes the reversible formation ADP and phosphocreatine from ATP and creatine. CK is a homo- or heterodimer composed of M and B chains. CK-MB is the isoform that is most specific for cardiac tissue, but it is also present in skeletal muscle and other tissues. The normal plasma concentration of CK-MB is <5 ng/ml. The plasma CK-MB concentration is significantly elevated in patients with acute myocardial infarction. Plasma CK-MB is not elevated in patients with stable angina, and investigation into plasma CK-MB concentration elevations in patients with unstable angina have yielded mixed results (See for instance Thygesen, K. et al., Eur. J. Clin. Invest. 16:1-4, 1986; Koukkunen, H. et al., Ann. Med. 30:488-496, 1998; Bertinchant, J. P. et al., Clin. Biochem. 29:587-594, 1996; Benamer, H. et al., Am. J. Cardiol. 82:845-850, 1998; and Norregaard-Hansen, K. et al., Eur. Heart J. 13:188-193, 1992). The mixed results associated with unstable angina suggest that CK-MB may be useful in determining the severity of unstable angina because the extent of myocardial ischemia is directly proportional to unstable angina severity. Elevations of the plasma CK-MB concentration are associated with skeletal muscle injury and renal disease. CK-MB is released into the bloodstream following cardiac cell death. The plasma concentration of CK-MB in patients with acute myocardial infarction is significantly elevated 4-6 hours after onset, peaks between 12-24 hours, and returns to baseline after 3 days. The release kinetics of CK-MB associated with unstable angina may be similar.
Glycogen phosphorylase (GP) is a 188 kDa intracellular allosteric enzyme that catalyzes the removal of glucose (liberated as glucose-1-phosphate) from the nonreducing ends of glycogen in the presence of inorganic phosphate during glycogenolysis. GP is present as a homodimer, which associates with another homodimer to form a tetrameric enzymatically active phosphorylase A. There are three isoforms of GP that can be immunologically distinguished. The BB isoform is found in brain and cardiac tissue, the MM isoform is found in skeletal muscle and cardiac tissue, and the LL isoform is predominantly found in liver (See for instance Mair, J. et al., Br. Heart J. 72:125-127, 1994). GP-BB is normally associated with the sarcoplasmic reticulum glycogenolysis complex, and this association is dependent upon the metabolic state of the myocardium (See for instance Mair, J., Clin. Chim. Acta 272:79-86, 1998). At the onset of hypoxia, glycogen is broken down, and GP-BB is converted from a bound form to a free cytoplasmic form (See for instance Krause, E. G. et al. Mol. Cell Biochem. 160-161:289-295, 1996). The normal plasma GP-BB concentration is <7 ng/ml (36 pM). The plasma GP-BB concentration is significantly elevated in patients with acute myocardial infarction and unstable angina with transient ST-T elevations, but not stable angina (See for instance Mair, J. et al., Br. Heart J. 72:125-127, 1994; Mair, J., Clin. Chim. Acta 272:79-86, 1998; Rabitzsch, G. et al., Clin. Chem. 41:966-978, 1995; Rabitzsch, G. et al., Lancet 341:1032-1033, 1993). Furthermore, GP-BB also can be used to detect perioperative acute myocardial infarction and myocardial ischemia in patients undergoing coronary artery bypass surgery (See for instance Rabitzsch, G. et al., Biomed. Biochim. Acta 46:S584-S588, 1987; Mair, P. et al., Eur. J. Clin. Chem. Clin. Biochem. 32:543-547, 1994). GP-BB has been demonstrated to be a more sensitive marker of unstable angina and acute myocardial infarction early after onset than CK-MB, cardiac tropopnin T, and myoglobin (See for instance Rabitzsch, G. et al., Clin. Chem. 41:966-978, 1995). Because it is also found in the brain, the plasma GP-BB concentration also may be elevated during ischemic cerebral injury. GP-BB is released into the bloodstream under ischemic conditions that also involve an increase in the permeability of the cell membrane, usually a result of cellular necrosis. GP-BB is significantly elevated within 4 hours of chest pain onset in individuals with unstable angina and transient ST-T ECG alterations, and is significantly elevated while myoglobin, CK-MB, and cardiac troponin T are still within normal levels (See for instance Mair, J. et al., Br. Heart J. 72:125-127, 1994). Furthermore, GP-BB can be significantly elevated 1-2 hours after chest pain onset in patients with acute myocardial infarction (See for instance Rabitzsch, G. et al., Lancet 341:1032-1033, 1993). The plasma GP-BB concentration in patients with unstable angina and acute myocardial infarction can exceed 50 ng/ml (250 pM) (Mair, J. et al., Br. Heart J. 72:125-127, 1994; Mair, J., Clin. Chim. Acta 272:79-86, 1998; Krause, E. G. et al., Mol. Cell Biochem. 160-161:289-295, 1996; Rabitzsch, G. et al., Clin. Chem. 41:966-978, 1995; Rabitzsch, G. et al., Lancet 341:1032-1033, 1993). GP-BB appears to be a very sensitive marker of myocardial ischemia, with specificity similar to that of CK-BB. GP-BB plasma concentrations are elevated within the first 4 hours after acute myocardial infarction onset, which suggests that it may be a very useful early marker of myocardial damage. Furthermore, GP-BB is not only a more specific marker of cardiac tissue damage, but also ischemia, since it is released to an unbound form during cardiac ischemia and would not normally be released upon traumatic injury. This is best illustrated by the usefulness of GP-BB in detecting myocardial ischemia during cardiac surgery. GP-BB may be a very useful marker of early myocardial ischemia during acute myocardial infarction and severe unstable angina.
Heart-type fatty acid binding protein (H-FABP) is a cytosolic 15 kDa lipid-binding protein involved in lipid metabolism. Heart-type FABP antigen is found not only in heart tissue, but also in kidney, skeletal muscle, aorta, adrenals, placenta, and brain (See for instance Veerkamp, J. H. and Maatman, R. G., Prog. Lipid Res. 34:17-52, 1995; Yoshimoto, K. et al., Heart Vessels 10:304-309, 1995). Furthermore, heart-type FABP mRNA can be found in testes, ovary, lung, mammary gland, and stomach (Veerkamp, J. H. and Maatman, R. G., Prog. Lipid Res. 34:17-52, 1995). The normal plasma concentration of FABP is <6 ng/ml (400 pM). The plasma H-FABP concentration is elevated in patients with acute myocardial infarction and unstable angina (See for instance Ishii, J. et al., Clin. Chem. 43:1372-1378, 1997; Tsuji, R. et al., Int. J. Cardiol. 41:209-217, 1993). Furthermore, H-FABP may be useful in estimating infarct size in patients with acute myocardial infarction (Glatz, J. F. et al., Br. Heart J. 71:135-140, 1994). Myocardial tissue as a source of H-FABP can be confirmed by determining the ratio of myoglobin/FABP (grams/grams). A ratio of approximately 5 indicates that FABP is of myocardial origin, while a higher ratio indicates skeletal muscle sources (Van Nieuwenhoven, F. A. et al., Circulation 92:2848-2854, 1995). Because of the presence of H-FABP in skeletal muscle, kidney and brain, elevations in the plasma H-FABP concentration may be associated with skeletal muscle injury, renal disease, or stroke. H-FABP is released into the bloodstream following cardiac tissue necrosis. The plasma H-FABP concentration can be significantly elevated 1-2 hours after the onset of chest pain, earlier than CK-MB and myoglobin (Tsuji, R. et al., Int. J. Cardiol. 41:209-217, 1993; Van Nieuwenhoven, F. A. et al., Circulation 92:2848-2854, 1995; Tanaka, T. et al., Clin. Biochem. 24:195-201, 1991). Additionally, H-FABP is rapidly cleared from the bloodstream, and plasma concentrations return to baseline after 24 hours after acute myocardial infarction onset (Glatz, J. F. et al., Br. Heart J. 71:135-140, 1994; Tanaka, T. et al., Clin. Biochem. 24:195-201, 1991).
Phosphoglyceric acid mutase (PGAM) is a 57 kDa homo- or heterodimeric intracellular glycolytic enzyme composed of 29 kDa M or B subunits that catalyzes the interconversion of 3-phosphoglycerate to 2-phosphoglycerate in the presence of magnesium. Cardiac tissue contains isozymes MM, MB, and BB, skeletal muscle contains primarily PGAM-MM, and most other tissues contain PGAM-BB (Durany, N. and Carreras, J., Comp. Biochem. Physiol. B. Biochem. Mol. Biol. 114:217-223, 1996). Thus, PGAM-MB is the most specific isozyme for cardiac tissue. PGAM is elevated in the plasma of patients with acute myocardial infarction, but further studies need to be performed to determine changes in the plasma PGAM concentration associated with acute myocardial infarction, unstable angina and stable angina (Mair, J., Crit. Rev. Clin. Lab. Sci. 34:1-66, 1997). Plasma PGAM-MB concentration elevations may be associated with unrelated myocardial or possibly skeletal tissue damage. PGAM-MB is most likely released into the circulation following cellular necrosis.
S-100 is a 21 kDa homo- or heterodimeric cytosolic Ca.sup.2+-binding protein produced from a and P subunits. It is thought to participate in the activation of cellular processes along the Ca.sup.2+-dependent signal transduction pathway (Bonfrer, J. M. et al., Br. J. Cancer 77:2210-2214, 1998). S-100ao (.alpha..alpha. isoform) is found in striated muscles, heart and kidney, S-100a (.alpha..beta.isoform) is found in glial cells, but not in Schwann cells, and S-100b (.beta..beta. isoform) is found in high concentrations in glial cells and Schwann cells, where it is a major cytosolic component (Kato, K. and Kimura, S., Biochim. Biophys. Acta 842:146-150, 1985; Hasegawa, S. et al., Eur. Urol. 24:393-396, 1993). The normal serum concentration of S-100ao is <0.25 ng/ml (12 pM), and its concentration may be influenced by age and sex, with higher concentrations in males and older individuals (Kikuchi, T. et al., Hinyokika Kiyo 36:1117-1123, 1990; Morita, T. et al., Nippon Hinyokika Gakkai Zasshi 81:1162-1167, 1990; Usui, A. et al., Clin. Chem. 36:639-641, 1990). The serum concentration of S-100ao is elevated in patients with acute myocardial infarction, but not in patients with angina pectoris with suspected acute myocardial infarction (Usui, A. et al., Clin. Chem. 36:639-641, 1990). Further investigation is needed to determine changes in the plasma concentration of S-100ao associated with unstable and stable angina. Serum S-100ao is elevated in the serum of patients with renal cell carcinoma, bladder tumor, renal failure, and prostate cancer, as well as in patients undergoing open heart surgery (Hasegawa, S. et al., Eur. Urol. 24:393-396, 1993; Kikuchi, T. et al., Hinyokika Kiyo 36:1117-1123, 1990; Morita, T. et al., Nippon Hinyokika Gakkai Zasshi 81:1162-1167, 1990; Usui, A. et al., Clin. Chem. 35:1942-1944, 1989). S-100ao is a cytosolic protein that will be released into the extracellular space following cell death. The serum concentration of S-100ao is significantly elevated on admission in patients with acute myocardial infarction, increases to peak levels 8 hours after admission, decreases and returns to baseline one week later (Usui, A. et al., Clin. Chem. 36:639-641, 1990). Furthermore, S-100ao appears to be significantly elevated earlier after acute myocardial infarction onset than CK-MB (Usui, A. et al., Clin. Chem. 36:639-641, 1990). The maximum serum S-100ao concentration can exceed 100 ng/ml. S-100ao may be rapidly cleared from the bloodstream by the kidney, as suggested by the rapid decrease of the serum S-100ao concentration of heart surgery patients following reperfusion and its increased urine concentration. S-100ao is found in high concentration in cardiac tissue and appears to be a sensitive marker of cardiac injury. Major sources of non-specificity of this marker include skeletal muscle and renal tissue injury. S-100ao may be significantly elevated soon after acute myocardial infarction onset, and it may allow for the discrimination of acute myocardial infarction from unstable angina. Patients with angina pectoris and suspected acute myocardial infarction, indicating that they were suffering chest pain associated with an ischemic episode, did not have a significantly elevated S-100ao concentration.
Markers Related to Coagulation and Hemostasis
Plasmin is a 78 kDa serine proteinase that proteolytically digests crosslinked fibrin, resulting in clot dissolution. The 70 kDa serine proteinase inhibitor .alpha.2-antiplasmin (.alpha.2AP) regulates plasmin activity by forming a covalent 1:1 stoichiometric complex with plasmin. The resulting .about.150 kDa plasmin-.alpha.2AP complex (PAP), also called plasmin inhibitory complex (PIC) is formed immediately after .alpha.2AP comes in contact with plasmin that is activated during fibrinolysis. The normal serum concentration of PAP is <1 .mu.g/ml (6.9 nM). Elevations in the serum concentration of PAP can be attributed to the activation of fibrinolysis. Elevations in the serum concentration of PAP may be associated with clot presence, or any condition that causes or is a result of fibrinolysis activation. These conditions can include atherosclerosis, disseminated intravascular coagulation, acute myocardial infarction, surgery, trauma, unstable angina, stroke, and thrombotic thrombocytopenic purpura. PAP is formed immediately following proteolytic activation of plasmin. PAP is a specific marker for fibrinolysis activation and the presence of a recent or continual hypercoagulable state.
.beta.-thromboglobulin (.beta.TG) is a 36 kDa platelet .alpha. granule component that is released upon platelet activation. The normal plasma concentration of PTG is <40 ng/ml (1.1 nM). Plasma levels of .beta.-TG appear to be elevated in patients with unstable angina and acute myocardial infarction, but not stable angina (De Caterina, R. et al., Eur. Heart J. 9:913-922, 1988; Bazzan, M. et al., Cardiologia 34, 217-220, 1989). Plasma .beta.-TG elevations also seem to be correlated with episodes of ischemia in patients with unstable angina (Sobel, M. et al., Circulation 63:300-306, 1981). Elevations in the plasma concentration of .beta.TG may be associated with clot presence, or any condition that causes platelet activation. These conditions can include atherosclerosis, disseminated intravascular coagulation, surgery, trauma, and thrombotic thrombocytopenic purpura, and stroke (Landi, G. et al., Neurology 37:1667-1671, 1987)..beta.TG is released into the circulation immediately after platelet activation and aggregation. It has a biphasic half-life of 10 minutes, followed by an extended 1 hour half-life in plasma (Switalska, H. I. et al., J. Lab. Clin. Med. 106:690-700, 1985). Plasma .beta.TG concentration is reportedly elevated dring unstable angina and acute myocardial infarction. Special precautions must be taken to avoid platelet activation during the blood sampling process. Platelet activation is common during regular blood sampling, and could lead to artificial elevations of plasma .beta.TG concentration. In addition, the amount of .beta.TG released into the bloodstream is dependent on the platelet count of the individual, which can be quite variable. Plasma concentrations of .beta.TG associated with ACS can approach 70 ng/ml (2 nM), but this value may be influenced by platelet activation during the sampling procedure.
Platelet factor 4 (PF4) is a 40 kDa platelet .alpha. granule component that is released upon platelet activation. PF4 is a marker of platelet activation and has the ability to bind and neutralize heparin. The normal plasma concentration of PF4 is <7 ng/ml (175 pM). The plasma concentration of PF4 appears to be elevated in patients with acute myocardial infarction and unstable angina, but not stable angina (Gallino, A. et al., Am. Heart J. 112:285-290, 1986; Sakata, K. et al., Jpn. Circ. J. 60:277-284, 1996; Bazzan, M. et al., Cardiologia 34:217-220, 1989). Plasma PF4 elevations also seem to be, correlated with episodes of ischemia in patients with unstable angina (Sobel, M. et al., Circulation 63:300-306, 1981). Elevations in the plasma concentration of PF4 may be associated with clot presence, or any condition that causes platelet activation. These conditions can include atherosclerosis, disseminated intravascular coagulation, surgery, trauma, thrombotic thrombocytopenic purpura, and acute stroke (See for instance Carter, A. M. et al., Arterioscler. Thromb. Vase. Biol. 18:1124-1131, 1998). PF4 is released into the circulation immediately after platelet activation and aggregation. It has a biphasic half-life of 1 minute, followed by an extended 20 minute half-life in plasma. The half-life of PF4 in plasma can be extended to 20-40 minutes by the presence of heparin (See for instance Rucinski, B. et al., Am. J. Physiol. 251:H800-H807, 1986). Plasma PF4 concentration is reportedly elevated during unstable angina and acute myocardial infarction, but these studies may not be completely reliable. Special precautions must be taken to avoid platelet activation during the blood sampling process. Platelet activation is common during regular blood sampling, and could lead to artificial elevations of plasma PF4 concentration. In addition, the amount of PF4 released into the bloodstream is dependent on the platelet count of the individual, which can be quite variable. Plasma concentrations of PF4 associated with disease can exceed 100 ng/ml (2.5 nM), but it is likely that this value may be influenced by platelet activation during the sampling procedure.
Fibrinopeptide A (FPA) is a 16 amino acid, 1.5 kDa peptide that is liberated from amino terminus of fibrinogen by the action of thrombin. Fibrinogen is synthesized and secreted by the liver. The normal plasma concentration of FPA is <5 ng/ml (3.3 nM). The plasma FPA concentration is elevated in patients with acute myocardial infarction, unstable angina, and variant angina, but not stable angina (Gensini, G. F. et al., Thromb. Res. 50:517-525, 1988; Gallino, A. et al., Am. Heart J. 112:285-290, 1986; Sakata, K. et al., Jpn. Circ. J. 60:277-284, 1996; Theroux, P. et al., Circulation 75:156-162, 1987; Merlini, P. A. et al., Circulation 90:61-68, 1994; Manten, A. et al., Cardiovasc. Res. 40:389-395, 1998). Furthermore, plasma FPA may indicate the severity of angina (Gensini, G. F. et al., Thromb. Res. 50:517-525, 1988). Elevations in the plasma concentration of FPA are associated with any condition that involves activation of the coagulation pathway, including stroke, surgery, cancer, disseminated intravascular coagulation, nephrosis, and thrombotic thrombocytopenic purpura. FPA is released into the circulation following thrombin activation and cleavage of fibrinogen. Because FPA is a small polypeptide, it is likely cleared from the bloodstream rapidly. FPA has been demonstrated to be elevated for more than one month following clot formation, and maximum plasma FPA concentrations can exceed 40 ng/ml in active angina (Gensini, G. F. et al., Thromb. Res. 50:517-525, 1988; Tohgi, H. et al., Stroke 21:1663-1667, 1990).
Platelet-derived growth factor (PDGF) is a 28 kDa secreted homo- or heterodimeric protein composed of the homologous subunits A and/or B (Mahadevan, D. et al., J. Biol. Chem. 270:27595-27600, 1995). PDGF is a potent mitogen for mesenchymal cells, and has been implicated in the pathogenesis of atherosclerosis. PDGF is released by aggregating platelets and monocytes near sites of vascular injury. The normal plasma concentration of PDGF is <0.4 ng/ml (15 pM). Plasma PDGF concentrations are higher in individuals with acute myocardial infarction and unstable angina than in healthy controls or individuals with stable angina (Ogawa, H. et al., Am. J. Cardiol. 69:453-456, 1992; Wallace, J. M. et al., Ann. Clin. Biochem. 35:236-241, 1998; Ogawa, H. et al., Coron. Artery Dis. 4:437-442, 1993). Changes in the plasma PDGF concentration in these individuals is most likely due to increased platelet and monocyte activation. Plasma PDGF is elevated in individuals with brain tumors, breast cancer, and hypertension (Kurimoto, M. et al., Acta Neurochir. (Wien) 137:182-187, 1995; Seymour, L. et al., Breast Cancer Res. Treat. 26:247-252, 1993; Rossi, E. et al., Am. J. Hypertens. 11: 1239-1243, 1998). Plasma PDGF may also be elevated in any pro-inflammatory condition or any condition that causes platelet activation including surgery, trauma, disseminated intravascular coagulation, and thrombotic thrombocytopenic purpura. PDGF is released from the secretory granules of platelets and monocytes upon activation. PDGF has a biphasic half-life of approximately 5 minutes and 1 hour in animals (Cohen, A. M. et al., J. Surg Res. 49:447-452, 1990; Bowen-Pope, D. F. et al., Blood 64:458-469, 1984). The plasma PDGF concentration in ACS can exceed 0.6 ng/ml (22 pM) (Ogawa, H. et al., Am. J. Cardiol. 69:453-456, 1992). PDGF may be a sensitive and specific marker of platelet activation. In addition, it may be a sensitive marker of vascular injury, and the accompanying monocyte and platelet activation.
Prothrombin fragment 1+2 is a 32 kDa polypeptide that is liberated from the amino terminus of thrombin during thrombin activation. The normal plasma concentration of F1+2 is <32 ng/ml (1 nM). The plasma concentration of F1+2 is reportedly elevated in patients with acute myocardial infarction and unstable angina, but not stable angina, but the changes were not robust (Merlini, P. A. et al., Circulation 90:61-68, 1994). Other reports have indicated that there is no significant change in the plasma F1+2 concentration in cardiovascular disease (Biasucci, L. M. et al., Circulation 93:2121-2127, 1996; Manten, A. et al., Cardiovasc. Res. 40:389-395, 1998). The concentration of F1+2 in plasma can be elevated during any condition associated with coagulation activation, including stroke, surgery, trauma, thrombotic thrombocytopenic purpura, and disseminated intravascular coagulation. F1+2 is released into the bloodstream immediately upon thrombin activation. F1+2 has a half-life of approximately 90 minutes in plasma, and it has been suggested that this long half-life may mask bursts of thrombin formation (Biasucci, L. M. et al., Circulation 93:2121-2127, 1996).
P-selectin, also called granule membrane protein-140, GMP-140, PADGEM, and CD-62P, is a .about.140 kDa adhesion molecule expressed in platelets and endothelial cells. P-selectin is stored in the alpha granules of platelets and in the Weibel-Palade bodies of endothelial cells. Upon activation, P-selectin is rapidly translocated to the surface of endothelial cells and platelets to facilitate the “rolling” cell surface interaction with neutrophils and monocytes. Membrane-bound and soluble forms of P-selectin have been identified. Soluble P-selectin may be produced by shedding of membrane-bound P-selectin, either by proteolysis of the extracellular P-selectin molecule, or by proteolysis of components of the intracellular cytoskeleton in close proximity to the surface-bound P-selectin molecule (Fox, J. E., Blood Coagul. Fibrinolysis 5:291-304, 1994). Additionally, soluble P-selectin may be translated from mRNA that does not encode the N-terminal transmembrane domain (Dunlop, L. C. et al., J. Exp. Med. 175:1147-1150, 1992; Johnston, G. I. et al., J. Biol. Chem. 265:21381-21385, 1990). Activated platelets can shed membrane-bound P-selectin and remain in the circulation, and the shedding of P-selectin can elevate the plasma P-selectin concentration by approximately 70 ng/ml (Michelson, A. D. et al., Proc. Natl. Acad. Sci. U.S.A. 93:11877-11882, 1996). Soluble P-selectin may also adopt a different conformation than membrane-bound P-selectin. Soluble P-selectin has a monomeric rod-like structure with a globular domain at one end, and the membrane-bound molecule forms rosette structures with the globular domain facing outward (Ushiyama, S. et al., J. Biol. Chem. 268:15229-15237, 1993). Soluble P-selectin may play an important role in regulating inflammation and thrombosis by blocking interactions between leukocytes and activated platelets and endothelial cells (Gamble, J. R. et al., Science 249:414-417, 1990). The normal plasma concentration of soluble P-selectin is <200 ng/ml. The sensitivity and specificity of membrane-bound P-selectin versus soluble P-selectin for acute myocardial infarction is 71% versus 76% and 32% versus 45% (Hollander, J. E. et al., J. Am. Coll. Cardiol. 34:95-105, 1999). The sensitivity and specificity of membrane-bound P-selectin versus soluble P-selectin for unstable angina+acute myocardial infarction is 71% versus 79% and 30% versus 35% (Hollander, J. E. et al., J. Am. Coll. Cardiol. 34:95-105, 1999). Soluble P-selectin concentration is elevated in the plasma of individuals with idiopathic thrombocytopenic purpura, rheumatoid arthritis, hypercholesterolemia, acute stroke, atherosclerosis, hypertension, acute lung injury, connective tissue disease, thrombotic thrombocytopenic purpura, hemolytic uremic syndrome, disseminated intravascular coagulation, and chronic renal failure (Katayama, M. et al., Br. J. Haematol. 84:702-710, 1993; Haznedaroglu, I. C. et al., Acta Haematol. 101:16-20, 1999; Ertenli, I. et al., J. Rheumatol. 25:1054-1058, 1998; Davi, G. et al., Circulation 97:953-957, 1998; Frijns, C. J. et al., Stroke 28:2214-2218, 1997; Blann, A. D. et al., Thromb. Haemost. 77:1077-1080, 1997; Blann, A. D. et al., J. Hum. Hypertens. 11:607-609, 1997; Sakamaki, F. et al., A. J. Respir. Crit. Care Med. 151:1821-1826, 1995; Takeda, I. et al., Int. Arch. Allergy Immunol. 105:128-134, 1994; Chong, B. H. et al., Blood 83:1535-1541, 1994; Bonomini, M. et al., Nephron 79:399-407, 1998). Additionally, any condition that involves platelet activation can potentially be a source of plasma elevations in P-selectin. P-selectin may be a sensitive and specific marker of platelet and endothelial cell activation, conditions that support thrombus formation and inflammation. It is not, however, a specific marker of ACS. When used with another marker that is specific for cardiac tissue injury, P-selectin may be useful in the discrimination of unstable angina and acute myocardial infarction from stable angina. Furthermore, soluble P-selectin maybe elevated to a greater degree in acute myocardial infarction than in unstable angina. P-selectin normally exists in two forms, membrane-bound and soluble. Published investigations note that a soluble form of P-selectin is produced by platelets and endothelial cells, and by shedding of membrane-bound P-selectin, potentially through a proteolytic mechanism. Soluble P-selectin may prove to be the most useful currently identified marker of platelet activation, since its plasma concentration may not be as influenced by the blood sampling procedure as other markers of platelet activation, such as PF4 and .beta.-TG.
Thrombin is a 37 kDa serine proteinase that proteolytically cleaves fibrinogen to form fibrin, which is ultimately integrated into a crosslinked network during clot formation. Antithrombin III (ATIII) is a 65 kDa scrine proteinase inhibitor that is a physiological regulator of thrombin, factor XIa, factor XIIa, and factor IXa proteolytic activity. The inhibitory activity of ATIII is dependent upon the binding of heparin. Heparin enhances the inhibitory activity of ATIII by 2-3 orders of magnitude, resulting in almost instantaneous inactivation of proteinases inhibited by ATIII. ATIII inhibits its target proteinases through the formation of a covalent 1:1 stoichiometric complex. The normal plasma concentration of the approximately 100 kDa thrombin-ATIII complex (TAT) is <5 ng/ml (50 pM). TAT concentration is elevated in patients with acute myocardial infarction and unstable angina, especially during spontaneous ischemic episodes (Biasucci, L. M. et al., Am. J. Cardiol. 77:85-87, 1996; Kienast, J. et al., Thromb. Haemost. 70:550-553, 1993). Elevation of the plasma TAT concentration is associated with any condition associated with coagulation activation, including stroke, surgery, trauma, disseminated intravascular coagulation, and thrombotic thrombocytopenic purpura. TAT is formed immediately following thrombin activation in the presence of heparin, which is the limiting factor in this interaction. TAT has a half-life of approximately 5 minutes in the bloodstream (Biasucci, L. M. et al., Am. J. Cardiol. 77:85-87, 1996). TAT concentration is elevated in, exhibits a sharp drop after 15 minutes, and returns to baseline less than 1 hour following coagulation activation. The plasma concentration of TAT can approach 50 ng/ml in ACS (Biasucci, L. M. et al., Circulation 93:2121-2127, 1996). TAT is a specific marker of coagulation activation, specifically, thrombin activation.
von Willebrand factor (vWF) is a plasma protein produced by platelets, megakaryocytes, and endothelial cells composed of 220 kDa monomers that associate to form a series of high molecular weight multimers. These multimers normally range in molecular weight from 600-20,000 kDa. vWF participates in the coagulation process by stabilizing circulating coagulation factor VIII and by mediating platelet adhesion to exposed subendothelium, as well as to other platelets. The A1 domain of vWF binds to the platelet glycoprotein Ib-IX-V complex and non-fibrillar collagen type VI, and the A3 domain binds fibrillar collagen types I and III (Emsley, J. et al., J. Biol. Chem. 273:10396-10401, 1998). Other domains present in the vWF molecule include the integrin binding domain, which mediates platelet-platelet interactions, the the protease cleavage domain, which appears to be relevant to the pathogenesis of type 11A von Willebrand disease. The interaction of vWF with platelets is tightly regulated to avoid interactions between vWF and platelets in normal physiologic conditions. vWF normally exists in a globular state, and it undergoes a conformation transition to an extended chain structure under conditions of high sheer stress, commonly found at sites of vascular injury. This conformational change exposes intramolecular domains of the molecule and allows vWF to interact with platelets. Furthermore, shear stress may cause vWF release from endothelial cells, making a larger number of vWF molecules available for interactions with platelets. The conformational change in vWF can be induced in vitro by the addition of non-physiological modulators like ristocetin and botrocetin (Miyata, S. et al., J. Biol. Chem. 271:9046-9053, 1996). At sites of vascular injury, vWF rapidly associates with collagen in the subendothelial matrix, and virtually irreversibly binds platelets, effectively forming a bridge between platelets and the vascular subendothelium at the site of injury. Measurement of the total amount of vWF would allow one who is skilled in the art to identify changes in total vWF concentration associated with stroke or cardiovascular disease. This measurement could be performed through the measurement of various forms of the vWF molecule. Measurement of the A1 domain would allow the measurement of active vWF in the circulation, indicating that a pro-coagulant state exists because the A1 domain is accessible for platelet binding. In this regard, an assay that specifically measures vWF molecules with both the exposed A1 domain and either the integrin binding domain or the A3 domain would also allow for the identification of active vWF that would be available for mediating platelet-platelet interactions or mediate crosslinking of platelets to vascular subendothelium, respectively. Measurement of any of these vWF forms, when used in an assay that employs antibodies specific for the protease cleavage domain may allow assays to be used to determine the circulating concentration of various vWF forms in any individual, regardless of the presence of von Willebrand disease. The normal plasma concentration of vWF is 5-10 g/ml, or 60-110% activity, as measured by platelet aggregation. The measurement of specific forms of vWF may be of importance in any type of vascular disease, including stroke and cardiovascular disease. The plasma vWF concentration is reportedly elevated in individuals with acute myocardial infarction and unstable angina, but not stable angina (Goto, S. et al., Circulation 99:608-613, 1999; Tousoulis, D. et al., Int. J. Cardiol. 56:259-262, 1996; Yazdani, S. et al., J. Am Coll Cardiol 30:1284-1287, 1997; Montalescot, G. et al., Circulation 98:294-299). Furthermore, elevations of the plasma vWF concentration may be a predictor of adverse clinical outcome in patients with unstable angina (Montalescot, G. et al., Circulation 98:294-299). vWF concentrations also have been demonstrated to be elevated in patients with stroke and subarachnoid hemorrhage, and also appear to be useful in assessing risk of mortality following stroke (Blann, A. et al., Blood Coagul. Fibrinolysis 10:277-284, 1999; Hirashima, Y. et al. Neurochem Res. 22:1249-1255, 1997; Catto, A. J. et al., Thromb. Hemost. 77:1104-1108, 1997). The plasma concentration of vWF may be elevated in conjunction with any event that is associated with endothelial cell damage or platelet activation. vWF is present at high concentration in the bloodstream, and it is released from platelets and endothelial cells upon activation. vWF would likely have the greatest utility as a marker of platelet activation or, specifically, conditions that favor platelet activation and adhesion to sites of vascular injury. The conformation of vWF is also known to be altered by high shear stress, as would be associated with a partially stenosed blood vessel. As the blood flows past a stenosed vessel, it is subjected to shear stress considerably higher than is encountered in the circulation of an undiseased individual.
Tissue factor (TF) is a 45 kDa cell surface protein expressed in brain, kidney, and heart, and in a transcriptionally regulated manner on perivascular cells and monocytes. TF forms a complex with factor Vila in the presence of C.sup.2+ ions, and it is physiologically active when it is membrane bound. This complex proteolytically cleaves factor X to form factor Xa. It is normally sequestered from the bloodstream. Tissue factor can be detected in the bloodstream in a soluble form, bound to factor Vila, or in a complex with factor Vila, and tissue factor pathway inhibitor that can also include factor Xa. TF also is expressed on the surface of macrophages, which are commonly found in atherosclerotic plaques. The normal serum concentration of TF is <0.2 ng/ml (4.5 pM). The plasma TF concentration is elevated in patients with ischemic heart disease (Falciani, M. et al., Thromb. Haemost. 79:495-499, 1998). TF is elevated in patients with unstable angina and acute myocardial infarction, but not in patients with stable angina (Falciani, M. et al., Thromb. Haemost. 79:495-499, 1998; Suefuji, H. et al., Am. Heart J. 134:253-259, 1997; Misumi, K. et al., Am. J. Cardiol. 81:22-26, 1998). Elevations in the serum concentration of TF are associated with any condition that causes or is a result of coagulation activation through the extrinsic pathway. These conditions can include subarachnoid hemorrhage, disseminated intravascular coagulation, renal failure, vasculitis, and sickle cell disease (Hirashima, Y. et al., Stroke 28:1666-1670, 1997; Takahashi, H. et al., Am. J. Hematol. 46:333-337, 1994; Koyama, T. et al., Br. J. Haematol. 87:343-347, 1994). TF is released immediately when vascular injury is coupled with extravascular cell injury. TF levels in ischemic heart disease patients can exceed 800 pg/ml within 2 days of onset (Falciani, M. et al., Thromb. Haemost. 79:495-499, 1998. TF levels were decreased in the chronic phase of acute myocardial infarction, as compared with the chronic phase (Suefuji, H. et al., Am. Heart J. 134:253-259, 1997). TF is a specific marker for activation of the extrinsic coagulation pathway and the presence of a general hypercoagulable state. It may be a sensitive marker of vascular injury resulting from plaque rupture
The coagulation cascade can be activated through either the extrinsic or intrinsic pathways. These enzymatic pathways share one final common pathway. The first step of the common pathway involves the proteolytic cleavage of prothrombin by the factor Xa/factor Va prothrombinase complex to yield active thrombin. Thrombin is a serine proteinase that proteolytically cleaves fibrinogen. Thrombin first removes fibrinopeptide A from fibrinogen, yielding desAA fibrin monomer, which can form complexes with all other fibrinogen-derived proteins, including fibrin degradation products, fibrinogen degradation products, desAA fibrin, and fibrinogen. The desAA fibrin monomer is generically referred to as soluble fibrin, as it is the first product of fibrinogen cleavage, but it is not yet crosslinked via factor XIIIa into an insoluble fibrin clot. DesAA fibrin monomer also can undergo further proteolytic cleavage by thrombin to remove fibrinopeptide B, yielding desAABB fibrin monomer. This monomer can polymerize with other desAABB fibrin monomers to form soluble desAABB fibrin polymer, also referred to as soluble fibrin or thrombus precursor protein (TpP.TM.). TpP.TM. is the immediate precursor to insoluble fibrin, which-forms a “mesh-like” structure to provide structural rigidity to the newly formed thrombus. In this regard, measurement of TpP.TM. in plasma is a direct measurement of active clot formation. The normal plasma concentration of TpP.TM. is <6 ng/ml (Laurino, J. P. et al., Ann. Clin. Lab. Sci. 27:338-345, 1997). American Biogenetic Sciences has developed an assay for TpP.TM. (U.S. Pat. Nos. 5,453,359 and 5,843,690) and states that its TpP.TM. assay can assist in the early diagnosis of acute myocardial infarction, the ruling out of acute myocardial infarction in chest pain patients, and the identification of patients with unstable angina that will progress to acute myocardial infarction. Other studies have confirmed that TpP.TM. is elevated in patients with acute myocardial infarction, most often within 6 hours of onset (Laurino, J. P. et al., Ann. Clin. Lab. Sci. 27:338-345, 1997; Carville, D. G. et al., Clin. Chem. 42:1537-1541, 1996). The plasma concentration of TpP.TM. is also elevated in patients with unstable angina, but these elevations may be indicative of the severity of angina and the eventual progression to acute myocardial infarction (Laurino, J. P. et al., Ann. Clin. Lab. Sci. 27:338-345, 1997). Plasma TpP.TM. concentrations peak within 3 hours of acute myocardial infarction onset, returning to normal after 12 hours from onset. The plasma concentration of TpP.TM. can exceed 30 ng/ml in CVD (Laurino, J. P. et al., Ann. Clin. Lab. Sci. 27:338-345, 1997). TpP.TM. is a sensitive and specific marker of coagulation activation. It has been demonstrated that TpP.TM. is useful in the diagnosis of acute myocardial infarction, but only when it is used in conjunction with a specific marker of cardiac tissue injury.
Markers Related to Atherosclerotic Plaque Rupture
The appearance of markers related to atherosclerotic plaque rupture may preceed specific markers of myocardial injury. Potential markers of atherosclerotic plaque rupture include human neutrophil elastase, inducible nitric oxide synthase, lysophosphatidic acid, malondialdehyde-modified low density lipoprotein, and various members of the matrix metalloproteinase (MMP) family, including MMP-1, -2, -3, and -9.
Matrix metalloproteinase-9 (MMP-9) also called gelatinase B, is an 84 kDa zinc- and calcium-binding proteinase that is synthesized as an inactive 92 kDa precursor. Mature MMP-9 cleaves gelatin types I and V, and collagen types IV and V. MMP-9 exists as a monomer, a homodimer, and a heterodimer with a 25 kDa a2-microglobulin-related protein (Triebel, S. et al., FEBS Lett. 314:386-388, 1992). MMP-9 is synthesized by a variety of cell types, most notably by neutrophils. The normal plasma concentration of MMP-9 is <35 ng/ml (400 pM). MMP-9 expression is elevated in vascular smooth muscle cells within atherosclerotic lesions, and it may be released into the bloodstream in cases of plaque instability (Kai, H. et al., J. Am. Coll. Cardiol. 32:368-372, 1998). Furthermore, the plasma MMP-9 concentration may be elevated in stroke and cerebral hemorrhage (Mun-Bryce, S. and Rosenberg, G. A., J. Cereb. Blood Flow Metab. 18:1163-1172, 1998; Romanic, A. M. et al., Stroke 29:1020-1030, 1998; Rosenberg, G. A., J. Neurotrauma 12:833-842, 1995).
Markers Related to Tissue Injury and Inflammation
C-reactive protein is a (CRP) is a homopentameric Ca.sup.2+-binding acute phase protein with 21 kDa subunits that is involved in host defense. CRP preferentially binds to phosphorylcholine, a common constituent of microbial membranes. Phosphorylcholine is also found in mammalian cell membranes, but it is not present in a form that is reactive with CRP. The interaction of CRP with phosphorylcholine promotes agglutination and opsonization of bacteria, as well as activation of the complement cascade, all of which are involved in bacterial clearance. Furthermore, CRP can interact with DNA and histones, and it has been suggested that CRP is a scavenger of nuclear material released from damaged cells into the circulation (Robey, F. A. et al., J. Biol. Chem. 259:7311-7316, 1984). CRP synthesis is induced by 11-6, and indirectly by IL-1, since IL-1 can trigger the synthesis of IL-6 by Kupffer cells in the hepatic sinusoids. The normal plasma concentration of CRP is <3 .mu.g/ml (30 nM) in 90% of the healthy population, and <10 .mu.g/ml (100 nM) in 99% of healthy individuals. Plasma CRP concentrations can be measured by rate nephelometry or ELISA. The plasma concentration of CRP is significantly elevated in patients with acute myocardial infarction and unstable angina, but not stable angina (Biasucci, L. M. et al., Circulation 94:874-877, 1996; Biasucci, L. M. et al., Am. J. Cardiol. 77:85-87, 1996; Benamer, H. et al., Am. J. Cardiol. 82:845-850, 1998; Caligiuri, G. et al., J. Am. Coll. Cardiol. 32:1295-1304, 1998; Curzen, N. P. et al., Heart 80:23-27, 1998; Dangas, G. et al., Am. J. Cardiol. 83:583-5, A7, 1999). The concentration of CRP will be elevated in the plasma from individuals with any condition that may elicit an acute phase response, such as infection, surgery, trauma, and stroke. CRP is a secreted protein that is released into the bloodstream soon after synthesis. CRP synthesis is upregulated by IL-6, and the plasma CRP concentration is significantly elevated within 6 hours of stimulation (Biasucci, L. M. et al., Am. J. Cardiol. 77:85-87, 1996). The plasma CRP concentration peaks approximately 50 hours after stimulation, and begins to decrease with a half-life of approximately 19 hours in the bloodstream (Biasucci, L. M. et al., Am. J. Cardiol. 77:85-87, 1996).
Interleukin-1.beta. (IL-1.beta.) is a 17 kDa secreted proinflammatory cytokine that is involved in the acute phase response and is a pathogenic mediator of many diseases. IL-1.beta. is normally produced by macrophages and epithelial cells. IL-1.beta. is also released from cells undergoing apoptosis. The normal serum concentration of IL-1.beta. is <30 pg/ml (1.8 pM). In theory, IL-1.beta. would be elevated earlier than other acute phase proteins such as CRP in unstable angina and acute myocardial infarction, since IL-1.beta. is an early participant in the acute phase response. Furthermore, IL-1.beta. is released from cells undergoing apoptosis, which may be activated in the early stages of ischemia.
Interleukin-1 receptor antagonist (IL-Ira) is a 17 kDa member of the IL-1 family predominantly expressed in hepatocytes, epithelial cells, monocytes, macrophages, and neutrophils. IL-Ira has both intracellular and extracellular forms produced through alternative splicing. IL-Ira is thought to participate in the regulation of physiological IL-1 activity. IL-Ira has no IL-1-like physiological activity, but is able to bind the IL-1 receptor on T-cells and fibroblasts with an affinity similar to that of IL-1.beta., blocking the binding of IL-1.alpha. and IL-1.beta. and inhibiting their bioactivity (Stockman, B. J. et al., Biochemistry 31:5237-5245, 1992; Eisenberg, S. P. et al., Proc. Natl. Acad. Sci. U.S.A. 88:5232-5236, 1991; Carter, D. B. et al., Nature 344:633-638, 1990). IL-Ira is normally present in higher concentrations than IL-1 in plasma, and it has been suggested that IL-Ira levels are a better correlate of disease severity than IL-1 (Biasucci, L. M. et al., Circulation 99:2079-2084, 1999). Furthermore, there is evidence that IL-Ira is an acute phase protein (Gabay, C. et al., J. Clin. Invest. 99:2930-2940, 1997). The normal plasma concentration of IL-Ira is <200 pg/ml (12 pM). The plasma concentration of IL-Ira is elevated in patients with acute myocardial infarction and unstable angina that proceeded to acute myocardial infarction, death, or refractory angina (Biasucci, L. M. et al., Circulation 99:2079-2084, 1999; Latini, R. et al., J. Cardiovasc. Pharmacol. 23:1-6, 1994). Furthermore, IL-Ira was significantly elevated in severe acute myocardial infarction as compared to uncomplicated acute myocardial infarction (Latini, R. et al., J. Cardiovasc. Pharmacol. 23:1-6, 1994). Elevations in the plasma concentration of IL-Ira are associated with any condition that involves activation of the inflammatory or acute phase response, including infection, trauma, and arthritis. Changes in the plasma concentration of IL-1ra appear to be related to disease severity. Furthermore, it is likely released in conjunction with or soon after IL-1 release in pro-inflammatory conditions, and it is found at higher concentrationsthan IL-1. This indicates that IL-1ra may be a useful indirect marker of IL-1 activity, which elicits the production of IL-6.
Interleukin-6 (IL-6) is a 20 kDa secreted protein that is a hematopoietin family proinflammatory cytokine. IL-6 is an acute-phase reactant and stimulates the synthesis of a variety of proteins, including adhesion molecules. Its major function is to mediate the acute phase production of hepatic proteins, and its synthesis is induced by the cytokine IL-1. IL-6 is normally produced by macrophages and T-lymphocytes. The normal serum concentration of IL-6 is <3 pg/ml (0.15 pM). The plasma concentration of IL-6 is elevated in patients with acute myocardial infarction and unstable angina, to a greater degree in acute myocardial infarction (Biasucci, L. M. et al., Circulation 94:874-877, 1996; Manten, A. et al., Cardiovasc. Res. 40:389-395, 1998; Biasucci, L. M. et al., Circulation 99:2079-2084, 1999).
Tumor necrosis factor .alpha. (TNF.alpha.) is a 17 kDa secreted proinflammatory cytokine that is involved in the acute phase response and is a pathogenic mediator of many diseases. TNF.alpha. is normally produced by macrophages and natural killer cells. TNF-alpha is a protein of 185 amino acids glycosylated at positions 73 and 172. It is synthesized as a precursor protein of 212 amino acids. Monocytes express at least five different molecular forms of TNF-alpha with molecular masses of 21.5-28 kDa. They mainly differ by post-translational alterations such as glycosylation and phosphorylation. The normal serumconcentration of TNF.alpha. is <40 pg/ml (2 pM). The plasma concentration of TNF.alpha. is elevated in patients with acute myocardial infarction, and is marginally elevated in patients with unstable angina (Li, D. et al., Am. Heart J. 137:1145-1152, 1999; Squadrito, F. et al., Inflamm. Res. 45:14-19, 1996; Latini, R. et al., J. Cardiovasc. Pharmacol. 23:1-6, 1994; Carlstedt, F. et al., J. Intern. Med. 242:361-365, 1997). Elevations in the plasma concentration of TNF.alpha. are associated with any proinflammatory condition, including trauma, stroke, and infection. TNF.alpha. has a half-life of approximately 1 hour in the bloodstream, indicating that it may be removed from the circulation soon after symptom onset. In patients with acute myocardial infarction, TNF.alpha. was elevated 4 hours after the onset of chest pain, and gradually declined to normal levels within 48 hours of onset (Li, D. et al., Am. Heart J. 137:1145-1152, 1999). The concentration of TNF.alpha. in the plasma of acute myocardial infarction patients exceeded 300 pg/ml (15 pM) (Squadrito, F. et al., Inflamm. Res. 45:14-19, 1996).
Soluble intercellular adhesion molecule (sICAM-1), also called CD54, is a 85-110 kDa cell surface-bound immunoglobulin-like integrin ligand that facilitates binding of leukocytes to antigen-presenting cells and endothelial cells during leukocyte recruitment and migration. sICAM-1 is normally produced by vascular endothelium, hematopoietic stem cells and non-hematopoietic stem cells, which can be found in intestine and epidermis. sICAM-1 can be released from the cell surface during cell death or as a result of proteolytic activity. The normal plasma concentration of sICAM-1 is approximately 250 ng/ml (2.9 nM). Elevations of the plasma concentration of sICAM-1 are associated with ischemic stroke, head trauma, atherosclerosis, cancer, preeclampsia, multiple sclerosis, cystic fibrosis, and other nonspecific inflammatory states (Kim, J. S., J. Neurol. Sci. 137:69-78, 1996; Laskowitz, D. T. et al., J. Stroke Cerebrovasc. Dis. 7:234-241, 1998). The plasma concentration of sICAM can approach 700 ng/ml (8 nM) in patients with acute myocardial infarction (Pellegatta, F. et al., J. Cardiovasc. Pharmacol. 30:455-460, 1997). ICAM-1 is present in atherosclerotic plaques, and may be released into the bloodstream upon plaque rupture.
Vascular cell adhesion molecule (VCAM), also called CD106, is a 100-110 kDa cell surface-bound immunoglobulin-like integrin ligand that facilitates binding of B lymphocytes and developing T lymphocytes to antigen-presenting cells during lymphocyte recruitment. VCAM is normally produced by endothelial cells, which line blood and lymph vessels, the heart, and other body cavities. VCAM-1 can be released from the cell surface during cell death or as a result of proteolytic activity. The normal serum concentration of sVCAM is approximately 650 ng/ml (6.5 nM). Elevations in the plasma concentration of sVCAM-1 are associated with ischemic stroke, cancer, diabetes, preeclampsia, vascular injury, and other nonspecific inflammatory states (Bitsch, A. et al., Stroke 29:2129-2135, 1998; Otsuki, M. et al., Diabetes 46:2096-2101, 1997; Banks, R. E. et al., Br. J. Cancer 68:122-124, 1993; Steiner, M. et al., Thromb. Haemost. 72:979-984, 1994; Austgulen, R. et al., Eur. J. Obstet. Gynecol. Reprod. Biol. 71:53-58, 1997).
Monocyte chemotactic protein-1 (MCP-1) is a 10 kDa chemotactic factor that is a specific marker of the presence of a pro-inflammatory condition that involves monocyte migration. MCP-1 is normally found in equilibrium between a monomeric and homodimeric form, and it is normally produced in and secreted by monocytes and vascular endothelial cells (Yoshimura, T. et al., FEBS Lett. 244:487-493, 1989; Li, Y. S. et al., Mol. Cell. Biochem. 126:61-68, 1993). MCP-1 has been implicated in the pathogenesis of a variety of diseases that involve monocyte infiltration, including psoriasis, rheumatoid arthritis, and atherosclerosis. The normal concentration of MCP-1 in plasma is <0.1 ng/ml. The plasma concentration of MCP-1 is elevated in patients with acute myocardial infarction, and may be elevated in the plasma of patients with unstable angina, but no elevations are associated with stable angina (Soejima, H. et al., J. Am. Coll. Cardiol 34:983-988, 1999; Nishiyama, K. et al., Jpn. Circ. J. 62:710-712, 1998; Matsumori, A. et al., J. Mol. Cell. Cardiol. 29:419-423, 1997). The concentration of MCP-1 in plasma form patients with acute myocardial infarction has been reported to approach 1 ng/ml (100 pM), and can remain elevated for one month (Soejima, H. et al., J. Am. Coll. Cardiol. 34:983-988, 1999).
Cellular Fibronectin, or ED1+. is mainly synthesized by endothelia cells. (See for instance Peters et al. Elevated plasma levels of ED1+ ‘cellular fibronectin’ in patients with vascular injury J Lab Clin Med. 1989. 113:586-597). Because c-Fn is largely confined to the vascular endothelium, high plasma lvels of this molecule might be indicative of endothelial damage. Plasma c-Fn levels have been reported to be increased in patients with vascular injury secondary to vasculitiis, sepsis, acute major trauma, diabetes, and patients with ischemic stroke (see for instance Peters et al. Elevated plasma levels of ED1+‘cellular fibronectin’ in patients with vascular injury J Lab Clin Med. 1989. 113:586-597). It has been reported to associate with the hemorrhagic transformation (see for instance Castellanos et al., Plasma Cellular-Fibronectin concentration predicts hemorrhagic transformation after thrombolytic therapy in acute ischemic stroke, Stroke 2004;35:000-000).
How to Measure Various Markers
One of ordinary skill in the art know several methods and devices for the detection and analysis of the markers of the instant invention. With regard to polypeptides or proteins in patient test samples, immunoassay devices and methods are often used. These devices and methods can utilize labeled molecules in various sandwich, competitive, or non-competitive assay formats, to generate a signal that is related to the presence or amount of an analyte of interest. Additionally, certain methods and devices, such as biosensors and optical immunoassays, may be employed to determine the presence or amount of analytes without the need for a labeled molecule.
Preferably the markers are analyzed using an immunoassay, although other methods are well known to those skilled in the art (for example, the measurement of marker RNA levels). The presence or amount of a marker is generally determined using antibodies specific for each marker and detecting specific binding. Any suitable immunoassay may be utilized, for example, enzyme-linked immunoassays (ELISA), radioimmunoassay (RIAs), competitive binding assays, and the like. Specific immunological binding of the antibody to the marker can be detected directly or indirectly. Direct labels include fluorescent or luminescent tags, metals, dyes, radionuclides, and the like, attached to the antibody. Indirect labels include various enzymes well known in the art, such as alkaline phosphatase, horseradish peroxidase and the like. For an example of how this procedure is carried out on a machine, one can use the RAMP Biomedical device, called the Clinical Reader sup.TM., which uses the fluoresent tag method, though the skilled artisan will know of many different machines and manual protocols to perform the same assay. Diluted whole blood is applied to the sample well. The red blood cells are retained in the sample pad, and the separated plasma migrates along the strip. Fluorescent dyed latex particles bind to the analyte and are immobilized at the detection zone. Additional particles are immobilized at the internal control zone. The fluorescence of the detection and internal control zones are measured on the RAMP Clinical Reader sup.TM., and the ratio between these values is calculated. This ratio is used to determine the analyte concentration by interpolation from a lot-specific standard curve supplied by the manufacturer in each test kit for each assay.
The use of immobilized antibodies specific for the markers is also contemplated by the present invention and is well known by one of ordinary skill in the art. The antibodies could be immobilized onto a variety of solid supports, such as magnetic or chromatographic matrix particles, the surface of an assay place (such as microtiter wells), pieces of a solid substrate material (such as plastic, nylon, paper), and the like. An assay strip could be prepared by coating the antibody or a plurality of antibodies in an array on solid support. This strip could then be dipped into the test sample and then processed quickly through washes and detection steps to generate a measurable signal, such as a colored spot.
The analysis of a plurality of markers may be carried out separately or simultaneously with one test sample. Several markers may be combined into one test for efficient processing of a multiple of samples. In addition, one skilled in the art would recognize the value of testing multiple samples (for example, at successive time points) from the same individual. Such testing of serial samples will allow the identification of changes in marker levels over time. Increases or decreases in marker levels, as well as the absence of change in marker levels, would provide useful information about the disease status that includes, but is not limited to identifying the approximate time from onset of the event, the presence and amount of salvagable tissue, the appropriateness of drug therapies, the effectiveness of various therapies, identification of the severity of the event, identification of the disease severity, and identification of the patient's outcome, including risk of future events.
An assay consisting of a combination of the markers referenced in the instant invention may be constructed to provide relevant information related to differential diagnosis. Such a panel may be constucted using 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, or more or individual markers. The analysis of a single marker or subsets of markers comprising a larger panel of markers could be carried out methods described within the instant invention to optimize clinical sensitivity or specificity in various clinical settings. The clinical sensitivity of an assay is defined as the percentage of those with the disease that the assay correctly predicts, and the specificity of an assay is defined as the percentage of those without the disease that the assay correctly predicts (Tietz Textbook of Clinical Chemistry, 2.sup.nd edition, Carl Burtis and Edward Ashwood eds., W. B. Saunders and Company, p. 496).
The analysis of markers could be carried out in a variety of physical formats as well. For example, the use of microtiter plates or automation could be used to facilitate the processing of large numbers of test samples. Alternatively, single sample formats could be developed to facilitate immediate treatment and diagnosis in a timely fashion, for example, in ambulatory transport or emergency room settings. Particularly useful physical formats comprise surfaces having a plurality of discrete, addressable locations for the detection of a plurality of different analytes. Such formats include protein microarrays, or “protein chips” (see, e.g., Ng and IIag, J. Cell Mol. Med. 6: 329-340 (2002)) and capillary devices.
In accordance with the present invention, a kit is provided for the analysis of markers. Such a kit preferably comprises devices and reagents for the analysis of at least one test sample and instructions for performing the assay, as well as a predictive software-based algorithmic model residing on a computer. Optionally the kits may contain one or more means for using information obtained from immunoassays performed for a marker panel to rule in or out certain diagnoses.
Methodology of Marker Selection, Analysis, and Classification
Non-linear techniques for data analysis and information extraction are important for identifying complex interactions between markers that contribute to overall presentation of the clinical outcome. However, due to the many features involved in association studies such as the one proposed, the construction of these in-silico predictors is a complex process. Often one must consider more markers to test than samples, missing values, poor generalization of results, selection of free parameters in predictor models, confidence in finding a sub-optimal solution and others. Thus, the process for building a predictor is as important as designing the protocol for the association studies. Errors at each step can propagate downstream, affecting the generalizability of the final result.
We now provide an overview of our process of model development, describing the five main steps and some techniques that the instant invention will use to build an optimal biomarker panel of response for each clinical outcome. One of ordinary skill in the art will know that it is best to use a ‘toolbox’ approach to the various steps, trying several different algorithms at each step, and even combining several as in Step Five. Since one does not know a priori the distribution of the true solution space, trying several methods allows a thorough search of the solution space of the observed data in order to find the most optimal solutions (i.e. those best able to generalize to unseen data). One also can give more confidence to predictions if several independent techniques converge to a similar solution.
Data Pre-Processing
After assaying the patients for various markers, it is necessary to perform some basic data ‘inspection’, such as identification of outliers, before starting a program of outcome prediction. Another task is performing data dimensional shifting in the case of discrete data sets such as SNP analysis. For instance, one can describe a three-state SNP vector either three-dimensionally (1,0,0);(0,1,0);(0,0,1) or two-dimensionally (0,0);(1,0);(0,1). For some algorithms, the latter description may have a direct effect on computational cost and classifier accuracy: one can, in effect, collapse several values to a single parameter. The advantage of single parameter is that one can reduce dimensionality with little or no effect on the selection of the optimal feature set. Following pre-processing, one can then perform univariate and multivariate statistical modeling to identify strongly correlative outcome variables and determine a baseline outcome analysis.
Missing Value Estimation
While the call rate and accuracy of high throughput methods are improving, genotype and proteomic data sets usually contain missing values. Missing values arise from missed genotype calls or from the combination of data collected under different protocols. If subsequent analysis requires complete data sets, repeating the experiment can be expensive and removing rows or columns containing missing values in the data set may be wasteful.
Missing values can be replaced with the most likely genotype based on frequency estimates for an individual marker. This row counting method may be sufficient when few markers are genotyped, but it is not optimal for genome wide scans since it does not consider correlation in the data. Other statistical approaches to estimating missing values apply genetic models of inheritance. In large-scale association studies of unrelated participants, lineage information is unavailable. For the dataset gathered in the instant invention, we will apply techniques that do not use complex models and take into account the possibly discrete nature of marker data when models are used. These methods fall into two categories: KNN-based and Bayesian-based methods.
KNN estimates the value of the missing data as the most prevalent genotype among the K Nearest Neighbors. For a data set consisting of M patients and N SNPs, the data is stored in an M by N matrix. For each row with a missing value in a single column, the algorithm locates the K nearest neighbors in the N-1 dimensional subspace. The K nearest neighbors then votes to replace the missing value under majority rule. Ties are broken by random draw. If there are n missing values present in a row, we find the nearest neighbors in the N-n subspace.
The only other consideration is what distance function to use to determine the K nearest neighbors. Typically, the Euclidean distance is well suited for continuous data and the Hamming distance for nominal data. The Hamming distance counts the number of different marker genotypes in the N-n subspace and does not impose an artificial ordinality as does the Euclidean distance. There are other options such as the Manhattan distance, the correlation coefficient, and others that may be used depending on the data set distribution.
In contrast, Bayesian imputation uses probabilities instead of distances to infer missing values. The objective is to draw an inference about a missing value for a matrix entry in the data set from the posterior probability of the missing value given the observed data, □(Ymiss|Yobs), where Yobs is the set of N-n observed marker values and Ymiss is the missing value. By Bayes's theorem, □(Ymiss|Yobs) can be expressed as follows:
where π(Ymiss) is the probability that a randomly selected missing entry will have the value Ymiss, π(Yobs|Ymiss) is the probability of observing the N-n genotypes given Ymiss, and the sum is over the m possible values for Ymiss.
The likelihood model assumes that the probabilities π(Yobs|Ymiss) can be expressed as functions of unknown parameters of the genotypes Ymiss:
where θik are unknown parameters of Ymiss for the N-n observed markers, ygi is the i th marker in the set of Yobs markers, and θ(ygi|θik) is the probability of observing ygi given the parameter θik of the marker value Ymiss for variable i. The model is based on the assumption that the probability of observing ygi is independent of the probability of observing ygj for each marker value Ymiss with i≠j.
Missing values are imputed as follows. For each marker for which there is a missing value, the probabilities θ(ygi|θik) are estimated based on the observed markers. Using Bayes' theorem, the posterior probability θ(Ymiss|Yobs) is calculated. We then sample Ymiss from the posterior. This approach treats the missing value problem as a supervised learning problem in which posterior probability is learned from the pattern of observed markers.
Feature Selection
Following missing value replacement, the third step in the predictive panel building process is to perform feature selection on the dataset; this is perhaps the most important step in the predictor development process. Feature selection serves two purposes: (1) to reduce dimensionality of the data and improve classification accuracy, and (2) to identify biomarkers that are relevant to the cause and consequences of disease and drug response.
A feature selection algorithm (FSA) is a computational solution that given a set of candidate features selects a subset of relevant features with the best commitment among its size and the value of its evaluation measure. However, the relevance of a feature, as seen from the classification perspective, may have several definitions depending on the objective desired. An irrelevant feature is not useful for classification, but not all relevant features are necessarily useful for classification.
Another problem from which many classification methods suffer is the curse of dimensionality. That is, as the number of features in a classification task increases, the time requirements for an algorithm grow dramatically, sometimes exponentially. Therefore, when the set of features in the data is sufficiently large, many classification algorithms are simply intractable. This problem is further exacerbated by the fact that many features in a learning task may either be irrelevant or redundant to other features with respect to predicting the class of an instance. In this context, such features serve no purpose except to increase classification time.
FSAs can be divided into two categories based on whether or not feature selection is done independently of the learning algorithm used to construct the classifier. If the feature selection is independent of the learning algorithm, the technique is said to follow a filter approach. Otherwise, it is said to follow a wrapper approach. While the filter approach is generally computationally more efficient than the wrapper approach, a drawback is that an optimal selection of features may not be independent of the inductive and representational biases of the learning algorithm to be used to construct the classifier.
SFS/SBS
A sequential forward search (SFS), or backward (SBS), is a process that uses an iterative technique for feature selection. In this wrapper technique, one feature at a time is added (SFS) or deleted (SBS) to a set of pre-selected features, and iterated according to a performance metric until the ‘optimal’ set of features are obtained. For example, SFS is a technique that starts with all possible two-variable input combinations from the entire data set and then builds, one variable at a time, until an optimally performing combination of variables is identified. For instance, with 9 input variables labeled 1-9 (each with a binary descriptor), the two-variable combinations would comprise 1|2, 1|3, 1|4, 1|5, 1|6, 1|7, 1|8, 1|9, 2|3, 2|4, 2|5, 2|6 . . . 8|9. These input combinations are each used in training a classifier using the collected data. The combinations that perform the best (evaluated using leave-one-out cross validation; top 10%, for example) are selected for continued addition of variables. Let us say that 2|3 is selected as one of the top performers, it would then be coupled to each of the other variables, not including those variables that are already included in the combination. This would result in 2|3|1, 2|3|4, 2|3|5, 2|3|6, 2|3|7, 2|3|8 and 2|3|9. This coupling is performed for all of the top two-variable performers. The resultant three-variable input combinations are used to train a classifier using the collected data and then evaluated. The top performers are selected and then coupled again with all variables in the group, again used to train a classifier. This is repeated until a maximal predictive accuracy is achieved. In our experience we have noticed a well defined ‘hump’ at the point where the addition of variables into the system results begins to contribute to degradation of system performance.
SBS starts with the full set of features and eliminates those based upon a performance metric. Although in theory, going backward from the full set of features may capture interacting features more easily, the drawback of this method is that it is computationally expensive.
An example of this is described in U.S. patent application Ser. No. 09/611,220, incorporated in entirety with all figures by reference, which uses a variation on the SBS technique. In this method, a Genetic Algorithm (please see section on classifiers) is used in combination with a neural network to create and select child features based upon a fitness ranking that takes into effect multiple performance measures such as sensitivity and specificity. Only top-ranked child features are used in iterating the algorithm forward.
SFFS
The SFS algorithm suffers from a so-called nesting effect. That is, once a feature has been chosen, there is no way for it to be discarded. To overcome this problem, the sequential forward floating algorithm (SFFS) was proposed. SFFS is an exponential cost algorithm that operates in a sequential manner. In each selection step SFFS performs a forward step followed by a variable number of backward ones. In essence, a feature is first unconditionally added and then features are removed as long as the generated subsets are the best among their respective size. The algorithm is so-called because it has the characteristic of floating around a potentially good solution of the specified size.
E-RFE
The Recursive Feature Elimination (RFE) is a well-known feature selection method for support vector machines (SVMs, please see section on classifiers). As a brief overview, a SVM realizes a classification function
f(x)=Σi=1NαiγiK(xi,x)+b,
where the coefficients α=(αi) and b are obtained by training over a set of examples S={(xi, yi} I=1, . . . ,N, xi ε Rn, yi ε {−1, 1} and) K(xix) is the chosen kernel. In the linear case, the SVM expansion defines the hyperplane
f(x)=<w,x>+b, with w=Σi=1Nαiγixi.
The idea is to define the importance of a feature for a SVM in terms of its contribution to a cost function J(α). At each step of the RFE procedure, a SVM is trained on the given data set, J is computed and the feature less contributing to J is discarded. In the case of linear SVM, the variation due to the elimination of the i-th feature is δJ(i)=wi2; in the non linear case, δJ(i)=1/2αtZ1/2αtZ(−i) where Zi,j=yiyj K(xi, xj). The heavy computational cost of RFE is a function of the number of variables, as another SVM must be trained each time a variable is removed. In the standard RFE algorithm we would eliminate just one of the many features corresponding to a minimum weight, while it would be convenient to remove all of them at once. We will go further in the instant invention by developing an ad hoc strategy for an elimination process based on the structure of the weight distribution. This strategy was first described by Furlanello (24). We introduce an entropy function H as a measure of the weight distribution. To compute the entropy, we split the range of the weights, normalized in the unit interval, into nint intervals (with nint={square root}{square root over (#R)}), and we compute for each interval the relative frequencies
Entropy is then defined as the following function:
The following inequality immediately descends from the definition of entropy: 0≦H≦log2nint, the two bounds corresponding to the situations:
URG
One filter method especially suited for ordinal data has been developed recently by the authors of the instant invention, and offers clearly interpretable results on such data. The feature selection aspect, tentatively named URG, or Universal Regressor Gauge, is a general method for scoring and ranking the predictive sensitivity of input variables by fitting the gauge, or the scaling, on each of the input variables subject to both predictive accuracy of a nonparametric regression, and a penalty on the L1 norm of the vector of scaling parameters. The result is a sampled-gradient local minimum solution that does not require assumptions of linearity or exhaustive power-set sampling of subsets of variables. The approach penalizes the gauge θ, or the set of scaling parameters (θ1, θ2, . . . , θn), applied to each of the input variables. The authors of the instant invention generalized this method to potentially nonlinear, nonparametric models of arbitrary complexity using a kernel-based nonparametric regressor. The penalty on the gauge is regularized by a coefficient □ that is scanned across a range of values to put progressively more downward pressure on the scaling parameters, forcing the scale (and the resulting significance in distance-based regression) downward first on those variables that can be most easily eliminated without sacrificing accuracy. Because this process is analog in the state-space of the gauge, nonlinear interactions between subsets can be investigated in a continuous manner, even if the variables themselves are discrete-valued.
Other FSAs complentated, but not limited to, to be used in the instant invention include HITON Markov Blankets and Bayesian filters.
Classification
The fourth step in the predictor-building process is classification. In the supervised learning task, one is given a training set of labeled fixed-length feature vectors, from which to induce a classification model. This model, in turn, is used to predict the class label for a set of previously unseen instances. Thus, in building a classification model, the information about the class that is inherent in the features is of utmost importance. The dataset that the classifier is trained upon is broken up generally into three different sets: Training, Testing, and Evaluation. This is required since when using any classifier, the use of distinct subsets of the available data for training and testing is required to ensure generalizability. The parameters of the classifier are set with respect to the training data set, and judged versus competitors on the testing data set, and validated on the evaluation data set. To avoid over-training (i.e., memorization of features in a specific data set that are not applicable in a general manner) this succession of training steps is discontinued when the error on the validation set begins to increase significantly. We use the error on the evaluation data set as an estimate of how well we can expect our classifier to perform on new testing data as it becomes available. This estimate can be measured by 10× leave-one-out-cross-validation on the evaluation set (100× in cases of low sample number), or batch evaluation on larger data sets.
Classifiers complimentated for the instant invention include, but are not limited to, neural networks, support vector machines, genetic algorithms, kernel-based methods, and tree-based methods.
Neural Networks
One tool to use construct classifiers is that of a mapping neural network. The flexibility of neural nets to generically model data is derived through a technique of “learning”. Given a list of examples of correct input/output pairs, a neural net is trained by systematically varying its free parameters (weights) to minimize its chi-squared error in modeling the training data set. Once these optimal weights have been determined, the trained net can be used as a model of the training data set. If inputs from the training data are fed to the neural net, the net output will be roughly the correct output contained in the training data. The nonlinear interpolatory ability manifests itself when one feeds the net sets of inputs for which no examples appeared in the training data. A neural net “learns” enough features of the training data set to completely reproduce it (up to a variance inherent to the training data); the trained form of the net acts as a black box that produces outputs based on the training data.
Neural networks typically have a number of ad hoc parameters, such as selection of the number of hidden layers, the number of hidden-layer neurons, parameters associated with the learning or optimization technique used, and in many cases they require a validation set for a stopping criterion. In addition, neural network weights are trained iteratively, producing problems with convergence to local minima. We have developed several types of neural networks that solve these problems. Our solutions involve nonlinearly transforming the input pattern fed into the neural network. This transformation is equivalent to feature selection (though one still needs as many inputs into the classifier) and can be quite powerful when combined with the independent feature selection techniques previously described.
Genetic Algorithms
Genetic algorithms (GAs) typically maintain a constant sized population of individual solutions that represent samples of the space to be searched. Each individual is evaluated on the basis of its overall “fitness” with respect to the given application domain. New individuals (samples of the search space) are produced by selecting high performing individuals to produce “offspring” that retain features of their “parents”. This eventually leads to a population that has improved fitness with respect to the given goal.
New individuals (offspring) for the next generation are formed by using two main genetic operators: crossover and mutation. Crossover operates by randomly selecting a point in the two selected parents gene structures and exchanging the remaining segments of the parents to create new offspring. Therefore, crossover combines the features of two individuals to create two similar offspring. Mutation operates by randomly changing one or more components of a selected individual. It acts as a population perturbation operator and is a means for inserting new information into the population. This operator prevents any stagnation that might occur during the search process.
GAs have demonstrated substantial improvement over a variety of random and local search methods. This is accomplished by their ability to exploit accumulating information about an initially unknown search space in order to bias subsequent search into promising subspaces. Since GAs are basically a domain independent search technique, they are ideal for applications where domain knowledge and theory is difficult or impossible to provide.
SVMs
The key idea behind support vector machines (SVMs, Vapnik, 1995) is to map input vectors (i.e., patient-specific data) into a high dimensional space, and to construct in that space hyperplanes with a large margin. These hyperplanes can be thought of as boundaries separating the categories of the dataset, in this case response and non-response. The support vector machine solution proposes to find the hyperplane separating the classes. This plane is determined by the parameters of a decision function, which is used for classification. The SVM is based on the fact that there is a unique separating hyperplane that maximizes the margin between the classes.
The task of finding the hyperplane is reduced to minimizing the Lagrangian, a function of the margin and constraints associated with each input vector. The constraints depend only on the dot product of an input element and the solution vector. In order to minimize the Langrangian, the Lagrange multipliers must either satisfy those constraints or be exactly zero. Elements of the training set for which the constraints are satisfied are the so-called support vectors. The support vectors parameterize the decision function and lie on the boundaries of the margin separating the classes.
In many cases, SVMs are typically more accurate, give greater data understanding, and are more robust than other machine learning methods. Data understanding comes about because SVMs extract support vectors, which as described above are the borderline cases. Exhibiting such borderline cases allow us to identify outliers, to perform data cleaning, and to detect confounding factors. In addition, the margins of the training examples (how far they are from the decision boundary) provide useful information about the relevance of input variables, and allow the selection of the most predictive variable. SVMs are often successful even with sparse data (few examples), biased data (more examples of one category), redundant data (many similar examples), and heterogeneous data (examples coming from different sources). However, they are known to work poorly on discrete data.
In another preferred embodiment of the present invention, regression techniques are used to deliver a diagnostic or prognostic prediction using the markers declared previously. These are well-known by those of ordinary skill in the art, however a short discussion follows. For more detail, one is referred to Kleinbaum et al., Applied Regression Analysis and Multivariable Methods, Third Edition, Duxbury Press, 1998.
In the discussion of weighted least squares a need was found for a method to fit Y to more than one X. Further, it is common that the response variable Y is related to more than one regressor variable simultaneously. If a valid description of the relationship between Y and any of these response variables is to be obtained, all must be considered. Also, exclusion of any important regressor variables will adversely affect predictions of Y. In general, the equation to be considered becomes
Y=b0+b1X1+b2X2+ . . . +bKXK
The Xs may be any relevant regressor variables. Often one X is a (nonlinear) transformation of another. For example, X 2=ln (X 1).
When dealing with multiple linear regression, fits to data are no longer lines. For example, with K=2, the resulting fit would describe a plane in three dimensional space with “slopes” bhat 1 and bhat 2 intersecting the Y axis at bhat 0. Beyond K=2 the resulting fit becomes difficult to visualize. The terminology regression surface is often used to describe a multiple linear regression fit.
Assumptions required for application of least squares methodology to multiple linear regression equations are similar to those cited for the simple linear case. For example, the true relationship between Y and the various Xs must be as given by the linear equation and the spread of the errors must be constant across values of all Xs. Also, a limit exists to the number of Xs that can be considered. Specifically, K+1 must be less than or equal to the sample size n for a unique set of bhats to be found.
In theory, least squares estimates of b 0, . . . , b K are found just as in the simple linear case. The estimates bhat 0, . . . , bhat K are the solution from minimizing sum (Yi−b0−b1X1− . . . −bkXki)sup2.
The description of the resulting equations and associated summary statistics is best made using matrix algebra. The computations are best carried out using a computer.
The relationship between Y and X or Y and several Xs is not always linear in form despite transformations that can be applied to resulted in a linear relationship. In some instances such a transformation may not exist and in others theoretical concerns may require analysis to be carried out with the untransformed equation.
Least squares methodology can be used to solve nonlinear regression problems. For the above equation the least squares estimates of the parameters would be the solution of the minimization of sum(W−A (1−e sup Bt)sup C)sup 2.
Application of calculus leads to three equations whose solution requires an iterative technique. For all but the simplest of cases, solving nonlinear least squares problems involves use of computer-based algorithms. A multitude of such algorithms exist emphasizing the number of problems whose valid solution requires the nonlinear least squares technique.
Several variations of nonlinear regression exist, which one of ordinary skill in the art will be aware. One preferred case in the present invention is the use of deterministic greedy algorithms for building sparse nonlinear regression models from observational data. In this embodiment, the objective is to develop efficient numerical schemes for reducing the training and runtime complexities of nonlinear regression techniques applied to massive datasets. In the spirit of Natarajan's greedy algorithm (Natarajan, 1995), the procedure is to iteratively minimize a loss function subject to a specified constraint on the degree of sparsity required of the final model or an upper bound on the empirical error. There exist various greedy criteria for basis selection and numerical schemes for improving the robustness and computational efficiency of these algorithms.
In another preferred embodiment of the present invention, a kernel-based method is trained to deliver a diagnostic or prognostic prediction using the markers declared previously. One such method is Kernel Fisher's Discriminant (KFD). Fisher's discriminant (Fisher, 1936) is a technique to find linear functions that are able to discriminate between two or more classes. Fisher's idea was to look for a direction w that separates the class means values well (when projected onto the found direction) while achieving a small variance around these means. The hope is that it is easy to differentiate between either of the two classes from this projection with a small error. The quantity measuring the difference between the means is called between class variance and the quantity measuring the variance around these class means is called within class variance, respectively. The goal is to find a direction that maximizes the between class variance while minimizing the within class variance at the same time. As this technique has been around for almost 70 years it is well known and widely used to build classifiers.
Unfortunately, as previously discussed, many biological datasets are not solvable using linear techniques. Therefore, one of the classifiers we use is a non-linear variant of Fisher's discriminant. This non-linearization is made possible through the use of kernel functions, a “trick” that is borrowed from support vector machines (Boser et al., 1992). Kernel functions represent a very principled and elegant way of formulating non-linear algorithms, and the findings that are derived from using them have clear and intuitive interpretations.
In the KFD technique (Mika, 1999), one first maps the data into some feature space F through some non-linear mapping Φ. One then computes Fisher's linear discriminant in this feature space, thus implicitly yielding a non-linear discriminant in input space. In a methodology similar to SVMs, this mapping is defined in terms of a kernel function k(x,y)=(Φ(x)·Φ(y)). The training examples (i.e. the data vector containing all marker values for each patient) can in turn be expanded in terms of this kernel function as well. From this relationship one can write a formulation of the between and within class variance in terms of dot products of the kernel function and training patterns and thus find Fisher's linear discriminant in F by maximizing the ratio of these two quantities.
In another preferred embodiment of the present invention, an algorithm using Bayesian learning is trained to deliver a diagnostic or prognostic prediction using the markers declared previously. See Pearl, J. (1988). Probabilistic Reasoning in Intelligent Systems: networks of plausible inference, Morgan Kaufmann, for an overview of Bayesian learning.
While Bayesian networks (BNs) are powerful tools for knowledge representation and inference under conditions of uncertainty, they were not considered as classifiers until the discovery that Naïve-Bayes, a very simple kind of BNs that assumes the attributes are independent given the class node, are surprisingly effective. See Langley, P., Iba, W. and Thompson, K. (1992). An analysis of Bayesian classifiers. In Proceedings of AAAI-92 pp. 223-228.
A Bayesian network B is a directed acyclic graph (DAG), where each node N represents a domain variable (i.e., a dataset attribute), and each arc between nodes represents a probabilistic dependency, quantified using a conditional probability distribution (CP table) for each node n.sub.i. A BN can be used to compute the conditional probability of one node, given values assigned to the other nodes; hence, a BN can be used as a classifier that gives the posterior probability distribution of the class node given the values of other attributes. A major advantage of BNs over many other types of predictive models, such as neural networks, is that the Bayesian network structure represents the inter-relationships among the dataset attributes. One of ordinary skill in the art can easily understand the network structures and if necessary modify them to obtain better predictive models. By adding decision nodes and utility nodes, BN models can also be extended to decision networks for decision analysis. See Neapolitan, R. E. (1990), Probabilistic reasoning in expert systems: theory and algorithms, John Wiley & Sons.
Applying Bayesian network techniques to classification involves two sub-tasks: BN learning (training) to get a model and BN inference to classify instances. Learning BN models can be very efficient. As for Bayesian network inference, although it is NP-hard in general (See for instance Cooper, G. F. (1990) Computational complexity of probabilistic inference using Bayesian belief networks, In Artificial Intelligence, 42 (pp. 393-405).), it reduces to simple multiplication in a classification context, when all the values of the dataset attributes are known.
The two major tasks in learning a BN are: learning the graphical structure, and then learning the parameters (CP table entries) for that structure. One skilled in the art knows it is easy to learn the parameters for a given structure that are optimal for a given corpus of complete data, the only step being to use the empirical conditional frequencies from the data.
There are two ways to view a BN, each suggesting a particular approach to learning. First, a BN is a structure that encodes the joint distribution of the attributes. This suggests that the best BN is the one that best fits the data, and leads to the scoring based learning algorithms, that seek a structure that maximizes the Bayesian, MDL or Kullback-Leibler (KL) entropy scoring function. See for instance Cooper, G. F. and Herskovits, E. (1992). A Bayesian Method for the induction of probabilistic networks from data. Machine Learning, 9 (pp. 309-347). Second, the BN structure encodes a group of conditional independence relationships among the nodes, according to the concept of d-separation. See for instance Pearl, J. (1988). Probabilistic Reasoning in Intelligent Systems: networks of plausible inference, Morgan Kaufmann. This suggests learning the BN structure by identifying the conditional independence relationships among the nodes. These algorithms are referred as Cl-based algorithms or constraint-based algorithms. See for instance Cheng, J., Bell, D. A. and Liu, W. (1997a). An algorithm for Bayesian belief network construction from data. In Proceedings of AI & STAT'97 (pp. 83-90), Florida.
Friedman et al. (1997) show theoretically that the general scoring-based methods may result in poor classifiers since a good classifier maximizes a different function -viz., classification accuracy. Greiner et al. (1997) reach the same conclusion, albeit via a different analysis. Moreover, the scoring-based methods are often less efficient in practice. The preferred embodiment is Cl-based learning algorithms to effectively learn BN classifiers.
The present invention envisions using, but is not limited to, the following five classes of BN classifiers: Naïve-Bayes, Tree augmented Naïve-Bayes (TANs), Bayesian network augmented Naïve-Bayes (BANs), Bayesian multi-nets and general Bayesian networks (GBNs). By use of this methodology it is possible to build a predictive model of the data.
These models can be put on firm theoretical foundations of statistics and probability theory, i.e. in a Bayesian setting. The computation required for inference in these models include optimization or marginalisation over all free parameters in order to make predictions and evaluations of the model. Inference in all but the very simplest models is not analytically tractable, so approximate techniques such as variational approximations and Markov Chain Monte Carlo may be needed. Models include probabilistic kernel based models, such as Gaussian Processes and mixture models based on the Dirichlet Process.
Ensemble Networks
The final step in predictor development, assembly of committee, or ensemble, networks. It is common practice to train many different candidate networks and then to select the best, on the basis of performance on an independent validation set, for instance, and to keep this network, discarding the rest. There are two disadvantages to this approach. First, the effort involved in training the remaining networks is wasted. Second, the generalization performance on the validation set has a random component due to noise on the data, and so the network that had the best performance on the validation set might not be the one with the best performance on the new test set.
These drawbacks can be overcome by combining the networks together to form a committee. This can lead to significant improvements in the predictions on new data while involving little additional computational effort. In fact, the performance of a committee can be better than the performance of the best single network in isolation. The error due to the committee can be shown to be:
ECOM=1/L EAV
Where L is the number of committee members and EAV the average error contributed to the prediction by a single member of the committee. Typically, some useful reduction in error is obtained, and the method is trivial to implement.
The challenging problem of integration is to decide which one(s) of the classifiers to rely on or how to combine the results produced by the base classifiers. One of the most popular and simplest techniques used is called majority voting. In the voting technique, each base classifier is considered as an equally weighted vote for that particular prediction. The classification that receives the largest number of votes is selected as the final classification (ties are solved arbitrarily). Often, weighted voting is used: each vote receives a weight, which is usually proportional to the estimated generalization performance of the corresponding classifier. Weighted Voting (UV) works usually much better than simple majority voting.
Boosting Networks
Boosting has been found to be a powerful classification technique with remarkable success on a wide variety of problems, especially in higher dimensions. It aims at producing an accurate combined classifier from a sequence of weak (or base) classifiers, which are fitted to iteratively reweighted versions of the data.
In each boosting iteration, m, the observations that have been misclassified at the previous step have their weights increased, whereas the weights are decreased for those that were classified correctly. The mth weak classifier f(m) is thus forced to focus more on individuals that have been difficult to classify correctly at earlier iterations. In other words, the data is re-sampled adaptively so that the weights in the re-sampling are increased for those cases most often misclassified. The combined classifier is equivalent to a weighted majority vote of the weak classifiers.
Entropy-Based
One efficient way to construct an ensemble of diverse classifiers is to use different feature subsets. To be effective, an ensemble should consist of high-accuracy classifiers that disagree on their predictions. To measure the disagreement of a base classifier and the whole ensemble, we calculate the diversity of the base classifier over the instances of the validation set as an average difference in classifications of all possible pairs of classifiers including the given one. A measure of this is based on the concept of entropy:
where N is the number of instances in the data set, S is the number of base classifiers, I is the number of classes, and NkI is the number of base classifiers that assign instance i to class k.
In the following, the invention will be explained in further detail with reference to the drawings, in which:
In accordance with the present invention, there are provided methods and apparatus for the identification and use of a panel of markers for the diagnosis of cardiovascular illness, particularity stroke and its sub-types.
Method for Defining Panels of Markers
In practice, data may be obtained from a group of subjects. The subjects may be patients who have been tested for the presence or level of certain markers. Such markers and methods of patient extraction are well known to those skilled in the art. A particular set of markers may be relevant to a particular condition or disease. The method is not dependent on the actual markers. The markers discussed in this document are included only for illustration and are not intended to limit the scope of the invention. Examples of such markers and panels of markers are described in the instant invention and the incorporated references.
Well-known to one of ordinary skill in the art is the collection of patient samples. A preferred embodiment of the instant invention is that the samples come from two or more different sets of patients, one a disease group of interest and the other(s) a control group, which may be healthy or diseased in a different indication than the disease group of interest. For instance, one might want to look at the difference in blood-borne markers between patients who have had stroke and those who had stroke mimic to differentiate between the two populations.
The blood samples are assayed, and the resulting set of values are put into a database, along with outcome, also called phenotype, information detailing the illness type, for instance stroke mimic, once this is known. Additional clinical details such as time from onset of symptoms and patient physiological, medical, and demographics, the sum total called patient characteristics, are put into the database. The time from onset is important to know as initial marker values from onset of symptoms can change significantly over time on a timeframe of tens of minutes. Thus, a marker may be significant at one point in the patient history and not at another in predicting diagnosis or prognosis of cardiovascular disease, damage or injury. The database can be simple as a spreadsheet, i.e. a two-dimensional table of values, with rows being patients and columns being filled with patient marker and other characteristic values.
From this database, a computerized algorithm can first perform pre-processing of the data values. This involves normalization of the values across the dataset and/or transformation into a different representation for further processing. The dataset is then analyzed for missing values. Missing values are either replaced using an inputation algorithm, in a preferred embodiment using KNN or MVC algorithms, or the patient attached to the missing value is exised from the database. If greater than 50% of the other patients have the same missing value then value can be ignored.
Once all missing values have been accounted for, the dataset is split up into three parts: a training set comprising 33-80% of the patients and their associated values, a testing set comprising 10-50% of the patients and their associated values, and a validation set comprising 1-50% of the patients and their associated values. These datasets can be further sub-divided or combined according to algorithmic accuracy. A feature selection algorithm is applied to the training dataset. This feature selection algorithm selects the most relevant marker values and/or patient characteristics. Preferred feature selection algorithms include, but are not limited to, Forward or Backward Floating, SVMs, Markov Blankets, Tree Based Methods with node discarding, Genetic Algorithms, Regression-based methods, kernel-based methods, and filter-based methods.
Feature selection is done in a cross-validated fashion, preferably in a naïve or k-fold fashion, as to not induce bias in the results and is tested with the testing dataset. Cross-validation is one of several approaches to estimating how well the features selected from some training data is going to perform on future as-yet-unseen data and is well-known to the skilled artisan. Cross validation is a model evaluation method that is better than residuals. The problem with residual evaluations is that they do not give an indication of how well the learner will do when it is asked to make new predictions for data it has not already seen. One way to overcome this problem is to not use the entire data set when training a learner. Some of the data is removed before training begins. Then when training is done, the data that was removed can be used to test the performance of the learned model on “new” data.
Once the algorithm has returned a list of selected markers, one can optimize these selected markers by applying a classifer to the training dataset to predict clinical outcome. A cost function that the classifier optimizes is specified according to outcome desired, for instance an area under receiver-operator curve maximizing the product of sensitivity and specificity of the selected markers, or positive or negative predictive accuracy. Testing of the classifier is done on the testing dataset in a cross-validated fashion, preferably naïve or k-fold cross-validation. Further detail is given in U.S. patent application Ser. No. 09/611,220, incorporated by reference. Classifiers map input variables, in this case patient marker values, to outcomes of interest, for instance, prediction of stroke sub-type. Preferred classifiers include, but are not limited to, neural networks, Decision Trees, genetic algorithms, SVMs, Regression Trees, Cascade Correlation, Group Method Data Handling (GMDH), Multivariate Adaptive Regression Splines (MARS), Multilinear Interpolation, Radial Basis Functions, Robust Regression, Cascade Correlation+Projection Pursuit, linear regression, Non-linear regression, Polynomial Regression, Regression Trees, Multilinear Interpolation, MARS, Bayes classifiers and networks, and Markov Models, and Kernel Methods.
The classification model is then optimized by for instance combining the model with other models in an ensemble fashion. Preferred methods for classifier optimization include, but are not limited to, boosting, bagging, entropy-based, and voting networks. This classifier is now known as the final predictive model. The predictive model is tested on the validation data set, not used in either feature selection or classification, to obtain an estimate of performance in a similar population.
The predictive model can be translated into a decision tree format for subdividing the patient population and making the decision output of the model easy to understand for the clinician. The marker input values might include a time since symptom onset value and/or a threshold value. Using these marker inputs, the predictive model delivers diagnositic or prognostic output value along with associated error. The instant invention anticipates a kit comprised of reagents, devices and instructions for performing the assays, and a computer software program comprised of the predictive model that interprets the assay values when entered into the predictive model run on a computer. The predictive model receives the marker values via the computer that it resides upon.
Once patients are exhibiting symptoms of cardiovascular illness, for instance stroke, a blood sample is drawn from the patient using standard techniques well known to those of ordinary skill in the art and assayed for various blood-borne markers of cardiovascular illness. Assays can be preformed through immunoassays or through any of the other techniques well known to the skilled artisan. In a preferred embodiment, the assay is in a format that permits multiple markers to be tested from one sample, such as the Luminex plafform.TM. and/or in a rapid fashion, defined to be under 30 minutes and in the most preferred enablement of the instant invention, under 15 minutes. The values of the markers in the samples are inputed into the trained, tested, and validated algorithm residing on a computer, which outputs to the user on a display and/or in printed format on paper and/or transmits the information to another display source the result of the algorithmn calculations in numerical form, a probability estimate of the clinical diagnosis of the patient. There is an error given to the probability estimate, in a preferred embodiment this error level is a confidence level. The medical worker can then use this diagnosis to help guide treatment of the patient.
Selection of Markers for Detecting Stroke and its Sub-Types
Samples from healthy patients, patients diagnosed with stroke and further diagnosed with a stroke sub-type and patients diagnosed with stroke mimic were assayed for a variety of markers. The goal of this investigation was to discover biological markers that would allow a clinician to determine whether a patient had a stroke and, if so, to perform a differential diagnosis. The research conducted was characterized by three aims.
To achieve the first aim of computing high performance predictive models, a series of models were developed for each selected target clinical classification using various combinations of dataset compositions, data preparation methods, and classification methods. The best of these approaches for each target diagnosis classification problem and dataset composition were then compared and the best dataset preparation method for each selected target clinical classification was used for further feature selection.
The following defines the classification problem and describes the predictive modeling methods used for the solving these classification problems. The target classification diagnosis, dataset compositions, data preparation, classification method, and feature selection method details are described herein.
Target Classification Diagnoses
Keeping in context with the goal of a stroke diagnostic and a differential diagnostic, the following target classifications were sought.
1) Stroke Diagnoses
Dataset Composition
Several different predictor dataset compositions were considered to provide maximum feature scope for modeling the different clinical classification targets.
Details of the different dataset compositions are given
The compositions of the different data sets are given in
Data Preparation Methods
A total of five imputation and four normalization methods were explored. Each of these methods is described in
Classification Methods
The classification methods employed were based on a variety of different theories to ensure maximum breadth of predictive modeling. The classification methods are described in terms of learner, learner tuning parameters, and metrics.
Learners
We employed the following methods to investigate the classification sets.
Details about parameter ranges and their optimization can be found in the section on [0215] Learner Tuning-Parameters.
Learner Tuning-Parameters
Support Vector Machine Parameters
Metrics
We used the area under the ROC curve (approximated via the trapezoidal rule). For the first pass scoring, we used 10 fold cross validation of single pass ROC curve construction. For the evaluation of the final selected models we used 10 fold cross validation of 50 fold bootstrapped ROC curve construction with smoothing.
Feature Selection Methods
Two main families of feature selection methods were used: the first is Markov Blanket techniques and the second is SVM-based feature selection. Performance can be found in
Experimental Design
Initial Investigation Using all Features
All combinations of data preparation methods were employed without feature selection for all target classification diagnosis and dataset compositions utilizing SVM learners. The summary of imputation and normalization methods is given in
We used a nested 10-fold cross-validation design with the outer loop estimating the performance of the best models in the inner loop, and the inner loop was used to optimize parameters for each learner.
Feature Selection
The best imputation and normalization per target classification-dataset combination was identified and used for feature selection analysis subsequently. The preliminary analyses results used to identify optimal combinations of imputation and normalization are described in
Again, we used a nested 10-fold cross-validation design with the outer loop estimating the performance of the best models in the inner loop, and the inner loop was used to optimize parameters for each learner.
Results
The summary of performance estimation (via cross-validation) of the final predictive models can be found in FIGS. 8-23: Performance with feature selection.
Discussion
Summary of Results—Main Observations
(a) The analyses produced support an optimistic view of the discernability of stroke and its variants using classifiers built from datasets such as the ones used in the present analysis. For all tasks and datasets, classification performances of the best classifiers almost always exceed 0.8 and often exceed 0.9. The Ischemic (Non-TIA)—Hemorrhagic task appears to be the most challenging classification target of all four but yielded the best result.
(b) Inclusion of clinical and demographic variables uniformly increases classification performance. This leads us to believe that the results obtained with the proteomic markers alone are not a by-product of biased collection of such markers among clinical strata of the population with increased or decreased strike prevalence. In other words, the good results obtained do not appear to be the results of inadequate pre-analytic control.
(c) Application of feature selection methods does not degrade substantially classification performance (in some cases performance is increased) while the size of necessary predictors ranges from 50% to −10% of the original depending on the task and method used. The typical pattern is that SVM-based feature selection achieves the best classification performance, while Markov Blanket methods provide the largest reduction in predictor set size with small losses in classification performance relative to SVM-based methods or the full predictor set.
(d) We emphasize that in case that feature selection results are used to identify biomarkers suitable for drug development, the inductive biases of SVM-feature selection and Markov Blanket feature selection are inherently different and should be interpreted differently in guiding drug-development or other biological experimentation.
(e) Using an specificity-sensitivity optimized ROC AUC as way to judge performance, it was found that a four- or five marker panel was the best differentiator of ischemic stroke versus hemorrhagic stroke with an ROC AUC value of 0.95, a four, five or seven marker panel was the best differentiator of ischemic stroke versus stroke mimic with an ROC AUC value of 0.91, a five or seven marker panel was the best differentiator of stroke versus stroke mimic with an ROC AUC value of 0.93, a five or six marker panel was the best differentiator of stroke versus non-stroke with an ROC AUC value of 0.93, and a three or four marker panel was the best differentiator of ischemic stroke versus normal patients of similar age with an ROC AUC value of 0.99. All results were from naïve data.
Hypertension Background
Hypertension is the presence of elevated pressure within the heart and blood vessels that places the patient at increased risk for damage to a number of organs. The risk of complications, such as heart failure, heart attack, kidney failure, blindness, stroke, and death increases as the pressure rises and as tissue is damaged.
It is estimated that as many as 50 million Americans aged 6 and older suffer from hypertension, leading to deaths of 42,565 Americans and contributed to the deaths of about 210,000 in 1997. The total cost to the U.S. economy is estimated to be $19 billion a year as of 1996, growing at a rate of 10% a year (see for instance http://www.niddk.nih.gov/health/nutrit/pubs/statobes.htm).
Of the 23.4 million Americans who take anti-hypertensive medication, only 42.9% of these patients are able to control their blood pressure (see for instance Burt V L, Cutler J A, Higgins M, et al: Trends in the prevalence, awareness, treatment, and control of hypertension in the adult US population. Data from the Health Examination Surveys, 1960 to 1991. Hypertension 1995;26:60-69.).
This failure to control blood pressure costs $964 million annually in general and $467 million among people who are actually being treated for high blood pressure, from a 2000 study. These incremental cost estimates are, in all likelihood, on the low side, as no cost was assigned to death from uncontrolled hypertension (see for instance http://www.heartinfo.org/reuters2000/00519elin028.htm).
One of the problems is that the clinical effectiveness of most anti-hypertensive drugs is only in the 40-55% range when used alone. Of those that respond, approximately two-thirds require the highest recommended dose to achieve control (see for instance Materson B J, Reda D J, Cushman W C, et al, for the Department of Veterans Affairs Cooperative Study Group on Antihypertensive Agents: Single drug therapy for hypertension in men. A comparison of six antihypertensive drugs with placebo. N Engl J Med, 1993;328:914-921.).
Another study that analyzed data on the efficacy of specific drugs in individual patients concluded that 10-59% of patients failed to respond to diuretics, 12-86% failed to respond to β-blockers, some patients exhibited heterogeneous responses to ACE inhibitors and calcium antagonists, and a small percentage of patients even showed an increase in blood pressure (see for instance Neutel J M, Rolf C N, Valentine S N, Li J, Lucus C, Marmorstein B L: Low-dose combination therapy as first line treatment of mild-to-moderate hypertension: the efficacy and safety of bisoprolol/HCTZ versus amlodipine, enalapril, and placebo. Cardiovasc Rev Rep 1996;71:33-45.). The variation in the individual response to anti-hypertensive drugs may be due to the heterogeneity of the mechanisms underlying hypertension, inter-individual variations of the pharmacokinetics of the drugs, or both.
In addition, much of poor patient compliance leading to treatment failure can be blamed on the side effects caused by anti-hypertensive medication. This factor must be dealt with to achieve blood pressure control.
Most side effects with anti-hypertensive agents are dose dependent. Using smaller doses of various drugs limits dose-dependent side effects. The combination of two complementary agents improves the response rate because more than one physiologic pathway is interrupted, leading to synergistic effects to improve efficacy and avoid the adverse drug reactions associated with higher doses of individual monotherapies. Ideally, one therapy would offset the potential adverse events of the other. Such combinations must address the various factors underlying hypertension in different individuals, including blood volume, vasoconstriction, and the impact of the sympathetic nervous system and the renin-angiotensin system.
Anti-hypertension medications are a primary method for treatment of hypertension. Prescription of anti-hypertensive medication, however, is inexact. Not all patients receiving an anti-hypertensive medication will respond to that treatment. Others may respond, but with serious side effects. The period required to determine the efficacy of treatment response can be both costly and lengthy. Thus a method for rapid identification of appropriate treatment for patients is needed.
Research has indicated that characteristics such as age, gender, ethnicity, weight, diagnosis, and diet affect both the pharmacokinetics and pharmacodynamics of hypertension medication (see for instance Williams B., Kim, J., Cardiovascular drug therapy in the elderly: theoretical and practical considerations. Drugs Aging. 2003; 20(6):445-63.; Ethn. Dis. 1998 Winter; 8(1):98-102. Calcium antagonists—pharmacologic considerations. Prisant L M.; Wassertheil-Smoller, S; Anderson, G; Psaty, B; Black, H; et al. Hypertension and Its Treatment in Postmenopausal Women: Baseline Data from the Women's Health Initiative, Hypertension. 2000 36:780; The rationale and design of the AASK cohort study. Appel L J et al. J Am Soc Nephrol. 2003 July; 14(7 Suppl 2):S166-72; Population analyses of sustained-release verapamil in patients: effects of sex, race, and smoking. Kang D, Verotta D, Krecic-Shepard M E, Modi N B, Gupta S K, Schwartz J B. Clin. Pharmacol. Ther. 2003 January; 73(1):31-40.). However, no method currently exists for incorporating these variables into a predictive algorithm for prescribing medication.
Recently, attention has focused on the identification of Single Nucleotide Polymorphisms, (hereafter SNPs) as factors that specifically influence drug action or act as markers for alleles of genes that influence drug action in hypertension (see for instance Sethi A A, Nordestgaard B G, Tybjaerg-Hansen A. Angiotensinogen gene polymorphism, plasma angiotensinogen, and risk of hypertension and ischemic heart disease: a meta-analysis. Arterioscler Thromb Vasc Biol. 2003 Jul. 1;23(7):1269-75. Epub 2003 Jun. 12.; Bengtsson K, Melander O, Orho-Melander M, Lindblad U, Ranstam J, Rastam L, Groop L. Polymorphism in the beta(1)-adrenergic receptor gene and hypertension. Circulation. 2001 Jul. 10; 104(2):187-90.). However, due to reasons described below, these single SNP variants have been shown to have little or no clinically acceptable and/or statistically significant effect by themselves.
As an independent variable, either a SNP or a patient characteristic is unlikely, itself, to indicate a responder phenotype with acceptable confidence—a direct causal effect on phenotype is rare. However, understanding the complex interactions that result in a response phenotype for more than a small number of variables are not realistic without comprehensive analysis technology. This patent will show how to use such analysis algorithms that have the ability to extract meaningful information from complex interactions occurring between multiple variables.
In recent years, the search for a single gene responsible for hypertension has given way to the understanding that multiple gene variants, acting together with yet unknown environmental risk factors or developmental events, interact in a complex system to account for its expression phenotype. In accordance, treatments that successfully alleviate hypertension symptoms are likely to act on multiple gene products and thus prediction of prognosis or treatment outcome will as well.
To date, SNPs and various proteins have not been used in combination as markers of hypertension. The current state of the art is single-marker tests which have little or no predictive value in hypertension response over large populations. Myraid Genetics Incorporated, of Salt Lake City, Utah has in the past offered a test for the M235T gene variant in relation to cardiovascular prognosis, including ACE inhibitor response. However, this has not been shown to be of relevance for hypertension in recent studies (see for instance the 1000-plus patient studies Poch E. et al., Genetic polymorphisms of the renin-angiotensin system and essential hypertension, Med Clin (Barc). 2002 Apr. 27; 118(15):575-9.; and Matsubara M et al., T+31C polymorphism (M235T) of the angiotensinogen gene and home blood pressure in the Japanese general population: the Ohasama Study. Hypertens Res. 2003 January; 26(1):47-52.). In addition, Myraid Genetics Incorporated has applied for a patent (U.S. Patent Office Ser. No. 10/331,192 filed Dec. 27, 2002) on relating A145G genetic variation in the human beta-1 adrenergic receptor gene to predicting human hypertension medication response. This as well has been shown (see for instance Bengtsson K, Melander O, Orho-Melander M, Lindblad U, Ranstam J, Rastam L, Groop L. Polymorphism in the beta(1)-adrenergic receptor gene and hypertension. Circulation. 2001 Jul. 10; 104(2):187-90.) to have little or no direct effect on hypertension response. It should now be clear that to diagnosis or determine treatment outcome in a complex disease such as hypertension or cardiovascular disease it is necessary to use multi-factorial genetic and/or proteomic markers and/or inclusive with environment and psysiological variables in combination. The present invention provides for methods for doing exactly this. Preferred markers of the invention can aid in the treatment, diagnosis, differentiation, and prognosis of patients with hypertension, cardiovascular disease, and stroke.
Assessing Patient Response To Hypertension Treatment
Responder/non-responder phenotypes of treatment efficacy are determined quantitatively by blood pressure, which first became easy to measure in 1896 when the Italian physician, Riva Rocci, developed what we would now recognize as a conventional mercury sphygmomanometer with a cuff around the arm, which was inflated until the pulsation of the artery could no longer be felt. This gave a very accurate measurement of systolic pressure, although it was subsequently found that it was more accurate if a wider cuff was used. In 1904 Nicolai Korotkoff, a Russian army surgeon, realised that by listening with a stethoscope below the cuff over the artery at the elbow, characteristic sounds were heard at the systolic pressure, but also importantly at the lower pressure (diastolic) when the heart relaxes. It then became very easy to measure both systolic and diastolic pressure accurately with a stethoscope.
Although definitions of hypertension in quantitative measurements of systolic and diastolic blood pressure are continually being modified, usually downwards, the definitions according to the Seventh Joint National Committee on Hypertension (JNC-VII, http://www.nhlbi.nih.gov/quidelines/hypertension; incorporated by reference) is given in table 1.
DBP, diastolic blood pressure;
SBP, systolic blood pressure.
Drug abbreviations: ACEI, angiotensin converting enzyme inhibitor; ARB, angiotensin receptor blocker; BB, beta-blocker; CCB, calcium channel blocker.
*Treatment determined by highest BP category.
†Initial combined therapy should be used cautiously in those at risk for orthostatic hypotension.
‡Treat patients with chronic kidney disease or diabetes to BP goal of <130/80 mmHg.
Classification of Patient Response/Non-Response
While the goal of all hypertension therapy is to achieve normotensive status in the patient (Normal blood pressure for an adult is around 120/80 mmHg), sometime this is not achievable and a person skilled in the art of treating hypertension will recognize a patient can still have a ‘response’ to a medication without achieving normotensive status.
Current diagnostic methods for hypertension treatment are basically trial-and-error. A person is given a medication at usually a low dosage, then titrated upwards in dosage over a period of weeks or months. After several months, the person is evaluated again by a physician to determine if the person's hypertension level has changed and/or an adverse event is registered. If it has not changed enough in a positive direction to suit the patient and/or physician, the person is gradually titrated downwards on the first drug and the process repeats itself with another medication. It is not uncommon for a patient to repeat this process over a period of years, all the while suffering physically, emotionally, and financially.
Accordingly, there is a present need in the art for a rapid, sensitive and specific diagnostic assay for hypertension treatment that can differentiate the type of medication and also identify those individuals at risk for adverse events. Such a diagnostic assay would greatly increase the number of patients that can receive beneficial treatment and therapy, and reduce the costs associated with incorrect therapy.
In another preferred application of the instant invention relates to the identification and use of diagnostic and/or prognostic markers for anti-hypertensives, ACE Inhibitors, and/or the anti-hypertensives captopril, benazepril, enalapril, enalaprilat, fosinopril, lisinopril, quinapril, ramipril, and trandolapril. The methods and compositions described herein can meet the need in the art for a rapid, sensitive and specific diagnostic assay to be used to facilitate the treatment of hypertension patients and the development of additional diagnostic indicators. Moreover, the methods and compositions of the instant invention can also be used in diagnosis, differentiation and prognosis of various forms of cardiovascular disorders as well as cardiovascular drug discovery.
In yet another aspect, the instant invention features methods of diagnosing hypertension by analyzing a test sample obtained from a patient for the presence or amount of one or more SNPs associated with genes in the adsorption, distribution, receptor or effector biochemical pathways of anti-hypertension medications. These methods can include identifying one or more SNPs, the presence or amount of which is associated with the treatment, diagnosis, prognosis, or differentiation of hypertension. Once such SNP(s) are identified, the pattern of such SNPs in a patient sample can be measured. In certain embodiments, these markers can be compared to a diagnostic level determined by an algorithm that is associated with the treatment, diagnosis, prognosis, or differentiation of hypertension. By correlating the patient pattern to the diagnostic pattern, the presence or absence of hypertension, and the probability of treatment outcomes in a patient may be rapidly and accurately determined.
For purposes of the following discussion, the methods described as applicable to the treatment outcome and diagnosis of hypertension treatment generally may be considered applicable to the treatment outcome and diagnosis of cardiac failure and other cardiovascular diseases such as stroke and atherosclerosis.
In certain embodiments, a plurality of SNPs are combined to increase the predictive value of the analysis in comparison to that obtained from the markers individually or in smaller groups. Preferably, one or more specific markers for hypertension treatment can be combined with one or more non-specific markers for hypertension treatment to enhance the predictive value of the described methods.
In certain embodiments, a diagnostic or prognostic indicator is correlated to a condition or disease by merely its presence or absence. In other embodiments, an algorithm is needed to relate the pattern of markers to a desired prediction outcome in the patient. A preferred algorithmic technique for relating markers of the present invention is a linear regression technique, a nonlinear regression technique, an ANOVA technique, a neural network technique, a genetic algorithm technique, a support vector machine technique, a greedy algorithm technique, a tree algorithm technique, a kernel-based technique, and a Bayesian technique. The skilled artisan will recognize the word “technique” refers to a process in which a predictor is built by using patient exemplar pairs of markers and phenotypes, and then refining such predictor algorithm in an iterative process by testing a version of the algorithm on unseen data and making changes to mathematical coefficients of such algorithm in such a way to increase the accuracy and specificity of the predictor algorithm.
In other embodiments, the invention relates to methods for determining a treatment regimen for use in a patient diagnosed with hypertension, particularly for ACE Inhibitors. The methods preferably comprise determining a level of one or more diagnostic or prognostic markers as described herein, and using the markers to determine a diagnosis for a patient. One or more treatment regimens that improve the patient's prognosis by reducing the increased disposition for an adverse outcome associated with the diagnosis can then be used to treat the patient. Such methods may also be used to screen pharmacological compounds for agents capable of improving the patient's prognosis as above.
In yet another embodiment, multiple determination of one or more diagnostic or prognostic markers can be made, and a temporal change in the marker can be used to monitor the efficacy of appropriate therapies. In such an embodiment, one might expect to see a decrease or an increase in the marker(s) over time during the course of effective therapy.
In yet other embodiments, multiple determination of one or more diagnostic or prognostic markers can be made, and a temporal change in the marker can be used to determine a diagnosis or prognosis. For example, a diagnostic indicator may be determined at an initial time, and again at a second time. In such embodiments, an increase in the marker from the initial time to the second time may be diagnostic of a particular type of hypertension, such as treatment-resistant hypertension, or a given prognosis. Likewise, a decrease in the marker from the initial time to the second time may be indicative of a particular type of hypertension, or a given prognosis. Furthermore, the degree of change of one or more markers may be related to the severity of the disease and future adverse events.
In a further aspect, the invention relates to kits for determining the diagnosis or prognosis of a patient. These kits preferably comprise devices and reagents for measuring one or more SNP patterns or marker levels in a patient sample, and instructions for performing the assay. Optionally, the kits may contain one or more means for converting SNP patterns or marker level(s) to a prognosis. Such kits preferably contain sufficient reagents to perform one or more such determinations.
In accordance with the instant invention, there are provided methods and compositions for the identification and use of markers that are associated with the diagnosis, prognosis, or differentiation of hypertension in a patient. Such markers can be used in diagnosing and treating a patient and/or to monitor the course of a treatment regimen; and for screening compounds and pharmaceutical compositions that might provide a benefit in treating or preventing such conditions.
Definitions
Before describing this application of the instant invention in greater detail, the following definitions are set forth to illustrate and define the meaning and scope of the terms used to describe the invention herein.
The terms “cardiovascular disease and anti-hypertensives” relate to the diseases of hypertension, cardiac failure, stroke, other cardiovascular or renal disorders and the pharmaceutical agents used to treat them, respectively. One skilled in the art will recognize these terms, which are described in “The Merck Manual of Diagnosis and Therapy” Seventeenth Edition, 1999, Mark H. Beers, and Robert Berkow, editors, chapters 197-213, incorporated by reference only. In various aspects, the invention relates to materials and procedures for identifying markers that are associated with the diagnosis, prognosis, or differentiation of hypertension treatment in a patient; to using such markers in diagnosing and treating a patient and/or to monitor the course of a treatment regimen; and for screening compounds and pharmaceutical compositions that might provide a benefit in treating or preventing such conditions.
The terms “genetic variant,” “mutation,” “nucleotide variant,” and “nucleotide substitution” are used herein interchangeably to refer to nucleotide changes in a reference nucleotide sequence of a particular gene.
The term “gene”, when used herein, encompasses genomic, mRNA and cDNA sequences encoding the gene protein, including the untranslated regulatory regions of the genomic DNA.
The term “genotype” as used herein means the nucleotide characters at a particular nucleotide variant marker (or locus) in either one allele or both alleles of a gene (or a particular chromosome region). A genotype can be homozygous or heterozygous. Accordingly, “genotyping” means determining the genotype, that is, the nucleotide(s) at a particular gene locus.
As used herein, the terms “amino acid variant,” “amino acid mutation,” and “amino acid substitution” are used herein interchangeably to refer to amino acid changes to a reference protein sequence resulting from a genetic variant or a mutation to the reference gene sequence encoding the reference protein.
The term, “reference sequence” refers to a polynucleotide or polypeptide sequence known in the art, including those disclosed in publicly accessible databases, e.g., GenBank, or a newly identified gene or protein sequence, used simply as a reference with respect to the genetic variant or amino acid variant provided in the present invention.
The term “allele” or “gene allele” is used herein to refer generally to a gene having a reference sequence or a gene containing a specific genetic variant.
The term “locus” refers to a specific position or site in a gene sequence or protein sequence. Thus, there may be one or more contiguous nucleotides in a particular gene locus, or one or more amino acids at a particular locus in a polypeptide. Moreover, “locus” may also be used to refer to a particular position in a gene sequence where one or more nucleotides have been deleted, inserted, or inverted.
As used herein, the terms “polypeptide,” “protein,” and “peptide” are used interchangeably to refer to amino acid chains in which the amino acid residues are linked by covalent peptide bonds. The amino acid chains can be of any length of at least two amino acids, including full-length proteins. Unless otherwise specified, the terms “polypeptide,” “protein,” and “peptide” also encompass various modified forms thereof, including but not limited to glycosylated forms, phosphorylated forms, etc. This term also does not specify or exclude post-translation modifications of polypeptides. For example, polypeptides that include the covalent attachment of glycosyl groups, acetyl groups, phosphate groups, lipid groups and the like are expressly encompassed by the term polypeptide. Also included within the definition are polypeptides which contain one or more analogs of an amino acid (including, for example, non-naturally occurring amino acids, amino acids which only occur naturally in an unrelated biological system, modified amino acids from mammalian systems, etc.), polypeptides with substituted linkages, as well as other modifications known in the art, both naturally occurring and non-naturally occurring.
The terms “primer,” “probe,” and “oligonucleotide” may be used herein interchangeably to refer to a relatively short nucleic acid fragment or sequence. They can be DNA, RNA, or a hybrid thereof, or a chemically modified analog or derivatives thereof. Typically, they are single stranded. However, they can also be double-stranded having two complementing strands which can be separated apart by denaturation. Normally, they have a length of from about 8 nucleotides, and more preferably about 18 to about 50 nucleotides. They can be labeled with detectable markers or modified in any conventional manners for various molecular biological applications.
A “promoter” refers to a DNA sequence recognized by the synthetic machinery of the cell required to initiate the specific transcription of a gene.
The term “primer” denotes a specific oligonucleotide sequence which is complementary to a target nucleotide sequence and used to hybridize to the target nucleotide sequence. A primer serves as an initiation point for nucleotide polymerization catalyzed by either DNA polymerase, RNA polymerase or reverse transcriptase.
The term “probe” denotes a defined nucleic acid segment (or nucleotide analog segment, e.g., polynucleotide) which can be used to identify a specific polynucleotide sequence present in samples, said nucleic acid segment comprising a nucleotide sequence complementary to the specific polynucleotide sequence to be identified.
The location of nucleotides in a polynucleotide with respect to the center of the polynucleotide are described herein in the following manner. When a polynucleotide has an odd number of nucleotides, the nucleotide at an equal distance from the 3′ and 5′ ends of the polynucleotide is considered to be “at the center” of the polynucleotide, and any nucleotide immediately adjacent to the nucleotide at the center, or the nucleotide at the center itself is considered to be “within 1 nucleotide of the center.” With an odd number of nucleotides in a polynucleotide any of the five nucleotides positions in the middle of the polynucleotide would be considered to be within 2 nucleotides of the center, and so on. When a polynucleotide has an even number of nucleotides, there would be a bond and not a nucleotide at the center of the polynucleotide. Thus, either of the two central nucleotides would be considered to be “within 1 nucleotide of the center” and any of the four nucleotides in the middle of the polynucleotide would be considered to be “within 2 nucleotides of the center”, and so on. For polymorphisms which involve the substitution, insertion or deletion of 1 or more nucleotides, the polymorphism, allele or biallelic marker is “at the center” of a polynucleotide if the difference between the distance from the substituted, inserted, or deleted polynucleotides of the polymorphism and the 3′ end of the polynucleotide, and the distance from the substituted, inserted, or deleted polynucleotides of the polymorphism and the 5′ end of the polynucleotide is zero or one nucleotide. If this difference is 0 to 3, then the polymorphism is considered to be “within 1 nucleotide of the center.” If the difference is 0 to 5, the polymorphism is considered to be “within 2 nucleotides of the center.” If the difference is 0 to 7, the polymorphism is considered to be “within 3 nucleotides of the center,”, and so on.
The term “upstream” is used herein to refer to a location which is toward the 5′ end of the polynucleotide from a specific reference point.
The terms “base paired” and “Watson & Crick base paired” are used interchangeably herein to refer to nucleotides which can be hydrogen bonded to one another by virtue of their sequence identities in a manner like that found in double-helical DNA with thymine or uracil residues linked to adenine residues by two hydrogen bonds and cytosine and guanine residues linked by three hydrogen bonds (See Stryer, L., Biochemistry, 4.sup.th edition, 1995).
The terms “complementary” or “complement thereof” are used herein to refer to the sequences of polynucleotides that are capable of forming Watson & Crick base pairing with another specified polynucleotide throughout the entirety of the complementary region. For the purpose of the present invention, a first polynucleotide is deemed to be complementary to a second polynucleotide when each base in the first polynucleotide is paired with its complementary base. Complementary bases are, generally, A and T (or A and U), or C and G. “Complement” is used herein as a synonym from “complementary polynucleotide”, “complementary nucleic acid” and “complementary nucleotide sequence”. These terms are applied to pairs of polynucleotides based solely upon their sequences and not any particular set of conditions under which the two polynucleotides would actually bind.
The term “isolated” requires that the material be removed from its original environment (e.g., the natural environment if it is naturally occurring). For example, a naturally-occurring polynucleotide or polypeptide present in a living animal is not isolated, but the same polynucleotide or DNA or polypeptide, separated from some or all of the coexisting materials in the natural system, is isolated. Such polynucleotide could be part of a vector and/or such polynucleotide or polypeptide could be part of a composition, and still be isolated in that the vector or composition is not part of its natural environment.
Specifically excluded from the definition of “isolated” are: naturally-occurring chromosomes (such as chromosome spreads), artificial chromosome libraries, genomic libraries, and cDNA libraries that exist either as an in vitro nucleic acid preparation or as a transfected/transformed host cell preparation, wherein the host cells are either an in vitro heterogeneous preparation or plated as a heterogeneous population of single colonies. Also specifically excluded are the above libraries wherein a specified 5′ EST makes up less than 5% of the number of nucleic acid inserts in the vector molecules. Further specifically excluded are whole cell genomic DNA or whole cell RNA preparations (including said whole cell preparations which are mechanically sheared or enzymaticly digested). Further specifically excluded are the above whole cell preparations as either an in vitro preparation or as a heterogeneous mixture separated by electrophoresis (including blot transfers of the same) wherein the polynucleotide of the invention has not further been separated from the heterologous polynucleotides in the electrophoresis medium (e.g., further separating by excising a single band from a heterogeneous band population in an agarose gel or nylon blot).
The term “purified” does not require absolute purity; rather, it is intended as a relative definition. Purification of starting material or natural material to at least one order of magnitude, preferably two or three orders, and more preferably four or five orders of magnitude is expressly contemplated. As an example, purification from 0.1% concentration to 10% concentration is two orders of magnitude.
The term “purified polynucleotide” or “purified polynucleotide vector” is used herein to describe a polynucleotide or polynucleotide vector of the invention which has been separated from other compounds including, but not limited to other nucleic acids, carbohydrates, lipids and proteins (such as the enzymes used in the synthesis of the polynucleotide), or the separation of covalently closed polynucleotides from linear polynucleotides. A polynucleotide is substantially pure when at least about 50%, preferably 60 to 75% of a sample exhibits a single polynucleotide sequence and conformation (linear versus covalently closed). A substantially pure polynucleotide typically comprises about 50%, preferably 60 to 90% weight/weight of a nucleic acid sample, more usually about 95%, and preferably is over about 99% pure. Polynucleotide purity or homogeneity is indicated by a number of means well known in the art, such as agarose or polyacrylamide gel electrophoresis of a sample, followed by visualizing a single polynucleotide band upon staining the gel. For certain purposes higher resolution can be provided by using HPLC or other means well known in the art.
The invention also concerns gene-related biallelic markers. As used herein the term “gene-related biallelic marker” relates to a set of biallelic markers in linkage disequilibrium with said named gene.
The terms “hypertension” and “hypertensive” used herein refer to symptoms related to undesirably high levels of blood pressure. Individuals said to have “symptoms related to hypertension” have blood pressure levels at an undesirably high level. For example, an individual with a diastolic blood pressure above 89 mmHg and a systolic blood pressure above 139 mmHg, is considered to have an undesirably high level of blood pressure by the medical community.
“Antihypertensive” treatment and “treating hypertension” as used herein refer to treatment intended to reduce diastolic and/or systolic blood pressure from an undesirably high level (i.e., a level that is considered a disease or disorder under conventional medical standards, or a level that is desired to be reduced for any reason). Individuals with only temporary periods of hypertension—wherein their blood pressure levels only temporarily exceed levels which become undesirable, but then fall to more desirable levels—may also be deemed as having symptoms related to hypertension. Patients with primary, essential, idiopathic hypertension, and secondary hypertension (e.g., renal hypertension and endocrine hypertension) are included in the category of individuals with hypertension.
The terms “diuretic” and “diuretic antihypertensive” are used herein to refer to drugs that affect sodium diuresis and volume depletion in a patient. Thus, diuretic antihypertensives include thiazides (such as hydrochlorothiazide, chlorothiazide, and chlorthalidone), metolazone, loop diuretics (such as furosemide, bumetamide, ethacrynic acid, piretamide and torsemide), and aldosterone antagonists (such as spironolactone, triamterene, and amiloride).
The terms “beta blocker” and “beta blocker antihypertensive” are used herein to refer to beta-adrenergic receptor blocking agents, e.g., drugs that block sympathetic effects on the heart and are generally most effective in reducing cardiac output and in lowering arterial pressure when there is increased cardiac sympathetic nerve activity. In addition, these drugs block the adrenergic nerve-mediated release of renin from the renal justaglomerular cells. Examples of this group of drugs include, but are not limited to, chemical agents such as propranolol, metoprolol, nadolol, atenolol, timolol, betaxolol, carteolol, pindolol, acebutolol, labetalol, and carvediol.
The terms “angiotensin converting enzyme inhibitor,” and “angiotensin converting enzyme inhibitor antihypertensive” are used herein to refer to drugs that are commonly known as ACE inhibitors. This group of drugs includes, for example, chemical agents such as captopril, benazepril, enalapril, enalaprilat, fosinopril, lisinopril, quinapril, ramipril, and trandolapril.
The term “sample” or “test sample” as used herein refers to a biological sample obtained for the purpose of diagnosis, prognosis, or evaluation. In certain embodiments, such a sample may be obtained for the purpose of determining the outcome of an ongoing condition or the effect of a treatment regimen on a condition. Preferred test samples include blood, serum, plasma, urine and saliva. In addition, one of skill in the art would realize that some samples would be more readily analyzed following a fractionation or purification procedure, for example, separation of whole blood into serum or plasma components.
The term “specific marker of hypertension treatment” as used herein refers to SNPs that are typically associated with genes in the Renin-angiotensin-aldosterone system, and which can be correlated with hypertension, but are not correlated with other types of disease. These systems, and others proposed to be involved in hypertension and affected by specific drugs, are in certain embodiments of the invention are candidates for gene/SNP sets to be used as system inputs for a predictive algorithm. These specific markers are described in detail hereinafter.
The term “non-specific marker of hypertension therapeutic action” as used herein refers to molecules that are typically general markers of cardiovascular disease. Such markers may be present in the event of myocardial injury, atherosclerotic plaque rupture, acute coronary syndrome, coagulation, and myocardial ischemia or necrosis but may also be present in general hypertensives.
Said non-specific marker(s) for myocardial ischemia are of one or more markers selected from the group consisting of an MMP-9 level, a TpP level, an MCP-1 level, an H-FABP level, a CRP level, a creatine kinase level, an MB isoenzyme level, a cardiac troponin I level, a cardiac troponin T level, and a level of complexes comprising cardiac troponin I and cardiac troponin T.
Said non-specific marker(s) of atherosclerotic plaque rupture are of one or more markers selected from the group consisting of human neutrophil elastase, inducible nitric oxide synthase, lysophosphatidic acid, malondialdehyde-modified low density lipoprotein, matrix metalloproteinase-1, matrix metalloproteinase-2, matrix metalloproteinase-3, and matrix metalloproteinase-9.
Said non-specific marker(s) of coagulation are of one or more markers selected from the group consisting of .beta.-thromboglobulin, D-dimer, fibrinopeptide A, platelet-derived growth factor, plasmin-.alpha.-2-antip-lasmin complex, platelet factor 4, prothrombin fragment 1+2, P-selectin, thrombin-antithrombin III complex, thrombus precursor protein, tissue factor, and von Willebrand factor.
Said non-specific marker(s) of acute coronary syndrome are of one or more markers selected from the group consisting of matrix metalloprotease-9 (MMP-9), an MMP-9-related marker, TpP, MCP-1, H-FABP, C-reactive protein, creatine kinase, MB isoenzyme, cardiac troponin I, cardiac troponin T, complexes comprising cardiac troponin I and cardiac troponin T, and B-type natriuretic protein.
Said non-specific marker(s) for myocardial injury are of one or more markers selected from the group consisting of annexin V, B-type natriuretic peptide, .beta.-enolase, cardiac troponin I, creatine kinase-MB, glycogen phosphorylase-BB, heart-type fatty acid binding protein, phosphoglyceric acid mutase-MB, S-100ao, a marker of atherosclerotic plaque rupture, a marker of coagulation, C-reactive protein, caspase-3, hemoglobin .alpha.sub.2, human lipocalin-type prostaglandin D synthase, interleukin-1.beta., interleukin-1 receptor antagonist, interleukin-6, monocyte chemotactic protein-1, soluble intercellular adhesion molecule-1, soluble vascular cell adhesion molecule-1, MMP-9, TpP, and tumor necrosis factor a.
Said non-specific marker(s) for myocardial necrosis are BNP and/or NT pro-BNP.
Said marker(s) for stroke are two or more of Cellular-Fibronectin, apolipoprotein CI (ApoC-I), apolipoprotein CIII (ApoC-III), serum amyloid A (SAA), antithrombin-III fragment (AT-III fragment), Creatine kinase, tropinin, CPK, LDH Isoenzymes, Antithrombin III, Protein C, Protein S, fibrinogen, Factor VIII, activated Protein C resistance, E-selectin, P-selectin, Willebrand factor (vWF), platelet-derived microvesicles (PDM), plasminogen activator inhibitor-1 (PAI-1), annexin V, B-type natriuretic peptide (BNP), pro-BNP, N-terminal pro-atrial natriuretic peptide, beta-enolase, cardiac troponin I, creatine kinase-MB, glycogen phosphorylase-BB, heart-type fatty acid binding protein (H-FABP), phosphoglyceric acid mutase-MB, S-100beta, a marker of atherosclerotic plaque rupture, a marker of coagulation, NR2A/2B (a subtype of N-methyl-D-aspartate (NMDA) receptors), CD54, C-reactive protein, caspase-3, hemoglobin .alpha.sub.2, human lipocalin-type prostaglandin D synthase, interleukin-1 beta, interleukin-1 receptor antagonist, interleukin 2, interleukin 2 receptor, interleukin-6, monocyte chemotactic protein-1, soluble intercellular adhesion molecule-1, soluble vascular cell adhesion molecule-1, MMP-9, tissue factor (TF), fibrin D-dimer (D-dimer), total sialic acid (TSA), TpP, and tumor necrosis factor alpha, and tumor necrosis factor receptors 1 and 2.
Other non-specific markers of hypertension include genetic variants and protein products of genes encoding components in lipid metabolism such as CETP and LDLR.
The skilled artisan will recognize that nucleotide position can be found from reference sequence number (hereafter RS#) information by referring to a public database such as www.snpper.chip.org. and that if no RS# exists one can refer to the literature for sequence information. An example of this latter case are mutations in the haptoglobin gene, which is referred by names of haptoglobin 1-1, haptoglobin 1-2, and haptoglobin 1-3. Detailed descriptions of the mutations and their respective genetic positions of these three are to be found by referring to Yano A. Yamamoto Y. Miyaishi S. Ishizu H. Haptoglobin genotyping by allele-specific polymerase chain reaction amplification. Acta Medica Okayama. 52(4):173-81, 1998 Aug. UI: 98454552, and Hill A V. Bowden D K. Flint J. Whitehouse D B. Hopkinson D A. Oppenheimer S J. Serjeantson S W. Clegg J B. A population genetic survey of the haptoglobin polymorphism in Melanesians by DNA analysis. American Journal of Human Genetics. 38(3):382-9, 1986 Mar., both incorporated in their entirety by reference.
The phrase “diagnosis” as used herein refers to methods by which the skilled artisan can estimate and even determine whether or not a patient is suffering from a given disease or condition. The skilled artisan often makes a diagnosis on the basis of one or more diagnostic indicators, i.e., a marker, the presence, absence, or amount of which is indicative of the presence, severity, or absence of the condition.
Similarly, a prognosis is often determined by examining one or more “prognostic indicators.” These are markers, the presence or amount of which in a patient (or a sample obtained from the patient) signal a probability that a given course or outcome, including treatment outcome, will occur. For example, when one or more prognostic indicators exhibit a certain pattern or level in samples obtained from such patients, the pattern or level may signal that the patient is at an increased probability for experiencing a future event in comparison to a similar patient exhibiting a different pattern or lower marker level. A certain pattern, level or a change in level of a prognostic indicator, which in turn is associated with an increased probability of disease recurrence or side effect such as obesity, is referred to as being “associated with an increased predisposition to an adverse outcome” in a patient. Preferred prognostic markers can predict the onset of delayed adverse events in a patient, or the chance of a person responding or not responding to a certain drug.
The term “correlating,” as used herein in reference to the use of diagnostic and prognostic indicators, refers to comparing the presence or amount of the indicator in a patient to its presence or amount in persons known to respond to a certain treatment; suffer from, or known to be at risk of, a given condition; or in persons known to be free of a given condition, i.e. “normal individuals”. For example, a SNP pattern or marker level in a patient sample can be compared to a SNP pattern or level known to be associated with response to a certain hypertension medication. The sample's marker pattern or level is said to have been correlated with a diagnosis; that is, the skilled artisan can use the marker pattern or level to determine whether the patient will respond to a certain medication, and prescribe accordingly. Alternatively, the sample's SNP pattern or marker level can be compared to a SNP pattern or marker level known to be associated with an adverse event (e.g., excessive dry cough or angiodemia), such as an SNP pattern or average level found in a population of normal individuals.
The skilled artisan will understand that, while in certain embodiments comparative measurements are made of the same diagnostic marker at multiple time points, one could also measure a given marker at one time point, and a second marker at a second time point, and a comparison of these markers may provide diagnostic information. The skilled artisan will also understand that proteomic or gene expression values may change in time, SNP patterns by definition are fixed in time.
The phrase “determining the prognosis” as used herein refers to methods by which the skilled artisan can predict the course or outcome of a condition in a patient. The term “prognosis” does not refer to the ability to predict the course or outcome of a condition with 100% accuracy, or even that a given course or outcome is predictably more or less likely to occur based on the presence, absence or levels of test markers. Instead, the skilled artisan will understand that the term “prognosis” refers to an increased probability that a certain course or outcome will occur; that is, that a course or outcome is more likely to occur in a patient exhibiting a given condition, such as nicotine dependence, when compared to those individuals not exhibiting the condition.
The skilled artisan will understand that associating a prognostic indicator with a predisposition to an adverse outcome is a statistical analysis. For example, a marker level of greater than 80 pg/mL may signal that a patient is more likely to suffer from an adverse outcome than patients with a level less than or equal to 80 pg/mL, as determined by a level of statistical significance. Additionally, a change in marker concentration from baseline levels may be reflective of patient prognosis, and the degree of change in marker level may be related to the severity of adverse events. Comparing two or more populations, and determining a confidence interval and/or a p value often determine statistical significance. See, e.g., Dowdy and Wearden, Statistics for Research, John Wiley & Sons, New York, 1983. Preferred confidence intervals of the invention are 90%, 95%, 97.5%, 98%, 99%, 99.5%, 99.9% and 99.99%, while preferred p values are 0.1, 0.05, 0.025, 0.02, 0.01, 0.005, 0.001, and 0.0001. Exemplary statistical tests and algorithmic methods for associating a prognostic indicator with a predisposition to an adverse outcome and success or failure on a treatment regime are described hereinafter.
Introduction
Blood pressure levels are homeostatically maintained through complex networks of many interrelated biochemical, physiologic, and anatomic traits organized to provide redundant systems with counterbalancing pressor and the pressor effects. Despite this underlying complexity, all hypertension can be viewed as a consequence of inappropriate vasoconstriction relative to the concomitant intravascular fluid volume or of overfilling of the arterial vascular bed with excess fluid relative to its capacity. Because many components of the regulatory systems are proteins that may vary in structure, configuration, or quantity because of genetic differences, it is expected that interindividual variation in antihypertensive drug responses are at least in part genetically determined. Logical candidate genes to influence antihypertensive drug responses are those that code for components of the system(s) targeted by the drug or of counter regulatory system(s) opposing an initial drug-induced fall in blood pressure. The only difference between a drug response trait and other genetically influenced traits is that the drug(s) must be administered for the response trait to manifest. Otherwise, the same analytical approaches can be taken to identify and characterize genetic and environmental sources of variation in drug response trait.
Angiotensinogen, of the serine protease inhibitor family, is a cell-secreted plasma protein in the circulation originating predominantly from the liver. It is cleaved by Renin to release a small 10 amino acid protein, Angiotensin I. A positive feedback mechanism exists between Angiotensin II and Angiotensinogen expression.
Angiotensin Converting Enzyme (ACE), adipeptidyl carboxypeptidase, has 2 mechanisms for vasoconstriction. It converts angiotensin I to angiotensin II, which is a potent vasoconstrictor, and it inactivates bradykinin, which is a vasodilator. ACE contains one functional site for cleaving terminal dipeptides. There are different classes of ACE inhibitors. Non-peptide inhibitors chealate Zinc and heavy metal ions needed for enzymatic activity, and creates a catalytically defective enzyme. A second class of inhibitors are peptides that interact with ACE similarly to endogenous substrates. Most medications are in this class.
Angiotensin II receptors bind free angiotensin II and initiate the biochemical signal pathways that lead to many physiological effects downstream effects. It is a member of the superfamily of G protein-coupled receptors that have seven transmembrane regions.
In recent years, the search for a single gene responsible for major depressive disorder has given way to the understanding that multiple gene variants, acting together with yet unknown environmental risk factors or developmental events, interact in a complex system to account for its expression phenotype. In accordance, treatments that successfully alleviate hypertension symptoms are likely to act on multiple gene products of the above pathway.
A study of 107 hypertensive patents taking ACE Inhibitors to control their blood pressure was performed. Normal blood pressure for an adult is around 120/80 mmHg. We defined a hypertensive patient as a patient with BP above 149/90 mmHg. The patients were newly diagnosed and not on any previous psychotropic medication. ACE inhibitors were not distinguished between, and We also did not distinguish non-response due to lack of BP reduction vs. non-response due to adverse effects.
The definition of a “Responder” used was a patient who is hypertensive, takes an anti-hypertensive medication, and has consistent normotensive BP after medication treatment without adverse side effects.
We used the initial diagnosis of hypertension by the physician on the patient's chart as sufficient information to classify the patient as hypertensive. A BP at diagnosis, or a series of prior hypertensive BP measurements reinforces this diagnosis.
We set a minimum duration of 6 months since initial diagnosis of hypertension. We further reinforce this by a minimum 6 months duration of one medication therapy if a patient has switched medication.
To be classified as normotensive after medication, patients must have at least 3 normotensive BP measurements over a minimum 6-month duration of therapy, and no more than 1 hypertensive measurement after medication start date. Patients must also have no adverse side effects noted on charts during the duration of medication. If a patient was not quite normotensive, yet, doctor's discretion kept the patient on the medication for over a year, we also allowed responder classification as a 40+ mmHg reduction in systolic or diastolic blood pressure.
The simplest definition of a “Non-responder” is a patient who is hypertensive, takes an anti-hypertensive medication, and has continues to have a hypertensive BP after medication treatment, experiences adverse side effects, or must revert to alternative medication to attain normotensive BP measurements.
Analysis of Patient Data
Following completion of the study we had collected 107 patient samples with: (1) response data, and (2) genotype information for 42 SNPs. Identity of the SNPs are given in appendix A. Of the total 107 samples there were 68 Responders and 39 Non-Responders.
Linear Analysis
As a first step, a linear association analysis was performed to screen for “Golden SNPs”, single SNPs that could be used independently to predict response. Since hypertension is a complex disease involving many genes as detailed above, we did not expect to find any, however, these SNPs alone or in combination could be relevant to disease prediction in smaller subgroups of people.
To show the amount of linear correlation with response, we give □2 and Pearson's r coefficients and their significance. To review, the □2 coefficient gives a measurement of association. The null hypothesis is that two variables have no association. A high □2 value indicates that the null hypothesis is unlikely. The chi-square statistic is given by
In our case, we are measuring the association between phenotype and genotype. There are two phenotype values, response/non-response, and three-genotype variable, 2 homozygous combinations and one heterozygous combination. Here, Ni,j is the observed number of responses/non-responses for each genotype value, and nij is the number of responses/non-response in the null hypothesis.
□2's significance is given by Q (χ|ν), an incomplete gamma function, where ν is the degrees of freedom. Strictly speaking, Q is the probability that the sum of the squares of ν random normal variables of unit variance (and zero mean) will be greater than χ2. Essentially, we are asking whether or not the sum of errors, N-n, are less than the sum of errors from an uncorrelated distribution of random variables. If Q is high, then it is likely that the errors of the uncorrelated distribution are greater than the errors in the data, indicating that the data is likely more correlated than some multi-variate normally distributed data set.
Hence, we then calculate Pearson's r, the linear correlation coefficient, for each genotype. Pearson's r gives a number between −1 and 1, with −1 indicating a high negative correlation, 1 a high positive correlation and 0 no correlation. In case a variable is constant (no genotypic variation), Pearson's r is undefined. We also supply the significance of the correlation, P, equal to 1−Q. P is the complimentary error function and it gives the probability that |r| should be larger than its observed value in the null hypothesis that the variable is uncorrelated with response/non-response. In other words, P is the probability that if the variable is uncorrelated then |r| would be larger than what is observed |r|. A small value of P indicates that the variable is significantly correlated with response/non-response.
We found no “Golden SNP” that delivered predictive success greater than 62%. Using a simple binary predictor, which counts the number of each outcome category for each genotype and assigns an outcome for that genotype based on the outcome category with the highest count, we identified the top performing individual SNP to have an r coefficient of only 0.31. Complete data is given in
As previously stated, we chose 43 SNPs in candidate genes in the renin-angiotensin-aldosterone pathway as well as the adrenergic and endothelial biochemical systems. This gives a network with 129 inputs, but we have only ˜100 patient samples. Clearly, we would not expect to be able to adequately train a network with 129 inputs to with only 100 examples. A global search algorithm was run that winnowed down the number of possible combinations of SNPs from 43!˜1053 to those that are the most predictive of response or non-response to ACE Inhibitors in a separate nonlinear adaptive algorithm (NAA). This search algorithm preserved nonlinear interactions between SNPs (epistasis) that we hypothesize are the primary contributors to determining response in drugs that act along multigenic biochemical pathways.
The search algorithm selected genotype combinations with a predictive accuracy >70% and a laundering <10% (i.e., >90% samples remaining) for evaluation of performance on an unlaundered (i.e., complete) dataset. Evaluation was performed using leave-one-out cross validation to minimize the chance of Type II errors.
Laundering is a dynamic process that evaluates whether a SNP genotype combination is found in both the responder and non-responder patient groups. Those patient samples that have SNP genotype combinations that occur for both responders and non-responders are removed from the dataset before the neural net is trained, tested and evaluated. When looking at a 2 SNP input combination the degree of laundering is high (perhaps >65% of samples are removed). However, as the SNP genotype input number increases, the likelihood of finding the same genotype combination in both the responder and non-responder groups becomes low and, hence, the degree of laundering decreases (perhaps <10% of samples are removed).
In further analysis, the global search algorithm was ran with the goal of simultaneously optimizing the number of SNPs to use as inputs to our final nonlinear predictor and Predictive Accuracy. The global search algorithm was also used to find the most common SNPs that appear in the top performing inputs that contained 20 or fewer SNPs.
The global search algorithm (hereafter called Step-Up) was run again with this time only using the most frequent SNPs at a level of 10% to the most frequent. This reduced the number of SNP-genotypes to 9 and increased the predictive accuracy to 81±3%. Additionally, with a different algorithm, using genetic-algorithm optimized neural networks, described in patent application Ser. No. 09/611,220 and hereby incorporated by reference, that demonstrated a predictive accuracy of 79±5% with 13 SNPs. Predictive accuracy was determined for Step-Up by leave-one-out cross validation performed over the entire sample set (n=118), and for GA Master Net by random batch removal (30%) performed with results averaged over greater than 20 tests.
We discovered fifty combinations of SNP predictors that had a prediction rate over the entire population of 76% or greater. The greatest accuracy of the top predictor had a predictive success of 81%, with a specificity of 92% and a sensitivity of 87%.
From
This study confirms our initial hypothesis that response to anti-hypertension medications is a strongly nonlinear epistatic process.
We have examined and ruled out the possibility that random chance is responsible for the strong positive results we are achieving by testing the global search algorithm against a random SNP dataset. It would not be unreasonable to question whether a 107 patient sample group could be partitioned into responders and non-responders using 43 random variables. Upon examination of this possibility by subjecting a random dataset identical in dimension to that of the hypertension dataset (e.g. 43 SNPs, 107 patients) to Step-Up, we found our technology was unable to select any combinations of random variables with a predictive ability greater than 55%. This supports our conclusion that we have identified select SNPs with relevant information for predicting outcome and that nonlinear algorithms are capable of extracting minimally representative information contained in complex multi-variable groups.
In a preferred embodiment of the present invention, to enable higher predictive accuracy, one can use the top N SNP groups to train a committee network, described below, in a voting scheme. Basically N predictors of N sets of groups each give a “vote” to new, previously unseen examples presented to each predictor. The votes are added up and a final output is given based upon this “group vote”. This methodology with the dataset yielded a predictive accuracy of 89±2%.
In still another preferred embodiment of the present invention, one or more of the top 50 SNP groups, given below, found might work better singly or in combination with other SNP groups with a certain subsection of the population. One can then train a predictor algorithm with these specific combinations.
Said specific combinations are the following, gene abbreviations followed by specific mutation, with a translation of gene abbreviations and mutations in table 2, put into vertical columns labeled one through fifty:
Diagnostic Detection of Hypertension Disease-Associated and Treatment-Relevant Mutations:
According to the present invention, base changes in the genes can be detected and used as a diagnostic for Hypertension. A variety of techniques are available for isolating DNA and RNA and for detecting mutations in the isolated AGT, ACE, AGTR1, GPB, EDN1, EDN2, alpha-adducin, haptoglobin, CYP2C9, RGS2, ADRA1a, 11betaHSD2, ADRA1b, ADRA2A, ADRAB1, ADRAB2, REN, APOA, APOB, CETP, LIPC, EDNRB, or ENOS gene(s).
A number of sample preparation methods are available for isolating DNA and RNA from patient blood samples. For example, the DNA from a blood sample is obtained by cell lysis following alkali treatment. Often, there are multiple copies of RNA message per DNA. Accordingly, it is useful from the standpoint of detection sensitivity to have a sample preparation protocol which isolates both forms of nucleic acid. Total nucleic acid may be isolated by guanidium isothiocyanate/phenol-chloroform extraction, or by proteinase K/phenol-chloroform treatment. Commercially available sample preparation methods such as those from Qiagen Inc. (Chatsworth, Calif.) can also be utilized.
As discussed more fully hereinbelow, hybridization with one or more labelled probes containing complements of the variant sequences enables detection of the Hypertension mutations. Since each Hypertension patient can be heteroplasmic (possessing both the Hypertension mutation and the normal sequence) a quantitative or semi-quantitative measure (depending on the detection method) of such heteroplasmy can be obtained by comparing the amount of signal from the Hypertension probe to the amount from the Hypertension.sup.-(normal or wild-type) probe.
A variety of techniques, as discussed more fully hereinbelow, are available for detecting the specific mutations in the AGT, ACE, AGTR1, GPB, EDN1, EDN2, alpha-adducin, haptoglobin, CYP2C9, RGS2, ADRA1a, 11betaHSD2, ADRA1b, ADRA2A, ADRAB1, ADRAB2, REN, APOA, APOB, CETP, LIPC, EDNRB, or ENOS gene(s). The detection methods include, for example, cloning and sequencing, ligation of oligonucleotides, use of the polymerase chain reaction and variations thereof, use of single nucleotide primer-guided extension assays, hybridization techniques using target-specific oligonucleotides and sandwich hybridization methods.
Cloning and sequencing of the AGT, ACE, AGTR1, GPB, EDN1, EDN2, alpha-adducin, haptoglobin, CYP2C9, RGS2, ADRA1a, 11 betaHSD2, ADRA1b, ADRA2A, ADRAB1, ADRAB2, REN, APOA, APOB, CETP, LIPC, EDNRB, or ENOS gene(s) can serve to detect Hypertension mutations in patient samples. Sequencing can be carried out with commercially available automated sequencers utilizing fluorescently labelled primers. An alternate sequencing strategy is the “sequencing by hybridization” method using high density oligonucleotide arrays on silicon chips (Fodor et al., Nature 364:555-556 (1993); Pease et al., Proc. Natl. Acad. Sci. USA, 91:5022-5026 (1994). For example, fluorescently-labelled target nucleic acid generated, for example from PCR amplification of the target genes using fluorescently labeled primers, are hybridized with a chip containing a set of short oligonucleotides which probe regions of complementarity with the target sequence. The resulting hybridization patterns are useful for reassembling the original target DNA sequence.
Mutational analysis can also be carried out by methods based on ligation of oligonucleotide sequences which anneal immediately adjacent to each other on a target DNA or RNA molecule (Wu and Wallace, Genomics 4:560-569 (1989); Landren et al., Science 241:1077-1080 (1988); Nickerson et al., Proc. Natl. Acad. Sci. 87:8923-8927 (1990); Barany, F., Proc. Natl. Acad. Sci. 88:189-193 (1991)). Ligase-mediated covalent attachment occurs only when the oligonucleotides are correctly base-paired. The Ligase Chain Reaction (LCR), which utilizes the thermostable Taq ligase for target amplification, is particularly useful for interrogating Hypertension mutation loci. The elevated reaction temperatures permits the ligation reaction to be conducted with high stringency (Barany, F., PCR Methods and Applications 1:5-16 (1991)).
Analysis of point mutations in DNA can also be carried out by using the polymerase chain reaction (PCR) and variations thereof. Mismatches can be detected by competitive oligonucleotide priming under hybridization conditions where binding of the perfectly matched primer is favored (Gibbs et al., Nucl. Acids. Res. 17:2437-2448 (1989)). In the amplification refractory mutation system technique (ARMS), primers are designed to have perfect matches or mismatches with target sequences either internal or at the 3′ residue (Newton et al., Nucl. Acids. Res. 17:2503-2516 (1989)). Under appropriate conditions, only the perfectly annealed oligonucleotide functions as a primer for the PCR reaction, thus providing a method of discrimination between normal and mutant (Hypertension) sequences.
Genotyping analysis of the Aldosterone synthase CYP11B2, Angiotensin converting enzyme ACE, CYP2C9, alpha-adducin, Angiotensinogen AGT, Angiotensin II type 1 receptor AGTR1, Angiotensin II type 2 receptor AGTR2, Mineralocorticoid receptor MLR, RGS2, Renin REN, Adrenergic 1a receptor ADRA1a, Adrenergic 1b receptor ADRA1b, Adrenergic 2 receptor ADRA2A, Adrenergic—1 receptor ADRB1, Adrenergic—2 receptor ADRB2, Adrenergic—3 receptor ADRB3, Endothelin receptor type A EDNRA, Endothelin receptor type B EDNRB, Endothelial nitric oxide synthase ENOS, Apolipoprotein A APOA, Apolipoprotein B APOB, Apolipoprotein E APOE, Lipase hepatic LIPC, Haptoglobin and Cholesteryl ester transfer protein CETP genes can also be carried out using single nucleotide primer-guided extension assays, where the specific incorporation of the correct base is provided by the high fidelity of the DNA polymerase (Syvanen et al., Genomics 8:684-692 (1990); Kuppuswamy et al., Proc. Natl. Acad. Sci. USA. 88:1143-1147 (1991)). Another primer extension assay, which allows for the quantification of heteroplasmy by simultaneously interrogating both wild-type and mutant nucleotides, is disclosed in a pending U.S. patent application entitled, “Multiplexed Primer Extension Methods”, naming Eoin Fahy and Soumitra Ghosh as inventors, filed on Mar. 24, 1995, Ser. No. 08/410,658, the disclosure of which is incorporated by reference.
Detection of single base mutations in target nucleic acids can be conveniently accomplished by differential hybridization techniques using target-specific oligonucleotides (Suggs et al., Proc. Natl. Acad. Sci. 78:6613-6617 (1981); Conner et al., Proc. Natl. Acad. Sci. 80:278-282 (1983); Saiki et al., Proc. Natl. Acad. Sci. 86:6230-6234 (1989)). For example, mutations are diagnosed on the basis of the higher thermal stability of the perfectly matched probes as compared to the mismatched probes. The hybridization reactions may be carried out in a filter-based format, in which the target nucleic acids are immobilized on nitrocellulose or nylon membranes and probed with oligonucleotide probes. Any of the known hybridization formats may be used, including Southern blots, slot blots, “reverse” dot blots, solution hybridization, solid support based sandwich hybridization, bead-based, silicon chip-based and microtiter well-based hybridization formats.
An alternative strategy involves detection of the AGT, ACE, AGTR1, GPB, EDN1, EDN2, alpha-adducin, haptoglobin, CYP2C9, RGS2, ADRA1a, 11betaHSD2, ADRA1b, ADRA2A, ADRAB1, ADRAB2, REN, APOA, APOB, CETP, LIPC, EDNRB, or ENOS gene(s) by sandwich hybridization methods. In this strategy, the mutant and wild-type (normal) target nucleic acids are separated from non-homologous DNA/RNA using a common capture oligonucleotide immobilized on a solid support and detected by specific oligonucleotide probes tagged with reporter labels. The capture oligonucleotides can be immobilized on microtitre plate wells or on beads (Gingeras et al., J. Infect. Dis. 164:1066-1074 (1991); Richman et al., Proc. Natl. Acad. Sci. 88:11241-11245 (1991)).
While radio-isotopic labeled detection oligonucleotide probes are highly sensitive, non-isotopic labels are preferred due to concerns about handling and disposal of radioactivity. A number of strategies are available for detecting target nucleic acids by non-isotopic means (Matthews et al., Anal. Biochem., 169:1-25 (1988)). The non-isotopic detection method may be direct or indirect.
The indirect detection process is generally where the oligonucleotide probe is covalently labelled with a hapten or ligand such as digoxigenin (DIG) or biotin. Following the hybridization step, the target-probe duplex is detected by an antibody- or streptavidin-enzyme complex. Enzymes commonly used in DNA diagnostics are horseradish peroxidase and alkaline phosphatase. One particular indirect method, the Genius.TM. detection system (Boehringer Mannheim) is especially useful for mutational analysis of the AGT, ACE, AGTR1, GPB, EDN1, EDN2, alpha-adducin, haptoglobin, CYP2C9, RGS2, ADRA1a, 11betaHSD2, ADRA1b, ADRA2A, ADRAB1, ADRAB2, REN, APOA, APOB, CETP, LIPC, EDNRB, or ENOS gene(s). This indirect method uses digoxigenin as the tag for the oligonucleotide probe and is detected by an anti-digoxigenin-antibody-alkaline phosphatase conjugate.
Direct detection methods include the use of fluorophor-labeled oligonucleotides, lanthanide chelate-labeled oligonucleotides or oligonucleotide-enzyme conjugates. Examples of fluorophor labels are fluorescein, rhodamine and phthalocyanine dyes. Examples of lanthanide chelates include complexes of Eu.sup.3+ and Tb.sup.3+. Directly labeled oligonucleotide-enzyme conjugates are preferred for detecting point mutations when using target-specific oligonucleotides as they provide very high sensitivities of detection.
Oligonucleotide-enzyme conjugates can be prepared by a number of methods (Jablonski et al., Nucl. Acids Res., 14:6115-6128 (1986); Li et al., Nucl. Acids Res. 15:5275-5287 (1987); Ghosh et al., Bioconjugate Chem. 1:71-76 (1990)), and alkaline phosphatase is the enzyme of choice for obtaining high sensitivities of detection. The detection of target nucleic acids using these conjugates can be carried out by filter hybridization methods or by bead-based sandwich hybridization (Ishii et al., Bioconjugate Chemistry 4:34-41 (1993)).
Detection of the probe label may be accomplished by the following approaches. For radioisotopes, detection is by autoradiography, scintillation counting or phosphor imaging. For hapten or biotin labels, detection is with antibody or streptavidin bound to a reporter enzyme such as horseradish peroxidase or alkaline phosphatase, which is then detected by enzymatic means. For fluorophor or lanthanide-chelate labels, fluorescent signals may be measured with spectrofluorimeters with or without time-resolved mode or using automated microtitre plate readers. With enzyme labels, detection is by color or dye deposition (p-nitropheny phosphate or 5-bromo-4-chloro-3-indolyl phosphate/nitroblue tetrazolium for alkaline phosphatase and 3,3′-diaminobenzidine-NiCl.sub.2 for horseradish peroxidase), fluorescence (e.g., 4-methyl umbelliferyl phosphate for alkaline phosphatase) or chemiluminescence (the alkaline phosphatase dioxetane substrates LumiPhos 530 from Lumigen Inc., Detroit Mich. or AMPPD and CSPD from Tropix, Inc.). Chemiluminescent detection may be carried out with X-ray or polaroid film or by using single photon counting luminometers. This is the preferred detection format for alkaline phosphatase labelled probes.
The oligonucleotide probes for detection preferably range in size between 10 and 100 bases, more preferably between 15 and 30 bases in length. Examples of such nucleotide probes are found below in Tables 4 and 5. Tables 5 and 6 provide representative sequences of probes for detecting mutations in AGT, ACE, AGTR1, GPB, EDN1, EDN2, alpha-adducin, haptoglobin, CYP2C9, RGS2, ADRA1a, 11betaHSD2, ADRA1b, ADRA2A, ADRAB1, ADRAB2, REN, APOA, APOB, CETP, LIPC, EDNRB, or ENOS gene(s) and representative antisense sequences. In order to obtain the required target discrimination using the detection oligonucleotide probes, the hybridization reactions are preferably run between 20.degree. C. and 60.degree. C., and more preferably between 30.degree. C. and 55.degree. C. As known to those skilled in the art, optimal discrimination between perfect and mismatched duplexes can be obtained by manipulating the temperature and/or salt concentrations or inclusion of formamide in the stringency washes.
As an alternative to detection of mutations in the nucleic acids associated with the AGT, ACE, AGTR1, GPB, EDN1, EDN2, alpha-adducin, haptoglobin, CYP2C9, RGS2, ADRA1a, 11betaHSD2, ADRA1b, ADRA2A, ADRAB1, ADRAB2, REN, APOA, APOB, CETP, LIPC, EDNRB, or ENOS gene(s), it is also possible to analyze the protein products of the AGT, ACE, AGTR1, GPB, EDN1, EDN2, alpha-adducin, haptoglobin, CYP2C9, RGS2, ADRA1a, 11betaHSD2, ADRA1b, ADRA2A, ADRAB1, ADRAB2, REN, APOA, APOB, CETP, LIPC, EDNRB, or ENOS gene(s). In particular, point mutations in these genes are expected to alter the structure of the proteins for which these gene encode. These altered proteins (variant polypeptides) can be isolated and used to prepare antisera and monoclonal antibodies that specifically detect the products of the mutated genes and not those of non-mutated or wild-type genes. Mutated gene products also can be used to immunize animals for the production of polyclonal antibodies. Recombinantly produced peptides can also be used to generate polyclonal antibodies. These peptides may represent small fragments of gene products produced by expressing regions of the mitochondrial genome containing point mutations.
More particularly, variant polypeptides from point mutations in said genes can be used to immunize an animal for the production of polyclonal antiserum. For example, a recombinantly produced fragment of a variant polypeptide can be injected into a mouse along with an adjuvant so as to generate an immune response. Murine immunoglobulins which bind the recombinant fragment with a binding affinity of at least 1.times.10.sup.7 M.sup.-1 can be harvested from the immunized mouse as an antiserum, and may be further purified by affinity chromatography or other means. Additionally, spleen cells are harvested from the mouse and fused to myeloma cells to produce a bank of antibody-secreting hybridoma cells. The bank of hybridomas can be screened for clones that secrete immunoglobulins which bind the recombinantly produced fragment with an affinity of at least 1.times.10.sup.6 M.sup.-1. More specifically, immunoglobulins that selectively bind to the variant polypeptides but poorly or not at all to wild-type polypeptides are selected, either by pre-absorption with wild-type proteins or by screening of hybridoma cell lines for specific idiotypes that bind the variant, but not wild-type, polypeptides.
Nucleic acid sequences capable of ultimately expressing the desired variant polypeptides can be formed from a variety of different polynucleotides (genomic or cDNA, RNA, synthetic oligonucleotides, etc.) as well as by a variety of different techniques.
The DNA sequences can be expressed in hosts after the sequences have been operably linked to (i.e., positioned to ensure the functioning of) an expression control sequence. These expression vectors are typically replicable in the host organisms either as episomes or as an integral part of the host chromosomal DNA. Commonly, expression vectors can contain selection markers (e.g., markers based on tetracyclinic resistance or hygromycin resistance) to permit detection and/or selection of those cells transformed with the desired DNA sequences. Further details can be found in U.S. Pat. No. 4,704,362.
Polynucleotides encoding a variant polypeptide may include sequences that facilitate transcription (expression sequences) and translation of the coding sequences such that the encoded polypeptide product is produced. Construction of such polynucleotides is well known in the art. For example, such polynucleotides can include a promoter, a transcription termination site (polyadenylation site in eukaryotic expression hosts), a ribosome binding site, and, optionally, an enhancer for use in eukaryotic expression hosts, and, optionally, sequences necessary for replication of a vector.
E. coli is one prokaryotic host useful particularly for cloning DNA sequences of the present invention. Other microbial hosts suitable for use include bacilli, such as Bacillus subtilus, and other enterobacteriaceae, such as Salmonella, Serratia, and various Pseudomonas species. In these prokaryotic hosts one can also make expression vectors, which will typically contain expression control sequences compatible with the host cell (e.g., an origin of replication). In addition, any number of a variety of well-known promoters will be present, such as the lactose promoter system, a tryptophan (Trp) promoter system, a beta-lactamase promoter system, or a promoter system from phage lambda. The promoters will typically control expression, optionally with an operator sequence, and have ribosome binding site sequences, for example, for initiating and completing transcription and translation.
Other microbes, such as yeast, may also be used for expression. Saccharomyces can be a suitable host, with suitable vectors having expression control sequences, such as promoters, including 3-phosphoglycerate kinase or other glycolytic enzymes, and an origin of replication, termination sequences, etc. as desired.
In addition to microorganisms, mammalian tissue cell culture may also be used to express and produce the polypeptides of the present invention. Eukaryotic cells are actually preferred, because a number of suitable host cell lines capable of secreting intact human proteins have been developed in the art, and include the CHO cell lines, various COS cell lines, HeLa cells, myeloma cell lines, Jurkat cells, and so forth. Expression vectors for these cells can include expression control sequences, such as an origin of replication, a promoter, an enhancer, an necessary information processing sites, such as ribosome binding sites, RNA splice sites, polyadenylation sites, and transcriptional terminator sequences. Preferred expression control sequences are promoters derived from immunoglobulin genes, SV40, Adenovirus, Bovine Papilloma Virus, and so forth. The vectors containing the DNA segments of interest (e.g., polypeptides encoding a variant polypeptide) can be transferred into the host cell by well-known methods, which vary depending on the type of cellular host. For example, calcium chloride transfection is commonly utilized for prokaryotic cells, whereas calcium phosphate treatment or electroporation may be used for other cellular hosts.
The method lends itself readily to the formulation of test kits for use in diagnosis. Such a kit would comprise a carrier compartmentalized to receive in close confinement one or more containers wherein a first container may contain suitably labeled DNA or immunological probes. Other containers may contain reagents useful in the localization of the labeled probes, such as enzyme substrates. Still other containers may contain restriction enzymes, buffers etc., together with instructions for use.
Therapeutic Treatment of Hypertension:
Suppressing the effects of the mutations through antisense or short interfering (siRNA) technology provides an effective therapy for Hypertension. Much is known about ‘antisense’ or siRNA therapies targeting messenger RNA (mRNA) or nuclear DNA. Hlen et al., Biochem. Biophys. Acta 1049:99-125 (1990). The diagnostic test of the present invention is useful for determining which of the specific Hypertension mutations exist in a particular Hypertension patient; this allows for “custom” treatment of the patient with antisense or siRNA oligonucleotides only for the detected mutations. This patient-specific antisense therapy is also novel, and minimizes the exposure of the patient to any unnecessary antisense or siRNA therapeutic treatment. As used herein, an “antisense” oligonucleotide is one that base pairs with single stranded DNA or RNA by Watson-Crick base pairing and with duplex target DNA via Hoogsteen hydrogen bonds.
RNA interference refers to the process of sequence-specific post-transcriptional gene silencing in animals mediated by short interfering RNAs (siRNAs). The process of post-transcriptional gene silencing is an evolutionarily conserved cellular defense mechanism believed to prevent the expression of foreign genes. Such protection from foreign gene expression may have evolved in response to the production of double-stranded RNAs (dsRNAs) derived from viral infection, or from the random integration of transposon elements into the host genome. The presence of dsRNA in cells triggers the RNAi response through a mechanism that has yet to be fully characterized. The presence of long dsRNAs in cells stimulates the activity of a ribonuclease III enzyme referred to as dicer.
Dicer is involved in the processing of the dsRNA into short pieces of dsRNA known as short interfering RNAs (siRNAs). Short interfering RNAs derived from dicer activity are typically about 21 to about 23 nucleotides in length and comprise about 19 base pair duplexes. The RNAi response also uses an endonuclease complex, commonly referred to as an RNA-induced silencing complex (RISC). RISC mediates cleavage of single-stranded RNA having sequence complementarity to the antisense strand associated with the complex. RNA interference (RNAi) has been harnessed in laboratory cell culture systems and widely applied to identify the function of genes and their respective proteins. Moreover, RNAi holds promise for the development of a brand new class of drugs, capable of turning off disease-causing genes. These drugs could have specificity and potential applications in a number of therapeutic indications. For more detail see U.S. Pat. No. 5,854,038 entitled ‘Localization Of Therapeutic Agent In A Cell In Vitro’.
Another preferred methodology uses DNA directed RNA interference (ddRNAi). ddRNAi relies on RNA polymerase III (Pol 111) promoters (e.g. U6 or H1) for the expression of siRNA target sequences that have been transfected in mammalian cells.
Pol III directs the synthesis of small RNA transcripts whose 3′ ends are defined by termination within a stretch of 4-5 thymidines. These characteristics allow for the use of DNA templates to synthesize, in vivo, small RNA duplexes that are structurally equivalent to active siRNAs synthesized in vitro.
siRNA/RISC duplexes form in the cell and lead to the degradation of the target mRNA. siRNA target sequences then can be introduced into the cell by a ddRNAi expression cassette or by being cloned in a siRNA expression vector. For more detail see Gou, D. et al. (2003) Gene Silencing in mammalian cells by PCR-based short hairpin RNA FEBS 548,113-118.
The destructive effect of the Hypertension mutations in AGT, ACE, AGTR1, GPB, EDN1, EDN2, alpha-adducin, haptoglobin, CYP2C9, RGS2, ADRA1a, 11betaHSD2, ADRA1b, ADRA2A, ADRAB1, ADRAB2, REN, APOA, APOB, CETP, LIPC, EDNRB, or ENOS gene(s) is preferably reduced or eliminated using antisense or siRNA oligonucleotide agents. Such antisense agents target DNA, by triplex formation with double-stranded DNA, by duplex formation with single-stranded DNA during transcription, or both. In a preferred embodiment, antisense agents target messenger RNA coding for the mutated AGT, ACE, AGTR1, GPB, EDN1, EDN2, alpha-adducin, haptoglobin, CYP2C9, RGS2, ADRA1a, 11betaHSD2, ADRA1b, ADRA2A, ADRAB1, ADRAB2, REN, APOA, APOB, CETP, LIPC, EDNRB, or ENOS gene(s). Since the sequences of both the DNA and the mRNA are the same, it is not necessary to determine accurately the precise target to account for the desired effect. Procedures for inhibiting gene expression in cell culture and in vivo can be found, for example, in C. F. Bennett, et al. J. Liposome Res., 3:85 (1993) and C. Wahlestedt, et al. Nature, 363:260 (1993).
Antisense oligonucleotide therapeutic agents demonstrate a high degree of pharmaceutical specificity. This allows the combination of two or more antisense therapeutics at the same time, without increased cytotoxic effects. Thus, when a patient is diagnosed as having two or more Hypertension mutations in AGT, ACE, AGTR1, GPB, EDN1, EDN2, alpha-adducin, haptoglobin, CYP2C9, RGS2, ADRA1a, 11betaHSD2, ADRA1b, ADRA2A, ADRAB1, ADRAB2, REN, APOA, APOB, CETP, LIPC, EDNRB, or ENOS gene(s), the therapy is preferably tailored to treat the multiple mutations simultaneously. When combined with the present diagnostic test, this approach to “patient-specific therapy” results in treatment restricted to the specific mutations detected in a patient. This patient-specific therapy circumvents the need for ‘broad spectrum’ antisense treatment using all possible mutations. The end result is less costly treatment, with less chance for toxic side effects.
One method to inhibit the synthesis of proteins is through the use of antisense or triplex oligonucleotides, analogues or expression constructs. These methods entail introducing into the cell a nucleic acid sufficiently complementary in sequence so as to specifically hybridize to the target gene or to mRNA. In the event that the gene is targeted, these methods can be extremely efficient since only a few copies per cell are required to achieve complete inhibition. Antisense methodology inhibits the normal processing, translation or half-life of the target message. Such methods are well known to one skilled in the art.
Antisense and triplex methods generally involve the treatment of cells or tissues with a relatively short oligonucleotide, although longer sequences can be used to achieve inhibition. The oligonucleotide can be either deoxyribo- or ribonucleic acid and must be of sufficient length to form a stable duplex or triplex with the target RNA or DNA at physiological temperatures and salt concentrations. It should also be sufficiently complementary or sequence specific to specifically hybridize to the target nucleic acid. Oligonucleotide lengths sufficient to achieve this specificity are preferably about 10 to 60 nucleotides long, more preferably about 10 to 20 nucleotides long. However, hybridization specificity is not only influenced by length and physiological conditions but may also be influenced by such factors as GC content and the primary sequence of the oligonucleotide. Such principles are well known in the art and can be routinely determined by one who is skilled in the art.
As an example, many of the oligonucleotide sequences used in connection with probes can also be used as antisense agents, directed to either the DNA or resultant messenger RNA.
A great range of antisense sequences can be designed for a given mutation. Oligonucleotide sequences can be easily designed by one of ordinary skill in the art to function as RNA and DNA antisense sequences for the mutant genes AGT, ACE, AGTR1, GPB, EDN1, EDN2, alpha-adducin, haptoglobin, CYP2C9, RGS2, ADRA1a, 11betaHSD2, ADRA1b, ADRA2A, ADRAB1, ADRAB2, REN, APOA, APOB, CETP, LIPC, EDNRB, or ENOS gene(s).
As can be seen, permutations can be generated for a selected mutant antigene by truncating the 5′ end, truncating the 3′ end, extending the 5′ end, or extending the 3′ end. Both light chain and heavy chain mtDNA can be targeted. Other variations such as truncating the 5′ end and truncating the 3′ end, extending the 5′ end and extending the 3′ end, and truncating the 5′ end and extending the 3′ end, extending the 5′ end and truncating the 3′ end, and so forth are possible.
The composition of the antisense or triplex oligonucleotides can also influence the efficiency of inhibition. For example, it is preferable to use oligonucleotides that are resistant to degradation by the action of endogenous nucleases. Nuclease resistance will confer a longer in vivo half-life to the oligonucleotide thus increasing its efficacy and reducing the required dose. Greater efficacy may also be obtained by modifying the oligonucleotide so that it is more permeable to cell membranes. Such modifications are well known in the art and include the alteration of the negatively charged phosphate backbone bases, or modification of the sequences at the 5′ or 3′ terminus with agents such as intercalators and crosslinking molecules. Specific examples of such modifications include oligonucleotide analogs that contain methylphosphonate (Miller, P. S., Biotechnology, 2:358-362 (1991)), phosphorothioate (Stein, Science 261:1004-1011 (1993)) and phosphorodithioate linkages (Brill, W. K-D., J. Am. Chem. Soc., 111:2322 (1989)). Other types of linkages and modifications exist as well, such as a polyamide backbone in peptide nucleic acids (Nielson et al., Science 254:1497 (1991)), formacetal (Matteucci, M., Tetrahedron Lett. 31:2385-2388 (1990)) carbamate and morpholine linkages as well as others known to those skilled in the art. In addition to the specificity afforded by the antisense agents, the target RNA or genes can be irreversibly modified by incorporating reactive functional groups in these molecules which covalently link the target sequences e.g. by alkylation.
Recombinant methods known in the art can also be used to achieve the antisense or triplex inhibition of a target nucleic acid. For example, vectors containing antisense nucleic acids can be employed to express protein or antisense message to reduce the expression of the target nucleic acid and therefore its activity. Such vectors are known or can be constructed by those skilled in the art and should contain all expression elements necessary to achieve the desired transcription of the antisense or triplex sequences. Other beneficial characteristics can also be contained within the vectors such as mechanisms for recovery of the nucleic acids in a different form. Phagemids are a specific example of such beneficial vectors because they can be used either as plasmids or as bacteriophage vectors. Examples of other vectors include viruses, such as bacteriophages, baculoviruses and retroviruses, cosmids, plasmids, liposomes and other recombination vectors. The vectors can also contain elements for use in either procaryotic or eukaryotic host systems. One of ordinary skill in the art will know which host systems are compatible with a particular vector.
The vectors can be introduced into cells or tissues by any one of a variety of known methods within the art. Such methods are described for example in Sambrook et al., Molecular Cloning: A Laboratory Manual, Cold Spring Harbor Laboratory, New York (1992), which is hereby incorporated by reference, and in Ausubel et al., Current Protocols in Molecular Biology, John Wiley and Sons, Baltimore, Md. (1989), which is also hereby incorporated by reference. The methods include, for example, stable or transient transfection, lipofection, electroporation and infection with recombinant viral vectors. Introduction of nucleic acids by infection offers several advantages over the other listed methods which includes their use in both in vitro and in vivo settings. Higher efficiency can also be obtained due to their infectious nature. Moreover, viruses are very specialized and typically infect and propagate in specific cell types. Thus, their natural specificity can be used to target the antisense vectors to specific cell types in vivo or within a tissue or mixed culture of cells. Viral vectors can also be modified with specific receptors or ligands to alter target specificity through receptor mediated events.
A specific example of a viral vector for introducing and expressing antisense nucleic acids is the adenovirus derived vector Adenop53TX. This vector expresses a herpes virus thymidine kinase (TX) gene for either positive or negative selection and an expression cassette for desired recombinant sequences such as antisense sequences. This vector can be used to infect cells including most cancers of epithelial origin, glial cells and other cell types. This vector as well as others that exhibit similar desired functions can be used to treat a mixed population of cells to selectively express the antisense sequence of interest. A mixed population of cells can include, for example, in vitro or ex vivo culture of cells, a tissue or a human subject.
Additional features may be added to the vector to ensure its safety and/or enhance its therapeutic efficacy. Such features include, for example, markers that can be used to negatively select against cells infected with the recombinant virus. An example of such a negative selection marker is the TK gene described above that confers sensitivity to the antibiotic gancyclovir. Negative selection is therefore a means by which infection can be controlled because it provides inducible suicide through the addition of antibiotics. Such protection ensures that if, for example, mutations arise that produce mutant forms of the viral vector or antisense sequence, cellular transformation will not occur. Moreover, features that limit expression to particular cell types can also be included. Such features include, for example, promoter and expression elements that are specific for the desired cell type.
The foregoing and following description of the invention and the various embodiments is not intended to be limiting of the invention but rather is illustrative thereof. Those skilled in the art of molecular genetics can formulate further embodiments encompassed within the scope of the present invention.
1.times. SSC=150 mM sodium chloride, 15 mM sodium citrate, pH 6.5-8
SDS=sodium dodecyl sulfate
BSA=bovine serum albumin, fraction IV
probe=a labelled nucleic acid, generally a single-stranded oligonucleotide, which is complementary to the DNA target immobilized on the membrane. The probe may be labelled with radioisotopes (such as .sup.32P), haptens (such as digoxigenin), biotin, enzymes (such as alkaline phosphatase or horseradish peroxidase), fluorophores (such as fluorescein or Texas Red), or chemilumiphores (such as acridine).
PCR=polymerase chain reaction, as described by Erlich et al., Nature 331:461-462 (1988) hereby incorporated by reference.
Sequencing of AGT, ACE, AGTR1, GPB, EDN1, EDN2, alpha-adducin, haptoglobin, CYP2C9, RGS2, ADRA1a, 11betaHSD2, ADRA1b, ADRA2A, ADRAB1, ADRAB2, REN, APOA, APOB, CETP, LIPC, EDNRB, or ENOS gene(s)
Plasmid DNA containing the AGT, ACE, AGTR1, GPB, EDN1, EDN2, alpha-adducin, haptoglobin, CYP2C9, RGS2, ADRA1a, 11betaHSD2, ADRA1b, ADRA2A, ADRAB1, ADRAB2, REN, APOA, APOB, CETP, LIPC, EDNRB, or ENOS gene(s) gene inserts is isolated using the Plasmid Quik.TM. Plasmid Purification Kit (Stratagene, San Diego, Calif.) or the Plasmid Kit (Qiagen, Chatsworth, Calif., Catalog #12145). Plasmid DNA is purified from 50 ml bacterial cultures. For the Stratagene protocol “Procedure for Midi Columns,” steps 10-12 of the kit protocol are replaced with a precipitation step using 2 volumes of 100% ethanol at −20.degree. C., centrifugation at 6,000.times. g for 15 minutes, a wash step using 80% ethanol and resuspension of the DNA sample in 100 ul TE buffer. DNA concentration is determined by horizontal agarose gel electrophoresis, or by UV absorption at 260 nm.
Sequencing reactions using double-stranded plasmid DNA are performed using the Sequenase Kit (United States Biochemical Corp., Cleveland, Ohio.; catalog #70770), the BaseStation T7 Kit (Millipore Corp.; catalog #MBBLSEQ01), the Vent Sequencing Kit (Millipore Corp; catalog #MBBLVEN01), the AmpliTaq Cycle Sequencing Kit (Perkin Elmer Corp.; catalog #N808-0110) and the Taq DNA Sequencing Kit (Boehringer Mannheim). The DNA sequences are detected by fluorescence using the BaseStation Automated DNA Sequencer (Millipore Corp.). For gene walking experiments, fluorescent oligonucleotide primers are synthesized on the Cyclone Plus DNA Synthesizer (Millipore Corp.) or the GeneAssembler DNA Synthesizer (Pharmacia LKB Biotechnology, Inc.) utilizing beta-cyanoethylphosphoramidite chemistry. Primer sequences are prepared from the published Cambridge sequences of the AGT, ACE, AGTR1, GPB, EDN1, EDN2, alpha-adducin, haptoglobin, CYP2C9, RGS2, ADRA1a, 11betaHSD2, ADRA1b, ADRA2A, ADRAB1, ADRAB2, REN, APOA, APOB, CETP, LIPC, EDNRB, or ENOS gene(s) by using public reference sources such as http://www.snpperchip.org Primers are deprotected and purified as described above. DNA concentration is determined by UV absorption at 260 nm.
Sequencing reactions are performed according to manufacturer's instructions except for the following modification: 1) the reactions are terminated and reduced in volume by heating the samples without capping to 94.degree. C. for 5 minutes, after which 4 .mu.l of stop dye (3 mg/ml dextran blue, 95%-99% formamide; as formulated by Millipore Corp.) are added; 2) the temperature cycles performed for the AmpliTaq Cycle Sequencing Kit reactions, the Vent Sequencing kit reactions, and the Taq Sequence Kit consist of one cycle at 95.degree. C. for 10 seconds, 30 cycles at 95.degree. C. for 20 seconds, at 44.degree. C. for 20 seconds and at 72.degree. C. for 20 seconds followed by a reduction in volume by heating without capping to 94.degree. C. for 5 minutes before adding 4 .mu.l of stop dye.
Electrophoresis and gel analysis are performed using the BioImage and BaseStation Software provided by the manufacturer for the BaseStation Automated DNA Sequencer (Millipore Corp.). Sequencing gels are prepared according to the manufacturer's specifications. An average of ten different clones from each individual is sequenced. The resulting AGT, ACE, AGTR1, GPB, EDN1, EDN2, alpha-adducin, haptoglobin, CYP2C9, RGS2, ADRA1a, 11betaHSD2, ADRA1b, ADRA2A, ADRAB1, ADRAB2, REN, APOA, APOB, CETP, LIPC, EDNRB, or ENOS gene(s) sequences are aligned and compared with published Cambridge sequences. Mutations in the derived sequence are noted and confirmed by resequencing the variant region.
As an alternative procedure for sequencing the AGT, ACE, AGTR1, GPB, EDN1, EDN2, alpha-adducin, haptoglobin, CYP2C9, RGS2, ADRA1a, 11betaHSD2, ADRA1b, ADRA2A, ADRAB1, ADRAB2, REN, APOA, APOB, CETP, LIPC, EDNRB, or ENOS gene(s), plasmid DNA containing the AGT, ACE, AGTR1, GPB, EDN1, EDN2, alpha-adducin, haptoglobin, CYP2C9, RGS2, ADRA1a, 11betaHSD2, ADRA1b, ADRA2A, ADRAB1, ADRAB2, REN, APOA, APOB, CETP, LIPC, EDNRB, or ENOS gene(s) gene inserts obtained is isolating the inserts using the Plasmid Quik.TM. Plasmid Purification Kit with Midi Columns (Qiagen, Chatsworth, Calif.) Plasmid DNA is purified from 35 ml bacterial cultures. The isolated DNA is resuspended in 100 .mu.l TE buffer. DNA concentrations are determined by OD (260) absorption.
As an alternative method, sequencing reactions using double stranded plasmid DNA are performed using the Prism.TM. Ready Reaction DyeDeoxy.TM. Terminator Cycle Sequencing Kit (Applied Biosystems, Inc., Foster City, Calif.). The DNA sequences are detected by fluorescence using the ABI 373A Automated DNA Sequencer (Applied Biosystems, Inc., Foster City, Calif.). For gene walking experiments, oligonucleotide primers are synthesized on the ABI 394 DNA/RNA Synthesizer (Applied Biosystems, Inc., Foster City, Calif.) using standard beta-cyanoethylphosphoramidite chemistry. Primer sequences are prepared from the published Cambridge sequences of the AGT, ACE, AGTR1, GPB, EDN1, EDN2, alpha-adducin, haptoglobin, CYP2C9, RGS2, ADRA1a, 11betaHSD2, ADRA1b, ADRA2A, ADRAB1, ADRAB2, REN, APOA, APOB, CETP, LIPC, EDNRB, or ENOS gene(s).
Sequencing reactions are performed according to the manufacturer's instructions. Electrophoresis and sequence analysis are performed using the ABI 373A Data Collection and Analysis Software and the Sequence Navigator Software (ABI, Foster City, Calif.). Sequencing gels are prepared according to the manufacturer's specifications. An average of ten different clones from each individual is sequenced. The resulting AGT, ACE, AGTR1, GPB, EDN1, EDN2, alpha-adducin, haptoglobin, CYP2C9, RGS2, ADRA1a, 11betaHSD2, ADRA1b, ADRA2A, ADRAB1, ADRAB2, REN, APOA, APOB, CETP, LIPC, EDNRB, or ENOS gene(s) sequences are aligned and compared with the published Cambridge sequence. Mutations in the derived sequence are noted and confirmed by sequence of the complementary DNA strand.
Mutations in each AGT, ACE, AGTR1, GPB, EDN1, EDN2, alpha-adducin, haptoglobin, CYP2C9, RGS2, ADRA1a, 11betaHSD2, ADRA1b, ADRA2A, ADRAB1, ADRAB2, REN, APOA, APOB, CETP, LIPC, EDNRB, or ENOS gene(s) for each individual are compiled. Comparisons of mutations between normal and Hypertension patients are made and an algorithm, described below, is used to provide diagnostic or prognostic prediction.
Detection of AGT, ACE, AGTR1, GPB, EDN1, EDN2, alpha-adducin, haptoglobin, CYP2C9, RGS2, ADRA1a, 11betaHSD2, ADRA1b, ADRA2A, ADRAB1, ADRAB2, REN, APOA, APOB, CETP, LIPC, EDNRB, or ENOS gene(s) Mutations by Hybridization Without Prior Amplification
This example illustrates taking test sample blood, blotting the DNA, and detecting by oligonucleotide hybridization in a dot blot format. This example uses two probes to determine the presence of the abnormal mutations of the AGT, ACE, AGTR1, GPB, EDN1, EDN2, alpha-adducin, haptoglobin, CYP2C9, RGS2, ADRA1a, 11betaHSD2, ADRA1b, ADRA2A, ADRAB1, ADRAB2, REN, APOA, APOB, CETP, LIPC, EDNRB, or ENOS gene(s) in DNA of Hypertension patients. This example utilizes a dot-blot format for hybridization, however, other known hybridization formats, such as Southern blots, slot blots, “reverse” dot blots, solution hybridization, solid support based sandwich hybridization, bead-based, silicon chip-based and microtiter well-based hybridization formats can also be used.
Sample Preparation Extracts and Blotting of DNA onto Membranes:
Whole blood is taken from the patient. The blood is mixed with an equal volume of 0.5-1 N NaOH, and is incubated at ambient temperature for ten to twenty minutes to lyse cells, degrade proteins, and denature any DNA. The mixture is then blotted directly onto prewashed nylon membranes, in multiple aliquots. The membranes are rinsed in 10.times. SSC (1.5 M NaCl, 0.15 M Sodium Citrate, pH 7.0) for five minutes to neutralize the membrane, then rinsed for five minutes in 1.times. SSC. For storage, if any, membranes are air-dried and sealed. In preparation for hybridization, membranes are rinsed in 1.times. SSC, 1% SDS.
Alternatively, 1-10 mls of whole blood is fractionated by standard methods, and the white cell layer (“buffy coat”) is separated. The white cells are lysed, digested, and the DNA extracted by conventional methods (organic extraction, non-organic extraction, or solid phase). The DNA is quantitated by UV absorption or fluorescent dye techniques. Standardized amounts of DNA (0.1-5 .mu.g) are denatured in base, and blotted onto membranes. The membranes are then rinsed.
Alternative methods of preparing cellular DNA, such as isolation of DNA by mild cellular lysis and centrifugation, may also be used.
Hybridization and Detection:
For examples of synthesis, labelling, use, and detection of oligonucleotide probes, see “Oligonucleotides and Analogues: A Practical Approach”, F. Eckstein, ed., Oxford University Press (1992); and “Synthetic Chemistry of Oligonucleotides and Analogs”, S. Agrawal, ed., Humana Press (1993), which are incorporated herein by reference.
For detection and quantitation of the abnormal mutation, membranes containing duplicate samples of DNA are hybridized in parallel; one membrane is hybridized with the wild-type probe, the other with the Hypertension gene probe. Alternatively, the same membrane can be hybridized sequentially with both probes and the results compared.
For example, the membranes with immobilized DNA are hydrated briefly (10-60 minutes) in 1.times. SSC, 1% SDS, then prehybridized and blocked in 5.times. SSC, 1% SDS, 0.5% casein, for 30-60 minutes at hybridization temperature (35-60.degree. C., depending on which probe is used). Fresh hybridization solution containing probe (0.1-10 nM, ideally 2-3 nM) is added to the membrane, followed by hybridization at appropriate temperature for 15-60 minutes. The membrane is washed in 1.times. SSC, 11 SDS, 1-3 times at 45-60.degree. C. for 5-10 minutes each (depending on probe used), then 1-2 times in 1.times. SSC at ambient temperature. The hybridized probe is then detected by appropriate means.
The average proportion of Hypertension AGT, ACE, AGTR1, GPB, EDN1, EDN2, alpha-adducin, haptoglobin, CYP2C9, RGS2, ADRA1a, 11betaHSD2, ADRA1b, ADRA2A, ADRAB1, ADRAB2, REN, APOA, APOB, CETP, LIPC, EDNRB, or ENOS gene(s) to wild-type gene(s) in the same patient can be determined by the ratio of the signal of the Hypertension probe to the normal probe. This is a semiquantitative measure of % heteroplasmy in the Hypertension patient and can be correlated to the severity of the disease.
The above and other probes for alteration and quantitation of wild-type and mutant DNA samples can be found at http://www.snpper.chip.org and typing in the RS numbers of the relevant mutations.
Detection of ABCB1, ABCB4, COMT, CRHR1, CRHR2, CRHBP, CYP2D6, CYP2D19, DRD2, DRD3, HRT2A, HTR3A, HTR3B, MAOA, MAOB, SLC6A3, OR SLC6A4 Mutations by Hybridization (Without Prior Amplification)
A. Slot-Blot Detection of RNA/DNA with .sup.32P Probes
This example illustrates detection of AGT, ACE, AGTR1, GPB, EDN1, EDN2, alpha-adducin, haptoglobin, CYP2C9, RGS2, ADRA1a, 11betaHSD2, ADRA1b, ADRA2A, ADRAB1, ADRAB2, REN, APOA, APOB, CETP, LIPC, EDNRB, or ENOS gene(s) mutations by slot-blot detection of DNA with .sup.32p probes. The reagents are prepared as follows: 4.times. BP: 2% (w/v) Bovine serum albumin (BSA), 2% (w/v) polyvinylpyrrolidone (PVP, Mol. Wt.: 40,000) is dissolved in sterile H.sub.20 and filtered through 0.22-.mu. cellulose acetate membranes (Coming) and stored at −20.degree. C. in 50-ml conical tubes.
DNA is denatured by adding TE to the sample for a final volume of 90 .mu.l. 10 .mu.l of 2 N NaOH is then added and the sample vortexed, incubated at 65.degree. C. for 30 minutes, and then put on ice. The sample is neutralized with 100 .mu.l of 2 M ammonium acetate.
A wet piece of nitrocellulose or nylon is cut to fit the slot-blot apparatus according to the manufacturer's directions, and the denatured samples are loaded. The nucleic acids are fixed to the filter by baking at 80.degree. C. under vacuum for 1 hr or exposing to UV light (254 nm). The filter is prehybridized for 10-30 minutes in 5 mis of 1.times. BP, 5.times. SSPE, 1% SDS at the temperature to be used for the hybridization incubation. For 15-30-base probes, the range of hybridization temperatures is between 35-60.degree. C. For shorter probes or probes with low G-C content, a lower temperature is used. At least 2.times.10.sup.6 cpm of detection oligonucleotide per ml of hybridization solution is added. The filter is double sealed in Scotchpak.TM. heat sealable pouches (Kapak Corporation) and incubated for 90 min. The filter is washed 3 times at room temperature with 5-minute washes of 20.times. SSPE: 3M NaCl, 0.02M EDTA, 0.2 Sodium Phospate, pH 7.4, 1% SDS on a platform shaker. For higher stringency, the filter can be washed once at the hybridization temperature in 1.times. SSPE, 1% SDS for 1 minute. Visualization is by autoradiography on Kodak XAR film at −70.degree. C. with an intensifying screen. To estimate the amount of target, compare the amount of target detected by visual comparison with hybridization standards of known concentration.
B. Detection of RNA/DNA by Slot-Blot Analysis with Alkaline Phosphatase-Oligonucleotide Conjugate Probes
This example illustrates detection of AGT, ACE, AGTR1, GPB, EDN1, EDN2, alpha-adducin, haptoglobin, CYP2C9, RGS2, ADRA1a, 11betaHSD2, ADRA1b, ADRA2A, ADRAB1, ADRAB2, REN, APOA, APOB, CETP, LIPC, EDNRB, or ENOS gene(s) mutations by slot-blot detection of DNA with alkaline phosphatase-oligonucleotide conjugate probes, using either a color reagent or a chemiluminescent reagent. The reagents are prepared as follows:
For the color reagent, the following are mixed together, fresh 0.16 mg/ml 5-bromo-4-chloro-3-indolyl phosphate (BCIP), 0.17 mg/ml nitroblue tetrazolium (NBT) in 100 mM NaCl, 100 mM Tris. HCl, 5 mM MgCl.sub.2 and 0.1 mM ZnCl.sub.2, pH 9.5.
Chemiluminescent Reagent:
For the chemiluminescent reagent, the following are mixed together, 250 .mu.M 3-adamantyl 4-methoxy 4-(2-phospho)phenyl dioxetane (AMPPD), (Tropix Inc., Bedford, Mass.) in 100 mM diethanolamine-HCl, 1 mM MgCl.sub.2 pH 9.5, or preformulated dioxetane substrate Lumiphos.TM. 530 (Lumigen, Inc., Southfield, Mich.).
DNA target (0.01-50 fmol) is immobilized on a nylon membrane as described above. The nylon membrane is incubated in blocking buffer (0.2% I-Block (Tropix, Inc.), 0.5.times. SSC, 0.1% Tween 20) for 30 min. at room temperature with shaking. The filter is then prehybridized in hybridization solution (5.times. SSC, 0.5% BSA, 1% SDS) for 30 minutes at the hybridization temperature (37-60.degree. C.) in a sealable bag using 50-100 .mu.l of hybridization solution per cm of membrane. The solution is removed and briefly washed in warm hybridization buffer. The conjugate probe is then added to give a final concentration of 2-5 nM in fresh hybridization solution and final volume of 50-100 .mu.l/cm.sup.2 of membrane. After incubating for 30 minutes at the hybridization temperature with agitation, the membrane is transferred to a wash tray containing 1.5 ml of preheated wash-1 solution (1.times. SSC, 0.1% SDS)/cm.sup.2 of membrane and agitated at the wash temperature (usually optimum hybridization temperature minus 10.degree. C.) for 10 minutes. Wash-1 solution is removed and this step is repeated once more. Then wash-2 solution (1.times. SSC) added and then agitated at the wash temperature for 10 minutes. Wash-2 solution is removed and immediate detection is done by color.
Detection by color is done by immersing the membrane fully in color reagent, and incubating at 20-37.degree. C. until color development is adequate. When color development is adequate, the development is quenched by washing in water.
For chemiluminescent detection, the following wash steps are performed after the hybridization step (see above). Thus, the membrane is washed for 10 min. with wash-i solution at room temperature, followed by two 3-5 min. washes at 50-60.degree. C. with wash-3 solution (0.5° SSC, 0.1% SDS). The membrane is then washed once with wash-4 solution (1.times. SSC, 1% Triton X 100) at room temperature for 10 min., followed by a 10 min. wash at room temperature with wash-2 solution. The membrane is then rinsed briefly (.about.1 min.) with wash-5 solution (50 mM NaHCO.sub.3/1 mM MgCl.sub.2, pH 9.5).
Detection by chemiluminescence is done by immersing the membrane in luminescent reagent, using 25-50 .mu.l solution/cm.sup.2 of membrane. Kodak XAR-5 film (or equivalent; emission maximum is at 477 .mu.m) is exposed in a light-tight cassette for 1-24 hours, and the film developed.
Detection of AGT, ACE, AGTR1, GPB, EDN1, EDN2, alpha-adducin, haptoglobin, CYP2C9, RGS2, ADRA1a, 11betaHSD2, ADRA1b, ADRA2A, ADRAB1, ADRAB2, REN, APOA, APOB, CETP, LIPC, EDNRB, or ENOS gene(s) Mutations by Amplification and Hybridization
This example illustrates taking a test sample of blood, preparing DNA, amplifying a section of a specific AGT, ACE, AGTR1, GPB, EDN1, EDN2, alpha-adducin, haptoglobin, CYP2C9, RGS2, ADRA1a, 11betaHSD2, ADRA1b, ADRA2A, ADRAB1, ADRAB2, REN, APOA, APOB, CETP, LIPC, EDNRB, or ENOS gene(s) gene(s) by polymerase chain reaction (PCR), and detecting the mutation by oligonucleotide hybridization in a dot blot format.
Sample Preparation and Preparing of DNA:
Whole blood is taken from the patient. The blood is lysed, and the DNA prepared for PCR by using procedures described in Example III.
Amplification of Target AGT, ACE, AGTR1, GPB, EDN1, EDN2, alpha-adducin, haptoglobin, CYP2C9, RGS2, ADRA1a, 11betaHSD2, ADRA1b, ADRA2A, ADRAB1, ADRAB2, REN, APOA, APOB, CETP, LIPC, EDNRB, or ENOS gene(s) Gene(s) by Polymerase Chain Reaction, and Blotting onto Membranes:
The treated DNA from the test sample is amplified using procedures described in Example 1. After amplification, the DNA is denatured, and blotted directly onto prewashed nylon membranes, in multiple aliquots. The membranes are rinsed in 10.times. SSC for five minutes to neutralize the membrane, then rinsed for five minutes in 1.times. SSC. For storage, if any, membranes are air-dried and sealed. In preparation for hybridization, membranes are rinsed in 1.times. SSC, 1% SDS.
Hybridization and Detection:
Hybridization and detection of the amplified genes are accomplished as detailed in Example V.
Although the invention has been described with reference to the disclosed embodiments, those skilled in the art will readily appreciate that the specific examples provided herein are only illustrative of the invention and not limitative thereof. It should be understood that various modifications can be made without departing from the scope of the invention.
Standard manufacturer protocols for solid phase phosphoramidite-based DNA or RNA synthesis using an ABI DNA synthesizer are employed to prepare antisense oligomers. Phosphoroamidite reagent monomers (T, C, A, G, and U) are used as received from the supplier. Applied Biosystems Division/Perkin Elmer, Foster City, Calif. For routine oligomer synthesis, 1 .mu.mole scale syntheses reactions are carried out utilizing THF/I.sub.2/lutidine for oxidation of the phosphoramidite and Beaucage reagent for preparation of the phosphorothioate oligomers. Cleavage from the solid support and deprotection are carried out using ammonium hydroxide under standard conditions. Purification is carried out via reverse phase HPLC and quantification and identification is performed by UV absorption measurements at 260 nm, and mass spectrometry.
Inhibition of Mutant DNA in Cell Culture
Antisense phosphorothioate oligomer complementary to the AGT, ACE, AGTR1, GPB, EDN1, EDN2, alpha-adducin, haptoglobin, CYP2C9, RGS2, ADRA1a, 11betaHSD2, ADRA1b, ADRA2A, ADRAB1, ADRAB2, REN, APOA, APOB, CETP, LIPC, EDNRB, or ENOS gene(s) gene mutation(s) and thus non-complementary to wild-type AGT, ACE, AGTR1, GPB, EDN1, EDN2, alpha-adducin, haptoglobin, CYP2C9, RGS2, ADRA1a, 11betaHSD2, ADRA1b, ADRA2A, ADRAB1, ADRAB2, REN, APOA, APOB, CETP, LIPC, EDNRB, or ENOS gene(s) mutant RNA(s), respectively, is added to fresh medium containing Lipofectin.RTM. Gibco BRL (Gaithersburg, Md.) at a concentration of 10 .mu.g/ml to make final concentrations of 0.1, 0.33, 1, 3.3, and 10 .mu.M. These are incubated for 15 minutes then applied to the cell culture. The culture is allowed to incubate for 24 hours and the cells are harvested and the DNA isolated and sequenced as in previous examples. Quantitative analysis results shows a decrease in mutant AGT, ACE, AGTR1, GPB, EDN1, EDN2, alpha-adducin, haptoglobin, CYP2C9, RGS2, ADRA1a, 11betaHSD2, ADRA1b, ADRA2A, ADRAB1, ADRAB2, REN, APOA, APOB, CETP, LIPC, EDNRB, or ENOS gene(s) DNA(s) to a level of less than 1% of total AGT, ACE, AGTR1, GPB, EDN1, EDN2, alpha-adducin, haptoglobin, CYP2C9, RGS2, ADRA1a, 11betaHSD2, ADRA1b, ADRA2A, ADRAB1, ADRAB2, REN, APOA, APOB, CETP, LIPC, EDNRB, or ENOS gene(s), respectively.
The antisense phosphorothioate oligomer non-complementary to the AGT, ACE, AGTR1, GPB, EDN1, EDN2, alpha-adducin, haptoglobin, CYP2C9, RGS2, ADRA1a, 11betaHSD2, ADRA1b, ADRA2A, ADRAB1, ADRAB2, REN, APOA, APOB, CETP, LIPC, EDNRB, or ENOS gene(s) gene mutation(s) and non-complementary to wild-type AGT, ACE, AGTR1, GPB, EDN1, EDN2, alpha-adducin, haptoglobin, CYP2C9, RGS2, ADRA1a, 11betaHSD2, ADRA1b, ADRA2A, ADRAB1, ADRAB2, REN, APOA, APOB, CETP, LIPC, EDNRB, or ENOS gene(s), respectively is added to fresh medium containing lipofectin at a concentration of 10 .mu.g/mL to make final concentrations of 0. 1, 0.33, 1, 3.3, and 10 .mu.M. These are incubated for 15 minutes then applied to the cell culture. The culture is allowed to incubate for 24 hours and the cells are harvested and the DNA isolated and sequenced as in previous examples. Quantitative analysis results showed no decrease in mutant AGT, ACE, AGTR1, GPB, EDN1, EDN2, alpha-adducin, haptoglobin, CYP2C9, RGS2, ADRA1a, 11betaHSD2, ADRA1b, ADRA2A, ADRAB1, ADRAB2, REN, APOA, APOB, CETP, LIPC, EDNRB, or ENOS gene(s) DNA, respectively.
Inhibition of Mutant DNA In Vivo
Mice are divided into six groups of 10 animals per group. The animals are housed and fed as per standard protocols. To groups 1 to 4 is administered ICV, antisense phosphorothioate oligonucleotide, prepared as described in Example V, complementary to mutant AGT, ACE, AGTR1, GPB, EDN1, EDN2, alpha-adducin, haptoglobin, CYP2C9, RGS2, ADRA1a, 11betaHSD2, ADRA1b, ADRA2A, ADRAB1, ADRAB2, REN, APOA, APOB, CETP, LIPC, EDNRB, or ENOS gene(s) gene RNA(s), respectively 0.1, 0.33, 1.0 and 3.3 nmol each in 5 .mu.L. To group 5 is administered ICV 1.0 nmol in 5 .mu.L of phosphorothioate oligonucleotide non-complementary to mutant AGT, ACE, AGTR1, GPB, EDN1, EDN2, alpha-adducin, haptoglobin, CYP2C9, RGS2, ADRA1a, 11betaHSD2, ADRA1b, ADRA2A, ADRAB1, ADRAB2, REN, APOA, APOB, CETP, LIPC, EDNRB, or ENOS gene(s) gene RNA(s) and non-complementary to wild-type AGT, ACE, AGTR1, GPB, EDN1, EDN2, alpha-adducin, haptoglobin, CYP2C9, RGS2, ADRA1a, 11betaHSD2, ADRA1b, ADRA2A, ADRAB1, ADRAB2, REN, APOA, APOB, CETP, LIPC, EDNRB, or ENOS gene(s) gene RNA(s), respectively. To group 6 is administered ICV vehicle only. Dosing is performed once a day for ten days. The animals are sacrificed and samples of relevant tissue collected. This tissue is treated as previously described and the DNA isolated and quantitatively analyzed as in previous examples. Results show a decrease in mutant ABCB1, ABCB4, COMT, CRHR1, CRHR2, CRHBP, CYP2D6, CYP2D19, DRD2, DRD3, HRT2A, HTR3A, HTR3B, MAOA, MAOB, SLC6A3, and/or or SLC6A4 DNA to a level of less than 1% of total AGT, ACE, AGTR1, GPB, EDN1, EDN2, alpha-adducin, haptoglobin, CYP2C9, RGS2, ADRA1a, 11betaHSD2, ADRA1b, ADRA2A, ADRAB1, ADRAB2, REN, APOA, APOB, CETP, LIPC, EDNRB, or ENOS gene(s) for the antisense treated group and no decrease for the control group.
Algorithmic Methodology for Determining Relevance of Combinations of Mutations to Hypertension Diagnosis or Prognosis
As was mentioned previously, combinations of mutations might combine in non-linear fashion in determining their effect on diagnosis and prognosis. The present invention demonstrates this as well. A previous example showed that using a trained learning algorithm of neural network and support vector machine type, an average predictability rate of 80% could be achieved in a population that the trained algorithm had never seen before, i.e. an evaluative population.
It is well known to those of ordinary skill in the art that predictive algorithms have three measures of testing, each of increasing validation: how well the algorithm does on data it has learned, called a training population; how well the algorithm does on data that is similar to the original dataset but not trained on, called a testing population; and how well the algorithm does on data it has never seen before, called an evaluation population. What is extremely spectacular about the present invention is its level of predictability in an evaluation population, which indicates its generalizability to a larger population.
It is therefore important to realize that in order to be interpreted into a clinical result that an algorithm must be used to determine the individual contribution each marker makes to the phenotype of interest.
As with identification of the pertinent alleles in the first instance, a algorithm is both (i) selected and (ii) trained to relate (i) identified pre-selected markers and/or characteristics of SNP patterns (as selectively appear in the genomic sequences of each of large number of historical patients) with (ii) the clinical histories of the response of these patients to some particular disease (e.g., breast cancer) in consideration of therapies applied, most commonly drugs. As before, (i) selecting and (ii) training the algorithm to the commonly vast historical clinical data, and to some scores or even hundreds of alleles, is a computationally intensive task normally performed over the period of some hours or days on a supercomputer.
Properly performed—and causal relationships, howsoever complex and permuted, residing somewhere within the data—the resulting (i) selected, and (ii) trained, algorithm will itself be the “synthesis solution”. The algorithm will itself be the expression of what can be known from the data. The later use, and exercise, of the algorithm is only so as to give “answers” for particular questions (i.e., what should be expected from administration of some particular drug) for particular patients (i.e., as are possessed of a particular pattern of markers and/or SNP pattern). Notably, the algorithm can exercised so as to validate its own performance (or lack thereof). The clinical data for the many patients, and patient histories, can be fed into the (selected, trained) algorithm, one patient at a time. Does the algorithm accurately predict what historical data shows to have actually happened? A properly selected and trained algorithm is normally much more accurate in its prognostications (for the useful questions that it may suitably answer) than is any human physician. The physician's judgment ultimately controls, but the “advice” of the algorithm “solution” constitutes a useful adjunct to the physician's judgment in the considerably complex area of relating a patient's therapy to his or her genetic profile.
Without indicating preference for a particular algorithmic technique, one of the preferred embodiments of the present invention is using a neural network to deliver a diagnostic or prognostic prediction using the markers declared previously. In this embodiment, a neural network is used to map the inputs to the outputs. Inputs are selected from using a feature selection algorithm. As above, we note that the construct of a neural network is not crucial to our method. Any mapping procedure between inputs and outputs that produces a measure of goodness of fit for the training data and maximizes it with a standard optimization routine would also suffice.
Once the network is trained, it is ready for use by a clinician. The clinician enters the same network inputs used during training of the network, and the trained network outputs a maximum likelihood estimator for the value of the output given the inputs for the current patient. The clinician or patient can then act on this value. We note that a straightforward extension of our technique could produce an optimum range of output values given the patient's inputs.
In another preferred embodiment of the present invention, an algorithm using a committee network is trained to deliver a diagnostic or prognostic prediction using the markers or combinations thereof declared previously.
While the invention has been described and exemplified in sufficient detail for those skilled in this art to make and use it, various alternatives, modifications, and improvements should be apparent without departing from the spirit and scope of the invention.
One skilled in the art readily appreciates that the present invention is well adapted to carry out the objects and obtain the ends and advantages mentioned, as well as those inherent therein. The examples provided herein are representative of preferred embodiments, are exemplary, and are not intended as limitations on the scope of the invention. Modifications therein and other uses will occur to those skilled in the art. These modifications are encompassed within the spirit of the invention and are defined by the scope of the claims.
All patents and publications mentioned in the specification are indicative of the levels of those of ordinary skill in the art to which the invention pertains. All patents and publications are herein incorporated by reference to the same extent as if each individual publication was specifically and individually indicated to be incorporated by reference.
The invention illustratively described herein suitably may be practiced in the absence of any element or elements, limitation or limitations which is not specifically disclosed herein. Thus, for example, in each instance herein any of the terms “comprising”, “consisting essentially of” and “consisting of” may be replaced with either of the other two terms. The terms and expressions which have been employed are used as terms of description and not of limitation, and there is no intention that in the use of such terms and expressions of excluding any equivalents of the features shown and described or portions thereof, but it is recognized that various modifications are possible within the scope of the invention claimed. Thus, it should be understood that although the present invention has been specifically disclosed by preferred embodiments and optional features, modification and variation of the concepts herein disclosed may be resorted to by those skilled in the art, and that such modifications and variations are considered to be within the scope of this invention as defined by the appended claims.
Other embodiments are set forth within the following claims.
This application is related to and claims priority from U.S. Provisional Patent Application No. 60/505,606, filed on Sep. 23, 2003, which is hereby incorporated by reference in its entirety. This application is also related to and claims priority from U.S. Provisional Patent Application No. 60/556,411, filed on Mar. 24, 2004, which is hereby incorporated by reference in its entirety.
Number | Date | Country | |
---|---|---|---|
60505606 | Sep 2003 | US | |
60556411 | Mar 2004 | US |