The determination of the underlying etiology of symptoms suggestive of obstructive coronary artery disease (obstructive CAD, >70% stenosis in a major coronary artery, by clinical read) is a common clinical challenge in both primary care and cardiology clinics. Usual care in low to medium risk patients often involves a family history, risk factor assessment, followed by stress testing with or without non-invasive imaging. If positive, this is often followed by invasive coronary angiography (ICA). Despite extensive adoption of this usual care paradigm, more than 60% of patients referred for angiography do not have obstructive CAD. The development of novel diagnostic tests may identify symptomatic patients without obstructive CAD, allowing the patient to avoid subsequent cardiac testing and the clinicians to look elsewhere for the cause of their symptoms.
Previous work has demonstrated that peripheral blood gene expression profiling can be used to determine the likelihood of obstructive CAD in symptomatic patients (e.g., Corus; see related, co-owned patents including U.S. Pat. Nos. 9,122,777 and 8,914,240, each of which is herein incorporated by reference, in its entirety, for all purposes). Peripheral blood gene expression is typically limited at present to interrogating the changes in gene expression within circulating cells of the immune system due to the interaction of the cells with the diseased tissue. In addition, gene expression-based assays can be expensive to utilize and can be difficult to implement in a clinical lab setting, which can limit the placement of such assays in those settings.
The various limitations of a gene expression-based approach are overcome or minimized by the various approaches described herein, e.g., by instead utilizing an approach that includes protein-based expression data. Proteins, which can be released into circulation in response to CAD, may capture a more direct response to CAD, e.g., the proteins are released directly from the diseased site, or a more systemic reflection of the disease, e.g., the proteins are released from multiple tissues or organs affected by CAD. In addition, protein-based assays can be more cost effective than gene expression based assays, and are generally easier to implement in the clinical lab setting, thus expanding the potential placement of such assays in those facilities. Finally, certain approaches taken herein have been demonstrated in varying head-to-head studies in the Examples described herein to have better performance and to be more predictive of CAD relative to Corus, e.g., as measured using area under the curve (AUC).
The patent or application file contains at least one drawing executed in color. Copies of this patent or patent application publication with color drawings will be provided by the U.S. Patent and Trademark Office upon request and payment of the necessary fee. These and other features, aspects, and advantages of the present invention will become better understood with regard to the following description, and accompanying drawings, where:
Described herein is a method for determining coronary artery disease risk in a subject, comprising: performing or having performed at least one protein detection assay on a sample from the subject to generate a dataset comprising data representing protein expression levels corresponding to at least two markers comprising corin, APOB, HSP70, RBP4, SERPINA12, NTproBNP, PIGF, adiponectin, APOA1, S100A8, MPO, S100A12, or TNFAIP6; and generating or having generated, by a computer processor, a score indicative of coronary artery disease (CAD) risk by mathematically combining the data representing the protein expression levels, wherein a higher score relative to a control subject having less than 50% stenosis in all major vessels as measured using Quantitative Coronary Angiography (QCA) indicates an increased likelihood that the subject has CAD or a lower score relative to a control subject having greater than or equal to 50% stenosis in at least one major coronary vessel as measured using QCA indicates a decreased likelihood that the subject has CAD.
Also disclosed herein is a method for determining coronary artery disease risk in a subject, comprising: obtaining or having obtained a dataset associated with a sample from the subject comprising data representing protein expression levels to at least two markers comprising corin, APOB, HSP70, RBP4, SERPINA12, NTproBNP, PIGF, adiponectin, APOA1, S100A8, MPO, S100A12, or TNFAIP6; generating or having generated, by a computer processor, a score indicative of coronary artery disease (CAD) risk by mathematically combining the data representing the protein expression levels, wherein a higher score relative to a control subject having less than 50% stenosis in all major vessels as measured using Quantitative Coronary Angiography (QCA) indicates an increased likelihood that the subject has CAD or a lower score relative to a control subject having greater than or equal to 50% stenosis in at least one major coronary vessel as measured using QCA indicates a decreased likelihood that the subject has CAD.
Also disclosed herein is a method for generating a dataset comprising data representing protein expression levels for a subject that has CAD or is suspected of having CAD, comprising: obtaining or having obtained a sample from the subject, wherein the subject has CAD or is suspected of having CAD; performing or having performed at least one protein detection assay on the sample to generate a dataset comprising data representing protein expression levels corresponding to at least two markers comprising corin, APOB, HSP70, RBP4, SERPINA12, NTproBNP, PIGF, adiponectin, APOA1, S100A8, MPO, S100A12, or TNFAIP6. In some aspects, the method further comprises generating, by a computer processor, a score indicative of coronary artery disease (CAD) risk by mathematically combining the data representing the protein expression levels, wherein a higher score relative to a control subject having less than 50% stenosis in all major vessels as measured using Quantitative Coronary Angiography (QCA) indicates an increased likelihood that the subject has CAD or a lower score relative to a control subject having greater than or equal to 50% stenosis in at least one major coronary vessel as measured using QCA indicates a decreased likelihood that the subject has CAD. In some aspects, the at least one protein detection assay is an immunoassay, a protein-binding assay, an antibody-based assay, an antigen-binding protein-based assay, a protein-based array, an enzyme-linked immunosorbent assay (ELISA), flow cytometry, a protein array, a blot, a Western blot, nephelometry, turbidimetry, chromatography, mass spectrometry, enzymatic activity, and an immunoassays selected from RIA, immunofluorescence, immunochemiluminescence, immunoelectrochemiluminescence, immunoelectrophoretic, a competitive immunoassay, amd immunoprecipitation.
In some aspects, the at least one protein detection assay is at least one enzyme-linked immunosorbent assay (ELISA), wherein the dataset comprises data representing expression levels corresponding to at least five markers comprising corin, APOB, HSP70, RBP4, SERPINA12, NTproBNP, PIGF, adiponectin, APOA1, S100A8, MPO, S100A12, or TNFAIP6, and wherein the score is more predictive of CAD than a score produced using Corus with the sample as measured using AIC or AUC. In some aspects, the dataset comprises data representing expression levels corresponding to at least three, four, or five markers comprising corin, APOB, HSP70, RBP4, SERPINA12, NTproBNP, PIGF, adiponectin, APOA1, S100A8, MPO, S100A12, or TNFAIP6. In some aspects, the dataset comprises data representing expression levels corresponding to at least three, four, or five markers comprising APOB, HSP70, SERPINA12, NTproBNP, PIGF, adiponectin, APOA1, S100A8, MPO, S100A12, or TNFAIP6.
In some aspects, a method disclosed herein further comprises classifying a sample according to the score. In some aspects, a method disclosed herein further comprises rating CAD risk using the score.
In some aspects, a sample comprises protein extracted from the blood of the subject.
In some aspects, the mathematical combination is based on a predictive model, optionally wherein the predictive model is a partial least squares model, a logistic regression model, a linear regression model, a linear discriminant analysis model, a ridge regression model, or a tree-based recursive partitioning model.
In some aspects, CAD is obstructive CAD.
In some aspects, method performance is characterized by an area under the curve (AUC) ranging from 0.52 to 0.81, 0.50 to 0.99, 0.55 to 0.65, 0.50 to 0.70, 0.70 to 0.79, 0.80 to 0.89, or 0.90 to 0.99. In some aspects, method performance is characterized by an area under the curve (AUC) ranging of at least 0.5, 0.52, 0.6, 0.7, 0.8, or 0.81.
In some aspects, a method disclosed herein further comprises obtaining data representing at least one clinical factor associated with the subject, optionally wherein the clinical factor comprises age of the subject and/or gender of the subject, and optionally mathematically combining the data representing the at least one clinical factor with the data representing the protein expression levels to generate the score. In some aspects, a method disclosed herein further comprises obtaining data representing at least one clinical factor associated with the subject, wherein the at least one clinical factor comprises at least one of age and gender. In some aspects, a method disclosed herein further comprises obtaining data representing at least one clinical factor associated with the subject, wherein the at least one clinical factor comprises age and gender. In some aspects, a method disclosed herein further comprises mathematically combining the data representing the at least one clinical factor with the data representing the protein expression levels to generate the score.
In some aspects, a subject is human.
In some aspects, an at least one protein detection assay is an immunoassay, a protein-binding assay, an antibody-based assay, an antigen-binding protein-based assay, a protein-based array, an enzyme-linked immunosorbent assay (ELISA), flow cytometry, a protein array, a blot, a Western blot, nephelometry, turbidimetry, chromatography, mass spectrometry, enzymatic activity, and an immunoassays selected from RIA, immunofluorescence, immunochemiluminescence, immunoelectrochemiluminescence, immunoelectrophoretic, a competitive immunoassay, amd immunoprecipitation.
In some aspects, a method disclosed herein further comprises taking at least one action based on the score, optionally wherein the at least one action comprises treating the subject, advising lifestyle changes to the subject, performing a procedure on the subject, performing further diagnostics on the subject, assessing the subject's health further, optimizing medical therapy, investigating non-cardiac etiologies of symptoms, or performing angiography on the subject.
In some aspects, obtaining the dataset comprises obtaining the sample and processing the sample to experimentally determine the dataset. In some aspects, obtaining the dataset comprises performing at least one protein detection assay, optionally wherein the at least one protein detection assay is an immunoassay, a protein-binding assay, an antibody-based assay, an antigen-binding protein-based assay, a protein-based array, ELISA, flow cytometry, a blot, or mass spectrometry. In some aspects, the at least one protein detection assay is an immunoassay, a protein-binding assay, an antibody-based assay, an antigen-binding protein-based assay, a protein-based array, an enzyme-linked immunosorbent assay (ELISA), flow cytometry, a protein array, a blot, a Western blot, nephelometry, turbidimetry, chromatography, mass spectrometry, enzymatic activity, and an immunoassays selected from RIA, immunofluorescence, immunochemiluminescence, immunoelectrochemiluminescence, immunoelectrophoretic, a competitive immunoassay, amd immunoprecipitation. In some aspects, obtaining the dataset comprises receiving the dataset from a third party that has processed the sample to experimentally determine the dataset.
Also disclosed herein is a system for determining coronary artery disease risk in a subject, comprising: a storage memory for storing a dataset associated with a sample from the subject comprising data representing protein expression levels corresponding to at least two markers comprising corin, APOB, HSP70, RBP4, SERPINA12, NTproBNP, PIGF, adiponectin, APOA1, S100A8, MPO, S100A12, or TNFAIP6; and a processor communicatively coupled to the storage memory for generating a score indicative of CAD risk by mathematically combining the data representing the protein expression levels, wherein a higher score relative to a control subject having less than 50% stenosis in all major vessels as measured using Quantitative Coronary Angiography (QCA) indicates an increased likelihood that the subject has CAD or a lower score relative to a control subject having greater than or equal to 50% stenosis in at least one major coronary vessel as measured using QCA indicates a decreased likelihood that the subject has CAD.
In some aspects, the dataset comprises data representing expression levels corresponding to at least five markers comprising corin, APOB, HSP70, RBP4, SERPINA12, NTproBNP, PIGF, adiponectin, APOA1, S100A8, MPO, S100A12, or TNFAIP6, and wherein the score is more predictive of CAD than a score produced using Corus with the sample as measured using AIC or AUC. In some aspects, the dataset comprises data representing expression levels corresponding to at least three, four, or five markers comprising corin, APOB, HSP70, RBP4, SERPINA12, NTproBNP, PIGF, adiponectin, APOA1, S100A8, MPO, S100A12, or TNFAIP6. In some aspects, the dataset comprises data representing expression levels corresponding to at least three, four, or five markers comprising APOB, HSP70, SERPINA12, NTproBNP, PIGF, adiponectin, APOA1, S100A8, MPO, S100A12, or TNFAIP6.
In some aspects, a system further comprises code for classifying the sample according to the score. In some aspects, a system further comprises code for rating CAD risk using the score.
In some aspects, the sample comprises protein extracted from the blood of the subject.
In some aspects, the mathematical combination is based on a predictive model, optionally wherein the predictive model is a partial least squares model, a logistic regression model, a linear regression model, a linear discriminant analysis model, a ridge regression model, or a tree-based recursive partitioning model.
In some aspects, CAD is obstructive CAD. In some aspects, a subject is human.
In some aspects, performance of the mathematical combination is characterized by an area under the curve (AUC) ranging from 0.52 to 0.81, 0.50 to 0.99, 0.55 to 0.65, 0.50 to 0.70, 0.70 to 0.79, 0.80 to 0.89, or 0.90 to 0.99. In some aspects, performance of the mathematical combination is characterized by an area under the curve (AUC) ranging of at least 0.5, 0.52, 0.6, 0.7, 0.8, or 0.81.
In some aspects, a system further comprises a storage memory comprising data representing at least one clinical factor associated with the subject, optionally wherein the clinical factor comprises age of the subject and/or gender of the subject. In some aspects, the system further comprises a storage memory comprising data representing at least one clinical factor associated with the subject, wherein the at least one clinical factor comprises at least one of age and gender. In some aspects, the system further comprises a storage memory comprising data representing at least one clinical factor associated with the subject, wherein the at least one clinical factor comprises age and gender. In some aspects, the system further comprises a processor communicatively coupled to the storage memory for mathematically combining the data representing the at least one clinical factor with the data representing the protein expression levels to generate the score.
In some aspects, a system further comprises an apparatus for providing a readout that provides instructions for taking at least one action based on the score, optionally wherein the at least one action comprises treating the subject, advising lifestyle changes to the subject, performing a procedure on the subject, performing further diagnostics on the subject, assessing the subject's health further, optimizing medical therapy, investigating non-cardiac etiologies of symptoms, or performing angiography on the subject.
Also disclosed herein is a computer-readable storage medium storing computer-executable program code for determining coronary artery disease risk in a subject, comprising: program code for storing a dataset associated with a sample from the subject comprising data representing protein expression levels corresponding to at least two markers comprising corin, APOB, HSP70, RBP4, SERPINA12, NTproBNP, PIGF, adiponectin, APOA1, S100A8, MPO, S100A12, or TNFAIP6; and program code for generating a score indicative of CAD risk by mathematically combining the data representing the protein expression levels, wherein a higher score relative to a control subject having less than 50% stenosis in all major vessels as measured using Quantitative Coronary Angiography (QCA) indicates an increased likelihood that the subject has CAD or a lower score relative to a control subject having greater than or equal to 50% stenosis in at least one major coronary vessel as measured using QCA indicates a decreased likelihood that the subject has CAD.
In some aspects, the dataset comprises data representing expression levels corresponding to at least five markers comprising corin, APOB, HSP70, RBP4, SERPINA12, NTproBNP, PIGF, adiponectin, APOA1, S100A8, MPO, S100A12, or TNFAIP6, and wherein the score is more predictive of CAD than a score produced using Corus with the sample as measured using AIC or AUC. In some aspects, the dataset comprises data representing expression levels corresponding to at least three, four, or five markers comprising corin, APOB, HSP70, RBP4, SERPINA12, NTproBNP, PIGF, adiponectin, APOA1, S100A8, MPO, S100A12, or TNFAIP6. In some aspects, the dataset comprises data representing expression levels corresponding to at least three, four, or five markers comprising APOB, HSP70, SERPINA12, NTproBNP, PIGF, adiponectin, APOA1, S100A8, MPO, S100A12, or TNFAIP6.
In some aspects, a medium further comprises program code for classifying the sample according to the score. In some aspects, a medium further comprises program code for rating CAD risk using the score.
In some aspects, a sample comprises protein extracted from the blood of the subject.
In some aspects, the mathematical combination is based on a predictive model, optionally wherein the predictive model is a partial least squares model, a logistic regression model, a linear regression model, a linear discriminant analysis model, a ridge regression model, or a tree-based recursive partitioning model.
In some aspects, CAD is obstructive CAD. In some aspects, a subject is human.
In some aspects, performance of the mathematical combination is characterized by an area under the curve (AUC) ranging from 0.52 to 0.81, 0.50 to 0.99, 0.55 to 0.65, 0.50 to 0.70, 0.70 to 0.79, 0.80 to 0.89, or 0.90 to 0.99. In some aspects, performance of the mathematical combination is characterized by an area under the curve (AUC) ranging of at least 0.5, 0.52, 0.6, 0.7, 0.8, or 0.81.
In some aspects, a medium further comprises program code for storing data representing at least one clinical factor associated with the subject, optionally wherein the clinical factor comprises age of the subject and/or gender of the subject. In some aspects, the medium further comprises program code for storing data representing at least one clinical factor associated with the subject, wherein the at least one clinical factor comprises at least one of age and gender. In some aspects, the medium further comprises program code for storing data representing at least one clinical factor associated with the subject, wherein the at least one clinical factor comprises age and gender. In some aspects, the medium further comprises program code for storing for mathematically combining the data representing the at least one clinical factor with the data representing the protein expression levels to generate the score.
In some aspects, a medium further comprises program code for storing instructions for taking at least one action based on the score, optionally wherein the at least one action comprises treating the subject, advising lifestyle changes to the subject, performing a procedure on the subject, performing further diagnostics on the subject, assessing the subject's health further, optimizing medical therapy, investigating non-cardiac etiologies of symptoms, or performing angiography on the subject.
Also disclosed herein is a kit for determining coronary artery disease risk in a subject, comprising: a set of reagents for generating a dataset via at least one protein detection assay that is associated with a sample from the subject comprising data representing protein expression levels corresponding to at least two markers comprising corin, APOB, HSP70, RBP4, SERPINA12, NTproBNP, PIGF, adiponectin, APOA1, S100A8, MPO, S100A12, or TNFAIP6; and instructions for generating a score indicative of CAD risk by mathematically combining the data representing the protein expression levels, wherein a higher score relative to a control subject having less than 50% stenosis in all major vessels as measured using Quantitative Coronary Angiography (QCA) indicates an increased likelihood that the subject has CAD or a lower score relative to a control subject having greater than or equal to 50% stenosis in at least one major coronary vessel as measured using QCA indicates a decreased likelihood that the subject has CAD.
In some aspects, the at least one protein detection assay is at least one enzyme-linked immunosorbent assay (ELISA), wherein the dataset comprises data representing expression levels corresponding to at least five markers comprising corin, APOB, HSP70, RBP4, SERPINA12, NTproBNP, PIGF, adiponectin, APOA1, S100A8, MPO, S100A12, or TNFAIP6, and wherein the score is more predictive of CAD than a score produced using Corus with the sample as measured using AIC or AUC. In some aspects, the dataset comprises data representing expression levels corresponding to at least three, four, or five markers comprising corin, APOB, HSP70, RBP4, SERPINA12, NTproBNP, PIGF, adiponectin, APOA1, S100A8, MPO, S100A12, or TNFAIP6. In some aspects, the dataset comprises data representing expression levels corresponding to at least three, four, or five markers comprising APOB, HSP70, SERPINA12, NTproBNP, PIGF, adiponectin, APOA1, S100A8, MPO, S100A12, or TNFAIP6.
In some aspects, a kit further comprises instructions for classifying the sample according to the score. In some aspects, a kit further comprises instructions for rating CAD risk using the score.
In some aspects, a sample comprises protein extracted from the blood of the subject.
In some aspects, the mathematical combination is based on a predictive model, optionally wherein the predictive model is a partial least squares model, a logistic regression model, a linear regression model, a linear discriminant analysis model, a ridge regression model, or a tree-based recursive partitioning model.
In some aspects, CAD is obstructive CAD. In some aspects, a subject is human.
In some aspects, performance of the instructions for generating the score is characterized by an area under the curve (AUC) ranging from 0.52 to 0.81, 0.50 to 0.99, 0.55 to 0.65, 0.50 to 0.70, 0.70 to 0.79, 0.80 to 0.89, or 0.90 to 0.99. In some aspects, performance of the instructions for generating the score is characterized by an area under the curve (AUC) ranging of at least 0.5, 0.52, 0.6, 0.7, 0.8, or 0.81.
In some aspects, a kit further comprises instructions for obtaining data representing at least one clinical factor associated with the subject, optionally wherein the clinical factor comprises age of the subject and/or gender of the subject, and optionally comprising instructions for mathematically combining the data representing the at least one clinical factor with the data representing the protein expression levels to generate the score. In some aspects, the kit further comprises instructions for obtaining data representing at least one clinical factor associated with the subject, wherein the at least one clinical factor comprises at least one of age and gender. In some aspects, the kit further comprises instructions for obtaining data representing at least one clinical factor associated with the subject, wherein the at least one clinical factor comprises age and gender. In some aspects, the kit further comprises instructions for mathematically combining the data representing the at least one clinical factor with the data representing the protein expression levels to generate the score.
In some aspects, the at least one protein detection assay is an immunoassay, a protein-binding assay, an antibody-based assay, an antigen-binding protein-based assay, a protein-based array, an enzyme-linked immunosorbent assay (ELISA), flow cytometry, a protein array, a blot, a Western blot, nephelometry, turbidimetry, chromatography, mass spectrometry, enzymatic activity, and an immunoassays selected from RIA, immunofluorescence, immunochemiluminescence, immunoelectrochemiluminescence, immunoelectrophoretic, a competitive immunoassay, amd immunoprecipitation.
In some aspects, the reagents comprise one or more antibodies that bind to one or more of the markers, optionally wherein the antibodies are monoclonal antibodies or polyclonal antibodies.
In some aspects, a kit further comprises instructions for taking at least one action based on the score, optionally wherein the at least one action comprises treating the subject, advising lifestyle changes to the subject, performing a procedure on the subject, performing further diagnostics on the subject, assessing the subject's health further, optimizing medical therapy, investigating non-cardiac etiologies of symptoms, or performing angiography on the subject.
Circulating proteins are well-established as biomarkers of disease. 137 protein biomarkers were interrogated for association with coronary artery disease, and subsequently a multi-analyte predictive model utilizing a subset of markers was created. The identification of biomarkers associated with the likelihood of coronary artery disease and creation of a predictive model could lead, e.g., to better patient stratification for further cardiovascular workup and intervention. Models to assist in determining the likelihood of coronary artery disease in a subject based on proteins markers were developed and tested. These models have been demonstrated to have greater predictive value for the likelihood of coronary artery disease relative to earlier coronary artery disease tests, including Cons.
Terms used in the claims and specification are defined as set forth below unless otherwise specified.
It must be noted that, as used in the specification and the appended claims, the singular forms “a,” “an” and “the” include plural referents unless the context clearly dictates otherwise.
A “subject” in the context of the present teachings is generally a mammal, e.g., a human. The subject can be a human patient, e.g., a human heart failure patient. The term “mammal” as used herein includes but is not limited to a human, non-human primate, dog, cat, mouse, rat, cow, horse, and pig. Mammals other than humans can be advantageously used as subjects that represent animal models of, e.g., heart failure. A subject can be male or female. A subject can be one who has been previously diagnosed or identified as having coronary artery disease. A subject can be one who has already undergone, or is undergoing, a therapeutic intervention for coronary artery disease. A subject can also be one who has not been previously diagnosed as having coronary artery disease; e.g., a subject can be one who exhibits one or more symptoms or risk factors for coronary artery disease, or a subject who does not exhibit symptoms or risk factors for coronary artery disease, or a subject who is asymptomatic for coronary artery disease.
A “sample” in the context of the present teachings refers to any biological sample that is isolated from a subject. A sample can include, without limitation, a single cell or multiple cells, fragments of cells, an aliquot of body fluid, whole blood, platelets, serum, plasma, red blood cells, white blood cells or leucocytes, endothelial cells, tissue biopsies, synovial fluid, lymphatic fluid, ascites fluid, and interstitial or extracellular fluid. The term “sample” also encompasses the fluid in spaces between cells, including gingival crevicular fluid, bone marrow, cerebrospinal fluid (CSF), saliva, mucous, sputum, semen, sweat, urine, or any other bodily fluids. “Blood sample” can refer to whole blood or any fraction thereof, including blood cells, red blood cells, white blood cells or leucocytes, platelets, serum and plasma. Samples can be obtained from a subject by means including but not limited to venipuncture, excretion, ejaculation, massage, biopsy, needle aspirate, lavage, scraping, surgical incision, or intervention or other means known in the art. In one embodiment the sample is a whole blood sample. A sample can include protein extracted from blood of a subject.
“Marker,” “markers,” biomarker,” or, “biomarkers,” all refer to a sequence characteristic of a particular variant allele (i.e., polymorphic site) or wild-type allele. A marker can include any allele, including wild-types alleles, SNPs, microsatellites, insertions, deletions, duplications, and translocations. A marker can also include a peptide encoded by an allele comprising nucleic acids. A marker in the context of the present teachings encompasses, without limitation, cytokines, chemokines, growth factors, proteins, peptides, nucleic acids, oligonucleotides, and metabolites, together with their related metabolites, mutations, variants, polymorphisms, modifications, fragments, subunits, degradation products, elements, and other analytes or sample-derived measures. Markers can also include mutated proteins, mutated nucleic acids, variations in copy numbers and/or transcript variants. Markers also encompass non-blood borne factors and non-analyte physiological markers of health status, and/or other factors or markers not measured from samples (e.g., biological samples such as bodily fluids), such as clinical parameters and traditional factors for clinical assessments. Markers can also include any indices that are calculated and/or created mathematically. Markers can also include combinations of any one or more of the foregoing measurements, including temporal trends and differences. As used herein, markers typically refer to sequence characteristics of the D-loop mtDNA, e.g., Tm and/or single or multiple SNPS and/or number of polymorphisms.
To “analyze” includes measurement and/or detection of data associated with a marker (such as, e.g., presence or absence of a SNP, allele, melting temperature (Tm) or constituent expression levels) in the sample (or, e.g., by obtaining a dataset reporting such measurements, as described below). In some aspects, an analysis can include comparing the measurement and/or detection against a measurement and/or detection in a sample or set of samples from the same subject or other control subject(s). The markers of the present teachings can be analyzed by any of various conventional methods known in the art.
A “dataset” is a set of data (e.g., numerical values) resulting from evaluation of a sample (or population of samples) under a desired condition. The values of the dataset can be obtained, for example, by experimentally obtaining measures from a sample and constructing a dataset from these measurements; or alternatively, by obtaining a dataset from a service provider such as a laboratory, or from a database or a server on which the dataset has been stored. Similarly, the term “obtaining a dataset associated with a sample” encompasses obtaining a set of data determined from at least one sample. Obtaining a dataset encompasses obtaining a sample, and processing the sample to experimentally determine the data, e.g., via measuring, sequencing, PCR, RT-PCR, microarray, contacting with one or more primers, contacting with one or more probes, antibody binding, or ELISA. The phrase also encompasses receiving a set of data, e.g., from a third party that has processed the sample to experimentally determine the dataset. Additionally, the phrase encompasses mining data from at least one database or at least one publication or a combination of databases and publications.
“Measuring” or “measurement” in the context of the present teachings refers to determining the presence, absence, quantity, amount, or effective amount of a substance in a clinical or subject-derived sample, including the presence, absence, or concentration levels of such substances, and/or evaluating the values or categorization of a subject's clinical parameters based on a control.
The term “acute coronary syndrome” encompasses all forms of unstable coronary artery disease.
The term “coronary artery disease” or “CAD” encompasses all forms of atherosclerotic disease affecting the coronary arteries. In particular, CAD includes obstructive CAD.
The term “FDR” means to false discovery rate. FDR can be estimated by analyzing randomly-permuted datasets and tabulating the average number of genes at a given p-value threshold.
The terms “highly correlated gene expression” or “highly correlated marker expression” refer to gene or marker expression values that have a sufficient degree of correlation to allow their interchangeable use in a predictive model of coronary artery disease. For example, if gene x having expression value X is used to construct a predictive model, highly correlated gene y having expression value Y can be substituted into the predictive model in a straightforward way readily apparent to those having ordinary skill in the art and the benefit of the instant disclosure. Assuming an approximately linear relationship between the expression values of genes x and y such that Y=a+bX, then X can be substituted into the predictive model with (Y−a)/b. For non-linear correlations, similar mathematical transformations can be used that effectively convert the expression value of gene y into the corresponding expression value for gene x. The terms “highly correlated marker” or “highly correlated substitute marker” refer to markers that can be substituted into and/or added to a predictive model based on, e.g., the above criteria. A highly correlated marker can be used in at least two ways: (1) by substitution of the highly correlated marker(s) for the original marker(s) and generation of a new model for predicting CAD risk; or (2) by substitution of the highly correlated marker(s) for the original marker(s) in the existing model for predicting CAD risk.
The term “myocardial infarction” refers to an ischemic myocardial necrosis. This is usually the result of abrupt reduction in coronary blood flow to a segment of the myocardium, the muscular tissue of the heart. Myocardial infarction can be classified into ST-elevation and non-ST elevation MI (also referred to as unstable angina). Myocardial necrosis results in either classification. Myocardial infarction, of either ST-elevation or non-ST elevation classification, is an unstable form of atherosclerotic cardiovascular disease.
The term “obtaining a dataset associated with a sample” encompasses obtaining a set of data determined from at least one sample. Obtaining a dataset encompasses obtaining a sample, and processing the sample to experimentally determine the data. The phrase also encompasses receiving a set of data, e.g., from a third party that has processed the sample to experimentally determine the dataset. Additionally, the phrase encompasses mining data from at least one database or at least one publication or a combination of databases and publications. A dataset can be obtained by one of skill in the art via a variety of known ways including stored on a storage memory.
As used herein “Corus” or “CorusCAD” refers to a commercially available test offered by CardioDx. This test is described in U.S. Pat. Nos. 9,122,777 and 8,914,240, each of which is herein incorporated by reference, in its entirety, for all purposes. In summary, Corus is a test where RNA is extracted from a sample of peripheral blood cells of a subject, converted to cDNA, and then assessed for the expression level of 23 distinct genes using RT-qPCR, followed by the transformation of the expression level data plus age and gender functions by an algorithm into a score that is predictive of the likelihood of CAD in the subject. Genes included in the Corus test are: S100A12, CLEC4E, S100A8, CASP5, IL18RAP, TNFAIP6, AQP9, NCF4, CD3D, TMC8, CD79B, SPIB, HNRPF, TFCP2, RPL28, AF161365, AF289562, SLAMF7, KLRC4, IL8RB, TNFRSF10C, KCNE3, and TLR4. The algorithm for producing the score is as shown below:
Methods
Disclosed herein are various methods of determining CAD risk in a subject from a sample. Such methods can include obtaining a dataset associated with a sample from a subject comprising data representing protein expression levels for one or more markers; and combining the data in the dataset to produce a score that is indicative of CAD risk associated with the sample. Such methods can include obtaining a dataset associated with a sample from a subject comprising data representing one or more clinical factors and data representing protein expression levels for markers; and combining the data in the dataset to produce a score that is indicative of CAD risk associated with the sample. Such methods can be computer-implemented, performed as physical assays, or a combination thereof. Such methods can be useful in informing later actions to be taken by the subject on whom the method is performed or by a physician that is assisting the subject. For example, a score that suggests a subject is at increased risk of CAD can be used by a physician to inform an action that is likely to reduce that risk, such as administering aspirin. Other actions that can be taken can include treating the subject, advising lifestyle changes to the subject, performing a procedure on the subject, performing further diagnostics on the subject, assessing the subject's health further, optimizing medical therapy, investigating non-cardiac etiologies of symptoms, or performing angiography on the subject.
Markers
“Marker,” “markers,” biomarker,” or, “biomarkers,” all refer to a sequence characteristic of a particular variant allele (i.e., polymorphic site) or wild-type allele. A marker can include any allele, including wild-types alleles, SNPs, microsatellites, insertions, deletions, duplications, and translocations. A marker can also include a peptide encoded by an allele comprising nucleic acids. A marker in the context of the present teachings encompasses, without limitation, cytokines, chemokines, growth factors, proteins, peptides, nucleic acids, oligonucleotides, and metabolites, together with their related metabolites, mutations, variants, polymorphisms, modifications, fragments, subunits, degradation products, elements, and other analytes or sample-derived measures. Markers can also include mutated proteins, mutated nucleic acids, variations in copy numbers and/or transcript variants. Markers also encompass non-blood borne factors and non-analyte physiological markers of health status, and/or other factors or markers not measured from samples (e.g., biological samples such as bodily fluids), such as clinical parameters and traditional factors for clinical assessments. Markers can also include any indices that are calculated and/or created mathematically. Markers can also include combinations of any one or more of the foregoing measurements, including temporal trends and differences.
Various markers are shown in the tables. In some aspects, a marker can include at least one of Adiponectin, APOA1, NT-proBNP, PIGF, and S100A8-MPO.
A marker can include one or more of corin, APOB, HSP70, RBP4, SERPINA12, NTproBNP, PIGF, adiponectin, APOA1, S100A8, MPO, S100A12, or TNFAIP6. A marker can include one or more of: APOB, HSP70, SERPINA12, NTproBNP, PIGF, adiponectin, APOA1, S100A8, MPO, S100A12, or TNFAIP6. A marker can include 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, or 12 of: corin, APOB, HSP70, RBP4, SERPINA12, NTproBNP, PIGF, adiponectin, APOA1, S100A8, MPO, S100A12, or TNFAIP6.
The terms “polypeptide,” “peptide” and “protein” are used interchangeably herein to refer to a polymer of amino acid residues. That is, a description directed to a polypeptide applies equally to a description of a peptide and a description of a protein, and vice versa. The terms apply to naturally occurring amino acid polymers as well as amino acid polymers in which one or more amino acid residues is a non-naturally encoded amino acid. As used herein, the terms encompass amino acid chains of any length, including full length proteins, wherein the amino acid residues are linked by covalent peptide bonds.
The term “amino acid” refers to naturally occurring and non-naturally occurring amino acids, as well as amino acid analogs and amino acid mimetics that function in a manner similar to the naturally occurring amino acids. Naturally encoded amino acids are the 20 common amino acids (alanine, arginine, asparagine, aspartic acid, cysteine, glutamine, glutamic acid, glycine, histidine, isoleucine, leucine, lysine, methionine, phenylalanine, praline, serine, threonine, tryptophan, tyrosine, and valine) and pyrrolysine and selenocysteine. Amino acid analogs refers to compounds that have the same basic chemical structure as a naturally occurring amino acid, i.e., an a carbon that is bound to a hydrogen, a carboxyl group, an amino group, and an R group, such as, homoserine, norleucine, methionine sulfoxide, methionine methyl sulfonium. Such analogs have modified R groups (such as, norleucine) or modified peptide backbones, but retain the same basic chemical structure as a naturally occurring amino acid. Reference to an amino acid includes, for example, naturally occurring proteogenic L-amino acids; D-amino acids, chemically modified amino acids such as amino acid variants and derivatives; naturally occurring non-proteogenic amino acids such as β-alanine, ornithine, etc.; and chemically synthesized compounds having properties known in the art to be characteristic of amino acids. Examples of non-naturally occurring amino acids include, but are not limited to, α-methyl amino acids (e.g., α-methyl alanine), D-amino acids, histidine-like amino acids (e.g., 2-amino-histidine, β-hydroxy-histidine, homohistidine), amino acids having an extra methylene in the side chain (“homo” amino acids), and amino acids in which a carboxylic acid functional group in the side chain is replaced with a sulfonic acid group (e.g., cysteic acid). The incorporation of non-natural amino acids, including synthetic non-native amino acids, substituted amino acids, or one or more D-amino acids into the proteins of the present invention may be advantageous in a number of different ways. D-amino acid-containing peptides, etc., exhibit increased stability in vitro or in vivo compared to L-amino acid-containing counterparts. Thus, the construction of peptides, etc., incorporating D-amino acids can be particularly useful when greater intracellular stability is desired or required. More specifically, D-peptides, etc., are resistant to endogenous peptidases and proteases, thereby providing improved bioavailability of the molecule, and prolonged lifetimes in vivo when such properties are desirable. Additionally, D-peptides, etc., cannot be processed efficiently for major histocompatibility complex class II-restricted presentation to T helper cells, and are therefore, less likely to induce humoral immune responses in the whole organism.
Amino acids may be referred to herein by either their commonly known three letter symbols or by the one-letter symbols recommended by the IUPAC-IUB Biochemical Nomenclature Commission. Nucleotides, likewise, may be referred to by their commonly accepted single-letter codes.
A derivative, or a variant of a polypeptide is said to share “homology” or be “homologous” with the peptide if the amino acid sequences of the derivative or variant has at least 50% identity with a 100 amino acid sequence from the original peptide. In certain embodiments, the derivative or variant is at least 75% the same as that of either the peptide or a fragment of the peptide having the same number of amino acid residues as the derivative. In certain embodiments, the derivative or variant is at least 85% the same as that of either the peptide or a fragment of the peptide having the same number of amino acid residues as the derivative. In certain embodiments, the amino acid sequence of the derivative is at least 90% the same as the peptide or a fragment of the peptide having the same number of amino acid residues as the derivative. In some embodiments, the amino acid sequence of the derivative is at least 95% the same as the peptide or a fragment of the peptide having the same number of amino acid residues as the derivative. In certain embodiments, the derivative or variant is at least 99% the same as that of either the peptide or a fragment of the peptide having the same number of amino acid residues as the derivative.
The term “modified,” as used herein refers to any changes made to a given polypeptide, such as changes to the length of the polypeptide, the amino acid sequence, chemical structure, co-translational modification, or post-translational modification of a polypeptide. The form “(modified)” term means that the polypeptides being discussed are optionally modified, that is, the polypeptides under discussion can be modified or unmodified.
In some aspects, a marker comprises an amino acid sequence that is at least 80, 85, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, or 100% identical to a relevant amino acid sequence or fragment thereof set forth in the Table(s) or accession number(s) disclosed herein. In some aspects, a marker comprises an amino acid sequence encoded by a polynucleotide that is at least 80, 85, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, or 100% identical to a relevant nucleotide sequence or fragment thereof set forth in Table(s) or accession number(s) disclosed herein. Accession numbers of certain markers are shown in Table 9.1.
Predictive Models
As disclosed herein the invention includes a method of generating a prediction model for likelihood of CAD in subjects. Also disclosed herein are methods of using the predictive model to determine the likelihood of CAD in a subject.
A predictive model can include, for example, a partial least squares model, a logistic regression model, a linear regression model, a linear discriminant analysis model, a ridge regression model, and a tree-based recursive partitioning model. In some embodiments, a predictive model can also include Support Vector Machines, quadratic discriminant analysis, or a LASSO regression model. See Elements of Statistical Learning, Springer 2003, Hastie, Tibshirani, Friedman; which is herein incorporated by reference in its entirety for all purposes.
Predictive model performance can be characterized by an area under the curve (AUC). In some embodiments, predictive model performance is characterized by an AUC ranging from 0.68 to 0.70. In some embodiments, predictive model performance is characterized by an AUC ranging from 0.70 to 0.79. In some embodiments, predictive model performance is characterized by an AUC ranging from 0.80 to 0.89. In some embodiments, predictive model performance is characterized by an AUC ranging from 0.90 to 0.99. AUC can range from 0.52 to 0.81, 0.50 to 0.99, 0.55 to 0.65, 0.50 to 0.70, 0.70 to 0.79, 0.80 to 0.89, or 0.90 to 0.99. AUC can be at least 0.5, 0.52, 0.6, 0.7, 0.8, or 0.81.
AIC can be used to measure model performance. Normal AIC is a combination of the log likelihood, or deviance, of the model adjusted by the number of parameters in the model. AIC can also be expressed as a corrected AIC (AICc) which is further adjusted for the number of cases available in a dataset from which a given estimate is calculated from. For example, corrected AIC can be calculated by: AICc=AIC+{2p(p+1)/n−p−1}, where p is the number of parameters in the model and n is the number of cases used in model fitting. AIC can range from 485 to 601, e.g., at least 490, 500, 510, 520, 530, 540, 550, 560, 570, 580, 590, 600 or greater (inclusive).
Relative Risk
In one embodiment, significance associated with one or more markers is measured by a relative risk. In a further embodiment, the significance is measured by a percentage. In one embodiment, a significant decreased risk is measured as a relative risk of at least about 1.2, including but not limited to: 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8 and 1.9. In a further embodiment, a relative risk of at least 1.2 is significant. In a further embodiment, a relative risk of at least about 1.5 is significant. In a further embodiment, a significant increase in risk is at least about 1.7 is significant. In a further embodiment, a significant increase in risk is at least about 20%, including but not limited to about 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95% and 98%. In a further embodiment, a significant increase in risk is at least about 50%.
Risk of CAD can be calculated by combining data representing expression levels of multiple protein markers, such as 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 or more protein markers. Risk of CAD can be calculated by combining data representing expression levels of multiple protein markers, such as 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 or more protein markers with data representing one or more clinical factors (e.g., age and/or gender). Such data combination will typically result in a score. Oftentimes such a score will be indicative of CAD risk. For example, a higher score for a given subject relative to a control subject having less than 50% stenosis in all major vessels as measured using Quantitative Coronary Angiography (QCA) can indicate an increased likelihood that the subject has CAD. Alternatively or in addition to, a lower score for a given subject relative to a control subject having greater than or equal to 50% stenosis in at least one major coronary vessel as measured using QCA can indicate a decreased likelihood that the subject has CAD.
A score produced via a combination of data can be useful in classifying, sorting, or rating a sample from which the score was generated. For example, a score can be used to classify a sample. A score can also be used to rate CAD risk for a given sample.
Assays
Examples of assays for one or more markers include DNA assays, microarrays, polymerase chain reaction (PCR), RT-PCR, Southern blots, Northern blots, antibody-binding assays, enzyme-linked immunosorbent assays (ELISAs), flow cytometry, protein assays, Western blots, nephelometry, turbidimetry, chromatography, mass spectrometry, immunoassays, including, by way of example, but not limitation, RIA, immunofluorescence, immunochemiluminescence, immunoelectrochemiluminescence, or competitive immunoassays, immunoprecipitation, and the assays described in the Examples section below. The information from the assay can be quantitative and sent to a computer system of the invention. The information can also be qualitative, such as observing patterns or fluorescence, which can be translated into a quantitative measure by a user or automatically by a reader or computer system. In an embodiment, the subject can also provide information other than assay information to a computer system, such as race, height, weight, age, gender, eye color, hair color, family medical history and any other information that may be useful to a user, such as a clinical factor described above.
Protein detection assays are assays used to detect the expression level of a given protein from a sample. Protein detection assays are generally known in the art and can include an immunoassay, a protein-binding assay, an antibody-based assay, an antigen-binding protein-based assay, a protein-based array, an enzyme-linked immunosorbent assay (ELISA), flow cytometry, a protein array, a blot, a Western blot, nephelometry, turbidimetry, chromatography, mass spectrometry, enzymatic activity, and an immunoassays selected from RIA, immunofluorescence, immunochemiluminescence, immunoelectrochemiluminescence, immunoelectrophoretic, a competitive immunoassay, amd immunoprecipitation. Reagents for use in such assays such as ELISA are shown in Table 9.2.
Protein based analysis, using an antibody as described above that specifically binds to a polypeptide encoded by an altered nucleic acid or an antibody that specifically binds to a polypeptide encoded by a non-altered nucleic acid, or an antibody that specifically binds to a particular splicing variant encoded by a nucleic acid, can be used to identify the presence in a test sample of a particular splicing variant or of a polypeptide encoded by a polymorphic or altered nucleic acid, or the absence in a test sample of a particular splicing variant or of a polypeptide encoded by a non-polymorphic or non-altered nucleic acid. The presence of a polypeptide encoded by a polymorphic or altered nucleic acid, or the absence of a polypeptide encoded by a non-polymorphic or non-altered nucleic acid, is diagnostic for a susceptibility to coronary artery disease.
In one aspect, the level or amount of polypeptide encoded by a nucleic acid in a test sample is compared with the level or amount of the polypeptide encoded by the nucleic acid in a control sample. A level or amount of the polypeptide in the test sample that is higher or lower than the level or amount of the polypeptide in the control sample, such that the difference is statistically significant, is indicative of an alteration in the expression of the polypeptide encoded by the nucleic acid, and is diagnostic. Alternatively, the composition of the polypeptide encoded by a nucleic acid in a test sample is compared with the composition of the polypeptide encoded by the nucleic acid in a control sample (e.g., the presence of different splicing variants). A difference in the composition of the polypeptide in the test sample, as compared with the composition of the polypeptide in the control sample, is diagnostic. In another aspect, both the level or amount and the composition of the polypeptide can be assessed in the test sample and in the control sample. A difference in the amount or level of the polypeptide in the test sample, compared to the control sample; a difference in composition in the test sample, compared to the control sample; or both a difference in the amount or level, and a difference in the composition, is indicative of a likelihood of CAD, either increased or decreased.
In addition, one of skill will also understand that the above described methods can also generally be used to detect markers that do not include a polymorphism.
Clinical Factors
In some embodiments, one or more clinical factors in an subject, e.g., a heart failure patient, can be assessed. In some embodiments, assessment of one or more clinical factors in a subject can be combined with a marker analysis in the subject to identify likelihood of CAD in the subject.
The term “clinical factor” refers to a measure of a condition of a subject, e.g., disease activity or severity. “Clinical factor” encompasses all markers of a subject's health status, including non-sample markers, and/or other characteristics of a subject, such as, without limitation, age and gender. A clinical factor can be a score, a value, or a set of values that can be obtained from evaluation of a sample (or population of samples) from a subject or a subject under a determined condition. A clinical factor can also be predicted by markers and/or other parameters such as gene expression surrogates.
A clinical factor can include age of a subject. A clinical factor can include gender of a subject. A clinical factor can include age and gender of a subject.
Various clinical factors are generally known to one of ordinary skill in the art to be associated with sudden cardiac events. In some embodiments, clinical factors known to one of ordinary skill in the art to be associated with coronary artery disease, such as an arrhythmia, can include age, gender, race, implant indication, prior pacing status, ICD presence, cardiac resynchronization therapy defibrillator (CRT-D) presence, total number of devices, device type, defibrillation thresholds performed, number of programming zones, heart failure (HF) etiology, HF onset, left ventricular ejection fraction (LVEF) at implant, New York Heart Association (NYHA) class, months from most recent myocardial infarction (MI) at implant, prior arrhythmia event in setting of MI or arthroscopic chondral osseous autograft transplantation (Cor procedure), diabetes status, Blood Urea Nitrogen (BUN), Cr, renal disease history, rhythm parameters to determine sinus v. non-sinus, heart rate, QRS duration prior to implant, left bundle branch block, systolic blood pressure, history of hypertension, smoking status, pulmonary disease, body mass index (BMI), family history of sudden cardiac death, B-type natriuretic peptide (BNP) levels, prior cardiac surgeries, medications, microvolt-level T-wave alternans (MTWA) result, and/or inducibility at electro-physiologic study (EPS).
In an embodiment, a condition can include one clinical factor or a plurality of clinical factors. In an embodiment, a clinical factor can be included within a dataset. A dataset can include one or more, two or more, three or more, four or more, five or more, six or more, seven or more, eight or more, nine or more, ten or more, eleven or more, twelve or more, thirteen or more, fourteen or more, fifteen or more, sixteen or more, seventeen or more, eighteen or more, nineteen or more, twenty or more, twenty-one or more, twenty-two or more, twenty-three or more, twenty-four or more, twenty-five or more, twenty-six or more, twenty-seven or more, twenty-eight or more, twenty-nine or more, or thirty or more overlapping or distinct clinical factor(s). A clinical factor can be, for example, the condition of a subject in the presence of a disease or in the absence of a disease. Alternatively, or in addition, a clinical factor can be the health status of a subject. Alternatively, or in addition, a clinical factor can be age, gender, chest pain type, neutrophil count, ethnicity, disease duration, diastolic blood pressure, systolic blood pressure, a family history parameter, a medical history parameter, a medical symptom parameter, height, weight, a body-mass index, resting heart rate, and smoker/non-smoker status. Clinical factors can include whether the subject has stable chest pain, whether the subject has typical angina, whether the subject has atypical angina, whether the subject has an anginal equivalent, whether the subject has been previously diagnosed with MI, whether the subject has had a revascularization procedure, whether the subject has diabetes, whether the subject has an inflammatory condition, whether the subject has an infectious condition, whether the subject is taking a steroid, whether the subject is taking an immunosuppressive agent, and/or whether the subject is taking a chemotherapeutic agent.
Computer Implementation
The methods of the invention, including the methods of generating a prediction model and the methods of for determining the likelihood of CAD in a subject, are, in some embodiments, performed on a computer.
In one embodiment, a computer comprises at least one processor coupled to a chipset. Also coupled to the chipset are a memory, a storage device, a keyboard, a graphics adapter, a pointing device, and a network adapter. A display is coupled to the graphics adapter. In one embodiment, the functionality of the chipset is provided by a memory controller hub and an I/O controller hub. In another embodiment, the memory is coupled directly to the processor instead of the chipset.
The storage device is any device capable of holding data, like a hard drive, compact disk read-only memory (CD-ROM), DVD, or a solid-state memory device. The memory holds instructions and data used by the processor. The pointing device may be a mouse, track ball, or other type of pointing device, and is used in combination with the keyboard to input data into the computer system. The graphics adapter displays images and other information on the display. The network adapter couples the computer system to a local or wide area network.
As is known in the art, a computer can have different and/or other components than those described previously. In addition, the computer can lack certain components. Moreover, the storage device can be local and/or remote from the computer (such as embodied within a storage area network (SAN)).
As is known in the art, the computer is adapted to execute computer program modules for providing functionality described herein. As used herein, the term “module” refers to computer program logic utilized to provide the specified functionality. Thus, a module can be implemented in hardware, firmware, and/or software. In one embodiment, program modules are stored on the storage device, loaded into the memory, and executed by the processor.
Embodiments of the entities described herein can include other and/or different modules than the ones described here. In addition, the functionality attributed to the modules can be performed by other or different modules in other embodiments. Moreover, this description occasionally omits the term “module” for purposes of clarity and convenience.
Methods of Therapy
The methods disclosed can be employed together with the treatment of subjects, e.g., through use of, e.g., diagnostic methods disclosed herein.
In some aspects, a subject has stable chest pain. In some aspects, a subject has typical angina or atypical angina or an anginal equivalent. In some aspects, a subject has no previous diagnosis of myocardial infarction (MI). In some aspects, a subject has not had a revascularization procedure. In some aspects, a subject does not have diabetes. In some aspects, a subject does not have a systemic autoimmune or infectious condition. In some aspects, a subject is not currently taking a steroid, an immunosuppressive agent, or a chemotherapeutic agent.
In some embodiments, methods can be employed for the treatment of other diseases or conditions associated with CAD. A therapeutic agent can be used both in methods of treatment of CAD, as well as in methods of treatment of other diseases or conditions associated with CAD.
The methods of treatment (prophylactic and/or therapeutic) can also utilize a therapeutic agent. The therapeutic agent(s) are administered in a therapeutically effective amount (i.e., an amount that is sufficient for “treatment,” as described above). The amount which will be therapeutically effective in the treatment of a particular individual's disorder or condition will depend on the symptoms and severity of the disease, and can be determined by standard clinical techniques. In addition, in vitro or in vivo assays may optionally be employed to help identify optimal dosage ranges. The precise dose to be employed in the formulation will also depend on the route of administration, and the seriousness of the disease or disorder, and should be decided according to the judgment of a practitioner and each patient's circumstances. Effective doses may be extrapolated from dose-response curves derived from in vitro or animal model test systems.
Therapies for a subject with CAD or a subject with an increased risk of CAD can include lifestyle changes, administration of therapeutics such as drugs, and undertaking one or more procedures. Lifestyle changes can include quitting smoking, avoiding secondhand smoke, eating a heart-healthy diet, regular exercise, achieving and/or maintaining a healthy weight, weight management, enrollment in a cardiac rehabilitation program, reducing blood pressure, reducing cholesterol, managing diabetes (if present), and keeping a healthy mental attitude. Therapeutics can include aspirin, antiplatelets, ACE inhibitors, beta-blockers, statins, PCSK9 targeting therapeutics (e.g., PCSK9 inhibitors such as monoclonal antibodies such as evolocumab, bococizumab, and alirocumab), and agina medicines such as nitroglycerin. Procedures include angioplasty (with or without stenting) and bypass surgery.
Kits
Also disclosed herein are kits for assessing CAD. Such kits can include reagents for detecting expression levels of one or markers and instructions for calculating a score based on the expression levels.
A kit can comprise a set of reagents for generating a dataset via at least one protein detection assay that is associated with a sample from the subject comprising data representing protein expression levels corresponding to at least two markers comprising corin, APOB, HSP70, RBP4, SERPINA12, NTproBNP, PIGF, adiponectin, APOA1, S100A8, MPO, S100A12, or TNFAIP6; and instructions for generating a score indicative of CAD risk by mathematically combining the data representing the protein expression levels, wherein a higher score relative to a control subject having less than 50% stenosis in all major vessels as measured using Quantitative Coronary Angiography (QCA) indicates an increased likelihood that the subject has CAD or a lower score relative to a control subject having greater than or equal to 50% stenosis in at least one major coronary vessel as measured using QCA indicates a decreased likelihood that the subject has CAD. In certain aspects, the reagents can be selected from Table 9.2. In certain aspects, the reagents comprise one or more antibodies that bind to one or more of the markers, optionally wherein the antibodies are monoclonal antibodies or polyclonal antibodies. The reagents can include reagents for performing ELISA including buffers and detection agents.
A kit can further include software for performing instructions included with the kit, optionally wherein the software and instructions are provided together. For example, a kit can include software for generating a score indicative of CAD risk by mathematically combining data generated using the set of reagents.
A kit can include instructions for classifying a sample according to a score. A kit can include instructions for rating CAD risk using a score.
A kit can include instructions for obtaining data representing at least one clinical factor associated with a subject, wherein the at least one clinical factor comprises at least one of age and gender. In certain aspects, a kit can include instructions for mathematically combining the data representing at least one clinical factor with data representing protein expression levels to generate a score.
A kit can include instructions for use of a set of reagents. For example, a kit can include instructions for performing at least one protein detection assay such as an immunoassay, a protein-binding assay, an antibody-based assay, an antigen-binding protein-based assay, a protein-based array, an enzyme-linked immunosorbent assay (ELISA), flow cytometry, a protein array, a blot, a Western blot, nephelometry, turbidimetry, chromatography, mass spectrometry, enzymatic activity, and an immunoassays selected from RIA, immunofluorescence, immunochemiluminescence, immunoelectrochemiluminescence, immunoelectrophoretic, a competitive immunoassay, amd immunoprecipitation.
A kit can include instructions for taking at least one action based on a score for a subject, e.g., treating the subject, advising lifestyle changes to the subject, performing a procedure on the subject, performing further diagnostics on the subject, assessing the subject's health further, optimizing medical therapy, investigating non-cardiac etiologies of symptoms, or performing angiography on the subject.
Below are examples of specific embodiments for carrying out the present invention. The examples are offered for illustrative purposes only, and are not intended to limit the scope of the present invention in any way. Efforts have been made to ensure accuracy with respect to numbers used (e.g., amounts, temperatures, etc.), but some experimental error and deviation should, of course, be allowed for.
The practice of the present invention will employ, unless otherwise indicated, conventional methods of protein chemistry, biochemistry, recombinant DNA techniques and pharmacology, within the skill of the art. Such techniques are explained fully in the literature. See, e.g., T. E. Creighton, Proteins: Structures and Molecular Properties (W.H. Freeman and Company, 1993); A. L. Lehninger, Biochemistry (Worth Publishers, Inc., current addition); Sambrook, et al., Molecular Cloning: A Laboratory Manual (2nd Edition, 1989); Methods In Enzymology (S. Colowick and N. Kaplan eds., Academic Press, Inc.); Remington's Pharmaceutical Sciences, 18th Edition (Easton, Pa.: Mack Publishing Company, 1990); Carey and Sundberg Advanced Organic Chemistry 3rd Ed. (Plenum Press) Vols A and B (1992).
Study Population
Subjects enrolled in the multicenter PREDICT trial (ClinicalTrials.gov; NCT00500617, herein incorporated by reference) served as the starting population for this study. PREDICT enrolled subjects who were symptomatic or high risk asymptomatic patients referred for invasive coronary angiography with no known previous history of myocardial infarction or cardiac intervention.
For the purpose of these analyses two sets of PREDICT non-diabetic subjects were utilized, Set 1 for the initial assessment of candidate markers and Set 2 for validation of positive markers identified in Set 1 and subsequent multi-protein model development.
Basic clinical demographics for these 2 sets of subjects are shown in Table 1.1 and 2.1
Methods
Study Logistics
The study was divided into 2 phases in regards to sets of potential biomarkers: Phase I evaluated 126 assays that had been previously characterized by MesoScale Discovery and were commercially available; Phase 2 evaluated 9 additional assays that were developed for CardioDx by MesoScale Discovery. Summaries of Phase 1 and 2 assays are provided below.
Phase I in Set 1
Panel Types
Phase 2 in Set 1
Reactions
Data Handling
Single Analyte Model Fitting—Set 1
In Phas2 all assays measured protein levels at sufficient levels, well above the LLOD.
The following models were fit for both Phase 1 and 2 assays:
Each assay was tested independently for association with disease. Where appropriate, concentrations were pre-adjusted for covariates.
Tables 3.1 and 4.1 summarize the results in regards to Phase 1 biomarkers showing significant association with CAD
0.030
0.038
0.023
0.035
0.033
0.027
0.046
0.036
0.044
0.042
0.028
0.022
0.045
0.021
0.027
0.034
0.022
0.009
0.016
0.001
0.001
0.009
0.001
0.004
0.017
0.044
0.014
0.047
0.016
Correlation between top Phase 1 markers was assessed; overall, pairwise correlation was low (r<0.7) (
Table 6.1a give individual p valves and directionality for all CAD models using Phase 2 markers in Set 1. Significant p values are in bold.
0.004
0.045
0.024
0.004
0.039
0.024
0.004
0.032
0.037
0.036
Model Building and Performance Estimates in Set 2.
In order to utilize multiple proteins in predicting a patient's disease status, two versions of disease likelihood score were produced by fitting L1-penalized logistic regression models (the “LASSO” method). The outcome variable for these models was a patient's CAD status, as defined by >=50% max stenosis if QCA was available; or if not, by >=70% max stenosis by clinical angiography and/or >=50% stenosis in the left main vessel.
The first version of the risk score was fitted using all 14 selected markers (Table 6.1), and is as follows:
SCORE1=0.03165626-0.126123955*APOA1+0.115560254*NT-ProBNP
The second version of the risk score was produced by restricting attention to the markers that were included in Set 1A models (Table 6.1). Due to the more restrictive initial selection, the resulting model was more permissive about including proteins and is as follows:
RISKSCORE2=0.033643483+0.288633218*NT-proBNP−0.259370805*APOA1−0.09760706*Adiponectin+0.067488037*P1GF+0.106117284*S100A8-MPO
Table 6.1b and 7.1 summarize the markers and coefficients for the two models side by side, including the model weights
Model performance was estimated via 2500 iteration of cross validation on random holdout sets of 14 patients; Area-Under-the-Curve (AUC) estimates are given in Table 8.1.
Overview
The purpose of this analysis was to determine the combined performance of certain markers and/or factors at predicting obstructive CAD (oCAD). This process utilized stages of model building and selection, as well as some variable selection in the form of clinical covariate inclusion. The main analyses presented here center on the CADP2 group of PREDICT patients, which were independent of the PREDICT CADP1 set used to select the proteomics markers initially. The marker set forming the basis of this analysis is a composite of clinical data, Corus test results, and several proteomics data sets that were generated in different stages. Of these, the new results presented here are the addition of the 5 selected markers from Custom Set 2 to the previously selected 10 protein markers from the Catalog 126 and Custom Set 1. There were a total of N=472 patients with full data on the most recent proteomics data set (Custom Set 2), and this forms the basis of the group that was analyzed. Their clinical characteristics are summarized in Tables 1 and 2.
Methods
Cohort and Marker Selection
Markers for this experiment were selected from several sets of candidate markers previously assayed on the CADP1 set of patients (1A, 1B or both). From the Catalog 126 and Custom Set 1 experiments, the markers NT-proBNP, P1GF, S100A8, MPO, APOA1, Adiponectin, S100A12, and TNFAIP6) were selected. From the Custom Set 2 experiment using CADP1A patients (n=183, m=15), 5 markers were selected to continue into this validation set: APOB, corin, HSP70, RBP4, and SERPINA12. CADP1A is a group of matched cases and controls for age, gender, and some covariates, selected for extreme case and control status. The data from this discovery set was produced by Mesoscale (MSD), while the validation data was produced in-house, using antibody coated plates created by MSD for the prior discovery study.
Markers and Reagents
The accession numbers for markers are shown in Table 9.1. Reagents used to detect each marker via ELISA are shown in Table 9.2.
Response Variables
The response variable used is a combined reference (continuous variable=Stenosis.Combo, case/control=CAD or CAD.RespNum), which defines a case as QCAMaxStenosis (QCA)>50%, if available. If unavailable, a case is defined as MaxStenosis>70%, otherwise all remaining patients are controls. MaxStenosis is the clinical angiographic read, while QCAMaxStenosis is the quantitative clinical angiography read result. Quantitative Coronary Angiography (QCA) is described in Garrone P, Biondi-Zoccai G, Salvetti I, Sina N, Sheiban I, Stella PR, Agostoni P. Quantitative coronary angiography in the current era: principles and applications. J Intery Cardiol. 2009 December; 22(6):527-36. doi: 10.1111/j.1540-8183.2009.00491.x. Epub 2009 Jul. 13. Review. PubMed PMID: 19627430.
Clinical Covariates (Clinical Factors)
For clinical covariates, earlier work had indicated that age and gender are important predictors of oCAD, so they were included in all main models. Some of the exploratory models do not include these predictors. Earlier work indicated some non-linearity in the relation-ship between oCAD, age and gender. To explore this, various splines for these predictors were put into different models. The main spline used was to include 3 knots for age, at 20, 60, and 80 years, based on the previous results.
Three other clinical covariates had been found in prior work to be important predictors within subsets of age and gender: Smoking status, Dyslipidemia diagnosis, and type of Chest Pain. These were included in some of the main models, encoded as binary variables based on prior observations. They were encoded as follows:
Signal Pre-Processing: Compilation, Inference, Truncation, and Transformation
The two marker data sets (Catalog 126+Custom Set 1, Custom Set 2) have different pre-processing steps to arrive at the actual values used in this analysis.
The Custom Set 2 data was generated by splitting the patients into 6 patient sets. For each protein to be assayed, duplicate plates were produced for each patient set. For the APOB assay, 3 of the patient sets were diluted to one level, while the second third of the patient sets were diluted to another, less-dilute level. There were noticeable and consistent shifts in the APOB values from the first 3 sets, relative to the second, even after the standard curve adjustment. Some of the other markers also showed evidence of systematic plate effects, although not as dramatic as the APOB shift. Additional normalization beyond the standard curve application was therefore performed by first log2 transforming the concentration values, then subtracting off the deviations of individual plate medians from the overall median of each assay (centering the concentration values within each assay). Missing values were then imputed, and the mean of the two replicate values per sample, per assay was calculated. This was the original value used for analysis. No truncation or attempt to identify outliers was performed at this time after visual examination of the data implied that this was not warranted. To be more specific, the imputation was performed as follows: for samples with a missing value and an indication that the replicate was below the lower limit of detection, the imputed value was sampled from the range of (min. observed concentration, 2.5 percentile) with uniform probability. For samples with a missing value and an indication that they were above the upper limit of quantification, the value that was imputed was sampled with uniform probability from the range (97.5 percentile max. observed concentration). For samples that had missing values, but no indication that they were above or below limits of detection, if the replicate value was non-missing, this was used to replace the missing replicate value. There were no cases where both replicates of the sample for a marker were missing and no below or above limits of quantification flag was given. Imputed values were then truncated to ±3 MAD (median absolute deviation) of the study median, as calculated within each marker.
Missingness
There was a small amount of missing data in the data set which was complete for all Custom Set 2 markers (N=472). In general, the strategy that was taken was to impute values for the missing data because of its low frequency. The exception was that two of these subjects were missing data for all 10 Catalog 126/Custom Set 1 markers, and these were excluded from further analysis. The imputation details are as follows: For the Catalog 126 markers, three subjects were missing S100A8, S100A12, MPO and TNFAIP6 data. These subjects were imputed to have the median value for each marker in the data set.
Thirteen subjects were missing Corus scores. These were imputed to be the median Corus score by age group and gender of the subject, where age group was defined here as 25-40, 41-50, 51-60, 61-70 and 71-95 years of age. This bucketing was selected because of the importance of the 60 year old cutoff in the Corus female scores, and to create reasonably similar sized groups.
For the clinical covariates, 24 subjects were missing a Dyslipidemia diagnosis, and two additional subjects were missing smoking and chest pain data. See details in the model selection section, but due to the gender-specific coding of these covariates in the models, if the subject was of a gender or gender*age group that was automatically treated as 0, this was the imputed value for that subject and covariate. If the subject was in the gender*age group that had the potential to have a 1 value, the imputed value was sampled from a bivariate variable with probability of being 1 the frequency of that category in the patient group as a whole. For example, for Dyslipidemia, the 16 males subjects with missing data were imputed to have a 0 value for this, the seven females who were younger than 65 years were imputed to be 1 with a probability of 0.38, which is the frequency of Dyslipidemia in the entire patient set, and the remaining female was imputed to be 0.
Five subjects were missing Diamond-Forrester predicted values. Two of these were the same subjects missing the Chest Pain variables above. These were imputed to have intermediate risks for their age group and sex. The other three were imputed to have the risks associated with their age group, sex, and chest pain symptoms.
Data Characterization
Results
Multivariate Model Building
For the model building, there were several things to consider, including marker selection, the amount of complexity to specify for each term, and the amount of summarization to use to account for collinearity of model terms. Several decisions on these items had been made in prior analyses, and these were carried forward. With regards to the marker selection, a set of candidate markers had been selected as top hits from three previous discovery data sets using the CADP1 patient groups (the Catalog 126, Custom 1 and Custom 2 sets). The use of clinical covariates in models was limited to top predictors identified in previous analyses.
With regards to collinearity and the optimal amount of summarization of the predictor variables, two pairs of markers had been previously identified as being highly correlated (S100A8, MPO) and (S100A12, TNFAIP6). The mean value of each of these pairs was used in all models, rather than the individual marker values due to the extent of the correlation, and these are referred to elsewhere herein as A8MPO and A12TNF. The data currently available for modelling the Catalog 126 and Custom Set 1 was pre-processed, including some form of outlier identification and removal, centering and scaling and estimation of a ‘Batch’ effect. Additionally, some other predictor variables showed some correlation amongst themselves. The largest of these was the pair of Adiponectin and APOA1. The mean of these two values in their centered and scaled forms was used and use of this as a single term in models due to current availability of the data, for those models that looked at the effect additional summarization would have on the fit.
Following Harrell's general rules of thumb, and based on an approximate N number of 440 subjects available, the target for the total number of degrees of freedom available for model selection was set at app. N/15=29. The set of predictor variables that were of interest were determined to have for the minimal full linear model: 7 parameters for the Catalog126+Custom Set 1 (6 markers after combining into A8MPO and A12TNF plus a Batch adjustment term), 5 parameters for the Custom Set 3, and 5 clinical covariates plus an overall intercept and a term for the Catalog model, or 19 total parameters. This left roughly 10 degrees of freedom available for modelling non-linear complexity that could reasonably be specified in the models. Complexity was partitioned out in the rank order of the predictor strengths, based on previous results. The first priority was to model complexity of the relationship of Age with oCAD. Then NTproBNP, HSP70, APOA1, RBP4, Adiponectin, and corin were the rank order of the previous effect estimate strengths. After some consideration, only Age and NTproBNP non-linearity were explored in these models, due to sample size. Without being bound by theory, it is thought that further model optimization could be pursued during algorithm development based on the results observed here.
Based on these calculations, a pre-specified set of 11 main models to address these primary questions of interest in modelling (summarization, complexity, and some limited marker selection), was compiled. To protect against inflation of model performance estimates, and yet still be able to use all the available data for model selection, Efron's optimism bootstrap was employed for all model performance measure estimates such as AUC, Sensitivity, etc. It was found that optimism estimates appeared to be converging after approximately 400 bootstrap iterations were performed. In the end 1000 iterations per main model were run, for these results.
Independence of Information
During the model planning process, the independence of the predictor variables were assessed, to determine if any correlated variables might be more optimally represented by summarization of their data into a single variable. Several measures of similarity were considered, including the rank correlation measure of Spearman (Table 18 and
Main Model Set
The main models considered were all logistic regression models, with the binary response of oCAD>=50% by QCA is a case, oCAD>=70% is a case if QCA was unavailable, all others were controls (Table 7). Because of the form of the available Catalog126+Custom Set 1 data which had a Batch intercept effect that the terms in these sets needed to be adjusted for, it was decided to create hierarchical models, where first a model was fitted for just this set of data alone. The predicted model values (on the Xβ scale) were calculated for each subject and used to create an additional variable, called “Catalog” models (Table 6). There were 5 such base-level Catalog models considered. This Catalog term was then put into the higher level main model as a single predictor. The models are described below. The general strategy for the main models was to explore the effects of increased predictor complexity, increased predictor summarization, and the effects of both together.
Model Performance
AIC Values and Determining the Best Model
The ability of each model to explain the variation in the data was compared using two statistics, AIC, which is the deviance of the model plus two times the number of parameters estimated by the model, and AICC, which is more severely penalized than the original AIC for the number of parameters in the model, relative to the number of subjects in the data set. AICC can be calculated as
The median AIC values for the main models are shown in
With regards to AIC, models 6 and 9 look the most promising, but with the AICc measure, Model 7 is superior. Models 6 and 9 are similar to each other, both with non-linear age and gender splines, the summarization of Adiponectin and APOA1 into a single term, and a 3-knot spline fitted to NTproBNP. Model 9 additionally has the clinical covariates. Model 7 is a relatively simple linear, additive model, differing only from Model 1 in the combined Adiponectin—APOA1 term. Since Model 7 has adequate AIC, while Models 6 and 9 look less appealing by AICC, because of the high variability observed in the coefficients fitted to the spline terms among the bootstrap models, and due to the reduced complexity of Model 7, which could be of benefit in diagnostic development, Model 7 was selected as the model to use as a reference point for the performance of the current proteomic marker set after Discovery efforts.
Odds Ratios for the Selected Model
For explorative purposes, the odds ratios for the final Model 7, as fitted to the full CADP2 data set are shown in
AUC Values and ROC Curves Although Model 7 was selected based on AICc and not on the basis of its AUC, it does have a superior value in the main model set (see Table 21 and
Other Model Performance Statistics
With regards to other measures of model performance such as Sensitivity and Specificity, two cutoffs were considered for the proteomics model. The first was to set the cutoff so that a positive result was all subjects with predicted probabilities >20% of having oCAD >50%. This was the criteria for the cutoff set for the original Corus test. The second cut point examined was the Youden cutoff, which takes the point at which the minimum distance from the upper left corner to the AUC curve occurs. This tends to maximize sensitivity and specificity simultaneously. This was compared to the performance in the same CADP2 patient set using a cutoff of 15 for Corus (Table 10 and
Model 7 Performance on Certain Subsets of Subjects.
After fitting the main model 7 on the entire N=470 CADP2 group of patients, it was then applied to several subsets to compare performance in these groups to Corus. Because the comparison of the model to Corus was the primary goal here, the fitted model was then used to predict the data excluding the subjects with imputed Corus data (referred to as the ‘All’ set). The subsets were then taken from this non-Corus imputed data set. Note that these estimates are unadjusted for optimism, and so are somewhat higher than the actual performance estimates given in the earlier main results. However, both Corus performance and Model 7 performance are calculated on the same subsets of subjects (Table 11).
Exploratory Analyses
Several sets of additional models were run in this analysis for exploratory purposes. The first was a set of models comparing results from the best proteomics model of the main field with a variety of combinations of Corus results on the same subjects (N1=457).
The second set looked at performance of both Corus and the best proteomics model on the cohort, excluding all subjects used in Corus AlgDev originally (N2=364). The third set of models looked at the performance of Corus, and the best proteomics model. The fourth set examined the model 7 predictor terms in a proportional odds regression model, performed on the full set of 470 subjects.
Exploratory Set 1: Corus and Proteomics
These models were run on a data set very close to that used for the main models (Table 12). The only difference was that the 13 subjects missing Corus scores were excluded from this exploratory analysis, resulting in a total sample size of 457. In the earlier main model results, Corus scores were imputed for these 13 (see earlier sections for details).
Exploratory Set 2: Excluding AlgDev Samples
Exploratory Set 2 was run on the subjects with available proteomic data that were not originally used in Algorithm Development for Corus (N2=364). The models are listed in Table 13.
Exploratory Set 3: Proteomics, Corus
Exploratory Set 3 was run on the CADP2A subjects (N3=176). The models are listed in Table 14.
Exploratory Set 4: Ordinal Regression Exploratory Set 4 was run on the full set of subjects (N4=470). The model is listed in Table 15.
Model 7 Calibration and Discrimination
A comparison of the predicted values for the Model 7 results to the stenosis of the patient can be seen in
A comparison of the predicted values for the Corus RNA expression-based test and the Model 7 results can be seen in
Tables 23-25: Listing of Odds Ratio estimates from logistic regression exploratory models
Final Model 7 Equation
For a logistic regression model where logit{CAD=1|X}=Cβ, the final fitted equation is:
X{circumflex over (β)}=−5.29177+1.34519*I(Sex=Male)+0.6996(Age)+0.76010(Set1)+0.02924(corin)+0.26173(APOB)−0.12978(HSP70)−0.05482(RBP4)−0.20628(SERPINA12)
Set1=−0.38017Batch−0.47149AdipA1+0.43946NTproBNP+0.18471PlGF+0.17573A8MPO+0.19449A12TNF
where I(Sex=Male) is an indicator function that is 1 if the subject is Male and 0 otherwise, Age is expressed in years, and the protein marker values are transformed as log 2 (Calculated Concentration+2). Set 1 is the predictor function from a nested logistic regression model with the same response variable, that is logit {CAD=1|X}, where Xi are different predictors than the full model, as listed in the equations above. There are several terms that are the means of two protein assays within a patient, these include AdipA1 (mean of Adiponectin+APOA1), A8MPO (mean of S100A8 and MPO), and A12T N F (mean of S100A12 and TNFAIP6).
This example provides results of a subtractive analysis where all possible subsets of the full model of interest were run using logistic regression:
High Level: logit{Pr(obstructive CAD)}=Intercept+Age+Sex+APOB+corin+HSP70+RBP4+SERPINA12+Lower Level Model Fitted Value
Lower Level: logit{Pr(obstructive CAD)}=Intercept+AdipA1+NTproBNP+P1GF+A8MPO+A12TNF,
where AdipA1 is the mean of Adiponectin and APOA1, A8MPO is the mean of S100A8 and MPO and A12TNF is the mean of S100A12 and TNFAIP6.
Each new model created via the subtractive analysis was a logistic regression model, which was fitted using an iteratively reweighted least squares method. Each time a new model was fit, this method calculated the coefficients or “weights” of the terms that minimize the least squares criteria for that specific model. For each particular model, these can vary due to the presence/absence of particular terms and the amount of information they each give about the response variable.
Two measures of model performance for each new sub-model were collected: AICc, Akaike's Information Criteria (corrected for the number of cases in the model fitting set; here AICc=AIC+{2p(p+1)/n−p−1}, where p is the number of parameters in the model and n is the number of cases used in model fitting. n=156), and AUC (area under the curve). For the AICc, the smaller the value, the better the model captures the information in the data set, while for the AUC, the larger the value, the better the model correctly classifies patients as having or not having obstructive CAD. AUC is the area under the ROC curve, which was calculated in the standard way, but is generally a rank ordered statistic, which is the probability for all possible (case, control) pairs that the model correctly orders the case as a higher risk of disease than the control.
The AICc and the AUC were calculated after the models were fit, where the coefficient values were determined. They were calculated in the same way for all models. As such, they are generally, relatively comparable across all models, despite the differences in the specific terms used in each model as part of the subtractive analysis. The individual models and values of AICc and AUC for each model are given in Table 26A-B. In total, 4094 distinct, new models were generated and tested for this example.
While the invention has been particularly shown and described with reference to a preferred embodiment and various alternate embodiments, it will be understood by persons skilled in the relevant art that various changes in form and details can be made therein without departing from the spirit and scope of the invention.
All references, issued patents and patent applications cited within the body of the instant specification are hereby incorporated by reference in their entirety, for all purposes.
This application claims the benefit of U.S. Provisional Application No. 62/212,935, filed Sep. 1, 2015, which is hereby incorporated by reference, in its entirety, for all purposes.
| Filing Document | Filing Date | Country | Kind |
|---|---|---|---|
| PCT/US16/49717 | 8/31/2016 | WO | 00 |
| Number | Date | Country | |
|---|---|---|---|
| 62212935 | Sep 2015 | US |